li A
rn
CIiOD
Second
David Poole Trent University
THOIVISON
__
~JI
_
BROOKS/COLE
Aust ralia. Canada. Mexico . Singapore . Spa in United KHl gdom • United Slates
THOIVISON -~...
-.
BROOKS/COLE
Linear Algebra: A Modern Introduct ion Second Ed ition
Dm'la Poole Executive Publisher C urt Hinnchs Executive Edi tor: jennifer Laugier Ed itor: /ohn·Paul Ramin ASSIStant Edi tor: Stacy Gre!.'n Ed itonal Assistant: Leala Holloway T!.'chnology l)roject Managt'r: E,lrl Perry Marketing M,mager. Tom Ziolkowski Marketing Assistant: Enn M IIchell Advertising Project ~lanager: Bryan Vann ProjeCl Ma nager. Edllorial ProduClion: Kelsey McGee Art Director: Vernon Boes Prinl Buyer: Jud y Inouye
Permissions Ed,lor: loohtt lLe Production Service: Matrlx Producllons Text Designer: John Rokusek Photo Research: Sarah Ever t50n / lmage Quest Copy Editor: Conllle Day Il lust ration: SCientific Illustra tors Cover Design: Kim l~okusek Cover Images: Getty Images Cover Pnnllng, Prim ing and Binding: Transcontlllen Pnnllllgl L.ouisrv ille Composilor: Inte racti\'e Compositton Corporation
e
Asia (including India ) Thomson Learning 5 Shenton Way #01 -01 UIC Buildtng Singapore 068808
2006 Thomson Brooks/Cole, a parI of The Thomson Corporation Thomson, Ihe St~r logo, and Brooks/Cor.. arc tr~demarks llsed herein under license. ALL RI GHTS RESERVED. No part of this work covered by the
cop)'right hereon may be reproduced or used III any form or b)'
III
C lIlada
2 3 4 5 6 7 09 08 07 06 05
For more information about ollr products. contact us at. Thomson Learning Academic Resou rce Cen ter 1-800-423-0563 For permISSiOn to usc material from thIs text or product , submit a request o nhnr at hllp:!!w....'" Ihumso nrighls.com. Any additiol1(ll (lUemOns ahout permiSSIOns can be su bmlltoo by email 10 thom son righ Is@thom son.com.
Library of Congrrss Co nlrol Numbrr: 2004111976 ISBN 0-534-99845-3 In ternational Student Editio n: ISBN 0-534-40596-7 ( Not for sale in the United States) Thomson Hi ghe r Educa tio n 10 Davis D rive Belmont, CA 94002-3{)98 USA
Aust ralia/New ualand Thomson karning Australia 102 Dodds Strret SoUlhbank, VictOri a 3006 Australia Ca nada Thomson Nrl~on I 120 Birchmount RO.ld Toronto. Ontario M 1K 5G4 Canada UK/Europe/ Middle East/Afri ca Thomson Learning HIgh Holhorn I-louse 50/ 51 Bedfo rd Row londonWC IR4LR United Kingdom Latin America Thomson Learning Srneca,53 Coloma Polanco 11560 MexICO D.F. Mexico Spain (incl ud ing Porlugal) Thomso n Paraninfo CaUl' Magallanes. 25 28015 Madnd. Spain
'10. ",
".
...... ".
· ,
,..
--"2"_
.
-'
For Mary', my standard basis
r
,
c
nlS
Pre/lice 1'/1 To lilt Instructor To the SlIIdent
Chapter I
Vectors
. ..
XVII
XXIII
I
1.0 1.1
1
The Geometry and Algebra of VeCiors
1.2
Length and Angle: The Dot Product
Introduction: The Racetrack Game
3 15 29
Exp/orllfioll: \ '(!ClOrs (/lui Geometr),
1.l
Lines ~lnd Planes Exploratiol/;
1.4
Chapter 2
Till.'
31 Cross
Pr~/lfct
Code Vecto rs and Modular Arithmetic \ 'igllelte: Tire Codabar Sy~11!1II 55 Chapler Review 56
Systems of Linear Equatio ns 2.0 2. 1
45
58
Introduction: Triviality 58 Inlroductlon to Systems of Linear Equations
Exp/oratio,,; L,cs My Compl/ter Told Me 2.2
2.5
"
59
66
Direct Methods fo r Solving Lmear System s 68 Exp/ortltiofl; Hlrlial PiI'tII;flS 86 Explom/ioll; COlllltir'g 0pt'nlliolls: An (III,o,IIIClioll
to tile Amdysis of A/goritiJllls 2.3 2.4
47
87
Span ning Sets and Linea r Independence 90 Applicatio ns iO I Allocation of Resources 101 Balanci ng C hemical EqulItions 103 Network Analysis 104 Electrical Networks 106 I~ init e Linear Games [09 Viguette: TIle Global Positioning S),stem 119 iteratIve Methods for Solving Linear Syste ms C hapter Review \3 2
122
Contents
Chapter 3
Matrices 3.0 3.1 3.2 3.3 3.4
3.5 3.6 3.7
134
Int roduction: Matrices in Action 134 Matrix Operations 136 Matrix Algebra 152 The Inverseof
Error-Correctmg Codes Chapter Rev iew
Chapter 4
4. 1
4.2
240
250
Eigenvalues and Eigenvectors 4.0
189 209
252
Introduction: A Dynami cal System on Graphs Introduction to Eigenvalues and Eigenvectors Dctermillants 262
252
253
Explomtioll: Geometric Applicatiolls of Determ;lII/IIrs
283
4.3
Eigenvalues and Eigenvectors of "x" Mat rices
289
4.4
Similarity and DiagQnaliz3tion
4.5 4.6
Iterative Methods for Computing Eigenvalues 308 Applications and the Perron-Frobenllls Theorem 322 Markov Chains 322 PopulatIOn Growth 327 The Pcrron-Frobcnius Theorem 329 332 Linear Recurrence Relations Systems o f Linear Differential Eq uations 337 D iscrete Linear Dynamical Systems 345
298
Vigllcllc: Rlmkiu8 Sports Teams llUd Scare/rillg rlre Inr eme! Chapter Review
Chapter 5
Orthogonality 5.0 5. 1 5.2 5.3
353
361
363
Introd uction: Shadows on a Wall 363 Orthogonality in R" 365 Orthogonal Complements and Orthogonal Projections The Gram-Schmid t Process and the QR Factorization
Exploralion: The Mo(lified QU Fllctorizllfioll
375 385
393
Expforflli()lI: Approxilllntillg Eigenvailles willi lire QR Algoril/rlU
395
5.4 5.5
Vector Spaces 6.0 6.1 6.2
431
Int roduct ion: FibonaccI in (Vector) Space Vector Spaces and Subspaces 433 Linear Independence, Basis, and Dimension
Exploratioll: Magic Squares 6.3 6.4 6.5 6.6
397
Orthogonal Diagonalization of Sym metric Matrices Applicat ions 405 Dual Codes 405 Quad ratic Forms 4 11 Grap hing Quadratic Equations 418 Chapter Review 429
431
447
464
Cha nge of Basis 467 Linear Transformations 476 The Kernel and Ra nge of a Linear Transfor matio n The Mat rix of a Linear Tra ns formatio n SO l
485
£'(plomtioll: Tilillgs. innices, ami tile Crystallogmphic Restrictiml
6.7
Applications
522
Homogeneous Linear Di fferentia l Equations Linear Codes 529 Chapter Review 536 .
Distance and Approxima tion 7.0
522
538
7./
Introduction: Taxicab Geometry Inner Product Spaces 540
7.2
Exploratioll: Vectors and Matrices lI'itl, Complex [//tries 552 Exploratiol/: Geometric II/equalities (llId Optimizatioll Problellls Nor ms and Distance Functions 561
73 7.4
Least Squares Approximation 577 The Singular Value Decomposition 599
Vignette: Digital/mage Compressioll 7.5
538
616
Applications 61 9 Approximation of Functions 6 19 Error·Correcting Codes 626 Chapter Review 631
APPEN DIX A APPENDIX B APPEN DlX C
APPEND IX D
5/9
Mathemat ical Notation :lnd Methods of Proof Mathematical Ind uc tion 643 Complex Numbe rs 650 Polynomials 661
Answe rs to Selected Odd-Numbered Exercises I ndex 706
67 1
634
556
. '-...
....
. . .. - - - . 1..
..,
,-
",
~..: ~.~
...
'
-
Preface Tile IIUf flrmg one know$ wilen wmlll81l book is Wlllll 10 put first.
- Blaise Pascal Pensees, 1670
See The College Matlrtmlll rN. jOllrna/ 24 ( 1993). 41-46.
I am pleased \",th the warm response of both teachers lind students to the first edition o f Linear Algebra: A Modem hltroductioll. In th is second edition, I have tried to preserve the approach and features that users found appe'lling, while inco rpornung many of their suggested Improvements. I want studen ts to see linear algebra as an exciting subject and to appreciate its tremendous usefulness. At the same time.l want to help them master the basic concepls and tech niques of linear algebra that they will need in other courses. both III mathematics and in o ther disciplines. I also want stude nt's to appreciate the interplay of theoretical. applied, and numerical mathematICS th nt perv
nnd thai student s need to see vectors fir st (in a concrete scning) in order to gam some geometric Insight. Moreover, introducing I'('C IOTS early allows students to sec how systems of linear equations arise naturally from geo metric problems. Matrices then ,I rise equally naturnlly as coefficient matrices of linea r systems and as agents of dlOlnge (linear transformations). ThIS sets thl' stage fo r eigenvectors and orthogonal projecl ions, both of which are best understood geometn cally. The arrows that appear on the cover reflect my conviction that geometric understanding should precede computatIonal techniques. I have tried to lim it the number oflhco rems in the text. For the most part , results labeled as theorems either will be used later in the text or summaflZC preceding work. Interest ing results that are not centml to the book have been Included as exercises or explorations. For example, the cross product of vectors IS discussed only in explorations (Ill Chapters J and 4). Unlike Illost linear algebra textbooks, this book has no chapter on determinants. The essential results aTe an m Section 4.2, with other interesting matenal con tamed in an explonltion. The book IS, however, comprehensive for nn introductory text. Wherever possible. I have included elementary and llccessible proofs of theorems in order to ayoid having to say, "The proof of this result is beyond the scope of this text.n The result IS, I hope, a work that is sclf-contallled. J have not been st ingy with the applications: There arc many more in the book than ca n be covered in a slIlgle course. However, it is Important that students see the impressive range of problems to which linear algebra can be applied. I have included some modern material on cod ing theory that is not normally found in an 1I1troductory linear algebra text. There are also sevc ral ImpressIve real -world appli cations of linea r lligebra, presented as self-contained "v ignettes." J hope that instructors will enjoy tCilchlllg from this book. More important , I hope that st udents using the book will come away with an appreciation of the beauty, power, and tremcndous utilit y of linear algebra and that they will have fun along the way.
New in the Second The overa ll structure and style of Lmeilr Algebra: A Modem l"tro(/u(lio/l remain the same in the seco nd edition However, there are many places where changes or additions ha\'c been made. Some of these changes urc the result of my using the hook in my o,.,.n courses and realizing that certain cxplanations 01' examples could be made clearer, or that an extra exercise here and therc would help students understand the materiuJ. I ha\'e paid attention to comments made by my own students, to the remarks of reviewers of the first edition . and to suggestions scnt to me by many users. As always, my first priority is the studen ts ,.,.ho will use the book. To this end, there is much new malerial designed to stimulale 51uden lS' enjoyment of linear algebra and to help them revie,'" concepts and technl(lues after each chapter. The second edition of Lmear Algebra: A Modem //llroI'lIl C/;OIl also comes wJth a much wider rangc of ancillary materials that benefit both students and instructors. Here is a summary of th e IH.'W material:
,,,ill
• Each chapter now concludes with a review section. Keydefi nitions and concepts are listed here. with page rcfercnces, and each chapter has a sct of review questions, including ten true/ false questions designed to test conceptual understandIng.
Prcfacc
I.
• I have added nve rea l-world applica tio ns of linear algebra, four of wh ich are new. Th ese "vignettes"
Oint of view (" Fibonacci in (Vector) Space" J. I think the new introduction does a much beller job of setting up the material o n general vecto r spaces that is the subject of this chapter. I havc retamed "Magic Squares" as an exploration followlllg Sectio n 6.2. • A new ex ploration, UVectors and Matrices wi th Complex Entri es," has been added after Section 7.1. Here you will find an introduction to Hermitian, unitary, and normal matrices that can be used by instructors whose course includes complex linear algebra, or simply as an enrichment ac tivity in any course. • In an effort to make the biographical and historica l snapshots more diverse, I have added biographies of Olga Taussky-Todd, for her fundamental contributions to matrix theory, and Jessie MacWilliams, a pioneer in coding theory. • There are over 300 new or reVised exercises. • I have mllde numerous small changes in wordlllg to improve clarity. • The num be n ng scheme for theorems, examples, figures, and tables has been Improved to make them easier to find when they are cited elsewhere in the book. • The supplementary material on technology that apl'eared in the first edition as Appendix E has been removed, updated, and placed on the CD that nO\\l accompanies the book. • A fuJI range of ancillllTY materials accompanies the second edition of Linear Algebra. A Mo(/em II/nodl/eriol!. These supplements arc dcscnbed on p
Fealures Clear Wrillng Style Th e text is written is a simple, direct, cOllversational style. As much as possible, I have used "mathematical English" rather than relying excessively on mathcm:ltical notation However, aU proofs that arc given arc full y rigorous. and Appendix A contains an introduction to mathemati cal notation for those who wish to streamline their own writing. Concrete examples almost always precede theorems, which arc then foll owed by furthe r exam ples and applications. This flow- from speCIfic to general and back
agai n- is consistent throughout the book.
Key Concepts Introduced Early Many students encounter difficulty in linear algebra when th e course moves from the computational (solVing systems of linea r equations. manipulating '-'cClors and matrices) to the theoretical (span ning sets, linear mdependence, subspaces, basis and d imension.) This book int rod uces all of the key concepts of linear algebra earl y, in a concrete setting, before revisiting them in fu ll gener ality. Vector concepts such as dot product, length, orthogonality, and projection arc fi rst discussed in Chapter I in the concrete setting of IR z and ~3 before the more general notions of inner prodUct, norm, and orthogonal projection appear in Chaplers 5 and 7. Similarly, spannIng sets and linear lIldependence arc given a co ncrete treatment in Chapter 2 pnor to thei r generalil..al1on to vector spaces in Chapter 6. The fundam ental concepts of subspace, basIs, and dimension appear first in Chapter 3 when the row, column, and nu ll spaces of a matrix arc introd uced; it is not until Cha pter 6 that these Ideas arc given a general trC
Emphasis on Vectors and Geometry In keepi ng \\'ilh the philosophy that linear algeb ra IS primarily about vectors, thIS book stresses geometric intuition. Accofd in gly, the first chapter is about vecto rs, and it develops many concepis that will appear repeatedly Ihroughout the text. Concepts such as o rthogonality, projection, and linear combi nation are all found in Chapter I, as is a comprehensive treatment of lines and planes in Rl that provides essent ial insight into the solution of systems of linear equations. This emphasis on vectors, geometry, and visuali7.ation is found throughout the text. Linear transformat ions are introduced as mat rix transfo rmat ions in C hapler 3 , \Vlth many geometric examples, before general linear transformations are covered In Chapter 6. In Chapter 4, eigenvalues arc introduced with "eigenpictures" as a Visual aid. T he proof of Perron's Theorem is given fi rst heuristically and then formall y, in both cases using a geometric argument. The geometry of linear d yna mical systems rein forces and summarizes the material on eigenvalues and eigenvectors. In C hapter 5, orthogonal projections, orthogonal complemen ts o f subspaces, and the G ram-Schmidt Process are all presented in the concrete setting oflQ} before being generalized to R" and, in Chapter 7,
Preface
II
to inner product spaces. The nature of the si ngular value decomposition is also explained informally In Chapter 7 via a gcom('!ric argument. Of the more than 300 figures in the text, over 200 are devoted to fostering a geometric understanding of Illle,lT algebra .
Explorations S(t' pagN I, /34. 431. 538
5« pagcs 29, 28.1, 4M, 519 5;]. :>tI
.'>« f'lIK/'j 6(i, 86, 87, .193, .W.'i
The introduction to each chapter is a guided exploration (Section 0) in which students arc invited to discover, individually or in groups, some aspect of the upcoming chapter. For example. "The Racetrack Game" introduces vectors." Mat rices in Action" introduces matrix multiplication and linear transformations, "Fibonacci in (Vector) Space" touches on vecto r space concepts, and "Taxicab Geometry" sets up generalized norms and distance functions. Additional explorat ions found throughout the book incl ude applications of vectors and determina nts to geometry, an investigation of 3X3 magic squares, a study of symm('! ry via the tilmgs of M. C. Escher, an int roduction to complex linear algebra, and optimization problems using geometric inequalities. There are also explorations that mtroduce import3nt numerical considerat ions and the analysis of algorithms. Having students do some of these explorations is one way of encouraging them 10 become ,\Ctlve learners and to give them "ownership" over a small pa Tt of the cou rse.
lppllcallons
Ser f"li,....j :U , 532
See paga 55. J 19, 2l4. .153, 616
The book contains an abundant selection of applications chosen from a broad range of disciplines, includ ing mathematics, compu ter science. physics, chemistry. engi neeri ng, biology, psychology, geography, and sociology. Notewo rthy among these is a strong treatmen t of coding theory, from error-detecting codes (such as International Standard Book Numbers) to sophisticated error-correcting codes (such liS the ReedMuller COOl' that was used to transmit satellite photos from space). Additionally, there are five "v ignettes" that brieny showcase some very modern applications of linea r algebra: the Codabar System. the Global Positioning System (G PS), robotics, Internet search engines, and digital image compression
Examples and Exercises There are over 400 examples 111 this book, most worked III greater detail than is customary in an mtroductory linear algebra textbook. This level of detail is in keeping With the philosophy that studen ts should wan t (and be able) to read a textbook. Accordingly. it is not intended that all of lh,'se exa mples be covered in class: mnny can be assigned for ind ividual or group study, possibly as part of u project. Most examples have at least one cou nterpart exercise so that students ca n tryout the skills covered in the example before exploring generalizations. There are over 2000 exercises, more than in most textbooks al a similar le\'el. Answers to most of the computational odd- numbered exerciscs can be fo und in the back of the book. Instructors will find :111 abundance of exercises from which to select homework assignments. (Suggestions are gIVen in the IIIS/mewr s Guide.) The exercises In each sect ion are graduated, progressing from the rou tine to the challenging. ExercISeS range from those intended for hand computation to those requiring the use of a calculator or computer algebra system, and from theoretical and numerieJI exercises to conceptual exercises. Many of the examples and exercises use actual da ta
compiled from real' \\'orld situations. For example, there ,lrot problem s on modeling the grmvth of caril>ou and seal populations, radiocarbon dating of the Stonehenge monument, and predicling major league baseball players' salaries. \I'/orking such problems reinfo rces the fact that linear algebra is a valuable tool for modeling real· hfe problems. Additional exercises appear in the form of a revi ew after each chapter. In each set, there are iO true/ false questions designed to test conceptual understanding, followed by 19 computational and theoretical exercises thai sum marize the main concepts and lechniques of that ch
Biographical Skelches and EtymologiCal NOles II is important that studen ts le
Margin Icons The margins of the book contain several icons whose purpose is to alert the reader in various \vays. Calculus is nol a prerequisite for this book, but linea r algebra has many icon denotes an example mteresting and important applications to calculus. The or exercise that requires calcu lus. (This material can be omitted if not everyone in the class has had ,It least one semester of calculus. Alternatively, Ihey can be assigned as projects.) The .... icon denotes an example o r exercise involvmg complex numbers. ( For students unfam iliar \vit h complex Olllnbers, Appendtx C contains alt the back · ground material that is needed.) The CAe icon indicates that a compuler
l&.
Technologr This boo k can be used stlccessful1y whether or not students have access to techno logy. However, calculators with matnx capabilit ies and computer algebra systems are now
Preface
1111
widely available and, properly used, can enrich the learning experience as well as help with tedious calculations. [n this text, I take the point of view that students need to master all of the basic techniques of line;lr algebra by solving by hand examples that are not too computationally difficult. Technology may then be used (in whole or in part) to solve subsequent examples and applications and !O flppl)' techniques th:1t rely on earlier ones. For example, when systems of linear equat ions are first introduced, detailed solutions arc provided; later, solu tions are Simply given, and the reader is expected to verify them. This is a good place to use some form of technology. likeWise. when applicat ions use dalll that make hand C
Iinile and Humericailinear Algebra
5
The text covers two aspects o f linear algebra th;!t are scarcely ever mentioned together: finite linear algebra and numerical linear algebra. By introducing modular arithmetic early, [ have been able to make finite linear algebra (more properly. "linear algebra over fin ite fields," although I do not USt' that phrase) a recurring theme throughout the book. This approach provides access to the material on coding theory In Sections 1.4,37.5.5,6.7, and 7.5. There is also an application to finit e linear games in Sect ion 2.4 that st udents really enjoy. In addltlon to being exposed to the applications of finite linear algebra, mathematics majors Will benefit from seeing the material on finite fields, because they are likely to encounter it in such courses as d Iscrete mathematICS, abstract algebra, an d number theory. Al! students should be aware thai in practice, it is Impossible to arrive:1I exact solutions of large-scale problems in linear algebra. Exposure 10 some of the techniques of numerical linear algebra will provide an indication of how 10 obtain highly accural e approximate solutIons. Some of Ihe numerical topics included in the boclk are roundoff error and partial pivoting, Iteral ive methods for solving lInear systems and comp uting eigenvalues, the LU and QR factorization s, matrix norms and condition numbers, least squares approxImation, and the singul:lr value decomposition. The inclusion of numerical linear algebra also brlllgs up some 1Illeresting and important issues that are completely absent from the theory of linea r algebra, such ,IS pivoting str,ltegles, the conditIon of a linear system, and the convergence of iterative methods. This book not only raises these questions but also shows how one might approach them. Gerschgorin disks, matrix norms, and the singular values of a matrix, discussed in Chapters 4 and 7, are useful in this regard.
AppendIces Appendix A contains an overview o f mathematical notation and methods of proof, and Appendix B discusses mathematical induction . All students will benefit from these sections, but those with a nulthematically oncnted major may wish to pay particular attention to them. Some of the exa mples In these appendices are uncommon (for instance, Example 13.6 in Appendix B) and underscore the rower of the methods. Appendix C is an introduction to complex numbers. For students fam il iar With these resuits, this appendix can serve as a usefu l reference; fo r others, th is section con tains everything they need to know (or those parts of the text that use complex numbers. Appendix D is abou t polyno mials. I have found that many stu(]cnts req uire a refresher about these facts. Most students will be unfamiliar with Descartes' Rule o f Signs; it is used in Chapter 4 to explain the behavior of the elgelwalues of Leslie matrices. Exercises to accompany the four appendices can be found on the book's website.
Answers to Selected Odd-Numbered ExercIses Short answers to most of the odd- num bered com putatIOnal exerCises are given at the end of the book. The Siru/eni Solution s Man ual and Study GUIde con tains detailed solutions to all of the odd-nu m bered exercises, including the theoretical ones. The Complctc Solutions MmJlm/ contains detailed solu tions to all of the exercises.
AncillarIes The foll owi ng supplemen ts are all available free of charge to ins tructors who ado pt Linear Algebra: A Modern Introduction (Second Edition ). The Student SO/1I/IOIIS Mal/llal (I/UI Stlldy Guide can be purchased by students, either separately or shrink\... rapped with the textbook. The CD-ROM is packaged with the textbook, and the website has password-protected sections for students and instructors.
StJl(lelH SO /Jltioll s Manual and Study Guide by Robe rt Rogers (Bay State College), ISBN: 0-534-99858-5 Includes detailed solutions to all odd-numbered exercises and selected evennumbered exercises; section and chapter summar ies o f symbols, definitions, and theorems; and study tips and hints. Complex exercises are explored through a qucst lon-and-answer format deSigned to deepen understanding. Challenglllg and entertaining problems that further explore selected exercises arc also included.
Complete SolutiollS Mmwal by Robert Rogers (Bay State College), ISBN: 0-534-99859-3 Full solutions to all exercises, including those ill th e explo rations and chapter reviews.
Illstructor's Gllide by Douglas Shaw and Michael Proph et (University of Northern lo\ ...a), ISBN: 0-534-99861-5 Includes camera-ready group work, teaching tips. interesting exam questions, examples and extra material for lectures, and o ther items designed to reduce the instructor's preparation time and make linear algebra class an exciting and interactive experience. For each section of the text, the Instructor's Grllfle includes suggested time and emphasis, points to stress, questions for discussion, lectu re materials and examples, technology tips, student projects, group work with solutions, sample assignments, and suggested test questions.
Preface
It
Test Balik by Richard C. Pappas ( Widener Unive rsity), ISIl N: 0-534-99860-7 Contains fo ur tests for each chapter (one each in ffl'e-form, muhiple-choice, true/ false, and "mixed" formats) and four tinal exa ms, each in a diffe rent formal . Answer keys a rc provided.
it.m TM Testing, [SBN: 0-534-99862- 3 EffiCIent and ve rs.1ti[e, JI.rn Testing IS an Inte rnet- ready, text-s ped fie testing su ite that enables instructors to customize exams and track student progress in lIil accessible, browser-based for mat . Itrn offe rs problem s keyed to Lineflr Algebra: A Modem Imro dIlCUOIl, allows free- response mathematics, and [ets sllldents work with real math nO\:ltion in re,ll lime. The complete integration of the testing a nd course managemen t components simplifies routme tasks. Results flow autom:lllcally to the instructor's grade book, and the inst ruClor cail easily communicate with individuals, sections, or an entire course.
CD-ROM, [SBN : 0-534-42289-6 Contams data sets for more than 800 pro blems III Maple, MATLAB, and Mathern:l!ic:l, as welt as dnta sets for selected examples. Also contains CAS e nhancements to the vtgnettes and explorations that :lppear III the text and includes manuals for using Maple, MATLAB, a nd Mathematica. Website for Lillcar Algebra: A Moderll IlItrQtillC/ioll math.brookscole.com/ poolelinearalgebra Contains additional online support materi:lls for students and instructors. O nli ne teachmg lips, practice e xams, selected homework solutions, transpare ncy masters, :lnd exercises to ac,omvany the book's appendices are available. Online versions of some of the ma te rial on the book's C D- ROM will also be fo und here.
Acknowledgments The reviewers of the first edition cOll tributed valuable a nd oft en msightful comme nts :lbout the book. I am grateful for the time each o f them took to do this. Their judgemen! and suggestions have contributed greatly to the second edition, whose reviewers contribu ted additional helpful suggestions. Reviewers for the First Edition Israel Kohracht, University of Con necticul Arthur Robinson, George Washington University Mo Tav:lkoli, Chnffey College Reviewers for the Second Edition Justin Travis G ray, Simon I:r;)ser Umversity J. Douglas Fai~s, Youngstown Sta te Un iversity Yltval Fl icker, O hio State Uni ve rsil y \ViIliam Hagl'r, Unl\'ersity of ):Iorida Sasho Kal3jdzievski, Unive rsity of Mallltoba Dr. E.n- Bing Lin , University o fToledo Dr. Asamoa h Nkwanta, Morgan Slate University G leb Novitc hkov, Pe nn State Unive rsity Ron Solomon, Ohio State UOIversity Bruno Wei fe r!, Arizona State Uni verSity
I am indebted 10 a great many people who h:lVe, over the years, influ~nced my vIews about linear algebra and the teaching of mathematics in generaL First, I would like to thank collectively the p'lfticipams in the education and special linear algebra sessions at meetmgs of the Mathematical Associatio n o f America and the Ca nadian M(lthemalical SOCIety. I h(lvc also learned much from parllcipatlon In the Canadian Mathematics Education Study Group. I espeCially wam to thank Ed Barbeau, Bill Higgmson , Rlch.ml HoshlllO, John Grant McLoughlin, Eric Muller, Morris Orzech, Bill Ralph , Pat Rogers, Peler Taylor, and Walter Whiteley, whose advice and Inspira tion contributed greatly to the philosophy and style of this book. Special thanks go to Jim Stewarl for his ongomg support and advice Joe Rotman and his lovely book A First COllrs!! ill Abstmcf Algebm IIlspired thc etymological notes in this book, and [ rehed heavily on Steven Schwarlzman's Tile Words of Mathem(ltics when compiling t hese notes. I thank An Benjamin for introducing me to the Codabar system. My colleagues Marcus ]>ivato and Reel11 Yassawi provided useful information about dynamical systems. As ah"ays, I am grateful to my students for asking good questions and providing me with the feedback necessary to becoming a better teacher. Special thanks go to my teachlllg assistants Alex Chute, NICk F
.
-'
To
,.
':
SUO
"Wollld you tell me, plr{lse, wllich way I ollght to go from here?" "That depends a good dim / Oil where you want to get to," said the Cat. - Lewis Carroll Alice'sA dvenrures in Wonderllllul, 1865
This text was written wit h flexibilit y in mind . It IS intended for use in a one- or two-semester co urse with 36 lectures per semester. The range of topics and applications makes it suitable fo r a vanety o f audiences and ty pes of courses. However, there is mo re material in the book than can be covered in class, even III a two semester cou rse. After the fo llowing overv iew of the text are some brief suggestio ns for ways to use the book. The Instmctor's Gllide has more detailed suggestions, including teach ing n otes, recommended exercises, classroo m activities and p rojects, and add itional topics.
An Overview Ollhe Texi Chapter 1, Vectors
Sec pagr 29
Sn page 45
See pages 52, 53, 55
T he racetrack game in Section 1.0 serves to introduce vectors III an informal way. (It's also quite a lot of fun to play!) Vectors are then formally Introd uced fro m both an algebraIC and :1 geometn c poin t of view, T he operations of additio n a nd scalar multi plICation and their properties are fi rst developed in the concrete settings of R! and R J before being generalized 10 Rn. Sectio n 1.2 defines the do t p roduct of vectors and the reiated nOlions of length, angle, and orthogonality, T he very importa nt concept of (o rt hogonal) p rojection is developed here. il will reappear in C hapters 5 and 7. T he explorat io n "VIXlors and Geomet ry" shows how vector methods can be used to prove certam results in Euclidean geometry. Sectio n 1.3 is a basic but tho rough int roductio n to lines and planes in R2 and R' . T his section is crucial fo r understanding the geometric significance of the sol ution of linear systems in Chapter 2. Note that the cross product o f vec.to rs in R3 is left as an explo ratio n. T he chap ter concl udes with an mtrod uctioll to codes, lead ing to a discussio n o f modular arithmetic and nnite linear algebra. Most students will enjoy the application to the Universal Product Code (U PC) and International Sta ndard Boo k Num ber (ISBN ). The vignette on the Codabar system used in cred it and bank cards is an excellent classroom p resen tation that can even be lLsed to introduce Section 1.4 , nil
111111
To the InSlructor
Cbapter 2, 51111111 01 linear (quallons Sa prlge 58
Sa pagtS 75, 203, J8J. 490
Set: {Hlgt:s 66, 86, 87
T he int rod uction to this chapter serves to illustrate that there is more than one way to think of the solutio n to a system o f linear equations. Sections 2.1 and 2.2 develop the ma in computation al tool for solving linear systems: row reduct ion of matnces (Gaussian and Gauss-Jordan elimi nat ion.) Nearly all subsequent comp utational methods in the book depend on this. The Ra nk Theorem appears here for the first tIme; it shows up again, in more generality, 111 Cha ptt'!fs 3, 5, and 6. Section 2.3 is very important; it in troduces the fundamental notions of spanning sets and linear independence of vectors. Do not rush thro ugh this material. Section 2.4 contallls fi ve applicalions from whICh instr uctors can choose depending o n the time available and the interests o f the class. The vigneu c on the Globa l Positioning System provides another application that students will enjoy. The Iterative methods in Sect ion 2.5 will be optional fo r many courses but arc essential for a course with an appl ied/n u mencal focus. T he three explorations in this chapter are related in that they all deal with aspects of the usc of computers to solve linear system s. All students should at least be made aware of these issues.
Chapter 3, Matrices
Seepage IJ4
Sa pages 170, 204, 29
Sa page 214
See page5 228. 2JJ
---'
This chapter con tains some of the most important ideas in the book. It IS a long chapter, but the early material can be covered fairl y qu ickl y, with extra time allowed for the crucial matenalill Sect ion 3.5. Section 3.0 is an explo ratio n that introduces the notio n of a linear transformat ion: the Idea that matrices are not just static objects but rather a type o f function, tr.tI1sfonning vectors into other vectors. All of the basic facts about matrices, matrix operations, and their properties are fou nd in the fi rst two sections. T hc material on partitioned matrices and the multiple representations of the matrix product is \vorth .messing, beca use Ii is used repeatedly in subsequent sections. The Fu ndame ntal Theorem of Invertible Matrices in Section 3.3 IS very important and will appear sevcral more times as new characterizations of inwrt ibil ity are presented. Section 3.4 d iscusses the very important LV factorization of a matrix. If this top ic is not covered in class, it is worth assigning as a project or discussing III a workshop. The point of Section 3.5 is to present many of the key concepts of linear algebra (subspace, basis, d imension, and rank) in the concretc sett ing of matrices before studen ts see them in fu ll gcnerality. All hough the examples in this section are all familiar, it is Import"an t that students get used to the new termillology and, in particula r, understand what the notion o f a basis means. The geometric treatment of linear tra nsform ations in Section 3.6 is intend ed to smooth the transition to general linear transformations in Chapter 6. The example of a projection is particularly important because it will reappear in Chapter 5. T he vignette on robotic arms is a concrete demonstration of composition o f lincar (and affine) transformations. There are four applications from which 10 choose in Section 3.7. At leasl one on Markov chams or the Leslie model of population growth should be covered so that it can be used agai n in Chapter 4 where their behavior will be explamed.
CUPt., 4, Ilunal ••• aid Ilgen.ectors Seepage 252
The in troduction Section 4.0 presents an interesting dynamical system involving graphs. This exploration in troduces the notion of an eigenvcctor and foreshadows the power method in Section 4.5. In kccplllg with Ihe gcomelric emphasis of the book, Seclion 4.1 con tains the novd feature of "clgenpicturcs" as a way of visualizing
To the Instructor
S«JHl~ 283
SIT pngc 353
&r pagN J9.l, )95
S« pogrs 40.... 411.
III
th e eigenvecto rs of 2X2 matrices. Dete rm ina n ts appear m Section 4.2, motivated by their use in finding the charnctenstic polynomials of small matrices. This "c rash course" 10 de termmants co nta ills all the essential material s tude nts need, including an optional but elem entar y p roof of the Laplace Ex pansion Theorem . The exploration "Geometric App lica tions of Determinants" makes a nice project tha t contains several in tcresting and useful results. (Alternatively, inst ructors who Wish to give m ore d ctailed coverage to de terminants may ch ~ to cover some of this exploration in class.) The basic theory of eigenvalues and eigenvcctors is found in Section 4.3. and Sect ion 4.4 deals wi th the important topic of diagonahzation. Example 4.29 o n powers o f matrices is worth covering in class. The power method and its varia nts, d iscussed in Section 4.5 , arc optio nal, but all students should be aware of the method , and an ap plied course s hould cover it in deta il. Gerschgorin's Disk Theo rem ca n be covered independently of the rest of Section 4.5. Ma rkov chains and the Leslie model of populat ion growth reappea r in Section 4.6. Although the proof of Perro n's Theorem is optional, the theo rem itself ( hke the stronger Perron-Frobeni us Theorem ) sho uld at least be m en tioned because it explains wily we should expect a unique positive eigenvalue with a corresponding positive eigenvector in these applications. The applications on recurre nce relations and d ifferen tial equatio ns connect linear algebra to discrete mathematics and calcul us, respectively. The matrix exponential can be covered if your class has a good calculus background. The final to pIC of discrete lin ear dynamical systems revIsits and summarizes ma ny of the ideas in Cha pter 4, looking at them in 11 new, geom etric light. Students will enjoy readi ng how eigenvC<:to rs can be used to help rank sports teams and websites. This vignetle can easIly be extended to a project or enrichmen t activity.
The mtroductory exploration, "Shadows on a Wall ," is mathematICS at its best: it takes a known co ncept ( projectio n o f a vector o nto another vector) and generalizes it 111 a useful way (projection of a vector on to a s ubspace a plane), while uncovering some previously un observed pro perties. Sect ion 5.1 contains the baSIC results about o rthogonal and o rtho normal sets of vectors tha t will be used repeatedly from here on. In particular, orthogonalmatTlces should be s tressed. In Section 5.2, two concepts from Chapter I are generalized: the ort hogonal complemen t of a subspace and the o rt hogo nal projection o f a vector onto a subspace. The Ort hogonal Decomposition Theorem is im portant here and helps to set up the G ram-Schm idt Process. Also no te the quick proof of the Rank Theorem. The Gram-Schmidt Process is detailed in Section 5.3, along wit h the extremely importan t QR facto riution. The Iwo exploratIons that follow outline how the QR facto rizatio n is computed in practice and how it can be used to app roximate eigenvalues. Section 5.4 0 11 orthogonal diagonaliz.... tion of ( real) sym metric matrices IS needed for the applicallons that foll ow. It also contams the Spectral Theorem , one of the h ighlights of the theory of linear algebra. The applications in Sect ion 5.5 include dual codes. quad rat ic forms, and gra phmg qU,ldratic equatIons. , always mdude at least the last of these in my course because it extends what s tudents alre3dy know about conic seCllons.
Chapter 6, Vector Space. The Fibonacci sequence reappears in Section 6.0, although it is not im portant that students have seen it before (Section 4.6). The p urpose of th is explo ration is to show that familiar vector space concepts (Section 3.5) can be used frui tfully in a new
II
To the Instructo r
Supagd J9
setting. Because all of th e main ideas of vector spaces have already been introduced in Chapters 1- 3, students should find Sections 6.1 and 6.2 fai rly famili ar. The emphasis here should be o n using the vector space axioms to prove properties rath('r than relying on computational techn iques. When discussing change of basis in Section 6.3, It is helpful to show students how to use the notation to f('member how the construction works. Ultimately, the Gauss-Jordan method is the most ('fficlent here. Sections 6.4 and 6.5 on Imear transformations arc important. The examples are rellited to previous results o n matrICes (and matrix transformations) . [n particular, it is 1ll1portant to stress that the kernel and range of a linear transformatIOn g(,l1('ralize the null space and column space of a matrix. Section 6.6 puts forth the notio n that (almost) all linear transformations are essentiall}' matrix transformations. This builds on the information in S('ction 3.6, so students should not fi nd it terribly surprising. However, the examples should be worked carefully. The con nection between change of basis and similarity of matrices is noteworthy. The exploration "Tilings, Lattices, and the CrystallographIC Restriction" is an impreSSive application of change of basis. The connection with th(' artwork of M. C. Escher makes it all the more lllteresting. Th(' appli cations in Section 6.7 build o n prevIOus ones and can be included (IS time and interest permit.
ChaPler 1, Dlslance lid Appro.I ••llon Serpage 538
See page .';'52
See pagt: 5';6
Seepagc616
Section 7.0 opens WI th the enterraining "Taxicab Geometry" exploration. Its purpose is to sct up the material on generalized norms and distance functions (metrics) that follows. Inner product spaces arc discussed in Sectio n 7. 1~ the emphasis here should be on the examples and using the axioms. The explorallon "Vectors and Mat rices with Complex Entries" shows how th(' concepts of dot product, symmetric matrix, orthogonal matrix, and orthogonal diagonalizatlon can be extended from real to complex vector spaces. The following explorat ion , "Geom('tric Inequalities and Optim izatIOn Problems," is one that studen ts typically enjoy. (They will have fun seeing hmv many "calculus" problems can be solved Without using calculus at all! ) SectIOn 7.2 covers generalized vector and matrix norms and sho ws how the condi tion number of a matrix is related to the notion of ill-conditioned linear systems explored in Chapter 2. Least squares approximation (SectIOn 7.3) is an important application of lin('ar algebra in many other disciplines. The Best Approximation Theorem and the Least Squares Theorem arc importan t, but their proofs are intuitively clear. Spend time her(' on the examples-a few should suffice. Section 7.4 presents the singular value decomposition, o ne of the most impressive applications of linear algebra. If your course gets this far, you will be amply rewarded. Not only does the SVO ti(' together many notlons discussed prevIOusly; it also affords some new (and quil(, powerfu l) applications. If MATUB is available, the vignette on digItal image compression is worth presenting; it is a visually impressive d\spltly of the power of linear algebra and a fitting culmination to the course. Th(' furth('r applications in Section 7.5 can be chosen accordi ng to the time available and the inter('SIS of the class.
How 10 Use Ihe Book Students find the book easy to r('ad,so I usually have them read a section before I cov('r th(' material in class. That way, J ca n spend class time highlighting the most important concepts, dealing with topics students find difficult, working ('xamples, and discussing
10 the Instructor
III
applications. I do not attempt 10 cover all of the material fro m the assigned read ing in class. Th is approach enables me to kee p the p ace of the course fairly brisk, slowing down for those sections that students typically find challenging. In a two-semester course, it is pOSSible to coyer the entire book, including a reasonable sele
• laslc Gourse A course designed for mathematics majors and students from other disciplines is outlined below. This course d oes not mention generaJ vecto r spaces at all (all concepts are treated in a concrete setting) and is very light on p roofs. Still, it is a thorough int roductio n to h near algebra. Section
Number of Lectures
1.1
1
1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.5
1- 1.5 1- 1.5 05--1
1-2 1-2 1- 2 1 2 2
Section
3.• 4.1 4.2 4.3 4.4 5.1 5.2 5.3 5.4 7.3
Number of Lectures
1-2 1 2 1 1- 2 1-1.5 1-1.5 0.5
1 2 Total: 23-30 lectures
Because the students in a cou rse such as this one rep resent a wide variety of disciplines. I wo uld suggest usi ng much of the remai ning lecture time fo r applications. In my course, 1 do Section l A, which students really seem to like, and at least one application from each of Chapters 2-5. O ther applications can be assigned as projects, along With as many o f the exploratio ns as d esired. There is also sufficient lect ure time available to cover some of the theory in detail.
For a course with a computational emphasis, the basic cou rse outlined above can be supplemented with the sections o f the text dea ling WIth numerical linear aJgebra. rn
such a course, I would cover part or all of Sections 2.5,3.4 , 4.5, 5.3.7.2, and 7.4, endIng with the singular value decomposi tion. The explorations in Chapters 2 and S arc particula rl y well suited to such a course, as are almost an)' of the applications.
A COirse lor Slldeals W.o Hawe Already Slldled So •• Unar Algebra Some courses will be aimed at students who have already encountered the basic prin ciples of li near algebra in other co urses. For example, a college algebra course will often include an introduction to systems of linear equations, matrices, and determinants; a rnultivariable calcu lus course will almost cen ainly contain material o n vectors, lines, and planes. For students who have seen such topics al ready, much early material can be omitted and replaced with a quick revie\v. Depending on the back ground of the d ass, it may be possible to skim over the material in the basic course up to Section 3.3 in abou t six lectures. If the class has a signi ficJnl number of mat hematics majors (and especially if Ih is is the only linear algebra course they will ta ke), I would be sure to cover Sectio ns 1.4,6.1- 6.5,7. 1, and 7.4 and as many appllcJtions JS t ime permits. If the course has science majors (but not mathematics majors), I would cover Sections 1.4,6.1, and 7.1 ;md a broader selection of applications, being sure to include the material on differen tia l equations and approximatio n of functions. If computer science students or engineers are prominently represented , I would try to do as much of {he materia! on codes and numericalllllear a!gebra as , could. There are many o ther types of courses that call successfully use this text. I hope that you find it useful for your course and that yo u enjoy using it.
... . -
-;"'
.'
-"0
To the Siudent "Wllere shall I ocgm, please your Majestyr he lUked. "Beg m at th e beginning," the Kmg SIIill, gravely, "Iurd go on till you (ome
to the end: thell stop." - Lewis Carroll Alue's Adventures II! Wonderland, 1865
linear algebra is an exci ting subject. It is full of in terestin g results, applicat ions to other disoplines, and connections to o ther areas o f mathematics. The Stlldetlt Solution; Manllal and Stu dy Guide contains d etailed advice on how best to use this book; fo llowing are some general suggestio ns. Linear algebra h
[ 1~2] is the same as the set of all scalar m ultiples of [~ l-
As yo u encounter new concepts, try to relate them to examples that you know. Write out proofs and solutions to exercises in a logical, con nected way, usmg complete sentences. Read back what you have written to see wh ether it makes sense. Better yet, if yo u can, have a friend in the class read what you have written. If il doesn't make sense to another person, chances are tha t it doesn't make sense, period. You will find that a calculator with matrix capabilit ies or a compu ter algebra system is useful. These tools can help you 10 check you r own hand calculations and are indispensable for some problems involving tedious computatio ns. Technology also enables you to explore aspects of linear algebra o n your own. You can play "what if?"
11111
1111
1b the Stude nt
games: What if I change one of the entries in this vector? What if this matrix is of a d ifferent size? Can I force the solution to be what r \"ould like it to be by changing something? To signal places in the text or exercises where the use o f technology is reco mmended, I have placed the icon in the margin. The CD that accompanies th is book contains selected exercises fro m the book worked o ut using Maple, Mathematiea, and MATLAB, as well as much additional advice abo ut the use of technology in lin ear algebra. You are abo ut to embark on a journey through lmear algebra. T hink of th is book as yo ur travel guide. Are you ready? Let's go!
=.
.-
~
.... -
----
...-
-
-.~
.
.
eel r
Here they cOllie pourlllg out of the bille Little arrows for me and for you. - Albert Ham mon d and Mike Hazelwood
Lillie Arrows Dutche.~~
Music/BMI, 1968
1.0 Introduction: The Racetrack Game M:my measurable quant ities, such as length , area, volu me, mass, and temperature, ca n be completely described by specifying their magnitude. Other quantities, such as velocity, force, and acceleration, require bo th a m:lgn itude and a direction for their description. These quantities arc vectors. For example, wlIld velocity is a vector consisting of wind speed and di rection , such as 10 kph southwest. Geometrically, vectors are often represen ted as arrows or directed line segments. Although the id~a of a vecto r \~as introduced In Ihe 19th c~ntury, ItS usefulness in applications, particularly those in the physical SCiences, was' not realil..ed unt il the 20th century. More recently, vectors have found appl ica tions in computer science, st;ltistics, econo m ics, and the life and social sciences. We will consider some of these many applications throughout this book. This chapter IIltroduces vectors and begi ns 10 consider some o f their geomet ric and algebraic p roperties. We will also consider one nongcometric applica tio n where vectors arc useful. We begin, though, with a simple game that introduces some of the key ideas. [You may even wish to ptay It with a fm nd duri ng those (very rard) dull momen ts in linear algebra cI;ISS. [ The game is played on graph paper. A track, with a starting line and a fi nish tine, is d rawn on the paper. The track ca n be of any length and shape, so long as it is wide eno ugh to accommodate all of the p];\yers. For this example, we will have two playe rs (let's call them Ann and Ik rt ) who use differem colored pens to represent their cars or bi cycles or whatever they are going to race around the track. (Let's think of Ann and Bert as cyclists.) Ann and Bert each begin by draw ing a dot o n the starting line at a grid po int on the graph paper. They ta ke turns m oving to a new grid point, subject to the following rules:
t. Each new grid point and the line segment connectmg it to the previous grid POIllI must lie entirely wi thin the track. 2. No two players may occupy the same grid point o n the same turn. (This is the"no collisions" rule.) 3. Each new move is related to the p revious move as fo llows: If a player moves a uni ts horizontally and b units vertically o n one move, then on the next move he 1
Chapter I
!
Vectors
or she must move between a - I and II + I units horizontally and between b - 1 and b + I uni ts vertically. In other words, if the second move is c units horizon tally and d units vertically. then )n - cl S I ,tnd )b - (~ S I . (This is the "acceleran onld ecelenlllon" rule.) Note tha t this rule forces the first move to be I unit ve rticall y andlor I unit horizo ntally.
1 ~
• TIle Iri sh mathemat iCian Wilham
A player who collides with another player or leaves thc track is eliminated . Thc winner is the firs t player to cross the fi nish line. If mo re than one player crosses the fini sh line o n thc same turn, the one who goes farth est past the fimsh hne IS the willner. In the sample game s hown in Figure 1.1, Ann was the winner. Ben accelcrated toO quickly and had dirficull y ncgoliating the turn at the top o f the track. To understand rule 3, consider An n's third :lIld fourth moves. On her thi rd move, s he we n t I unit hOri zon tally and 3 units vertically. O n her fourth move, her opt ions were to move 0 to 2 uni ts horizo ntally and 2 to 4 uni ts ve rt icall y. ( Not ice tha t some o f these combillations wo uld ha vc placed her ou tside th e track. ) She chose to move 2 units in each direction.
Rowan H:unilt on (1805- 1865)
ust'd "ector concepts in his stud y of complt'X numbers and their generalizatio n, the quaternions.
Shm L~±
A B
L + _---' Flilish
,,,.,, 1.1 A sample game of racetrack
Pr.ItI,.l Playa few games of racetrack. ".111,. t Is it possible for Bert to win thIS race by choosing 11 different scque nce of moves?
PrlIII•• , Usc the notation (a, bJto denote a move that is a units horizontally lind b units ve rt ical ly. (Eilher a o r b or both may be negatIVe.) If move 13, 4 1 has just been made, d raw on graph paper all the grid points tha t could pOSSibly be reachcd on the next move. PrlIII_ ~ \-\'hat is the lIer c[fect of IWO successive mOH~s? In other words, if ~'ou move la, bj and then Ie, dl, how far ho rizontally and ve rtiCally Will you have mm·ed a ltogether?
Section l. t The Geometry and t\lgebra ofVeclors
3
Pr••I,. 5 Write out Ann's seq uence of moves using the ((I, bl notation. Suppose she begins at the origi n (0,0) on the coordi nate axes. E»plain how you can find the coordinates of the grid point corresponding 10 each of her moves wit/,our looking m the graph paper. If the axes were drawn dirrerently, so that Ann's starting poi nt was not the origin but the point (2, 3), what would the coordinates of her final point be? Although simple, this game introduces several ideas that will be useful in our study of \'ectors. The next three sections consider veClors from geomet ric and algebraic viewpoints, beginning, as in the racet rack game, in the plane.
The Geometrv and Algebra of Veclors Vectors In the Plane
The Cartesian plane is named after the rrcrKh philosopher and mathematician Rent! Descartes (J 596(650), whose introduction of coordinates allowed grometric problems 10 be handled using algebraic techniques. The word vtctorcomes from the Lalin rool meaning ~ to carry.. .. A veclor is fo rmed when a poinl is displ ~ced--or "c~\fried off"- a gi"en distance in a given direction. Viewed another way, vectors "cn rry" two pieces of information: their length :md their direction. When writing vecton. by han it is difficult to indicate boldface Some people prefer to write Ii fo the vector denoted in print by V, but in most cases it is fine to use ;111 ordin~ry lowercas;,' v. [ t will usuaUy be dear from Ihe context when Ihe letter denOies a vector. The word componem is derh·e{i from the L.1lin words (0, meaning "together with," and POllett', meaning "to put." Thus, a vedor is MPUI together" out of its component.>.
We begin by consideri ng the Cartesian plane with the fa miliar x- and y-axes. A vector is a dlrecred lINe segment that corresponds to a displacement from one point A to another point B; see Figure 1.2. The vector from A to B is denoted by AB; the point A is called its initial point, or tail, and the Jroint 8 is called its termillal point, or Ilead. Often, a vector is simply denoted by a single boldface, lowercase letter such as v. The set of all points in the plane corresJronds to the set of all vectors whose tails are at the origin O. To each po int A, there corresponds the vector a = OA; to each vector a With tail at 0, there corresponds its head A. (Vectors of thIS form arc sometimes called position vectors.) 11 IS natural to represent such vectors using coordinates. ror example, III Figure 1.3, A = 0,2) and we wrlle the vector a = OA = (3,21 using square brackets. Si milarly, the other vectors in Figure 1.3 are
-
-.
-
b = [ - 1,3J "nd < = [ 2, - I J The Individual coordinates (3 and 2 in the case of a) are called the components of the vecto r. A vector is sometimes said to be an ordered pair of real numbers. The o rder is Important since, for example, [3,21 "., [2, 3J. In general, two vecto rs are equal if and only if their correspond ing components are equal. Thus, lx, yJ = I J, 5 J implies thaI X "" I andy = 5. It is frequently convenient to usc column vectors instead of (or in addilion (0 ) row vectors. Another representatio n of 13, 21 is
[ ~ ]. (The important point is that the )'
B
y
; '0
b
, Fillr, 1.3
c
•
[R '
Chapter I Vectors
is pro nounced "r two."
co mponents are ordered. ) In later chapters, you will see that colum n vectors are somewhat better from a compu tational point of view; for now, try to get used [0 both representations. It may occur to you that we cannot really draw the vec[Or [0, OJ = 00 from the origi n to Itself. Nevertheless, it is a perfectl y good vector and has a special name: the zero vector. The zero vector is denoted by O. The set of all vectors with n. . o componen ts is denoted by 1Il1 (where R denotes the set of real numbers from which the components of vectors in 1R2 arc chosen). T hus, [-1,3.5), [ V2, 1T ]. and [~ , 4 ] are all in Rl. Thinklllg back to the racetrack game, let's try to connect all of these ideas to vectors whose tails are rlO l at the Origin. The etymological origin of the word vector in th e verb "to carry" provides a cl ue. The vector [3, 2J may be interprctcd as follows: Start ing at the origin 0, travel 3 units to the nght, thcn 2 units up, fin ishing at P. The same displacement may be ltpplied with oth cr mitial points. Figure 1.4 shows two equ ivalent displacements, represented by the vectors AB and CD .
-
-
)'
B D
"+-+- -+-+- ---+ .,
ng.r.l.4
- -
When vectors arc referredTcill
their coordinates, they are being considerrd mralytica/ly.
We define two vectors as equal if they have th e same length and the same direction. Thus, AU = CD in Figure 1.4 . (Eve n though they htlve di fferent initial and terminal poi nts, they represent the same displacement.) Geometrically, two vectors are equal If one can be obtained by slidmg (or translatmg) the other parallel to itself until the two vectors coincide. In terms of components, in Figu re 1.4 we have A = (3, 1) a nd B = (6, 3). Notice that the vector 13, 2\ that records the displacement is just the d ifference of the respectIve components:
-
AB ~ [ 3 ,2 1~[ 6 - 3 , 3 -1J
Sim ilarl y,
-
-
Ci5 ~ [- I - ( - , ), '- ( - I )I~[3, l )
a nd thus AB = CD, as expected . A vecto r such as OP with its tail at the origm is said to be in standard position. The fo regoi ng dIscussion shows that cvery vector can be drawn as a vector in sttlndard position. Conversely, a vector in standard position can be redmwn (by translation) so that lIs tail is at any pomt in the plane.
Example 1.1
-
-
If A = (- 1,2) and B = (3, 4 ), fi nd AU and redraw it (a) rn standard position and (b) with its tml at the point C = (2, - I ).
"..-tI,.
-
-
We compute AB =[ 3 - (- 1),4 - 2] [4,2 ]. If AB is then tr:lllslated to CD , where C = (2, - I ), then we must have D = (2 + 4, - \ + 2) = (6, I ). (Sec Figure 1.5.)
Section 1. 1 The Geometry and Algebra of Veclors
5
y
8(3 .4)
A( - i .2)
14. 21
n,lr.1.5
Mew Vectors " •• Old As in the r~l ce \ rack game, we often want \0 "fo llow" one vector by another. Th is leads to the no tion o f veCforadditiotl, the fi rst basic vector operation.
If we fo llow u by v, we can visualize the total displacement as a third vector, denOied by u + v. In Figure 1.6, u = f I, 21 and v = [2, 2 J, so the net effect of fo llo\"ing u by vis
I' + 2,2 + 21 which gLves u the vector
+ v. In general, If u
=
13, 41
= 111,, 1,21a nd v = I v.. \/21 , then their $um u + v IS
II is helpful 10 vLsualize u + v geometrically. The following rule is the geo metric version of the foregoing discussion.
y
,, ,: 2
,, ,, ,
,
___ __ -I!
,', 4
2
, ,
u
, , _______ _ _ £1 3
'Illn 1.' Vector addi tI on
•
Chapter I
Vtxtors
Tbe Head-Io-Tail Rule
Given vecto rs u and v in 1R1, translate v so that its tail coincides with the head of u . The sum u + v of u and v is the vector from the tail o f u to the head o f v. (See Figure 1.7.)
, ,
n••r.1.1 The head-to-tail ru le
, Figure 1.1
By tra nslating u and v parallel 10 themselves, we obtain a parallelogram, as shown in Figure 1.8. This parallelogram is called the parallelogram determirlCd by u
The paTJllelogram
11m/ v. It leads to an equivalent version of the head-Io -tail rule for vectors in standa rd
determined by u and v
position.
The Parallelogram RUle
G iven vectors u and v in R2 (in standard position), t heir sum u + v is the vector in standard position along the diagonal of the parallelogram determined by u and Y. See Figure 1.9.)
"
,
" ng.r.U The parallelogram rule
ExamPle 1.2
If u = [3, - IJ and v = [1, 4], compute and draw u
+ v.
We compute u + v = [3 + I, - \ + 4 1 = [4, 3]. This vecto r IS drawn using the head-to-tail rule in Figure l.iO(a ) and using the parallelogram rule in Figure l.iO(b).
SDI.1I01
&ellon 1.1
The GMll1etry and Algebra of V«tors
y
y
-+-_x
(,)
(b)
Fig,,. 1.11
The second basic vector operation is scalar multiplimtio1l. G iven a vector v and a real number c, the scalar multiple cv IS the vector obtained by m uh iplYlllg each component of v by c. For example, 3[ - 2, 4 J = (- 6, 12 J. In general,
Geometrically, cv is a "scaled" version o f v.
If v = \-2, 4 ], compute and dra w 2v , 1v, and -lv.
S,I,.It.
We calculate as follows: 2v = (2(-2),2(4)J = (- 4,8J
Iv= (\(- 2), \(4) J =
(- I, 2J -2v = (- 2( - 2), - 2(4) J = (4, -8J These vectors are show n in Figure !.II. y
2v
v
,
!v
-+-_ x
- 2.
n"r. 1.11
•
Chapter I Vector5
, " ,
2,
...-::',,,
fIt,r.l.l1 Vector subtraction
fll'" 1.12
The term S(a/nrcomes from the Latin ....,ord Knill, meaning Mlad_ der." The equally spaced rung~ on a ladder suggest II scale, and in vec tor arithmetic, multiplication by a constant changes only thc '>tale (0 length) of a vettor. Thus, constant became known as scalars.
"
u +( - v )
Observe that cv has the same direction as v Lf c > 0 and the opposite di rection if c < O. We also see that cv islc~ times as long as v. For this reason , III the context of veetors, constants (that is, real numbers) are referred to as sca/a N. As Figure 1.1 2 shows, when translation of vectors is taken into account, two vectors are scalar multiples of each other 1f and only if they are parallel. A special case of a scalar multiple is (- \ lv, which is written as - v and is called the negative ofv. We can usc it to define vector subtraction: The difference of u and v is the vector u - v definfd by II -
V
= U
+ (- v)
Figure 1.1 3 shows that II - v corresponds to the "o ther" d iagonal of the parallelogram determined by u a nd v.
EII.,'I 1.4 y
b- .
~ b
B
::..:.... .,
fl,.,. 1.1.
= [-
3, 11. then u - v = (I - (- 3), 2 - 11 = [4,
II.
--
A
•
If u = [I, 2 J and v
The definitio n of subtractIon in Example 1.4 also agrees with the way we calculate a vector such as AB. If the poin ts A and B correspond to the vectors a and b in standard posLtlo n, then All = b - a. as shown in Figure 1.14 .1 Observe thai the headto · ta il rule applied to this diagram gives the equation a + (b - a ) "" b. If we had accidentally drawn b - !l wLth its head at A Lllstead of at 8, the diagram would have read b + (b - a) = a, which is d early wrong! More will be said about algebraic exprl"SSions involvmg vectors later I II this section. I
Veelors
'I ~.
Everything we have just do nt txtends tasily to three dimensions. The set of all ordered triples of real numbers is denoted by RJ. Points and vectors are located using three mutually perpendicular coordinate axes that mC<.'t at the o rigi n O. A poi nt such as A .. ( 1, 2, 3) can be located as follows: First travel 1 unit along the x-axis, then move 2 units parallel to the y· axis, and fi na!!L!nove 3 uni ts parallel to the z·axis. The corresponding vector a = I I, 2, 31 is then OA , as shown in Figure 1.15. Another way to visualize vector a in Ill' is to construct a box whose six sides arc determ ined by the three coordinate planes (the X)'- , XZ' , and yz-planes) ,lIld by three planes through the point ( 1, 2, 3) parallel to the coordinate planes. The \'ector [1, 2,31 then corresponds to the diagonal fTo m the o rigin to the opposite corner of the box (see Figure 1.16).
Section 1.1
The Geometry and Algeb ra of Vectors
•
l
A(1. 2. 3)
a •
I ,
,, ,:3 ,'
2"-J~ Y
fill" 1.15
x n ••r.1 .16
T he ucomponenlwise" d efin itions of vector additio n and scalar multiplication ,lfe extended to IRl in an obvious way.
Vectors il R" In general, we define R~ as the set of all ordered II-wples of real numbers written as row o r column vectors. Thus, a vector v in lQ" is o f the form
)
The individual entries of v arc its compo nents; V, is called the ith component. We extend th e definition s of vector add ition and scalar multiplication to R~ In the o bvious way: If U "" [II I' 1/2' . .. , II" ] and v = [vI' I'., ... , v"l, the Ilh component of u + v is II, + v, and th e ith componen l of cv is just (v,_ Since in IR" we can no longer draw pictures of vectors, it is important to be able to calculate with vectors. We must be ca reful not to ass ume tha t vector arithmetic will be si milar to the arithmetic o f real numbers. Often it is, and the algebraic calculations we do vmh vectors arc Similar to those we wo uld d o with scalars. But, 10 later se<:tions, we will encoun te r situatio ns where vector algebra is quite IInlikeour previous experience with real num bers. SO II IS importan t to verify any algebraic properties before attem pting to usc them. One s uch property is commlitalill ilyo f addition: u + v = v + u for vectors u and v. T his is certainly true in 1Il1. Geometrically, the head-to- tail rule shows that both u + v and y + u arc the main diago nals of the parallelogram determ ined by u and v. (The parallelogra m rule also refl ects th is symmetry; see Figure 1.1 7.) No te that f igure 1.17 is simply an illustration of the property u + v = v + u.Jt is not a proof, si nce it docs not cover every poSSible case. Fo r example, we must also include the cases where u = Y, U = - V , and u = O. (What would d iagrams for these cases look like?) ror this reaso n, an algebraic p roof is needed. However, it is just as easy to give a p roof that is valid in IR" as to give o ne that is valid 10 Ill!. The following theorem summarizes the algebraic properties of vector addition and scalar multipllcat Ion in IR". The proofs follow fro m the corresponding properties of rea l numbers.
11
Chapter I
Vec tors
TheOf•• l.l
Algebraic Properties of Vector s in a" Let u, v, and w be vecto rs in R~ and let c and d be scalars. The
+ v '" v + u b. (u + v) + w = u + (v +
a. u
Commutativi ty
•
w)
c. u + O = u
d. u + ( - u) = 0 e. c( u + v ) = eu + ev f. (e + d )u = eu + ti u g. e( du ) = (ed)u h. iu = u
Distributiv ity Distributivit
••••nl The word rlworem is derived from
the Greek word rhrorema, which in turn CO Ul es from a word mcanin "to look at." Thu5, a theorem i based o n the insights we have when we look at examples and extract from them properties tha we t ry to p rove hold in general. Similarly, when we understand someth ing in mathematics-the proof of a theorem. fi p we o ften say " I see."
• Properries (c) and (d ) together with the commutativity property (a) imply thatO + u = u and - u + u = Oas well. • If we read the dlstributivity properties (e) and ( f) fro m right to left . they say thai we ca n filctor a common scalar or a common vector fro m a su m.
mil We prove p roperties (a) and (b) and leave the proofs of the remaining p roperties as exerCiSes. Let u = [ 11 1,112> " " Il nJ , V = [VI' V2" ' " v"l , and w '" [ wI'
W2•··· > w..l.
(, j
+ [vl . v2, ••• • v" 1 = [Ill + VI' /11 + V1" ' " 11.. + vnl = [ VI + 111' V2+ 112" " , Vn + IInl = [ vI' V1" ' " vnl + [ U p 1/2" ' " U,,]
u + v = [ tl l ,
1/2" ' "
unl
= v+ u The second and fourth equalities are by the defi nitio n o f vecto r addition, and the third equality is by the comm utativity of addition of real n um bers. (b) Figure 1.18 ilIuSlTates associativity in R: 2• Algebraica lly, we have
eu+ v) + w =
([lip [ Il l
(u
+ v) + w
= u
+ ( v + w)
[ ( II I [ Ii i
, 11"1 + [ VI' V2, ••• , v"]) + [Wl' W" . . . , W"] 112 + V" ... , lin + v~ ] + [ WI' W " • • . , wn]
112""
+
Vj ,
+ +
VI)
(VI
+ +
WI' (U 2
WI )' 112
+
V2)
+ ( Vl
+ +
W 2• .•• ,
W l ), · · · ,
( II ~
+
II ~ +
V~)
+
Wn ]
n + \\In) ]
(V
w
, fig.,. 1.1'
= u
+ (V + W)
The fourth equality is by the associativity o f add itio n of real nu mbers. Note the careful use of parentheses.
Section 1.1
The GWllletry and Algebra of "wors
11
By property (b) o f Theorem 1.1, we may u Ilambiguously write u + v + W without parentheses, since we may group the summands In whichever way we please. By (a), we may also rearrange the summands-for example,as w + u + v- if we choose. Likewise, sums of fou r or more vectors can be calculated without rega rd to order or grouping. In general, if VI' v2" •• , vt are vectors in R ~, we will write such sums without parentheses:
The next eX3mple illustrates the use of Theorem 1.1 in performing algebraic calculations with vectors.
Let a, b, and x denote vectors in Rn. (a) Simplify 3a + (Sb - 2a) + 2(b - a). (b ) If Sx - a "" 2{a + 2x), solve for x in terms of a.
hI'tltl We will give both solutions in detail , with reference to all of the properties in Theorem 1.1 th3t we use. It is good practICe to justify all steps the first few times you do this type of calculation. O nce rou arc comfortable with the vector properties, though, it is ;\cccptable to leave out some of the intermediate steps to save time and space. (a) We begin by inserting parentheses.
3a + (Sb - 2. ) + 2(b - a) = (3a + (Sb - 2. )) + 2(b - a) = (3a + (-2a + Sb)) + (2b - 2a) = «3a + ( - 2a» + Sb) + (2b - 2a ) = «3 + (- 2»a + Sb) + (2b - 2a) = (I a + Sb) + (2 b - 2a ) = « a + Sb) + 2b) - 2a = (a + (Sb + 2b» - 2a =(. +(S+2)b) - 2a
(a ), (t )
(bl (fl (bl, (hi (bl (f)
=
(7b + a ) - 2a
(.1
=
7b + (a - 2a)
(bl
=
7b + (1 - 2)a
(fl, (h i
=7b +( - I). =
7b - a
You can sec why we wi ll agree to omit some or these steps! In practice, it is acceptable to si mplify this sequence of steps as 3a
+ (Sb - 2a ) + 2(b - a ) "" 3a + Sb - 23 + 2b - 23 = (3. - 2a - 2a)+(Sb +2b) "" - a
or even 10 do most of the calculation mentally.
+ 7b
12
Chapter I
Vel:IOrs
(b) In de tail, we have
Sx - a "'" 2(a + 2x)
(5x
~
5x - a =2a+ 2(2x)
(eJ
5x - a = 2a + (2 · 2)x 5x - a =2a + 4x
(g)
a) - 4x = (2a + 4x) - 4x
+ 5x) - 4x "" 2a + (4x - 4x) - a + (5x - 4x ) = 2a + 0 -a + (5 - 4)x = 2a
(- a
(a). (b )
(b), (d )
(O. (c)
- a +( I )x =2a
a + (- a + x) = a + 2a (a + (- a)) + x - (I + 2).
(h)
(b ),
O + x =3a
(d )
x = 3a
«)
m
Again . in most cases we will omit most of these steps.
111..1 CO •• llltlOII.I. Coo~II.IeS A vector that IS 11 sum of scalar multiples of other vectors IS said to ~ a linear combi· lIatloll of those vectors. The fo rmal definition follows.
•
lelilill..
A vector v is a Ii" ear combinatiotf of vecto rs vI' v" . .. , Vi if there are scala rs 'I' '.! •... , c1 such that v - ' IVI + ~V2 + .. . + 'tvJ: The scalars Cl' cll ••• , c~ are called the coeffide", s of the linear comb ination.
,
...
2 The vecto r - 2 is a linear combination of - I
3
I
2
0
+2 - 3
- ]
]
-
I
0 - I
5
2
, - 3 • and - 4 , since
5 -4 = 0
•
I
0
2 -2 - ]
-4
••••rk Determining whether a given veCior is a linear combimltion of other vectors is a problem we will address in Chapler 2.
In R2, it is possible to depict linear combinat io ns of two (nonpa rallel ) vectors quite conveniently.
Let u =
[~) and v = [ ~). We can use u and v to locate a new sel of axes (in the same
way that e , =
(~] and e1 = (~] locate the standa rd coordi nate axes). We can use
Section 1.1
The Geometry and Algebra of Vectors
1~
y
2,
w
v
"
.
---jC-+--+--+_ X
-" Flg.r. 1.19 these new axes to determine a coordinate grid that wtlllet us easIly locate lInear combinations of u and v. As Figure 1.19 shows, w can be located by starting at the origin and traveling - u follow'ed by 2v. That is, w == - u
+
2v
We say that the coordinates of w with respect to u and v are - I and 2. ( Note that this IS j ust another way of thinking of the coefficients of the linear combinatio n.) It fol lows that
(Observe that - 1 and 3 are the coordmates of w with respect to c 1 and c 2. )
SWItching from the standard coordi na te axes to alternative ones is a useful idea. It has applications in chemIstry and geology, Slllce molecular and crystalline st ructures often do not fall onto a rectangular grid. It is an idea that we will encounter re peatedly in this book.
J. Draw the following vectors in standard position in Iff:
la)
a ~ [~]
I')' ~[-: ]
Ib)
b~ [ :]
Id)d ~ [_ ~ l
4. If the vectors in Exercise 3 are translated so that their heads are at the point (4, 5, 6), find the po ints that correspond to thei r tails.
-
-
5. For each of the follOWing pairs o f points, draw the vector AB. Then compute and redraw AB as a vector In standard POSI t Ion.
2. Draw the vectors in Exercise I with their tails at the point (- 2, - 3).
la) A ~ II, - I),B ~ 14,2 ) Ib) A ~ 10, - 2), B ~ 12, - I)
3. Draw the following vectors in standard position in lR':
(e) A == (2, f) ,B =
l a ) a~ [ O,2,O [
Ib)b ~[ 3,2,1[
(c)e ==[ 1, - 2, 1)
(d)d ==[ - ], - ], - 2[
(d) A ==
(t,3)
d,D, B == (i, ~ )
14
Chapter I
Vectors
6. A hiker walks 4 km no rth and the n 5 km northeast. Draw displacement vectors representing the hike r's tTiP and draw a vector that rep resents the hike r's net displacem ent frOIll the staning point.
Exercises 7- 10 refer ro the vectors in Exercise J. Compllte the indimtcd vectors (/lui also show how the resl/lts a lii be obtained geometrimlly. 7. a
+b
-
-
(a) AB
-
(b) BC
AD (d) CF (e)
-
+c
(e) AC
lO.a - d
(f) Be
8. b
-.
9. d - c
Exercises II am/ 12 refer to the vectors ;,, Exercise 3. Compute tile indicated I'ectors. 12.2c -3b - d
11.2a+ 3c
13. Find the components of the vectors ti , v, u + v, and u - v, where u and v are as shown ill r igurc 1.20. )'
-
Express each of the followm g VCCIOTS in terms of a :::: OA and b = DB:
--->
+ DE +
~
FA
In exercises 15 and 16, simplify t/ie givell vector expression.
llidimte which properties j" T/leorem 1.1 yOlil/se. 15. 2(a - 3b)
+ 3(2b + a)
16. -3(a - c) + l {a + 2b ) + 3(c - b)
111 Exercises J 7 and 18, solve for the vector x ill terms of the veClnrs a and b.
1
17. x - a = 2( x - 2a)
•
18. x
60'
+ 2a
- b = 3(x
+ a)
- 2(2a - b )
Itl Exercises 19 alld 20, draw tile coordJ/late t/.Xes re/(lfive to u and" and /ocfllew.
- I
19. u = [ _ : ]. v = [ : ], w = 2u
20,:. =
- I
FI'lre 1.11 14. In rlgure 1.21, A, B, C, D, E, and Fare the verti ces of a . regular hexagon centered at the ongill. )'
-u -2v
1/1 ExerCises 21 tHld 22. draw tile stalldard CQOrtli,wte axes 011 {h e same diagram as tIre axes relative to u alld v. Use tlrese to find w as a lillctlT combinat ion of u and v .
21. . ~ [_:]., =[ :].w ~ [!]
22.u ~ [ - ~ ]. , = [ : ]. w~ [:]
1/
C
[- ~ J. v = [ ~J. w =
+ 3v
23. Draw diagrams 10 11IuSIrate properties (d) and (e) o f
A
-",
a E
",Ire 1.11
x
T heorem 1.1 .
24. Give algebraic proofs of propert ies (d ) th ro ugh (g) of Theorem 1.1. F
Section 1.2
Lengt h and Angle: The 001 Product
15
length and Angle: The Dot Product It is qUIte easy to reformulate the familiar geometric concepts of length. distance. and angle in terms of vectors. Doing so wdl allow us to use Ihese Important and powerful ideas in settings more general than R Z and IR' . In subsequent chapters, these simple geo met ric tools will be used to solve a wide variety of problems tlrl Slll g in applicatio ns even when there is no geometry apparent at all!
n. Dot Predict T he vecto r versions of lengt h, d istance, and angle ca n all be descnbed using the noti on of the dot prod uct 0 [ 1\"'0 veC lOrs.
Delililiol
If
", ". then the dot product U · v of u and v is defined by
In wo rds. u· v is the su m of the p roducts o f the correspond ing components of u and v. 1t is importan t to note a couple of thi ngs about this "product" that we have Just defined : First, u and ... must have the same number of com po nents. Second , the dot p roduct u' v is a /lu mber, no t anoth er vector. (T his is why u ' v is sometimes called the scaltlr producl o f u and ....) The dot prod uct of vectors in IRwis a special and importan t case of the mo re general notion of imler producl, which we Iyill explo re in Chapter 7.
EII.ple 1.1
-3
I
Compute u ·v when u =
2 and v =
-3 S.IIII ••
U · V
= ] · (- 3) + 2 · 5
5
2
+ (- 3) ' 2 =
]
Notice that if we had calculated v · u in Exam p le 1.8, we wo uld have com puted v ' u = (- 3) ' 1 + 5 · 2 + 2 ' (- 3)
=
]
That u ' v = v ' u in general is clear, since the individ ual products o f th e componen ts commute. This comm utativity propert y is o ne of the properties of the dot product that we will usc repeatedly. T he main properties of the d ot p roduct arc summarized in Theo rem 1.2.
.2
Let u , v, and w be vectors in IR" and let c be a scalar. Then a. b. c. d.
u' v = v ' u Commutati\·ity Dhlributlvit)" u · (v + w) = u'v + u'w (cu ) ·v = c( u·v)-..; u · u ~ O and u·u =Oifnndonlyifu=O
',..1
We p rove (a) and (c) and leave proof of the remaining properties fo r the exercises. (a ) Applying the definition of dot product to u . v and v' U, we obtam u .v =
II I VI
=
VI lli
+ +
II. V. Y. u .
+ ... + + ... +
Il~Vn Vnll n
where the middle equality fo llo\'IS fro m the fact that multiplicalion of real numbef5 is commutative. (c ) Using the defi n itio ns of scalar multipl icatio n and dot product, we ha\'e
(cu ) • v
= [CU ,. CU2' ..• , CII "1- [ VI> V2' ... ,
= ~
+ (U 2 V1 + ... + c(ll t VI + 112V2 + ... +
(IIIV,
vn]
CII"V"
II"Vn )
« u ·v)
•••arb • Property (b ) can be read from righ t to left, in which case It S,1YS that we can factor out a common vector u from a su m of dot products. Th iS prope rt y also has a " right-handed" analogue that follows from properties (b) and (a) together:
(v + w) · u ::
V ' U
+ w . u.
• Propert y (e) can be extended to give u' (cv) = c(u ' v) (Exercise 44). This extended version of (c) essentially says thaI in t'lking a scalar mult iple of;1 dot product o f vecto rs, the scalar can first be comb med with whic hever vector is more convenient. For example,
(j[- I, - 3,ZJ) · [6,-4,O)
~
[-I ,-3,Z) ' (j[6, - 4,0J)
~
[- I,-3,2) · [3, - Z,O) ~ 3
With this approach we avoid introducing fractions mto the vectors, as the o riginal groupi ng would have. • The second part of (d ) uses the logical connective If (wd only if Appendix A d iscusses this phrase in mo re detail, but for the m oment let us just note that the wording signals a dOl/ble implication- namely, ifu = O,t hen u'u :: 0 and
if u'u = O,thenu = 0
Theorem 1.2 shows that aspects of the algebra of \'e<:tors rtsCmble the algebra of numbers. The next exam ple shows that we can sometimes fi nd vector analogues of fa miliar iden tities.
Section 1.2
Ellmple 1.9
Prove that (u
+ v)· (u + v)
=
Length and .... ngle: The Dol Product
Jl
+ 2(u-v) + v 'v fo r all vectors u and v in IR~.
U·U
(u + v)· (u + v) = (u + v) ' u + (u + v) '\1 = u·u + v·u + u·v + v·y
S.IIIII.
= u ·u + u- v + u· v + v-v -
U ·U
+ 2(u-v) + V'V
(Id en tify the parts o f Theorem 1.2 that were used at each step.)
)'
To sec how the dot product plays a role in the calculalion of lengths, recall how lengths 3rc computed in Ihe plane. The Theore m of Pythagoras IS all we need.
b
IIvl1 = Val + b 1 /
,,: ,, , ,,
-I' ' - - - - - 4_ x a fl,lr.l .U
[~] IS the d istance frorn the origin \0 the point Pythagoras' Theorem. is given by Va- + II, as in Figure 1.22.
In Rl, the length o f the vector \I = (a, b), which, by
Observe that til
+ II-
= v' v. This leads to the followi ng definition. ,•
Dell.llloD
The leng.h (or twnn) ora vector v arive .scalar ~ vf defined by
",.
in R" is tnef'iotu'i
•
v.
V vi + 0 + ... + :!I ________________ I v~
=
, hi
Vv· v =
~=F
In words. the lengTh of a vector is the square root of the sum of the squares of its components. Note that the square root of v' v is always defi ned, since v' v > 0 by Theorem l.2(d ). No te also that the definition can be rewntlen to give U 2 :::: V • v, whICh will be useful in provmg furth er proper ties of the do t p roduct and lengths of vectors.
E. . . plll.l0 Theore m 1.3 lists some of the main properties of vector length.
ThallI. 1.3
Let v be a vector in R" and let c be a scalar. The n a.
Ilvl
= 0 ifand only if v = 0
b. • "'1 - kiN ____________________ .__ .-L-
P'II'
•
Properly (a ) fo llows immediately from Theorem 1.2{d ). To show (b ), we have
levi' -
(ev)·(ev ) - c(v.v ) =
<'lIvl'
using Theorem 1.2(c). Taking square roots of both sides, usi ng the fuct that
W
=
lei for any real number c, gives the result.
11
Chapt~r
I Vectors
A vector of length I is called a unit vector. In lRl, the set of all uni t vectors can be identified wi th the Imit circle, the circle of radius I centered al Ihe origin (see Figure 1.23). Given any nonzero vector v, we can always find a unit vector in the 5.1mcdlrcetion as v by dividing v by its own length (or, equivalenlly, TlIII/lip/ying by 1 /~vI). We can show this algebraically by using property (b) of Theorem 1.3 above: If
u=
(l/I v~ ) v,
then M
11/lvl ll - (1/1 )lvI = I and u is in the same direction as v, since lflvl is a positive scalar. Finding a unit vec=
l(lflvl)vl
=
lor in the same direction is often referred to as lIormalizitlg a vector (see Figure 1.24). y
I
,
/
v
- I
I
,
~
I ,
,
( ..IIPI.1.
fI,.r.'.U
fl,.,. 1.r.
Unit vectors in R!
Normaliling a
In 1R2, let e l
=
l~]
and c1 =
[~]. Then eJ and ez are unit vectors, since the sum of the
squares of their compone nts is I in each case. Similarly, 111 R', we call const ruct unit vectors I 0 0 e, = 0 , e2 = I , ,nd e 3 = 0 0 0 I
, y
" " -4- - >1-- - e,
_ x
fl,.r".15 Standard un it vectors in Rl and R'
x
y
19
Section 1.2 Length and Angle: The Dot Product
Observe in Figure 1.25 that these veClO rs serve to locate the positive coordinate axes in RZ and RI.
In general, in R", we defin e uni t vectors e 1• e Zl • •. , e". whe re e, has li n its Ith com ponen l ilnd zeros elsewhere. These vectors a n se repeatedly in linear algebra a nd are called the stntulnrd unit vectors.
Exlma'e 1.12
2 Norma1i1.e the vecto r v =
- I 3
"IIUII
'"
~ vl = \hz +(_ I )l+3l = \IT4, so a uni t vector in the same direction
as v IS given by
2
u = (1/I>I)v = (1/ Vl4) _ I 3
2/ Vl4 -1/ Vl4 3/ Vl4
Since property (b ) of Theorem 1.3 describes how length behaves With respect to scala r multiplication. natural curiosity suggests that we ask whether length and vector addition arc compatible. It would be nice If we had an identity such as lu + vi = I l u~ + Ivll. bu t fo r nlmost any choice of vectors u a nd vthis turns out to be false. [See Exercise 42(n).) However, all is not lost, fo r It turns out tha t if we replace the = sign by :5 ,l he resulti ng inequality IS true. The proof of this famous and important resuh-
••
Ihe Triangle Inequality-relies on anolher imporlant inequality-the CauchySchwarz Inequality-which \....e will prove and disc uss In m ore detail in Chaple r 7.
Theorem 1.4
• The Cauchy-Schwarz InequaUty
a,
•
For all vectors u and v in Rn,
u
+ ,.
" flllrt 1.21 The Triangle In(>(Juality
.
See Exercises 55 nnd 56 for nlgebraic and geometric npproaches to the pmof o f this inequality. In Rl or IR J , where we G ill usc geom etry, it is dea r from a d iagra m such as Figure 1.26 that l u + vi < ~ u ll + I vl fo r all vectors u and v. We now show that thIs is true more generally.
,
~
ThlOrlm 1.5
The Triangle Inequality For all vectors u and v in R~.
20
Chapter I
Vectors
P"II Since both sides of the inequality arc nonnegative, showing thai the square of the left-hand side is less Ihan or equal to the square of the right-hand side IS equiva lent to proving the theo rem. (Why?) We compute
Il u + V~2
(u + v) ·(u + v)
""
= u·u + 2(u·v) + v'v
By EX31llpie 1.9
" iull' + 21u· vi + ~vr s lul1 2 + 21uU + IvII 2 = (lui + IvI )'
By Cauchy-Schwoarl
as requ ired.
DlSlance The d istance betw'cen two vectors is the direct analogue of the distance between IwO points on the real n umber line or IwO points in the Cartesian plane. On the number line (Figure 1.27 ), t he distance between the numbers a and b is given by ~. (Taking the absolute need to know which o f a o r b is larger.) This d istance is also eq ual to , and its two-dimensional generalization is points (II I' a2 ) and (bl' btl-na mely, the familia r fo rmulaIor the dis· nce (I
la-
d=Y
-b
'+
-11,)' t
/,
o
I i
I
o
-2 flglr'1 .21 d
= 10 -
Jn terms of vectors, if a
=.0
[
~
::]
I 4 I I . 3
= 1- 2 - 31= 5 and b "" [ : : ], then ti is just the length o f a - b.
as shown III Figu re 1.2B. This is the basis for the next definition. y
a- b
,
,, , ,. ,, "' , __ _ ________ JI,
l a~- h _
FIliI,. 1.Z1 (/ - v""(a-,'----b,.,")'~+-c(a-,-b,.,")l
DelialtloD
=
'--------~ x
la - hI
The distanced(u, v) between vectors u and v in Rn is defined b
(u. v)
=
lu .
i
' UW'"
S«t ion
1.2 Length and Angle: Thl' Dot f'roduct
Exlmple 1.13
o
Find the distance betwl'en u =
and v :::
I
- I
S,I.II..
21
2
- 2
v2 We com pute u - v =
-
I ,so
1
d(u . , ) ~ ~ u -
'i
s
\I( v2) ' + ( -
I )'
+
I' ~
V4
~ 2
The dol product can also be used to calc ulat e the ang le between a pair of vcctors. In 1!l2 orR), Ihe angle bet>.vee n Ihe non7.ero vect ors u and v will refer 10 the angle 0 determ ined by these vectors that S:ltisfi es 0 s () :S 180" (sec Plgure 1.29 ).
,
, u
"
• C\
•
u
u
fl,. f.l .2I The lIngie bctw('('n u lind v In Figure 1.30, con sider the tria ngle with side s u, v,and u - v, where () is the angle between u and v. Applyin g the law of cosi nes \0 Ih is triangle yields
U- '
•
Il u - vl1
U
FI"f' U'
lu92 + I 2 - 20ull cos 0 Expanding the left-hand side and usin g Ivf2= v · v seve ral time s, we obta in lui ' - 2(u" ) + lvi' ~ lu! ' + lvi' - '1Iu!II'1005 0 :::
which, after simplification, leaves li S with u' v = U uUvf cosO. From this we obtain the foll owing fo rmula for Ihe cosine of tile angle () between non zero ve<:t ors u and v. We s\:lt c it as a definition. For nonzero vectors u and v in R", u '
•
Eli _p ie l.1
Compu te the angle between Ihe vectors u =
12, I, -21 and v
IE
I I, I, II.
U
Chapter I Vectors
Sollllal
We calculate U· \I = 2· 1 + I . I + (-2 )' I = I. lui =0 V2 2 + 11 + (- 2)1 = V9 = 3, and I"" = V 11 + Il + Il= v'3. Therefo re. cos O= I/ 3v3 . so (J = c05 - 1(1/3 \13 ) .. 1.377 radians, or 78.9". of
tI
Com pute the angle between the diagonals on two adjacent faces of tI cube .
Sol lll. . The dimensions of the cube do not matter, so we will work with a cube with sides of length I. Ori ent the cube rdalive to the coord inate axes in R'. as shown in Figure 1.31, and take the two side diagonals to be the vect ors] 1,0, I J and ]0, I, 1J. Then ang le 0 between these vectors sat isfies cosO =
1·0 +0 ·1
+
v'2 V2
1· 1
I =2
from which it follows that the required ang le is 'TT/ 3 rad itlos, or 60°.
, [0. 1.1 ]
fI , O.lj
_ -x
y
fI,.,. 1.31 (Actuall y. we don 't need to do any calculations at all to get th is answer. If we draw a third side diagonal joining the \lert ices al ( 1,0, I) and (0, I, I ), we get an equilateral tnangle, Sillce all of the side diagona ls arc of equal length. The angle we w:tnt is one of the angles of this triangle and therefore measures 60°. Sometimes a little insight can save a lot of calc ulation; in th is case, il gives a nice chec k on our wor k!)
• As Ihis discussion shows. we usua lly will have 10 settle for an approx imation to the ang le between t....-o vectors. However, when the angle is one of the so-c alled spee cial angles (Oe, 30 • 45°, 60°, 90°, or an mteger mul tiple of these), we sho uld be able to recognize its cosm e Crab le 1.1 ) and thus give the cor respond ing angle exac tly. In all
~able 1,1
00
0
cos 0
CoslDls 01 Special Ing liS
v'4 2
= I
300
0
45
0
VI
2
2
=
600 I
Vi
VI
2
W
I = -
2
Yo 2
=0
n
Section 1.2 I.englh and Angle: The Dol Product
ot her cases, we will use a calculator o r computer to approxi ma te the demed angle by means of the JIlverse cosine function. • The de ri ~ at ion of the fo rmula for the cosine of the angle between two vectors is valid only JIl Rl o r R}, since it depends o n a geometric fac t: the law of cosines. In Rn. (or 1/ > 3, the formula can be ta ken as a (Iefil/ition instead. This makes sense, smce the Cauchy-Schwarz Inequali ty implies that
u· v
U ·
v
S I, so l u~avl r
- I to I, just as the cosine function does.
OnUgOlI1 •• ctors The word orthogonal is deri\'ed fro m the GredI: words orlh..s, meaning "upright," and g,m;a, mea 11 i I1g "angie." Hence. lWtho nal literally means "right·angled.~ The Latin equivalent is rectangular.
The concept of perpend icularity is fu nda mcntal to geometry. Anyone st udying geometry q uickly real izes the importa nce and usefulness of Tigh t angles. We now generalize the idea of perpendicularit y to veClOfS ill R", where it is called orthogonality. In lit: o r R3, two nonzero vectors u and v a re perpendicular If the angle 8 between them is a right angle-that IS, if Q - 1T/ 2 radians. o r 90°. ThUS'
I :~'I:1 = cos 90
0
= 0,
and it follows that u ' v = O. This motivates the following definitio n.
BIIiDlUaD
Th'ovectors u and v in R" are or,hogoll4lto each other if u· v
O.
Since O· v s 0 for every vecto r v in Rn, the zero vecto r IS o rthogonal to every vector.
E....II 1.16
In lR'. u
=
]1, 1, -2] andv = 13,1,21 a reorthogonal,smce u . v = 3
+ I
- 4 =
o.
...i
Using the no tIon of o rthogonalit y, we gel an easy proof of Pythagoras' TheQrem, valid in Rn.
Pythagoras' Theorem Por all vectors u and v in orthogonal.
Rft,
u."
+
~
: if and o nly if u ant.!
"... From Example 1.9, wc have l u + vf = ~ u 11 2 + 2(u · v) + Ivfl for all vectors u and v in R". It foll o\vs immediately that lu + vll = lull + ~ vl: if and only if u . v = O. See Figure ' .32.
v
u+ ,
,
" flilre 1.U Pythagor:u' Theorem
The concept of o rthogo nal ity is one ofthe 1I10si important and useful in linear algebra, and it often rtrises in surprisi ng ways. Chapter 5 contains a detaIled treatment of the topIC, but we will encounter it many lim es befo re then. One problem in which it clea rly plays .. role is fi nding the distance from a point to a line, where "dro pping a perpendicular" is a fa miliar step.
Prolletlo., We now consider the problem o f finding the d istance fro m a po in t to a line in the context of vectors. As you will set, this tedmique leads to an important concept: the p rojection of a veCior onto ano ther vecto r. As Figure 1.33 shows, the problem of finding th e distance from a point B to a li ne C(in R2 or R 3 ) red uces to the p roblem of fi nding the length of the perpendicula r li n e segmen t PB o r, equi valently, the lengt h of the vecto r PH. If we choose a point A on C, then, in the right-angled Irmngle 6.A PB, the other two vectors arc the leg AP and the hypotenuse AB. AP is called the projection of AB onto the line t. We will now look at Ihis situation in terms o f vecto rs.
-
--
-
B
B
.-----'
p
A
n•• r. UJ The distance from a pomt to a [me
Consider two nonzero vectors u and v. Let p be the vector obtained by dropping a perpend icular from the head of v o nto u and let 0 be the angle between u and v, as shown in Figu re 1.34. Then clearl y p = ~ p U u , where u "" ( 1/I ul )u IS the unit vector in the di rcction of u. Morcover, elemen tary trigonometry gives I p~ = Ivi cosO, and we know that cos (J =
I ~~'I:~ ' Thus, after substitution. we obtain
U ,v ) U ( U' U
This is thc for m ula we want, and It is th e basis o f thc follo\" ing definitio n for vccto rs in IR".
Definition
If u and v are vectors in -v onto u ; [he \'cctor proj.. (v) defined by
rojll(v) =
R" and U·
u • 0, then th~ ptPjlL'N,.
v)
- ( U'U
u . .
An alternativc way to derive thiS formula is d escribed in Exercise 57.
•
Section 1.2
,, ,, ,, '<1
proj g(V)
Length and Angle: The Dot Product
2:5
Re • • ,U
•
• U
fI,lr.1.35
• The term projectio" comes from the id ea of projecting an image onto a wall (with a slide projector. for example). Imagine a beam of light with rays parallcll'O each other and perpendicular to u shi ning down on v. The projection of v onto u is just the shadow cast, or p rojccted, by v 011 to u. • it may be helpful to thi nk of proUv) as a function with variable v. Then the variable v occurs only once on the righ t-hand sid e of the defini tion. Also, it is hel pful to remember FIgure 1.34, which reminds us that proju(v) is a scalar multiple of the ' -ector u ( not v). • Although in our derivation of the definition of projg(v) we required v as well as u to be nonzero (why?), it is clear frolll the geometry that the projection of the
zero vector onto u is O .The definition is in agreement with this, si nce ( u. 0) u~ Ou = O.
U· U
• If the angle between u and v is obtuse. as in Figure U S. then proj. (v) will be in the opposite d irection fro m Ui that is. proj. (v) will be a negative scalar multiple of u. • If u is a unit vector then proj.(v) z:: {u . v) u. (Why?)
(KImple 1.11
Find the projection of v onto u in each case.
Sllullon (a) We compute U' v =
[ ~ ] . [ - ~] =
I and u ' u =
[n ·[~]
= 5, so
= [2/S] ( U'V)U~ ~[2] 5 1 115 U·U
(b) Since cJ is a unit vector,
o proj.,(v) = (cJ ' v)c) = 3c) = (c) We sec that
lui =
v'~ + ~ + t =
pro; ..(v}-(u·V )U =
(~
+
I
+
1. Thus.
~)
1/2 1/ 2
I/ Vi 3( 1
+ Vi) 4
0 3
I I
Vi
3( 1
+ Vi) 2
1/ 2 1/ 2
I/ Vi
C h~pt cr
26
I Vectors
In Exercises 1- 6, fil/d u • v.
3. u =
\
,
2 ,v =
3
-
. - 4. u =
\
3
5. " ~ [ \,
-
\12, v'J, 01, v
3.2 - 0.6 , v =
1.5
4.\
- 0.2
- 1.4
~ [4, -
30. Let A = ( - 3,2), B = ( 1, 0 ), and C = (4,6). Prove that a A BC is a right-angled tria ngle. 31. Let A = ( 1, 1, - I), R = ( - 3, 2, - 2), and C= (2,2, - 4). Prove that dARC is a right-:mglcd triangle. .::::. 32. Find the angle between a diagonal of a cube and an adjacent edge. 33, A cube has fou r diagonals. Show that no two of them
\12, 0, - 51
are perpendicular.
6. u = [ 1.1 2, - 3.25, 2.07, - 1.83],
111 Exercises 34-39, find tile projeClioll of v 011to u. Draw (I sketch ill Exercises J4 ami 35.
v = [ - 2.29, l. 72, 4.33, - 1.54]
In Exercises 7- 12. jimi ~ u B for the given exercise. (HId give a un it vector 11/ tile (lireclIOU of u. 1M
7. Exercise I 10. ExerCise
8. E.xercise 2
9. Exercise 3
II . Exercise 5
::. 12. Exercise 6
36. u =
2
2/3 - 2/ 3
- 2 37. u =
2
- 1/ 3 III Exercises 13- 16. find Jhe dis/(lllce d( u, v ) betweell u ali(I v ill the givell exercise. 13. Exercise I
_~ 38. u =
14. ExerCise 2
15. Exercise 3 ~ 16. Exercise 4
17. If u, v, and w are vectors in Rn, /1 ~ 2, and c is a scalar, explam why the follow'lIlg expresSIOns make no sense:
(.) I"'vl
(b) u · v + w
(c) u· (v·w)
(d) c· (u+w)
=-
39. u =
\
\ 8. "
v
= [~]. v = [- :]
19.u =
3.0\
1.34
- 0.33
4.25 - 1.66
- 2
20. u = [4,3 , - I ], v = [ I, - I, I]
(, )
~ 21. II = [0.9,2. 1, 1.2], v = [- 4. 5,2.6. - 0.8 ]
22, u
= [1, - 2,3,4 ], v = [- 3, I, -
1, I ]
23. u = \1,2, 3,4]. v = 15, 6.7 , 8] /11 Exercises 24-29, find lilt allgle between U lind v . .
In
lir e
g,vell exerCIse.
24. Exercise 18
25. Exercise 19
~ 27, Exercise 21 ~ 28. Exercise 22 :
26. Exercise 20 29. Exercise 23
-2
2.52 Figure 1.36 suggests two ways III whICh veclOrs may be used to compute the a rea of a triangle. The area A of
- \
\
- \
1.2
1.5
\
- I ,v =
,v =
[0.5] ,v = [2.\]
is acute, obtuse. or a right angle.
2
\
- \
III E.xercises /8-23, determme wlretlrer the allgle between 1\ IIlId
-\
2 -3
(b)
Fl•• ,. 1.16
Secllo n 1.2
the triangle in part (a) is given by 11 Iv - proj.(v)l, nnd pa rt (b) suggests the trigonometric form of the nrea of a triangle: A = 1I ullvisinO (We can use the identit ysinO "" V I -cos1 8tonnd sinO. ) In Exercises 40 tmd 41 , compllte the arell of the triangle with
tilegil'cnl'crtices !Ising bOlh ltIellwds.
= (1,- 1),8 = (2,2),C= (4,0) 41. A = (3, - I, 4), 8 = (4, - 2, 6), C = (5,0,2) 40. A
, V ""
2
k' k -3
44. Describe nil vectors v = [;] that are orthogonal
lo u =[~l 45. Describe all vectors v "" [;] that are orthogonal
,o u =[~l 46. Under what conditions arc the following true fo r vectors u and v in Rl or RJ ?
(.) lI u+ ~I = Ilull + II~I 47. Prove Theorem 1. 2(b). 48. Prove Theorem 1.2(d).
111 Exercises 49-51, pro\'t~ the statell property of distance betWCCl1vectors. 49. d(u, v) "" dey, u) for all vectors u and v 50. d(u, w) ::5 d(u, v) + dey, w) for all vectors u, v, and w 51. d(u, v) "" 0 ifa nd only if u == v 52. Prove that U' cv = c(u ' v) for all vt""Ctors U and v in R" and all scalars c.
,
53. Prove that Jlu - vU2: UuJl- llv11fo r all vectors u and v in \R:". I Him: Replace u by u - v in the Triangle Inequality.1 54. Suppose we know that u . v = U· w. Does It follow that v "" w? If it does, give a proof that is valid in Rn; otherwise: give a cOllllterexampie (that is. a specific set of vectors u, v, and w for which u ' v = u' w but v ¢ w).
:n
IIul12 -llvWfor all vec-
.h"
56. (. ) Pm" Ilu + ~I' + Ilu - ~I' = 211ull' + 21MI' for all vectors u and v in R~. (b) Draw a diagram showing u, v, u + v, and u - v in R~ and use (a) to deduce a result about parallelograms. U· V
=
1
-4 Ilu + v~2
I -~ u
-
4
- vf
for all
vectors u and v in R~.
which tllc two vectors lire orthogonal. - I
55. Prove that (u + v )· (u - v) = tors u and v in R-.
57. Prove that
[II Exercises 42 Witt 4] , fi nd all wllues of the sealll r k for
1
Le ngt h and Angle: The Dot Product
58. (a) Prove that lIu + vii"" Ilu - vii If and only if u and v are orthogonal. (b) Draw a diagr:lm showing u, v, u + v, and u - v in RI and use (a) to deduce a resu1t abou t parallelograms. 59. (a) Prove thai u + v and u - v arc orthogonal in RM if ull = 11vI1./ Hml:See Exercise 47·1 and only if D (b) Draw a diagram showing u, v, u + v, and u - v in RI and use (a ) to deduce a result about parallelograms. 60. If lIuli
= 2,1Iv11 = v'3, ,nd u. v = I, find Ilu + ~I. 61. Show that there arc no vectors u and v such that Ilull "" [1vI1 ". 2,and U· v :;: 3.
I,
62 ~ (a ) Prove Ihat if u is orthogonal to both rand w, then
u IS orthogonal to v + w. (b) Prove thai if u is orlhogMal to both v and w, then u is orthogonal to sv + t w for all scalars sand t. 63. Prove that u is orthogonal to v - proi~( v) for all vectors u and v in Rn, where u O.
"*
64. (a) Prove tha t proiu{proju(v» = proju(v). (b) Prove tha I proj.( v - proju(v» = O. (e) Explain (a) and (b) gcomClrically.
65. The Cauchy-Schwarz. Inequality lu· vj < Ilull \lvll is equivalent to the inequality we get by squaring both sides: (u' v)2 S lI uW IIvW. (a) In R Z, with u =
["'J 112
and v =
[v'J.
this becomes
V2
Prove this algebraically. I Hint: Subtract the leh -hand side from the right·hand side and show that the difference must necessarily be nonnegative.' (b) Prove the analogue of (a) III Rl. 66. Another approach to the proof of the Cauchy-SchwarL Inequality is suggested by Figure 1.37, which shows
ZI
Chapter I
Vectors
that in R2 or IRJ, l proju(v)1 :s Ivi. Show that this is equivalent to the Cauchy-Schwarz Inequality. v - eu
cu
u
fl •• "1.31
proju(v)
u
Fila" 1.31 67. Use the fa ct tha t proJ.(v) = eu fo r some scalar c, together with Figure 1.38, to fi nd c and thereby denve
68. Using mathematical induction, prove the following generalization of the Tna ngle Inequality:
jlv. +
V2
for all
+ ... + vnl :s Iv.11 + IIv21 + ... +
fI 2;
Ilvn~
I
the formula for projw(v).
•
,
. ,. • 4o
0_
Vectors and Geometry Many results in plane Eucl idean geometry can be proved using vecto r tech niques. For example. in Example 1.17, we used vectors to prove Pythagoras' Theorem. In this exploration we will lise vecto rs to develop proofs for sollle other theorems from Euclid ean geometry. As an introd uct ion to the nOlation and the basic approach. consider the follow ing easy example.
Example 1.18
SllllIOI We fi rst convert evc~ in g to vector nOI:l tionJ.!..o de n~s the Orig i~l and fi! a poin t, l~ be t he vector Op, In this situation, a "" GA, b = DB, m = OM, and AB = DB - OA = b - a (Figure 1.3~ Now, since M is the midpoint of AB, we have
A
•
Give a vecto r description of the midpoi m M of a line scgmcnt AB.
b- a
,--
1
m - a = AM - l AB = l (b - a ) m = a +!
flgar, 1.39
-
The midpomt of AB
1. Give a vector description o f the po ill t P that is one- third of the way from A to B on the line segment AB. Generalize. 2. Prove that th e line segment join ing the nud points of twO sides of a tria ngle is parallel to the third side and half as long. (In vector no tation, prove that PQ = in Figure 1.40.)
tAB
c
Q
A~-------------' . fl,." 1.40
3. Prove th:. t the q uad rilateral PQRS (Figure 1.41 ), whose vertices are the midpoints o f the sides o f an arbitrary quadrilateral ABCD, is a parallelogram. 4. A median of a triangle is a line segmen t from a vertex to the midpoint of the opposite side (J:igure 1.42). Prove th tH the three medians of any tri:.ngle are cOllcurrem (Le., they have a common point of intersection ) at a point G that is two- thirds of the d istance fro m each vertex to the m idpoint of the opposite side. ( Hint: In Figure 1.43, show that the point that is two-thirds o f the d ista nce from A to P is given by~(a + b + c). Then show that 1(a + b + c ) is two-thirds of the distance from B to Q and two-thirds of the distance from Cto R. ) The point G in Figure 1.43 is called the centroid of the triangle.
"
A
A
R
8
M
--"c f)
:..--- p
C
R
n,.rel."1
flg." 1.42:
nlur.l.U
A median
"111' centroid
5. An altitude o f a triangle is a line segmen t fro m a vertex that is perpendicular to the o pposi te side (Figu re 1.44). Prove that the three alti tudes of a tn angle ,1fe concurrent. ( Him: Lei H be the poi nt of intersect ion oflhe altnudes from A and 8 in ~ Figure 1.45. Prove that CH is orthogonal to AB.) Th e point H In Figure 1.45 is called t he ortl'oce,.'er of the tria ngle.
-
6. A perperldicular bis«lor of a line segment is a line through the midpoint of the segment, perpendicular to the segment (Figure 1.46) , Prove Iha\ the perpendicular bisectors of the three sides of a triangle are concurrent. - (l-/IIJI: Let Kbe the point of intersection of the ~pendicu1a r bisectors of ACand Bein Figure 1047. Prove that RK is o rthogonal to AB. ) The poilU K in Figure 1047 is called the circumCfmterof the tria ngle.
_.
c
A
/J
B
A
flllr.1.U
n'I" 1.U
FI'lf,1.41
An altitude
The orthocentcr
A perpendicular biSCC10r
7. Let A and Bbe the endpoints of a diameter of a circle. If C is any point on the circle. pro\'e thai L ACB is a right angle. (Hint: In Figure lAS. let 0 be the center of the circle. E.l:press everything in terms o f a and c and show that AC is orthogonal to BC)
-
-
8. Prove that the line segmen ts Joi ning the midpoin ts of opposite sides of a quadrilateral bisect each other (Figure 1.49).
c A
Q
o
c
B R
A'----~------',. /J
o
R
n•• ,. Ul The circumccnter
n•• ,.1.41
n,.r.1.49
.... -_ _
... ..... ~-, •
cc _ ~ C
31
Lines and Planes
Section 1.3 c
•
1:3
lines and Planes We arc all familiar with Ihe equation of a line in the Cartesian plane. We now want to consider lines in Rl from a vector po int of view. The insights we obtain from this approach will allow us to generalize to lines in R ' and the n to planes in IRl. Much o f thc linea r algebra we will consider in later cha pters has its origins in the simple geometry of lines and planes; the ability to Visualize these and to think geometrically about a
problem will serve you well.
lines In
~.
and
~3
'*
In the xy-planc, the general form of the equation of a line is ax + by = c. If b 0, then the equation ca n be rewritten as y = -(alb)x + c/b, which has the fo rm y = mx + Ie. [This is the slope-intercept form; m is the slope of the line, and the point with coordinates (0, k) is its y-intercept.l To get vectors into the picture, let's consider an example.
The line e with equatlOil 2x + y = 0 is shown in Figure 1.50. It is a line with slope - 2 passing through the ori gin. The left -hand side of the equation is in the form of a dot product; in fact, if we let n =
[ ~] and x =
[;
J,then the equation becomes n . x
=
o.
The vector n is perpendicular to the line-that is, it is orthogol/(II to any vector x that is parallel to the line (Figure 1.51 )- and it is called a normal vector to the line. The equation n . x = 0 is the normal form of the equation of f. Another way to think about this line is to imagine a particle moving along the line. Suppose the particle is initially al the origin at time t = 0 and it moves along the line in such a "Vay that its x-coordinate changes I unit per second. Then at r = I the particle IS at (I, - 2), at r = 1.5 it IS at ( 15, - 3), and, if we allow negatIve values of t (that is, we consider where the particle was in the past ), at t = - 2 it is (or was) at y
y
\
, The l.atin word trormlf refers to
ca rpenter's square, used for d raw~-. ing right angles. Th us, a 110rmal vector is one Ihal is perpendicular to so meth in g else, usuall y a plane.
I
fhilUI' 1.50 The lille 2x + y - 0
flllr.l .51 A normal vector n
..
tI
II
Chapler I
Vtelors
(- 2,4). This movemen t is illustrated in Figure 1.52. In general. if x = t, then y - - 21. and we may write this relationship in vector form as
Whal is the sign ifica nce of the voctor d = [
e,
_~]? It is a particular vector parallel
to caned a din-criol! "t:elor for the line. As shown in Figure 1.53, we may write the equation of 35 x = Id. This IS the vecfor fo rm of the equation of the line. If the line does n o t pass through the origin. then we must modify things slightly.
e
y y
\
,- - 2
-l-
-+-+-+_
_x d , -
I
, ,.. 1.5
" Jlgutl1.52
fI,.r,
,
x
[-il
1.53 1\ dlre ctlOI} veclor d
Consider the line ewilh equation 2x + y= 5 (Figure 1.54). Th is is just the line from Example 1.19 shifted upwa rd 5 u nits. It also has slo pe - 2, but its y-intercept IS the poi nt CO, 5). II is clear Ihat the veClots d and n from Example 1.19 are, respectively, a di rection vector and a normal veClor fo r this line too. Thus. n is orthogonal 10 every vector that is parallel to f. The poi nt J> = ( 1,3) is on f. If X = (x, y) represe nts a general point on f. then the vector PX = x - p is parallel 10 f and n · (x - p ) = 0 (see Figu re 1.55). Simplified , we have n · x = n · p. As a check, we compute
-
n .x =[ ~ ] . [;] = 2X+Y
and
n.p =[ ~] . [ ~] =5
Thus. the normal form n • x = n · p is just a di fferent represcntation of the general form of Ihe equation of thc hne. (Note that in Exam ple 1. 19, p \Vas the ....ero ve<:to r, so n · p = 0 gave the righ t-hand side of the equation.)
-
5«:t ion L3
y
aa
Lines and Plant'S
y
\
/. n d
-+-+-
,
-+-+- \+--+-<_x e
"
x
t
flgur.1.54
Flglr, 1.55
The line 2x + y " 5
n· (x - p)" O
These res ults lead to the fo llowing definition.
DeliDitloD
The normal/orm o/t',e etluatiOtI of a line n' (x - p )=O
where p is a specific point on
or
ein Rl is
n·x = n·p
eand n *' O is a norma) vector for C.
The geneml form of the equation of( i!i ax + by= c, where n mal vecto r fo r
e.
•
Continuing with Example 1.20, lei us now fi nd the vector form of the equation of Note that, for each choice of x, x - p must b e parallel to-and Ihus a multiple ofthe direction vecto r d . That is, x - p = Id or x = p + td for some scalar t. In te rms of componen ts, We have
e.
(I) x= I
+
I
(2)
y = 3 - 21
e,
Th~
word paramt'tl'Tand
ih~ COJ'T~
sponding adjecliv~ JMramctric rome rronl the Greek words pllTa, mea ning "alongside," and metrQ", mcaning "mcasure.~ Mathematically speaking, a parameter is a variable in lemlS of which other va riablcs HC cxpressed-a new "rtlt'
Equat ion ( I) is the vector fo rm of the equat ion of a nd the componentwise equations (2) are called JKlra metric equations o f the line. The va riable t is called a parameter. How dotS all of th iS generalize to R J ? ObserVe tha t the vecto r a nd paramet ric forms of the equations of a line ca rryover perfectly. The notion o f the slo pe of a line in R 2- which is diffi cul t to generalize to three d imensions -is replaced by the mo rc convenient no tion o f a d irec tion vecto r, Icading to t he followi ng defin ition.
Definition
The vector fo nn of the equation of a /jlle C in R2 or R:.is
e
"*
x zr p +td
where p is a specific poi nl o n and d 0 is a direction v~ctor for {. The C
34
Chapter I
Vectors
We will often abbreviate this terminology slightl y, refe rring simply to the genera l, normal, vector, and parametric equations of a line or plane.
Example 1.21
Find vector and parametric equations of the line in IR J through the point P = (1,2, - I), 5 parallel to the vector d =
- I
3 SOlliliol
The vector equation x = p
+
Id is
x
1
y =
2 + t - I
5
- I
The parametric form is
3
+ 5,
x=
I
y=
2-
I
z =- 1 + 31
HamarU
e e
• The vector and parametric forms o f the equation of a given line are not unique-in fac t, there arc infinitely many, since \ve may use any poi nt on to determine p and any direction vector for C. However, all d irection vectors are clearly multiples of each Olher. 10 In Example 1.21,(6, 1,2) is anothe r point on the line (ta ke t = I),and another dircction vectoT. Therefore.
x
6
10
Y
1
+s -2
2
6
- 2
is
6
gives a different (but equivalent) vector equa tion for the line. T he relationsh ip between the two parameters sand t can be fo und by comparing the paramet ric equations: For a givcn point (x. y. z) on C, we have
x=
1 + 51=6+ lOs
y=
2-
1= 1 -
2s
z = -1 +3( = 2+ 6s iml'lying that
- IOs+51 =
25 -
5
t =- I
- 6s + 3t = Each of these equations reduces to I = I + 2s.
3
Section 1.3
Lines and Planes
35
• Int uitively, we know that a line is a one-dimensional object_ The idea of "dimension" will be clarified in Chapters 3 and 6, but for the moment observe tha t th is idea appears to agree wi th th e fact that th e vector fo rm o f the equa tion of a line requires one parameter.
Example 1.22
One often hears the expression "two points de termine a line." Fmd a vector equa tion of the line C in R) de te rmined by the poi nts P = (- 1,5,0) a nd Q = (2, I, I). Solltlot fi ne).
We may c hoose any point on C for p, so we will use P (Q would also be
-
A convenient direction vector is d = PQ = Th us, we o b tain
3 - 4 (or any scalar multi ple o f this). 1
x = p + td - I ~
+
t
-,
1
Plaaes In R'
n
The next question we should ask o urselves is, How docs the general fo rm of the equation o f a line generalize to R~ We might reasonably guess that If ax + by = C IS the general form o f the equation of a line in R l , then ax + by + a = d m ight represent 3 line In IR). In normal (arm, this equation would be n' x = n . p , where n is a normal vector to the line and p corresponds to a point on the line. To see if this is i1 reasonable hypothesis. let's thi nk abou t the special case o f the
flllr, 1.5& orthogonal to infinitely many vectors
n
5 0
3
IS
equatio n ax + by + cz = O. In normal form , it becomes n . x
n ' (x - p )=- Q
,
However, the sct of all vectors x that satisfy this equation is the sct of all vectors orthogo nal to n . A5 shown in Figure 1.56, vectors in infinitely ma ny direc tions have this pro pe rty, determm tng ,1 fa mil y of parallel plmlcs. So our g uess was incorrect: [t appea rs that (IX + by + cz = d IS the equation of a plane-not a line-in iR;3. Let's make thiS fillding more precise. Every plane '1f in IV can be determined by specifying a point p on ~ and a no nzero vector n normal t o~ ( Figure 1.57). Thus. if x rep resents an arbi tra ry point on eJ', we have n . (x - p) = 0 or n . x = n' p. If
n
fl , .r, 1.51
= 0, where n =
a b.
"'-..., ~
n (IX
=
a b
,
x and x =
+ by+ a=
Definition
y, then, in te rms of components, the equation becom cs
,
t/(whered= n ·
pl.
The lIormal fo rm of tile equation of a pla"e ~ in n ' (x - p) = 0
where p is a speci fic point on ~ and n
or
R3 is
n · x = n· p
'* 0 is a normal vector for '3'. a
T he general f orm oftheequ(,tiotl ofr;/' is fIX + b)' + rz = d, where n = a no rmal vector for t1J .
b "
,
.'
38
Ch~Jltcr t
Vectors
Not e that any scal ar mul t iple of a norm al vect or for a plan e is ano ther nor mal vect or.
Exlmple 1.23
Find the norm al and genera! fo rms of the equ atio n o f the plan e that con tain s thc I
poin t P = (6,0 , I) and has norm al vect or n
=
2 3
SO llnl .
Wit h p ""
6 0 and x
x =:
I the nor mal equ atio n n . x = n
0
y , we have n op = 1·6 ,
+ 2·0 + 3 ·1 = 9, so
p bec ome s the gen eral equ atio n x
+
2y
+
3z
::0
9.
4
Geo m etricall y, it is clea r tha t parallel plan es ha ve the sam e nor mal vecto r(s) . Thus, thei r gen eral equ atio ns have left -han d sides that are mul tiple s of each othe r. So. for exam ple, 2x + 4)' + 6z =: 10 is the gene ral equ alio n of a plan e Ih,1I is parHIle! to the plan e in Exa mpl e 1.23. since we may rew rite the equ atio n as x + 2y + 3z = 5-from whi ch we see that the two plan es ha ve the sam e no rma l vect or n . ( Not e tha t the plan es do not coin cide , sinc e the righ t-ha nd Sides of thei r equ a tio ns arc dis tinc t.)
We may also express the equation of a plane in vector or pa rametric fo rm. To do
so, we obse rve tha t a plan e can also be dete rmi ned by spec ifyin g one of its poin ts P ( by the vector p) and two di rect ion vect ors u and v parallel to the plane (but no t pllralle l to each othe r) . As Figu re 1.58 sho ws. given any poin t X in the plan e (loc ated
"
P "' SII + tv
p
x
o flgl rl 1.5. x - p = su + rv by xl, we can always find app rop riat e mul tipl es su and tv of the dire ctio n vect ors such that x - p = su + tv or x = p + s u + tv. If we writ e this equal10n com pon entw ise, we obta m para met ric equ atio ns for the plan e.
DeUnUion
==
The vect or f orm of tire equ atiotf of a plan e Jl in R J is x = p + su + t v whe re p is a poi nt on qp and u and v arc di rect ion vect ors for -.I (u and v are rRfn zero and parallel to r;iJ> , b ut not para llel to each othe r ). The equ atio ns corr espo ndi ng to the com pon ents of the vect or form of the equ atio n are called par ame tric equ atio ns of 'Jl.
Sect Lon Ll
Example 1.24
,
3T
Lines and Planes
Find vector and parametric equat ions for the plane in Example 1.23.
-
-
Solilion We need to fi nd two direction vectors. We have one point P = (6.0, I) 10 the plane; if we can find two other points Q and R in qp, then the vectors PQ and PR can serve as direction vectors (unless by bad luck they happen to be parallel!). By trial and erro r, we observe that Q = (9,0,0) and I~ ::::: (3,3, 0) both sa tisfy the genera l equation x + 2y + 3z = 9 and so lie in the plane. Then we com pute n
3
-
n,
o
u =PQ= q - p =
Flgur'1.59 Two nor mals determme a line
-,
-
andv :::: PR =r - p =
-3 3
-,
which, since they are not scalar multiples of each o ther, will serve as dltection vectors. Therefore, we have the vector «Iuation of@',
n,
x y
Il~
,
=
- 3
6
3
o
o +I
-1
- 1
1
3
and the corresponding param~tric tquations,
,
x = 6 + 3s - 3t Y= z= l -
nglr, 1.6. The Intersection of two planes is a line
[What would have happened had
w~
3t s-
t
chosen R = (0.0, 3)?J
Rlmlrks • A pla n ~ is a t wo-dim~nslonal object. and Its equation, in vector o r parametric form, requi res two pa rameters. • As Figure 1.56 sh ows, gi\'en a point Pand a nonzero vector n in R', there are in fin itely many lines thro ugh P with n as a normal vector. However, P and two nonparallel normal v~c tors and do serve to locate a line t uniquely, since must then be the line th rough Pthat is perpendicular to the plane with equation x = p + SIl l + Ifl l (Figure 1.59). Thus. a line in Rl can also be specified by a pair of equation s
"I
"1
e
alx+ b,y+ c1z = d l azx + bV' + C2z ,.. d1 one corresponding to each normal vector. But since these equations correspond to a pair of no nparallel planes (why nonpa rallel?), th is is just the descnpllon of a line as the intersection of two no nparallel planes (Figure 1.60), Algebraica lly, the line conSISts of all po ints (x, Y, z) that simultaneously satiSfy both equations. We will explore this concept furth er in Chapter 2 when we discuss the solution of systems of linear equations, Tables 1.2 and 1.3 summarize the information presented so far abou t the equalions of lines and planes. Observe once again that a single (general) equation descnbes a line in R" but a plane in IR' , [In higher di mensions, an object (line, plane, etc.) d~tcrm ined by a single equation of this type is usually called a hyperplane. I Th~ relationship among the
38
Chapter 1 Vectors
Tlble 1.2 !qlallons 01 lines In Normal Form
~,
General Form
Vector Form
ax + by = c
x ::: p +ld
n·x = n · p
Parametric Form
! X = Pr + tdr
\J' = PI +
Table 1.3 lines and Planes In
~.
Normal Form
Planes
Vector Form
General Form
+ b,y + clz = ta2x + /'zY + c2z =
Lines
fa lx
Parametric Form x - p , + ld ,
x = p + ld
d, d1
Y = Pl+ td2 z ::: p) + Idl X = p+ SU + 1V
ax + by + cz = d
n ·x = n · p
ttl:
X "" P I +
SUI + t VI
Y = Pl+ SUz + II': Z "" P l
+
SilJ
+
tVJ
d imension of the object, the n umber of equa tio ns req u ired, and the d imension of the space IS given by the " balancing fo rmula": ( di mellsioll o f the object )
+ ( flllmberof general equ.ltiollS)
d imension of the space
The higher the dimension of the object. the fewer eq uations It needs. For example, a p lane in Rl is Iwo-dimensional, requires o ne general equatIon , and lives in a three-d imensional space: 2 + I = 3. A line in R J is one-dimenSIonal and so needs 3 - I = 2 equations. Note that the dimension of the object also agrees wi th the number of para m eters in its vector or parametric fo rm. NotIo ns o f "dimensio n" will be clarified in Chaplers 3 and 6, but fo r the time being, these in tuitive observations will serve us well. We cail now find the distance frOIll a point to a line or a plane by co mbilllng the results of Section 1.2 with the results fro m th iS section .
Example 1.25
Find the distance from the point B = ( 1, 0, 2) to the 1me
e through
the point
- I
A = (3, I , 1) ",ithdlrecllonvCClor d ""
I
o 50lullon As "'e have al ready determ ined, we ne«l. to calculate the lengt h of PB, where Pis the poin l on Cal the foot o( the perpendicular (rom B. If we label v = A B, then AI' = proJd(v) and PO = v - proid(v) (sec Figure 1.61 ). We do the necessary calculat ions in several steps. ~
~
~
,
1 ~
Step I:
v = AB = b - a =
0 2
-
3
- 2
1
- I
1
1
Section 1.3
Lines and Illanes
B
,
A
Figure 1.61 d(B, €) = ~ v - projd(v ) ~
Step 2:
The projection of v onlO d is
~ ((-1)0 (- 2) + 10(- 1) ( 1)2 +1+0
~
+ 001)
I
0
, ,!
_1 ~
0 Step 3:
The vecto r we want is
-t
-2 v - projiv) =
1
- I
Step 4:
The distance d ( B,
~
0
I
,, -,, I
e) from B 10 eis
Iv -
-1
projiv)1 =
-! I
Usi ng Theorem 1.3(b) to simplify the calculation, we have
-3
Iv -
Projd{v)l =
!
I
o
- I
,,
- I
-3 2
=! V 9 + 9 + 4 = ~ v'U
Note • In terms o f our earl ier notation, deB, €) = d ey, projd(v)) .
39
••
Chapler
J
Ve<:tors
In the case where the line e is in Rl and its equation has the general form ax + by = c, the distance deB, t) from 8 "" (Xo, Yo) is given by Ihe fo rmula
(3)
You are invited to prove this formula In Exercise 390
Example 1.26
Find the dlSlance from the point H = ( 1,0, 2) isx +y - z= l .
10
the p lane 9J whose general equation
-
SIIIU" In Ihis case, we need to calculate the length of Pll, where P is the pOint on IJP at the foot of the perpendicular froln B. As Figure 1.62 shows, if A is any poin t on 1
qp and
we sit uate the norlnal vector n ==
• of 9JI so that its tail is at A, then we
I - I
-
need to find the length of the projection of An onto n. Again .....e do the necessary calculations in steps. n
------ -
B
no'"~
1.62 diM') - Iproj.(AB) 1 Step I: By trial and erro r, we find any point whose coordi nates s.1lisfy the equation x + y - z = I. A "" (1,0,0) will do. Step 2:
Set
-
v = AB "" b - a ". Step 3:
1
1
0 2
0 0
-
0 0 2
The projec tion of v onto n is proj.(v} =
(n . v) n·n
n
_ (1 0 + 1 0 - 1 0
0
0
I+J+(- I ) ~
,
= - '•
1 - I
,, - ', ,1
- '•
1
=
2)
1 1
- I
Section 1.3
Step 4:
Lines and Planes
41
The d istance d (B. IJP) from B to IJP is
I Ipwj"(v) i = HI
I - I
,, In general. the distance deB, IJP) from the point 13 = (>;» Yo . ~ ) to the plane whose general equatio n is ax + by + cz = d is given by the fo rmula
d(8,
tU"o
+
b}o +
cZo -
d
(4)
You will be asked to derive th is fo rmula in Exercise 40.
Exercises 1.3 III Exercises 9 muilO, write the eqllatioll of the pia/ie /)(lssing
1n Exercises J alld 2, write the equatlOl! of the line passing through P with normal vector n IfI (a) normal fOrtll and (b) gel/eml form. I. P = (O,O), n=
[:l
2.P = ( 1,2), n =
through P with direct ion vectors u alld v ill (a) vector form alld (b) parametric form.
[-:l
9. P = (0,0,0), u =
III Exercises 3-6, write the et/llation of the /inc passing through P with direction vecford in (a) vector fo rm and (b) parametric form. 1 3.P = (J , 0), d = [ - 3 ]
4. P = (- 4,4 ), d =
5.P = (O,0,0), d =
- I
6.P = (3,0, - 2),d =
° 2
5
4
/11 Exercises 7 and 8, write the equation of the plane passing throllgh P with /lormal vectorn in (a) normal form and (b) gel/eml foml. 7.P =(0,1,0), n =
3 2 I
8. P = ( - 3,5, 1), n =
I , v=
2
2
I
10. P = (6, - 4, - 3), u =
[I ]
I
-3
2
I - I
5
° ,v =
- I
I
I
I
I
/n Exercises II alld 12, give till' I'('(tor eqllation of the line passing throllgh P and Q. I J. P = (I, - 2),Q=(3,0)
12. P = (0, I,- I),Q = (- 2, 1,3) III Exercises / J and 14, give the vector eqlwtioll of the plalle pllSsing througll p, Q, and R.
13. P = (I, I, I),Q = (4,0, 2),R = (0, 1, -1) 14. P = ( 1, 0,0), Q = (0, 1,0), R = (0, 0, I)
15. Find parametric equations and an equation in vector for m for the lines in [R:2 with the followlllg equations: (a) y= 3x - 1 (b ) 3x + 2y = 5
42
Chapter I
V~ tors
16. Consider the vector equation x = p + t( q - p), where p and q co rrespond to distinct points Pand Q in 1R2 or IRJ.
(a) Show that this equation describes the li ne segm ent PQ as t va ries from 0 to 1. _ (b) For which value o f t is x the midpoin t of PQ, and what is x in this case? (el Fi nd the midpoint of PCl when P = (2, - 3) and Q -(O, I). (d) Fi nd the m idpoint of PCl when P = ( 1, 0, 1l and Q - (4, I, -2). (e) Find the two points tha t d ivide PQ in part (c) into three equal parts. (£) Fi nd the two poin ts that divide PQ in part (d ) in to three equal pa rts.
17. Suggest a " vector proof" of the fact that, in 1R2, two lines wit h slo pes III ] and o nly if lII ] m2 = - I.
1112
are per pendicular if and
18. The li ne C passes thro ugh the point P "" ( I , - I , I) and
25. A cube has vertices at th e eight points ( x, y, z), where each of x, y, and z is eit her 0 or 1. (See Figure 1.31.)
(a) Fi nd the general equations o f the planes tha t d etermi ne the six faces (sides) o f the cube. (b ) Find the general equation of the plane tha t contains the diagonal from the origin to ( I , I, I ) and is perpendicular to the xy- plane. (cl Find the general eq uation of the plane that contains the side diago nals referred to in Example 1.1 5. 26. Find the equation of th e set of all po ints that are equidistant from the points P = (1,0, - 2) and Q~
(5,2,4).
In Exercises 27 and 28, find the distance from the pomt Q to thelinef. 27. Q = (2, 2),f with equa tion [ ; ] =
[-~] +
2
has direction voc{Qr d ;::
3. For each of rhe
- I follow ing planes (jJ>, determ ine whether f and QI> are parallel, perpendicular, or neither: (a) 2x+ 3y - Z "" I (c) x - y - z = 3
(b) 4x - y + 5z = 0 (d ) 4x + 6y - 2z = 0
19. The plane @'] has the equation 4x - y + 5z = 2. For each of the planes q. in Exercise 18, de termine whether qp] and 'lJ' are parallel, perpendicular, or neither.
x 28. Q = (0, \ ,0), € with eq uatio n y
{-:J -2
1
\ +
,
I
1
0
3
In Exercises 29 and 30, find the distallce from tllf point Q to the phme ~ . 29, Q "" (2, 2, 2), ~ with equation x
+ y-
z= 0
30. Q = (0, 0,0), f!J' with equation x - 2y + 22 = 1
20. Find the vector fo rm of the equation of the 11Ile in 1R2 that passes thro ugh P = (2, - I ) and is perpendicular to the line with general equation 2x - 3y = 1.
Figure 1.63 suggests a way to use vectors to locate the point R Otl f that is closest to Q.
2\. Find the vecto r fo rm of the eq uatio n of the line in [R:2 that passes th rough P = (2, - \ ) and is parallel to the line with general equat ion 2x - 3y = 1.
32. Find the point Ron f t hat is closest to Q in Exercise 28.
3 1. Find the poin t Ron
e that is doses t to Q in Exercise 27.
Q
22. Find the vector fo rm of the equation of the line in IR J that passes through P "" (- \,0, 3) and is perpendicular to the plane with general equation x - 3y + 2z = 5.
e
23. Fmd the vector fo rm of the equation o f the line in R J that passes through P = ( - 1,0, 3) and is parallel to the lme with parametric equations
,
x = I - t Y "" 2 + 3t z= - 2- I 24. Find th e nor m al for m of the equation of the plane that passes thro ugh P = (0, - 2,5) and is parallel to the plane with general equatio n 6x - y + 22 == 3.
p
o flgur. 1.63
~
r = p
+ PR
Section 1.3
Figure 1.64 suggests a way to use vectors to locate the poim R on VI' tlrat is closest to Q.
Lines and Planes
43
the angle between W> I and qp 2 to be either 8 or 180" - 0, whichever is an acu te angle. (Figure 1.65)
Q
n,
,
c
,
o
flgare 1.64 r = p + PQ + OR
- -
180 - 8
figure 1.65
33. Find the point Ron g> that is closest to Q in Exercise 29. 34. Find the po int Ron 'll' that is closest to Q in Exercise 30.
Exercises 35 (II/(/ 36, filld the distall ce between tile {X/rallel lilies.
III Exercises 43-44, find tlse acute mlgle between the pIa/Ie! with the given equat ;0115.
43.x+ y+ z = 0 and 2x + y - 2z = 0 44. 3x - y+2 z=5 and x+4y - z = 2
111
35.
= [I] + s[-'] [x] y I 3
III Exercises 45-46, show tlll/tihe pllllie and line with the given 1.'(llIatiol15 illlersecf, (lnd then find the aela/.' angle of intersectioll between them. 45. The plane given by x
36.
x Y
I
I
x
O+siandy
,
- \
I
z
o
I
\ +t \ 1
Z=
3
+
I
46. The plane given by 4x - Y -
In Exercises 37 011(/38, find the distance between the parallel planes. C 137. 2x + y - 1%= 0 and 2x + y - 2z =5
38.x+y + z =
J
,nd x + y +z= 3
39. Prove equation 3 o n page 40. 40. Prove equation 4 on page 4 J. 41. Prove that, in R ', the distance bet..."een parallel lines wit h equations n' x = c, and n· x = c1 is given by
given by x =
Exercises 47-48 explore Olle approach 10 the problem of fillding the projection of a ,'ector onlO (/ pial/e. As Figllre 1.66 shows, if@> is a plalle throllgll the origin ill RJ with normal
n
en
till
p= \
I nil If two nonparallel plalles f!J> I alld 0>2 lrave lIormaI vectors " l al1d 11, mui 8 is tile angle /Jetween " l anti " 2, then we define
6 and the line
t
42. Prove that the dis tance between parallel planes with equations n· x = til and n' x = ti, is given by -
z
y = I +2t. Z = 2 + 31
ICI - ~ I ~ nil
I(il
0 and the line
givcn byx = 2 +
I y = I - 2t.
I
+ Y + 2z =
figure 1.66 Projection onto a pl(lllc
en
44
Chapler I
Veclors
vector n, ami v is a vector in Rl, then p = pro~{ v) is a vector ;11 r:I sllch that v - en = p for some scalar c.
onto the planes With the fo llowi ng equations: (a) x+ y+ z = O
(b) 3x - y+ z = O
47. Usi ng the fa ct that n is orlhogonal to every vector in ~ (and hence to p), solve for c ::and the reby fi nd an expressio n fo r p in terms of v and n.
(e) x - 2z = 0
(d ) 2x - 3y
48. Use the method of Exercise 43 to find the p rojection of v =
1 0
-2
+z= 0
I The Cross Product It would be convenient if we could easily convert the vector form x "" p + s u + tvor the equation of a plane to the normal for m n' x = n ' p. What we need is a process that, given two nonparallel vecto rs u and v, produces a third vecto r n that is orthogo nal to both u and v. One approach is to use a const ruction known as the cross product of vectors. Dilly Yldid In RJ , it is defined as follows:
Definition
The cross prOtlUCI of u =
U2
and v =
VI
is the vector u X v
defin ed by
U X II
=
IIl V) -
II J Vl
" l V, -
" I V)
U I Vl -
U2 Y ,
. A s hortcut that can help yo u rem ember how to cakut.lte the cross product of two lIectors is illustra ted below. Under each com p lete Yector, write the first two compo-
nents of that vector. Ignon ng the two components on the top line, consider each block o f four: Subtract the products of the components connected by dashed lines from the products o f the components connected by solid lines. (It helps to notice that the fi rst component of u X v has no Is as subscripts, the second has no 2s, and the third has no 35.)
IIl Vl -
II J V,
UJ V, -
II I VJ
Il , I'l -
1'2 VI
45
The following problems brietly explore the cross product. I. Compute u x v.
,
3
0 (a) u =
,
, v :::::
(0) u
=
,
flgur.1.61
=
-.-. 2
2 ,v =
3
(b ) u
2
-, u X ,
-,
(d ) u
=
-, , , ,
•v =
2
,
, ,
0
3
, ,=
2
3
2. Show that c 1 X c 2 = c)' c1 X c j = c., and c j X e l = ez. 3. Using the definitiOn o f a cross p roduct, prove that u X v (as shown in Figure 1.67 ) is orthogonal to u and v. 4. Use the cross product to help find the no rmal form of the equation of the plane. o 3 (a) The plane passing through P = (l , 0, - 2), pa rallel to u = I and v = - ]
,
2
(b) The plane passing through p = CO, - 1, l),Q = (2,0,2),aod R = (1,2, - 1) 5. Prove the following properties of the cross produ ct: ( a) v X u = - (u x v ) (b) u X 0 = 0 (c) u X u = 0 (d ) u X kv = k(u X v) (e) u X ku == 0 (0 u x (v + w) = u X v + u X w 6. Prove th e fo llowing properties of the cross product: (a ) u· (v X w) ::::: (u X v) ·w ( b ) u x (v X w ) "" (u ·w )v - (u -v)w (e) Illl x
V!l =
I U ~ 2 ~ v ll~ -
(u -vy
7. Redo Problem s 2 and 3, this time making use of Problems 5 and 6. 8. I.et u and v be vecto rs in RJ and let 0 be the angle between u and v. (a) Prove that lu x v ~ = l un vll sin O. t Hlnt: Usc Problem 6(c).J ( b) Prove that the arc.. A of the tri .. ngle de termined by u and v (as shown in Figure 1.68) is given by
,
A u
fll.,.1 .6I
= t llu x
vii
(c) Use the resul t in part (b) to compute the area o f the tria ngle with vertices A = (1 , 2, 1) , B = (2, I,O),and C = (5, - I, 3) .
Section 1.4
Code Vectors lmd Modular Anthmetlc
41
" ."~
Code Vectors and Modular Arithmetic
The modern theory of codes onglllated WIth the work o f the American mathematician and com puter scientist Claude Shannon ( 1916-2001 ). whose 1937 thesis showed how algebra could playa role in the design and analysis o f electncal clfcuits. Shan non would later be Instrumental in th e formatIon of the field of IIIformation tlreoryand gtve the theorctkal basis for what are now called errorcorreclmg codes.
Throughout hislory, people have transmitted informa tio n usi ng codes. Sometimes the intent is to disgUise the message being sen t, such as when each letter in a word is replaced by a different leiter acco rding \ 0 a substitu tio n rule. Although fascinating, these secret codes, or ciphers, ilre not o f concern here; they are the focus of the field of cryptography. Rather, we wi ll concentrate o n codes that are used when data m ust be transmitted electronically. A familiar example of such a code is Morse code, \~ i th its system of dots and dashes. The adven t of d igital computers In the 20th centu ry led to the need to tra nsmit massive amounts of data q uickly and accurately. Computers are designed to en code data as sequences of Os ilnd Is. Many recent tech nological advilncements depend on codes, and we encounter Ihem every d ay withoul being aware of them: satellite communications, compact disc players, the u niversal product codes (U PC) associated with the bar codes fo u nd o n merchandise, and the international standard book numbers (ISBN) found o n every book published today are but a few examples. In this sectIOn, we will use vectors to design codes for detecting errors that may occur in the transmission of data. In laler cha plers, we will construct codes that can not only detect but also correct erro rs. The vectors thaI arise in the study of codes are not the familia r vectors of R" but vectors with only a fi nite number of choices for the components. These veclo rs depend on a different type of arlthmetic-moduiar arithmetIc-which will be introduced in Ihis section and used throughout the book.
Binary COdes Since computers represen t d;lIa in terms o f Os and Is (which can be interpreted as off/on, closed/open, false/ t rue, o r no/yes), we begin by consideri ng biliary codes, which co nsist of vectors each of whose componenls is eilher a 0 Qf a \, In thls setting, the usual rliles of arit hmetIC must be modified, since the result of each calculation involving SOl lars must be a 0 or a I. The modifi ed rules for addition and multiplication are given below.
+
0
I
o
I
001
o
0
0
I
I
0
I
I
0
The only curiosity here is the rule that I + I = O. This is not as strange as it appears; If we replace 0 wit h the word ""even" and I with the word "odd," these tables simply sum marize the fami liar panty rules for the additio n and multiplicatIOn of even and odd integers. For example, I + I = 0 expresses the fact tha t the sum of IWO odd integers is an even lllteger. With these rules, a Uf set of scala rs 10, I } is denoted by Z2 and is called the set of integers modulo 2.
":t
In Z2' 1 + 1 + 0 + 1 = 1 and 1 + 1 + I + I = O. (Thesecakulalions ill ustrate the panty ,"I" Th, sum o f lh,oo odds ,"d , n eve" " odd; lh, sum of fout odds is
'S''"'
We are using the term kmgth differently from the way we used it in R". This should not be confusing, since there is no gromt'tric notion of length for binary vectors.
Wi th l, as OU f set o f scalars, we now extend the above rules to vectors. The SCI of all ,,-tuples of Os and Is (with all ari th metic performed m odu lo 2) is de noted by Zl' The vectors In Z~ are called binary vectors o/Iength n.
Cha pler I
V« lo rs
Example 1.28
The vectors in l~ arc [0 , 0 1, [0, I], [I, 0], and II, contain , in general?)
Exampll 1.29
Lei U = f 1, 1,0, I, OJ and v - 10, I. I, 1,01 be two brna ry veclorsoflen glh 5. Find U ' v.
II.
(How Illany vectors does
Z'l
Solution The calculation of u' v takes place in Zl' so we have u ·v = \·0+ \ . \ + 0·\ + \·1 + 0·0 = 0 + 1 +0+ 1+0
= 0
t
I In practice, we have a message (consisting of words, numbers, or symbols) that we wish to transmit. We begin by encod ing each "word" of Ihe message as a binary vecto r. III
Definition
A binary code is a set o f binary vecto rs (of the same length ) Gliled
code vectors. The process o f com'erring a message into code vectors is ca tled encoding, and the reverse process is called decodi"g. •
"=zz
A5 we will 5«, it is highly desirable that a code have other p ro~rti es as well, such as the ability to spot when an error has occu rred in the transmission o f a code vecto r and, if possible, to suggest how to correct the erro r.
Error-Ollecllng COdes Suppose that we have alread y encoded a message as a set of binary code vectors. We now want to send the binar y cod e vecto rs across a cluHlllei (such as a radio tra nsm itter, a telepho ne line, a fiber o ptic cable, or a CD laser). Unfortunatel y, the channel may be "noisy" (because o f electrical interference, competing Signals, or dirt and scratches). As a result, erro rs may be introduced: Some of the Os may ~ changed to Is, and vice versa. How can we guard agaInst this problem ?
hample 1.30
We wish to encode and transmit a message conSisting of one of the words up, do""., I"/t, or rigill. We decide to use the fo ur vectors in Z~ as our binar y code, as shown in Table 104. If the receiver has this table too and the encoded message is transmitted without e rro r, decod ing is trivial. However, let's suppose that a si ngle error occurred. (By an error, we mean that one component o r the code vec to r changed .) For example, suppose we sent the message "down" encoded as [0, I J but an error occurred in the transm ission o f the fi rst component and the 0 changed to a t. The receiver wo uld then sec
Tlble 1.4 Message Code '
up
[0.0)
down
left 11 . 0)
right ) 1,
tJ
Section 1.4
Code Vectors and Modular Anthmetlc
49
[1. II instead and decode the message as "right." (We will only concern ourselves wi th thc case of single errors such as this o ne. In practice. it is usually assumed that the probabil ity of multiple errors is negligibly small.) Even If the receiver knew (somehow) that a single error had occurred, he or she would not know whether Ihe cor rect code vector was (0, Jj or [ I, OJ. But suppose we sent the message usi ng a code that was a subset of Z~in other wo rds, a binary code of length 3, as shown in Table 1.5.
Tnlll.5 Message
Cod,
up
down
left
right
[O.O. OJ
10, I, J]
11,0, iJ
[ 1,1, 01
This code can detect any single error. For example, if "down" was sent as [0, I, J J and an error occurred in one component, the receiver would read either [I , I , 1 J o r !0, 0, I J or [0, I , 0], none of which is a code vector. So the receiver would know that an error had occurred (but not where) and could ask that the encoded message be retransmitted. (Why wouldn't the receiver know where the error was?)
The term pa rifY comes from th~ l ntin wo rd par, meaning "equnl or ~CVCI\:' Two inteser~ ar~ s.aid t~. have the sa me parity if they are both even or bQth odd,
The code ill Table 1.5 is an example of an error-detecting code. Until the 1940s, this was the best that could be achieved. The advent of digital computers led to the development of codes thllt could correct as well as detect erro rs. We will consider these in Chapters 3, 6, and 7. The message to be transmitted may itself consist of binary vectors. In th is case, a simple but useful error-detecting code is a parity d l/?ck code, which is created by ap-
pending an extra componellt---catied a check digit-to each vector so that the par ity (the total numberof Is) is even.
Exampla 1.31
If the messllge to be sent is the binary vector I I, 0, 0, 1,0, 11. which has an odd number of Is, then the check digit will be I (In o rder to make the total number of Is in the code vector even) and th e code vector will be [ 1, 0,0, 1, 0, I, I J. Note that a single error will be detected, since It will CllUse the panty of the code vecto r to change from even to odd. For exam ple, if an erro r occurred III the third compo nent. the code vector would be received as [ I, 0, I, 1, 0, I, I J, whose parity is odd because it has fi ve Is.
~ jI
Let's look at this concept a bit more formally. Suppose the message is the binary vector b = [bl' bl •• •• , hnJ in I.';. Then the parity check code vector is v = [ /' " b2 , . .. , l bn , d] in , where the check digit d is chosen so that
Zr
bl + h2 + ... + b" + d = 0
In
Zl
or, equivalently, so that I .v = 0
where I = [1, I, ... , I J, a vector whose every component is I. The vector 1 is cal led a check yector. If vector Vi IS received and I· Vi = I, then we can be certai n that
50
Chapter I
Vectors
an error has occurred. (Although we are not considering the possibility of more tha n one erro r, observe that th is schem e will not detect an even number of erro rs.) Parity check codes arc a special case of the more general check digit codes, which we will consider after first extend ing the forego mg ideas to more general seuings.
Modular Arithmetic I I is possible to generalize what we have just done fo r b inary vecto rs to vecto rs whose
components are taken from a finite set 10, 1,2, ... , kJ fo r k 2:: 2. To do so, we must fi rst extend the id ea of b inary arit hmetic.
EKample 1.32
The integers modulo 3 consist of the set Zl = {O, I, 2 ) I io n as given below:
+1 0
I
2
0
0
I
2
0
I
I
2 0
I
2
2 0
I
2
With
addition and multiplica -
0
I
2
0 0 0
0
0
I
2
2
O bserve that the result of each addition and m ultiplication belongs to the set 10, I , 2J; we say that Zl is closed wi th respect to the operatio ns of addi tion and multiplicatio n. It is perhaps easiest to think of this set in term s of a three-ho ur dock with 0 , I, and 2 on its face, as shown in Figure 1.69. The calculation 1 + 2 = 0 translates as fo llows: 2 hours afte r I o'dock, it is o o'dock. l us t as 24:00 and 12:00 are the same on a 12-hou r d ock, so 3 and 0 are eq uivalent on this 3- ho ur clock. Likewise, all mult iples of 3-positive and negativeare equivalent to 0 here; 1 is equi valent to any num ber tha t is I more than a multiple o f 3 (such as - 2, 4, and 7); and 2 is eq uivalent to any numbe r that is 2 mo re than a m ultiple of 3 (such as - 1,5, and 8). We can Vis ualize the n um ber line as wra pping a round a circle, as shown in Figure 1.70.
o . . . . -3.0.3 . .. .
2
.... 1,2.5 . . . .
. . . . - 2, 1.4.. ..
filer. 1.&9 Arit hmetic modulo 3
Example 1.33
Fllur. 1.10
To wh
Selullel
T his is the same as asking whe re 3548 lics on our 3-hour clock. T he key 15 to calculate how far this number is (rO m the nea rest (smaller) multiple of 3; that is,
Sectl011 I 4
Code Ve
51
we need to know the remmnderwhcn 3548 is divided by 3.Uy longdivislO11, we fi nd that 3548 = 3 · 1182 + 2, so the remainder is 2. Therefore, 3548 is eqUIvalent to 2 in Z,.
• tI
In courses in abstract algebra and number theory, which explore this conce!,l in greater det ail, the above equivalence is o ften written as 3548 = 2 (mod 3) or 3548 _ 2 (mod 3) , where = is read "is congruent to." We will not usc this nottllion or terminology here.
(Xample 1.34
In ill' calculate 2
+ 2 + I + 2.
SIIIUa. 1 We use the S<1 mc ideas as in Example 1.33. The ordinary sum IS 2 + 2 + 1 + 2 = 7, wh ich is I mo re than 6, so division by 3 leaves a remainder of I. Thus, 2 + 2 + 1 +2= l in1'.).
Salltlll t
A better way to perform this calculation is to do it step by step entirely in ZJ'
+ I+2
2 + 2 + 1 + 2 = (2 + 2)
= 1 + 1+ 2 ~( 1 + J)+2
= 2+2 ~
I
Here we have used parentheses 10 group the terms we have chosen to combine. We could spcl'd things up by simultaneously combining Ihe first twO and the last two terms:
(2 + 2) + (i + 2)
~
I+ 0
~
l
Repeated multiplica tion can be handled similarly. The idea is to use the addition and multiplICation tables to reduce the result of each calculation 10 0, 1, or 2. Extending these ideas to vectors is straightforward.
Example 1.35 In -
1
o
In ~. lei u = (2, 2, 0, I, 2 J and v = ll , 2, 2, 2, I J. T hen u·v =2 · 1 +2· 2+0 · 2 + 1·2 + 2 · 1
=2 +1+0 + 2 + 2 ~
l
3
Vecto rs in Z~ are referred 10 as temary vectors o/Iel/gth 5. II(
filii" 1.11 Arithmetic modulo
In
ti
In general, we have the set Z", = 10. I, 2, ... , m - I J of integers modulo m (correspo nd ing to an m-hour dock, as shown in Figure 1.7 1) and m-ary vectors of lengtl. n d enoted by Z::,. Codes using m-ary vectors are called m-ary codes. The next example is a direct extellsion o f Example 1.31 to ternary codes.
52
Chapter 1 Vectors
Example 1.36
Let b = [bl' b!, .. . , bnl be a vector in Z!.~. Then a check digit code vector may be d efined by v = [b .. bz"'" b~, dl (in Z!.3' 1), with the checkdiglt dchosen so lhal I . v "" 0
(where the check vector I = [1,1, ... , 1[ is the vecto r of Is in Z!.j"' I); that is, the check d igit satisfi es
For example, consider the vector u = [2,2,0, 1, 21 from Example 1.35. Th e sum o f its components is 2 + 2 + 0 + 1 + 2 = I, so the check digit must be 2 (since 1 + 2 = 0). Therefo re, the associated code vector is v = [2, 2, 0, I, 2, 21.
\Vhile simple check digit codes will detect single errors, it is oft en importtln t to catch other common types of errors as well, such as the accidenttll interchtlnge, or transpositIOn, of two adjacent components. (Fo r example, transposing the second and third components of v in Example 1.36 would result in the Incorrect vector v' = [2,0,2, 1,2,21.) For such purposes, o ther types of check digit codes have been
designed, Many of these simply replace the check "ector 1 by some olher carefully chosen vector c,
Example 1.31
a figure 1.12 A Universal Product Code
The Universal Product Code, o r UPC (Figure 1.72), is a code associated with the bar codes fo und o n many types of merchandise. The black and while bars that are scanned by a laser at a store's checkout counter correspond to a lO-a ry vecto r u = [111' 112" " , U I I' d] of length 12. The fi rst II componen ts for m a vector in Z:~ that gives manufacturer and prod uct information; the last component d IS a check digit chosen so that c' u = 0 In l w where the check vector c is the vector [3, I, 3, I, 3, 1,3, 1, 3, 1, 3, I I. 111at is, aft er rearranging,
6
where d is the check digit. In other words, the check digit is chosen so that the left hand side of this expression is a mulliple of 10. Fo r the UPC shown in Figure 1.72, wecan determ ine that the check digit is 6, perfo rming all calculations in 71. 10 :
+ 9+3 - 2 + 7 + 3 · 0 + 2 + 3·0+9 + 3·4 + d + 0 + 0 + 4) + (7 + 9 + 7 + 2 + 9) + d
c' u = 3 · 0+ 7 + 3 · 4 = 3(0 + 4 + 2
= 3(0) + 4 + d = 4+(/ The check digit d must be 6 to make the result of the calculation 0 in 1' 10' (Another way to think of the check digit in this example is that it is chosen so that c' u will be a muillpleof 10.) The Universal Product Code will detect all single errors and most transposition errors in adjacen t components, To see this last point, suppose that the UPC in
Section 1.4
Code V«lors and Modular Anthmetlc
51
Example 1.37 were incorrectly writ te n as u ' = [0,7, 4,2,9,7,0,2,0,9, 4,61, wi th the fourth a nd fi fth components transposed. When we applied the check vector, we wo uld have c' u ' - 4 0 ( verify this!) alerting us to the fact that there had been an error. (Sec Exercises 48 and 49.)
'*
The I nte rna tio nal Standard Book Number ( ISBN ) code is a nother widely used check digit code. It IS designed to detect more types of errors tha n the Universal Produc t Code and, conseque ntly. IS slightly more complica ted. Ye t the basic pn nci ple is the same. The code vector is a vector in Zl~. Th e first nine componenlS give cou nl ry. publishe r, and book info rmation; the tenth componenl is the c heck digit. The ISBN for the book Calculus: Concepts and COllfexts by James Stewart is 0-534-344S0-X. It ,s recorded as the vector
b = (O,S, 3, 4,3,4,4,S,0,X] where the check Mdigit" IS the leite r X. For the ISBN code. the check vecto r is the vector c - 110,9, 8.7,6.5, 4,3,2, I J. and we require that c - b = 0 in 7.. 11 ' lei's dete r m ine Ihe check digit fo r the vecto r b in Ihis exam ple. We must compule c,b - 10 · 0 + 9·5 + 8 · 3 + 7 · 4
+ 6·3
+ 5 ·4 + 4'4
+ 3·5
+ 2 ·0 + tl
whe re d is the check digit. We ~in by performing all of the multiplications in II ,. ( Fo r example, 9 · 5"" I , since 45 is I mo re t ha n the closest smaller multiple of 11namely, 44. O n an II-hour dock, 45 o'clock is I o'clock.) The si mplifi ed su m is 0+1 +2+6+7+ 9 +5+4+0+ d and adding in 7..11 leaves us with I + d. The check digit tl must now be chosen so Ihat the final result is 0; therefore, in ZI" tI :: 10. ( Equivalently, dmust be chosen so that c ' b will be a multiple of II .) But since it is prefe rable tha i each compone nt o f an ISBN be a single digit, the Roma n nume ral X is used for 10 whenever it occurs as a c heck digit, as it d oes he re.
The ISBN code will detect all single e rro rs and adjacent tra nspositio n errors (see Exercises 52-54 ).
//1 Exercises 1- 4, u t/lld v are bill~ry vectors. Find u u . v III cueh casc. I. u
= [~]" = [:]
2.u =
3. u = (1, 0, I, I]" = ( I, I, I, J)
I
I
l,v =
i
°
+ v (/lid
6. Wrile oullhe addition and multiplicatio n tables for ls'
I" Exercises 7- 19, perform tile i"dicated c~fcu lat iotlS. 7.2 +2+2 in Zl 9. 2(2+ 1 +2) in l}
I
11. 2 · 3 ·2in ~
8. 2 - 2 - 2in Z j I O.3 + 1+ 2+3in ~
12. 3{3 + 3 + 2 )i n ~
4. u - ( I, 1, 0, 1,0]" = (0, I, I, 1, 0]
13. 2+ I +2+2+ lin lJ,~, and Zs
5. Wri te out the addition a nd multiplication tables for Z t.
14. (3 + 4)(3 + 2 + 4 + 2) m Zs
54
Chapter 1 Vectors
+ 4 + 3) i n ~
16. 21oo in Z II
17. [2, 1,2] +[2, O , I ] inZ~
In Exercises 45 and 46, fi nd the check digit d III the given Universal Product Code.
18. [2, 1, 2J · [2, 2,1] inZ;
45. [0, 5,9,4,6,4, 7,O,O,2 ,7, d]
15.8(6
19. [2, 0,3, 2J · ([3, I, I, 2J + [3,3, 2, I]) in
z: ,nd in Zl
46. [0,1,4,0, 1,4, 1,8, 4,1 ,2, d] 47. Considerthe
III Exercises 20-31, solve tile given eqllariolJ or iluiicale that there is no sollltioll.
+ 5 = 1 in Z6
20,x+ 3 = 2inZs
21. x
22.2x= 1tn Zl
23. 2x = 1 in Z.
24. 2x = I tn Zs
25_ 3x
26. 3x= 4m ~
27. 6x = 5 in Z,
28, 8x= 9in Z il
29. 2x + 3 = 2in Zs
+5
31. 6x + 3 = 1 in Zs
30. 4x
= 2
in ~
= 4
urc [0,4,6,9,5,6,1,8, 2, 0, 1,5 ].
(a) Show that this UPC can not be correct.
(b ) Assuming that a sin gle error was made and that the incorrect digi t is the 6 in the third entry, fi nd the correct U PC. 48. Prove that the Universal ProducCCode will detect all single errors.
in Zs
°
32_ (a) For which va lues of a does x + t l = have a solution in Zs? (b) For which values of a and b does x + a = b have a solution III Z6? (c) For which values of a, b, and m does x + a = b have a solution in Z",,?
49. (a) Prove that if a transposition error is made in the second and third en tries o f the UPC [0, 7, 4, 9, 2, 7, 0, 2, 0, 9, 4, 6 ], the error will be detected. (b ) Show that there is a tra nsposition involving two adjacent en tries o f the UPC in part (a) that would no t be detected. (c) In general, when will the Un iversal Product Code not detect a transposition error involving two
adjacent entries?
33. (a) For which values o f a d oes ax = 1 have a solulion 1Il
Zs?
(b) For which values o f a does (IX = 1 have a solution .III " ""6' (c) Fo r \~hich values of a and In does ax = I have a solution in Z",?
In Exercises 50 mId 51, find the check dig it d ill the givell International Standard Book Number. 50. [0,3,8,7,9,7,9,9,3,dl
51. [0,3 ,9, 4, 7,5,6 ,8,2,d] 52. Consider the ISBN [0,4,4,9, 5,0,8,3, 5,6].
In ExerCISes 34 (IIu/ 35, finritlJ e parity check code vector for
the binary I'ecforu. 34.u = [1,0, 1, 1]
35. u =
II, 1,0, 1, I ]
In Exercises 36-39, a parity check code vector v IS glvell. Determine wiJetiJer a single error could have occurred ill the fr(m sllllssion of v. 36. v = \ 1,0, 1, 0]
37.v = ]I,I,I,0,1,1]
38. v = [0,1,0, I, I, II
39. v = [ I, 1,0,1,0,1 , 1, 1]
Exercises 40-43 refer to check digit codes ill which tlJe check vector c is tIle vector I = [ I , I , . . . , I ] of the approl"ime length. /n each case, find till! dleck digit d that would be appendetl to the wctor u. 40. u = (1,2,2, 21 inZ;
4 1. u = 13,4,2,3] in Z!
42.u = [1,5,6,4,5]i n .z~
43. u = 13,0,7, 5,6,8J in Z;
44. Prove that for any posi tive integers m and fI, the check d igit code 1I1 .z~ with check vector c = 1 = [ I , 1, . .. , I J will detect all single errors. (That is, prove that if vecrors u and v in Z::' differ in exactly o ne entry, then c-u ,·v. )
*"
(a) Show that this ISBN cannot be correct. (b) Assuming that a single error was made and that the incorrect digit is the 5 in the fi ft h entry, find the correct ISBN. 53. (a) Prove that if a transposition error is made in the fourt h and fifth entries of the ISBN [0,6, 7, 9, 7, 6, 2, 9, 0, 61, the error will be detected. (b) Prove that if a transposition error is made in an y two adjacent entries of the ISBN in part (a), the error will be de tected. (c) Prove, in general, that the ISBN code will always detect a transposition error involvlllg two adjacent ent ries. 54. Consider the ISBN [0,8,3,7,0,9,9, 0,2,6]. (a) Show that this ISBN can not be correct. (b) Assum ing that the error was a transposition error involving two adjacent entries, fin d the correct ISBN. (c) Give an example of an ISBN fo r whICh a tr ansposition erro r involving two adjacent entries will be detected but will not be correctable.
Svste
The Cod
Every credi t (
f (iN.M
--
Ofl01
Suppose tha t the fi rst IS digi ts of you r card
a TC
541 23456 7890432
and that the check d igi t is d. This corresponds to the veClor
x = [5,4 , 1, 2, 3,4, 5,6,7,8,9,0,4,3,2,dl in Z:~ T he Codabar system uses the check vecto r C ::: [2, 1,2, 1,2, 1,2, 1,2, J, 2, 1,2, 1, 2, 11, but instf."ad of requIring that c . x = 0 in ZIO' an extra calculation is added to increase the error-detecting capabIlity of the code. Let h count the number o f digits in odd positiolls thal are greater Iluln 4. ln this example, these diglls arc 5, 5, 7, and 9,
5011 = 4. It is now required that c . x
+h=
0 in 7. 10. Thus, in the example, we ha ve. rear-
ranging and working modulo 10,
c . x + I, = (2 . 5 + 4 + 2 • 1 + 2 + 2 . 3 + 4 + 2 . 5 + 6 + 2 . 7 + 8 + 2 . 9 + 0 +2 · 4 +3+2 ·2 +d)+ 4 =2(5 +1+ 3+5+7+ 9 + 4 + 2)+(4 +2+ 4 +6+8+ 0 +3+ (1)+ 4 ... 2 (6) + 7 + d + 4
=3+ d Thus, the check d igit d for this card must be 7, so the resul t of the calculation is 0 111 liD"
The Codabar system is one of the most efficien t error-d etecting m ethods. It \"111 d elcet all single-digit errors and most other common errors such as "dj;lcellt transposItion errors.
"
,, ,
..
Chapler 1 Vectors
,I
:'f~' Ie, Definitions and error-detecting code, 49 head-la -tail rule, 6 integers mod ulo m (Zm)' 5 1 International Standard Book Number, 53 length (norm) of a ve<:tor, 17 linear combinatio n of vectors, 12 normal vector, 31, 35 o rthogonal vectors, 23 parallel vectors, 8 parallelogram rule, 6
algebraic properties of vectors, 10 angle between vectors, 21 Cauchy-Schwan Inequality, 19 check digit code, SO code vector, 48 cross product, 45 direclion veCIO T, 32 distance between vecto rs, 20 dot product, 15 equation of a line, 33 equation of a plane, 35-36
projection of a vector onto a vector, 24 Pythagoras' Theorem, 23 scalar multiplication, 7 standard unit vectors, 19
Triangle Incqu
19
uni t vector, 18 Universal Product Code, \lector, 3 vector addi tio n, 5 zero vector, 4
52
Review Quesllons I. Mark each o f the following statements true or false:
(a) For vectors u, v, and w in R~, if u + w = v + w) then u = v. (b) For vectors u, v, and w in !R", if u . w "" v . w, then u "" v. (c) For vectors u, v, and w in !RJ , if U IS orthogonal to v, and v is orthogonal to w, thcn u is orthogonal to w. (d) In R.l , if a line is ptlmllel to a plane~ , then tl direct ion vector d for is parallel to a normal vector
e
n for ~ .
e
e
(e) In 1Il3 , if tl line is perpend icu lar to a plane W>, then a di rec tion vector d for is a parallel to a normal vector n for ~. (f) [n 1R3 , if two planes are not parallel, then they m ust intersect in tl li ne. (g) In IR' , if IwO Illles are no t pamUcI, then they must in tersect in a point. (11 ) If v is a binary vecto r such that v . v "" 0, then v = O. (i) If tlt most o ne error has occurred, then the upe [0,4, 1,7,7, 1, 5,2,7,0,8,21 is correct. (j) If at most one error has occurred, then the ISBN [0,5, 3,2,3, 4, 1,7,4,8J is co rrect. 2. Jf u = [
e
-~]. v = [~].and the vector4 u + v
IS drawn
with its tail at the POlll t (10, - 10), fi nd the coordi nates of the POlllt at the head of 4u + v. 3.lf u "" fo r x.
[-~]. v = [~].and 2x + u =3( x -
_.
4. Let A, 8, C, and D be the vert ices o f a sq uare centered al the origin 0, labeled in clockwise orde r. If a ::: OA and b = OB, find Be in te rms of a and b. ~
~
5. Find the angle between the vectors (- I, 1,2 J and [2. I. -I[. I
6. Find the projection of v =
1
1 onlo u :::
1
- 2
2
7. Find a un it vector in Ihe xy-plane that is orthogonal 1
to
2 .
3 8. Find the general equation of the plane through the point ( I , I, I) that IS perpendicular to the line with parametric equations
x
=
2-
Y= 3 z= - I
t
+ 2t + t
9. Find the general equalio n o f the plane th rough the point (3,2,5) that IS parallel to the plane whose genem l equatIOn is 2x + 3Y - z = O. 10. Find the general equation of the plane through the points A{I, 1,0), B( l , 0, I ), and C(O, 1, 2). 11. Find the area of the triangle with vertices A( I, 1,0),
8( 1,0, I), and C(O, 1,2).
v),solvc
12. Find the midpoint of th e line segment between A = (5, 1, - 2) and B = (3, -7,0).
Chapter Review
,, I I
I
13. Why are there no vectors u and v in R" such that l u~ = 2, ~ vll = 3,andu 'v =-7?
17, If possible, solve 3(x + 2) = 5 in 71. 7 '
14. Find the distance from the point (3. 2, 5) to the plane whose general equation is 2x + 3y - z = O.
19. Find the check digit din the O,3,1,7,d).
15. FlIld the distance from the point (3.2 ,5) to the line with parametric equmio ns x = I, Y = I + t, Z = 2 +
20. Find the check digit d in the ISBN [0,7,6,7,0,3,9,
16. Compute 3 - (2
+ 4)'(4 + 3)2 in Z s.
18. If possible, solve 3(x
t.
4,',dl,
+ 2)
= 5 in 7L,-
upe [7,3,3. 9,6, I, 7,
5J
.- :'\'t r #
"-c ..
'
' :...
- _ •••
Ie E
The worl.1 w.:Ij f ul! 0/ "llulI/ron, . ... There: /lUISI b;,>
1'lC Accidental Toriris/ Alfred A. Knopf, 1985, p. 235
f linear io
2.0 Introduction: Trlvialitv The word lrivilil is derived from the ullin root /ri- (" three") and the Latin word v/(/ ("road"). Thus, speaking liter,llly, a triviality is a place where three roads meet. Th iS common meeting point gives rise to the other, more fa miliar meaning of trivia/commonplace, ordinary, or insignifi cant. In medieval universities, the Irivi/llll consisted of the three "common" subjects (grammar, rhetoric. and logic) that were taught before the quadrivilllll (arithmetic. geometry, music, and astronomy). The "three roads" th at made up the trivi um were the beginning of the liberal arts. In this section, we begin to examine systems of linear equations. The same system of equations can be viewed in three different, yet eq ually important, ways-these will be our three roads, all leading to the same solution. You will need to get used to this th reefold way of viewing systems of l!Ilear equallo ns, so that it becomes commonplace (trivial!) for you. The system of equations we are goi ng to consider is
2x + y= 8 x -3y=- 3
Probl •• l Draw the two lines represen ted by these equations. What is their point of intersection? Proill •• 2 Consider the vectors u =
[~J a nd v = [ _ ~J. Draw the coordillate
grid determined by u and v. ( Hlllt: Lightly draw the standard coordinate grid first and use it as an aid in drawing the new one.)
Problem 3 On the tr -vgrid, fi nd the coordinates ofw
= [ _:
J.
Proble.4 Another way to state Problem 3 IS to ask for the coefflcicnts x and yfor which xu + yv = w. Write out the twO equation s to which this vector equation is equivalent (one for each component). What do YOli observe? Preble. 5 Return now to the lines you drew for Problem I. We will refer 10 the line whose equation is 2x + Y = 8 as Ime I and the Iine whose equation is x - 3y "" - 3 as line 2. Plot the pOlilt (0, O) on your graph from Problem I and label it Po- Draw a 58
Sect ion 2.1
Tlble 2.1 y
Point
Po P,
0
0
P, P, P,
Introductio n to Systems of Linear Equations
59
horizontal line segment from Po to linc I and label this new pomt PI" Next draw a vert/ca /lin e segment from PI to line 2 and label this point P2" Now draw a hOrizontal line segment from Pl IO linc 1, obtai ning point Py Continue in this fashion , drawi n g vertICal segments to line 2 followed by ho rizontal segments to lin e I. What appears to be happening? Proille. a Using a calculator with two-d eci mal -place accuracy, find the (approximate) coordi nates o f the points P,. p~ , Pj' . . . , P6' (Yo u will fmd it helpful to first solve the fi rst equation fo r xin terms or r and th e second equatio n for y m terms of x.) Record yo ur results in Table 2.1 , writing the x- and y-coo rd inates of each point separately.
P, The results of these pro lJlems sho w tiltH Ihe task of "so lvlllg" a system of lint'! :lr equations may be VIewed in several ways. Repeat the process descri bed in the prob· lems with the following systems of equations:
P,
{a}4x - 2y " 0 x + 2y "" 5
{bp x + 2y == 9 x + 3y =: to
(c ) x + Y = 5 x- y = 3
(d ) x + 2y = 4 2x -
Y- 3
Are all of your observatioJls from Problems 1--6 suU valid for the5C examples? Note any similarities or differences. In this chapter, we will explo re these ideas in more detail.
Introduction to Svstems of Linear Equations Recall that the general equation of a line in HZis of the fo rm f1X + by = c and that the general equatio n of a plane in Rl is of the form
ax + by+ cz= d Equatio ns of this form arc called IinttJr equations.
Definition
A
linear equation in lhe " variables X l' X1 •. .. , Xw is an equation
that can be wriUen in the form alx l
where the coefficiellts tI l '
+ a:x: + ... + a.,x" =
Ill ' ...•
b
a" and the COllstan' term b are constants. .
Example 2.1
The foll owing equations are linear:
3x - 4y =- 1
V2X+7Y -
r -}s-Y t = 9
( sin ~) z =
1
3.2x I
-
O.Ol xJ "" 4.6
Observe that the third equation is linear because it can be rcwrilten in thc form X I + 5x z + X l - 2X4 = 3. It is also impo rtan t to note that, although in these examples (and in most applica tions) the coefficients and constant lerms are real numbers, in some examples and applications they W i ll be complex numbers Qr members of Ip for some prime number p.
Chapter 2 Systems of Linear Equal lons
The following equatio ns are not linear: x
-+z=2 y
.xy+2z= 1
Thus, linear equations do not contain products. reciprocals, o r o ther func tions of the variables; the variables occu r only to the first power and arc multiplied only by <:onstants. Pay particular allention to the fourth exam ple in each list: \Vhy is It that the fourth equation in the fi rst iisl is linear but the fourth equation in the second list is not r
+
+ ... +
.+
= b is a vector I$p~ .... , $.1whose components satisfy the equatio n when we substitute X I "" 51' x., = $" .. .• x. = $ ... A solution of a linear equatio n fl lXI
-
Example 2.2
(llX2
(I,.xn
-
(a) [5,4] IS a solutIo n of 3x - 4y = ~ I because, \\'h cn we substitute X = 5 and y = 4, the eq uation is s.1tisfi ed: 3( 5) - 4(4) = - I. ( I, I ] is another solution. In general, the sol utions simply correspond to the points on the line de termined by the given equalio n. Thus, setting x = I and solvlflg for y, we see that the complete set of solutions Gill be writlcn in Ihc pa ramcrric form ft,! + ~ tJ. (We could also sel r equa] to some parameter-say, s-and solve fo r x instead; the two para metric solutions wou ld look d ifferent but would be equivalent. Try thi s.) (b) The linear equation X I - X 2 + 2x} - 3 has [3,0,0], [0, 1,2 1, lind 16, I , - I I as specific solutions. The complete set of solutions corresponds to the set of poi nts in the plane determllled by the given equatLon. If we set X l = s and Xl = t, then a parametric solution is given by (3 + $ - 2t. s, I]. (Which valuesof sand I prod uce the three specIfi c solutIons above?)
A system oftitlear equatiol1s is a finite set of linear equations, each with the same vllriables.fA solutjon of a system of lin r equation s is a vector that is sinlllitllnmu* a solution of each equation in the system. 'T1le solutiouset.o f a.sy telll o f linear equationa is thesec of dlJ solutions of the system. We will refer to the p rocess o f fi nding the solution set of a system of linear equations as "solving the system."
Exalliple 2.3
The system
y=3 + 3y z: 5
2x X
has (2 , I ) as a solution, since it is a solution of both equations. On the other hand, II . -II is not a solution of the system, since it satisfies only the first equatio n.
Example 2.4
Solve the following systems o f linear equations:
(,)x - y: I x+y = 3
(b ) x - y : 2 2x - 2y = 4
(c) x - y :
I
x - y=3
Section 2.1
Introduction to Systems of Line:lr Equatlons
61
Solalloll (a) Adding the two equations together gives 2x = 4, so x = 2, frOIll which we find that y = 1. A quick check confi rms that [2, 11is indeed a solution of both equations. That this IS the Dilly solution can be seen by observing that th is solution corresponds to the (unique) point of intersection (2, 1) of the lines with equat ions x - y = 1 and x + y = 3, as shown in Figure 2.I (a). Thus, 12, 1) is a mriqlle so/wioll. (b) The second equation in thiS system is just t",,·ice the fi rst, so the solutions arc the solutions orthe firs t equation alone-namely, the POll1ts on the line x - y = 2. These can be represented paramet rically as 12 + t, tl. Thus, this system has infinitely many solutions [Figure 2.1 (b) I. (c) Two numbers x and ycannot simul taneously have a difference of 1 and 3. Hence, this system has 110 solutlollS. (A mo re algebraic approach might be to subtract the second equation from the fi rst, yielding the equally absurd conclusion 0 = - 2.) As Figure 2.1 (c) shows, th e lines for th e equations are parallel in th is case. y
y
)'
4
4
2
2 -f--+x
-4
- 2
-4
-2
-4
4
-2
-4 ( a)
(b)
(0)
Ilgure 2.1 A tern o(linear equations is caUed cOl/sistent if it has at leasrone solution. A systm1 with no SOIUti01l.S is called ;'lconsisteflf. Even though they are small, the three systems in Example 2.4 illustrate the only three possibilities for the number of solutions of a system o f linear equations wilh real coefficients, We will prove later thaI these same three possibilities hold fo r lIllY system of linear equations over the real numbers.
A system of linear equations with real coefficients has either (a) a unique solution (a consistent system) or (b) infinitely many solu tions fa l.:oDsistentsystem) o r (c) no solutio ns (an inconsistent system). ..,
Solving a Svslem 01 linear £Quallons Two linear systems are <:alIed 4!quivalmt if they have the same solution sellf. For example,
x -y :::: l x+ y = 3
and
x - y=! y= l
are equivalent, sillce they hOlh have the un ique solut ion )2, I], (Check th is.)
It
Chaptt r 2 Systems of Linear Equations
Ou r approach to solvmg a system of linear equations is to transform the given system into an equivalent one that is easier to solve. The triangular pattern of the second eXllmple above (in whIch the second equation has one less va riable than the first) is what we will aim for.
EKlmple 2.5
So lve th e system
x- y- ,= 2 y+ 3z== 5 5z == 10
Solullon Starting from the last equation and \"orking backward, \oJe find successively that z == 2,y = 5 - 3(2 ) = - l,andx = 2
+ (- I ) +
2 := 3. So the umquesolution is
[), - I ,2 [.
The procedure used to solve Example 2.5 is called back substitution. We now turn to the general strategy for transforming a given system LIlto an equivalent one that can be solved easily by back substitutIOn. ThIs process will be described in greater detail in the next sectIOn; for now, we will simply observe il in act ion III a single exa mple.
EKlmple 2.6
Solve the system
x- y- z= 2 3x - 3y + 2z = 16 2x - y + z = 9 SOliliol To transform th IS system into one that exhibI ts the triangular structure of Example 2.5, we first need to climitwte the variable x from eq uations 2
The word IIItJlrix is derived from the louin word /l/tJla, meaning umother." When the 5uffiJt -Ix is added, the meaning becomes uwomb." Just as a womb surrounds a fet us, the brackets of a matriJt surround its entries, and just as the womb gives rise to a baby, a matrix gives rise to certain types of functions called linear tfam/ormtJIiolls. A matrix with m rows and II columns is called an II! X n matrix (pronounced h I/! by /I~). The plural of matrix is matricC$, not "matrixes."
I
- I
)
- )
2
- I
2 2 16 I 9
- I
where the first three columns contain the coeffici ents of the variables in o rder, the fin
x - y -z= 2 3x - 3y + 2z = 16 2x - y+z=9
- I
2
I
- I
3
- )
2 16
2
- I
I
9
Section 2.t
Subtract 3 times the fi rst equation from the second equation:
In troduction to Systems of Linear Eq uatio ns
63
Subtract 3 times the firs t row from the second row:
z= 2 5z = 10 2< -y + z = 9 x-y -
I
- 1
o
0 -I
2
- 1
2
5 10 1
9
, Subtract 2 times the fi rst equation fro m the third equation:
x - y-
Z ='
2
third row: I
-1
- \
2
5%::: 10
o
0
5 10
y+32"" 5
o
I
3
Interchange equations 2 and 3:
)
Subtract 2 limes the first row from the
5
Interchange rows 2 and 3:
x - y- z = 2
I
- \
y+3% = 5 52 = 10
o
1
35
o
0
5 10
- )
2
This is the same system that we solved using back substi tution in Example 2.5, where we found that the solution was [3. -1, 21. This IS therefore also the solution to the system given 10 this example. Why? The calculations above show that any soillri01l of the given system is also II solution o/the final Otic. But si nce the steps we just performed are reversible, we could recover the origi nal system, start ing with the fin al system. (How?) So any roIl/tion of the final sy5tem is also a SOIlltioll of the givell one. T hus, the systems are equivalent (as are all of the ones obtamed in the intermed iate steps above). Moreover, we migh t just as well work wi th matrices instead o f equations, since it is a simple matter to reinsert the" variables before proceeding with the back substitution. (Wo rking with matrices is the subject of the next sectio n.)
Remark
Calculato rs with matrix capabilities and computer algeb ra sys tems can facilitate solving systems of linear equatio ns, particularly when the systems are large or have coeffi cien ts that arc no t "nice," as is often the case in real-life applications. As always, though, yo u should do as many examples as you can wi th pencil and paper until you are comfortable with the tech niques. Even if a calculator o r CAS is called for, thi nk about how you would do the calculatio ns manually before d oing any th ing. After yo u have an answer, be sure to thmk abo u t whether it is reasonable. Do not be misled into thUlking that techno logy will always give yo u the answer faster or more easily than calculating by hand. Sometimes it may not give you the answer at all! Roundoff errors associated with the fl oa ting- poin t arithmetic used by calculators and computers can cause serious problems and lead to wildly wrong answe rs to some problems. See Explo ration: Lies My Computer Told Me for a glimpse of the problem. (You've been warned!)
Chapter 2 Systems of Linear Equatlons
64
In ExerCISes 1-6, tletermme wllich equatiolls are lillear equa tiotlS in the vnr;ables x, y, ami z. If allY equatIOn is not linear, explain wilY nOI. 1. x - 1Ty + ~z = 0 2. X2 + I + Zl = \ 3.
X -I
Tilt systems II! Exercises 25 alld 26 exllibll a "lower trial/gll lar" pmlem tlwt makes them easy to solve by fonvllfd subs/I /Illioll . ( We will encounter forward subs/illllioll again ill Chapter 3.J Solve these systems. = - \ 25. x 2 26. x ,
+ 7y +z = sin(;) 5.3cosx - 4y +z=
4.2x - xy - 5z = 0 oj. 6. (cos3)x - 4y + z =
2x
V3
V3
III Exercises 7- 10, fiml a linear equal/OIl that has tile same soilltion set as the given equation (possibly with some restrictiollS on the 1'llriaIJ/es). Xl -
8.
7.2x+ y = 7 - 3y I I 4 9. - + - = x y xy
x
i
=
y
10. log,o X
1
29.
III Exercises 11-14,jind the solutlOlI S
\2. 2x,
Exercises 15- 18, draw graphs correspOIulillg to II,e gillell lillellr systems. Determine geomerrialll" whether ellch syslem lUiS (/ ullique soilltiol/, iI/finitely mw,y solutiO/IS, or 110 solu tioll. 'f/II! n wive cadI syslem algebraically 10 confirm )'OM t//lSlller. IS, x + )' = O 16. x - 2), = 7 3x + y= 7 2x + )' = 3 17. 3x - 6y=3 18. 0.lOx - 0.05y = 0.20 - x + 2y=1 - O.06x + 0.03)' = - 0.12 III
,/' Exercises /9- 24, soll'C Ihe glvell syslem by back 5111,;;11 til t/Oll.
21.x -
20. 211 - 31' = 5 211 = 6
y + z= 0 2y - z = \ 3z = - \
23. x, + x 1
22.
, ,
0
- 5x + 2x = 0 4x - 0
x }- x 4 = 1 Xl +X} +X4 = O Xj - x4 = 0 -
x~ =
x, + 2X2 + 3x) =
,
1
5
- 3x - 4y + z = - iO
7
x+5y =- 1 - x+ y =- 5 2x + 4)' = 4
30.
a - 2b + d= 2 - a+ b -c- 3d = J
lias tile given matrix as its augmemed matrix.
+ 3x 2 = 5
31.
y= 3
- 3
III Exercises 3 I alld 3Z, fi"d II system of linear etjuatlOtls ,IIat
13. x + 2y + 3z= 4
19. x - 2y= 1
=
rind tile augll/ellled matriCes of the Imear systellls ill Exercises 27-30. 27. x - y = O 28. 2x, + 3X-.! - xJ = 1 x, + x3 = 0 2x + y = 3 - x, + 2X-.! - 2X3 = 0
iog ,o Y = 2
-
+ )'
24. x - 3)' + z = 5 y - 2%=- \
32.
0
I
I I
I
- I
0 I
2
- I
I I
I
- I
I
I
0
I
0 2 0
3
I 2
I
- I 4
2
3 0
For Exercises 33-38, solve the lincnr systems ill Ihe givcn exerCIses. 33. ExerCise 27
34. Exercise 28
35. Exercise 29
36. Exercise 30
37. Exercise 3 \
38. Exercise 32
39. (a) Find a system of two linear equatIOns III the variables x and y \"hose solution set is given by the paramet ric equations x = t and), = 3 - 2t. (b) Find another pa rametric solution to the system III pari (a) in wh ich the parameter is 5 and y = s. 40. (a) Fllld a system of two linear eq uatIOns 111 the variables x,, x 2,and x J whose solution set is given by the parametric equations x I = t, X 2 = \ + I, and x 3 =2 - t. (b) Find another parametric solu tion to the system in part (a) in which the parameter is 5 and xJ = s.
Section 2. L Introduction to Systems of Linear Equations
-
In Exercises 41-44, the systems of equations are lIonliliear. Find substitutiorlS (cha nges of variables) that convert each system 11110 a linear system cmd lise this linear system to help solve the given system. 2
3 41. - + -= 0 x y 3 4 -+-= 1 x y
42.
x? + 2y2 =
6
x 2 - i =3 43.tanx-2siny
2 tanx- siny + cosz = 2 siny - cos z = - \ 44. -2~ + 2(3 b) = I 3(2") - 4(3') = I
65
Using your calculator or CAS, solve this system, rounding the result of every calculation to five significant digits. 3. Solve the system two mort" times, rounding first to four significant digits and then 10 three significant digitS. What happens? 4. Clearly, a ve ry small roundoff error ( less than or equal to 0.(0125 ) call result in very large errors in the solution. Explam why geomet rically. (Think about the grap hs o f the various li near systems you solved in Problems 1-3.) Systems such as the one you just worked w ith are called iI/-conditioned. They are ext re mely sensllive to roundoff errors, and there is nOI m uch we can do about it. We will encoun ter ill-condi tioned systems agai n in Chapters 3 and 7. Here is a nother example to experi ment with:
45S2x + 7.083y "" 1.931 1.73 Ix + 2.693y = 2.00 1 Play around with various numbers o f significant d igi ts to see what happens. startll1g with eight significant digits (if yo u can).
•
"
68
Chapkr 2
Systems of Li near Equations
•
Direcl Melhods for Solving Linear SVSlems In this section we will look at a general , systematic procedure for solving a system of lmear equations. ThIs procedure IS based o n the Idea of red ucing the augmented matrix of the given system to a form that can then be solved by back substitution. The method is direct in the sense that it leads direct ly to the solution (if one ex ists) in a finite number of steps. In Section 2.5, we wi ll co nsider some ;lI(ilrect methods that work in a completely dIfferent way_
Matrices and Echelon lorm There are two important matriccs associatcd with a linear system. The coefficient matrix contams the coeffiCIents of the variables, an
2x+y - z = 3 x +5z= 1 - x + 3y - 2z = O the coefficient 1
- 1
0 - 1 3
5
2 1
a ndlhe ugrncnted matri
-2
•
" 2
1
0 - 1 3 1
- 1
3
5 I
- 2 0
Note that if [I variable is mi ssi ng (as y is in the second equation), its coefficien t 0 is entered In the app ropria te positio n in the matrix_ If we denote the coefficien t matrix of a linear system by A and the column vector of constant terms by b, then the form of the augmented matrix IS [A I b J. In solving a linear system, It will not always be p ossible to reduce the coefficient mat rix to tri3ngular fOTm, ,IS we d id in Example 2.6. However, we can always achieve a stai rcase pattern in the nonzero entries of the final matrix. The word cchdoll comes from th e Lati n word 5C11ifl, meaning " ladder" or "stairs." The Fren.;h wo rd for '"ladder," tel/file, is also deri\-ed from this Latin base. A matrix. in «he1on form exhibilS
a staircase pattern.
---"
Deflnillon
A ma trix is in
form if it satili6es
lh~ ..folJpwing
p roperties:
t ..AIlY ro~ consi!iting entirely of .e!'05"8.T t the bottlm", 2. In each nonzero rO\ !he fi rst no nzero entry (called the leailing entry) is in a columu to the left of any leading entries below it
Section 2.2
&9
DITfil Met hods for Soh'ing Lmear Systems
Note that these p roperties guarantee that the leading en tries fo rm a staircase pattern. In particular, in any col umn con taining a leadlllg entry, all entries below the leading en try ar... zero, as the follow ing exam ples illustrate.
Example 2.7
The following 2 0 0
4
matri c~
1
0
1
1
0 0 0
5
0 0
1
2 D 0 1
are in row echelon form:
1
1 2 1 0 1 3 0 0 <J
0 2 0 0 0 0 0 0
0
1
- I
-I
1
2
0 0 0 0
3 2
0 0 5 4
-t-
If a matrix in ro w echelon fo rm is actually the augmented matrix of a linea r system, the system is qui te easy to solve by back substi tution alone.
Example 2.8
Assumi ng thai each of the mat rices in Example 2.7 IS an augmcnled ma trix, w ri te out the corresponding systems of linear equations and solve them.
Solullon
We fi rSI remind o u rselves that the last col umn in an augmented matrix is the vecto r o f co nslantterms. The first matrix then corresponds to the system
2x,+4x2 = 1 - X:z = 2 ( Notice that we have dropped the last equation 0 = 0, or Ox, + Ox! = 0, whICh is clearly satisfied for any values of XI and xJ.) Back substitutioll gives Xl = - 2 and then 2x, I - 4( - 2) 9, so XI "" ~ . The solution is 2 J. The second matrix has the corresponding system
=
=
n, -
The last equation represents Ox, + OX:! "" 4, which d earl y has no solutions. Therefore, the system has no solutions. Similarly, the system corresponding to the fourth matrix has no solutions. For the system corresponding to the th ird matrix, we have
so XI = I - 2(3) - X1 = -5 - Xz. There are infinitely many solutions, since we m3Y assign Xz any value r to get the parametric solution {- 5 - t, t. 3 J.
ElemeRlarr Row Operallons We now desc ribe the p roced ure by which any mat rix can be reduced to (j matrix in row echelo n form. The allowable o peratio ns, called elementary row operations.
18
Chaptrr 2 Systrms of Linear EquatIOns
correspond to the operations that can be performed on a system or linear equations to transform it into an equivalent system .
Definition
The following elem entary row operations can be performed on a
matrix: I. Interchange two rows.
2. Multiply a row by a nonzr ro constant. 3. Add a multiple of a row to another row.
Observe that dividing a row by a nonzero constan t is implied in the above definition, since, fo r example, dividing a row by 2 is the same as multiplying it by Similarly, subtract ing a multiple of one row from another row is the same as adding a negative multiple of one row to another row. We WII! use the following shorthand notatio n for the three elementary row operatio ns:
t.
I. R, +-+ Rj means intercha nge rows j and j. 2. kR; means multiply row j by k. 3. R, + kRJ means add k tJlnes row J to row I (and replace row i with the result).
The process of applying elemen tary row operat ions to brmg a matrix into row echelon form, called row reduction, is used to reduce a matrix to echelon form.
El3mple 2. 9
Reduce the follow ing matrix to echelon fo rm:
1
2
- 4
- 4
5
2
4
o
0
2
2 3 - 1 1
2
1
5
3
6
5
Solution
We work column by column, from left to right and from top to bottom. The strategy is to create a leading entry in a column and then use it to create zeros below It. The entry chosen to become a lead ing en try is (aLlea a pivot, and this phase of the process is cal led pivoting. Although no t strictly necessary, it is often convenient to use the second elementary row operation to make each leading entry a I. We begin by introducing zeros into the firs t column below the leading 1 in the first row:
1 2 2 4 2 3 - 1 1
- 4 0
2 3
- 4
5
2 1 5 6 5 0
R, - 2R, R, - 2R, 1<, + H,
,
1
2
- 4
- 4
0
0
8
0
- 1 3
8 10 - 1
0
9
5 - 8 -5
2
10
The first colu mn is now as we wan t it, so the next thing to do is 10 create a leading entry in the second row, aiming for the staircase pattern of echelon form.
Se<:tion 2.2
Dire<:t Me thods for Solving Linear Systems
11
In th is case, we do this by interchanging rows. (We coutd also add row 3 or row 4 to row 2.) 11. . _
II,
•
-, -,
I
2
0 0 0
-I
10
9
5 -5
0 3
8
8
-8
- I
2
10
The pivot this time was -I. We now create a zero at the bottom of column 2, using the leading entry - 1 in row 2:
R.+311,
I 0
2 - \
o
0 0
• o
- 4
10 8 29
- 4 9 8 29
5 -5
-8 -5
Column 2 is now done. Noting that we already have a leading entry in column 3, we just pivol on the 8 to introduce a zero below it. This is easiest if we first divide row 3 by8:
III.,
•
-, -,
I
2
0 0 0
-I
10
0 0
1
29
9
5 - 5
I - I 29
- 5
Now we use the leading I in row 3 to create a zero below it:
,
I ' 29 1<,
•
0 0 0
- 4
-4
10
9
5 -5
Q
4
~
-I
0
0
0
24
2
With this fin al step, we have reduced our matrix to echelon for m.
Kellar', • The row echelon form of a matrix is' n ot unigue. (Fi nd a different rowecheton form for the matrix in Example 2.9.) • The lead ing entry in each row is used to create the zeros below it. • The pivots are not necessarily the entries that are originally in the positions even tually occupied by the leading entnes. In Example 2.9, the Pivots were I, - I, 8, and 24. The original matrix had 1,4, 2, and 5 in those positions on the "staircase." • Once we have pivoted and introd uced zeros below the leading entry in a column, that column docs not change. In other words, the row echelon form emerges from left to right , top to bottom. Elementary row operations are reversible-that is, they can be "undone." Thus, if some elementary row operation cOllverts A into B, there is also an elementary row operation tha t converts B into A. (See ExerCises IS and 16. )
12
Chapter 2 Systems of Lmear Equations
Definition
Matrices A and B are row equivalent if there is a sequence of elementa ry row operations that converts A into B.
T he matrices in Example 2.9, ]
2
- 4
- 4
2 2
4
0
3
0 2
I
5 2 5
- I
I
3
6
5
,nd
]
2
- 4
-4
0 0
- ]
10
9
5 - 5
0
I
I
- I
0
0
0
0
24
are row equivalenLIn general, though, how can we tell whether two matrices are row equivalent?
Theorem 2.1
Matrices A and B are row equivalent if and only if they can be reduced to the same row echelon fo rm.
Prool
If A and Bare row equ ivalent, then furt her row operatIons WIll reduce B (and therefore A) to the (same) row echelon form. Conversely, if A and B have the same row echelon form R, then, via elementary ro w operations, we can convert A into Rand B into R. Reversing the latter sequence of operations, we can convert R into B, and therefore the sequence A ---;. R ---;. B ach ieves the desired effect. ________
Relllark
In practice, Theorem I is easiest to use if R is the reduced row echelon fo rm of A and B, as defined on page 76. See Exercises 17 and 18.
Gaussian Elimination When row reduction is applied to the augmented matrix of a system of linear equations, we create an equivalent system that can be solved by back substitution. The entire process is kno wn as Gaussian elimination.
Gaussian ElImination
I . Write the augmented matrix of the system of linear equations. 2. Use elementary row operations to reduce the augme nted matrix to row echelon form . 3. Using back substitution, solve the equivaJent system thaI corresponds to the row- reduced matrix.
Remark
When performed by hand, step 2 of Gaussian elimination allows qUI te a bit of choIce. Here are some useful guidelines: (a ) Locate the leftmost colum n that is not all zeros. ( b) Create a leading entry at the top of this column. (It will usually be easiest if you make this a leadi ng I. See Exercise 22.)
Section 2 2
Direct Methods for Solving Linear Systems
13
•
(c) Use the lead ing entr y to create zeros below it. (d) Cover up the row contain ing the leadmg entry, and go back to step (a) to repeat the procedu re o n the remain ing submatTlx. Stop when the ent ire matrix is in rowechcion form.
Example 2.10
Solve the system
2X2 +
3 Xl
8
5 x, - 2xJ = - 5
2Xl+3~ +
x, 501111101
=
Xl
=
T he augmented malrix is
2 3
0 2 1
- I
8 5 - 2 - 5 3 1
We p roceed 10 red uce this matrix to row echclon form , foll owing the guidelines given fo r slep 2 of the process. The first nonzero column is column I. 'Ne begm by creating a leading entry al the to p of this column; interchanging rows I and 3 is Ihe best way to achieve th is.
0
2
2
3
1
- I
8 1 5 -2 -5 3
,
R,"' R,
-2 - 5
1
- I
2
3
1
5
0
2
3
8
We now create a second zero in the fi rst column, using the leading I:
" ,
lit,
1
- I
0
5 2
0
-2 - 5 5 15 3 8
Carl Friedrich Gauss ( 1777- 1855) IS generally conSidered to be o ne of Ihe thfet: greates t mathematicians of al! time , along with Archimedes and Newton He is often called the "prince of mathematiCIans," a mckname that he richly deserves. A child prodigy, Gauss reportedly could do arithmetIC bdore he could talk_ At the age of 3, he corrected an er rur In his father's calcul;lliot1s for the company pa yroll, and as a young student, he found th e formula 11(11 + 1l/2 for the sum of the first II natural numbers. When he wa.~ 19, he proved that a 17-sided polygon could be co nslru(ted USing only a straightedge and a comp;lss, ~md at the age of 2 1, he proved, in his doctoral dissertnllon, that every polrnomial of degree n wi th real or co mplex coefficie nts has exactl y II zeros, coullting multIple zeros- th r Fundamental Theorem of Aigebr;l. Gauss' t 80 [ publicat ion Disqllisiliolles Anthmcticllc is gener:llly considered to be th e fou ndation of modern num ber theory, but he made co nt ributions to nearly eve ry branch of mathematiCs as well as to statistics, physics, astronomy, and su rveymg. Gauss dId not pubhsh all of his findings, probably because he was tOO crukal of Ills own work. He also did not like \0 teach and was often critical of other mathemnt lcinlls, perhaps be
14
Ch~JHtt
2 Systems of Lmear Equallons
We now cover up the fi rst row and repeat the procedure. The sc
i
\
- \
0
5 2
0
- 2 - 5 5 \5 3 8
til.
•
-2 -5
\
-\
0
\
\
3
0
2
3
8
We now need ano ther zero at the bottom of column 2:
•
. 1 H,
•
-2 - 5
\
- \
0 0
\
\
3
0
\
2
The augmented matrix is now in row echelon form , and we move to step 3. The corresponding system is
2x} ::::: - 5
~ -
XI -
+
Xl
Xl:::::
3
Xj :::::
2
and back substitution gives Xj ::::: 2, then ~ ::::: 3 - X j ::::: 3 - 2 ::::: I, and finally X I ::::: - 5 + X2 + 2x) ::::: - 5 + 1 + 4 ::::: O. Wt: write the solution in vector form as
o \
2 (We are going to write the vecto r solut ions of linear systems as column vectors from now on. The reason for this will become clear in Cha pter 3.)
Example 2.11
Solve the system
w-
x-y+2z:::::
I
2w - 2x - y+3z:::::
3
:::::-3
- w+ x- y
SOlulloD
The augmented matrix is 1
- I
- I
2
I
2
-2
- I 3
3
- I
1
- I
0- 3
which can be row reduced as fo llows: \
-\
- \
2
2
- 2
- \
- \
\
- \
3 3 0 -3
\
• .
••
!R,
'•.
•" •
\
-\
- \
2
\
0 0
0
\
- \
\
0
-2
2 -2
\
- \
- \
2 \
0 0
0
\
0
0
- \
\
0 0
Section 2.2
Direct Methods for Solving Line;!.r Syst mlS
15
The associated system is now
w -x -y +2 z= 1 y - z= I which has mfinitely man y solution s. There is more than one way to aSSI gn JXItlUllelets, but we will proceed to use back substItution, writing the variables corr espo nding to the leading entries (the lead ing vari ables) in terms of the other variable s (Ihe free variables). In this case, Ihe lead mg v:lriables are )\I and y, and the free vari,lbles are x :md z. Thus, y = 1 + z. and from this .....e obtain
w =1 +x +y -2 z = I
+
x
+ ( I + z) - 2z
= 2+ x -z If we assign parameters x es and z = I,t he solution can be written in vect or form as w
2
x y
+s-
,
\
,
t
2 0
+ r
\
r
0
+>
\
- \
\
0
0
+r
0
\ \
4
Example 2.1 I high lights a very importa nt property : In a consistent syst em, the free variables arc just the variables that arc not leading variables. Si nce the num ber of lead ing vari ables is the num ber of non uro rows in the row ech don form of the coefficie nt matrix, we can predict the num ber of free variables (p:lram eters) before we lind Ihe explicil solulion using back substitution. In Cha pter 3, we w,1I prove that , although the row echelon form of a matrix is not unique, the number of nOllzero rows is the So1me in all row echelon forms of a given matrix. Thus, it mak es sense to give a name to th is num ber.
"m k of a matt· is the Q;umber oC .IlOI 1lcrO =
•
\-Ve Will denote the rank of a mat nx A by rank (A) . In Exa mpl e 2.10, the rank of
the coefficient matrix is 3, and in Example 2.11, the rank of the coefficie nl matrix is 2. The observations \ve have just mad e Justify the following theorem, which we will prove in more generality in Chapters 3 and 6.
l
Theor •• 2.2
The Rank The ore m lei A be the coefficien t mntrix of ;l system of linear equations with ""Variables. If
Ihe syslCm is consistenl. Ihen
num~r of fltt
variabl
","iot A ) - p ..
•
16
Chapler 2 Systems of l mear Equations
Thus, in Example 2. 10, we have 3 - 3 = 0 free variables (in other words, a unique solution), and in Example 2.11 , we have 4 - 2 = 2 free variables, as we fo und.
Example 2.12
Solvc thc systcm ~+2X3=
X1 -
Xl
+
2X:1 -
X3
3
= - 3
2xl - 2x3 =
SolutIon
I
When we row reduce the augmented matrix, we have 1 1
0
- I 2 2
2 3 - I -3 - 2 1
,
~ II,
0
- I 3 2
2 3 - 3 - 6 -2 1
1
- I
0
1
0
2
2 3 - I - 2 - 2 1
1
- I
2
3
1
-I
- 2
1
11, - 11
0
,
, o
11. - 1 11
o
o
0
5
i
lead ing to the impossible equal ion 0 = 5. (Wc could also have pc rfo ~m ed R) - R2 as the second elementary row operatio n, which would have given us the sam e contradiction but a d ifferen t row echelon fo rm .) Thu&. th e systcm.has..llP solution5-' . ]J)).QlJ),nSJCAt.
Gauss-Iordan Ellmlnallon Wilhelm lo rdan ( 1842-1899) was a German professor of geodesy whose cont ribution to solving linear systems was a systematic method o f back substitution
closely rdated to the method des<:ribed here.
A modification of Gaussia n elimination greatly sim plifies the back substitut ion phase
and IS particularl y h elpful when calculat ions are being done by hand o n a system with in finitely many solutions. This vanant, known as Gauss-Jordan elimi"ation, rei les on red ucing the augmented matrix cvcn furth cr.
,
Definition
A matrix is in reduced row echelon f orm if it satisfies the follow-
ing properties: S in row echelon form. 2. 1n e leading entry in each nonzero row is a 1 ( called a leading I) , 3. Each column con taining a lead ing 1 has ..:eros everywhere els~
• The following matrix is in reduced row echelon fo rm: 1
2 0 0
0
0
1
0
0
0 0
-3
1
0 0
4
- I
0
0 1
3
-2 0
0
0
0
0
0
1
0
0
0
0
0
0
Section 2 2
Direct Methods for Solving Linear Systems
n
For 2 X 2 matrices, the poSSIble reduced row echelon fo rms are
For a short proof of this filet, see the article by Thomas Yuster, "The Red uced Row &helon Form 01 a Matrix Is Unique: A Simple Proof.~ in the M3rch 19M ~ue of .\1,]llIr mafies Magluim·(\·ol. 57, no. 2, pp. 93-94).
Gauss-Jordan EIImlaalioa
Example 2.13
oI] . 'nd
[00 0]0
whe re · can be any number. It is clear Ihat after a malrix has been reduced to echelon fo rm, further elementary row operatio ns will bring ilto reduced row echelon form. WhaJ is nol dear (although intuition may s uggest it) is that , unlike the row echelon form, the reduced row echelon form of a matrix is unique. In Gauss-Jo rdan ehmination, we proceed as III Gaussian elimination but reduce the augmenled malrix to reduced row echelon form.
I. \Vrite the augmented matrix of the system of linear equations. 2. Usc elementary row operations to reduce the augmented matrix to reduced row echelon form. 3. If the resulting s~ ,h:m IS wnslstcnt, solve for the leading va riables III terms of any remaining free variables.
Solve the system in Example 2.11 by Gauss-lordan elimination .
Solullon The reduction proceeds as it d id in Example 2.1 I until we reach the echelon form: 1
- 1
- 1
o
0
1
2 1 - I 1
000
o
0
We now must create a 0 above the leading I in the second row, thud column. We do this by adding row 2 to row I to obtain I
- I
0
0
1
o o The system has now been reduced
1 2
- I
0 0
o
1
0
10
IV -
X
+ z=2 y - z= 1
It is now much easier to solve for the leadlllg va riables:
w =2 +x - z
,nd
y = I
+
Z
l'
Chapter 2 Systems of Linear Equations
If we assign parameters x = s and z = 'as before, the solution ca n be written in vector
form as 2
w
+5-
,
x
y
1
,
,
+
I
I
.-t-
",."l
From a comp utational point of view, it i.s more emdent (in the sense that it re{luircs fewer calculat ions) to fi rst reduce the matrix to row echelon form and then, working from right to left, make each leading entry a \ and create zeros above these leading I s. However, for manual calculation, you will find it easier to just work from left to right and create the leading Isand the zeros in their columns as you go. Lct's return to the geometry that brought us to this point. Just as systems of linear equations in two variables correspond to lines in R!, so linear equations in th ree variables correspond to planes in RJ. In fact. many questions about li nes and planes can be answered by solving an appropriate linear system.
Example 2.14
Find Ihe Jine of intersection of the planes x + 2y - z = 3 and 2x + 3y + 2 = I.
Solutlol r:irst, observe that there wilfbe a line of in terseClion, since the normal vectors of the two plancs-[ I, 2. -\ J and [2,3, II-are not pa rallel. The points that lie in the intersection of the two planes correspond to the poinls in the solution set of the system
+ 2x + x
2y - 2 = 3
+2=
3y
1
Gauss-Jordan elimination applied to the augmen ted mat rix yields
[;
2 3
- I
3]
R.
1 1
• [~
2
2R
r, .,.1/f.
- I
[~
o
+ 52 =
-7
y - 3z =
5
•
l
-I 3] 3 - 5
1
Replacing variables. we have x
We set the free variable z equal to a parameter I and thus obtain the parametric equations of the line of intersection of the tw"o planes:
x=-7 - 5t
figure 2.2 The intersection of two planes
Y =
z=
5
+
3t t
Section 2.2
19
Direct Meth ods for Solving Linear Systems
In ve<:lo r fo rm, the equat ion is
x
~7
y
5
,
~5
+ ,
o
3 1
See Figure 2.2.
EKamDle 2.15
1 Let p ==
x= p +
~
,q
~
o
1
2 ,u =
l,andv =
3 - I . Determine whether the lin es
- I - I 1 1 I U and x == q + rv inte rsect and, if so, find thei r jXlint o f in tersect io n.
SllllIol
We need to be carefu l here. Altho ugh t has been used as the parameter III the equatio ns of both lines, the li nes are ind ependent and therefo re so arc their parame ters. Let's usc a d ifferent param eter- say, s- for the fi rst li ne, so its equation
x becomes x "" p
+
su. If the lines intersect, then we wan! 10 fi nd an x =
satisfi es both equatio ns simultaneously. Th:lt is, we want x = p su - tv = q - p . Substitu ti ng the given p, q , u, and v, we obtal1l the equations
+
SU
= q
y lhat
,
+
Iv or
5 - 31 =- }
s+
1
=
2
s + 1=
2
t
whose solutIon is easily found to be s = ~ , I =
i .The point of intersection is thereforc
,,
1
x
1
1
y
0
+ )', 1
- I
1
,
~
,, ,
)'
)'
figure 2.3
Sec Figure 2.3. (Check thaI substituting t = ~ in the other equa tion gIves the same
Two intersecting lines
point. )
In R'\ It is possible fo r two lllles 10 intersect in a pOll1t, to be parallel, o r to do neither. Nonparallel lines that d o not intersect a rc called skew lilles.
Be••r'
HomOgeneous
S~stems
We have seen that every syslem of linear equations has either no solutio n, a u nique solutio n, or infinitely ma ny solutions. However, there is one type o f system that always has at least one solutio n.
,
80
Chapter 2
Systems of Linea r Eq uatIons
DeUaltloD
A.system of linear equations is called homogeneous if the constl:1n term in eamequation is zero.
III other words, a homogeneous system has an augmen ted matrix of th e for m IA I OJ. T he following system is homogeneous:
2x - x
+ +
3y -
Sy
+
Z ""'
0
2z = 0
Since a homogeneous system cannot have no solution (fo rgive the double negative!), It will have either a unique solution (namely, the zero. or trivial , solution 0 •nfi nit Iy Illany solutions. The next theorem says that the latte r case m llst occur if the number o f var iables is greater than the number of equations.
, Theore. 2.3
If [A I 01 is a homogeneous system of II! li near equat ions wi th m < II, then the system has in fi ni tely man y solutions.
/I
variables, where
Prool SlIlce the system has at least the zero solution, it is consistent. Also, rank{A) S
//I
(why?) . By the Rank Theorem, we have numberoffree variables =
II -
rank(A );=:
II -
m > 0
So there is at least one free variable and, hence, infinitely many solutions.
Theorem 2.3 says noth ing about the case where m ~ II. Exercise 44 as ks you to give examples to show that, in this case, there can be either a unique solution or infi ni tely many solut ions.
Motl
Linear SlSlems over 7L, Thus far, all of the linear systems we have encountered have involved real numbers, and the solut ions have accordingly been vectors in some [R n. We have seen how other number systems arise--no tably,'Z ..Wd~n p isa prime number,Z.pbehaves in manyrespects like R; in particular, we can a ,5ul)tract, multiply, and d ivide (by no nzero num bers). Thus. we can also solve systems of linear equations when the variab les and coefficien ts belong to some z.p'In such instances, we refer to solving a system over Z,. For example. the linear eq uation X I + Al + X, = 1, when viewed as an equation over 1 2, has exactly fou r solutio ns: Rand Zpa re examples of fields. The set o f rational num~rs 0 and the set o f complex nu mbers Care
other examples. Fields are covered in detail in courses in abstract algebra.
x,
I
X,
0
x,
0
x, ,
x,
X,
0 I ,
x,
0
x,
(where the last so lution arises because I
X,
+I+
0 0 • ,nd I I "" I in Z. l)'
x,
I
x,
,
x,
I
Section 2.2
Direct Methods for Solving Linear Systems
.,
x, When we view the equation XI
+ :s + x, =
l over Z " the sol utio ns
Xl
are
x, 100202JJ2 0 , 1 ,0,2 , 2 , 0 , 1 , 2,1 00
1
02221
1
(Check these. ) But we need not use trial-and-erro r methods; row reductio n o f augmented matrices wo rks just as well over Zpas over R.
Solve the following system o f linear equations over l ,: XI XI
+2x,+ x.~= O +x,= 2 X:z + 2x, :::z 1
Solutlaa
The fi rst thing to note in examples like this one is that subtraction and d ivision are not needed; we can accomplish the same effects usi ng addition and multiplica tion. (Th is, however, requires tha t we be working over l ,. where p is a prim e; see ExerCise 60 at the cnd of this sectIOn and Exercise 33 in Section IA. ) We row red uce the augmented matrix of the system, using calculations modulo 3. 1
2
1 0
1
0
1 2
0
,
II .. .. ZII,
2 1 II, + II,
II, + 211,
Thus, the solution is X I = 1, X2 = 2. x,
Example 2.11
::
1
2
1 0
0
1
0 2
0
1
2 1
1
0
1 2
0
1 0 2
0
0
2 2
1
0 1
0
0 1
0 2
0
0
1 1
I.
Solve the following system of linear equations oyer 1 2:
x, +x1+x, + x, +x, X:z + x, Xj + +
x~
= 1
=1 =0 ~
:: 0
x~
:: 1
82
Chapter 2 Systems of Linear Eq uations
Solation
The row red uction proceeds as fo llows: I
I
I
I
I
I
I
0
0 0 I
I
I
0 0
I
0 I 0 0 I 0
0
I
H. + II,
,
/I. ~ H,
I
11. .... ,1. II, + 11, II, + II,
•
11,+ /1 .
,
/1,. + 11,
I
I
I
I
0 0 0 0
0 I 0 I
I
I 0
I I
0 0 I 0 0 0
I
0
0
I
0 0 0
I
I
0
I
0 0 I 0
I
I 0
0
0 0
0
0 0
I
0
I
0
I
0 0
0 0 0
0 0
I
I
0
0
0
I
I
I
I
I 0
0 0 0 0 0
T herefore, we have +~ = I
x,
+
x~
= 0
X3+ X4=O
Seui ng the free variable x. = t yields
x, x, x,
I + 1
I
1
'"
1
0 0 0
1
I +1
I I I
SlIlce t can take on the two values 0 and I, there are exactly two solutions: I
o
o
I
o o
and
I I
.1 .. 1"
For li near systems over lLp' there can never be infinitely many solutions. (\Vby not?) Rat her, when there is more than one solution, the nu mber of solutio ns IS fi nite and is a function of lhe number of free variables and p. (See Exercise 59.)
-
Direct Methods for Solvmg L.lnear S),Stt'I11S
Sei.:tio n 2.2
13
Exercises 2.2 In Exercises 1-8. determi"e whether the giwn matrix is ill row echelon form. If it I S, state w~,ether it IS also it, reduced row echelon for m. 0 0
3
I
0
I
3
0
0
0
3
5. 0 0 0 I
0
I
I.
3.
0 0
[~ I
7.
I
2.
~l - 4 0 0 0 0
5
7 0
I
0
I
- I
4
111 Exercises J 7 (llid I B, sl/ow that Ihe give" IIIMnC!!5 (l re row
0
0
0
equlI'a/e/lt {Hid find (I scqllf!llce of eicm cll wry ro w opera/iollS Ilwl will COll verl A lfl to 8.
0 0
4.
6.
I
0 0 0
0 0
0
0
0
0
0
I
0
I
0
I
0
0
0
17. A ==
2 3
2
I
3
5
I
0
0
0
0
I
- I
0 0
I
I
0
I
0 0 0 0 0 0
3 0
8.
row «he/Oil form. 0
I
9. 0
I
I
I
I
I
3
5 - 2
II.
5
2 13.
3 2
4
10.
12.
4 - 2
- I
- I
- 1
- 3
- I
14.
- 2 - 3
- 4
- 2
I
6
-, - 6 2
:]
-,
- 4
0
- I
10
9
5 - 5
I
I
- 1
0
0
24
0
0 0
0
1
2
- 4
- 4
5
2 2
4
0
0
3
2
I
- I
I
3
6
2 5 5
0 ,B
- I
I
I
~
I
- I
3 2
5 2
I
0
Perfor m R! + R, and R, + R2• Now rows t and 2 are identical. Now perform Rl - R, to obtain a row o f zeros in the second row.
+
RI • R, - R2• Rz + R, • - R,
21. St udents freque ntly perfo rm th e following type of cal culation 10 introd uce a zero intO :1 matrix:
7
10 - 3
15. Reverse the elementary row operatIOns used in Example 2.9 to show that we can convert
2
I
R!
1
I
I
3
20. What is the net e ffect of performi ng the fo llowing sequence o f elem en tar y row o perations o n a matrix (with at least two rows)?
[~ ~] [:
- I
2 0
19. What is wro ng w ith the fo llowing "proof" that every m atrix with at least two rows is row cqUlvalent to a matrix with a :.:ero row?
ill Exercises 9-1 4, use eielllel//(/fY row operatiolls to redllre Ihe gIVen matrix to (a) row echeloll f orm (md (b) retillced
0
:J. B =[~ -~ ]
[~
18.A=
I
,
16. In general, what IS the elementa ry raw o peratio n that " undoes" each of the three elementary row o perations R, +-+ Rf kR,~ and R, + kR,?
However, 3 Rz - 2R, IS flOl an elem entary row operatio n. Why not? Show how 10 achieve the same resul t using elem entar y row operations. 22. Consider the ma trix A =
into
[~ ~ ]. Show that any o f
Ihe th ree types o f elem entary row operations can be used to create a le:lding I at the top of the first co lumn. \Vh ich d o yo u p refer and why? 23. What is the rank o f each of the matrices in Exercises 1-8? 24. \Vhat are the possible reduced row echelon fo rms of 3 X 3 m atrices?
••
Chapter 2 Systems of Linear Equallons
In Exercises 25-34, solve the given system of eqllations IIsing either Gausswn or Gauss-Jordan elimination, 26. x- y+ z = O 25. X,+ 2x2-3x,= 9 - x + 3y + z = 5 2x, - X2+ x,= O X2 + X, = 4
4x, -
27.
x, - 3x z - 2x, = 0 -x,+ 2xz + xJ= O 2x,
+
4x1 + 6xJ
+ s= + s=
29. 2r 4r
+ 21V + 3x
28.
+
Y
3x - y
31V - X 31V - 4x
= 0
7z = 2
+ 4z = + z=
0 I z= 2
+ y-
J 7
- XI
+ 3x1 - 2x] + 4X4 =
31.
+
3x1
X, ! X1
+ X2 -
ji, x,
+'jX2
}XI
2 =
+ Xs
22 = 3, ~
- vl
-y + V2z=
I
+X+
2y
2
+z=
+
ky = I
kx+ y == 1 43. x
3z = 2
+ Y + kz
x+ky+
x+ y+ z= k 2x - y+4z= f(-
==
1
z=
I
kx+ y+ z=- 2
and
46. 4x +y-z= O and
B
2x - y+ 4z=S
2x-y+ 3z= 4
47. (3) Give an example of th ree planes that have a com mon line of intersection (Figure 2.4 ).
I
1
°
y+z "'=' x+ y - \ w+x +z= 2 c+ d ., 34.n + b + a + 2b + 3c+ 4(1 = n + 3b + 6c+ lOti = a+ 'Ib+ 10c + 20d E> w- x -
+
45. 3x+2y+z =- 1
= -1
- 4x:s=
2x,
vly IV
=
42. x - 2y
41. x
In Exeroses 45 a"d 46, fi "d the li"e of intersection of the give" planes.
6X4
- 3",
32. V2x+ y+
33.
8~
4x, XJ -
-
0
xJ-2 ~ =-J
2x,-6x2+
40. kx + 2y = 3 2x - 4y = - 6
44. Give examples of homogenCQus systems o f m linear equations in 11 variables with It! "" n and with //I > n that have (a) infinitely many solutions and (b ) a unique solution.
2r+5s=- 1 30.
III Exercises 40--43,for what value(s) of k, if any, will the systems hllve (a) I/O soilltion, (b) a unique solution, l/lld (c) illfilli rely II J(lIIY solll t lOllS ?
4 iO 20
35
flglrl
Exercises 35-38, (lelermi"e by impectioll (i.e" withollt performing any ca/clliatiolls) wlletller a /mear system with the given allgmented matrix has a IImque 5O/lItIOIl, infillitely many solutions, or flO solution, lllsllfy your answers.
2.'
In
0 35.
37.
0 I
3 36. I
-2 2
2
4
4 0
I
2 3
4
8 0 12 0
38. 6 7
4
3
7
7
I 2
0 I 0
3 I I I
I
2
5 9
6 10
3 7
"
5 7
0 -3
(b) Give an example of th ree planes that intersect in pairs but have no common point of intersec~on (Figure 2.5).
I
I I - I
-6 2
0
5 6 2 I 7 7
39. Show that if ad - be '1= 0, then the system
(u:+by= r
cx+ dy==s has a unique solution.
fl11U12 .5
Section 2.2
Direct Methods fo r Solving linear Systems
15
show that there are mfmitcly many vectors
(el GLve an example of three planes, exactly two of which arc parallel (Figure 2.6).
x, x '"
X,
x, that simultaneously sat isfy u ' x = 0 and v' x = 0 and that aU arc multiples of u Xv=
fl,.,.2 .• (d) Give an example of three planes that in tersect in single pomt (Figure 2.7).
lL
52. Let P =
I':Y, -
"J v~
" , VI -
III V,
lil Y: -
IIl Vt
2
1
0
1 •q =
1 •u = - 1
0
- 3 , andY = 1
0 6 - 1
Show that the Ji nes x = p + su ::and x = q + IV,\Te skew lines. Find vector equations of a pair of p::arallel planes, one contaimng each line. hi Exercises 53-58, soll'e tile systems oflmeaTequations over
tile indim/ed Zr 53. x + 2y = l over Z,
+ Y= 2 +Y = lover 'Z.2
x 54. x
y+ z=O
+ z=
x
fill" 2.1
y+ z = O +z= l
x 48. P =
49. P -
3 1 .q = 0
- 1
1
2 ,u =
2 ,v =
- 1
0
0 ,v =
2 3
1
1
1
1 ,u = - 1
2 50. Let P = 2 , u = I , and v = 1 . Describe 3 - \ 0 all points Q = (a, b, c) such lhal the line through Q with direclion vecto r v intersects the line with equation x = p + SUo 1
1
51. Recall that the cross prod uct of vecto rs u and v is a v«tor u X v that is orthogonal to both u and v. (See Exploration: The Cross Product in Chapter I.) If and
v=
56. 3x x
1
- 1
0
1
= I over 'Z.,
55. x+ Y
2
"v: ",
I
+ 2y = lover l.J + 4y = I
57. 3x + 2y= I overl., x
58.
+ 4y
= 1
+ 4~
XI
x t + 2x!+ 4x, 2xt+ ~
+
"" I over Z", = 3
~= 1
+ 3x,
"" 2 59. Prove the following corollary to the Rank Theorem: Let A be an m X "m::atrix with entries in Z,. Any consistent syslcm of linear equ::ations wi th coefficienl m:llrix A has exactly p" ,.".V.l sol utions over 7L r ''(I
60. When p is not prime, eXira care is needed in solvi ng a linear system (or, indeed, any equation) over l,. Using Gaussian e limination, solve the foll owing system over l,. What complicatio ns ::arise? 2x + 3y= 4 4x + 3y = 2
.~
...
,
~
.:;$: .
CAS
Partial Pivoting In Exploration: Lies My Computer Told Me following Section 2.1, we saw that ill conditio ned linear systems ca n cause tro uble when roundoff error occurs. In this exploration, you will d iscover another way in which linear systems are sensitive to rou ndoff error and see that very small changes in the coefficients can lead to huge maccuracles in the solution . Fortunately, there is somet hing that can be done to
mmtmize or even e1immate this problem (unlike the problem with tn-conditioned systems). I.
(a) Solve the single linear equation 0.OOO21x = 1 for x.
(b) Su ppose yo ur calculator can carry on ly fo ur significant digits. The equation will be ro unded to 0.OOO2x = I. Solve this equation. The difference between the answers in parts (a) and (b) can be thought of as the effect of an error of 0.0000 1 o n the solution o f the given equation. 2.
Now extend this Id ea to a system o f linear eq uations. (a) Wit h Gaussian elimination, solve the li n ear system 0.400x
+ 99.6y =
75.3x - 45.3y =
100 30.0
using three siglllficant digits. Begin by pivoting on 0.400 and take each calculation to th ree significant digits. You should obt:lin the "solution" x = - 1.00, Y = 1.01. Check that the actual solutjon is x = 1.00, Y = 1.00. Th iS is a huge error-200% 111 the xvalue! Can you discover wholt caused It? (b) Solve the system in part (a) again, this time interchanging the two equa tions (o r, equivalently, the two rows of its augmented matri x) and pivoting on 75.3. Again, take each calculation to three sign ificant digits. What is the solution this time? The moral o f the story is that, when uSing Gaussian or Gauss-Jo rdan dim ination to obtai n a numerical solution to a system of linear equations (i.e., a decimal approx imation), you should choose the pivots with care. Specifically, at each pivotll1g step, choose fro m alTlong ,,11 possible p ivots in a colum n the entry with the la rgest absolute value. Use row interchanges to bring thIS element into the correct posit ion and usc it to create zeros where needed in the column. This strategy is known as partial pivoti"g,
.6
3. Solve the foll owing systems by Gaussian elimlllation, first without and then with part ial pivoting. Take each calculation to th ree sign ificant digits. (The exact solutions arc given.)
+ O.995y == 1O.2x + I.OOy =
(il) O.OOlx
-
1.00
(b) lOx -
- 3x
- 50.0
7y
+ 2.09y + 6z
5x -
. [xJ [,.ooJ 1.00
Exact so]utlon: y =
= 7 = 3.91
y+ Sz = 6
x Exact Solullon: y
- I.DO
z
l.OO
0.00
I Counting Operations: An Introduction to the Analysis of Algorithms
Abu la'fur Muhammad ibn Musa al -Khwarizmi (c. 780-850) was an
Ambie mathematician whose book 1'liMb al-jabT w'l/ll/Ilujllualllll (c. 825) described the uSC of HinduArabiC nume.rals and Ihe rules of basic arithmehc. T he second word oflhe book's title gives rise 10 the English word a/gellm, ,md the word algorithm is dem/cd from
(11· Khwarizml's 113111e.
! ! •
G.l ussian and Gauss-Jorda n elimination arc examples of algorithms: systematic procedures deSigned to Implement a particular task-in this case, the row reduction of the augmented matrix of a system of linear equations. Algorithms arc particularly well sui ted to com pu ter implementation, but not all algo rithms are created equal. Apart from the speed, memory, and other atlributes of the com puter system o n which they are running, some algorithms are fast er than others. One measure of the so-called complexilyof an algori thm (a measure of its effi ciency, or ability to perform its task in a reason
Consider the augmented matrix
[A l b) =
2
4
6
3 - I
9 1
6 12
- I
8 1
OJ
• Count the number of operations requir~d to bring (A I b] to the row echelon form I
o
2 1
3 4 - \ 0
o
0
[1
(By"operation" we mean a mu lt iplica tion or a division.) Now count the number of operations needed to complete the back substitution phase of Gaussian cltminatio n. Record the \olal number of operations.
2. Count the number of operations nceded to perform Gauss-Jordan elim ination-that is, to reduce (A I b] to its reduced row echelo n form I
0
0 - 1
o
1
0
1
o
0
1
1
(where the zeros are introduced inlo each column immed iately after the leading I is created in that column ). What do your answers suggest about the relative efficiency of the two algori thms? We will now attempt to analya the algorithms in a general, systematic way. Suppose the augmen ted matrix [A I b ] arises from a linear system with n equations and /I vanables; th us. [A I b] is /I X ( /I + I):
(A I b J =
a" a"
a"
• ••
au
•• •
a" b, a" b, • •
(/,,1
• ••
a"
II""
b,
We will assume that row in terchanges are never need ed- Ihat we can always creale a leading I from a pivot by dividing by the pivot. 3.
(a) Show that
II
operat ions are needed 10 create the firslleading I: · ..
I
,
•• •
n,,1
n~
•
· ..
·..
·..
• ••
• • a"" bIt
(Why don't we need 10 counl an operation for the creation of the leading 1?) Now show that n operatio ns are needed 10 obtain the first zero in colum n I:
• o • I
• • . .. • • . ..
II
,
(Why don't we need to coun t an operation for the crcatlon of the zero itself?) When the first column has been "swept out," we have the matrix
• •• o • ... • • 1
•• •
• •
o • ... • • Show that the 10lal number of operations nceded up to this poi nt is "
+
(n - I ) n.
(b) Show that the total number of operations needtd to reach the row eche~ Ion form
-
1
• .. . • •
o
1
o
0
••
...
I ·
•
"
[ n + (n - I),,] + [(n - I) + (n - 2)(n - I) ] + [(n - 2) + (n - 3)(" - 2) ]
+ ... +
[2
+
1·2 ]
+
I
whICh si mplifies to
(e) Show that the number of operallons nceded to complete the back substitution phase is
(d ) Usi ng summation formulas for the sums in parts (b) and (e) (set Exercises 45 and 46 in Section 2.4 and Appendix B), show that the total Ilumber of operations, T ( n), perfofmed by Gaussian elim ination is
T(n) :::
1,,) + Ir - in
Since every polynomial function is dOllllllutcd by its leading term for large values of the variable, w~ see thai T ( tI) = ~ for large values o f II.
rr
rr
4. Show that G auss-Jordan el im Ination has '{( til = j tOlal operatlonS if we creale zeros above and below th e leadmg Is as we go. (This shows that. for lurge systems of linear equations, Gaussian elimination is fa ste r than this version of GaussJordan elimination .)
II
90
Ch3pler 2 Systems of Lincnr Eq unliolls
Spanning Sels and linear Independence The second of the three roads in our "trivium" is concerned with linea r combmfllions of vectors. V./e have seen that we can view solving a system of linear equations as asking whether a certam vector IS a linear combination of certain other vectors. We explore this idea in more de tail in this section . It leads to some very Important concepts, which \
Spanning Sets 01 Vectors We can now easily ans wer the question raised in Section 1.1: When is a given vc<:tor a linea r comb inat ion of other given vectors?
lXample 2.18
\ (ll) Is the vector
- \
\
2 a linear combination of Ihe vecto rs 0 and 3 3
\ 2 (b) Is 3 a linear coml>i nalion of the vectors 0 and 4 3
\
,
-3
- \
\ ?
- 3
Sollilion (a) \Ve wanllo fin d scalars x and y s uch that
I
x 0
+y
3
- 1
1
1
2
-3
3
Expandlllg, we obtain the system
x -
y= I Y= 2
3x -3y=3 whose augmen ted matrix is 1
- I
o
\
3
I
2 - 3 3
(Observe that the column s of the augmented matrix are just the given vectors; notICe the o rder of the vecto rs-in particular, \'Ihic h vC<: tor is the co nstant vector.) The reduced echelon form of this matrix is
\
o o
0 3 \2 0 0
( Verify thiS.) So the solution is x = 3,), = 2, and the corresponding linear combinatio n is - \
\
3 0 3
+2
\
- 3
\
=
2 3
!J1
$e(t lon 2.3 Spann ing Sets and Lme;u Independence
whose augmenled (b) Utilizing our observation in par t (3), we oblain a lmea r syslem matrix is 1
-I 2
0
1 3
3
-3 4
which reduces to
5 3 0 0 0 - 2 1
0 1
2 a linenr com revealmg Ihat Ihe system has no sol ution. Thus, in this cnse, 3 is not I
bination of 0 and 3
4
- I
I -3
of linea r The nOli on of a span ning set is int imately connected wIth the solu tion mented masystems. Look back at Example 2. 18. Th ere we saw thn' a system with aug colum ns of trix fA I bl has a solution precisely when b IS a linear combination of the A. This is a gene ral fact , summarized in the next theQrem .
TheOI•• 2.4
nt if and A system of linear eqU
M,
LeI's revisit Example 2.4, iOlcrpreting it inlighl of Throfcm 2.4. (a) The system
x - y::: I x
+
y = 3
has the uniq ue solution x = 2,y ::: I . Thus,
See Figure 2.8(a).
(b) The system
x-
y :;:: 2
2x - 2y
=-
4
that has infinitely many solu tion s of thc form x = 2 + t, y::: t. This imp lies
for all valu es of t. Geo met rically, the vectors
[ ~ ], [=~], and [~] are all parallel and
so all lie along thc same line through the origin Isec Figure 2.8(b) J.
Chapter 2 Systems of Linear Equations
92
)'
!'
!'
5 4
4
3
3
3
2
2
- 2 - 1
1 - 1
-
0
x
x
-2 - 1
3
2
2
-2
-2
-2
-3
-3
-3
(.)
Jlllni
-2
3
3
(0)
(b)
Z.' (e) The system
x- y= 1 x-y=3 has no solutions, so there arc no values of xand y that satisfy
In this case, [ : ] and [
=:]
an: parallel. but
[ ~] does not lie along the same line
thro ugh the o rigin [sec Figure 2.8(e)]. We will often be interested in the collection of allli nclIT combinations of a given sc i o f vectors.
Dellnltlon
If S = Iv!' v 2' •••• v l } is a set of vectors in R • then the set of all linear combi nations of vI' v1, • ••• vi is called the spa" of vI' v 2•... , v l and is denDled by span(vp v2•... , vl ) o r span(S). If span{S) R". lhcn S is called a spa,,ning set for R~.
lumpla 2.19
Show th at Rl = span(
Solulioil
We need to show Ihat an arbitrary vecto r
combination of [ _
1~]
=
[ _ ~]. [~]).
[~] can be ..... ritlen as a linear
~ ] and [ ~ l thai is, we must show that the equat ion x[ _~] +
[ ~] can always be solved for x and y (in terms of
valucs of a and b.
Ii
and b), regardless of the
93
Section 2 3 Spanning Sets and Linear Independence
The augmented matrix is [ _ 1
a]
II -
II.,
>
3 b
~ ~
:l
and row reduction produces
!~
I 3 bj ll [ 2
I
[ -
'
0
II
3 b 7 (/ + 2&
j
at which point it is clear that the system has a (unique) solut ion. (Why?) If we contin ue, we obtain
" [-'o >
3
b I (a + 2b)f7
J, ,- w [-I ~
0
from which we see that x == (3 11 - b)/7 and y = (a (la nd b, we have
0
(b - 3a )/7 ] I (11 + 2b)/7
+ 2b)/7. Thus, fo r any choice o f
(Ched this.)
He.,rk It is also tr ue that IR? "" fi nd xand y such that
o[~] "" [:l
span( [ _~]. [ ~l [ ~]} If,givcn [ :]. wecan
x[ _ ~ ] + y[ ~ ] =
[:l
then we also have
x[ _~] + y[ ~] +
in fact, any sct of vecto rs that cOIl taillS
a spanning set for 1R2(see Exercise 20). The next example is an important (easy) case of a spanningsct. We will encounter versions of this example many times.
EKample 2.20
x Let e p e1 • and e 3 be the standard unit vectors in
RJ. Then for any vector y
o
o
y = x 0
+)' I
+ z 0
o
o
x
1
,
,
= xe l
+ ye2 +
>
we have
ZC.l
1
Thus. Rl = span(e l , e2, ( 3 ). You should have n o difficulty seeing lhat, in general, R~ "" span(e l' c! • ... , e~).
-t-
When the span of a set o f vectors In a d escription of the vectors' span .
EKBmple 2.21
1
Find the span o f 0 and
3
R~
is n ot all of lR~, i\ is reasonable to ask for
- I
1 . (See Exam ple 2.18.)
-3
94
Chapter 2 Systems of Linear EquatIOns
,
Solutloll Thinking geometrically, we can see that the set of all linear combi nations of - I
]
o
and
I IS Just
]
the plane through the origi n with
0
-3
3
x
plane
]
as direction
- 3
x y
,
- ]
]
= , 0
+ 1
,
- 3
3 1
1
- I
y is in the span of 0 and
which is just another way of sayi ng that
Two nonparallel ve<:tors span a
and
3
vectors (Figure 2.9). The vector equ:lIion of this pl ane is
figure 2.9
- I
] z 3-3 Suppose we want to obtain the general eq uation of this plane. There are several ways to proceed. One IS to usc the fa ct that the equation ax + I,y + cz = 0 must be satisfied by the points ( ], 0, 3) and (-I, I, - 3) determined by th e di rection vectors. Substitution then leads to a system of eq uations III (I, II ,\I1d c. (See Exercise 17.) Another method is to use the system of equations arising from the vector equation:
s-
r= x r = )'
3s - 31 =
Z
If we row reduce the augmented matrix, we obtain 1
0
3
- I
X
• o
1 Y
,
- 3
]
11.-)11
o
- I
x
]
Y
0 z - 3x 1
x
Now we know that Ihis system is consistent, since y is in the span of 0 - I , 3 1 by assumption. So we 11/115/ have z - 3x = 0 (or 3x - z = 0, in more standard
- 3 for m ), giving us the general equation we seek.
Remerk
A normal vector to the plane in this example is also given by the cross
product
1
-1
- 3
OX]
°
)
1
- )
lineal Independence -I
1
In Exam ple 2. 18, we found that 3 0
+
2
1
1
2 . Let's abbreviate thIS equa-
- 3 3 3 l ion as 3u + 2v = w. The vector w "depends" on u and v in the sense that it is a linear com blllation of them. We say that a set o f vectors is linearly depetldetlt If one
Section 2.3
Spanning Sets and Lmear Independence
95
of them can ~ written as a linear combination of the others. Note that we also have u = - ~ v + l w and v = u + ! w. To get around the question of which vecto r to express In terms of the rest, the formal definitio n is stated as follows:
-!
Definition
A sct of vectors vI' v~, ... , v l is linearly dependent jf there are scalars 'I' ':2, ... , ,~, (1/ least Ollt: of whi,h is IIo t zero, such that
'I VI
+ '2V2 + ... +
'''V~
= 0
A set of vectors that is not linearly dependent is called lillcarly
i"dependellt. •
RemarllS • In the definition of linear dependence, the requirement that at least one of the scalars 'I' ':2, .. . , 't must be no nuro allows for the possibility that some may be zero. In the example above, u, v, and w are li nea rly dependent, since 3u + 2v - w = 0 and, in fact, all of the scalars are nonzero. On the o ther hand,
so
[~], [~], and [~] are linearly dependen t, since at least o ne (in fact, two) of the
three scalars I, - 2, and 0 is nonzero. (Nott' that the actual dependence arises si mply fro m the fact that the fi rst two vectors are multiples.) (See Exercise 44.) • Smce OVI + OV1 + ... + O v~ ::: 0 fo r any vectors VI' v 2, •• • , v,, linear dependence essentiall y says that the zero vector can be express(.'(\ as a IIontnvia/l inear combinat io n of v,. VI"'" V.. Thus. lincar mdependence means that the zero vector can be expressed as a linear combination of VI' v 2" •• , vt ollly in the trivial way: ' IVI + Czv2 + ... + 'tV, = 0 o nly if ' I ::::c 0, Cz = 0, ... "t = O. •
.5
-
The relationship between the intuitive notion of dependence and the formal definition is given in the next theorem. Happily, the two notions are eq uivalent ! Vectors VI' VI' ... , Vm in R~ are linearly dependent if and o nly if at least one of the vectors coin be expressed as a linear combination of tht' others.
PrlGl If 011t' of the vectors--say. VI-is a linea r combina tion of the others, then such that VI = ':2V2 + ... + ,",v"" Rearranging, we obtam there are scalars ' 2" ' " VI - Czv~ _ .. - cmv'" = 0, which implies that VI' VI" '" V", are hnearlydependent. since at least o ne of the scalars (namely, the coefficient I of v.) is nonzero. Conversely. suppose that v I' V I " '" V m are linearly dependent. Then there are ':2, .. . , ("" not all zero, such that + ' 2V1 + ., . + '..,V,. = O. Suppose scalars CI o. Then
'm
"*
'1'
and we may multiply both sides by other vectors:
'IV.
1/ '1to ob tain VI as a linear combina tio n o( the
-----
96
Chapter 2
Systems of Lmear Equations
Mall
It may appear as if we are cheating a bit in th is proof. After all, we cannot be sure that VI is a linea r combination of the other vecto rs, nor that ci is nonzero. However, the argumen t JS analogous for some other vector v, or for a different scalar Cj . AJternati veiy, we can just relabel things so that they work ou t as in the above proof. In a situation like this, a mathematician might begin by sayi ng, " withou t loss o( generality, we may assume that VI is a linear combination of the other vectors" and then proceed as above.
Example 2.22
Any set of vectors containing the zero vector is linearly dependent. For i( O, v2, · .. , Ym a re in R", then we can find a nont rivial combination o f the form CIO + C:!Yz + . + C,,, Y m = Obysetting c1 = I andC:! = cJ = . . = c,n = O.
Example 2.23
Determine whether the fo llowing sets of vectors are linearl y independent:
[:],nd [-:] 1
{el
- I .
o
o
1
0
I ,
I , and 0
o
1
(b) - I
I , and
(dl
0
- I
1
1
1
1
2.
I ,a nd
4
o
1
1
- I
2
Solullol
In answe ri ng any question of this type, it is a good idea to see if you can d etermlOe by inspection whether one vector is a linear combination of the others. A li ttle tho ught may save a lot of com putatIOn !
Jtt) The only way two vecto rs can be li nearly dependen t is If one IS a multiple o f the other. (Why?) These two vecto rs aTe dearl y not multiples, so the y are linearly independent. ( b) There is no obvious dependence relation here, so we try to find scalars such that 1
'1
o o o
0
1 +£21
o
'p ' 2' cJ
1
The correspond ing linear system is
c, + c, + c,
CJ = 0 ~ O
~ + C,
= 0
and the augmented matrix is 1
0
1 0
1
1
0
1
0 0 1 0
Once again, we make the fundamental observation that the colum ns of the coefficient matrix are jus t the vectors in question!
Section 2.3
Spanning Sets and Li near Independen ce
91
The reduced row echelon form is
(check this), so
I
0
a
o o
I
0 0
0
I 0
0
'I = 0, ':! "" 0, CJ "" O. Thus, the given vectors are linearly independent.
(c) A Im1e reAection reveals that
1
0
-\ +
+
1
o
- I
- ]
0
0
0
I
0
so the three vectors are li nearly dependent. (Sct up a linear system as in parI (b) to check this algebra ically.) (d ) O nce again, we observe no ohvious dependence, so we proceed directly to red uce a homogeneous linear sys tem~h ose augmen ted matrix has as ils columns the given vectors: II,
\
\
\ 0
2
\
4 0
0
- \
2 0
II,- ~ II ,
•
\
\
\ 0
0
- \
2 0
0
- \
2 0
+ H,
• •, R.
\
0
3 0
0 0
\
- 2 0
0
0 0
If we let the scalars be c" ' 2' and '1, we have 'I
+
3(3 ::::
0
'1 - 2c, :: 0 from which we sec that the system has l1lfinitely many solutions. In particular, there
must be :1 nonzero solution, so the given vectors are li nearly dependent. If we continue, we can describe th ese solutions exactly: ' I "" - 3(, and Cz == 2cJ • Thus, for any nonzero value of C3• \Ve have the llnear dependence relation 1
- 3c, 2
1
+ 20
o
1
0
o o
1
-\
(O nce again , check that this is correct. )
•
We sum marize this procedure (or testlllS for linear independence as a theorem.
Theorem 2.6
leI
V I' V1•. •. , v'" be (colum n) IV I v2 • . • v",J with these vectors
vectors in [Rn and let A be the IIXm matrix as its columns. Then V I' v!' ...• v.. are linearly dependent if and only if the homogeneous linear system with augmented matrix iA I OJ has a nontrivial solu tio n. vI' v2' • •• , v", are lin early dependent if and on ly if there are scalars cp c2" . • , cm ' not all zero, such that ' I V I + S!v1 + .. . + c..,vm == O. By Theorem 2.4, this is equiv-
Prool
" alenl to saying that the nonzero vector augmented m:ltrix is [ VI v2 . . . Y", I 0].
is a solution of the system whose
SI
Chapter 2 Systems of Linear Equations
Example 2. 24
The standard umt ve<:tors e p e , and eJ are linearly independent in R', Slllce the sys2 tem wI th augmenTed matrix [el e e I O[ is already in the reduced row 1 J echeloll form I
0
o [ o
0
0 0 0 0 I 0
and so dea rly has only the trivial solution. In general , we see that et. e , 2 ••• , eft will be linearly independent in R".
Perform ing c1ementary row oJXf3tlons on a matrix constructs linear combinations of the row s. We can use Ihis fact to come up wi th anOlher way 10 tesl vectors for linear independence.
Examp le 2.25
Conside r the thre e vectors of Example 2.23(d) as ro w vectors: [1,2 ,0)'
[ 1, 1, - 11,
and
/1, 4, 21
We constru ct a matrix with these vectors 3S its row s ilnd proceed to redu ce it to eche lon fo rm. Each tim e a row changes, we denote the new rO\~ by adding a prim e symbol:
I 2 0 < I I - I I
4
2
,
" • • 0 , 0,
I
2
0
0 0
-I
- I
2
2
/r, - II", >211"·
•
I
2
0
0 0
- I
- I
0
0
From this we sec that
or,
1!I
term s of the orig inal vectors, - 3[ 1.2. 0J + 2[ 1, I , -I] + [ 1,4, 2J ~ [O, O,O J
[No tice that this 3pproach corresponds to t3klllg pl,2 .23 Id)· 1
CJ
= 1 in the sol utio n to Exam-
Thus, the rows of a matrix will be linearly dependent if elem entary ro\~ operations can be used to create a zero row. We SUmm3r1Ze th is fi ndin g as follo ws: ,
."
Let V "
V2 ' ••• ,
v'" be (row ) vectors in
thes e vect ors as its rows. The n vI' v ' 2 rank (A) < m.
R~ and
• , _,
let A be the mX /I matrix
' . II " with ~"
v. v'" art' linearly dependt:nt if and only jf
Proof Ass ume that v,, VI " .• , V... are linearly dept"ndenl. Then, by The orem 2.2, at least one of the vectors can be wriHen as a linear combination of the othe rs. We
Section 2.3
Span ning SC1S and Lmear Independence
99
relabel the vectors, if necessary, so that we can write v", "" civ i + C2V2 + ... + c",_lvm_I' Then the elementary row operatIOns Rm - c1RI • Rm - c2R2, • • . , Rm - cm_I Rm_ 1applied to A will create a zero row In row til. Thus, rank.{A) < n1. Conver$ely, assume that rank(A) < m. Then there is some sequence of row operations that will create a zero row. A successive subst it ution argument analogolls to that used in Example 2.25 can be used to show that 0 is a nontrivial linear combination of v I' v 2' . •• , v .... Thus, vI' V I" •• , V m are linearly dependent. In some si tuations, we GIn ded uce that a set ofvec lors is linearly dependent wi thout domg any work. One such sItuation IS \"hen the zero vector is III the sct (as in Example 2.22). Ano ther is when there arc " too many" ve
Theorem 2.8
•
Any set of til vectors in
Prool Let
R~
is linearly dependent if III >
II.
v 2' •••• v m be (colum n) veCiors 10 R" and let A be the " X III mal rix [VI v2 . •• v",1 wllh these vectors as its co lum ns. By Theorem 2.6, v2••••• v", are linearly dependen t if and only if the homogeneous linear system with augmented matrix fA I OJ has a nontrivial solution. But , according to Theorem 2.6, this will always be the case if A has more columns than rows; it is the case here. since number of col umns m is greater than number of rows fl.
Example 2.26
Vi'
The vectors
v,.
[ ~ ]. [ ~ ], and [~] are linearly d ependent , since there cannot be more
than two linearly indcpcn(lent vectors in [R2. (Note that if we wan t to fi nd the actual dependence relation among thest three vectors, we must solve the homogeneous system whose coefficient matrix has the given vectors as columns. Do this!>
III Exercises 1-6. determine if the vector v is a linear combl/la/iou of the remainillg vectors.
I.y~ [~]. u, [ _ : ]. u, ~
[_:]
2.Y~ [a u,~ [ _;]. u, ~
[-:]
1
3. v =
, til
=
4.v ""
-
eM
6. v =
1
3 2, u , - I
1 I ,u 2 =
o
2 , ul =
l , u2 =1 ,
3
o
3.2
1
3
1
0 I 1
0 1
1
o
1
2
5. v ""
1
2.0 , u l = -2.6
-1.2 uj
""
0.2 -1.0
1.0 0.4
4.8
3.4
, ul
:=
1.4 •
- 6.4
lal
Chapter 2
Systems of Linear Equations
(b) In part (a), suppose in addition that each vJ is also a linear combination of u l ' . .. , u l - Prove that span(u" ... , u l ) = span{v l . ··· , vm). (c) Usc the result of part (b) to prove that
III Exercises 7 {lIId 8, determine if Ihe vecror b is ill the span of 1111: coi/ll/lllS of tile I/wlrix A
7.A =
[~ ~J. b = [~] 3
10
I
I
I
456,b =
11
0,
I ,
I
7
12
001
I
8.A =
2 8
9 2
9. ShowthadR =
SP:I11([:l [ _ ~]}
10. Show that W = span( [
Rl
I
0
0 • I • I 0 I I I
12. Show that
Use II,e metitod of Example 2.23 and Theorem 2.6 to deterlilli if' if tile sets of vectors ill Exercises 22-3/ (Ire linearly 1/1 depelldelll. If, for allY of titese, the answer ({III be detenllll1ed by ill5peetion (i.e., WllllOlIt en /Cillatioll ), state why. For any sets t/tnt are linearly depel1defll, find (/ dependence relationship (Huol1g tlte vectors. I I I 2 I 23. I • 2 • - I 22. - I , 4 I 3 2 3 4
_~], [~] ). I
II. Show that jRl = span
[Him: We know that IRl = span(e l' e2> e,).)
- I
2
2 • - I • I - I 3 0
= span
24.
Exercises /3- 16, describe tile Spl/II of ri,e givel1 I'ec fors (a) geometrically lind (b) alge/Jraica//y. III
13.[_:].[-;] 15.
I
3
2,
2
o
14. [~ ].[!] 1
16.
- I
0, ..... 1
J
2 •
I •
I
2
26. - I
0
1 ,- 1 0 I
18. Prove that u , v, and ware all in span( u, v, w ). 19. Prove that u, v, and w are all in span(u, u v + w ).
4
7 I
28.
2
,
3
2
2 2 •
3
I
- I
I
0
- I
I
0
I
0 •
I
0 0
0 0 •
2
I
=
31.
,
I
•
0
4
3
3
2 •
2
I
l
- I
I
- I
- I
3
I
- I
- I
•
, 0
.2
3
I
I
3 I
• I
- I
-I
I
0
I
- I
4
•
2
I
I
I
2
6 0 27. 4 • 7 • 0 5 8 0
I • 0 2 3
5
0
3
5
3
0
30.
25.
2
- I
29.
-5
3 • - I •
+ v, u +
20. (a) Prove that if u l • . .• , u mare vectors in R~, S = { u p u ~"" , u kl, a nd T = {u p ...• Ul. U hP .•• , uml. then span(S)!: span( T). (Him: Rephrase this questio n in terms of linear combinations.) (b) Deduce that if R" span (S), then R~ span( T) also. 21. (a) Suppose that vector w is a linear combi natio n of vectors U I' ••. , u •. an(\ that each u, is a linear combination of vectors v I" . . , v,,,. Prove that W IS a linear combination of V I" •• , v'" and therefo re span(u p . .. , Uk) \: span( v p ••• > von)'
I
- 2
17. The general equation of the plane t hat contains the points (1, 0, 3), (-1, I, - 3), and the origin is of the form ax + by + cz = O. Solve fo r (/, b, and c.
=
2
I - I
•
3 I
•
I
3
111 Exercrses 32- 4 J, determine If ri,e sets of vectors 1/1 the given exerCISe are Iim~ilrly itldependelll by cOllverting rile
Section 2.4
vectors to row vectors (Il1d usmg tile method of Example 2.25 (/lid Theorem 2.7. For any sets that are lil/eMly dependem, find a dependence relationship among the vectors. 32. Exercise 22
33. Exercise 23
34. Exercise 24
35. Exercise 25
36. Exercise 26
37. Exercise 27
38, Exercise 28
39. Exercise 29
40. ExerCISe 30
41. Exercise 3 1
Applications
1D1
(b) If vectors u , v, and ware hnearly independent, will u - v, v - w, and u - w also be linearly independent? Justify your answer. 44. Prove that two vectors are linearly dependent if and o nly ,f one is a scalar multiple of the o ther. ( Him: Sepa rately consider the case where one of the vectors is 0.) 45. Give a "row veclor proof" of Theorem 2.8.
42. (a) If the columns of an fi X /I mat rix A are linearly independent as vecto rs in IR", what is the rank of A? Explain. (b) If the rows of an nXn matrix A are linearly independent as vectors in R ", what is the rank of A? Explain. 43. (a) If vecto rs u, v, and w arc linearly independen t, will u + v, v + w, and u + w also be linearly independent? Justify your answer.
46. Prove that every subset of a linearly independent set is linearly independen t. 47. Suppose that 5 = Iv 1" •• , v ~, vI is a sel of ve(.to rs in some R" and that v is a linear combination of v..... , vk. lf S = !vp ... , vk}, prove that span (S) = span (S'). [Hint: Exercise 2 1(b) is helpful here. ] 48. Let {v" .. . , v~l be a linearly independent set of vectors in R", and let v be a vector in R ~. Suppose Ih"l v = e,v l + C2V1 + ... + ck v k with CI *- O. Prove that lv, v!" .. , v,l is li nearly independent.
Applications There are too many applications of systems o f linear equations to do them justice in a single section. This section Will introduce a few applications.. to illust rate the diverse settings in which they arise.
Allocation 01 Resources A great many applications of systems of ti near eq uations involve allocating limited resources subject to a set of constraints.
Example 2.21
A biologist has placed three st rains of bacteria (denoted I, II, and III ) in a test tube, where they will feed on threedifTerent food sources (A, 13, and C). Each day 2300 units of A, BOO units of B, and 1500 units of C are placed in the test tube, and each bac· terium consumes a certain nu mber of units of each food per day, as shown in Table 2.2 . How many b'lCteria of each strain ca n coexist in the test tube and consume ti ll of the foo d?
Table 2.Z
Food A Food B FoodC
Bacteria Strain I
Bacteria Strain II
Bacteria Strain III
2
2 2 3
4
I I
0 I
112
Chapler 2 Systems of Linear Equations
Sol.tl..
Let XI' x 2 , and x) be the numbers of ba cteria of strains [, II , and 1[ [, respectively. Smce each o f the XI baCieria of strain I consumes 2 units of A per da y, strain 1 consumes a total of 2xI units per da y. Similarly. strai ns II and III consume a to tal of 2x2and 4xJ units of food A daily. Since we W
+ 2x~ +
4Xl = 2300
Likewise, we obtain equations corresponding 10 the consumption of 13 and C: XI XI
+ 2X2 + 3x! +
== 800 xJ == 1500
Th us, we have a system of three linear equations in th ree variables. Row reduction of the corresponding augmen ted malTi" gives 2242300 1 20800 I
3
I 1500
-
1 0 0 100 ..1010350 001350
Therefore, Xl = 100, x 2 = 350, and X J = 350. The biologist should place 100 bacteria o f strain I and 350 of each of strains II and III in Ihe test lube If she walliS all the food to be consumed .
Example 2.28
Repeat Example 2.27, using the data on daily consumption of food (u nits per day) shown in Table 2.3. Assume this time that 1500 units of A, 3000 units of B, and 4500 units o f e are placed in the test tube each day.
Table 2.3
Food A Food B
Foodc
Bacteria Strain I
Bacter ia St ra in II
Bacteria Strain III
1 1 1
1 2
1
3
3
5
hIIU.. Let xl'~, and xJ again be the numbers of bactcna of each type. The augmented matrix for the resulting linear system and the correspondlllg reduced echelon form are 1 I 1 1500 1233000 I 3 5 4500
1 0
- I
0
, o
1
2 1500
o
0
o
0
We sec that in Ihis case we have llIore than o ne solution, given by
o
x, Xl
+ 2xj
"" 1500
Letting Xj = /, we obtain x ! = t. ~ "" 1500 - 21, and x.I = I. In any applied problem, we must be careful to interprct solutions properly. Certalilly the number of bacteria
Section 2.4
Applications
113
cannot be negative. Therefore, t iii?:: 0 and 1500 - 2t 2:. O. The latter inequality implies that t :S 750, so we have 0 S t :S 750. Presumably the number of baCieria must be a whole number, so there are exactly 75 1 values of t that satisfy the inequality. Thus, our 751 solutions are of the form XI
t
Xi
1500 - 2t t
~
0 =
1500 0
I
+t
- 2 I
one for each integer value of t such that 0 :S t <: 750. (So, although mathematically this system has in finitely many solutions, physically there are only finitely many.)
BalanCing Chemical (quilions
...t
When a chemical reaction occurs, certain mo lecules (the rcactwltS) combine to form new molecules (the products). A balanced chemical equation is an algebraic equation that gives the relative numbers of reactan ts and products in the reaction and has the same number of atoms of each type on the lefl - and right -hand sides. The equation is usually written with the reactants on the left , the products on the right, and an arrow in between to show the direction of the reaction For example, fo r the reaction in which hydrogen gas ( H2) and oxygen (0 2) combine to form water (H,O), a balanced chem ical equation is
indicat ing that two molecules of hydrogen combme wi th one molecule of oxygen 10 form IWO molecules of waler. Observe that the equalion is balanced, since there are four hydrogen atoms and twO oxygen atoms o n each SIde. Note that there will never be a un ique balanced equation (or a reaction, since any positive integer multiple of a balanced equation will also be balanced. For example, 6H 2 + 30, ) 6HzO is also balanced. Therefore, we usually look for the simplest balanced equation for a given reaction. While trial and error will often work in simple exa mples, the process of balancing chemical equations really involves solving a homogeneous system of linear equations, so we can use the techniques we have developed \0 remove the guesswork. ,
The combustion of am monia (NHj ) in oxygen produces nitrogen (N2 l and water. Find a balanced chemical equation for this reaction. SaluUaa If we denote the numbers of molecules of ammonia,oxygen, nitrogen, and water by 11', x, y, and z, respectIvely, then we arc seeking an equation of the form
Compari ng the numbers of nitrogen, hydrogen , and oxygen atoms in the reactants and products, we obtatn three linear equations: Nitrogen: II' = 2y Hydrogen: 3w = 2z Oxygen: 2x = z Rewriting these equations in standard fo rm gives us a homogeneous system of three linear equations in fo ur variables. INotice that Theorem 2.3 guarantees that such a
104
Chapter 2 Systems of l inear Equations
system will have (infinitely many) nontrivial solutions.] We reduce the correspondin g augmented matri x by Gauss-Jordan el iminatIon.
- 2y
w 3w 2x
Thus,
IV
= 0 1 0 -2z = 0 -->1 3 0 -
= ~ z, X =
z= 0
0
2
-2 0 0
0 0 -2 0 - 1 0
100 -
•• 1
0
I
o
0
0
, -t 0 - ,, 0 -, 0
1z. and y = tz. The smallest positive value of zthat will produce
in feger values for all four variables is the least common denominator of the fractions j , ~ , and S-name1y, 6-which gives IV = 4, x = 3, Y = 2, and z = 6. Therefore, the bal anced chemical equation is
Network AnalysiS
)'0
Many practical situations give fl se to networks: transporta tion networks, communi cations networks, and econom tc networks, to name a few. O f particular interest are the possible floll's through networks. For exa m ple, vehicles flow through a network of roads, mformation flows through a data network, and goods and services flow through an economic netwo rk. For us, a lIetwork will consist o f a finite number of nodes (also called junctions or vertices) connected by a series of directed edges known as branclu!s or arcs. Each b ranch will be labeled with a flow that represents the amount of some commodity that can fl ow
Figura 2.10 Flow at ~l node f, + J; "'" 50
(Kample 2.30
Figure 2.10 shows a portIon of a network, with two branches entering a node and two leaving. The conservation of flow rule implies that the total Illcoming flow, h + h unit s, must match the total outgoing flow, 20 + 30 un its. Thus, we have the linear equation h + h = 50 corresponding to this node. We ca n ana lyze the flow through an entire network by const ructing such equ
Describe the possible flows through the network of water pipes shown in Figure 2.11 , where flow is measured in liters per minute.
5alulloD
At each node, we write o ut the equation that represents the conservation of flow there. We then rewrite each equatIOn with the variables on the left and the constant on the right, to get a hnear system in standard form. Node A:
15 = h + J.
Node 8:
h= h +
10
Node C: h+f,+5=30 Node D: f.t + 2O =f,
•
+ J. = 15 f. = ]0 f. - I, = 25 I,+f, f, - f.t = 20
Section 2.4
10
B
•
1,1
/,1 20
105
Applications
f; ~
•
5
•
c
0
1
30
flgurI2.11 Using Gauss-Jordan elimination, we red uce the augmented matrix:
I
0
0
lIS
I
- I
0
o
I
I
o
0
I
o o
10
I
0
0
lIS
, o
I
0
5 - 1 20
001
25 - I 20
o
0
0
I
o o
(Check this.) We see that there is one free variable.}:;. so we have infinitely many solutions. Setting}:; = t and expressing the leading variables in terms of .r.;. we ob ta in
/J = 15 - r
j,= 5 - r h = 20 + r r ~= These equations describe all possible flows and allow us to anal yze the ne twork. For example. we sec that if we control the fl ow on branch AD so tha t t = 5 Umin. the n the ather flowsareJ; = lO,h = 0. and A = 25 . We can do even better: We can find the m inimum and maximum possible flows on each branch. Each of the fl ows must be nonnegative. Examining the fi rst :and second equations in turn , we see that I ::::: 15 (otherwise f, wo uld be neg:ative) and 1:5 5 (otherwise h would be negative). The second of these inequalities is mo re restrictive than the fi rst,so we must use it. The third equat ion contributes no furt he r restrictions on our parameter /, so we have deduced that 0 :5 1 :5 5. Combining this result with the four equati ons, we see that
10:5/J S I5
Os h :S 5 20 <
hS
25
0
.t
cornpJo" d"",pt;on of th, po,,;b], flow, thmugh th;, ""Work .
Chap ter 2 Systems of linear Equatio ns
Electrical Networks Electrical networks arc a specialized type of netwo r k providing infor matio n about power sources, such as batteries, and d evices powered by these sources, such as light b ulbs o r motors. A power source "forces" a current of electrons to fl ow through the network, where it encou nters various resistors, each of which requires that a certain amount of fo rce be applIed in order for the current to flow through il. The fundamen tal law of electricity is O hm's law, which states exactly how m uch force E is needed to drive a cu rrent 1 through a resistor with resistance R.
Ohm's law
fo rce = resistance x cur re nt
E = R1
-i
Force is measured in volts, resistance in ohms. and curren t in amperes (o r amps. for short). Thus. in terms of these un its, Ohm's law becom es "volts = ohms X amps," and it tells us what the "voltage drop" is when a current passes through a resisto r- that is, how much voltage is used up. Current fl ows o ul of the positIve terminal o f a battery and flows back into the n egative terminal, traveling around one or more closed circu its in the process. In a diagram o f an electrical network, batteries are represented by (where the
-11-
positive terminal is the longer vertical bar) and resistors are represented by -A./Vv-. T he fo llowing two laws, whose discovery we owe to Kirchhoff, govern electrical net works. The firs t is a "conservation o f fl ow" law at each node; the second IS a '"balancing of voltage" law aro u nd each circuit.
Kirchholl's laws
C urrent Law (nodes) The sum of the currents flowing into any node is eq ual to the sum o f the cu rren ts flow ing out of that node.
Voltage Law (circuits) The sum of the voltage dro ps around any circuit is equal to the total voltage around the circuit (provided by the batteries).
Figure 2.12 illustra tes Kirchhoff's laws. In part (a), the current law gives II = 12 + IJ (or I, - 12 - IJ = 0, as we will write it); part (b) gives 41 = to, where we have used O hm's law to compute the voltage drop 41 at the resistor. Using Kirchhoff's laws, we ca n set up a system of linear equations that will allow us to determ ine the currents in an electrical network.
Example 2.31
Determine the currents I ]> 12, and IJ in the electrical network shown in Figure 2. 13.
Solution
This network has two batteri es and four resistors. Current II fl ows th ro ugh the top branch RCA, current 12 flows across the middle branch AB, and current I, fl ows through the bOllom branch BDA. At node A, the currentlalV gives II + 13 = 12, or
11 - 12 + 1, = 0 (Observe that we get the same equatIon al node 8.)
Section 2.4
•
.
1
-
I,
1
101
I,
•
B vo lts
~'=-,
2 ohms
12
10 volts
•
c
I,
I\pplications
A
t,
•
2 ohms
•
B
I ohm 4 ohms
•
4 ohms
(a) 11 = 11 + / J
I,
, I,
0 16 volts
(b) 4/ = 10
Figure 2.13
Flgur, %.12
Next we apply the voltage law for each ci rcuit. For the circuit CABC, the voltage drops at the resistors are 211, / 2' and 211, Thus, we have thc equatio n
Similarly, for the circui t DARD, we obtain
/z +41J = 16 ( No tice that there is actually a third circuit, CADBe, if we "go aga inst the fl ow," In this
case, we must treat the voltages and resistances on the "reversed" paths as negative. Doing so gives 211 + 211 - 41) = 8 - 16 = - 8 or 411 - 41) = - 8, wh ich we observe is just the difference of the voltage equations for the other two circuits. Thus, we can omit this equation, as it contributes no new in format ion. On the other ha nd. including it does no harm.) We now have a system of three linear equations in lhree variables:
12 +
0
1') ::=
I,
8
12 + 4/J
16
""
Gauss- jordan elimination produces J
-J
J
0 8
4
1
0
0
1
4 16
,
J
0
0
J
0 1 0 4
0
0
J 3
Hence, the cu rren ts are II = I amp, I z "" 4 amps, and 13 = 3 amps.
Remarll
In some elect rical networks, the currents may have fractional values or m ay even be negative. A negative value simpl y means that tlte current in the corresponding branch flows in the d irection opposite that shown on the network diagram.
CAS
Example 2.32
The network shown in Figure 2.\ 4 has a single power sou rce A and five resistors. Find the currents I, I I' ... , Is' This is an example o f what is known in electrical engineering as a Wheatstolle bridge circuit.
108
Chapter 2 Systems of Linear Equations
2 ohms
I ohm
c
2 ohms
I, B
I,
•
j
I,
•
£
I ohm
D 2 oh ms
A
,
,
I
10 volts
I
Figure 2.14 A bridge CIrCUli
Solullon
Kirchhoff's current law gives the followi ng eq uations at the four nodes: NodeB:
1-
Node C:
II - 12 - I, = 0
Node D:
1 - 12 - I, = 0
Node
E:
/1
- /4
= 0
I} + 14 - Is = 0
For the three basic circuits, the voltage law gives Circuit ABEDA:
14 + 2/5 = 10
Circuit BCEB:
2/1 + 21j - 14 = 0
Circuit CDEC:
12 - 215 - 21} = 0
(Observe that branch DAB has no resistor and there fo re no voltage drop; thus, there is no 1 term in the equation for ClfCUIt ABEDA. Note also that we had to change signs three times because we went "against the current." T his poses no problem, since we WIll let the sign of the answer determine the d irectio n of curre nt fl ow.) We now have a system of seven equations in six variables. Row reduction gives
I 0
I
- I I
0
0 0
0 0 0
0 2 0
0
0
- I
- I - I
- I
0
0 0 0
I
0 0
0 0 0
- I I I - I 0 I 2 10 0 2 -I 0 0 0
- 2
0
0
- 2
0
)
I 0 0 0 0 I 0 0 0 0 I 0 0 0 0 I 0
0
0
0
I
0 0
0 0
0 0
0 0
0 0
7 3
0
0
0 0
0 0
0
0 - I 0 4
4
I
3
0
0
(Use yo ur calculator or CAS to check this.) Thus, the solution (in amps) is / = 7, II = 15 = 3, 12 = l~ = 4, and I, = - 1. The sIgnificance of the negative value here is that the current th rough branch CE is flowing in the di rection opposite that marked o n the diagram.
-i
Remark There is o nly o ne power source in this example, so the single to-volt battery sends a current of 7 am ps through the network. If we substitute these values
SeCIIon 2.4
Applications
109
into Ohm's \aw, E = RI, we get 10 = 7 R or R = If.Thus, the entire network behaves as if there were a si ngle .!j-oh m resistor. This value is called the effective resls/wlce (Rclf ) of the network.
Finite linear Games There are many situations in \vhich we must consider a physical system that has only a finite number of states. SometllTles these sta tes can be altered by applying certain processes, each of which produces fmi tely many outcomes. For example, a light bulb can be on or off and a switch can change the state of the light bulb from on to off and 1'lCe versa. DIgital systems that anse in compu ler science are of\Cn of this type. More fri volously, many computer games fea ture pu zzles in which a certain devi ce must be manipulated by various slvi tches 10 produce a desired outcome. The fimt cness of such sit uations is perfectly suited to analysis using modular arithmClic, and often lin ear systems over some Zp playa role. Problems IIlvol ving thIS Iype of situation arc often c
lxample 2.33
A row of five lights is controlled by five switches. Each switch changes the state (on or off) of the light directly above it and the states of the lights immediately adjacent to the left and right. For example, if the first and thi rd lights are on, as 111 Figure 2. 15(a), then pushmg switch A changes the state of the system to that shown in Figure 2. 15(b). Ifwe next push switch C, then the resuit is the state shown in Figure 2.15(c) Suppose that initially all the lights are off. Ca n we push the switches in some order so that only the fi rst, third, and fift h lights will be on? Can we push the swllches in some order so that only the first light will be on?
A (,j
(b)
figure 2.15
Solullon
The on/off nature of this problem suggests that bi nary nota lion will be he1l-lful and that we should work with Z2" Accordingly, we represent the states of the five lights by a vector in 1"~ , where 0 represents off and 1 rel-lrescnts on. Thus, for example, th e vector
o I
I
o o corresponds to Figu re 2. 15(b).
110
Chaplcr 2 SysTcms o( l.me:lr Equations
We may also use vecto rs in ~ 10 represen t the action of each switch. If a switch changes the state of a light, the correspondi ng component is a I; o therwise, it is O. WLth this convent ion , the actio ns of the fi ve switches are given by
.
~
I
I
0
0
0
I
I
I
0
0
0 ,b =
I , c =
I • d
0
0
I
I
I
0
0
0
I
I
~
I
, e=
0
The situation d epicted in Figure 2.15(a) correspo nds to the initial state I
0 ,
I
~
0 0
followed by I I .
~
0 0 0
It is the vector sum ( 1Il ~) 0 I
s + a =
I
0
0
O bserve that this result agrees with Figure 2. 15(b ). Starting with any in itial configura tion s, suppose we push the switches in the o rder A, C, D, A, C, B. This corresponds to the vector sum s + a + c + d + a + c + b. But in Z~, addition is commutative, so we have s + a
+c +d
+ a + c + b = s + 2a + b + 2c + d = s + b + d
where we have used the fact that 2 E 0 in Zl' Thus. we would ach ie-v.: the same result b y pushing only Band D-and the o rd.:r does not matter. (Check that this is correct.) Henc.:, in this example, we do not need to push any switch morc than once. So. to see if we can achieve a t:lTget configuration t start ing from an in itial confLguration s, we need to determine whether th ere are scalars XI' . . .• ~ in Z l such that
Sct:tion 2.4
Applications
111
In o ther words, we need to solve (if possible ) the linear system over Z 2 that corresponds to the vector equation
In this case, S = 0 and our fi rst target configuration is I
o I
t =
o I
The augmented matrix o f this system has the g iven vecto rs as colum ns: I
I
I
0
0
0
I
I
I
0
0 0
0
I
I
1
0
I
0
0
I
I
I
0
0
0
0
1
I
I
I 0
o o
0
I 0
We reduce it over Zl to obtain
o o
0
o o
I
1
o
I
0
0 1 0 o I
I
1
0
o
o
0
1
0
Th us, Xs is a free variable. Hence, there are exactl y two sol utions (corresponding to Xs = 0 and Xs = [ ). Solving for the other variab les In terms of Xs, we obtain
x, -
X,
x! =
I
XJ =
1
xf = 1
So, when Xs = 0 and
Xs
+
X,
+
X,
== I, we have the solutions
x, x, x, x.
0
.f ,
I
1
0
I
x, x, x.
X,
0
.f ;
I
I
"d
respectively. (Check thatl hese both work. )
I
0
112
Chapter 2 Systems of Linear Equations
Sim ilarly, in the second case, we have I t =
0 0 0 0
The augmented matrix reduces as follows: I
I
0
I
I
I
0 0
0 0 0
I
I
I
0 0
I
I
0
I
0 I 0 0 0 0 I 0 I 0
,
I
0
0 0 0 0
I
0 0 0
0 0 I
0 0
0 0 0
I 0
0 I
I
I
0
0 I
I I I
showing that there is no solution in this case; that is. It IS impossible to start with all of the lights off and turn only the first light on.
Example 2.33 shows the power of linear algeb ra. Even though we might have found out by trial and er ror that there was no sol ution, checking all possible ways to push the switches would have been extremely tedious. We might also have missed the fact that no switch need ever be pushed more than once.
Example 2.34
Consider a row with only three lights, each of whICh can be off, light blue, or dark blue. Below the lights are three sw itches, A, B, and C, each of which changes the stales of part icular light510 the IIext state, in the order shown in Figure 2.16. Switch A changes the states of the first two light5.5\",itch B all three lights, and switch C the last two ligh ts. If all three lights are initially off, is it possible to push the switches in some order so thai the lights arc off, light blue, and dark blue, in that order (as in Figure 2. 17)?
/ ' orr ",,-
Dark bluc
Light blue
~
A
FII.r. 2.11
c
B
B
A
c
FlI. r. 2.1I
Whereas Example 2.:n involved Z2' this one dearly (is it clea r?) involves Z,. Accordingly, lhe switches correspond 10 the vectors
S, I.II..
I
,=
I
0
,
b=
I
0
I,c -
\
I
I
Sc\:tlon 2.4
Apphcations
113
o in z:i, and the final configuration weare aiming for is t ""
I . (Off is 0, light blue is I,
2 and dark blue is 2. ) We wish to find scalars X" Xz, x J in 7i.., such that
x ,a
+ x2b +
xJc = t
(where x, represents the number of times the ith switch is pushed). Th is equation gives rise to the augmented matrix [a bc I tl , which reduces over 7i.., as follows:
I
I
0 0
I
I
I
1 I
- ->10
o
I
I 2
o
0
0 2
1 01 0
I I
Hence, there is a unique solutio n: x, = 2,Xz "" I,x] = 1.ln o ther words, we must push swi tch A twice and the other two switches on ce each. (Check this.)
Exercises 2.4 Alloeallon of Resolrees I. Suppose that, in Example 2.27, 400 un its of food. A, 600 units of B, and 600 units of C arc placed in the test tube ellch day and the dll ta on dail y food consumption by the bacteria (in u nits per day) are as shown in Table 2.4. How many bacteria o f each strain can coexist III the test tube and consume all of th e food?
able 2.4 Bacteria Strain I
Bacteria Strain II
Bacteria Strain III
1 2 1
2 1 1
0 1
FooclA Food B Food. C
2
2. Suppose that in Example 2.27, 400 units offood A, 500 units of B, and 600 units of C arc placed in the test
Jable 2.5
Food A Food B Food C
Bacteria Strain I
Bacteria Strain II
Bacteria Strain III
1 2 1
2 1 1
0 3 1
tube each d ay and the data o n dal ly food consumptio n by the bacteria (in un its per day) are as shown in Table 2.5. How many bacteria of each strai n can coexist in the test tube and consume all of the food? 3. A florist offers th ree sizes o f flowe r arrangements containing roses, daisies, and chrysanthemums. Each small arrangement contains one rose, three daisies, and three chrysanthemullls. Each llledium arrangemen t contains two roses, four daiSies, and six chrysan them ums. Each large arrangement contains four roses, eight d;lisies, and six chr ysant hemums. One da y, the flo rist no ted Ihal she used ;l to tal of 24 roses, SO d aisies, and 48 chrysant hemum s in fi lling orders for these th ree types of arrangements. How m ally arrangements o f each type did she make? 4. (a) In your pocket rou have some IlIckels, d imes, and quarters. There arc 20 coins altogether and exactly twice as many dimes as nickels. The total value of the coins is S3.00. Find the number of coi ns of each type. (b) Find all possible combinations of 20 coins (nickels, dimes, and quarters) tha t will make exactl y $3.00. 5. A coffee merchant sells three blend s of coffee. A bag o f the house blend contains 300 grams of Colombian beans and 200 grams of French roast beans. A bag of the special blend contains 200 gram s o f Colombian beans, 200 grams of Kenyan beans, and 100 gra ms of French roast beans. A bag o f the gourmet blend
11.
Chapter 2 Systems of Linear Equations
f,
contains 100 grams o f Colombian beans, 200 grams o f Kenyan beans, and 200 grams of French roast beans. The merc hant has o n hand 30 kilogra ms o f Colom bian bea ns, 15 kilograms of Ke nya n beans, a nd 25 kilograms of French roast bea ns. If he wishes to use up all of the beans, how ma ny bags o f eac h type of blend can be made?
c
,
6. Redo Exercise 5, assu m ing tha t the house blend contains 300 gra ms of Colo m bian beans, 50 gra ms of Kenyan beans, and 150 grams of Fren ch roast bea ns and the gourmet blend contains 100 grams of Colombian beans, 350 grams of Ke nyan bea ns, a nd 50 grams of French roast beans. Th is time the me rchant has on hand 30 kilograms of Colombian beans, 15 ki lograms of Kenyan beans, and 15 kilograms of French roast beans. Suppose o ne bag of the house blend produces a profit of $0.50, one bag o f the special blend prod uces a profit of $1.50, and one bag of the gourmet blend produces a profit of $2.00. How many bags o f each type should the merchant prepare if he wants to usc up all o f the beans and maxinm.e his profit? Wha t is the maximum profit?
III ExerCISes 7- /4, bn/tmce tile chemical equation for each ret/cl iot!.
8. CO!
+ I-Ip
~
+ SOl C,H 120, + 0 1 (This reaction takes
r e 20 ) ~
place when a green plant converts carbon dioxide a nd wa te r to glucose and oxygen during photosynthesis.)
9.
CO 2 + H 20 (This reac tion occurs when butane, C~ H 1 0' burns in the presence of oxygen to form car bon dioxide and wa ter.) C~HtU
+ O2
10. Ci H 60 Z + 0 l
)
~
Flgur.2.18 (b) If the fl ow through A B is res tricted to 5 Um in, what WIll the fl ows th ro ugh the o ther two branches be? (c) What a re the m inimum and maximum possible flows through each branch? (d ) We have been assuming tha t flow is always poSitive. \¥ha t would negaave flow mean, assum ing we allowed it? G ive an illus tration for this example.
16. The downtown core of Gotha m City consists of one-way st reets, a nd the traffic fl ow has been measured al each in lersectioll. For the city block shown in Figure 2. 19, the numbers represent the average numbers of vehicles per m inute entering and leaving intersections A, C, a nd D d uring business hours . (a) Set up and solve a syste m of linear equal iOns to fin d the possible flows fl' ... ,f;,. (b) If traffic is regula ted on CD so thath "" 10 ve hicles per minute, what will the average flows on the other streets be? (c) What are the minimum and maximum possible flows on each s treet? (d) How would the solution change if all of the dIrections were reversed?
+ CO 2 Hp + CO 2 (This equation rep-
H 20
II. CsH I10H + Ol ~ resents the combus tion of amyl alcoho1.)
+ P40 10 ) 13. Na 2CO J + C + N: ~ 14. C2 H lCl~ + Ca(O H ): 12. HCIO t
B
n,
• 111.elnl Ch,.lell IquIUOOS
7. FeS 2 + O 2
H)P04 + Cl z0 7 ) NaCN + CO ) C 2HCl J
•
101 10,
15. Figure 2. 18 shows a neh...-ork of water pipes with flows measured in lite rs per min ute. (a) Set up and solve a system of lin ear equations to find the possible flows.
f,
A
•
hI
+ CaCI2 + H zO
N,lwor. 111111111
20i s, B
hi f,
•"
D I()
flgur' 2.19
I
•
C
lSi
•"
Section 2.4
17. A netwo rk of m igation ditches is shown in Figure 2.20, with flows measured in thousa nds o f liters per day. (a)
SCI
up and solve a system of lin ca r equations to find
the possible 110ws h ....
,is'
(b ) Suppose DC is closed. Wha t range of flow will need 10 be mamlai ned th ro ugh DB? (e) Fro m Figu re 2.20 it is deartha! DIJ cannO( be closed. (Why no t?) How does your solution in part (a) show
Electllcal "etwDI's For Exercises J9 (Iud 20, defemllne tile Ctlrrctt/s jor the gIven elecrricailletworks. I
19.
•
,
I
•
, 1 ohm
IIliS.
J,
(d) Front your solution in part (a), determine the minimu m and maxImum fl ows through DB.
i
c 8 volts
.,
100 ~
115
Applica ti ons
I,
•
A
•
B
1 oh m 4 ohms
A
I)
I,
•
•
•
I
\
I,
13 volls
:{'
20.
•
I
,
\
•
I
,
5 volts 1 ohm
"
C
c
J,
o
A
I,
•
•
B
2 ohms
fillare 2.21 4 ohms
18. (a) Set up and solve a system of linear equations to find the possible fl ows in the network shown in Figure 2.21. (b ) Is it possible for f. == 100 and i6= ISO? (Answer this questio n firs t wi th reference to your solutIOn III part (al and then direct ly from Figure 2.21. ) (el If h = 0, what will the ra nge of flo w l>c on ellch of the other b ranches?
150 !
lOot
[,
200 ~
[,t 4
c
I,t
f,t
/6
2 oh m ~
c
100 ~
/J
h
• ISO
•
r
E
0
21. (a) Find the cu rren ts I, 11" ' " I., in the bridge circuit . I·· 1Il " sure 2" . __ . (b ) Find the effective resistance of this network. (e) Crill you change the resistance III bra nch Be (bu t leave everything else unchanged ) so Ihal the current through branch CEbecomesO? I ohm
h~
[, !
200
8 volts
200t
•
A
•
, I,
I)
I,
I ohm
I, /J
"!
• 2 ohms
I,
£
•
I)
1 ohm
A
loot fig,,, 2.21
100 !
loot
•I fllure 2.22
14 volts
•I
116
Chapter 2 Systems of Linear Equations
22. The networks III parts (a) and (b) of Figure 2.23 show two resistors coupled in series and in parallel, respectively. We wish to find a general formula for the effective resistance of each network- that IS, find R,ff such that E = RoffI.
24. (a) In Example 2.33, suppose the fourth light is initially on and the other four lights arc ofT. Can we push the switches in some o rder so that only the second and fourth lights will be on? (b) Can we push the switches in some order so that only the second light will be on?
(a) Show that the effective resistance Rdf of a netwo rk With two resistors coupled in series [Figure 2.23 (a) I is given by
25. In Example 2.33, desc ribe all possible configurations of lights that can be obtained if we start wi th all the ligh ts off. 26. (a) In Exam ple 2.34, suppose that all of the lights are iniliallyoff. Show that it is possible to push the switches in some o rder so that the lights are off, da rk blue, and light blue, III that order. (b) Show that it IS possible to push the sWilches in some o rder so that the lights are light blue, off, and light blue, III that order. (cl Prove tha t any configu ration of the th ree lights can be achieved.
(b) Show that the effective resista nce Rdf of a network with two resistors cou pled in parallel [Figure 2.23(b)] is given by
27. Suppose the lights in Example 2.33 can be ofT, light blue, or dark blue a nd the switches wo rk as described in Example 2.34. (That is, the sWI tches control the same lights as in Example 2.33 but cycle through the colors as in Example 2.34.) Show that it is possible to start with all of the lights off a nd push the switches in some order so that the lights are dark blue, light blue, dark blue, light blue, and dark blue, in that order.
E (,)
/
I I
,
/,
•
R,
•
N,
28. For Exercise 27, desc ribe all possible configurations of lights that can be obtamed , starting With all the lights off.
-
CM
1 I
/1 E (b)
29. Nine squares, each one either blac k or wh ite, are arranged in a 3X3 grid. Figure 2.24 shows one possible arrangement. 'Nhen touched, each square changes Its own state a nd the states o f some of its neighbors (black ~ white and white ~ black). Figure 2.25 shows how the state changes work. (Touchi ng the square whose numbe r is circled causes the states of the squares marked " to change.) The object of the game is to tu rn all n ine squa res b lack.lExe rcises 29 and 30
Figure 2.23 Resistors in series and in parallel
Flnlle linear Games 23. (a) In Example 2.33, suppose all th e lights are initially off. Ca n we push the swi tches in some order so that only the second and four th lights will be on? (b) Can we push the switches in some order so that only the second light will be on?
Figure 2.24 The nine squares puzzle
Section 2.4
CD•
2
4
5
•
• •
8
7
3
4
5
6
4
,
6
7
8
9
7
8
9
3
I
6
9
•
•
4
5
@
9
7
8
9
2
3
I
2
3
5
6
4
5
6
9
7
6
7
8
9
7
8
I
2
3
I
4
5
•
6
4
•
9
7
•
•
Q)
•
8
•
®• •
•
•
•
•
•
•
• 8 ® • •
figure 2.25 Stat e changes for the nme squares puzzle
-
CMl 30. Consider a variation on the nine squares puzzle. The game is the same as that d escribed in Exercise 29 except that there are th ree possible states for each square: white, grey, or black. The squ3 res change as shown in Figure 2.25, but now the stale changes follow the
nlur.2.2& The nine squares puzzle with more states
cycle \"hite --1> grey ~ black --1> wh ite. Sho\\' how the winning '1 1I -black configuration can be achieved from the initial configuratio n shown III rigure 2.26.
MI,eelianeou, Problems
6
5
2
111
•
3
@)
•
•
2
• 4 ~ • •
•
•
I
3
I
2
3
2
I
•
I
Applications
III Exercises 31-47, set lip alld solve (III appropriate system of linenr equmiol/5 to al/swer the qucstimlS. 31. Grace is three times as old as Hans, but in :; years she will be twice as old as Hans is then. Howald arc they now? 32. The sum of Annie's, Ikrt's, and Chris's ages is 60. Annie IS older than nert by the same number of years that Bert is older than Chris. When Bert is as old as Annie is now, Annie will be three times as old as Chns is now. What are their ages? The precedl11g two problems are typical of those found 111 popular boob of mathcmmrcal puzzles. However, they have their origins ill amiquity. A Babylolliatl clay tablet tJlIlt sllrvives from aiJout 300 RC. cOllwins the foffowmg problem 33. There are two fi elds whose tota l area is 1800 sq uare yards. One fie ld produces grain at the rate of i bushel per square yard; the other field produces grai n at the rate of i bushel per squa re yard. If the total yield is 1100 bushels, what is the size of each field? Over 2(J(K) years ago, the Chme$(! developed methods for so/villg systems of imear equations, mcludillg a version of GaJlSsian efimmallon that did 1101 become wefl known m Europe rllltd the 19t1l cet/fury. (There is no evulellce that Gmus was aware of the Chillese lIIethods w/renhe developed what we /lOW call Gallssian efimllwtion. However, it is ele", t/wt Ihe Chmese kr'elV the essence of the met/lOd, even though they did 1I0t jllSlify its lise.) The following problem is taken from the Chinese text !iIlZ/Wllg mailS/III (Nim! C/wplers in the M"rh emllticai Art), w,irren dl/rmg till! early Hall Dynasty, abolll 200 B.C 34. There are three types of corn. Three bundles of the first type, two of the second, and one of the third make 39 measures. Two bundles of the first type, three of the second, and one of the third make 34 measures. And one bundle of the fi rst type, two of the second, and three of the third make 26 measures. How many measures of corn are contained in onl! bundle of each type? 35. Describe all possible values of a, b, c, and d that WIll make each of the followmg a valid add ition l'able.I Problems 35- 38 are based on the article
118
Cha pter 2
Systems of Linear Eq\w tlO ns
"An Application of Matrix Theory" by Paul Glaister in The MnrhematICS 7eacher, 85 ( 1992), pp. 220-223. }
(a)
+ a b ,
2
3
d
4
5
(b)
+ a b c tI
3
6
4
5
36. What cond itio ns on w, X. y, and z will guarantee that we can fi nd a, /J, c, and d so tha t the following is a valid addition table?
(
W
b x
(I
Y
z
a
37. DeSCribe all possible values of a, b, c, tI, e, and f that will make each of the following a valid addition table.
forexnmple, it arises in calcit/IISwhen we tleed to mtegralea ratiollal fill/Ctfotl alld in ,Iiserete mathematics whe" we lise getlerarillgfllnctio1l$ to so/l'e recurrellce relatiom Tile decom poSition ofa ratiollal fimct loll as a Slim ofpartial fract Ions leads 10 a system oflinear e(I/IlI /ioll s. III Exercises 41-44,fi"d the partitll fmction decomposition ofthegive" form. ("f1 ze caplta/ /ellers denOfeCOIISItUJIS.) 41.
3x + 1 A "" XZ+2x-3 x-I
42.
x! - 3x+3 A B =- + xl +2x2 +X x x+ I
+ abc
a
b
,
d t
3 5
2 4
I 3
d
I
e
f
4
3
I
f
3 4 5 4 5 6
(b)
2
39. From elementary geometry we know that there is a unique straight line through any two points in a plane. Less well known is the fae-I lhal there is a unique parabola through any three noncollinear points in a plane. Por each set of poin ts below, find a pambola with an equation of the (arm y = + bx + ( that passes through the given points. (S ketch the resulting parabola to check the validity of your answer. )
ar
(a) (O.I). (- 1.4), and (2, I )
+
C (x + 1)1
+x+1 A _--,B ,,=- + x(x - I )(x1 + X + I )(x 1 + 1)3 X x-I Xl
+
3
38. Generalizing Exercise 36, li nd conditions on the entries of a 3 X 3 addition t.\ble that wiH guar.tntee that we can solve fo r a, b. c. tI. e. and f as above.
8 x+3
x-I u.43. Z 1 (x+ I)(x + I )(x + 4) A 8x+C Dx+E "" +, +, x+ 1 x' +l x+4 c.o.s 44.
(a)
+
Cx+D Ex+"+ x 1 +x+ ) Xl+ 1
+
Gx+H lx+] + (x 2 + 1)1 (Xl + I )l
Following are 111'0 IIsefill fonl/II/a s fo r 'he slims of powers of (omend;ve mlluml "umbers:
1 +2+···+ 11 "" and
)2
+ 21 + ... +
I~n +
I)
2
' ~II+ 1)(2n+
I)
If- "" ~::...:--.:c,,","-,--.:.!. 6
The validity of Ihese formil las for (11/ V(lilles of n ~ I (o r even n > 0) call be established usm,!; mtlfhematical indllCtioll (see Appemlix 8 ). Dlle way 10 make (III edllcatetl guess (I S to what the fo mlllias (Ire. though, is to observe Ilrtll we ((III rewrile the two forml/I(ls above as
(b) (-3, 1),(-2,2),and (- 1,5)
40. Th rough any three noncoHinear points there also passes a unique circle. Find the circles (whose general equations are ofthcform xl + I" + ax + by + c"" 0) that pass through Ihe sets of points in Exercise 39. (To check the validity of your answer, find the center and radius of each ci rcle and d raw a sketch. )
The process ofadtling rational filllCl ;011$ (ratios ofpoly"om I aIs) by placing them over a commo" denominator is tile anaioglleof tl(itlillg ratlOna/numbers. Tire reverse process oflakmg (i rallO/wi fU llctioll l'parl by wntmg /ltU a SI/ III of si",pier ratiOlwl/lItICliollS is IIseflll in several areas ofmatllematics;
respectlve/y. TIllS leads to the conjecture that the Sll III of ptll powers of tI,e jim" natural /wmbers is a polynomial of degree p + I in tire variable II. 45. Assuming that I + 2 + ... + 11 "" ati + bn + c, find a, I"~ and (by substituting thrcc values for /I and thereby obtai nrng a system of linear equations in (I , b, and c. 46. Assume that 12 + 21 + ... + ,,2"" (11,3 + bll 2 + en + d. Fi nd (I, b, c, and d. ( Hm l: It is legitima te 10 use 1/ = o. What is the left-hand side in that case?) 47. Show that I' + 2] + ... + ,1' = ("(/1 + I) 2)1.
I I The GI
1Positio
g
The Global Positioning System (CPS) is used In a variety of situations fo r dctcrmining geographical locatio ns. The military, survero rs, airlines. shipping co mpanies, and hikers all make use of It. CPS technology IS becoming so commonplace that some auto mobiles, cellu lar phones, and various handheld deV ices are now equipped with it. The basic idea of G PS is a variant on th ree-dimensionaltriangulation: A point on Ea rth's surface is uniquely determined by knowing Its distances from th ree other points. I']ere the point we wish to determine is the location of the CPS receiver. the other points are s.1tcl lites, and the distances a re computed usmg the travel times of radio signals from the sa tellites to the receiver. We will assume that Earth IS a sphere on which we impose an xyz-coordinate system with Earth centered at the origin and with the posi tive z-axis runn ing through the north pole and fi xed relative to Earth. For simplici ty, let's take one unit to be equal to the radius of Earth. Thus Earth's su rface becomes the unit sphere with equation ,(- + I + t = I. Time will be measured in hundredths of a second. GPS fi nds distances by knowing how long II takes a radio signal to get from o ne point to another. For this we need to know the speed of light, which is approximately equal to 0.47 (Earlh radii per hundredths of a second). Let's imagine that you are a hiker lost in the woods at poin t (x,y, zl at some lime t. You don't know whe r~ you are, and fur thermo re, )'o u have no watch, so you don't know what time it IS. However, you have your C PS device, and it receives simultaneous signals from four satel lites, giving their positions and limes as shown in Table 2.6. (Distances are measured in Earth radii and time in hundredths of a second past midnight.)
This application is based on the article MAn Underdetermined linear System for GPS" by Dan Kalman In 71ll' Col/l'gf' Mil themiltKJ jol"nill. 33 (2002). pp. 384--390.
For a more in-depth treatment of the Ideas introduced here, s« G. Strang and K. Ilorre. Lil/cllr Algebra, GeM!'!),. II/Ill CPS (Wellesley-Cambridge Press, MA, 1997).
Let (x,y. z) be your position. and let t be the time when the signals arrive. The g0.11 is 10 solve for x. y, Z, and t. Your distance from 5..1tellite I can be compu ted as foll ows. The signal, traveling at a speed of 0.47 Earth radl i llO~l sec, was sent at time 1.29 and arrived at lime t, so it took t - 1.29 hundredths of a second to reach you. Distance equals veloci ty multiplied by (dapsed) lime, so
rl = 0.47«( - 1.29) 111
• Table 2.6 salemle Oat. Position
Time
(1.11. 2.55, 2.14) (2.87.0.00. 1.43) (0.00. 1.08.2.29) (1.54, 1.01 , 1.23)
1.29
SateUile
,
2 3 4
1.31
2.75 4.06
We can also express d in terrn$ o f (x. y. z) and the s,1Iell itc's position (1.11,2.55,2.14) using the distance formula:
d ~ \/"(x- _-:,-:.,7, ") ,-+:-7(y-_-:,".,","7; ),-+:-7 (,- _ - :,.7,,"')' Combin ing these results leads to the cqutllion (x - Lll) ~
+
(y- 2.55 )2
+ (z-
2. 14 )2 - 0.47z(t - 1.29)2
(I)
Expanding, simplifying, and rearrangi ng. we find thot equation ( I) becomes 2.22x
+ 5.1Oy + 4.28z -
0.57 ' = y;2
+ y + ZI
O.22r
-
+
11.95
Similarly, we can derIve a correspond ing equation fo r each of the other three satel lites. We end up with a system o f four equations in x. y, z, and t:
+ 5.IOr + 4.28z + 2.86z 5.74x 2.16y + 4.58% 3.08x + 2.02y + 2.46z -
2.22x
+ I + r - O.22r + ) 1.95 0.581 = xl + 1+ r - O.22r + 9.90 1.2 11 = K + i + o.zzr + 4.74 l.79r = x:2 + Y + xl - 0.221 2 + 1.26 0.57t =
r
r-
These are not lineOlr equations, but the nonlinear terms are the same in each equatio n. If we subtrOlct the fi rst equation from each o f the ot her three equations, ,,'e obtain a linear system: 3.52x - 5.lOy - I.42z-0.0 I t =
2.05
- 2.22x - 2.94y + 0.30z - O.64t =
7.2 1
O.86x - 3.08y - J.82z - 1.22( = - 10.69
The augmented matrix row reduces as 3.52
-5. 10
- 1.42
-0.01
-2.05
-2.22 0.86
- 2.94 -3.08
0 .30
- 0.64
-7.2 1
- 1.82
- 1.22 - 10.69
o
,o ,
0 0.36 2.97 0.03 0.8 1 0.79 59 1
from which we see that
x= 2.97 - 0.36/ y=0.8 1 -0.03 / Z
= 5.9 1 - 0.791
wi th t free. Substituti ng these equations into ( I ), we obtain (2.97 - 0.36r - 1.11)1 + {0.8 1 - 0.031 - 2.55)2 + (5.91 - 0.79t - 2. 14 )~ = 0.471( t - 1.29 )~
. 120
(')
which sim plifies to the q uadratic equation
0.54 r - 6.65t + 20.32 :: 0 There are two solutions: t = 6.74
and
t = 5.60
Substituti ng into (2), we find that the first solu tio n corresponds to (x, y, z) = (0.55, 0.61,0.56) and the second solution to (x, y, z) = (0.96,0.65,1.46). The second solution is clea rl y no t o n the unit sphere (Eanh), so we reject it. The firs t solution produces Xl + + = 0.99, so we are satisfied that, within acceptable rou ndoff error, we have located yo ur coordinates as (0.55,0.61,0.56) . In practice, G PS ta kes sign ificantly more facto rs in to acco unt, such as the fact that Eart h's surface is not exactly spherical, so additio nal refinements are needed involvIng such techn iq ues as least squares approximation (see Chapter 7). In addition, the results of the CPS calculation arc converted fro m rect:mgular (Cartesian) coordinates into latitude and longi tude. an interesting exercise in Itself and one involvmg yet other branches o f mathematics.
l
-r
'"
Seellon 2.5
Table 2.1 n o o x, o x,
1
2
J
4
5
6
0.714
0.914
0.976
0.993
0.998
0.999
1.400
1.829
1.949
1.985
1.996
1.999
TIle successive vectors iterate is
121
Iterative Methods for Sol\'ing Linear Systems
[::J
arecalled iterates.so,forexample. when" = 4, the fourth
[~::~~]. We can sec that the iterates in this example arc approaching [~],
which is the exact solution of the given system . (Check lhis.) We say in this case tha t Jacobi 's Illethod converges.
Jacobi's method calculates the successive iterates in a two-variable system according to the crisscross pattern shown," Table 2.S.
o
2
1
x,
J
:
:=
Before we consIder Jacobi's method in the general case, we will look al a I g
modification of it that o ften converges fas ter to the solution. The Gauss-SeIdel method is the same as the Jacobi method except th:Jt we usc each new val ue (IS SOOIl (IS lVe CfIlI. So In OllT exa mple. we begin by c3 leulati ng x, = (5 + 0)/7 = ~ .. 0.7 14 as before, bu t we now use this value o f x, to gel the next value of x 2:
7 + 3· J "'" 1.829 5
,•
We then usc this value of A1 to recalculate X" and so on. The iterates this lime arc shown in Table 2.9. We observe thaI the Gauss-Seidel method has converged (3sler to Ihe solution. The ilerates this time are c3lculaled according to the zigzag pattern shown in Table 2.10.
[ able 2 91
"
x, x,
0
1
2
J
4
5
0
0.714
0.976
0.998
1.000
1.000
0
1.829
1.985
1.999
2.000
2.000
124
Chapter 2 Systems o f Lmear Equations
Table 2.10 n o
1
2
3
The Gauss-Seidel method also has a nice geometric interpretation in the case of two variables. We can thin k of X I and ~ as the coordinates of po ints ill the plane. O ur starting poi nt LS the point corresponding to our Ill itial approximation, (0, 0 ). O ur first calculation gives X I = ~ ,so we move to the point ( ~, 0) .. (0.7 J 4, 0). T hen we compute:s = ~ = 1.829, which moves us to the point (~, ~) ... (0.71 4, 1.829). Continuing in this fashion, ou r calculations from the Gauss-Seidel method give risc to a sequence of poin ts, each one d ifferi ng frOIll the precedmg poi nt in exactly one coor== 5 and 3x1 - 5-'0 = - 7 correspo ndi ng to the two d inate. If we plot the lines 7X I given equations, we find that the points calculated above fall alternately on th e two lines, as shown III Figure 2.27. Moreover, they approach the poi nt of intersec\Jon of the li nes, which corres po nds to the solution o f the system o f equations. Th is IS what cO ll l'ergcllcc means!
:s
2
Ij. I
0.5
0.2
0.4
0.6
~-+-+-+- ~ XI 0.8 I 1.2
- 0.5 - I
flnrl 2.21 Convcrgmg l te ra l ~
T he general cases of the two methods are analogous. Given a system of equations in tI vanables,
""X, + ",,'"_ + .. + " ,"x" ~ " /.
II
linear
b, (2)
we solve the first equation for XI' the second for :S' and so on. Then, beginning with an initial approximation, we use these new equatio ns to iteratively update each
Section 2 5
herati\'e Me th ods for Solving Linear Systems
125
variable. Jacobi's method uses all of the values at the hh iteration to compute the (k + l )st iterate, whereas the Gauss-Seidel m ethod always uses the mos' recent value of each variable in every calculation. Example 2.37 below illustrates the Gauss-Seidel method in a three-variable problem. At th is paim, yo u should have some questions and concerns abo ut these iterat ive methods. (Do you?) Several corne to mind: Must these met hods co nverge? ICnot, when do they converge? If they co nverge, must they converge to the solution? The answer \0 the first ques tion is no, as Example 2.36 ill ustrates.
Example 2.36
Apply the Gauss-Seidel method to the system
with initial approximation SOIIUOI
[~].
We rea rrange Ihe equatio ns to gel
x l = l +x2 Xl
= 5 - 2xI
The fi rst few iterates are given in Table 2. 11. (Check these.) T he ac tual solution to the given system is [ : :] =
[~l Clearly. the iterates ill
Table 2. 11 are not app roaching this point, as Figure 2.28 m akes graphically clear in an example of divergence.
128
Chapter 2 Systems of Linear Equations
So when do these iter3tive methods com'erge? Un for tunately, the answer to this question is rather tricky. We will answer it completel y in Chapter 7, but fo r now we will gl\'e a partial answe r, \vithou t proof. Let A be the II X II m atrix
au
A=
a"
, ,
'"
a.,
a~
fl 21
'
a,.
' ,
.. ,
fl2"
.. ,
a.
We say tha t A is strictly diagonally dominant if
lulI I > laul + laul + ... + Iud > Itl211 + laBI + ... +
I UI ~I l al ~1
That is, the absol ute value of each diagonal entry a ll' U w .. . ,
n~~
is greater than the
sum of the absolute values of the re/Twining entries in that row.
Theor8 .. 2.9
If :I system o f II linear equatio ns in /I variables has a strictly diagonally dominant coefficient matrix, then it has a uniq ue solution and both the Jacobi and the Gauss-Seidel method converge to it .
•••• ,. Be wa rned! This theorem IS a one-way implicat ion . The fa cl that a system is lIo t strictl y dl3gonally domm[lnt does lIor m ean that the iterati ve methods diverge. They mayor may no t converge. (See Exercises 15- 19.) Indeed , there 3re examples in wh ich o ne o f the m ethods converges and the o ther d iverges. However, If either of these methods converges, then it must co nverge to the sol ution- It cannOI converge to some other point.
Theorem 2.10
If thc Jacobi or the Gauss-Seidel method converges for a syMem of /I linea equations in n variables, then it must converge to the solution of the system.
•
PilOt We will illustra te the idea behind the proof by sketch mg It o ut for the case of Jacobi 's method, using the sys tem of equations in E.xample 2.35. The general proof is similar. Convergence mea ns that from some iteration on, the val ues of th e iterates rel1l:li n the same. Th is means that X I and X:z converge to rand s. respecti vely, as shown in Table 2. 12. We musl prove tha t
[x'J ['J' X2
=
s
IS
,
the solutl()n of the system of eq uatio ns. In
other \'o'o rds, at the ( k + l )sl iteration , the values of
XI
and X:z must sta y the same as al
Se.::tion 2.5
Iterative Met hods for Solving Linear Systems
121
Table 2.12 k
n
x,
.. .
,
••
s
•
k+ I
k+ 2
,
,
.
s
s
.. .
the kth iteration. But the calculatio ns give (7 + 3xl )/5 == (7 + 3r)/5. Therefore.
5h 7
~
,
and
XI
7
= (5
..
+ x1 )17 = (5 + $)/7 and x2 =
+ 3r 5
=s
Rearranging, we see that
7r - s = 5 3r -55= - 7 Thus,
XI
= r, Xi = s salisfy the o riginal equations, as required.
----
By now you may be wonden ng: If iterative methods don't always converge to the solution, what good arc they? Why don't we just use Gaussian eliminatio n? First, we have seen that Gaussian elim inatIOn is sensitive to roundoff errors, and this sensitivity can lead to inaccurate or even wildly wrong answers. Also, even if Gaussian elimi · nation does not go ast ray, we canno t improve o n a solu\ion once we have found it. For example, if we use Gaussian elimination to calculate a solution to two decimal places, there is no way to obtain the solution to fou r decimal places except to start over again and wo rk with increased accuracy. In contrast, we can achieve additional accu racy with ite ra tive methods simply by doing more iteratio ns. For large systems, particularly those with sparse coefficien t matrices, iterative methods are m uch faster than direct methods when Implemented on a computer. In many applications, the systems that arise are strictly diago nally dominant, and th us iterative methods are guaranteed to converge. The next example illustrates one such ap plication.
Example 2.37
Suppose we heat each edge of a metal plate to a constant temperature, as shown in Figu re 2.29.
50""
JlgurI2 .29 1\ heated metal plate
o·
!ms o f ti near Equatio ns
Eventuall y the temperatu re at the in terior po ints will reach equilibrium, where the following propert y can be shown 10 ho ld :
The temperature at each interior point Pon a plate is the average of the temperatures on the circumference of any circle centered at Pinside the plate (Figure 2.30).
Fluu,.2.30
To apply this pro perty in an actual exam ple requ ires techniques fro m calc ulus. As a n alternative, we can approximate the sit ua tio n by ove rla yi ng the plate with a grid , or mesh, that has a fi nite number o f interior points, as shown in Figure 2.3 1.
5fJ'
50'
nuur. 2.31
I,
I,
100' 'l
The di5(: rele verSLon o f Ihe heated
plale problem
0'
0'
The disc rete analogue o f the averagin g p roperi y governing equil ib ri um tempera tu res is slated as fo llows: Th e temperature at each interior point P is the average of the temperatures at th points adjacent to P.
For the example shown in Figure 2.3 1, there are th ree In terior points, and each is adjacent to four other points. Let the equ il ib ri um temperatures o f the m terior points
Section 2.5
be t.,
' 2'
and
fl ,
129
Iterati ve Methods for Solving Linear Systems
as snown. Then, by the temperature-averaging p roperty, we have
'I + SO
+ 100 +
100
4
+
tl
+ 0
t)
+ 50
(3)
4
+
100
100
+
+
0
t2
4 "'" 250 - /1 +4t} -
t}"'"
50
'1 + 41} "'" 200
-
No tice tnat this system is strictly diagonally do m inan\. No tICe also that equations (3) arc in the fo rm required for Jacobi o r Ga uss-Seidel iteratIon. With an initial approxima tion of" "" 0, 12 "" 0, t) = O,the Gauss-Seidel method gives the foll owing itera tes. 100
Iteration 1:
+
100
+ 0 + 50
4 /1
I}
Iteration 2:
=
=
62.5+0+0 + 50 100
+
4 100
100
+
100
= 62.5
= 28. 125
+0+
+ 28.1 25 + 50
4
28.125
= 57.03 1
4 69.531
= 69.53 1
+ 57.031 + 0 + 50 4
100
+
100
= 44.141
°
+ + 44.1 41 :: 61.035 4
Continuing, we fi nd the ite rates listed in Table 2.13. We wo rk with five-slgnificantdigit acc uracy a nd stop when two successive Iterates ag ree wuhin 0.001 in all variables. Thus, the equdib rium temperatures at the in tenor po ints are (to an accuracy of 0.001) II = 74. 108, = 46.430, and ') "" 61 .607. (Check Ihesecalculations.) By using a fin er grid ( wuh more Inte rior points), we can get as precise in forma tio n as we like abo ut the eq uilibrium te mperat ures at va rious POlllts on the plate.
'2
Table 2.1a 0
J
2
3
t,
0
62.500
69.531
t,
0
28. 125
I;
0
57.031
"
...
7
8
73.535
74.107
74. 107
44.141
46.1 43
46.429
46.429
6 1.035
61.536
6 1.607
61.607
.. .
-+
138
Ch;lpwr 2
Systems of Lmear Eqwltion5
CAS
111 Exercises 1--6, apply Jacobi's lUe/lrod to thegivell S)'stelU. 'lflke Ihe zero vector as the illilifl/approximation am/work wllh fOllr-signific,," t-digit accuml)' IItllli IwO srlCceuive aerales agree willli" 0.001 in each vam,"'e {" each case, compare YO llr al/swer wi/II tile exact so/wioll foulld using (IllY direct tIIethod )'Ollilke. 1.
7x,
-
2. 2xl +x1 =5 x, - Xl = 1
= 6 x l -Sxz = -4 Xl
3. 45xI - O.5xz "" x, - J.5J,·l
- I
4. 20xI +
Xl
:II
Xl -
x, - lOx! + - XI
5.
+
3xI +
Xl
+
-
= 17
6xI -
xJ
-
2xJ = I
17. Draw a diagram to illu strate the divergence of the Gauss-Seidel method in Exercise 15.
l B. - 4xl :
I
-'YI+3x1 - X, -Xl
+ 2xJ = 2 2X2 + 4xJ = I
XI - 4xj
diagolt(llly dOlllill(lIIl, lIor ((III the efjJlfltiollS be reamll/ged 10 make it so. Howel'Cr, both tile Jacobi arid till' Gauss-Seidel method cotll'crge (lIIyway Dell/ollSlmle dUlt tlris is Irlle of IIJe G(1I/5s-Seidel method, starlmg with lite zero vector as lite i"itial approx;matlon and obta;IIl118 (I $o/ul ioll tiwI ;$ accurate 10 lI'ith;/1 0.01.
1
x,
16.
hi Exercises 18 a1l(1 19, lhe coefficiem matrix is IIOt strrclly
xl +4x2 + x} = I Xl + 3x, = 1
6. 3x,
15. XI - 2X2 = 3 3xl+2x2 = 1
1
Xj = 13 l Ox} "" 18
x,
o/mlill a ll approxilllflte solution Ihtl/ is aCCl/rtlte to witilill O.(XH.
= 0 + 3x} - X4 "" I -X, + 3x,. = 1
I" Exercises 7-/2, repcat Ille givell exercise rlSillg tire GallssSeidel metllOd. Take the zero vector as tile illitial approxilllfllioll (lIId work with fOlir-sigmfiflJtJI-dlgil aa"llml)' 1I11/i/ 111'0 sllccessiw! itemres agree willli" 0.00 I m each I'anable. Compare thc /II/ mila of itemtiolts requi red by tire Iflcobi arId Gal/ss-Seldel met/lods 10 reach sl/ch all approxmlllte solmio1l. 7. Exercise I
B. Exercise 2
9. Exercise 3
10. Exercise 4
II. Exercise 5
12. Exercise 6
I" Exercises 13 mill 14, (Imw (/iflgrums 10 illustrate I/,e eOIlvergence of tIle G(II/ss-Seidel method willi lire gll'e" system. 13. The system in Exercise I 14. The s),slem in Exercise 2
+
5xl = 14
3x1 = - 7 19. 5xI - 2xz + 3x, = - 8 XI + 4xz - 4x.l = 102 -2x l - 2X:! + 4x3 = -90 XI -
20. Continue perfonning iterations in EXerc"lSe 18 to oblain a solution Ih;1t is accurale 10 wilhin 0.001 . 21. Continue performing iterations
Exercise 19 10 oblain a solution Ihal is accura te to within 0.00 1.
I" Exercises 22-24, the mettll plate lUIS tire C0Il5/(1II1 lemt1Crafllres shown 011 its bOllmlancs. PmtJ rite equilibrium tempemlure fll caell of tile /fIt/ieared ill/erior poillts by sCltillg up a syslem of lille(lr eqlllltiolls miff applyillg eitller the I(lcobi or the Gauss-Seitlel method. Obtai" a SOIIIlIOII Ihal ;$ accurate 10 lVithi" 0.001. 22.
0'
I" Exercises 15 (/lid 16, compule the firs l four ilemles, Iisillg lilt' zero vector as tile jllitiaf approximatlOtl, to show II,at tile Gm/SS-Seidef metllod (Iiverges. TlJen show tlJaltl,e eqlw llOIls am be rcammged 10 give (/ strictly diago//(/I/y (IOllllllalll coeffilictll matrrx, and apply II,e Gauss-Seidel tIIelhod 10
III
0'
"
,. ,.
SeCiion 2.5
o·
CY'
23.
"
o· IOCY'
"
24.
o·
o·
'.
IOCY'
100"
CY'
2CY'
"
'2
40'
4CY'
27. A narrow strip of paper I unit long is placed along a number line so that its ends are at 0 and I. The paper IS folded In half, right end over left, so that its ends are now at 0 and t. Next, it is fo lded in half again, this time left end over right, so that It S ends a fC at ~ and ~ . Figure 2.32 shows this process. We con tinue fo lding the paper in half, alternating flght -over-Ieft and leftover-right. If we could con ti nue indefinitely, il IS dear that the ends of the paper would converge to a poi nt. It is thjs point that we want 10 find .
2CY'
(a) lei XI co rrespond to the left -hand end of the paper
100'
'.
"
131
Exercises 27 ali(I 28 demonstrtltc that soll/ctimes, if we arc lucky, the form of an iterative problelllll1ay allow [IS 10 lise a little IIIsight to olJtaiu all exact soilltion.
'2
IO(f
Iterati ve Methods fo r Solvmg Linear Systems
IOCY'
111 Exercises 25 (lI1d 26, we refille the gnd used In Exercises 22 mid 24 to obtai" more accllrate iltjormatloll about IIII? eqllilibrlllm lcmperarures almtcrior poims of the plates. ObU/ili solUliollS Ihat are accurate to WIt/1111 0.00], J/SlIlg eitiler the Jacoul or tile G(It/S5-Seidel method. 25.
and Xz to the right-h:llld end. Make a table with the first six values of [ XI' X21and plol the corresponding pomts o n Xl-X:! coordinate axes. (b) Find two linear equulions of the form X:! = (lX I + b and XI = £'Xl + d that determine the new values o f the endpoints at each iteration. Draw the correspondlllg lines on your coordinate axes and show that thiS d iagram would result from applying the Gauss-Seidel method to the system of linear equations you have found. (Your diagram should resemble Figure 2.27 on page 124.) (c) Switching to decimal representation, continue applying th e Gauss-Seidel method to approximate the
point to which the ends of the paper are converging to within 0.00 1 accuracy. (d) Solve the system of equations exactly and compare your answers. 28. An ant is standin g on a number line al poilu A. It
o· 5~
26.
5°
0°
o·
"
'2
'. '.
" 4CY' 40'
00
'.
'10
'13
•
5°
20°
20 0
"
'.
"
'.
'n
'12
'"
'16
20°
2CY'
100'
,,,,alks halfway to point 8 and turns arou nd. Then it walks halfway back to point A, lurns around again, Jnd walks halfway to pomt B. It continues to do this indefinitely. Let point A be at 0 and poil\\ 13 be at I. The ant's walk is made up of a sequence of overlap· ping line segments. Let XI record the positions of the left-hand endpoints of these segments a nd X:! their right -hand endpoin ts. (Thus, we begin with XI = 0 and X:! = Then we have Xl = ~ and X2 = ~. and so on.) Figure 2.33 shows Ihe stun of the ant's wulk.
i.
(a) Make a table with the fi rst six values of Ix i • ~J and plot the corresponding points on X I - X 2 coordinate axes. (b) Find two linem equations of the form ~ = aX I + /, and XI = cx.z + d that dClertlline the new values of the endpoints at each iterallon. Draw the corresponding
132
Chapter 2 Systems of I.inear Equations
-...
..I 0
• I ;I /Iite
I
I
0
I I
, ,, ,I •
I
,
,
I
I 0
,,I
I 0
0 I
I
-1
1
,
2
I
,
-3
n11U12.32 Folding a strip ofpapcr
I 0
,,I
I
I
• -I- i I
I 0
,-
-,J
)
I"-J9{~\
-
J
-
I
, -,
I/P'if·1 I
I
J
I
-,
1
lines on your coordinate axes and show that this diagram wo uld result from applying the Gauss-Seidel method to the system of linear equa tions you have found. (Your diagram should resemble rigure 2.27 on page 124.) «) Switchmg 10 decimal represen tation, continue appl ying the Ga uss-Seidel method to approximate the values to which XI and Xl arc converging to within 0.00 1 acc uracy. (d ) Solve the system of equatio ns exacdy and compare yo ur answers. Inter pret yo ur resul ts.
1
2
8
figl,. 2.33 The anCs walk
R
.
.'
~
.~
.~
..:......... ":'
.
'.'
Ie, Dellnltlons and augment ed matrix, 62 back substitu tion , 62 coefficient matrix, 68 consistent system, 61 convergence, 123- 124 d ivergence, 125 elementary row o perations, 70 free variable, 75 Gauss- Jordan eli mination, 76 Gauss-Seidel method, 123
Gaussian eiimmatio n, 72 ho mogeneous system, 80 inconsistent system, 6 1 iterate, 123 Jacobi's m ethod , 122 leadmg Yanable (leading 1), 75-76 linear equation, 59 linearl y dependent vecto rs, 95 linearly independent vectors, 95
pivot, 70 ra nk of a matrix, 75 .. Rank Theorem, 75 reduced row echelon form, 76 row echelon fo rm, 68 row equivalent matrices, . 72 span of a set o f vectors, 92 spanning Set, 92 sysiCm of linear equations. 59
Review Questions I. Mark each o f the following statemen ts true or fa lse:
(a ) Every system of linear eq ua tions has a solution. (b ) Every homogeneo us system of linear equations has a solution. (c) If a system of linear equations has more vanables than equat ions, then it has infinitely many solutio ns. (d ) If .. system of linear equatio ns has mo re equations than variables, then it has no solution.
(el Determining whether b is in span(a l , •• . ,an) is equivalent to determlll lllg whether the system [A I b l is consistent, whe re A ::0 lOl l'" anI . (f) In RJ,span( u , v) is alwa ys a plane through the on gln. (g) In R3, if nonzero vectors u and v are not parallel, then they are linearl y independent. (h ) In R 3 , If a set of vecto rs can be drawn head to tail, one after the o ther so that a closed path (polygon) is fo rmed, then th e vectors are linearly dependen t.
133
Chapter Review
(i ) If a set of vectors has the propert y that no two vectors in the set are scalar m ultiples of o ne another, then the set of vectors is linearly independent. (j) If there arc more vectors in a set of vectors than the num ber of entries in each vector, then the sCI of vectors is linearl y dependent.
2. Find the rank o f the mat rix.
] 3
-2 - ]
o
3
2
]
3
4
3
4
2
- 3
o
- 5
- I
6
2 2
II. Find the gener;al equatio n of the plane span ned by
1
3
1 and
2
1
]
2 12. Determ ine whet her independent.
u
~
,v =
]
4. Solve the linear system
- ]
,v =
over 2 7,
6. Solve the linear sys tem
3x +2y = 1 x + 4y = 2
7. For what value(s) of k is the linear system with
2 I] inconsistent?
2k 1
9. Find the point of intersectio n of the fo llowing lines, if it exists.
Y
2 + , -I 3 2
,
10. Determine whether 1
and
2 -2
x
1
and
y
,
5 -2 - 4
- I
+
~
- I
1
3
5 is in the span of
1
3
a1 a,l. What are the possible val-
t
1
17. Show that if u and v are linearly independent vecto rs, thensoare u + vand u - v.
18. Show that span(u, v) = span(u, u u and v.
+ v) fo r any vectors
19. In order for a linear system with augmented mat rix [A I b l to be consisten t, what mus t be true about the ran ks of A and [ A I b j? 1
1 1
- ]
w
16. What is the maximum rank o f a 5 X 3 matrix? What is the minimum rank of a 5 X 3 matrix?
8. Find parametric equations fo r the line of intersection of the planesx+ 2y+ 3z = 4 and 5x + 6y+ 7z = 8.
1
,
15. Let a i' a 2, aJ be linearly dependen : vectors in R', not all zero, and let A = [a l ues o f the rank of A?
x
0 1
0
(a) The reduced row echelo n for m o f A is 13' (b ) The rank of A is 3. (e) The system [A I b] has a unique solution for any vector b in [RJ. (d) (a), (b ), and (c) are all true. (e) (a) and (b ) are both true, but nOI (el.
2x+ 3y = 4 x + 2y = 3
k
1
w ~
14. Let a i' a 2, a J be linearl y independent vectors in [RJ, and let A = ta l a~ a JI. Which o f the following s tatements are true?
5. Solve the linear system
augmented matrix [ I
- 2
1
0
3w+ 8x- 18y+ z = 35 w + 2x - 4y = II w+ 3x- 7y+ z = iO
9 are linearly
0
]
- I
,
= span{u, v, w) if:
,
0
1
Cbl u ~
-2
1
0
x + y - 2z = 4 x + 3y - z = 7 2x+ y - 5z = 7
-I
-3
]
Cal
3. Solve the linear system
,
]
13. Determine whether R'
3
1
20. Arc the matrices
I
I
2 3 - I and - 1 4 1 row equivalent? Why or why not?
I
0
- 1
I 0
I I
I 3
'---' .. .... ... . --,-_ ......- ~,..
.....
~
-
.
trice
We [Halmos lind Kllpltlllsky/ share II philosophy a llOlll lim:llr algebra: we Ihink basis-free, we wnte basis-free, bur wile" Ihe chips are down we clost' Ihe affin' door ami comp"tt with matricts tikt fury. -lr:vlllg Kaplansky In Pa,11 Halmas: Celebrating 50 )cars of Mar/rt'malics J. H. Ewingand F '" Gehrmg. 005. Springer-Verlag, J991 , p. 88
3.0 Introduction: Matrices In Action In this ch3pter, we will study matrices in their own right. We have already used matrices-in the form of augmented matrices-to record information about and to help stream,line calculatio ns involvmg systems of Imear equations. Now you will see that matrices have algebraic properties of their own, whICh enable us to calculate with them, subjoct to the rules of matrix algebra. Furthermo re, you will observe that matrices arc not stalic objects, recording information a nd data; rather, they rep resent certain types offunctions that "act" on vectors, transformi ng them in to other vecto rs. These "mat rix transformations" will begin to play a key role in our study of linear 31gcbra and will shed new light o n what you have al ready learned about vectors and systems o f Imear equatio ns. Furthermo re, mat rices arise in many form s other than augmented matrices; we will explore some of the many applications of mat rices al the end of th iS chapter. In thiS section, we will consider a few si mple examples to illustrate how matrices ca n transfo rm vectors. In the process, you will gel your first gl impse of "matrix arithmetic." Consider the equations
y, = x l + 2x2 Y2 =
(I)
3 X2
\'Ve can view these equations as describing a tran sformation of the vector x -- [xX,'
1
in to the vector y = [;:]. If we d enote the matrix of coefficients of the right-hand side by F, then F =
[ ~ ~] we can rewrite the transformation as
or, more succinctly, y = Fx. (T hi nk o f this expression as analogous to the functional notation y = ! (x ) you are used to: x is the independ ent "varltlble" here, y is the dependent "variable," and F is the name of the "functio n.")
13.
Section 3.0
Th us, if x = [ -
Introduction: Matrices m Action
135
~ ], then the eq uations ( I) give YI=- 2 +2 'I =O
Y2 =
3 ' 1=3
We can write this expression as
y =
[ ~]
[ ~] = [ ~ ~][ - ~ ].
ProblelD 1 Compute Fx for the following vectors x:
Problem 2 The heads o f the fo ur vectors x in Problem 1 locate the four corners of a square in the x I X2 pla ne. Draw this square a nd label its corners A, B, C, and D, cor· responding to parts (a), (b ), (c), and (d ) o f Problem 1. On separate coordinate axes (labeled YI and Yl)' d raw the fo ur points determined by Fx in Problem 1. Label these po~s A', 8' ,C , and D' . Let's make the (reasonable) assumption thaI the line segment AB is tra nsformed in lO the line segment A' B', and likewise for the other three sides of the square ABCD. Whal geometric figure is rep re· sen ted by A' B'C D'?
Problell 3 The center of square ABCD is the origin 0 =
[ ~ ]. What
IS
the center of
A' 8' C D' ? What algebraic calculation confirm s Ihis? Now consider the equations 21
=
YI- Yl
(2)
2;> = - 2YI
that t ransform a vector y =
[;J
[~]. We can abbreyiatc this
into the vecto r z =
tra nsformation as"Z = Gy, where
G=[ - 2' -'] 0 Prolllllll 4 We arc going to fi nd out how G transfor ms the figure A' B' C D' . Compute Gy for each o f the four ve<:tors y that you comp uted in Problem 1. [That is, compute "Z = G(Fx ). You may recognize this expression as being .analogous to the composition of fun ctions wlIh which yo u are fa m iliar. I Call the correspondmg poin ts A", 11', C', and D ", and sketch the fig ure A" 8"C"D" on 21 ~ coordinate axes. Problem 5 By substituting equations ( 1) into equations (2), obtain cquatjons for 21 and ~ In terms of XI and x.. [f we deno te the matrix of these equations b y N, then we have "Z :::: Hx. Since we also have "Z = GFx, it is reasonable to write
1-1 = GF
'
Can yo u see how the en tries of H are related to th e entries of F and G? Plabl._ 6 Let's do the above process the other way around: First transform the square ABCD, using G, to obtain figure A"'8* ~D* . Then tra nsform the resulting figure, using F, to obtain A"'· 8"'* C"'* Dn. lNote: Don', wo rry abou t the "variables"
136
Ch:lpter 3 Matrices
It,
n, C, and D into equat ions (2)
y , and z here. Simply substit ute thc coo rdinates of A,
and then substitute the results into equations ( J ).J Are A** 13*'" CO'" 0"'''' and A" BloC' D" the same? What does this tell you about thc order ill whIch we perform the transformations Fand G? Problem] Repeat Problem:; with general matrices G
~
[g" g,,], 8 21
and
g"
That is, if equations ( I) and equations (2) have coefficients as specified by F and G, fmd the entr ies of H in terms of the entries of F and G. The result will be a formula fo r the "product" H = GF. Problem 8 Repeat Problems 1---6 with the follow ing matrices. (Your formula from Problem 7 may help to speed up the algebraic calculatIOns.) Note any si milarities or differences that you think are significant. (.) F
~
(0) F ~ ,
[ °1
[I
- 1]G~[20] o' 0 3 I]G~ [ 2 - 1]
2 '
- I
I
(b) F
= [:
(d) F
~
[
~J.G = [~ :J I
-2
-2]G~[ 2 4 '
I
:]
Matrix Operations Although \~e have already encou ntered matrices, we begin by stating a form
Definition
A matrix is a rectangular array of numbers called the entries, or elements, of the matrix.
Although numbers will usually be
chosen from the set R of real numbers, they may also be taken from the set C of complex numbers or from Zp' where p is prime.
Technically, there is a distincnonl between row/column matrices nnd wctors, but we will not belabor this distinction. We will, however,distinguish between row matflces/vectors and co/mllli matrices/vectors. This distinction is important-at the very lea$tfor algebraic computations. as we will demonstrate.
The foltowing are all eX:llllpJes of matrices: - I
2 4 , 17
5.1
I ],
1.2
- I
6.9 4.4 , [ 7] - 73 9 8.5 The size of a matrix is a description of the numbers of rows and columns it has. A ma trix is called mX 1/ (pronounced" m by 1/") if it has III rows and tI columns. Th us, the examples above are mat rices of sizes 2 X 2, 2 X 3, 3 X I, 1X4, 3X 3 and 1X I, respectively. A I X m matrix is called a row matrix (or row vector), and an /IX I ntrttr lx is called a column matrix (o r column vector). We usc dOllble-subscrlpt notation 10 refer to the entnes of a matrix A. The entry of A in row i and column j is denoted by art Thus, tf A =
[I
I
[~
9 5
I
°
- :]
then au = -\ and a ll = S. [The notation A'J is sometimes used interchrtngeably with ail I We Crtll therefore compactly denote a matrix A by (aijl (or (ai)",xw ifitls Important to specify the size of A, although the size \~ill usualty be dear from the con text).
SW io n ). I
131
Matrix Operations
With this notation, a gene ral //I X 11 matrix A has the fo rm .. .
a .. 1
...
QORl
If the columns of A ar~ the vectors a l' a 2, ... ,a ~, th ~n we may represent A as •••
' .J
If the rows of A a rc AI' Al •.•. , Ani' then we may represe nt A as
A, A=
A,
A. The diagonal e,dries of A arc al l' nll> aw " . ,and If m = n ( that IS, if A has the same nu mber of rows as columns), the n A is called a square mntrix. A square matrix whose nondiagonal entries a rc all zero IS called a tlingomd matrix. A diagonal matrix all of whose diagonal en tries a rc the same is called a scalar matrix. If the scalar o n the diagonal IS 1. the scalar mat ri x is called a n idw,ity matrix. For example, let A = [
2
- I
5 4
B
-
[34 5'I]
o o c= o 6 o , o o 2 3
D =
1 0 0 0 1 0
o
0
1
The diagonal enlries of A a re 2 and 4, but A is no t square; B is a square ma trix of Size 2 X 2 with diagonal entries 3 a nd 5, C is a diagonal ma trix; D is a 3 X3 identity ma tTix. The n X II identity ma trix is denoted by I~ (or simply I if its size is unde rslOod). Since we c.1 n view ma trices as generalizations of vectors (and. indeed, matrices can and sho uld be though t of as being made up of bot h row a nd column veclOrs), many of the conventions and o pe rations for vectors carry th rough ( m a n obvious way) to m atrices. Two ma trices arc equal if they have the same size a nd if their corresponding ent ries are equal. Th us. if A = [a'JJmxn and B (b91 ,xl' the n A Bif and only if //I "'" r llnd /I = s a nd tI,) = hI] for atll a nd j.
=
Example 3,1
=
Conside r the matrices
A = [:
!].
B=
[ ~ ~ ].
c-
o
[! ;] 3
Neither A no r B can be eq ual to C( no matter what the values of xand y), s ince A lInd Bare2 X2 malrices a nd C is2X3. However, A = Bifand on ly if ( I = 2, /, ;;; O,e - 5, and d "" 3.
Example 3,2
Consider the malri c~s
R =[ l
4
3J and
C=
1 4 3
138
Chapter 3
Matrices
D espite the fac t that Rand C have the same entries in the same order, R -=I- C since R is 1 X3 and C is 3X I. (If we read Rand Caloud , they both sound the same; "one, fou r, th ree.") Thus, o ur distinc tion between row matrices/vectors and column matrices! vecto rs is an importan t one.
Matrix Addition and Scalar Multiplication Generalizing from vector add ition, we defi ne mat rix addi tion compOl1el1twise. If A = [a,) and B = [b;) are mX tI mat rices, their sum A + B is the mX tI matrix obtained by adding the corresponding entries. Thus,
A
+ B = [11;j +
bij]
[We could equally well ha ve defined A + B in terms o f vector addition by specifying that each column (or row) of A + B is the sum of the correspo nding colum ns (or rows) of A and 8. [ If A and B a re no t the same size, the n A + B is not defined.
Example 3. 3
Let A = [
1
4
-2 6
~].
B ~
1 ] [-: -1 2 . 0
Then
A+B = b ut neither A
+
C no r B
[ -~
5 6
"d
C =
[~
:]
-; ]
+ C is defi ned.
The com ponen twise defi n ition of scalar multiplication will come as no surprise. If A is an m Xn matrix and c is a scalar, then the scalar multiple cA is the mXn matrix o btained by m ultiplyi ng each e ntry of A by c. More fo rmally, we have
[ In te rms o f vectors, we could equivalently stipulate that each column (or row) of cA is c times the corresponding colum n (or row) of A.I
Example 3.4
For mat ri x A in Example 3.3 ,
2A = [
2
- 4
8 12
l~l
!A= [_: ~
~l
and
(- l)A =[- ~
- 4 - 6
The matrix (- I)A is written as - A and called the negativeo f A. As with vectors, we can use this fact to defi ne the difference of two matrices; If A and B are the same size, then A - B ~ A +(- B)
Sect ion 3.1
111m pie 3.5
lJ1
Ma tnx Operatio ns
For matrices A and B in Example 3.3,
]- [-3 o A - B= [ I 4 0 I
-2 6
5
3
A matrix all of whose entries arc l eTO is called a zero matrix ;md denoted by 0 (or 0 "')(11 if it is imporlant to specify its size), It should be dear that if A IS any matrix and o is the 7£ ro matrix of the same size, then
A+O=A=Q+A . nd A - A = 0 = - A
+A
MaUll MUlllpllclliOD ~13t hcll1aticia ns
are sometimes like Lewis Carroll's Hu mpty Dumpty: "Wb en I use a w\l rd ,~ Hu mpty Dumpty said, "it means just what I choose it to me-anneither more nor JdoS ( from 11Iro11811 Illf Loobll8 GIIIU), M
The Introduction in Sect ion 3.0 suggested that there is a "product" of matrices that is analogous to the compo sition of fun ctions. We now make Ihis no tion morc precise. The defini tion we arc about to give generalizes what you should have discovered in Problems 5 and 7 in Section 3.0. Unl ike the definitions of matrix addition and sca lar multiplicauon, the defi nitio n o f th e product of IwO m,l\rices is not a componentwise definiti on . Of cou rse, there is nothing to stop u s from defin ing a product o f matrices in a componenlwl5e fas hion; unfortunately such a defini tion has fcw ap plica tions and is not as "natu ral" as the one we now give.
If A is an ", Xn matrix and 8 is an tlX r matriX', then the product an "'x r matrix. The (i, j) entry of the product is computed as follows:
II
.
U.,r•• Notice that A and B need not be the same size. However, the number of colulIIlI$ of A must be the same as the number o f rows of 8. If we write the sizes of A, and AIJ in order, we C;11l scc at a gl
n.
8
A
mx.
1
U
"
SJ~
SiuofAR
,
)
= AB mX<
lei
Chapter 3 Matrices
• T he formula fo r the entries of the product looks like a do t product, and indeed it IS. [t says that the ( I, j) entry of the matrix AB is the dot product of the ith row of A and the jth col umn of 8:
a"
(I ll
·•
• •
a,
a,
a,. • ••
a,.
·•
·..
boo
· ..
b11 . •
a.,
a."
• ••
·.
b. ,
a.
boo
•• •
b"
b" b"
• • •
b.,
Notice that, in the expression C'I = alibi) + (l;zb2j + ... + a'"/' "j' the "o uter subscripts" o n each ab term in the sum are always I and j where:lS the "inner subscripts" alw:lYs agree and increasc from I to 11. We see this pattern clearly if we write e'l using summatio n notation:
(KImple 3,&
Compute AB if
A~ [
I -2
J - I
-:1 and
-4 0 5 -2 -I 2
B:
3
- I
- I
I
0
6
S,I,UOl Since A is 2 X3 and B is 3X 4, the p roduct A8 is defi ned a nd will be a 2 X4 matrix. The first row of the product C::: AB IS computed by taking the dot product of the first row of A with each o f the columns of 8 in tu rn. Thus,
<" <"
1(-4) + 3(5) = 1(0) + 3(-2) = 1(3) + 3(- 1) - 1(- 1) + 3(1) ~
+ (- 1)(- 1) : 12 + (- 1)(2) : - 8 + (- 1)(0): 0 + (- 1)(6) = - 4
The second row of C is computed by taking the dot product of the second row of A With each o f the colum ns of B in turn:
+ (- 1)(5) + (1 )(- 1): 2 'n: (-2)(0) + (- 1)( - 2) + (1 )(2) = 4
Thus, the product matrix is given by ALi :::
12 [
2
-8
0
4
- 5
-:]
(Wi th a little practice, you should be able to do t h~ calculations mentally without writing out all of the details as we have done here. For mo re complicated examples, a calculator with matrix capabilities o r a computer algebra system is preferable.)
J
Sccllon 3.1
Matrix Operations
1.,
Befo re we go furth er, we will consider two examples that justify our chosen
definition of matrix mult irlication.
Example 3.7
Ann and Bert are plan n ing to go shopping for fru it for the next week. They each want to buy some apples, o ranges, and grapefruit , but in diffenn g amounts. Table 3.1 lists what they intend to b uy. There are two fruit markets near by-Sam's and Theo's-and their prices are given in Table 3.2. How much will it cost Ann and Bert to do their shopping at each of the two markets?
Table 3.1
'able 3.2
Apples
Gra pefruit
O ranges
6 4
3
10
8
5
Ann Bert
Solutloll
Apple Grapefruit Orange
Sam's
Theo's
$0.10 $OAO $0.1 0
$0.15 50.30
SO.20
If Ann shops at Stun's, she will spend
+ 3(0.40) +
10(0.10)
~
S2.80
6(0. 15) + 3(0.30) + 10(0.20)
~
$3.80
6(0.10)
If she shops at Theo's, she will spend
Bert will spend
4(0.10) + 8(0.40) + 5(0.10) = $4.10 at Sam's and
4(0.15) + 8(0.30) + 5(0.20) = $4 .00 at Thea's. (Presumably, Ann will shop at Sam's while Bert goes 10 Theo's.) The "dot product form" of these calculations suggests that malrix multipl ication is at work here. If we o rganize the given mformation into a demand matrix D and a price matrix P, we have
D =
[!
!
1~]
and
0.10 OAO 0.10
P=
0.15 030 0.20
The calculations llbove are equivalent to computing the product
Table 3.3
6
Sam's
Theo's
Ann
S2.80
S3.80
"'n
$4.10
$4.00
DP= [ 4
3
10
8
5
0.10 0. 15 0040
1O. \0
0.30
0.20
=
[ 2.80 3.80 4.10
1
4.00
Thus. the product matrix DP tells us how much each person's purchases will cost at each store (Table 3.3).
142
ChapleT)
Matrices
E".DI.3.'
Consider the linear system
x l -2x2 +3xl =
5
-X1 +3x2 + xj =
I
2xI -
x2
+
(I )
4x) = 14
Observe that the left-hand Side arises from the matrix product
I
- 2
3
XI
- \
3
I
x~
2
- I
4
x,
-2 3 3 1
x,
-I
x,
so the system (I) can be written as 1 -I
2
4
5
=
1
14
or Ax = b , whe re A is the coefficient matrix, x is the (column) vector ofvariablcs, and b is the (column) vector of constant terms.
You should have no difficulty seemg that every Imear system can be writlen in the form Ax = b . In fact, the notation (A I b] for the augmented matrix of a linear system is just short hand for the matrix t<Juation Ax = b. This form will prove to be a tremendously useful way of expressi ng a system of linear equations, and we will exploit it often from here on. Combining this insight wi th Theorem 2.4, we see that Ax = b has a solution if and only if b is a linear combination of the columns of A. There is 'lOot her fact about matTJX operations that will also prove to be qu ile useful: Multiplication of a matrix by a standard unit vector can be used to "pick out" o r
" reproduce" a colum n or row 0 f ' Let a matriX.
A =
[4o 25 - I]\ andconsl'der t he
products Ac) and e2A, with the lllli t vectors c J and e 2 chosen so that th e products make sense. Thus.
Ae J = [ 0 4
2 5
,nd
1][4 2 o 5
=[0 5 -I} Notice that Ae) gives us the third column of A and c 2A gives us the second row of A. We reco rd the general result as a theorem.
,
Thlar•• 3.1
let A be an mX" matrix, e, a I X ", sta ndard unit vector. and e, an /IX I standard unit vector. Then a . e, A is the ith row of A and b. Ae} is the jth colum n of A.
Section 3.1
Matrix Opt-rallons
lC3
Prol'
We prove (b ) and leave p roving (a) as Exercise 41 . lf a l , ... , a" arc the colu nl ns o f A, then the product Ae) can be written
,
,
Ae = Oa+ Oa +···+ ' We co uld also prove (b) by direct calculation:
,
Ae
~
o
a" a"
•••
\
a
a", I
o
a.
since the I in eJ is the jth entry.
Partitioned Matrices It will o ft en be convenient to regard a mat rix as being oomposcd of a number of smaller slIbmatrices. By introducing vertical and horizontal lines into a matrl.1(, we can partition it into blocks. There is a natural wlly to partition man y matrices, particularty those aflSlIlg in certain applications. Por example, consider the matri x
A ~
\
0
0
2
- \
0 0
\
0
\
0
1 4
3 0
0 0
0 0
0 0
1
7
7
2
It seems nalUralto Pllrtition A as 1
0
0
0 !• 2
-I
1
0 •'• \
3 0
• •
I _ :• .... 4 0__ ... 0......... 0 0 01 \ • 0 0 0 :i
.
, 7
[~ ~l
where lis the 3><3 identity matrix, B is 3X2, 0 is the 2 X3 zero matTlx,and Cis 2Xl. [n this way, we can vie\¥ A as a 2X2 matrix whose entries are themselves matrices. When matrices are being mul tiplied, there is ofte n an advantage to be gained by viewing them as partitioned ma trices. No t on ly does th is frequentl y reveal unde rl ying sl ructures, but it ofte n speeds up computa tIOn, espe<:ially when the ma trices are large and have man y blocks o f zeros. It tu rns ou t that the mult iplication of partitioned matrices is just like ordinary mat rix multiplication. We begin by considering some special cases of partitioned matrices. Each giyes rise to a different \"3Y of viewing the prod uct of two nKtt rices. Suppose A is III X II and B is lI >< r, so the produc t AB exists. [fwe partit ion B ill terms of Its wlu mn vectors, as B lb l ' h ! ' ... : b ,l. then
144
Chapter 3 Malrices
Th is result is an imm ediate consequence of the definitio n of matrix mul tiplicatIo n. The form o n the right is called the matrix-colum n rep resellttltiotJ of th e product.
Example 3.9
If
A=
[~
3 -I
~]
4
-I
I
2
3
o
then
3
Ab,. = [ 01
and
- I
3 - I
- I
~] ~
J32 :: - 5]2 . (Check by ordinary matm multi plicatio..+ n.)
Therefore , All = IAb l ; Ab 2 J = [
Observe that the matrix-colum n representa tion of AB allows us to writc cilch column of All as a linear combi nat ion of the columns of A with cntric.s from B as the coefficients. Fo r example,
Rillar.
3 - I
(See Exercises 23 and 26.) Suppose A is mX nand B is /I X r, so the produ ct All exists. If we partition A in terms of its row vecto rs, as
A, A,
........
A=
• ........
A, A, .......
A ,B .........
.......
then
All =
Il =
A,B
.........
• .......
• .........
A.
AmB
O nce again, thIS result IS a d irect consequence of Ihe definitIon of matnx multiplicatio n. The form on the rIght is called the row-1twtrix represemation of l he product.
Example 3.10
Usmg the row- matrix rep resentation, compute AB for the matrices in Example 3.9.
145
Sectio n 3. 1 Matrix Operations
Solltl..
We compute
, A ,B = [ \
3
2)
- \
\
2 = [13 5) 'nd
3
0
- \
A, B = [0
- \
\
2
3
0
!)
- 2)
= [2
Therefore AB := ,
4
[·~~!!-l = ['~'~'------~'l as before. A2B 2 - 2'
The defi n ition o f the matrix product A B uses the natural partitio n of A into rows and B into columns; this form might well be called the row-column rep'~jen laljon o f the product. V·le can also plIrlition A in lo columns and B into rows; this for m is called the column-row representation o f the product .
In this case, we have
8, A =
[3 r : al : ... : a"l and
B,
B =
.- -•... 8.
B, + .. :r-a"B~ • ......
(2)
8,
Notice thai the sum resembles a dot product expansion; the difference is that the indi-
vidual terms are matrices, not scalars. Let's make su re that this makes sense. Each term arB, is the product of 3n //IX I and a I X r matrix. Thus, each a,B, is an mX r Illatrixthe same size as AB. The products a,B, are called outer products, and (2) is called the 014t~rJ1roduct ~xpallsioF1 of AB.
lllmpia 3.11
Compute the outer product expansion of An for the matrices in Example 3.9.
Solullan
We have
, nd
8, ..... 8 = 8,
B, The outer products are - \) = •
and
[0'
,........... - \
~
\ 2 .....••..... 3 0
141
Chapler 3 Matrices
(Observe that computing each outer product is exactly like filling in a multiplica tIo n table.) Therefore, the outer product expansion of AB is
-1]+[ 3 6] + [6 0]- [1 3 o -2
- I
0
J
2
We will make use of the outer product expansion in Chapters 5 and 7 when we discuss the Spectral Theorem and the singular value decomposition, respectively. Each of the foregoing part itions IS a special case of partitioning in general. A matrix A is sa id 10 be partitioned if horizontal and vertical li nes have been introd uced, subdividing A into submatrices called blocks. Partitioning allows A to be written as a matrix whose entries are its blocks. For example, 2 - I 1 3 A- ••••••••••••••• 0 0 1 1" ••••••••••••• 4 0 0 0 OJ 1 7 •• 0 0 OJ 7 2 1 0 0 0 1 0
3 :• 1 2!• 1 • •
4
212
- 1
8-
,nd
• 1 ••• 1
•• 1 - 5 : J 1 .............•. +• ...•...•.. +• ... • •
1
3: OJ• 0 o i• 2
0
1 10
0;3
arc panitioned matrices. They have the block st ructures
A" ] A- [An An A 21
B_ [8n
and
B"
Du
8"
~]
tf two matrices arc the same size and have been partitioned in the same way. It IS clear that they can be added and multiplied by scalars block by block. Less obvious is the fact that, with suitable partitioning, mat rices can be multiplied blockwise::as well. The next ex::ample illustrates th is process.
Example 3.12
Consider the matrices A and B above. If we Ignore for the moment the fac t that their en tries are matrices, then A appears to be a 2x2 matrix and Ba 2x3 matrix. Their product should thus be a 2X3 matnx given by
A"][B,,
An ~ 1 All Bll + All~l = [ A 21B II
+ A22~'
B" 8"
~]
All Bn
+ Alliin
A' IBu
A ! IB' 2
+ A22~2
A 21BI)
+ AI2Bn] + A!2Bn
But all of the products in this calculation arc actually IIlIItr;x prod ucts, so we need to m::ake sure that they are all defined. A quick check reveals that this is indeed the case, since the numbers of cO/limns in the blocks of A (3 and 2) match the numbers of rows in the blocks of B. The matrices A and B are said to be partitioned conformably for block multiplication. Carrying o ut the calculations indicated gives us the product AB in partitIOned form: 3 -I 2 1 -5 4
+
2 - 1 I 3 = 4 0
6
2 o 5 5 -5
Section 3.1
141
Matrix O perations
(When some of the b locks are 7.C ro matrices o r identity matrices, as is the case here, these calculations can be do ne q uite q uickly.) The calculatio ns fo r the o ther five blocks of AB are similar. Check that the result is 6
o
2 :• I ••
2 1 2 • •
I •. 12
5i2 • 5 -- --.. - 5' . +! ...... 3 __3.. t ....9. ... I 7 :• 0 0 : 23 7
• • •
••
2 i O 0 : 20
..-t-
(Observe tha t the block in the upper left co rne r is Ihe rcsult o f o ur calculations above.)
Check lh" you ob"in lh, "m, , n"", by multiplying A by B in th, u, u, 1"'y.
Malr'. Powers When A a nd B are twO /I X " m atrices, thei r p roduct AB wil l also be an n X n m atrix. A special case occurs when A = B. It makes sense to defi ne A.! = AA and , in general, to
defi ne Ak as , Ie factor1
if k is a positive integer. Thus, A I = A, and it is convenient to defi ne All = I,.. Hefa re making too many ass um pt ions, we sh o uld ask o urselves to what exten t ma tn x powers behave like powers of real numbers. Th e fo llo w11lg properties fo ll ow immed iately fro m the defin it io ns we have just given and arc the mat ri x a nalogues of the corresponding pro perties for powers of real nu mbe rs.
If A is ,I square m a trix and r and safe nonnegative integers, then
•
J. A'A' = Ar+· 2. (A' )'= A" I
In Section 3.3 , we will extend the definitio n and properties to inclu(ie negative integer powers.
Example 3.13
(a) [f A = [:
A
2
:]. the n
=[: : ][ ~ :J = [~ ~]. AJ= A2A =[~ ~][: :] = [~
and, in general,
:J
.II'" [-'" , 'l 2"
2" 2'
The above statemen t can be proved by m a thema tical induction, si nce it is an mfi m te collectio n of s ta tements, o ne fo r each natu ral num ber II. (Appe ndix B gives a
148
Chapter 3
Matnces
b rief review o f mathematicallllduction.) The basis step is to prove that th e formu la h olds for n = 1. In this case,
:] = A
as required. The induction hypothesIs is to assume that
22H] H fo r some integer k 2: I. The induction step is to prove that the formula ho lds for 1/ "" k + \. Using the d efinition o f matrix powers and the induct ion hypo thesis, we compute
:]
2<-1 + 2H] + 2.1-1
2H
"""-, ]
ii-H r l
- [i
H 1- 1
i h l)- I
T hus, the for mula holds for all n 2: 1 by the principle of mathematical induction. (b) If 8 ""
0 -I] [ I
0' then B2""
[0 -'][0 -I] [-I 0]. \
0
1
0
0
=
- I
Continuing,
we find /j~= 81 B =
[-Io - 0\ ][0 -I] [-~ ~] 1
0
,nd
B' = B' B= [ bus,
If =
° 1][0 -I] [' 0]
- 1 0
\
0
0
1
B, and the sequence of powers of B repeats in a cycle of fOll[:
[0 -I] [-I 0] [ ° '] [' 0] [0 -I] 1
0'
0
- 1' - \
0' 0
\ '
\
0'
Tbe Transpose 01 a Matrix Thus far, all of the mat rix operations we have defined are analogous to operations on real numbers, although they may not always behave in the same way. The next opera tion has no such analogue.
Section 3.1
Matrix Operat ions
141
, The transpou of an mX" matrix A is the nX til matrix A J' oh~ tained by interchanging the rows and columns of A- That is, the ith column of A r js the ith row of A for all i.
Definillon
Enmple 3.14
L
[~
3
0
:J.
~l·
8 = [:
, nd
C = [5
-1
2J
Then their tra nsposes are 5 3 0 • 2 1 1
AT =
aT=
[a/,
~J.
, nd CT =
5 - 1
2
•
4
The transpose is sometimes used to give an alternative defi mtlon of the dot product of two vectors in terms of matrix multiplication. If
u,
v,
u,
v,
and
v =
u,
then u· v -
v,
"I"I +
II~ V~
+ .. . +
II~V.
v,
-
=
[ II I
1/2
1
v,
II w
v,
>i A useful al ternative deflllition of the transpose is given component wise: (A 1)., = Ajo
for all i and j
In words, the entry in row i and column j of AT IS the saine as the entry in row } and column i of AThe transpose is also used to define a ver y Important type of sq uare matn x: a symmetric matrix.
Definition
A square malnx A is symmetric if AT its own transpose.
E... ple 3.15
L
1 3
2
35 20
0 4
A-that is, if A. is equal to
150
Chap teT 3
Matrices
T he n A is symmetric, since AT = A; but a IS not symmet ric, si nce aT =
[~
-1] 3
4= B.
-t
A sym met ric matrix has the property that it is Its own "mirror image" across its m ain diagonal. Figure 3.1 illustrates this propert y for a 3X3 ma tri x. The corresponding shapes represent equal e ntries; the diagonal entries ( those on the dashed line) are arbitrary. A component w ise defin it ion of a symmetric mat rix is also useful. It is si mpl y the algebraic description of the "reflection" propert y.
nuure 3.1 A symmetric matrix
A square m atrix A is sym metric if and only if A ~ = A" for all i a nd j.
L" A
~
[
3
- 1
~
D
~].
-3]
[ 0 -2
In Exercises /-16,
B=
1 •
l~
- 2 2
E ~ [4
COllljJlllt: II,e
:J. 2].
1 2
C ~
3 5
F ~
4 • 6
[-:]
indicated nUl/rices (,f
possible). 1. A
+ 2D
2. 3D - 2A
3. B - C
4.B - CT
5. AB
6. BD
7. D
+
8. BTB
BC
10. F(DF)
9. E(AF)
I I. FE 13. BTCT
12. EF -
(ClW'
15. A3
14. DA - AD 16.
(I2 - D ?
17. Give a n example of a nonzero 2x2 matnx A s uch that
A2 = O. 18. Let A =
[!
~]. Find 2X2 matrices Band C suc h that
AB = ACbut B#: C.
19. A fa ctory man ufactures three products (doohlc kies, gizmos, and widgets) a nd shIps the m to two wareh ouses fo r storage. The number of units of each product shipped to each warehouse is given by the matrix
A =
200
75
150
100
100
125
(whe re a,) is the nu mbe r of uni ts of product i sent to warehouse j a nd the products arc taken in al phabetical orde r). The cost o f shipping o ne unit o f each p roduct by tr uck is $1.50 per doohickey, $1.00 per gizmo,and 52.00 per widget. The co rres ponding unit costs to ship by trai n are $ 1.75, $1.50, and $1 .00. O rganize these costs into a ma trix B and then usc matrix multiplica· tion to show how the factor y can compare the cost of shippi ng its products to each o f the two wareho uses by tr uck and by train. 20. Referri ng to Exercise 19, suppose that the unit cost of dis tribu ting the products to stores ]s the same for each product but varies by warehouse because of the dis· tances invo lved. It costs $0.75 to distrib ute o ne uni t from warehouse I and $1.00 to d istri bute one unIt from wareho use 2. Orgalllze these costs into a matrix C and then use matrix multiplIcat Ion to compute the total cost of distributing each product.
SectIOn ].1
3 2. A =
2xz + 3x) "" 0 2xl + x 2 -5x) ""4 Xl -
[ '4
1
-2 33. A ,..
A = -3
1
1
2 0
- I
, B=
1
- I
0
1
4
the columns of A. 24. Use the row- matrix represen tation of the product to wrrte each row of AB as a linear combination of the rows o f B.
- 2 3
2
o
1
..3.. 4 i ...0
1
o B= o o
••
,
.~
1
oi - I
o ] o 0 o 0
1
- I
o :• 0
1
•
o
] : I
.
"1'
0
i• 0
- I
o
O!]
]
I
,
3
I
0 2
0
1 4
I
I
3
:
B=
'
O[ 4
o.............. 0 1 ·1·· .. ••1 I
I
] :- 1
(a) Compute AZ, AJ , .•. , Ai. (b) What is AlGOL ? Why? 1
26. Usc the matrix-column representallon of the product to write each column of BA as a linear combination of
28. Com pute the o u ter prod uct cxp:msion of BA.
37. letA =
29. Prove that if the columns of B a re linea rly dependent. then so are the colu m ns of A B.
30. Prove that if the rows o r A arc linearly dependen t, then so arc the rows of AB.
1
I
Vi
Vi
. Find, wit h just ificatio n,
[~
:
J. Find a rormula rorA~ (1I ~ J)and
veriry your rormula using mathcmaticallllduction.
_ [COSO
- SinO]
si n 0
cos O
38. Le tA -
.
COS20
-Sin29]. cos 20
(3) Show that A l = [ . Sin 29
(b) Prove, by ma thematical inductIon, that
A"=
lira I lire produci AB I/wkes
sellse.
- Vi
B lOO l.
the columns of B. 27. Use the row-ma trix represen tation of the produc t to write each row of BA as a linear combination of the rows of A.
1
Vi
36. Let IJ =
25. Compu te the outer product expansion of AB.
(mlWlC
4
35. LetA = [_~ :J.
3 0 - I 1 6
5
1
34. A -
23. Use the mat rix-column representation of the product to wrrte each column of AB as a Irnear combination of
I" Exercises 29 rlml 3D,
5
100
-,
1 0
,
1
o I! I
III Exercises 2~28, let
arid
3
0 001
1 ,
- I
151
o:1
In ExerCISes 21-22, write r/ie givetl system of lillear eqllat/O/IS as a matrix equation of rlre form Ax = b. 21.
Matrix Operations
COS 110
.
[ sm
119
- si n 110]
ro r n> I.
cos 110
39. In each or the rollowing, flOd the 4X4 matrix A = (ll~ 1 that satisfies the given condition: ~ j
(a) ll., = (- 1)''"1
(b)
(c) Q., =(i - IY
(d ) a,, =sm
QIj =
j
. (U+ j - I)") 4
40. In each or the rollowlng, find the 6X6 matrl."( A = (Q 1
III &ercises 3 /- 34, compllie A B iJy Mock IIIII/liplicatioll, using tire i"tiict/ted ptrrlllio,,,,rg.
31. A
-
1 0 0
- I
• ••• •••
0
0
1 •• 0
... ,
0
0 •'
J
,
, B=
3
••• • •• •
, 1
0
- I 0 ................... 0 •: I
o 0
•
O! 1
9
that satisfies the given condition:
(a)
a.,
(c) a 'I
i+j = { 0
={I
0
ifi S j
if I
>j
(b)
if6
4 1. Prove Theorern 3.1(3 ).
ll'l ::O
{' if [i - , [ S l 0 if li - JI > I
152
Chapter J
Mat rices
•
Matrix Algebra In some ways, the arith metic of milt riccs gcnerillizes that of vecto rs. We do not expect any surp rises with respect to addition and scala r multiplicatIon and mdeed there are none. Th is will allow us to extend to matrices several concepts that we are already familiar with from our wo rk with vectors. In particular, hnear combUlalions. spann ing sets, and linear independence carryover to matrices with no difficulty. However, matrices have o ther o perations, such as matrix multiplication, that vecto rs do not possess. We sh ould not expect matrix multiplication to behave like multip lICatIon of real n umbers unless we can prove that it does; in fact . it does no t. In this sect ion, we sum marize and prove some of the main proper ties of matrix o perations and begin to develop an algebra of matrices.
Properties 01 Addllion and Scalar Multiplication All of the algebraic properties of addition ;\nd scalar multiplication fo r vecto rs (Theorem 1.1 ) carry over to ma trices. For completen ess, we summa rize these properties in the next theorem.
Theorem 3.2
Algebraic Properties of Matrix Addition and Scalar Multiplication Let A, B, and C be mat rices of the same size and let cand (I be scalars. Then Commutati\'ity Associativity
a.A+B = B + A b. (A + 8) + C"" A + (B + C) c. A+O=A d. A +(- A)=O e. c(A + 8) = cA + cB f. (c + (I)A = cA + dA g. , I dA ) = ("I )A h. IA = A
DistributivilY Distributi vii y
•
The proofs of these p ro perties are duect analogu es of the correspo nd ing proofs of the vector p ro p~ rt i~s and are left as exefClses. LikeWise, the comments following Theorem 1.1 are equally valid here, and you shou ld have no difficulty usi ng these properties to perfo rm algebraic manipulations With matrices. (ReView Example I.S and see Exercises 17 and 18 at the end of this section.) The associat ivity property allows us to unam bigu ously combine scalar multiplication and addit ion without parentheses. If A. B, and C are matrices of the same size, then ( 2A
+ 38)
-
c = 2A + (38
- C)
a nd so we can simply write 2A + 3B - C. Generally, Ihen, if AI' A1• ... , At are matrices o f the same size and c p cz•...• c1 are scalars, we may form the linear combination clA I
+ c2 A1 + ... + etA,
We will refer to cl • '1 •... , c. as the coe{ficients o f the linear combinati on. We can now ask and answer questions about linear combinalio ns of matrices.
5«tion 3.2
Example 3.16
LetAI =[_~ ~J. A2 =[~ ~].alldAJ =[ :
153
Matrix Algebra
:].
(a ) 158 =
[ ~ ~]alinearcomblOationOfAI'Al. andAl?
(b) Is C =
[~ ~] a linear combination of AI' A
l•
and A,?
Solution (a) We want to find scalars ' I' S' and c, such tha t CIA I + <;A2 + clA, = B. Thus,
The left ~ hand side o f this equation can be rewritten as
Comparing entries and using the defini tion of matrix equality, we have equa tions
'I -c.
(OU T linear
0+ C,= 1
+ cJ = 4 +c,= 2 '1+ c,=
1
Gauss-Jordan elimination easily gives
o
1
1 0
- I
o
1
1
o
0
I
1 4
o o
1
o
-2
0
1
3
000
0
1
0
1 2
1
1
1
•
(checklh is!),so'l = 1' (2 = -2,and c, = 3.Thus,A . - 2Al easily checked. (b) ThiS time we want to solve
Proceeding as
to
part ( 3), we obtain the linear system
'1+ c, =1
'. - (I
+', = 2 +c,=3 '1+,)= 4
+ lA, = B,which can be
Row reduction gives 1
1 1
1 0
1 2
0 - 1
0
1 3
0
1
1 4
,
R, - I(
0 1 1 1 0 1 - I 0 1 0 0 0
1
2 3 3
We need go no further : The last row implies that there is no solution. Therefore. in this case, C is 110t a linearcombillation of AI' A2, and Ay
Remarll
Observe that the columns of the augmented matrix contain the ent ries o f thc matrices we are given. If we read the en tries of each matrix from left to righ t and top to bOllom, we get the order in which thc ent ries appea r in the col umns of the augmented matrix. For example, we read AI as "0, I, - ],0," which corresponds to the first column of the augmented matrix. It is as if we simpl y "straightened out" the given matrices 111 10 column vectors. T hus, we would have ended up with exactly the same system o f linear equations as in part (a) if we had asked 1
"
4 2
a linear combinat ion o f
0
1
1 , - 1
0
0
1
0 1
1
,
, nd
1 1
, •
1
We will encounter such pa rallels repeated ly from now on. In Chapler 6, we will explore them III more detail. We can defille the spall o f a set of matr ices 10 be the set of all linear combinat ions of the matrices.
Describe the span of the matrices AI' A2, and AJ in Example 3.16.
Solullon
O ne way 10 do Ihis is simply to write out a general linear combinatio n of AI' A ~. and AJ • Thus, clA I
+ 0 Al +
c)A,:: cI[
_~ ~] + ~[~ ~] + C)[:
:J
(wh ich is analogous to the param etric rep rcsentalion of a plane). Hut suppose we wan t to know when the matrix [;
:J
is in span (Ap A 1 • AJ }. From the representa-
tion above, we know that it is when
[
-~
: ::
:: :
: :] = [ ;
:]
for some choice of scalars cl ' S. c,. Th is gives rise to a system of linear eq uations whose left-hand side IS exactly the S.1l11 e as in Example 3.16 but whose righi-hand side
Section 3.2
155
Matrix Algebra
is generaL The augmented matrix of Ihis system is 0
1
1 w
I
0
1 x
- I
0
0
1
1 Y 1
,
and row reduction produces 0
I
I w
I
0
\
x
- \
0
\
Y
0
\
\
,
,
\
0
0
0 0
\
0
0
\
0
0
0
}x - ~ y - tx - iy + w !x + ly
. w- ,
,
(Check Ihis carefully.) The only restriction comes fro m the last row, where clearl y we must have w - z = 0 in order to have a solution. Thus, the span o f AI> A 21 and AJ COn-
.
slstsof all mat rices
["xl z ' y
.
for whLch w = z. That Ls, span (AI>AZ,A J ) =
{[wxl} y w .
.+ lIole
If we had known this before attempting Example 3. 16, we wo uld have seen
immediately that B =
[I2 4]is a linear combination of AI' A21and A
Ihe necessary form (take w II , x = 41andY = 2),butC=
[~
!]
J,
since it has
can not be alin-
ear combination o f AI' Al , and A, l since it does not have the proper form ( I 'fI:. 4). l inear independence also makes sense for mat rices. We say that mat rices AI' A l , ••• , At of the same size are linearly independent if the only solu tio n o f lhe equatio n
(\) is the trivl3l Olle: (. = ~ = .. = (l = O. If the re a re no ntrivial coefficients that satisfy (I ), then AI' A2•• •• , Ai are called linearly dependent.
Example 3.18
Determ ine whether the matrices AI' A l • and A) in Example 3.16 ar linearly
independent.
Sollilion
We want to solve the equatio n matrices, we have
(I[ _~
CI A .
+
CzA2
~J + (l[~ ~] + 0[:
+
c, A) = 0. Writing out the
:J [~ ~J =
This time we get a homogeneous linear syslem whose left -hand side is the same as in Examples 3.16 and 3. 17. (Are yo u slarting to spot a pallern yet?) The augmented mlllrix row reduces 10 give
o
1
I 0
I - \
0 0
1 0 J 0
o
1
I
0
,
\
o
o
\
o o
0 0
0 0 o 0 I 0 0 0
156
Chapter 3 Matrices
hus, CI = C:! == " "".,, and we conclude that the matrices AI' A z' and AJ arc lineady ndel2.ende nt.
Properties 01 MalliM Multiplication \N'henever we encounter a new operation, such as matrix multiplicatIOn, we must be careful not to assume too much about it. It would be nice If matrix multiplication behaved like multiplication of real numbers. Although in many respects it docs, there are some significant differences.
Example 3.19
Consider the matrices A ~
[ -~ -~]
,nd
B= [:
~]
Multiplying gives AB = [
2
- ]
-~][ ~
:]
[-: -~]
and
BA
[: ~][ -~ [:
-~]
~]
[rhus, AB"* BA. So, in contrast to m ultiplication of real numbers, matrix multiplication is /lot commutative-the order of the factors in a product matters! It is easy to check that A2 =
[~ ~]
(do so! ). So, for matrices, the equation
does not im ply that A = 0 (unlike the situation for real numbers, where the equation X- = 0 has only x = 0 as a solution ). A~
=0
However gloomy things might appear after the last example, the situation is not really bad at all- you just need to get used to worki ng with matrices and to constantly remind yourself that they are not numbers. The next theorem summarizes the main properties of matrix multiplication.
Theorem 3.3
Prdperties of Matrlx Multiplication Let A, B, and Cbe matrices (whose sizes are such that the indicated opera tions can be performed) and let k be a scalar. Then , . A(BCI
~
(ABIC
b. A(B + C) = AB + AC c. (A + B)C= AC + BC d. k(ABI ~ (kAIB ~ A(kBI e. {rn A = A = Aln if A is mX /J
Associativity Left dislribulivilY Right distribulivity ~tultiplicat ive
identity
Proof We prove (b) and half of (e). We defer the proof of property (a) until Section 3.6. The remaining properties are considered in the exercises.
•
Section 3.2
Ma tr ix Algebra
(b) To prove A ( B + C) = AB + AC, we let th e rows of A be denoted by Aj and the columns of Band Cby bl and c, Then the;th column o f B + C is bJ + cJ (since addilio n is defined component Wise), and thus [ A(B + C) ], ~ A. · (b,
+ c,)
= A" bJ+ A" cj ~ (A B)"
+ ( Ae),;
+ AC)" Since th is is true fo r all j and j. we must have A( B + C) = ~ (A B
AB
+ AC.
(el To prove Ai" = A, we note lhat the identity matrix I" can be column partitio ned as
I" ==
rei' c
l ' ... :
c,,]
where c, is a standard unit vector. Therefore,
AI" = [Acl : Ac, ' . . , ; Ae,, ] = [ a l : a~ :
... :a"]
~ A
by Theorem 3.1(b). We ca n usc these properties to further explo re how closely matrix multiplication resembles Illultiplicahon of real numbers.
(xample 3.20
If A and B are square matrices o f the S<1mc size.
50lullon
LS (A +
8 )l = Al
+ 2AB +
If?
Using properties of matrix multiplication, we compute
(A
+ 8)'
~ (A ~
+ 8)(A + 8)
(A + 8)A + (A + 8)B
By left dbtribu tivity
+ AB + Ii B) right distribuIIVil)· Therefore, (A + 8)1 ::: A2 + 2AB + Ef if and only if Al + SA + AB + Ii = Al + 2AB + W. Subtract ing Al and If from both sides gives BA + A8 = 2AIl Subtracting AB from both sides gives I3A = A Il Thus, (A + B) 1 = A 2 + 2A IJ + B2rr and only if A :0
A2 + BA
and B commute. (Can you give an example of such a pair o f matrices? Can you find two matrices that do not satisfy this pro perty?)
Properlles 01 .he Transpose Theor... 3.4
Properties of the Transpose LeI A and B be matrices (whose sizes arc such that the indicated operations can be performed ) and Ict k be a scalar. Then .b. (A + a. (A'Y= A c. ( kA )T"" k(A I ) ~. (AB ) c. (A,)T = (AT)' for all nonnegative integers r
Bf"., ~
AT + HT
B' A
158
Chapter 3
Matrices
P,.ot
Properties (a)-(cl arc intuitively dear and straightforward to p rove (sec Exercise 30). Proving property (el is a good exercise in mathematical inductIOn (sec E.'(em se 3 1). We wi ll prove (d ), since it is no t what you might have expected. [Wo uld yo u have suspected that (An) T = ATBT might be true?1 First, If A is m X tI and Jj is /I X r, then BT is r X /I and AT is II X m. Thus, the product OTAT is defined and is r X III. Since AB IS III X r, ( AB) Tis r X III, and so (Am T and HTAT have the same si ze. We must now p rove that thei r correspondi ng entries arc equal. We denote the tth row of a matrix X by row,(X) and its jth colum n by col ,(X). Using these com'en tions, we sec that
[( AB)'l., = (A B),. "" rowj(A) . col,(H) "" coIAAT ). row,( H"'}
= row,(B"'} . co l~A') "" [BTA1" ( Note that we have used the definition of matrix multiplication, the defimtlon of the transpose, and th e fact that the do t product is commutative.) Since J and J are arbitrary, this resuh implies that (A B)T = BTAT.
1I ••• rll
Properties (b ) and (d ) of Theorem 3.4 can be generalized 10 sums and products of finitel y m::l lly mat rices:
(A\
+ A2 + ... + A~V =
A[ + AI + .. + A[ and (AlAI ·'· AI) T
"'" AT . AJAi assuming that the sizes of the matrices ::I re such that all of the operatio ns can be perform ed. You arc :lsked to p rove these facts by mathe matical induction III Exercises 32 and 33.
Ellmpll 3.21
Let A =
Then AT""
[~
[~
!l
- I
B 0:
and
[ ;
3
~l
3] soA + AT"" [2 !J. a sym metric matrix. 4 '
5
We have
- I
2 3
0
1
4
JjT ""
BBT =
- I [ :
- I
2 3
0
1
4
.nd
BTB
=
3
~l
[;
- I
2 3
0
1
4
- I 3
~l =
=
[1; I:] 20 2 2
2 10
3
,3 1
Thus, both BBT and BTB are sym metric, even tho ugh B is no t even square! (Check that AA T and ATA are also sym metric.)
Section 3.2
159
Matrix Algebra
The next theo rem says that the results of Example 3.21 arc true in general.
Theorem 3.5
a. If A is a square matrix, then A + AT is a symmetric matrix. b. For any matrix A, AA Tand ATA are symmet ric matrices.
Prool
We prove (a) and leave proving (b) as Exercise 34. We simply check that (A
+ Al) T = AT + (A l) T = AT + A = A + AT
(using properties of the transpose and the commutativity of mat rix addition ). Thus, A + AT is equal to its own transpose and so, by definition, is symmetric.
III Exercises 1- 4, solve the equation for [;
~]atldB= +
I. X - 2A
x, givetl that A =
~ ].
[- :
A.
- \
-\
9. span(A p A2l in Exercise 5
10. span(Ap A2 , A3 ) in Exercise 6
In Exercises 5-8, write B ,IS a litlear combination of tile other matrices, if possible.
5.B = [~ ~l AI=[_: ~l Al=[~
:]
6,8 =[' - 4
-\]
,
A = [0\
11. span(Ap A~ > A 3) in Exercise 7 12. span(A1> Al > AJ> A4 ) in Exercise 8
In Exercises 13-16, determine whether the given matrices are linearly independem.
!J. [~
~] : ], [ -~ ~].[:
0'
13. [;
[~
7. 8 [3 0
\ A I -- [ 0
=
-1]
0 1
14. [:
0 '
0
-I
\5.
0
2
-2
0
0
3
\
- 2 ,
o \ \ 0 0 I , 000
-\ A) =
0 0 10 ,
o o
AI =
002
A2 =
0
In Exercises 9-12, find the general form of the spall of tlte indicated matrices, as in Example 3.17.
4. 2(A - B + X) = 3(X - A)
8. B =
1
38 = 0
3. 2(A + 2B) = 3X
A2 = [
- I
00\
2.2X = A - B
A) =
=
1
0
0
o \ o 0
\ - \
o, - \
\
\
- 2
- \
3 ,
0
\
I
0
2
\
0
\
3
0
4
9
0
2 , 2 - \ 0 \ \ -\ 0 16. 0 2 0 , 0 2 6 5
2 0 0
- \
\
0
0
- \
0
0
0
- 4
:] - \
-3
\
9 5
,
4 2
0
, 0 1 0 , 0
3
5
160
Chapter 3
Matrices
17. Prove Theorem 3.2(a)-(d ).
35. (a) Prove that if A and B are sym metric /I X " ma trices,
IS. Prove Theorem 3.2(e)-( h).
thenso isA + B. (b) Prove that if A is a symmetric "X /I nWlrix, then so is kA for any scalar k.
19. Prove Theorem 3.3(c). 20. Prove Theorem 3.3(d ). 21. Prove the half o f T heorem 3.3(e) that was not proved in the text. 22. Prove that, for square matrices A a nd B, AB = BA if and o nly if (A - B)(A + B) = A2 - !Y.
In Exercises 23-25, ifB
c, a1ld d such Ihat AB
=
= [:
~J.fi/UI cOllditiollSOl1 a, b,
BA.
24.A =[ 1 -I] 25. A = [I '] 23.A =[ ~ 1] 1 - I 1 34 26. Find conditions on a, b, c, and d such that 8
=
[
',' dIe ]
. both [ 01 commutes With 27. F. nd condItions on a, b, c, and d such that B = [ :
~]
com mu tes with every 2X 2 matrix.
2S. Prove that if AlJ and BA are both defined, then AB and 1M are both square matrices.
tries below the main diagonal are zero. TIllis, tllc form of all IIppcr trial/gll/ar mtltnx IS
o
* * ....
•
0
A square matrix is called skew-symme,ric if AT = -A.
37. Which of the foll owing matrices arc skew-symmetric? (al [_:
o (el
!]
(b l
-3
3 0
- 1 ,
1
- 2
0
(dl
[~ -~] 0 -I
2
I 0
2 5
5 0
3S. Give a componentwise d efiniti on of a skew-symmetric mat ri X. 39. Prove that the main diago nal of a skew-symmetric matrix must consist entirely of zeros. 40. Prove that if A and B are skew-symmetric nX 11 matri ces, th en so is A + 8.
41. If A and B arc skew-symmetric 2 X2 mat rices, under what conditio ns is AB skew-symmctnc?
A square m(l/rix is callell upper triangular if all of the en-
* * o * o 0
36. (a) G ive an example to show that if A and 13 are symmetric nX /I matrices, then A B need no t be symmetric. (b) Prove that if A and B are sy mmetric /IX /I matrices, then AB IS symmetric jf and o nly if A /J = BA.
•
42. Prove that if A IS an /I X IImatnx, then A - AT is skew-symmetric.
43. (a) Prove 111,\1 any squa re matrix A can be written as the su m of a symmet ric matrix and a skewsymmetfl c matrix. ( Hint: Consider Theorem 3.5 and Exercise 42 ). 1 3 (b) Illust rate part (a) for the matrix A = 4 5 6
,
789
o •
where tile ent ries /IIt1rked .. are arbitrary. A more formal defilllllOli of SI/ch (/ IIImrix A = [a,)I is tllat a'i = 0 if i > j. 29. Prove that the product o f two upper triangular /l X n mat n ces .s upper triangular. 30. Prove T heorem 3.4(a) - (c). 31. Prove Theorem 3A (e).
Tile 'race of all II X /I matrix A == Ill'/] is tile sum of the ell/TIes 01/ its III(/i" diagmllll (lIl d is delloted by tr (A). That IS, tr(A) = a ll + au
+ ... +
a....
44. If A and fj are /IX /I matrices, prove the fo llowing properties of the trace: (a) tr {A + B) = tr (A) + tr ( B) (b ) tr ( kA ) = kt r (A), where k is a scalar
32. Usi ng induction, prove that fo r all" 2: I, (A. + A2 + .. + A ~)T = Ai + Af + ... + A ~.
45. Prove Ihat if A and B are /IX n matrices, then tr(AH) = tr(BA).
33. Using induc tion, prove that for all n 2: I,
46. If A is any mat rix, to what is Ir (AA T) equal?
T ATA ( A I Al ' ' A~ lT= AT''' ~ ~ I' 34. Prove Theorem 3.5(b).
47. Show that there are no 2X2 matrices A and Bsuch that AB - SA = 12 ,
Section 3.3 The Inverse of a Matru
161
The Inverse 01 a Matrix In this section, we return to the matrix descript ion Ax = b of a system of linear equations and look for ways to uSt matrix algebra to solve the system. By way of analogy, consider the equation ax = h, where ti, h, and x represent real numbers and we wan t to solve for x. We can quickly figure out that we want x = bit, as the solution, bu t we must remind ourselves that th iS is true only if a"* O. Proceeding more slowly, assuming that a"* 0, we will rt"ach the solution by the fo llowing scquenct" of steps:
1
1 (1)
ax = b::::} -(ax)=-(b ) ::::} a (l
b ::::}l'x =-b ::::} x = b - (a) x = (/ (I (l a
(This example shows how much we do in ou r head and how many propert ies of arithmetic and algebra we take for granted! ) To imitate this procedure for the matrix equation Ax = b, what do we need? We need to fi nd a matrix A' (analogous to I / a) such that A' A = I, an identity matrix (analogous to 1). If such a matrix exists (analogous to the requ irement that a ;:l; 0), then we can do the fo llowing sequence of calculations: Ax
b => A'(Ax) = A' b ::::} (A' A)x = A' b ::::} Ix = A' b => x == A' b
=
(Why would each of these steps be justified?) Our goal in this section is to determi ne preCisely when we can find such a matrix A'. In fact , we arc going to insist on a bit more: We want not only A' A = I but also AA ' == 1. This requirement forces A and A' to be square matrices. (Why?) =2
Dellnltlon
If A is an nX II mat rix, an j"verse of A is an the property that
AA' = I and where 1 == ;'JVertible.
EKample 3.22
,
If A =
AA ' :
[
21
is the
matrix A' with
A' A ~ 1
"x" identity matrix. If such an A' exists, then A is called
5] then A' = [ 3 -'] 2 is an inverse or A, since 3 '
- I
[2 5][ 3 -5]: [1 I
Example 3.23
1~
/IX /I
3
- I
2
0
0] and A' A = [ 3 1 - 1
Show that the following matTices are not invertible:
(, ) 0 :
Solullon
[~ ~]
(b) B : [ ;
:]
(a) It is easy to see that the zero matnx o docs not have an inverse. Ifit did, then there would be a matrix 0 ' sllch that 00' = J == 0 ' 0. But the product of the zero matrix with any other matrix is the zero matrix, and so 00' could never equal the identity
matrix I. (Notice that this proof makes no reference to the si%.e of the mal rices and so is true for nXn matrices in general.) (b) Suppose B has an inverse B'
=
[ ;'
:J.
The equalion 118' = / gives
[; ~t :] [~
~]
=
from which we get the equations
+ 2y
w
+ 2z
x
+ 4y
211'
= 1
= 0
=0
+ 4z = I
2x
Subtracting twice the fi rst equation from the third yields 0 = - 2, which is clearly absurd. Thus, there is no solution. (Row reduction gives the same result but is not rea lly needed here.) We deduce that no such matrix B' exists; that is, IJ is not invertible. (In fact, it docs not even have an inverse th
Remarll • Even though we have seen Ihat ll1;!trix mult iplication is not, in general, commu tative, A' (ifil exists) muSI sati sfy A' A = M ' . • The examples above mise two queslions, which we will answer in th is sect ion: ( I) How can we know when a matrix has an inverse? (2) If a matrix docs have an inverse, how can we find it? • We havc not ruled oul the possibility that a matrix A might have more than one invcrse. The next theorem assures us that this cannot happen .
.6
-
If A is an invertible matrix, then its inverse is unique.
Prool
In mathematics, a standard way to show that there is just one of something is to show that there cannot be more than one. So, suppose that A has two inversessay, A' and Aff. Then AA' = / = A'A
Thus,
A'
and
AA "= I =A ~ A
= A', = A'(AA") = (A' A)A" = lA" = A"
Hence, A' = A", and the inverse is unique. Thanks to this theorem, we ca ll now refer to the inverse of an invertible matrix. From now on, whell A IS invertible, we ,,,.11 denote its (unique) IIlversc by A - 1 (pronounced " A inverse" ).
W.ralill
1
Do not be tempted to wnte A- 1 = A! There
.
1S
no such operatlOll as
"division by a matrix," E"cn if lhere were, how on earth could we dlllidc the scalar I by the
Section 3.3 The Inverse of a Matrix
163
mntrix A? If you ever feel tempted to "dIvide" by a matTiX, what you reaUy want to do is multIply by its inverse. We can now complete the analogy that we set up at the begmning of this section .
.
----~~--~------------------------------
Theorem 3.1
If A is an invertible nXll matrix, then the system of linear equations given by Ax =< b has the unique solution X = A 'b for any b in R",
Prool
Theorem 3.7 essentially formah zes the observation we made at the beginning of this section. We will go through it again, a little more carefully this time. We are asked to prove two things: that Ax = b Itasa solution and that it has onlyOtlesolution. (In mathematics, such a proof is called an "existence and uniqueness" proof.) To show that " solution exists, we need o nly verify thai x = A ~ lb works. We check that
A(A- ' b)
= (AA- ')b = Ib = b
So A-I b satisfies the equation Ax := b. and hence there is at least this solution. To show that this solution is unique, suppose y is another solution. Then Ay = b, and multiplying both si des of the equation by A- I on the left . we obtain the chain of implications
Thus, y is the same solution as before. and therefore the solution is unique.
So, returning to the q uestions we raised in the Remarks before Theorem 3.6, how can we tell if a matrix is invertible and how can we find its inverse when it is invertible? We will give a general procedure shortly, but the SItuation fo r 2X2 matrices is sufficielltly simple to warrant being singled o ut.
Theorem 3.8 IfA = [:
~], then A is invertible if ad ,
A
1
bc*O, in which case
[ of
= ad - bc -
-b ] u
If ad - bc = 0, then A is nOLmvertib1e. The expression ad - bc,s called the determin a nt of A, denoted del A. The formula for the inverse of
[ac db]
(when it exists) is thus
I
limes the matrix obtained
cletA by mterchanging the en tries on the main d iagonal and changing the signs on the other two entries. In addition to giving this for m ula. Theorem 3.8 says that a 2X2 matrix A is invertible if and only if det A i:- O. We will see in Chapter 4 that the determinan t Can be defi ned for all square matrices and that this result remains true, although there is no simple formu[;, for the inverse of la rger square matnces.
Proal Suppose that det A "" ad - bc=;': O. Then
" b][ of -b]= [ad- be -ab + bal = [ad - be [cd -c a cd - dc -cb +da 0
0]
[I 0]
ad-be = detA o
1
16C
Ch:tpler 3
Ma lr ic~
Similarly,
b] ~ del A[ ' 0] [- de -b][a a e d O l Since del A -=I=- 0, we can m ultIply both sides of each equa tion by l / det A to obtain
1,] ( , [-cd -b]) [' 0 ] a b ] [, 0 ] (del, [- dc -b]n)[a ed -
a [ e (/ and
=
dcl A
A
0
I
0
I
[Note that we have used property (d) of Theorem 3.3. 1Thus, the matrix
, [ d det A -c
- ~] ..
satIsfi es the definiti on of an I1lverse, SO A is invertible. Since the inverse of A is unique. by ThCQrem 3.6 , we must have A- I
d "" det A - c I
[
Conversely, assume thai (/(J - he = 0. We will consider separately the cases where a -=I=- and where tI - O. If a -=I=- 0, then d "" he/n, so the matrix. can be written as
°
where k = c/ a. ln o ther words, the second rowof A ISa muillple of the fi rst. Referring
x]
to Example 3.23(b), we see that if A has an inverse [ w y z' then
x] [kna kbb] [W yz
-
[IOl0]
and the correspo nding system of linear equations
+
aw
~ I
by
+ bz;;:: 0
ax
+ khy
kall'
= 0
+ khz =
kax
1
has no solu tion. (Why?) If (I ;;:: 0, then ad - IJC = 0 implies that be = 0, and therefore either b o r c is O. Thus, A is of the (orm
[~ ~] In the fi rst case,
0'
[~ ~]
[~ ~][ ;:] [~~]" [~
n
have an inverse. (Verify thiS.) Consequently, if ad - be = 0, then A IS not inve rtible.
Simi""y,
[~ ~] cannot
Section 3.3
Enmple 3.24
Find the inverses of A =
Solutlol
[~
! ] and8 =
-~]
i [ _: 2
165
[I! -IS]
We have d el A = 1(4) - 2 (3) = - 2 A- I =
The Inverse of a Matrix
- 5
, if they exist.
"* 0, so A is invertible, with
[ -~ _~]
=
(Check th is.) On the other hand , d et B = 12 ( - 5) - (- 15) (4) = 0, so B is not invertib le.
-+(Kample 3.25
Use the inverse of the coefficient matrix to solve the linear system
x + 2y = 3 3x + 4y=-2
Solutloll
The coeffi cient mat rix is the matnx A =
[~
!l
whose inverse we com-
puted in Example 3.24. By Theorem 3.7, Ax = b has the unique solution x = A-l b. Here we have b =
He_.f.
[ -~ l thus, the solution to the given system is
-+-
Solving a linear system Ax = b via x = A -'b would ap pear to be a good
method. Unfortunately, except fo r 2X 2 coefficient matrices and matrices with certai n special fo rms, it is almost always faster to use Gaussia n or Gauss-Jordan elt mination to fi nd the solutio n d irec tly. (See Exercise 13.) Fu rthermore, the technique of Exa mple 3.25 works only when the coeffi cient matn x is square and invert ible, while el iminatio n methods can always be applied.
Properlles ollnverllble Malrices The foIl 0....,111£ theorem records some of the most Important p roperties o f invertible matnces.
Theore.3.9
a. If A is an invertible matrix, then A
I
is invertible and
b. If A is an inverti ble matrix and c is a nonzero scalar, then cA is an invertible matrix and
tcAt' =.!.c A
I
c. If A and B a re invertible mat rices of the sam e size, then AlJ is invertihle and
Matrict'S
d. If A is an invertible matrix, then A: is invertible a
(AT) .
~
(A ')T
e. If A is an invertible matrix, then A" is invertible for all nonnegative integers 11 and
(A") .. (A ')
Prool We wIl! prove properties (a), (c), and (c) , IClIving properties (b ) and (d ) to be proven in Exercises 14 and IS. (a) To show that A-I is invertible, we m ust argue t hai there is a matrix X such that
But A certainly satisfies these equation s in IJlace of X, so A - I is invertible and A is irwerse of A -I . Since inverses are umque, this means lhat (A - I) - I = A.
fi ll
(c) Here we must show that there is a matrix X such tha t
(AB)X = I = X(AB) The claim is that substituting B- 1A - I for X works. We check that
(AB)W'A - ')
~
A(BB- ') A- ' = AlA - ' = AA- ' = I
where we have used associativity to shift the parentheses. Similarly, (B- 1A - I) ( AB) = 1 (check!), so A B is l!lvertlble and its inverse is B-I A -I . (e) The basic idea here is casy enough . For example, when n = 2, we have Al (A- I)2 = MA- IA- I = AlA-I = AA- I = I
Simi larly, (A - I) 2A' = 1. Th us, (A- I ) l is the inverse of A~. It is not d ifficult to see that a sllnilar argument works for any higher integer va lue of n. However, mathematical inductio n is the way to carry OU I the proof. The basis step is when '1 = 0, in which case we are being as ked to prove tha t A" is 1I1vertible and tha t
(AO) - I
=
(A-I t
This is the same as showing that 1 is invertible and that r l = I, whICh is dearly true. (Why? See Exercise 16.) Now we assume thai the result is true when II = k, where k is a specific non negative integer. That IS, the induction hypothesis is to ass ume that Ak is invertible and that (Ak) - I = (A - I)k
The IIlduclion step requires that we prove t hat Ak+1 is invertib le and thai (AI... 1) - 1 = (A - 1)h I . Now we kno\" rro m (c) that Al • I = AkA is invertible, si nce A and (by hypo thesis) Al are both invertible. Moreover, (A- I)k+1 = (A- I)kA - l
= (A~) - IA - I
By the induction hypothesis
= { AAl r l
By property (e)
=
{AI'~
I )- 1
•
Section 3.3 The Inverse of a Matrix
161
Therefore, A ~ is invertible fo r all nonnegative integers n, and (A") - I = (A - 1) " by the principle of mathematical induction.
Hemarlls • While all of the properties o f Theorem 3.9 are useful, (c) IS the one you should highlight. It is perhaps the m ost important algebraic property of matrix inverses. It is also the one that is easiest to get wrong. In Exercise 17, you are asked to give a coun terexample to show that, contrary to what we might like, ( AB) - I '# A - I B- 1 in general. The correct property, (A B) - 1 = B- 1A - I , is sometimes called the socks-and-shoes rule, because. although we p ut our socks on before our shoes, we take them off In the reverse order. • Property (c) generalizes to products of finitely many invertible matrices: If A I' A 2,.·., A n are invertible matrices of the same size, then AlA:! ... A n is IIlvertible and A ) - I = A- I ( AA"' I 2 " n
(See Exercise 18.) Th us, we can state that
The inverse of a product of invertible matrices is the product of their inverses in the reve rse order.
. Sm ce, for real numbers,
1 • + - , we should not expect that, for square a+b a b matrices, (A + B) - I = A- I + B- 1 (and, indeed , this is not tr ue in gen eral; see Exercise 19).l n fact, except for special matrices, there is no formula for (A + B)- I. • Property (e) allows us to define negative integer powers o f an invertible matrix:
If A is an inverti ble matrix and
1
II
I
'* -
is a positive integer, then..A- " is de.6ne.d by
With this definition, it can be shown that the rules for expo nentiat ion, A'A' = A r+, and (Ar) ' = A" , hold for all integers rand 5, provided A is invertible. One use o f the algebraic p roperties of matrices is to help solve equations involving matnces. The next example ill ustrates the p rocess. Note that we must pay pa rticular atte ntion to the order of the matrices in the product.
Example 3.26
Solve the following matrix equation for X (assuming that the matrices involved are such that all of the indicated operations are defined ):
168
Chapter 3
Matrices
SolulioD There are many ways to proceed here. One solution is A- '(BX) - '
= (A- I B3 r::::} «BX)A) - l = (A -1 B.l? =>[«BX)A) - 'r ' ~ [(A - 'B' )' r ' => (BX)A ~ [(A - 'B' )(A-' B')t' ::::} (BX)A = B-J(A -J )- IB-\A- I)-I
=> 8XA = W~AB-JA => B-1BXAA - I = B- 1B- 3 AB- J AA - 1 => IXI = B- 4 AB- J I => X = 8- 4 AB- 3 (Can you justify each step?) Note the careful use of Theorem 3.9( c) and the expansion o f (A-1 8 3 ) 2. We have also made liberal use of the associativity of matrix multiplicatio n to simplify the placement (or el imination) of parentheses.
Elementary Matrices V.re are going to use matrix multiplication to take a different perspectIve o n the row reduction of mat rices. In the process, you will discover many new and important insights into the nature o f invertible matrices.
If I E ~
0
0 0
0 I
0
I
0
,nd
A ~
5 - I
0
8
3
7
we find that 5
7
8
3
-1
0
EA ~
In other words, multiplying A by E(on the left ) has the same effect as lllterchanging rows 2 and 3 o f A. What is significant about E? It is si mply the matnx we obtain by applying the same elementary row operation, R2 +-,). R3, to the identIty matrix 13, It turns out that this always works.
Definition
An elementary matrix is any matrix that can be obtained by per forming an elementary row operation on an identity matrix.
Since there are three types of elementary row operations, there are three corresponding types of elementary matrices. Here are some more elementary matrices.
Example 3.21
Let I
E]
=
0
0 0
0 3 0 0
0 0
0 0 1 0
0
1
, Ez =
0 0
0
1
1
1
0
0 0
0 0 , 0
0
0
0
I
I
,nd
E, ~
0 0 0
0
0 0 1 0 0 0 1 0 - 2 0 1
Section 3.}
The Inverse of a Matrix
169
Each of thcse matrices has been obtained from the identity matrix I. by applying a single elementary row operation. The matrix £1 corresponds to 3R1 , E, to RI +-+ R.p and ~ to R( - 2R 2• Observe that when we left-multiply a 4X II matrix by one of these elementary matrices, the corresponding elementary row operation is performed on the matrix. For example, if
a" a" a" a"
A~
then
E1A =
a"
al2
au
3""
3all
3al}
a" a"
a"
a"
EJA ;;
and
a.,
al2
au
a2l
a"
a" a"
a.2
a.,
• E2A =
""
a" a" a" a"
a" an
a'l
a"
au •
all
a" a"
a"
al!
au
a 21
au
a"
a"
an
an
a" - 2a21
a. 2 - 2a Z2
ao - 2a D
--t
Example 3.27 and Exercises 24-30 should convince you that tltlyelemen tary row operation on tilly matrIX can be accomplished by left-multiplying by a suitable elementary matrix, We record this fact as a theorem, the proof of which is omitted.
Theo,.. 3.10
,
L
Let E be the elementary matrix obtained by performing an elemcntdry row opcration on Tw' If the salllc clementary row operat iOll is performed on an fi X r m,lI rix A, the result is the S(Hne as the matrix fA
a•• .,.
From a compu tational poin t of view, it is not a good idea to use elementary matrices to perform elementary row operations-j ust do them direct ly. However, elementary mat rices C:1I1 provide some valuable IIlslghts 1Il10 invertible matrices and the solution of systems of linear equations. We have already observed that ewry elementary row operation can be "undone," or "reversed." Th is sa me observation applied to element,lfY matrices shows us that they are invertible.
Example 3.28
Let L
E,
Then £ 1-
~
0 0
0 0
0
I
0
L
L
,~=
0 0
0 4
0 0 • and
0
I
E) =
L
0
0
I
0 0
- 2 0
I
corresponds to Rz H RJ , which is undone by doi ng R2 H RJ agai n. Thus, 1 = £ 1' (Check by showing that Ei = EIE. = I.) The matrix Ez comes from 41l1, EI
Chapt ~r 3
110
Matrices
which is undone by perform ing ~ R2 ' Thus.
o o 1 o E,- · - o • o o I I
which can be easily checked. Finally. ~ corresponds to the elementary row o peration RJ - 2R" which can be undone by the elementary row opera tion R} + 2R .. So, in this case,
(Again, it is easy to check this by confirming that the product of this matrix and both o rd ers, is I.)
~,
in
Notice that not only is each elementary matrix invertible, but its inverse is another elementary matrix of the same type. We record this finding as the next theorem.
-
Theo". 3.11
Each elementary matrix is invertible, and its inverse is an elementary matrix of the same type.
T.e luUamenlal Theorem 01 Inverllble Mallicas Weare now in a position to p rove one of the main resul ts in this book- a set of equivalent characterizatio ns of what it means for a matrix to be invertible. In a sense, much o f line;l r algebra is connected to this theorem, either 10 the develo pment o f these characterizations or in their applicatio n. As you m ight expect, given this introduction, we will use this theorem a great deal. Make it yo ur fr iend! We refer to Theorem 3.12 as the first version of the Fundamental T heorem, since we will add to it in subsequent chapters. You are rem inded that, when we $l.IY that a set of statements about a matrix A are equivalent, we mean that , for a given A, the statements are either all true o r all fal se.
, The Fundamental Theorem of Invertible Matrices: Version I Let A be an a. b. c. d. e.
11 X n
matrix. The following statements are equivalent:
A is invenible. Ax = b has a unique solution for every b in IR,n. Ax = 0 has only the trivial solution. The reduced row echelon form of A is 1.. A is a product of elementary matrices.
SectiOn 33
Praal
111
The Inverse of a Matrix
We Will establish the theore m by proving the Ci rcular cham of implications
(a )::::} (b ) We have al ready shown that if A is invertible, then Ax = b has the unique solut ion x = A- Ib fo r any b in 1R"(Theorem 3. 7). (b) => (c) Ass ume that Ax = b has a unique solu tion for any b in [R". This implies, in particular, that Ax = 0 htls tl unique sol utIOn. But tl homogeneous system Ax = 0 alwtlys has x = 0 as olle solution. So in Ih is case, x = 0 must be tl.esolution. (c)::::} (d ) Suppose th tlt Ax = 0 has o nl y the tnvitll solution. The corresponding system of eq uations is (I ,
tX t
(l2tXt
+ +
(1124 (1 214
+ ... + + ... +
at..x" = (ll..x" =
0 0
and we are ass um ing that its solutio n is
x,
= 0 =0
x
"
=
0
In o the r words, Gauss-Jordan eliminatio n applied to the augmented matrix of the system gives
a" a" [AI OJ =
""
a 22
a",
anl
.. ,
ti , "
0
a,"
0
,
1 0 0
1
' , ,
.. ,
0
0
0
0
=
[/"I OJ
, ,
a""
0
Thus, the reduced row echelon form of A
0 IS
0
1 0
I".
(d ) =? (e) If we assume that the reduced row echelon for m o f A is I", then A can be reduced to I" usi ng a fi nite sequence of elemen tary row operations. By Theorem 3. 10, each one o f these elementary row operations COl n be achieved by left-multiplyi ng by an appro pria te elementary matrix. If thc app ropr iate sC{1uence of elem entary matrices is E., f l '" ., EI; (in tha t order), then we have
" "'k
''' ""2 "EA = I" 1
According to Theorem 3.11 , these elementary matrices are all invertible. Therefore, so is their p roduct. and we have
E) - II" = ( E1 ... £, 2 E'1 )- ' -- £, 1- ' E'-l 1... E-,. ' A -- (El .. , E21 Agai n, each E,-1 is anothe r elementary matrix, by Theorem 3. J I, so we have wriuen A as a product of elemen tary mat rices, as required. (e) =? (a) If A is a product of elementary matri ces, lhen A is invertible, since elementary matrices are invertible and products of inverti ble matrices are invert ib le.
111
Chapter 3 Matrices
Example 3.29
If possible, express A =
Solullon
[ ~ ~] as a product of elemen tary matrices.
We row reduce A as follo\~s:
A --
'[I 3]3 ,_, " ) [I2 !] KJ-l~ [~ -!J '_"1 [I 0] IRe, [I 0] = I ° -3 o
1
'
Th us, the reduced row echelon fo rm of A is the identity matrix, so the Fundamental Theorem assures us that A IS invert ible and can be written as a product of elementary matrices. We have E~EJ ~EIA = /, where
EI = [~ ~]. E2=[_~ ~]. E3=[~
:]. ~=[~ _~]
are the elementary matrices corresponding to the four elementary row operations used to reduce A to /. As in the proof of the theorem , we have E"2 E" E" _ [ 01 A = (E~}! E E EI ) . , -- E" 'I ') '4 -
~] [~ - : ] [~
as required.
Remark
Because the sequence of elementary row operations that transforms A into / i.~ not un l(lue, neither is the representation of A as a product of elementary matrices. (Find a d ifferent way to express A as a product of elementary matrices.)
"Never bring a cannon on stage in Act I unless you intend to fire it by the last aC1." - Anton Chekhov
Theorem 3.13
The Fundamental Theorem is surprisingly powerfu l. To ill ustrate its power, we consider two of ItS consequences. The nrst is that. although the defin ition of an invertible mat rix Slates that a matrix A is invertible if there is a mat rix B such that both AD = / and BA = / are satisfied, we need only check oneof these equatio ns. Thus, we can cut our work in half! Let A be a square matrix. If 8 isa square matrix such that either AD = lor BA = I, then A is invertible and B = A-I.
PlIof
Suppose BA = I. Consider the equation Ax = O. Left-multiplying by IJ, we have BAx = 80. This implies thatx = Ix = O. Thus, the system re presen ted by Ax = 0 has the unique solution x == O. From the eq uivalence of (c) and (a) in the Fundamental Theorem, we know that A is invertible. (That is, A- I exists and satisfies AA - I = I = A - I A.) If we now right -multiply both sides of BA = / by A- I, we obtain BAA-I = LA- I ~ BJ ::: A- I => B = A-I
(The proof III the case of AB
= I
is left as Exercise 4 L)
The next consequence of the Fundamental Theorem is the basis for an efficient method of computing the inverse of a matrix.
Section 3.3
Theorem 3.14
The Inverse of a Matrix
113
Let A bc a square matrix. If a sequence of elem entary row operations reduces A to /, then the same sequence o f elementary row op erations transforms 1 into A-I .
If A is row equivalent to t, thcn we can achieve the reduction by leftIll ulti plying by a sequence £1' Ez • ... , E1. of elem entary matrices. Therefore, we have Ek . .. ~EI A = I. Setting B = E, ... EzE I gives IJA = I. By Theorem 3. 13, A is invertible and A- I = B. Now applying the same sequcnce of elementary row olXrations to 1 is equivalent to left- multiplyi ng Iby El · ·· ElEI = 8. The result is
Proof
Ek ... ~Ell "" 131
= B = A-I
Thus, 1 is transform ed into A-I by the slime seq uence of elemcntary row opcrations.
The Gauss-Jordan Method lor Computing the Inverse We can perform row opcrations on A and I sim ultlillcously by constructi ng a "superaugmented ma tri x" [A l l]. Theorem 3. 14 shows that if A is row eq uivale nt to [ [which, by the I:: undamc ntal Theorem (d ) <=;> (a), means that A is invertible !, then elementary row operations Will yield
If A cannot be reduced to /, then the Fundamental Theorem guarantees us tha t A is not invertible. The procedu re just described IS simply Ga uss-Jordan elimilllltion performed o n an tIX27/, instead of an II X( n + 1), augmented matrix. Another way to view this procedure is to look at the problem of fi nd ing A- I as solVi ng the mat rix eq uation AX = I" for an n X II matrix X. (This is sufficie nt, by the Fundamental Theorem , since a right inverse o f A mus t be a two-sided inve rse. ) If we deno te the colum ns o f X by X I ' .•. , x n' then this matrix equation is equ ivalent to so lvlllg fo r the columns of X, one at a time. Since the col um ns of /" are the standard um t vectors f l ' ... , en' we th us have /I systems of linear equa tio ns, all with coeffiCIent matrix A:
Since the sam e sequence o f row o perations is needed to b rlllg A to red uced row echelo n form in each case, the augmented matr ices for these systems, [A I e d, ... , [A I e" i, ca n be co mbined as
(AI ' ,' , ... ' .]
~
[A I I.]
We now apply row operations to try 10 reduce A to I", which, if successful, will sim ul taneo usly solve for the columns o f A -I , transfo rming In into A - I. We illustrate this use of Ga uss- Jo rdan el iminatio n with three examples.
(Kample 3.30
FlIld the inve rse of A ~
if it exists.
1 2
- I
2 2 1 3
4
- 3
114
Chapler 3
Matrices
Solulloa
Gauss-Jordan elimination produces
2 2 2 1 3 1
[AI f ) = H. · 211, 11,- II,
R..- II. )
1
0
0
4 0 -3 0
1
0 1
0
2 -2
0
1
1
)
- I
0
- I
6 - 2
- 2 - I
1
2
- I
1
0
0 0
1
-t
1
- 3 1 - 2 - I
1
2
- I
0
1
0
0
1
2 0 - I
0
0 0 1
1
0
- 3 1 r - 2
,
0
_1
0
1
r
0 1 0 -s 0 0 1 - 2
,
1
1
1
3
,1
, -,
I 0 0 9 0 1 0 -s 0 0 1 - 2
II, -lll,
1
A-I
=
9
-1
- 5
-s
1
-2
1
3 1
,
1
-s
,1 ,-
-
Therefore,
0 0 1 0 0 1
1
3 1
(You should always check that AA- l = Tby di rect m ultiplicat ion. l3y T heorem 3.13, we do not need to check that A- , A = Ttoo.)
Remlr.
NOlice Ihal we have used the v3riant of Gauss-Jordan elimination th3t first introduces all of the zeros below the le3dmg Is, from left to right and to p to baltom, and then cre31es zeros above the leading Is, from right to left and bollom to top. This approach saves o n calculations, as we noted in Chapter 2, but yo u mOlYfind it easier, when working by hand, \0 create ,,/I o f the zeros in each column as yo u go. The answer, of cou rse, will be the same.
lxample 3.31
Find the inverse of 2 A '=
if it exists.
- 4 -2
1
- 4
- [ 6 2-2
SectIon 3.3 The In\'erst of a Malrix
115
SOI,II.. We proceed as in Example 3.30, adjoin ing the identity matrix to A and then trying to manipulate IA I tJ into II I A-I I. 2
[AI I}
~
I{_ 11{,
•••• II. ,_II
•
-.
]
-,
- \
- 2
2
-,
]
0
0
6 0
\
0
- 2 0
0
\
\
0
0
\
0
3
- 2 2 - 6 \
0
\
\
2
-\
]
0
0 0
]
-3
2
\
0 0
0 -5
-3
\
2
]
0
]
0
0
At this point, we see that it is not possible to reduce A to 1, SIIlCCthere is a row of zeros on Ih, Id l-h,nd sid, of Ih, ,msmenICd n,,"" . Co ''''q ''c<"ly, A is nOI in" nib". . . t
As the next example illustrates, everything works the same way over Z" where p
IS pnme. Find the inverse of A=
[~ ~]
if it exists, over Z J'
SIII"I.1 in Zj'
We use the Ga uss-Jordan method, remembering that all calculations are
[A l t]~[~ ' • [;
.
I 2
0 0
• [~
• [~
0 0
1I.+2R,
::=
0 0
]
11,+ It,
Thus, A- '
2 \
2
\ 2 \ 2
~l ~l ~l ~l
[~ ~ ], andit iseaSY to check that, ovCr ZJ' AA
Solullo.2 Since A is a 2X2 matrix, we can also com pute II
1
= f. I
using the formula
given in Theorem 3.8. The determina nt of A IS dol A ~ 2(0) - 2(2) ~ - ] = 2
in l j (since 2 + 1 = 0) . Thus, A- I exists and is given by the formula in Theorem 3.8. We must be ClIrcful here, though, since the formula introduces the "fraction" I/ det A
116
Chapter 3 Matrices
and there are no fractions in Zy We must use multiplicative inverses rathe r than division. Instead of I/de t A = 1/ 2, we use 2- 1; that is, we find the number x that satisfies the equation 2x = I in Zy It is easy to see tha t x = 2 is the sol utio n we want: In Z J' 2- 1 = 2, since 2(2) = 1. The formula for A -I now becomes A-I =
Tl[ _~
-~ ] = 2[~ ~ ] = [ ~ ~]
which agrees with our previous solution.
Exercises 3.3 In Exercises 1- 10, find the inverse of the given matrix (If it exists) using Theorem 3.B.
[~ ~] 3. [~ : ] I.
5.
7. [ 0.5 9. [" h 10.
14. Prove Theorem 3.9( b ). 15. Prove Theorem 3.9(d).
Un - 1.5
[
'/" l/ c
pronounced, and this explai ns why computer systems do not use one of these methods to solve linear systems.
-v'v'22 ]
-4.,]
8. [
2.4
2.54
8.128]
0.25
0.8
I/b]
1/ d ,where neither a, b, c, nor d is 0
ExerCises II and J2, solve the given system IIsing the method of Examp[e 3.25.
16. Prove that the nXn identity matrix In is invertible and that 1;;1 = 1.,-
17. (a) Give a counterexample to show that (AB)- I "* A - I B- 1 in general. (h) Under what conditions on A and B is (AB) - l = A -I B- 1? Prove yo ur assertion. 18. By induction, p rove t hat if AI' A z, . .• , An are invertible matrices of the same Size, then the product AlA;:'" An is invertible and (A IA 2 ... An)-l = A-n l ... A;-' Ai '. 19. Give a coun te rexample to show tha t (A A- I + B- 1 in general.
+ B) - I
*
[/I
11.2x + y =- I 5x + 3y = 2 13. Let A =
G !J. b
12.
1 2xl + X:! = 2 Xl -
X;: =
l
(a) Find A- 1 and use It to solve the th ree systems Ax = bl' Ax = b 2, and Ax = by (b) Solve all three syslems at the same time by row reducing the augmented matnx jA I b , b 2 b J ] using Ga uss-Jordan elimination. (c) Carefully count the total number of individual multiplicalions that you performed in (a) and in (b). You should discove r thai, even for this 2 X 2 example, o ne method uses fewer operations. For larger systems, the difference is even mo re
In Exercises 20-23, solve the given matriX equatIOn for X. Simplify your answers as lIluch as possible. (In the words of Albert Emstein, "Everything should be made as simple as possible, but not SIll/pIer.") Assume tllat aI/matrices are invertible. 20. M 2 = A-I 21. AXB = (BA) 2 22. (A- IX) -I = A(B- 2A) - 1 23. ABM- 1B- 1 = I + A III Exercises 24-30, lei
I I I c ~
I I 2
2
I - I 2
I I
-I I ,
B~
0
- I I , - I
D ~
I I I
- I I 2
0
I , - I
I 2 - I - 3 - I 3 2 I - I
Section 3.3
each mse,find (III elementary matrix E that satisfies the . gIVen equ(l!Ion. /II
.
24. £4. = 8
25. E8
= A
26. £4. = C
27. EC= A
28.EC = D
29.ED = C
30. Is there an ele men tary mat ri x £ such that EA = D? Why or why not?
47. Prove that if A and B are square matrices and A8 is invertible, then both A and B are invertible.
/n Exercises 48-63, use the Gllllss-Jord(1II method to filld the inverse of the given II/(I{rix (if it exists). 48.
[~ ~] 33. [~ ~]
[~ ~] 34. [-~ ~] 32.
31.
35.
37.
]
0
0
0
]
-2
0
0
]
]
0
0 0
36.
,
0 0 , C'F 0
0
]
38.
2 52.
0
]
]
]
0
0 0
]
0
0
0 0
]
0
54.
, ,c *" 0 56.
39. A = [
] - I
0]
- 2
I llS
40. A =
[~ ~ ]
58.
4 1. Prove Theorem 3.13 for the case of AB = I. 42. (a) Prove that if A is invertible and AB = 0, the n B = O. (b) Give a counterexample to show that the result in part (a) may fail if A is no t invertible. 43. (a) Prove that if A is invertible and BA = CA, then B = C. (b) Give a counterexample \0 show that the result in part (a) may fa il if A is no t invertible. 44. A square matrix A is called idempotent if A' = A. (The wo rd idempotent comes f rom the Latin idem, mea ning "same," and polere, meaning "to have power." Thus, something that is idempotent has the "s:.II11e power" when squared.) (a) Find three idempotent 2X2 malnees. (b) Prove that the only invertible idempotent I/X 1/ matri x is the identity matrix.
45. Show that if A is a square matrix that satisfies the equation A 2 - 2A + 1 = 0, the n A -I = 2/ - A.
59.
-;]
51.
0
]
- ]
2
0
- ]
]
]
0
]
0
]
0
]
]
/, 0
prodllcts of elementary
49.
3 -2
0
]
III Exercises 39 ami 40, fi"d a sequellce of elementary matrices EI, ~, ... , Ek such that Ek ... ~ EIA = 1. Use this
sequcllce to write both A (Hid A matrices.
[ ~ ~]
50. [ 6 -3
0 0
111
46. Prove that if a symmetric matrix is invertible, then its inve rse is symmet ric also.
III Exercises 31-38. filld the illverse of the give" elemelltary
matrix.
The Inverse of a Matrix
0
" 0 ,
57.
d 0
0 0 20 - 40 0 o 0
0
]
0 0 0
0
0 0 0
3
]
]
0
0 0
]
0 0
0
]
,
;,] [-: ]
- ]
2
3
]
2
2
3
- ]
a 0 0 55.
b
0 d
" 61. [~ !] over Z 5 63.
53.
[-~ -~]
]
5
0
]
2
4 over l,7
3
6
]
]
0 0 2
" ]
- ]
"
]
- ]
0
]
]
]
0 2
]
0 3
60. [ 0]
62.
0
0 - ]
: ] over Zl
2
]
]
]
0 2 over 1L3
0
2
]
Partitiollillg large sqllare matrices ct/II sometimes make their iI/verses easier to compute, partiCIIl(lrly if tile blocks hal'e aI/ice form. III Exercises 64-68, verify by block mulliplicatiollthat the inverse of IT matrix. i!partitioIJed as shown. is (IS claimed. (Assume Illat all inverses exist (IS "ceded. )
Chapt~r
nl
3 Matrices
- (1 - BC) - 'B ] 1+ C(/- BC) - 'B
In Exercises 69-72, partition the given matrix so that you can apply one of the form uu15 from Exercrses 64-68, and then calCIIlate the inverse usmg that formula.
69.
D- 1
68.
A 8 ]-' [P Q] [ C
D
-=
-
(BO- ' c t ' BO- ' ] D- 1 C(BD- 1 C) - I BD- 1
.
7 1.
Q -= - PHO- I, R = - O- lCP,and 5 = D- 1 + D- 1CPHD- 1
'
0
0
0 2
1
0
0
0 1 0 0 1
3 1 2
70. The matrix in Exercise 58
5' where P -= (A - BO- 1C) - I,
R
1
0
0
1
0
0
1 0
0
-1
1 0
1
1
0
1
0 72.
1
1
1
3 - 1 5
1
1
2
..~ The LU Factorization Just as it is natural (and illuminating) to factor a natural numl.x-r into a product of other nat ural numbers-for example, 30 "" 2·3· 5-lt is also freq uen tl y helpful to fa ctor matrices as products o f other mat rices. Any represen tation of a matrix as a product of two or more other mat rices is called a matrix factorization. For exa mple,
] -I] = [' 0][3 -1 -2 -5
3
I
0
is a matrix fac torization. Needless to say, som e factoriza tions are mo re useful than o thers. In th is sectio n, we introduce a m atrix factorization that arises in the solution of systems of linear equations by Gaussia n elimination and is particula rly well suited to computer im plementation.ln subsequent chapters we will encoun te r other, equally useful matrix factorizations. Indeed, the topic is a ric h one, and entire books and courses have been devoted to it. Consider a system of linea r equa tions o f the for m Ax = b, where A is an /IX 'I matrix. Our goal is to show that Gaussia n eliminatio n implicitly factors A into a product of matrices that the n enable us to solve the given system (a nd any other system with the same coefficient mat ri x) easily. The follow ing example illustrates the basic idea.
Ixamplu 3.33
Let
A -=
2 4 - 2
1 - I 5
3 3 5
SectIon 3.4
n9
The LV Factonzation
Row reduction of A p roceeds as follows:
A =
2
1
3
4
- 1
- 2
5
3 5
R, - lR, R. -+- R,
2 0 0
,
1
3
- 3
- 3
6
8
,
R, dR .
2 0 0
3 - 3 2
1
-3 0
U
(I)
The three elemen tary matrices £1' ~, Ej tha t accomplish this reduction of A to echelon form U arc (in order):
E,
1
0
0
- 2
1
0 ,
0
0
1
E,=
1
0
0
1
0
0
0
1
, Ej = 0
1
0
1
0
0 1
0
2
1
Hence, £l ~EI A
== U
Solving for A, we gel
2
0 0 1 0
0
0
1 A = EI-1 E2- JEl- I U =
-
1
1
0 - 1
0 0 1 0
1
0
0
0
1
0 U
1
0
- 2
0
1
0
0
2
1
0 U=W
- 1
-2
1
1
Thus, A can be factored as A = LV
where U is an upper triangular matrix (see the exercises for Section 3.2), and L is IIIlit lower triangular. T hat is, L has the form The LV facto ri zation was int roduced in 1948 by the great English mathematician Alan M. Turing ( 19 12- 1954 ) in a paper emitled "Rounding-off Errors in Matrix
Processes» (QwlrIerly Jo/mwl of MeclulIlics lind Applied M,ahcrnalics, I (l948), pp. 287- 308). During World War II, Turing was instrumental in cracking the German "Enigma" code, However, he is best known for his work in malhematicallogic Iha' laid the theoretical groundwork for the development of the digital co mputer and the modern field of artificial intelllgence. The "Turing test" that he proposed in 1950 is sti ll used as one oflhe benchmarks in addrf'ssing the question of whether a compu ter can be considered .. intt."lligcnt. H
•
1
o o
•
•
1
1 0
L=
...
with zeros above and Is on the main d iago nal.
The preceding exa mple m otivates t he following definitio n.
Definition
Let A be a square matrix. A factor ization of A as A = LU, where L is unit lowe r triangular and U is upper triangular, is called an LV factorization o f A.
Rlmarlls Observe that the matrix A in Example 3.33 had a n LV factorization bccause 110 row illlerciwl1ges were needed in the row reduction of A. llence all of the elementary matrices that arose were unit lower triangular. Thus. L was guaranteed to be unit •
180
Chapter J
Matrices
lower triangular because inverses and products of unit lower triangular matrices are also unit lower triangular. (See Exercises 29 and 30.) If a zero had appeared in a pivot position at any step, we would have had to swap rows to get a nonzero pivot. This would have resulted in L no lo nger being unit lower triangular. We will comment furthe r on this observatio n below. (Can yo u find a rna· trix for which row interchanges will be necessary?) • The notion of an LV fac torization can be generalized to nonsquare mat rices by simply requiring Vto be a matrix in row echelon form. (See Exercises i3 and 14.) • Some books define an LV factorization of a square matrix A to be any factorization A = LV, where L is lower triangular and V is upper triangular. The first remar k above is essentially a proof of the following theorem.
Theorem 3.15
If A is a squa re matrix that can be reduced to row echelon fo rm without using any row interchanges, then A has an LV factorization.
To sec why the LV factorization is useful, consider a linear system Ax = b, where the coeffi cient matrix has an LV factorization A = LV. We can rewrite the system Ax = b as LUx = b or L( Vx ) = b. If we now define y = Vx, then we can solve for x in two stages: I. Solve Ly = b for y by forward substitution (see Exercises 25 and 26 in Section 2.1). 2. Solve Vx = y for x by back substitution. Each of these linear systems is st ra ightforward to solve because the coefficient matri · ces Land U arc both triangular. The next example illustrates the method.
Example 3.34
2 4
I - I
3 3
- 2
5
5
Use an LV factorizat ion o f A =
Solullon
to solve Ax = b, where b =
I - 4
9
In Example 3.33, we fou nd that I
o
2 - I
I
0 0
2 0
I - 3
3 - 3
- 2
I
0
o
2
w
As outli ned above, to solve Ax = b (which is the same as L( Ux) = b), we fi rst solve y,
Ly = b for y =
Y2 • This is just the linear system
y, y, 2Yl
I
+
Y2
-4
- Y, - 2Y2+Y3 =9
Forward substitution (that is, working from top to bottom) yields YI = I 'Y2 = - 4 - 2Yl = - 6,h = 9
+ Yl + 2Y2 =
- 2
The l_U FaClori7~ lion
Seclion 3.4
~
1
Thus y
=
1'1
- 6 and we now soh-e Ux = Y for x
=
X
z This li near system is
x,
- 2 2xl
+
X2 +3x3=
I
-3Xl - 3xJ = - 6
2x) = - 2
and back substitution quickl y p roduces XJ
= - 1,
-3~
+
= - 6
3x) = - 9 so thtH Xl = 3, and
2xI = I - x! - 3xJ :. I so that XI -
Therefore. the solution to the given system Ax = b is x =
! 3 - I
An EasY Way 10 find LU Faclorlzallons In Example 3_33, we computed the mat rix L as a product of elementary matrices. Fortunately. L can be computed di rectl y from the row reduct io n process wi thout our needing to compute elementa ry matrices at all. Remember that we are assuming that A can be red uced to row echelon fo rm without using any row interchanges. If th is is the case, then the en tire row reduction process can be do ne using o nl y elemen tary row operations o f the form R, - kR}' (Why d o wc not nced to use the remaining elcmcntary row operation , multiplying a row by a nonz.ero scalar?) In the o penllion
R; - kR,• we will refer to the scalar k as the mulliplje,. In Example 3.33 , the elementary row operations that were used were. in order, II, - 2R,
(multiplier '" 1)
+ RI = RJ - (- I ) R, R, + 2R z = RJ - (- 2)Rz
(multiplier - - I )
RJ
(multiplier - - 2)
The multipliers are precisely the entries of L that are below its diagonal! Indeed.
L=
= 2,
1 2
0 I
0 0
- \
- 2
I
= - I , and ~l = - 2. Notice that the elementary row operatio n R, - kR/ has its mult iplier k placed in thc (i,}) entry of L
and
Example 3.35
1"21
~l
Find an tUfactorizau on o f
A=
3
1
3
-4
6
4 2 5
8
- 10
5
- I
-2
-4
3
-9
112
Chaptt'r 3
Matrict'S
SolulioR
Red ucmg A to row echelon form , we have I
3
-4
6 4
8
-1 0
5 -2
- I
3
A=
3
2
-9 5
1l, - IIl, Il,- Il, R,-(-31 1l,
,
-4 R,-lR; R,.-4R,
,
,
R., -(- l j R
-4
2
3 2
0
I
2
0
8
7
3 -1 6
3 0 0 0
I
3 2 I - I
3 0
I
2
0 0 I
3 0
2
0 0
0 0
3 2 I 0
- 2
-4 - 2
4 - 8 -4 -2
=U
4 -4
T he first three multipliers are 2, I, and - 3, and these go into the subdiagonal entries o f the first column of 1. So. thus far.
I. =
I
0
0
2
I
I
0 0 I 0
• • •
-3
0
I
t
The next two multipliers arc and 4, so we continue to fill o ut I.: I
L=
2
0 I
I
1
-3 4
0
0
0 I
0 0 I
•
T he final m ultiplier. - I, replaces the last '" in L to give
L=
I
0
0
0
2
I
0
0
I
1
0
-3
4
I - I
,
I
Thus, an LU factorizat ion o f A is
A=
3
I
6
4
3
2 5
- 9
as is easily checked .
- 4
3 8 5
- 10
- 2
- 4
- I
=
I
0
0
0
2 I
I
0
0
1
-3
4
I -I
,
I
0
3 0 0
0
3 2 I
I
0
0
0
2
-4 -2 4 -4
= LU
Section 3'" The LV Factorization
113
.,.ar" • In applying th is method. it is Important 10 note thai the elementary row operations R, ~ kRJ must be performed from lOp to bottom within each column (using the diagonal entry as Ihe pivot), and column by column frol11 left to right. To illustrate what can go wrong if we do not obey these rules. consider the following row reduCiion: , 2 2 , 2 2 2 2 A=
[
1
1
2 2
,
R, - !R,
•
, ,, ,,, -, o placed in as follows: ,oo , , o , o ---c"
0
This time the multipliers would be get
0
L
0
- I
-I
u
0-1 Ln = 2. ~I == J. We would
2
)
'*
but A LV, (Check this! Find a correct LV factorization of A. ) • An alternative way to construcl L is to observe that the multipliers can be obtained directly from the matrices obtained al the intermediate steps of the row red uction process. In Example 3.3), examine the pivots and the corresponding columns of the mlllrices that arise in the row reduction
,
3 2 3 4 - I 3 -+A 1 = 0 - 3 -3 -+ 0 -3 - 3 - U - 2 5 5 0 6 8 0 o 2 The first pivol is 2, which occurs III the first column of A. DIviding the entries of this column vector that are on or below the diagonal by the pivol produces 2
1
3
2
,-
1
,
2 4
2
2
~
-,
-2
The next pIVot IS -3, wh ich occurs in the second column of A,. Dividing the entries of this column vector that arc on or below the diagonal by the pivot, we obtain
, T':;-1-3 (-3)
~
,
6
- 2
The final pivot (whICh we did not need to usc) is 2, in the third column of U. Dividing the entries of thiS column vector that arc on or below the diagonal by the pivot, we obtalll
,
2
2
,
,
If we place the resulting three column vectors s ide by side in a matrix, .....e have
2
,
- I
- 2
1
which IS exactly 1-, Qnce the above-diagonal en tries are fi lled with zeros.
1..
Chapter 3
Matnces
In Chapter 2, we rema r k~d that the row echelon form of a matrix is not unique. However, if an invertible matrix A has an LV facto rization A - LV, then this factOrIzation is unique. i •
Theor.m 3.16
If A is an invertible matrix that has an LV factorization, then L and V are unique.
Prool Suppose A = LV and A == L, VI are two LV facto rizations of A. Then LV = LI Up where 1. and L ] are u nit lower triangular and U and VI a r~ up per triangular. In fact , Vand VI :1TC two (possibly different) row ~chelo n fo rms of A. By Exercise 30, 1.1 IS invertible. Because A is invertib le, Its reduced row echelon form is an id~ nt ity matrix I by the Fundamental Theo rem of Invertible M;lIrices. Hence U also row reduces to I (Why?) and so U is invert ible also. Therefore, LII(LU)U '= LII(LIU,)U- I so (LI ' L)(UU - I ) "" ( L"lI LI)(UIU -' ) Hence (LII L) / = I(V,U - I}
so
LII L =
U,u- I
But LII L is unit lower tmng ular by ExerciK 29, and UIU- 1 is upper triangular (Why?) . It follows Ihat LII L = VI is bOJh unit lower triangular and upper triangular. The only su ch matrix IS the identity matrix, so Ll l L = I and VI V- I = /. It foll ows that L = LI and U"" VI' so the LVfactorization of A is unique.
c..r'
The p' LU FaClorlzalioD We now ~xplore the p roblem of adapting the LU fac to rizatio n to handle cases where row interchanges a re necessa ry during Gaussian elimination . Consider the matrix I A =
2
- I
3 6
2
- I
1
4
A straightforward row reduction produces
1 2
A-+B =
- 1
0
0
5
o
3
3
which is not an upper triangular mat n x. However, we can easily convert th is into upper tr iangular form by swapping rows 2 and 3 of B to get
V ..
I
2
- I
0
3 0
3 5
o
Alternatively, we can swap rows 2 and 3 o f A first. To this end, let Pbe the elementary matrix I
0 0
0 0
0
I
0
I
Section 3.4
Th~
115
W Factori1.ation
corresponding to interchanging rows 2 and 3, and let E be the product o f the elementary matrices that then reduce PA to U (so that 1;- 1 :::: L is unit lower triangular). Thus EPA = U, so A :::: (EP)- I U = p- IC I U :::: p- I LU. Now th is ha ndles only the case of a single row interchange. In gen~ral , Pwill be the product P = Pl .. • Pl P1 of all the row interchange matrices PI,Pl , . .. Pk (where PI is performed first, on so o n.) Su ch a matrix P is called a permutation matrix. Observe that a permuta tion matrix arises from permuting thc rows of an iden tit y ma trix in some o rder. For exa mple, the following are all permutation mat rices: I
o o
[~ ~], ~
o
I
I 0 0
000 I 0 0
o o, I o o
0
I 0
I 0
Fortunately, the inverse o f a permutation mat ri x is casy to compu te; in fact, no calculatIOns arc need ed at all!
TMeore. 3.11
• If P isa perm utation matrix , thcn P
1
pT.
I _
• Prill We must show that pTp ... J. But the ith row o f pT is the same as Ihe ith column of P, and these arc bOlh equal to the same standard unit vector e, beca use Pis a permutation matrix. So
( pTp )i':::: (ith row of pT)(ith column o( IJ) :::: eTe :::: e·c :::: 1
'*
This shows that d iagonal entries of I,Tp are all Is. O n the other ha nd, if} i, then the,ith column of P is a (/iJJcrem sta ndard uni t vector from e-say c '. T hus a typical off-diagonal en try of pTp is given by
( pTp) ij = (ith row of pT)Uth colum n of P) ""
('Te'
:::: ('. e' "" 0
Hence pTp IS an identity matrix, as we wished to show. Thus, in general, we can faclor a square matrix A as A "" p-I LU = pTLU.
Dellnltlon
Let A be a square matrix. A factorization of A as A = pTLU, where Pis 11 permutation matrix, L is unit lo .....er triangular, and U is upper triangular, is called a pTLU factorizat ion of A.
Example 3.36
0 0 6 Find a pTI.U factorization of A =
I
2
3.
2
I
4
Solulloa
r.irst we reduce A to row echelon fo r m. Clearly, we n{'Cd at least o ne row Interchange. I 0 0 6 I - I, I 2 3 2 3 0 0 A= I 2 3 6 • 0 0 6 2 I 4 2 I 4 0 -3 - 2
., "
•
/1, _ 11.
•
I
0 0
2 - 3
3 - 2
0
6
186
Chapter 3 Matrices
We have used two row interchanges (R I +-+ Rl and then R! _ permutation mat rix is P = P 2P I =
R3), so the required
]
0
0
0
]
0
0
]
0
0
0
]
]
0
0
0
0
]
0
]
0
0
0
]
]
0
0
We now find an LV factorization of PA.
PA Hence
=
L 21
0
]
0
0
0
6
]
2
3
0
0
]
]
2
3
4
]
0
0
2
]
4
2 ] 0 0
0
2 - 3
3 - 2
0
0
6
]
fl. ., - lfl.,
•
6
U
= 2, and so
A
=
pTLV
=
0
0
]
]
0
0
]
2
]
0
0
2
]
0
- 3
0
]
0
0
0
0 I
3 - 2
0
0
6
The discussion above justifies the following theorem.
Theore. 3.18
Every square matri x has a pTLV factoriza tion.
Hemarll
Even fo r an invertible matrix, the pTLV factorization is not Ulllque. [n Example 3.36, a single row interchange RI +-+ R3 also would ha ve worked, leading to a di fferent P. However, once P has been de termined, L and V are unique.
Computational Considerations If A is tlXn, then the total number o f operations (multiplications and d ivisions) required to solve a linear system Ax = b using an LV factorization of A) is T(n) = rl 3/ 3, the same as is required for Gaussian elimination (See the Exploration "Counting Operations," in Chapter 2.) This is ha rdly surprising since the forward elimination phase produces the LV factorization in = tI'/ 3 steps, whereas both fo rward and backward substitution require = 1?-/2 steps. Therefore, for large values of II, the 1~/3 term is domina nt. From this point of view, then, Gaussian elimination and the LV factorization are equivalent. However, th e LV factorization has other advantages: From a storage po int of view, the LV fac torization is very compact because we can overwrite the ent ries o f A with the entries of L and Vas they are compUled. In Example 3.33, we found that •
A~
2 4 - 2
I
0
0
- ]
3 3
2
]
0
5
5
- ]
- 2
]
]
This can be stored as 2
- ]
2
-3 -2
-~
3 -3 2
2 0 0
- 3
3 -3
0
2
]
W
Section 3.4
111
The LV Faaonzation
with th("'entries placed in thcorde r (1, 1), ( 1,2) , (1,3), (2, 1), (3 , 1), (2,2) , (2,3) , (3,2 ), (3,3). In o ther words, the subdiagonal entries of A are replaced by the cOrreSIJondi ng mult ipliers. (Check that this works!) • Once an LV fa ctorization o f A has been computed . it can be used to solve 3S ma ny linear systems of the form Ax = b as we like. We just need to apply the method o f Example 3.34, va rying the vector b each time. • Fo r matrices with certain special forms, especially those with a large nu m ber of zeros (so-called "sparse·' matrices) concentra ted off the d iagonal, the re arc methods Ihal will simplify the compulalion of an LU factorization. In these cases, this m ethod is fa ster than Gaussian elimination in solving Ax = b . • For an invertible matrix A, an LV fac to ri zatio n of A can be used to find A - I, if necessar y. Moreover, this can be done in such a way that it simultaneously yields a fa ctorization of A- I. (Sec Exercises 15- 18.) ••• .,. If you have a CAS (such as MATLAS) that has the LVfacto riza tion buil t in, you may notice some differences between yo ur hand calculations and Ihe computer o utput. This is lx
2 - 1 0
In Exercises 1- 6, solve tile system Ax = b using /I,cgi,'c" LV [actoriwtioll of A.
-: :
I] b _- [ 5] [-: ~] = [ ~ ][ - ~ 6' I - 2] b = 2. A = -:] [~ [ ~][~ , . [~ ] l. A =
3. A =
2
1
-2
3 - 3
,
,
2 0 0 0 2 3
4. A =
- I X
2 0 0
-,
1
0
0
- I
1
0
2
0 1
-.-,,
-3
- 2
1
X
- 2
-, - I
,
2
2
-,
0
, b =
-,•
s. A =
6. A =
-
0 1 0 0 1
,
-~
5
, . b=
0
2
0
2
0 -
-
j
,
- 3 1 0 0 7 2 - I 0 0 - I 5 -3 . b= 0 1 0 0 0 5
1 0 0 0 1 0
I 5 2 2
,
3
0
1
0
- 2
-5
- I
2
- 2
1
3
X
0 0
0
-,
1
-3 3 -2 9 -8 9 -5 3 0 1 - 3 3 5 2 . b= - I 0 -2 0 0 1 0 0 6
1
1
,
1
0
0 0 0
1 3
1
-5
0 !
2 0 0 0
X
1
1
•8 -,-, , -I
0
,
,
0
0 0
0
1
0
- 2
1
I" ExerCIses 7- 12,jiml WI LU factorizatIon of the glvell matm:. 7. [
I
-3
-n
8.
[!
-~l
181
9.
Chapter 3
1
2
4
5 6 7 9
8
11.
12.
Ma trices
3
10.
1
2
3
- 1
2 0 - 1
6 6
3
0
-6
7
-2
- 9
0
2 - 2
2 4
4
4
2 - 1
2
2
- 1
4
0
4
3
4
4
1
21.
2 3
Ger/erall ze tIle dejinition of LV factorization 10 nOl/square matrices by silllply reqwrmg V to be a matrix it! TOW echelo" form. With tllis mo(lificatlon,jiml an LV fac/oriza tiotl of the matrices it! Exercises 13 and 14. 1
-2
13. 0
0 3
3
1
0
0
0
5
1
14.
1
2
- 2
- 7
1 0
1
3
0 3 3 -3
- 1
8 5
-6
1 -2 2 0
For (11/ invertible matrix wllh till LU facto rization A = LU, botll Land U will be illvertible (///(1 A - I = [ j l C l . I" Exercises 15 and 16, find L- 1) U- I , and A- 1 for the gIven matrix. 15. A
111
Exercise I
16. A in Exercise 4
The IIIl'erse of a matrix ca t/l/lso be computed by solvillg several systems of equtltiotlS IIsing IIle method of Example 3.34. For a" nX" matrix A, to find ils inverse we /leed 10 solve AX = I~ for the /I X " matrix X. Writillg this equation as A[ X I Xl'·· x ~ ] = [el e z '" f ~ ] . Iising the matrrx-co/umll form of AX, we see that we need to solve II systems of linear eqlltltions: Ax l = f l' AX2 = f l ' •• . , Ax ~ = en. Moreover, we am use tile factorization A = LV to solve each one of these systems. It! Exercises 17 alld 18, lise Iil e appro(lCh just oudlt/ed to jilld
A - [ for tile givC/l mmrix. Compare witl! tl/C melilot! of Exercises 15 alld 16. 17. A in Exercise I
1 0 0 0 1 0
0
18. A in Exercise 4
0 0
1
0 0 19.
7 5 8
6 9
In ExerCIses 19-22, write tFtc given permutatioll matrix as a product of efemelltary (row illterc/ulIlge) matrices.
20.
1 0
0 0 1 0 0 0
0
0 1 1 0
0 1 0
0
0
0
1
0
0
0 0 0
0 1 0
0
0
0 0 1 0
0 1 0
0 0 1 0 22. 0 0 0 0 0 1
0 1
0 0 0 1 0
III Exercises 23-25, find a pTI.V factorization of the given matrix A.
23. A =
25. A =
0 1 4 - 1 2 1 1 3 3
0 0 - 1 1
24. A =
0 1
0
- 1
1
- I
1
1
0 0
1
- 1
0
1
1 3
2 1
1 -I
2 2 1 0
3 2
1 1
26. Prove that there are exactly II! /IX 1/ perm utation matrices.
III Exercises 27-28, solve tlJe system Ax = b usmg the given factoriuHioll A = pTf_U. Bemuse ppT = 1, pTLUx = b call be rewrittell as LUx = Pb. Tllis system callthell be solved IIsing the metllod of Exl/mple 3.34. 27. A =
1
2
3
1
1
- I
2
3
2
1
o
1
- I
1
o
0
x
28. A =
o
o
8 4 4
- 1
2 =
I
0
1
o
1 0
0
0
1 0
o
1
! -!
0
-,,
3 5 1 2 o 3
0 I
5
o o
I
0
1
0
1
I
I
0
0
2
o
0 1 0
- I
1
4
1
2
16
x o
- 1
1
- 4
o
0
2
4
Section 3.5 Subspaccs, Basis. Dimension, ~nd Rank
29. Prove that a prod uct of unit lower triangu lar matrices is uni t lower mangular. 30. Prove that every um! lower I riangular matrix is
inver tible and that its inverse is also uni t lower triangular. An LDU factorization 01 a square matrix A is a lac/oriUl-
lioll A =:; LOU. where L IS a umllower IriallgulClr matrix, o is a diago"al matrix, (md U is a IInit IIpper Ina"gu/ar malrix (upper mangular with 1$ on its diagomd). /" Exercises 31 (/1/(1 32,jilld (11/ LDU j(lctonzmiolJ of A.
31. A in
Exerci~
I
189
32. A in Exercise 4
33. If A is symmetric and invertible and has an LDU factorization, show thai U = LT.
LVLT (with L unit lower triangular and Ddiagonal), prove that this factOrization IS unique. That is, prove that if we also have A = LIDILi (with LI unit lower triangular and V I diagonal), then L = 1.1 and D =:; DI'
34. If A is symmetric and invertible and A
:=
Subspaces. Basis. DImension. and Rank
,
,
'" y
figure 3.2
Th iSsection introduces perhaps the most important ideas in the entire book We have already seen that there is an in terplay between geometry and algebra: We can often use geometric intuition and reasoning to obtai n algebraic results, and the power of algebra will often allow us to extend our fi ndings well beyond the geomet ric settings in which they first arose. In our study of vectors, we have al ready encountered all of the concepts in this scction informally. Here, we will start to become more formal by giving definitions for the key ideas. As you'll see, the notion of a sl4bspace IS simply an algebraic generalization of the geometric exam ples of lin es and planes through the origin. The fu ndamen tal concept of a basis for a subspace is then derived frolll the idea of direction vectors for such lines and planes. The concept of a basis will allow us to give a precise definition of dimension that agrees with an intuitive, geometric Idea of the term, yet is fl exible enough to allow generalization to other settings. You will also begin to see that these ideas shed more light on what you already know about matrices and the solution of systems of linear equations. In Chapter 6, we will encoun ter all of these fundamental ideas again, in more detail. Consider this section a "getting 10 know you" seSSion. A plane through the origin in R3 "looks like" a copy of Rl. Intuitively, we would agree that they are both "two-dimensional." Pressed fu rther, we might also say that any calculation that can be done with vcctors in R2 can also be done iI) a plane through the ongin. In part icular, we can add and take scalar multiples (and, more generally, form linear combinations) of vectors in such a plane, and the results are other vectors in the smile pltllle. We say that, li ke (Rl, a plane through the origin is closed with respect to the operations of addition and scalar multiplica tion. (Sec Figure 3.2.) But are the vectors in this plane two- or three-dimensional objects? We might argue that they are three-dimensional because they live in R) and thereforc have three components. On the other hand, they can be described as a linear combination of Just two vectors---direction vectors fo r the plane-and so are two·dimensional objrcls living in a two-dimensional plane. The notion of a subspace is the key to resolv ing this conundrum.
191
Chapter 3
Malrlces
::~~~::~~--~----~~------~~--~~----~~~-""-----r;:::ii
Definition
A subspaceof R- is any collection S of vectors in
.....
R" such tha t
I . The zero vector 0 is in S. 2. If u and v a re in S, then u + v is in S. (S is closed uniler addition.) 3. If u is in Sand c is a scalar, then cu is in S. ($ is closed limier scalar
multiplication. )
We could have combi ned properties (2) and (3) and requi red, equivalentl y, that S be
closed lIt1der /itlear combinations: I(u •• u2•... '
U k :!T/;'
in S a nd c., ':1 ••.. , CA arc scala rs,
then c. u .+':!Ul + ··+cku kis in S.
Ellmple 3.31
Every line and plane through the origin in R) is a s ubspace of ~J . [t sho uld be d ear geometrically tha t pro perties ( 1) through (3) are satisfied. Here is an algebraic proof in the case o f a pl a ne thro ugh the o rigin. You are asked to give the correspo nd mg proof for a line in Exe rcise 9. Let 9:1' be a plane thro ugh the origin with d irection vectors v. and v 2• Hence, '3' = span (v.,v}). The zero vcClor 0 is in \1>, since 0 = OV1 + Ov2, Now let u = c.v .
+
':!v1 and v = d.v . + d1v l
Ix two vecto rs in 'l/'. The n
Thus, II + v IS a linear com bina tion o f V I and V2 and so is in fl. Now let c be a scalar. The n
ell = e{clv.
+ C2V:) =
( C(I) V I
+ (CC2)V2
which shows lhat cu is also a linear combin3tion of V I and v 2 a nd IS therefore in tJP. We have shown tha t fJ> satisfies properties ( 1) through (3) 3nd hence is a subspace of R 3 •
-4 If you look carefull y a t the details of Example 3.37. yo u Will notice that the fact that VI a nd v1 were vectors in lQ' played no role a t all in the ve rificatio n of the prope rties. Thus. the algebraic me thod we used should ge neral ize beyond R3 and apply in sit ua tions where we can no longer vis ualize th ~ geom etry. It d ~ Moreover. the m ethod o f Example 3.37 can serve as a " I ~ mplat e" in m ore general settings. When we gene ralize Exa mple 3.37 to the span of an arbitra ry SCI of vectors in a ny lQ n, the result is impo rtan t enough to be called a theorem .
Theore. 3.19 'roof
leI S = span ( VI ' v!" .. ,vk)' To ch«;k p roperty ( I) of the definition, we simply o bserve that the zero veCior 0 is in S. since 0 = OV I + Ov z + ... + Ov,.
Se<:lIo n 3.5
Subspaces, Basis, Dimension, and Rank
191
Now let u == '. v .
+
~V2
+ ... + 'kVk
and
v = d.v .
+
d1v2 +
... + alvl
be two vect o rs in S. Then
u + v = ('. v. + ';!V2 + ... + CkVl ) + (div i + d2v2 + ... + d1vk) = (ci + dl)v l + (Cz + d2)V2 + ... + (Ct + dt)vt Thus, u + v is a line ar com bina tion of v •• v •• ••• vi and so is in S. Thi s verifies pro p2 erty (2). To sho w pro pert y (3), let cbe a scalar. The n
CU = c(c. vl + ';!v] + ... + Ctv,) = (cc. )v. + (c';!)vJ + ... + (CCl)V t which shows that cu is also a line ar com bina tion of v •• v ' •••• V 2 k and .s ther efor e in S. We have sho wn that S s,1tisfles prop erti es (1) thro ugh (3) and hen ce is a subspace ofR n. We will refer to spa n (v i • VI" .• , Vi) as tI,e .sub .space .spa ,mea bYV I' VI" .. • Vic" We will ofte n be able to save a lot of work by reco gnizing whe n The orem 3.19 can be app lied .
lnm ple 3.38
x Show that the set of all vectors
forms a subspace of R3.
50lullon
y that satisfy the con di tions x = 3y and z = -2y
,
x Sub stitu ting the two con diti ons into y yields
,
3y
3
Y - 2y
1
-2 3
Since y is arbi trar y, the given set of vectors is spa n
1
and is thus a subspace of
-2
J
R , by The orem 3. 19.
Geo met rica lly, the sci of vect ors in Exa mpl e 3.38 repr esen ts the line thro ugh the 3 orig in in R' with di rect ion vector I
-2
192
Chapter 3
Matrices
(IImple 3.39
x y
Determine whether the set of all vectors
,
and z = -ly is a subspace ofRl.
SOIIUO.
tha t satiSfy the conditions x :. 3y + I
This time, we have all vectors of the for m 3y
+
1
Y
- 2y 3y
The zero vector is not of this form . (Why not? Try solving
+
1
y
- 2y
o o .) Hence,
o
prop<:rty ( I) docs not hold , so this set cannot be a subspace of Rl.
ElIample
3.4~
Determine whether the set of all vectors
Solullo.
.:.1.•,h,,, )'
These are the vectors of the form
=
r, is a subspace of 1R1.
[~ }--call this set S. This time 0 = [~]
belongs to S (take x = 0 ), so property ( I) holds. Lei u =
[~] and v = [ ~] be in S.
Then
u + v =[~:~] which, in general, IS not in 5, since it docs not have the correct form; that is,xT + (XI + ~) l . To be specific, we look fo r a counterexample. J(
x: :I-
t hen both u and v arc in 5, but their sum u + v = [ :] is nOI in 5 since 5"* 31• Thus.
P,oP"'Y (2) f" ;I" ,,,d S ;s not Hubsp'" of R'.
.....t-
.... ,. In order for a set 5 to be a subspace of some JR:n, we must prove that properties ( I) th rough (3) hold !II gelleml. However, for 5 to fmlto be a subspace of Rn, it is enough to sho\" that ollcof the three properties fai ls to hold. The eaStest course is usually to fi nd a single, specific cOllnterexample to illustrate the fai lure of the property. Once you have done so. there IS no need to consider the other propert ies.
SubaPlcea Issoclated wll. MatriCes A great many exa mples of subspaces arise in the context of matrices. We have already encountered the most important of these in Chapte r 2; we now reVIsit them with the notion of a subspace in mind.
Section 3.5
Definition
Subspaces, BaStS, Dimension, and Rank
193
lei A be an /fiX" matrix.
I . The row space of A is the subspace row (A) of IR" spanned by the rows of A. 2. The columll space of A is the subspace col (A) of R'" spanned by the columns of A.
lnmple 3.41
Consider the matnx
A=
1
- I
0
I
3
- 3
1 (a) Determine whet her b =
2 is in the col um n space of A. 3
(b) Determine whether w = [4 5) is in the row space of A. (c) Describe row(A ) and col(A).
Solullo.
1
(a) By Theorem 2.4 and the discussio n precedi ng It, b is a linear combination of the columns of A if and only if the linear system Ax = b is consistent. We row reduce the augmented matrix as follows:
I
- \ 1
1 0 3
1 2
--+)012
o
o
J - 3 3
0 0
Thus, the system is consistent (and, in fac t, has a unique solution). Therefore, b is ill col (A) . (This example is just Example 2. 18, phrased in the term inology of th iS section .)
(b) As we also saw in Section 2.3, elementary row operations simply create linear combinations of the rows of a matrix. T hat IS. they produce vectors only in the row space of the matrix. If the vector w is in row (A} , then w is a linear combination o f the
~ ]. it will be poSSible to apply elementary row operations to this augmented matrix to reduce it to form ~' usi ng only elemenrows o f A,so If we augme nt A bywas [
Ij
t;uy row operations of th" fo rm R, + kRJ, where i > j - in other wo rds, workjllgfrol1l lOp
to bottom ill each CO/lIIllIl. ('Vhy?) In th is example, we have
[: 1~
1 - I
fl. •.111,
1 -I
0 1 3 -3
R. - ~R ,
0
1
0
0
0
9
4
5
•
1 - 1
<
~R,
•
0
1
0
0
0
0
194
Chapler J
Malrices
Therefore, w is
y], the augmenled malrix [ ; ]
I 0
o o
I
0
o 0 in a sim ilar fas hion. Therefo re, every vector In 1R2 IS in row(A), and so row(A) = IRl. Finding col (A) is identical to solving Example 2.21, wherein we determined that it coincides with the plane (through the o rigin ) in Rl wIth equation 3x - z = O. (We will discover o ther ways to answer this Iype of question shortly.)
Re •• rll \Ve could also have answered parI (b ) and the flTst part of part (c) by observing that any question about the rows of A is the correspondmg question about the COIUIIIIl S of AT. So, for example, w is in row (A) if and only if w T IS in col(A T). This is true If and only if the system ATX = w T is consistent. We can now proceed <15 in part (a). (Sec Exercises 2 1-24.) The observatIons we have made about the relat ionship between elemen tary row operat ions and Ihe row space are su mmarized in the following theorem.
Theorem 3.20
Lei B be any matrix Ihat is row equivalenl to a matrix A. Then row ( B) = row (A).
• The matrix A can be transformed into B by a sequence of row operations. Consequently, the rows of B are linear combinat io ns of the rows of A; hence, linear combInations of the rows of B are linear combi nat ions of the rows of A. (See Exercise 21 In SectIon 2.3.) It follows that row (8) C row (A). On the other hand, reversi ng these row o perati ons transforms B mto A. Therefore, the above argumen t shows Ihat row(A) C row (8). Combining these results, we have row(A) = row ( B).
PrOOf
There is ano ther Important subspace that we have already encountered' the set of solut ions of a homogeneous system of Imear equat ions. It is easy to prove that this subspace 5.1tisfies the th ree subspace propertIes.
Theorem 3.21
Let A be an mX /I matrix and let N be the set of sol utions of the ho mogeneou linear system Ax = O. The n N is a subspace of Rn.
[Note that x must bea (column) vector in Rn in order for Ax to bedeflned and that 0 = Omis the zero veclor in Hm.) Since AO" = 0"" 0,, is in N. Now leI u and v be in N. Therefore, Au = 0 and Av = O. It follows that
Prool
A(u
+ v)
= Au
+ Av
= 0
+0
= 0
Section 3.5 Subspaces. Basis, Dimension, and Rank
1·lence, u
+ v is in
195
N. Finally, for any scalar c,
A(," ) = «Au) - , 0 = 0 and therefore
Cll
is also in N. It follows that N
IS
a subspace of Rn.
Dellnltlon
Let A be an mXn matrix. The null space of A is the subspace ofR" consisting of solutions of the homogeneous linear system Ax :::: 0.1t is denoted by null (A). The fact thaI Ihe null space of a matrix is a subspace allows us to prove what intU ition and examples have led us 10 understand abo ut Ihe sol utions of linear systems: They have either no solutio n, a unique solution, or infinitel y many solutions.
Theorem 3.22
Let A be a matrix whose entries are rcal n umbers. For any system of linear equations Ax :::: b, exactly one of the following is true: There is no solutio n. b. There is a unique solution. c. There are infinitely many solutions. 3.
At fi rst glance, it is no t entirely clear how we should proceed to prove this theorem. A little reRection sho uld persuade you that what we are really being asked to prove IS that if (a) and (b) arc nOI true, then (c) is the o nly other possibili ty. That is, if there is more than o ne solution, then there cannOI be just two o r even finitel y many, but there must be infin itely many.
Prool
If the system Ax :::: b has either no solutions or exactly one solution. we are
done. Assu me, then, that there are at least two d istmct solutio ns of Ax = b--say, Xl and Xl ' Thus,
Axl = b with
XI
and
AX1
= b
#; Xl' II follows that
A(XI - Xl) = Axl - Axl '" b - b :::: 0 Set ~ z: X I - X l ' Then ~ ¢ 0 and Axo :::: O. HenCl:, the n ull space of A is nontrivial, and si nce null (A ) is closed u nde r scalar multiplicat io n, c~ is in null (A) for every scalar c. COrlscquemly, the null space of A con tai ns infinitely many vectors ( SIllCC it contains nllenst every vector of the fo rm cXu and there arc infil1ltely many of these). Now, consider the (infini tely many) vectors o f the form XI + c~, as (varies th ro ugh the set ofreal n um bers. We have
A(Xl
+ cX())
= Axl
+ cAxo = b +
cO == b
Therefo re, there are infinitely many solutions o f the equation Ax :::: b ,
~
BasiS , ....(' can extract a bit m ore from the intui tive Idea that subspaces arc generalizations of planes through the o rigi n in R'. A plane is spa nned by any two vecto rs that arc
196
Chapter 3 Matrices
parallel to the plane but are not parallel 10 each other. In algebraic parlance, two such vectors span the plane and arc linearly independent. Fewer than two vectors will not work; more than two vectors is not necessary. This is the essence of a basis for a s ubspace.
Defilltlol
A basis for a subspace 5 of [Rn is a set of vectors in 5 that
I. spans 5 and 2. is linearly independent.
Example 3.42
Ixample 3.43
In Section 2.3 we saw that the standard unit "ectors c l ' e2, . . . en III Rn arc linearly ; ndependent ,nd sp,n R". Thmfn", theyfo'm' b,,;; fo, R",colled the
1n Example 2. 19, we showed that
R~ =
span ([ _
~ ], [~D. Since [ _ ~] and [~] arc
also linearly independent (as they arc not mu lfiples), they form a basis for [1!1.
A subspace can (and will) have more than one basis. For example, we have just seen that R! has the standard basis { [
~ J. [~J
}
and the basis { [ _
will prove shortly that the IHlm/Jer of vectors always be the same.
Ixample 3.44
III
l }.
~ [~]
However, we
a basis for a given subspace will
Find a basis for 5 = span (u, v, w ), \~here 3 u =
- I ,v =
5
o
2 I ,
3
and
w =
-5 I
SolulioD The vectors u, v, and w already span 5, so they will be a basis for S if they arc also linearly independent. It IS easy to determine that they are not; indeed, w = 2u - 3v. Therefore, we can ignore w, since any linear combinations involving u, v, and w can be rewritten to involve u and valone. (Also see Exercise 47 in Section 2.3.) This Implies that 5 = span (u, v, w ) = span (u, v), and since u and v are certainly lin early mdependent (why?), they form a basis for 5. (GeometTically, this means that u, v, and w all lie III the same plane and u and v ca n serve as a set of direction vectors for this plane.)
Section 3.5
Example 3.45
Subspaces, Basis, DimensIOn, and Rank
191
Find a basis for the row space of
A :
I 2
I - I
-3
50101l0n
6 - I
-2
I 3
0
2 I I 6
4
I I
}
I
The reduced row echelon form of A is
R=
1 0
1 0
o
2
0
}
000 o 0 0
1 0
4
1
- I
o
By T heorem 3.20, row(A) = rowe R), so it is enough to find a baSIS for the row space of R. BUI roweR) is dearly span ned by its nonzero rows, and il is easy to check thallhe staircase pattern fo rces the first three rows of R to be linearly indepe ndent. (This is a general fact , one th at you wiII need to establish to prove Exercise 33.) Therefore, a basis for the row space of A IS
(I I
0
I
0
-1] .[0
I
2
0
}J.[O
0
0
I
4])
We can use the m ethod of Example 3.45 to find a basis for the subspace spanned by a given set of vectors.
Example 3.46
Rework Example 3.44 using the method from Example 3.45.
Stlullon We transpose u , v, and w 10 gel row vectors and then form a mat rix wilh these vectors as ils rows: B=
3
- ]
5
2
1
3
o
-S
1
Proceeding as in Example 3.4 5, we reduce B to its reduced row echelon form I
0
i
o
I
-~
000 and use the nonzero row vectors as a basis for the row Space. Sin ce we started with column vectors, we m ust transpose again . Th us, a basis for spall (u , v, w ) is I
0
O.
I
~
-!
Re.a," • In fact, we do not need to go all the way to reduced row echelon for m- rowechelan form is far enough. If U is a ro w echelon form of A, then the nonzero row vecto rs
198
Chapler 3
MllIrict's
of Uwill fo rm a basis fo r row{A) (see Exercise 33). This approach has the advantage of (often) allowing us to avoid frac tions. I n Example 3.46, B can be red uced to
U~
3
- 1
0 0
-5 1 0 0
5
which gives us the b asis
3
0 - 1 , - 5
5
1
forspan( u , v, w ). • Observe that the methods used in Example 3.44, Example 3.46, and the Remark above will generally produce different bases. We now turn to the problem of findmg a basis for the colum n space o f a matrix A. One melhod is simply to tra nspose the mat rix. The column vectors o f A become the row vectors of AT, and we can apply the method of Example 3.45 to find a basis for row(A T) . Tra nsposing these vectors then gives us a basis for col (A). (You arc asked to do this in Exercises 21 - 24. ) This approach, however, reqUlTes performing a new SCI of row operations on AT. Instead, we prefer to lake an approach thaI allows us to use the row reduced form of A that we have already computed. Reca ll that a product Ax of a matrix and a vector corresponds to a linear com binatio n of the columns of A with the entries of x as coefficients. Thus, a no nt rivial solutio n to Ax = 0 represents a depel1dence rdatioll among the columns of A. Since eleme ntary row operations do not affect the solution set, if A IS row equiva len t to U, tire co lrlmll S of A have the same depentience relatwnships (IS the CO/llml1S of R. This important observation is lhe basis (no pun intended!) for the technique we now lISC to find a basis for col(A).
lumple 3.41
Find a basis for the column space of the matrix frol11 Example 3.45, 1 A ~
3 - 1 0 2 1
1
6 - 1
- 2
1
6
1
3
1
2 -3 4
1
1
Solullon
Let a , de note a column vector of A and let r, d enote a column vector of the reduced echelon form 1
0
1
0
- I
o o o
1 2 0 0
0 I
3 4
0 0 0
0
We can quickly see by inspection tha t c3 = r l + 2c 2 a nd r5 = - c1 + 3c 2 + 4c4 . (Check that, as predicted, the correspo ndi ng col umn vectors of A satisfy the same depen dence relations.) T hus, C3 and Cs contribute not hing to col (R). The remaining column
S«tion 3.5
Subspaces, BasIs, DImension, and Rank
199
vectors, f " f 2> and f 4 > are linearly independ ent , since Ihey are just standard unit veclors. The corresponding statements are therefore true of the colu mn vectors of A.
Thus, amo ng the colum n vectors of A, we climinate the dependent ones (a) and 3 5 ), and thc remaining ones will be linea rly indepe ndent and hence form a basis for col(A). What is the fastest way to find this basis? Use the columns of A that correspond to the columns of R containing tile leading I s. A basis for cal(A} is 1
1 - I
2
{a l • a 2• a 4} =
-3 •
1 1
2 •
-2
1
4
1
-i
Wa'Dlng
Elem entar y row op eratio ns change the col um n space! In ou r example, col (A ) '" col( R), since every vector in col (R) has its fo urth component equal 10 0 b ul this is certainly nOI true of col (A). So we must go back to the original mat rix A to get the colu mn vectors for a baSIS of col (A). To be speCIfic, in Example 3.47, r l> r" and r 4 do not form a basis for the column space of A .
Ixample 3.48
Find a baSIS for the null space of matrix A from Example 3.47. There is really nothing new here except the termi nology. We siml'iy have to find and describe the solutions of the homogeneous system Ax = O. We have al ready computed the reduced row echelon form R of A, so all that remains to be done in Gauss-Jordan elimination is to solve fo r the leading variables in terms of the free variables. The fi nal augmented matrix is
S01111101
[RIO] :
-1 0
1
0
1
0
0
1
2 0 0
° 1
4 0
0
0 0
0 0 0
0
) 0
If
x, X,
x:
X,
X. X,
then the leading Is are in columns 1, 2, and 4, so we solve for XI' Xz. and x. in terms of the free variables XJ and Xs. We get XI = - x) + Xs, Xz ::: - 2x, - 3X;. and x. = - 4Xs' Setting:s "" $ a nd ~ = t, we obta in
x:
+
X,
- $
X,
- 2$ -
x, x, X,
,
- 41
,
t
- 1
1
3t
- 2
- 3
=,
1
+
I
0
0
- 4
0
1
"" $ U
+
tv
Thus, U and v span null{A), and since they are 110early lOdependent, they form a basis (or null(A).
f 81
Chl'ptcr J
Malrices
Following is a sum mary of the most effective procedure to use to find bases fo r the row space, the column space, and the null space o f a matrix A.
I . ,:ind the reduced row echelon fo rm R of A. 2. Use the nonzero row vectors of R (containing the leading Is to arm a basi~ for row(A). 3. Use the column vecto rs of A that correspond to the columns of R containing the leading I s (the pivot colum ns) to form a basis for col (A ). 4. Solve fo r the leadi ng variables of Rx = 0 in terms of the frtt variables, set the free variables equal to parameters, substitute back into x, and write the result as a linear com binat ion o f f veclors (where f is the number of free variab These f vectors form a basis fo r nulJ (A).
If we do not need to find the null space, then it is faster to simply reduce A to row echelon form to find bases for the row and column spaces. Steps 2 and 3 above remain valid (with the substitutIOn of the word "pivots" for "leading Is").
DimenSion and Ranl We have observed that although a subspace will have different bases, each basis has the same number of vectors. Th is fu ndamental fact will be of vital im portance from here on in th iS book.
I
Theore. 3.23
"
T he Basis Theorem Let S be a subspace of vectors.
R ~.
Then any two bases for Shave th
umber of
P'"'
Sherlock Holmes Il(lted, ....."hen you have eliminated Ihe impossible, .... hate,·er remain~, hO"''f''WT improbable, muSI be the truth" (from The Sigll of Ft,"r by Sir Arth ur Conan Duyle).
Let B = lu" u2 ' ••• ' u,\ and e = Iv" v2' • •• ' v,) be bases for S. We need to pro ve Ihal r = s. We do so by showing that neither of the o ther 1\\'0 possibilities, r < s or r > s, can occu r. Suppose tha t r < s. '''le will show that this forces C to ~ a linearly dependent set of vectors. 10 this end, leI (I)
Since 6 .s a basis for S, we can write each v, as a linear combination of the elements u;
v. = ",.u, + " I2U 2 + ... + (I ),U , VI
=
" 2, U 1
+
(I 22 U2
+ ... +
(ll , U ,
Substituting the equations (2) into equation ( I ), we ob tai n
(2)
S«:tion 3 5
211
Subspaas, Basis, Dime nsion , and Rank
Regrouping, we have (cja ll
+
cZa l1
+ .,' + c,a'l)lll + (c1al l + 'la U + " . + c,a,l)UZ + ... + (cia.. + '2"1. + . + '," .. )U,
= 0
Now, since B is a basis. the u; s are linearly independent. So each orlhe expressions in parentheses must be zero: 'I tI li 'lal2
+ £1;1121 + ... + c,a,l = 0 + c2an + ... + e,n>! = 0 • •
This IS a homogeneous system of r linea r equations in the s va riables 'I> ' l " .•• c.. (The fact that the variables appear to the left of the coeffi cients makl"S no difference.) Since , < $, we know fro m Theorem 2.3 that there are infinitely many solutions. In particular, there is a llol1tnvia l sol utio n, giving a nOllt rivial dependence relation in eq ua tio n ( I). Thus, C is a linearly depe ndent set o f vectors. Bullhis findin g cont radicts the fact that C was given to be a basis, and hence linearl y independent. We condude that r < s is not possible. Sim ilarly (interchanging th e roles of B and C), we fi nd that r > 5 leads to a contradiction. Hence, we must have r = S. as desired.
Since all bases fo r a given subspace must have the same nu mber o f vectors, we can attach a name to this number.
DelialtioD
If S is a subspace of Din, then th e number of vectors in ,I basis for S
is called the dimeIJsioIJ o f 5, denoted dim
S.
",..r.
The zero vecto r 0 by itself is always a subspace o r R~. (Why?) Yet any sct cOlltnining the zero vector (and, in particular.IO}) is linearl y dependent. so {O) cannOI have a basis. We d efi ne dim {O} to be o.
Si nce the standard b3sis for Rn has /I vectors, dim IRn = II . (Note that th is result ag rees with ou r in tuitive unders tandin g of d imension fo r 1/ < 3.)
(1IlIIple 3.50
In Examples 3.4 5 th rough 3.48, we fou nd tha t row (A) has a basis with three vectors, col(A) has a basis with three vectors, and null{A) has a basis with two vectors. Hence, dim (row( A» == 3, dim (col(A)) = 3, and dllll (null (A)) = 2.
A single example is not enough on ,"hieh to specul3te, but the fa ct tha t the row and column SpliCes in Example 3.50 have the same d imension is no accident. Nor is the faCI that the su m of dim (col(A)) and dim ( null (A)) is 5, the num ber of co lumns of A. We now prove that these relationships are lrue in general.
2DZ
Chapter 3
Matrices
Theorem 3.24
= The row and col um n spaces of a mat rix A have the same dime nsion.
PrlOI
Let R be the reduced row echelon form of A. By Theorem 3.20, row(A)
r oweR), so
dim (row (A» = d im ( row(R»
= n umber of nonzero rows of R
= number ofleading Is of R Let this number be called r. Now col (Al "* col(R), but the columns of A and I~ have the same depende nce relationsh ips. Therefore, dim (col(A» = d im (col(R». Since the re are rleading Is, R has rcolumns that are standard unit vectors, e l '~"" , e". (These will be vec tors in Rm if A and Rare mX" matrices.) These r vectors are linea rly independent , and the remaining columns of R are linear combinations of them. Thus, dim (col(R» = r. It follows that d im ( row(A») = r = dim (col (A», as we wis hed to prove.
The rank of a matrix was first defined In 1878 by Georg J:robenius ( 1849- 1917), although he defined it using determinan ts and not as we have done here. (See Chapter 4.) Froberu us was a German malhematlCl'ln who r«tiyed his doctorate from and later taught at the University of Berlin, Best kn own for his wn tribut ions 10 gro up theory, Frobenius ust'd matrices in his work on group
representations.
Definition
T he rank of a mat rix A is the di mens ion of its row and column spaces and is de no ted by ra nk(A).
For Example 3.50, we can thus wri te rank(A )
= 3.
Rlm.,II. • The precedmg defi nition agrees wi th the m ore informal definitio n of rank that was introduced in Chapter 2. The ad va ntage of our new definition is that it is much m ore Oexible. • The rank of a matrix simultant:ously glV(:S us information abo ut linear dependence a mo ng t he row vectors of the matrix alld among its column vectors. In p a rticular, it tells us the number of rows and colum ns that a re linearl y independent (and this n umber is the sam e in each case! ). Since the row vectors of A a re the column vecto rs of AT, Theorem 3.24 has the foll owing immediate corollary.
Theorem 3.25
For any ma trix A,
mnk{A') = mnk(A ) PrOal
We have
mnk(A') = d;m«ol(A'» = dim(ww(A» = mnk(A)
Oel'nlll08
The nullity of a matrix A is the dimension of its /lull space and is denoted by nullity(A).
Se.:tion 3.5
203
Subspaces, Basis, Di mensIon , and Ran k
In other words, nullity(A) is the dimension of the solution space of Ax = 0, which is the same as the number of free variables in the solution. We can now revisit the Rank Theorem (Theorem 2.2), rephrasing it in terms of our new d efinitions.
r"
Theore .. 3.26
"
The Rank Theorem
..
"
If A is an mX n matrix, then
rank(A) + nullity(A) =
/I
Proof Let R be the reduced row echelo n form o f A, and suppose that rank (A) = r. Then R has r leading Is, so there are rleading variables and n - r frcc variables in the solution to Ax ::: O. Since dim (null (A» = /I - r, we have rank{A)
+ nuility(A)
= r + (II - r) =
/1
--
Often, when we need to know the nullity of a matTIX, we do not need to know the actual solution of Ax = o. The Rank Theorem is extremely useful in such situations, as the following example ill ustrates.
Elample 3.51
Fi nd the nullity o f each of the fo llowing matrices:
~N-
2
3
1
5
and
4 7 3 6 2 1 - 2 - I 4 4 -3 1 2 7 1 8
Solullon
Since the two columns o f M are clearly linearly indepe ndent , rank (M ) = 2. Thus, by the Rank Theorem, nu llity(M ) = 2 - rank( M ) = 2 - 2 = O. There is no obvious dependence among the rows or columns of N, so we appl y row operat ions to reduce it to 2
1
- 2
- 1
o
2
1
J
o
0
0
0
We have reduced the matrix far enough (we do not need reduced row echelon fo rm here, since we are not looking for a basis fo r the nul! space). We see thai there are on ly IwO nonzero rows, so rank(N ) ::: 2. Hence, nullity(N) = 4 - rank (N ) ::: 4 - 2 = 2.
-4
The results o f th is sectIon allow us to extend the FundamenlaJ Theorem of Invertible Mat rices (Theorem 3.1 2) .
2DC
Chapter 3 Matrices
Theorem 3.21
ow •
The Fundamental Theorem of Invertible Matrices: Version 2 Let A be an n X tI matrix. The fo llowing statements are equivalent:
a. b. c. d. e. f.
A is invertible. Ax = b has a unique sol ution for every b in R~. Ax = 0 has on ly the trivial solution. The reduced row echelon form of A is In' A is a p roduct of elementary matrices. rank (A ) = tI g. nullity(A) = 0 h. The colum n vecto rs of A are linearl y independent. i. The column vecto rs of A span R". j. The column vectors o f A form a basis for R". k. The row vecto rs of A are linearly independent. I. The row vecto rs of A span JR". Ill. The row vectors o f A form a basis for JR".
The nullity of (I !1l(l1r1X was defined in 1884 by Jal1\es Joseph Syive,tcr
PrOal
rIIII 4-HI97), who was interc5ted in
(0 -<=t- (g) Since milk (A) + nullity(A) = II when A is an /IX /I mat rL~, jt follows from
IItvariall/s-properties of matric~ that do not change under certain types of transformations. Born III Englnnd, Sylvester became the ~econd presIdent of the London Mathelml1ical SocIety. [n 1878, while teachmg at Johns Hopkins University in Baltimore, he founded the American lormlill of Malhellltllin. the first mathematIcal journal in the United States.
the Ra nk Theorem Ihal rank(A) ==
We have already established the equivalence of (a) thro ugh (e). It remains to be shown that statements (f) to (m) are equivalent to the first five statements. /I
if and only if n ul li ty(A) = O.
( f) => (d ) => (c) => (h) If rank(A) = tI, then the red uced row echelon fo rm of A has "leading Is and so is In" From (d) => (c) we know that Ax = 0 has on ly the triVIal solution, wh ich implies that the column vectors o f A are linearly independent. since Ax is jusl a linca r comblllatio n o f the column veClors o f A.
(h ) => (i) If the column vectors of A are linearly independen t. then Ax == 0 has only the triVIal solution. Thus, by (c) => (b), Ax = b has a unique solu tion for every bill IR". This means that every vector b in JR" can be written as a linea r combination of the column vecto rs of A. establishi ng (i). (i) => (j) If the colum n vectors of A span IR", the n col (A ) = Rn by definition, so rank (A) :::: dim (col (A» :::: tI. This is (f), and we have already established that (f) => (h ). We conclude that the column vectors o f A are linearly independent and so form a basis for IR:", since, by assumption, they also span IR". (j ) => ( f) If the column vectors of A form a basis for JR", then, in pa rticular, they are linearly independ ent. It follows that the reduced row echelon form of A contains /I lead ing Is, and thus rank(A) = II. The above dIscussion shows that ( f) => (d ) => (c) => (h) => (i) => (i ) => ( f) ¢> (g). Now recall that, by Theo rem 3.25, rank (AT ) :::: ra nk (A), so what we have just proved g ives us the correspond rng results about the column vectors of AT. These are then resulls about the rol\lvectors of A, bringing (k), (1),
--=-
Theorems such as th e Fundamental Theorem are not merely of theoretical interest . They arc tremendous labor-saving devices as well. The Fundamental Theorem h as already allowed us to cut in half the work needed to check that two square matri(:es arc inverses. It also simplifies the task of showing that certain sets of vectors are bases for Rn. lndeed, wh en we have a set of /I vecto rs in IR", that set will be a basis for (Rn if eitller of the necessa ry properties o f linear independence or spanning set is true. The next example shows how easy the calculations can be.
Section 3.5
Ixample 3.52
Subspaccs, Basis, Dil1lcnsion,alld Rank
2.05
Show that the vectors - I
1
4
2 •
0 • a nd
9
3
1
7
form a basis for RJ.
Solution According to the Fundamental Theorem, the vectors will form a baSIs for RJ if and only if a matrix with these vectors as its columns (or rows) htls rank 3. We perform Just enough row operations to determine this: 1
A ""
2 3
- I
4
o
9
1
7
,
1
- I
4
o o
2 0
1 -7
We see that A has rank 3, so the given voctors are a baSIS fo r R J by the equIValence of (f) , nd (j ).
The next theorem is an application of both the Rank Theorem and the Fundamental Theorem. We wi\( reqUire this result in Chapters 5 and 7.
•
Theor•• 3.28
Let A be an m X n matrix. Then a. rank(ATA)'" rank (A) b. The /lXn matrix ATA is invertible if and only if rank{A) =
II.
Proof (a) Since ATA is " X fI, it has the same number of columns as A. The Rank Theorem then tells us that rank(A)
+ nulhty(A) "" " "" mnk(ATA) + nu!lity(ATA)
Hence, to show that rank(A} "" rank(ATA ), it is enough to show that nullity(A} == nul lity(A TA). We will do so by establishing that the null spaces of A and ATA arc the same. To this end. let x be in nul! {A) SO that Ax = O. Then ATAx = ATO :: 0, and thus x is in null (ATA). Conversely, let x be in null ( ATA). Then ATAx = 0, so xTATAx = xTO = O. But then
(Ax) • (Ax) - (AX)T(Ax)
~ xTATAx ~ 0
and hence Ax = 0, by Theorem l.2 (d ). Therefore, x is in null (A), so null (A) = null (ATA) , as requ ired. (b) By the Fundamental Theorem , the nX /I matrix ATA is Invertible if and only if rank(ATA) = II. But, by (a) this is so if and only if rank(A) "" II.
Coordinate, \\le 110W return to one or the questions posed a t the very beginning of th is section:
How should we view vectors in R' that live in a plane through the origin? Are they '",,-a-dimenSIOnal or three-dimensional? Th e not iOll s of basis and dimension will help clarify things.
20&
Chapler 3
Matrices
A plane through the origin is a two-dimensional subspace of IR', with any set of two direction vectors serving as a basis, Basis vectors locate coord inate axes in the plane/subsp,lCe, in turn allowing us to view the plane as a "copy" of R2, Before we illustrate this approach, we prove theo rem guaranteeing that "coordinates" that arise in this \vay are Ulllque.
,I
Theore. 3.29
Let 5 be a subspace of IR" and let £3 = {VI' v 2, ... , V k} be a basis for S. For every vector v in S, there is exactly one way 10 write v as a linear combinJlion of the basis vectors in B:
Prool
Si nce £3 is a basis, it spans S, so v can be written in at least olle way as a linear combination of V I' v 2', •• , vk• Let one of these linear combinations be v
==
'I VI
+ ':2v2 + ... +
"V.
Our task is to show that this is the only way to write v as a linear combination of V1' V 2' ... , v k• ']0 this end, suppose that we also have V == d l¥1 + d1v2 +
... +
dkV k
Then Rearranglllg (using properties of vector algebra ), we obtain
(c\ - dl)v l + (':I -
d 1) V1
+ .,' +
{Ck -
(h) Vk == 0
Since B is a basis, vI' v2, • , , , v. are linearly independen t. Therefore, [n other words, 'I = d1' S = tl2• ••• , Ck = dl • and the two linear combinations are actually the same. Thus, there is exactly one way to write v as a linear combination of the basis vectors in B.
DaUaifioa
Let Sbe a subspace oflR" and let B = {VI' v l " ' " Vi} bea basis for S. Let V be a vector in S, and write v = 'I V] + ':2v2 + ... + '1vr Then '1' ':2, ... , '1 (Ire called the coordjnate$ of v with respect 10 B. and the column vector
<, <,
[vl, ~
<, is called the coordinate vector of v with respect to 6.
EKample 3.53
Let [ = {e j , e 2, eJ } be the standard baSIS for liJ • Find the coordinate vector of v =
2 7 4
with respect to e.
Section 3.5
211
Subspaces. Bas,s. Dimensio n, and Rank
Solallo. Si nce v = 2e, + 7tz + 4eJ , 2 7
[ v ),~
4
-4
II should be dea r that the coordinate vector of every (column) vector in R" with respect to the standard basIs is jusl the v('ctor itself.
(IImple 3.54"
z
3
In Example 3.44 , we saw that u =
- I ,v
=
0
1 ,andw =
- 5 are three vcc-
1 5 3 • . to rs m the same subspa (' (plane I h ro ugh the o n gln) Sof R .lIld that B basis for S. Since w = 2u - 3v, we have
,
.
'.
[W)B = See Figure 3.3 .
~
• {u, v} IS a
[-:J
,
......~" v
fi,,,.3.3
,
The coord1llates of a veCior with
r(Spec'
I" Exercises 1-4, leI S be the col/cclio,! of vectors [ ;] i" RI thm satisfy tlJcgivcn properry. J" ench case, elliter prove Ihm 5 forms a slIbspace ojR! or give a colltJIcrexample to show t!tat It does /lot. J. x=O
2. ;{ 2. 0,), ?::: 0
3.y==2x
4. xy
2::
9. Prove that every line through the origin in R J is;J subspace of R'.
10. Suppose S consists of all points in R2 that are on the x-axis o r the y-nxis (or both). (S IS called Ihe II niOIi of the ' .....0 axes.) Is 5 a subspace of R2? Why or ..... hy not?
0 X
171 £XCfCISes 5-8, fet S be thecol/eetiotl of vectors
y
,
in lR'
6. z=2.\;y:: 0
III
roll'(A ), as;1I EXiHllple 3.41 .
II. A=[: ~ -:]. b=[:].W=[- I 1 !J 12. A ==
tllnt if does lIot.
Z
Exercises II ami 12, determille wile/her b is III cof(A) and
wllelher w is
that satisfy tlte gil'f'll property. 171 cadI case, either prol't tlltlt S jorms ( I subspace oj R' or gIYe a cOllnterexample to show ::::0
a ba5is
8· lx - yl ~ Iy - ' I
7. x- y+ z= I
III
5. x == Y
10
1
I
0
Z
I
- I
- 3
1 - 4
I , b ~
1
o
,w =[Z 4 -5)
t..
Chapler 3
Matrices
13. In Exercise t I, determine whether w is in row (A), using the method described In the Remark fo llowing Example 3.4 t.
3 1. Exercise 29
14. In Exercise 12, determine whether w is In row(A)
34. Prove that if the columns of A are linearly independent, then they must fo rm a basis fo r col (A).
using the method descn bed Example 3.41 .
In
the Remark fo llowing - I
15. If A is the matrix in Exercise 11 , is v =
3 in null ( A )? -I - 1 in null (A )? 2
19. A -
1
0
1
0
1
- I
1
0
\
- \
- \
- 4 0 2 \ - 2 \
2 20. A =
- \ \
2 2
3
4
4
35. Exercise 17 37. Exercise 19 39. If A
is a 3 X5 matrix, explain why the columns of A
must be linea rly dependent. 40. If A is a 4 X2 matrix, explalll why the rows of A must be linearly dependent. 4 1. If A IS a 3X5 matrix, what afe the possible values of nullity(A)? 42. If A isa 4 X 2 mat rix, what ilre the possible values of nullily(A)?
-:l
1
For Exercises J5-38, give the rank and the nullity of tile mlTfrices ill the given exercises.
38. Exercise 20
III Exercises 17-20, give bases for rO IV(A), col(A), and mdf(A). 1 1 -3 0 18. A = 0 2 1 17. A = 1 1 -I - 4
[:
33. Prove that if R is a mat rix In echelo n form, then a basis (or row( R) consists of the no nzero rows of R.
36, Exercise 18
7
16. If A is the matrix in Exercise 12, is v =
32. Exercise 30
\
I" Exercises 43 arid 44, find all possible vailles of mnk(A) as .
a vanes.
2
a
4. - 2
2
\
III Exercises 21-24, filld b(Ises fo r rOIV(A ) alld cof(A) in the glVell exercises using AT. 21. Exercise 17
22. Exercise 18
23. Excrcise 19
24. Exercise 20
25. Explain carefully why your answers to Exercises 17 and 21 arc both correct even though there appear to be differences. 26. Explain carefully why your answers to Exercises 18 and 22 arc both correct even tho ugh there appear to be differences.
43. A =
-2
a
a 44. A =
\
3 -2
2 3
- \
-2
- \
•
Answer Exercises 45-48 by cOllsidering ,lie mfJIrix wltl. the given vectors liS its coi.mms. \
45. Do
\
\
\ - \
\
III Exercises 27-30, fill d a baSIS for the Spall of the given
fo rm a basis fo r IR J ?
• 0 • \
0
46. Do
0
\
\
5 • - 3 ro rm a basis (or R3?
- \ •
3
\
\
vectors. - \
\
27.
- \
•
0
0 •
29. (2
-3
30. (0
\
\ ). ( \
- 2
28.
\
- I
- \
\
0
\
I). (3
\
2
• 2 • \ • \ 0
\
- \
0
\
0). (4
- 4
- \
0), (2
\
47. Do
2
5
\)
48. Do
For ExerCIses 31 ami 32, find bases for the spans of ' he vectors ill the give" exercises fro m (lII/O llg ' he vectors tl.emse/vcs.
\
\
0
\
\
0
\
\ • 0 • \ • \
0
\) \
\
\
\
\
0
- \
\
0 0
,
0 • - \
form a basis for iJr?
\
0 0
- \
0
- \ •
\
\
0
fo rm a basis fo r Ir?
Xclion 3.6
In Exercises 49 and 50, show thaI w is in spall(13 Jand find the coordinate vector [wlB' 49. 13 =
50. [3 =
I
I
2 ,
o
o
-]
3
5
I ,
]
4
6
I
,w = I 3 4
111 Exercises 5 / -54, camp/ire fhe rank and nl/lIily of Ih e given matrices over Ille inciimtcd 7L p • 5 1.
53.
54.
]
I
0
0
1
]
0
1 over Z 2 ]
I
3
I
2
3 0
l over Zs
I
o
o
4
52.
I 2
] I
2 2 over Zl
4
022
5
I
I
1
62. Prove thai , for m X " matrices A and 8, rank (A
+
+ 8) <
rank (B).
63. Let A be an /JX n mat rix such that A< = O. Prove that ra nk(A) :S 11/ 2. I H int: Show that col( A ) C null(A) and use the Rank T heo rem.] 64. Let A be a skew-symmetric /I X n matrix. (See the
55. If A is //I X 11, prove tha t every vector In nul l(A) is orthogonal to every vector in row(A).
56. I f A and B arc /IX 11 matrices of rank has ra nk
60. Prove thaT an mX II mtllrix A has rank I ir and only if A can be written as Ih e o ute r p rod uct uv1' o f a vector u in Rm and v III III".
ra nk (A)
I
1
59. (a) Prove tha t if U is invertible, then ra nk( UA) = ran k(A). [HII1I: A = {j l (UA) .] (b) Prove that if V is invertib le, then rank{AV) = rank (A).
6 1. If an m X II malrix A has rank r, prove that A can be written as the sum of r matrices, each of which has ran k I. (Hillt: Find a way to use Exercise 60.)
200
2 4 001 63 5 1 0 1
57. (a) Prove that rank(AB) -< rank (B). ( Hillt: Review Exercise 29 In Section 3.1.) (b) Give an example in which ra nk (AB) < rank(B).
58. (a) Prove thai rank (AB):S ran k(A). (Him: Review Exercise 30 in Sect ion 3. 1 or use tra nsposes and Exercise 57(a ).J (b) Give an example in which rank(AB) < rank(A).
6
2
,w =
289
Introduction 10 Linear TrallsforlllaliOlls
/I,
prove Ihal AB
exercises in Sectron 3.2.) (a) Prove that x TAx = 0 for all x JI1 [R:n. (b) Prove tha t I + A is invertiblc. [Hint: Show
null(l
that
+ A) = {OI.J
fl.
Introduclion 10 Linear Transformalions In this section, we begin to explore one of Ihe themes from the imroducl lOn \ 0 this chapter. There we saw Ihat matrices can be used 10 transfo rm vecto rs, acting as a type o f " fu nction" of the form w = T( v), where the IIldepende nt vanable v and Ihe dependent va riable w arc vecto rs. We will make th iS notion more prec ise no w and look al several exam ples of such matrix transformat ions, leading to the concept o f a linear Irtlll5formation- a powerfu l idea that we will encounter repeatedly from here o n. We begi n by reGllling som e o f the basic concept s associated with fun ctions. You will be fami liar with m ost of these ideas from other courses in which yo u encountered function s of the formf: R --+ R Isuch asft x) = xl] that transform real nu mbers into real numbers. What is new here is that vectors arc invol ved :l1ld we are interested o nly in fu nction s that arc "compatible" wi th the vecto r operations of addiTion and scalar multiplication.
2:10
Chapter 3
Matrices
Consider an example. Let 1
A=
0 - I
2 3
,nd
[-:1
v=
4
T hen
Av =
1
0
2 3
- I
4
1
[-:1
3 -I
=
1
This shows that A transfo rms v into w =
3
- I We can desc ribe this transformat ion more generally. The mtltrix equatio n
x 2x - y 3x
+
4y
x 2x - y 3x + 4y (Although technically sloppy, o m itting the parentheses in definitions such as this on(' is a common convention that saves some writing. The description of T,., becomes
x 2x - Y 3x + 4y w ith this convention.) With this example in m ind, we now consider some terminology. A transforma· lion (or mapping or function ) T from R" to R'" is a rule tha t assigns to each vecto r v in Rn a unique vector T(Y ) in R"'. The domain of Tis R", and the codomain of Tis R;m. We indicate thLS by writing T: Ill" -+ R"'. Fo r a vector v in the domain o f T, the vector T (v) in the codoma in is called the image of v under (the action of) T. The set of all possible images T(v ) (as v varies througho ut the domain of T ) is called the ratlgeof T. In ou r example, the do mai n o f T... is R l and its cod omain is Ill J , so we write 1
T... : R l -+IR:' .The imageofv = [_ : ] is w = T,.,(v ) =
3 . What is the range of - I
Section 3.6
111
Inuoduchon to Linear Transformatio ns
T,.,? It consists of all vectors in the codo main Rl that arc of the form 2x - y
- x2
+ 4y
3
3x
o
1
x
+y
- I
4 1
which describes the set of all linear combinations of Ihe column vectors
o - I 4
morc
2
and
3 of A. In o ther words. thc range of Tis the column space of A1 (We wi ll have
say about Ihis later- for now we'll simply note il as an in teresti ng observatio n.) Geometrically, this shows that the range of T" is the plane th rough the o rigin in \0
IIll with direction vectors given by the column vectors of A. Notice that the range of T), is strictly smaller than the codomain of T,,:
linear Transformations The example T... above is a special case of a more general type of transformation called a linear tmllSformation. We will consider the general definition in Chapter 6, but the essence of it is that these are the transformations that "preserve" the vector operations o f add ition and scalar mul tiplication. •
Deftnltlon
A tra nsformation T: IR~ ~ R'" is called a litlear tratls/ormario" if
I . T(u + v) = T(u ) + T(v) forall u andvin R~ and 2. T (cv) = cT (v ) for all v in RW and all scalars c.
Example 3.55
Consider once again the transformation T : iR2 --+ R J defi ned by
x 2x - Y
h
+ 4y
Lei's check that Tis a linear transformation. To verify (I ), we lei u = [ ;: ]
and
v = (;:]
Then XI
2(x] 3(x] XI
+ x,
XI
2x l +2x,-YI- Y, 3xI + 3x2 + 4YI + 4yz
x, 2x l - YI
h i
+ 4YI
X,
+
2Xi - Yl 3x 2 + 4Yl
(2xl - YI )
(3x, +
4YI )
+ Xl
+ x2) + X, ) +
+ Yl ) 4(YI + Yl )
+ Xi
+ (2x, - Y?) + (3Xl + 4yz)
1;:1 t 1~ l(u) +
(YI
+ 7(v )
ttl
Chapter 3
Matrices
To show (2), we let v = [ ; ] and let cbe a scalar. Th en
ex 2(ex) - (0') 3(ex) + 4(0')
ex c(2x - y) c(3x + 4y)
x - c 2x - y 3x + 4y Thus, T is a linear t ransformation .
B'.lrk
T he d efi nition of a li near transformation can be st re
T : IR~ --J> IRm is a linear transfon nation if
In Exercise 53, yo u will be asked to show that the sta tement above IS eqUIvalen t to the o riginal definition. In p ractice, this equivalent fo n n ula tion can save some writingtry it! Altho ugh the linear transfor mation T in Example 3.55 originally arose as a matrix transfo rmatio n TN it is a sim ple matter to recover th e matrix A fro m the definition o f Tgiven in the example. We observe that x 2x - y 3x + 4y
o
I
= x 2
3
+y
- 1 4
I
=
2 3
o - 1 [; 4
1
I 0 so T = T. . , where A = 2 - 1 . (Notice that when the variables x and yare lined 3 4 up, the matrix A is j ust their coeffiCi ent matrix.) Recogmzmg that a transformati o n JS a matrix transfo rmation is importan t, si nce, as the next theorem shows, all matrix transformat io ns are linear transformatlons.
Theor•• 3.30
Let A be an mX tJ mat rix. Then the matrix tra nsfo rmation T .... : Rn defi ned by
is a linear transfo rmation.
--J>
R"I
Section 3.6
Prll'
Let u and v be vecto rs III Rn and let c bc a scalar. Then
T",(u + v) and Hence, Til.
Example 3.56
211
Introduction to Lme-ar Tnansformations
= A( u + v) =
Au
+ Av = T",(u ) + T",(v )
TA(CV) = A(cv) - c( Av ) = cT",(v ) IS
a linear tr:lIlsformation.
Let F : Rl --.. Rl be the transfor mation that sends each poin t to its reflection in the x-axis. Show that F is .tlinear transformation.
Sol.II,.
Fro m Figure 3.'1, it is clear that F sends the point (x, y) to the point (x, - y). Thus, we may write
)'
( I . 2) T
,, ,,,
~---r-->-~,+-+ x
,, ,, ,
,
,, ,,
We could proceed to check that F is linear, as in Example 3.55 (this one is even easier 10 check! ), but it is fas ter to observe Ihal
(r. - I·)
(1. - 2)
flgare 3.4 Therefore,
Refie("tion III the x-alCis
Jx]y rl
= A [x] Y , where A = ['0
0] . .
.
_ I ,SO F IS a malrlx transfo rmallon. It
now follows, by Theorem 3.30, that F is a linea r transformation.
Example 3.51
Let R: 1R1 ---+ Rl be the transformation tha t rotates each poin! 90" counterclockwise about the o rigin. Show Ihat R is a linear transformatio n.
Sal.II,.
( -I'. _1 )
----x
As Figu re 3.5 shows, R sends the point (x, y) to the point (- y. x). Thus,
we have (x. I")
-y
figure 3.5 0
A 90 rotation
.,
Hence, R IS a matrix transformation and is the refore linear.
Observe that if we multiply a matrix by standard basis vectors, we obta in the . columns of the matrix. For example,
,
,
,
,
, ,
b d
f
We can usc this observation to show that tl'l~ry linear transformati on from RM to R'" arises as a matrix tra nsformation.
21.
Chapter 3 Matrices
Theorlm 3.31
lei T: IR" -+ IR'" be a linear transformation. Then T is a n};ltrix transformation. More speCifically, T = TA , where A is the mX" matrLx A = [T« ,) ;T«,) ; ··· ;T «.,)]
I',.,f
Let e l, e 2, .•• , en be the standard basis vectors in R~ and let x be a vecior in R". \Ve can ,."rile x ::: xle l + x2e 2 + ... + x"en (where the x,'s are the components of xl. \Ve also know that T(e l), 1·(e zl, ... , T(e,,) are (colum n) vectors in Am. Let A ::: [T(e l) : T(el ) : ... : T(e n) J be the m X /I matrix wi th these vectors as its colu mns. Then
'f ( x) = T(xic i + x2e2 + ... + x"c,,) ::: Xl T(c l
)
+ .xzT(e2) + ... + x"T(c,,)
= [T(e,) ; 1'(o,) ; '"
; T« . ) ]
x, x,
= Ax
x, as required. The matrix A in Theorem 3.3 1 is called the standartl matrix of ti,e /i" ear transformatio" T.
Example 3.58
Show thtl l a rottllion about the origin through an angle 0 defines a linear transformation from [Rl to Oil and find ilS standard mat rix. lei R, be the rotation. We will give a geometric argumen t to establish the fact that R, is linear. Let u and v be vectors in H2. If they are nOI pa rallel, then Figure 3.6(a) shows the parallelogram rule that determines u + v. If we now apply R9 , the entire parallelogram is rotated th rough the angle 0, as shown in Figu re 3.6(b). But the diagomll of this parallelogram must be U9(u) + R.g{v), again by the parallelogmm rule. Hence, R~( u + v) = R,,(u ) + R,(v). (What happens ifu and v are parallel?)
S"IU,.
y
)'
,
-- -
u h
~I U
u
\)
•
~ u x (,j
(bj
JlIIIII 1.6
Similarly, If we apply R, to v and cv, we obtain R9(v) and R,(c v), as shown in Figure 3.7. But since the rotation does not affect lengths, we must then have ~( cv ) "" c~( v), as required. (Draw diagrams for the cases 0 < c < I, - I < ( < 0, and c< - I. )
SectIOn 3.6
Introduction to Linear Transformations
7:15
" /~___(cos
)'
O.
~jn
0)
sin 0
ROlev )
RI1 (v )
v
x
flgur. 3.]
Therefore, R,; is a linear transformation. Accordi ng to T heorem 3.3 1, we can fi nd Its matrix by determ in ing its effect on the standard basIS vecto rs e l and e 2 of Ill"'. Now,
[1] [<Sin0'09 ].
as Figure 3.8 shows, R,; 0 We can fi nd
=
~[~] similarly, but It IS faster toobscrve that ~[ ~ ] mu st be perpen-
[1]
dic ular (coUilierciockwise) \0 R,; 39) 0 ' . . IF19ure
and so, by Example 3.57, R,;
[0] [-')" 0] =
I
,
Therefore, the standa rd matri x of R,; is
cosO
[COS O -SinO ]. . sm ()
cos 0
y
" o
:...l:......-
4-0 x
flgur. 3.9 R~( el)
T he result of Example 3.58 can now be used to comp ute the effect of an y ro tation. For example, suppose we wish to rotate the point (2, - I) through 60" abo ut the origin. (The convention is that a positive angle corresponds to a coullIcrclockwise
21'
Cha pter 3
Matr ices
rotatio n, while a negative angle is clockwise.) Since cos 6()0 = we compute
Y
R6IJ
..... (2.
'] [,,,, 60" 60"][ ' ] = = [- I sin 60" -,;n cos 60" - I
- 0 /2][ 2] [ 0 '/' /2 1/2 - 1
= [ (2
- I)
1/2and sin 6()0 = VJ/2,
+ 0)f2]
(20 - 1)/2
T hus, the image of the point (2, - I) under th is rotatio n is the poin t ((2 (2 V3 - 1)/2) '"" ( 1.87, 1.23), as shown in Figure 3.10.
Figure 3.10
+ VJ )/2,
A 60" rotation
Exalllple 3.59
(a ) Show that the transfo rma tion P ; Al --to R! that p rojects a point onto the kaxis is a linear transformation and find its standard matrix. (b) More generally, if is a line thro ugh the o rigin in Al, show tha t the transformation Pt : Rl --+ Rl that projects a point onto is a linear tra nsfo rma tion and find Its standard matrix.
e
Salu1loa
y (x.
e
(a) As Figure 3. 11 shows, Psends the point (x, y) to the point (x,0). Thus,
q
,,
t
-\--" ",_ (x. 0) flglr.3 .11 A projection
.,
It fo llows that P is a ma trix transfo rmatio n (and hen ce a linea r transformat ion ) with
standard matrix
[~ ~J.
e
(b) Let the line have d irectio n vecto r d and let v be an arb itrary vector. Then p( is given by p rojd(v), the p rojectio n of v o nto d , which yo u'll recall from Sect io n 1.2 has the fo rm ula
T hus, to show that Pr is linear. we proceed as fo llows:
Similarly. Pt (cv ) = cPt( v ) for any scalar c (Exerctse 52). Hence, Pf is a linear tra nsform ation.
Section 3.6 Inlroduct ion to Linear Transformations
To find the standard matrix of Pt , we app ly Theorem 3.3 1. If we let d =
tn
~ [ dd:]'
then
.nd Thus, the standard matrix of the projection is A
, [ d ' - a, + di' d d -
tllaJ (tlt + d D] ay (tl L2 + aD
I
l
L
2
As a check, note that in part (a) we could take d = e l as a direaion vector for th e x-axis. Therefore,
a, "" 1 and d
l
= 0, and we ob tai n A ""
[~ ~], as befo re.
New linear Translormallons Irom Old If T: R'" -+ R~ and 5 : R" -+ RI' arc linear transformations, then we may follow Tby 5 to form the composition of the two transfo nnations, denoted S o T. Notice that, ' 11 order for S o Tto make sense, the codomain of Tand the domain of 5 must malch (Ln this case, they are both A ") and the resultLllg composite transformation S o T goes from the domain of Tlo Ihe codomain of 5 (Lll lhis case, SOT: R'" -+ IRP) . Figure 3.1 2 shows schematically how this compositIon works. The for mal definit ion of composition of transfo rmations is taken directly from this fi gure and is the same as the corresponding defi nit ion of composit ion of ordinary functions:
(5' T)(v)
~
5( T(v ))
O f co urse, we would like 5 0 T to be a linear transformation too, and happily we find that it is. We can demonstra te this by showing that So Tsat isfies the defimt lon of a li near tra nsform ation (which we will do in Chapter 6), but, since for the lime being we are assuming that line
R" T
s
Ih I
FllIg,. 3.12
The composillon of transrormations
y ) - (S
T)( v )
21.
Chapter 3
Matrices
•
Theor•• 3.32
Let T: Rm -? Rn and 5: R"-+ R' be linear transformations. Then S~ T: A- --t- A' is a linear transfo rmatio n. Moreover, their standard matrices ace..relatcd by ~
[So 1']
[SliT] ..
.,
,
P,..f let [5] = A and [T ] = H. (Not ice that A is px" and 8 is "x m. l I( v is a vector in R-, then we simply compute ~
(So T)(v) - S(7'(v))
5(Bv)
~
A(Bv ): (AB) v
(Notice here that the dimensions of A and B guaran tee that the product AB makes sense.) Thus, we sec that the effect of S 0 Tis to multiply vectors by AB, (rom which it follows immediately that S o T is a matrix (hence, linear) transformation wi th
[So 1' [ : [5 11 1'[. Isn't this a great result? Say it in words: " The matrix of the composite is the product of the matrices." What a lovely formula!
Ellmpl. 3.60
Consider Ihe linear tra nsform3 1ion T: jRl ~ R' from Example 3.55, defined by
and the li nea r tr3 nsformat ion 5: R J -? R4 defi ned by
y, 5 y, y,
SOIIIlIl
,
2Yl + YJ 3Yl - y,
:
Yl - Y2 YI+ Y2+ YJ
We see that the !>tandard matrices arc
[5] ~
, ,
- I
- I
0
0
[So 1'] : [5][ 1'] :
0 1 1
,nd [ 1'] :
,
1
,
so Theo rem 3 gives
,
1
0 3
0 3
, , - I
- I
0 1
1
,
- I
3
4
,
-,
3
4
0
5
4
3
-7
- I
1
6
3
0 :
It fo llows Ih31
(S o
T)[::] :
5
4
3
-7
-I 6
, [::] 3
:
5x1
+
4x2
3x1
-
7x!
-
Xl
6xl
+ X! + 3Xl
S«hon 3 6
Introduction to
Lin~ar Transformations
21.
(I n ExcrcJse 29, you will be asked 10 check this result by seu ing
y, y,
=
-1'[::] =
YJ
3x1
+ 4x2
and substilUling these val ues into the definition of 5, the reby calcul:lting (S d irectly.)
Example 3.61
0
T)
[x, ] XJ
..•-t
Fmd the standard matrix of the transforma tion that first rotates;! pom l 90° co un terclockwise abou t the origi n and then reflects the res ult in the x-axis.
Solallon
The ro tation R and the reAcetion F were discussed in Examples 3.57 and
356, respectively, where we found their standard matrices to be (R] =
[F] = [ ,
[~
-
~ ] and
0 ]. It follows that the composition F o R has for its matrix
o -,
[F' R]
~ [ e][ R] = ['o
0][0 - 1 ]
-']o =[ - 0-'] J
0
(Check that this result is correct by consldenng the effect of Fo R o n the st:mdard basis vectors c l and e J • NOie the importance of the order o f the transformat ions: R is performed before F, but we write F o R. In this case, R <> F also makes sense. Is R o F = F <> R?)
Inverses oIl/near TranslorllaliOls Consider the effect of a 90" counterclockwise ro mlion abo ullhe origin fo llowed by a 90 0 clockwise rolation abou t the origin. C learly Ihis leaves every point in HI unchanged. If we denote these tra nsformations by R90 and R_90 ( remember that a negati ve angle m easure corresponds to clockwise directio n ), then we may express this as (R9!) <> R_\IO)( v) == v fo r every v in R2. Note thai , in this case, if we perform Ihe tra nsformations in the other order, we gel the sa me end result: (R _'K! <> R90 ) (v ) = v for every v in A2. Thus, R90 0 R ~90 (and R~ <JO <> R.,o too) is a linear transformation that leaves every vector in A Z uncha nged. Such a tra nsformation is called an identity transformation. Generally, we have one such transformation for every RW- namely, I: R~ -l> R~ such Ihat l t v) "" v for every v in R". (If it is important to keep track of the dimension of the space, we might write In for clarity.) So, with this notation, we have R90 <> R ~90 = I "" R_90 0 ~ A pair of transform ations that are related to each other in this way are called inverw transformations.
Dennltloa
LeI S and Tbe linear transformations from R~ to A". Then 5 and T are inverse transformatioru if S o T = I" and T o 5 = I,..
ttl
Chapter J
Matrices
•••• r.
Since this definition is symmctric wit h respect to S and T. we will say that, when this situation occurs, S is the inverse of T and T is the inverse of S. Fur· thermore, we will say that 5 and T are invertible. In terms of matrices. we see imme(llatciy that if 5 and Ta re inverse transfonnatlOns, then [5 II TI :: 15 0 TJ "= II J = I. where the last I is the identity marrix. (Why is the standard matrix of the identity tr:lIlsformation the identit y matrix?) We must also have [TI [SI .,. I T o 51 = II] = I. This shows that IS] and I TI are inverse matrices. II shows something more: If a linear tr.ansformatlon T is invertible. then its standard matrix I TI must be invertible, and since matrix inverses are unique. this means that the inverse of Tis also unique. Therefore, we can unambiguously usc the notation r - I to refer to tile inverse of T. Th us, we can rewrite the above equations as ITIl T IJ = 1 = [ r- I JI '1'1, shOWing that the matrix of 'r - I IS the inverse matrix of I TI. We have just proved the follOWing theorem.
Theorem 3.33
,
Let T: R" __ [R" be an invertible linear transformation. Then its standard matrix I 'f) is an invertible matrix, and
-
•••• r.
Say this one in words 100: "The matrix of the inverse is the inverse of the matrix." Fabulous!
Ellmple 3.62
Find the standard ma trix of a 6Cf' clockwise rotat io n about the origin in 1R: 2•
SDlullDa Earlier we computed the ma trix of a 60" counterclockwise rOlal ion about the origin to be
1/2 - 0 / 2] [ R., ) - [ 0 /2 1/2 Since a 600 clockwise rotation is the inverse of a 60" counterclockwise rmation, we can apply Theorem 4 to obtain
- 0 /2] -' ~ [ 1/2 0 / 2] 1/2 - 0/2 1/2 (Check the calculation of the matrix inverse. The fastest way is to use the 2X2 short· cut from Theorem 3.S. AlSQ, check that the resulting matrix has the right effect on the standard basis in [Jl2 by drawing a diagram.)
Example 3.63
Determine whether projection onto the x-axis is an invert ible transformat ion, and if it is, find ils inverse.
Sol,II., The sta ndard ma trix of this projection P is [~ since its determin ant is O. Hence, P is not invertible either.
~ ] , wh ich is not invertible
Section 3.6
H•••
221
r.
Figure 3.13 gives some idea why Pin Example 3.63 is not invertible. The projection "collapses" 1R2 on to the x-axis. For P to be invertible. we would have to have a wa y of "undoing" it. to recover the point (fl. b) we started with. However, the re are infinitely ma ny candidates for the image of (fI, 0) under s uch a h ypo thetical " inverse." \-vhich o ne should we use? We can not simply say tha t r l must send (a. 0) to (a, b), since this cannot be a defimtion whe n we have no way of knowing what b should be. (See Exercise 42. )
,,«(/. , b) ~ (fI,b')
,
( ll.
Introd uction to Lmcar Transformations
0)
, ' (fI, b")
figure 3,13
Assoclallvilv
Projections arc nOl invertl ble
Theore m 3.3(a ) in Sectio n 3.2 stated the associativity property for matrix multiplication: A( Be ) = (A IJ) c. Of you didn't try to p rove it then, do so now. Even wi th all matrices restric ted 2 X2, yo u will get some feeling for the notat io nal complex ity involved in an "elementwise" proof. which should make you ap preciate the p roof we are abou t to give.) Our a pproach to the proof is via linear transformations. We have seen tha i every mX" matrix A gives rise to a linear transformation T", : Rn -+ jR'''; conversely, every linear tra nsfo rmation T: R" -+ R'" has a co rresponding m X II matrix [ T J. The two correspondences are inversely related; that is, given A, !Toll = A, and given T, TIT! = T. Let R = TA • 5 = TIi' and T = Tc Then , by T heorem 3.32,
A(BC)
~
(AB)e if,ndonlyif R .(S. T )
~
( R. S). T
We now prove the latter identity. Let x be in the domlli n of T (and hence in the do main o f both R 0 (S o T) and (R 0 5) 0 T- wh y?). '10 prove that R 0 (S o T) = (R 0 5) 0 T, it is enough to prove that they have the same effect on oX . By repeated applicat ion of the definition of composition, we have
(R 0 (s • T))( x)
~
~
R« S ' T)(x )) R(S( T(x)))
~ ( R ' S)(T( x )) ~ «R ' S) · T)( x )
required. (Carefull y check how the definitio n o f composi tion hlls been used four times.) This sectIOn has served as an introduction to linear transformations. In C hapler 6, we will take another m o re detailed and more general loo k at these transformati o ns. The exercises that follow also contilin some add itional exploratio ns of thiS important concept.
3S
Exercises 3.6 I. Let TA
:
.
]R2 -+
IR' be the matrix transfo rmatio n corre-
spondmg to A =
[2 -'j 3
4 . Find T",(u ) and T",(v),
2. Let TA : IR J -+ Rl be the matrix tra nsformation correspond ingtoA = [
4 - 2
0 I
- lj. Filld T",( U)a nd 3
I TA( v), where u
=
- I
2
0 and v =
5
- ,
!ZZ
Chapter 3
Matrices
III Exercises 3-6, prove that the given lrtills/ormation is u
linear trans/ormation, using the defillitloll (or the Remark /ollolYillg Example 3.55). X [
x
5. T Y
,
+
r]
4.T[;]
x -y
+,]
x- y [ 2x + y - 3z
x 6. 'J' y
-y x
+
2y 3x - 4y
x+ , y+ , x+y
III Exercises 7-10, give a counterexample /0 show that tIle
givell tram/orma/io" is Ill)( a linear traIlS/ormation.
T[;l [~l ""[;1 [xx:J 7.
r[;] ~ [:::1 10.T[;] ~ [;+ :] 8.
/11 Exercises J J-1-1, filld Ihe standard lIIa/rix of the lil/ellr mills/ormation in lire gil'cn exercise. LI. Exe rcise 3
12. Exercise 4
13. ExeTCIse 5
14. bercise 6
In Exercises 20-25, find tire sW lldard matrix 0/ lite gIven linear tmm/ormation from Rl to ]R2. 20. Cou n te rclockwise rotation t h rough 1200 abo u t the • •
origin
21. Clockw ise rot ation th ro ug h 30° abo u t lhe origin
y == 2x 23. Projection o n to the line y == - x 24. Refl ectio n in the li n e y = x 25. Rencction in the line y = -x 22. Project io n onto t h e line
e
26. Let be
e,
(a) Draw d iagra ms to show thaI Fr is Imea r. (b ) Figure 3.14 suggests a way to fi nd the matrix of Fr, using the fact that t he d iago nals of a parallelogram b isect each o ther. Prove that Fr( x ) = 2Pe( x ) - x, and use th is r es ult to show th at the standar d matrix of F( IS
( where the direction vector o f /11 Exemses15- 18, show tlwilireg iven tram/ofl/wtion from 01: 2 to [Rl is linellr by showing l/rat It IS a IIwlrix I rans/ormatioll.
e15 d = [ ~~ J).
(c) If the a n gle between Cand the positive x -axis is (J, sh o w tha t the ma t rix of F( is
20 si n 20
COS
15. Freflects a vecto r in th e y-axis.
[
16. R rotates a vector 45° coun te rcl o ckwise about the
s in 20] - cos 2{)
o n glll. 17. D stre tch es a vector by a fa ctor of 2 in the x-co mponent a n d a fac to r o f 3 in t h e y-co mponent.
I
18. Pprojects a vector onto the line y = x. 19. The three types of elem entary ma t rices g ive rise to five types of 2X2 m at rices with one of th e fol lowing forms:
[~ ~lo' [~ ~l [~ ~l [~ ~] o, [ : ~l Describe geometricall y the effect o f the matrix t ra nsformatio ns from R2 to ]R2 that corresp o nd to these elementary matrices. D raw p ictu res to ill ust ra te your answers.
Figure 3.14 11/ ExefCIses 27 ami 28, appl)' part (b) or (c) 0/ ExerCISe 26 to find the sland"ra lIIatrix 0/ the tmlls/ormatiol/. 27. Reflect ion in th e line y = 2x
Section 3.6 Introduction to !.mear TransformatIOns
223
e
28. Reflection in the line y = v'3x 29. Chttk the formula for S o T in Example 3.60, by tKrforming the suggested direct substitution.
Itl Exercises 30-35, verify Theorem 3.32 by finding tile matrix cIS 0 T (a) by direct substltutioll (///(1 (b ) by matrix mllfllp/ical iotl of [S] [n.
10. T[x,] _ [x' x 'j.s[Y'] ~ [:r' ] x, x,+x, Y2 Y2
31. T[~] = [:~x~ :X:J s[;~] _[Y;, ~ 3~2] Y, + 3Y2 2Yl + Y2 YI - Y2
4 1. If 0 is the angle between lines and m (thro ugh the o rigin), thcn Fm0 F( = R+2JJ• (Sec Exercise 26.)
42. (a) If P is a proJection, then p o P = P. (b) The matrix of a projection can never bc invertible. 43. If e, 171, and n are three lines through the origin, then F" 0 F.. 0 F( is also a reflection m a line through the ongm. 44. Let The a linear transformation from R2 to R Z (or from R J to Rl). Prove Ihal T maps a straight line 10 a straight line or a point. ( Hint: Use the vector form of the equation of a line.) 45. Let Tbe a linear transformation fro m R2 to HZ (o r from R J to Rl). Prove that Tmaps parallel lines to paraUellines. a single line, a pair of points. or a single potnt.
x, 33. T X2
x, x, 34, T Xz
x,
~ [x' +
2Xz -
2,,]. sly'] ~ Xl
Y1
y,
Yl - Y: Y1 + Y2 - YL + Y2
YI - Y2
y,
In Exercises 46-51. let ABeD be the square with vertices (- I , I), (l, I), (I. -l). and (- I. -I). Use the resllits in Exercises 44 and 45 to find and draw Ille image of ABeD Imder the given lransformation. 46. T in Exercise 3
47. Din Exercise 17 48. P in Exercise 18
49. The projtt"lion in Exercise 22
III Exercises 36-39, find file standard mtltrix of the compos-
ire trans/ormation from IIll to IJII, 36. Counterclockwise rolation through 60". foll owed by reflection in the line y = x 37. Reflection in the y-axis, followed by c1od-wise rotation through 30" 38. Clockwise rota tion through 45°, followed by projection onto the ,..axis, followed by clockwise rotation through 45°
39. Reflection in the 1mI.' y = x, followed by counterclockwise rotatio n through 30". followed by reflectio n in the liner= - x
In Exercises 40-43, lise mal rices to prove lhe given slatemellts abollt transformations from RZ to RZ. 40. If R, denotes a rotation (about the origin) through the angle 8, then R.. 0 Rfj = R,, "Il'
50. T in Exercise 31
51. The transformation in Exercise 37 52. Prove that Pe( cv ) - cPt{v) for any .scalar c IExample 3.59(b»). 53. Prove that T: IR" -+ Rm is a li near transformation if and on ly if
for all vI' vl in R" and scalars cl'
c,.
54. Prove that (as noted at the beginning of this sect ion) the range of a linear transformation T: R~ -+ R'" is the column space of its matrix IT[. 55. If A isa n invertible 2X2 matrix, what docs the Fundamental Theorem of Invertible Matrices assert about thc corresponding linear transformatio n Til in light of Exercise 19?
R
lics
[n 198 1. the U.S. Space Shuttle Columbin blasted off equipped with il dev ice called the Shuttle Remote Ma nipulator System (SRM S). This robotic arm, known as Canadarm, has proved 10 be a vi tal tool in all subsequent space shuttle missions. providing strong, yet precise a nd delica te handling o f its payloads (see Figure 3.1 5).
. ..
•
•
Flgur. 3.15 Canadarm Canadarm has been used to place satellites into their proper orbit and to retrieve mal func tion ing ones for repair, a nd it has also pe rformed critical repairs to the shuttle itself. Notably, the robot iC arm was instrumental in the successful repair of the Hubble Space Telescope. SIIlCC J 998, Canadarm has played an Important role in the assembly of the Interna tional Space Station. A robotic arm consists of a series of 1mb of fixed length connected at joiuts where they can rotate. Each link can therefo re ro tate In space, or ( through the effect of the other links) be tra nslated parallel to itself, or move by a com bination (composition) o f rotations a nd translatio ns. Before we can design a mathematical model for (I ro botic arm, we need to understa nd how rotations and translations work in composition. To s implify matters, we will aSsume that our arm is in 1R2. In Section 3.6, we saw that the matrix of a rotation R about the o rigin thro ugh an
•
v
=
[COS. O -sinO ]. ( Figure 3. 16()) a . Ir
sinO cosO [ :]' then a translation along v is the transformation
T(x ) = x + v or,equivalentl y,
L ".
( Figu re 3.16(b».
r[;]
=
[;::J
y
y
TOt ) =
,
R(x )
X
+\
,
•
,
- - - --, (a ) R(){alion
(b) TranslalJon
Fig." 3.11
Unfortunately. translation is not a linear transformation because n O}
"* O. However,
there is a trick that will gel us around this problem. We can represent the vector x " = (xl as the vector
y
Y
in R). This is called represent ing x in homogeneous roor-
I
dinah!!. Then Ihe m atrix multiplication lOa
o o
I
0
b I
x y 1
x+. y+ b I
represents the transl.lted vector T(x) in homogeneous coo rdinates. We can trcat rOlations in homogeneous coordinates too. The matrix mul tiplication
cosO sin 0
o
-sinO
0
x
xcosO - Y5m 0
cos O 0 o I
Y
xsinO + yeos O
I
I
rep resents Ihe rOI:llOO vector R(x) in homogeneous coordinates. The composition T o R Ihal gl\'es the ro tation R followed by Ihc transbtion Tis no\v represented by thc product
Oa cos8 -sm O 0 o 1 b sm O cos O 0 001001 l
I
sinO
cosO
a b
o
0
I
cos O - smO
(Note that R o T* T o R.) To model a robotic arm. we give each link its own coordinate system (called a fmme) and examine how o ne link moves in relation to those to which it is d irectly connected. To be specific, we let the coordinate axes for the link A j be X, and Y with " bethe X,.-axis aligned with the link. The length of A j is denoted by Il " and the angle tween X, and Xi _ l is deno ted by 8,. The joi nt between A , and A i- I is at the point (0.0) relative to A , and ( " i-I> 0 ) relative to A ,_ t. Hence, relative to A ,_ I' the coordina te system for A , has been rotated through 0, and then translated along
[a~I] m
(Figure 3.17). T his transformation is represen ted in ho mogeneo us coordinates by the matrix
T~
•
cos 9,
-sinO;
sin 9, 0
cosO, 0
a,_1
0 I
)" - I
• '--L.___~
X, _ I
I--a . _ ,~
figure 3.n
To give a speCIfic example, consider Figure 3.18(a). 1t shows an arm with three li nks in which A , IS in its initial posit Ion tlnd each of the other two links has been rota ted 45° from the previous link. We will take the length of each link to be 2 units. FIgure 3. 18(b) shows A ) In its initial fram e. The transformatio n
T,
=
cos 45 sin45
o
-sin 45
2
cos45
0
o
I
o
I
causes a ro tation of 45° and then a translation by 2 units. As shown in 3.18(c), this places A , III ItS ap propriate position relative to A / s frame. Next, the transformation (0545 Tz =
sin4 5
o
- sin 45 cos45
o
2
0 I
o
I
is applied to the previous resu lt. This places both A ) and A 2 in their correct position relative to A I' as shown in Figure 3.18(d). Normally. a third transformation TI (tl rotation) would be applied to the prel'ious result, but III our case, TI is the identity transformation because A I stays in ils initial position. Typically, we want to know the coordinates of the end (the "hand") of the robotic arm, given the length and angle parameters-this is known as forward kiuemtlt/cs. Following the above sequence o f calculations and refe rring to Figure 3. 18. we see thilt we need to determine where the poi nt (2, 0) ends up after T) and Tz tlre applied. Thus,
I Y3
•
"
•
'-L-'---_ _ _+
;\ 1
x,
- - - - -_x3 (b) A ) in
(u) A three-link chain
I\S
imtial frame
3' ,
(c) T3 puts A3 in A2'S initial frame
n••rt 3.1.
the arm's hand is at 2 7~ TJ 0
=
I
I/ V, - I/ V, I/V, I/V, 0
0
2 ' 2 0 0
I
=
I
0
-I
I
0 0
0 =
2+ V, V, I
2 0
I
2+V2 2+V2 I
which represents the poi nt (2+ v'i, 2+
v2)
in homogeneous coordinates. It is
easily checked fro m Figure 3.18(a) that this is correct. The methods used in this example generalize to robotic arms in three di men-
sions, although
In
W there are morc degrees of freedom and hence more variables.
The method of homogeneous coordinates is also usefu l in other appIIC::llions, notably computer graphics.
'"
n.
Chapter 3
Ma trices
Applications
--'-
Markov Chains A market research team is cond uc ting a contro lled su rvey 10 del(.'rm ine peop le's prefe re nces in toothpaste. The sam ple consists of 200 people, each o f whom is as ked to Iry two brands of toothpaste over a period of several months. Based on the responses to the survey, the research team compiles the fo llowing statistics abou t toot hpaste p references. O f those using Brand A in a ny mon th, 70% continue to usc it the fo llowing m o nth, wh ile 30% switch to Brand B; o f those using Brand B in an y mo nlh, 80% con " tinue to use it the fo llowing m on th , wh ile 20% switch 10 Brand A. These fin d ings are sum marized in Figure 3. 19, III which the perccntages havc been converted into deCImals; we will think of them as probab ilities.
' L..-
j~
Ij •
0.]0 0.70
0.80
AnJ m A. Ma rkov ! 18~ 1 922)
a Russian mathematician who stud.ed and later taught at the University o f St. Petersbu rg. He was interested III number th eo ry, analysis, nnd th e th eo ry of contmued fractions, a rece ntly developed fiel d wh ich M:lrkov applied 10 probability th eory Markov was also in terested m poetry, and one of the uSt'S to whICh he put Markov chains was th e analysis of patterns in poems and other literar y texts.
"'"015
lllmple 3.64
0,20
fillur. 3.11 Figurc 3. 19 is a simpl e ex;t m p le o f a (flll ite) Markovchain. [t re presents an evo[ving proccss consisting of a fi n ite number of Slates. At each step o r point In time, the process may bc in any o ne of the states; at the nexl step, the process can re main III its present state or switch to o ne of the other states. The state to which the process moves at the next step and the probablhty of its do ing so depend otlly on the prestnt state and not on the past history o f the process. These p robabilities are called lramilioll prob(Ibililics and are assumed to be consta nts (that is, the p robability of moving fro m state i to state j is always the same).
In the toothpaste su rvey descri bed abovc, there are just two states-using Brand A and using Brand B-and the t ransition probabilities are those lIId lcated in Figure 3. 19. Suppose that, when the survey begins, 120 people arc uSing Bra nd A and 80 people are using Brand B. How many people will be using each b rand I mon th later? 2 mon ths later? The n umber of Brand A users after 1 month Will be 70% of those in itially usi ng Brand A (those who rem ain loyal to Brand A) plus 20% of the Brand B uscrs ( tho~ who switc h fro m B to A) :
S"IUII
0.70( 120)
+ 0.20(80)
~ 100
Sim ila rly, the n umber of Brand B users after' mo nth will be a comb ination o f those who switch to Brand B and those who continue to u.se I t: 0.30(120)
+ 0.80(80)
~ 100
SectIon 3.7
Applications
229
We can summarize these two equations in a single matnx equation:
0. 7 0 0. 2 0][ 120] = [1 00] [0.30 0.80 80 100 Let's
call the matrix Pand label the vecto rs Xo = [ I!~] and
XI := [ : :
J. (Note
that the components of each vector are the numbers of Brand A and Brand ij users, in that order, after the number of months llldicalcd by the subscript.) Thus, we have XI = P X IJ> Extending the not alion, let XI_be the vector whose components record the distributio n of tooth paste users after k months. To determine the number of users of each brand after 2 mo nths have elapsed, we simply apply the same reasoning, starting wi th XI instead of xO' We o btain Xl
=
P XI =
0.70 [ 0.30
0.20][ 1 00] [90] 0.80 100 = 110
from ,"hich we see th,lt there arc now 90 Brand A users and 110 Bra nd B users.
The vectors X I in Example 3.64 are called t he state vectors of the Markov chain, and the matrIx P is called its transition matrix_ We have just seen that a Markov chain satisfies the relatio n XHI = Px ..
fork
0, J,2, .. .
Fro m this result it follows that we can comp ute an arbit rary state vector itemtll'l:ly o nce we know Xo and P. In other words. a M:lrkov chain is completely determmed by its transition probabilities and its initial Slate.
Re .. lrU • Suppose, in Example 3.64, we w:lll ted to keep track of not the actual numbers of toothpaste users but, rather, the reltltive nu mbers using each brand. We could convert the data into percentages o r fractio ns by dividing by 200, the IOtai number of users. Thus. we would start with
Xo -
[;J [~:~~] =
to reflect the fac t that , initi:llly. the Brand A- Brand B split is 60% 40%. Check by
.
dIrect calculation that PXo =
[0.50] , whIch . can then be taken as Xl (10 agreement
0.50 with the 50-50 split we computed above). Vectors such as these, with no nnegat ive components that add up to I , arc called probability vectors. • Observe how the transitio n probabiliti es are arranged within the transItion matrix P. We can think of the columns as being labeled wit h the prese" t states and the rows as being labeled With the "ext states: Pn-seut A B Next
A [0.70
B 0.30
0.20] 0.80
Chapter 3
230
Matrices
The word stochastic I~ derived from the Greek ad;t\:tive stokhastikQs, meaning "(arable of aiming" (or gu~ing).1t has com to be applied to anything !.hat h gO\'erne
No te also that the columns of P are probability vectors; an y square m atrix with this p roperty is called a stochastic matrix. We can realize the d etermini stic nat ure of Markov chains in another way. Note lhat we can write
a nd, in general, Xl
=
piX,
for k
0, I, 2, ...
This leads us to exam ine the powers o f a transi tion matnx. In Examplc 3.64, we have
p! = [ 0.70 0.30
0.7
A 049
A
0.3 A
~ y B
fl'lre 3.21
B 0_21·
AOOO
~ BO.24.
0.20 ][ 0.70 080 0.30
0.20] 0.80
~ [ 0.55 0.30 ] 0.45
0.70
What are we to make of the elHries of th is mat rix? The first thing to observe is Ihal pl is ano ther stochastic matTIX, sincc its colum ns sum to 1. (You arc asked to prove th Is in Exercise 14. ) Could it bc that pl is also a transition matrix of some kind? Consider one of its entries say, (1'2)21 = 0.45. The tree d iagram in Figure 3.20 clarifies where Ihis ent ry came from . Thcrc are four possible state ch,mges that can occur ovcr 2 months, and these correspond to the four b ranches (or paths) o f length 2 in thc tree. Someone who illlt iall y is using Br;md i\ can end up using Brand B 2 m on ths b tcr in two d ifferen t ways (marked '" ill the figure ): The perSOll can con tinuc to use A after I month and then switch to 8 (wi th probability 0.7(0.3) = 0.21), o r thc person can switch to Barter 1 m onth and thell stay with B (w1lh probability 0.3(0.8) = 0.24 ). The sum of these probabilities givcs an overall probability of 0.45. O b serve that these calculations are extlctly wh:Jt we do whcn we compute ( P~)"I ' It fo llows that ( P2h = 0.4 5 represents the probabil ity of m oving from state I ( Brand A) to state 2 (I}rand Bl in two transitions. (Notc II1:1t the o rder of the subscripts is the rCI'erst' of what yo u migh t have guessed.) The argumcnt can be general ized to show tha t j
"
( pt)~ is the p robability of moving from state j to statc i in k transitions.
In Example 3.64, what will ha ppen to the d istribution of toothpaste users in the lo ng ru n? Let's wo rk wilh p robability veClors as state vecto rs. Continuing our calculations (ro unding to three deCImal places), we find
Xu
0.60] [
[0.50] [0. 4 5] x - Px - [0.70 0.20][0.50] 0.50 ' 0.30 0.80 05 0 = 055 ' 0.70 0.20][0.4 5] ~ [0.425] x ~ [0.412] x ~ [0 . 4 06] = [ 0575 0.588 ' 0594 ' 0.30 0.80 0.55 03] [0 . 4 02] [0. 4 01 ] [0.400] [0. 4 00] [0.4 0.597 ,x = 0.598' = 0.599 ' x\> = 0.600 ' x = 0.600 '
= 0.40 ' xI --
2 -
PXl
I -
'4
7
X8
5
10
Section 3.7
131
Applications
and so on. It appears that the state vectors approach (or COtlVugf: to) the vector [ 0.4] . 0.6 implying that eventually 40% of the toothpaste U2r5 in the survey will be using Brand A and 60% will be using Brand B.lndeed, it is easy to check that, o nce this distribution is reached, it will never change. We s imply compute
0.70 [ 0.30
0.20] [ 0.4] = [0.4 ] 0.80 0.6 0.6 A state vector x with the property that Px := X is ca lled a steady state vutor. In Chapler 4, we will prove that every Markov chain has a unique steady state vector. For now, let's aceepl this asa fact and see how we can find stich a vCClorwLthout doing any iterations al aiL We begin by rewriting the matrix equation Px = x as Px = Ix. whICh can in turn be rewritten as (I - P}x = O. Now this is just a homogeneous system of linear equations with coefficient matrix 1 - P, so the augmented matm is 11- P I 0). In Example 3.64, we have 1 - 0.70 ( / - P I 0) = [ - 0.30
- 0.20 0] [0.30 1 - 0.80 0 ., - 0.30
which reduces to
-0.20 0] 0.20 0
0]
[~ ° a -~
So, if our steady state vector is x ::: .
.
[XI]' then Xl is a free variable and the parametric X,
solutIOn IS
X, :::
j t,
X:2 = t
If we require x to be a probability vector, then we must have I = X,
Therefore, X:! = ' ''''
i=
0.6 and
X,
+ Xl =
=
it + t =
i "" 0.4, so x =
~t
[0.4 ]. in agreement with our 0.6
iterative calculations above. ( If we require x to contain the aCt/l(ll distribution, then in this example we must have x, + Xz ::: 200, from which it follows that x "" [ I
Example 3.65
~~
l)
A psychologist places a rat in a cage with three compart ments, as shown In Figure 3.21. The rat has been trained to select a door at random whenever a bell is rung and to move through it into the next compa rtment. (a) If the rat is initially in compartment I, what is the probability that it will be in compartment 2 after the bell has rung twice? three times? (b) In the long run, what proportion of its time will the rat spend in each compartment? SOllllloD
Let P =
I~ I "" PJI =
II}ql be the transition matrix for this Markov chain . Then
L Pu
::: Pu = j , P12 = P2J =
J. and
Pl1 = P1:I. = Pn = 0
232
Chapter 3
Matrices
flll"l 3.21 ( Why? Remember th;!t P'I is the prob,lbility of moving fro m j to i.) Therefore,
0
I) =
, j
, j
0 ,
, •,
!
,,
•, 0
and the initial sl;lte vector is I x,~
0 0
(a) After one ring of the bell. we ha\'e
,! 1,, !, 0 , 1, , , 0
0
x, = PXo =
0
I
0 0
,1 1 ,
~
0 0.5 0.5
Continu ing (rou nding to three decimal places), we find
, , o , o , , 1 , o , ,, 1 , 1, o ,
,
1
~
1
0333
I I
0333
1
• .L
0.222
"
0.389
.L
0.389
0333
a nd
o 1 !,, ,1 j x} = Px~ = , o , ,1,
l
,1
o
,
~
'"
1
Therefore, after two rings, the probability that the rat is in compartment 2 is = 0333, and after three ri ngs, the probability that the rat is in com partment 2 is 0389. I Note t hat these questions could also be answered by computing ( p lhl and (JV h .j
? ....
Section 3.7
Applications
2:33
(b) This questio n is aski ng for the steady stale vector x as a prooobility vector. As we saw above, x must be in the null space of I - P, so we proceed to solve the system
,, _1, 0 , 1 1 _, -"2 0 _ 1, ,, 1 0 1
[/ - PI Oj -
•
1
0
_., 0
0
1
- I 0
0
0
0 0
x, Hence, if x =
X~ , then
X3
x, bility vector, we need I
C XI
= t is free and
XI
=
i i, X2 =
t. Sm ce x must be a proba-
+:s +:s "" ~ t. Thus, l '" i and
,• ,• 1
•
which tells us that, in the long ru n, the rat spends ! of its tim e in compart ment 1 and 1 o f its time in each of the other two compartments.
Populallon Growlft P. H.. Leslie, "On the U!;C (,f Matrices in G:rtain I\' pulation MathematiG." Biorll,·tri!.tJ J) ( 1945), pp. 18J-lll.
Example 3.66
One o f the most popular models of population growt h is a m atTix-based model, fi rst in trod uced by i'. H. Leslie in 1945. The Leslie "Iodel describes the growth of the female portion of a population, \vhich is assu med to have a maxi mum hfespan. The females arc divided into age classes, all o f which span an eqU:11 number of years. Using data abou t the average birthrates and survi val probabilities of each class, the model is then able 10 determme the growth of the popul:Hlon over lime.
A certai n species of German beetle, the Vollmar- Wasserman beetle (or VW beetle, for sho rt), lives for at most 3 years. We divide the female VW beetles into th ree age classes of I year each: yo uths (0- 1 year),juveniles ( l-2 years), and adults (2-3 yea rs). The youths do not lay eggs; each juvenile produces an ave ragc of four female beetles; and each ad ult produces an average of three fe males. The survival r:lle for }'ouths is 50% (that is, the probability of a youth's survi ving to become a juvenile is .5), and the survival rate for juven iles is 25%. Suppose we begin wi th a pop ulation of 100 fe male VW beetles: 40 youths, 40 juven iles, and 20 adults. Predi ct the beetle populatio n for each of the next 5 years.
5alall01 After I year, the number of yout hs will be the number prod uced d uring tha t year: 4O X 4 +20X3=220 The num ber of juvcniles will simply be the nu mber of youths that have survived: 4Q x 0.5 = 20 Likewise, the number of ad ults will be the nu mber o f juveniles that have su rvived : 40 X 0.25 = 10
We can combine the~ into a single matrix. eq uatio n
o
4 0 0.25
0.5
o
3 0 0
40 40
220 20
=
20
10
40 o r Lxo xI' where Xo = 40 is the initial population d istribution vector and XI
=
=
220 20
20
10 is the distribution after I year. We see that the struct u re of the equation is exactly the same as for Ma rkov chains: X i ... 1 :::: Lx. fo r k = 0, I, 2, ... (although the in terpretation is qUite different). It follows that we can iteratively compute successive population distribution vectors. (It also fo llows that x~ = L*x o for k = 0, 1,2, .. . ,as for Markov chai ns, bu t we wi ll not use thIs fact here.) We compute o 4 3 220 110 20 = 110 Xl = LX I = 0.5 0 0
o o X3
= Lx, =
0.25
0
10
5
0.5
4 0
3 0
110 110
o
0.25
0
5
455 55 27.S
4 0.5 0
3 0
455
o o
0.25
0
4 0 0.25
3 0 0
27.5 302.5 227.5 13.75
o x4=Lx)=
Xs =
L~
=
0.5 o
::::
302.5
55
227.5 13.75 95 1.2 151.2 56.88
Therefore. the model predicts that after 5 yea rs there will be approximately 951 you ng fe male VW beetles. 151 juveniles. and 57 ad ults. ( N ote: You could argue that we should have rounded to the nearest in teger at each step-for example, 28 ad ults after step 3-wluch would have affected the subsequent iterations. We elected Itot to do th is, since the calculations are o n ly approximations anyway and it is much easier to u~ a calculator or CAS if you do not round as yo u go.)
The matrix L in Example 3.66 is called a u$lie matrix. In general. if we have a p opulation with" age classes of equal duration. L will be an " X II matrix with the following structure: b, b, b, .. . $1 L ~
o o
0
0
52
0
0
S.l
000
... ...
5~_1
0
Here. b l • hz• ... are the birt/! parameters (b, = the average numbers of females prod uced by each female in class i) and 51' s,;, ... are the mrvival probabilities (5, = the p robabil ity that a fe male in class i survives into class j + 1).
Section 3.7
4000
235
Applications
Youlhs
= o 3000
1000
o k::;:::~ ;;;;;;;;:::;::. -..-=4==.= :::::;:=A~d",..:I"_ o
2
4 6 Timc (10 years)
10
flgtue 3.ZZ Wh at arc we to ma ke of o u r cnlcu lat ions? Overa ll , thc beetle pop ulntion appears to be Incrensing, nlthough there nrc some fluctua tion s, such as a decrease from 250 to 225 fro m year \ to yea r 2. Figure 3.22 shows the change in the population in each of the th ree age classes a nd dearly shows the growth, wi th fluctuations. If, ins tead of plotting the actl/al population, we plot the relative population in each class, a diffe rent paltern em erges. 10 do this, we need to compute the fmc lio n of the population in each age class in each yea r; that is, we need to divide each distributio n vector by the sum o f its comp<ments. For example, after I year, we have 1
- x 250 I
1 250
220
0.88
20 to
0.08 0.04
which tells us lha t 88% of the population con sists of you ths, 8% is juveniles, and 4% IS adults. If we plot this type of data over time, we get a graph like the one in Figure 3.23, whICh s hows clearl y that the propo r tio n of the populat ion in each class IS app roachi ng a s teady state. II t urns o ut that the steady slate vector in th is example is 0.72
0.24 0.04 That is, in the long run, 72% of Ihe populatio n will be you ths, 24% juveni les, and 4% adults. (I n o ther words. the population is distributed nm ong the th ree age classes in the ratio 18: 6 : I .) We will see how to dete rmine this ratio exactly in Chapter 4.
Graphs and Digraphs There are many s itualions III which I t IS important to be able to model the interrela tionships amo ng a finite set of objects. For example, we might WIsh 10 desc ri be various Iypes of networks (roads connecti ng low ns, airline routes connecting CIties, cormnulllcalion llllks connecting satellites, CIC. ) or rela tionships amo ng gro ups or individ uals (friendship rehllionships in a sociClY. predator- prey relatio nships III an ecosyste m , dominance relationships in a sport, etc. ). Graphs are ideally sui ted to modeling such ne tworks and relmionships, and it turns out that ma trices are a useful tool in their study.
23&
Cha pler 3
MalTices
09 0.8 0.7 A
B
c
.--0
.
-
Yoolh ~
0.6
0
8.
0.5
~
-• & " 0
c
C
D
0.4 0.3
Ju ven ile ..
0.2
0.1
Adults
o~~~~~~~~~===-~
A'- --.'---C!
o
5
10 TlIne (in years)
D
figure 3.24 Two rcpustntililo ns o( the same graph
The term VCrlex (\'('rl;c('s is the plural) comes (rom the La tin verb verrett, which means "to turn." In th e co ntext of gr 'l phs (and gt"{)metry ), (1 vertex iS:I corncr-a point wher(' an edge "turns" into a diff(' rent edge.
IS
20
flgur. 3.23 A graph consists of a finit e set of points (called vertices) and a fi nite set of edges, each of which connects two (not necessarily distinct ) vertices. We say that two vertices are adjacent if they are the endpoin ts of an ed ge. Figure 3.24 shows an example n of the same graph drawn in two different ways. The g raphs are the "sam e in the sense that all we care about a rc the 3djacency relationships tha t identify the edges. We can record th e essential in form3lion abo ut a graph in a matrix and use matrix algebra to help us answer certai n questions about the graph. This is particularly useful if the graphs are large, since computers can handle the calculations very quickl y.
Definition
If G i.<> a graph wit h IlXn matrix A [or A(G)I d efined by
a , "= { I 'I 0
/I
vertices, then its adjacency matrix is the
if there is an edge betwee.n vertices i and j otherwise
Figure 3.25 shows a graph and its associated adjacency matrix.
",
,',• A-
1
1
1
1
1 0
I
1 0
1 0
'.flaur. 3.25
'J
A gmph with adjacency matrix A
1
0
0
0
0
Section 3.7
Applications
231
ft ••• ,.
O bserve that the: adjacency matrix of a graph is necessarily a symmetric matrix. (Why?) Notice also that a diagonal entry a" of A is zero unless there is a loop at vertex i. In som e situations, :'I grap h may have more than one edge between a paito f vertices. In such cases, It may make sense to modify the definition of the adjacency matrix so that (lj) eq uals the I1Ilmber o f edges hctwecrl vertICes i and ). We defi ne a patl. in a graph to be a sequenccor edges that allows us to travel from o ne vertex to another conti nuo usly. The length of a path IS the num ber of edges it contains, and we will refer to a path with k edges as a k-ptlth. For example. in the graph of Figure 3.25, VI v1 v2v. is a 3-path, and V~V I V2V1 Vt Vj is a 5-palh. Notice that the first o f these is closed (it begins and ends at the same vertex); such :, p:lIh is called a circui•. The second uses Ihe edge between 1'1 a nd 1'21WICe; a palh th
3 2 I 0 2 3 2 I I 2 2 I
o
I
I
I
Wh:1I do the entnes o f Al represent? Look al the (2, J) entry. From Ihe definition of matrix multipl icatio n we know that {A~ ) D = a21aD
+ an an + " u a" +
a24a43
The only way this expression can resu lt in a non zero number is if at lenst one of the products auQ u that make up the su m is nonzero. But au au is nonzero if and o nly if bolh ~t and au are no nzero, which means that there is an edge between 1'2 and vl as well as an edge between I'l and 1'1' Thus, there will be a 2-path between vertices 2 and 3 (via vertex k). In our example, this happens for k = 1 and for k = 2, so 2 ( A )lJ = aJl a"
= 1 ·1
+ an alJ + alJQJl + " 24"-4l
+
1·1 + 1'0 +0 ·0
=2 which tells us that there are two 2- paths between vert ices 2 and 3. (Check to see Ihal the remaining entries o f A2 correctly give 2-paths in the graph.) The argumen t we have just given can be generalized to yield the following result, whose proof we leave as Exercise 54. =
I
If A is the adjacency matrix of a graph G, then the ( ~j ) entry of Al is equal to the number o f k-paths between vertices i lind j.
(IlmaI13.61
How many 3-paths are there bclw~ n VI and
1'2
in Figure 3.25?
We need the (1,2) entry of A' , which is the d ot product ofrow I of A ~ and column 2 of A The calculation gives
SolutIon
(AJ )1l =3 '1 +2 · 1
+ \.\ +0'0 = 6
so there art: six J -paths between vertices \ and 2, which can be easily checked.
231
Chap ter 3
Matrices
/',"~----~>------c "
In momy applications that can be mode/cd by a graph, the vert ices are ordered by SQme type of relation that imposes a direction on the edges. For example, directed edges I11ight be used to represent one-way rou tes in a graph that models a transportation network or predator-prey relationships in a graph modeling an emsystem. A graph with directed edges is called a digraph. FIgure 3.26 shows an example. An easy modification to the definition of adjacency matrices allows us to use them with digraphs.
DeliaUion
If G is a digraph with /I X II n1:l trix A ror A( G) I defined by
figure 3.26
tI
vertices, then its adjautlcy matrix is the
I+. digraph
a . = {I if there is an edge from vertex ito verlex j IJ 0 otherwise
Thus, the adjacency matrix for the digraph in Figure 3.26 IS
A=
o o I I
I
0
I
0 0 I 000 0 I 0
Not surprisingly, the adjacency matrix of a dlgmph is not symmetric III general. ('Vhen would it be?) You should have no difficulty seeing that Ai now contains the numbers of directed k-paths between vertices, where we insist that all edges along a path flow in the same direction. (See ExerCise 54. ) The next example gives an applica· tion of this idea.
Example 3.68
D
w
Five tennis players (Davenport , Graf, Hingis, Scles, and Williams) compete 111 a round-robin tournament in which each player plays every other player once. The digraph in FIgure 3.27 summarizes the results. A directed edge from vertex i to vertex j means thai player I defeated player j . (A digraph 11\ whic.h there is exactly one directed edge between ellery pair of vertices is called a tQurnamerll.) The adjacency matrix for the digraph in Figure 3.27 is
G
A-
S
flglrI 3.:n A tournament
H
0 I 0 I I 0 0 I I I I 0 0 I 0 0 0 0 0 I 0 0 I 0 0
where the order of the vertices (and hence the rows and colu mns of A) is determined alphabetically. Thus, Graf corresponds to row 2 and column 2, for example. Suppose ,"'c wish to rank the five players. based on the results or their matches. One way to do this migbt be to count the number of wins for each pla)'er. Observe that the number of WinS each player had is just the sum of the entries in the
StClion 3.7 Applications
239
corresponding row; equivalently, the ve(lor conhlining all the row sums is given by the product Aj, where
1 1 ) =
I
1 1 in our case. we have
0 0 Aj "'"
1 0 0 1
1 0 0 0 0 0
0 0 I
1
1
1
1 1 0
1
1 1 = 1 I
0
J J 2 1 1
which produces the followin g ran king: First: Davenport, Graf (tic) Second: H ingis Third:
Sdes. Williams (tie)
Are the players who tied in this ranking equally strong? Davenport might argue that since she defea ted Graf, she deserves first place. Scles would use the same type of argumcnt to break the tICwi th Williams. However, Williams could argue tha t she has two "indi rect" victories because she beat Hmgis. who defeated two others; fu rt hermo re, she m ight note that Seles has only one indircct victory (over Williams, who then dcfeated H mgis). Since one player might not have defeated all the others with whom she ultimately lies, the notion of ind irect wi ns seem s more useful. Moreover, an indi rect victory corresponds to a 2-path In the digrap h, so we can use the square of the adjacency ma trix. To compute both wins and indirect wins for each player, we nced the row su ms of the matrix A + A 2 , which arc given by
(A + A'lj
=
=
0 1 0 1 1 0 0 1 1 1 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1 2 2 J 1 0 2 2 2 1 1 0 2 2 0 0 1 0 1 I 0 I 0
2 1 2 0 1 1 1 0 1 0 1 2 0 0 1 0 0 1 0 0 1 0
0 I
+
1 1 1 1 1
1 I
1 1 1
8 7 6
2 3
Thus, we would rank the players as fo llows: Davenport, Graf, Hingis, Williams, Seles. Unfortunately, this app roach is not guaran teed 10 break all ties.
Chapter 3 M:J{rlce's
frror-Correctlng Codes Section 1.4 discussed examples of error-delecling codes. We turn now 10 the problem of designing codes that can ,nrrecl as well as d elect certai n ty pes of errors. O ur message will be a vector x in .l~ for some k, and we will encode it by using a mat rix transforma tion T : 1:~ ..... 1:; for some" > k. The vector T(x) will be called a ",de veclor. A simple example will serve to ill us trate the approach we will take, which is a generaJization of the parity-check vectors in Example 1.31.
Example 3.69
Suppose the message is a single bina ry digit: 0 or I. If we encode the message by simply repeating it twice, then the code vectors are [0, OJ and [ I, I]. This code can de tect single errors. For example, if we tra nsmit [0, OJ and a n error occurs in the first component, then [1,01 is received and an error is detected, because this IS not a legal code vector. However, the receiver can not correct the error, since [ I, OJ would also be the result of an error in the second component if [ I, I] had been transmitted . We can solve this problem by mak lllg the code vectors longer- re peating the message d igit three times instead of two. Thus, 0 and 1 are encoded as {O, 0, 0] and [ I, I, I ], respect ively. Now if a single error occurs, we can nol only detect it but also correct it. For example, If 10, 1. OJ is received, then \ve know it m ust have been the result of a Single error in Ihe transmission of [0, 0, 0 ]. since a single error ill [ I, I, I J could not have produced it.
Note that the code in EX
tra nsformation, albei t a particularlytnvlal o ne. Let C =
1 and define T; Z 2 _ Z~
I b y T (x ) = Gx. (Here W~ ar~ thinking of the clements of 1:2as 1 X I matrices.) T he ma t fix G IS called a generator matrix for the code. ·'0 tell whether a received vC<:lor is a code vector, we perform not one but two par-
ity checks. We req uire that the receIVed vector c =
e, Cz satisfies 'I = c1 ::
' ,.
We can
c, write these equations as a !tnear system over 1: 2: (I
If we lei p =
=
~
(I)
0'
[ ~ ~ ~ ], Ihen (1) is equivalent to 1>C =
parity check matrix fo r the code. Observe that PC =
[~]
o. The mat ri x P lscalled :1 =
o.
To see how these matrices COllle in to play in the correction of errors, su ppose I we send I as I = [ 1 I I but a single error causes it 10 be received as
f,
I
SectIOn 3.7
c' = [1 0
AppilcatlOns
711
IV. We comp ute
Po'
[: o I
I
~]
0 I
so we know that c ' cannot be a code vector. VV'here is the error? Notice that Pc' IS the second column of the parity check matrix P- this tells us that the error is in the second componen t of c ' (which we will prove in Theorem 3.34 below) and allows us to correct the error. (Of course, in this example we could find the erro r faster without uSing matrices, but the idea is a useful one.) To generalize the ideas In the last exam ple, we make the following definitions.
Definitions
If k < II, then an y nXkmatrix o f the form G =
[ ~], where A is
an (II - k)X k matrix over 7l 2> is called a standard generator matrix fo r an (II, k) biliary code T: 7l~ --,lo z~. Any (/I - k) X n matrix of the form P = [ B In k)' where B is an ( II - k) X k matrix over 71.1' is called a standard parity check matrix. The code is said to have length II and dim ension k.
Here
what we need to know: (a ) When is G the standard generator matrix fo r an error-correcflllg bmary code? (b ) Given G, how do we find an associated standard parity check matrix P? It turns out that the answers are quite easy, as shown b}' the follow ing theorem .
Theorem 3.34
If G =
IS
[ ~] is a standard generato r matrix and P =
[B
I~~tl is a standard par-
ity check matrix, then P is the parity check m atrix associated with G if and on ly if A = B. T he corresponding (II, k) binary code is (single) error-correcting if and only if the colum ns o f Pare nonzero and distinct.
Before we prove the theorem, let's consider another, less tri vial example that illustr
Example 3.10
Suppose we want to design an error-correcting code that uses th ree parity check equat ions. Since these equations give rise to the rows of P, we have /J - k = 3 and k = /J - 3. The message vecto rs come from 7l~ , so we would like k (and therefore II) to be as large as possible in o rder that we may transmit as much information as possible. By Th eorem 3.34, the /J columns of P need to be nonzero and distinct, so the maximum occurs when they conSIst of all the 2J - I = 7 nonzero vectors of Zr k = 7l~. One such candidate JS
p=
I
I
0
I
I
I
0
J
I
o
J
I
I
0 1 0 001
0
0
In
Chapler 3
Malrices
This means Iha! I
I
0
I
I
0
I
I
0
I
I
I
A=
and th us, by Theorem 3.34. a standa rd generator matrix fo r this code is
0
0 0 0 I 0 0 0 0 I 0 O 0 0 I I I 0 I I 0 I I 0 I I I I
C-
As an example of how Ihe generator matnx works, suppose we encode [0 I 0 I ]T to get the code vecto r c = Gx = [0
I
0
I
0
!It
0)'
I
If this ve
0 I
Pc' =
I
0
I
0 0
I
I
0
I
0
I
I
I
0
I
I
0
I
0
I
I
0
0
I
I
I
0
I
0 w hich we recognize as column 3 of P. Therefore. Ihe error is in the third com ponent of c'. and by changing il we recover the correct code vector c. We also know that the fi rst fo ur co m ponents of a code vector arc the original message vecto r, so in this case we decode c to get the o riginal x = [0 1 0 I t
The code in Example 3.70 is called the (7. 4) Hamming code. Any binary code constructed in this fashion is called an ( n. k) Hamming codt'o Observe that, by construction, an (n, k) Hamming code has 11 = 2,,- l - 1.
PrOlf If TII"r._ 3.3.
(T hro ughout th ts proof we d eno te by a j the ith co lum n o r a m atrix A.) With P and G as in the statement of the theorem, assume fi rst that they arc standard parity check and generato r matrices fo r the same binary code. T herefore, for every x in Z;, PGx = O. [n terms of block multiplicat ion,
(B
T][ ~]x
'"
0
for allx in Z~
Sectioll 3.7
Equivalen tly, for
Bx
lIll
codes Claude Shannon had proven theoretIc-ally possible 1I1 1948.
243
x III Z~ we have
+ Ax :: (8 +
A)x
(Ill
=
Bx
0'
Richa rd W. Halllming ( 191 5- 1998) receIVed his Ilh.D. in mathematics from the University of Ill inois at Urbana-Champaign In 1942.. HIs mathematical researc h interests were III the fields of differentIal equa\lons and numerical analYSIS. From 1946 to 1976, he worked at Bell Labs, after \.,.hich he joined the fiICulty at the U.S. Naval Postgrad uate School in Monterey, C.1Iifornia. ln 1950. he published his fundament(11 paper o n error-correcting codes, glVlI\g an explicIt construction for the optimal
ApplicatIOns
+
fAlx
= [8 !J[ ~]x = 0
= Ax
If we now take x = c i ' the ith standard basis vector in Z~, we see that
b, = Be, = At', = a, for all
I
Therefore, U = A. Conversely, it is ellSY to check that if B = A, then PCx = 0 for every x in Z~ (see Exercise 74). To see {hat such a p
Z!
Pc' = P(c
+
e,)
=;
Pc + Pe;::::; 0 + p, = p,
which pUlpoints the error in the ith component. On the other hand, if P, =; 0, then an error In ' he IIh componen t wtll not be detected (i.e., Pc' = O), and if p, = PI' then I'll' cannot determine whether an error occurred in the ich or thejth com ponent (Exercise 75). The main ideas of this section arc summarized below.
I. For /I > k, an IIX k matrix G and an (fl - k) X II matrix P (with entries in Z2) are a standard generator matrix and a standard parity check matrix, respectively, for an (II, k) binary code if and only if, in block form,
for some ( n - k) x k matrix A over ill' 2. C encodes a message vector x in Z! as a code vector c in Z~ ViOl c == Gx. 3. C is error-correcting if and only if the columns of P are nonzero and distinct. A vector c ' in Z; is a code vector if and only if Pc' = O. In this case, the corresponding message vector is the vcctor x in consisting of the first k components of c'. If Pc' 0, then c' is nota code vcctor and Pc' isone of the columns of P. If Pc' is the jth column of P, then the crror is in thc ith component of c' and wc can recovcr the correct code vector (and hence the message) by changing this component.
*"
Z;
FE
2/14
Chapter 3
Matrices
Marlo. Chains
0.5 0, 3 ] [0.5 0.7 be tile trallS/tiOll II/a· frix for a Marko y cllain wltll two stntes. Let Xo = [0.5] b, 0.5
/11 Exercises /-4, let P =
tile initial suite vector for llie poplI/a liOlI. 1. Compute X I and Xz.
2. What proportion of the state I population will be in state 2 aft~r two steps? 3. What proport ion of the state 2 population will be in state 2 after two steps? 4. Find the steady stale vector.
/" Exercises 5-8, let P =
,1 ,1 1, ,! I
°, , 1
!
be the lransitio" malrix
°
for a Markov elw;" with three stMes. Let XtJ = j"ilial state vector for the populntioll.
120 180 bell,t 90
5. Compute x, and x,;. 6. Wha t proporlion of the state I population will be in
state I after two steps? 7. What proportion of the slate 2 population will be in state 3 after two steps?
8. Find the steady state vector. 9. Suppose that the weather in a particular region behaves according to a Markov cham. Specifically, suppose that the probability that tomorrow will be a wet day is 0.662 if today is wet and 0.250 if today is dry. The probability that tomorrow will be a dry day is 0.750 If today is dry and 0.338 if today is wet. [This exercise is based on an act ual study of rain fa ll in Tel AVIv over a 27·year period. See K. R. Gabnel and J. Neumann, "A Markov Chain Model for Daily Rainfall Occu rrence at Tel Aviv," Quarterly }oumal of tile Royal Meteorological SocielY, 88 ( 1962), pp. 90-95.1 (3) Write down the transition matrix for this Markov
chain. (b) If Monday is a dry day, what is t he probability that Wednesday wl\l be wet? (c) In the long ru n, what will the distribution of wet and dry days be?
10. Data h3ve been accumu [ated on the heights of chilo dren relative to thei r parents. Suppose that the proba· bllities that a taU parent will have a tall, medium· height, or short child arc 0.6, 0.2, and 0.2, respectively; the probabili ties that a mediu m· height paren t will have a tal l, medium-height, or short child are 0. 1, 0.7, and 0.2, respectively; and the probabilities that a short parent wilt have a tall , medium· height, or short child are 0.2, 004 , and 0.4 , respectively. (3) Write down the transi tion matrix for this Ma rkov
chain. (b) What is the probability that a short person will have a tall grandch ild? (c) If20% of the current pop ulation is tall, 50% is of medium height, and 30% is short, what will the distribution be in th ree generations? (d) If the data m part (c) do not change over time, what proportion of the population will be tall , of medium height, and short in the long run? II. A study of pinon (pille) nut crops In the American
sou thwest from 1940 to 1947 hypothesized th3t nut production followed 3 M3 rkovchain. [Sec D. H. Thomas, "A Computer Si mulation Model of Great Basin Shoshonean Subsistenc~ and Settlement Patterns," in D. L Clarke, cd .• Mo(Je/s ill Archaeology (London: Methuen, 1972 ).J The data suggested that if one year's crop was good , then the probabilities that the fo llowmg yea r's crop would be good. fa ir, or poor were 0.08, 0.07. and 0.85, respectively; if one year's crop was fa ir, then the probabilities that the following year's crop would be good, fair, or poor were 0.09, 0. 11 , and 0.80, respectively; if one year's crop was poor, then the probabilities that the followi ng year's crop would be good, fa ir, or poor were 0. 11 ,0.05, and 0.84, respectively. (. ) Write down the transition matrix for this Markov chain. (b) [f the pinon nut crop was good in 1940, fi nd the probabilities of a good crop in the years 1941 through 1945. (c) In the long run, what proportion of the crops will be good, fai r, and poor? 12. Robots have been programmed to traverse the maze shown in Figure 3.28 and at each junction randomly choose which way to go.
Section 3.7
1
2
Applications
245
pOPUlation Growlh 19. A population with three age classes has a Leslie matrix
L ==
4
r'
figure 3.2.1
xo=
1
1
3
0.7
0
O. If the imtlal population vecto r is
o
0.5
0
100 100 ,compute x j, xZ, andx J .
100 20. A populatio n with (our age classes has a Leslie matrLx o 1 2 5 0.5
o o (0) Construct the transition matrix for the Markov chain that models this situation. (b) SUl'pose we start with 15 robots at each Junction. Find the steady state distnbutlon of robots. (Assume that It rakes each robot the same amount of lime to tra vel between two adjacent j unctions.)
13. Let ; denote a row vector consisting entirely of Is. Prove that a no nnegative mat rix P is a stochastic mat rix if andonlyif j P= j.
Suppose we want to know tile average (or expected) /lllmber of steps it will take to go from state I /0 slate} /f/ a M(lfkov cham. II call be sllowlI that the following computatioll (111swers tllis quest/all: Delete the jtll row amJ the JIll COIIl1ll11 01 the trtJlIsitiOIl matrix P to get a new matrix O. (Keep the rows and colll1l1/15 % labeled as Ihey were ill P.) The expected nllmber olsieps from state i 10 state j is gIven by Ih e SIl1ll of til(' (/lines ill the ca/limn of (J - Qr l labeled i. 15. In Exercise 9, If Monday is a dry day, what is the expected number of days until a wet day? 16. In ExerCise 10, what is the expected number ofgencrations until a short person has a tall descendant?
0
0.7
0 0.3
o
O. If the init ial po pulatio n 0
0
10 10 , compute x i> x 2' a nd Xy 10 10
vector is X Q =
21 . A certam species with two agedasses of 1 year's duralion has a survival probability of 80% from class I to class 2. Empirical evidence shows that , o n ave rage, each female gives birth 10 five fema les per year. Thus, two possible Lesl ie mat rices a re
14. (a) Show that the product of two 2 X 2 stochastic matrices is also a stochastic matrix. (b) Prove that the product of l'ovo nX" stochastic
matrices is also a stochastic matrix.
0
L, =
[~.8 ~]
and
~ == [~.B ~]
. (a) Startmg wit h Xo == [ 10] 10 , co mpute x l' .. . . XU) III each case. (b) For each case, plot the relative size of each age class over li me (as in Figure 3.23). What do your graphs suggest? 22. Suppose the Leslie malrix for the VW bectle is L =
o
0
20
0. 1 0
O. Starting with an a rbitrary xO' deler-
o
0
0.5
mine the behavio r of this population. 23. Suppose the Leslie matrix for the VW beetlc is 0 20
o
L=
s 0
o . Invcstigate the effect of va rying
o
0.5 o the survival probability s of lhe yo ung beetles.
17. In ExerCIse II , if the pinon nut crop is fair one ycar, what is the expected numbe r of years until a good crop ~ 24. Woodland cari bou a rc found primarily in the western occurs? provinces of Canada and the American northwest. 18. In Exercise 12. starting from each o f the other ju ncThe average lifespan of a female is about 14 years. tions, what is the expected !lumber o f moves until a The birth and survival rales for each age bracket are robot reaches ju nction 4? given in Table 3.4, which shows that canbou cows do
24&
Chapler 3
Matrices
not give birth at all dUri ng their firs t 2 years and give birth 10 about one calf per year d uring their middle years. The mortality rate for you ng calves IS very high.
yea rs 2000 and 20 10. What do yo u conclude? ( What assum pt ions does this m odel make, li nd how could it be im proved?)
Crl.lls Ind OlarlDlIs In Exerci5es 25-28, (Iefermi/le lile adjacellcy matrix of the given grapll.
The numbers of woodland caribou re ported in Jasper Na tio nal Park in Albe rt;! ill 1990 arc shown III Table 3.5. Using a CAS, predi ct the Cllribo u population for 1992 and 1994. T hen projecllhe population for the
28.
Table 3.5 Age (years)
"
Number
10 2 8
6-8
5
8- 10
12 0 1
10-12 12- 14
"I
Woodland Carlboa population In Jasper National Parl, 1990
0-2 2- 4 4-6
SDwru: World Wildlift Puod Canada.
", ",
"
",
"
/11 Exercises 29-32, draw a graph tlml ha5 Ille givell adJocell C)'
29,
m(llrix. 0
1
1
1
0 1
1
0
1
1
0
0
0
1
1
1
1
0
0
0
0
1
0
1
1
0
0
0
1
1
1 0
30.
Section 3.7
Appllcatiolls
III ExercIses 37-40, dmll' (I digraph dUll Iws tlte given culjacency m(ltrix.
0
0
I
I
0
0
0
0
I
I
0
0
I
I
0
I
I
0
0
I
0
0 0
0
31. I
0 0
0
I
I
0
I
0
0
0
I
0
0
I
I
0
0
0
I
I
I
0
0
I
0
0
I
0
0
0
I
0
I
I
0
0
I
I
I
0
0
0
I
0
0
I
0
I
0
I
I
0
0 0
I
0 0
11/ ExerCISes 33-36, determine the adjacency m(ltrix of tlte
0
0
I
0
I
0
I
0
0
I
givell tligraph.
I
0
I
0
0
I
0
0
0
I
I
0 0
0
0
0 0
0
I
I
I
0
I
0
I
0
I
0
0
0
I
0
I
0 0
I
I
0
0
0
33.
Vt
32.
37.
39.
'·2
UJ
38.
40.
III Exercises 41-48, use powers of adjacency matrices to de-
termine tlie 'IIltllber of paths of lite speCIfied lellgth betwee" ti,e givell vertrces.
'.,
41. Exercise 30, length 2,
34.
v, and 1'1
42. Exercise 32, length 2, v, ilnd
1'2
43. Exercise 30, length 3, v, and
V}
44. Exercise 32, length 4,
1'1
1'2
45. Exercise 37, length 2, v,
and [0
"J
-46. Exercise 37, length 3, v, to v. 47. Exercise 40, length 3,
1'4
to v.
48. ExeTCIse 40, length 4, v, to
v~
49. Let A be the adjacency matrL.'t of a graph G.
(A) If row i of A is all zeros, what docs this imply about C? (b) If column ] of A is all zeros, what does this imply about C?
"J
35.
50. Let A be the adjacency matrix of a digraph D.
",
", \ ' .1
36.
v,
",
(a) If row i of Al is all 7.eros, what does this imply about D? (b) If column j of A2 is all 7.eros, what does this Imply about D? 5 1. Figure 3.29 IS the digraph of a tournament wi th six players, P, to Pt,. Using adjacency matrices, rank the players first by determining wins only and then by using the notion of combined wi ns and ind irect wins, as in Example 3.68. 52. Figure 3.30 is a d igraph rep resent ing a food web in a small ecosystem. A directed edge from (I to b ind.cates tha\ (I has bas a source of food. Construct the adja· cency rnalnx A for this digraph and use illo answer Ihe following questions.
'.J
(a) Which species has th e most direct so urces of food? How does A show this?
UI
Chapter 3
Matrices
Roden!
Plant
Insect
•
Fish
Bird
flaure 3.30
(b) Which species is a direct source o f food for the most other species? How docs A show this? (c) If II eats b and II eats c, we say that (I has cas an indirect source of food. How can we use A to determine which species has the most mdirect food sources? Which species has the most direct and ind irect food sources combined ? (d) Suppose that pollutants kill the plants in this food web. and we want to delermine the effect thIS change will ha\'e on the ecosystem. Construct a new adjacency ma trix A· from A by deleti ng the row and column corresponding to plants. Repeat paris (a) to (c) and determine which species arc the most and least affected by the change. (e) What will the long-term efrecl of the pollution be? What matrix calculations will sh ow this?
53. Five people are alI connected bye-mail. Whenever one of them hears a juicy piece of gossip. he or she passes II along bye-mailing it to someone e~ in the group according to 'Iilble 3.6. (a) Draw the digraph that models t his "gossip network" and find Its adjacency matrix. A.
(b) Define a step as the time it takes a person to e-mail everyone on his or her list. (Thus. in one step, gossip gets from AnIlIO both Carla alld Ehaz.) If Bert hears a rumor, how many steps will it lake for everyone else to hear the rumor? What matrix calculation reveals th is? (c) If Ann hears a rumor, how many steps will it take for everyone else to hear the rumor? \\'hat matrix calculation reveals this? (d) In genera], if A is th e adjacency matrix of a dig raph, how can we tell if vertex i is connected to vertex jby a path (of some length)? [The gossip network in this exercise is reminiscent of , he nOlion ofusix degrees of separat ion" (found in the play and film by tha t name), which suggesls that any two people are con l1ected by a path of acq uain tances whose length IS a t most 6. The game uSix Degrees of KevlO Bacon" more frivolo usly asse rts that all actors afe conn ected to the actor Kevin Baco n in such a way. ] 54. Let A be the adjacency matrix of a graph G. (a) By induction. prove that for all n 2: I , the (i.} )
entry of A" is equal to the number o f ,,·paths between vertices I and j. (b) How do the sta tement and proof in part (a) ha ...e to be modified if G is a digraph?
55. If A IS the adjacency ma trix of a digraph G, what docs the (i, j) entry of AA r represent if i * j? A grapll is calle(1 bipartite if its vertices cat! be subdivided
illto tlVO sets U ami V sucl! tlmt el'ery edge Itas one emlpomt ;'1 U mul tile other em/poi/lt ill V. For example, tlte graph i/l r:.xercise 28 is bipartite lVillt U:: {v" 1,'2' VJ} al/d V = {v•• vj }. In ExerCIses 56-59, determine wllethera graph Wilh Ihe giwn adjace"cy matriX IS bIpartite. 56. The adjacency matrix in Exercise 29
, (
Section 3.7 Applications
( 57. The adjacency matrix in Exercise 32
whether (III error has occurred and correctly decode c ' ro recover the original mcssllge vector x.
58. The adjacency matrix in ExerCIse 3 1
1 0 1 1 0 0 1 0 1 1 1 1 0 1 0 0 59. 0 0 1 0 1 1 1 1 0 1 0 0 1 1 0 1 0 0 0
0
66.,' = [0
I
0
0
I
0
67.c'=[1
1 0
0
1
1 Of
68. c' = [0
0
I
I
I
B]
(a ) Fi nd a standa rd parity check matrix for this code. (b) Find a sta ndard generator matrix. (c) Apply Theorem 3.34 to explai n why this code is not error-cor recti ng. 70. Define a code matrix
[ BT : 0
Zi ---. Z~ using the sta ndard generator
(b) Using the result in part (a ), prove tha t a biparti te graph has no circuits of odd length.
G=
Error-CorrlCtiDI CodlS 61. Suppose we encode the fo ur vectors in the vector twice. Thus, we have
Zi by repea ling
[ O, I J ~ [ O, I ,O,1J
1
Show that this code is not erro r-correcting. 62. Suppose we encode the binary digits 0 and 1 by repeating each d igit five times. Thus, O~[O,O,O ,O,O J
I ~ [I, 1, I, I, I]
Show that this code can correct double errors.
W!rm is the resull of cllcodillg the messages I!I Exercises 6365 /lsmg the (7, 4) I-fammillg code of Example 3.70?
1 1
o
1
I
0
o
1
1
1
matrL,{
[ 1. 1J ~ [ I, 1, 1, 1J
65. x =
0
71. Define a code Z~ _ l~ using the standard generator
(i ,O J ~(i,O,I,OJ
0 1
1
(a) List all four code wo rds. (b ) Find the associated sta nda rd panty check matrix for this code. Is th is code (si ngle) erro r-cor rect ll1 g?
[ O,oJ~[O,O,o,o J
64. x =
Or
~ - ~.
0 !----, A = ......
63. x =
If
69. The parity check code in Example 1.3 1 is a code
60. (a) Prove that a gra ph IS bipartite If and only if its vertices can be labeled so that Its adjacency matri x can be partitioned as
1 1 0 0
...
1 1 1 1
When rhe (7, 4) Ham ming code oJ Example 3.70 is /lsed, suppose the messages, ' in Exercises 66-68 are received. Apply t/le standard parity check matrix to c' to determine
G=
o o
o 1 o o o 1 1 o o 1
1
o
1
1
1
(a) List all eight code words. (b ) Find the associated standard parity check matrix for this code. Is this code (single) error-correcting? 72. Show that the cod e In Example 3.69 is a (3, 1) Hamming code. 73. Co nstruct stand
ua
Chaplcr 3 Matrices
III Definitions Ind basIs, 196 Basis Theorcm, 200 colum n matrix (vector), 136 column space of a mat rix, 193 composition of linear transformat ions. 2 17 coordinate \'ecto r with respect to a basIs, 206 diagonal matrix. 137 dimension, 201 elementary matnx, 168 Fundamental Theorem of Invertible Matrices, 170 identity matrix, 137 inverse of a squarE' maUlX, 161 inverse of a hncar transformation. 219 lmear combination of matrices, 152
rank of a matrix, 202 Rank Theorem. 203 representations of matrix prod ucts, 144-146 row matrix (vector), 136 fOW space of a matrix, 193 scala r mul tiple of a matrix, 138 span of a 5(t of matrices, 154 square matriX, 137 standard ma trIX of a linc:lf tran sformatlon , 214 subspace, 190
linear dependence/independence of matrices. 155 linea r transformation, 211 LV factoriza tion, 179 matrix, 136 matrix add ilio n, 138 matrix factorization. 178 matrix multipl ication , 139 matrix powers, 147 negative of a matrix, 138 null spaceofa matrix, 195 nullity of a matrix, 202 outer product, 145 partilloned matrices (block
multiplication), 143,1 46 permutat ion matnx, 185 properties of matrix algebra, 156,157,165
symmetric matrix, 149 transpose of a matrix, zero matrix, 139
152,
149
Rerlew Questions I. Ma rk each of the following statements true or (:llse:
(a) For ally matrix A, bot h AATand ATA are defined . (b) If A and 8 are matrices such that A8 A,* Q,then 8= 0.
=
0 and
(c) If A, 8, and X are invertible matrices such that XA "" B,thenX= A - lB. (d) The inverse of an clcmellt,lry matrix is an elemen-
t:lry matrix. (e) The transpose of an elementary matrix is an
elementary matrix. (f) The product of two elementary matrices is an
(j) If T: Rt _IRs is a hnear transformation , then there is a 4 X5 matrix A such that T(x) = Ax for all x in the domain of T
111 Exercises 2-7, let A
subspace of Rn. (h) Every plane in R:1 is a two-dimensional subsp'lCe
orR'. (i) The transformation T: Rt ..... Rl defined by
ltx) =
- x
is a linear transformation.
[ ~ ~] (///(I 8 = [ ~
-3
Compllte the indicated matrices, i/possible. 2. A 2B
3. ATfi'-
5. ( BBT)-I
6. ( BT8 )- 1
4. BTA - IB
7_ The o uter product expansion of AAT · .c h · 8.1 f Als amatnx suul t atA . , = [
elementary matrix. (g) If A is an //IX II matrix, then the null space of A is a
=
o
9. If A =
1 0
- I
2
3
- I
0
'
,
-I
5 3
1// 2
-3 2
-']
4 ,fi ndA.
and X is a matrix such thaI
- 3
0 ,findX. -2
Cha pter Review
251
16. If A is a square matrix whose rows add up to the zero vector, explain why A cannot be inverti ble. 17. Let A be an //IX " matrix with linearly independent columns. Explai n why Ar A must be an invertible mat rix. Must AA T also be invertible? Explain. 18. Find a linear transfo rmation T: 1R1 ~ 1R2 such that
19. Find the standard mat rix of the linear tra nsformati on T: [R2 ~ [Rl that corresponds to a counterclockwise rota tio n o f 450 about the origin followed by a projection onto the line y = - 2x. 20. Suppose that T : IR" ~ IR ~ is a linear transformatio n and suppose that v is a vector such that T (v) =t- 0 b ut T l(V) = 0 (where T l = T o T. ) Prove that v and T(v) are linearly independent.
. J'"IIF' ..... ". •. l,"; .lII(.
,
_
~
~
•. . . - . . .
_.
I
_
Ei Ei
4.0 Introducllon: A Dvnamlcal Sl/stem on Graphs
Almost n-er)' combillfl/ioll of the adjectiYes proper, latent,
CAS
characteristic. eigen (HId s«ular, with the "Olms root, nUIIlb
We saw in the last chapter that iterating matrix multipllcatlon often produces interesting results. Both ~'iarkov chai ns and the Leslie m odel of po pulatIon growth exhibit
-Paul It Halmos Fin ite Dimem;ollill Vector Spaces (2nd edition) Van Nostnnd , 1958, p 102
steady siaies in cerlain situations. One of the goals of this chapter is to help you understand such behavior. First we will look at another Iterative proc~. o r dynam;. cui system, tha t uses ma trices. (I n the problem s that follow. )'ou will find it helpful to u sc a CAS or a calculoltor with matrix ca pabilities to facili tate the comp uta tions.) OUf example involves graphs (see Section 3.7). A compleregraph is any graph in which every ve rtex is adj:lcent 10 every other vertex. [f a complete gra ph has n vertices. it is d enoted by Kn. Fo r example, Figure 4. 1 shows a representation of K4 •
P,,1I1,. 1 Pick any vector x in R4 with no nnegative entnes and label the ve rtices o f K~ with the. com ponen ts of x , so tha t VI is I:lbeled wilh XI ' a nd so 011. Compute the adjacency mat h x A of K4 and relabel the ve rt ices of the gra ph wit h the corresponding componen ts of Ax. Try this for several vectors x rind explain , in terms of the graph, how the new labels ca n be determ ined from the o ld labels. 2: Now iterate the process III Proble m 1. Tha t is. for :I given choice of x, rel:lbel the verticc=s as dc=sc ribed above and then apply A again (and again, a nd aga m ) until a pattern c=m ergcs. Since components of the vectors the mselves will get quite large, we will scale them by dividi ng each vector by its largest component after each ite r,Hlon. Thus, if a computa tion results in the vecto r
P,.III,.
,.,
4
2
'2
1 1
we will replace it by
•
"n,.f. ".1 K
•
'"
1
4 !
•
2 1 1
=
0.5
0.25 0.25
5«11011 4 I
•
/ •
/
•
/
•
•
fluur, 4.2
,/
•
•
fllur. 4.3
-/ nlure 4.4
~
Introduction to Eigenvalues and Eigenvectors
2:53
Note thallhis p rocess guaran tees that the largest com ponen t of each vecto r will now be I. Do this for K4 , then KJ and K5 , Usc at least len iterations and two-deci mal-place accuracy. What appears to be happening? Probl,. 3 You should have noticed tha t, In each ca~, the labeling vector is
approach ing a cerlain vector (a steady state label!). label the ,'ertices o( the complete graphs with th is steady state vector and apply the adjacency matrix A one more time (WIthout scaling). What is the relationship between the new labels and the old ones? Pr,"••• Make a conjecture about the general casc K... What is the steady state label? What happens if we label K" With the steady state vector and apply the adjacency mat rix A without scaling? Prllbl,.5 The Petersen graph is shown in Figure 4.2. Repeat the process in Problems I through 3 with th is graph . We will now explore the process with some other classes of graphs to see if they behave the same way. The cycle C" is the graph wi th tI vert ices arranged in a cyclic fas hion . For example, Cs is the graph shown in Figure 4.3. Pr,bl •• 6 Repeat the process of Problems I through 3 wi th cycles C~ (or various odd values of 11 and make a conjecture abou t the general casco Probl,. J Repeat Problem 6 with even val ues of n. What happens? A bipartite graph is a complete bipartite graph (see Exercises 56-60 in Section 3.7) if its vertices can be partilloned into sets U and V such that every vertex in Vis adjacent to every vertex in V, and vice versa. If Vand V each have /I vert ices, then the graph is denoted by K... ~. For example, K,.) IS the graph in Figure 4.4. Pro III.. • Repeat the process of Problems ) through 3 wilh complete bipartite graphs K for various values of fl. What happens? II . "
By the end of this chapler, you will be in a position to explain the observations you have made in this Introduction.
Introduction 10 Eigenvalues and Eigenveclors
-
In Chapter 3, we encountered the notion of a steady state vector in the conlext of two "ppIiC(llions: Markov chains and the leslie model of population growth. For a Markov chain with transition mat rix P, a steady stale vector x had the property that Px = x; for a Leslie ma trix L. a steady Slale vector was a population vector x satisfying Lx = rx, where r represented the steady state growt h rate. For example, we saw that
° 0.7 0.' ] [0.4] = [0.4 ] ,nd 0.5 [0.3 0.8 0.6 0.6
4
3
18
0 0.25
0 0
6
18
= 1.5
Th t Germa n adJcctive tigtll means "own" or Hcha ractcristic of." f .11m/rles and ngoll'<.'(lo'5 are ehara,teTistie of a malrix in the' ~n~ that they conlain important information about the nature of the matrix. The leiter A (la mbda), the Greek eq uivalen t oflhe Englb h lellt' L, is ulotd for eigenvalues
In this chapter, we investigate this phenomenon more generally. That is. for a square matrix A, we ask whether there exist nonzero v('Ctors x such that Ax is just a scalar multiple of x. This is the eigenvaillc problem, and it is one of thc most central problems in linear algebra. It has applications throughout mathematics and in many other fields as well.
because at one time Ihq W('T(' also known as la /ellt mluts. The prt fix eigerl is pronounced ~ EYE·g un ."
Let A be an fiX" matrix. A scalar"\' is called an eigellvalue of A if there is a nonzero vector x such that Ax = Ax. Such a vector X is called an eige". vector of A corresponding to A.
o
1
6 1
Dennltion
.,
•
25'
Chapter 4
Eigenvalues and Eigenvectors
(KllDlle 4.1
Show that x =
[,]. . I
LS
an eIgenvector of A =
[3 ~ J 1
and find the corresponding
eigenvalue.
Solution
We compute
fro m which it follows that x is an eigenvector o f A co rresponding to the eigenvalue 'I.
-4 (Xlilple 4.2
Show that 5 is an dgc nval ue of A =
[.' !]
and d etermi ne all eigenvectors carre-
sponding to this eigenvalue.
Solullon
We must show that there is a nonzero vector x such that Ax = 5x. But this equation is equiv
A-S/=[:
2] -[5 0] [-4 2] =
3
0
5
4
- 2
Smce the columns of this matrix arc clearl y linea rl y depmden t, the Fundamental T heorem of Invertible Mat rices implies that its null space is no nzero. T hus, Ax = 5x h as a nontrivial solution, so 5 is an eigenval ue o f A. We find its eigenvectors by computing the null space:
[A - 5/ 10J = Thus. if x = Xl -
[:J
2 0 ] -' , [ -2 0
-\o 0] 0
is an eigenveclor corresponding to the eigenvalue 5, it satisfi es
~ Xz = 0, or X I =
1Xz. so these eigenvectors are of the fo rm
1'11;11 is, they are the nonzero multiples o f [
~] (or, equivalen tly. the nonzero multiples
of [ ; ] ).
The set of all eigenvectors corresponding to an eigenvalue A of an /IX" matrix A is just the set o f tlonzero vectors in the null space o f A - AI. It fo llows that this set of eigenvectors, together with thc zero vecto r in [Rn, is thc null space of A - AI.
S«tion 4.1
255
Introd uctio n t o Eigenvalues and Eigenvecto rs
i.
"
•
DenallloD
leI A be an n X /I matrix and let A be an eigenvalue of A. The collection of all eigenvectors correspond ing to A, together with the zero vector, is called the eigenspace of A and is denoted by E.I.'
{t[~ ] }.
Therefore, in Example 4.2, Es =
lKample 4.3 Show that ,\ = 6 is an eigenvalue of A = space. SOlltiOA
7
1
-2
- 3
3
6 and fi nd a basis for its cigen-
2
2
2
As in Example 4.2, we compute the null space of A - 6 /. Row reduction
produces
A - 6/=
I
1
-2
-3
- 3
6
2
2
- 4
I
1
-2
--+. 0 0
0
o
0
0
from wh ich we see that the null space of A - 6/ is nonzero. Hence, 6 is an eigenvalue of A, and the eigenvectors corresponding to this eigenvalue satisfy X l + ~ - 2x} = 0, or XI "" - ~ + 2~. It follows that -Xl
£6 ""
+
- I
2 X3
X,
x,
x,
1
+ x,
-I
2 0
= span
1
0
2 1 • 0 0 1
..+
In R2, we can give a geomet ric interpretation ofthe notion of an eigenvector. The equation Ax = Ax says that the vectors Ax lmd x are parallel. Thus, x is an eigenvector of A if and only If A transforms x in to a parallel vector (or, equivalent ly, if and only if TA(x ) is parnllel lO x, where T", is the mat rix transfor mation corresponding to AJ).
lxample 4.4
Find the eigenvectors and eigenvalues of A =
[~
0] geometrically.
- I
We recognize Iha l A is the matrix of a reflection F in the x-axis (see Example 3.56). The on ly vectors that F maps parallel to themselves lire vectors parallel
SOllllol
to lhe y-axis (i.e., mult iples of
[~]), which are reversed (eigenvalue -
) , and vectors
paraUel lo the x-axis (Le., mul tiples of [ ~]), which are sent to themselves (eigenvalue I) (see Figure 4.5). AccordlOg1y, A = - J and A = I arc the eIgenvalues of A, and the corresponding eigcnspaces are
25&
Chapter 4
EigenV3lues and Eigenvectors
y 3
2
~,
F(y)
1
'~ " 3
-2
FIel)
1
= el
1
2
3
Y~ F{e2) - - e2
~
-2
1'(, )
-3
fllllr, 4.5 The eige nvectors o f a reflection y
4
3
Ay
2 Y 1
,
A, 2
4
3
Flglr, 4.6
The d iscussion i ~ b
Another W'd Y to think o f eigenvectors geometrically is to draw x and Ax head -totail. Then x will be an eigenvector of A if and only if x and Ax are aligned in a straight line. In Figure 4.6, x is an eigenvector of A but y is not. If x is an eigenvector of A co rrespond ing to th e eigenvalue A, then so is any nonzero multiple o f x. So, i f we want to search for eigenvectors geometrically, we need o nly consider the effect of A on ullit vectors. Figure 4 .7(a) shows what happens when we transform unit vectors with the matrix A = [3
' ] of Example 4. 1 and display
13
['/Vi] -I/Vi] [
the results head-Io-tail, as in Figure 4.6. We can see that the vector x =
1/ V2 is an
eigenvecto r, but we also notice that there appears to be an eigenvector in the second quadrant. Indeed, this is the case, and it turns o ut to be the vector
l/ Vz '
Section 4.1
15'
Introduction to Eigenvalues and Eigenvcctors
)'
)'
4
-, (b)
(3)
figure 4.1
. re 4.7(b). v.'e see what happens when we use the matnx A = In F1gu
[
\ - \
There arc no eigenvectors at all! We now kn m v how to find eigenvectors once we have the correspond ing eigenvalues, and we have a geometric in terpreta tion o f them- but one questio n rem ains: How do we fi rst find the eigenvalues of a given mat rix? The key is the observat ion that A is an eigenval ue of A if and only if Ihe null s pace o f A - AI is nontri vial. Recall fro m Section 3.3 Ihat Ihe determinant of a 2 x 2 matrix A :. [:
!]
is the
expressIon clet A = ad - be, and A IS invertible if and o nly if d et A is nonzero. Furthermore, the Fundamental Theorem of Inver tIble Mal rices guaran tees that a matrix has a non trivial null space if and o nly if it is noninvertible--hence, If and o nly If its determinant is zero. Puuing these facls toget her, we see that (for 2x2 mat rices at least) A is an eigenvalue of A if and only if det (A - A1) = O. This fact characterizes eigenvalues. and we will soon generalize It to square matrices of arbitrary size. For the moment. tho ugh, let's see how 10 use it with 2 X 2 matrices.
Example 4.5
Find all of the eigenvalues and corresponding eigenvectors of th e matrix A
[~ ~] from Example 4. 1. Solution
The preceding remarks sho w that we mUSI find all solutions A o f the equatio n del(A - AI) = O. Since det(A - AI) = det [
3- A I
\ 1~ (3 -
3- A
A)(3 - A) - \
~ A' -
6A + 8
we need to solve the quadratic equation Al - 6A + 8 - O. The solutions to thisequation arc easily fo und to bt" A = " and A = 2. These are therefore the eigenvalues of A.
258
Chapter 4 E1genvalues and Eigenvectors
To find the eigenvectors corresponding to the eigenvalue A = 4, we compute the ntlll space of A - 41. We find - I [A - 4/ 1 0l~ [ 1
from which it follows that x = only if X I
-
Xl
= 0 or
,p,n([:])
XI x =
2•
-I 0 ] 1 0] [ 1 - I 0 -+ 0 o0
[:J
is an eigenvector corresponding to A = 4 if and
Hence, the eigenspace E. =
{[~] }
=
{Xl[:]}
Similarly, for A "" 2, we have
11 0]0 ~ [01 o100 ]
[A -2/ 1 0l~[ : so y "" YI =
[;:J
is an eigenvecto r corresponding to A = 2 if and only tf YI
-Y2' Thus, the eigenspace ~ =
{[
+ Y2 =
-~2 ]} = {Y2[ - :]} = span( [ -
:
0 or
D.
Figure 4.8 shows graphically how {he eigenvectors of A 3rc transformed when multiplied by A: an eigenvector x in the eigenspllce E4 is transformed into 4x, and an e igenvector y in the eigenspace £2 is transformed mto 2y. As FIgure 4. 7( a) shows, the e igenvectors of A arc the only vectors in R2 that are transformed into scalar multiples of themselves when m ulti plied by A.
,
Ay = 2)
-+---+---+-- _ -'I c..--t--+---+---+_ x - 4
-3
- 2
- I
I
- I
-2 -3 - 4
figure 4.8 How A transfo rrnseigenvcclors
2
3
4
Section 4.1
Introduction to Eigenvalues and Eigenvectors
259
Remarll
Yo u will recall that a polynomial equallon with real coefficients (such as the quadratic equallon in Example 4.5) need not have real roots; It may have complex rOOts. (See Appendix C.) It is also possible to compute eigenvalues and eigenvectors when the en tries of a mat rix come fro m Zp' where p is prime. Thus, it is important to specify the sctling we in tend to work in before we SC I out to compute th e eigenvalues of a matrix. However, unless otherwise specified, the eigenvalues of a matrix whose entries are real n umbers will be assumed to be real as well.
Examp114.6
Interp ret the matrix in Example4.5 as a matrix over 1 J and find its eIgenvalues in that field.
Solullon
The solut ion p roceeds exactly as above, except we work modulo 3. Hence, th e quadratic equation Al - 6A + 8 "" 0 becomes A2 + 2 = O. This equation is the same as Al = - 2 "" I, giving A "" I and A = - I = 2 as the eIgenvalues in ZJ' (Check that the same answer wo uld be obl,lined by first red ucing A modu lo 3 to obtain
[~ ~] and then worklOg with Ihis matrix.) Example 4.1
Find the eigenvalues of A =
Solullon
[~
-
~] (a) over IR and (b) over the com plex num bers C.
We must solve the equation
o=
-A
det (A - A1) "" de t [ 1
-Il -A
"" A2 + 1
(a) Over R, there are no solutions, so A has no real eigenvalues. (b) Over C, the solutio ns are A = j and A "" - i. (See Appendix C.) In the next section , we will extend the notion of determinant from 2X2 to nX n matrices, which in turn will allow us to find the eigenvalues of arbitrary square matrices. (In fact, this isn't qu ite true- but we will al least be able to find a polynomial eq uation tha t the eigenvalues of a given matrix must sat isfy.)
III Exercises 1- 6, sllow that v is fill eigenvector of A alldfind
the corresponding eIgenvalue. I. A
=
2. A =
3. A =
[~ ~]. v =[:l
[; :J.v=[-:l [-~ ~]. v = [-;l
4. A "" [ :
=~]. v = [~l
3
S. A ""
6. A =
0 0 1 1 0 0 1
1 1
1 2
0 -2 • 1 - I
1 , 0
v=
2 -I
1
v=
2 -I
- I
211
Eigenvalues and Elge:nv«tors
Chapler 4
I" Exercises 7- 12, sl/Ol\' ' hat A is nil eigenvalue of A alld jiml Olle eigerll'cc/or correspondillg /0 ,IIis eigenvalile.
7.A = [22]. A = 3 2
8. A =
•. A 10. A
[:
- 1
'9.
-, .
2] A = -2
[ -~ == [-:
I I. A =
In Exercises 19-22. 111ft Imit vectors x in R! alia tlreir Images Ax Wider fi,e (ICliOll of tI 2X Z IIIfl /rix A are (Imll'lI head-totail. as III FIgure 4. 7. Estimale lire eigenvectors lind eigenva/lies of A from each "eigel/pictllre "
4] A= I
5 •
4J A"" 4
5 •
,°2 -, , ,, 2 ,°-,
,, ,
,A= - I
3 12. A =
4
•A= 2
°
2
III Exercises 13-18. filld the clgetH'allle5 cmd eigenvectors of A geometrically. 13. A =
[ - 0' ~) (refl ection in the y-axis) 20.
'4. A = [ 0,
o,] ( refl ection in the line y::: x)
15. A =
[~ ~] (projection onto the x-axis)
16. A
[~
:IE
Z)
(projection onto the line through the
origin w ith directio n vecto r 17. A =
[i]>
[~ ~] (stretching by a factor of 2 horizontally
and a factor of 3 vert ically) 18. A =
[~
-
~] (cou nterclockwise rota tio n 0£9O"
about the o rigm)
J'
Scction 4.1
Introduction \0 Elge nva!ues and Eigenvecto rs
29. A= [ ~ :]
21.
30. A = [ 1
~j
1
261
~ i)
2
III Exercises 3/- 34, filld al/ of the eigenvailies of lire matrix A over the indicated Z,.
~]over Z3 32_ A=[~ ~J over l j
31. A= [ : -2
33. A =
[!
~]over zs
34.A = [ :
~] over Zs
35. (a) Show that the eigenvalues of the 2X 2 matrix
22.
are the solut ions of the q uad ratic equation ,\1 tr (A)A + del A = 0, where tr(A) is the trace of A. (See page 160. ) (b) Show that th e eigenvalues of the matrix A in part (a) are
y
A~
J{. + d ± V(. -
d)' + 4bd
(c) Show that the trace and determinant o f the matnx A in part (a) arc given by tr(A ) "" Aj
+ A2
del A "" AlA2
and
where A1 and '\2 arc the eigenvalues of A. 36. Consider again the matrix A In Exercise 35. Give cond il"ions on a, b, c, and d such that A has (a) two distinct real eigenvalues. (b ) o ne real eigenvalue, and (el no real eigenvalues.
III Exercises 23-26, use the metllod of Example 4.5 to find all of the eigenvalues of the malnx A. Give bases for each of tile correSfJol!(lillg cigenspaces. IIIl1slrate the eigenspaces and the effect of mll/tiplying eigenvectors by A as in FIgure 4.8. 23. A = [ 2 4
- I]
2 25. A "" [0 52 ] •• bI I
24. A = 26. A ""
In Exercises 27- 30, find /III of the clgenvalues of tire 1I1(11rix A over the complex numbers C. Give bases for each of ti,e correspolldillg eigel/spaces.
27. A=[_::J
28.A =[ ~ -~]
.. ,
37. Show that the eigenvalues of the upper triangular matrix
are A = {I and A = d, and find the corresponding eigenspaces. 38. Let a and b be real numbers. Find the eigenvalues and correspond ing eigenspaccs o f
A ""
[ - "b b]
over the complex num bers.
{I
ZSZ
Chapter 4
Eigenvalues (lild Eigenvectors
DelermlnanlS HLstorically, determina nts preceded matrices- a curious fact in light of the way Ii near algebra is taught today, with matrices before determLnan ts. Nevertheless, determ ina nts arose Inde pendently of matrices in the solut ion of many pract ica l problems, a nd the theory of determinants was well developed "Imost two centuries before matnces were deemed worthy of study in "nd o f them selves. A snapshot of the history o f determinants is presented at the end of Ihis section. Recall lhat the delermlllant of the 2X2 mat rix A "" det A ""
(/ 11 ° 22 -
(II I [
(I~ I
OIl ":!1
We first encountered this expression when we determmed ways to compute the inverse of a m"trix. In particular, we found that
The determinant of a matrix A is sometimes also denoted by IAI, so for the 2 X2 ma• A = tTiX
[ 1111 (I, L
11 12]
. we may also wnte
1122
a" a'2 = IAI = il2L (/22 WlfallD
(11I1In -
(/12a2 1
This notation fo r the determina nt is remin iscent of absolute value no-
tation.1t is easy to mistake al l
a L2
. ,the notati. on for determlllant, fo r [all
a 21
all
il21
all]
,the
an
no tation fo r the matrix itself. Do not confuse these. Fortunately, it Will usually be clear from the con tex t which is intended. We define the determinant of a 1 X I mat rix A = ra j to be de t A = 1(11 =
il
(Note that we really have to be careful with notat ion here: lal does '101 de note the absolute value of il in this case. ) How then should we define the d eterminant of a 3x3 matrix? If you ask your CAS for the inverse of
a b ( d
A-
,
g I.
f
,.
the answer Will be equivalent to L
A- = -
J
~
ei - fll fg - Iii
ell - hi
dll - cg
bg - (III
III -
bf - ce cd - at ae - bd
cg
where A = aei - Ilfli - bdi + bfg + cd" - ceg. O bscrw that
+ bfg + cd!! - ccg - a(" - fl.) - b(Ji - fg) + «(dJi - (g)
A = (lei - IIfl! - I,d;
(
Ji
Section 4.2
Determinants
213
and that each of the entries in the matrix portion of A - I appears to be the determina nt of a 2 X2 submatrix of A. In fa ct, th is is t rue, and it is the basis of the defi nition of the determinant of a 3x3 matrix. The definition is rewrsive in the sense that the determinant of a 3 X 3 matrix is defined in te rms of determinants o f 2 X 2 matrices .
•
Dennlllon
I~t A =
det A ""
a ll
a ll
al l
au an . Then the ddenninant of A is the scalar
IAI =
a ll
a ll
a"
(I)
a"
Notice that each of the 2X 2 determina nts is obtai ned by dele-tlllg the row and column of A that contain the entry the determina nt is being multi plied by. For example, the first summand is a ll multiplied by the determinant o f the sub matri x obtained by deleting row I and colum n I. Notice also that the plus and minus signs alternate in equation ( I ). Ifwedenote by A'j the submatrix of a matrix A obtained bydcleting row i and column j, the n we may abbreviate equation ( 1) as
,
• =
L (-I )I+;al l de t All F'
For any square:: matru II, det A ;) is called the (i. j)"minorof A.
I
Compu te the de term inant of
A =
SII.U..
5
- 3
2
I
0
2
2
- \
3
We compute detA=5 ~ ~
o - I
2
3
- (- 3)
1
2
2
3
+2
I
2
o - I
+ 3{3 - 4) + 2( - 1 - 0) 5(2) + 3( - 1) + 2(- 1) ~ 5 5(0 - (- 2»
With a little prac tice, yo u should fi nd tha t you can easily work out 2 X 2 d eterm ina nts in yo ur head. Writing out the second li ne in the above solution is then un necessary.
.+
Another method for calculating the determ inant of a 3X 3 matrix is analogous to the method for calculating the determinant of a 2X2 matrix. Copy the first two columns of A 10 the right of the matrix and take the products of the elements on the six
214
Chapter 4 Eigenvalues and Eigcm'ectors
diagonals shown below. Attach plus signs to the products from the downward-sloping diagonals and attach minus signs to the products from the upward-sloping diagonals.
-(2)
Th is method gives
In Exercise 19, you are asked to check that th is resu lt agrees with that from equation ( I ) fo r a 3X 3 determinant.
EKlmple 4.9
Calcu late the determinant of the matrix in Example 4.8 using the method shown in (2).
Solution We adjo in to
A Its first two columns and co mpute the SIX indicated
products: •
Adding the three products at th e bottom and subtracting the three products at the to p gives
d" A = 0 + (- 12) + (- 2) - 0 - (- 10) - (-9)
=5
as before.
WarnlaD
We are about to define determinants for arbitrary square matrices. However, there is no analogue of the method in Example 4.9 for larger matrices. It is valid only for 3 X 3 matrices.
Determinants 01
nx n Matrices
The defimtion of lhe determinant of a 3X3 matrix extends naturally to arbitrary square matrices. Let A :::: [ai}l be an /I X 1/ matr~, where nant o f A is the scala r
Defiaition
•
2: ( - I )I~ia l, dct Ali j '"
/I?:.
2. Then the determi
Section 4,2
Determinants
2&5
JI is convenient to comb ine a m ino r with its plus or minus sign. To this end, we define the ( i. j ) -cofactor of A to be
c·'I -
(_I)'~i det A·~
With th is no tatio n, defi nition (3) becomes
2:" a ljC1i
del A =
(4)
r'
Exercise 20 asks yo u to check tha I this d efini t ion correctl y gives the formu la fo r the determ inan t of a 2x 2 matrix when u ::: 2. Defin itio n (4) IS often refer red to as cofactor expansion along tl.e first row. It is a n amazing fact thai we get exaclly the same result by expandi ng along allY row (or even (my cO/limn)! We sum ma rize this fact as a theorem but defer the p roof until the end of th is sectio n (since it is somewhat lengthy and wo uld in terru pt our diSC USSion I f we were 10 p resen t it here).
r
Theore. 4.1
The Laplace Expansion Theorem The determina n t of an n X II matrix A = [a"l. where n 2:: 2. can be co
(5)
"
= ~ a,jCii
,,
(which is the cofactor expansion along the ith row) and also all
(the cofactor expansion along the jlh column ). __________________________________ -MB
Si nce e'l = (- I )'~/ det " 'i' each cofactor is pl u s or m inus the correspo nd ing mino r, with the correct sign given by the term (- 1)'+J. A qUick way 10 d etermine whether the sign is + or - IS 10 re memocr that the signs for m a "checkerboard" patte rn:
-
· ..
+
+ +
+ +
+ +
•• • ••
+
· ..
211
ChapttT 4
Eigenvalues and Eigenvttlors
Ixample 4.10
Compute the determinan t of th e matrix
A =
5
-3
2
1
0
2
2
- I
3
by (a) cofa ctor expansion along the third row and (b ) cofactor expansion along the 5Ccond column. Salullon (a) We compute
2 5 - (- 1) 2 1
-3
o
: 2( - 6) +" + 3(3)
: 5 I'ierre Simon L1placc ( 1749- 1827) was born in No rmandy, France and was ex pected to become a ckrgyman until his mathemalleai taltnts wtTe not icw at school He made many important con tribu tions to eakulus, probability, and astronomy. He was an examinu o f the young Napoleon Bonaparte at the Royal ArUUeI)' Corps and later, when Napoleon was in power, served briefly as MlrIlster o f the Interio r and then Chancellor o f the Senat e. Laplace was granted the hlle of Count of the Empire in 1806 and received the title of Marquis de Laplace in 181 7.
IxaDlPle 4.11
(b) In th is case, we have
2 5 2 : - (-3) 1 2 + ~5 - (- 1) 2 3
2 3
1 2
:3(- 1)+0+"
:5
Notice that in part (b) of Example 4. lOwe needed to do fewer calcu lations than in part (a) because we were expanding along a colu mn that contained a zero entrynamely, Iln; therefore, we did not need to compute ~ 2 ' It follows that the Laplace Expansion Theorem is most usefu l when the matrix conta ins a row or column with lots of zeros, since, by choosing to expand along that row or column, we minimize the num~r of cofactors we netd to com pute.
Compute the determinant of
2 5 1
-2 SOlull08 First notice that column 3 has only one no nzero entry; we should therefore expand along this column. Next note that the + 1- pattern assigns a minus sign to the"
Section 4.2 DetermmanlS
21'
entrya.zl = 2. Thus, we have det
A
=
(l1 3CB
=
O(CIl ) - 2
=
+ (/jjC1J + ll3~C'4 + 2Ctl + O(CJj) + O(C,}4)
+
lln C lJ
2 - 3 1 - I - 2
1 3
I
0
We now conti nue by expanding along the thi rd row of the deternuna nt above (the third column would also be a good choice) to get - }
I
det A = - 2( -2 - I } = - 2( - 2( - 8) - 5) = - 2(1 1) =-22 (Note that the +/- pattern for the 3 X3 minor is not that of the original ma trix bu t that of a 3 X3 matrix III generaL )
The La place expansion is plm icularly use ful when th e matrix is (upper or lower) triangula r.
Ex._pla 4.1t
Compute the dc."termi nant of
A=
2 0
-}
I
}
2
0 5
0
0 0
I
6
0
5
0
0
0
0 0 SOIIIIOIL
4
7 0 2 - I
We expa nd along the fi rst column to get
} 2 0 1 detA = 2 0 0 o 0
5 6 5
o
0
- 1
7
2
(We have om it ted all cofactors correspond ing to lero entries.) Now we expand along the fi rst column ag.1in: I
6
0
det A = 2 · 30
5 0
2 - 1
o
Comi nuing to expand along the fi rst column, we complete the calculation:
detA =2 . 3'1~ _~1 = 2'3 ' 1 ' (5(- I) - 2 ' O)=2'3 ' 1' 5 ' (- I) =-30..-i
es and Eigenvectors
Exam ple 4.1 2 should convince you that the dete rm ina n t of a triangu lar matrix is the product of its diagonal en tries. You lire lIskeel to give a p roof o{lh ls {llct in Exercise 21. We record the resul t as a theorem .
.2
The determinant of a triangular matrix is the product of the entries d iagonal. Specifically, if A = (aj) is an /l X '1 triangular mlltrix then det A ""
Cl ll tJ22 •••
011
its main
a....
• ,11 In genenl (that is, unless the mat rix is triangular or has some ot her specIal fo rm ), computing a determinant by cofactor expansion is not effici en t. For example, the de termina nt of a 3X 3 matrix has 6 = 3! summands, c
T(n )
=
(n - 1)n! + tI! - I
>
II!
Even the fastest of supercomputers cannot calcula te th e determ inant of
Properlles 01 Determinants The most efficient way to compute determ inants IS to use row reduction . However, not every elemen tary row operation leaves the determinant of a matri..'{ unc hanged. Th e next theorem summarizes the main p ropenies yo u need to understand in order to use row red uction effect ively.
.3
-
Let A ==
.It /It.
fa,) be a square matrix.
If A has a zero row (colu mn), then det A "" O. If His obtained by intercha nging two rows (colum n s) of A, the n det B = -det A. If A has two identical rows (columns), the n del A = o. d. If B is obtained by multiplying a row (co lum n) of A by k, then de t B = k det A. p If A. B, and C are identical except that the ith row (col um n) of C is the sum o f the ith rows (columns) o f A and B, then det C"" det A + det 8 . f. If B is obtained by adding a m ultiple o f one row (colum n) of A to ano ther row (colum n) , then det B = det A.
rt'
Section 4.2
211
Determinants
Prool
We will prove (b ) as Lemma 4.1 4 at the end of thIs section. The p roofs of propertIes (a) and (f) aTe left as exercises. We will prove the remaining properties in terms of rows; the corresponding proofs fo r co lum ns are analogous.
(cl If A has two identical rows, swap them to obtain the matrix B. Clearly, 11 = A, so del 8 = det A. On the other hand,by ( b ), det B = - del A. Therefore , det A = -det A, so det A "" O. (d ) Suppose row j of A is multipl ied by k to produce B; that is, b" :: ka" for j = I, . .. , II. Srnce the cofactors e'l of the clements in the jlh rows of A and B are identi cal (why?). expanding alo ng the ilh row of IJ glV(~s ~
w
"
detB = L b'l e" "" Lku" e'I"" k La'J e 'J :: k del A I- I
,,-1
,,-L
(c) A.~ in (d ), the cof[J ctors e 'J o f !he clements rn the ith rows of A, B, and C are ident ical. Moreover, ' y = 'I" + b'l for } :: I , ... , II. We expand alo ng the jth row of to obtain
e
"
,,~
"
L c'l e., = L: (a" + b,,)C~ = L: a'I C'l + 2: b'l e" ""
det C ""
,,-1]'" 1
I-I
det A
+
d et B
,- I
Notice that properties (b ). (d ). and ( f) are related 10 elementary row operations. Since the echelon form o f a square matri x is necessarily upper tria ngular, we can combllle these properlles with Theorem 2 \0 calculate determi nants effi ciently. (See Exploration: Counting Operations in Chapter 2, which shows that row reduction of an n X n matrix uscson the order of II' operat ions, far fewer than the II! needed fo r co· factor expansion.) The next examples Illustrate the computation of d etermi nan ts using row reduction.
Compute det A if
(a) A =
2 0
-, 0
(bl A ~
3 2 5
3 - ] 5 3 -6 2 2 0
, - ]
-,
5 -3 6 5 7
-3
]
S0111101 (a) Using property (f) and then property (a), we have
dct A =
2
3
-1
o
5
3
2 3 - 1 o 5 3
-6
2
o
-,
0
o
~ O
211
Chapter 4 Eigenvalues and Eigenvectors
(b) We reduce A to echelon fo rm as fo llows (there are other possible ways to do this):
det A =
0 3
- 4 5 -3 6 R,_R,
2
0
2 5
~
4
- I
I 14 ,R, 0 ~ - 3 It, !R,
0
5 - 3
7
0
- I
2 4
-4
I
7 2
0 - I II,HII.
-
1t,~2/t.
3
I 0 0 - I 0 0
-
- I
0
2 15
0
0
2 5 3
3 0 2 5
0
2 4 - I
- 3 - 4 5 - 3
II.....1t, ~
- ( - 3)
-9
6
5 7
R,/3
-3
I
I
0
0
- I
0
4 2
0
I
0
- I 2
0 2
2
5
- I
- 4 5 5 7 - 3 I
4
- I 2 2 -9 7 - 4
3 5
2 -9 - 33 - 13
- 3 .1' ( -1 ) ·1 5 · ( - 13) - 585
Hamar. 13y Theorem 4.3 , we can also usc elementary column operatIOns in the process of computing determinants, and we can "mix and match" elementary row and column operatio ns, For example, in Example 4.1 3(a) , we could have started by adding column 3 to column 1 to create a leading 1 in the upper left-hand corner, In fact, the method we used was fa ster, but in other exa mples column operations may speed up the calculations. Keep this in mind when yOll work determinants by hand.
Determinants 01 llementa,. Matrices Recall fro m Sect ion 3.3 that an elementary matrix results fro m performing an elementary row operation on an identity matrix. Setting A "" I" in Theorem 4.3 yields the fo llowing theorem,
,
Theorell 4.4
Let E be an
TlX /I
elementary matrix..
a. If E results from interchangi ng two rows of lw' then del E = - I. b. If E results from multiplying one row of Iw by k, then det E = k. c. If E results from adding a multiple of one row of '" to another row, then detE = I.
The word It'llmw is derived from the Gret'k verb /ambanein, whi,h means "to grasp." In mathematics, a lemma is a "helper theorem" that we "grasp hold of" and use to prove another, usually more Important theorem.
Prool Since det I" = l . applying (b), (d),and (f) of Theorem 4.3 immcdiatcly gives (a ), (b), and (c), respectively. of Theorem 4.4. Next, recall that multiplyi ng a matrIX B by an elementary matrix on th e left performs the corresponding elemen tary row operation on B. We can therefore rephrase (b ), (d ), and (f) of Theorem 4.3 succinctly as the follOWing lemma, the p roof of which is straightforward and is left as Exercise 43.
Se<:tlon 4.2 Determinants
le .. lI. 4.5
Let 8 be an
nX II
211
matrix and let Ebe an /I X II elementary matrix. Then
det(EB)
~
(det £)(<10. 8)
•
.
We can use Lem ma 4.5 to prove the ma in theorem of thIs section: a characteriza· tion of invertibllity in terms of determi nan ts.
Theorem 4.6
'* O.
A square matrix A is invert ible if and only if det A
P,aol Let A be an
•
nX IJ matrix and lei R be the reduced row echelon for m of A. We
'*
'*
Will show first thai del A 0 if and only if det R O. Let EI, f 2• ••. , E, be Ihe ele· melltary matrices corresponding to the elementary row openltions that reduce A to R. Then
.
E"'EEA = R
"
Taking determina nts of both sides and repeatedly applying Lemma 4.5, we oblain ( deIE,)·· ·(deI El )( detE])( deIA ) = detR
By Theorem 4.4 , the determinants of all the elemen tary matrices are nOllzero. We conclude that £let A 0 if and only if det R O. Now suppose thai A is mvertible. Then , by the Fundamenlal Theorem of Invertible Mat rices, R = J", so det R = I O. Hence, det A 0 also. Conversely, if det A =F 0, then det R 0, so R cannot contain a zero row, by Theorem 4.3(3) . It follows that R
'*
"*
'*
'*
'*
muSl be I" (whyn, so A is invertible , by the Fundamental Theorem agai n.
DelerlllnnlS and MarriN Dperalions Let's now Iry to determine what relationship, if any, exists between delerminants and some of the basic matrix operations. Specifically, we would like to find for mulas for det( tA), det(A + B), del(AR), det(A- 1), and det( AT) in terms of det A and det 8. Theorem 3(d) does /lot say that det(kA) = k det A. The correct relationship between scalar multiplication and determinants is given by the followin g theorem.
Theorem 4.1
If A is an /IX /I matrix, then det(kA)
=
k"detA
:
57
,
You are asked to give the proof of this theorem in Exercise 44 . Unfortunately, there is no simple fo rmula for del(A + B), and III general, det (A + B) det A + del B. (Find two 2X2 matrices that verifY this.) It therefore comes as a pleasant surprise to find out that determinants are qui te compatible wIth matrix multiplication. Indeed, we have the following nice formula due to Ca uchy.
'*
•
212
Chap ter 4
Eigenvalues and Eigem·mors
AugUSlin louis Ca uchy ( [ 789-1857) was born in Pans and studied engineering bu t swi tched to ma themati CS because of poor health. A brilliant and prolific mathematICian, he published over 700 pa pers, many on quite difficult problems. His name ca n be found on many theorems and defim tio ns in differential ('(juatlons, mfinite scnes, probability theory, algebra, and physIcs. He IS noted for introducing rigor into calculus, layi ng the foundation for the br.mch of mathematics known as :maI YSls. Politicall y conservative, Ca uchy was a royalist. and in 1830 he followed C ha rles X in to el[ile. He returned 10 France 111 1838 but did not return to his post at the Sorooll ne unti l the unh~rsity dro pped its requirement that faculty swear an oath of loyalty to the new klllg.
Theo" . 4.8
If A and Bare " X 'I mat rices. then
d,,(AB) - (d" A)(det Il)
Proof We consider two cases: A IIlvertible and A not invertible. If A is invertible. th en, by the Fundamental Theorem or Invertible Mat rices, it can be written as a product of elementary mat rices-say.
,.
A-EE. ··· < Then AIJ =
EI ~
~
. EtlJ, so k applications of Lemma 4.5 give
det(AB)
~
d,,(E,E, ·· · E,B) - (det E,)(det E, ) ... (dol E,)(det B)
Con tin uing to apply Lemma 4.5, we obtain
det(AB) - d,,(E,E, ... E,)det 8 - (det A)(d" B)
if A is not inven ible, then neither IS AIJ, by Exerci$C 47 in Section 3.3. Thus. by Theorem 4.6, det A = 0 and det(AB) = O. Consequently, det (AIJ) = O n thc other hand,
(det AHdet 11), since both sides are zero.
Uample 4.14
ApplYlllg . T h eorem 4.8 to A =
[' 2
~]andB= [~ AB =
: ]. we find that
[ :~ ~]
and that det A = 4, det IJ = 3, and det(AB) = 12 = 4 · 3 = (det A)(det B), as daimed. (Check these assertions!)
The next theorem gives a I11ce relationship between the determinant of an invertible mat rix and the determinant of its inverse.
Section 4.2
I
Theorem 4.9
213
Deterrninants
If A is inveruble, then 1
det(A ') ~ -;-'--; det A
Since A is invertible, AA - t "" I, so det{AA - 1 ) "" det I I. Hence, (det A)( det A- I) = I, by Theorem 4,8, and SLOce det A of. 0 (why? ), dividing by del A yields the result.
Prlol
Ixample 4.15
Verify Theorem 4.9 for the matrix A of EXlImpJe 4.14.
Solution
We compute
3
I
8
8
=-- - =-= 4
1
det A
ae •• r.
The beauty or Theorem 4.9 is that somet imes we do not need to know what the inverse o r a matrix is, bu t only that it exists, or to know what its determi nant is. r o r the matrix A in the last two examples, once ,ve know that det A = 4 0, we immediately can deduce that A is inveruble and that det A- I =! WIthout actually computing A - I .
"*
We now relate the d eterminant or a matrix A to that of its transpose AT, Since the rows of AT are just the columns of A, evaluat ing det AT by expanding alo ng the first row is identical to evaluating det A by expand ing along ItS fi rst column , wh ich the Laplace Expansion Theorem allows us to d o, Thus, we have the fo11ow1l1 g result .
Theore. 4.10
For any square mllirix A,
del A = dctAT "
Gabriel Cramer (1704-J 752) was a Swi!;5 mathematician. The rule that bears hiS name was published in 1750, In his treatise til rroducrioll 10 the Allalysis of Aigdmllc Curves, As early as t 730, however, spedal cases or thc rormula ",'ere known to other mathematicians, Including the Scotsman Colin Maclaurin (J 698-1746) , perhaps the greatest of Ihe British mathematicians who were the usuccessors of Newton,~
Cramer·, Rule and lbe Ulolal In this section, we derive two useful formulas rel3ting determinants 10 the solution of lincllr systems and th e inverse of a matnx. The fi rst of these, Cramer'J Rule, gIves a (ormulll fo r describing the solution of certain systems or 11 linear equations in 11 vari· lIbles entIrely in terms of determ inants. Wh ile Ihis result is or little practical use beyo nd 2X2 systems, il is of grell t theoreticlil importance, We will need some new notal ion for Ih is result and its proof. Fo r an fiX 11 matrix A lind a vector b in ~n, let A,(b ) denote the matrix obta ined by repillcing the ilh col· umn of A by b, That is,
,
Column
A,(b)
~
j
[a,--- b --- a.J
214
Chapter 4 Eig~ nval ues and Eigenvectors
Cramer's RuJe Let A be an invertible " X" matrix and let b be a vector in IR". Then the unique so lution x of the system Ax = b is given by for i= I , ... ,'1
del A
Proor c2 "
•• ,
The colum ns of the iden tity matrIX I = I" are the standa rd unit vecto rs c" en' If Ax = b, then
AI,(x ) ""
A [ c 1 ...
- ["
..
X
'"
b ..
e"J
= [Ae l
' .J -
A,(b)
...
Ax
.. .
Ae"J
Therefore, by Theorem 4.8,
(dctA)(d
0
x,
0
]
x,
·•
•
·•
•
·•
det l,(x ) = 0
0
x,
o o
o ...
•
. ..
·• ·
0 0
0 0 0 0 ·•
0 ·•
•
- x,
•
• ••
x.-,
]
0
• ••
x.
0
I
as can be seen by expand ing alo ng the ilh row. Th us, (det A) x, = de t(A,(b )), and the result follows by dividing by det A (which is nonzero, since A is inven ible).
Example 4.1.6
Use Cramer's Rule to solve the system
x l +2x2 = 2 -Xl
SOIUtiOD
del A =
+ 4x2 =
I
We compute I
- ]
2 - 6 4 '
dOl(A ,(b)) -
2 2 I
4
= 6,
and
det(A 2(b » =
By C ramer's Rule,
d
6
=-= 1 ,nd 6
X, _
~dc:: "",(A2,(",b,-, )) _ ~ _ ~ dN A
6
2
]
2
-]
I
$«tion 4.2 Determinants
7:15
Be_.rl As noted above, Cramer's Rule is computationally inefficient for all but small systems of linear equations because it involves the calculation of many determinants. The effort expended to compute Just one of these determinants, using even the most efficient method , wou ld be beller spen t using Gaussian elimina tion to solve the system directly. The final result of this St(tion is a formu la fo r the inverse of a matrix in terms of determinants. This formula was hinted at by lhe formula for the inverse of a 3X3 ma trix, which was given Withou t proof at thc beginmng of this section. Thus, we have come full circle. Let's discover the formul
Therefore, Ax,
=
ej , and by Cramer's Rule, X :
det A
'I
Hoy.'cvcr, Hh (olUI11I1
"II
au
...
a ll
all
...
• 0 0
. ..
" I~ ab•
",.
. ..
• ••
o
: (- I))" det AJ'
=
CJ'
•••
which is the (j. j)-cofactorof A. It fo llows thaI x'I = (t l det AlC",. so A I = X = (l / det A)[ C.I'I = (I I det A)[ C,) T. In words, the IIlverse of A is the tmrlSposc of the matrix of cofuctors of A, divided by the determinant of A. The mat fiX
IC,]: IC,l':
C" C;, c;, <="
. C., · . . c,.,
C,. C;.
·
·
. c;..
is called the adjoint (or adjugate) of II and is denoted by ad; A. The result we have just proved can be stated as follows.
211
ChaplC'f 4
ElgC'nvaJuC'S and EigenvC'Ctof$
Theorem 4.12
•
: Ii
Let A be an invertible nX PI matrix. Then I ad' A det A J
A -I =
lumple 4.11
Use the adjoi nt method to compute the Inverse of
A =
1
2
- I
2
2
4
1 J
- 3
Solullon We compute det A = - 2 and the nine cofactors - 3
2
- I
3
- 3
+I~
- I
C' I = +
Cz,
ell
= -
=
• = - 18
2 3
•
=3 = 10
•
2
Cil = -
I
- 3
I C:z2 = + I
- I -3
I
- I
C'2 =
-
2
4
10 = - 2
=-6
Cu =
+
c" = C31 = +
2
2
I 3 I 2 I 3 I 2 2
2
=. = - \
=-
2
The adjoint is the trarlSpose of the ma trix of cofactors-namcJy,
adjA ""
- 18
10
4
3
- 2
- I
10
, =
-6 -2
- 18 10
3 - 2
-6
4
- I
- 2
10
Th,n
A
_I
- 18 10 3 1 . I = det Aad) A = - -2 10 - 2 - 6 = - I - 2
•
9
-,,
-5
-5
I
3
- 2
1
I
which is the same answer we obtained (wi th less work) In Example 3.30.
Proof of I.e laPlace I.panslon Tbeorem Unfortunately, there is no short . e'asy proof of the Laplace Expansion Theorem. The proof we give has the merit of being relat ively straightforward. We break it down into several steps, the first of which is to prove that cofactor expansion along the first row of a matrix is the same' as cofactor expansion along the first column.
Ie .... 4.13
Let A be an nX n matrix. Then
Stction 4.2 Determinants
PrIG' Wf! prove this lemma by induction on
2:n
For /I = 1, the result is trivial. Now assume tha t the result is true for (n - I )X(ll - I ) matriccs; this is our induClion hypothesis. Note that, by thf! definition of cofactor (or minor), all of the terms contain illS {I I I
o. •
o. •
a,_ I,i
i l ,. , ,!
a,. I.l
o. •
a,. ,,!
O' '' I,}
o.
• •
o. •
o. •
o. •
(I,· "n
a,. I••
The jth term in this expansion of det A'l is 0 1/ ( _ 1)1 +r l det A,." where the nota tio n AtI,rf denotes the submalrix of A obtained by deleting rows k and I and columns r and s. Combining these, we see that the term con tain ing a"llJ l o n the right-hand side of equatio n ( 7) is
a,I ( - I) '''a 1/ (-1), .. / - 1 del A 1;' 1] -- (_1)'''/+'"i l aI) del A 1,1] What is the term containing (l0I 1i1/ on the left-hand side of equation (7)? The factor lll) occurs in the Jlh summand , lli/ C,/ = (I11 ( - I )1 + / d el A I / . By the induction hypotheSIS, we can expand det A,/ alo ng its first column:
au
al. r '
a"
o. •
lljl
o. •
a l. /_ I
a,.,.,
o. •
a,.,.,
a,. a,.
I
• •
ll~i-I
a i./H
o. •
a".
a., The nh term in th is expansion of del AI / is 11,.( - I )II-n +' del AI •. ,/ ,SQ the term containing a.,a l ) on the left -hand side of equation (7) is a,) ( - I) '"
( 1)(·- 1) .. , d et Ah.lj - ( - 1),, /.. 1a'lfl1j d el Ah.l)
(101 -
which establishes that the left- and right-hand sides of f!q ualion (7) are eqUivaleni.
Next, we prove property (b) o f Theorem 4 .3.
, 2
Ie ••• 4.14
Ltt A be an nX" matrix and let B be obtained by interchanging any two rows (col umns) of A . Then
det B = -dct A •
~
and EigenvC'Ctors
Proal
Once agai n, the proof is by induction on n. The result can be easily checked when II = 2, so assume that it is true for ( /I - 1) X (II - I) ma trices. We will prove that the res ult is true for /I X /I ma trices. First, we prove tha t it holds when two adjacen t rows o f A are intercha nged- say, rows rand, + I. By Lemma 4.13, we can evalua te de t B by cofactor expansion along its first column. The ith term in this e xpa nsIo n is (- I )1 + 'b,) del B". If I ra nd, r + I, then bi , = a, ) and BII IS an (II - 1) X (n - 1) submat ri x Ihat is ide ntical to A" except tha t two adjacen t rows have been inte rchanged.
'*
*"
a"
a"
·.. • ••
a"
·.. • •
a"
a.,
••
T h us, by the induc tion hypot hesis, det B,) "" - det A'l if ff I "" r, the n hoJ = a,+1.1 and Bil = A ,+ .. ).
j
'* rand
j
*" r + I.
·. .
a"
· ..
Row i_
· ..
·. . T h erefore the rt h sum mand in det B is
(-\ y~ Ib,1 det B,] Similarly, if i = r
+
= (- I
I , the n b'l
yH (I,~ 1.1 det AM-I.I =
(/,1'
= - ( - 1)('+1)+Ia +1. 1det A,+I.I
'
Bd = A,p and th e {r+ I)sl summand in det B IS
( - l){M- 1)+1h,+1. 1del B' I ).I = (- I
Ya" det A'I
= -( _1) M-1 a,1 det Ad
In other wo rds, the rth and ( r + I )st te rms in the first colum n cofactor expansion o f de t B are the tlegluives of the {r + I)S! and rth terms, respectively, in the first column cofactor expa nsion of de t A. Substituting all of these results into det Band using Lemma 4.13 again, we obta m
•
de t 8 = ~ ( - l) ' ~ l b;l detB'1
,- , •
L,- , (- 1) '+l b'l det B,] +
(- I y+l b'l det B'I
+ (- \ t
+I )+l b'+1. 1de! 8,+1, 1
''',.,+1
•
L (- I) '''' aile - de t Ail) I
i- I
''' ",+1
•
- L (-IY'la'l d etA'1 i'" I
- detA
(_1)(r~ 1)+ I(/r+ 1, 1 det AM- 1.1 -
(- 1)'+' a,l det Art
Sechon 4.2
Determmants
119
This proves the result for fi X " matrices if adjacent rows are interchanged. To sec that it holds fo r arbit rary row interchanges, we need only note tha t, for example. rows ra nd s, whe re r< s, can be swapped by perform ing 2($ - r) - 1 interchanges of adjacen t rows (see Exercise 67). Since the num ber of interchanges is odd and each one cha nges the sign of the de term inant, the n et effect is a change of sign , as desired. The proof for col um n interchanges is a nalogo us, except tha t we expand alo ng row I instead of alo ng column 1. We can now prove the Laplace Expansio n Th eorem.
Prool 01 Theorem 4.1 Let B be the m atrix obtained by moving row i of A to the top, using i - I intercha nges o f adjacent rows. By Lemma 4.14,det B = ( - I ),- 1det A. But 11., = a" and B., :::: A" fo r j = I •... , n. . .. ·..
'" det B = a._ I.• ti, ,,, !. I
·. .
·. . ·..
,..
...
11,.1.;
.. .
" i I.j
11, + I .~
••
· ..
...
"""
Thus,
" L (1)1·'II det ,., " " = ( _ I},-I L (- I )HJa'J det A,) = L (- I ) I ~jag det A"
det A "" ( - I ),- 1det 1J = ( - I ),- 1
,.,
IJ
B IJ
,.,
which gIVes the formula for cofactor expansio n along row I. T he proof for column expansion is similar, invoki ng Lemma 4.13 so that we can ~ use col umn expansion instead of row expansion (sec Exercise 68 ).
A Briel Hlslor, 01 DOlerminanlS As noted at the beginnmg of this sect ion. the his to ry of determ inan ts predates that of matrices. Indeed, determinants we re firs t introduced, independent ly, by SekJ in 1683 and Leibn lz in 1693. In 1748, determinants appeared in Maclaurin's TreatIse 011 Algebra, which included a treat ment of Cramer's Rule up to the 4 X4 case. In 17SO, Cram er hUl1Self proved the general case of his rule, a pplying it to curve fit ling, and III 1772, Laplace gave a proof of his expansion theo rem. The term determillalU was not coined until 1801, when it was used by Gauss. Cauchy made the first usc of determinants in the modern sense in 1812. Cauchy, III fact, was respo nsible fo r developing much of the early theory of determi nants. including several importa nt results that we have me ntioned: the prod uc t rule for determina nts, the characteris tic polynomial. and the notion of a diagon:dizable ITwtrix. Determinants d id no t become widely known until 1841, when Jacobi popularized them, albeit in the context of fu nctions o f several variables, such as are encountered In a multivariable calculus course. (These types of dctermillants were called " Jacobi:l1ls" by Sylvester around 1850, a term that isslill used today. )
281
Chapter 4
Eigenvalues and Eigen\'cctors
Gottfried Wilhelm von Leibllil. ( J646- 1716) W;lS born in Leipzig ~nd studied bw, theology, pnilosophy. and mathe m at i c,~. He is probably best known for developing (with Newton, mdependently) the main ide:15 of differential and intcgnll calculus. However. his contributions to other branches of mathematu:s are also im pressl\'e. 1·le develop·ed the notion of a determinant, knew versions of Cramer's Rule and the Laplace EKpansion Theorem Ix:fore others were gi\'en credit for them, and laid the foundat ion for mat nx theory through wo rk he did on quadratic forms. Leibnil. also was the I1 rSllO develop the binary system of arithmelic He believed in the importance of good notation and, along wi th the f.1miliar nOlal ion for derh-a tivcs and integrals, introduced a form of subscript notation fo r the coefficients of [\ Imear system Inal is essent ially the t1ol:11ion we use loday.
I•
Charles LutwidgC' Dodgson (18J2-
1898) is much beller known by his pen name. Lewis Carroll, under which he wrote Alice's Adwmurts III Wimderland and Through the Lookmg Glass He also wrote several mathematICS books and collections of logic puules.
By the late 19th century, the theory of determ ina nts had developed to the stage that entire books were d evoted 10 it, including Dodgson's 1\/1 Elementary Tlleory of Determinants in 1867 and Thomas Muir's mon umental five-volume wo rk, which appeared in the early 20th cen tury. While their history is fasci nating, today determ inants are of theoretical mo re than pract ical mterest. C r:lmer's Rule is a hopelessly inefficient method for solving system of linear equations, and numerical mel hods have re placed any use of determ inants in the computation of eigenvalues. Determinants are used. however, to give students an initial understanding of the characteristic polynomial (as in Sections 4.1 and 4.3).
,I
Compute the determilJ(lIIts ill Exercises 1-6 usillg cofactor exptlllsioll alollg the first row (HId nlollg the first colulI1l1. 1
0
3
1. 5
1
1
0
1
2
3.
- I
0
-I
0
1
0
1
- I
2 3 3 I 1 2
5. 2 3
-I
3
-2
-I 3
0
2.
1
1
1
0 2 1
4.
1 0
1 0
1
0
1
1
1
2 3 5 6
7. -I 3
2 2 1 2 0 0
3
2
-2
4
1
-I 0
9.
a
1
-I
8. 2
0 - 2
I 1
10.
0 0 15. 0
12. b 0
-I 0
2 6 1 0 0 4 2 1
0 0
d
g h
0 b
,
,.
2
3
5
2 0
0 0 0
b
1
1
cosO
0 11. 0 a b a 0 b
13.
6. 4 7 8 9
3
1
1
Compute the determittatJts in £Xeroses 7- 15 using cofactor expmmon along any row or coilimn that seems conve,,;ePlt. 5
-4
a
, •
J
14.
1
0 2
sin O cos O
tanO -sin O
sin O
cosO
a 0 d
, ,
0 0 0
3 2
- I
-I
1
4
0
1
- 3
2
Sc<:tio n 4.2
Itl Exercises 16-18, compule the m dicated JX J determ;/ltl/lts
b , e = 4
a d
16. Thedetcrminanl in Exercise 6 10
281
Find the determinallts in Exercises 35-40, assuming lilal
Ilsing the method of Example 4.9. 17. The determinant
Detennmants
Exercise 8
g II
I
18. The determinant in Exercise 11
2. 2b 2,
19. Verify that the method indicated in (2) agrees with
equation ( I) fo r a 3 X3 determinan t.
,
35. d
20. Verify that definition (4) agrees with the definition of a 2X2 determinant when II = 2. 21. Prove Theorem 4.2. ( Hint. A proof by induction would be appropriate here.)
," 37. b , "g h ,. g
3.
/.
,
- b
36. 3d
-,
3g
- /,
2,
2/ 2i
a + g b+h c + i 38. d / ,• h g
d
,
• , d
2, b III ExerCIses 22-25, evaluate the given determinallt using
39.
elementary rolY alld/or column oper(ltiol!5 alld Theorem 4.3 /0 reduce Jlre matrix to row echeloll form . 23. The determinant in Exercise 9
Exercise 14
In ExerCISes 26-34, use properties of determilUwts to evtl/ulile Ihe given delerm illcml by illSl'ecriol1. Explclitl . your remoll/IIKI 26. 3
I 0
I -2
3 27. 0
2 2
2
o
0
I
28.05
2
3
4
-]
29.
2 I
3 - 3
- 4 - 2
-\
5
2
I 0
3 - 2
164
5
4
I
o
2 0 0
32.
34.
o o o
0
I
I
0
0
0
I
\ 0
] 0
o ]
0
J
]
0 0
o
0
1
I
I
- 3 0 0 33. 000
o
0
2f - 3;
, •
42. Prove Theorem 4.3(0.
43. Prove Lemma 4.5.
44. Prove Theorem 4.7.
In Exercises 45 (ltId 46, use Tlleorem 4.6 to find all values of k for which A is invertible.
k
k
4 31. - 2
o o o
,
-k
3
45. A=Ok + l l k - 8 k- ]
I 0 - 2 5 o 4
123 30.04\
10 0
b 2e - 3h h
4 J. Prove Theorem 4.3{a}.
46. A=
o
g
g
24. Thedeterrninant in Exercise \3 111
"
2i
a 40. 2d - 3g
22. The determinant in Exercise]
25. The determinant
2/
k!
o
k 2 k
0
k k
II, Exercises 47-52. assume that A am/B are /IX II matrices witll det A = 3 and del B = - 2. Find the indicated determimmts.
0 4
] 0
47. det(AB)
48. del(A2)
49. det(Jr'A)
50. det(2A)
51. det{3BT )
52. dot(AA')
III
Exercises 53- 56, A and B are nX II malflces.
53. Prove thaI det {AB) = dct{BA). 54. If B is invertible, prove that det(S-' AB) "" dct (A). 55. If A is idempotent (that is, A2 = A), find anpossible val ues of det(A). 56. A square matrix A is called nilpotent if A'" = 0 for some II! > 1. (The word lIilpotl!l!I comes from the Latin nil, meaning "nothing," and potere, meaning
Z8Z
Ch~plI.: r
,I
Eige nvalues and E3igcllv(.'Ctors
"to have power." A nilpotent matrix is thus one thai bemrnes "nothing"---that is. the zero matrix-when raised to some power.) Find all possible values of det (A) jf A is nilpotent.
where P and S are square matrices. Such a matrix is said to I>e in block (upper) triangular form. Prove that det A = (det P)(det S) ( H ml: Try a proof by inductio n o n the nu mber of
111 Exercises 57-60, lise Cramer's Rule 10 solve Ihe given linear sy$tem. 57.x+ y = I
58. 2x -
5
Y""
70. (a) Give an example to show that i( A can be partitioned as
x+3y - - 1
x-y = 2 59. 2x + Y + 3z - I
6O. x+y -z= 1
Y + z- I
x+y +z= 2
,- I
x-y
=3
III Exercises 6 1-64, lise Theorem 4. J2 to compllte tile inverse of tile coeffiClellt mtllrix for tire gIVtII exercise. 6l. Exercise 57
62. Exercise 58
63. Exercise 59
64. Exercise 60
65.
rows of P. )
If A is an Invertible /IX n malrix, show that adj A is also invertible and that
1 A = adj (A- I) det A 66. I( A is an /I X II matrix, prove that (adj A) -I =
det(adj A) "" (det A)" - l 67. Venfy that i( r < s. then rows rand s o( a malflx can be interchanged by performing 2( s - r) - I mterchanges of adjacent rows. 68. Prove that the Laplace Expansion Theorem holds (or column expansion along the jlh column. 69. Let A be a square malrix that can be partitioned as A =
[6+~l
A =
[~'iil
where 11, Q, R, and 5 are all sq uare, then it is nOl necessarily Irue that
dOl A = (dct P)(det S) - (dOl Q)(dct R) (b) Assumc that A is partitioned as in part (a) and that P is invertible. LeI B
=:
[:~~~~·i +f]
Compute det ( BA ) usi ng Exercise 69 and use thc result to show that det A "" det Pdet(S - RJTIQ) (The matrix S - RP- IQ is called the ScilUr complemetl' o( PIn A, afte r Issai Schur ( 1875- 1941 ), who was born in Belarus bU I spent most of his li(e in Germany. He is known mamly (or his fu ndamental work on the represen tatio n theory of gro ups, but he al$O worked in number theory, analysis, and other areas.) (c) Assume Ihal A is partitioned as in part (a), that P is invertible, and t hat PR = RP. Prove Ihat det A = det(PS - RQ)
.- ,
~
, - .. ".""
~
~
~
_~I
... -:;:--
.
Geometric Applications of Determinants This exploration will reveal some of the amazing applications of determinants to geometry. In particular, we will see that determinants arc closely related 10 area and volume form ulas and can be used to produce the equations of lines, planes. and certain other curves. Most of these ideas arose when the theory of determinants was being developed as a subject in its own right.
Tbe Cross Product Recall fro m Exploration: The Cross Product in Chapter I that the cross product u, of U = liZ and v = V2 is the vector u X v defined by
U X Y =
112V, -
l'l Vl
II J V1 -
Il IV}
II1 V2 -
11 2 V ,
If we write this cross product as ( 112v, - II) v2)e I - ( II I v} - Il, VI )e 2 + (u] v2 - IIz V. )c}' where (' I' c 1' and c, are the stimdard basis veclOrs, then we see that the form of this fo rmula is U X V = det
CI
III
VI
C2
112
V1
eJ
IlJ
Vj
If wc expand along thc first column . (This is not a proper determinan t, of course, since e p c2• and eJ are vectors, not scalars; howcver, it gives a useful way of rcmcml>er ing the somewhat a\. . kward cross product formula. It aLso lets us use properues of determinants to ver ify some of thc properties o f the cross product.) Now let's revisit some of the exercises from Chapter I.
283
I.
Use the determinant version of the cross product to com pute u 0
(a) u =
1 ,v = 1
2
-I
2 - 4
2 •y = 3
(,) u =
2.
3
(b) u =
1
(d) u =
-6
1
1 1
• y=
1
2 3
W,
",
, V ""
- I • y= 1
v.
0
2
Y,
'" '" '"
If u :::
3 - I
X
, and w =
w, ,show that
Y,
W,
", u· (v
X
w) = del
",
",
v, v, Y,
w, w, W,
3. Usc propert ies of d eterminants (and Problem 2 above, jf necessary) to prove the given property o f the cross product.
(b) u X 0 = 0 (d) u X k.v = k(u X v) (c) u x u = O (,) u X (v + w) = u X v + u X w (f ) u' (u X v ) = 0 and v' (u X v ) = 0 (g) u · (v x w) = (u X v) · w (t he Iriplescafarproduct idetltity )
(a) v X u ::: -( u X v)
Area and Volume We can now give a geometr ic interp retillion of the deter m inan ts o f 2x2 and 3X3 matrices. Recall that if u and v arc vectors in R l, then the tHCa A o f the parallelogram d eterm ined by these vectors IS given by A = II u X vii. (Sec Explo ration : The C ross Product in Chapler I.) 4.
Let u
=
["'J
and v =
vl
li!
termined by u and v is given by
b+ d d
A = del
~ UI
k ed)
( Hint: Write u and v as
(a. b)
!i2
o a a+c
figure 4.9
284
["J
. Show that the area A of the parallelogram de-
x
[''""
VI
and
V< .)
0
5. Derive the area fo rmula in Problem 4 geomet rically. usi ng Figure4.9 asa gUide. (Hint: Subtract areas fro m the large rectangle until the parallelogram remains.) Where does the absolute value sign come from in this case?
v X ""
h
,
-
FIGura 4.10
6.
Find the area of the parallelogram determined by u and v.
Generalizing from Problems 4-6, consider a parallelepiped. a three-dimensional solid resembling a "slanted" brick, whose six faces are all parallelograms with opposite faces parallel and congruent (Figure 4.10). Its volume is given by the area of its base tImes its height. 7. Prove that the volume Vof the pa rallelepiped determined by u, v, and w is given by the absolute value of the determinan t of the 3X3 matrix [u v w) with u, v, and w as its columns. [Him: Fro m Figure 4.10 you can see that the height h can be expressed as II = I lul ~os(}, where 0 is the angle between u and v X w. Use this fact to show that V :::: lu· (v X w)l and apply the result of Problem 2.1 8.
Flgur, 4.11
Show that the volume V of the tet rahed ron determined by u, v, and w
(figure 4. 11) is given by V = Hu· (v x w)1 [Him: From geometry, we know that the volume of such a solid is V = base) (heigh t).]
j (area of the
Now let's view these geometric interpretations from a transformational point of vIew. Let A be a 2 X2 matrix and let P be the paral lelogram determ ined by the vectors u and v. We will consider the effect of the matr ix transformation .,~ on the area of P. Let T.... (P) denote the parallelogram determined by T....(u) = Au and T....(v) = Av. 9.
Prove that the area of 1~( P) is given by Idet AI(area of 1').
10. Let A be 3 3X3 matrix and let P be the parallelepIped determined by the vectors u, v, and w. Let T",(P) denote the parallelepiped determined by T",(u):::: Au. T",{ v ) :::: Av, and 'J ~( w) = Aw. Prove that the volume of T",{P) is given by Idet AI (volume of P).
•
The preceding problems illustrate that the determinan t of a matrix captures what the corresponding matrix transformatio n does to the area or volume of figu res upon which the transformation acts. (Although we have considered o nly certain types of figures, the result is perfectly general and c.m be made rigorous. We will not do so here.)
285
LInes and Planes Suppose we are given two distinct poin ts (x .. r,) and ( ~, Y2) in the plane. There is a u nique line passing through these points, and its equation is of the form
ax + by +c=O Si nce the two given points are on this line, their coordinates satisfy this equation. Thus,
ax,+ by,+c -= O aX2 + bY2 +C - Q The three equations together can be viewed as a system of linear equations in the va riables (I, b, and c. Since there is a nont rivial solution (Le., the line exists), the coeffi cien t matrix
cannot be invertible, by the Fundamental Theorem of Invertible Mat rices. Consequently, its determinan t must be zero, by Theorem 4.6. Expa nding Ih is determinan t gives the equatio n of the line.
• The equation of the line thro ugh the points (x,. y,) ilnd (~. Y2) is given by
Y
x xl
yl
1 l= O
Xl
Yl
I
II. Use the method descri bed above to fi nd the equation of the line through the given po ints.
(b)( I.2) 'nd (4.3) 12. Prove that the th ree poi nts ( xl' YI)' (-\I. Yl )' and (x,. y,) are collinear (lie o n the same line) if and only if Xl
YI
1
x~
y~
l=O
xJ
Yl
1
13. Show that the equation of the plane through the three noncollinear poi nts ( XI' YI' %1 )' (~, Y2' ~), and (~, Yl' %3) is given by
, 1 1 " 1=0 "" 1 What ha ppens if the three po ints are coll inear? [Him: Explain what ha ppens when x x, x, x,
y y, y, y,
row red uction is used to evaluate the determi nant.] 286
14. Prove that the four points (x" r .. ZI)' ( ~'Y2' Zz), (.\3' Y3' Z), and (x.> Y4'Z.) 3rc coplanar (lie in the same plane) if and only if
x, y, " x, y, Z, x, y, z, y.
X.
"
1 1 1 1
= 0
Curve fitting When data arising from experimentatio n take the form of poi nts (x. y) that can be plotted in Ihc plane, it is often of interest to find a relationship between the variables x and y. Ideally, we would like to find a funct Ion whose graph passes through all of the
12
points. Sometimes all we wa nt is an approxim a tio n (see Section 7.3), but A
CX 3Ct
results
are also possible in certain situations. 15. From Figure 4.12 it appears as though we may be able to find a parabola passing through the points A( -] , 10), B(O, 5), .tnd C(3, 2). The eq uation of such a parabola is o f the form y = a + bx + cx 2, By s ubstitu ting the given pOll1ts into this equation, set up a system of three linear equations in the variables a, h, and c. Withollt
6
IJ
so/vmg the system, uSC T heore m 4.6 to argue t ha t it must have a umque solution.
4
Then solve the system to find the equation o f the parabola in Figure 4. 12. 16. Use the me thod of Problem J 5 to find the polynomials o f degree at most 2 that pass through the following se ts of po mts.
c -2
flgur. 4.12
2
4
(,) A( I. -I ). 8(2, 4), C(J. J)
6
(b) A(- I . - J ).8(I.-I ). C(J. I)
17. Ge neralizing from Problems 15 and 16, suppose al' a2 , a nd a) a rc distinct real numbers. Fo r any real nu m bers hI' h1, a nd h3' we wa nt to show tha t there is a unique quadratic with equation of the for m y = a + hx + a? passing through the points (a l • btl , (a2, b2 ), and (a 3, h3)' Do th is by demonstrating that the coefficient matrix of the associated linea r system has the determinant ~
I
a, a2
(Ii =
1
IlJ
tt,
1
(a) - (l ,)(a3 - (l ,)(aJ
-
a2)
which is necessarily nonzero. (Why?) Let a" rlz. a3, and
18.
1 1
-
"a, aiai a:ai
1
a,
1
'.
ai al ai a:
fl.
be distinct real num bers. Show that
= (~ - {1,)(a3 - al )(tI.! - Ql )(a.l - (~) (a4 - ( 2)(a4 - a3)
'* 0
For a ny real num bers b" bl , b~, and b4 , usc this res ult to prove tlla t there isa u nique cubic WIth equatton y = a + bx + ex l + dx' passing through the fo ur points ( {I I' bl ), (a 2, b2), (a3, h3), a nd ( a~, b. ). ( Do not actually solve for a, b, c, and d. )
28'
I 9.
Let a l • a2•••
• , Q~
I I I I
be
a, a, a, a"
/I
real numbers. Prove that
a:
• ••
a;r-I
al
•• •
~- l
•• •
(t- J
a:
'"
,
- IT
(aJ - (I,)
ls;i < I "; ~
(/~-I
"
where n I '" i"'. ( a, - a ,) m eans the product of all terms of the fo rm ( tli - (//), where j < j and both i and jare between I and 11. [ The d Clcrmin::lllt of a matrix of this form (or Its transpose) is called a Vandermomle determimmt, named after the French mathematician A. T. Vandcrrnondc {1735- 1796) .1 Deduce that fo r any n points in the plane whose x-coordinates arc all dIstinct, there is a unique polyno mial of deg ree II - I whose graph passes th ro ugh the given points.
I
!II
Section 4.3
,
Eigenvalues and Eigenvectors of nX 11 Matrices
219
Eigenvalues and Eigenvectors 01 nx nMatrices Now that we have defined the determinant of an /IX n m atrix, we can continue our d Iscussion of eigenvalues and eigenvectors in a general co ntext. Recall from Section 4.1 that A is an eigenvalue of A if and o nly if A - Af is noninvertible. By Theorem 4.6, Ih lS is true if and only if de t(A - AI) == O. To sUllllnarize:
The eigenvalues o f a square ma trix A are precisely the solut ions i\. o f the equation
dct(A - AI)
~
0
When we expand del(A - AI), we get a polynomial in A, called the characteristic polynomial of A. The equation d ef(A - ).I ) = 0 is called the characteristic equation of A. For example, if A = [ "
c
d
~
(I - A
b
c
d- A
b], its characteristic polynomial is d = (a - A)(d - A) - be = A' - (a + d)A + (ad - be)
If A is IIX II, its characteristic polynomial will be of d egree /I. Accord ing to the Fundamental Theorem of Algebra (see Append ix D), a polynom ml of degree /I with real or complex coeffi cients has at most n distinct roots. Applymg this fa ct to the charac teristiC polynomial, we see that an ' IX 11 mat rix with real o r complex entries has at most
n d,stillct eigellvalues. Let's summarize the procedure we Will follow (for now) to fi nd the eigenvalues and eigenvectors (elgenspaces) of a ill:J\rix.
Let A be an /I X
fI
matri x.
I. Compute the characteristic polynomial det (A - AT) of A. 2. Fmd the eigenvalues of A by solvi ng the characterist ic equatio n det (A - AI ) = 0 for A. 3. For each eigenvalue A, fi nd the n ull space of the matrix A - AI. This is the eigens pace EA, the nom:ero vectors of which arc the eigenvectors of A corresponding to A. 4. Find a basis for each eigenspace.
Example 4,18
Find the eigenvalues and the correspo nding eigenspaccs of
o A =
0 2
1 0 0 -S
I 4
:rIa
Chapler"
Eigenvalues and Eigenvcctors
Solullo.
We follow the p rocedure outlmed above. The characteristic polynomial is
- A del(A - AI) =
0
2
1
0 1
- A -5 4 - A
-A 1 = - A - 5 4- A
1
0
-
2 4- A
= - A(A' - 4A + 5) - (- 2) = _ Al + 4Al - 5A + 2 To fi nd the eigenvalues, we need to solve the characteristic equation del(A - AI) = 0 for A. The characteristic polynom ial fac tors as - (A - l/(A - 2) . (The I:actor Theorem is helpful here; see Appendix D.) Thus, the characten stic equal ion is -( A - 1)2(A - 2) = O,whichclea rl yhassolutionsA = 1 and A = 2. Since A == 1 i5a muhiple root and A = 2 is a simple root, let us label them AI = A2 == 1 and A.I = 2. To fi nd the eigenvectors correspondmg 10 AI = A2 = I, we find the null space of
A - II ==
- I
1
0
-I
2
-5 4- 1
0 1
-I
1
0
-I
2
-5 3
=
0 1
Row reduction produces
- I
1
0 0
0
-I
1 0
2
-5 3 0
[A - I IO) =
,
1
0
0
1
- I 0 - I 0
0
0
0 0
x, ( We knew in advance that we m ust get at least one zero row. Why?) Thus, x =
X2
is
X,
x,
in the eigenspace E I if and only if XI - Xl == 0 and X:! = O. Set ting the free vari· able x, = t, we see that XI = t and X:! = t, from which it follo\vs Ihal
, , ,
1 ""
t 1
1
== span
1 1
1
To find the eigenvectors corresponding to AJ = 2, we fi nd lhe null space of A - 21 b y row reduction :
- 2 fA - 21 10) =
o
1 0 0 - 2 I 0
2
- 5 2 0
,
-!
1
0
o o
1
0 -, 0
0
o
,
0
<, So x =
X2
is in Ihe eigenspace ~ if and only if X I = ~Xl and
X2
=
!x. . Setting the
x, free variable
x, ""
E, =
I, we have
,"
",,
,, , ,, I
== span
,, ,
, I
1 = span
2 4
$«lIon 4.3
291
Eigenvalues and Elgen\'ectors of Il X" Matrices
where we have cleared denominators in thc basis by multi plying th rough by the least common deno mi nator 4. (Why is this permissible?)
ReDlarl! Notice that in E.xa mple 4. 18, A is a 3 X3 matrix but has only two d istinct eigenvalues. However, if we count multiplicities. A has exactly three eigenvalues (.A "" I twice and A = 2 once) , This is what the Fundamental Theorem of Algebra guarantees. Let us define the algebraic multiplicity of an eigenvalue to be its mu.bi ~ , plicity as a root of the ch-aracteristic equation . Thus, A = I has algebraic muhiplicity 2 and A = 2 has algebraic.muhi"licity I. Next notice that each eigenspace has a basis consisting of Just onc vector. In other wo rds. dim E, = dim ~ = I. Let us define the geometri(;. m ultipiicity o£an eigenvalue , A to be dim EA, thedi m e ns io n of its corresponding eigenspace. As you will see in Sec" tion 4.4, a comparison of these two notions o f multiplicity is Importa nt.
Example 4.19
Find the eigenvalues and the corres ponding eigenspaces o f
- \
A
Solation
=
0
1
3 0 \ 0
-3 - \
The cha racteristic eq ua tion is
0 = det(A - Al) ~
- A(A'
- \ - ).
0
1
3
- ).
-3
1
0
- I - A
=
= -A
-\ - A
1
1
- \ - A
+ 2A ) ~ - A'(A+ 2)
I-Ie nce, the eigenvalues are A1 = A2 ::: 0 and A) ::: - 2. Thus, the eigenvalue 0 has algebraic m ultiplicity 2 and the eigenvalue - 2 has algebraic multi plicity I. For A1 = Al = 0, we compute
[A - 0 1 10J
[A I oj
~
- I
0
3
0
1 0 -3 0
1 0
- 1 0
~
,
1 0
o o
- \ 0
0
0 0
0
o0
x, fro m whICh it follows that an eigenvector x ""
Xl
lJl
Eo satisfies XI =
Xl .
Therefore,
x, both >; and xJ are free. Seu ing >; = sand xJ = t, \....e have
,
Eo =
, ,
,
I
0
+,
1
0
= span
1
0
0 1 1 • 0 0 I
For AJ = -2,
1 0
[A - (-2)1[ OJ
~
[A + 21 10 J
~
1 0
2
- 3 0
1 0
1 0
3
,
1 0
I 0
o
1
- 3 0
o
0
0 0
lIf
Chapter 4
Eiglmvalues and Eigenvectors
so x) :::
t iS
Cret and
XI
== - xJ
::: -
Ai = 3x.l == 3r. Consequently,
r and
- I
- I
31
=
- I
3
t
== span
1
1
3 1
It follows that AI = A2 == 0 has geometric m ultiplicity 2 and Aj == - 2 has geometric multiplicity I. (Note that the algebra ic multlplteity equals the geometric mul tiplicity Cor each cigenv:llue.)
In some situations, the eigcnv.tlues of a matrix :Ire very easy to find . If A is II triangular matriX, then so is A - AI, and Theorem 4 .2 says that det (A - AI ) is j ust th e product of the diagonal entries. This imphes that the characteristic equation of a triangu lar matrix is
(all - A)(a22
-
A) '
A) == 0
'( 0 _ -
fro m which it follows immediately that the eigenvalues are AI == a ll' A2 == aw "" An = Il"w We summarize this result as a theorem and illustrate it with an example.
T he eigenvalues of a triangular matrix are the entries on its main diagonal.
lumple 4.20
The eigenval ues of
A-
2 0 0 - I 1 0 3 0 3 5 7 4
0 0 0 -2
arc AI = 2, A2 = 1, Aj == 3, A4 = - 2, by Theorem 4.1 5. [Indeed, the characte ristic polynomial isjust (2 - A)( I - A)(3 - ,-\)( - 2 - A).)
No te that diagonal matrices are a special case of Theorem 4. 15. In fact, a diagonal matrix is both upper and lower triangular. Eigenvalues captu re much importan t information about the behllvior of a matTix. O nce we know the eigenvalues o f a mat rix. we can deduce a great many things withou t doing any more work. The nextlheorem is one of the most imponant in this regard.
Theo". 4.16
A square matrix A is invertible if and only if 0 is
/lot
an eigenvalue of A.
-
.. •
Proof Let A be a square matrix. By Theorem 4.6, A is invertible if and only if
'*
'*
d el A O. But det A 0 IS equivalent to del (A - 01) -+ 0, which says that 0 root of the characteristic equation of A (Le.,O is not an eigenvalue of A).
IS
not a
We ca n now extend the Fundamental Theorem of Inverti ble Matrices to include results we have proved in this chapler.
Section 4.3
Theorem 4.11
EIgenvalues and Eigenvectors of nX n Matrices
293
The Fundamental Theorem of Invertible Matrices: Version 3 Let A be an
nX 11
matrix. The follow ing statemen ts are equivalent:
a. A is invertible. b. Ax = b has a un ique solution for every bin R". e. Ax = 0 has only the trivial solut ion. d. The reduced row echelon fo rm of A is In' e. A is a product of elementary matrices. f. rank(A) = tl g. nul1 it y(A) = 0 h. The column vectors of A are linearly independent. i. The column vectors of A span IR". j. The column vectors of A form a basis for [Rn. k. The row vectors of A arc linearly independent. 1. The row vectors of A span !R". m. The row vectors of A form a basis for \R". n.detA'l-O o. 0 is not an eigenvalue of A.
Prool The equivalence (a) <=} (n) is Theorem 4.6, and we just proved (a) ¢:;> to) JI1 Theorem 4.16. There are nice formulas for the eigenval ues of the powers and inverses of a matrIX.
Theorem 4.18
Let A be a square matrix with eigenvalue A and correspondi ng eigenvector x. a. For any positive integer 11, ,\ " is an eigenvalue of A " with correspond ing eigenvector x. b. If A is invertible, then II A is an eigenvalue of A- I with corresponding eigenvector x. e. For any integer tI, An is an eigenval ue of An with corresponding eigenvector x.
Prool We arc given th
Ax = '\x.
(a) We proceed by induction on tI. For" = 1, the result is just what has been given. Assume the result is true for II = k. That is, assume that, for some positive integer k, Akx = ,\ l X. We must now prove the result for n = k + I. But AH IX
= A(AkX) = A(A"x)
by the induction hypothesis. Using property (d ) of Theorem 3.3, we have A(,\kX)
= ,\'Ax)
=
Al( Ax )
= Ak+I X
Thus, Ak+ IX = Ak+ lX, as req uired. By induction, the result is true for al l integers /I ~ 1. (b) You are asked to prove this property in Exercise 13. (c) You
Z94
Chapter 4
Eigenvalues and Elgtnv«tors
Example 4.21
compute [~ : r[~]. Solullon
Let
A ""
[~
A
eigenvalues of 3re AI = and
V2
=
:]
and x
=
-I and A2
=
[ ~ ] ; then what we want to find is A
10
2, with cor responding eigenvectors VI --
x. The [
I]
- I
[ ~]. That is, AV I "" -V I
and
AV j ""
2v2
(Check this. ) Since l V I' v 2! forms a basis for 1R2 (why?), we call write x as a linear combi nation of VI and vz. Indeed, as is easily checked, x = 3v1 + 2v2 . Therefore, using Theorem 4.18(a), we have A lOx "" A IO(3v l
+ 2v2)
= 3(AIOvl )
+ 2(AICV 2)
"" 3(A:O)v l + 2(A~O) Vl =
3(-1)"[ I] + 2(2,,)[1] -I
2
= [
3 + 2"] [20SI] - 3+2 4093 12
=
This is certainly a lot easier than comput ing AIO first; in fact, there arc no matrix m ultiplications at all!
When it can be used, the method of Exa mple 4.21 is quite general. We summarize it as the following theorem , which you arc asked to prove in Exercise 42.
I
Theorem 4.19
Suppose the nXtlmatrix A has eigenvectors vI' vl > ••• > v'" with corresponding eigenvalues AI' A2, ••• , Ant' If x is a vector in Rq that can be expressed as a linear combination of these eigenvectors-say, x = eiv i
+ elv: + ... + c,..v",
then, for any integer k,
Wlrll_. The catch here is the "if" in the second sentence. There is absolutely no guarantee that such a linear combination IS possible. The best possible situation would be if there were a basis of IR" consist ing of eigenvectors of A; we will explore till S possibility fur ther in the next section. As a step in that direct ion , howeve r, we have the following theorem, whi ch states lhat eigenvectors corresponding 10 distillct eigenvalues are linearly independent.
Theore.4.20
Let A be an nX n matrix and let AI' A1> ••• , A", be distinct eigenvalues of A with cor· responding eigenvectors VI' VI" '" v .... Then VI' v2> ... ,v",are linearly independent.
SKtion 4.l
2:15
Eigenyalues and Eigen\lectors of nXn Matrices
PrOI' The proof is indirect. We will assume that VI' v z•...• V", are linearly depew/em and show that this assumption leads to a contradiction. If v" VZ' ...• V'" are linearly dependent, then one of these vectors must be expressible as a linear combination of the previous o nes. ~t vH I be the first of the \lectors v, that can be so expressed. In other words, VI' VZ" '" V1 are linearly independent. but there are scalars c.' ~ , ... , c1 stlch that (I )
Mul tiplying both sides of equation ( I) by A fro m the left and using the fact that Av, = A,v, fo r each i, we ha\le
+ C:!V2 + ... + CkV ,,)
Ak+I Vl+l = AVhl = A(ci vi = ClAVI
+ C:!Ayz + ... + cl!\vi;
= c.A.v.
+ C:!Azvz + ... +
Now we multiply both sides of equatio n ( 1) by " hIYH I = c. Ah. v.
(2)
c.tAl,vI;
"t+. to get
+ C:!" Ir+ IYZ + .. + Cp\.l:+ lvk
(3)
When we subtract equation (3) from equation (2), we obtain 0 = cl(AI
-
Ahl)VI
+ Cz{Az - A1", I) V2 + ... + c.{A l - AI:+ .)vl;
The linear independence of v.' vl ' " ., Vi; implies that
c.( A. - Au-I) = Cz{Al - AH I) = ... = r.{Al - AI;+I) "" 0 Since the eigenvalues A, arc all distinct. the terms in parentheses (A, - A•• I), I = 1••.. , k. afe all nonzero. Hence, ' J = '1 = ... = '1 = O. This implies that vhl = ' .v.
+ C:!v2 + ... + ClYl
= OV I
+ OV2 + ... + Ovl = 0
which is impossible, since the eigenvector vI:+ 1 cannot be lero. Thus, we ha\le a contradiction, which means that our assumption that VI>V l ' " • >v'" are linea rly dependen t is false. It follows that Y. , V1'. ' " V'" must be linearly independent.
/n exercISes /- 12, compute (a) the characteristic polynomial of A, (b) the eigenvalues ofA, (c) a basIS for eacl! eigenspace of A, lIlid (d) tile algebmic alld geometric multiplicity ofeach eIgenvalue. I. A=[
3. A =
5. A =
I
- 2
~l
2.A =[
~l
2
-\
0
\
0
0
\
3
\
\
0 0 3
0 0 0
\
- 2
\
2
- \
0 0
- 2
\
4. A = 0
\
0
3
\
\
- \
\
0
\
\
6. A =
3 2
-, 0
10. A
\
\
- \
\
,-
11. A =
- \
0
2
0
-\
- \
\
\
0
0 \ 4 0 0 3 0 0 0
5
2
\
0
0
- \
0 0 0 0
\
\
\
\
4
0
0
9.A =
3
- \
8. A =
2 2
\
\
2
7. A =
\
\
0 0 0 0
\
\
0 2 3 - \ 0 4
=
\
\
2
296
Chapter 4
Eigenvalues and Eigenvectors
I
0
12. A :::
4 0 0 4 0 0 0 0
I
I
I
2 0
3
(b) Using Theorem 4. 18 and Exercise 22, find the eIgenvalues and eigenspaces of A-I , A - 2/, and A + 21. 24. Let A and Bbe n X II matrices with eigenvalues A and J.L, res pectively. (a ) Give an example to show that A + J.L need not be
13. Prove Theorem 4.1 8(b). 14. Prove Theorem 4.18(c}. [Hint: Combine the proofs of parts (a) and (b) and see the fo urth Remark follow ing
Theorem 3.9 (p. 167}.!
I" Exercises IS and 16, A tors V ,
::: [ _ : ]
ami V2
IS
a 2X2 matrix with eigenvec-
::: [ :
J corresponcling to elgenvailles
!
AI ::: and Al = 2, respectively, and x :::
[~].
15. Find Al(lx.
an eigenvalue of A + B. (b) Give an example to show thtH AJ.L need not be an eigenvalue of AB. (c) Suppose A and J.L cor respond to the sallie eigenvector x. Show that, in t his case, A + JJ. is an eigenvalue of A + Hand AJ.L is an eigenvalue of AB. 25. If A and Bare IWO row equivalent matrices, do they necessarily have the S;lme eigenvillues? Ei ther prove Ihat they do or give a counterexample.
Let p(x) be tile poiynom;tli
16. Find AlX. What happens as k becomes large (i.e., k -+ oo)? II! exerCISes 17 alld 18, A is a 3X3 matrix with eigerlvectors I I I V, = 0 ' V2 = 1 ,am/v)::: I corresponding to eigellI o o vailles A, = - i, A2 ::: and A) = 1, respectivel)\ and
1,
p(x} = X' + a"_Ix"-1 + ... + a,x +
Tile companion matrix ofp(x) is Ille /I X" matrix - a" _1
- 11" - 2
I
0
0
I
0
0
0
0
C(p)-
2
x=
I
ao
-
(I ,
0
- "0 0 (4 )
0
• ••
I
0 0
26. Find the companion matrix of p(x) = x 2 - 7x + 12 and then find the characteristic polynomial of C( pl.
2
17. Find A 10x. 18. Find Akx. What happens as k becomes large (Le., k '"""" oo)? 19. (a) Show that, for any sq uare matri:x A, AT and A have the same characteristic polynomial and hence the same eigenvalues. (b) Give an example of a 2X2 matrix A fo r which AT and A have different eigenspaces. 20. Let A be a nilpotent matrix (that is, Am = a fo r some II! > I). Sho\'I' that A ::: 0 is the only eigenvalue of A. 21. letA bean idempotent matrix (that Is, A! = A).Showthat A = 0and A = I are the only possible eigenvalues of A. 22. If V is an eIgenvector of A with corresponding eigenvalue A and c IS a scalar, show Ihat v is an eigenvector of A - cI with co rrespondi ng eigenvalue A - c. 23. (a) Find the eIgenva lues and eigenspaces of A=
[~ ~]
27. Find the companio n ma trix of p(x) = xl + 3x2 4x + 12 and then find the characteristic polynomial
of C( pI. 28. (a) Show that the companion matrix C( p ) of p(x) ::: xl + ax + b has characteristic polynomial A2 + aA
+
b.
(b) Show that if A is an eIgenvalue of the companion
~
matrix C( p) in part (a), then [ ] is an eigenvector of C( p) corresponding to A. 29. (a) Show that the companion matrix C( p) of p(x) ::: Xl + ax 2 + bx + c has characteristic polynomial _( A' + aA 2 + bA + c). (b) Show that If Aisan eigenval ue of the compan ion
A' matrix C( I'} in part (a), then
A isan eigenvector I
of C( p) corresponding to A.
Section 4.3
30. Construct a tloili riangular 2x 2 matrix wit h eigenvalues 2 and 5. (Hint: Usc Exercise 28.)
33. Verify the Cayley- Hamilton Theorcm fo r A =
2, the companion matrix C( p) of p(x) "'" x" + tl.. I x "~ I + ... + a , x + ao has characteristic polynomial (- \ )"p (A). 1HlMt: Expand by (ofacton along the last colum n. You may find it helpfu l to introduce the polynomial q (x) = ( p(x) - '\I)/x.1 (b) Show that if A IS an eigenvalue of the compan ion matrix C( p } in equation (4), then an eigenvector corresponding to A is given by II
~
- I
0
1
- 2
I
0
powers mId inverses of flit/trices. I-o r example, if A is a 2X 2 matrix with ,/ramaer;st j, poiynomitll ' ,,( A) ::< A2 + aA + b, thenA 2 + aA + bl= O,so
+
A
~
I
~
a" _ I A~-1
+ . + a,A + aoI
Au imporlrlllt theorel/llll (Idwlllced li" eM alge/ml says 111(11 if c,.(.A) IS the ciltlftlc/eriS/1Cpolynomial of the matrix A. lilen CA( A) = (itl words, every matrix satisfies its characterislic equotioll). This IS the celebrated Cayley-Hamilloll r 1leorem, mll1leti after Arthur Cayley ( 182 1- 1895) Ilud SI( WjJ{iam Rowan Hamiltotl (sec page 2). Cayley proved this tllt.'orem ill 1858. HamiltOll dlscoveretl iI, illdepemle1lf/); ill IllS work all quaterniorlS, (I gellemlizat;oll of tile complex nllmbers.
- aA - bl
AI = AA2 = A( - (11\ - bf) = - aA 2
A~
~].
'n,e Cayley- Hamilton TI ,corem can be used to ca1cultJfe
and
I\A ) "'"
-
34. Verify the Cayley-Hamilton Theorcm for A :: I I 0
A 2::11:
If p{x) = x" + tl..~ I X "- 1 + ... + alx + ao and A is a square matrix, we alii define tl sqllflrc rnatflX P(A) by
[~
That is, find the characteristic polynomial c,,( A) of A and show that cA(A) = 0.
3 1. Const ruct a tlont riangular 3 X3 matrix with eigenvalues - 2, I,and 3. ( H int: Use ExerclS<' 29.) 32. (a) Usc mathematical induction to prove that, for
til
Eigenvalues and Eigenvectors of n X n Matrices
bA
-
- ,(- aA - bI) - bA (a' - b)A + abi
It is e(lsy to sec tll(l( by cOIl/hilling 111 1/115 fil5hiOIl we am express allY pO$itive power of A as a /illear combill(ltioll of I and A. From A~ + tlA + hI :: 0 , we also obtai" A(A
+ al)
= - hI, so ,
I
a b
A- = - - A - - /
°
b provided b '# O.
35. For the ma trix A in Exercise 33, usc the CayleyHamilton Theorem to compute A 2, Al, and A4 by expressing each as a line,lf combination of I and A. 36. Por thc matrix A III Exercise 34, use the CayleyHamilton Theorem to compute AJ and A4 by expressing each as a linear combination of I, A, and A2. 37. For the matrix A in Exercise 33, use the CayleyHamilton Theorem to compute A I and A 2 by expressing each as a linear comblilation of I and A. 38. For the matnx A in Exercise 34, usc the CaylcyHamilton Theorem to compute A - I and A 1 by expressing each as a linear combination of I, A, and A2.
I
I•
39. Show that if the square matrix A can be partitioned as A =
[ 8+~l
where P and S are S
291
Chapter 4
Eigenvalues and Eigenvectors
40. Let AI' Al , . . . , A" be a complete sel of eigenvalues (repetitions included) of the IIX 11 matri x A. Prove that det(A) tr(A)
" IA2 ... A ~ and = AI + A2 + ... + =
A~
l Hint: The characteristic polynomial of A facto rs as det(A - AI) = (- I )'(A- A, )(A - A, ) ... (A- A.)
41. Let A and 8 be nXn matrices. Prove that the sum of till
the eigenvalues of A + B is the sum of all the eigenvalues of A and B indivIdually. Prove that the product of all the eigenvalues of AB IS the product of all the eigenvalues of A and B individ ually. (Compare this exercise with Exercise 24.) 42. Prove Theorem 4. 19.
Find the consta nt term and the coefficient of" n- I on the left and right sides of this equal ion.)
..=ii'-,. ~~
'i~;4 ..... .
Similarilv and Diagonalizalion A5 you saw in the 1asl section, triangula r and diagonal matrices are nice in the sense
thaI their eigenvalues are transparently displayed. II would be pleasant if we could relate a given sq uare matrix to a triangular or diagonal one In such a way that they had exactly the same eigenvalues. Of course, we already know one procedure for converting a square mat rix into tTiangular fo rm-namely, Gaussian elimination. Un fortunately, this process does not preserve the eigenvalues of the matrix. In this section, we consider a different sort of tr.msformation of a matrix that does behave well with respect to eigenvalues.
Similar MaUices
DenalUon
Let A and B be /IX 11 matrices. We say that A is similar to B if there is an invertible n X n matrix P such that P- IAP = B. If A is similar to B, we write
A- B. R8marks • If A - B, we can write, equivalently, that A = PBP- I or AP = Pil. • Similarity is a re/tIli()1I on square matrices in the same sense tha t "less than or equal to" is a relation on the integers. Note that there is a direction (or order) implicit in the definitIOn. Just as a :S beloes not necessarily imply b -< a, we should not assume that A - B implies B - A. ( In fact, this is true, as we will prove in the next theorem, but it does not follow immediately from the defi nition,) • The matrix Pdepends on A and B. It is not unique for a given pair of similar matrices Aand B. To see this, simply take A = B = 1, in which case 1- l,si nc.:e p -I IP = I for allY invertible matrix P.
Example 4.22
LetA =
[~
'],ndB = [-2I -\
0]
- I
. [' -1][ I 0]
. Then A - 8, Since
I
I
- 2
- I
Section 4.4
Th us, AP == PH with P == [ :
Similarity and Dlagollalization
299
-: ] . (Note I hal it is nol necessary to com pute p - I.
See the first Rema rk above.)
Theorem 4.21
Let A, U, and C be /IX n matrices. a. A - A. b. If A - B, then B - A. c. If A - Band 8- G.lhen A- C
Praal
(a ) This proper ty fo llows from the facl that
r
l
Ai == A.
(b) If A - 8, then p - IA/) == B for some invertible matrix P. As noted in the first Remar k above, this is equi valent to PBP - ' = A. Setting Q == p - I, we have Q- 1BQ = ( P -l )- 18P- 1 = PBP- l = A. Therefore, by defin ition, H - A. (c) You are asked to prove property (c) in ExerCIse 30.
Rllllla,' Any relation satisfying the th ree proper ties ofThro rem 4.2 1 is called an equivaletlce relation. Equivalence relations arise frequent ly in mathematics, and objects that a re related via an equivalence relation usually share importllnt properties. Vie arc about to see that tIllS is t rue of similar matrices.
,
Theorem 4.22
Let A and B be /I X II matrices with A - B. Then a. b. c. d. e.
det A ::: det H A is invertible if and only if H is invertible. A and B have the same rank. A and 8 have the same characteristic polynom ial. A and H have the same eigenvalues. ,
Plool We prove (a) and (d ) and leave the remaining properties as exercises. If A - H, then P-1 AP = 8 fo r some invertible matrix P. (a) l llking determinan ts of both sides, we have det B = det(P- 'AP) = (del p - I)( det A)(det P)
~( det1 P)(det A)(dctp)
= delA
(d ) The characteristic polyno m ial o r H is del (B - AI) ::: dct(P- 1AP - AI)
= det (p- 1AP - Ar ' IP) det(P- 'AP - p- I(AJ)P)
d" (P- '(A - M)P) ~ dot (A - AI) with the last step foll owing as in (a). Thus, de t(lJ - AI) = det(A - AI); tha t is. the characteristic polynom ials of B and A arc the same.
all
Chapter 4
EigenvaJu~
and EigenV«toN
Remark
Two mat rices may have properties (a) through (e) (and mo re) in co mmon
and yet still not be similar. For example, A =
[ ~ ~] and B - [~
: ] both have de-
terminant I and rank 2, are invertible, and have characteristic polynomial (1 and eigenvalues A, = A2 = 1. But A IS not similar to B, since p - IAP "" P-I IP = I fo r any invertible matrix P. Theorem 4.22 is most useful in showing that two matrices are and Bcannot be similar if any o f properties (a) th rough (e) fa ils.
Eumple 4.23
(a) A =
[ ~ ~ ] andB = [ ~ ~]are not similar, sincedetA =
(b ) A =
[ ~ ~] and B = [ ~
/lo t
"y
'* B
similar, since A
- 3 butdetB = 3.
_ : ] are not similar, since the characteristIC polyno-
m ial of A is A1 - 3'\ - 4 while that of B is A1 - 4. (ChK k this.) Note that A and B do have the same d eterminant and rank, however.
Dlauonallzallon Th e best possible situation is when a square matrix is similar to a d iagonal matrix. As y<:m are about to see, whether a mat rIX is diagonalizab le is closely related to the eigenvalues and eigenvectors o f the mtllrix.
DenuUlon
An /I X " matrix A is tliago mdizable if there is a d iagonal matrix D such that A is similar to D- that is, if there is an invert ible /IX" matrix P such that P- IAP= D.
Example 4.24
II =[ ~ ~ ] isdiagOnali7.able Since' ifP :::[ : _~ ] andD =[~ _~J. thenp- IAP= D' as ( an be easily checked. (Actually, it is faster to check the eqUivalent statement liP = PD, since it docs not require findi ng P- 1.)
Example 4.24 begs th e question of where matrices P and D came from. Observe that the diagonal entries 4 and - 1 o f D arc the eigenvalues of A, since they are the roots of its characteristic polynomial, which we found in Example 4.23(b ). The ongin of matnx P is less obvLous, but, as we 3re about to demonstrate, its en tries are obtained from the eigenvecto rs of A. Theorem 4.2 3 makes th is connection precise.
Em
Let A be an n X " matrix. Then A is diagona lizable if 3nd only if A has /I linearl y independent eigenvectors. More precisely, there exist an invertible matrix P and a diagonal matrix D such that p- 1AP = D if and only if the columns of Pare tllinearly independent eigen....ectors o f A and the d iagonal entries of D are the eigenvalues of A corresponding to the eigenvecto rs in P in the same order.
Similantyand Diagonalization
Sn-lion 4.4
311
Proof Suppo~ firs t that A is Similar to the diagonal matrix D via P ' AP = Dar, equivalently, AP = PD. Let the columns of P be P I' Pl' ... , p" ,md let the diagonal entries of Dbc AI' A2, ••. , An. Then
A, A[p, p,
.. . p.]
= [p,
p,
• ••
p.]
0
. ..
0
A, ...
0
0
0 0
(I)
A.
Ap, . . . Ap ~] - (A,P I "2P2
(2) A"p~] where the righ t-hand side is just the column- row representation of the product PD. Equating columns, we have
0'
[ API
API = A,p ,. Apz = A2Pl" .. , Ap..
=
A"p..
which proves that the column vectors of Pare eigenvectors o f A whose corresponding eigenvalues arc the d iagonal entries of D in the same order. Since P IS invertible, ItS colum ns are lmearly independent, by the Fundamental Theorem of Invertible Matrices. Conversely, if A has 11 linearly independen t eigenvectors PI' P2' . .. , Pn with corn:sponding clgenvalucs A" AI' ... , A ~, respectively, then
API
=
AIPI ,Ap;: "" AlPl. · .. ,A p ~ =
A"p n
This implies eq uation (2) above, which ISequivalent to equation ( I ). Consequen tly. if we take P to be the " X " matrix with col umns PI' P2' ... , Pit' then equalion ( 1) becomes AP = PD. Si nce the columns of Pare linearly independent, the Fundamental Theorem of Invertible Matrices implies that P is Invertible. so P-IAP = D; that is, A is diagonalizable.
lxample 4.25
If possible, find a matrix P that diagonal izes A =
o
I
0
0
0
1
2 -5 4 Solution We studied this matrix in Example 4.18, where we discovered that iI has eigenvalues AI = .\.2 = 1 and A~ = 2. The elgenspaces have the follow ing bases: 1
For AI = A2 = I , EI hastxlsis 1 1 1
For AJ = 2,
Eh has baSIS 2 .
4 Since all other eigenvectors arc just multiples of one of thesc two baSIS vectors, there cannot be th ree linea rly independent eigenvectors. By Theorem 4.23, therefore, A is not diagonalizable. ~
Ixample 4.26
If possible, find a matrix P that diagonalizes A =
- I
0
1
3
0
- 3
I
0
- I
30%
Chapter 4
Eigenvalues and Eigenvectors
Soliliol This is the matrix of Ex 3m pie 4. 19. There, we found that the eigenvalues o f A are AI "" A2 == 0 and A, ::::I' - 2, with the following bases for the elgenspaces: 0
1
1 and pz ==
0
0
1
For AI = Al == O. Eo has basis P I = - I
3 1
II is straightforward if we take
[0
check that these three vectors are linearly independen t. Thus,
o P - [PI
P2 P,l] =
I
- \
1 0
3
o
1
1
then P IS invertible. Furt hermore.
o p- IAP =
o o
0
0 0 o 0
= D
- 2
as can be eaSily checked. (If you are checking by hand, It is much easier to check the equivalent equation AP = PD.)
Rllllrkl • When the re are eno ugh eigenvoctors, they ca n be placed into the columns of P in an y order. However, the eigenvalues will come up on the diagonal of D 111 the same o rder as their correspondi ng eigenvectors in P. f o r exam ple, if we had chosen
P = [ PI p, p,] =
0
- I
1
1
3
0
I
0 1
the n we would have found
0 p- IAP "'"
0 0
0
0
-2 0 0
0
• In Example 4.26, yo u were asked to check thaI Ihe eigenvectors PI' Pl' and P, were linearly independent. Was it necessary to check thiS? We knew that I PI' p zl was linearly independent. since it was a basis for th e eigenspace ~. We also knew that the sets IPI ,p, 1and {P2' p,1were li nearl y independen t. by Theorem 4.20. But .....ecould not conclude from this informat ion that I P I' Pl' PJ} was linearly independent. The next theorem, however, guara ntees that hnear independence is preserved when the bases of d itTerent eigcnspaces are combmed.
Theull.. 4.24
Let A be an n X 11 matrix and let AI' Az•... , Ai be distinct eigenvalues of A. If B, is a basis for the eigenspace EA, . then B = B l UB1 U ... UB, (i.e., the to tal collection of basis vectors for a ll of the eigcnspaces) is linearly independe nt.
•
Section 4.<1
PrODf
Similarity and Diagonalization
303
Let B, = {Vii' v,:!"'" v"J for i = \, .. .• k.. We have to show that
is linearly independent. Suppose some nontr ivial linear combination of these vectors is the zero vector-say, (CII V II
+ ... +
Cln, V ln ,)
+ (C21 V21 + .. + Cz",V2~, ) + .. + (C1"\ VU + ... + Cbl, Vk~,)
=
0 (3)
Denoting the sums m parentheses by X I' Xl "
.. Xl'
we can write equation (3) as (4)
Now each x, is in EA. (why?) and so either is an eigen vector corresponding to A, or is O. But. since the eigenvalues A, arc distinct, if allY of the factors Xi is an eigenvector, they are linearly mdependent, by Theorem 4.20. Yel equation (4) is a linear dependence relationship; this IS a cont radiction. We concl ude that equation (3) must be trivial ; that is, all of its coefficients arc zero. Hence, B is llllearly independent. There is one case in which diagonalizability is automatic: an II distinct eigenvalues.
Theorem 4.25
If A is an fiX n matrix with
n
/I X /I
matrix wi th
distinct eigenvalues, then A is diagonalizable.
Let v I' Vi " ' " v ~ be eigenvectors corresponding to the 11 distinct eigenvalues of A. (Why could there not be more than 11 such eigenvectors?) By Theorem 4.20, VI' v z• ... , v~ are linearly independent, so , by Theorem 4.23, A is diagonalizable. Proof
Example 4.27
The matrix A ""
2
-3
7
0
5
I
o
0
- 1
has eigenvalues Al "" 2'..\2 "" 5,and A, = - I, by Theorem 4. \ 5. Since these are threedistinct eigenval ues for a 3 X 3 matrix. A is diagonalizable, by Theorem 4.25. (If we actually require a matrix P such that P- IAP is diagonal, we must still compute bases for the eigenspaces, as in Example 4.1 9 and Example 4.26 above. )
The final theorem of this section is an Important result thai characterizes diagonalizable matrices in terms of the two notions of multiplicity that were introduced following Example 4.18. It gives precise conditions under which an /IX /I mat rix can be diagonalized, even when it has fewer than II eigenvalues, as in Example 4.26. We first prove a lemma that holds whether or not a matrix is diagonalizable.
lemma 4.26
If A is an "X II matrix, then the geometric multiplicity of each eigenvalue is less than or equal to its algebraic multiplicity.
314
Chapter 4
Eigenvalues and Eigenvectors
Proof
Su ppose Al is an eigenvalue of A with geometric multipliCity p; that is, dim £A, "" p. Specifically, let El , have basis B I = Iv l • v 2" '" ' vI'l. Let Qbeany invertible IIX II matrix having vI' v 2'" •• , vI' as ils fi rst pcolum ns-say,
Q=
[ VI
"
vI'
vp + 1
.•.
v..l
o r, as a pa rt itioned matrix,
leI
where C is px II. Since the columns of U are eigenvectors corres ponding to AI' AU = AI U. We also have
from which we obtai n CU "" II" CV = 0, DU = 0 , and DV = 1"_p ' Therefore,
Q-tAQ ""
(-~]A[U!VJ "" [£~'~ 'f'~1r] = [·~;ft+~~·~ ] = [·~8P. j. ~~:J
By Exercise 69 in Section 4. 2, it fo llows thm det(Q- IAQ - AI) = (AI - A}pdet(DAV - AI)
(5 )
But det ( Q-IAQ - AI ) is the cha rac teristIC polynomial o f Q- IAQ. which is the same as the characteristic polynomial of A, by Theorem 4.22( d). Thus, equation (5) implies that the algebraic multiplicity of AI IS at least p, its geometric multiplicity.
Theorell 4.21
The DiagonaJization Theorem Let A be an IIX II matrix whose d istinct eigenvalues are A" Al •. .. , A1" The following statements are equivalent: a. A IS diagonalizable. h. The union B o f the bases of the eigenspaces of A (as in Theorem 4.24) contains II vectors. c. The algebraic mult iplicity of e.leh eigenvalue equals its geometric multiplicity.
Proof
(a) ~ ( b) If A is diagonalizable, then it has II linearl), independen t eigenvectors, by Theorem 4.23. If " , of these eigenvectors correspond to the eigenvalue A" then B, contains at least II, vectors. ( We already know that these " i veclo rs are linearl y ind ependent; the only th ing that Illlght prevent the m fro m being a basis fo r £.1.. is tha t they m ight not span it.) Thus. B contains at least II vectors. But, by Theorem 4.24, B is a linearly independent set in an; hence, it contains exactly n vectors.
Section 4.4
Similarity and Dlagonalizalion
105
(b) => (c) lei the geometnc m ultiplici ty of A, be d, = d im fA, and let thc algebraic multiplicity of A, be III " By Lemma 4.26, d, < III , fo r j "'" I, .. " k. Now assume that property (b) holds. Then we also have
Bul nil + "'1 + ... + /Ilk = II, since the sum of the illgebraic multiplicities of the eigenvalues of A is just the deg ree of thc characteristic polynom ial o f A- namely. II, lt follows IhOlI ell + dz + .. + cll, ., "'1 + nil + ... + "'1> wh ich implies that
(6) Using Lem ma 4.26 again, we know Ih31 III; - el, > 0 for j :z I, ... , k, from which we can ded uce that each summand in equat io n (6) is zero; that is, III, = d, fo r i '" 1, ... , k. (e) ::::} (a ) If the algebraic multi plicity III, and the geometrIC multi plicity (I, 3fC equal for each eigenvalue A, o( A, then l3 has til + dz + ... + til = "' 1 + rl/ 2 + .. . + 1111 = n vectors, which arc linearly independent. by Theorem 4.24. Thus, these are nlinearly independent eigenvectors of A, and A is diagonalizable. by Theorem 4.23. -~
Ilample 4.28 (a ) The matrix A =
o
\
0
o \
0
from Exa mple 4. 18 has two d istinct eigenvalues,
2
-5 4 Al = Az = 1 and A} = 2. Since the eigenvalue Al = AJ = \ has algebraic multiplicity 2 bu t geo metric multiplicity I, A is not diago nalizable, by the Diagonalization Theorem. (See also Example 4.25.) - I 0 \
(b) The matrix A =
3 0 -3 (rolll Example 4. 19 also has two d istinct cigenI 0 - \ values, AI = A2 = 0 and A} = - 2. The eigenvalue 0 has algebraic and g~'Ome t nc mult iplicity 2. and the eigenvalue - 2 has algebraic and geometric multiplicity I. Thus, thts ma trix is diagonalizable, by the Diagonalization Theorem. (This agrees with ou r fi ndi ngs in Example 4.26.)
We conclude this section with an applICation of diagonalization to the computatio n o( the powers o( a matrix.
( .. IIPIe $olullon
In Example 4.21 , we (ound Ihat this matrix has eigenvalues At = - \ and
A2 = 2, with corresponding eigenvectors VI == [_ : ] and V2 =
[ ~]. It fol lows (fro m
anyone of a number of theorems in Ihis section) thaI A is diagonalizable and P-t AP = D, where
v,) _ [ \ - \
;]
and
306
Chapter 4
Eigenvalues and Eigenvectors
Solving for A, we have A = PDP- I, which makes it easy to find powers of A. We co mpu te A2 = (PDp- I )( PDp- I)
= PD(P- lp)Dp- 1 = PD/Dp-l
= PD 2p- l
a nd, generally, N = PlY' p- 1 for all n > \. ( You should ve rify this by induction . Observe that this fact will be true fo r tilly diagonaliwble matrix, not just the one in th is example.) Since I)"
0]"~ [(-\)" 0]
- I = [ 0
2
0
2"
we have
An = p[1'p-1
=
[-: :][<-~)" ~,][ -: :r [-: :][<-~ )" ;"][1-n 2{ - 1) ~
3 2( _ 1)'1 -0-1
+ 2n + 2n-t-J
3
{ _ 1)~+ 1
+
2~
3 { - I t+2 + 2,,+1
3
Since we were o nly asked for A10, this is mo re than we needed. But now we can simply set 11 = 10 to find
2( - 1) 10 A10 =
+ 210 (_ 1)11 + 210
3 3 2( -1 )11 + 211 ( _ 1)12 + 2"
I. A
2.A :::: [
3. A
: ]. B =[~ ~]
=[~
=
,
- 5 0 0
4. A =
4 3 • 4
\
2 0
\
2
0 0
\
-\
B= 0
- \ ,B= \
\
0
\
4
0 0
2
3
4
2 \ 0 \ 2 0
682 683
In Exercises 5- 7, a diagonallzatioll of tile matrix A is given in the form p - I AP = D. List tile eigellvalues of A and bases for tire correspondmg eigenspaces.
5. [ _~ -: ][~ -~][ : ~]=[~ ~J ,1 , l \ \ \ 3 \ 0 \ \ - \ 6. j - ~ - I, 0 0 \ 1, , -,, \ \ 0 , 0 - \
- 7\]' B=[- 42 ~]
3
341]
3
3
In Exercises 1-4, show tlull A and B are not similnr matrices.
= [342
1
1
\
0 \
2 0 0 0 0 0
0 0 -\
=
Section 4.4
" ," _1• ,• -,• ,• • • " 6 0 0 !
1
1
_1
7.
0
-2
0
0
3 3 2 0 2 3 3 ] ]
3
0
]
2
]
0
3 - ]
- ]
12. A =
0 I\. A
0 0 2
3
3 0
]
2
0
0
2
0
3
2
]
0
0
0
0
0
3 0
2
14. A =
==
]
0
]
0
]
]
]
1 0
]
2
]
- ]
0
]
]
]
0
0 13. A ==
1
15. A =
2 0 0 0
]
0 2 0 0
0
4
0 - 2
0 0 -2
0
III general, il is dlffiw/t 10 show lilM two matrices are similar. However, if two SllIIilar lIIalrices are dlllgollalizable, tile task becomes easier. III Exercises 36-39, sholl' that A ami B are SlInilar by slJOwillg thm they are similar to liIe sallie diagonal matrix. Theil find all illvertible matrix P slIch liIM P-1 AP= B. 36. A =
20.
22.
:r -:r
4
- ]
2 ] 2 ] 2 ] 2 0
2 2 2
]
1
]
]
0
2
]
17. ]9.
•
l- ~ ~r
38. A =
[~
:r
]
]
]
0
- ]
0
0
0
-]
]
]
0
2 0
-2
2
]
]
21.
-, 23.
39. A ,."
26. A =
[ ~ ~] [~ ~]
25. A
27. A =
,
]
0
0
I 0
=
0
- 2
- 6
1 0 - 2 1 ,B =
:] 3 2 ] 2
-5 - 4
0
0
]
2
]
0
2
-3
-2
]
- ]
6
5
0 0
2
0
4
4
- ]
1 ,B 1
=
2
- ]
41. Prove that if A is diagona lizable, so is A T. 42. leI A be an invert ible matrl.'c Prove that if A is diago-
nalizable, so is A-I.
[ ~ ~] o
[:
-3 ], 8 = [-I
40. Prove that i f A is similar to B, then AT is si milar to 8 T•
In Exercises 24-29, find all (real) Vllllles of k for wliiclt A is diagolla/;zable. 24.A =
[~ _ : lB = [~ ~] 2
compllte the IIIdiC(lted power of the m(l/rtx.
18. [
]
34. If A and B are invertible matrices, show th,lt A B and BA are similar.
III Exercises 16--23, lise ti,e melllo,/ of Example 4.29 to
[-'-3
]
33. Prove Theorem 4.22(e).
37. A =
]6.
]
35. Prove that If A and 8 are similar mat rices, then tr(A) = Ir(B) . (HUll: Find a way 10 use Exercise 45 from Section 3.2.)
:]
9. A "" [-3 -I
3 1
]
]
1
k k k
32. Prove Theorem 4.22(c).
8. A== [52 ~] 10. A ==
0
]
3 1. Prove Theorem 4.22(b).
III Exercises 8- /5, determme whel/le( A is diagonalizable ami, if so, find an invertible matrix P and (l dmgonal lIIa/rix DSllch that P- 1AP= D.
]
0
29. A ==
]
311
30. ProveTheorem4.21 {c).
0 - 2
3 0 0
0
k 0 2 0
]
28. A ==
Simitamyand Diagonahl.atlon
k 0 ]
43. Prove that if II is a diagonalizable m:nrix with only one eigenvalue A, then A is of the form A = AI. (Such a matrix is called a scalar motrix. ) 44. Let A and B he /I X n ma trices, each with II distinct eigenvalues. Prove that A and B have the same eigenvectors if and only if AB = BA. 45. Let II and B be similar ma trices. Prove that the algebraic multipli cities of th e eigenva lu es of A and Bare the same.
308
Chapter 4 Eigenvalues and Eigenvectors
46. Let A and B be similar ma trices. Prove that the geometric multiplicities of the eigenva lues of A and Bare the same. (Hint: Show that, If B = P- rAP, then every eigenvector of B is of the fo rm p - I v fo r so me eigenvecto r v of A. ) 47. Prove that If A is a diagonalizablc matrix such that every eigenvalue of A is either 0 or I, then A is idempotent (that is, A2 = A ). 48. Let A be a nilpotent matrix (that is, A'" = 0 for some m > I). Prove that if A is diagona lizable, then A must be the zero matrix. 49. Suppose that A is a 6 X6 matrix with characteristic polynom ial ciA) = ( I + ,\)( \ - ,\ )l(2 - A)J.
(a) Prove that it is not possible to find three linearly independent vectors vI' v2' v3 in R6 such Ihat AVI = Vl' AV2 = v 2 ,and Av) = vJ • (b) If A is diagonalizable, what are the dimensions of Ihe eigenspaces E_I ' E], and ~?
so. Let A =
[:
!J-
(a) Prove that A is diagolltllizable if (a - (W + 4bc > 0 and is not d iago nali7.able if(n - d )l 4bc < o.
+
(b) Find two examples to demonst rate that if (a - tf)2 + 4bc = 0, then A mayor may not be diagonalizable.
,
lIeralive Melhods for CompuUng Eigenvalues
In 1824, the Norwegian mathematiCian Niel) Ilenrik Abel (1802· 1829) proved that a general fifthdegree (qUintic) polynomial equation IS nOi SO/WIble by mdica/s: that IS, there is no fo rmula for Its roots III terms of liS coeffi cients that uses only the operations of addit ion, subtraction, multiplication, division, and lakllig I1\h rOOlS. In a paper written in 1830 and published posthumously in 1846, the French mathematician Evariste Galois ( 1811 - 1832) gave a more complete theory tha t established cond itions under which an arbitrary polynomial equation ca n be sol\'ed by radicals. Galois's work was inslrumental m establiShing the branch of algebra called group II,tory; his approach to polynomial equations is now known as Galois theory.
At this point, the o nly method we have fo r comput in g the eigenvalues of a matrix is to solve the characteristic equation. However, there arc several problems with this method that render it impractical in all but small examples. The first problem is that it depends o n the computatio n of a determi nan t, wh ich is a very time-consumlllg process for large matrices. The second pro blem is that the characteristic equation is a polynom ial equatIOn, a nd there arc no fo rmulas fo r so lving polynomial equations of degree higher than 4 (polyno mials of degrees 2, 3, and 4 ca n be solved using the quad ratic fo rmula and its analogues). Thus, we arc fo rced to approximate eigenvalues in m ost practical problems. Un fo rtu nately, methods fo r approximating the roots of a polynomial are qUite senSitIVe to roundoff error and arc therefore unreliable. Instead, we bypass the characteristic polynomia l altogether and take a d ifferent approach, approximating an eigenvecto r first and then using thiS eigenvector to fi nd the correspo nding eigenvalue. In thiS section, we Will explore several variations on one such method that IS based on a simple iterative technique.
The Power Method T he power method applies to an n X n matrix that has a dominant eigenvalue AI that is, an eigenvalue that is larger in absolute value than all of the other eigenvalues. For example, if a matrix has eigenvalues - 4, - 3, 1. and 3, then - 4 is the dominant e igenvalue, since 4 = 1- 41 > 1-31 2: 131 2: Il l. O n the other hand, a matrix with eigenvalues - 4, - 3,3, a nd 4 has no dom inan t eigenvalue. The power method proceeds iterat ively to produce a sequence of scalars that converges to AI and a sequence of vectors that converges to the correspo nding eigenvector vI' the dominant eigerlvector, For simplicity, we will assume that the matrix A is d iago nalizable. The following theorem is the basis fo r the power method.
Section 4.5
Theorem 4.28
Iterative Methods fo r Computing Eigenvalues
309
Let A be an ,zX IIdiagonalizable matrix with dominant eigenvalue AI' Then there ex iSIs a nonzero vector X u such Ihat the sequence of vectors x4 defined by XI = Axe,X2 = AxI. X.l = Ax 2" " , Xl = AXk_I""
approaches a dominant eigenvector of A.
Proal
We may assume that the eigenvalues of A have been labeled so that
IAll > IAll
IA31 >- ... > IAnl
2:
Let VI' v2"" ' Vn be the corresponding eigenvectors. Since vI' v2" . . , vn are lillearly in dependent (why?), they fo rm a basis for Rn. Consequently. we ca n write Xu as a linear combination of these eigenvectors-say, = CIV I
Xo
+ '2V2 + ... + cnv"
Nowx l = Ax O, x2 = Axl = A(Ax o) = A2xo. x) = Ax! = A(A 2xO) = Ajxo, and,gencrally, Xl = Ak"o for k ~ I As we saw in Exam ple4 .21, A>XO = cI A~ 1
=
+
CzA~2
+ ... +
c"A~n
A~( (IVI + '2(~J~V2 + ... + (n(~J\,, )
0)
where we have used the fact that Al *- O. The fact that AI is the dominant eigenvalue means that each of the fractions A21 AI' AJI AI' ... , AQI AI' is less than I in absolute value. Thus,
all go to zero as k -it 00. It follows that Xl
ask _ oo
= AkXo_A}c1 v l
(2)
Now, since AI *" Oand VI *" 0, xl is approaching a nonzero multiple of V I (thai is, an eigenvector corresponding to AI) provitJed ci O. (This is the required condition on the initial vector xu: It must have a no nzero componen t (I in the direction of the dominant eigenvector VI')
"*
Example 4.30
Approximate the dominant eigenvector of A =
[ ~ ~J
using the method of Theo-
rem 4.28.
Solullon
We will take Xu =
[~ ] as the initial vector. Then
x, = Ax. = [ ;
~] [~]
= Ax , = [;
~][;]
x,
[:]
[:]
We cominue in this fashion to obtain the values of x k in 'lilble 4.1.
11'
Chapter"
Elg~nvaluts
and Eigenvectors
Table 4.1 k
0
x,
[~]
" "
I
4
3
2
5
[:] [:] [!] [ :~] [~:]
6
7
8
[::]
[:l [I7 ll 170
0.50
1.50
0.83
1.10
0.95
1.02
0.99
1.0 1
1.00
3.00
1.67
2.20
1.9 1
2.05
1.98
2.01
, 4
3
2 I
flgur.4 .13
Figure 4.13 shows what is happeni ng geo metrica lly. We know that the eigenspace for the do minant eigenvector will have dimenSIOn I . (Why? See Exercise 46.) Therefore , it is a line thro ugh the o rigin in R2. The fi rst few Iterates x.are shown along wit h the di rections Ihey determine. It appears as though lhe iterates are converging 01) Ihe line whose d irectio n vector is [:]. To con fi rm tha t this is the do minant eigenvccto r we seck, we need only observe that the ratio r~of the first to the second component of xk gels very close to I as k increases. The second line in the body of Table 4.1 gives t hese values, and you can see dearly that rl is indeed approaching I. We deduce that a do minant eigenvector of A is [ : JOnce we have found a dominant eigenvector, how can we fi nd the correspondlllg dominant eigenvalue? One approach is to observe that if an Xi is approximalely a dominant eigenvector of A for the do minant eigenvalue AI' then Xk+l
=
Ax l .. Alxt
It follows thai the ratio fl of the firs t componen t of x t _ 1 to that of Xi will approach AI
as k increases. Table 4.1 gives the valu~ of 11' and you can see that they arc approaching 2, wh ich is the dominant eigenvalue.
Section 4.5
Iterauve Methods for Computing EIgenvalues
311
There is a d rawback to the method o r Example 4.30: The componen ts or the iterates x~ get \!Cry large very q uickly and (an cause significa nt roundoff errors. To avoid this drawback. we ca n mu lti ply each Iterate by some scalar that red uces the magni tude of its componen ts. Since scalar multiples of the iterates X l will still converge to a domi nant eigenvecto r. this approach is acceptable. There are various ways to accolllplish 11. One is to normalize each Xi by dividing it by ~ x , 1 (i.e.• to make each Iterate a unit vector). An easier method-and the one we will use-is to divide each '" by the component with the maximum absolute val ue, so that the largest component is now J. This method is called scaling. Thus, if "'t denOles the component of X t with the maximum absolute value, we will replace x t by Yl = (1/ tII.)x,. We illustrate th is approach with the c:llculalions rrom Example 4.30. For X II' Ihere is nothing to do, since " '0 = I. Hence,
yo=x, - [~] We then compute X l ==
[~] as berore, but now we scale with
111 1
= 2 to get
Now the calculations change. We take
X ,=AY ,=[:
~][~5] _ [ :5]
and scale to get y,
=
(fs)[ :5] = [~67]
The next few calcu lations arc sum marized in Table 4.2. You can now see dearly that Ihe sequence of vectors Yiois converging to [ : ]. a dominant eigem'ecror. Moreover, the sequence of scalars sponding domi na nt eigenvalue AI = 2.
Table 4.2 k
x, y, I1I t
0
[~] [~]
1
1
5
co nverges to the corre-
6
7
[:]
[:5]
[:67]
[183]
[: 91 ] [195]
( ~.98 ]
[~5]
[~67]
[~83]
[~.91 ]
[~95]
[~98l
[~99l
8 [1199] 98 [~99l
2
1.83
2
1.95
2
1.99
2
2
15
3
III,
4 1.67
1.91
Chapler 4 Eigenvalues and Elgenvtttors
This method, called the power method, is summnrizcd below.
The Power Method
Lei A be a diagonalizable IIX n malfix with a correspond ing dominant eigen -
value AI' I. Let x(I ::: Yo be any initial vector in R" whose largest component is I. 2. Repeat the following steps for k "" I, 2, . .. : (a) Compute x k = Ay.I I' (b) Let fill be the component of Xl with the largest absolute value. (c) Set Yt::: ( l /mk) xl·
•
-4
For most choices of x(I' 1114 converges to the dominant eigenvalue Al and Y1- con-
verges to a dominant eigenvector.
Example 4.31
Usc the power method to approximate the dominant eigenvalue and a dommant
eigenvector of A~
-6
-4
5 12
- 12
- 2
-2
10
0
Salufloll 'laking as our initial vector x.,~
1 1 1
we compute the entries in Table 4.3. 0.50 You can see tha i the vectors )'t are appro.1ching
1 - 0.50
and the scalars
ntl
are
approaching 16. This suggests that they are, respC<:lively, a do minant eigenvector and the dominant eigenvalue of A. Ramlrlls • If the initial vector "0 has a zero component in the direction of the dominant eigenvector V I (i.e., if c = 0 in the proof of Theorem 4.28), then the power method will
Table 4.3 k
x,
y,
no,
o
1
2
3
4
5
6
1
-1
-,
- 9.33 - 19.33
6
11.67
8.62 17.3 1 - 9.00
8. 12 16.25 -8.20
8.03 16.05 - 8.04
8.0\ 16.01 - 8.01
16.00 - 8.00
0.50 1 - 0.52
0.50 1 - 0.50
0.50 1 -0.50
0.50 1 - 0.50
0.50 1 -0.50
17.3 1
16.25
16.05
16.0 I
16.00
1 1 1 1 1
-0. 17 - 0.67
1
1 - 0.60
1
6
- 19.33
0.48
7 B.OO
Sectioll 4.5
lohn William Strun ( 1841-19 19 ), Baron Rayleigh, was a British physicist who made major contributions to the fields of acoustics and optics. In 1871, he gave the first correct cKplanation of why the sky is blue, and in 1895, he discovered the inert SHS argon, for which discovery he received the Nobel Prize III 1904. Rayleigh was president of the Royal Society from 1905 to 1908 and beca me chancellor of Cambridge University in 1908. Hc used Ibylelgh quotients in an 1873 paper on vibrating systems and later in his book T1lc T1,cory of Sound.
Iterative Methods for Computing Eigenvalues
313
not convcrge to a dominant eigenvector. However, it is q uite likely that du ring the calculation o f the subsequ ent iterates, at some pain! rou ndoff error will produce an Xl with a nonzero component in the direction of v \. The power method wil l then start to converge to a multiple of v •. (This is one instance where roundoff errors act ually help!) • T he power method still wo rks when the re is a repeated d ominant eigenvalue. or even whcn the matrix is not diagonalizable, under certain conditions. Detatls may be fo und in m ost modcrn textbooks on numerical analysis. (See Exercises 21 - 24.) • For some matrices the power m ethod convcrges rapid ly to a d Olllll1ant eigen vector, while for others the convergence may be quite slow. A carefu l look at the proof of Theo rem 4.28 reveals why. Since IAd A" 2: IA}/A,I2:·" ?: IA,,/Atl , if IA1 / A,1 is close to zero, then (Al/ AI )k, . .. , ( Ani AI )k will all approach 'lero rapidly. Equation (2) then shows that Xi = A" x u will approach A ~C1V I rapidly too. As an illustration, co nsider Example 4.3 1. The eigenvalues are 16, 4, and 2, so Az/ AI = 4/16 = 0.25. Since 0.25 7 = 0 .00006, by the seven th iteration we should have dose to four·d ecimal-p];lCe accuracy. This ,s exactly what we saw. • There is an alternative way to estimate the dominant eigenvalue AI of a matrix A in conjunction with thc power m ethod . First, observe that If Ax "" Alx, then
The expression R(x) = « Ax) , x)/(x' x) is called a Rayleigh quotien,. As we compute the iterates Xl' the successive Raylcigh quo ticnts R (Xl) should approach AI ' I n fa ct, fo r symmetric mat rices. the Rilyleigh quoticn t method is abo ut twice as fast as the scaling factor method. (See Exercises 17-20. )
The Shilled Power Method and the Inverse Power Muthod The power method can help us approximate the domin(l/]( eigenvalue of a matrix, but wha t should we do if we want the other eigenvalues? Fortunately. the re arc several variations of the power method thilt can be applied. T he shifted power method uses the observation that, if A is an eigenvalue of A, then A - a is an eigenvalue of A - rxl for any scalar a (Exercise 22 in Section 4.3). Thus, if AI is the domi nant eigenvalue of A, the eigenvalues o f A - Al l will be 0, A2 - AI> AJ - A" ... , AQ - AI' We can then a pply the power melhexl to compute A2 - AI ' and fro m Ih is value we can fi nd A2• Repeilllllg this process will allow us to compute all of the eigenvalues.
Example 4.32
Use the sh ifted power m ethod to compute the second eigenvalue o f Ihe matrix A ""
U~ ]
fro m Example 4.30.
Solutloa
In Example 4.30, we found tha t AI = 2. '10 find A2, we apply the power
method to
A -2 / = We take Xo = ill Table 4.4.
[-I I] 2
-2
[~], but other cho ices will also wo rk. The calculations are summarized
314
Chapter 4
Eigenvalues and Eigenvectors
Table 4.4 k
x.
r, ml
0
I
2
3
4
[~] [~]
[-:] [-~']
[-;'] [-~']
[-;'] [-~']
[-;'] [-~']
I
2
-3
- 3
-3
Our choice of X o has p roduced the eigenvalue -3 afTer only l wo ilcralions. Therefore , Al - AI "" - 3, so A2 = AI - 3 = 2 - 3 = - I is lhe second eigenYalueof A: . . t
Recall from propert y (b ) of Th~ rem 4.18 that if A is invert ible wi th eigenvalue A, then A-I has eigenvalue 1/ A. Therefo re, if we a pply the power method to A-I, its dominant eigenvalue will be the recIprocal oftile smallest (Ill m ag nitude) eigenvalue o f A. ']0 use this inverse power method, we foll ow the sam e steps as in the power method, except tha t in ste p 2(a) we compute Xl = A J Yl-I' ( In practice, \'IC don't actually compute A -I explicitly; instead , we solve the equ ivalen t equation Ax, "" Yl- I fo r xk using Gaussian elimination. T his turns o ut to be faster.)
hampla 4.33
Usc the inverse power met hod to compute the second eigenval ue of the m atrix A =
[ ~ ~] from Example 4.30. S0111101
We sta rt, as in Example 4. 30, with Xu = Yo =
[~] . To solve AXI
Yo.
we use row redUCtlo n:
[Ai r,]
~ [:
I I 0] [ A l r']~20 [ 1 Hence,
Xl
= [
_~:~]. and, by scaling, we get Yl =
o
0]
I I
o
0.5 ] I - 0.5
[ _ : ]. Conlinuing, we get the
values shown in Table 4.5, where the values fII, a re convergi ng to - I. Thus, the smallest eigenvalue of A is the reciprocal of - I (which is also - I ). This agrees with our previo us fi nd ing III Example 4.32.
Section 4.5
Table 4.5 k
x, y, III~
0
[~] [~] I
,
I
2
3
[ 05] -0.5
[ - 0 5] 1.5
[ 05 ] - 0.83
[ 05]
1.5
-0.83
- 1.1
315
Iterative Methods for Computing Eigenvalues
6
5
- 1.1
7
8
[ 05] - 0.95
[ 05 ] - 1.02
[ 0;] -0.99
[ 05] - 1.01
- 0.95
- 1.02
- 0.99
- 1.01
[-:] [-~33] [-~ 6] [-~.4S] [-~ 52] [-~49] [-~51] [-~SO] 05
The Shilled loverse Power Method The most versatile of the varilmts of the power method is one that combines the two just mentioned. It can be used to find an approxi mation fo r (my eigenvalue, provided we have a close approximation to that eigenvalue. In other words, if a scalar a is given, the shifted i,JVerse power method will fi nd the eigenvalue A of A that is closest to a . If A is an eigenvalue of A and a *" A, then A - a Tis invertible if a is not an eigenvalue of A and 1/(,\ - a ) is an eigenvalue of (A - a1) -I. (See Exercise 4S.) If a IS close to A, then 1/( ,\ - a ) will be a dominam eigenvalue of (A - a1) -I .ln fact, If a is very close to A, then 1/(A - a ) will be nlltcl! bigger in magnitude than the next eigenvalue, so (as noted in the third Remark following Example 4.3 1) the convergence will be very ra pid.
lKample 4.34
Use the sh ifted inverse power method to approximate the eigenval ue of
0
-,
A ~
-2 that
IS closest
Sollll,'
-6 - 12
5 12 -2
IO
to S.
Shifting. we have
-5 A - SI =
-,
5
-2
-2
7
-6 - 12 5
Now we apply the inverse power method with I
lCtJ = Yo =
1 I
We soh'e (A - 5l)xI = Yo fo r X I :
-5
[A - SI I Yo] ~
-,
5
-2
-2
7
-6 I - 12 I 5 I
0 0 -0.61 0 I 0 -0.88 0 0 I -0.39 I
•
316
Chapler 4
Table 4.6 k o x,
y, /Ilk
I I I I I I I
Eigenvalues and Eigenvt'Clors
I
2
3
4
5
6
7
- 0.61
- 0.41
- 0.47
- 0.49
- 0.50
-O.SO
- O.SO
- 0.88
-0.69
-0.89
- 0.95
- 0.98
- 0.99
- 1.00
- 0.39
- 0.35
- 0.44
- 0.48
- 0.49
- 0.50
- 0.50
0.69
0.59
0.53
0.51
0.50
0.50
0.50
1.00
1.00
1.00
1.00
1.00
I .00
0.45
1.00 0.51
0.50
0.50
0.50
0.50
0.50
- 0.88
-0.69
- 0.89
- 0.95
- 0.98
- 0.99
- 1.00
Th iS gives - 0.6 1
x,
~
- 0.88 - 0.39
,
"'I
~
- 0.88,
, nd
I
y 1- -x 1
-
111 1
I 0.88
- 0.61
0.69
- 0.88
I
- 0.39
0.45
We co minue in this f;lshion to obtai.n the values in Table 4.6, from which we deduce t hat the eigenvalue of A closest to 5 is approximatel y 5 + 1/ "'7 "'" 5 + 1/( - 1) = 4, which, in fact, is exact.
The power method and its va r ia nts represe n t only one approach to the computa tion o f eigenvalues. In C ha p ter 5, we will d iscuss another method based o n the QR factorizat ion o f a matrix. ror a more complete t reatmen t of this tOPIC, you can cons ult almost any textbook on n u merical methods. / _+~
W e owe this theorem to the
Russian mathematician S. Gerschgorin, who stated it in 1931. It did nOI receive much attention unli11 949, when it was resurrected by Olga Taussky-Todd in a note she published in the Ameri(1Ir1
Mllthematical/'.iomh/y.
7 Gersebgorln's Theorem In this seclJon, we have d i~c ussed several variations on the power method for approximating the eigenvalues of a ma trix. All o f these methods are iterative, and the s peed with whic h they converge depends o n the choice of in itial vector. If o n ly we had some " Inside infor ma t ion" about the location o f the eigenvalues of a give n ma t rix, then we could make a j udicious choice of the initial vector and perhaps speed up the convergence of the iter ative process. Fortunately, there IS a way to es timate the locatio n of the eigenvalues of a ny ma t rix. Gerscllgorin's Disk Theorem stales t hat the eigenvalues of a ( real or complex) " X II matrix all lie inside Ihe union of II circular dIsks in the complex pla ne.
Definition
Let A "" [(Iii] be a (real or complex) /IX 11 matrix, lind let r, denote the sum o f t I.' absol u te values of the off-d iagonal enlries in the ith row of A; that is, r; = I(I,;!. The ith Gerschgorin disk is the circular disk D, in the complex
L
",
plane w ith center a ;; and rad ius r j • That is,
0 , = {zinCqz -
(I'i):S r,}
Section 4.5
Iterative Methods for Computmg Eigenvalues
311
Olga Taussky-TocId ( 1906-19')5 ) was born in OlmuTZ in the Austro-Hungarian Empire (now Olmu;!;; in Ihc Czech Rep ublic). She received her doctorate in number theory from the University of Vienna in 1930. During World War [I. she worked for the National Physical Labo ratory in London, whe re she investiga ted the problem of flutter in the wings of superson ic aircraft. Although the problem involved differential equations, Ihe stability of an aircraft depended on thc eigenvalues of a related rnatnx. 13 ussky-Todd remembered Gerschgorin's Theorem from her graduate studi es in Vienna and was able to usc it 10 simplify the otherwise laborious computations needed to determine the eigenvalues relcvantto the fluner problem. Taussky -Todd moved to the United States in 1947, and ten yea rs later she became the first woman appointed to the California Institute of Technology. In her career, she produced over 200 publications and received numerous awards. Sne was instrumental in the development of th e bnlnch of mathematics now known as matrix theory.
f
Example 4.35
Sketch thc Gerschgorin disks and thc eigenvalues for thc following matrices:
50luliOD (a) The two Gerschgorin disks are centered at 2 and - 3 with radii I and 2, respectively. T he characteristic polynomial of A is A2 + A - 8, so the eigenvalues
'"
A ~ (-I ±V I' - 4(-8))/ 2 ~ 2.37. -3 .37
Figure 4.14 shows that the eigenvalues are contained wi thin the two Gerschgorin disks. (b ) The two Gerschgorin disks are cen tered at 1 and 3 with rad ii 1- 31 = ) and 2, respectively. The characteristic polynomial of A is Al - 4A + 9, so the eigenvalues arc
A ~ (4 ± V( - 4)' - 4(9))/ 2 ~ 2±
iVs ~ 2 + 2.230.2 -
2.230
Figure 4.\5 plots the location of the eigenvalues relative to the Gerschgorin disks.
1m 4
-6
--+_Re
-4
4
-4 figure 4.14
318
Chapter 4 Eigenvalues lilid Eigenvectors
)'
4
2 -+-~x
-4
6
,-4
Flgur • • .15
As Example 4.35 suggests, the eigenvalues of a matrix are con tained with in Its Gerschgorin disks. The next theorem verifies that lhis is so. o
Tbeur•• 4.29
Gerschgorin's Disk Theorem
"X"
Let A be an (real or complex) matrix. Then everyeigen\"alue of A is contained withi n a Gerschgorin disk.
Prool Let A be an eigenvalue of
A with corresponding eigenvector x. Let x, be
the en try of x with the largest absolute value-and hence nonzero. (Why?) Then Ax "" AX, the jth row of which IS
x, "" Ax, or
Rearranging, we have
"*
because x, O. Takll1g absolute values and using properties of absolute value (see AppendIXC) , we obtain
I A - a"l~
*
because Ix)1s Ix,1for j i. ThLs establishes that the eigenvalue A is contained with in the Gerschgon n disk centered at an with radius T;-
Section 4.5
319
Iterat ive Methods for Computing Eigenvalues
1I ••• rU • There is a corresponding version of the precedin g theorem for Gerschgori n disks whose radii are the sum o f the off-diagonal entries in the jlh column o f A. • It can be shown that if k of the Gerschgorin disks 3fC disjoint from the other disks, then exactly k eigenvalues are contained within the union of these k disks. In particular. if a single d isk is disJoint fro m the o ther diSks, then it must contain elt3ctly one eigenvalue o f the malrix. Example 4.35(a) illustrates this. • Note that In Example 4.35(a),0 is not contained in a Gerschgorin disk; that is. o is not an eigenvalue of A. Hence, without any further computation, we can deduce
that the mat rix A is invertible by Theorem 4.16. This observation is particularly useful when applied to larger matrices, because the Gerschgorin disks can be determmed dIrectly from the entries of the Illat rix.
(Kample 4.36
2 Consider the matrix A =
,1
I
0
! .
6 Gerschgorin's Theorem tells us that the eigen2 0 8 values of A are contained WIthin three disks centered at 2,6, and 8 with radii I , I, and 2, respectl\'e1y. See Figure 4. 16{a). Because the fi rst disk is disjoint from the other two, it must contam exactly o ne eigenvalue, by the second remark after Theorem 4.29. Because the characteristic polynom ial of A has real coefficients, if it has complex roots (i.e., eigenvalues of Al, they must occur in co njugate pairs. (See Appendix D. ) Hence there is a unique real eigenvalue between I and 3, and the union of the o ther two disks contains two (possIbly complex) eigenvalues whose real parts lie between 5 and 10. On the o ther hand, the first remark after Theorem 4.29 tells us that the same three eigenvalues of A arc contai ned in di sks centered al 2, 6, and 8 with radii ~ , 1, and respectively. See Figure 4. 16(b). These disks are mutually disjoint, so each contains a single (alld hence real) eigenvalue. Combining thest results, we ded uce that A has three real cigenvaJues. o ne in each of the intervals II, 3), 15, 71 . and 17.5, 8.5\. (Compute the actual eigenvalues of A to verify this.)
t,
1m 4
1m
4
2
-1-- -i-l~++-H-+- ~R, 4 o
+-+-+R,
2
10
-2 -4
FI,.r. 4.1'
- 4 (b)
321
Chapter 4
Eigenvalues and Eigenvectors
III Exercises 1-4, a matrix A is g"'en alollg with all iterate Xs, produced as ill Exmllple 4.30. (a) Use these data to al'l"oxilllate a domillallt eigenvector whose first componelll is I and a corresponding dOlllill(lfIt eigenvalue. (Use three-decimal-place accuracy.) (b) Compare your lIpproxilllate eigenvalue ill part (aJ witlr tire actual dominallt eigenvalue. 1 A = .
2. A =
3. A
=
12. A = [ 3.5 1.5 13. A =
9 4
14. A =
8 3 1
4] =[ 78"]
- 3904 [-: 4] I'xs = ['489 [~ ']
-\ '~
2.0
0.5] [ 60.625 ] 3.0 ,xs = 239.500
III mrcises 5-8, a matrix A is givell alollg with all iterate x k, prOlillced using the power met/lod, as ill Example 4.31. (a) Approximate the dominant eigem'a /lI 1! alld elgellvector by campI/Illig the corresponding Ink and r,.. (b) Verify that YOIl hm'e approximated an eigenvalue and (In eigenvector of A by comparlllg Ay, with III kY k " 5. A =
[_! ~~J.
6. A =
[~ _~J. XIO = [~:!~~]
8. A =
=[', !lXo = [~ J. k = 6
[ 4443] ' '] [ 5 4 ' xS = 11109
4. A = [ 1.5
7.A =
I I. A
Xs = [
4 0 6 - [ 3 r 6 0 4 \ \
~~: ~~~]
, x~=
1.5] y- = [ ' ] k=6
- 0.5
4
V
0'
8 - 4 ,Xu =
15
-,
9
1 0 3 1 1 3
0
"
,Xjl =
1 1 ,k = 6 1
In Exercises 15 tmd 16, use the power method to approximate the domillallt eigenvalue ami eigellvector of A to twodecimal-place accuracy. CllOose any initial vector YOlllike (but keep the first Remark on page 312 in mind!) and apply
the method ulltiJ tile digit in the second ,lecimal place of the iterates stops changillg. 15. A =
4
1
3
0
2
0
,
,
2
16. A
0.00 1
19. Exercise 13
20. Exercise 14
10.000
- 3, XIO =
2.9 14
o -\
I
- 1.207
In Exercises 9-14, use tire power mel/lad to "l'l'roxlmate the domimlllt elgellva/lte mrd eigenvector of A. Use tile givell illilial vector X O ' tile specified number oj iteratIons k, and three-dec; 1//(11· place (lcm racy.
-6
to. A = [ 8
= [ 'I] '
6 0 6
- 6 - 2 12
Ray/elgh ijuotients are describetl ill tire fOllrth Remark 011 page 313, III Exercises 17- 20, /0 see Irow t're RayleIgh ijllotlellt method approximates tire dominant eigelwaille 1II0re rapidly than the ordinnry power method, compllie the sllccessive RayleIgh ijllotiems R(x,l fori = I, "" ,kfor tire mMrix A ill the given exercise. 18. Exercise 12
3.4 15
Xo
=
12 2 -6
17. Exercise 11
- 2
" 3' ]
1 ,k = 5 1
10.000
2 \
9. A = [ 145
1
k= 5 24. A
=
1
1
5e<:lion 4.5
III ExerCIses 25-28, tlte power metltod does tlot converge /0
tire dominam eigenvalile and eigel,vector. Verify this. using tlte givert inili(11 vector xo' Compllte the exact eigenv(lllIes ami eigenveclors and explain IV/wt is happelling. 25. A = [ - I 26. A =
28. A =
42. p{x) = xl - x - 3.0' := 2 43. p(x) = x} - 2xz + 1,0' = 0 44. p(x )= xl - 5x2+ x+ 1,0'=5 45. Let A be an eige nvalue of A with corresponding eigen vector x.If 0' *" A a nd cr is not an eigenvalue of A, show that I /( A - cr) is an eigenvalue of {A - a I) - I with co rresponding eigenvector x. ( Why must A - 0'1 be invertible?)
- I
- 5
1
0 7
4
7 0
1
- 5
27. A =
321
lIerative Meth ods for Computing Eigenvalu es
1
- I
0
1
1
0
1
- I
1
1 , ~
1
=
1 1
. ~
=
47.
29. Exercise 9
30. Exerctse 10
31. Exercise 13
32. Exe rcise 14
III Exercises 33-36, apply the !IIverse power method to approximate, for the IIIMrix A III rite given exercise, the eigen-
value that is sll/allest in ml/gll/tutle. Use the given mitial vector xo' k iterations, ami tftree-decimnl-pl(/ce accuracy. 34. Exercise 10 1
I •k
=
5
-I
36. Exe rcise 14
/" Exercises 37-40, lise tlte sl"fied Im'erse power me/hotl to approximate, for tlte matrix A in the given exercise, the eigenwl/lle closest to 0'. 37. Exercise 9, cr "" 0 39. Exercise 7, a
=S
,,
1
0
4
1
1
0
5
1
III Exercises 29-32, apply the slrified power method to approximate tile second eigenvaille of the matrix A ill tile given exemse. Use the given illllial vectorx o• k iterations, ami I/tree-decimal-plnce accumcy.
35. Exe rcise 7, Xo =
In ExeTCIses 47-50, draw the Cerschgorill disks for the given matrIX.
1 1
33. Exe rcise 9
46. If A has a d o m inant eigenvalue AI' prove that the eigenspace EA, IS one-dimensional.
38. E.'xercise 12, a :::: 0 40. Exercise 13, cr "" - 2
Exercise 32 ill Sectioll 4.3 demOllstrates that every polyllomia/ is (plw or mill/IS) the chamcleris/ic polynomial of ItS own companion lIU1trix. Therefore, the roots ofll polynomial p are the eigel/values ofC( p). HI!TlCl!. we (nn lise Ille meth ods of this section 10 approximate tlte rOOIS of IIny polYllomial whell exact resliits are lIot rea.dily Ilvmlable. In F..xercises 41-44, apply the shifted illl'crse polver metllad to thecompamotJ matrix C( p) ofp to approximate the root of p closest to a to three decimal places. 41. p(x) = :<- + 2x - 2,cr = 0
,
49.
1
+
,
. , + - ,.
, •
- 2i
,!
, 0, 4 ,
0 0
0
1
6
1
0
!
2
0
1
•
•
-,
5
0
2; 1 + ;
1
0
- I
1
50.
48. •
4 - 3i
,
2
- 2i
1
2
-2
0
0
+ 6i
2;
2;
-S - 5i
•8
51 . A square matrix is striclly diagollally dominant if the a bsolute value of each diagonal entry is greater tha n the sum of the abso lute val ues of the remaining entries in Ihat row. (Sec Section 2.5. ) Use Gerschgorin's Dis k Theorem to prove that a strictly diagonally dominan t matTlX must be in vertible. [Hint: See Ihe third remark after Theore m 4.29.) 52. If A is an /I X /I matrix, let II A II denote the maxim um of th e sum s of the absol ute values of the rows of Aj tha t is, IIAII = max ( I S.S n
±la,jl
).(see Sectio n 7.2.) Prove
J"' J
that if A is an eigenv'llue of A, then 1A1 s
II A 11 .
53. Let A be an eigenvalue of a stochastic matrix A (see Seclion 3.7). Prove tha t IAI s I. I Him: Apply Exercise 52 to AT.] 54. Prove that the eigenval ues o f A =
0
1
2
5 0 0
!
,
0
0
0
0
0
,
3
!
;
7
•
ace
all real, a nd locate each of these eigenvalues Within a closed in terval on the real line.
322
Chapter 4
Eigenvalues and Eigenvectors
and Ihe Perron-Frobenius Theorem In this section, we will explore several applications of eigenvalues and eigenvectors. We begin by revisiting some applications from previous chapters.
Markov Chains Section 3.7 introduced Markov chains and made several observations about the transition (stochastic) matrices associated with them. In particular, we observed that if P is the transition matrix of a Markov chain, then P has a steady state vector x. T hat is, there is a vector x such that Px = x. This is equivalent to saying that P has I as an e igenvalue. We are now in a position to prove this fact .
Theore. 4.30
If P is the nX n transi tion m atrix of a Markov chain, then I is an eigenvalue of P.
Prool
Recall that every transition matrix is stochastic; hence, each of its columns sums to I. Therefore, if j isa row vector consisting of n Is, then jP = j. (See Exercise 13 i n SectIOn 3.7.) Taking transposes, we have
pTf
=
(jp) T = jT
which implies that f is an eigenvector of pT with corresponding eigenvalue 1. By Exercise 19 in Section 4.3, Pand p Thave the same eigenvalues, so I is also an eigenvalue of P. In fact, much mo re is true. For most transition matnces, evcryeigenvalue A satisfies IAI < 1 and the eigenvalue 1 is dominant; tha t is, if A =t- 1, then IAI < I. We need the following two definitions: A matrix is called positive if all of its entries are positive, and a square matrix is called regular if some power of it is positive. For example,
A = [I
•
Theorem 4.31
[~ ~J
is pOSitive but B =
[~ ~ ]
is not. However, B is regular, since B2 =
~ ~] IS positive.
Let P be an nX n transitio n matrix with eigenvalue A.
a·IAJ S I b. If P is regular and
A =t-
I, then IAI
<
I.
Prool
As in Theorem 4.30, the trick to proving this theorem is to use the fact that pT has the same eigenvalues as P.
(a) Let x be an eigenvector of pT corresponding to A and let Xk be the component of x with the largest absolute value m. Then Ixil S Ixlj = m for i = 1, 2, .. . , n. Comparing the kth components of the equation pTX = Ax, we have PlkXI
+
P2k~
+ .. + P"kX"
=
Ax,
Se<:tion 4.6
Applications and the Perron- Frobenius Theorem
323
(Remember that the rows of pT are the colum ns of P. ) Taking absolute values, we obtain
IAlm= IAllx,1= IAxll = IPax, + P2l~ + ... + !,,,,x,,1 <: IPIIXIi + IPHXJI + ... + Ip"*x", = p' llxd + PztlXll + ." + p",lx,,[
(I )
:s Pit'" + Pa nl + .. + p .. ,m =
(PI!: + Pll. + ."
+ p",)111 "" m
The first inequality follows from the Triangle Inequality in R, lind the last eq uality comes from the fact that the rows of pT sum to I. Thus, IAlm:S m. After dividing by 111, we have IAI :!f 1, as desired.
eb) We will prove thc equivalent implication: If lAI
then A = I. First, we show that it is true when P {and therefore p T) is a positive matrix. If IAI = 1, then all o f the inequalities in equations ( I) arc actually equal ilies. In particu lar, Plkix , l
+ PHlx21 + ,.. + p"kix,,1
= \,
+ Pu m + ... + p~km
= PIA'"
Equivalent ly,
PIl(m - Ixd) + PH( m - IXlI) + .. + P"I;(III -
i x~1) = 0
(2)
Now, since P is positive,p,t > 0 for i = 1,2, . . . , II. Also, III - IxJ~ 0 fo r i = t, 2, ... , II. Therefore, eac h summand in equation (2) must be zero, and this can happen o nly if IxJ= In fo r , = t, 2, ... , II. Furt hermore, we gct equality in the Triangle Inequality III IR if and on ly if all of the summands are positive or all are negative; in other words, the P,t X, 's all have the same sign. This implies that
x
=
m
- 111
m
-
-
mf
0'
x-
= _ mjT
- m
m
)
III
where j is a row vecto r of n Is, as in Theorem 4.30. Thus, in either case, the eigenspace of pT co rresponding to A is EA = span(jT ). But, using th e proof of Theorem 4. 30, we see that { = pTj T = A{, and, co mparing components, we find that A = 1. This handles the case where P is positive. If Pis regular, then some power of P,S posi.tive-say, pk.1t follows that pHI must also be positive. (Why?) Since Ai and At> I are eigenvalues of pi- and j Y< + I, respectively, by Theorem 4.1 8, we have just proved that AI: = AHI = 1. Therefore, ,.\l(A - I) = 0, which implies that). = I, since A= 0 is impossible if IAI = I. We can now explain some of the behavior of Markov chains that we observed in Chapter 3. In Example 3.64, we saw that for the transition matrix p=
[0.7 0.2] 0.3
0.8
anc1 '''1 IOllia state vector Xu = [0.6]h , te state vecto rs
0'] [ .
0.6
0..
XI
converge to the vector x =
,a steady state vecto r for P ( i. e., Px = xl. We arc going to prove that for regular
324
Chapter 4 EIgenvalues and Eigenvc(tors
Markov chains, this always ha ppens. Indeed, we will prove much more. Reca ll that the state vectors x*satisfy x k = P"x o' Let's invest igate Whlll happens to th e powers p l as P becomes large.
Example 4.3J
0.2]h' . .equallon . [00.3.7 0.8 as c laractcnstlc
The transition matrix P ""
0"" det (P- AI) ""
0.7 - A 0.3
0.2
,
(
0.8 _ A = A - l. SA + 0. 5 = A - 1)(A - 0.5)
so its eigenvalues arc AI "" 1 and A2 "" 0.5. (Note lhat, thanks to Theorems 4. 30 and 4.31, we knew in advance that 1 would be an eigenvalue and th e other eigenvalue would be less than \ in absolute vlllue. However, we sti ll needed to compute A1.) The . clgenspaces are
and
.
So, taking 0 =
[2
3
_ : ] ,we knowlhat O- IPO ""
[~ ~.5 ]
= D.r:rom themethod
used in Example 4.29 in Section 4.4, we have
']-'
-I
Now, as k --+ 00, (O.5 )k --+ 0, so
1][' 0][2 1]-'
,nd
- \00
3-\
=
[OA ~A l 0.60.6
(Observe that the columns of th is "limit matrix" a re identical and each is a steady state vector for P.) Now let
~=
[:] be any Imtlal probability vector (i.e., a + b = 1).
Then
x, = P'x,~ [ OA
0.6
OA ]["] = [OA" + 0.4b] = [0.4] 0.6 b 0.6a + 0.6b 0.6
Not onl y does this explain what we saw in Example 3.64, it also tells us that the state vectors xk will converge to the steady state vector x
=
004 ] for any choice of xo!
[0.6
--t
There is nothing special about Example 4.37. The next theorem shows that this type of behavior always occurs with regular transition matri(es. Before we can presem the theorem, we need the followin g lemma.
Lell •• 4.32
" lei P be a regular fiX II transition matrix. If P is diagonalizable, then the dominant eigenvalue AI = I has algebraic multiplicity 1.
3':5
Section 4.6 Applications and the Perron-Frobenius Theorem
P,OOf
The eigenvalues of P and p1' arc the same. From the proof of Theorem 4.3 1(b), AI = 1 hasgeomctric multipl icity 1 as an eigenvalue of pT. Since P isdiagonalizable, so is pT, by Exercise 41 in Section 4.4. Therefore, the eigenvalue = I has algebraic multiplicity I, by the Dlagona lization Theorem.
"I
Theorem 4.33
,I
Let P be regular nX II transition matrix. Then as k --+ 00, pi approaches an "Xn matrix r. whose col umns are identical, each equal to the same vector x. This vector X is a steady state probability vector fo r P.
P,oof See Finite MarkOl' Chains by J. G. Kemeny and J. L Snell (New York: Springer-Verlag, 1976).
l b simplify the proof, we will consider only the case where P ISdiagonalizable.
The theorem is true, however, withou t this assumption. We diagonalize Pas Q- IPQ = D or, equivalently, P = QDQ-I , where
A, 0 0 A,
D~
0
•• •
...
0
0 0
A.
From Theo rems 4.30 and 4.31 we know that each eigenvalue A, ei ther is 1 or satisfies 1"-,1 < I. Hence, as k ---}- 00, A1approaches I or 0 for i = 1, . .. , n. It follows that rf approaches a diagonal matrix- say, D*-each of whose diagonal en tries is 1 or O. Thus, pk = QdQ- 1 approaches L = QD* Q-I . WC write Iimpl
,\'I'e are taking some liberties with the notion of a limit. Nevertheless,
these steps should be intuitively d ear. Rigorous proofs follow from the properties of limits, which rou may have ellcoulltered in a calculus course. Rather than get sidetra cked with a disc ussion of matrix limi~ we will omi t the proofs.
= L
Observe that PL= P li m~= hmPp'< = lim ph i = L k -oo
'->00
A-ooo
Thereforc, each column of L is an eigenvector of P correspondlllg to AI = I. To see that each of these columns is a probability vector (i.e., L is a stochastic matrix), we need only observe that, if j is the row vector with fi l s, then = klim iP = k--->oo limj = j ;L = ;limpt 1_0() .....o
for scalars
'I' '", ... ,'".Then, by the boxed comment following Example 4.21 , '*
By Lemma 4.32, A} I forj oF- I, SQ, by Theorcm4.3 I(b). IA)1 A}k ---}- 0 as k --+ 00, for j I. It follows that
*"
Le· = lim pke = 'I VI ,
k->oo
'
<
I for j 'I: I. Hence,
U,
Cha pter 4
Eigenvalues and Eigenveclol'$
I n other words, column j of L is an eigenvecwr corresponding 10 A, "" 1. But we have shown that the columns of LaTe probability vectors. so Le, is the I1l1iqlle mulliple x of V I whose componen ts su m to I . Since this is true for each column of L, it implies that all of the columns of L 3re identical. each equal to this vector x.
.... ark
Since L is a stochastic matrix, we can interpret it as the long range transi· tion matrix of the Markov chain . Th;lI is, Lij represents the probabi lit y of being in slate ;, having started from state j , if tile lmnsilions were to conlinlle intlefin itely. The fa ct that the columns of L 3re identical says that the starting state ,Ioes IIOt matler, as the next example illustrates.
hample 4.38
Recall the rat in a box from Example 3.65. The transition matrix was p ""
, ,
O
!
j
~
0
i
i i
0
We determined that the steady state probability vector was j
x ""
•
,•• l
Hcnce, the powers of Papproach
,• ,• ,• ,• ,• ,• • • • j
L=
j
!
=
0.250 0.375 0.375
0.250 0.250 0.375 0.375 0.375 0.375
from which we can see that the rat will evefllllally spend 25% of its time in comparl· men! J and 37.5% of its time in each of the other two co mpartments.
We conclude our discussion of regular Markov chains by proving thaI the steady state vector x is independen t of the initial slate. The proof is easIly adapted to cover Ihc case of state vectors whose components sum to an arbitrary constant- say, s. In I he exercises, you arc asked 10 prove somt" other properties o r regular Markov chains.
Theore. 4.34
lei Pbc a regular n X 11 transition mat rix, wi th x the steady sta te probabilit y vecto r
ror P. as in Theorem 4.33. Then . for any initial probability vector x(l> the sequence of iterates Xl' approaches x.
PrDol
Let
x, x,
x.
Section 4.6
where XI
+ x 2 + .. + xn ""
321
ApplicatIons and the Perron-Frobenius Theorem
I. Since xi; = pk" o' we must show that lim pk " 0 = x. NO\v, ,~
by Theorem 4.33, the long range tra nsition matrix is L = [x x . . . x ] and limP" = ,~
L Therefore.
x,
= {x x
• •
x" "" XI X "" ( Xl
+ X2X + ... + In " + X I + .. + Xn) X =
X
Population Growt. We return to the Leslie model o f population g rowth. which we fi rst exp lored in Section 3.7. In Example 3.66 in that section, we saw that fo r the Leslie malrix
043 L=
0.5
o
0 0.25
0 0
iterates o f the population vectors bega n to approach a mult iple of the vector
18
x-
6 I
In o ther words, the three age classes o f th is population eventually ended up in the ratiO 18: 6: I . Moreover, o nce this state is reached, it is stable, since the ratios for tile fo llowing year are given by
o Lx =
0.5
o
4 0 0.25
3
18
o
6
0
I
27
=
1.5x
9 1.5
and the components are still in the ratio 27: 9: 1.5 = 18 :6: 1 . Observe that 1.5 repre· sents the growth rate of this population when it has reached its steady statc. We can now recognize that x is an cigenvector o f L co rrespo nding to the eigenvalue A = 1.5. Thus, the steady state growth ra te is a positive eigenvalue of L. and an eigenvecto r correspond ing to this eigenvalue represents the relative sizes of the age classes when the steady state has been reached. We can compute these di rectly, witho ut having to iterate as we d id before.
Example 4.39
Fmd the steady state growth rate and the corresponding ratios between the age classes for the lLslie matrix Labove.
SII.tll. We need to fi nd all positive eigenvalues and correspond ing eigenvectors o f L The characteristic polynomial of L is
d
- A
4
3
0.5
- A 0.25
o ""
o
-A
_ AJ
+ 2A + 0.375
321
Chapter 4
Eigenvalues and Eigenvecto rs
so we must solve - A) + 2A ing, we have
+ 0.375 =
0 or, equivalemly, 8A J
-
16.-\ - 3 = O. Facto r-
(2) - 3)(4A' + 6A + I ) ~ 0 (See Append ix D.) Since the second fac to r has only the roots ( - 3 + VS)/4 "'" - 0.19 and ( - ) - VS)/4 "'" -1 .31, the only positive root of this eqwl tio n is A = = 1.5. The corresponding eigenvectors arc 111 Ihe null space of L - 1. 5[, which we find by row reduc tion:
i
- 1.5 [ L - 1. 5 / 1 0 J ~
0.5
o
4 3 0 - 1.5 0 0 0.25 - 1.5 0
,
I
0
o o
I
0
- 18 0 - 6 0 00
x, Thus, If x =
X2
is an eigenvector corresponding 10 A = 1.5, il satisfi es X I = 18X:J and
x, X:z
= 6xJ • That is,
"6
= span
x,
I
Hence, th" steady state growth rate IS 1.5, and when th is rate has been reached, the age dasscs are in the ratio 18: 6: I , as we saw before.
In Example 4.39, there was o nly one candidate for the ste'ldy state growth ra te: the unique positive eigenvalue of L But what would we have do ne if Lhad had m ore than one positive eigenvalue or none? We were also appa rently fortu na te that the re was a corresponding eigenvector all of whose components were positive, which allowed us to rcltlte these components to the size of the populat ion. We can prove that th is situa li on is not acciden tal; thai i.~, every Leslie matrix has exactl y one posilive eigenvalue a nd a corresponding eigenvector with positive comp onents. Recall that the form o f a Leslie matrix is
L-
b, b, b,
U,,_I
b"
0
0
0
0
"
0
0
0
0
"
0
0
00
0
"0 0
(3)
Since the entries sJ represent survival probabilities, we will assume that they are all nonzero (otherwise, the population would rapidly die out), We will also assume Ihal at least one of the birth pa rameters b, is nonzero (otherwise, there would be no births and , again, the population would die out). With these smndmg assumpt ions. we can now prove the assertion we made above as a theorem .
Theore. 4.35
Every Leslie matrix has a unique positive eigenvalue and a correspond ing eigenvector with positive com ponents.
.,
Section 4.6
Proof
329
Applications and the Perron-Frobenius Theorem
Let L be as in eqwltion (3). The characteristic polynom ial of L is
CL(A) = det(L - AI)
"" (- I)"(An
-
b,A" - ' - b2sI A"- Z - bj S1S1A,,-J - ... -
b"s ,~
...
5,, _ 1)
= ( - I )"[ (A) (Yo u are asked to prove this in Exercise 16.) The eigenvalues of L are therefo re the roots off{A ). Since at le3st one of the birth panlluclers b, is positive and all orl he survival probabilities 51 arc posi tive, the coefficients of I (A) change sign exactly once. By Descartes's Rule of Signs (Appendix D), therefore, [ ( A) has e:'(actly o ne posi tive root. Let us call it AI" By direct calculation, we can check that an eigenvector correspondi ng 10 AI is
1
sdA, s,sd Ai 5 l s2sJ/ Ai
(You are asked to prove this in Exemse IS.) C learly, 311 of the com ponents of X I are I'ositive.
In fact, more is true. With the add itional requirement th3t two COllSCClltive birth parameters b,. and 1'1-1 are positive, it tur ns out that the unique positive eigenval ue Al o f L is dOllllt/atlt; tha t is, every other (real or co mplex) eigenvalue A of L satisfies lAI < AI ' (It is beyond the scope of th is book to prove this resuit , but a partial proof is outlined in Exercise 27 for readers who :lrc fam iliar wi th the algebra o f complex nu m bers. ) This explains why we get convergence to a steady state vector when we itera te the population vecto rs: It IS just the pO\"er method working for us!
Ue PerrOn-JrObenias neorem In the previous two applications, Markov chains and Leslie matrices, we saw that the eigenval ue of interest was positive and domina n t. Moreover, there was a corresponding eigenvector with positive components. It t urns out that a remarkable theo rem guarantees that this WIll be the case for a l H, A <: H, and so on .) Thus, a positive vecto r x satisfies x > O. Le t us define IAI = 11(/,,11 to be the matrix of the absolute values o f the entries of A.
330
Chapter 4
Eigenvalues and F.lgenvectors
Theore.. 4.36
•
Perron's Theorem Let A be a positive nX n matrix. The n A has a real eigenvalue A. with the fo llowing properties: a . AI> O b. AI has a correspond ing positive eigenvector. c. If A is any other eigenvalue of A, then IAI ~ AI"
,
In tuitively, we can see why the first two statements should be true. Consider the case of a 2 X 2 positive mat rix A. The corresponding matrix transfo rmatio n ma ps the first quadrant of the plane properly into Itself, since all com ponents are positive. If we repeatedly allow A to act on the images we get, they necessarily converge toward som e ray in the first quad rant (Figure 4.17). A direction vector for this ray will be a positive vector x, wh ich must be mapped into some positive multiple of itsclf (say, AI)' since A leaves the ra y fixed. In other wo rds, Ax = Alx, with x and A. both positive.
Proof
for some nonzero vectors x , Ax ?: Ax for some scalar A. When this happens, lhen A(kx) ~ A(kx ) fo r all k > 0; thus, we need only co nsider unit vectors x. In Chapter 7, we will see that A m aps the set of all unit vectors in R" (the IHIII sphere) into a "generalized ellipSOid." So, as x ranges over the nonnegative vectors on th is un it sphere, there will be a maximum value of A suc h lhat Ax 2: Ax. (See Figure 4. 18.) Denote this number by AI and the corresponding unit vector by XI' y
y
y
y
, -+---~x
-+----+ x -+----+ x -l' '-----
Figure 4.11 y
Figure 4.18
-+ x
Sed.ion 4.6
Applications and t he Perron-Frobcnius Theorem
331
We nO\\l show that Ax l = Alx l. If not, then Axl > A1x l, and, applying A agai n, we obtain A(Ax l) > A(Alx l ) = A1(Ax I) where the inequality is preserved, since A IS positive. (See Exercise 40.) But then y = ( 1/II Axllj)Ax l is a unit vector that satisfi es Ay > Aly, so there will be so me A. > AI such that Ay 2: A2y . This contradicts the fact tha t AI was the maxi mum val ue wit h th is property. Consequently, it must be the case that A X I = Alx l; thai is, AI 's an eigenvalue of A. Now A is positive and X I is positive, so A,x l = Ax, > O. This means that AI > 0 and XI> 0, which completes the proof of (a) a nd (b). To prove (c). suppose A is any other (real or complex ) eigenvalue of A with co rrespondlllg eigenvector z. Then Az == Az, and, taking absolute values, we have (4)
where the middle inequality fo llows [rom the Triangle Ineq ual ity. (See Exercise 40.) Since jzI > 0, the unit vector u III the d ireCtIon of Izi is also positive and sa tisfies Au :;> IAlu. By the maximality of AI from the first part of thiSproof, we must have IAI:$: A,. In fact, more is true. It turns out that A, is dominant, so IAI < A, for any eigenvalue A AI. It is also the case thai AI has algebraic, and hence geometric, mult iplici ty L We will not prove these facls. Perron's Theorem can be generalized from positIVe to certain nonnegative matrices. Frobeni us d id so in 191 2. The resuit requires a technical condition o n the malrlx. A S(luare matrix A is called reducible if, subject 10 some permutation of the rows and the same permutation of the columns. A can be written It1 I>lock form as
"*
where Band D arc square. Equivalently, A is reducible matrix Psuch that
,r there is some permutatio n
(See page 185.) For eX:lm ple, the mat rix 2 4
A=
I
6 I
0 2 2 0 0
0
I
3
I
5
5
7 0 0
3
0
2
I
7
2
is reducible, since jnterchangmg rows I and 3 and then col umns I and 3 produces 72 i l 30 • 2 i •••. 4 ...... 5 -----5 -_I.. _--_.+ o O •i 2 I 3 o Oj 6 2 I o O i l 72
,
332
Chapter 4 EIgenvalue; and Eigenvectors
(This is just PApT, where
p-
0 0
0
I
I
I
0
0
0
0
0
0 0 0 0
0 0 0 I
0 0 0 0
0
I
Check Ihis!) A square matrix A that is not reducible is called irreducible. If Al > 0 for some k, then A is called primitive. For example, every regular Markov chain has a primitive transition matrix, by definition. It IS not hard to show that every prtmitive matrix is irreducible. (Do you see why? Try showi ng the cont rapositive of this.)
Theora. 4.31
The Perron-Frobenius Theorem Let A be an irreducible nonnegative
nX n
matrix. Then A has a real eigenvalue Al
with the following properties: a. Al > 0 b. Al has a corresponding positive eigenvector. c. If A is any other eigenvalue of A, then .A !SO AI' If A is primitive, then this inequality is strict. d. If A is an eigenvalue of A such that A = AI' then A is a (complex) root o f the equa tion An - A ~ = O. c. Al has algebraic multiplicity I .
S« Matrix Alwlysis by R. A. Horn and C. R. Johnson (Cambridge,
England: Cambridge Uruve~ity Pre$$, 1985).
The interested reader can filld a proof of the Perron-Froheni us Theorem in many texts on nonnegative matrices or matrix analysis. The eigenvalue AI is often calted the Perron root of A, and a corresponding probability eigenvector (which is necessarily unique) is called the Perron eigenvector of A.
linear Recarrence Relations The Fibonacci numbers are the numbers in the sequence 0, 1, 1. 2, 3, 5, 8, 13, 21 , ... , where, after the fi rSI two terms, each new term is obtained by summing the two terms preceding it. If we denote the nth Fibonacci number by f.. then this sequence is completely defined by the equations fo = 0, It = 1, and. for n 2. 2,
This last equation is an example of a linea r recurrence relation. We will return to the Fibonacci numbers, but first we will consider linear recurrence relations somewhat more generally.
Section 4.6
Applicatio ns and t he Perron-Frobenius Theorem
an
I.eonardo of PiS
Definition
Let (x,,) = (.\Q,XI'~, ... ) be a sequence ofnumbcrs that is defined
as follows:
I. A1l = "0, x, = a,•... , x* , = at_i. where no, a, •...• (j~-l are sca lars. 2. For all 11 ;:: k, Xn = C' X n _ 1 + CzX~ -2 + ... + ctx" ' b where c,. ' 2' •..• Cl arc scalars.
"*
If ' k 0, the equation in (2) is called a linear recurrence relation of order Ie. The equations in ( I) are refe rred to as the inilial cotlditiollS of the recllrrence.
Thus, the Fibonacci num bers satiSfY a linear recurre nce rela tion o f order 2.
He.lrU • If, in order to define the ni h term in a recurrence relation. we requi re the (11- k)th term but no term before it. then the recurrence relation has order k. • T he n um ber of ini tial condi tions is the order of the recurrence relation. • It is no t necessary that the first term of the sequence be called Xo . We could Slart at xl Qr anywhere else. • It is possible to have even mo re general linear recurrence rela tio ns by allowing the coeffici ents Ci to be functions ra ther than sca lars and by allowi ng an extra, isolated coefficient, which may also be a funct Ion. An example would be the recu rrence x" =
2x..-, - ,1'x,,_2 + - x,,-3 +
We will not consider such recurrences here.
lnmple 4.40
1
II
"
Consider the sequence (x~) defincd by the mitial condi tions x, = I, Xz == 5 and the recurrencc relation x" == 5x,,_, - 6X~_ l fo r tI :> 2. Wnte out the first five ter ms o f this sequence.
Salullaft
We are given the first two terms. We usc the recurrence relation to calculate the next three terms. We have XJ = 5x2 - 6xI == 5 · 5 - 6 · 1 = 19 ~ =
Xs
5xJ - 6X2 == 5 -1 9 - 6 ·5 = 65
= 5x~ - 6X3 = 5 · 65 - 6-19
so the sequence begins 1,5, 19,65,21 1, ....
= 2 11
es and Eigenvcctors
Clearly, if we were interested in, say, the lOOth term of the sequence in Example 4.40, then th e approach used there would be rather ted ious, since we would have to appl y the recurrence relation 98 times. It would be nice If we could fi nd an explicit formukl fo r x" as" fun ction of n. We refer to find ing such a fo rm ula as solving the recurrence relatio n. We will illust rate the process with the sequence from Example 4.40. To begin, we rewrite the recurrence relation as a matrix equation. Let
and int roduce vectors
XII
= [
x" ] for
II :>
2. Thus,
Xl
=
X,,_ I
[x,] ['], x I XI
J
=
[x,] ~ x2
[ l:] , x~ "" [~] : : [~~J,and so on. Nowobserve that' forn~2,wehave Ax,,_, =
[5 I
- 6][X,,_,] = [5X
o _, -
0
X,, _ 2
6X
o _,]
= [ Xo ] ~
Xn - l
x.
X,, _ l
Notice that this is the same Iype of eq uatio n we encountered With Markov chains and Leslie mat ri ces. As in those cases, we can write X ,, --
Ax ,,-
\ --
A' x,,-2 --
" ' -
AO- ' x2
We now use the techn ique of Example 4.29 to compute the powers of A. The characteristic equa tion o f A is ,\2 _
5,\
+6
= 0
from which we find that the eigenvalues arc AI = 3 a nd ..\2 = 2. (Notice that the fo rm of the characteristic equation fo llows that of the recurrence relatio n. If we write the recurrence as x" - 5 X n _ 1 + 6X,, _ 1 = 0, it is appa rent that the coeffi cients arc exactly the same!) The corresponding eigenspaces are
SettingP =
[ ~ ~], weknowthatfTIAP = 0
=
[~ ~]. ThenA = POp -l a nd
[: ~] [3~
2~] [:
[: ~W~
2~] [ - : - 2(3 k+ l )
+ 3(2 h l )] - 2(3') + 3(2' )
1t now follows that
Sect iol1 4.6 Applications and the Perron-Frohenius Theorem
335
from whtch we read off the solution x~ = 3 ~ - 2". (To check our work, we could plug in" = 1,2, . .. ,5 to verify that Ihis fo rmula gives the same terms that we calculated usmg the recurrence relation. Try it!) Observe that x is a linear combination of powers of the eigenvalues. This is nee· cssarily the Case as long as the eigenvalues arc distinct las Theorem 4.38( a) will make explicit). Using this observation, we can 5.1Ve ourselves some work. Once we have computed the eigenvalues AI = 3 and Az = 2, we can immediately write lt
where 'I and c2 are [0 be determined. Using [he illlt131 conditions, we have I =
XL
= cL3 1 + '12 1 = 3c\
+
5=
X2
= c132 + c22 2 = 9c
+ 4'1
2'1
when n = I and
when
fI
I
= 2. \Ve now solve the syMem
3cI
+
2c2
1
9c l + 4c2 =5
for CI and cl \0 obtain (I "" I and ':2 = -1. Th us, x" = 3" - 2", as before. This is the method we will use in practice. We now illustrate its use to find an explicit fo rmula for the FIbonacci numbers.
Example 4.41
Solve the Fibonacci recurrence fo = 0, It
= 1, a nd f~ = in-I
+ f~ - 2 for /I 2:
2.
Writing the recurrence as in - in-I - [,,- 2 "" 0, we see that the characteris· tic equation is A! - A - I = 0, so the eigenvll[ues are
SuJuliDa
[t foll o\\'5 from the discussion above that the solutio n to the recurrence relation has the form Jacques Binet ( 1786-1856) made contributIons to matrix theory, number theory. phySICS, and astronomy. He discovered the ruk for Illatnx multiplication in \ 812. Binet's formula for the Fibonacci numbers is actually due to Euler. who published 11111 1765; however, it was forgotten L1I1(11 Billet published hIS version in 1843 Like Cauchy. Binet was a royalist, and he lost his ulllversity position when Charles X abdicated In 1830. He recel\"ed many hOllors for hLs "'ark. IIlduding hiSelection, in 1843, to the Acad~ mie des Scu::nces.
for some scal
o = fo
= CIA? + " A~ = CI
+ c,
and Solving for ci and ':2, we obttlill ci = 1/ mlila for the IIIh Fibonacci number is
\IS and c2 =
- I/
\IS.
Hence, an explicit for·
I (I + VS)" I (I - VS).' VS 2 '" VS , ~
2
(5 )
336
Chapter 4
Eigenvalues and Eigenvtttors
Formula (5) is a remarkable formula, because it is defi ned in terms of the irraIto/wI num ber Vs yet the Ftbo nacci numbers are aU integers! Try plugging in a few values for II to see how the Vs terms cancel o ut to leave the mteger values!". Formula (5) IS known as Binet's formula. The method we have just outlined works for any second order linear recur rence rd ,lIion whose associated eigenvalues are all distinct. \Vhen there is a repeated eigenval ue, the technique m ust be modifi ed, since the d iagonalization method we used may no longer work. T he next theorem summarizes both situations.
Theorem 4.38
Let x" = (lX~_ 1 + bx.._ ! be a recu rrence relation that is satisfied by a sequence (x") . Let Al and A2 be the eigenvalues of the associated characteristic equation A2 aA - b = O.
*"
a. If Al A2, then x" = CI A; + ~A~ for some scalars c1 and S. h. If AI = Al = A, then x" = cIA" + C;>IIA ~ for some scalars c1 and '1. In eit her case, ci and S can be determined using the initial conditions.
Prool
(a) Genera lizing our discussio n above, we can write the recu rrence as x.. =
Axn_ 1> where
'
n
X. [ X,, _ I
=
1
and
A =
[~ ~]
Since A has d istinct eigenva lues, it can be di3gon3lizcd . The rest of the de tails are left fo r ExeTCIse 51. (b ) \Ve will show that x" = CI An + Cl IIA" satisfies the recurrence relation x" = aX.. _1 + bx.._1 or, equivalentl y, (6)
if A2 - tlA - b = O. Since X .. _ 1
=
, ,,-I + " (1/
CIA
x,, - 2 =
and
-
CI A "- ~
+ C2(II - 2) A,,- 2
substitution into equa tion (6) yields
x" - aX,,_1 - bx.._2 = ( cI A ~ + " IIA") - (/(CIA"- I + ,,(II - I ) A,,- I) - b( cI A,,- 2 + ~ (II - 2) A,,- 2) (I( An
-
aA"- 1 - !JA"-I)
+
~(/l A " -
a( /1 - I ) A"- I
- b( n - 2) , ··- ' )
= ( IA"- 2(A2 - aA - IJ) + C211A,,- 2(A2 - aA - b) + ~ A" - 2(aA + 2b) = cI A,,-2(0) + " I1A,,- 2(0) + = c1A"- 1( aA
~A " - 2 (aA
+ 2b)
+ 2b)
=
=
But since A is a double root o f ,\2 - (IA - b 0, we m ust have Ql + 4b = 0 and A a/2, using the quad ratic (ormula. Consequently, aA + 2b = cr/2 + 2b = - 4b/ 2 + 21J = 0, so
SeCtio n 4.6
331
Apphcatio ns and the Perro n-Frobenius Theorem
Suppose the in itial conditions are XV = r and x, "" s. Then, in either (a) or (b ) there is a unique soluti on for and '1- (Sec Exercise 52. )
'I
Ixample 4.42
Solve Ihe recurrence relatio n XV = I. x, = 6, and
x~
= 6x.. _, - 9xn_l fo r n 2: 2.
The characteristic equation is,\2 - 6A + 9 = 0, which has A = 3 as a dou ble root. By Theorem 4.38(b), we must have X n == e13" + ':zu3" = ( e. + ~ 1I ) 3 ". Since I = XV = c1 tl nd 6 = X I = ( ' I + ez)3, we fi nd that '2:: I,SO
SOlllilon
x" = ( I + /1)3" The techniques outlmed in Theorem 4.38 can be extend ed to higher o rder recurrence relations. We slale, without proof, the general result.
Theorem 4.39
Let x" = a ," _ \x~_ 1 + a .. _2x~_2 + "'" + ~x" '" be a recurrence relatio n of order III that is sa tisfied by a sequence (XII) ' Suppose the (lssoci:lIed characteristic polyno mial
' " - a", _ I A, ,,,-I_ a",_1ft• ...-2_ ... _
•
A
factors as (A - A1)"" (A - A2)"'; '" (A - AA)"", where 1111 + Then x~ has the form X,, :::: (cll A ~ + c12 nA ~ + c13 u2A7 + ... + cl",n",,-I An + ...
m.,:'"'.~..,F'mL
::
111.
+ (Ckl Aj; + cu /lAi: + cul12AI' + ... + Ckm/,m" -IAl)
SYSlemS 01 linear D111erenllaIIQualions In calculus, you learn that if x = x ( t) is a diffe rentiable fu nction satisfyi ng a differential equation of the fo rm x' :::: h, where k is a constant, then the genenll solut ion is x = ee b , where C is a constant, If an initial cond ition x (O) = ~ is specifi ed, then, by substitut ing I = 0 in the general solution, we fi nd that C = ~. Hence, the uniq ue solution to the differential equation that s(ltisfi es the ini tial conditio n is
Suppose we have n differen tiable fun ctio ns of I-say, x" X:z, .. . I x,,- that sallsfy a system of differential equations
x; =
a l 1x 1
+
+ ... +
" l n X ..
xi =
(l2 I X .
+ (ln X 2 + ... +
(l2" X ..
{11 2Xi
We C(l 1l wflle this system in matrix for m as x'
x(I)
~
XI( t) x,( I) •
x,,( I)
X'( I)
~
x;( I) .<,(1) x;,( I)
=:
Ax, where
• , nd
A~
il l I
a"
(Il l
an
a" I
(/,,2
Now we can use mat rix methods to help us fin d the sol ution.
a l II ••
. ..
al It
{/,,"
331
Chapter 4
Eigenvalues and Eigenvectors
First, we make a useful observation. Suppose we want to solve the following system of differential equatio ns:
Each equation can be solved separately, as above. to give XI
=
Ai :::
CI C
lr
C2e SI
where CI and C; are co nstan ts. Notice that. in matrix fo rm, o ur equation x' = Ax has a dillgOlwl coeffiCient mat ri.x
A
=:
[~ ~]
and the eigenvalues 2 and 5 occur in the exponentials e ll and e St of the solutio n. This suggests that, for an arbitrary system, we sho uld start by d iagonal izing the coefficient m at ri x, If possib le.
Example 4.43
Solve the following system of differential equations:
X;= x, +2x1 xl = 3x 1 + 2Ai Solullon ues
Here the coeffic ient mat rix is A =
are_~1 = 4 and
V l .... [
'\2
=
[~ ~ ]. and we fmd that the eigenval-
- I, with co rrespondi ng eigenvecto rs VI =
[~]
and
1], respectively. T herefore, A is d iagonalizable, and the matrix Ptha t docs
the job is
P=[v, v,]-[: -:] We know that
0] = D
- I
Let x = P y (so thllt x' = Py') and substit ute these results in to the o riginal cquiltion x ' == Ax to get Py' = APr o r, equ ivalcntly,
y' = P-'APy = Dy This is just the system
w hosc general solution is
Section 4.6
.. I
Applications and th<, Perron-Frobt-nius Th('Ornn
To find x, we just compute
=
2Cl c(' - c;c- ' and ~ so XI given system.}
Remlrll
= 3Cle
4
'
+
C1c '. (Check that these values satisfy the
Observe that we could also express the solution
x,.. cc14' 3 [ ' ] + cc-'[ - 1' ] - CC14 'y1 2
+
In
Example 4.43 as
CC-'y 2,
This technique generalizes easily to /I X rl systems where the coeffiC ient mat ri x is diagon,lllzable. The next theorem, whose proof IS left as an exercise, sum marizes the situatiOIl.
, l etAbean /lXndiagonal izable matrix and lei 1' = A,
0
o
A
x "'" CICA,/ v\
+
[VI
V1
•.•
vJ be such that
·.. o ·.. o
o o Then the general solution to the system x'
•
• ••
A.
Ax is
C,cA'V1
+ .. ' +
.
C eA.t v
,
The ncxt example involves a biological model in which two species live in the same ecosystem. It is reason;lble to lissume Ihat the growt h rate of each species depends o n the sizcs of both populations. (Of COUfse, there arc other fact ors that govern growth. but we will keep our model simple by ignoring these.) If XI (t) and x 2 (t) denote the sizes of the two populations at time t, then xj{t) and xi(t ) are their rates o f growth at time t. Our model is o f the form
x:{r) ::;: (lXI(t) + bx 2{t) xi(t) = eXI( t) + dx1(t) where the coefficients a, b, c, and d depend on the conditions.
lIample 4.44
Raccoons and squirrels inhabit the same ecosystem and compete with each other for food , water, and space. let the raccoon and squirrel populations at time t years be given by r et) and S( I), respecti vely. In the absence of squ irrels, the r
3411
Chapter 4 Eigenvalues and Eigenvectors
initially there are 60 raccoons and 60 squirrels in the ecosystem. Determine what happens to these two populations.
Solullop
Ou r system
IS
x' "" Ax, where
X ~x(I)~ The eigenvalues of A are A I [
-~] and
V1 ""
[ ' (I)] ,( I) ""
3 and
2.5 -1.0] A- [ -0.25 2.5
,nd
2, with cor responding eigenvectors
'\ 2 ""
VI
=
[ ~J. By Theorem 4.40. the general solution to our system is x(t ) = C,e" v , +
c,c - ~] + ~Cll[ ~] ['(0)] [60]
~e2'vl =
.. . .
The Imtml population vector LS x(O) =
J
5(0)
{
=
(7)
60' so, settmg t = 0 m equa-
tion (7). we have
Solving this equation, we find C, "" 15 and
C:! =
X(I) = 15'''[ - :] +
45. Hence,
45" '[:]
from which we find ret) = - 30e)' + 9Oe 2 ' and set) = ISe" + 4Se2 ,. Figure 4.19 shows the graphs of these two functio ns, and you can see clea rly that the raccoon population dies out after a little more than I year. (Can you determine exactly when it dies out?)
-i
We now consider a similar example, III which one species is a source of food for t he other. Such a model is called a predQfOr-prey model. O nce again, our model will be drastically oversimplified in order to illustrate its main fea tures. 3000
2000
, 000
01=0.2
~+--+-~I
0.4 0.6 0.8
1
- , 000
Flgur, •.19 Raccoon and squirrel populatio ns
1.6
Section 4.6
lIllIPIl 4.4.5
Applications and the Pcrro n-Frobenius Theorem
341
Robins and wo rms cohabi t an ecosystem. The robins e,lt the worms, which are their o nly sou rce o f food. The robin and worm populations al lime t years are denoted by r{t ) and w( Il, respecti vely, and the equatIOns governing the growth of the Iwo popu lations arc
r( l ) = W( I) - ' 2
+
w'( I ) = - ' ( I)
(8)
10
If initially 6 robins and 20 wo rms occupy the ecosystem, determ ine the behavIOr of the two populations over lime.
Solutlol The first thing we notice about th is example is the prese nce of the exira constants, - 12 and 10, in the Iwo equations. Fortu nately, we can gct rid of them witha simplcchangc ofvariablcs. lfwclet r(t ) = x(t) + JO and w(l) :: y( t) + 12, the n ret) = x '( t) and wet) = Y( l ). Substituting into equations (8), we have
X( I ) = y(l)
(9 )
y( l) = - x( I)
Ax, where A =
which IS easier to work with. Equations (9) have the fo rm x '
0 [- , x(0 )
. 1cond luo · . ns are ' ] . 0 ur new .Imua 0
= , (0 )
5O X(O) =
- 10
=6 -
10
= -.
yeO) = w(O) - 12 = 20 - 12 = 8
and
[-:1
Proceedmg as in the last example, we find the eigenva lues and eigenvectors of A. The characteristic polynomial is Al + I, which has no real roots. What should we do? We have no cho ice b ut to ust the comp\c;t roots, whic.h arc AI = i and Al = - i. The corresponding eigenvectors are also complex-namely. VI = [ I;] and
Vl "" [
_ I;]. By
Th eorem 4.40, o ur solution has the form
.
rr
ce-"[ -.']
X(I) = Ce"v + Ce "v 2 :::: ce [ '. ] + 11 1 1 1
·
Fromx(O) = [- : ] . weget
whose solutIo n is C, = - 2 - 4i and
~
= - 2
)<"[:] +
X(I) = ( - 2 - ••
+ 4;. So the solution
(- 2
to system (9) is
+ '.)'--[ ~ .l
What arc we to make of this solution? Robllls and worms inhabit a reaJ worldyet o ur solutio n involves complex nu mbers! Fearlessly proceedmg, we apply Euler's fo rmula c,r= COS I + i sinl
342
Chapter 4
Eigenvalu es and Eigerw\,clo rs
CALVIN AND HO BBES e 1988 Watterso n. Reprin ted wi th
I~ rnllssion
of UN IVERSAL PRESS SYNDICAT E. All rights reserved.
(Appendix C) to get e- ,r = cos( - t)
+ I sin ( -
x(t) = (-2 - 41)(cos t + isin t)[
~]
t) = cos t - i sin t. Substituting, we have
+ (- 2 + 4 i)(cos t - i si n t)[
( -2 cos t + 4 sin t) + i( - 4 cos t - 2 sin t)
+ i( - 2cos t + 4 sin t)
~J
1
[
(4 cos 1 + 2sin t)
+
[( - 2COS l + 4 sin /) + 1(4 cos t + 2sin I)l (4 cos t + 2 sin t) + 1(2 cos t - 4 sin t)
- 4 cosI + 8sin tl [ 8COS I+ 4slll l T his gives x(t) = - 4 cos 1 + 8 sin I and y(t) = 8 cos 1 + 4 sin t. Putti ng everything In terms of o ur original variables, we concl ude that r(f) = x( t) + 10 = - 4 cos t + 8 sin I + 10
w(f) = y(f) + 12 =
a nd
8cost
+ 4 sinl +
12
25
20 20
"
N '
J\
r\
"10
"
5
10
~
5
0
w(t )
2
4
5 I
6
8
10
fluur, 4.20 Rob in and wo rm populatiOns
12
14
16
figure
4.21
10
"
20 25
30
I
Section 4.6 Applications and the Perron·Frobemus Theorem
au
So our solution is real after all! The graphs of r( l) and IlI'{t) in r:igu re 4.20 shO\\' that the two populations oscillate periodically. As the robin popul.u ion increases, the
worm population starts to decrease, but as the robins' only food source diminishes. their numbers star t to decline as well. As the predators diSllppear, the worm popula. lion begins to recover. As its food supply increases, so does the robin population, and the cycle repeals itself. This OSCillation is typical of exam ples in which the eigenvalues are complex. Ploning robms, worms, and lime on separate axes, as in Figu re 4.21, clearly reveals the cyclic nature of the two populations.
.4-
We conclude this section by looki ng at what we have done from a different poi nt of view. If x "" x(r) is a differentiable fun ction of t, then the general solution of the ordinary differential equation x' "" ax is x "" ce o"~ where c is a scalar. The systems of linear differential equations we have been considenng have the form x' "" Ax,so if we Simply plowed ahead without thinking, we might be tempted to deduce that the solution wou ld be x = eeAl , where e is a vecto r. But what on earth could this mean? On the right-hand side. we have the IIIlmber e raised to the power of a matrix. This appears to be nonsense, yet you will sec thaI there is a way to make sense of it. leI's stan by considering the expression e'1. [n calculus, you learn that the fu nction e~ has a power series expansion
e"= I +x +" + -x' + '" 2!
3!
that converges for every real number x. By anaIOb,)" let us define
A'
eA""I+ A +
2!
A' + - + .. . 3!
The right-ha nd side is just defined in terms of powers of A, and it can be shm.,.n that it converges for any real matrix A. So now ~ is a matrix. cailed the exporu!ntial of A. But how call we compute e" or eA,? f-or diagonal matnces, it is easy.
Example 4.46
Compute eDt for D =
Solutlol I
[~
_
~].
From the definition, we have
(Dr)'
(Dr)'
2!
3!
+ DI + "':-;'-- +
+ ...
l~ ~] + [~t ~t] + t.[(4~)1
°]
(_ I)l +
,[(41)' J' 0
[ I + (4 1) + t.<4t~1 + f.{4tf + .. l + (- I) +
t.<- 1~1 + ~(_ I)J + .. . ]
[~' e~'] The mat rix exponential is also nice if A
IS
(_Or)'] + ...
diagonalizable.
344
Chapter 4
Eigenvalues and Eigenvectors
Example 4.41
~].
A
Compute e for A = [ ;
SDIIIIID.
In Exa mple 4.43, we found the eigenva lues of A to be AI = 4 and A2 = - I,
with corresponding eigenvectors V I p = [ VI
vl1 '" ['3
[ ~] and
-
V2
= [- :
- IJ . we have p-IAP - D "" [ , 1
0
J, respectively. Hence. with OJ. Si nce A "" POp - I, we
- I
have A!; = pdp- 1. so
e- A
=
AJ
2!
3!
I+A+ - + -+'" PIr ! +
~
A2
POp- I
+ 1. pD2 P- 1 + 1. PDJ p - 1 + ... 21
31
J
P(I+ D + D1 + D + ... )"- ' 2!
3!
~ [2e" + 3e5 3e~ - 3e- 1 1
We a re now in a position to show that o ur bold (and ~emingl y foolish ) guess at a n "exponent ial " solution of x' = Ax was no t so far off after all!
Theorem 4.41
Let A be an IIX II diagonalizable matrix with eigenvalues AI' Al , ••• , An' Then the general solu tion to the system x' = Ax is x = e"'c, where c is an arbit rary constant vecto r. If an ini tial cond itio n x(O) is specified. then c = x(O).
PrDDI
Let P diagonalize A. Then A = PDp - I, and. as in Example 4.47.
Hence, we need to check that x' = Ax is sat isfied by x constant except for eDr, so
dx x' "" dt
If
=
= ~p- 1 C.
d d - (Pe Dr p- le ) = P- (e Dr) p- le dt
D~
dt
AI
0
o
A,
.. .
0
o
o o .. . A.
Now, everythi ng is
( 10)
Section 4.6
,f>~
then
345
Applications and th e Perron-Frobenius The(l re m
t"
0
· ..
0
0
.t"
·..
0
0
0
<".,
Taking derivatives, we have
_d (e A,,)
0
•• •
0
0
d - teA,,)
...
0
0
0
dt
.'!.(,D') tit
dt
:
A "
0
~
~
0
A2c A"
·. . · ..
0
·..
A eA.,
e
0
A,e 0
_d (CA.,) dt
•• •
A,
0
•••
0
0
At
·. .
0
0
0
·. . A"
0
"
0
A e"
·. . ·..
D
0
·. .
A, ,
0
0
eA.,
0
De ~
Substituting th is resu lt into equat ion ( IO), we o btain
x' = PDe Drp-1c
=
PDp-1 peDo p-le
= (PDy l)( Peblp-l )e = AeA'e =
Ax
as required. The last statemen t follows easily from the fa ct that if x = x (t) = eAte, th en
x(O)
=
eA-
=
Ie = e
since {J = 1. (Why?) In fac t, Theorem 4.4 1 is true even if A is not diago nalizable, b ul we will no t prove this. For example, see Lincrlr Algebra by S. H. Friedberg,A. J. lnsel, and L E. Spence (Englewood Cliffs, NJ: Prentice-Hall, 1979).
Computation of matrix exponentials fo r nondiagonalizable matrices requires the Jordan normal form o f a mat rix, a topic that may be fou nd in more ad vanced linear algebra texts. Ideall y, this short d igression has served to ill us trate th e power o f mathematics to general ize and the '1:11 ue of crealive thinki ng. Matrix exponen tials turn out to be very im portant tools III many applica tions of linear algebra, both theoretical and applied.
DIscrete linear Dwnamlcat Systems We conclude this chap ter as we began it- hy looking al drnarmcal systems. Markov chains and the Leslie model of population growth are examples of discrete linear dynamical systems. Each ca n be d escribed by a matrix equation o f the form X HI
= Ax.
US
Chap! tr 4
Eigenvalues and EigenveclOrs
where the vector x. records the stateofthe system at "tlme" kand A is a square matnx. As we have seen, the long-term behavior of these systems is related to the eigenvalues and eigem'ectors of the matrix A. The power method exploi ts the iterative natu re of such dynamical systems 10 approximate eigenvalues and eigenvectors, and the PerronFrobenius Theorelll gives speciali1.cd information about the long-term behavior of a discrete linear dynamical system whose coefficient matrix A is non negative. When A is a 2X2 matrix, we can describe the evolution of a dyna mICal system geometrically. The equation x N \ = Ax k is really an infinite collection of equatIons. Ikginning With an initial vector "0, we have: XI
= Axo
X l ""
/lx l
X J ""
Ax~
The set lXo. XI' Xl ' ... ) is called a trajectory of the system. ( ]:or graphical purposes, we will identify each vector in a trajectory with its head so that we can plol 11 as a point .) Note that x, "" A"xil.
Ixample 4.48
Let A ""
[0.5 ° ]. For thedynamicaJ system o 0.8
plot the first five points in
X t+1 "" Axh
the trajectories with the following initial vectors:
(,) x,
~ [~]
SDIIIIIDI
0.625] ' ° [
(b) X,
~ [_~]
(a) We compute X..j
XI
(e) x,
= ~ =
~ [: ]
2.5] [ 0
' X2 ""
(d) x,
AX I
""
~[
-!]
[1.25] 0
, X}
=
Ax l
=
° .
= Ax, "" [0.3 125] These are plotted in Figure 4.22 , and the POUlts are
connected to highlight the trajectory. Similar calculations produce the trajectories marked ( b). (e), and (d) in Figure 4.22.
,. 4 (e)
«(I)
++-+++ "" -+- ., 4 2 6
-2
(b)
flgur.4.22
$cchon 4.6
Applications and thC' Perron-Frobcnius Theorem
leI
In Example 4.48, every trajectory converges to O. The origi n is called an attractor Tn
th is case. We ca n understand why this is so from T heorem 4.1 9. The matrix A in
Example 4.48 has eige nvecto rs
[ ~] and [~J correspondi ng to its eigenvalues 0.5 and
0.8, respectively. (Check this.) Accordingly, for any ini tial vector
we have
Because both (0.5)1 and (0.8)1 approach zero as k gets large, Xl approaches 0 for any choice of Xo- III addition, we know from Theorem 4.28 that because 0.8 IS the dom ina nt eigenvalue of A, x~, will approach a mult iple of the correspondi ng eigenvector
[~] as long as ".! *' 0 (the coefficient o f Xo correspondmg to [~]). In olher words, all trajectories except those that begin on the x-axis (where e:z :: 0) will approach the y-axis, as Figu re 4.22 shows.
EXlmple 4.49
Discuss the behaVIOr of the dynamical system o
matrix A =
Sa1aflol
[0.65 -0015] - 0.15
0.65
X ~"I :: Ax ~
corresponding to the
.
The eigenvalues of A arc 0.5 and 0.8 with corresponding eigenvectors [ : ]
and [ - : ]. respcctively. (Check this). Hence for an ill it ialvector Xo
= CI [ : ]
+ e:z[ -:],
we have
x.
Once again the origin is an attractor, because approaches 0 for any choice of !to. If c! 0, the trajectory will approach the line through the origin with direction
*
vector [ - : ]. Several such trajectories are shown in Figure 4.23. The vectors Xo where
e:z = 0 are on the line th rough the origin with direction vector [ : ], and the corresponding trajectory in this case follows this line in to the origin.
at.
Chapu r 4 Eigenvalues and Eig~Il\'cclors
)'
-,
filII" 4.23
Example 4.50
Discuss the behavior o f the dynamical systems fo llowi ng matrices:
X ,l:+1 ::
Ax~ co rresponding 10 the
(a) A = [ ~ ~ ] SII,n'l
(a)
and [ - : ),
Th~ eigenval ues o f A are 5 and 3 with co rres po nding eigenvectors [ ; ]
r~spectivc\y. Hence for an initial vector ~ ==
C1[ : ]
+
'1( -:].
We
have
As k becomes largc. so do both 5' and 3t. I-I~ n ce . Xl tends away from the origin. Because the do minant cigenvalue or 5 has correspond ing eigenvecto r [:]. all trajecto ries
ror which (, with
( I ""
*" 0 will event ua lly end up in the fi rst or the third quad rant. Trajccto ries
0 sta rt and stay on the line y "" -x
who~ di rection vt'Ctor is [ -
: ]. See
Pigure 4.24(a). (b) In th is example, the eigenvalues are 1.5 and 05 with correspo nding eigenvectors [ : ] and [ - : ] . respectively. Hence
Section 4.6
349
ApplicatIOns and the Perron -Frobenius Theorem
y
4
-4
c-+--+--+-~
-2
x
4
-4
(,)
(b)
flaar, 4.24
If c,
::=
0, then
x~ == ~(O.5)k[ -:] -,lo [~] as k -,looo. But if c, *" 0, then
"d such " .;octmi" osymplol i"lIy 'pp,"" h Ihe Ii", y = x. S" Figm' 4.24(b
).---t-
In Example 4.50(a), all points that start out near the origin become increasingly large in magnitude because IA! > I for both eigenvalues; 0 is called a "peller. In Example 4.50(b), 0 is called a saddle point beca use the o rigin attracts points in some directions and repels points in othe r directiOns. In this case, lAd < I and IAll > I. The next example shows what can happen when the eigenvalues of a real 2X2 matrix are complex (and hence conjugates of one another).
(Kample 4.51
Plot the trajectory beginning with "
[!J
fo r the d ynamical systems x,,+\ =
AXk
corresponding to the following matrices; (a)
A= [0.5 0.5
Solution
-0.0.55]
(b) A =
[~ ! -1.2] 1.4
The trajecto ries arc shown in Figure 4.25(01) and (b), n:spectively. Note that (a ) is a trajectory spi raling into the origin, whereas (b) ap~a rs to follow an elliptical orbit.
351
Chap ter 4
Eigenvalues and Eigenvectors
- 4 - 10
(, )
flllur.4 .25
(b)
The followi ng theorem explains the spiral behavio r of the trajectory in Exampic 4.5 1(:1 ).
l£tA= [: -b]a .
The eigenvalues of A are A =
(I
± bi, and
if II and
/1 ,He
not
both zero, then A can be fac tored as
A:
[b
-1>]-[' II
0lr ['~'O sme
0
- sinB] cos8
Va2 + il and 0 is the principal argument of II + bi.
where r = IAI =
Prool
a
The eigenvalues of A are
A = H2(1
± V 4(- il))
= H2 (1
±
2Wv'=t) =
by Exercise 35(b) in Section 4. 1. Figure 4.26 d isplays a
-b] = ,[ai' -bl']: [, bl r aI r II
•
lie.".
1m
u + bi
0
0]['~'8 r
smO
+
II
± Ibli =
(I
± /"
b" r, and 8. It follows that
-,'n8] cos O
Geometrically, Theorem 4.42 impli es that when A = [ ;:
- ~] '*
the linear transformat ion T(x ) "" Ax is the composition of a rota tion R
0, :EZ
[ c~sO 5mO
- SinO] through the angle 0 followed by a scaling 5 = [ , 0 ] with facto r r cosO 0 r ( Figure 4.27). In Exam ple 4.5 1(a), the cigenvalues arc A = 0.5 ± 0.5/ so r - IAI = b
V111 =
0.707 < I , and hence the trajecto ries all spiral inwards toward O. The next theorem shows that, ;n general, when a real 2 X2 matrix has complex
eigenvalues, it is similar to a matrix of the form [ : -iL--~~--L---+R'
"
Flglr.4.2&
-~]. Fo r a complex vecto r
Section 4.6 Applications and the Perron-Frobenius Theorem
351
y
*..
Scali ng ,
,
Ax = SRx
Rx ' .~ Rotation
,,
,
,
~::::.-------_x
fllluft 4.21 A rotation followed by a sc:ating
we define the real part, Re x, and the Imaginary part, 1m x, of x to be Re x -
-
Theorem 4.43
[a]b --
[R "]
1m x -
Rew
[,] ['mz] tmw d
-
Let A be a real2X2 matrix with a complex eigenvalue,\ = a - hi (where h =F- 0) and corresponding eigenvector x. Then the matrix P = [ He x Im.x J is invertible and
Proof
Let x = u
Au
+ Avi =
+
vi so that Re x = u and 1m x :: v. From Ax = Ax, we have
Ax = Ax
= ::
(a - bi)(u + vi) flU + avi - bul + bv
=
(au + bv) + (- bu + av)i
Equating real and Imaginary parts, we obtain Au = au + bv and
Av = - bu + av
NowP = [u j v},so
a [
P b
-b]a ~ [u l vJ [ab -!]
=
[au + bv I - bu + av J = [Au ! Av] = A[ u I vJ
= AP
To show that P is invertible, It is enough to show that u and v are linearly independent. If u and v were not linearly independent, then it would fo llow that v = ku for some (nonzero complex) scalar k, because neit her u nor v is O. Thus x = u + vi = u + leui = (I + ki) u Now, because A is real, Ax = Ax implies that
- = Ax=..\x=Ai Ax=Ax
so X But
::: U -
VI
is an eigenvector corresponding to the other eigenvalue A = a x = (I
+ ki)u
=
(I - ki) u
+ bi.
Eigenvalues and Eigenvectors
because u is a real vector. Hence, the eigenvectors x and x of A aTe bo th nonzero multi ples of u and therefore are m ultiples o f one anothe r. Th iS IS impossible because eigenvectors corresponding to d istinct eigenvalues must be linearly independent by Theorem 4.20. (This theorem is valid over the complex num bers as well as the real numbers.) This con trad iction implies that u and v are linearly independent and hence Pis invertible. It now follows that
-b]p-' " Theorem 4.43 serves to explain
A == [ 0.2 0.6
-1.2 ] 1.4
Example 4.51 (b ). The eigenvalues o f
are 0.8 ± 0.6;. Por A == 0.8 - 0.6;, a corresponding eigenvector is
From Theorem 4.43, It fo llows that for P == have
A =
pcr l
For the given dynamICal system Xl ==
PYJ;
Xh I
[
- I
- I] and
°
C ==
[0.8 0.6
-0.0.86 ]
, we
and P- 1AP = C
== Ax " we p erform a change of van able. Let
(or, equ ivalently, Yt == p-I X. )
Then
so Yt-t l == Xk+l = Ax. == P- 1APYk = CYIo
Now C has the s.1 me eigenvaJ ues as A ( Why?) and 10.8 ± 0.6;1 == 1. Thus the dynamical system Yk-t l == Cy. simply rotates the points in every trajecto ry in a circle abo ut the origin by Theorem 4.42. To determine a trajectory of lhe dynamical sys tem in Example 4.5 1(b ), we itera tively apply the linear transformation T(x) = Ax == PC p- Ix. The transformation c:m be though t o f as the composition of a change o f vamble (x to y), followed by the ro tation determined by C, followed by the reve rse change of variable (y back to x). \Ve wi ll encounter this idea again in the application to g raphing quadratic equations in Sectio n 5.5 and, more generally, as "ch ange of baSIS" in Section 6.3. In Exercise 96 of Section 5.5, yo u will show that the trajectory in Exam ple 4.5 1(b ) is indeed an ellipse, as it appears to be from Figure 4.25(b). To summarize then: If a real 2X2 m at rix A has com plex eigenvalues A = a ± hi, th en the trajectories o f the dynamical system Xk '.1 = AXk spiral inward if IAI < 1 (0 is a spiral attractor), sp iral outward if IAI > I (0 IS a spiral repeller), and lie on a closed orbit if IAI = 1 (O is an orbital cwter).
I g Sports Tea
In any co mpetitive spo rts league, it is not necessarily a straightfo rward p rocess \0 rank the players or teams. Counting wins and losses alone overlooks the possibility that one team may accumulate a large n um ber of victories against weak teams, wh ile another learn may have fewer victories but all of them agai nst st ro ng teams. Which of these teams is better? How should we co mpare two teams that never play o ne another? Should poin ts scored be taken into account? Points against? Desp ite these complexi ties, Ihe ranking of at hletes and sports teams has become a commonplace and much-anticipated feature in the media. For example, there are various an nual rankings of U.S. college football and basketball teams, and golfers and tennis players are also ranked internationally. There are many copyrighted schemes used to p roduce such rankings, but we can gai n some insight into how to app roach the problem by using the ideas from this chapter. To establish the basic idea, let's revisit Example 3.68. Five tennis players play o nc another in a round-robin tournament. Wins and losses arc recorded in the form of a digraph in which a directed ed ge from ito jindICates that player i defcats player j. The correspo nding adjacency matrix A therefore h.ls a,j = 1 [fplayer i defeats player j and has n ij = 0 otherwise. I
5
2 A~
4
0
I
0 I
0 0
0 0
0 0
0 I
0 0 I
I
I
I
I
I
0
0 I
0
0
3
f,
We would like to associate a ranking r, with player I in such a way that r, > mdicates that player i is ranked more highly than p layer j. Fo r this p urpose, let's require that the r;'s be p robab ilities (that is. O:s r, < I fo r all i, and r l + ' 2 + ') + '4 + '5 = I) and then orgamze the ran kings in a rallklllg vecto,
• •
Furthermore, Jet's insist that player j's ranking should be proportional to the sum of the rankings o f the players defeated by player i. For exa mple, player I defeated players
353
2, 4, and 5, so we want ' t = a(rl
+ '4 + '$)
where a is the constant of proportionality. Writing out si milar equations for the other players produces the following system:
' 1 = cr(rl + r~ + ' s) '1 TJ
+ '. + 's) = o(rl + (4) = a(r,
Observe tha t we can \\' ri te this system 1Il 1llatrix form as
'. " "
'.
"
~
0
I
0 I 0 0
0 0 0 0
0 I 0 0 I
I
I
I
I
I
0 I
0
0 0
'. " "
0'
r = (fAr
'.
"
I
Equivalently, we see that the ranking vector r must satisfy Ar = - r. ln other words, r is an eigenvector corresponding 10 the matrix At
a
Furthermore, A is a primitive nonnegative matrix, so the Pcrron-Frobemll.s Theorem guarantees that there is a IIniqlll! ranking vector r . In this example. the ranking vector turns out to be 0.29 0.27 ,
~
0.22 0.08 0. 14
so \ \ 'C would rank the players in the o rder 1,2,3,5,4. By modifying the matrix A, it is possible to take into accoun t many of the corn-
plexlt il-s mentioned In the opening paragraph. However, th is sJln ple example has served to indicate olle useful approach to the problem of ranking teams. The same idea can be lIscd to understand how an In lernet search engine such as Google works. Older sea Tch engi nes used to relurn the resuhs of a search wlOrderetl. Useful sites would often be buried among irrelevant ones. Much scrolling was oflen needed to uncover whal you were looking for. By con trast, Google returns sea rch results ordered according 10 thei r likely relevance. Thus, a method for ranking websites is needed. Instead of tea ms playing one another, we now have websites hnking to one anot her. We can once again use a digraph to model the situation, only nOI" an edge from ito j indicates that I"ebslle i Imks to (or refers to) website j. So whereas for the sports team digraph, incommg dIrected edges are b.,d (they indicate losses), for the Inlernel d igraph. incoming directed edges are good (they indicate links from other sites). In 354
this setti ng, we wan t the: ranking of websi te i to be proportional to sum of the: rankings of all the websites that link to i. Using the digraph on page 353 10 represent jusl five websi les, we have r~
"" a(r, + r2
+
rl)
fo r example. II is easy to see Ihat we now want to use the trallsposc of the adjacency I matrix o f the d igraph. Therefo re, the ranking vecto r r mus t satisfy ATr "" - r and will thus be the Pe:rron eigenvector of AT. In this example, we obtain 0
I
0
0
0. 14
0
0
0
0
0 I
0.08
0
0 I
I
I
I
0
0
0.27
I
I
0
I
0
0.29
0 I
AT =
'nd
,
~
a
0.22
so a search tha t turns up these fi ve sites would list them in the o rder 5, 4, 3, 1,2. Google actually uses a va riant o f the m ethod desc ribed here and computes the ran king vector via an iterative method very similar to the power m ethod (Section 4.5).
355
356
Chapt('r 4
Eigen".;llut'S and EigeTl\'ectOrs
17. If all of the sur".ival rates s, are nonzero, let
l. [~~] 3.
5.
U~]
4.
0.1
0
0.5
0. 5
I
0
OA 0 0.5
p =
• !
0
I
0
0
0
I
0
•
0.5
I
0
6. 0.5
0
I
0
0
0
0
0
0
"0
0 5152
0 0
o
0
o
0
2. [~ i] j
0
I
Wlriell of the stoc/UI$tie /1lcHricl:s ill Bxercises 1-6 (lfe re~ul(lr?
. .. • ••
Compute p - I LPand usc it to fi nd the characterist ic polynomial of L. ( Him: Refer 10 Exercise 32 in Section 4.3.) 18. Verify that an I."rgenvC(lor of L correspond ing to A, is I
III Exerci$c$ 7-9, P is the trmlS/tlOli maf";x of(I reglilM Markov cll"ill. Filld the /o/lg rallge lrilllsitiofllll(l/rlX L of P. 8. P ""
7. P ""
sdA,
., .
1
1
1
StsJ A i
1
!
1
SI~$J A t
2
1
J
o ! ! 0.2 0.3 0.4 9. P = 0.6 0.1 0.4 0.2 0.6 0.2 10. Prove that the steady state probabili ty vector o f a regular Markov chain is unique. ( Hint: Use Theorem 4.33 or Theorem 4.34. )
P8,1I111I •• Gr.wlll In Exerci$es //- /4, calm/ale the positive elgellvalue tII,(1 a correspolUlmg lwsiuve eigellvector of ti,e Les/re matrix L.
11. [00.5 0']
12. [I05 01.5] L =
t =
074 13. L =
05
0
0
14. L =
15
3
~
0
0
00.50 o i o 15. If a Leslie matrix has a unique positive eigenvalue A,. \"hat IS the significance fo r the populalion if A, > I? A1 < I?A 1 = 1? 16. Verify that the charaCieristic polynomIal of the Leslie matrix L in equation ( 3) is
CtP.) = (-I )"(A~ - b,A~ - 1 - b2$I A ~-z - bJ$,~ A "-J - ... - b,.s t~" 5,._,) (/'Ji"t: Usc mathematical induction and expand dc t( L - AI) along the last column. )
( Hin!: Combine Exercise 17 above with Exercise 32 in Sect Ion 4.3 and Exercise 46 in Sect Ion 4.4). ~
In Exercises 19-21, complltc the steady stale growth rare of Ihe popu/arioll with the I.e$/Ie nwtrix L from tire given exercise. Then use Exercise 18 10 help find the corresponding di$triburion of tile age classe$. 19. Exercise 19 in Section 3.7 20. Exercise 20 in Section 3.7
21. Exercise 24 in Section 3.7
_22.
Many speci~$ of seal have suff~red from commercial hunting. They have ~I."n killed for their skin , blubber, and meat. The fu r trade, in particular, reduced some seal populations to th~ point of extinctIon. Today, the greatest threats to seal populations arc dC(li ne of fish stocks due to overfi shi n g, poll ution, distu rbance o f habitat, entanglement in marine debris, and culling by fishery owners. Some seals have been declared endangered species; other species arc carefully managed. l able 4.7 gives the birth and survival rates for the northern fur seal, divided into l-year age d asses. [The data arc based o n A. E. York and J. R. Ha rtley, ~ Pup Production Fol lowing Ilarvest of Female Northern Fur Seals," Camu/iall JOII,,/III of FIsheries and Aquatic Science, 38 ( 1981). pp. 84- 90.]
Secllon 4.6
ApplICations and the Perron-Frobenius Theorem
35J
(b) Show tha t r = I if and on ly if At = I. (Th is represen ts zero populatioll growth.) I Hint: Lei
Show that A is an eigenvalue of L if and only if
g(A} =
I.J
(c) Ass uming that there is a unique positive eigenvalue A" show that r < I if and o nly if the population is dec reasi ng and r > I if a nd onl y if the population IS increasing.
A sustamable harvesting policy is a proce(llIre that al/ows a
certaill fract ion of a population (represemed by a papulatioll distnbution vector x ) 10 be Iwrvested so fllat the population retllrns to x after one time imerval (where a time IIItervtll is the length of aile: age class). If II is the fTC/ ction of each age class that is harvested, then we can express (ile harvesting procedure mathema tically as fo llows: if we start willt a poPlllation vector x, after aile time interval we have Lx ; hllrve5tmg rell/oves hLx, leaving
Table 4.1 Age (years)
Birth Rate
Survival Rate
<>-2
0.00 0.02
0.91
0.88
0.70 1.53 1.67 1.65 1.56 1.45 1.22 0.91 0.70 0.22 0.00
0.85 0.80 0.74 0.67 0.59 0.49 0.38 0.27 0.17 0.1 5 0.00
2-4
4-6 6-8 8--10
10-12 12- 14 14- 16 16- 18 18-20
20-22 22- 24 24-26
Lx - ilL" ==
Slistainability requires that
-
CMl
(a) Construct the Leslie matrix L for these data and compute the positive eigenvalue a nd a corresponding positIVe eigenvector. (b) In the long run. what percentage of seals will be in each age class and what will the growth rate be?
Exercise 23 shows that the lang-run behavior ofa population can be determined directly from the entries of its Leslie /nrHrix. 23. The net reproductIon rate of a population is d efined as
(1 - h)Lx.
24. If .-\1is the unique positive eigenvalue of a Leslie ma tn x Land h is the sustainable harves t ratio, prove tha t II == 1 - I / AI. 25. (a) Find the s usta inable harvest ratio fo r the wood land caribou in Exercise 24 in Section 3.7. (b ) Usi ng the data in Exercise 24 in Section 3.7, red uce the caribou herd according to your answer to part (a). Verify that the population re tu rns to its original level after o ne time interval. 26. Find the sus taina ble harvest ratio for the seal in Exercise 22. (ConservatioOlsts have had to ha rvest seal populations when overfishing has reduced the available food su pply to the point whe re the seals are in da nger o f starva tion.)
.. .. 7 27. Le t L be a Leslie ma trix: with a unique positive eigen val ue At. Show that if A is any othe r (real or complex) eigenvalue of L, then IAI <: AI' ( /1int:Write A = where the b, are the birth rates a nd the 5} arc the survival rates fo r the popub t ion. r (cos 8 + i SIO 8 ) and substitute it into the equation g (A) = 1, as in part (b ) of Exercise 23. Use De Moivre's (a) Explain why rca n be in terpreted as the average Theorem and the n take the real part of both sides. The number of daughters born to a single fem ale over Triangle Inequal ity should prove useful .) her lifetime.
Chapter 4 Eigenvalues and Eigem'ectors
351
The Perron-Frobenlus Tbeorem
(b ) Show that if A is primitive, then the o ther eigen-
values are aU less than k in absolu te value. (Hmt: Adapt Theorem 4.31.)
/11 Exercises 28- 31, find the Perron root and tile correspondillg PerrOIl eigenvector of A, 28, A ::
30, A ::
[~ ~l
29, A =
0
I
I
I
0
I
I
I
0
31. A ::
[;
39. Explain the results of yo ur exploratio n in Sectio n 4,0 III light of Exercises 36-38 and Se
~l
2
I
I
I
I
0
I
0
I
40. Let A, B, C, and 0 be PIX /I matrices, x be m IR", and c be a scalar. Prove the follow ing lTlatrix inequalities:
lal IcA I ~ IeIIAI 1<1 IAxI " IAllxl
lise I/US crilerion 10 tielermllle whether rile malrix A 15 Irredll clhle. If A is redllcible, find a permutfltioll of ils rows and colllll1l1S t/wtl'lIlS A illlo the block form
linear Rlcurrinci .,lltI.l. /n Exercises 4J- 44, wrile oul tire first six terms of tile
seql/('tIce defined by lire recurrence relal io/! wilh the given imlial conditions.
W~l
34. A =
41. Xu = I, Xn == 2x,,_1 fo r 11 i?:: I
0
0
I
0
0
0
I
0
0
0
0
I
0
0
I
I
I
0
0
0
I
I
0
0
33. A =
(dl IABI < IAII BI
(e) lf A > B > Qand ei?:: D 2: 0, then AC 2: BD i?:: 0,
It call be sllOwl/ that a nonnegatil1e /I X /I mat rix is irretillCilJle if and Dilly if ( / + A) ,,-I > 0. b. Exercises 32-35,
32. A ~
Ibl IA + BI siAl + 181
42. a l
= 128, an = a n_ I / 2
43, Yo = 0,11 ... I, y" =
for
II
2:
2
Y,,-2 for N 2' 2
Yn-l -
0
I
0
0
I
0
0
0
0
I
0
0
0
0
I
0
0
0
0
0
I
0
I
0
0
0
0
I
I
0
I
0
I
I
0
0
0
I
hi Exercises 45-50, solve Ihe recllrrence relatioll IVillr tile givell Irlilial cOlldilions,
0
0
I
I
0
0
0
I
0
0
45. Xc!
I
0
0
0
0
0
0
0
I
I
46, Xu = 0,
35. A ::
44. /)0
36. (a) If A is the adjacency matrtx of n graph G, show that A is Irreducible if a nd only if G is connected . (A graph is c0111lected if there is a path between every pair of vertices.) (b) Which of the graphs in Section 4.0 have an c irreducible adjacency matrix? \Vhich have a prrm itive adjacency matrix? 37. Let G be a bipa rtite graph with adjacency matrix A. (a) Show that A is no t prim itive. (b) Show that if A is an eigenvalue of A, so is - A, \ Hil1l: Use Exercise 60 in Section 3.7 and partition an eigenvector fo r A so that it is compatible with this partitioning of A. Use this partitioning to fi nd an eigenvector fo r - A. I 38. A graph is called k-regl,{ar If k edges meet at each vertex. Let G be a k-rcgular graph. (a) Show that the adjacency matrix A of G has A = k as an eigenva lue. (1'/1111: Adapt Theorem 4.30.)
= I , /)1 = 1, b
n
= 0, x 1 = XI
S,x"
49.
"
= 3x
n_
1
= I, x" = 4xn_ 1
47. YI = \ 'Y2 = 6,y" 48. ('0
= 2bn _ 1 + b"_2 for II 2: 2
+ 4X n_l -
fOTIi
>
2
3X,,_2 for /I i?:: 2
=
4Yn_1 - 4Y,,_2 for 1/ i?:: 3
= 4, " I = I, a" =
a,,_1 - a,,_z/4 for II i?:: 2
bo = 0, bl
= I, b" = 2b n _ 1
+ 2b,,_2 for"
2:
2
50. The recu rrence relation in Exercise 43. Show that your solut ion agrees with the answer to Exercise 43, 5 1. Complete the proof of Theorem 4.38(a ) by showing that jf the recurrence relation x" = ax"_ 1 + bX,,_2has distlilct eigenvalues Al '\2' then the solution will be
'*
of the form
( Hilll: Show that the method of Example 4.40 wo rks in general.)
52. Show that for any choice of mltial conditio ns Xu = r and x, = S, the scalars c i and C:! can be fo und, as stated jn Theorem 4,38(a) and (b).
Section 4.6
Applications and the Perron-Frobenius Theorem
T he area of the square is 64 square u mls, but the rectangle's area is 65 square u nits! Where did the extra square coille fro m? (Him: What does this have to do wi th the Fibonacci sequence?)
53. The Fibonacci recurrence f" = /,,- 1+ /,,-2 has the associated matrix equation x ~ = Ax n _ p where
(a) With fo = 0 and f.. = I, use mathematical ind uction to prove that
A"
~ [f"+' f.
f,
/~- I
359
54. You have a supply of three kmds of tiles: two ki nds of 1 X2 tiles and one kind of I X 1 tile, as shown in Figure 4.29.
1 figure 4.29
for ,11111 <::!: I. (b) Using part (a), prove that
Let twbe the number of different ways to cover a I X " rectangle with these tiles. For example, FIgure 4.30 shows that 13 = 5. fo r all II <::!: I. [This is called Cassini's Identity, after the astronomer G iovanni Domenico Cassini ( 1625-17 12). Cassini was born in haly but, on the invitation of Louis XIV, moved in 1669 to France, where he became director o f the Paris Observatory. He became a French citizen and adopted the French version of his name: Jean- Dominique Cassini. Mathematics was one of his many interests o ther than astronomy. Cassini's Iden tity was published in 1680 in a paper submitted to the Royal Academy of Sciences in Paris.] (c) An SXS checkerboard can be dissected as shown in Figure 4.28(a ) and the pieces reassembled to form the 5X 13 rectangle in Figure 4.2S(b).
I
-
-
t :-r ,
-+
I-
tt -
Jl ?'\ ,
I I
~
I
-I 1
55_ You have a supply of I X2 domi noes with which to cover a 2 X n rectangle. Let dn be the number of different ways to cover the rectangle. For example, Figure 4.31 shows that d3 = 3.
+--
•
(b)
fllure 4.Z8
flgur. 4.30 The five ways 10 tile a I x 3 rectangle
1 1-
(Does to make ally sense? If so, what is it?) (b) Set up a second o rd er recu rrence relation for tn' (c) Using II and t1 as the initial conditions. sol ve the recurrence relation in part (b ). Check your answer against the d ata in part (a).
, -,---
i
+ .
~
,
I
. ..
(a) Find tl" ..• ' 5'
-
•
....;.-
-
~
(a) Find d p . . . , ds. (Does ~ make any sense? If so, what is it?) (b) Sel up a second order recu rrence relation fo r dn' (e) Using d l a nd d2 as the initial conditions, solve the recu rre nce relation in part (b) . Check you r answer against the data in part (a).
3611
Chapter 4
Eigenll(llues and eIgenvectors
The two bacteria compete for food and space but do not feed on each other. I f x == x { t) and y "" y( t) are the numbers of the str;lins at time I days, the growth rates of the two populations arc given by the system
1
x' ""
y' = -O.2x
The thrt'c ways to (01lt'f (I 2X3 rcct3ngIe WIth I X2 domInoes
56. In Example 4041 , find eigenvectors V I and v1 corresponding to AI = " k ::;:
[tJ
2
and Az =
1 - VS 2
• With
verify formula 2 in Section 4.5. That is.
show that, for some scalar cl'
...
In Exercises 57-62. fintl tile general wilitlOlI to the gIven system of tli/fercmitl/ et/rmtiolls. Tlren find tire specific sollltion that mtisftes tire initial c01rditiottS. (Consider all /llIIctiotlS to be /liller/otIS of t.j x = 2x =
+ 3r, x(0) - 0 + 2y. y(O) - 5
58. x' - 2x - y, )1 = - x+ 2y, 59, i l = XI
xl
+ "
y(O) x1 (0)
= I ===
60. '" ::;: YI - Y2, Yl::;: YI + )'l '
y-
y'=x+ z'=x+y,
62. x'::;: x + y' = x - 2y
z' = 3x+
x(0) -
z,
:], b- [=:~]. X(O) -[~~] -1
1
y(O) - o ,(0) - - 1
+ z,
to tire two populations for the given A andb lind Illitial con di tIOns x(O). (Flfst show t/ral there are cOll stants (/ and b such that rlre substitutions x = u + a ami y = v + b convert the system i llto an equivaiem aile with 110 CO/lSIan/ terms.)
66. A ::;: [ -1
Y2(0) == I z,
" "" [;] and b is a constant vector. Determine wlwt happens
I
YI(O) == I ~
Exercises 65 and 66, species X preys 011 species Y. The sizes of tile populations /Ire represented by x = x( t) a"d y = yet) . "f/ie growtlr rate of etlch population is govemed by tire system of dijferelllia/ equ(rtiolls ,,' ::;:; Ax + b, where III
65_ A =
X2. X2(0) "" 0
= XI -
61. x'
x(0) - 1
O.4x - 0.2y
Determine what happens to these two populations.
C.V.
Sut,.s olll.,.r DIII,rllll.1 Eallllols
57. x' y'
= -O.8x + OAy
y' -
. x,
hm k = A,
1.5y
64. Two species, X and Y, live in a symbIOtic relationship. That is. neither species can survive on its own and each depends o n the other for its survival. Imtially, there are 15 of X and 10 ofY. lf x = x(t) and y = yet) arc the sizes of the populations at time t months. the growth rates of the two populatio ns are given by the system x'
Moo
+
(a) Determine what happens to these two populations by solving the system of differential equations. (b) Explore the effect of changing the in itial popu lations by letting x(0) = a and ){O) = b. Describe what happens 10 the popuJations in terms of a and b.
figure 4.31
I +VS
I.2x - 0.2y
x(0) - 2 y(0) - 3 z(O) - 4
63. A scientist places two strains of bacteria, X and Y, in a petri dish. Initially, there arc 400 of X and 500 ofY.
67. Let x = xU) be a twice-differen tiable function and consider the second order differenllnl equatioll x"+ax'+ bx=O
(II)
(a) Show that the change of variables y = x' and z ::;: x allows equation (i l ) to be written as a system of two linear differential equations .n yand z. (b) Show that the characteristic equation of the system in pari (a) is ,\2 + aA + b::;: O.
Chapler Review
68. Show that there is a change of va riables that converts the 11th order differcntitll equation } n)
+
a~_I}"- I )
+ ... + (llx' +
('0 == 0
into a system of nlinca r differential equations whose coefficient ma trix IS th e companion mat rix C{ p) of the polynomial peA) = An + a.. _IA,,-1 + .. . + al A + flo. IThe notation x fl l denotes the kth deriva tive of x. See Exercises 26-32 in Section 4.3 for the deflnltlon of a companio n matrix.] III Exercises 69 alld 70, use Exercise 67 to fifU/ the general
solution of the give" eql/{ltiofl. 69. x" - Sx' + 6x = 0 70. x" + 4x' +3x=O
L.5
-,
79. A = [
0.'
80. A = [ 0.5
0.2 81. A = [ - 0.2
0.4] 0.8
82. A =[ O
1.2
3&1
0.9]
0.5
-1.5 ] 3.6
1/1 Exercises 83-86, the givell lIlatrix is of the form
a
A = [b
-b]a ' I" each case,
A call be factored liS the
product of a scaling matrix alld a rotatiot! matrix. Fmd the scaling/actor r and the allgle 0 of rotatiOIl. Sketch the first fo ur PO;fHS of the trajectory for the dYllamical system Xu r "" Axt
with Xo = [ : ] "nd classify tlle origm (IS a spIral
III Exercises 71-74, wIve tile system of differential equations m the given exerCIse using TI/Core", 4 41.
atrractor, spiral repeller, or orVital center.
71. Exercise 57
72. Exercise 58
83. A :: [:
73. Exercise 61
74. Exercise 62
84. A =
-: ]
v'3,3]
Dlscr •• e Linear DJnamlCal SVS ....II
[- °05 0.5] 0
_ [-v'3/2 - '/2 ] 86. A '/2 -v'3/2
III Exercises 75-82, cOllsider the dynamical System In ExerCIses 87-90, find all illvertible matrix P and a
" h i = AXt ·
(a) Compute ami plot Xo, " I '
X 2, X J
(b) Compute aud /llot Xo-> X I> x2, X l for Xo =
[' 3']
-!]
sl/cli (lwI A = PCp - I.
Sketch the first six points of t/ie traJlxtory for tire dynamical
[~].
system Xk+ I =
(c) Usiug eigenvailles alld eigellvectors, classify the origin as (HI anTactor, repeller, s(I(ldle poillt, or Hone of these. (tl) Sketch several typicnl trajectories of the system. 75. A=0
[~
matrix C of the form C =
for Xo = [:].
[0.5 -0.5] 0.5
76.A =O
- 4 78.A = [ I
Ax~ with "tl =
a spirol (ltlraC/or, spiral repeller, or orbital cellter.
1 22 0]
87.A == [ 0.1 - 002 ] 0.1 0.3
88.A = [
89. A=[: -~ ]
90.A== [~~]
..
~-
R
[:] (llIti cltlssify the origill (IS
~
- ,"
.....
-
-
',~,
.
Ker Dellnlllons and adjoint of a matrix, 275 algebraiC multiplicity of an eigenvalue, 291 cha racterist ic equation, 289 cha ractenstic polynomial, 289 cofactor expansion, 265 Cramer's Rule, 273-274 determinant, 262-264
diago nali7..able matrix, 300 eigenvalue, 253 eIgenvector, 253 eigenspace, 255 Fundamental Theorem of Invertible Matrices, 293 geomet ric multiplicity of an eigenvalue, 291
Gerschgorin's Disk Theorem, 318 Laplace Expansion Theorem, 265 power method (and its variants), 308- 316 properties of determmants, 268-273 similar matrices, 298
362
Chapter 4
Eigenvalues and Eigenvectors
Review Questions I. Mark each of the following statements true or false:
(a) For all square matrices A, d et( - A) = - de t A. (b ) If A and B are 11 x II matrices, then det(AB) = d et ( BA). (c) If A and B are nXn matrices whose columns are the same but III different o rders, then det B = - det A. (d) If A is invertible, then det(A- I) = d et AT. (e) If 0 is the only eIgenvalue of a square matrix A, then A is the zero matrix. (0 Two eigenvecto rs co rresponding to the same eigenvalue must be linearly dependent. (g) If an n X PI matrix has n distinct eigenval ues, then it m ust be diagonalizable. (h ) If an "X II matrix is diagonalizable, then it must have 11 distinct eigenvalues. (i) Similar matrices have the sam e eigenvectors. (j) If A and B are two 11 X n matrices with the sam e red uced row echelon form, then A is SImilar to B.
2. LetA =
I )
5
3 7
7
5 9
11
(a) Compu te det A by cofactor expansion alo ng any
row or column. (b) Comp ute de t A by fi rst reducing A to triangula r form.
3d 2, - 4f f " b , 3. If d e f = 3, find 3(1 2b - 4c c gir l 3g 211 - 4i j 4. Let A and B be 4 X4 mat rices with d et A = 2 and de t 8 = - i. Find d et C for the indicated mat rix C: (a) C = (A8) - '
(b) C= A'B(3A')
5. If A is a skew-symmetric /I X PI matri x and
n is odd,
prove that det A = O.
6. Fi nd all values of k for wh ich
1
- 1
I 2
I 4
2
k = O. Ii'
111 Questlolls 7 and 8, sholll that x is all eigcllvector of A and
filld the corrcspolldillg eigellvalile.
7. X = [aA =[~ 8. x =
3 - I ,A =
2
10
- 6
3
3
4
- 3
o
0 -2
9. Let A =
(a) Find the characteristic polyno mial of A. (b) Find all of the eigenvalues of A.
(e) Find a basis for each of the eigenspaces of A. (d) Determine whethe r A is diagonalizable. If A is not diagona lizable, explain why not. If A is d iagonallzable, find an invert ible matrix P and a d iagonal matrix Dsuch tha t P- IA P "" D.
10. If A is a 3X3 d iago nalizable m at rix wit h eigenvalues - 2,3, and 4, fi nd det A. 11. If A is a 2X2 mat rix with eigenvalues Al =
and correspo nding eigenvectors V I
find A-S[ ~].
t,Al ""
- ],
=[:].v,=[_:].
12. If A is a diago nalizable matm and all of its eigenval ues satisfy ~ A~ < I, prove that A" approaches the- zero matrix as n gets large.
111 Questions 1~ 1 5, determine, with reasons, whether A IS similar to B. If A - B, give an invertible matrix P such tlrat p- 1AP=B. 13. A = 14. A =
15. A =
[~ ~]. B =[~ ~] [~ ~].B= [~ ~] 1
1
0
1
I
0
0
I
1
, B= 0
1
0
0
0
I
0
0
1
16. Let A =
[~ ~]. Find all values of k fo r which:
(a) A has eigenvalues 3 and - \. (b) A has an eigenvalue with algebraic multiplicity 2. (e) A has no feal eigenval ues.
17. If A3 = A, what are the possible eigenvalues of A? 18. If a square matrix A has two equal rows, why must A
have 0 as one of its eigenvalues?
~l \3 - 5
- 5
19. If x is an eigenvector o f A with eigenvalue A = 3, show
- 60
- 45
18 - 40
15
-32
that x IS also an eigenvector of A l the correspond ing eige nvalue?
-
SA
+
21. What is
20. If A is similar to B with P- IAP = B and x is an eigenvector of A, show that p - IX is an eigenvecto r of B.
rlho
.. . that sprightly Scot of S(Q/S, Dol/glaJ, /IUlt rims a-horseback up 1/ 11;1/ perpmdiClilar- William Shakespeare Hrury IV; Pari I Act II, $etne JV
Ii
5.0 Introduction: Shadows on a In this chapter, we will ('xtend the notion of orthogonal projection thaI we encountered fi rst in Chapter I and then again It1 Chapter 3. Unti l now, we have d iscussed only p roje.:tio n onto a si ngle vector (or, equivalently, the o ne-d imensional subspace span ned by that vector), In this section, \\'e will see ,f we can find Ihe analogous formulas for projection on to a p lane in [R3. Figure 5.1 shows what havpens, fo r example, when parallel light rays create a shadow on a wall. A similar process occurs when a three-di mensional object is displayed on a two-dimension'll screen, such as a computer mon itor. Later in this chapter, we will consider these ideas in full generalit y. To begm, let's take another look a t what we already know about project ions. In SWion 3.6, we showed that, 111 R2 , the standard matrix of a projwion o nlo the line thro ugh thc origin with direction vector d = [",] ;,
",
dldl/( d{ + df)] (Iil(d{ + dD Hence, the projection of the vector v onto thiS rille is just Pv.
Problem 1 Show that Pcan be written in the equivalent fo rm
flgur. 5.1 Shadows on II wall are project1ons
cos' O [ cos O sin 6
COS (;I SinO ] Sll1
2
0
(What does 0 represent here?) Prolll,.2 Sho," that l)can also be written in the fo rm p :::: uu T, where u IS a unit vector in the d irection of d. [ 3] Problem 3 Using Problem 2, find P and then find the proje<:tion of v :::: onlO the lines with the following unit direction vectors: - 4
Probl••• Using the form p :::: uu T, show that (a) pT = P (i.e., P is symmetric) and (b) p! = P (i.e., Pis idempotcnt).
363
364
Chapter 5 Orthogonality
PrOb1'1I 5 Explain why, if P is a 2X2 project ion matrix, the line onto which it projects vectors is the column space of P. Now we will move in to R J and conside r projections onto planes through the origi n. We will explore several approaches. Figure 5.2 shows one way to proceed. If (t is a plane through the origin in R~ with n ormal vector n and if v is a vector in [IlJ, then p = projff (v) is a vector in '!P such thtH v - en = p for some scalar c. n
,. - ell
p
~
( II
,
Figure 5.2 ProJection onto a plane
PrOblem 6 Using the fact that n is o rthogonal to every vector in (j}, solve v - en = p for eta find an expression fo r p m terms of v and n. Probl,.l Use the method of Problem 6 to find the projection of I
v =
0
-2 o nto the planes with the follOWing equations:
(a) x + y + z = 0 (b) x - 2z = 0 (c) 2x - 3y +
Z
= 0
Another approach to the problem of fmding the projection of a vector o nto a plane is suggested by Figure 5.3. We can decompose the p rojection of v o nto '!P into the slim of its projections onto the d irection vectors for (j}. This works only if the d irection vectors arc o rthogonal unit vectors. Accordingly. let u l and u 1 be direction vectors for '!P with the property that
,, ., ,
---ri p ,, <
PI
Fillure 5.3
P,+ P2
., ..
Sccllon 5. ! Orthogonality in R"
365
By Problem 2, the projections of v o nto u 1 and u 2 are
respectively. To show that PI + P2 gives the p rojection of v onto 'fJ', we need to show that v - ( PI + P2) is orthogonal to '!P. It is enough to show that v - ( PI + P2) is orthogonal to both u l and U 2• ( Why?) Problema Show that u 1' (v - ( P I + Pl» = 0 and u 2' (v - (P I + P2» = O. ( HI/II: Use the alternative form of the dot product, x Ty = x' y, together with the fact that U I and U 1 are orthogonal unit vectors.)
It foll ows from Problem 8 and the comments preceding it that the matrix of the projection onto the subspacc <J> or M' span ned by orthogonal unit vectors U 1 and u 2 is (I )
Probl •• 9 Repeat Problem 7, using the formula for Pgiven by e(luation ( I ). Use Ihe same " and usc " I and " 2' as indicated below. (First, verify that u l and U z are o rthogon'll uni t vectors in the given plane.)
o
-2/ V6 (a) x
+y+
Z
=
0
wi th
Ul
[/ v'6
=
andul =
1/ V6
o
' /VS (b) x - 2z :: 0 with U 1 =
0
and U 1
I/ VS
""
1
0
I/VS (c) 2x - 3y +
~ =
0 with u \ =
1/0 - 1/ 0
- L/ 0 1/v3
2/V6 andU2 =
1/ V6 - 1/ V6
Probl•• 11 Sho w that a projection matrix given by equal ion ( I ) satisfi es propertIes (a) and (b ) of Problem 4. Problem 11 Show that the matrix Pof a projc<:tion onto a plane in R) can be expressed as
£1 = /\A T fo r some 3 X 2 matrix A. [H im: Show that equalion ( I) is an o uter product expansion.] Probl•• 12 Show that if P is the matrix of a projection on to a plane in [R3, then rank (P) = 2. In this chapter, we will look at the concepts of o rt hogonality and o rt hogonal projection in greater detail. We will see that the ideas introduced in thiS section can be generalized and thilt they have many important applicatio ns.
Orthogonalltv in
~.
In this section, we will generalize the no tion of orthogonality of vectors in R" from two vectors to sets of vectors. In doi ng so, we will see that two properties make the standard basis Ie ., e2•• . •• e,,1 of R" casy to work with: Fi rst, any two distinct vectors
3&&
Chapter 5
O rthogo nality
in the set are o rthogo nal. Second, eac h vector in the set is a unit vector. Th ese ~vo properties lead us to the notio n of o rthogon:l l bases and orthono rnw l basesconcepts that we will be able to fr uit fully apply to:l va riety o f applications.
Ortbogoaal aDd Orthonormal Sets 01 Vectors
Delioitioo
A set o f vecto rs {vI' V z" .. , vd in H~ is called an orthogotlal sct if all pairs of dis tinct vectors in the set are o rthogonal- that is, if Vj 'Vj"" O
whenever
i*j
fori ,j= I,2, ... ,k "
The sta ndard basis le i' ez•.. . , e~ 1 o f lR~ is an o rt hogonal set, as is a ny subset of it. As the fi rst example iII usl r3les, there a rc man y olhe r possibilities.
Example 5.1
Show that
Ivi• V~ , v,l is an o rthogonal set in H' if
o
2 VI
=
I,
V2 -
- I
Solullon
I ,
1 VJ -
- I
I
1
We must show that every pair of vectors from this sel is o rt hogonal. Th is is
true, since
Y, 'Y, Y, ' Y, y
Y , ' Y,
~ ~ ~
'(0 ) + 1(1) + (- 1)( 1) ~ 0 0(1) + 1(- 1) + (1 )(1) ~ 0 2(1 ) + 1(-1} + (-I}(I) ~ 0
Geome trically, Ihe vecto rs in Example 5. 1 a re mUlually perpendicular, as Figure 5.4 s hows.
flgur.5 .4 An orthogonal set of vectors
One of the main ad va ntages of working wi th o rthogonal selS of \'ccto rs is Ihal they a re necessarily li nearly independent, as Theore m 5.1 shows.
Theorem 5.1
I f l V I' vZ"
vtl isan orthogonal set of nonzero vectors in H", lhen these \'eCIOrs are linearly independent.
Prool
'"
If c..... , Ct a rc scalars such that (CIVI
+ ... +
(I V I
+ ... +
(tVt
= 0, then
CtVt) -v, = O- v, = 0
o r, equivalen tly, (I)
Since l VI' v z• _.. , vtl is an orthogonal set, all of the d ot products in equation (I) arc zero, except V, ' v,_Thus, equation ( I) reduces to
(,{v" v,) = 0
Section S. l
361
Orthogonality in R"
'*
Now, v, ' V, 0 because v, '" 0 by hypothcsls. So we must have (, = O. The fact that this is true for all j = I , ... ,k implies that lv l • v2•••• , vd is a linearly indcpendem sel.
a•• ."
Thanks (0 Theorem 5.1, we know Ih ,l\ if a set of vectors IS orthogonal,
it IS automatically linearly independent. For example. we call immediately deduce th31 the three vectors in Example 5.1 are linearly independent. Contrast this approach wi th the work nceded to establish their linear independence directly!
DelialUoD
An orthogonal basis for a subspace W of Rn is a basis of Wlhal is
an o rl hogonal set.
Example 5.1,
The vectors
o
2
1
- I
1
from Example 5.1 arc orthogonal and, hence, linearly independent. SIIlCC any three
linearly independent veclOrs in R' form a basis for Rl, by the Fundamental Theorem of Invertible Mat rices, it follows that Ivl' v2' vJI is an orthogonal basis for Rl .
v,
In Exa mple 5.2, suppose only the ort hogonal vectors V I and were given and you were asked to find a third vecto r vJ to make {vI' v" vJI an orthogonal basis for RJ. One way to do this is to remember that in iR', the cross product of two vectors V I and v1 is orthogonal to each of them. (See Exploration: The Cross Product in Chapter I.) Hence we may take •••• rll
o
2
x
I - I
I
2
=
-2
2
1
Note that the" resulting vector is a multiple of the vector vJ in Example 5.2, as it must be.
Example 5.3
Find an orthogonal basis for the subspace Wa f R' given by x y:x-y+2z::Q
w ~
,
S.111I,. Section 5.3 gives a general procedure fo r problems of this sort. For now, we will find the orthogonal baSIS by brute force. The subspace W is a plane through the origin III Rl. From the equation of the plane. we have x :: y - 2z, so W consists of vectors of the for m
y - 2z
y
,
-2 y 1 +, 0 1
~
o
1
368
Cha pter 5
Orthogonality
-2
1 It fo llows that u
=
I
and v
o
=
o
are a basis fo r W, but they are
1101
onhogo-
1
n al. II suffices to find another nonzero vector in Wthat is o rt hogonal to either o ne of these. x Suppose w ::::
y is a veclOr in W that is orthogonal to u . T hen x - y + 2% = 0,
,
since w is in the plane 11': Since u ' w = 0, we also have x system
+ )' = O. Solving the li near
x - y +2z = 0 x +y
:::: 0
we fi nd that x = - z and y = z. (Check this.) Thus, ally nonzero vector w of the fo rm
-, z
w=
, -1
will do. To be specific, we could take w =
I . lI.s easy to check that lu , w } is an 1
orthogo nal set in W and, hence, an orthogonal basis fo r W, since dim W = 2.
Another advantage o f working with an orthogonal basis is that the coo rdinates o f a vector wi th respect to such a baSIS are easy to compu te. Indeed . there is a formula fo r these coordinates, as the fo llowing theorem establishes.
Theorem 5.2
Let Iv I' v2' ••• , vk} be an orthogonal basis for a subspace W of RW and let w be any vector in W. Then the un ique scalars ' I" .. , ,~such thai w =
' IVI
+ ... + '.V.
a re given by
c, =
W'v
v' ' , v,
for i = I, ... , k
Prool
Since {v I' v ~, ...• vi} is a basis for W, we know that there are umque scalars 'I " . . , 'l such that w ::: 'I V. + .. + 'lV1 (from Theorem 3.29). To establish the for mula for 'i' we take the dot product of this linear com bination with v, to obtain
::: c\{v l 'v,)
+ ... + c,{v ,' v,) + .. . +
Ci(Vk ' V,)
= c,( V,' Vi)
since VI ' Vi = 0 for j sired result.
'* i. Si nce v, *" 0,
Vi ' V,
'* O. Divid ing by v, ' v" we obtain the de--=-
!i«lion 5.1
Example 5.4
Orthogonality in R-
31.
1
2 with respect to the o rthogonal basiS B = {VI' v1 • vll 3
Find the coordi nates of w =
of Exa mples 5. 1 and 5. 2. Solutloll
Using Theorem 5.2, we compute
c, =
'1 =
W ' Vt
2+2- 3
1 =-
VI • VI
4+ 1+ 1
6
W' " l
0+2+3
5
V I ' \/2 W'Vj
= vj . v J
cJ
== 0+ 1 + 1 2 1
2+ 3
2 = 1 + 1 +1 3
Thus, w =
'I V I
+ CzvJ + eJv)
= ~ VI
+ ~ V2 + i v)
(Check this.) With the nolation introduced in $cctlon 3.5, we can also write the above • equation as 1
(WJ8 =
•1 ,,
Compare the procedure in Example 5.4 with Ihe work required to find these coordinates directly and you should start to appreciate the value of orthogonal bases. M no ted at the beginning o f this sectio n, the o ther property o f the standard basis in Rn is Ih.1( each standard basis vecto r is a umt veclor. Combin ing this property with orthogonality. we have th e foll owing defi n ition.
Delinilion
A set o f vectors in R" is an orthonormal set if it is an orthogon.d
set of un it vectors. An ortl/Onormal basis for a subspace IV of R is a basis of IV that is an o rtho normal set.
Reilif' If 5 = /q l" .. , 'hI is an o rt honormal.sct of vectors, then q, ' ~ = 0 fo r I j and , q,1 = 1. The fact that each q, is a unit vecto r is equivalent to ~. q, = I . It follows tha t we can summarize the statement that 5 is orthonormal as
*"
if i "" j if j = j
Show that S "" {q l' 'ltl is an o rthono rmal .sct in II J If
q, -
1/0 - 1/0 1/0
1/ \/6 and
q l::
2/ V6
1/ \/6
310
Chapter 5 Orlhogonllhty
Solall"
We check that
q, ' q, ~ 1/ v'iS - 2/ v'iS + 1/ v'iS ~ 0 q l 'ql ql ' ql
=
1/ 3 + 1/3 + 1/3 "" I
=
1/ 6 + 4/ 6 + 1/ 6"" 1
If we have an orthogonal set, we can easily obtain an orthonormal SCt from it: We simply normalize each vector.
Example 5.6
Construct an orthonormal basis for Ire from the vectors in Example 5.1.
SoluUGa
Since we al ready know that vI' v1' and v) are an orthogonal basis, we nor· mahze them to get
- I
2/ V. I/ V. - I/ V.
a
a
2
I
V. I
IIv,1I
I VI ""
I
IIv,1I
v'i I
Vl -
V3
I
I
I I
- I I
I/ v'i I/ v'i 1/0 - 1/ 0 1/ 0
Since any orthonormal set of vectors is. in particular, orthogonal, it is linearly independent, by Theorem 5.1. If we have an orthonormal basis, Theorem 5.2 becomes even simpler.
Theall. 5.3
Let /q l' '12, ... , q.l;l be an orthonormal basis for II subspace W of R~ and let w be any vector in W. Then w "" (w'q l)q l
+ (W'lb )ql + ... + (W·qk)qk
and th is representation is unique.
PrODr
Appl y Theorem 5.2 and use the fact that q, ' q, "" I for i
=
I, ... , k.
OrtbogOnal MatrIces Matrices whose columns form an orthonormal set arise frequently in applications, as you will see in Section 5.5. Such matrices have several attractive properties, which we now examme.
Secllon 5.1
Tbeor•• 5.4
The columns QT ~ 1... .
PrODI
or an
311
Orlhogonahty in R"
m X " matrix Q form an orthonorm I set if .nd only i
j
We need to show that
{o ifif
(QTQ), =
I
'*
i j i = )
Let q, de no te the ith column of Q (and, hence, the ith row of QT). Since the ( i, j ) entry of QrQ is the d ot product of the ith row of QT and the j lh column o f Q, It follows that
(QTQ)., = q. 'q,
(2)
by the definition of matrix Illultlplication. Now the colum ns Q form an o rtho normal set if and o nly if
q, "q) =
o {J
ifi "4:)
ifl = )
which, by cquation (2), holds If and only if
(QTQ), = Orlhogonal matrix is an unforlu nate bit of terminology. ·Orthonornlal ma trix" would dearly be a better term, but it is not stJndard. Moroover, there is no tenn for;/, nonsquare matrix with orthonormal col umns.
"*
{~
if i j if i = j
This complctes the proof. If the matrix Q in Theorem 5.4 is a sqllare matri x, it has;) special Il:.me.
Definition
An n X II matrix Q whose columns form an orthonormal ... is #
~aned an orthogonal matrix.. The most impor tan t fae l about o rthogonal m:llrices is given by the next theorem.
•
Theor,. 5.5
squaJ.e: II1lltrixQis orthogonal if and onty if Q
-
,
Prll'
By Theorem 5.4, Q is orthogoll:.1 if and only if QTQ = 1. This only if Q is invert ible and Q- I = QT, by Theorem 3.1 3.
(lampl, 5.1
IS
true if and
Show Ihat the following matrices are orthogo nal and find their inverses:
010 A =
0
0
I
I
0
0
,nd
COSo
8 -
[
smO
-,'no] cosO
Solullon T he columns or A arc lusl the standard basis \'ectors for clearly o rthonormal. Hence, A is orthogonal and
A-1= AT ..
o
0
I
I
0
0
010
R\
which ;lre
312
Chapter 5 Orthogonalit y
For B, we check directly that
8 TB ::: [
eos(J SinO ] [ eos o - Sin 0 cos 0 sin 8
0]
- sin 00, 6
COS20 + si nl O - eos Osi nO + SiIiOCOSO] = [ ,. - sill 0 cos e + cos (J si n e 5in 20 + cos1 0 Therefore, B is orthogonal, by Theorem 5.5, lind 8- 1 == BT: [
cosO - sin 0
[a' a,] =
I
0]
s in cos O
-+
The word isometry literally mea ns " length pres ttvin g," since it is deriYCd from the Greek root s isos ("equal") and metrOIJ ("measure").
Theor.m 5.6
Remark Malrix A in Exa mpl e 5.7 is an e_'(am ple of a perm utation m3t rix, a matrix obta ined by perm utin g the colu mns of an iden t ity matrix. In gen el31, any "x" perm utat ion matrix is orthogo nal (see Exercise 25). Matrix B is the mat rix of a rota tion through the ang le 0 in R2. Any rotation has the prop erty that it is a length preserving tran sformatto n (known as .tIl isomelry in geo met ry). The nex t theorem shows that every orth ogo nal matrtx tra nsfo rma tion is an isomelrY. Ort hogonal matrice s also preserve dol products. In f3ct , orth ogo nal matrices arc cha racteriz ed by eith er one of these pro perties.
Let Q be an nX 11 m atri x. The following stat ements are equiv:llenl: Q is orthogo nal. b. I Qxl = Ixl for every x in R". 3.
c. Qx. Qy = x·y
fo r every x and y in RI!.
Proal We will prove that (a ) => (e) => (b) :=} (a). To do so, we will need to make use of the fact tha t if x and r are (col umn ) vC(tors in R", then x' y =: xTy. (a) => (cl Assume that Q is orthogonal. The n QTQ = I, and we have
Qx· Qr = (QX)TQ y = XTQTQ y = xTly = xTy :::: x · y ( c) => (b) Assume that Qx · Qy = x . y for every x and y in R". Theil, taki ng y : x,
weh ave Qx 'Qx = x'x ,so I Qxl:::: Y Qx · Qx = Y x-x = I x~. (b) => (a) Assume that property (b) holds and let q, den ote the Ith colu mn of Q.
Using Exercise 49 in Sect ion 1.2 and propert y (b), we have
x'y :::: Hlix + y~ 2
-I x -
yU1)
+ Y)I ' - l Q(x - Y)i ' ) = IU Qx + QYI ' - IQx - QYI') = WQ (x
= Qx · Qy for all x and y in ~ ". [This shows that (b) => (e).1 Now if e, is the IIh stan dard basis vector, then q, :
q, . q) = ,-
=:
Oc,. Consequently,
o ifi> l:j {[if i = j
Thus, the colu mns of Q form an orth ono rma l set , so Q is an orth ogo nal matrix.
Sect ion 5.1
Orthogonali ty in
R~
313
Loo king at the orth ogonal matrices A and B in Exampl e 5.7, you may notic~ that no t only do their colu mns form orth ono rma l sets -so do thei r rows. In fact, every o rthogonal matrix has this pro pe r!)', as the next theorem shows.
The orem 5.1
•
If Q is an orth ogo nal matrix, then its rows form
,,"1
From The orem 5.5, we know that Q- I ;; QT. Therefo re,
(Q') , = (Q-')"
=
Q = (Q')'
so Qr is an orth ogo nal mat rix. Thus, the colu mns of QT_w hich are just the rows of Q-f orm an orth ono rma l SCI. The fi nal theo rem in th is sect ion lists som e o ther prop erties of ort hogonal mat rices.
Theorem 5.8
Let
0 be an orthogonal mal ri.'(,
a. Q- I is orth ogo nal.
b. det O =:t l c. If A is an eige nva lue of Q, then ,A = I. d. If QI and O2 arc orth ogo nal ,. X" matrices. then so is
OIQ,. •
"11 1 We will prove properl y (e) and leave the proofs of the rema inin g pro perties as exercises. (c) LeI A be an eige nval ue of 0 with corr esp ond ing eigenv~c lor v. Then
and , uSlh g The orem 5.6(b), we have
Ilvll
Since ; vl
'*
I Qvl1 0, this implies that IAI == I. =
I Avi
=
=
Qv "" Av,
IAlll vl1
[0 -0\]
.,••,.
Property (e) hold s even for com plex eige nvalues. The mat rix 1 is orth ogona l with eigenva lues i and - i, both of whi ch have absolu te valu e I.
Exercises 5.1 /11 Exercises /-6, de/erm ine widell sets of veClQ r$ {Ire ortl wgo nal. - 3 2 \ - \ 4 2 I. \ • 4 • - \ 2. 2 \ 2
3.
•
2
\
3
- \
\ - \
•
2 2 • - 2 \
•
-5
2
4
5 4.
0 \
3 • -2 • \
5.
\
2 3 \
- \
6.
2 3 -\
•
- 2
- 4
\
-6
- \
•
2 7
4
0
\
0
\
- \
0
- \
\
0
- \
\
\
\
\
\
0
2
•
Chapter 5
314
Orthogonality
In Exercises 7- /0, show t}u/f the given vectors fOrtll W I orthogonal basis for (R2 or (RJ. Theil usc 'nleorem 5.2 to express w as a linear combination of these /msis vectors. Give the coordil1ate vector [wJ8of w with respect to the basis 6 = {VI' VI } ofR l or6 = {VI' V 2' Vl } of R l •
7.v,~ [_;l v,~ [:lw ~ [-:l
8.
0
I
2
=
, V2
=
I
;w =
I
I
I
I
I
I
I
I
I
;w =
- 2
0
I
2 3
In Exercises // - 15, determille w/letller the givell orthogonal set of vectors is orthonormal. If it is IIot, normalize the vectors to fo ml all orthonormal set. II.
[l]. [-!] ,!,
j
13.
o
1/2 1/ 2 , - 1/2 1/ 2
[l]. [-ll •,,
2 -~
o
,
I
,• . , ' l
15.
12
14.
,
_1
j
•
,• • 1 ,
26. If Q IS an orthogonal matriX, prove that any matrix obtained by rearranging the rows of Q is also orthogonal. 27. Let Q be an orthogonal 2X2 mat ri x and let x and y be vecto rs in RI . If 0 is the angle between x and y, prove that th e angle between Qx and Qy IS also 8. (This proves that the linear transformations defi ned by o rt hogonal m atrices arc Grlgle-presavillg in Gt 2, a fact that is true in generaL) 28. (a) Prove tha t an orthogonal2X2 matrrx m ust ha ....e the form
,, _1 •, • -,,
where [ : ] is a unit vector.
j o o v'3/ 2 o V6/3 - v'3/6 1/ V6 ' v'3/6 ' 1/0. 1/ 0. - 1/V6 - v'3/6
(b) Using part (a), show that every o rthogonal 2 x 2 matrix is of the fo rm COS [
orthogonal. If It is, find its inverse.
oil ,,
19.
cosOsin 8 COSI O
- cos ()
. '0 - sm-
sin 0
-cos 8 si n 0
Sin 0
0
cosO
,,
20.
,,
1 , , , , • _., 0 1 1
18.
[ 1/ 0. 17. - 1/ 0.
,1 _1 ,, ,
,, 1, •, ,1, j _ 1, , _.1, •, •, , j
0
sin 0
cos O
-SinO ]
[ si n 0
cos 0
0]
,in - cas 8
where 0 < 0 < 27t_ (e) Show that every orthogo nal2X2 matrix corre· sponds to either a rotatio n or a reflec tion in R2. (d ) Show that an orthogonal 2X2 matrix Qcorresponds to a rotatio n in [R2 if det Q = 1 and a reflection in IR:l if de t Q = - I.
III Exercises 16-21, determine whether rlzegiven matrix is
16. [ 0I
2/ 3 1/0. 1/V6 - 2/3 1/0. - 1/ Y6 0 1/3 1/ 0.
25, Prove that every permutation matrix is orthogonal.
- I
- I , Vj =
0
1/ V6
24. Prove Theorem 5.8{d ).
- I I , v2 =
0
0
23. Prove Theorem S.8{b).
I , vJ
21.
0
0
22_ Prove Theorem 5.8(a ).
v,~ [;]. v,~ [-~ l w ~ [:l I
I
1/ 0.] 1/ 0.
III Exercises 29-32, lise Exercise 28 to determine whetltcr the given ortllOgoll(// matrix represents (/ rotalion or a refleetioll. If il is a rotatlOl/, give the angle of ro/at ioll; ifit is (/ reflee· lion, give the lillc of reflectjoll 29.
1/ 0. - 1/ 0.] [ 1/ 0. 1/ 0.
- 1/ 2 V3/2 ] 31. [ v'3/ 2 1/2
[ - 1/ 2 v'3/ 2] 30. _ v'3/ 2 - 1/ 2
-,
32.
[-i
Section 5.2
33, Let A and B be /IX /I orthogonal matrices. (a) Prove that A(AT + 81)8 == A + 8. (b) Use part (a) to prove th at, if det A + del B = 0, then A + B is not invertible. 34. Let x be a unit vcctor in ROO. Partition x as
x,
......
x,
x-
-[~ l
115
Orthogonal Complements and Orthogonal Projections
wi th a prescribed fi rst vector x, a construction that is frequently useful in applications.) 35. Prove that if an upper triangular matrix is orthogonal, then it must be a diagonal matrix. 36. Prove that if 1/> III, then there IS no mX'1 matrix A such that IAxI == Ixl for all x in n-. 37. Let 8 = {VI' . .. , v~} bean orthonormal basis for R~. (a) Prove that, for any x and y in R~,
x"
s'y = (X 'YI)(Y ' V1 )
+
(x'Y!)(Y'V2)
Let
+ ...
+ Ix· v,,)ly· v,,) :!I.i..............! ~............ _
, ( I )yyT
y j 1:
I - XI
(This identity is called Parseval's ldetllily.) (b) What does Parseval's Identity imply about the relationshIp betw~ n the dot products X· yand
Prove that Q IS orthogonal. (This proced ure gives a qUIck method for finding an orthonormal basis for R~
Ix18 ' [Yl.6?
OrthogOnal Complements and Orthogonal proJections In this sect ion, we generali7.e two concepts that we encountered in Chapter I. The no· tion of a normal vector to a plane Will be extended to orthogonal complements, and the projection of one vector onto another will give nse to the concept of orthogonal projecllon onto a subspace.
IV! is pronounced ~ IV perp.~
Orthogonal Complellen\s A normal vecto r n to 11 plane IS orthogonal 10 every veclor in that plane. If the plane passes through the origin, then it is a subspace 'V of (Rl , as is span (n ). Hence, we have two subspaccs of Rl with the property that every vector of one is orthogonal to every vector of the other. This is the idea behind the foll owing definition.
,
•
IV
!k
Let \Vbe a subspace of R-. We say that a vector v in R is orthogonal to W If Y is ort hogonal to ewry vector in \V. The set of all vectors that are orthogonal to W is called the orthogonal complement 0/ \V, denoted \V"'. That is,
e
WI == { vi nR ~ : v'w = 0
flglrl5 .5 \VI and IV '"
s
Definition
,.. e'"
,
fora Jlwi n IV }
,
e1
EKlmple 5.8
e
If W is a plane through the origin in R l and is the line through the ongin perpen· dicular to W ( i.e., paral lcl to the normal vector 10 \V), then every \'ector von is orthogonal to every vector w in W; hence, = W .l.. Moreover, \V consislS precise/y of those vectors w that are orthogonal to every v on C; hence, \,>,e also have \-V = f J. . Figure 5.5 Illustrates this situation.
e
e
318
ChapteT 5 OrThogonalilY
In Example 5.8. the orthogonal complement of a subspace turned out to be ano ther subspace. Also, the complement of the complement of a subspace was the original subspace. These properties are true In general and are proved as properlles (a) and (b) of Theorem 5.9. Properties (c) and (d) will also be useful. (Recall that the intersectIOn A n B of sets A and B consists of their common elements. See Appendix A. )
Theo,.. 5.9
Let W be a subspace of R8. a. W.I. is a subspace of R8. b. ( W.l.).1.= W c. wn W.l. = 101 d. If w = span (wj • • . • , Wi)' then v is in W.l. if and only if v ' w, = 0 for all i = l •. . . , k.
Proal (a ) Since o · W "" 0 for all w in
W. O is in W..I. . u-t u and v be in W.I. and let c
be a scalar. Then u 'W = v·w = 0
forallwlfl W
Therefore, (u + v)·w = u ' W + V'w = 0 + 0 "" 0 so u + vis in W.I. . We also have
« . ) ·w
= « •. w) = «0) = 0
from which we sec that cu is in W.I.. It foll ows that W.I. is a subspace of R". (b) We will prove this properly as Corollary 5.12. (c) You are asked to prove this property in Exercise 23. (d ) You are asked to prove this properly in Exercise 24.
--
We can now express some fu ndamental relationships involving the subspaces associated with an m X " matrix.
Theore .. 5.1.
Let A be an m X n matrix. Then the orthogonal complement of the row space of A is the null space of A, and the orthogonal complement of the column space of A is the null space of AT:
P,ool
If x is a vector in R", then x is in (row (A».I. if and only if x is orthogonal to every row of A. But this is true if and only if Ax = 0, whi(h is equivalent to x bc:'ing in null (A), so we have established the firs t Identity. To prove the second identity, we s imply replace A by ATand use the fa ct that row (A T ) = col (A). Thus. an m X n mat rix has four subspaces: row(A), null (A ), col (A), and null (AT ). The first two arc orthogonal complements in R8, and the last two arc orthogonal
•
,
Section 5.2
I
Orthogonal Complements and O rthogonal Proj«tions
null(Al
,0
col(A)
row(A)
R" flglre 5.6 The four fundamental subspaces
complements in R"'. The mX /I mat rix A d efines a linear transfo rmation from R~ into R" whose range is collA). Moreover, th is transfo rmatio n sends null (A ) to 0 in Ill .... Figure 5.6 illustrates these ideas schematically. These four subspaces arc called the fundame"tal subspausofthe mX" matrix A.
Example 5.9
Find bases fo r the four fu ndamental subspaccs of
A ~
1
1
3
1
6
2
- 1
0
1
- 1
-3
2
1
- 2
1
1
6
1
3
and verify Theorem 5. 10.
50lullon
In Exa mples 3.45, 3.47, and 3.48, we computed bases fo r the row space, column space, and n ull SpattOr A. We fou nd that row (A) = span (u \, u l > lll)' where
Also, null {A) = spa n (x •• Xl)' where
x,
~
-1
1
-2
-3 0
1
0 0
,
X2
=
-. 1
To show thaI (row (A».!. = null (A ), it is enough to show that every u, is orthogonal to each x" which IS an easy exercise. (Why is th is sufficierll?)
311
Chapter 5
Orthogon:llity
The colu m n space of A is col (A) "" span {a J, a 2• a J ), where ]
2 , - 3
3 1 ""
=
32
]
]
- ]
]
2
4
,
a,
=
- 2
]
]
We still need 10 compute the null space of AT. Row reduction produces ]
2
- 3
4 0
]
0
0
]
]
- ]
2
0
6 0
0
]
]
- 2
-]
]
0 0 0
]
]
0 0 0 0
]
IA'I O)= 3
0 0 ] 0 3 0
3 0 0 0 0 0
•
]
•
•
0 0
0
So, if y is in the n ull space of AT, then Y. ""' - Y., Y2 "" - 6y., and Yj = - 3Y4. It fo llo\ys that
nuU(A')
- 6y. - 3y.
=
"" span
vCClO r
-3 ]
)'.
and it is easy to check tha t this
-. - ]
-Y.
is orthogonal to a l> a z• and a y
The method of Example 5.9 is easily adapted to other situatio ns.
(umple 5,10
Let W be the subspace of [R5 spanned by ]
wl =
-3 5 , w1 = 0
5
-]
0
]
- ]
2 , - 2 3
Wj=
4 - ]
5
Find a basis for W J.. .
Salallall
The subspace W spanned by wI'
w~ .
and wJ is the same as the column
space of
A=
]
- ]
0
-3
]
- ]
0
2 -2
4 - ]
5
3
5
5
Seclion 5.2
Orthogonal Complements and Orthogonal Project ions
319
Therefore, by Theorem 5.1 0, W i = (col(A))'" = null (AT ), and we may p roceed as in the p re vio us exam ple. We com pute
[A' loJ-
1
- 3
- I
1
°
5 2
- I 4
°J ° 5
•
- 2 0 - \ 5 0
° °0 01° o1 32 °° 1
°°
3
4
1
Hence, yisin W.L ifandonlY lf Yi = - 3Y4 - 4YS'Y2 = - Y4 - 3Ys, and YJ = - 2Ys. lt follows that
- 3
- 3Y4 - 4Y5 -Y4 - 3Y5 - 2ys
W l. =
-4 - I - 3 0 , -2 1 0 0 1
= span
y,
y, and these two vectors for m a basis for W-i
,
Orthogonal Prolee"ons Recall that, in Rl, the projection o f a vecto r v on lO a no nzero vector u is given by )
projo(v)
v)
U· = ( U· U
u
Furthermo re, the vector perpg(v) = v - proj,,( v) is orthogo nal to proju(v ), and we can decompose v as
v = proj.(v) + perpu{v) as shown in rigurc 5.7. If we leI W = span (u ), then w = proj., ( v) is in Wand w'" = perp.( v) is In Wi , We therefore have a way of "decomposing" v into the sum of two vectors, one from Wand the other orthogonal to IV- namely, v = w + W i . We now generalize this idea to R~.
Definition
Let Wbe a subspace of R~ and le t {u l , . . . , uJ.} be an orthogonal basis for W. For any vector v in R~, the ortlJogonal projection of v Ollt o W is defi ned as
The component ofv orthogonal to W is the vector
Each sum mand in the defi nition o f proj I~ V) is also a projectio n onto a single vecto r (o r, equivalently, the one-d imensional subspace span ned by it- in our p revIO Us sense). Therefore, with the notation of the preceding defin ition, we can write
p roj W
+ ... +
proj...(v)
•••
Chaplt r 5 Onhogonalily
• figure 5.8
P · PI+ Pl
Since the vectors u, are orthogonal, the o rthogonal projection ofv onto Wis the s um of its projections o nto one-di m ensional subspaces t hat are mutually orthogonal. Figure 5.8 illustrates this si tuation with W "" span(u l , u 1 ), P = proj \\f(v), PI = proj..,(v), and pz = proj..,(v). As a special case o f the defini tio n o f proj lV< v), we now also have a nice geometric interpretation of Theorem 5.2. In term s of o ur present notation and terminology. tha t theorem states that if w is in the subspace W of R~, which has orthogonal basis Iv!' vz" '" v,1. then
= proj.,(w )
+ ... +
proj.,(w )
Th us, w is decomposed into a sum of orthogonal projectio ns onto mu tually o rt hogonal one-dimensional subspaces of w. The definitio n above seems to depend o n the ch oice of orthogonal basis; that is, a different basis !u~, ... • ukl for W would appear to give a "differen t" proj ,V
lllmp,.5.11 Lei Wbe the plane
III
R' with equation x - )' + 2z = 0, and let v =
3 - I . Find the
2 o rlhogonal projection of v o nto Wand the com ponent of v orthogonal to IV.
S.'III..
In Example 5.3. we found an o rl hogonal basis for W. Taking - I
1
u,
1
and
ul =
o we have
1
u, -v = 2 UI •UI
1
= 2
ul 'v = - 2 U z' Uz
=
3
Sectron 5.2
Orthogonal Complements and Orthogonal Projections
381
Therefore,
.
( u,.v) + (u,.v) ,, _. , =i -
proJ w(v) =
Ur • Ur
u l
U2 " U2
1
- I
1
1
0
\
3
,nd
pcrpw{v) = v - projw(v) =
-\
2
Ul
!
_.,•
-
,• !,
-j
-
_.•,,
•,
It is easy to see that proj w{v) is in lV,since it satisfies the equation of the plane. It is I!qually easy to see that perp w(v) is orthogonal to W, si nce it is a scalar multiple of \
the normal vector - I to W. (See Figure 5.9.) 2
----,
,, ,, ,, ,,
v
, rlgur.5 .9 v - projw (v)
+ perpw (v)
The next theorem shows that we can always fi nd a decomposition of a vector with respect to a subspace and its orthogonal complemen t.
Theor,. 5.11
I
K
The Orthogonal Decomposition Theorem Let W be a subspace of R" and let v be a vector in R". Then there are unique vee.. tors w in Wand w.L in W 1 such that
We need to show two things: that such a decomposition exuts and that it is
Proat • "''''lu,.
To show existence, we choose an orthogonal basis !u " . .. projw(v) and leI w.L = perpw(v). Then w
+
wi
= proj\..{v)
I
Uti
for W. Let w ::
+ pcrpw(v) :: proJ\..{v) + (v - pro; w(v»
= v
31!
Chapter 5
Orthogonalit y
Clea rl y, w "" proj w(v) is in W, since it is a linear combination of the basis vectors u p .. . , uJ.. To show that W i is in wi, it is enough to show that w l is orthogonal to each of lh e basis vecto rs u ,~ by T heorem 5.9(d ). We compute U ,"
w1
"" U , '
pcrp h'(v )
"" u, ' (v - proj ",{v» "" U , '
( ( "'.,) Y -
•
UI
"" U r 'V -
(
UI ' V U.
",., ) (
-
U
>
UJ
-
J
)(U;>U J )
_
( "'.,) ",) " .,) ( Ul ' U k
,
• •• -
U, ' Ur
(u r '
U r)
( U,· U.)
Uk ' Uk
"" U, ·V - O - ··· =
UJ
U'Y -
•
"., ) ' ( (U, ·U,) _ U,>U,
·· · - O
U " Y "" O
•
= 0 for j if. i. This proves that w .L. is in W l and completes the existence part of the proof. To show the uniqueness of this decomposition, let's suppose we h,\\'(' another de(;omposition Y "" W J + wt, where w J is in Wand w tis . n W.L.. Then w + wi "" w, + wt , so
S lll CC U , ' llj
w -
WI ""
wt -
wt - w
1
But since w - W i IS in W lllld w 1 is in W 1 (because these arc subspaces) , we know that th iS common vector is in W 1 "" 101 [using Theorem 5.9(c) ]. Thus, w SO W J
=wandwt =
W
wn
J ""
wt - w -l-
""
°
W i.
Example 5. 11 illustrated the Orthogonal Decomposition Theorem. When W is the subspace of R J given by the plane with equation x - y + 2z = 0, the orthogonal 3 decomposition of v = -I \"ith respect to W.s V = W + W i , where
,
w = p roh"(v ) "" The word corollary comes from the La tin word corol/arium, wh ich refers to a garland gh'en as a reward. Th us, a corollary is a li ttle extra reward that follows from a theorem.
Corollarv 5.12
,, ,
1
-,,
,,
-,,
,"
The u niqueness of the o rthogonal decomposltJon guarantees that the defi nitions of p roj \V( v) and perp \V( v) do no t depend o n the choice of o rthogon
If W is a subspace of R", then
(W-I- )-I- = W
Section 5.2 Orthogonal Complements and Orthogonal Projcctions
383
'rl"
If w is in Wand x is in W.I., then w · x = O. But Ihis now implies that w is in (""J.)J. . Hence, W !': (WJ. ).1.. To see that we actually have equality here, suppose the contrary. Then there is a vector v in ( WJ.) J. that is nOI in W. By Theorem 5.11, we can write v = w + wJ. for (unique) vectors w in Wand w.l. in W J. . But now 0 = v·wJ. = (w
= 0 + w.l. · w.l. = w .l. · w i
+ w.! )· wJ. = w·w-l + wJ. ·wJ.
so wJ. = O. Therefore, v = w + W i = w, and thus v lion. We conclude that ( WJ.) J. = W, as required.
IS
in W- which is a contradic-
There is also a nice relationship between the dimensions of Wand WJ.. expresscxl in TheoremS.13.
Theor•• 5.13
If WIS a subspace of R~, then dim \V + dim WJ. -
/I
U*,
Proof Let luI' ...• be an orthogonal basis for Wand let {vI' ...• v,I be an orthogonal basis for WJ.. Then dim \V:: k and dim W.I. = I. Let 8 - {u ,•... • Uk> VI" .• , VI}. We claim that B is an orthogonal basis for R~. We first note that, since each u , is in Wand each vJ is in IVJ., u ,'v, = 0
fori = 1, .. "kandj = J ••• " I
Thus, B is an orthogonal set and, hence, is linea rly independent. by Theorem 5.1. Next, if V is a vector in Rn, the Orthogonal Decomposition Theorem tells us that v = w + w.l for some w in Wand w.l. in WJ.. Smce w can be written as a linear combination of the vectors U , and w.l can be written as a linear combi nation of the vectors v,, v can be written as a linear combination of the vectors in B. Therefore, 8 spans Al also and so is a basis fo r IR:~. It follows that k + I :: dim R-, or dim W + dim \v.t =
/I
As a lovely bonus, when we apply th is result to the fu ndamental subspaces of 3 matrix , we get a quick proof of the Rank Theorem (Theorem 3.26), restated here as Corollary 5, 14,
Corollary 5.14
The Rank Theorem If A is an /fi X n matrix, then rank(A)
+ nullity(A)
= n
Proof In Theorem 5.13, take IV :: row(A), Then IV.! "" null (A), by Theorem 5, 10, so dim IV - rank (A) and dim w J. = null ity(A), The result follows. Note thal we get a counterpart identity by taking W :: col (A) [and therefore WJ. "" null (AT)I:
Chapu~r
114
5 Orthogonality
•
Sections 5.1 and 5.2 have illustrated some of the adva ntages of working with orthogonal bases. However, we have no t established that ev~ry subspace has an orthogonal basis, no r ha ve we given a method for constructi ng such a basis (except in particular examples, such as Example 5.3). These issues an: the subject of the next section.
Exercises 5.2 III exercises 1-6, firullhe ort/rogollal complemellt w J. of W alld give tI htlsis for \VJ. . J.
•
111 Exercises 11- 14, leI W be tl,e subspace spall ned by the given vectors. find a basis for WJ. . 2
w "'" {[;]:2X- r =0 }
11.w,=
4 O
l , w,=
-2 2. W = { [;]:3X+ 4Y = 0}
1
o
- I
1
,
3
-2
-2
1
x
2
- I
2
- I
2
5
-3 -2
6
x
3. W =
4. W =
y:x+y - z = O
y:2x - y + 3z = 0
13. w, =
6
r :x ""' t,Y ""'- t, z = 3r
, x
6. W =
y :x - 2t,y = 2t,z =- t
,
'" Exercises 7 alld 8, fi"d 1)(I$C5 for lire rolV space and null space of A. Verify tlml every vcctor 11/ rolV(A) is orthogollal to every \lutor ill/wl/(A).
7. A ""
8. A ""
1
- I
3
5
2
1
0
1
-2
- I -I
1
1
-2 2
0 2
- 2
-3
- I
3
1
2
6
2
2
o , wJ
"'"
0 4
2
=
1
-I
-3
2
/11 exerCIses 15-18, jiml the ort/JOgollal projection of v O/lto t/'e subspace W spwllle(/ by the vectors u ,. ( YOII mayassl/me d lllt tire vectors u , Me ort/rogonat.)
[_:].u,~ [:] 3
2 4
1
4
14. w, "'" - 1 , w l 1 - I
l5.y ~
- I 0 2
1
1
16. v "'"
, U,
=
1
1
1
, u,. = - I
4
-2
1
0
1
1
2
- I
5
17.v ""
In ExerCIses 9 (/lui 10, fi r/{I bases for tire COII/ IIIII space of
A (Illd the 111111 space of ATfor tire givell exercise. Verify thlll every vector ill co/ fA) is orlll080llal 10 every vector in IIIIII(A T). 9. Exercise 7
, W,"'" .
3
x 5. W =
1
10. Exercise 8
18. v =
2 , u, ,. 3 3 - 2 4
-3
, U,
-2 , u,. =
1
1
-=
4 1
1
0
1
- I - I
0 1
0 0
, U~
=
1
, Uj "'"
1
Section 5.3 The Gram-Schmidt Process Jnd the QR Fac tori1.ation
In Exercises 19-22, [md the orthogonal decomposition of v with respect to H~
20. v =
21. v =
- 2 , \.V = span
2
3
1
4
1
- 2 , \¥ = span 3
22.
\I
=
1
,
1
2 • 1
2
1 , \¥ = span
- I
3
that v = w + w'. Is it necessarily true that w ' is in W l.? Either prove that it IS true or find a cou nterexample. 26. Let {v I' •.. , v.\ be an o rthogonal basis for [R" and let W = span (v., ... , v.l) . Is it necessarily true that W l = span(vhl"" , v ,,)? Either prove that it IS true or find a counterexample.
1
4
. 385
-I
111 ExerCIses 27- 29, let W be a sl/bspaceoJ IR", and let x be a vector /II R~. 27. Prove that x is in W if :lIld only if proj w(x) == x.
1
28. Prove that x is orthogonal to W if and o nly if proJw(x) = O.
0 1
29. Prove that proj\V(proj,,,(x)) = proh... (x).
1 •
1
0
1
30. Let 5 = {V I"'" Vi} bean orthono rmal set IIllR", and let x be a vector in R". (a) Prove that
Ixl12 >
23. Prove Theorem S.9(c).
24. Prove Theorem 5.9(d).
2 IX'v 11 +
Ix,vl + ... + IX 'V~ 2
(T his inequality is Gl lled Bessel's "Iequality. )
25. Let Wbe a subspace of R" and v a vector in JR.". Suppose that w and w' :Irc orthogonal vectors wi th w in Wand
(b) Prove that Bessel's Inequality is an equality If and o nly If x is in span( 5).
The Gram-Schmidt Process and the Factorization In this section, we present a simple method for construct ing an orthogonal (or or· thonormal) basis for any subspace orR". This method will then lead us to one of the most useful of all matrix facto rizations.
The Gram-Schmidl Process We would like to be able to find an ort hogo nal basis for a subspace W of R". The Idea is to begin with an arbitrary basis jx., ... , xd for Wand to "orthogonalizc" it o ne vector at a IIrne. We will illustrate the basic construction with the subspace W from Example 5.3.
Example 5.12 - 2
1
x,
1
and
X2
=
0
o
1
Construct an orthogonal basis for W. $Olulion Starting with xl' we get a second vector that the component of x 2 orthogonal to XI ( Figure 5. 10).
IS
orthogonal to it by taking
31&
Chapter 5
Orthogonality
--
x,-.,..... --- .,
/'
,,
,.,
"
flgur' 5.10 Constructing VI orthogonal to X I
Algebraically, we sel
VI
=
so
XI'
v2 = perpx,(xJ = x~ - projx,(x, )
= x2
(x, .x,)
-
XI
XI • X I
-2 0
-
( ,')
1
- I
1
1
1 1 0 Then lv l • vl l is an orthogonal set of vectors in IV. Hence, { V I ' pendent set (lnd therefo re a basis for W, since dim W :: 2.
Remark
v21
is a linearly Inde-
Observe that this method depends on the order of the original basis
- 2 vectors. In Exa mple 5.1 2, if we had taken X I =
o
1 and x, =
I , we would have
1 0 obtained a d ifferent orthogonal basis for W. ( Verify this.) The generalization of this method to more than two vectors begills (IS III Example 5. 12. Then the process is to iteratively construct the components of subsequent vectors o rt hogonal \0 all of the vectors that have already been constructed. The method is known as the Gram-Sch midt Process.
i
Theorem 5.15
The Gram-Schmidt Process Let
{ XI ' . ..
, xkl be a baSIS for a subspace
W
of R" and define the following:
v ,.x,) ( v,. V I ' VI
_
(
Vt_1 'X l ) Vl ' , l Vl _ 1 • Vl _ I
Then for each i = I , ... , k, Ivl'" . , vJ is an orthogonal basis for W,. In particular, Iv" ... , v k ) is an orthogonal basis for W. .
l
Section 5 3 The Gram-Schmidt Process and the QR Factori7.ation
Jorgen Pede rsen Gram ( 1850-1916) was a Danish actuary
(insurance statIstiCian) who was mterested in the science of measurement. He first publishl'
381
Staled succinctly, Theorem 5. 15 says Ihal every subspace of R~ has an orthogonal basis, and it gives an algorithm for construct ing such a basis.
Proal We will prove by induction that, for each
j =
I , ... , k, {VI' .. . , v;l is an or-
thogonal baSIS for W,. Si nee v I = X l' clearly IVII is an (orthogonal) basis for WI = span(x r). Now aSSume thai, for some i < k, Iv.. ... , v,} is an orthogo nal basis for W,. Then
By the ind uction hypothesis, {v.. ... , v,l is an orthogonal basis for span(x p W,. Hence,
••.
, x,) =
V'~l "" X,-<- I - proj ll',(xi+ l) = perpw{x,+I )
So, by the Orthogonal Decomposit ion Theo rem, V'-<- I is orthogonal to WI' By definitto n, v" ... , v, are iille, .. , v" ri is a basis for W,+ I' since dim W;~ I = j + 1. This completes the proof. If we require an orthonormal basis for W, we simply need to normalize the orthogonal vectors produced by the Gram-Schmidt Process. ThaI is, for each j, we replace v, by the unit vectorq, = (1/ IIv,l )v,.
Example 5.13
Apply the Gram-Schmidt Pro,ess to construct an orthonormal basis for the subspace W = span(x I ' X 2' Xl ) of 1It', where
x,
]
2
2
- ]
I
2
- ]
I
• x,
0 • I
Xj =
]
2
First we note that IxI' x2' xJI IS a linearly independent set, so it forms a basis for W. We begin by settmg v! "" XI ' Next, we compute the component of X 2 orthogonal to WI "" span(v l ) :
SOlU1l0D
aBB
Chapter 5
Orthogomllity
For hand calculalions. it is a good idea to "scale"v2 at this point 10 elim inate (ractions. When we are finished , \'IC can rescale the orthogonal set we are const rucling 10 oblain an orthonormal set, thus, we can replace each V, by any conven ienT scalar multiple without affecting the final result. Accordingly, we rep lace V2 by
3 3 I
I
We now fi nd Ihe componenl of x] orthogonal 10
W2
= span(x i• X l) = span(v l. v
l )
=
span(v l• vl)
using the orthogonal basis lvp ~!:
;. ("" X vl')v ~.
2
I
2
-I
- G)
I
2
3 _
-I
(~ ) 3 I
I
,,
I
o I
- I
o
Again, we rescale and use v; = 2v J =
•
I
2 We now have an orthogonal basis lv l• v~. vj! for W. (Check to make sure that these vectors are orthogon al.) To obtain an orthonormal basis. we normalize each vector:
q,
C:,I )v, G) =
I - I
1/2 - 1/2 - 1/2
=
- I
1/2
1
3
q2 =
C~I )'" = (,~)
3 I 1
=
-I
q,=
( l~i )V; = (~)
0 1
=
/q" 'h, 'LJ is an orthonormal basis for
W.
=
3VS/ IO 3VS/ IO VS/ IO VS/ IO - V616
0
0
1/v6
V6/6 V613
2/V6
2 Then
3/ 2VS 3/ 2VS 1/ 2VS 1/2VS - I/ V6
~ioll
5.3
The Gra m-SchmIdt Process and the QR ""aclori1.3tion
319
One of the important uses of the Gram-Sch midt Process IS to construct an orthogonal basis that contains a specified vecto r. The n ext example ill ustrates this application.
Example 5.14
Find an orthogonal basis for R J that contains the vecto r
y
1
,-
2
-
3
Solution
We first find any basis for R ' conta ining V I_ If we take
o and
xJ
-
0 1
then
Ivi • Xl '
x,1 is dea rly a basis for
[R 3.
(Why?) We now apply the Gram-Schmidt
Process to thiS basis to obtain =
VI
X2 -
(y, .x,) VI
0 1
V I • VI
1
-
(~) 2
3
0
,,
_1
- - I,
- I
•
v;=
5 - 3
and fi nally
C'·x,) ("V;'V;.x,)v,- -_
v, = x) -
VI -
VI • V I
0 0 1
- I
1
5 - (t) 2 - (1/) - 3 3
- 3 ~ =
'" '" - 0
,
.L
'"
0 1
Then
Iv.. vi. V;I is an orthogonal b'lsis for R} t hat contai ns
VI"
Similarly, given a unit vector, we can find an orthonormal basis that contains it by using the preceding method and then normalizlIlg the resulting orthogonal vecto rs.
Re .. lr. When the Gram-Schmidt Process is implemented on a computer, there is almost always some roundoff error, leading to a loss of o rthogonality in the vectors q ;- To avoid th IS loss of o rthogonality, some modlflC3tio ns are usually made. The vectors v, are no rmalized as soon as they 3re computed , rather than at the end, to give the vectors q ,~ and as each q, is computed. the remalfling vecto rs x, arc modified to be o rthogonal to qr This procedure is known as the Modified Gram-Schmidt Process. In p ractice, however, a version of the QR factorization is used 10 compute orthonormal bases.
Tbe OR factorization If A is an m X n matrix with linea rl y independen t columns (requiring that 111 ;:: 11), then applying the G ram-Schmidt Process to these columns yields a very useful factorization of A into the p roduct o f a mat rix Q with o rthonormal colum ns and an
3911
Chapter 5 Onhogollality
t ions to the upper triangula r matrix R. This is the QR Jactorizatiotl. and it has applica this section, numerical approxImation of eigenval ues. which we explore at the end of Chap ter 7. and to the problem of least squa res app roximat IOn. whi ch \~'C discuss in pendent) To see how the QR factorizatio n ame s, lei a l • ... • a ~ be the (linearly inde app lYIng the col umn s of A and let ql" . . , q" be the orthonormal vectors obtained by we know that , G ram -Schmidt Process to A with nor maIi7.ations . From Theorem 5.15, rore ach r= 1, .. . , 11, W, = span{a l • • . . ,a,) = spa n( q, •.. . , q,) The refo re, there are scalars ' 1,. r2;, ... , r" such that a, .", '"q l
+ rl,q l + ... + r"q, fo r i = 1, ...• "
Tha t is, a, = rll q l a1 = ' u q l a~
=
+
' u q!
',.A , + ' 2..Ql
+ ... + ' ....q ..
whi ch can be written in mat rix form as
A =
[al
a,
. ..
a.J
=
(q, q,
.. .
q.J
'"0
rl~
0
0
'n
..
.. .
""
',.
= QR
'-
the diagoClearly. the matrix Q has orthonorma l col umn s. It is also the case that a, is a lillear nal entries of R arc all non zero. To see this. observe that If ' " = 0, then a linear com combina tion of q l" ..• q,- l and, hence, is in W'_ I' Bu t then a, wou ld be independent. binat ion of a l , ...• a'_I' which is impossible, since a l •.•. , a, arc linearly follows tha lli We conclude that r" *" 0 for i = I, ... , II. Since R is upper tr!:lngular, it mus t be invertible. (See Exercise 23.) We h'lVe proved the follow ing theorem.
Theo,.. 5.16
The QR Fac lori zati on be facLet A be an mX'1 matrix wi th linearly independen t columns. Then A can and R is tored as A = QR •....,here Q is an ". X/I matrix \\lith orth onormal columns an invertib le upper triangula r mJ.trix.
••• • rks 'N< 0, • We ca n also arrange for the diag onal ent ries of R to be positil'e. If any simply replace q, by - q, and ' " by - r,~ ssary one . • The requirement that A have linearly inde penden t colu mns is a nece ion , as in TheTo prove this, suppose that A IS an mX /I matrix that has aI QR factorizat = ra nk(A), orem 5. 16. Then, since R is invertible, we have Q = AR- . Hence, rank( Q) ono rmal by ExerC ise 55 in Section 3.5. But rank ( Q) = II, si nce liS colu mns are orth sequen tly the and , therefore, linearly indepen dent. So rank (A) = 11 100, and con colu mns of A arc linearly independen t. by the Fundamental Theorem .
Section 5.3
The Gram-Schmidt Proew and the OR Factorization
391
• The QR factor ization can be extended to arbitrary matrices in a slightly modified form. If A is /fi X II, it is possible to find a sequence of orthogona l matrices Q .. .... Q"'_I s uch that Qm- I" Q2Q 1A is an upper triangular mXII matrix R. Then A = QR, where Q = (Qm- I '" Q2Q.)-1 is an orthogonal matrix. We will examine this approach in Exploration: The Modified QR Factorizat ion.
Example 5.15
Find a QR facto rization of
A~
I 2 - I I - I 0 1 1
2 2 1 2
Solullon The columns of A are just the vectors fro m Example 5.13. The o rthonormal basis for co l(A) produced by the Gram-Schmidt Process was
1/ 2 - 1/ 2 - 1/ 2 • q, ~
3VS/ 10 3VS/ 10 VS/I O'
1/ 2
Q~
-\16/6 0
q, ~
\16/6 \16/3
VS/ IO
[q, 'b 'bl
1/ 2 3VS/ 10 - 1/ 2 3VS/ 1O - 1/ 2 VS/ IO VS/ IO 1/ 2
~
-\16/6
o \16/6 \16/3
From Theorem 5. 16, A = OR for some upper triangular matrix R. To find R, we use the fact that Q has orthono rmal columns and, hence, QTQ = I. T herefo re,
I" Exercises 1-4, the givell vectors form a basis for iR 2 or R.I. Apply the Gram-Schmidl Process to oblain all orl/lOgO/IUI basis. Tllell IIormalize this basis to oblaill an orthonormal
2.
x,~ [_:].x,~ [:] I
btuis.
Lx,~ [:].x,~ [;l
3.
Xl
=
0
- I , x2 =
3
- 1
3
• Xl
=
3 2 4
392
Chapter 5 Orihogomailly
4. x, =
/
/
/ • x~ =
/
/
0
/
, xJ =
0
14. Q =
0
I" Exemses 5 a"d6, ' he gil'en l'eelo1"$ form a baSIS for a SIIbsp(lce IV of IItl or IItt. Apply Ihe Gram-Sc/HI/idt Process to obtain an orthogonal basis for w: 2 3
of. x, --
/
3
/ • x,= • 0
4
- /
6. XI =
/
2
- / , Xl =
2
0 4
In Exercises 7 and 8, fimllile orthogonal decomposition of v
lVith respect to IIII! SUbs/Hlce w: 4 - 4 , \Vas in
7. v =
17. A = Exe rci~
5
, Was 111 Exercise 6
I I
I
10.
I
0
I
I
I
I
- I
2
- I
I
I
5
0 I
11. Fmd an orthogonal baSIS for R J that contains the
3 vector
-2
-2
I
I
3
2
4
-I
- I
•Q
=
-~
•Q =
1/ V. 2/ V. - I/ V.
o
~
I / YJ
o I / YJ I / YJ
of A. 20. Prove that A is invertible if and only if A = QR, where Q is o rthogonal and R is upper triangular with nonzero emries o n its diago nal.
III Exercistj Z J (md 12, lise tile metllod slIggested by Exercise 20 to compule A - I for Ihe lIlatrix II III the givell
exercise. 2 1. Exercise: 9
I
5 12. Find an o rthogonal basis for R 4 tha t cont:lins the veclOrs I 2 I
0 - I
and
0
3 2
In Excmses /3 and 14, fill ill tIll! missillg entries of Q /0 IIwke Q rill ortllogo/l(l/ matrix.
13. Q =
- I
,, -,,, ,
19. Ir A is an orthogonal matriX, find a QR ractorization
I
0 I
I
I
, ,! !, 1 ,
1
2
8 7
o
US/! tlte Gmlll-Schmidt Process 10 find an orlhoJ,'OI/allxlSis for the colllmll sp(lces of the ttlmrices in Exercises 9 {md 10.
9.
16. Exercise 10
In Exerrises 17 "",I 18, the colllmm of Q lVe{e obtained by applyillS tIle Grnm-Sdllllitlt Process to the colllmns of A. Fi"d the upper ,{i(IIIS"/(lr 1Ilf//rix R SlIell that A = OR 2
2
0
1/ v'I4 0
15. Exercise 9
18. A =
o
1/ 2 1/ 2
• • • • • • • •
III Exerases 15 alld 16, filld {j QR faerorizatioll of the matrix ill tile givell exerciSe.
/
8. v =
2/ v'I4
1/2 - 3/v'14
3 4
1/2
1/'./2
I/ YJ
o
I / YJ
-I/ V,
I/ YJ
• •
•
22. Exercise 15
23. Let A be an //IX II mat rix wi th linearly independent columns. Gi\'e an alternative proof that the upper triangular matrix R in a QR factorization of II must be invertible, usi ng properly (c) of the Fundamental Theorem. 24. Let II be an m X " matrix with linearly independent columns and let A = QR be a QR fa ctonzaliOll of A. Show that II and Q have the same column space.
The Modified QR Factorization When the matrix A does not have linearly md cpendcnt columns, the Gram-Schmidt Process as we have sialed it does not work and so cannot be used to develop a generalized QR factorization of A. There is a modification of the Gram-Schmidt Process that can be used , bu t Instead we will explore a method that converts A into upper triangular fo rm one column at a time. using 3 sequence of orthogonal matrices The method is analogous 10 that of LU factorization, in which the matrix L is fo rmed using a sequence of elementary matrices. The first th ing we need is the "orthogonal analogue" of an elementary matrix; thai LS, we need to know how to constr uct an orlh ogonal matrix Q that will transform a
y
FIIIUI 5.11
given column of A-call II x- into the corresponding column of R-call it y. By Theorcm 5.6, it will be neCess.1ry that Il x ~ = ~ Qxl = ~ yl . Flgurc 5. 11 suggests a way to proceed: We can reflect x in a line perpcndicular to x - y. If
!
I
f Alston Houschold~r (1904- 1993) was one of Ihe pionccrs in Ihe field of numerical linear algebro. He W:t.~ the nrsllo pre~ nt ;1 systematic trealment of algoTlthms for solving
problems ul\'olvi ng IlIleaTsystems. In addition to IIltroduCtng th~ widely used Uouseholder trnnsformaflons that bear hiS name, he was one of the nrst to advocate the systemallc use of norms in hncar algebra. HIS 1964 book nle 1111'ory of Motrkn in Mlllluiall Alla/ysis IS m,,, a classIC.
d,,,,,,
is the unit vecto r in the direction of x - y, then u.l. = [
-~:J is orthogonal to u. and
we can use Exercise 26 in Section 3.6 10 find the standard matrix Q of the reflection In the line through Ihe origm in the direction of u.l. . I.
2.
I - 2d l
Show that Q :: [
, "
-2(1( 2
Co mpute Q for
We can generalize Ihe defi nition of Q as fo llows. If u is any unit vector in R", we define an /IX" matrix Q as
Q :: J - 2uu T
...
Such a matrix is called a HouS('holder matrix (or an elementary "i!edor).
3.
Prove that every Householder matrix Q satisfies the following properties: (3) Q is symmetric.
ll ,
(b) Q is orthogonal.
(c) Q l = I
4. Prove thai if Q is a Ho useholder matrix corresponding to the unit vector then Qv = {
- v v
ifv is in span(u) if v'u = O
I
5. Compute Q for u ==
- I and verify Problems 3 and 4.
2 Let x *- y with Ilxll == Iyl and sct u == ( If Ix - yl)(x - y). Prove that the correspond ing H ouseholder mat rix Q satisfies Ox = y. (Hi"t: Ap ply Exercise 51 in Section 1.2 to the result in Problem 4.)
6.
7.
Fi nd Qand verify Problem 6 for
x=
I 2
y =
and
3 0
o
2
\Ve are now read y to perform the triangubrization of an m X " matrix A, column by column. 8.
Let x be the first co lum n of A and lei
o Show that if 0 1 is the Householder mat rix given by Problem 6, thcn QIA is a mat rix with the block form
whereA 1 is (m - I)X (n - 1).
If we repeat Problem 80n the ma trix AI' we usea Householder matrix P1 such that
•
where A2 is ( m - 2)X( n - 2) . 9.
Set Q 2 = [ I
o
0 ]. Sho w that Q1 P,
IS
an orthogonal matrix and that
• • • •
...
10. Show that we c:m contmue in this fash ion to find a sequence o f orthogo nal matrices QI" ' " Q"'_I such th.1t 0",- , '" Q20lA = R is ;ln upper tri:mgular mX n matrix (i.e., r'J = 0 if i > )). ! I.
Deduce that A = QRwith Q = Q I Q2
12.
Use the method of this explorauo n LO find a OR factorization of (, ) A _ [
3
- 4
3
0 ",-1 o rthogonal.
1
;]
9
.
(b ) A -
2 2
3 - 4
3
2
1
1
-s
- \
- 2
Approximating Eigenvalues with the QR Algorithm
See G. H. Golub and C. F. \'an Loan, Matrix Comp"tt/tions
(Bahimore: Johns Hopkins Ul1lver~ity Press, 1983).
One of the best (and most widely used) methods for numerically approx imating the eigenvalues o f a matrix makes use o f the OJ? fa ctorization . The purpose of th is exploration is 10 introduce this method, the QR algorithm, and to show it at work In a few examples. For a more complete treatment of l his tOP'C, consult any good text on numcricallinear algebra. (You will find it helpfu l to usc a CAS to perform the calculations in the problems below.) Given a square matrix A, the first step is to factor it as A =< OR (using whichever method is appropn ;1 te) . Then we define A, = RO. I.
First prove Ihm Al is simi lar to A. Then prove that AI has the same eigenval-
[ ues asA. 2. If A = :
]
~,fi nd AI and verify th at it has the same eigenvalues as A.
Continuing the algorithm, we factor Al as A, ... Q,R, and set A] "" RIQI' Then we factor A2 == Q) R2 and set Aj ::: Rl O l , and so 011. That is, for k?: I , we compute A, = QtR, and then set AU l == I~., Q,. 3.
Prove that At is similar to A for all k ?: I. A~,
4. Continuing Problem 2, compute A!, A ), A4 , and accuracy. What do you notice?
using two-decimal-place
It can be shown that jf the eigenvalues of A are all real and have distinct absolute values, then the m3lrices At ap proach an upper triangular mat rix U.
5.
What will be true of the diagona l entnes o f this matrix U?
6. ApproXImate the eigenvalues of the following mat rices by applying the QR algorithm. Use two-decimal- place accuracy and perfo rm at least five iterations.
(, )
[~
:] 1
I
(e)
1 2
- 4
7. Why?
Apply the
0
0
(b ) [; - \
1
I
:] I
(d )
\
- \
0 2
0
- 2
OJ? algorithm to the matrix A =
4 [
2 2
-I
3]. What happens?
- 2
39.
8.
Shift the eigenvalues of the matrix in Problem 7 by replacing A wi th B = A + 0.91. Apply the OR algor it hm to B and then shift back by subtracting 0.9 from the (approximate) eigcnvalues of B. Verify thaI this method approx.imatcs the eigenvalues of A. 9.
Let 0 1)
=0
and RI)
= R. First show that
~O l
... O,,_]A k = AOOOI ... Ok - l
for all k::! 1. Then show that
(000 1'" Ol)(R~ '" R1RO) = A(OOOI "Q"_I)(R"_1 ". RlI?O)
I Hint: Repeatedly use the same approach
used for the fi rst equation, working from th e "inside out."] Finally, deduce that (00 0 1 ..• Ok)( R,, ' " R]Ro) IS the OR factoriza tion of Al+l .
Section 5 "
O rlhogonal Dmgon.di7.a tion of Symmetric Ma t rices
191
OrthOgOnal D1agonalization 01 Symmetric Matrices We saw in Chapter 4 that a sq u3re matrix with real en tnes will not necessarily have real eigenval ues. Indeed. the matrix
[~
-
~] has complex eigenvalues; and -
i. We also
discovered that no t all square matrices are d iagonali1.ablc. The situation changes dramatically if we restrict our attention to real symmelrtCmat rices. As we will show in this section. all of the eigenvalues of a real symmetric mat rix are rcal, and such a matrix is always d iagonahzablc. Recall that a symmetric matrix is one that equals its own transpose. Lei's begin by studying the diagonalizatlOn process for a symmetric 2X 2 matrix.
Example 5.16
If possible, diago nalizc the matrix A =
[,I -,'] .
The char'lCteristic pol ynom ial of A is,\ 2 + ,\ - 6 = (,\ + 3)( '\ - 2), rrom which we see that A has eigenvalues AI = - 3 a nd A2 = 2. Solving for the correspond ing eigenvectors, we find
SaiIUo.
respectively. So A is diagonaiizable, and if we SCI P = [v I v2] , then we \mow that Ir IAP =
[-~ ~]
= D.
However, we can do better. Observe that VI and v1 are orthogo nal. So, j( we normalize them to get the unit eigenvectors
u - [ I -
1 / v'S] - 2j VS
, nd
U2
v'S]
2/ = [ Ij Vs
and then take Q = [ UI
u,) =
[
-'Iv'S 'Iv'S] 1/ v'S 1/ v'S
we have Q- IAQ = 1) also. But now Q is an orthogQlwl mat n x, since lUI' u21 IS an o rthonormal sel of v«tors. Therefore. 0 - 1 = Q T, and we have OTAQ = D. (Note that checking is easy, since computi ng 0 - 1 only involves taking a transpose!)
The sit uat ion in Example 5. 16 is the one that interests us. It is importa nt enough to warrant a new defin ition.
Definition
A square matrix A is orthogollally diagollalizable if there exists an orthogonal matrix Qand a d iagonal matrix D such that QTAQ = D.
We are interested in finding conditio ns under which a matrix is orthogo nally dlago nalizable. Theorem 5.17 shows us where to look.
398
Chaptt!r 5 Orthogonalit y
neor•• 5.11
*1
If A is orthogonally diagonatizable, then A is symmetric. •
Proal
If A is orthogonally diagonalizable, then I'here exists an orthogonal rna· Irix Qand a diagonal matrix Dsuch that QTAQ = D. Since Q- l "" QT, we have QTQ = 1 = QQT, so
Hut then AT = (QDQ,) T = (Q')'DTQT = QDQT = A
since every diagonal matrix is symmetric. Hence, A is symmet ric. RBm.r~
Theorem 5.17 shows that the onhogonally diagonalizable matrices arc all to be found amollg the symmet ric matrices. It does /101 say that every symmet ric matrix must be orthogonally diagonalizable. However, it is a remarkable fuct that this indeed is true! Pi nding a proof for this amazing resu lt will occupy us for much of the rest of thiS section. We next prove that we don't need to wo rry about cOlllplexeigenvalues when work· ing with symmetric matrices with real en tries.
Theorem 5.18
If A is a real symmet ric matrix, then the eigenvalues of A are real. Recall that the complex conjllgnreof a complex number z = a + In is the number z = a - vi (see Appendix C). To show that z is real, we nccd to sho," thai /, = O. One way to do this is to show that z = z, fo r then bi = - b/ (or 2bi = 0), from which it follows that b = O. We can also extend the notion of complex conj ugate to vectors and matrices by, for example. defi ning Ato be the mat n x whose entries are the complex conjugates of the entTies of Ai thai is, if A = la;), then A = [tI'I]' The n~e.! for complex conjugation extend easily to nl:ltnces; in particul:lr, we have AS = A H ror compatible mat rices A and B. Proof Suppo~ that A is an eigenvalue of A With co rres~ ding eigenvector v. Then Av = ..\.v, and, taking complex conjugates. we have Av = Av. But then
Av =
AY = Av =
Av
= ..\.v
si nce A is real. Taking transposes and usi ng the fa ct that A is symmet ric, we have vTA
=
vTAT = (Av) T = (..\.V) T = Av T
T herefore, A(vTv) = vT(Av) = vT(Av) = (VTA)V = (AvT)v = X(yTV) or (A - A)(yTV ) = o. al
+
bji
• •
Nowifv = a~
+ b"i
,thcnv =
&clion 5.4
Orlhogonal Diagonalizalion ofSymmrlrk Malrkes
nl
'*
since v 0 (because it is an eigenvector). We conclude that A - A = 0, or A = A. Hence, A is real. Theorem 4.20 showed that, for any square matrix, eigenvectors corresponding to dIstinct eigenvalues are linearly independent. For symmet ric matrices, something stronger is true: Such eigenvectors are orthogonal,
!
Theoll.. 5.19 ,
F
i
If A IS a symmetric matrix, then ;lny two eigenvectors corresponding to distinct eigenvalues of A are orthogonal.
PUloi Let VI and V2 be eigenvectors corresponding to the distinct eigenvalues AI Al so Ihat AVI = Aiv i and AVI = A2v2• Using AT = A and the faClthat x' y _ xTy fo r any two vectors x and y in An, we IUllle
'*
= (A1v1) 'V2 = AVI 'Vl
AI(V1'V2)
= (AV1)TV1
= (v;A1) v~ = (V; A)V2 = vi(AV2)
== vi(A1v1 )
Example 5.11
::
Abiv2 ) = Az(VI 'V2)
VeritY the result of Theorem 5.19 for 2 A ""
,
,
121
1 1 2
1.1.11..
The characteristic polynomial of A is _ AJ + 6A2 - 9A + 4 = -( A - 4) · (A - 1) 1, from which it follows that the eigenvalues of A arc AI = 4 :Ind Al "" I. The correspondi ng eigenspaccs are ~
= span
, ,
,
- ,
and
EI = span
(Check Ihis.) \Ve easily verify that
, -, , , I,O
, =Oand
from which it follows that every vector in (Wh y?)
- I
••••'l
Note that
I , /;~
·
- I
O .
,
1
0
, 1 0
= 0
is orthogonal to every veCior in El •
- I
O· I I. Thus, eigenvectors corresponding to the , 0 same eigenvalue need not be orthogonal.
.11
Chapter 5 Orthogonality
We can now prove the main result of this section, It is called the Speclfai Theorem, Since the set of eigenvalues of a matrix is sometimes calJwlhe sp~ctrum of Ihe m atrIX, (Technically, we sho uld call Theorem 5,20 the Ret,1 Spectral Theorem, since there is a corresponding result for mat rices with complex entries,)
I
Theorem 5.20
T he Spectral Theorem ~I A
be an
nX n real mat rix. Then A is symmetric if and only if it is orthogonally
diagonalizable, =
We have already proved th~ " if" part as Theorem 5. 17, '10 prove the "only if" implication, we proceed by induction on n. For n = I, there is nothing to do, since a I X I matrix is already in diagonal form, Now assume thai every kx k real symmetric matrix wtt h real eigenvalues is orthogonally diagonaliz..1ble, Let f1 = k + I and let A be an nX rt real symmetric matnx with real eigenvalues, Let AI be one of Ihe eigenvalues of A and let v t be a corresponding eigenvector. Then VI is a rea/vector (why?) and we COlO assume Ihat VI is 11 unit vector, SlOce o therw isc we can normahze it and we wilt still have an eigenvector corresponding to AI' Using the Gram-Schmid t Process. we can extend VI to an o rthonormal basis IVI! v2•••• • v~1 of Rn, Now Wf: form the matrix
P,oDI
Spectrum is a latin word m~an 118 " im age,~ When atoms vibrate, the emit light. And when light pa~ses through a prism, it spreads out into a sp«trum-a band of rai nbow colon, Vibration freq u~ncies correspond to the ~igenV;)lucs of a certain operata and art \'isibl~ as bright lines in the spectrum of light that i\ emined from II prism. Thus, we (an literall y see the eigenvalue., of the atom in its spectrum, and for this reason, it is ilpl'ropriau~ that the word 5pec'/rum has come to be ap plie
v.l T hen Ot is orlhogonal, and
OiAO I =
vi vi v'• vi vi
[ Av I
A[ VI
,
Av ... Av
.J
v'• [ AtVt
v'• A OJ = ['"i)':"A"t' = B
"
1n :I kcture he delivered at tht University of GOllmgt n in 1905, tht Gtrman mathern:uician David Hilbt-rt ( 1862- 1943) considered linear operators acting on certain infinite-dimensional
vector spaces. Out of this lectu re arose the notion of a qU:ldratk form 111 infinitely many variables, and it was in thIS context that Hilbert fi rst USl'd the term spectrlml to mean a complete set of eigenvalues. The spaces in qucstion arc now called Hilbert spaces. Hilbert made major contributions to many areas of m;lthematics, among them mtegral equations, numi>cr theory, gcometry, and the found:ltlon s of mathematiCs, In 1900, at the Second International Congress of Mathemallcians in Paris. Hilbert gave an address entitled "The Problems of Mathemalics," In ii, he challenged mathematiCIans to soh'~ 23 problems of fundamental importance during the commg CCTItU:Y. Many of the problems ha"c been solved-some wcre proved true, olhers falsc-and some may never be solved, Nevertheless, Hilbert 's speech energized the mathemallcal commulllty and is often regarded as the most influential spe« h ever given about mathematics
St'ctio n 504
Orthogonal Diago na tization o f Symmetric Matrices
•• 1
since vJ( A.v ,) = A.(v;v.) = A.(v 1 ' ",) = A. and v;(A,v.)::: A.(v; " .) = A.(v, · " .) = 0 fo r i I, because {VI' " 2" '" v ~1 is an orthonormal sct.
'*
Bu. OT = (QiAQ.V = QiA T(Q~Y = Q;AQ. = 8 so 8 is sym met ric. Therefore , 8 has the block fo rm
8 ~ [A,+_O1 oi
AI
and AI is symmet n c. Purthermorc, IJ is similar to A (why?), so the characteristic polynomial of lJ is equal 10 the characteristic polynomial of A , by Theorem 4.22. By Exercise 39 in Section 4.3, the characteristic polynom Ial o f AI divides the characteristIC polynomial o f A. It follows that the eigenval ues of AI arc also eigenvalues of A and, hence, arc real. We also see that AI has real entries. (Why?) Thus, AI is a kx k real symmetric mat rix with real eigenvalues, so t he induction hypothesis applies to it. Hence, there is an orthogonal matrix P2 such that pTA, P2 is a diagonal mat rix-say, 0 .. Now let
Then Q2 is an orthogonal ( k Consequently,
+
I)X( k
+
I) matrix. and therefore so is Q "" Q,Qr
QTAQ ,. (O,Q2)l A{Q,QZ) = (OIQi)A( O,0 2) - OI{OiAO,)0 2 = OIl:IQz =
[~+~t][ t+'1;][i-+~J A, !
0
= [ ·i)··t- prA~-p~
1
:: [~~+~',] which IS a diagonal matrix. This completes the induction step. and we conclude tllat, for all 11 ~ I, an "XII real symmetric matrix with real eigenvalues is orthogonally diagonal.zable.
Example 5.18
Orlhogonal1y di
A-
2
1
1
I
2
I
1 1 2
This is the ma tm from Example 5.1 7. We have ,llready fOll nd that the eigenspaces of A are
SO'IUOI
- 1
1 ~ = span
I 1
alld
E,
=
span
-1
0 •
1
1
0
402
Chapter 5 Orthogonahty
We need three o rt honormal eigenvecto rs. First. we apply the Gram-Schmid t Proc~ to - I
- I
o
and
I
o
I
,,
- I
o
to obtai n
and
I
I
- I
T he new vector, wh ich has been constructed to be orth ogonal to
o
, IS
still in EI
I I
(wh y?) and so is orthogonal to
1 . Thus, we have three mutually orthogonal I
vectors, and all we need to do is normali7.e the m and co nstruct a matrix Qwith these vectors as its colu mns. We find that
1/ v3 - 1/ v> Q = 1/ v3 o 1/ v3 I/ V>
-v v'(, 2/ v'(, - II v'(,
and it is straightforward to verify that
QTI\Q =
4 0 0 0 I 0 001
The Spectral Theorem allows us to wri te a real symmetric matrix A in the form A = QDQT. where Q is orthogonal and 0 is diago nal. The diago nal entries o f D are just the eigenvalues of A, and if the columns of Q are the o rt ho no rmal vectors q l' ...• q", then usi ng the column-row representation o f the product we have
... o A = QDQ' =
[q, ... q.)
• •
• •
q; = Alq lq i + A1q lqJ + ... + A nqnq~ This is called the sputral decomposition o f A. Each of the terms A,q ,q ; IS a rank I m atrix. by Exercise 56 in Section 3.5, and q ,q ; is actuall y the matrix of the proJectio n o nto the subspace spanned by q ,. (See Exercise 25.) For this reason, the spectral d eco mpositio n
IS
sometimes referred to as the projection fo rm of the Spectml Thtorrm .
S«-tion 5.4
Example 5.19
'03
O rthogonal Dlagoll 3li7.3tio n of Symmetric Matrices
Fmd the spcrtral decomposition of the matrix A fro m Example 5.18. $,IIUIII
Fro m Example 5. 18, we have:
1/ 0 1/ 0 . 1/ 0
q, -
Therefore.
,,
q q' -
1/0 1/ 0 [1/ 0 1/ 0
1/ 3 1/ 3 1/ 3
- I/V>
1/ 3 1/ 3 1/ 3
1/ 2 0
o
o
[- I/ V> 0 I/ V>] -
1/ 3 1/ 3 1/ 3 - 1/ 2
o
0
- 1/ 2 0 1/ 2 I/ V> -I/ V(, 1/ 6 -1 / 3 1/ 6 2/ V(, [- 1/ \16 2/ \16 - 1/ \16 ]- - 1/ 3 2/ 3 - 1/ 3 - 1/ \16 1/ 6 -1 /3 1/ 6
, 1, ,1 ,1 1, ,1
1
- 4
, ,I ,I
1
+
,
_1
0
0 0
,
0
1
1 _1
0
,
+
, 1• , ,1 _ 1, 1 _1, 1 • •
! _1
_1
which can be easily verified. 111 this cx,lI11ple, A! = Aj.SQ we could combine the last two terms A1qlqf to get
,, , , _1
• , , _ ,1
-1 _ 1
+ Ajq JqJ
, , 1 ,
- j - j
The rank 2 miltrix qzqJ +
Observe Ihilt the spectral decomposition expresses a symmetnc matrix A explicitly in terms of its ei genvalues and eigenvectors. This gives us a way of constructing a matrix w1th given eigenvalues and (orthonormal) eigenvectors.
Example 5.20
Find a 2X2 Oliltrix with eigenvalues AI = 3 and Al = - 2 and corrcsponding cigcJlvcc lors
V,- [!] ,nd V,-[-:]
404
Chapter 5 O rthogonality
Solullon
We begin by normal izing the vectors to obtain an ortho normal basis {q p ql j, with ql =
U]
and
q2 = [
-t]
Now, we com pUle the ma trix A whose spectral decomposition
IS
+ A2q2q;
A =
A1q lq;
=
3[;J[i
~l _ 2
2[ -i][ -~
.
[" 2S
_11
~]
-~l "
It is easy to check that A has the desired properties. ( Do th is.)
Orthogonally diago"a/,ze Ille matrices in Exercises 1- 10 by finding an orthogonal mntrix 0 alltl a diagonal malrix D slICh that Or AO :::: D.
I. A= [:~ ] 3. A =
2. A ::::[- ~ _ n
[~~]
4. A =
500 5. A=O
7. A
=
9. A =
I
3
03
1
2
6.A =
I
0
- I
0
I
0
-I
0
1
I 1 0 1 I 0 001 001
0 0 I 1
B.A =
-!]
[_~ 3
0
32
4
o
2
4
0, o rthogonally dlagonalize A ::::
0
a 0
bOa 13. Let A and B be o rthogo nally d mgonalizable /IX n matrices and let c be a scalar. Use the Spectral Theorem to prove that the following matrices are orthogonally diagonalizable:
+B
(b ) cA
(c) Al
15. If A and B are o rthogonally diagonalizable and AB = BA, show that AB is orthogonally diagonalizable.
0 0 1 o 1 0 o o 0 1 o 1 0 0 2
[~
*"
b
14. If A is an invertible mat rix: that is o rthogonally diagonalizable, show that A-I is orthogonal ly diagonal izable.
1 2 2 2 1 2 2 2 1
11. If b -i= 0, orthogonally diagonalize A ""
12. If b
(a) A
2
to. A ::::
a 0
~l
16. If A is a symmetric matrix, show that every eigenvalue of A is non negative if a nd only If A = If for some sym metric matrix 8.
In Exercises 17-20, find a spectral decomposition of file matrix in the given exercise. 17. Exercise I 19. Exercise 5
lB. Exercise 2 20. Exercise 8
SectIon 5.5
1n Exercises 21 (lluI 22,jilld a symmetric 2X2 matrix with eigenvalues A, (wd A2 alltl corresponding orthogonal eigenvectors v, (wd v 2• 21. A,
=
- 1, A2
::
2, v ,
=:
22. Al = 3, Al = - 3, v, =
[:], V2
=: [ _ : ]
[ ~J. v
1 :: [
-n
In Exercises 23 and 24,fiud Ilsymmetric 3X3 matriX with eigelll'(l /ues A" A2, and AJ and corresponding orthogonal clgel/vectors v,, v~, mid V J •
App\icallons
4115
a vector v o nto W (as defined in Sections 1.2 and 5.2) is given by proj w( v)
=:
(qq1)v
and that the matri x of this projection is thus qq·r. ( Him: Remember thal, for x and y in IR", x' y = x Ty.) 26. Let Iq " ... , q tl be an orthono rmal sct of vectors in R" and let Wbe the subspace span ned by this set. (a) Show tha t the matrix of the orthogonal projection onto W is g iven by
P = q,q;
+ ... + qkq[
(b) Show that the projection matrix P in part (a ) is sym metric and satisfi es p 1 = P. (c) LetQ = [q , ... qk) betheIlXk mat ri xI ... hose columns are the orthonormal basis vecto rs of W. Show that P = QQT and deduce tbM mllk ( P ) = k. 27. Let A be an /IX /I real matrix, all of whose eigenvalues are real. Prove that th ere exist an orthogonal matrix Q and an upper tr iangular matrix Tsuch that QTAQ = T. This very usefu l result is known as Schur's Triangu larization Theorem. (Hi"t: Adapt the proof of the Spectml Theorem.)
25. Let q be a un it vecto r in R~ and let W be the subspace spanned by q . Show that the orthogonal projection o f
28, Let A be a nilpotent matrix (see Exercise 56 in Section 4.2) . Prove that there is an orthogonal matrix Q such that QT AQ is uppe r triangular with zeros on its diagona l. [ Hlllt: Use Exercise 27.J
Applications Dual Codes There are many ways of constructing new codes from old ones. In thiS section , we consider one of th e most important of these. First, we need to generalize the concepts of a gene ra to r and a parity check matrix for a code. RecaJl from Section 3.7 that a sta n dard generator matrix for a code is an /I X m mat rix of the form G=
and a standard parity check matrix is an p=
['~l
( II -
[B i
m) X II mat rix of the form
'.-.J
Observe that the fo rm of these matrices guara ntees that the column s of Care !1Ilearly independent and th e rO\'o'S of P are linearly independent. (Why?) In proving Theorem 3.34, we showed that C and P arc assoCiated with the $Iwlccode if and onl y
Orthogonality
if A = 8, which is equivalent to requi ring Ihal PC = O. We use thrse propertIes as the basis for the following defmition.
Definition
For II > III, an II X m matrix C and an (II - k)Xn matrix P (with ent ries in &::2) are a generator matrix and a parity clleck "mtrix. respectively, fo r a n (II, k) bmary code eif the following cond itions are all satisfi ed: I. The columns of G are linearl y independent. 2. The rows of Pare linearly independent. 3. PG = 0
Notice thai property (3) implies Ih:lt every column v of G satisfies Pv = 0 and so is a code vector in C. Also, a veclor y is in C if and on ly if il IS obtained from the generator matrix as y = Gu fo r some vector u in &::2' In other words, C is the column space of G. To understand the relationship between different generator mat rices for the same code, we only need to recall tha t. just as elementar y row operations do not affect the row space of a matrix (by Theorem 3.20), elementary col umn operations do not affect the column space. For a matrix OVer 7L1 • there are emly MO relevant operatIOns: lllterchange two columns (e l ) and add one column to another column (e 2). (W hy are these the only two elementary column operations on mal rices over Zz?) Similarly, elementary row operations preserve the linear independe nce of the rows of P. Moreover, if E is an elementar y matrix and c is a code vector, then
(EI')<
~
£(1'<)
~
£0 ~ 0
It follow s that EP is also a parity check matrix for C. Thus, any parity check matri x can be converted into another one by means of a sequence of row operatio ns: inte rch ange IwO rows ( RI ) and add one row to another row (R2 ). We are inte rested in showing that any generator or pllri ty check matrix can be brought into s tandard form . There is o ne Olher defmitlon we need. We will call two codes C I and C1 equivalent if there is a permutation matrix M such that
In other words, if we permute thc ent ries of the vectors in C 1 (all in the same way), we can obl'ai n Cz. For examplc,
o
o
I I 0 0,0,0,0
0,1,1 ,
0
o
o
o
o
I
I
0
o
I
o
1
001 are eq uivale nt via the permutation matrix M =
I
0
0 . Permu ting the en trirs of
0
1 0
code vectors correspo nds to permuting the rows of a generato r matrix and permu ting the columns of a parity check matrix for the code. ( Why?)
Section 5.5
Applicallons
... ,
We can b ri ng any generator matrix for a code into standard form by means of operations e l , e2, and RI . If RI has not been used, then wc have the So1rne code; if RI has been used, then we have an equivalent code. We can bring any pa ri ty check matrix fo r a code into standard form by means of operations Rl, R2 , and C I. If CI has not been used, then we ha\'c the same code; if C1 has been used, then we have an equivalent code. The followi ng examples illustrate these po ints.
ExallP11 5.21
(a) Brm g the generator matrix G=
1
0
I
0
o
1
Into standard form and fi nd an associated panty check matrix. (b) Bring the pari ty check matrix
p
= [' 0
o
1
o 1
:l
inlO sta ndard form and find an associated generator matriX.
SOlliioll
(a) We can bring the generator matrix G into sl:lndard form as follows:
G=
1 0
1
I
0
o
o
1
1
(Do you see why it is not possible to obtam standard form without usmg RH) Hence,
A "" (I
01,$0
P = [Ail]= [ l 0 1] is an assocIated parit y check matrix, by Theorem 3.34. (b) We use elementlHY row operations to bring P into standard form, keeping in IT11t1d that we wa nt to create an ident ity matri x on the fig/II- no t on the left, as in Gauss-Jordan elimination. We compu te p =
[~
Thus, A
o o
1
1
1
1
1
o o
o
0
1
P'
= [:
r.
-i
H, •• In part (3), it is instructive to verify that G and G' generate eqUIvalent, but no t identical, codes. Check that thls IS so by computing {ex: x in Zi} and {Gx : x in ZU· We now turn our altention to the main topic of this seclion, the nOlion of a d ual code.
40'
Chapter 5 Orthogo naht y
Dennilion
Let C be a set of code vectors in Zi. The orthogonal com plement of C is called the dual code of C and is denoted Cl.. That is,
Cl. ={x in Z; : c . x =O forallc inC}
The dot product in Zl bchaves like the dot prod uct in R", with one important exception: Property (d ) of Theorem 1.2 is no longer truc. In other words, in Zi, a no nzero vector can be orthogonal to itself! As an example, take x = x·x = I+J=O.
Ixample 5.22
[ ~ ] in 2~. Then
Find the dual code of the code Ci n Example S.21 (b).
Sulullau The code C is C= {Gx;"in Zi}
~ H~H~H~].c[: ]} ~
0 0 0
0
0
0
I
•
I
0
I
I
0
I
I
[Alternatively, C = {c in Z~: Pc = O} = null(P). Check that this really d oes give the same code. ] tha t are orthogonal to all four vecto rs in To find C-, we need those vectors in C. Since there are o nly 16 vecto rs altogether in Z;, we could proceed by trial and error- but here is a better m ethod . Let y = [YI Y. YJ Y4Ybe in Cl. . Since y. c = 0 for each c in C, we have four equations, one of which we can ignore, since it just says o = O. The o ther three arc:
Z;
Y2+Yj YI+
=0
Yj+ Y. = O
y.= O
YI+ Y2+ Solving this system, we obtai n
o
I
I
0 0
1 0
1
1 0
1 0
1 0
I
(Check this.) It follows that YI = y,
Yl
C'=
I
0
I
I 0
-"~OIIOO
o
0
0
0 0
+ Y4 and Y2 = YJ' so
+ Y.
I
I
y, y,
I
0
0 0
0
0 •
I
I
0
0
Y.
y,
I
0
+ y.
[
[
0
I
0
I
0 •
I
I
I
•
,
Section 5.5
Applications
...
We now examine the relationship between the generator and pari ty check matrices of a code and its d ual.
Theo". 5.21
If C is an (II, k) binary code with generator matrix G and parity check matrix P, then CJ. is an ( fI, /I - k) binary code such that
CT is a parity check matrix (o r C.l. . b. pT is a generator matrix for C.l . 3.
ProDI By definition. G is an " x k matrix with linearly independent colum ns, Pis an (" - k)X n matrix With linearly independent rows, and PC = 0. Therefore, the rows of C T are linearly independent, the col um ns o f pTarc linearly independent, and
GTp: (PG)': OT: 0 This shows thai C r is a pari ty check matnx for C .l. and pr is a generator matrix for C.l. Since pT is ttX ( " - k), CJ. is an (II, ,. - k) cooe.
Example 5.23
Find generator and panty chock matrICes for the dual code CJ. from Example 5.22. SOIIIiOA
There arc two ways to proceed. We will illustrate both approaches.
Method I: Accord ing to Theorem 5.21 (b), a generator matrix G.i for C.L is given by
This matrix
IS
o
0
I
I
in standard form wit h A =
I
o
o o
I
I
I
I
J
[~
: SO a parity check matrix for C.i
IS
0] o
I
J
I
I
by Theorem 3.34 .
I
Method 2: Using Theorem 5.2 1(a) and referring to Example 5.21(b), we obtai n a parity check mat rix pi- for C- as foll ows:
This matrix is not illo
p J. =
JIl
l
OT
o
I
I
J
J
I
J
0
~]
standard form, so we use element
[~
o
I
J
I
~]
I I
0] : o I
J
[A'• I] : Pi
418
Chapter 5 Orthogonality
Now we can use Theorem 3.34 to obtain a generator matrix G.I. for C.I. :
,
o , o ,
G~ =[l-] =
IlImple 5.24
0
,,
Let Cbe the code wi th generator matrix
, 0
o ,
G=
,
0
o List the vectors in Cand CJ. . $,1111'1
The code Cis
0 0
C- {Gx :xin IH;;;
, , , , , , , , 0 0
0
,
0 0
,
0
,
Ii
Z:
(Note that Cis a double repetition code Ihal encodes veclo rs fro m as vectors In by writing the entries twice. ~e Exerdse6t in Se
,
0
T
o , , 0
o , Thus, pl htls the form [ A : E], where A "" E, so a generator matnx G.1 for C.I. is ,
0
o , ,
0
=G
o , Hence, C.I. has the same generator matrix as C, so C.l. == C!
A code C with the property Ihal C.I. "" Cis ca\led self dual. \Ve can check that the cod e in Example 5.24 is self dual by showing that every vector in C is o rthogonal to all the vectors in C, includmg itsel( (Do this.) You may ha\'t' notIced that In the self dual code 11\ Example 5.24, every vector in C has an even number of Is. We will p rove that this IS true for every self dual code. The following definition IS useful.
S«:tio n 5.5
Applicat io ns
411
F. jessie M:KWilliams ( 1917- 1990) was one of the pioneers of coding theory. She received her It A. and M.A. from Cambridge University in 1938-39, followmg whk h she stud ied in the United Stales at Johns Hopkins UniverS Ity and Harvard Univers Ity. After marrying and ra isi ng a fam Ily, MacWilliams took a job as a w rnpUTer programm er al Bel! La boraton es 111 Murray HI li, New Jersey, in 1958, where she became interested in codmg theor y. In 1961, she retu rned to Har va rd for a year and obtained a Ph.D. Her thesis conla1115 one of th e most powerful theo rems in coding theory. Now known as the MacWil liams Ident ities, thi s Iheo rem rela tes the weight d lslribUTio n (th e numbe r of codewords o f each possi ble weisht ) of a li near code to the weIght di SfTibutlon o f its du al code. The MacWilliams Identities are widely used b y coding theo ri sts, both to obtaIn new th eoretical informatio n ab out crror-COfTC(ling codes and to determine the weight di stributions of specific codes. MacWilliams is perhaps best known for her book The Theory of Error-CorrecTing Code5 (1977), written with N. J. A. Sloane of Bell Labs. Th is boo k is often refe rred to as Ihe "bible of coding theory.~ In 1980, MacWill iams gave Ihe iniluguraJ Emmy Noether Lec ture of the Association for Women in Mathem atics.
Definition
Let x be a vector in Z~. The weight of x, denoted w(x), is the number of Is in x.
For example, w( II I 0 I 0 0 I] T) :::: 4. If we temporarily thi nk of x as a vector in Rn, then we can give the following alternative descriptions of w(x}. Let 1 denote the vecto r (of the same length as xl all of whose entries arc I. Then \\I(x) = X· 1 and w( x ) :::: X · x. We can now prove the following interesting facts about self d ual codes.
!2
If C is a self d ual code, then a. Every vecto r in Ch:ls even weight. b. 1 is III C.
"001
(a) A vector x in Zi has even weight if and only if \\I(X) :::: 0 III Z2' But
w{x)::::
x' X
= 0
since C is self d ual. (b) Using property (a), we have 1 ·X :::: \\I(x) = 0 III Zl fo r all x in C. ThiS means that ______ I is orthogolllil to every vector III C, so I is in C.i. :::: C, as req uired.
Quadrallc forms An expression of the form
ax 1
+
hi
+ cxy
is called a quadratic fo rm in x and y. Sim Ilarly,
ax 2 + byl +
CZ2
+ dxy +
ex2
+ fY2
is a quadratic form in x, y, and z. In words, a q uadratic form is a sum of terms, each of which has total degree fWO in the variables. Therefo re, 5x ! - 3y2 + 2xy is a quad ratic form , b u t X l + y! + X is noL
412
Chapter 5 Orthogonality
We can represent quadratic form s using matrices as follows:
~1 + &y1 + eX)' "" [x
e~2] [;]
yJ[e;2
and
riJ2 ~j 2 ril 2 b f/2
x Y
2 fl2
,
I'f
ax! + hi + a 2 + fiX)' + exz + fY2 = [x y
2J
,
(Verify these. ) E.1ch has the form x TAx, where the matrix A is symmetn c. Th is observation leads us to the follow ing genem! definition.
Denlillol
A quadratic fOTm in
II
• variables is a function f: R" ~ R of the
o,m
where A is a symmetric n X" mat rix :m d x is in R". We refer to A as the matrix
associated with f •
(Kample 5.25
What is the quadratic form with associated mau ix A =
[ 2 -3,]" - )
$olullon If x = [::]. th en f( x ) = xTAx
=:
[XI
x 2J[
_~
- :][; :] -
2xI!
+ 5xi - 6X ,X1
O bserve that the ojJ-diagona/enlries a ' 2 = a 2 1 = - 3 of A arc combilled to give the coefficient - 6 of x1Xz. This is Irue generally. We can expand a quadratic form in fI va riables x TAx as follows:
xTAx Thus. if i
=:
allxt
+ auxi + ... + a•..x! + }: 2 a~x, xJ
.<,
'* j, the coeffic ien t of x,x} is 2a,,.
•
Find the matrix assOCiated with the q uadratic fo rm
f(x ,. x2.xJ ) = 2x~ -
xf + 5xj + 6Xlxl -
'I'IU..
3xl xJ
The coefficients o f the squared terms X,2 go on the diago nal as a,,> and Ihe coefficients of the cross-product terms x,x, arc split between a'i and a)l. This gives
2 A=
3 -i
3
- I
0
-~
0
5
Section 55 Applications
!(X1• X1• Xl) = (XI x, x,]
so
2 3
- \
-j
0
3
-,,
.13
x, x, x,
0 5
as you can easily check.
In the case of a quad ratic fo rm fix, y) in two variables. the graph of z = !(x,y) is a surface in R l. So me examples arc shown in Figure 5.1 2. Observe that the effect of holding X or y constant is to lake a cross sectio n of the graph parallel to the )'2 o r X2 planes. respectivel y. For the gra phs In Figure 5. 12, .. 11 of these cross sections are easy to identify. Fo r exam ple, in Figure 5.12 (3), the cross sections we get by holding x or y co nstant are all parabolas opening upward, so ! (x. y) ;?! 0 fo r all values of x and y. In Figure S. 12(c), holding xconstant gives parabolas opening downward and holding yconstanl gives parabolas opening upward, prod ucing a saddle poillt.
,
••
,. y (a) 4 - ~
+ 3;
"" y (c) 4 '"'
2f2 -
3)'2
fllure 5.12 Graphs of quadratic forms fi x. y)
(d)
z - 2f2
414
Cha pter 5 Onhogon:lht y
What makes this type of analysIs quite easy is the fac t that these quadratic forms have no cross-product terms. The mat rix associated with such a quadratic form is a d iagonal matrix. For exam ple.
In general. the matrix of a quadratic form is a symmet ric mat rix. and we saw in Sc<:lion 5.4 that such matrices can always be diagonalized. We will now use this fact to show that, for every quadratic fo rm , we can elimmate the cross-product terms by mea ns of a suitable change of variable. Let f ( x ) = x TAx be a quadratic form in II va riables, with r\ a symmetric " X II mat rix. By the Spectral Theorem, there IS an orthogonal mat rix Q tha t diagonalizes Ai that is, QTAQ = D, where D is a diagonal matrix displaymg the eigenvalues of A, We now set
Substitution
LIlia
the quadratic fo rm yields
.'Ax = (Qy )'A(Qy ) = y' Q'AQy = yTDy
which is a quadratic fo rm without cross-product ter ms, Sll1ce 0 is diagonal. Furthermore, if the eigenvalues of A are AI" . . , A", then Q can be chosen so that A,
o
0 •• •
If Y = [YI .. . Y~lT, lhen, with respect to these new variables, the quadratic fo rm becomes
yTDy = ArYll + .. + A,J': Th is process IS called diago nalizing a quadratic fo rm. We have Just proved the following theorem, known as the Prill cipal Axes Th eorem. (The reason for this name wi ll become clear in the next subsection.)
I
•
Theore. 5.23
T he Principal Axes Theorem Every quadratic fo rm can be diagonalized. Spet:ifically, if A is the nXn symmetric matrix associated with the quadratic form x TAx and if Q is an orthogonal matrix such that QTAQ = D is a diagonal matrix, then the change of variable x = Qy transforms the quadratic form x TAx in to the quadratic form yT"Oy> which has no cross-prod uct terms. If the eigenvalues of A are AI> ' . . ,An and y = [YI ... Y~r, then ..
Section 5.5
(Kample 5.2
ApplicatIOns
415
Find a change of variable that transforms the quad ratic form
[(XI' x;.) = 5x,l
+
4X1X2
+ 2x~
into one wi th no cross-product terms.
Solullon
The matrix of fis A =
[~ ~]
with eigenvalues Al = 6 and " 1 = I . Corresponding uni t eigenvecto rs are ql =
Ij Vs ['/Vs]
and
(Check this.) If we set
'/Vs Q ~ [I/Vs
I/Vs] - 2/Vs
q2 =
[- 2jI/Vs] Vs
• and
D=
[~ ~]
then QTAQ = D. The change of variable x = Qy, where
x
~ [~]
'nd y
~ [;;l
converts f into
fry) ~ [(y" y,)
~ [y, Y'l[~ ~t:] ~ 6yi + y!
The original quadratic form x TAx and'the n ew o ne yTDy (refer red to in the Principal Axes Theorem) are equal in the fo llowing sense. In Example 5.27, suppose we want
to evaluate f(x) = x TAx at x =
[ -~]. We have
[(-1,3 ) ~ 5(-1)' + 4( -1 )(3) + ' (3)' ~ II In te rms of the new variables,
[ y,y, ]~ y ~ Q'x ~[ 2/Vs I/Vs [(YI'Y2) = 6y~ +
ri =
I/Vs] [- I ]~[ I/Vs] -'/Vs 3 -'/Vs
6( I/Vs)2 +
(-7/ vsl
=
55/5 = II
exactly as before. The Principal Axes Theorem has some interesting and important consequences. We will consider two of these. The fi rst relates to the possible values that a quadratic form can take on.
Definition
A quadratic formf(x ) = xTAx is classified as one of the following:
I. positive d efinite if f( x ) > 0 for all x *" O. 2. positive semidefinite if f(x) ~ 0 for all x. 3. negative definite if f(x ) < 0 for all x *" O. 4. negative semidefinite if f(x ) < 0 for all x. 5. jndefinite if f( x ) takes on both positive and negative values.
41&
Cha pter 5 Orl hogollaluy
A symmetric matrix A is called positive definite, positive semidefinite, negative definite, negative semidefi nite, or i"definite if th e associated quadratic fo rm [(x) = xTAx has the corresponding property.
The quadratic forms in parts (a), (b), (cl, and (d) of Figure 5. 12 are posi tive defin ite, negative defi nite, indefi nite, and posltl\'e semidefini te, respectIvely. The PrinCIpal Axes Theorem makes it easy to tel l if a quadratic form has one of these properties.
Theorem 5.24
Let A be an !IX 1/ sym metric matrix. The quadratic form [(x) = x TAx is a. jIOsitlve definite if and only if aU"of the eigenvalues of A ar po&Jtive. b . .positive SCiUldefiniJe j f and only if II of the eigenval ues of A nonn ive. egative definite-if and only if all of the eigenvalues of A are rfegs· . egative semidefinitejf and only if aU of the eigenvalues of A ak non"",,,';"v<, e. indefiniteif and only if A has ~th positive and nE'8?tive. .
You are asked to prove Theorem 5.24 III Exercise 49.
Example 5.28
3y
C lassify [ (x, y, z) = 3'; + + 3r - 2xy - 2xz - 2yz as positive defi nite, negative d efim te, indefinite, or none of these. SOliliol
The matrix associated with { is
- I - I - I 3 - I - I - I 3 3
which has eigenvalues I, 4, and 4. Since all of these eigenval ues arc positive, { is a positive definite quadratic fo rm.
If a quadratic form {(x) = x TAx is positive defi nite, then, since [CO) = 0, the milli1lllllM value of{(x) is and it occurs at the origin. Similarly, a negative defin ite quadratic form has a maximum at the origin. Thus, Theorem 5.24 allows us to solve certain types of maxima/minima problems easily, without resorting to calculus. A type of problem that falls into this category is the cotlstra;tled optimization problem. It is often important to know the maximum or minimum values of a quadratic form subject to certain constraints. (Such problems arise not only in mathematics but also in statistics, physics, engineering, and econo mics. ) We will be interested in fi nding the extreme values of {(x) = x TAx subject to the constrai nt that Il xl = 1. In the case of a quadratic form in two variables, we can visualize what the problem means. The graph of z = {(x,y) is a surface in R J , and the constraint Il xll = I restricts the poin t (x, y) to the un it circle in the xy-plane. Thus, we are considering those points that lie sim ultaneously on the surface and on the unit cylinder perpendicular to the xy-plane. These points form a curve [ying on the surface, and we want the highest and lowest points on this curve. Figure 5. 13 shows this situation for the quadratic for m and co rresponding surface in Figure 5. 12(c).
°
Sectio n 5.5
Applicallons
en
,
fig." 5.13 The intersection of % cylinder Xl + I - 1
zr - 31 with the
In this case, the maximu m and minimum values of { (x, y) == 2X1 - 3y2 (the highest and lowest points on the curve of intersection) are 2 and - 3, respectively, which are just the eigenvalues of the associated rmll ri". Th eorem 5.25 shows that this is always the casc.
Tbeo,.. 5.25
•
l.etf{x) = x TAx be a quadratic form with associated "XII symmetric matrix A. Let the eigenvalues of A be AI ~ A2 2: ... 2: An. Then the foll owi ng are true, subject to the constrai nt Ixl = 1:
a. AI::: fi x) ~ An b. The maxrm um vaJue of{(x) is AI' and il occurs when x is a unit eigenvector corresponding to Ar. c. The min imum value off(x) is An,and it occurs when x is a unit eigenvector corresponding to An"
PrOOf As usual, we begin by orthogonally diagonalizing A. Accordingly, let Q be an orthogonal Ill;l trix such that QTAQ is the diagonal matrix
A, .
o
.
o A.
Then, by the Principal Axes Theorem, the cha nge of vari;lble x = Qy gives xTAx = yTDy. Now note that y = QTX implies that yTy = (QTX)T(QTX) = XT( Q1)TQTX = xTo<:lx = x Tx
Wr ""
since QT _ Q- I. I-Ience, usmg x' x = XTx, we see that h l = Wx = Ixl = 1. Thus, if x is il uni t vector, so is the corresponding y, and the values of xTAx and yTDy are the same.
418
Chapter 5
Ort hogonali ty
(a) To prove properl y (a). we observe that if y = /Yl
...
y",T, then
/(x) = xTAx = yTOy == Aly,l + A1y1 + ... + A,J'~ s: AIYI2 + A1yl + ... + AIy; ,. A,(Yf + yl + .. + y;) 1
== "11Y1 == AI Thus,j(x ) s: Al for all Exercise 59.)
xsuch tha t I x ~
= I. The proof that j( x)
2::
All is similar. (See
(b) If ql is a unit eigen vecto r correspondi ng to AI' then Aq l == Alq l and j (q l) == q iAq , = q i Alq l = A,(q ;q l) = AI This shows tha t the quadratic fo rm aClually takes o n the value A]> and so, by prop~ erty (a), it is the maximum \'alue of j(x) and it occurs when x = q ,. (c) You are asked to prove this property in Exercise 60.
Example 5.29
Find the maximum and mioul\um vlllues of the quad ra tic fo rm j (x l' Xl) = 5xf + 4x lx! + subject to the constraim + = I, and determine values of Xl and x I fo r which each of these occurs.
xl xl
2xi
Solution In Example 5.27, we fou nd that jhas the associated eigenvalues AI = 6 and Al = I . with corresponding unit eigenvectors
q, =
2/ v'S] [1/ v'S
.nd
q, =
II 5] [-2/ v'S
Therefore. the maximum value of / 156 when XI = 2/VS and Ai = 1/v'S. T he mini~ mum valueofJis I when Xl = l/Vs and .\2 = -2/ VS. (Observe that these extreme values occllr twice-in opposite directions-since - q l Hnd - q 2 are also unit cigen ~ veCiors fo r AI and " 1' respectively).
Grap.lag Quadratic Iquallons The general fo rm of Hquadratic equation in twO variables X and y is
ar + bf + cxy + dx + I.'y + / -
0
where at least o ne of (I , h, and c is nonzero. The graphs of such quadratic equations arc called cOllie sl?Cl ions (or conia), sincc the y can be obtained by taking cross scc~ tlo ns of a (do uble) cone (i.e., slicing it with a plane). The most Important of the con ic St:cliol1s arc the ell ipses (with circles as a special case), hyperbolas. and parabolas. These are called the lIondegl?tleratl? conics. FIgure 5.1 4 shows how the y arise. It IS also possible fo r a cross section of a cone to result in a slIlgle point, a straight line, or a pair o f hnes. These arc called d egenerate con ics. (See ExerCises 81-86.) The graph o f a nondegenerate conic is said to b e in standard positiotl rela tive to the coordinatt" axes If its equation call be expressed in o ne o f the forms in Figure 5. 15.
Sc\;tion 5.5
Circle
Parnoota
Ell ipse
Appli(ations
Hyperbola
figure 5.14 The non degenerate co nics Xl
Ellipse or Circle: 2" " y
y
y2
+ -.
= I; 11 , b
>0
/I y
b b
-a
- a
a
"
"
-a
x
/
"-
-b
-b (I >
a
b
"
"
a
x
""
-a a= b
Hyperbola y
y
_-",a' t-- - -f"''_ x
702
y2
"'-- bI = L
"
)'
b --+-~x -b
o.b > O Parabola y
y
; - -_x
Y=(1X 2, £1 > 0
fig.It 5.15 Nondegenerate conics In standard position
x
419
420
Chapter 5
Orthogonality
Example 5.30
If possible, write each of the following q uadrat ic equ ations in the form o f a co nic in standard position and identify the res ulting graph . (a) 4xl+ 9yl: 36
Solallol
(b) 4x': - 9y'+ I =0
(a) The equation 4x 2
(c) 4xl - 9y = O
+ 9'; = 36 can be wri tten in , , -K + -r = I 9
the fo rm
4
so its graph is an ellipse in tersecting the x-axis at ( ::!: 3, 0) and th e y-axis at (:!:2, 0). (b ) The equation 4:2 -
9y +
I=
°can be written in the for m I
Xl
-
-
'-I • •
= 1
so its graph is a hyperbola, opening up and down, Intersecting the y-axis at
to, :ti).
(e) The equatio n 4xl - 9y = 0 can be written in the form
4 ,
y= - K 9
so its graph is a parabola opening upward.
If a quadratic equation con tains too many terms to be written in o ne of the forms in Figure 5. 15, then its grap h is no t in standard posit ion. When there arc additional terms but no Xl' term, the graph of the coni c has been translated out of standard position.
Example 5.31
Identify and graph the conic whose equation is Xl
SolalioR
+ 2y1 - 6x + By + 9
:: 0
We begm by grouping th e X and y teTIllS separately to get
(x 2
-
6x)
+ (2y2 + By)
= -9
(x' - 6x) + 2(,' + 4y) ~ - 9 Next, we complete the squares o n the two expression s in parentheses to obtain
(x 2
-
6x
+ 9) + 2(/ + 4y + 4) = - 9 + 9 + 8 (x - J )' + 2(y + 2)' ~ 8
'Ne now make the substitutions x' :: tion into
X -
(x' )' +2(y) '~ 8
3 and y' = y
0'
+ 2, tUrtllllg the above equa-
(x')' (y) ' "'-!+ ~ I 8 4
Section 5.5 Applications
41 1
This IS the equation of a n ellipse; n standard position in the x'y' coordi nate system, mte rsecting the x' -axis a t (:!:2 Vl.O) and the y' ·axis at (0. :t2). The origin in the x'i coordinate system is at x = 3, Y = - 2, so the ellipse has been tra nslated o ut of stan · dard position 3 u nllS 10 the right and 2 uni ts down. Its gra ph IS shown in Figure 5. 16. )'
r'
2
-::t2-+-t-?2:-+~4~-6-+--+ x -2
x
- 4
fillarl 5.1& A translated ellipse
If a quad ra tic equation contains a c ross· p roduct term, then it represe nts a conic that has been rOtClted.
Ix ample 5.32
Ide ntify a nd graph the con ic whose equation is
5x 2 + 4xy + 2y2 = 6
Solution
The left·hand sid e of the equation is a q uadra tic for m , so we can wri te it in matrIX fo rm as x TAx = 6, whe re
A =
[~ ~]
In Example 5.27. we found that the eigenvalues of A a re 6 and I, and a matrix Q that o rthogo nally diago nalizes A is
Q :
2/Vs l/ Vs] [ l/ Vs -2/ Vs
O bserve that de t Q :::: - I. In Ihis example. we will in te rchange the columns of this matrix to make the determinan t equal to + 1. The n Q will be the ma trix of a rotCltiorl. by Exe rcise 28 in Section 5.1. It is always po ssible to rearrange the colum ns of an o rt hogo nal matrix Q to ma ke Its de te rm ina nt equal to + I. ( Why?) We set
Q:[ - 2/l/ Vs Vs instead, so that
2/ Vs] l/ Vs
422
Chapler 5
Orthogonality
Thechange of variable x = Qx' converts the given equation into the fo rm (x') TDx' = 6
by means of a rotation. If x' = [ ;]. then thIS equation is just (x)' 6
y
+ (f)' ~ ,
which represents an ell ipse in the x' y' coord inate system .
To graph this ellipse, "'e need to know wh ich vectors play the roles of e; lmd e;
=
-- [0']
[~] in the new coordinate system. (These two vectors locate the positions
of the x' and y' axes.) But , from x = Qx ' , we ha\'e
2/v'S][ '] [ '/V ' 5 ] 'Iv'S - [-2 /v'S 'Iv'S 0 - -2/,15
A_' _
- 2
~,
and
,
,, flua" 5.11 A rotated ellipse
Qe,~
/ / 2 v'S][O ] ~ [2 v'S] -2/v'S 'Iv'S , 1/v'S
[ 'I v'S
These arc just the columns q l and q! of Q, which are the eigenvecto rs of A! The fact that these are orthonormal vectors agrees perfectly with the fa ct that the change of variable is just a rotation, The graph is shown in Figure 5. 17.
You can now see why the Pri ncipal AJ(cs TheorCIll is so named. If a real symmet ric matrix A arises as the coefficien t matrix of a quadratic equation, the eigenvectors of A give thc dIrections of the principal axes of the corresponding graph. It IS possible for the graph of a conic to be both rotated and translated out of standa rd position , as illustrated in Example 5.33.
lxampls 5.33
Identify and graph the conic whose e(1uatioll is 5x
2
+
4xy
+
28
4
v'S X - Vs y +
2y2 -
4 = 0
Sulutlon
The strategy is to eliminate the cross-product term fi rs\. In mat rix fo rm. the equation is x TAx + Bx + 4 "" 0, where A =
[~ ~]
and
8_[_ 28 _ 4] v'S v'S
The cross-product term comes fro m the quadratic forlll xTAx, which we d iagonalize as III Example 5.32 by setting x - Qx', where
/ 2 v'S] [ -2/v'S 'Iv'S 1/v'S
QTheil, as in Example 5.32,
xTAx ~ (x' rDx' ~ (x )'
+ 6(1)'
BUI now we also have
Bx = BOx'
= [ -
~
-
/v'S][x] 4][ 'Iv'S 2 v'S -2/v'S 'Iv'S 1
-4x -
12y'
SectIOn 5.5
App licat ions
en
y
2 y'
1
Thus, in terms or x' and y', the given equation beco mes
(x)' + 6(,' )' - 4x -
12y' + 4 = 0
To bring the conic represented by this equa tjon into s\;'lIlda rd posi tion , we need to trans/at e the x! y' axes. We do so by completing the squares, as in Example 5.31 . We have
«x)' - 4x + 4) + 6(y ')'
- 2y' + I)
= -4 + 4 + 6 = 6
(x - 2)' + 6(,' - I)' = 6
0'
This gives us the tran slation equations JI' =x '-2
and
y" -y '- I
In the x~y~ coo rdinate system, the equation is simply
(x" )' + 6(, ")' = 6 whi ch is the equation or an ellipse (as in Example 5.32). \\'e can sketch this ellipse by first rotatlllg and then tran slating. The resulting grap h is sho wn in Figure 5.18 . The gen eral rorm or a quadratic equation in three variables X, y. and z is
ax 2 + byl
+ cil + dxy + en + fyz + gx + hy +
jz
+j
.+
= 0
whe re at least one of a, h, . .. f is nonzero. The graph of such a quadra tic equatio n is called a quadric sur foce (or quadric). Once aga in, to reco gni u a quad ric we nttd to I
"Y
y
r2 Hyperbolo1d of twO shee ts' ~
~,2
+ h!
Elliptic p:lr3boIOld:
,
y
figure 5.19 Quadric surfaces
-
Z2 ~
I
Section 5.5
Applications
425
p ut II into standa rd position. Some quadrics in standard position arc shown FIgure 5.19; others a re obtained by perm uting the variables.
Example 5.34
In
Identify the quadric surface whose equation is 5x 2
Solution
+ 11/ + 2-!- +
+
16xy
20xz - 4yz= 36
The equa tion can be written in matrix form as x TAx = 36, where
A
=
5
8
10
8
II
- 2
10
-2
2
We fi nd the eigenvalues of A to be 18,9, and - 9, wit h corresponding o rthogonal eigenvecto rs
2 2 , 1
1
-2 , 2
2
,,,d
- I
-2
respectively. We normalize them to obtain
,, ,, ,,
q, =
,
q, =
t
, , ,,,d
,l _ 1 , _ , 1
q, =
_ l
j
and form the orthogonal mat rix
, , , l, l, _1, -! , l
Q
= [q, q, q,] =
, , _l,
1
Note that in order fo r Q to be the mat ri x of a rotat Ion, we requ ire del Q "" 1, which is true in Ihls case. (OtherWise, det Q "" - 1, and swapping two coluillns changes the sign of the determ ina nl.) Therefore,
QTAQ = D ""
18 0
o
0 9 0
0 0 -9
and, wilh the ch.mge of variable x = Qx', we gel x r Ax = (x' )Ox' = 36, so
18{x )' + 9(y )' -
*')' = 36
(x)' 2
+
(I) ' 4
From Figure 5.19, we recognize th is equation as the equation of a hyperboloid o f one sheet. The x', y', and z' axes are in the d irecti ons o f the eigenvecto rs q l' Q2' and q .P respect ively. The graph IS shown in Figure 5.20.
426
Chapter 5 O rthogonali ty
,
,
x.
figure 5.20 A hyperboloId of one sh« t In nonstandard position
4
We can also iden tify and graph quad rics tha t have been tra nslated oul of standard position using the "cornplcte-the-squares method " of Examples 5.3 1 and 5.33. You will be asked to do so in the exercises.
Exercises 5.5 ••• 1 CUll In ExercISes 1-4. G is a generator matrix fora code C. Bring G imo standar(1form and determ;/le whether tile corresp{mdmg code ;s equal to C. I
0 0
0
0
I
I
I
I
0
0
0
I
I
I. G =
3. G =
I
0
I
I
I
0
2. G =
0
0
0
I
0
I
0
I
I
I
I
I
4. G =
I
I
I
I
I
0
0
0 0
I
I
I
01
6.P= [:
I
I
0
I
I
0
0
I
0
0
I
I
I
8.P =[~
I
0 0
7. p =
0
:1
0 0 0 • I 0 0
10. C =
I 0 I
I
In Exemses 9-12, find the dual code c.l of tile code C.
In Exercises 5-8, P is a parity check matrix for a code C. Brmg P into standard form and determine wlletller tile corresponding code is equal to C.
5. P =[I
0
I
:1
I 0 I 0 0 • I • 0 • I I 0 0 I
0
II. C =
0
0
I 0 0 • 0 •
I
0 0
0
• 0
0
I
I
0
Section 5.5
12. C =
I
0
-3
0
2
]
I
-3
I
I
I
0
3 -3
0
2
]
0
0
I
I
0
I
0
I
0 , I , 0 0 I 0 0
]
]
0
26. A
27. A =
III Exercises 13-16, eitlu"11 gcnerator matrix G or a parity
check matrix P IS givef/ for a code C Fm(1 a gCl/erator matrix
G.1 and a parity check
13. C =
I
]
]
]
I
0
] 0 14. G=
16. P =
1
0
I
I
o
I
]
0
]
o o
I
o
0
29.
x~
+
33. 5x,2
I
0 ] 0
o
,
x
~
2
0
3
I
, 2
- 3 I 3 2 2 0 2 0 I ,x = ]
y
- ]
I I
2x}
+ 6x rx l
30.
X1Xl
Y 32. xf - xl + 8x,X2 xi + 2xf + 2x,x1 - 4xrx3 + 4X~l 3/ + i - 4xz
-
34. 2X2 -
6X! Xl
]
]
Th e even parilY code C" IS Ihe su.bset oflL'2 cOllsis/ing of all vectors witl! evell we/gl!t. Tile II -times repetition code Rep~ is the subset ollL.; consisting o/just the '10'0 vectors 0 alld I (all zeros and allIs, respectively). 18. (a) Find generator and pari ty check matrices for EJ and Repy (b) Show that E; and Rep3 a rc dua l to each other. 19. Show that Elt and Repn are d ual to each other. 20. If C and D aTC codes and C C D, show tha t D l ~ CJ... (l
,x =
31. 3Xl - 3xy -
17. Find generator and panty check matrices for the dual o f the (7, 4) Hamm ing code in Example 3.70.
21. Show that if C IS a code wi th (C.l. ).l.:: C.
x
llI lixemses 29-34, find tile symmetric IIwtrix A associated with the given qrurdratic fortl l.
o ]
]
I
28. A =
lIIatrix p.l fo r the d//(// code ofC
o ]
1 5.P =[~
~
UJ
Applications
gene rator matrix, the n
22. Find a self dual code of length 6.
Quadratic F.,., In Exercises 23-28, evaluate the qrmdrMic form f( x) = x T Ax for the given A alUl x.
Dwgollaiize the qlladmtie fomls ill ExerCISes 35-40 by findillg all orthogollal matrrx Q slIch tllm the challge of variable x = Q y t,"lISforms tIle given form illto OtiC with /10 cross-prodllct terms. Give Q and the nelY qrmdmtic form.
+ 5X2 - 4X,X2 36. x-' + 8xy + f.l 37. 7xr2 + xi + xl + 8XrX2 + 8x,xJ - 16x2X3 38. xr1 + x; + 3xj - 4XrXl 39. X l + ~2 - 2xy + 2yz 40. 2xy + 2.1.'2 + 2)'2 35. 2x,"
ClaSSIfy em:!1 of the qlladraric forms 11/ Exercises 4 1-48 l I S positrve definite, posrtrve scm idefinite, negative defillite, negative semidefimte, or ifuJefillite.
+ xi - 2XrX2 43. -2x 2 - 2/ + 2xy 44. X2 + / + 4xy 45. 2xr2 + 2x; + 2x5 + 2x,x 1 + 2x i x J + 2X1Xl 46. x,! + xI + xi + 2x 1xJ 47. + xi - xj + 4xrX2
41. x~
+ 2xf
42. x f
xr
48.
_ Xl -
/
-
Z2 -
2xy - 2.1.'2- 2yz
49. Prove Theorem 5.24. 50. Let A
~ [~
!]
be a symmetric 2X2 matrix. Prove
thaI A is positive defi nite if and o nly if a > 0 and de t A > O. !Him: ru? + 2bxy + d/ = 24. A
~ [~
25. A ~ [3 -2
a(x+ ~yr + (d - ~ll )y2. J n
51. Let be an invertible matrix. Show that A = 8T B is positive defi nite.
UI
Chapter 5 Orthogonahty
52. let A bt a positive definite symmet ric matrix. Show that there exists an invertible matrix Bsuch thaI A = 8 TB. ( Hh1t: Use the Spectral Theorem to write A = QDCt. Then show that D can be fact ored as er e for some invertible matrix C) 53. let A and 8 be positive definit e symmetric n X n matri-
77. 3x1 - 4xy + 3/ - 28 V2x + 22 V2y + 84 - 0 78. 6x 1 - 4xy + 9yl - 20x - lOy - 5 = 0 79. 2xy +2V2x - 1 = 0
ces and let c be a posit ive scalar. Sho w that the foll owing matrices arc positive defi nite.
80. x 2
(a) cA (b ) Al (e) A + B (d ) A- I (First show that A is necessa rily invert ible.)
SometImes the graph of{j quadratic eqllation is a stroight Une, a pair of straight lilies, or a sillg/e poiu t. We refer to such a graplt as a degenerate conic. It is also poSSIble tltat tile equalioll IS not satisfied for {Illy values of the variables, ill whic/I case tllere is no gmplr at all (Iud we refer to tile conic as an imaginary con ic. In ExerCISes 81-86, identify the conic with the given equ(jfioll as either degenerate or il/wginary {mfl. where possible. sketch tile grapll.
54. Let A be a positive definite sym metric matrix. Show that there is a posItive definite symmetric matnx 8 such that A = If. (Such a matrix B is called a $quare
root or A.)
III Exercises 5~58. find the maximum (md mimmllm value$
of the quadratIC form f (x) in tlu!: given exercise, subject to tlte cOllstramt Ixl = I , a"d determine the values of x for whICh tllese ocaIT. 55. Exercise 42
56. Exercise 44
57. Exercise 45
58. Exercise 46
59. Finish the proof of T heorem 5.25(a).
60. Prove Theorem 5.25(c).
Graphing Quadralle Equallons In Exercises 61-66. identify the grapll of the given eqljQfion. 6I.x1 +5l= 25 62.x 2 _ yl_ 4 = 0 64. 2x 2 + / - 8 = 0
63. r - y - l = 0
66.x = - 2/
In Exercises 67- 72. use a tram/ahon ofaxes to put the conic ill stand{ml positlOIi. Idfm tify the gmpll, give its eqlwtioll in the translated coordinate system, ami sketch the curve.
+ I - 4x - 4y + 4 = 0 68.4x 2 + 2/ - 8x + 12y + 6 = 0 69. 9~ - 4l - 4y = 37 70. x 2 + lOx - 3y = - 13 71. 2/ + 4x + 8y = 0 72. 2/ - 3Xl - 18x - lOy + II - 0 67.
III Exercises 77-80, idemify tlte conic lVitlt tlte given equatlOIl aud gu-e its equation i" standard form.
Xl
In Exercises 73- 76, use a rotation ofa xes to put tlte conic in standard position. Identify the gmp/l, give its equation in the rotated coordinate system, and sketch the Cllrvt. 73. x 2 + xy +
75. 4x 2
l
74. 4Xl + lOxy + 4/ = 9 = 5 76. 3x1 - 2.xy + 3yl = 8
= 6
+ 6xy - 4l
2xy
-
+ / + 4V2x - 4 = 0
82 . x'+ 2y'+ 2 ~ 0
81. x' - y'= 0
+l =o 84. Xl + 2).)1+ l ::: 0 85. x 2 - 2xy + l + 2v'2x - 2V2y = 0 86. 2xl + 2xy + 2l + 2V2x - 2V2y + 6 = 0 83.3x1
87. lei A be a symmetric 2 X2 matrix and let kbe a scalar. Prove that the graph of the quadratic equatio n xTAx = kis (a) a hyperbola if k 'I< 0 and det A < 0 (b) an ellipse, circle, or imaginary conic if k 'I< 0 and det A > O (c) a pair of straight lines o r an imaginary conic if k 'l
tioll alltl give its equation hI statld{lrtl form.
88. 4r + 89. x~
41
+ 4Z1 + 4xy + 4.xz + 4yz = 8
+ l + Z1 - 4yz
90. - x 2 91. 2xy
-
/
+z=
-
Z2
== 1
+ 4xy + 4xz + 4yz =
12
0
92. 16x1 + IOOy2 + 9z 2 - 24x.z - 60x - 80 z = 0 93. x 2
+
r - 2r
+ 4xy - 2.xz + 2yz - x + Y + z = 0 94. IOXl + 25/ + JOZl - 40xz + 20Vzx + SOy + 20 v'iz = 15 95. ll ~ + III + 14z2 + 2.xy + 8xz - 8yz - 12x 12y + I2z = 6
+
Chapter Re'llew
96. Let A be a real 2X2 mat rix with complex eigenvalues A = Ii ± bi such that h 0 and IAI = I . Prove that every trajecto ry of the dynam ical system X, + I :: AXA lies 011 "n ellipse. [Hmt· Theorem 4.43 shows that if v is all eigenvec· tor corresponding to A "" a - hi, then the matrix p == [Re v Imv] is invertlblea nd
'*
A~
1~ -~]
UI
p - I. Sct B = ( pp'f) - I. Show that thc
quadratic xTBx = k defi nes an ellipse fo r all k > 0, and prove that if x lies o n this ell ipse, so does Ax.J
• ., Dellnltions and fundamental subspaces of a matrix.. 377 Gram·Schmidt Process. 386 orthogonal basis, 36i orthogonal complemen t of a subspace, 375 orthogonal matrix. 37 1
orthonormal set of vectors. 369 propcn ies of onhogonal mat rices. 373 QR faclOrization. 390 Rank Theorem, 383 spectral decompositIOn , 402 Spectr,,1 Theorem , 400
o nhogonal proJKtion , 379 o rthogonal set of vecto rs, 366 Orthogonal Decomposi tion Theorem. 381 o rthogonally diagonalizable matrix. 397 o nhonormal basis, 369
Review Questions 7
1. Mark e"ch of the following statements t rue or false: (a) Every ortho normal set of vttto rs is linearly
(b) (c) (d) (e)
(0 (g)
(h) (i) (j)
independent. Every nonzero subspace of R~ has an orthogonal basis If A is a square matrix with orthonormal rows, then A is an orthogo nal matTlx. Every o rthogol1,,1 matrix is invertible. If A is a mat rix with det A :: I, then A is an orthogonal matrix. If A is an m X " matrix such that (row(A» .l = R~. then A must be the zero matrix. If W is a subspace of IR~ and v is a vector in R" such that pro; lO'( v) = O, lhen v must be the zero vector. If A is a symmetTlc,orthogonal matrix, then AZ = I. E\'ery o rthogonally diagonaliz.l ble matrix is Invertible. Given any /I real numbers AI! . .. ,An' there exists a symmetric /I X III11"tfl)C with AI' ... , An as Its eigenvalues.
2. Find all values of (/ and b such that I
2. 3
4
•
I , b - 2 3
is an o rthogonal set of vectors.
3. Find the coordmate vector [V]6 of v =
- 3 with 2
respect to the orthogonal basis I
I
0,
I ,
I
- I
- I
2
of R'.
1
4. The coordina te vector of a veClOr v with respect to an
orthonorm:llbasis 6 "" {v l, vl}of R l is [V]6 = If VI =
J/5] [
[ l/~] '
4/ 5, find allposslblevectorsv.
6! 7 2! 7 3! 7 5. Show that - I!V, o 2! V, 4! 7Vs - 15!7V, 2! 7V,
•
IS an
o rthogonal matTix. 6. If
[ 1~2
:] is an o rthogonal matrix, fi nd all possible
values of a, h, and c. 7. If Qis an orthogonal " X II matrix and {VI> ...• v(} is an orthonormal sct In Rn, prove that {Q v l , . . . , QVt} is an o rthonormal sct.
431
Chapter 5 Orthogonali ty
8. If Q IS an " X " mat rix such that the angles L (Qx, Qy) and L (x , y ) a re eq ual for all vectors x and y in lQ:", prove that Q IS an orthogonal matrix .
(b) Use the result of part (a) to find a OR factorization
o f A ""
In QlleJtlO1U 9-12. find Il basis lor IV J.. 9. W is the line 111 H2 with general equation 2x - Sy = O. X=I
11. W "" span
\
0
- \ ,
\
,
\
vectors
A=
3
2
\
-2
\
4
8
9
- 5
6
- \
7
2 3
IS. Let A =
- \
: XI
El
I
- 1
I - I
2 I
I 2
-=
span
20. If {V I' V 2• 0
,
\
=
\
\
\
0
\
\
\
\ \
, Xl
=
\
, X)
0
to fi nd a n o rthogonal basis for
=
\ \
W
.•
\
,
\
\
\
,E _l = span
- \
o
v,,} is an o rthonormal basis for R d a nd
prove that A IS a sym m e tric m a trix wi th eigenvalues cl' ':!' . .. • c,. and corresponding eigenvectors VI' v 1• • . , v .....
- 2
15. (a) Apply the G ram · Schmidt Process to
XI
of W.
\
- \
\
= 0
3
\
,
\
o
with rcsp«t to
0
+ Xi + X, + ~
2
2
\
R~ that contains the
\
\
0 - \
\
\
19. Find asymmetric ma trix wit h eigenvalues Al = Az = I, AJ = - 2 and eigenspaces
\
W = span
\ 0
(a) O rthogonally diagonahze A. (b) Give the spectral decomposition of A.
14. Find the orthogonal decompositio n of
v =
\
17. Find an ort hogo nal basis for the subspace
\
- \
\
2
2 - 2
- \
\
"d
2
13. Find bases for each o f the four fundame ntal subspaces of \
\
\
x,
- \
\
o
2
\
\
0
\
\
0
\
x, x, x,
\ \
- I
-3
4
12. W = span
=
\
16. Fi nd an orthogonal basis for
10. Wis the line in W wi th parametric equations y = 21. Z
\
= span{xl'
X l ' Xl }'
ctor
Algebm is gellerous; she of,ell gives more 1111111 is asked of IU'r. - Jean Ie Rond d'Alembert
6.0 Introduction: Fibonacci in (Veclor) Space The Fibonacci sequence was introduced in Section 4.6. It is the sequence
( 17 17- 1783)
In Carl B. Boyer A Hisrory of MII/h emafies \'Viley, 1968, p. 481
0, I, 1,2,3,5,8, 13, ...
of no nnegative integers with the property that after the fi rst IwO terms, each term is the sum of the two terms preceding it. Thus 0 + 1 = 1, 1 + 1 = 2, J + 2 "" 3, 2 + 3 = 5,and soon. If we denote the terms of the Fibonacci sequence by ~, h., ... , then the entire sequence is completely determined by specifyi ng that
to,
fo = Q,!;
= I
and
in=
In- I
+ i"-2 fo r II 2: 2
By analogy with vector notation, let's write a sequence .\("
XI'
X:!' x3• '
••
as
x = [Xo,XI' ~,x3" " )
The Fibonacci sequence then becomes f = [Io,!"!,,!,,. .. ) = [0, I, 1, 2,. .. )
We now general ize this notion.
Definition
A Fibonacci-type sequence is any sequence x = (Xu, xI' X 2, Xl" such that Xu and X I are real numbers and xn = xn _ 1 + xn_2 for n > 2. For example, [ I, sequence.
Vi. I + V2. 1 + 2 V2. 2 + 3 v'2.... )
••
is a Fibonacci-type
Problell1 Write down the first five terms of three more Fibonacci-t ype sequences. By analogy with vecto rs agai n. let's defi ne the $11111 of two seq uences x = [At). xI' X l > . . . ) and y = [Yo. Y» Y2' .. . ) to be the sequence
x + Y = [41
+ Yo,xl + YI'X2 + Yl.·· ·)
If c is a scalar, we can likewise define the scalar multiple of a sequence by
•
431
taZ
Chapter 6
Vector Spaces
'r,lIle.2 (a) Using your examples from Problem 1 or other examples, compute thf! sums of various pairs of Fibonacci-type sequences. Do the resulting sequences appear to be Fibonacci-type? (b ) Com pute va rious scalar multiples of your Fibonacci-type sequences from Problem I . Do t he resulting sequences appear to be Fibonacci-type? Probl •• 3 (a) Prove that if x and y arc Fibonacci-type sequences, then so is x + y. (b ) Prove that if x is a Fibonacci-type sequence and c is a scalar, then ex is also a
Fibonacci-type sequence. Let's denote the set of all Fibonacci-type sequences by Fib. Problem 3 shows Ihat, like R~, Fib is closed under addition and scalar multiplication. The next exercises show that Fib has much more in common with R". 'robl •• 4 Review the algebraic properties of vectors in Theorem 1. 1. Does Pib satisfy all of these properties? What Fibonacci-type sequence plays the role of O? For a Fibonacci-type sequence x, what is - x? Is - x also a Fibonacci-type sequence? 'robl •• 5 In An, we have the standard basis vecto rs el' e1•... • eft' The Fibonacci sequence f = [0. [, I, 2, . .. ) can be thought of as the analogue of e~ because its fi rst two terms arc 0 and l. Whal sequence e in Fib plays the role of c l? What about el , e~ •. .. ? Do these vectors have analogues in Fib? 'rolll•• 6 Let x = [;.;;., xl'~ ' ... ) be a Fibonacci- type sequence. Show that x is a linear combination of e and f. J Show that e and f arc linearly independent. (That is, show that if ce + df = O,then c '" (1 = 0. ) Problel! 8 Given your answers to Problems 6 and 7, what would be a sensible value to assign to the "'dimension" of Fib? Why? ProbleUl 9 Are there any geometric sequences in Fib? That is. if
'r.III,.
Il,r, r1, rJ , ••
.)
is a Fibonacci-type sequence, what arc the possible values of ~ 'r,bl •• 11 Find a "baSIS" for Fib consisting of geometric Fibonacci-type sequences. ",lIle. 11 Using your answer to Problem 10, give an alternative derivation of Biflet's fomlllfil Iformula ( 5) in Section 4.6 1:
I (I + v'S)" _ I (I - v'S )" "v'S 2 v'S 2
f, _
for the terms orthe Fibonacci sequence f = the basiS from Problem to.) The Lucas sequence is named after Edouard lucas (see pagf! 333).
1fo.J; ./,., . . . ). ( Hint: Express f in terms of
The Luctu seq uence is the Fibonacci-type sequence 1 =[ ~,11'12 ,13 , ·· · ) - [ 2, 1 ,3,4,
. .. )
Problea 12 Use the basis from Problem 10 to find an analogue of Binet's formula fo r the nth term f" of the Lucas seq uenCl~~. Proble. 13 Prove that the Fibonacci and Lu cas sequen c~ are related by the identity
{,. -I + f~" 1 = I~ [H im; The fibona cci-type sequences r-
for tl
2:.
1
"" 11, I, 2, 3, ... ) and f'"
= [ I, 0, I, I, ... )
fo rm a basis for Fib. (Why?)] In this Introduction, we have seen that the collection Fib of all Fibonacci -type sequences ~ha ves in many respects like H2. even though the "vectors" are actually infinite sequencrs. This useful analogy leads to the general notion of a vector space that is the subject of this chapter.
5«-tion 6. 1 Vector Spaces and Subspaces
ua
Vector Spaces and Subspaces In Chapters 1 and 3, we saw that lhe algebra of vectors and the algebra of matrices are similar in many respects. In particular, we can add both vC(:tors and matrices, and we can multiply both by scalars. The properties that result from these two operations (Theorem 1.1 and Theorem 3.2) are identICa l in bot h settings. In th IS section, we usc these properties to define generalized "vectors" tha t arise in a wide variety of exam ples. By proving general theorems about these "vectors," we will therefore sim ultaneo usly be provlllg results about all of these examples. ThiS is lhe real po\Y"er of algebra: its ability to take properties from a concrete setting, like RM, and (lbstmct them into a general setting.
,, Let Vbe a set 011 wh ich two operations, called ndditiol1 and 5calar ; have been defi ned. If u and v arc in V, the 511m of u and v is denoted by u + v, and if c is a scalar, the scalar multiplc of u by c is denoted by cu. If the following axioms hold for all u, v, and w in Vand for aU scaJars cand d. then V is called a vector space and its elements are called vectOni. The German mathematiCIan Hermann Grassmann ( 18091877) is generally credited with first Introducing the idea of a vector space (although he did not can it that) in 1844 Unfortu · nately, his work was very difficult to read and did not receive the attention it deserved. One person who did study it was the Italian mathematician Giuseppe !'eano ( [8 58~ 1932). In his 1888 book C(llcolo GeomctncQ, Peano clarified Grassmann's e;lrlier work and laid down the axioms for a vector space as we know them today. Pea no's book is also remarkable for introducing operations on sets. His notations U, n , and E (for "union," "inler· section,Mand "is an dement of") are the ones we still use, although they were nOI immcdlaldy accepted by other mathematici(1I1S. Peano's axiomatic defini· tion of a vector space 111so had vcry little mfluence for many years. Acceplance came in 1918, after Hermann Weyl ( 18851955) repeated it 111 his book Space, Time, Mmler, 1111 introduction to Einstcl11's general theory of relativity.
l. u + v lsinV. 2. u + v = v + u
under addition Commutativity 3. ( u + v) + w = u + (v + w ) M!,ociati\-il\' 4. There ex ists an element 0 in v, called a %ero vector, such that u + 0 = u. 5. Fo r each u in V, there is an clement - u in V such that u + (- u ) == o. 6. culs inV. Clo~urc under !oCalar muJtipJi";,1lion 7. c( u + v) = co + CV Diwibutivity 8. (c + d) u = ru + du D i~tributivi t y 9. c(tlu ) = (cd )u 10.1u = u Ch)~ure
Re • • ," • By "scalars" we will usually mea n the real numbers. Accordingly, we should refer to Vas a rC(l1 vector space (or a vector space over tile Tenlmlmbers) . It IS also possible fo r scalars to be complex numbers o r to belong to Zp' where p is prime. In these Cllses, V is called a complex vector SplICe o r a vector space over Zp' respectively. Most of our examples will be real vector spaces, so we will usually o mit the adjective " real." If something is referred to as a "vector space," assume that we arc working over the real number system. In fact, the scalars can be chosen from any num ber system in which, roughly speakmg, we can add, subtract, multiply, and divide according to the usual laws of arit hmetic. In abstract algebra, such a number system is called a field. • The definition of a vector space does not specity what the set V consists of. Neither docs it specify what the operations called "addition" and "scalar multiplication" look like. Often, they will be fam ilia r, but they necd not he. Sec Example 6 below and ExerCises 5-7,
\Ve will now look at several examples of vector spaces. In each case, we need to specify the set Vand the operations of addition and· scalar multiphcation and to verify axioms 1 th rough 10. We need to pay particular attention to axioms 1 and 6
434
Chapler 6
Veclor Spaces
(closu re), axiom 4 (the existence o f a zero vector V must have a negative in V).
In
V), and axiom 5 (each vector in
Ixample 6.1
For any 1/ i2: 1, IRn is a vector space with the us ual op erations of addition and scalar m ultiplication. Axio ms I and 6 follow from the defi n itions o f these operations. and the remaining axioms foll ow from Theorem 1.1.
lKample 6.2
The set of all 2X3 matrices is a vecto r space with the usual operations of matrix addition and m atrix scalar multiplication. Here the "vectors" are actually matrices. We know that the sum of 1\\10 2X3 matrices is also a 2X3 matrix and that multiplying a 2X3 matrix by a scalar gives anothe r 2X3 mat r ix; hence, we have closure. The remaming aXIO ms follow from Theorem 3.2. In particular. the zero vector 0 is the 2X3 "lero matrix, and the negative of a 2x3 matrix A is just the 2x3 matri x - A. There IS noth ing special about 2X3 matrices. For any positive integers m and n, th e set of all //I X tI mat rices fo rms a vector space with the usual operatio ns of m atri x add ition and matrix scalar multi plication. This vector space is denoted M m".
IxampleJt
Let ~ 1 denote the set of all polynomials o f degree 2 or less with real coefficients. Define addition and sca lar multiplication in the usua l way. (See Appendix D.) If
p(x)
= flo + alx + al~
and
(Ax)
= bo + b,x + blxl
are in f!J' 2' then
p(x)
+ q(x) ""
(110
+ Vo) + (a l + vl)x + (al + b2 )X2
has degree at most 2 and so is in r;p 2' If c is a scalar, then
cp(x) "" ctlo + calx + cal;; is also in qp 2' This verifies axioms 1 and 6. The zero vector 0 is the zero po lynom ial- that is, the polyno mial all of whose coefficients are zero. The negati ve of a polynom ial p(x) = flo + (/IX + (l lX2 is the polyn om ial -p{x) "" -flo - (l 1X - a2x 2. lt is now easy to verify the remaining axio m s. We will check axiom 2 and Icave the ot hers for Exercise 12. With p{x) and q(x) as above, we have
P(x) + (Kx) = (ao + a\x + a2;;) + (~ + blx + blxl) = (flo
+ bo) + (a\ + b,)x + (a2 +
b2 )Xl
+ (b\ + al)x + (b2 + ( 2)x2 = (bo + b\x + b2xl) + ('10 + (l\X + (l2xl) = q(x) + p(x) = (bo + (10)
where the third equality follows fro m the fac t that addition o f real nu m bers is comm utative.
Section 6. 1 Vector Spaces and Subspaces
435
In general, for any fixed t/ :> 0, the set Cjp" o f all polynomials of degree less than or equal to " is a vector space, as is the set g> of all polynomials.
Example 6.4
Let ~ denote the set of all real-valued fu nctio ns defined on the real line. [f [a nd g arc two such func tions and c is a scala r, then f + g and c[are defi ned by
(f + g)(x)
~
f(x) + g(x)
,,,d
(e!)(x) - if(x)
In other words, the valll l! of f + g at x is obtained by adding together the values of f and g at x/Figure 6. 1(a)l. Similarl y, the val ue of c[at x IS Just the value of fat x mu ltiplied by the scalar c I Figure 6.1 (b)]. The zero vector in c:;. is the constant fu nctlon /o tha t is identically zero; that is,/o (x) ::: 0 for all x. The negative of a funct ion f IS the function - f defined by ( - f) (x) ::: - [( x) IFigure 6. 1(c)]. Axioms I and 6 arc obviously true. Verifi cation of the remaimng axioms is left as Exercise 13. Th us, g. is a vector space.
)'
(x. 2j(x» (x,j(x)
\
+ g(x»
I
2(
f f+g
8
\ _ --+.o:,x:c.f,,"::: '"j:)...-' '~
--
(x. 0)
~~ L--4--'----~ x (x. 0)
f
/ - '-" Jj(x))
(l.
(b)
(a)
)'
(x,J(x)
\ / -fIx»~
f
-f
(x. (e)
"gar. &.1 The graphs of (a) f, g, and [ + g, (b) [, 2[, and - 3[, and (c) f and - f
-3f
'
436
C hapter 6
Vector Spaces
In Example 4, we could also h:we considered o n ly those fu nctions defined o n some closed mterval [a, h] of the real line. T his approach also prod uces a vector space, d enoted by ~ [a, h].
Example 6.5
T he set 1L of integers wit h the usual ope ra tions is lIo t a vector space. '10 de mo nstrate this, it is enough 10 find tha t ol/cof the ten axioms fail s and to give a specific instance in which it fails (a cOllllterexample). In this ca se, we find that we do not have closure under scalar multiplica tion. For example, the m ult iple o f the in teger 2 by the scalar is 0)(2) = whic h is no t an integer. Th us, il is nOI true that ex is in if.. for every x in 1L a nd every scalar c (i.e., axiom 6 fails).
1
Example 6.6
L
Let V = R2 wit h the us ual defi nition of add ilion but the fo llowing defin ition of scala r m ultiplication:
t ]~ [ ~] 1[:] ~ [~] t [:]
Then, for example,
so axiom 10 fal ls. [In fact , the other nine axioms are all true (check Ihis), but we do n ot need to look into the m because V has already failed to be a vector space. This example shows the value of lookmg ahead, rathe r th an working through the list of axioms in the o rde r in which they have been given. )
Example 6.1
LeI (2 de note the set of all o rdered pairs of com plex n umbe rs. Defi ne addition and scalar multi plication as in 1112, except here the scalars are com plex nu mbers. For exam ple,
[ 1+i] +[-3+ 2i] [-2+3i] 2 - 31
(I -
a nd
4
6 - 31
i)[ I+i] ~ [(I-i)( I +i)] ~ [ 2-3i
( l - i)(2 -3i)
2] -I - 51
Using prope rt ies of the complex numbers, it is straigh tforward 10 check that all ten axio m s hold. The refore, C 2 is a co mplex vector space.
In general,
Example 6.8
e" is a complex vector space for all n 2:
I.
If P is prime, the set lL; (with the us ual d efinitions of addition a nd multiplication by scalars from Z,J is a vector space over lLp for all n 2: I.
Sa:tion 6.1
Vector Spaces and Subspace!;
UI
Udore we consider furt her examples. we st,llc a theorem that contains somc useful properties of vecto r spaces. It is Important to note Ihal, by proving this theorem fo r vector spaces In gelleml, we are actually provmg it for every specific vector spact.
Theor•• 6.1
Ltt V be a vector spact, u a vector in V, and c a scalar.
II
3.0U = 0
b.eO = O c. (- l) u = - u d. Ifcu = O,then
C '"'
Oor u = O.
Proal We prove properties (b) and (d ) and leavt Ihe proofs of the rema ining propenies as exercises. (b) \Ve have dI ~ « 0
+
0 ) ~ dI
+
dI
by vector space axioms 4 ::and 7. Adding the negat ive of d) 10 both sides produces
+ (-dl )
dI
~
(dl
+ (0 ) +
(-dl)
which implies
°= dI + (dl + ( - dl )) - cO
+0
By ax iom ~ 5 and 3
By axiom 5
= dl
By axiom 4
c = 0 or u = 0 , let's assume that c ¢ O. (If c = O. there is no thing to prove.) Then, since c r:f. 0, its reciprocal I/c is defi ned, and
(d ) Suppose cu = O. To show that ei ther
u
=
lu
U)' axiom 10
e} I
,
-fro )
I\y axiom 9
I
-0 c
°
Ill" property (b )
We will wn te u - v for u + (- v ), thereby definmg sub/merion of veclo rs. We will also exploit the associativity property of addit io n to unambiguo usly write u + v + w fo r the sum of three vectors and, more generally,
for a linear combiNation of vectors.
Sibspaces We have seen that, in R~. it is possible for onc vector space to sit inside another one, glVmg rise to the notion of 3 subspace. For example. a plane through the ongin is a subspace of R '. We now extend th Ls concept to general vector spaces.
431
Chapter 6
Vtctor Spaces
Dennmon
A subset W of a vector space V is ca lled a sllbspace of V if IV is .tse f a vector space with the same scalars, add ition, and scala r multiplication as V.
As in IR~, checking to see whet her a subset W o f a vector space Vis a subspace of V involves testing only two of the ten vecto r space axioms. We prove this observation as a theorem.
•
it
Theorem 6.2
Let V be a vector space and lei W be a nonempty subset of V. Then \Visa subspace of Vi f and only if the fol lowi ng conditions ho ld: a. Ifu and varei n W, thenu + v is in W. b. If u is in Wand c is a scalar, then cu is in IV.
Prool Assume that W is a subspace of V. Then W satisfi es vecto r space axio ms I to 10. In particular. ax iom 1 is cond ition (a) and axiom 6 is condition (b). Conversely, ass ume that W is a subset of a vector space v, satisfying co nditions (a) and (b ). By hypothesis, axioms I and 6 hold. Axioms 2, 3, 7,8,9, and 10 hold in Wbecause they are true for allveclors in Vand thus are true in particular for those veclo rs in W. (We say that W inlterits these properties from V.) This leaves axioms 4 and 5 to be checked . Si nce W is noncmpty, it contains at least one vcctor u. Then condi tion (b) and Theorem 6. I(a) imply that Ou = 0 is also in W. This is axiom 4. If u is in V,then, bytakingc = - I in condi tion (b ), we have that - u = (- J) u is also in W, using Theorem 6.1 (c).
R,.arr
SlIlce Theorem 6.2 generalizes the no tion of a subspace from the ca nlext of lR~ to general vector spaces, all of the subsp:lcCS of R" that we encountered in Chaptcr 3 arc subspaces o f R" in the current context. In particular, lines and planes th rough the origin arc subs paces of Rl.
-
Ixample 6.9
lxample 6.10
We have already shown that the set ~ n of all polynomials with d egree at most vector space. Hence, (jJ> ~ is a subspace o f the vector space ~ of all polyno mials.
/I
is a
.-t
lei Wbe the set o f sym met ric /I X /I matrices. Show that W is a subspace of M n" .
Sola11011 Clearly, W is nonempty, so we need only check condJlio ns (a) and (b ) in Theorem 6.2. I.et A and B be in Wand let c be a scalar. Then A T = A and 8 T = B. from wh ich it fo llows that (A + 8) T = AT + 8T = A + B
Therefore, A + B is symmetric and , hence, is in \V. Similarly,
(CA )T = CAT = cA so cA is symmetric a nd, thus, is in W. We have shown that W is dosed under add ition and scalar multiplication. Therefore, it is a subspace of M"", by T heo rem 6. 2.
$cClion 6. J
Elample 6.11
Veclo r Spaces and Subspaces
la9
LeI cg be the set of all continuous real-valued functions defined 011 R and let £b be the sel of all d Ifferen tiable real -valued func tions defined on R. Show that <€ and '2b are subspaces o f ~, the vector space o f all real-valued fu nctions defin ed o n R. Fro m calcu lus, we know that if f and g are con tinuo us functio ns and c is a scalar, then f + g and if arc also continuous. "!cnce, <6 is d osed under addition and scala r mult iplication and so is a subspace o f ~. Iff and g :Irc d ifferentiable. then so aTe f + g and cf Indeed ,
SI'llIal
(f + g)' =
r + g'
and
«n' = «f)
So £il is also closed under addition and scalar multiplication, making it a subspace of ~.
It is a theorem of calculus that every d ifferentiable (unction is contin uous. Con scquenlly, ~ is contained in {€ (d enotcx[ by 9i C <€), making lJJJ a subspace o f <e. [t is also the case that every polynomial function is d ifferentiable, so@' C qJJ , and thus q> is a subspace o f <21:1. We therefore have a Inerarclryof subspaces of 5', o ne inside the o th er:
This hierarchy is depicted in Figure 6.2.
, !P
"
-l
~
-"
"
Fllu" 6.2 The hierarchy of subspacC$ of ~
There are other subspaces of SF that can be placed into this hierarc hy. Some of these aTe explored in the exerciscs. In the precwrng discussion . we could have restricted our attent ion to functions denned o n a cloS/.."
EKample 6.12
Let S be the set of all funcrions that 53tisfy the d iffere nt ial equation
r + f= 0 [That is, S is the solutio n set of equation ( I).] Show that 5 IS a subspace of ~.
(I)
• I 440
Chapter 6
I
Vec tor Spaces
I 50lu1lon 5 is nonempty, since the zero function clearly satisfies equation (I ). Let rand g be in S and let c be a scalar. Then
(f+ g)" + (f + g)
~
(f" + g") + (f+ g)
~
(f' + f) + (g" + g)
= 0+0 ~ O
which shows that f
+ g is in S. Similarly, (,f)" + 'I~
'f" + 'I
dJ" + f) = cO ~
~ O
so cf is also in s. Therefore,S is closed under addition and scalar multiplica tion and is a subspace
of 5'. The differential equation ( I) is an example of a homogeneous linear differential
equation. The solution sets of such equations are always subs paces of ?}i. Note that in Example 6. 12 we did not actually solve equation ( I ) (i.e., we did not find any spec ific solutions, other than the zero func tion). We will discuss techniques for fin di ng solu~ tions to this t ype of equatio n in Section 6.7. As you gain experience working with vector spaces and subspaces, you will notice that certam examples tend to resemble one another. For exam ple, consider the vector spaces [R4, 9J> 3' and Mn ' Typical elements of these vector spaces are, respectively,
a b u
In th e words of Yogi Berra, "It s dej il. vu all over again."
EMample 6.13
~
,
, p(x)
=
a
+ bx +
a;l
+ dx 3 ,
d
Any calculations involving the vec tor space o pe rations of add ition and scalar multiplication are essentially the same in all three settings. To high light t he simila rities, in the next exam ple we will perform the necessary steps in the three vector spaces side by side. (a) Show tha t the set W of all vectors of the form
a b - b a is a subspace of [R4. (b) Show that the set W of all polynomials of t he form a s ubspace of 9J> y (e) Show that the set W of all matrices of the form [ _ :
+ bx - bil + (o?
:J
IS
a
is a subspace of Mn ·
,,
(
Section 6.1
,
•
Vector Spaces and Subspaccs
.41
Solutlan (3) W is no n empty beca use it contai ns the ):cro vector O. (Take a = b = 0.) leI u and v be in \.~' say,
,
a
"~
(b) W IS nonem pty because it contains the zero polyno m ial. (Take a = b = 0.) Let p {x ) and q(x) be in W-say,
b -b
and
v
~
AX) =
d
,
+ I,x - bx- + ax' A =
q(x) = c + (Ix -
cJr + ex'
and
Then
Then
+C b+ d
B=
[ - "b b] (l
,I] , [ -d
c
Then
p(x) + q(x)
II
u + v=
tams the zero matri x O. (Take (l = b =' 0.) Let A and B be in W-say,
and
- d
a
(I
(c) W is nonempty because il con-
~
+, b+ d] A +B= [ - (b + d) a + ,
(a + ,)
a
+ (b + d)x
- b- d
- (b + d)"
+ (a+ c)K
a+, b+ d -(b + d)
a+, so u + v is also in W (because It has the right form ), Similarly, if k is a scalar, then
so p{x ) + q(x) is also in W (because it has the righ t fo rm ). Sim ilarly, if k is a scalar, then
kp{x)
ka ku =
=
so A + B is also in W (because it has the right form ). Si m ilarly, if k is a .scalar, then
ka + kbx - kbK + kax'
ka kA = [ -kh
kb - kb
kb] ka
ka so ku is in W. Thus, W IS a noncmpty subset of
R4 that is d osed under addition and scalar multiplication. Therefore, W is
a subspace of R 4 , by Theorem 6.2.
so kp(x) is In w. Thus, W is a no n('m pty subset of IJi'J tha t is closed under addition and scalar m ultiplication. Th('rcfo re, W is a subspace of qp ) by T heorem 6.2.
so kA is in \-V. Thus, W is a nonempty subset of M n that is closed under addition and scalar multiplicat IOn. T herefo re, W is
a sob, p'" of M ".by Th'N' ''' 6.:...t
Exampk 6. 13 shows that it is often possible to relate examples that,on the surfa ce, appear to have nothing in com mo n. Conseq uently, we can apply o ur knowledge of III " to polynomials, matrices, and othe r examples. We will encoun ter this idea several times mlhis chapter and will m ake it pre<:l5e in $eetio n 6.5.
Examgle 6,14
If V is a vector space, then V is clearly a subspace of itself. T he sct 10!, consisting of o nly the u ro vector, is also a subs pace of V, called the zerQ subspace. To show this, we simply no te that the two closure conditions o f Theorem 6. 2 are satlsfi ed: 0 + 0 = 0 The subspaces
and
cO = 0
fo rany.scalar c
tOI and Vare called the trivial subspaces of V.
442
Chapler 6
Vei:tor Spaces
An examination o f the proof of Theorem 6.2 reveals the fo llowing usefu l fact:
-
"
If W .s a subspace o f a vector w ace V, then IV contains the zero vector 0 o f
,
v.
SE"
,
T his fuct is consistent with, and analogous to, the fact that li nes and planes arc subs spaces of RJ if and only if they contain the origin. The requirem ent that every subspace must con tai n 0 is sometimes useful in showing thaI a set is /lot a subspace,
Example 6.15
let
Wb~
the set of all 2 X2 mat r i c~s of the form
Is Wa subspace of M21? 51111111 Each mat rix in W has the p roperty that its (1,2 ) entry IS o nc more than its (1, I) entry. Since th e zero matrix
does not have th is proper ty, it is not in W. Hcnce, W is not a subspace of M n '
EXlmple 6.16
Let lV be the sct of all 2X 2 matrices with determ ina nt O. Is Wa subspace of M21? (Si nce de t 0 == 0, the zero matrix is in W, so the m ethod o f Example 6.15 is of no use to us.) 50111101
Let A ==
[~ ~]
and
B=
[~ ~]
Then det A = det B == 0, so A and B arc in W. Bul
A + B 3E[~ ~ ] so del (A + B) = 1 "* 0, and therefore A + B is not in W. Thus, W is no t closed under additio n and so is not a subspace of MH .
Spalling Sell The nOtion of a span ni ng set o f vecto rs carnes over easily fro m R~ to general vector spaces.
Dennltlon
If S = Iv., v 2" .. , vkl is a SCI o f vectors in a vector space v. then the sct of all linear combinations of v •• v Z" ' " VI iscaUed the span of v • v~ •...• V I and is denoted by spa n (v •• v 1" '" v 1) or span(S). If V ... span(S), then S is called a spanning set for Vand V is .s.1id to be spalUled by S.
Section 6.1
Example 6.11
Vector Spaces :and Subspaces
"3
Show that the polynomials I, x, and K span (jp l '
By its very definition, a polynomial p(x) = a + bx + 1l3tio n of I, x, and K. Therefore, qp1 = spanO, x, xl).
Solullon
a? is a linear combl-
Example 6.17 can clea rly be generalized to show that ~ " = span( I, x, :2-, ...• x" ). However, no finite set of polynomials can possib ly span ~, the vecto r space o f all polynomials. (See Exercise 44 in SectIo n 6.2.) But, if we allow a spanning set to be infi nite, then d early the set o f all nonnegative powers of x will do. That LS, qp = span(l,x,.i, ... ).
Example 6.18
Show thaI MIl = 5pao ( £1I' Ell' Eu ' E.!I' Ezl'
0 Ell = [~ 0 ~] 0 ~I = [~ 0 00]
E;,
Eu), where
~ [~ ~ ~]
c __ .."
[00 0, 00]
F"
~ [~ ~ ~]
[0 0 0]
E,,~OO'
{That is. E'i is thc matrix with a I in row i, (olu mn j and zeros elsewhere.)
Extending this example, we sec Ihat , in general, M"m is spanned by lhe lIm rn:l.triccs Eij' where i = I •. .. ,mand j= 1, ...• 11.
Example 6.19
In CjJ>l' determine whether ,(x) = I - 4x ((x ) = I - x
+ xl
+ 6~ is in span ( p(x). q(x)), where
and
q(x)::::: 2
+x
- 3K
Solallol We arc looki ng for scalars c and d such that cp(x) + dq(x) - ,(x). This means that c( 1 - x
+ r) +
(2
+x
- 3x1 ) = I - 4x
+ 6K
Regrouping according powers of x, we have
(c+ 2d)
+ (-c+ tl)x+ (c- 3t1),r
Equaling the coeffi cients o f like powers of x gives
c+ 2d= I -c + d = -4
c - 3d =
6
= 1 - 4x+ 6x 2
...
Ch:lptcr 6 Vector Spaces
which is easily solved to give c = 3 and d ... - J. Therefore. r{x) = 3p(x) - q(x) , so r(x) is in span (p(x). q(x» . (Check this.)
Example 6.20
In !J. determine whether sin 2x is in span(si n x, cos x). We set C Sin x + i/ cos x "" sin 2x and try to determine c and Ii so that th is equ::ltion is true. Since these are function s, the equation must be true for (III values of x. Setting x = 0, we have
$0111101
csinO + dcos O = Sin O or
c(O) + d(l ) = 0
from which we see Ihat i/ = O. Setting x = rr / 2, we get csin(7T/ 2) + dCOS(7T/2}
= sin(7T)
or
c( l ) + d(O)
=0
giving c = O. Bu t this implies that sin 2x "" O(SIn x) + O(cos x ) = 0 for all x, which is absu rd, since sin 2x is not the u ro function. We conclude thai sin 2x is not in span (sin x. cos x) .
••••'l It is true that sin 2x can be written in terms of sin x and cos
x. ror
example, we have the double angle for mula sin 2x = 2 sin xcos x. However, th is is not a lillcilrcombination.
Ellmpl,6.21
[~
In Mw descnbe the span of A "" SIIIII..
'], 8=[ '
°
°
O,].and C=[ O,
0'],
Every linear combination of A, B, and C is of the for m
CA+dB +eC= C[ ~ ~]+d[ ~ ~]+ e[~ ~] ~
[<+ , / <+ ,] C+ d e
This matrix is symmetric. so span(A, 8, C) is conta ined within the subspace of symmetric 2 Xl matrices. In fact, we have equality; that is, every symmetric 2X2 matrix is in span(A, B, C). To show this. we let [;
[;
~] be a symmetric 2X 2 matrix. Setting
~]=[~: ~
c:e]
and solving for cand d, we fi nd that c = x - z, d == z, and e = - x + y + z. Therefo re, [;
~] = (x - Z)[: ~ ] + z[ ~ ~ ]
+ (- x + y +
Z)[ ~ ~]
(Check Ihis. ) It follows that span (A, B, C) is the subspace of symmetric 2 X 2 matrices.
-l
Section 6.1
Vector Spaces and Sub~paces
445
As was the case in R~, the span of a set of vcrtors is always a subspace o f the vector space that conta ins the m. The next theorem makes th is result precise. It generalizes Theorem 3.19.
'~-,~~~~~~==~---===== Theorem 6.3 Let VI' vz, ... , vt be vectors in a vector space V, a. span(vl'v!"", Vk) isa subspace of V.
b. span (v l> v2'
••• ,
v.l) is the smallest subspace of Vthal contains v I' v2' · · · ,
k'
Proof
(a) The proof of property (a) is ident ical to thc proof of Theorem 3.19, with R" replaced by V. (b) To establish property (b), we need to show that any subspace of V that contains V I> v 2' •.. , vt also contains span(v" v 2' . . • , v k}' Accordingly, let Wbe a subspace of V that w ntains " I> v2' . • . , .... l. Then, since W is closed under addition and scalar multiplication , it contains every linear combination CIVI + C~V2 + ... + ckV k of vI' V I" •• , vr Therefo re, span (v i • v l " .. , vt ) is contallled in W.
ExerCises 6.1 In Exercises 1-11, determine whether thegivell set, together with the specified operatiolls of additioll and scalar mulriplicatiOlI, is a vector space. If It is 1I0t, list all of t ile axioms !lrat fail to hold 1. The set of all vectors in
R2of the
form
[:J.
with the
usual vcrtor addiuon and scalar multiplication
2. The set of all vectors [ ;] in Rl with x C!: 0, Y 2: 0 (i.e., the first quadrant), with the us ual vector addition and scalar multiplication
3. The set of all vectors [;] in 1R2 with xy C!: 0 (i.e., the union of the fi rst and third quadrants), with the usual vector addition and scalar multiplication 4. The set of all vectors [ ;] in R2 with x
~ y, with the
usual vector addition and scalar multiplication
5. IR', with the usual addition but scalar multiplication defi ned by
6. 1R 2, with the usual scala r multiplication but addition defi ned by
X'] + [""] _ [x' +"" + [Y I Yl -YI+y,+
I] l
7. The set of all posiuve real numbers, with addition defined by xE!) y = xy and scalar multiplication 0 defined by c 0 x "'" x'
E:B
8. The set of all rat ional numbers, wi th the usual additIOn and multi plication 9. The set of all uppe r triangular 2X2 matrices, with the usual matm additio n and scalar multiplication 10. The set of all 2 X 2 matrices of the form [:
: ].
where ad :::: 0, with the usual matrix addition and scalar multiplicat ion
11. The set of all skew-symmetric 71X n matrices, with the usual matflx additio n and sca lar multiplication (see Exercises 3.2). 12. Fin ish veri fying tha t qp l is a vector space (see Exampie 6.3) . 13. Finish verifying that ~ is a vector space (see Example 6.4).
...
Cha pt~r
6
Vector Spaces
•• ~ III Exercises 14- / 7, delt:rl/lll1e whether the gIven set, toge/I, er w;th the specified operntiollS of (uldition (wd scalar multipUcmioll, is a complex vector space. If it is nor, list all of the axioms thnt fnil 10 /IO/d. 14. The set of all vectors in C 1 o f the for m
[~], with the
usual vector add ition and scalar multiplication 15. The sct M",~(C ) o f all m X " comple x matrices, wi th the usual ma trix addi tion and scalar multiplication 16. The set
el, with
the usual vector addit ion but scalar
multiplication defin ed by 17.
c[::] = [~~]
Rn, with the usual vector add ition and scalar multiplicat ion
III Exercises 18-2 1, determille whether the give" set, together wirh tlJe specified operatiolls of mld,t,oll (lml scatt" multipU-
alliorl, IS a vector space over tlJe illdicated Z,.. If it IS IIOt, Ust rill of lite rlxiOIllS II/at fat! to Itold.
18. The set of aU vectors in Z; with an tvt'n numocr of I s, over Zz with the usual vector additio n and scalar multiplication 19. The set of all vectors in Zi with an odd number o f Is, over Z, with the usual VC1:tor addition and scalar multiplication 20. The set M"",(Z,J of all m X " mat rices With entries from Zp> over Zp with the usual ma trix addition and scalar multipl icatio n
21. 1 6 , over ill with the usual additio n and multiplicatio n (Think this o ne Ihrough carefu lly!) 22. ProveTheorem6.1 (a).
23. PrQ\'e Theorem 6.1 (c).
In Exercises 24-45, lIse Theorem 6.2 to determine whether W is a subspace ofY.
27. V = Rl, W =
• b
I. I
28. V=M n ,W = {[:
2~)}
29. V=M n ,W = { [ :
~] :ad2bc}
30. V = Mn~' W = lAin M",, : det A = I} 31. V = M".., W is the set o f diagonal
"X" mat rices
32. V = M"", W is the set o f idem potent nXn matrices 33. V = At"", \V = IA in M",, : AB = BA}, where B IS a given (fixed ) matrix
34. V ~ ~" W = {bx+ d} 35. V = CJ>:z, W= fa + bx+ a 1:u + b+ c= O} 36. V=~" W = {.+ Itr+ d ,abc=O} 37. V =
~,
W is the set o f all polynomials o f degree 3
38. V= '§, W = {n n '§'f(- x) = f(x))
39. V = 1/', IV = (f ;,, 1/" f( - x) = - f(x))
'0. V = S;, IV = (f; n 1/' , f(O) = I) 41. V = :1', IV = 1f;":I', f(O) = O} 42. V = '§, IV is the set o f all llliegrable fu nctions 43. V = 9i, IV = {fin ~: r ( x) ~ 0 for all x} 44. V = ,§, w = (€ (l), the sct of all fu nctions with continuous second derivatives ~ 45. V =
,-,
1/', IV = (f h' 1/', Um f(x) = 00)
46. leI Vbe a vector space with subspaces U and W Prove that u n W IS a subspace of V. 47. Let Vbe a vector space wit h subspaces U and HI. Give an example wit h V "" Rl to show that U U W need nOI be a subspace of V. 48. Le t Vbe a vecto r space with subspaces U and \V. Define the slim of U t,ml W 10 be
U+ W = lu + w : u isin U, w is in W]
25. V = R', W=
•
-. 2.
26. V = Rl, W=
a b a+b+1
(a) If V = IR:J, U is the x-axis, and W is the y-axis, what is U + W ? (b) If U and Wa re subspaces of a vector space V, p rove Ih:1I U + W /s a subspace of V.
49. If U and Yare vector spaces, define the Cartesian product of U and V to be U X V = leu , v) : u isin Uand v isi n VI
Prove that U X V is a vector space.
Secllon 6.2
50. Let W be a subspace of a vector space V. Prove that = !( w, w ): wislll W I is asubspace of VX V.
a
In Exercises 51 (lnd 52, let A = [ 8 =
\ -I] [ 1
\ - \
5I. C=[~!]
52.C =[~
-5] - \
xl-
54. sex) = I
58. hex)
= situ
:].[ ~ ~]. [: ~]t
-~}
60. Is M22 spanned by [ :
~].[ : ~].[: :].[~
-~]?
62. IsttP 2 spannedbyi
span(p(x). q{x). r( x)).
56. h(x) = cos 2x
59. ISM21SpannedbY [~
61. Is (jJ>1 spanned by I
In Exercises 53 011(/ 54. let p(x) = 1 - 2x, q(x) = x - X l , alld r(x) = - 2 + 3x+ x 2. Determine whether s(x) IS in 53. s(x) = 3 - 5x -
=I 57. h (x) = sin 2x
55. h(x)
\ ] and \
O · Determine whether C is ill span (A, 8 ).
en
I.mear Independence, BasIs, and Dimension
+ x,x + xl, 1 + Xl? +x+ 2x",2 + x+ 2X 2,
-1+ x+2x 2?
63. Prove tha t every vector space has a unique zero vector.
+ x + xl-
64. Prove that for every vector v in a vector space V. there is a unique ,,' in V such that v + v' = o.
In Exercises 55-58, let f (x) = sin 2x ami g(x) = cos 2x. Defermine wlle/ller II(X) is ill spaf/(f (x), g(x)).
Linear Independence. Basis. and Dimension In this section , we extend the notions of linear independence, basis, and dime nsion to general veclor spaces, generalizing the results of Sections 2.3 and 35. In most cases, the proofs o f the theo rems ca rryove r ; we simply replace R" by the vector space V.
linear Iidependence DeOnIllD.
A set of vectors {V I ' v2, ••• , vk} in a vector space V is linearly de· pendent if there are scalars CI' C:l, ... • c1, allerul one of wJUda..u.. 0, such that
A set of vectors tha t is not linearly d ependent is sa id to be linearly independen .
As only if
In
RIr, Ivp v" . . . , vA.} is linearl y inde pendent in a vector space V if and
We also have the following useful alternative formulation of linear d ependence.
441
Chapter 6
Vector Spaces
Ii
Theorem 6.4
A set of vectors l VI' V2" . • , v k } in a vector space Vis linearly dependent if and only if alieasl one of the vectors can be expressed as a linear combination of the others.
Prill
•
The proof is .dentical to that of Theorem 25.
As a spc
Example 6.22
I n ~2, t h eset I I
+ x+
Xl,
2( 1 + x
Example 6.•.2
+ r) - ( I - x + 3x 2 )
Xl)
is linearlydependcllt ,sincc
1 + 3x - x 2
=
In Mw let
Then A
EXlMPI. 6.24
1 - x + 3x 2, 1 + 3x -
+B=
C, so the set lA, B,
C\ is linea rl y depe ndent.
In <,if, the set {si n 2x, cos 2x, cos 2x\ is linearly depende nt, Since cos 2x = cos 2x - sin2x
Example 6
Show that the set {I, x, i, .. . , x"1 is linearl y independent in f!J>~.
Solution 1 Suppose that Co. cl'"
. ,
e" are scalars such that
(() . J +CI X +~X2 +···+,/=O
sr
Then the polynomial p (x) "" '0 + cl x + + ... + c"x w is zero fo r all values of x. But a polynomial of deg ree at m ost n cannot have more than II teros (see Appendix D). So p(x) must be the zero polyno m ial, meaning that '0 "" '. = s = . . . = ' " = O. Therefore, {I, x, xl, ... , xnI is linearly independen t.
SDlltI,. 2
We begin, as in the first solu tio n, by ass um ing that
p(x) "" Co
+ ' IX +
~X 2
+ ... + 'nx n ""
0
Since this is true for all x, we G ill substitute x = 0 to obtain Co = O. This leaves 'IX
+ C:2 x' + ... + ,,,x''
= 0
Xclion 6.2
Linea r Independence, Basis, and Dimension
449
Taking deriva! ives, we obtain
" + 2ctx + 3c}r + ... + nc,,x'-l
== 0
and setting x = 0, we see that" = O. Differentiati ng2c2 x + 3C) X 2 + ... + ncnx n- I = 0 and setting x = 0, we find that 2e} = 0, so 'J = O. Cont in uing in th is fashion, we find that k!ci = 0 for k = 0, ... , II. Therefore, Co = " = '2 = .. . :::: en = 0, and ! 1, x, X-, . .. ,x'" is linearly independent.
Example 6.26
[n q>2' de termine whether the set I I
+ x, x + ;;C, 1 + xl i is linearly mdependent.
Solution Let c.. ' 2> and ') be scala rs such
thai
,,(I + x) + G.!(x + Xl ) + '3( 1 + xl)
=
0
the solution to wh ich is " = ~ = c} = O. It follows that I I linearl y independent .
+
Then
T his implies that
c, +
cJ= O
"+'2
= 0
' 2 + C)= 0 X,
x
+ x 2, 1 + xli
is
Remark
Co mpare Example 6.26 wuh Example 2.23(b). T he system of equ3tions tha t arises is exactly the same. This is beca use of the correspondence between W'2and R3 that relates
I I + x~
I,
0 X+X 2 ~
I ,
o
I
and prod uces the columns of the coefficient m atrix of the line::lr system that we have to solve. Thus, showing that {J + x, x + x 2, 1 + x 2! is linearly independen t is equivalent to showing tha t
o
I
I
I • I • 0
o
I
I
is linearl y ind ependent. This can be do ne simply by establishi ng that the matrix
I
0
I
I
I
0
o
I
I
has ran k 3, by the Fundamental Theorem of Invertible Matrices.
450
Chapler 6
Vector Spaces
Example 6.21
1n ,*, determine whether the set lsin x, cos
x!
is linearly independent.
50lull00
The functions f(x) = sin x and g(x) = cos x are Imearly dependent if and only if one o f them is a scalar multiple of the other. But tt is clear from their graphs t hat this is not the case, since, for example, any nonzero multiple of f(x) = sin x has the same zeros, none of which are zeros of g(x) = cos x. This approach may not always be appropriate to use, so we offer the following d irect, more computat ional method. Suppose e and d are scalars such that csinx
+ dcosx = 0
Setting x = 0, we obtain d = 0, and seUmg x = nil, we obtain c = O. Therefore, the set {sin x, cos xl is linearly independent.
Although the definitions of linear dependence and independence are phrased in terms of finite sets of vectors, we can extend the concepts to il1fimre sets as fo llows:
A set 5 of vectors in a vector space V is linearly dependent if it contains finitel Yi many linearly dependent vectors. A set of vectors that is not linearly dependent is saId to be finearly independe'l t.
Note that for fin ite sets of vectors, this is just the original definition. Following is an example of an infinite set of lmearly mdependent vectors.
(umple 6.28
In W>, show that S = II, x,
x-, ... 1 is linearly independent.
SolulioA
Suppose there is a finite subset T of 5 that is linearly dependent. Let x
"+
Cn ... IX n +l
+ ... +
,..m -C",.-.
0
But, by an argument sim ilar to that used in Example 6.25, this implies that cn = cn... 1 = ... = cm = 0, whICh is a contradiction. Hence,S cannot con tain finitely many linearly dependent vectors, so it is linearly independent.
Bases The important concept of a basis now can be extended easily to arbitrary vector spaces.
Definition
A subset B of a vector space Vis a basis for V if
I. B spans Vand 2. B is linearly independent.
S«t.on 6.2
Linear IndeJ)(ndence, BMi~ and Dimension
451
Example 6.29
If c, is the ith column of the ,. X fl identity matrix, then called the standard basis for R~.
Example 6.30
{I, x, Xl , •.
Example 6.31
The set £ = { E••, . .. , E, ... ~I"" , ft. ... E"" •... ,E-} is ;) basis for M.~, where the matrices E'I are as defined in Example 6.18. £ is called the standard basis for M..... We have already seen thai t: spans M ",n' It is easy to show that £ is linearly independent. (Verify this!) Hence, £ is a basis fo r Min,,'
Example 6.32
Show that 6 = {1 + x,x +
.,
Ie.. c l > • •• , e,,\ is a basis for R ~,
xl> I is a basis for qJ>~, called the sumdard btlSis for qJ> ,..
Xl,
I + .0}isaba si s for ~l'
We have already shown that 6 is linearly independen t, in Example 6.26. To show that 8 spans f/P 2' let a + bx + ex! be an arbitrary polynomial in ~l' We must show that there 3re scalars c" ';}. and t; such that
Solulloa
(,( I
+ x) +
C:l( x
+
x2)
+
cil
+
x 2) = ()
+
/JX
+
ex l
or, equivalently,
Equating coefficients of like powers of x, we obtain the linear system
(, +
Cj = a
which has a solution, Since the coefficien t matrix
I
0
I
I
I
0 has rank 3 and, hence,
o
I
I
is invertible. (We do nO I need 10 know wllll/ ihe solution is; we only need to know that it exists.) Therefore, B is a basis for r;p l '
Remar. Observe that the matrix
I I
0 I lOis the key to Example 6.32. We can
o
I
I
immediately obtain it using the correspondence between flP'1 and RJ, as indicatC'd the Remark foUowing Example 6.26.
In
.52
Chapter 6
Veclor Spaces
Example 6.33
Show that 6 = { I, X,
X l , ..• }
isa basis fo r ~.
[n Example 6.28, we saw that 6 IS linear[y mdependent. It also spans IJJ>, since clearly every polynomial IS a linear combination of (finite ly many) powers o f x.
Solution
-tExample 6.34
Find bases for the three vector spaces in Example 6.13:
a
(,) W, =
b -b
a
Solullon Once again, we will work the three examples side by side to highlight the similari ties amo ng them. In a strong sense, they are all the SClme ex:ample, but ]t will rake us until Section 6.5 to make this idea perfectly precise.
('J
(b) Sin ce
Since
"b :
-b
"
"
1
0
0
1
0 1
+ b
0
1
0
0
1
0 1
'nd
n +bx - bx 2 +nxl = a( l + x 3) + h(x - xl)
- I
we have WI = span (u , v ), where
u:
(c) Since
v =
we have W z = span ( ll(x), V(xl), where
u(x) = 1 + x 3
- I
0
Since lu, v) is clearly linea rly indepe ndent, it is also a basis fo r WI'
and
v(x)
=
we have W3 = span( U, V) , where
u=l~~]
and
v = [ _~~]
x - x2
Since lu (x), vex» ) is dearly linearly independent, it is also a baSIS for W2•
Since 1U, VI is dearly linearly in dependent, it is also a basis for WJ •
Coordinates Section 3.5 in troduced the idea of the coordinates of a vector with respect to a basis for subspaces of Rn. We now extend thiS concept to arbitrary vector spaces.
Theorem 6.5
Let V be a vector space and let 6 be a basis for V. For every vector v in V, there is exactly one way to wri te v as a linear combination of the basis vectors in 13.
Proof
The proof is the same as the proof o f T heorem 3.29.11 works even if the basis B is infmite, since linear combinat ions are, by defi nition, finite.
Sectton 6.2
453
Linear independence, Basis, and Dimension
The conve rse of Theorem 6.5 IS also true. That is, If 13 is a set of vectors in a vector space V \" ith the pro perty that every vector in V can be wri tten uniquely as a linear combination of the vectors in 13, then B is a basis for V (see Exercise 30) . In this sense, the unique representation property characterizes a basis. Since representation of a vector with respect to a basis IS unique, the next definition m akes sense.
Definition
Let 13 = { V I> V2' • . " v~} be a basis for a vector space V. Let V be a vector in V, and write v = ci v, + '7v2 + ... + env no Then (I' (2' .. . , cn arc called the coordinates ofv with respect to B, and the column vector
c,
c, c. is called the coordi1mte vector of v with respect to 8.
O bserve that if the basis 13 of Vhas n vectors, then [vlLl is" (colum n) vector III Rn.
Example 6.35
Find the coordinate vcrtor [P(X) ]8 of p (x ) = 2 - 3x dard basis B = {i, x, Xl} of rz; 2.
SoluUon
+ 5Xl with respect to
The polynomial p(x) is already a line" r combination of i , x. and
the sta n-
xl, so
2 [P(X) iB~
-3 5
This is the correspondence between QIl 2 and IR3 tha t we remarked o n after Example 6.26, and it can easily be generalized to show that the coordin,lIe vector of a polynomial
p(x) = ~
+ alx + alx 2 + .. +
with respect to the standard basis 13 = { i,
X,
an x~ in qp~
x 2, .. . ,xn} is jus t the vector
a, [p(x) iB~ a,
in IRn+1
a. The order in which the basis vectors appear in 6 affects the o rder of the entries III a coordina te vector. For examp le, in Example 6.35, assume that the
1I,•• rl
.5.
Chapter 6
VeclOr Spaces
standllrd basis vecto rs lire o rde red as B' = {x 2, p (x) = 2 - 3x + 5; wit h respect to B' is
X,
I}. Then the coordillate vector o f
5
[p(xl l.
lumpls 6.36
Find the coo rd inate vecto r
~
,
- 3
-']
[A],~ of A = [~
3 with respect to the standa rd basis
l3 = {Ell' E12 , £:z l' £:zl}o f M12 • SoluUon
Since
2
-,
we have
4 3 ThIS is the correspondence between Mn lind IR" that we no ted bcfo rc thc intro· duct ion to Exam ple 6.13, It too can easily be generalized to give 11 corrcspondcnce b etween M",~ and R""'.
lump Ie 6.31
Find the coordinate vector [ p(x)]/3 of p(x) = 1 + 2x C = II + x, x + x 2, I + x 2 ) o f W> 2'
SOIl iiol
Xl
wit h respect to the basis
We need to find cl ' c2' and c, such that
' 1(1 + x ) + Gj(x + x 2) + c3(1 + x 2) "" 1 + 2x - x 2 or, eq uivalently,
(el
+ eJ ) + (c1 +
Gj )x
+ (Gj +
c, )x 2 = 1 + 2x - x 2
As in Exam ple 6.32, this m eans we need to solve the system
+ c1 + ( I
1
(J ""
(2
""
2
c2 +(3=- 1 whose solution is found to be el
= 2, C:.! = 0, £3 = - I. T herefore, [p(xl lc
~
2 0
-,
Secllon 6.2
Linear Independence, Basis, and Dimension
(Since this result says that p( x) "" 2( I correct.)
+ x)
- (I
455
+ xl), it is easy to check that it is
The next theorem shows that the process of forming coordinate veclOrs is compatible wi th the vector space operations of addition and scalar multiplication.
Theor,. B.B
lei 6 "" {VJ' VI" .. , V~} be a basis for a vector space V..Le.LLLand v be vectors in and let c be a scalar. Then
a. [u + v]s "" [u]s + [vls b. [cuJ. = cl uj.
Prllt
We begin by writing u and v in terms of the basis vectors-say, as
Then, using vector space properties, we have
,nd so
d,
c, ~
[u + v]s ""
[eu]s
and
+
d,
= [uJ. + [vJ.
=c
=
ee.,
e.,
An easy corollary to Theorem 6.6 states t hat coordinale vectors preserve linear
combinations:
[ c, u, + "" ..
i!""OCC'
!. ' " :
- ;;-;c
c;-"f"-:
(I )
You are asked to prove this corollary in Exercise 3 1. The most useful aspe<:1 of coordinate vectors is that they allow us to transfer information fro m a general vector space to R~. where we have the tools of Chapters I to 3 al our disposal. We will explore this idea III some detail in Sections 6.3 and 6.6. For now, we have the following useful theorem.
456
Chapter 6
Vector Spaces
Theorem 6.1
.=
Let B = { V I ' V2, . .. , V,,} be ,I basis for a vector space V and let U p" " u. be vectors in V. Then {up ' .. , u,J is linearly independent in V if and only if {[ u .18'" . , [ u t18} is li nearly independent in IR".
PrUD'
•
Assume that lu I" .. , ud is linea rl y independent in Vand let
'.[ U.J8 + ... + Ck[ Uk]8
=
0
in lit". But the n we have
[C.ul + ... + C,Ul:J8 =
0
using equation ( I), so the coordin:ltes of the vector CI U I :Ire all zero. That is, C.U I
+ ... + ClU,.with
+ ... + (:,u •. = Ov l + Ovl + ... + Ov"
respect to B
=0
The linear independence of {u I' ••• , ud now forces ci = c2 = . = Cl: = 0, so {[ u .18" .. • [ udB} is lin early independent. The converse implic:ltion, whi ch uses si milar ideas, is left as Exercise 32. Observe that, in the special case where u, =
v, = Q'v l
+ ... +
I'Yi
V i'
we have
+ ... + a ·v"
DImensIon The definit io n of dime nsion is the same for a vector space as for a subspace of R"the number o f vectors in a basis for the space. Since a vector space can have more thall olle basis, we need to show that this definition makes sense; that is, we need to establish that different bases for the same vector space contain the same number of vectors. Part (a) of the next theorem generalizes Theorem 2.8.
Theore.6.8
Lei B =
{V I '
vl " ' "
v,,} be a basis for a vector space V.
Any set of more than 11 vectors in V must be linearly dependent. b . Any sct of fewer than n vectors in V cannot span V.
3.
I
Pruol (a) Let {u p . .. , u "'} be a set o f vectors in V, WIth m > 11. Then {[ u l ]B' ... , [U,n]8} is a set of more tha n tI vectors in R" and, hence, is linearly dependent, by Theorem 2.8. lIlIS means that
ful' ... , uml
is linearly dependent as well, by Theorem 6.7.
(b ) LeI {up ... • u m} be a set of vecto rs in V, with m < II. Then S = {[ U' ]B' " . ,[ u",jB} is a set of fewer than 11 vectors In R". Now span (u p " " u "') = V if and only if spa n(S) = IRn (see Exercise 33). But span (S ) is just the column space of the tlXm mat ri x
so dlm(span(S» does not span V.
= di m (col(A»
<:
Now we extend Theorem 3.23.
m
<
11.
Hence, S cannot span 1R",so {up"" u rn)
Section 6.2
I,
Theil •• 6.9
Linta r Independenct', nasis, and Dimension
C51
T he Basis TheOl"em If a vecto r space V has a basis with " vectors, then every basis for V has exactly vectors. 7
!
:
7
11
•
The proof of Theorem 3.23 also .....orks here. virtually word fo r word. Ho .....ever. it is easier to make use of Theorem 6.8. Proal Let [3 be a basis for V with " vectors and let [3' be ano ther basis for V wi th In vecto rs. By Th eorem 6.8, III :S II; otherwise, 13 ' would be linea rl y dependen t. Now usc Theorem 6.8 wit h the roles o f Band B' in terchanged. Since B' IS a basis o f Vwith ttl vectors, Theorem 6.8 impl ies that any set o f more than III veClo rs in V is linearly depende nt. Hence, " :S III, since B is a basis and is, therefore. linearly independent. Since n :S' III and III :S' II, we must have 1/ = III, as req u ired. The following definition now makes sense, since the number of vecto rs in a (fini te) basis does not d epend o n the chOIce of basis.
Denaman
A vector space Vis called finite-dimensiollal if it has a basis con sisting of fi nitely many vectors. The dimension of V. denoted by dim V, is the num~ ocr of vecto rs in a basis for V. The dimension of the zero vector space {OJ is define 10 be zero. A vector space thm has no fi nite basis is called j"fi"jte-dim ell siomd.
Smce the st:md:ard basis for RR has n vectors, d im R- "" II. In the case of A', a onedimensional subs pace is just the span of a slllgie nonzero vector and thus is a line through the o rigin: A two-dimensional subspace is span ned by its basis of two linearly independen t (Le., nonparallel) vectors and therefore is a plane thro ugh the ongin. Any three li nearly mdepcndent vectors must span R J , b y the Fund:lmen tal Theorem. The subs paces of R} are no\" completely classified according to di mension, as shown in T:able 6.1 .
••Ie 6
EII.ple 6.a9
dimY
v
3 2
R'
I
Plane through 0 Line through 0
o
101
The standard basis for f!J> n co ntai ns /I
+
I.
/1
+
I vectors (sec Example 6.30), so dim ~ n =
458
Chapter 6
Vector Spaces
Example 6.40
The standard basis for M",n contains mn vectors (see Example 6.31), d im M"'1f = ",n.
Example 6.41
Both t:JJ> and '!F are infinite-d imensional, since they each contai n the infinite linearly independent set 11, x, xl, . .. J (see Exercise 44).
EKample 6.42
Find the dimension of the vecto r space W of symmetric 2X2 matrices (see Example 6.10 ).
Solullon
SO
A sym metric 2X2 matrix is o f the form
[: ~J
=
a[~ ~] + b[~ ~] + 1~ ~]
so Wis spanned by the set
If S is linearly independ ent, then it will be a basis for W. Setting
a[~ ~] + b[~ ~] + c[~ ~] = [~ ~] we obtain
from which it immediately follows that (j = b = c = O. Hence, S is linearly independent and is, therefore, a basis for W. We conclude tha t d im W = 3.
The dimension of a vector space is its "magic num ber.» Knowing the dimension of a vector space Vprovides us with much infor mation about Vand can greatly simplify the work needed in certain types of calculations, as the next few theorems and examples illustrate.
Theorem 6.10
Let Vbe a vector space with dim V = a. b. c. d. e. f.
II.
Then
Any linearl y independen t SCI in V contains at most" ve Any spanning set for V contains at least /I vectors. Any linearly independent set of exactly II vectors in Vis a basis for V. Any span ning sct for V consisting of exactly rl vectors is a basis for Any linearly independent set in V can be extended to a basis for V. Any span ning set for V C.111 be reduced to a basis fo r v.
•
Section 6.2
Linear Independence, BasIs, and I)i nlcnslon
459
Prill The p roofs ofproperlLcs (a) and (b ) follow fro m parts (a) and (b) o fT hoorem 6.8, respectively.
(e) Lei S he a linearl y independent set of cxacliy /I vecto rs in V. If 5 docs no t span V, then there is some veclOr v in V that is not a lm ear combinatio n of the veClors in S. Inserting v into S prod uces a set S' with II + .l vectors that is sti ll linea rl y independent (sec Exercise 54), But this is impossible, by Theorem 6.8(a). \\'e conclude that S must span Vand therefo re be a basis for V.
(d ) lei S be a spanning sct fo r V consisting of exactly /I vectors. If 5 is linearl y d ependent, then some vector v in 5 is a hnear combination of the others. Th rowing v away leaves a set S' with /I - I vectors that still spans V (see Exercise 55). But th is IS impossible, by Theorem 6.8(b ). We cond u(le that S must be lincfl rly independ('IH and therefore be a basis fo r V. (e) Let 5 be a linearl y mdependent set o f vectors m V. If S spans v, it is a basis fo r V and so consists o f exactl y II vecto rs, by the Basis T heo re m. If 5 does not span V, then , as in Ihe proof of property (e), there is some vecto r v in Vlhat is not a linear comb inatio n of the vectors in S. In sert ing v into S produces a set S' tha t i ,~ still linearl y ind ependent. If S' still does nOI span V, we ca n repeat the process and expand it into a larger, li nearly inde pendent set. Eventuall y, this process must stop, since no linearly independent sct in V can conlai n mo re tha n 1/ vecto rs, by Theorem 6.8(a ). When the process stops, we have a linearl y Independent set S* Ihat co ntains 5 and also spans V. Therefore, S* is a basis for V that extends S.
(0 Yo u are asked to pro\'( this p ro perly in Exercise 56, You should view T heorem 6. 10 as, in part, a labor-savlllg device. In many instances, it can d ramatically decrease the am ou nt of work nceded to check that a set o f vecto rs is linearly independent, a spa nn ing set, or a basis.
llampl.6.43
In each case, determine whether 5 is a basis for V.
(a) V =
+ x,2 - X + xl,3x - 2xl , _ 1 + 3x + Xl }
(b ) V = Mn,S = {[ :
~J. [~ -~].[~
-:])
(c) V = Cjp!. S = { I +x.x+ xl, 1 + Xl}
$,1.11,. (a ) Since dim (~~ ) = 3 and 5 contains fou r vectors,S IS linearly depende nt, by Theorem 6.10(3). Hen ce. S is not 3 basis for 1J> 2' (b ) Since d im(M12) "" 4 and S contains th ree vectors, S cann ot span M2l' by Theo rem 6.IO(b ). Hence, S is not a basis for M 22• (c) Since dim(~ l ) = 3 and 5 contains three vectors, S will be a basis for ~l if it is linearly independent o r if It spans ~ 2> by Theorem 6.IO(c) or (d ). It is easier to show thai 5 is linearly inde pendenti we did this in Example 6.8. Therefo re, S is a baSIS fo r qpl' (Th is is the s.1 me p roblem as in Example 6.32-but see how much easier it beco mes using Theorem 6. 10!)
..+
lllmpI16.44
Exlend / 1 + X, I -
xl to a basis for W>l '
First note that / 1 + x, I - xl is linearly independent. (Why?) Since d i m(~ !) = 3, we need a third vector-one Ih at is no t linearly dependen t o n the fi rs t
$.1111..
.&a
Chapler 6 Vector Spaces
two. We could proceed. as in the proof of Theorem 6. I O(e). to find such a vector using trial and error. However. it is easier in practice to proceed in a different way. We enlarge the given set of vectors by throwing in the entire standard basis for qp 2' This gives
Now S is linearly dependent, by Theorem 6.IO(a), so we need to throwaway some vectors-in this case, two. WhIch oncs? We use Theorem 6.1 O(f), starting with the first vector that was added, 1. Since I = !(I + x) + !(1 - x). theset /I + x, I - X. tj is linearly dependent, so we throwaway I. Similarly, x = HI + x) - i(1 - x), so { I + X. 1 - x, xl is linearly dependent also. Finally, we check that II + x, I - x, x 2 1is linearly independent. (Can you see a quick way to tell this?) Therefore, /1 + x, I - x, rl is a basis for 'iJ' 2 that extends II + x, I - xl.
In Example 6.42, the vector space W of symmetric 2X2 matrices is a subspace of the vector space M21 of all 2X2 matrices. As we showed, dim W = 3 < 4 = dim M 22 • This is an example of a general result, as the final theorem of this section shows.
,
Theore ••. 11
Let Wbe a subspace of a fin ite-dimensional vcc t(lr space V. Then
a. W is finite-dimensional OInd dim W S dim V. b. dim W = dim Vif and onlyifW - V. . .,
(a) Let dim V = II. If W - {OJ. then dim( W) = 0 S 11 = dim V. If W is nonzero, then .my basis B for V (contai ning n vectors) certainly spans W, since W is contained in V. But B can be reduced to a basis B ' for W (containing at most n veclors), byThwr~m 6. 10(f) . Hence, Wis finite-dimenSional and dim( W ) :5 II = dim V.
PnDI
(b) If W = V, then certainly dim W == dim V. On the other hand, if dim W = dim V = n, then any basis 6 for W consists of exacl ly 11 vectors, But these are then 11 linearly independent vectors in Vand, hence. a basis for V. by Theorem 6.IO(c). Therefore, V = span (S ) = W.
hI Exercises 1-4, test tile sets of matrices for /illear i"depe,,-
dence it, Mll' For those that art finearly dependent, express on~ of the matrices as a linear combination of the others.
1.{[~
_:].[:
2. {[ :
-~ ]. [;
3. {[
4.
{[~
: ].[:
~].[~
: ].[ :
~]}
III Exercises 5-9, test tile sets of polYllomjnls for li'lcur indc· pel/dellce. For t/lOse IIIat are linearly dependent, express one of the poiYllomials as a lillear combillatioll of the oll'ers.
=; ;].[~ 0][ 0 2] [-I ~]} 1 ' - 3
1 ' - 1
5. \x. I + xl in !Jl , 6. {I + x, I + x 2• I - x + x 21 III '21'2 7_ tx, 2x - x 2, 3x + 2Xl} in !Jl 2
Section 6.2
8. {2x.x -
Xl,
I + xl,2 -
X2
+ x'i in eP]
9. {I - 2x, 3x+x 2 - x' , 1 + x 2 +2Xl.3 + 2x+ 3X' ] i n ~l
LineM Independcncc, Basis, and DimenSIOn
20. V = Mll,6 = {[~ ~],[~ ~]. [~
:]. [ :
481
~]}
21. V = Mw
In Exermes 10-1 4, lestlhe sets of fi mctions for /rl/ ear imlependellce ill :y;. ror tllOse that are lil/eMly depel/dem, express one of the fimcriOPu as a linear combmation of tire otlrers. 10. {I,sin:c.cosx] II- {1,sin 1x,cos!xl 12. {e"', e-.rl 13. {I , In (2x).ln (x 1)1
sin 2x, sin 3xl p . Iffand gare in «5('), the vector space of all functions 14. {sin x,
with cominuous derivatives, then the determinant
W(x) = f(x) ['(x)
g(x) g'(x)
W(x) =
·• · fl'- ' ~ x)
[,(x) fi(x) ·• •
J1 ' - ' ~ x)
•• •
•
f.(x) f;(x) • • •
... r.'- '~ x)
and h, .. . ,f~ are linearly independent, provided W(x) is not identically zero. Repeat Exercises 10-14 using the Wronskian test. 17. Let lu, v, wI be a linearly independent SCt of vectors in (1 vector space V.
(a) Is lu + v, v + w, u +
wi linearly independen t?
Either prove that it is or give a counterexample to show that it is not. (b) Is ju - v, v - w , u - w i li nearl y independen t? Either prove that it is or give a counterexample to show that it is not.
III Exercises 18-25. detem.ine whether ,I.e wt 6 is a basis for the vector space V 18. V = M21 ,6 =
22. V = ~ l ' B = {x, I + x. x - x 2} 23. V = f!t ~.B = (I - x, l - x 2,x - x 2 } 24. V = Cfl> 2' B
IE
{I, I
+ 2x + 3x 2}
25. V = ([P2,13 = {1,2 - x,3 - x 2,x
+ 2X2}
26. Find the coordinate v«tor of A =
[~
!J
with
respect to the basis 6 = {Eu • E,., En . El .} of Mn.
.s called the Wronskian of f and g (named aft er the Polish- French mathernatician J6sef Maria Ifacne· Wronski ( 1776-1853), who worked on the theory of determinants and the philosophy of mathematicsl. Show that land g are linearly independent if their Wronskian is not identicaUy zero [that is, if there is somexsuch that W(x ) *- 01. ~ 16. In general, the Wronskian of j; •. .. ,j" in ~ (" - I l is the determinan t
f,(x) f :(x)
a=W:l[: ;l[-; :J,[~ :l[; ~]}
{[~ : J. [~ -~J.[:
1 9. V = M21.6 = {[~ ~J.[~ -~],[ :
_:]}
:].[: _:]}
27 . Fmd the coordinate vtttor of A =
[~
:] with respect
to theb
:]}
of Mu ' 28. Find the coordinate vector of p(x) = 2 - x + 3x 2 with respect 10 the baSIS B = {I + x, I - x., X l } of '1/' l'
29. Find the coordinal'e vector of p (x ) "" 2 - x
respect to the basis 6 = {I, I +
x,
+ 3X l with
- I + Xl} of~l'
30. Let 8 be a set of vectors in a vector space V wi th the property that every vector in V can be written uniquely a5 a linear combination of the vectors in B. Prove that 8 is a basis for V. 3 1. Let B be a basis for a vector space
be vectors in
V,
and let C] '
. •. , ck
v, let U, ' ... ' UL
be scnlars. Show that
[CIU , + .. . + C1 U.]S = c.[ulls + . . +
CL[ U k}l:I'
32. Finish the proof of Theorem 6.7 by showing Ihatlf
{[ u,Jlf> ' " ,(U ~] B} is linearly independent in R- then {UI" .. , Uk} is linearly independent in V. 33. Let {u ,•. . .• u",} be a set of vectors in an
II-dimensional vector space Vand let 6 be a basis for V Let S = {( U,]B•. .. , ( u",] s} be the set of coord inate vectors of (UI> . . . , u'"} wit h respect to 6. Prove that span( u ,' . .. , u"') = V if and only if span(S) = R". /11 Exerciws 34- 39, fi nd the dill/ellsiOI/ of tI/e I'ector splice V (md give (I
hllsis for V
34. V = {p(x); n !'i', 0 P(O) = 0) 35. V = {p(xj;n!'i', op(l ) = 0) . V - {p(x) ;n !'i', 0 xp'( x) = p(x))
482
C hapter 6
Vector Spaces
37. Vo: {A in Mn:A is upper lriangular}
54. Let S = {Vi•...• v.} ~ a linearly independent set in a vector space V. Show that if v is a vector in V that is not in spa n{S), then S' " {VI" '" v~, v} is sl'iJi linea riy independent.
38. V = {A in M12: A is skew-symmetric}
39. V= {A inMn:AB = BA}.where B =
[~
:]
40. Find a formula for the dimension of the vector space of symmetrtc nXn matrices. 41. Find a form ula for the dimension of the vector space
of skew·sym metric
II X II
mat rices.
v,,} be a basis (or a vector space Vand let c... . . , en be nonzero scalars. Prove that {clv" ... , e"v,,}
57. Let { VI"'"
42. Let U and W be subspaces of a fin ite-dimensional
vector space V. Prove Grassmtum's Identity:
is also a basis for V.
dim(U+ W ) == dimU + dimW - dim(Un W )
[Hi"t: The subspace U + W is defined in Exercise 48 of Sect ion 6.1. Let 8 == Jv,. ...• v*' be a basis for un w. Extend l3 10 a basis C of U a nd a basis V of W. Prove that CuD is a basis for U + w. J 43. Let U and Vbe fi nite-dimensional vector spaces.
(a) Find a formula for dim(U X V ) in terms of dim U is infin ite-dimensionaL ( Him: Suppose it has a finite basis. Show that there is
some polynomial that is not a linear combination of thiS basis.) 45. Extend I I + x. I + x + 46. Extend {[ 47. Extend
~
: ].
[~
Xl )
to a basis for ra> 2'
:]} to a basis for M12•
{[~ ~]. [~ ~]. [ ~ -~] } to abasisfor M~.
48. Extend { [
~ ~], [ ~ ~] } to a basis for the vector
space of symmetric 2 X2 m3trices. 49. rind a basis fo r span( l . I + x, 2x) in 1P 1. 50. Fi ndab3sisfor span (1 - 2x,2x- Xl , 1- x 2, 1 + x 2 ) in (jJ> l ' 51. Findabasisfor span (J - x,x- X l , I - x 2 , 1 - 2x+ X l ) in f!J>2' 52. Fi nd a basis for
span ([~ ~]. [~
'] [-' ']
O·
[ - 1, - I' ]) ;nM"--.
53. Find a baSIS for span(sin1x, cos 2x, cos 2x) in ~.
55. Let S == {VI" ..• v,,} be a spanning set for a vector space V. Show that ifv" IS in span (v l •• .. , V,,_ I)' then S' = {VI" .. • v n- I} is still a spann ing set for V. 56. Prove Theorem 6. IO(f).
I
- I '
58. Let {Vi' ...• v ..} be a basis fora vector space V. Prove thai
{VI' VI + Vl, VI + v1 + V,' ...• VI + ... + v,,} is also a basis for V.
Let (Ie, Ill>' • . ,(I" be n + I dis/mel rea/ nllmbers. Defil1c polynomials pJ"x), plx), . . .• p.(x) by .I ) _
(x - IlO) ". (x - a' _I)(x - a,+I)'" (x - a. )
p" .' - (a, -
"0) . . (a, - (1, _1)( (I,
-
a,+I)' " (a, - an)
These are all/ell tlte lAgrange polY'lOmials associate,' with tIo, Ill" •. , an' IJoseph·wllis Lagmllse ( 1736- 1813) was hom i,1 ffaly bllt spent most of his life ill GermallY ami Frallce. He made important cOllf riblltioll$ to mel! fields as /lumber theory, algebra, astronomy. mechanics, and the calculus of variatiOllS. I" 1773, lAgrnnge WtlS tile first to give the volume iflterpreltl tioll of a determinant (see Chapter 4).1 59. (a) Compute the Lagrange polynomials associ31cd With a." = l ,u 1 = 2, ° 2 :: 3. (b) Show, in general, that
p,(a,) =
t
ifi "'} if j = j
60. (a) Prove that the set 13 = {Alx), plx), .. . , p,,(x)} of Lagrange polynom ials is linearly independent III ~,..IHjm:Se t GJAix) + ... + c"p,,(x) == Oand use Exercise 59(b).] (b) Deduce that B IS a basis fo r flI'". 6 1. If q(x) is an arbit rary polynomial in qp M it follows from Exercise 6O(b) that
q(x) = '>Po(x ) + ... + "p,(x) for some sca l ars~, ... , c,..
(l )
(a) Show that c, == q(a,) for i = 0•. .. , n. and deduce th .. q(x) = q(a,)p,(x) + ... + q(a. )pJx) ;sth, unique represen tation of q(x) with respect to the basis B.
Settion 6.2
(b ) Show that fo r ;lny n + I points (Ug. Co), (al' c, ), . .. , ( tI~, cn) with distinct first components, the (unction q(x) defined by equation ( I) is the unique po lynomial of degree a t most I1 lha l passes th rough all of
the points. This formula is known as the Lagrange ;"'erpolation formula. (Compare this formula with Problem 19 in E..xploration: Geometric Applications of Determinants in Chapler 4.) (cl Usc the L1grangc interpolation for mula to fin d the polynomial of degree at most 2 that passes through the points
tmear Independenct. Bas,s, and Dimension
461
(i) (1. 6). (2. -i) .,nd (3. - 2) (ii) ( -I, 1O), (O,S), and {3,2)
62. Use the Lagrange interpolation for mula to show that if a polynomial in <jp" has /I + I zeros, then it must be the zero polynom ial. 63. Find a formula for the number of invertible matTices
in M",,(lL p )' [Hint: This IS the smue as determini ng the number of different bases for Z;' (Why?) Count the number of ways to construct a basis for one vector at a lime.]
Z;,
............':
· .,,"
-~.
.
-
Magic Squares T he engraving shown on page 465 is Albrecht Durer's Melancholia 1 (1514). Among the many mathematical arlifacts in this engraving is the chart of numbers that hangs on the wall in the upper right-hand corner. (It is enlarged in the detail shown.) Such an array of numbers is known as a magic square. We can thmk of it as a 4X4 matrix 16
3
2
13
51011
8
9
6
7
12
4
15
14
I
Observe that the numbers in each row, in each column, and in both diagonals have the same sum: 34. Observe further that the entries are the integers 1,2, ... ,16. (Note that Durer cleverly placed the 15 and 14 adjacent to each other in the last row, giving the date of the engraving. ) These observatIOns lead to the followlllg definition.
Definition
An nX rI matrix M iscaHed a magic sq uare if the sum of the entries is the same in each row, each column, and both diagonals. This common sum is called the weight of M, de noted wt(M). If M is an /IX 11 magic square that contains each of the entries 1,2, . .. , r? exactly once, then M is called a classical magic
square.
I.
If M is a classical nX II magic square, show that
WI(M)
11(112 + I) 2
( Hint: Use Exercise 45 in Section 2.4.) 2. Find a dassical 3 X3 magic square. Find a differen t one. Are your two exam · pies related in any way?
•••
.,
:
"
~
i
<
•
,li
j
•~
J
•
•
3. Clearly, the 3X3 matrix with all entries equal to ! is a magic square wilh weight I. Using your answer to Problem 2, find a 3x3 magic square wi th weight I, all of whose entries (Ire different. Describe a method for constructing a 3X 3 magic square with dis tinct entries and weight IV for any rca I number w. lei Mag" de nOie the set of allll X" magic squares, and let Ml1g~ denote the set of all flXn
magic squares of weight O.
4.
(a) Prove that M(% is a subspace of M j ]. (b) Prove that M(lg~ is a subspace of Mag,.
5. Use Problems 3 and 4 to show that if M then we ca n wrile M as
IS a
3X3 magic square with weight w.
M = Mo+ kJ where Mo is a 3X3 magic square o[weight 0, J is the 3X3 ma trix consisting entirely of Is and k is a scalar. What must k be? ( Hint: Show that M - kJ is in MaiJ for an appropri
M ~
a b ( d e f g h
, ,
be a magic square with weIght O. The condItions on the rows, columns, and diagonals give rise to a system of eight homogeneous linear equations in the variables a, b, . .. ,i. 6. Write o ut this system of equations and solve it. (Note: Usi ng a CAS will f
...
7. Find the di mension of Mal,. Hint; By doing a substitution, If necessary, use yo ur solution to Problem 6 to show that M ca n be written in the form
,
M ""
t
- $ -
I
o
-s + t
>+
- I
I
8. Find the d imension of Mu&. Hlllr: Using Problems 5 and 7, show that every 3 x 3 magic square has the fo rm
r+ s r - s+t r- l
r -s- t r
r+ s+ t
r+ t r+s - t r -s
9. Can yo u find a direct way of showing that the (2, 2) entry of a 3 X 3 magic square with weight IV must be w/ 3? ( Hint: Add and subtract certam rows, columns, an d d iagonals to leave a mu ltiple of the central entry. ) 10. Let M be a 3x3 magic square of weIght 0, obtained from .. classic.. 1 3X3 m agic squa re as in Problem 5. If M has the form given in Problem 7, wnte out an equa tion fo r the sum of the squa res of the entries of M. Show that this is the equation o f a ci rcle in the variables s and I, and carefully plot it. Show that there are exactl y eight points (s, t ) on this circle with both s and I in tegers. Usi ng Problem 8, show that these eight points give rise to eight claSSIcal } x3 magic squares. How are these magic squares related to o ne anoth er?
Section 6.3
Change of BaSIS
.'1
Change d. BasiS In many ap plica tions, a problem described using o ne coordinate system may be solved mo re easIly by switching to a new coordinate system. This switch is usually accomplished by performing a change of variables, a process that you have probably encou ntered 10 other mathematics courses. In linear algebra, a baSIS provides us with a coordinate system for a vcctor space, via the notion of coordinate vectors. Choosing the right basis will often greatl y simplify a particular problem. For example. consider the molecular structure of z.ine, shown in Figure 6.3(3). A scientist stud ying zmc m igh t wish to measure the lengths of the bonds between the ato ms, the angles be· tween these bonds, and so a ll. Such a n analysis will be gre'ltly facilitated by IIltrooucing coordinates and making use of the tools of linear algebnl. The standard basis and the associated standard xyzcoordinate axes are not always the best chOIce. As fi gure 6.3{b) shows. in this case lu , v, w I is p ro bably a better chOIce of basis for RJ than the standard basis, since these vectors align nicely with the bonds between the atoms of zinc.
,
(.)
(b)
Figure 6.3
Cbanue-ol-Basis Matrices Figure 6.4 shows two different coordi nate systems for R 2, each arising from a d ifferent basis. Figure 6.4(a) shows the coordinate system related to the basis B :: {uJ' ulh while Figure 6.4(b ) arises from the basis C :: {VI' VI}' where
The same vector x is shown relative to each coo rd inate system. It is ele'lT from the diagrams that the coordinate vectors of x wit h respect to Band Care
[xl,
= [:]
.00
[xJc
= [ _~]
respectively. It tu rns out that there is a direct connectio n between the two coordinate vectors. One way to find the rela tionshi p is to use (" 111 to calculate
461
Chapter 6 Vooor Spaces
y
)'
3", -4
- 4 (b)
Figure 6.4
Then we can fin d [x ]e by writing x as a Imear combination of V I and v2 • However, th ere is a better way to proceed-----one that will provide us wi th a general mechanism
for such problems. We ilIuslrale this approach in the next exa mple.
Example 6.45
Using the bases l3 and C above, find [x ]e. given that [ x ]1> ""
[~].
u l + 3u 2, w ri ting u l and " , in terms of v j and required coordinates of x with respect to C. We find that
Solution Since x
=
UI
=
V1
wi ll give us the
[ -~ ] "" -3[~] + 2[:] "" -3vj + 2v,
", =
[-:l
=
3 [~l - [:l = 3 v,-v,
'0
This gives
in agreement with Figure 6.4(b ).
This method may nOl look any easier than the one suggested prior to Example I. but it has one big advantage: We can now find (Y]c from {Y]8 for any vecto r y in R2
$eClion 6.3 Change of Basis
469
with very little additional work. Lei's look at the calculatIOns in Example 6.45 from a different point of vicw. From x :: U I + 3ul , we have
by Theorem 6.6. Thus,
[xk ~ [[u'k[u'k)[ ~l
~ [- : -:l [~l ~
I1x)8
where P is the matrix whose columns are [u l 1c and [uIle. This procedure generalizes vcry nicely.
Dollnillon
Let B = {up . .. , u~} and C = {Vi•... ' v n} be bases for a w eto space V. The /I X" matrix whose columns are the coordinate vcctors lUll c.··· ~ (unle of the vectors in B with respect to C is denoted by Pc....s and is called the change-of-basu matrix from B to C. That is.
P'_8
~
[[u,),[u,], ... [u.k) .
Think of B as thc "old basis and C as the "new" basis. Then the columns of Pc- s arc just the coordinate vectors obtained by writing the old basis vectors in terms of thc new oncs. Theorem 6.12 shows that Example 6.45 is a spccial case of a gcneral reSli lt. ft
Ii
,--_ ..:.1:2: -
Let B = {up ...• un} and C = Iv!' .. . • vnl bc bases fo r a vector space V and I Pe_8 be the change-of-basis matrix frOIll B to C. Then a. P' o-s[ xls= [x]c forallx in V. b. P, _s is thc unique matrix Pwith the property that P[xls c. PC_ B is invert ible and (Pe_ 8) I - Ps-c. Prl al
(a) Let x be in Vand let
e, . •
That is, x
= CI U I
+ ... + ' nu n' Then
[xl, :: =
[ CIU I
+ .. +
CI [ U I ] ,
("u ~ ),
+ .. + c"[unJc e,
~
[[u,], .. . [u.]e] :
[xle fOLa ll X in V.
410
Chapter 6 Veclor Spaces
(b ) Suppose that P is an nXIl matrix with the property that P[x [e = (xlc for all x in V. Taking x = u i• the ith basis vector in B. we see that [xl e = [u.le = ej~ so the ith col umn of P is
P. ~ !'e. ~ P( u.J. ~
[u,J,
which is the ith colum n of Pc.....8> by defi nition. It foll ows that P = Pc+- tl . (c) Since {UI• ... , un) is linearly independent in V, thesel l (u llc> "" [u"lel is Jim'arly independent in [Rn, by Theorem 6.7. Hence, PC....B = [ [ ul]e [ u Je is invertible, by the Fundamental Theorem. For all x In V, we have Pc+-Bi: xlB = [xle. Solving for [xJB. we fi nd that
[xJ,
~
(P, _,)- '[ xk
for all x in V. Therefore. (PC+-S)- l is a matrix that changes bases from C 10 B. Thus. by the uniqueness property (b). we must have (PC_B) -I = PB.....c.
Remarks • You may find it helpful to thin k of change of basis as a transformation (indeed, it is a linear transformation) fro m IR" to itself that simply switches from one coordinate system to another. The transformation corresponding to PC+- B accepts [ X)B as inpu t and returns [xlc as out put; (PC+-S)-l = PB--f! does just the opposite. Figure 6.5 gives a schematic representation of the process.
x
•
I Ie /
"--.. I I.
/ ' Multipliclltion ~
by PC- Ii
R" ngure 6.5 Change of b aS IS
•
•
• R" M uhipl icaLion by
i'u-c "'" (PC-U) - I
• The columns of Pe_ 8 are the coordinate vectors of one basis with respect to the other basis. To remember which basis is which, thlllk of the notation C f - 8 as sayi ng "8 in terms of C." It is also helpful to remember that pc_u[xlu is ,\ linear combination of the columns of Peo- e. Bu t si nce the result of this combination is [xl" the columns of PC_ B must themselves be coordinate vectors wi th respect to C.
Example 6.46
Find the change-of-basis matrices PC_I> and Ptlo-C for the bases 8 :::: {I, x, x 2} and C "" {I + X, x + X l , I + X l) of'lJ 2• Then find the coord inate vector of I'(x) = I + 2x - x 2 with respect to C.
Solullon
Changing to a standard basis is easy. so we fi nd PlJ-c firs t. Observe Ihat the coordinate vectors for C m terms of Bare
o
I
[l+x jB =
I .
o
[X+ Xl ]B =
I , I
I
[ 1 + x 1],6
=
0 I
Section 6.3
Change of lJasis
.U
(Look back at the Rema rk followi ng Example 6.26.) It fo llows that I
PB-c=
0
I
II
O
o
I
I
To find PCo-li. we could express each vector in B as a linear combination of the vectors in C (do this), but it is much casier 10 use the fac l that PC_ B = ( PI:I-C) - I, by Theo rem 6.12(c). We fi nd that
, 1, _1, _1 , 1, ,1 ,1 _ 1, 1, 1
(I'a ...d
PCo- B -
-I
=
It now fo llows that
,1 1,, -~ _1, , 1, _1, 1, I
I
2 - I
2 0 - I
which agrees with Exam ple 6.37.
Rlmlrk
If we do no t need PCo- Bexplicitly, we can find [p(x) ]c fro m [p(x) ]s and PB-c using Gaussi:lll elimination. Row reductIOn produces
(Sec the next section o n using Gauss-Jordan eliminatio n.) It is worth re~ating the observation in Example 6.46: C hanging to a standard basis is easy. If t is the standard basis for a vector space Vand 8 is any other basis, then the columns of Pt _ s are the coordmate vectors of 8 with respect to [ . and these arc usually "visible." We make use of this observation again in the next example.
In M w let 8 be Ihe basis IEII' E21' E12, En ! and leI C be the basis lA, B, C, Dl, where
A-
[10]
00 '
8 -
[II] 00 '
C-
[I ~]. I
D=[ :
Find the change-of-basis matrix PC-I:> and verify that [ Xlc
[; !l
z:::
:]
pc_J X]s
for X =
.12
Chapter 6
Vector Spaces
Solulltn 1 To solve this problem d irectly. we must fi nd the coordinate vectors of B with respect to C. This involves solvi ng four linear com bmation problems of the form X = aA + bB + cC + dD. when' X is in B and we mu st find a, b. c, and d. However, he re we are lucky, since we can fi nd the required coeffi cients by inspection. Clearly. Ell = A, ~ I = - B+ C, EI2 = - A + B,and ~2 = -C+ D. Thus.
[ EI1 ]C =
If X =
[~
0 , 0 0
- I
0 - I
1
1
, [E"k = 1
[E, ,]c=
0
[E,, ]c =
,
0 0
0 0
1
0
- I
0
- I
1
0 0
1
0 0
0
- I 0 0
- I 1
! ]. then 1
3
2 4
and
PC_8[XJ.
=
1
0
-I
0
1
0
- I
1
0
0 0
1
0 0
- I
3 2 4
0
1
=
- I -I - I 4
T h is is the coordinate vector with respect to C of the malrix
-A - 8 -C+ 4D= - [ ~ ~] - [~ ~) -[ : ~]+ 4[ : =
[~
!]
:]
= X
as it sh ould be.
Solullon 2 We can compute PC<-B in a different way, as follows. As yo u will be asked to prove in Exercise 21, If E is another basis for Mw then PC.... B = PC.....t: Pt:_B = (Pt:<-c) - IPt:_B' If E is the standard basis, then Pt:.-B and Pt:<-c can be fo und by inspection. We have
I 000 o 0 1 0
o
100
00
0
I
I
and
Pt:-c =
o o
111 1 1 I 0 1 1
000
I
Section 6.3
)
Change of Basis
413
(Do yo u sec why?) Therefore. PCo- B
= (PC+...c)-l p{+-8 1
1
1
1
o 1 o o
1
1
1
1
o o
0
1
1 000
-1
0
1
- 1
o o
o
1
0 -1 - 1 1
0 0
- 1
o
1
o o
0 0
o
o
0
1
1
1 0 0 o 00 1 o 0 1 0 o 000 1
o
I
0
- 1
o
0
0
1
0
1
o
1
o
0 1
000
which agrees with the first solution.
RI.ark The second method has the advantage of not requiring the computalion of any linear combinations. It has the disadvantage o f requiring that we find a matrix inverse. However, using a CAS will facilitate finding a m3trlX inverse. so in general thc second method is preferable to the first. For certain problems. though, the first method may be just as casy to use. In an)' event, we are about to describe yet a third approach. wh ich you may fi nd best of all.
The GauSS-Jordan Melhod lor Com puling a Change-ol-Basls Matrtl Finding the change-of-basis matrix to a standard basis is easy and can be do ne by inspection. Finding the change-or-basis matrix from a standard basis is almost as easy. but requires the calculation of a Immix inverse. as in Example 6.46. If we do it by hand, then (except for the 2X2 cast) we will usually find the necessary inverse by Gauss-Jordan elimination. We now look at a modification of the Gauss-Jordan method that can be used to find the change-of-basis matrix between two nonstan da rd bases. as in Example 6.47. Suppose B = {u l • .. • , un} and C = {VI•...• v~} arc basts for a vector space Vand PC +-I> is Ihe ch
p,, p,,, so u, = PI ,V1 +
... + Pmvn. 1f £ is any basis for
V, then
This can be rewritten in matrix form as
p"
[[v,I,"
[v.l,]
[u,l,
cn
Chapt~r 6
V«lor Spaces
WlllCh we ca n so lve by applying Gauss- Jordan eliminat ion
10
the augmented matrix
[[v,J, ... [v.J, I[u,J'] There arc 11 such systems o f equations 10 be solved, one for each column of 1" _8' but the coefficient matnx ([ VI)e ... [v ~)d is rhe smile ill encl. case. Hence. we can solve all the systems simulta neously by row reducing the IIX 211 augmented matrix
[[ v,J, ... [v.J,I[u,J, ... [u.J,J
~
[CIDJ
Since {v l1 ••• , v"l is linearly independent, so is {[v ile' .. .• [ v~1:}' by Theorem 6.7. Therefore. the matrix C whose columns are [v l]t'> .. . • [v..]e has the nX II ident ity matnx 1 for its reduced row echelon form , by the Fundamental Theorem. It follows that Gauss-Jordan elim ination will necessanly produce
[CI DJ.... [II PJ where P = Pc_/:!. We have proved the following theorem.
Theora. 6,13
Let B = {u l, ... , u ~} and C = {VI" • .• VitI be bases for a vector space V. Let B =[[ u de ... [ u ~]c ] and C=[[ vl]c ... (v..lc], wheret'is anybasisforV. Then row reduction applied to the " X2 11 augmented matrix (C 8 ] produces
[CIBJ .... [l l l'c_. J ,
If t' is a standard basis, this method is particularl y easy to use, si nce in that case B = Pe_/:! and C = Pc<-C. We illustrate this method by reworking the problem in Example 6.47.
Example 6,48
Rewo rk Example 6.47 using the Gauss-Jordan method.
Solul1on Taking [, 10 be the standard basis for Mn . we s« th3t
B = Pe-
8
=
1 0
1
1
1
1
0 0
0 0
I
I
1
0 0
1
1
0
1
0
0 0 0 1 0 1 0 0 0 0 1
"d c = Pe<-c =
0
Row reduction produces
[C I DJ ~
1
1
1
1 1
0 0 0
1
1
1 0
0 0
1
1 0
0
1 0
0 0 0 0 1 0 1 0 0 0 0 1
1
0
•
0 0
0 0 0 1 0 1 0 0 0 -I 1 0 1 0 0 0 0 0 1 0
(Check.) It follows th3\
Pc_/:! =
as we round berore.
1
0
- I
0 0
- I
1
1
0
0
0 0
0 0
- I 1
-I
0
1
0
0 0
- I 1
Seclion 6.3
In Exercises 1-4: (a) Find the coordinate vectors [x lB (HId [xJc ofx with respect to the bases 6 and C, respectively. (b) Find the change-oJ-basis matrix PCo-8from 6 to C. (c) Use YO llr answer to part (b) to COIIIl'IIte [xle. alld compare yOlI( answer with the one fOllnd in part (a). (d) Find the clumge-of-basis matrix PBo-C from C /0 B. (e) Use your answers to paris (c) and (d) to compute [x J8> and compare YOllr answer with tile one JOUlld In part (a).
3. x =
1
1
o
0 ,6 =
0 , 0
1 , 0
-1 1 c~
1
,
1
0
0
1
, 0
1
1
3
4.x =
I ,8
5 c ~
1
I ,
o,
0
0
1
0
0
I ,
I , 0
0
1
Pco- B
I
in
~j
1
=[_: -~ J
16. Let Band C be bases forQJ> 2. If 6 = {x, I + X, I - x + xl} and t he change-of-basis matrix from B to C is 100 021 - I
I
I
find C. Xl,
X + x 2},
C= {I, 1 + x,xl}in QJ>2
III calmlus, you leam that a Taylor polynomial of degree n
In Exercises 9 and 10, follow the instructiolls for Exercises 1-4 IIsing A instead of x.
~ {[~
{ [ ~], [ ~]} and
find 6 .
8.p(x) = 4 - 2x- x 2,6= {x, 1+
C
III Exercises 11 and 12,jollow the Im tructions for Exercises 1-4 IIsingf(x) ills/ead of x. 11. f(x) = 2 sin x - 3 cos x, 6 = {sin x + cos X, cos xl, C = {sin x, cos x} in span(sin x, cos x) 12. f(x) = sin x, B = {sin x + cos X, cosx},C = {cosx sin x, sin x + cos x} in span (sin x, cos x)
the change-of-basis matrix from B to C is
,
C={I,x,x Z}in~2
o
~ {[~ ~]. [~ C ~ {[~ :J. [ :
B
15. U=I Band C be bases for R'. If C =
III Exercises~. follow the imtructions for Exercises 1-4 IIsing p(x) instead of x. 5. p(x) = 2 - x,6 = {1,x},C = {x, I + x} in 9J'1 6.p(x) = I +3x,6 = {l +x,1 - x},C={2x,4}in Q/'1 2 2 7 . p(x) = I + x , 6 = {I + x + X l, X + X Z, x },
9. A = [ 4
:].
14. Repeat Exercise 13 with 0 = 135°.
in W
o
1
,
1
0 ~
415
13. Rotate the xy-axes in the plane counterclockwise through an angle 8 = 60° to obtain new x' y' -axes. Usc the methods of this section to find (a) the x' y' -coordinates of the point whose xy-coordinates are (3, 2) and (b) the xy-coordinatcs of the point whose x' y' -coordina tes are (4, - 4).
0
o
IO. A=[ :
Change of Basis
about a is a polYllomial of the form
2 ], 6 = the standard basis,
ao + aj(x - a) + a,(x - a)l + ... + a,,(x where an *" o. In other words, it is a polynomial/hat has
-:J. [:
beell expanded ill terms of powers of x - a illstead of powers of x. Taylor polYllomials are very useful for approximating futlc/ions that are "well behaved" lIear x = a.
- 1
~J. [~
:J. [~
~]} in M"
p(x)
=
at
Chapter 6 Vee/or Spaces
416
The set 8 = {I, x - a, (x - af .. ., (x - a)If} is a basis for9P nfor(IIIY real /IIIII/ ber (/. (Do YOI I see a quick way to show tills? Try uSing Throrem 6.7.) TIIis fact allows us to lise the techniqlles of tllis section (0 rewrite a polYllomial as a 1(lylor poiYllolllwl abollt a given a.
be bases for a timte-dlmensional vecfO r space V. Prove that
21 . Let 8. C, and V
22. Le t V be an II-dimensional ve
17. Express p {x ) "" 1 + 2x - 5.i as a Taylor polynomial about a = I. 18. Express p(x)
about
tI
6 = {v,•... , v,,}. Let Pbe an invertible " X II matrix and set
= 1 + 2x - sxl as a Taylor polynomial
= -2.
19. Express p(x) = ,; as a Taylo r polynomial about a = - \.
20. Express p(x) ""
r
for i = I, ...• II. Prove that C = {u l , ••• , U n} isa basis fo r Vand show that P = PB....c.
as a Taylor polynom ial abo ut a = ~.
linear Translormalions We encountered linear transformations in Se<;tion 3.6
the context of matrix transformations from R" to R'". In this section, we extend this concept to Ime,lr transfor· matlons between arbitrary vector spaces. In
•
;:::J
&Ic&
Defialiion
A linear tramformation from a vector space \'10 a "ector space \V is a mapping T : V -+ W such thai. for all U and v in Vand for all scalars (, I. T ( u + v) = T ( u ) 2. T{cu) '""' cT(u)
+
T(v) ,
It is straightforward to s hOl~ that th iS definition is equivalen t to the req uiremen t
that T preserve all linear combinations. That is, I
T : V -+ \Vis a linear transformation if and only if
for all vl'" . , v l in Vand scalars cl •···• cl . ..,
Example 6,49
i
Every matrix transformat ion is a linear transformat1on. That is. if A is an m X " matrix, then the transfo rmation 'f~ : IRN -+ Rm defined by
TA(x ) =
Ax
for x in R"
is a linea r transformation. This is a restatement of Theorem 3.30.
Stclion 6.4
Example 6.50
Uncar Transformations
.11
Defi ne 1": M~~ ~ M"" by T(A) = A1'. Show that T is a linea r tra nsformation. SOlull.1
We check that, fo r A and B in
M"~
and scalars c,
T(A + 8) - (A + 8)'- A' + 8' - T(A) + T(8) .nd
T(uI) - (uI)' - uI' -
Therefore. T is a linear transformation.
Example 6.51
Let D be the differential operator D: £h ~ ~ defined by D(f) = f . Show that D is a linear transfo rma tion.
lei f and g be differentiable function s and let c be a scalar. Then, fro m calculus, we know that
SalulioD
r + g' -
D(f+ g) - (f+ g)' .nd
D(if) - (if)' -
D(!) + D(g)
if - ,D(f)
Hence, D is a lincar transfo rmatio n.
In calculus, yo u learn tha t every continuo us functIon on la, hJ is in tegrable. The next example shows that integration is a linea r transformation.
Example 6.52
Define 5 : "€[ (I, b] -+ R by S{j) ==
Solililln
J: f(x)dx. Show that 5 is a hnear transformation.
Let f and g be in !€ Ia, bJ. Then S(f + g) -
fu
+ g)(x) dx
•
- f(f(X) + g(x»dx
•
- f/(X)
dx
+ f'g(X)dx
•
- S(!) + S(g)
.nd
S(if) - «if)(X) dx
~
• f'
•
- {f(X)dx
• - <S(!) It follows that S is linear.
•
411
Chapter 6 Vector Spaces
Example 6.53
Show that none o f the follow ing transformations is linear: (a) T: Mn ----+ IR: defi ned by T( A) = det A
IR: -+ IR: defined by T(x) = r T: IR: -+ IR: defined by T(x) = x + 1
( b ) T: (c)
50lullon
In each case, we give a specific counterexample to show that one of the properties of a linear transformation fails to hold.
[ ~ ~]andB =[ ~ ~J.ThenA + B=[~ ~lso
(a) LetA =
+ B)
T(A
~ d et (A
+
1 B) ~ 0
0 1
~
I
But T( A) + T(8) = detA
so T(A
+
*-
B)
+
o + o
1 detB = 0
o
0 0
1
= 0+0 = 0
T(A) + TCB) and Tis not linear.
(b ) Letx = landy = 2. T hen
T(x + y) so T
IS
=
T(3)
=
2J = 8
"* 6 =
2' + 2z = T(x) + T(y)
not li near.
( c) Letx = landy = 2. Then
T(x + y)
~
T(3)
~ 3
+
1~ 4
*5
~ (I
+
I)
+
(2
+
I)
T(x ) + T(y)
Therefore, T IS not linear.
Hamarll Exam ple 6.53(c) shows that yo u need to be carefu l when you encoun ter the word " linear." As afllnc/ion, T(x) = x + I is linear, since its graph is a st raight line. However, it is not a /incar transformation from the vector space IR to itself, since it fails to satisfy the definition. (Which linear fu nctIOns from IR to IR will also be linear transfonnatlO ns?) There are two special linear transformations tha t deserve 10 be singled out.
Example 6.54
(a) For any vector spaces Vand W. the transfo rmation To : V -+ W that maps every vector in V to the zero vector in W is called the zero transformation. That is,
Tu{v) = 0
forallvinV
(b) For any vector space V, the transformation I: V ----+ V that maps every vector in V to itself is called the identity trttnsformation. That IS,
I(v ) = v
forailvinV
(If it is important to identify the vector space V, we may write Iv fo r clarity.) The proofs that the zero and identity tra nsfo rmations are linear are left as easy exerCises .
..i
Sectio n 6.4
Linear Transformallons
.'9
Properlles 01 linear Translormallons In Chapler 3, all linear transfo rmations were matrix transformations, and their properties were directly rdated to properties of the matrices involved. The follow ing theorem is easy to prove for matrix transfo rmatio ns. (Do il!) The full proof for Imear transformatio ns in general takes a bit more care, but it is still straightforward .
.-----~T~n·: eo:r: e. ~ &.~14 ;-~~~~~~~:-~~~--~========~·~' Let T : V --+ Wbe a linea r transformation. Then ,. 1'(0) = 0 b. T( - v) = - T(v) for all v in V. c. T( u - v) = T( u ) - T(v) foralluandvin V.
1"11'
We p rove properties (a) and (cl and leave the proof of property (b) fo r Exercise 21.
(a ) Let v be any vector in V. Then T (O) = T {Ov ) "" Or{v) .. 0, as required. (Can you give a reason for each step?) (0) 1'( u - v)
= 1'(u + (-I)v) = 1'(u ) + (-I)1'( v) = 1'( u ) -
1'(v)
I ••• ,. Properly (a) can be useful in showing that certain transformations arc lIot linear. As an illustration, consider Example 6.53(b). If 1\x) "" 2", then 1"(0) = 20 = I ¢ O,SO Tis not linear, by Theorem 6.14{a). Be warned, however, that there are lots of transformations that tlo map the zero vecto r to the zero veClor but that aTe slill llot linear. Example 6.53(a) is a case in point: The zero vecto r is the 2X2 zero matrix 0, so 1'( 0 ) = d et 0 = 0, b UI we have seen Ihat T(A) :: del A is nOllinear.
The most impo rtanl property o f a linear transforma tion T : V --+ W IS that T is completely determined by its effect on a basis for V. The next example shows what Ih is means.
flallPle &.55
Suppose Tis a linear tr:m sforma tion from 1R2 tO ~2 such thai
Sollll.1
Since B :: {[:
span (B). Solving
J. [~] }
is a basis for R2 (why?), every vecto r in R2 is in
....
Chapter 6
Vector Spac es
we find tha t c, = -7 and Cz = 3, The refo re,
=
-7T[ :]+ 3T[!]
- 7(2 - 3x + x 2 ) + 3( 1 - x 2 ) = - 11 + 21x - IOx 2 =
Si m ilarly, we disc ove r that
'0
r[:] ~ r(3a - 2b)[ :] + (b- all:]) ~ (3a - 2b) T[:] + (b- a)l: ] + Xl ) + (b - a){ 1 - Xl ) 3b) + (- 9a + 6b) x + (411 - 3b)x 2
= (3 a - 2b){ 2 - 3x = (Sa (No te that by sett ing
tl
= - I and
b= 2, we recover the solu tio n T[-~] = -
II
+
2 1x- IOx ~ , ) The pro of o f the gen eral theo rem is quit e stra ight forw ard. i.
Theo". 6.15
Let T:
*'
v--+
l&
\Vb ea line ar tran sfor mat ion and let 6 = { VI i" " V } be a spa nnin g set for V. Then T(6 ) = (T(v1) . . . . , T(v ~)} span s th e romge of T. n
ProD.
The rang " of T is the set of all v('CtofS in HI that are of th" fo rm T(v ). whe r" v is in V. Let T(v ) be in the range of T. Since 6 spa ns V, ther £' ar£' scal ars ( " .. . , s uch that
'n
V ""C V 1 I
+ "' +c" v ~
App lyi ng Tan d usin g the fact that it is a line ar tran sfor ma tion , we see thai
T(v) = 11,l v, + ... + '"v n) = In othe r wor ds. T{v)
IS 111
C, T(v 1 )
+ .. . + (,,1'( v")
spa n (T(B » , as r"qu ircd .
The orem 6.15 app lies, in par ticu lar, whe n 6 is a basis for V. Yo u mig ht guess that , i n this case, 71B l wou ld then be a basis for the range of T. Unf ortu nately. this is not always the case. We will add ress thts issue in Sect ion 6.5.
ComposllloA 01 linear Translorlllalions In Sect ion 3.6, we defi n{'d the com pos itio n of m atri x tran sfo rma tion s. The defi m tlo n exte nds to gen eral line ar tra nsfo rma tion s in an obv io us way.
Section 6.4
So T
isread~SofT."
481
Linear Transfo rmations
If T: U -+ Vand s: V -+ Ware linear tra nsformations. then the composition afS with Tis the mapping S o T, defined by
Definition
(So T)(u) - S(T(u)) where u is in U. 7
,
O bserve that S o T is a mapp ing from U to W (see Figure 6,6). Notice also that for the d efinition to m ake sense, the range of T must be contained in the domain of S.
u
v
•
-
u·
5
W SrTc u))
•
(S o T)( u )
Figure &.& ComposItion of linear trllnsformations
Uamgle 6.56
Let T: R2 -+ r::P[ and 5: '1J[ -+~ 2 be the linear transfo rmations defined by
1:]
=
SOIUtiOD
(So
(I
+ (a + b)x
and
S\p(x )) = xp(x )
We com p ute
D[_:]- s( 1-:]) -
5(3
+
(3 - 2)x) - 5(3
+ x) - x(3 + x)
=3x + x 2 and
(5 0
DU-s( T[ ~]) = ax
- S(a + (a + b)x) - x(" + (a + b)x)
+ (a + b}r
Chapter 3 showed that the co mposition of two matrix transformations was anothe r matrix transformatio n. In general, we have the following theorem.
Theorem 6.16
If T: U -+ Vand 5: V -+ W are linear transfo rmations, then S o T: U -+ W..is linear transformation . ~
482
Chapter 6
Vector Spaces
PrlDr
Let u and v be in U and 1et c l>e a scalar. Then
(5 0 T)(" + v) = 5(7(" + v))
+ J(v)) ,~ 'l (")) + 5(7(v)) (5 0 T)(u) + (5 ' 7)(v)
= 5(J(" )
Since Tis linear
=
Sin ~e
=
(S ' T)(,u) = "" "" =
and
5(T(,")) $( c1'( u )) c5( T( u )) ,(5 ' T)( u)
5 IS linear
Since Tis linear Since 5 is linear
T he refore, S o T is a line3 r Irtlnsfor matio n . The algebraic properties of linear Irtlllsformatio ns mi rro r those of matrix tr3nsformations, wh ich, in turn , arc re131ed to the 31gebr3ic proper tIes of mlllrices. For eXllm ple, compositio n of linear transforrnal lons is
provided these compositions make sense. The proof of this property is Identical to that given in SectIon 3.6. The next example gives another useful (but not surprising) property of linear Lransfonnal iOllS.
Example 6.51
Let S: U ~ V 3nd T: V ~ Wbe hnear transfor mations and let I: V ~ Vbe the identity t ransfo rmation. Then for every v in V, we have
(T o 1)(. )
=
T(l(. ))
=
T(v)
5i nce T ~ I 3nd T have the same value at every v in their domain, it follows that T o I "" T. Slm tiariy, l o S = 5.
Reman:
The method of Example 6.57 is wo rt h noting. Suppose we want 10 show thaI two linea r transformations TI and T1 ( bolh from V lo W ) are equal. II suffices to show that TI( v) "" T!(v ) for every v in V. Furt her properties of li near transfo rmations are explored in the exercises.
Inverses 01 linear Translormalions Detlailloa
A linear tra nsformation T: V ~ 'IV is invertible if there is a linear transfo rmation T' : W _ V such that
T' oT= lvand In th iS C3se,
r
is called an inverse for T.
T OT'= / w
$cclio n 6.4
"II
Linear Transformation s
Remlrll • The domain Vand codomam Wof T d o not have to be the same, as they do in the case of invertible matrix transformations. However, we will see In the next section that Vand W must be very closely related. • The requirement that r be linear could have been omitted from this delln ition. For, as we will see in Theorem 6.24, if r is any mapping from W to V such that T' 0 T = Ivand T o r = I\\" then '/"' is forced to be linear as well. • If r is an inverse for T, then the defi n itio n implies that Tis an inverse for T'.
Hence, r is invertible 100.
(Kample 6.58
Verify that the mappings T: 1R2 -+rtP I and T' : q; I -+ 1R2 defin ed by
r[:] '"
a + (a + b)x and
r(c + dx) = [d
~
~
0)
J
· are Inverses.
Solution
We com pute
T'(" + (" + b)x)
[(
a +/J - (l
] _ [ ,,] b
,nd (T o T')(c + dx) Hence, T' other.
0
~
T(T'( c + dx))
T ::::: fa' and T o r
=
~
T[
e ] d- c
~ e + (c+ (,I -
enx = c+ dx
I;". Therefore. T and T' are inverses of each
As was the case for invertible matrices. inverses of linear transformations are unique if they exist. The following lheo~m is t he analogue of Theorem 3.6.
TklOle.
&.n
If Tis an invertible linear transformation. then its inverse is unique.
Proal The proof is the same as that of Theorem 3.6. with products of matrices replaced by compositions of linear transformations. (You are asked to complete th is proof in ExerCLse 31.) Thanks to Theorem 6.17. if T is Invertible. we can refer to the inverse of T. It will be denoted by T - 1 (pronounced " Tlnversc"). I n the next two sections, we will address the issue of determining when a given linear transformation is invertible and finding its inverse when it exists.
414
Chapler 6
Vt'Clo r Spac"
16. Let T: t!P2 -+ 112 be a linear transformation for which
III Exercises 1- 12, de/ermine w/re/lrer T is a/illear /mrrsformlllio". I. 'f': Mll -+
M 22
defi ned by Find T(6
'It ~] = [a~ b c~ d] 2. '1':
M 22
-+
M 22
17. Let T: t!P1 -+ CJ>1 be a hnear transformatio n fo r whICh T( I
defined by
1: ~] -[ b~ C
+ x - 4x 2 ) and T(/:I + hx + exl ) .
+ x)
T(I
(r~ d]
Find 1'(4 - x
3. T : M~~ -+ M~~ defi ned by T(A ) = AB, where B is a matrix fixed 4, T : M~~ -+ Mw~ defined by T(A) = AB - HA, where B is a fixed matrix
8. T :fJll -+~2 definedbyT(tI + bx+ (b+ I )x+ (c+ l ) x 2
1(
2
. T[ 41 F1t1d
+
J]2
and T
r[~
:] = 4
["c bd ].
19. Let T: Mll -+ R be a linear transfo rmation. Show that there arc scalars (I, b, c, and d such that
T[;
10. r, ~ --+ ~ d,finN by 1-en = f(x'l
:] = lIlV +bX+CY+ dZ
(f(x»'
12. T: ~ -+ R defi ned by T(n = f(c) , where cis a fixed scalar 13. Sho\'" that the transformat ions S an d T In Example 6.56 are both linear.
for all [;
:] inM21 •
20. Show that there is no linear transformation T: IR' -+ qpl such that
14. Let T: IJl2 -+ W be a linear transformation for which
,
'1'1
1
2 - I
1~ ~] = 2,
I,
T[: oI] = 3.
)=(0+ 1)+
9. T :~t-+fJl2 d efined by T (a + bx + all = n b(x+ 1)+ b(x+ 1)2
+ 3x l ) and T(a + bx + ex l ) .
1~ ~] =
"x"
5, 'f': M"rI -+ 1Il defined by T( A ) = tr{ A) 6, T: M",, -+ IR defined by T(A) = 1l1,(l U ' . (/"" 7. T : M~ .. -+ lR defined by T(A) = r:mk(A)
X l,
18. Let T: MIl -+ Ii be a linear transforma tion for which
"X"
II. r, ~--+~d,finNbyT( f) =
+ Xl, T(x+ x 2 ) = x + X l ) = 1 + x + x2
= I
1
+ x,
o
.nd
3 TO
'" 2 - x
= -2
+ 2x2
,
+ Xl,
o T
6
-8 2 1. Prove Theorem 6.14(b) . 15. Let T : W -+ CJ>2 be a linear transformatio n for which
'I'[:]= I- lx
and
T[_ ~] = X+2K
22. Let { VI" ' " v~} be a basis for a vector space Vand let T ; V-+ V be a linea r transformation. Prove thaI if T (v l ) " V I' T (vl ) '" vl ... , T(v.) - V .> then Tis the identity transformation on V. ~ 23. leI
T : ~~ -+ ~ ~ be a linear transformation such that T(x'-) = kx'- I for k = 0, I, ... , II. Show that Tmust be the differential operator D.
5«tiOl1 6.5
24. Let VI " ' " V Ol be vectors in a vector space Vand let T: V ---+ W be a li near transformation.
(a) If {T(vl ) , ... , T(v
is linearly independent in W, show that { VI " '" v,,} is linearly independent in V. (b) Show that the converse of part (a) is false. That is, it is not necessa rily true that if {v1, . .. , v..l is linearly Independent in V, then {T( v , ), ... , r(v~)} is linearly independent in W. Illustrate this with an example 1': 1R2 ---+ (f.i2.
The Kernel and Ra nge of a Linear Transformal ion
In ExerCIses 29 and 30, verify tllat S and T are inverses. 29. S:1R: 2 ---+1R 2 definedb Y
lt } }
25. Defi ne linear transfor mations S: T: [Rl ---+ [R2 by
Compute (5
0
g;r ---+ Mn and
T) [~ ] and (So T)[;].canyou
compute (T 0 5) [;
26. Defi ne linear transfo rmatio ns S: IlJ>I ---+ rJl'l and T : g>-l ---+ W'I by
and
=
a + (a + b)x + 2bx 1
+ 2x - .i-) and
(5 " T )(a + bx + exl ). Can you compute (T c 5)(a + I!x)? If so, compute it. ~ 27. Define linear transformations 5: IlJ> .. ---+ '!J'1t and T: ~ It ---+ IlJ> It by
S(p(x)) - p(x + 1) ,nd
T(I~ x))
- p'(x)
Find (S 0 T)(p(x» and (To S)(p(x)}. ( H mt: Remember the Cham Rule.) ~28. Defin e linear transfo rmation s 5:
'lP .. ---+ 'lP and It
T : IlJ> It ---+ IlJ> It by
S(p(x)) - p(x + 1) ,nd F; nd (5 0
T)~x))
"i,p(x)) - xp'(x)
, nd (T o 5)(p(x)).
y
y
3x
x- y - 3x+4y
y
and T :1R1---+1R2
1
30. S: Q/' I ---+ eJ> I d efined by S( a + bx) = ( - 4a and T:
T(a + bx)
=
+ b) + 2ax
b/2 + (a + 2b)x
3 1. Prove Theorem 6. [7. 32. Let J' : V ---+ V be a linear transfo rmation such that T o T = I.
(a) Show that {v, T(v)} is linearly depe nden t if and only if T(v ) = ::t.v. (b) Give an exa m ple of such a linear transformation w1th V = Rl.
(a) Show that {v, T(v)} IS lin early dependent if and only if T(v) = v or T(v) = o. (b ) Give an exa mple of such a linear transfo rmation with V = R2.
The set of (Ill/inear lrallsformatlOllS from (I vector sp(lce V to a vector space W is denoted by !e( V, W ). IfS and Tare in!e( V, W ), we can define the sum S + T ofS and T by
T(a + bx + cx 1 )= b +2cx
Compute (S" T)(3
d,fin,d by T[xl- [
5[xl -[4X ++ rl
33. Let T: V ---+ V be a linear transfo rmation such that T o T = T.
]? If so, compute it .
S(a + bx)
4115
(5
+ T)(v) - S(v) + T(v)
for all v in V. If c is a scalar, we define tile scalar multiple cT ofTbycro be ('T)(v) - ,T(v) for all v in V. TIIen S + T and cT are both trarl$formatioll$ from V to W 34. Prove that S + T and cT arc linear transformations. 35. Prove that ~( V, W ) IS a vector space with Ihis addition and scalar multiplication. 36. Let R, S, and Tbe linear transformations such that the following operations make sense. Prove that:
(a) R"( S+ 1) = R oS + R o T (b) c(R 0 S) = (cR) 0 5 = R" (c5) for an y scalar c
The Kernel and Range of a Linear Transformation The null space and column space are two of the fu ndamental subspaces associated with a matrix. In th is section , we extend these notions to the ke rnel and range of a linear transformati o n.
41&
Chapter 6
Vector Spaces
The word kemd is derived from the Old English word cyme/, a form of the word corn, meaning useed" or tigrain." Like a kernel of corn, the kernel of a linear transforma tion is its "core" or "seed" in the sense thai it carries information aboui many of the important properties of the transfo rmatio n
•
Definition
Let T: V -+ W be a linear transforma tion. The kemel of T, denoted ke r( T ), is the set of all vecto rs in Vt hat are mapped by T to 0 in W. That is,
Th e range of T, denoted range(T), is the set of aU vcclQtUn wth;tt al:e images Qf vectors in V unde r T That is, ra nge( T) = {T(v): v in
V}
= {w in W: w = T(v) fo r some v in V}
Example U9
Let A be an ", X n matrix and let T = T,., be the correspondi ng matrix transformati on from R~ \0 Rift defi ned by T(v) = Av. Then , as we saw In Chapter 3, the range of 'f'is th e co lumn space of A. The kernel of Tis k,~T) ~
{v ;n R' , T(v)
~
0)
= {vin Rw: Av = O} = null(A) In words, the kernel of a matrix transformation is just the nu ll space of the correspo nd mg matrix.
Example 6.60
Find the kernel and range of thc differential operato r I) : '3'] -+ ~2 defined by D(p(x)) ~ p'(x) .
Solution
Since O(a
+ bx + ex 2 + d;c l)
ker{ D) = {a = {(/
=
b + 2ex + 3rlx 2, we have
+ ux + ex" + dx J : LX,a + bx + ex 2 + dx J )
= O}
+ bx + ex 2 + dx J : b + 2ex + 3dxl "" O}
But b + 2ex + 3dx 2 = 0 if and only if b = 2c = 3d = 0, which implies that b = c = d = O. Therefore, ker{D) = {a
+ bx + ex 2 + dx 3 : b =
e = rl = O}
= {a: a in R } III other words, the kernel of D is the set of constant polyno m ials. The range of D is all of@ll. si nce everypol ynoml a! in !Jl z is rhe image under D (i.e., the derivative) o f some polynomial in rzJ> J' To be specifi c, if (/ + /JX + cx 2 is in rzJ> 2' then
(/ + bx + ex" ""
D( ax + (~)X2 + (;)xJ)
Stttion 6.5 The Kernel and Ra nge of a Linear Transformation
Example 6.61
411
Let S: \Jl 1 ~ R be the linear transformation defined by
S(p(x» - i'P(X)dx
•
Find the kernel and mnge of S. SII.II,.
In detail, we have
y
S(a
+ bx) =
b 2
-
-
-x
=
[nx+ixz ]:
=
(a+
~) - o = a+~
+ bxo st" + bx) - OJ
{a + bx: a+ ~ = o}
= {a+bX:(l =-~}
b
2
= flilure 8,1 b Ify :o
then
•
=
k,~S) - {a
Therefo re,
I I(a + IJx)dx
- '2 + bx,
( rdX - 0 •
{-*+bX}
Geometrically, kc r{S ) consists of all those linear polyno mials whose graphs have the property Ihat the area between the line and the x-axis is equally distributed above and below the axis o n the interval 10. I J (see Figure 6.7). The range of S is R, sin ce every real n umber ca n be obtained as the image under S of some polynomial in ~ p For example, if Ii is an arbitrary real number, then
fa
dx =
rax]~ =
Ii -
0=
a
so a = 5( a).
Example 6.62
Let T: M21 -+ Mn be the linear transfo rmation defin ed by taking transposes: T(A) :::: AT. Find the kernel and range of T.
Solullon
We see that k,~T) - (A ;o M u ' T(A ) - OJ
= {A inMn :A T =
O}
But if AT = 0 , then A = (AT)T = OT = O.lt follows that ker( T) = 101. Since. fo r any matrix A in Mw we have A = (AT ) T = T (AT ) (and AT is in M n). we deduce that range( T ) = M!l'
In all of these examples. the kernel and range of a linear transformation are subspaces o f the domain and codoma in, respectivciy, of the tra nsformati o n. Since we are generalizing the null space and colum n space o f a mat rix, this is perhaps nO! surp rising. Nevertheless, we should not take anything for granted. so we need to prove that it is no t a coincidence.
.11
Chapler 6
V«tor Spacrs
Theo,.M 6.18
Let T: V-t Wbe a linear transformation. Then a. The kernel of T is a subspace of V. D. The range of T is a subspace of W.
Prool
(a) Since T(O) "" 0, the zero vector of V is in ker(T), so ker( T) is nonempty. Let u and v be in ker ( n and let ' be a scalar. Then T(u) :::= T(v) = 0, so
T(u + v) : T(u) + T(v) - 0 + 0 : 0 ,nd Therefore, U
T(c u ) "" cT( u ) "" cO = 0
+ v and cu are in ker( T), a nd ker( T ) is a subspace of V.
(b ) Since 0 = 1'(0), the zero vector of W is in ra nge( 'f), so range( T) is nonempty. Let n u) and T (v) be in the range of Tand let c be a scalar. Then 7{ u ) + T( v ) = n u + v) is the image of the vector u + v. Since u and v arc in V, so is u + v, and hence 1'(u) + 1'(v) is in range Pl Similarly, ,1'( u) = 1'(cu ). Since u is in V, so is cu, and hence e1'(u ) is in rangeCT). Therefore, range( T) is a nonempty subset of Wlhat IS dosed under addition and scalar multiplication, and thus It is a subspace of W.
Figure 6.8 gives a schematic representation of the kernel and range of a linear transformation.
range(T)
v T
I
6.'
figure The kernel and range of T: V --t W
In Chapter 3, we defined the rank of a matrix to be the dimension of its column space and the null Ity of a matrix to be the dimension of its null space. We now extend these defin itions to linear transformat ions.
,
DellallioD
let T: V - t \V be a linear transformation. The rar,k of T is the imension of the ra nge of T and is denoted by rank(T). The r,ulliry of Tis the dimension of the kernel of T and is denoted by nullity( T).
Elample 6.63
If A is a matrix and T = TA IS the matrix transformation defmed by T(v) = Av , then t he range and kernel of Tare the column space and the null space of A, respectively, by Exam ple 6.59. Hence, from Section 3.5, we ha\'e rank(7) = rank{A)
and
nullity(T) = nuJlity(A )
Section 6.5 The Kernel and Range of a Linear Transformallon
(Kample 6.64
Find the rank and the nullity of the linear transformation D:
Solaliol
.....
419
f!.Pl defined by
In E:xample 6.60, we computed range (D) "" 9J>2' so rankeD) = dim ~2
=
3
The kernel of DIS the set of all constant polynomials: ker( D) = la : (I in RI in Rl. Hence, III is a basis fo r ker (D), so
=
Ia ' I : a
nullity(D) = dim(ker{D)) = I
Example 6.65
rind the rank and the null ity of the linear transformation S: '3'1"" R defined by
~ i'P(X)dX ,
S1 p(x))
SOluiloR
r rom Example 6.6 1. ra nge(S) kereS) =
=
Rand rank(S) = dim R = I. Also,
{ - ~ + bx :bin R}
= { b( - ~
+ x): bin IR} = span{ -~ + x) so {-1 + x} isa basis fo r ker(S) . Therefore, null ity(S) = dim(ker (S» = 1.
Example 6.66
Find the nm k and the nullity of the linear transfor mation T: M 22 .." M n defi ned by T(A) = AT.
SoJulion
In Example 6.62, we fo und that range(T) = Mu and ker( T ) = 101. Hence, rank( T)
=
di m
M 22
= 4
and
nulli ty{T) = dim{O}
=
0
In Chapter 3, we saw that the rank and nul lity of an //IX 11 matrix A are related by the formula rank(A) + nutlity(A l = II. This is the Rank Theorem (Theorem 3.26). Since the matrix transformation T = TAhas An as Its domain, we could rewrite the relationship as rank(A) + nUIliry(A) = dim R" This version of the Rank Theorem extends very nicely to general hnear transformations, as you Gill see from the last three examples: rankeD) + nUllity(D) = 3 + I = 4 = dim W' j
Example 6.64
rank(S) + nullity(S) = I + I = 2 = dim W' I
Example 6.65
rank(T) + nulllty( T)
Example 6.66
=
4 + 0 = 4 = dim M n
49G
Chapter 6
Vector Spaces
Theorem 6.19
The Rank Theorem Let T: V ----+ Wbe a linear transformation from a fi n ite·dimensional vector space Vinlo a vector space W. Then rank( T)
+ nUllity(T)
= d im V
In the next section, you will see how to adapt the proof of Theorem 3.26 to prove this version of the result. For now, we give an alternative proof that does not use matrices.
n and let IvI' . . . , v k} be a basis for ker( T)(so that nullity( T) = dim(ker ( T» = k). Since {v l" ' " vd is a linearly independent set, it can be extended to a basis for V, by Theorem 6.28. Let B = {VI' ... , vi' vHI " .. , v n} be such a basis. If we can show that the set C = IT(v k+ , ), . . • , T(v n ») is a basis fo r rangeCT) , then we will have rank( T) = dim(rangc(T» = /I - k and thus
Prool Let dim
V=
rank(T)
+ nullity(T)
= k + (n -
k)
= n = dim V
as required. Certainly C is contained in the range of T. To show that C spans the range of T, let T(v) be a vector in the range of T. Then v is in V, and since 13 is a basis for V, we can fi nd scalars , en such that
'I" ..
Since v I" .. , v k are in the kernel of T, we have T(v, )
= ... ""
T(vt ) = 0, so
T( v) "" T('lv J + ... + 'tVt + CUJ Vk+1 + ... + ' wvw) = c,T(v, ) + .. + 'iT(Vt) + Ct+IT(Vt+l ) + ... + ,"T(v") = ' HI T(Vl+l) + ... + c" T(v w) This shows that the range of Tis spanned by C. To show that C is linearly independent, suppose that there are scalars 't , I" .. , ' " such that 'l +I T(Vl+ l )
+ ... + cn T(v n)
=
0
Then T(ck+IVl+1 + ... + 'nvnl = 0, which means that 'H IVk+1 + ... + '"v" is in the kernel of T and IS, hence, expressible as a linear combination of the basis vectors V I " ' " Vi of ker( T)-say,
Bul now a nd the linear independence of B forces ' I = ... = ' n = O. In particular, C1+1 = .. . "" ' " = 0, whICh means C is linearly independent. We have shown that C is a basis for the range of T,sa, by our com ments above, the proof is complete. We have verified the Rank Theorem for Examples 6.64 , 6.65, and 6.66. In practice, this theorem allows us to fi nd the rank and nullity of a linear transfo rmation with on ly half the work. The following examples illustrate the process.
&ction 6.5 The Kernel and Range of II Linear Transformation
(KImple 6.6J
Find the rank 3nd nullity of the line3r transformation T: fjl2 ----i' 1'(p(x» ". xp(x). (Check that Treally is linear.)
Solution
491
W>j defined by
In detail, we h3ve
T(a
+ bx + ex l ) = ax + bx 2 +
ex J
It follows that
ker(T) = {a + bx + cx 2 : T(a + /IX + cx l ) = O} = {a + bx + cx 2 : ax + bx 1 + ex ) = O}
{a + bx + ex 2 : a = /, - e = O} = (0) dim (ker( T» = O. The Rank Theorem implies th3t =
so we have nullity(T)
=
rank( T ) = dim ~2
-
nullity(T) = 3 - 0 = 3
1I,.,rll In Example 6.67, it would be just as easy to find the rank of T first, since Ix.;, KI is easily secn to be a basis for the range of T. Usually, though, one of the two (the rank or the nu llity of a linear transformation) will be easier to compute; the Rank Theorem can then be used to fi nd the ot her. Wi th practice, you will become belter at knowi ng which way to proceed.
(KImple 6.68
Let W be the vector space of all symmetric 2 X2 matrices. Define a linear transformation T : W----i' QIl 1 by
1: ~]=(a-b)+(b-c)x+(c-a)xl
(Che
The nul lity of Tis easier to compute directly than the rank, so we proceed
::IS foll ows:
k,«1)
= { [~
~l T[~ ~l = o}
={[:
~]:(a-b)+(b-c)x+(c - a)xl= O}
={[ ~ ~l(a- b)=(b - ,)=(,-a)= o}
= {[~ ~l a = b ='}
={[: :]} =,p,n([: :]) Therefore, { [:
: ] } is a basis for the kernel of T, so nullit y( T) = dlm(ker( T» = I.
The Rank Theorem and Example 6.42 tell us that rank( T) = dim IV - nullity( T) = 3 - 1 = 2.
OI8-to-On and Onto linear Transformations We now investigate criteria fo r a linear tra nsformation to be invertible. The keys to the discussion are the very importan t properties onc-Io-one and onto.
A linear transformation T: V ~ Wis called one-fa-one if Tmaps ",:\0" i' " VIa distinct vectors in \\I: Ifrangc( T) \v, then Tis called O"fo.
RBmarlll The defin it io n o f on('-IO-one may be written more for mally as follows:
•
T: V ..... W is o ne-Io-one if, for all u and v in V,
u "* v implies that T(u)
'* T(v)
The above statemen t is equivalent to the following:
T: V ..... W is o ne- Io-o ne if, (or all u and v in
v,
T(u ) = 1'(v ) implies that u = v
Figure 6.9 ilIuSirates these twO statements.
/
~
'- v /
T
'\
/
'-
T
v
IV
(b) T
(3) Ti s one-Io-onc
is nQ( one-to-one
fillura &.9 •
Another way to write Ihe defi n ition of onto is as follows:
T: V""'""""" Wis on to if. for all w in W, there is at least one v in Vsuch that
w - T( v)
In other words,givl!tJ w in W, don there rust some v in Vsuch that w = T (v}? If. for an arbitrary w, we can solve this equation for v, then T is onto (see Figure 6.10).
Section 6.5
/
'\
'\
/
T
'\
/
T
range(T)
"
v
493
The Kernel and Range of a LinCllr Transformation
/
•
'\
rallge(T)
\... IV
(a) T is onto
v
"
IV
(b) Tis nO t onto
fllure &.lD
Example 6.69
Which of the following linear transformatIOns are o ne- Io-one? onto?
2x
T: IR~ ---JoIli:) definCdbYT[;] =
(3)
x 0 y
(b) D : QI' 3 ---7 '2P 2 defined by D(p{x)) "" p' ( x)
(e) T:M12 -+M n definedbyT(A)= AT
Solutloa
(, ) Let
r[;:J
=
T[~]. Then lx,
2x,
x, - y,
X:! - Y2 0
0
so lx, = 2x,. and Xl y, =
Y2'
Hence,
Yt =
-
~ - YZ'
Solving these equalions, we sec that XI :;:: x2 and
[xYI'J Y2",] so . one-Io-one. =
-.
T IS
T is not onto, since its range is not all o f (Fe. To be specific, there
15
no vector
o o . (Why not?) I
(b) In Example 6.60, we showed that ra ngc( D) = ~2> so D IS onto. D is not aile-toone, since d istinct polynomials in '8» can have the same deri vative. For example, x 3 xl + I, but D(K):: 3'; = D(xl + I). (c) Let A and B be in Mw with T( A ) = T( B). Then A T = nT , so A = (ATf:: ( BT )T = B. Hence, Tis one-to-one. In Example 6.62, we showed thai range(T) = M 22' 1·lenee, T is onto.
"*
It turns o ut that there is a very simple criterion for determining whether a linear transform ation is o ne-Io-one.
Theorem 6.20
A linear transfo rmation T : V -+ W is one-to-one if and on ly ifkcr(T) = {OJ.
494
Chapler 6 Vcrtor Spaces
Proal Assume that T is one- to-one. If v is in the kernel of T, then T (v) = O. But
=
=
we also know that T(O) 0, so T(v ) T(O). Since T is one-Io-one, this implies that v = 0, so the only vector in the kernel of Tis the zero vector. Conversely, ass ume that ker(T) = 10 }. To show that Tis one-to-one, let u and v b e in V with nu) = T (v). Then T (u - v) = T( u) - T(v) = 0, which implies that u - v is III the kernel of T. But kcr (T) = 101, so we must have u - v = 0 o r, equivalently, u = v. ThiS proves that Tis one-to-one.
(Kample 6.10
Show that the linear transformatton T:
T[:] :
n;f ~ '!P I defined by d
(d
b)x
is one-to-one and onto.
Solullon If [ : ] is III the kernel o f T, then
0=T[:] = It follows that a = 0 and II quendy, ker(1) =
+ /, =
II
+ (a +
b)x
O. Hence, b = 0, and therefore [ :]
{ [~] }, and Tis one-Io-one, by Theorem 6.20.
By the Ran k Theorem, rank( T ) = di m Rl - nulhty(T) = 2 - 0 = 2 Therefore, the range of T is a two-dimensional subspace of Bll, and hence range( T) = 1R1. 1t follows that T is on to.
For linear transfo rmations between two II-dimensional vector spaces, the properties of one-to-one and onto are closely related. Observe first that for a linear transfor mation T: V ~ W, ker (T) = {OJ If and on ly if null ity( T) = 0, and TIS onto If and only if rank(T) = dim W. (Why?) The proof of the next theorem essentially uses the method of Example 6.70.
Theorem 6.21
l et dim V = dim W = n. Then a linear transformation T: V ~ \V is one-to-one if and only if it is o nto.
Prool Assume that T is one-to-one. Then nullity( T ) =
°
by Theorem 6.20 and the remark preceding Theorem 6.21. The Rank Theorem implies that ra nk( T) = dim V - nullity( T) =
II -
0 =
11
Therefore, Tis o nto. Conversely, assume that T is onto. Then rank( T ) = dim W = n. By the Rank Theorem, nu llity(T) = dim V - rank(T) = " - n = 0 Hence, ker (T) = 10},and Tisone-to-one.
Section 6.S The Kernel and Ihmge of a linear Transfor mn(lon
.95
In Section 6.4. we poi nted out that if T: V--+- W IS a linear transformation. then the image of a basis for V under Tneed not be a basis for the range of T. We can now give a condition \ha l ensures that a basis fo r V will be mapped by T 10;1 basis for W.
Theorem 6.22
lei T: V--+- W be a one-to-one linear transfol'mation. If S = Iv\, ...• v11 is a linearly independent sct in V, then T (S) = fT(vl), ... , T(v1H is a linearly independent set in W.
Prool Let CI"
• • ,
ck be scalars such that
Then T(c,v j + ... + CkV~) = O,which implies that C1V l + ... + clv~ isi nthekernelof T l3ut, since Tis one-to-one, ker( n = fO), by Theorem 6.20. Hence, CIV I
+ ... +
CkV k
= 0
But, since {Vi•.•. ' vd is linearly independent, all of the scalars c, must be O. Therefore, IT{ VI)' ... , T{v k)) is linea rly independell t.
Corollarv 6.23
Let dim V = d im IV = n. Then a one-la-one linear transformation T: V ~ HI maps a basis for V to a basis for W.
p,"r
Let 8 = h, ... , vn } be a basis for V. fiy Theorem 6.22, T(13) = I T(v l ), . • . , 11vnH is a linearly independent set in W, so we need only show that 116) spans w. But, by Theorem 6. 15, T( 13 ) spans the mnge of T. Moreover, T is onto, by Theorem (j.21, Sf) range( T) = W. Therefore, T(B) spans W, whi,h completes the proof.
Example 6.11
Let T: [R2 ~ *1 be the linear transfor mation from Example 6.70, defined by
T[:] =a+(a +b)X Then, by Corollary 6.23, the standard basis [; T(£) = {T( , ,). T(, , )) 0[\1' ,. We find th"
=
{e l , e2 } for 1R2 is mapped to a basis
T(, , ) It follows that {I
=
T[~l = x
+ x, x} is a basis fo r (lJ> 1'
We can now determine wh ich linear transformations T: V ~ HI are invertible.
Theore. 6.24
A linear transfo rmation T: V and onto.
~
W is invertible if ;lIld only if it is one-to-one •
496
Chl1 pter 6 Vector Spaces
Prill Assume that Tis inver tible. Then there eri sts a linear transformation y - I : W -+ V such that
T- 1 0 T = I v and
T o T - 1 = I""
To show that T is o ne-to-one, let v be in the kernel of T. Then T (v) "" O. T herefore,
r '(T(v» = r '(O) => (r'. T )(v ) = 0 =>1(v)= O ::::} v = 0
which establishes that ke r( T ) = {o}. Therefore. T is one-Io-one, by Theorem 6.20. To show th::1t Tis onto, let w be in Wand let v = T -I(w ). Then
T( v ) = T(r '(w» = (T . r ')(w) = I(w)
= w wh ich shows that w is the image of v under T. Since v is in V. this shows that Ti s on to. Conversely, assume that T is one-to -one and on to. Th is means that nullity( T ) = 0 and Tank( n = d im W. We need to show that there exists a linear transformation T': W-+ Vsuch that Y' 0 T = I" and T o T' = 1w. lei w be in W. Since Tis on to, there exists some vector v in Vsuch Ihal T(v) = w. There is only one such vector v, since, ifv' is another vector in V such that T(v ' ) = w, then T(v ) = T(v ') ; the fact thaI Tis on e-Io -One then implies that v = v'. It therefore makes sense to define a ma pping T' : W ~ Vby setting T'( w ) = v. It follows that
(T" a nd
T) (v ) = T'(T(v » = T' (w) = v
(T . '[") (w) = T(T' (w » = T(v ) = w
then follows that T' 0 T = Iv and T o T' = 1w' Now we must show that T' is a IiI/ear tra nsformation. To this end, let w, and W2 be in W and leI ci and '
]t
v,
1" (C I W I
+
C:/w l ) = r (c, T(v l )
r (T(c,v l
I(c,v,
+ !1: T(v2 + CzV2»
+ 'tv,)
Conseq uen tly, y ' is li near, so, by Theorem 6.1 7,
r =
T - I•
»
Section 6.5 The Kernd and Range ofa Lmear Transformation
The WQrds ;w/llorpl,ism an,d~ i5()/llorpir;c are derived from tile Cretk words isoJ., meaning -equal, and morph, meaning "shape.~ Thus, figuratively speaking, isomorphic ~~tor ~pa(es h;we "equal shapes....
411
Isomorphisms 01 VeCIOr Spacls Wc now arc in a position to describe, spaces to be "essentially the sam e."
10
concrete terms, what it means fo r two vecto r
Ii
Oilluitioa
A linear transformation T: V -+ Wis called an isomorphism jf it
is o ne-la-one and on to. If Vand Ware two vector spaces such that the re is an isomorphism from V to W, then we say that V is isomorphic to \V and write V ::: 1\'. .
Example 6.12
Show that eP~_. and IR " are isomo rph ic.
Solullol
The process of fo rmmg the coordinate vecto r of a polynomial provides us wi th one possible isomorphism (as we observed alrc:ldy in Seclion 6.2, although we did not use the term iso m orp}IIS/li there). Specifically, define T : ~ ,,_ I -+ R" by T(p( x))'" [ ~ x)][, whe re t: = { I, X, • .. ,r'} is the standard baSIS for~,,_ •. ThaI is,
a,
Theo rem 6.6 shows thaI T is a linear transfo rmation . If p( x) = "0 1l,._ lx"-' is in the kernel of T, then
+
a,x
+ ...
o
o Hence, (10 = Il, = ... = a~ _ 1 = 0, so p(x) = O. Therefore. kerO') = 10 1. and Tis oneto-one. Since dim ?PIf_ I = dim R~ = fI, T is also onlO. by Theorem 6.2 1. Thus. T is an isomo rphism , and IJP ~_ I -= R".
Example 6.13
Show that M m " and R ..... are isomorphic. Once again, the coordinate mapping from M m" to Rm. (as in Example 6. 36) is an isomorphism . The details o f the proof are left as an e xeTCI s~ .
SOliliol
In fu ct. the easiest way to tell if two vector spaces are isomorphic is simply to check thei r dimensions. as the nexltheorem shows.
498
Cha pt~r6
Vector Spa(e5
r'
•
•
3
leI Vand Wbe two finite-dimensional vector spaces. Then Vis isomorphic 10 W if and only if dim V = dim W. 7"
57
!
I
PrOD' Let" = dim V. If V is isomorphic to W, then there is an isomo rphism T: V-+ W. Since Tis one-to-o ne, nullity(T) = O. The Rank Theorem then implies that
rank( T) ::: dim V - nullity( T) "" n - 0 =
11
Therefore, the range of T is an n·dimensional subspace of W. But, since 1'is on to, W "" range( T), so dim W = fI, as we wished to show. Conversely, assume that V and W have the same dimension, 11. Let B '" {V I ' " ., v~} be a basis for Vand lei C '" jw p . •• , wnl be a basis for W. We will define a linear transformation T: V -+ Wand then show th at Tis one-Io-one and o nto. An ar bitrary vector v in V can be written uniquely as a linear combination of the vectors ill the basis B-say,
We define T by
+,,·+c w T{ v ) =cI wI n" It is straightforward to check that Tis linear. (Do so.) 'Ib sec that Tis o ne-Io-one, suppose v is in the kernel o f T. Then
and the linear independence o f C forces 'I '" . . . = ' " = O. But then
v "'cIvI +·,,+,,," v = 0 so ker (T) "" IO}, meaning thaI Tis one-to -one. Si nce dim V = dim W, Tis also on to, by Theorem 6.21. Therefore, T is an isomorphism, a nd V .. W.
hample 6.14
Show that R" and 'lP" a re not isomorphic. Since dim Rn = Theorem 6.25.
S.IIU"
hamPl1 6.15
11
"*
n + I = d im ~ ", IR:" and (jp" are not isomorphic. by
Let Wbe the vector space of all symmetric 2X 2 matrices. Show that W is isomorphic to R).
SDlull"l In Example 6.42. we showed that dim W = 3. Hence, dim W = dim Rl, so W~ R l , by Theorem 6.25. (There is an obvious candidate for an Isomorphism T: W~ RJ. Whatisit ?)
Section 6.5
The Kernel and Range of a Linear Transformatio n
tl9
1I'.'fll Our examples have all been re(l/ vector spaces, but the theorems we have proved arc true for vector spaces over the complex numbers C or Zr' where p is prime. For example, the vector space M n (Z2) of a1l2X2 matrices with entries from 1:.2 has dimcnsio n 4 as a vector space over 1.2' a nd hence M n(Zz) iii Z~.
Exercises 6.5 (a) Which, if any, of the fo llowing polynomials are in kcr( T)? (i) 2 (ii) X l (iii) 1 - x (b) Which, if any, of the polynomia ls in parI (a ) are in
1. Let T: Mn ~ M22 be the linear transformation
defined by
range(T)? (c) Describe ker( T) and range(T).
(a) Wh ich, If any, of the following matrices are in ker( T)?
(i) [
I - I
'] 3
(ii)
[~ ~]
(iii)
[3o 0] -3
(b) Which. if any, of the matrices in part (a ) are in range( T)? (cl Describe ker ( T) and ra nge( T) . 2. Let T: Mn ~ IR be the linea r transforma tion defi ned by 'I\A) ~ " (A), (a) Which. if any, of the followmg matrices are in ker( T)? (i) [_ : : ]
(ii)
[~~]
(iii)
[ ~ _~]
(b) Which, if any, of the following scalars arc in range( T)? (i) 0
(ii) - 2
III Exercises 5-8, filld bases for the keme/ (///(/ mnge of the linear tralls!ormatlOn s T ill the indicated exercises. In each case, state the IIl1/1ity CHId r(wk ofT alld verify tile Rank Th eorem. 5. Exerose 1
6. Exercise 2
7. Exercise 3
8. Exercise 4
III Exercises 9-14,fHlci either the 1I11!lJty or the milk ofT and then li se the Rall k Theorem to find the other. 9. 10,
B~[
3, Let T: '1J>2 ~ [R2 be the linear t.ransformation defined by
+
bx
+ a! }
=
c ["b -+ b]
B ~[
[~l
- I
-I] 1
1-I]
- I
1
~ 13.
(ii) x - x 2
(iii) I
+x-
x2
(b) Which, if any, of thc following vecto rs are in range( T)? (i)
1
12. T: M22 ~ MZ2 defined by T(A} = AB - BA, \",here
(a) Which, if any, of the following polynomials arc in ker(T)? (i ) 1 + x
T ~,-+R' d'finedbyT(p(x» ~ [~~~n
11. T: M22 ~ M12 defined by 'It A) "" AB, where
(iii) I/ Vi
(cl Describe ker(T) and range( T).
'{{a
T:Mu _ W definedbYT[~ ~] = [~ = ;]
(ii)
[~l
(iii)
[~l
(c) Describe ker(T) and range(T). 4. Let T: 'lP 1 -+ 'lP 1 be the linear t ransformatIon defined by T(p(x» ~ xp'(x),
T : 1J>2_IR defi ned by T(p(x)) = p'(O) 14. T:MJJ ~MjJ de fi nedby 'J1A} = A - AT
Itt Exercises 15-20, determit1e wlletller I/Ie linear transfor-
mallOn T is (a) aile-la-one alld (b) 011/0. 15. T: 1H z _ [Rl defined by T [ x] = Y
[2X - Y] x + 2y
588
Cha pter 6 Vector Sp:lces
ee[0, 2]. 32, Show that (€[a, bJ - C€[ c, d] fo r all a < ba nd c < d. 31. Sho w that C(6[ 0, I J -
x - 2y 3x
+Y
x +y 2a - b
a + b - 3c
18. H I" --+11' dcfi ncd by 'I11'(x» = n
19. T : llt~ M22 d efi n edby T U
,
[;~~n
a+b+ [ b - 2c
C
2']
,
(a) Prove that if 5 and T are both o ne-to -one, so IS SoT. (b ) Prove that if 5 and T are both on to, so is SoT.
34. Let 5: V -+ Wand T: U --'" V be linear tra nsformations.
__ ["" +bb bb -+ ,,]
(a) Prove that If 5 0 T is o ne-to -one, so is T (b ) Prove that If S o T is onto, so is S. 35. Let T: V --'" W be a linear tra nsfo rmatio n between two fi nite-d im ensional vector spaces.
a 20. T: R J --'" W defi ned by T b
33. Let 5: V ~ W and T : U --'" V be linear transfo rm ations.
=
b, where W is the vector space o f a- ,
all symmet ric 2 X2 matrices
(a) Prove that if di m V < dim W, then Tcan not be onto. (b) Prove that if dim V> d im W, then 1'can not be one- to-o ne. 36, Let no, (1, •••• , a" be n + I distinct real n umbe r~. Defin e T: W'" -+ IR:"-t l b y
In Exercises 2/-26, determine whether Vand Wa re
T(p(x) =
isomorphIC. If they are, gIVe an explicit IsomQrphism T: V~ W. 22. V = 53 (sym metric 3 X 3 matrices) , W = UJ (upper t riangu lar 3 X 3 mat rices)
53 (skcw-
24. V = !j>,. IV = (P(x) in !j>" P(O ) = 0) •• 101
25. V = C, W = R2 26. V = {A in M" , u(A) = 0). W =
Il'
27. Show that T:~n ~ ~n defi ned by T{p(x» = p(x) p'(x) is an isomorphism. 28. Show that T:'lP n --'" 'lP n d efined by is an isomorphism. 29.
+
T(p(x» = p(x - 2)
~how that T:~n--"' 'lPn defined by T(p(x»)
=
xnp(; )
IS an Isomorphism. 30. (a) Show that (£[0, I ] - '{; [2, 3]. [ Hint: Define T: C€ (0, 11--'" C€ [2,3] by letti ng T(f) be the functio n whose value at x IS (T(f))(x) = f(x - 2) for x in
[2.3[.[ (b ) Show that <"€{O, I]
== <€{a, a + I) for all a.
• •
p(a.)
21. V = DJ (d lagonal 3 X3 matrices) , W = IR: J
23. V = 5j (symmetric 3X3 matrices) , W "" symmetric 3X3 matrices)
p(a,) p(a,)
Prove that T is an isom orphism. 37. If Vis a fini te-dimensional vector space and T : V --", V is a linear transfo rmation such that ra nk( T) = rank(T 2), prove that range{T)n ker(T) = \01. [Hint: T2 de notes ToT. Use the Rank Theorem to help show that the kernels of T a nd '1'2 arc thc same. ] 38. Let U and W be subs paces of a fmi te-dimensional vector space V. Define '1": U X W -+ V b y T{u , w ) = u - w. (a) Prove tha t Tis a linear transformation. (b ) Show that range(T) = U + W. (c:) Show that ker( T) OO! un w. [Hint: See Exercise 50 in Section 6. 1.] (d ) Prove Grassmann's Identity: d lm ( U + W ) = di m U+ dim W - dim( Un W)
1Hint: Apply th e Rank Theorem , using results (a) and (b) and Exercise 43(b) in Section 6.2. }
St-ction 6.6 The ~Ialrix of a Linea r Transfo rmation
511
The Matrix of a Linear Transformalion Theorem 6.15 showed that a linear transfor mation T: V ~ W is completely determined by its effect on a spa nning sct for V. [n particular, if we know how Tacts on a basis for V, then we can co mpute T (v ) for any vector v in V. Example 6.55 ill ustrated the process. We implicitly used this important property of linear transformations in Theorem 3.31 to help us compute the standard matrix of a Linear transformat ion T: Rn ~ R"'. In this section, we witl show that every linear transformatIOn between finite-d imensional vecto r spaces can be represented as a matrix transformation. Suppose that V is an tI-dimensional vector space, W is an m-dimensional vector space, and T: V ~ W is a linear transformatio n. lei Band C be bases for Vand W, respectively. Then the coordinate vector mapping R(v) = [vJs defines an isomorphism R: V ~ Rn. AI the same time, we have an isomorphism 5: W ~ R'" given by S{w ) = [w l p which allows us to associate the image T {v ) With the vector IT(v)jc in R"'. Figure 6.11 illustrates the relationships.
,
v
T
•
• s
H
------. So To R1
•
Fllure 6.11 SlIlce R is an isomorphism, il is invertible, so we may form the composite mapping
50
T o R- I : IR" ~ R'"
which maps [v]s to [T(vHc. Since th is mapping goes from R" to R"', we know fro m Chapter 3 that it is a matrix transformation . What, then, is the standard matrix of 5 0 T o R- I? We would like to fi nd the m X n matrix A such th,lt A[vJlI == (5 0 T o W i )([" J8)' Or, since (5 0 T o R- l )([ V]8) = ( T(v)Jc. we require
A[' J.
=
[T(, )lc
11 turns o ut to be surprisi ngly easy to find. The basic idea is thai of Theorem 3.3 1. The columns of A are the images of the standard basis vectors for R~ under S o T o R I But, if B = {VI' " ., vn} is a basis fo r V, then
=
I
o
= ,,
_ nhcntry
51Z
OJapler 6
Vector Spaces
so R- 1(e,) = v j • Therefo re, the Ilh column o f the matrix A we seek is given by
(S 0 T o R- ')(e,)
~
S( T (R- '(. ,» )
~
S(T(Y,)l [T(Y,)]e
~
which IS the coordinate vcctor of T(v;J with respect to the basis C of W. We su m marize this discussion as a theorem.
Thaoram 6.26
Let Vand Wbe two fi n ite-di mensional vector spaces with bases Band C, respecti vely, where 6:: {VI"'" v,,}. If T: V -)0 W is;J linear transformation, then the m X" mat rix A defined by
A ~ [[ T(v,) ]c I [T(y,) ], 1"' 1[T(y.l lcl satisfies
A[Yl.
~
[T(yl ir
for every vecto r v in V. . T he matrix A in Theorem 6.26 is called the matrix
0/ T with respect to the bases B
and C. The relationship is illustrated below. (Recall that T" denotes multiplication by A.)
,,
y
J,
[vl.
T(y) J,
'. • A[Yl.
~
[T(Yl le
Remarill • The matrix o f a linear tra nsformation Twith respecl to bases 6 and C is someti mes denoted by [71c_6' Note the directIOn of the arrow: right-to-left (not left -toright, as for T : V -+ W ). With Ihis nOlatlOn, the fi nal equation in Theorem 6.26 becomes
[Tle_.[y]. - [T(Y)le O bserve that the 6s in the subscri pts appear side by side and appear to "cancel" each other. In words. th is equatio n says "The matrix for T times the coordinate vector for v gives the coord inate vector for n v)." In the special case where V:= Wand 13 := C. we write [7']6 (instead of {7']6_8)' Theorem 6.26 then states that ~
[Tl.[v].
[T(v)l.
• The mat ri x of a linear tra nsformation with respect to given bases is u n iqu~ . That is, fo r every vector v in V, there is only one mat rix A with the property specified by Theorem 6.26-narnely,
A[Yl.
~
[T(yl ]C
( You are asked 10 prove this in Exercise 39,)
Section 6.6
The Matrix of a Linear Transformation
503
The diagram Ih,\1 follows Theorem 6.26 is sometimes called a comllllllmivc diagmm because we can start in the upper left -hand corner with the vector v and get to [Ttv) Ie in the lower right-hand corner in two d ifferent, but equivalent, ways. If, as before, we denote thecoordinatc mappings that map v to [V]8 and w to [w l c by Rand 5, respectively, then we can summarize this "commutativity" by •
SoT= T"oR The reaso n for the term commutative becomes dearer when V = Wand B = C, fo r then R = S 100, and we have
R o T = T"o R suggesting that the coordinate mapping R co mmutes with the linear transformation T (p rovided we use the matrix version o f T- namcly, Til. = 1[l}..-where it is required). • The matrix (T]C<-B depends on the order o f the vectors in the bases Band C. Rearrangmg the vectors within either basis wi ll affect the matrix [TJC+-B' (See Example 6.77(b ).1
Example 6.16
leI T: R' ~ 1R2 be the li near transformation defi ned by
x [ x- 2y 1 , x+ y - 32
T Y -
and let B = { e l'~' C3} and C = !e2, ed be bases for R J and IRl, respectively. Find the I matrix o f TWlth respect to Band C and verify Theorem 6.26 for v = 3
-2
Solution First, we compute
~ [;J.
T(e,)
T(e,) -
[-n
T(e,) -
[ -~l
Next we need their coordinate vectors with respect to C. Since
[ -~ ]= ~ - 2el' [_~] = -3~ +Oe l
[:]=e!+e h
we have
( T(e,)J,
~
[:J.
[T(e, )J, -
[-;J.
Therefore, the mat rix of Twith respect 10 Band C is
I = [:
- 2
[ T( e,) J, -
[-~l
514
Chapu:r 6
Vector Spaces
To verity Theorem 6.26 for v, we fi rst compute
T(v1
,
~
3 -2
[v].
Then
, 3 -2
~ [-5] ~ [ 10]
[ T(v)],
and
~
~ [ ~~]
10 c
- 5
(Check these.) Using all o f these fac ts, we confir m that
A[v].
Example 6.11
~
'[ 1 -3] 3, I
- 2
0
- 2
Let D; CJ>l --+ \jll be the d ifferen tial opera tor D( p(x)) "" p'(x). Let B = {I, oX. x ' , Xl } and C "" !1, x, xl) h e bases for ~3 and '!P2' respectively. (a) Find the matrix A of D with respect to Band C.
(b) r ind the matrix A' of D With respect to B' and C, where B' = {x', Xl,
+
ec) Usi ng pa rt (a), compute 0(5 - x Theorem 6.26.
SolulioD Fi rst note that D(a ple 6.60.)
+
bx
2K) and D(a
+ a! +
+
dx}) = b
bx
+
+
2ex
cx 2
+
X,
I}.
+ dx' ) to verify
3dx 2• (See Exam-
(a) Since the images of the basis B unde r Dare D( I) = 0, D(x) = I, D( xZ) = 2x, and D(x J ) = 3Xl , their coordinate vecto rs with respect to Care
[D( ll ]c ~
o
I
0 , [ D(x1 ]c ~
0,
o
o
0
[D(x') ],
~
2,
0
[D(x' )],
~
o
Consequently,
A ~ [D],_.
~
[[ D(1l], I [D(xllc i [ D(.'l ], I [D(x')] , ]
o =
0
1 0
0 2
0 0
o
0
0
3
( b) Since the basis 8 ' is just 8 in the reverse order, we see that
A' ~ [D], _G' ~ [[ D(x'l ],
I [ D(.') ]c I [ D(x1]c I [ D( ll ]cl
o o
0
I
0
2
0
0
3
0
0
0
0 3
Section 6.6 The Matrix of a Linear Transformati on
505
(This shows that the order o f the vectors in the bases Band C affects the matrix of a transformation with respect to these bases.)
(c) First we compute 0(5 - x
+ 2Xl)
=
- 1 + 6x 2 di rectly, getting the coordinate
vector - \
[D{S - x + 2xJ)]c = [ - 1 + 6x 2 ]c
0
=
6 On the other hand,
; - \
o 2
A[S - x+ 2X)J8 =
which
0
\
0
0
0
2 0
0
0
0
5
0
- \
- \
=
0
3
= ( D(5 - x + 2x')Je
0 6
2
'g"" w"h Thcoc,m 6.26. We I"y, proof of \h, gen",1 G'''' " , n
'''': t
Sin ce the linear tra nsformat ion in Example 6.77 is easy to usc di rectly, there is rc· ally no advantage to using the matrix of this transfo rmation 10 do calculat ions. However, in other cxamples-cspecially large o nes-the matrix app roach may be simpler, as it is very well-suited to compu ter impleme ntation. Example 6.78 illustrates the basic idea behind this indirect approach.
Example 6.18
Let T : ~l ---+ '!f2 be the linear transformat ion defined by
'f(p(x» -
P(2x - \)
(a) Find the mat rix of Twith respect to E = {I. x, xl}. (b ) Compute T (3
SoluUon
+ 2x
- x 2 ) indlfectly, using part (a).
(a) Wc sce thaI
T(\ )
= I,
= 2x -
T(x)
I,
T(x 2 ) = (2x - 1)2 = I - 4x + 4x~
so the coordina te vecto rs arc \
(T(I)J, =
- \
I
0 • ( T(x)J, =
2 • ( T(x' )J, -
o
o
- 4 4
Therefore,
(T],
= ([T( I)], I (T(x) J, I (,[,(x')J,] =
\
- \
\
0
2
- 4
o
o
4
51&
Chapter 6
Vector Spaces
(b ) We apply Th eorem 6.26 as follows: The coordinate vector of p(x) = 3 with respect to [: is
+ 2x - xl
3
[p(xl ], =
2 - ]
Therefore, by Theorem 6.26,
[T(3 + 2x - x' l ], = [ T(p(x)) ], = [ T],[p(xl ], = It follows that 7(3 + 2x computing T{3 + 2x- x 2 )
I
- I
I
3
0
2
- 4
2
o
0
4
- I
0
=
8 - 4
+ 8-x - 4_x 2 = 8x - 4x 2. (Verify this
Xi )
= 0- 1
= 3
+ 2(2x -
by
I) - (2x - I)l directly.)
The matrIX of a linear tra nsformation can sometimes be used in surpn smg ways. Example 6.79 shows its applicn tion to a traditional ca lculus problem.
Example 6.19
~
Let Qb be the vector space o f all diffe rentiable funct ions. ConSider the subspace IV of Qb given by IV ::: spa n(e JX , xe)K, x 2e3K ). Since the set 13 = f e-'x, xc' ''' x 2e IS linearly independent (why?), it is a basis for IV.
'x }
(a) Show that the d ifferenti al operator D ma ps IV into itself. (b) Find the ma trix of D wi th respect to 13.
(c) Compute the derivative o f Sel X + 2xelK- xle lx ind irectly, using Theorem 6.26, ilnd verify II using part (a).
50lulloo
(.1) Applying D 10 a general element of W, we see that
D(aelx + bxelx + cx 2e '<)
=
(3a + b)e-'.< + (3b + 2c)xc'.< + 3CX 1C lx
(check this), whICh is again in W. (b) Using the form ula in part (a), we see that
D(e-'X) "" 3~'"
D(xC
X ):::
e3x
+ 3xeh , D{x 2eh )
=
c
2xe-'x + 3x 2 x
so
[D(,>'l],
=
3 0 ,
o
]
[D(xe" l ],
=
0
3 , [D(x',"l ], = 2 0
3
It follows that
3
[D],
=
[[D( ," l] B![ D(xc")]Bl[ D(x'C'l ]B] =
]
0
0 3 2 003
Se<:tion 6.6
5a1
The Matrix of a Lmear Transfo rmatIon
«) For [(x) '" 5e):O: + lxt?" - x 2en , we see by instxCtion that 5 2 - I
If(x)J. = Hence, by Theorem 6.26, we have
[D(f(x»
J. = [DJ.[f(x )J. =
3 0 0
0 3 2 0 3 1
which, in turn, implies that f (x) = D{[(x») = 17c):O: with th e formula In part (a).
5 2 - I
17
4
-3
+ 4xe l .. - 3x 2el .., in agreement
RIma,.
The point of Example 6.79 is not that this method is easier than direct differentiation. Indeed , o nce the formula in pa rt (a ) has been established . there IS little to do. What is significant is that matrix: methods can be used at all in what apJ>f=ars, on the surface, to be a calculus proble m. We will explore this idea further in Example 6.83.
Example 6.80
Let V be an ,,·dimensional vector space and let I be tne identity transformation on V. What is the matrix of I with respc<:t to bases B and C of V if 13 = C (including the order of the basis vectors)? if B C?
*'
SOlulioll
Let 8 = {VI•...• v,,}. Then I(v l ) = vI" .. , l(v,,) = v". so
[1(v, )]. =
1
0
0
1
·•
[1(v,)J. =
= el,
=
~.
... , [1(v.)J. =
•
0
0
0 0 • • •
= .•
1
and. if B = C,
[IJ,
B
[[1(v, )]. I [1(v,) J, I ...
[1(v.)].J
=[' , I ' , I · I · .J = /
•
the nX II Identity mat rix. (This is what you expected, Isn't it?) In the case 13 *- C, we have
so
[1JC_B
=
[[v,Jc 1"' 1[v.Jd
". P _ C B
the change-of-basis matrix from l3 to C.
Matrices of Composlle aad fnverse linear Transformallons We now generalize Theorems 3.32 and 3.33 to get a theorem that will allow us to easily find the inverse of a linear transfor mation between fini te-dimensional vector spaces (if It exists).
50B
Chapter 6 Vector Spaces
Theorem 6.21
Let U, V, and W be finite- di mensional vector spaces with bases 8, C, and 12. respectively. Let T: U -+ Vand 5: V -+ Wbe linear transformations. Then
Remar~s
•
.
In words, this theorem says "The matnx of the composite is the product of the
.
matnces. • Notice how the "inner subscripts" C must match and appear to cancel each other out, leaving the "ou ter subscripts" in the form V *"- 8 .
Prool We will show that corresponding columns of the matrices [5 0
Tlv ....ti and
[S]v .._d T]C....Bare the same. Let v, be the ith basIs vector in 8. Then the ith column of [5" T]V<-6 is [(So T)(v,) l v
~
[S(1{v,)lv ~ [SlD.-d 1{v,) lc ~ [Slp--c[ Tlc ~B [ v,l,
by two applications of Theorem 6.26. But [v ,l6 = e, (why?), so
is the ith column of the matrix [S ]v+-c [ T] c;-ti. Therefore, the jlh columns of [5" T]V ....B and [S ]v+-c[ T] c<-ti are the same, as we wished to prove.
Example 6.81
Use matrix methods to compu te (5 0
T) [:] for the linear transformations Sand Tof
Example 6.56.
Solullon Recall that T: U;f -+ CZi'] and S: '!P ] -+ '!P 2 are defined by 1: ] =a + {Il+b)X
and
S(a + bx}=ax + bx 2
Choosing the standard bases [, [', and £" for 1R2, W" I' and '!P 2, respectively, we see that • , ~c ~ [ 1 [rl 1
~l
and
[S]e+-c
o
0
I
0 1
=
o
(Verify these. ) By Theorem 6.27, the matrix of SQ TWlth respect to [and [" is
o
0
1
0
1
1
Section 6.6
The Mat rIX of a Lmear Transformation
589
Thus, by The orem 6.26 ,
D[:]L
[(SO
= [(S o
=
Con sequen lly, (So
T) [ ~]
=
(lX
D],.-<[:l,
o
0
0
I
0 [: ] =
a
I
I
+ (a + b)x z.
a+b whi ch ag rees with the solution to
Exa m ple 6.56.
In The orem 6.24 . we proved that a line ar tran sfor mat ion is invertib le if and only if it is one -to- one and onto (i.e., if it is an isom orp h ism ). Wh en the vect or spaces in volved are fi nite -dim ensi onal. we can use the mat rix met hod s we hav e develo ped 10 fi nd the inverse of such a linear tran sfo rma tion .
Thea, .. &.21
ut T: V ...... \V be a linear tran sfor mat ion betw een /I-dimensional ve<: lOr spaces V and Wa nd let Ban d C be bases for Van d W. respecti\'ely. The n T is invertible if and only if the mat rix (7']C..-B is invertible. In this case,
Prool
Obs erve that the mat rices of Tan d '1'- ' (if it exis ts) are then y-I 0 T = IV' App lyin g The orem 6.27, we have I. =
[Iv].
= [y- ' =
0
"x ".If Tis invertible,
TI.
[T- ' ]o_ c[Tle_ o
Th is shows that f TI C... 8 is inve rtib le and that ( [ 1']C.... 8)-1 := [y- ' ]8 ....C. Conversely, assu me that A = i Tl ..... is invertible. To sho w that Tis inve rtib le. it c s is eno ugh to sho w that ker( T) := {O\. (Why?) To this end , let v be in the kern el of T. The n T( v) = 0, so
A[v ].
=
[Tle_ o[ v].
=
[T( v)]e
=
[Ole
= 0
wh ich mea ns that (vJI'! is in the null space of the inve rtib le mat rix A. By the Fun damenta l The orem . th is imp lies that (VJ8 = O. which. in turn , imp lies that v = 0, as required.
Elample 6.82
In Example 6.70 , [he linear tran sfor mal ion T: (R1 -)o ~l defined by
1:] =a+(a+b)X was shown to be one· to-o ne and ont o and hen ce invertible. Find
,(- I.
511
Chapler 6
Veclor Spaces
5Dlull •• In Example 6.81, we found the matrix of Twilh respeclto the standard bases £ and £' for Rl and ~ l' fC.'Spectively, 10 be
By Theorem 6.28, il follows that the matrix of
r -1 with respect to £' and E is 0] - '=[ 1 1
-1
~]
By Theorem 6.26,
[r'(a + bx) ],
=
[r-' ], _da + bx]'.
~][~]
= [- : =
[b ~ a]
This means that
rl(a + bx) :: (lei + (b - cl)el :: [
1 b-a a
(Note thai the choice of the standard basis makes this last calculat ion virtually irrelevant.)
The next exa mple, a contin uation o f Example 6.79, shows that matrices can be used in certain integration prob lems in calculus. The specific Integral we consider is usually evaluated in a calculus cou rse by me,lns of two appl ications of in tegration by parlS. Contrast this approach with our method.
Ex ••p'e 6.83
Show that the differential operator, rest ricted to the subspace W ::: spa n (c~, x(l", x1e-'1 of£tl, is invertible, and usc this fact to find the integral
J"'<"
$.I.tI.. B=
In Example 6.79, we found the matrix of D wit h respect to the basis
{~,xrr,xle'1 ofWt o be
310 [ D]. =
0
3 2
o
0
3
By Theorem 6.28, therefo re, 0 is Invcrtlble on W, and the matrix of D 3
[D-' ]. = ([D].f' =
I
0 -1
0 3 2 003
1
o o o
t;
-,•
1
I
is
S«lion 6.6 The Malrix of a Lmear T~a n s formallon
511
Since integration is untidif[erenti(ltion, this is the matrix corresponding to integration on W. We want to integrate the fun ction x1e,;r whose coordi nate vector is
o [X' .."J8 -
0 I
Consequently, by Theorem 6.26,
[I x1e);r dxL = =
( D- I(x~e}'<) lll' (D-I1u[x 2e)'<]B
,, _1, o 1• o
o o
o
I
It follows that
(To be fu lly correct, we need to add a consta nt of integration. It does not show up here because we are working with linear tran sformations, which must send zero vectors to zero vectors, forci ng the constant o f in tegration to be zero as well.)
WarnIng In general, differentiation is no/ an invertible transformation. (See Exercise 22.) What the preceding example shows i.~ that, suitably restricted, it sometimes is. Exercises 27- 30 explore this idea fu rther.
Change 01 Basis and Similarity Suppose T: V-+ V is a linear transformatio n and Ba nd C are two different bases fo r V. lt is natural to wonder how, if at all , the matrices [T l /j and [T lc are related. It turns out that the answer 10 this question is quite satisfying and relates to some questions we fi rst considered in Chapter 4. Figure 6.1 2 suggests one way to address this problem. Chasmg the arrows around the diagram from the upper left-hand corner 10 the lower right-hand corner in two different, but equivalen t, ways shows that l o T = T o I, something we already knew, since both are equal to T. However, if the "upper" version of T is with respect to the
v } baSIS C ,j
~
~J
I
T
flp,. &.12 / o 1'=T,, /
V
•
. T( ,.)
} basis B
512
Chapter 6
Veclor Spaces
basis C and the "lower" version IS with respect to 8 , then T = l o T = T o I is with respect 10 C in its domain and With respect to 8 in its codomain. Thus. the matrix o f T in this case is [T]g....c. But
.od
I TI.-c = IT' 1).-c = I TI.-.I fj • ...c T herefore, [llB .....d T]c-c = (T]S_8( 1]8-e· From Example 6.80, we know thai [1]8-e = Pa -e, the (IOvertlb le) change-ofbasis matrix fro m C 10 B. If we d enote Ih is matrix by P, then we also have 1'1 = (Pg -ct l = Pc_a
W ith this notation,
Thus, the mat rices [ TJ8 and [TJ c are similar, in the terminology of Section 4.4. We summarize the foregoing d iscussion as a theorem.
I
Theorem 6.29
Let V be a finite-d imensional vector space with basesJ3. alld.c..andJeLL;jI'~.... be a linea r transformation. Then
where P is the change-of-basis matrix from C to 8.
--~~~~~----~=, Ramarll
As an aid in rememberi ng that P must be the change-of-basis matrix from C to 8, and not 8to C. it is instr uctive to look at what Theorem 6.29 says when written in fu ll de tail. As shown below, th e " inner subscripts" must be the same (all Bs) nnd must appear to cancel, leaving the "outer subscr ipts," wh ich are both Cs.
Same
Theorem 6.29 is o ften used when we a re trying to find a basis wit h respect to which the matrix of a linear transformat ion is partic ularly si mple. For example, we can ask whether the re is a basis C of V such that the matrix [ T] c of T: V .... V is a d iagonal matrix. Example 6.84 ill ustrates this application.
Example 6.84
leI T:
1R 2 ....
1R 2 be defined by x + 3y 1 X l [ [ T Y = 2x + 2y
If possible, fin d a basis C for R2 such that the matrix of Twith respecllO C is diagonal.
Section 6.6 The Matrix of a Lmear Transformation
80lu110n
513
The matrix of T with respect to the s tandard basis C is
[11, =
[;
:]
This mat rix is diagonalizable, as we saw in Example 4.24. Indttd, if
P=['I
3] ,nd
- 2
D=[40 - 0]I
then P- L[ T]c P "" D. If we let C be the basis of IR 2consist ing of the columns of P, then P is the change-of-basis matrix Pc.... , from C to C. By Theorem 6.29,
[71c = p-' [11 .. p =
D
~
so the matrix of T with respect to the basis C = { [ ], [ _
~] } is diagonal.
He • .,,, • It is easy to chcr:k that the solution above is correct by computing directly. We fi nd that
ITlc
Thus, the coordinate vectors that fo rm the columns of [ Tl c are
io agreement with our solution above. • The general procedure for a problem like Example 6.84 is to take the 5t'andare! matrix I T]! and determine whether it is diagonali7.able by findin g bases for its eigcnspaces, as in Chapter 4. The solution then proceeds exactly as in the preceding example. Exam ple 6.84 motivates the following definition .
Definition
lei Vbe a finite-dimensional vector space and let T: v~ Vbea mcar trans ormation. Then Tis caJled diagotltdizable if there is a basis C for V such that the matrix [T Ic is a diagonal matrix. It is not hard to show that If 6 is allY basis for V, then Tis diagonahz..1ble if and only if the matrix [1'].6 is diagonalizable. This is essentially what we dId , for a special case, in the last example. You are asked to prove this result in general in Exercise 42. Sometimes it lS easiest to write down the matrix of a Imear transformation with respect to a "nonstandard" basis. We can then reverse the process of Exa mple 6.84 to find the standard matrix. We illustrate this idea by revisiting Example 3.59.
Enmple 6.85
Let
ebe the line through the origin 10 R2 with direction vector d = [~:]. Find the
standard matrix of the projection onto
t.
514
Chapter 6
Vector Spa(:es
Solullon
Let T d enote the p rojection. There is no harm in assuming that d is a unit vector (i.e. , df + di = I) , sim:e any nonzero multiple o f d can scrve as a direction
vecto r for C. Let d ' = [
-d~2] so that d and d' are orthogonaL Since d ' is also a unit
vector, the set V = {d , d '} is an Orl honorma\ basis for Rl. As Figure 6. 13 shows, T( d ) = d and T(d ') == 0, Therefo re,
1 [ T(d)Jv = [0] , ,,d [T(d ') Jv =
[~]
" 7\,, d
__
~'-:- '
_ T(y) " . ,::::::-'(s.;T,:('I'----. --\7 "'"7rd', ,, ' 'T( d l= d x ,, , -¥
y
figure &.13 Projection onto
e
[71. =
[~ ~]
The change-of- basis matrix fro m V to the standard basis [ is
d, Pi ....'D = [ d
-
l
so the change-or-basis matrix from [to V is
-d1]-1=
By Theorem 6,29, then, the standard matnx of Tis
which agrees with parI (b) of Example 3.59.
d,] d,
Section 6.6
Example 6.86
The Mat rix of a Linear Transforma tion
515
Let T : 'lP 2 ~ f/J' 2 be the linear transfo rmation defined by T( p(x»
~
p(2x - I )
(a ) Find Ihe malrix o f T with respect to the basis B == {I
+
X2} of z. (b ) Show that Tis d iagonalizable and fi nd a basis C for qy 2 such Ihal I T Jc is a diagoX,
I -
X,
nal matrix.
Solution
(a) In Example 6.78, we found that the matrix of T wit h respect to the standard basis £ = {I, X, Xl} is
1 [71,~
- \
0
1
2-' 0 ,
o The change-of-ba sis ma trix fro m B to £ is 1
P = PCo-- 8
==
1 0
1
-I
0
o
0
1
It foll ows thai the matrix of Twl1h respect 10 {3 is
[ T).
~
~
~
!"[ TV
,, 1, , _1,
1
0
1
- I
1
1
0
2 0
-4
1
0
0 1
4
0
0
1 0 -I 2
0
-,,,
0
0
1 0 - I 0
0
1
,
4
(Check this.)
(b ) The eigenvalues of [Tlc are 1,2, and 4 (why?), so we kn ow from Theorem 4.25 that ['[1,.: is diagonalizable. Eigenvectors corresponding to these eigenvalues are I
- 1
0,
1,
o
0
1
-2
1
respectively. Therefore, setti ng 1 p ~
nc
- I
1
0
1 -2
o
o
1
and
D ==
1 0 0 020 o 0 ,
p = D. Furthermore, P is the change-of-basis matrix from a basis we h[lve p- I[ C to £, and the col um ns o f P are th us the coordinate vectors of C in terms o f E. It follows that
C == { I, - I and l1') c == D.
+ x, 1 - 2x + Xl}
516
Chapter 6 Vector Spaces
The preceding ideas can be generalized to relate the matrices [1]c0-6 and [TJeo-B' of a linear transformation T : V --+ IV, where 8 and 6' are bases for Va nd C and C' a re bases fo r W. (Sec Exercise 44. ) We conclude this section by revisiting the Fundamental Theorem of Invertible Matrices and incorpo rating some results fro m this chapter. 1&.
Theora. 6.30
T he Fundamental Theorem of Invertible Matrices: Version 4 Let A be an /IX II matrix and let T: V --+ IV be a linear transformation whose mat rix [J]co-u with respect to bases 6 and C of V and W, res pectively, is A. The following statements are equivalent: a. A is invertible. b. Ax :::: b has a unique solution for every b in R". c. Ax :::: 0 has o nly the trivial solution. d. The reduced row echelon for m of A is 1..e. A is a prod uct of elemen tary matrices. f. rank(A) = /I g. nullity(A) = 0 h. The column vectors of A are li nearly independent. i. The column vectors of A span IR". j . The colum n vectors of A fo rm a basis for H". k. The row vectors of A are linearly independen t. I. The row vectors of A span IR ". m . The row vectors of A form a basis for R". n . det A,#O o. 0 is not an eigenvalue of A. p. T is invertible. q. Tis one·to· o ne. r. T is onto. s. ker(T)::::! OI t . range( T) :::: W
Prool The equ ivalence (q) ¢:;> (s) is Th eorem 6.20, and (r) ¢:;> (t) is the defini tion of o nto. Since A is /l Xn, we must have dim V :::: dim W = n. From Theorems 6.21 and 6.24, we get (p) ¢:;> (q) ¢:;> ( r). Finally, we connect the last fi ve statemen ts to the others by Theorem 6.28, which implies that (a ) ~ (p).
In Exercises 1- 12,jilld the matrix [Tb_6 of the linear transformation T: V --+ IV with respect to the bases 6 alld C of V and W, respectively. Verify Theorem 6.26 for the vector v by computing T( v) directly and using the theorem. I. T:
bx) = b - ax, B = C = {I. x}, v = p(x} = 4 + 2x
2. T : 9/'1 _ 9/'1 defmed by T(a + bx) = b - ax, 6 = 11 + X, 1 - x},C = {l,x}, v :::: p(x) :::: 4 + 2x 3. T: 'lJ> Z--+!J>l defm ed by T ( p{x) = p{x + 2), B = {i. >; x' }. C = {I. x + 2. (x + 2)' }. v =
p(x ) = a + bx + cx 2
Section 6.6
4. T:~ ! --+!J1 defined by T( p(x» = p{x + 2),
B - {I, x + 2, (x + 2)'), e - {I, x, x' }. v "" p(x) "" a
+
bx
2
(a) Show that the d ifferential operator D maps Winto itself. (b ) Find the mal rixof Dwith respect to B = {fL<,e- h'J . (c) Compute the de rivative o f [(x) = C-" - 3e- h" indirectly, uSIng Theorem 6.26, and verify that it agfl~es with r(x) as computed directly.
[:~~n,
5 "" (I, x, Xl }, C = {e l, el } . v = p(x) ". a + bx + cx1
[~~~~].
6. T !J>, -+ R' d,fin, d by T( p(x» B - {x' ,x, I},e V ""
7. T:
p(x) = a + bx
-,
{:] -
th" cos x. t h" Sin x ).
, B-
{ [;].[_~]},
b
I I o, I , I I 0 0
8. Repeat Exercise 7 with v
,
v -
[-~]
= [:].
9. T: M Z1 -+ M12 defi ned by T(A) = AT, B = C = {E' l' Eu.
~l' ~l}' V = A = [:
!]
10. Repeat Excrcise 9 w ith B "" (Ell' Ell ' Eu. Ell} and C = {Ell> E2l , E22, Ell}' II. T: MI.2 -+ M22 defined by T(A ) = All - BA, where
B- [
I -I], B _ e - {E" , E E", E,,}, - I I
v - A - [:
(a) Find the matrix of D with respect to B = {th , eh" cos X, eh" sin x}. (b) Computethederivativeoff(x) = 3t h - tUcosx+ Zth" sin x indirectly. using Theorem 6.Z6, and verify that it agrees with 3S computed d irectly.
+ (X l
I
e -
~ 15. ConSider the subspace IV of~, given by W = span (t h",
W].[:]},
R2 --+ RJ d efi ned by a + 2b
rex)
~ 16.
Consider Ihe subspace Wof 9:J, given by W = span (cos x, sin x, xcos x, x si n x). (a) Find Ihe mat rix of Dwith res pect 10 5 = {cos X, sin x, x cos X, xsin x}. (b) Compute the d erivative of f(x) = cos x + Zxcos x ind irectly. using Theorem 6.26, and verify Ihat it agrees wilh f (x) as computed directl y.
III Exercises 17 alld 18, T: U -+ V and 5 : V-+ Ware Ii/lear tra/l sformatlOlIS (Illd 5, C. mul V are bases for V. V. lIIld W, respectively. Compute [S 0 T):D ..... 8 ill two ways: (a) by finding So T directly and thell compl/ting Its matrix and (b) by finding the matrices ofS anti T separately and using Theorem 6.27. 17. T: @I, -+ R 2 defi n ed by T(p(x)) =
u,
:]
d'finodby
[~~~]. S: RI-+ RI
J.Jl a]_ [a 2b],B _ {I,x}, 2a - b b
C = V = {e l . e2}
12. T:M21 -+M11 definedbyT(A) "" A - AT,B =
c= {Ell,E12'~l>~l}' V =
511
1l. 14. Consider the subspace W o f 2b, given by W - span (C'", e- 1., "),
+d
5. T ;@>l --+ R d efinedbyT(p(x» =
The MatriJ( of a Linea r TransformatIon
A = [:
~]
13. Consider the subspace Wof 2'b. given by W = span (slll X, cos x). (a) Show that the differential operator D maps IV into itself. (b) Find the matrix of 0 with respect to B = {sin x, cos x}. (c) Compute the d erivative of fix) "" 3 sin x - 5 cos x indireclly. using Theorem 6.26, and verify that it agrees with r ex) as computed directly.
18. T: ~' -+ ~l definedbyT(P(x» = p(x+ I), S:~l-+~ldefi n edbyS{p(x» - p(x+ I),
B: {I, x},e - V - {I,x,x'}
III Exercises 19-26, determine wller/ler tire lillear trfillsformatioll T is invertible by considering its matrix witll respect to the standard bases. 1f T is invertible. lise Theorem 6.28 and tile metlrod of Example 6.82 /0 fiml T- ' . 19. Tin Exercise 1
20. Tin Exercise 5
21. Tin ExeTClse3 22. T: (jJ> I -+ '!P 2 defi ned by T(p(x» = p' (xl . T: ~l-+'!Pl defined by T(p(x»
= p(x) + p' (x)
Chapler 6 Vector Spaces
51.
24. T: Mn --+ MZ2 defi ned by T(A ) = AU, when!:
it to compute the o rthogonal projection of v o nto W, where 3 v -
25. T in Exercise II
2
26. T In Exercise 12
C()1l1pare your answer with Example 5.11. I Him: Find an orthogonal decomposition oflVas W = w + W1. using an o rthogonal baSIs for W. See Example 5.3.1 39. Let T: V--+- Wbe a li near transform.llion between finite- dimensional vecto r spaces and let Band C be bases fo r Vand W, respectively. Show that the matnx of Twith respect to Ba nd C is un ique. That is, If A IS a matrix such that A[ v]o = [ T(v)] c for all v in V, then A = [1'Jc_B' {Hi"t: Find values of v that will show this. one column at a time.]
~ 11, ExerCIses 27-30, use the method of Example 6 83 to eVal'lnte the given inlrgml.
f 28. f
27.
(si n x - 3 cos x )llx (See Exercise 13.) Se- z.. (ix (Sce Exercise 14. )
29.
J (;S cos x -
30.
f
2i-" si n x) dx(See Exercise 15.)
(xcos x + xsin x ) dx(See Exercise 16.)
III Exercises 31-36, a lillear Ir(msfor/1lf1liml T: V--+ Vis give" If possil1le, find a 1n,sis C for V S'ld, IIJat the matrix [T1- ofT will, respect 10 C is dlagollal.
'1. T: R2 -t Rl definedbyJal ~ [ - 4b 1 Jl a +5b
33.
1:]
=
(tl
41. Show that rank(T) = rank(A).
[: +~]
T :gJI, --+ ~, d e finedbyT( a + bx) = (4a
42. If V = Wand lJ = C, show that Tis diagonalizable if and only if A is diagonalizable. + 2b)
+
+ 3b)x
34. T: @I, --+ ~ Zdcfined by T(p(x )) = p{x + I) l.&.35. T :@I,--+gJIl defined by T(p(x» = p(x) + xp'(x) 36. T : t~\ --+ ~2 definedby T(p(x)) = p(3x+ 2) 37. Let
ebe the line thro ugh the o n gin in R' with dire<:tion
[ ~:]. Use the method o f Example 6.85 to find the standard matrix of a reflectio n in e. vecto r d
III Exercises 40-45, let T: V --+ W be ,,/inear tran sformatio" belweetlfi"ite-dime"s;OIw/ vector splices V (IIlti ~v. LeI t3 (IIuJ C be bases for Vand \v, respectiw!iy, and let A = (T'Jc ...B'
40. Show that Ilullity( T ) .... nullity{A ).
b
32. T: R2 --+R2 defined by
- 1
=
38. Let IV be the plane in W with equation x - y + 2,z ::: O. Use the method of Exampl e 6.85 to fi nd the standard matTix of an orthogonal projectio n onto W. Verify that your answer is correct by usmg
43. Use th e results of this section to give a matrixbased proof of the Rank Theorem (Theo rem 6.19). 44. If 6 ' and C' are also bases for Vand W, respectively, what IS the relationsh ip between [ Tk "'B and (TJC' ...tI. Prove your assertion. 45. If dim V = "and dim W = m, prove that !i( V; W) iii ""'....,. (See the exerCises for Sectio n 6.4.) [I'I;"t: Let B and C be bases fo r Vand \V, respectively. Show that the mapping
Tilings, Lattices, and the Crystallographic Restriction Repeating pallerns are frequently found in na ture and in art. The molecular structure of crystals often exh ibits re petit ion, as do the tilings and mosa ics found in the artwork of many cultures. Tiling (or fessef/mion) is covering o f a plane by shapes that do not overlap and leave no gaps. The DUlch artist M. C. Escher (1898- 1972) produced many wo rks in which he exp lored the possibility of tiling a plane using fanci -
ful shapes (Figure 6.1 4).
, ! ~
•
nil"
'.1.
M. C. Escher's "Sym mclry Drawing EI03"
511
fI,.,. &.15
fI,.r. &.11
Invariance under translation M. C. Escher's "Symmetry Drawing EI 03~
Rotational symmetry !l.I. C. Escher's "Symmetry Drawing E103"
•
•
•
• • •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
"
fI,.,. &.1& A laUice
•
•
•
In this exploration, we wi ll be Interested in patterns such as those in Figure 6. 14 , whICh we assume to be infinite and repeating in all di rections of the plane. Such a patte rn has the properly that it can be shifted (or mil/slated ) in at least two directions (corresponding to two linearl y independent vectors) so that it appears not to have been moved at all. We say that the pattern is illl'arimrl under translations and has translational symmetry in these directions. For exam ple, the pattern in Figure 6.14 has translational sym metry in the d irections shown in Figure 6. 15. If a pattern has translational symmetry in two directions, it has translational symmetry in infinitely many directions. I. Let the two vectors shown in Figure 6. 15 be denoted by u and v. Show that the pattern in Figure 6.14 is invarian t under transltllion by any imeger linear combination of u and v- that is, by any vector of the form au + by, where a and b are integers. For any two linea rly independent wctors u and v in Rl, the sct of points determined by an integer linear combinations of u and v is called a lattice. Figure 6.16 shows an example of a lattice. 2.
I
Draw the lattice corresponding to the vectors u and v of Figure 6.15.
Figure 6.14 also exhibits rotatio"al symmetry, That is, it is possible to rOlate the entire pattern about some poi nt and ha\'e It appear unchanged. We say that it is invariatlt under such a rotation. ror example. the pattern of FIgure 6.14 is Invariant under a rotation of 120" about Ihe pOInt 0, as shown in FIgure 6.17. We call 0 a center of rotational sym metry (or a rotatio" ce"ter). Note that if a pattern is based 011 an underlyjng la ttice, then any symmetries the pattern must also be possessed by the lattice .
or
...
3. Explain why, if a point 0 is a rotation center through an angle 0, then it is a rota tion cen ter th rough every integer multiple of O. Deduce that if 0 < 0 <: 360°, then 360/ 0 must be an integer. (If 360/ 8 = II, we say the pattern or lattice has n-fold rOlational symmetry.) 4. What is the smallest positive angle of rotational symmetry for the lattice in Problem 2? Does the pattern in Figure 6.14 also have rotational symmetry through this angle? 5. Take various values of 0 such that 0 < 0 :$ 360" and 360/ 0 is an integer. Try to draw a lattice that has rota tional sym metry th rough the angle O. In partICula r, can you draw a lattice With eight-fold rotational symmetry? We will show Ihat values of 0 thai are possible angles of ro tatlonal symmetr y for a lattice are severely restricted. The techniq ue we will use is to consider rotatIO n Iransformati o ns in te rm s o f diffe rent bases. Accordingly, let Re deno te a rotation about the origin through an angle (J and let £ be the standard basis for R2. Then the sta ndard matrix of R, is
[ Re1£ -6.
1 cos O
[COS9 . Sill
-sin 0
8
Referring to Problems 2 and 4, ta ke the origin to be at the tails of u and v. (a) What is the actual (Le., numerical) value of [Role in this case? ( b) Let B be the basis !u, vi. Compute the matrix [Ro J6'
7. In general, let u and v be an y two linearly indepe ndent vectors in Rl and suppose that the lattice de termined by u and v is invariant under a ro ta tion through an angle O. If B = {u , v}, show tha t the matrix of R ~ with respect to B lUust have the form
a
[R. le = [ ,
I,]d
where a, b, c, a nd d are integers. 8. In the terminology and notation of Problem 7, show thai 2 cos 0 must be an integer. rHmt: Use Exercise 35 in Section 4.4 and Theorem 6.29. ] 9. Using Problem 8, make a list of ;111 possible values of 0, With 0 < fJ :$ 360", that can be angles of rota tional sym metry of a lattICe. Record the corresponding values of n, where n = 360/ 8 , to show that a lattice can have u-fold ro tational symmetry if and o nly If n "" 1,2, 3,4, o r 6. This resuit , known as the crysta ffograpllic restriction, was first proved by W. Barlow in 1894 . 10. In the library o r o n the Internet , see whether you can find an Escher tiling fo r each of the five possible types of ro tational symmetry- that is, where the smallest angle of rotational symmetry of the pattern is one of those speCified by the crystallographic restriction.
521
5%2
Chapter 6
Vecto r Spaces
,
Applications HOIIIOge8eOUS lIRear Dlfferenllal (quallons In Exercises 67-70 In Section 4.6, we showed that if y "" y( r) is a twice-d ifferen tiable function that satisfies the d ifferential equal ion y~+(/y'+by = O
(I)
then yis of the form
y "" CI~ I' + ~~" if "\ 1 and A2 arc distillct roots of the associated characteristic equation Al + aA + b "" O. (The case where ,.\ I = A2 was left unresolved.) Example 6.12 and Exercise 20 in this section show that the set of solutions 10 equation ( I ) forms a subspace of f!j, the vector space of functions. In this section, we pu rsue these ideas further, paying pa rticular attention to the role played by vector spaces, bases, and dimension. To set the stage, we consider a simpler class of examples. A differential equation of the form
)" +ay=O
(2)
is called a first-order. homogeneous, linear diffuelltial equation. ("First -order" refers to the fac t that the highest derivative that is involved is a fi rst derivative, and " homogeneous" means thai the right-hand side is zero. Do you see why the equation is "linear"?) A solution to equation (2) is a differentiable function y = y et) that satisfies equation (2) for all values of t. It is easy 10 check that one solution 10 equation (2) is y = e- ·'. (Do il.) However, we would like to describt all solulions--and th is is where vector spaces come in. We have the following theorem.
Theall. 6.31
The set S of all solutions to y'
+ ay =
0 is a subspace of ~.
• PrOD' Since the zero fu nction certainly satisfies equation (2), S is nonempty. Let x and y be two differe ntiable functio os of t that are in S and let c be a scalar. Then
x'+ax=O
and
y'+oy=O
so, using rules fo r differentiation, we have
(x
+ y)' + a(x + y) =
x + )"
+ ax + ay = (x' + ax) + ()" + ay)
::=
0
+0
"" 0
.nd
(",), + a(cy) = cy' + c(ay) = c(y' + ay) = c' 0 = 0 Hence, x
+ yand cy arc also in S, so 5 is a subspace of 30.
Now we will show that S is a one-dimensional subspace of30 and that {[-I is a basis. To this end, let x = x (r) be in S. Then, for all r,
x'(r) + ax (r) = 0 or x'( r) - -ax( r)
Section 6.7
ApplICat ions
523
Define a new function z ( t} = x( t)e-r. Then, by the Chain Rule for differentiation,
z'( r) = x(r)ae" + x'(r)e" = ax (r)'" - ax (r)'" = 0
Since z' mea ns tha t
IS
identicall y zero, z must be a constan t fu nction-say, z (t) = k. Hu t th is
x (t}e"r = z( t) = k
for all t
so x (t) :: kc- ·'. Therefore, all sol ut ions to equation (2) are scalar multiples of the single solution y = e-·'. We have proved the following theorem.
Theorem 6.32
If S is the solution space of y' + ay = D. then dim S = I and {c· ., I is a basis for S.
O ne model for population growth assumes that the growth rate of the population is proportional to the size of the population. This model works well if there are few restrictions (such as limited space, food , or the like) on growth. If the SIZe of the populatIon at time t is p(t), thcn the growth rate, or rate of change of the population, is its derivative 1" (t). Our assumption that the growth rate of the population is proportional to ils size can be written as
p'(r) = kp (r) where k is the proportionality conSlant. Thu s, p satisfies the differenti31 equation p' - kp :: 0, so, by Theorem 6.32,
for some scal3r c. The constan ts c and k 3re determined uSing experimental data.
Ixample 6.81
The bacterium Esc/,cr;clJ;n coli (or £ col" for short) is commonly found in the intestines of humans and other mammals. It poses severe health risks if it escapes into the environment. Under laboratory conditions, each cell of thc baClen UIll divides into two every 20 min utes. If we start with a smgle E. coli cell , how Ill.my will there be after I day?
SGlullol We do not need to use differential equations to solve this problem, but we E. coli is mentioned in Michael
Crichton's novel The Allrirolfll'ria Slmin (New York: DC'II, 1969), altho ugh the' '''·illain- in that novel was supposedly an aliC'n virus. In real life, E. coli contaminated the town wdter supply of WalkC'rlOn. Ontario, in 2000. resulting in sn·en deaths and C1using hundrC'ds 0 people to bec;:ome.scriously ill.
will, in order to illustrate the basic method. To determine c and k, we use the data given in the statement of the problem. If we take I Ullit of time to be 20 minutes, then we are given that P(O) = I and p( I) = 2. Therefore. C"'"
c · 1 = ce l ·O = I
and
2::
ce t ' l =
t
It follows that k :: In 2, so
p(r} =
e " nt = e lnt ' = 21
Afler I day, t = 72, so the number of bacteria cells will be p( 72) = 271 (sec Figure 6. 18 ).
"..
4.72 X 1021
5U
Chapter 6 Vector Spaces
p(1) 5 X 1()21 4 X
4.72 X 10 21
---------------- ----
102 1
3 X 10 21
2
,, ,, ,, ,,
X 102 1
I X 1()21
o o
flgur.
'.1.
I
10 20
30 40
50 60
70\ 72
Exponential growth
me,)
Radioactive substances dec:ly by emitt ing radiation. If d enotes the mass of the substance at time /, then th e rate of decay is 11/'( f). Physicists have found that the rate of decay of a substance is p roportional to its mass; that is,
m'{ I) :: km(f)
or
m' - km
=
0
where k is a nf:'gative constant. Applying Theorem 6.32, we have
m(l) = ei' for some constant c. The time required for half of a radioactive substance to decay is called its half-Ii/I!.
Example 6.88
After 5.5 days, a t OO-mg sample o f rado n-222 decayed to 37 mg. (a ) Find a fo rmula for m(l), th e mass remaining after I days. ( b) What is the half- life of radon-222? (c) When will only 10 mg remain?
Sallll,.
(a) From
me,)"" d', we have 100 "" m(O) = ce k ' O = c · I = c
met) = lOOe k' W ith time measured in da ys, We are given that m(5.5) ::: 37. Therefore,
IOOe ut = 37 e~'Sl
so
= 0.37
Solving fo r k. we find 5.5k k=
Therefore. met ) = looe- o 18'.
= In(0.37)
In(0.37) 5.5
- - 0. 18
Se
Appl icatio ns
525
met)
100
80 60 40
,,, , ,, ,,
' 50
20
3.85 :
-1-4~-4~'~4-4-4-~4-~-' 2
4
6
8
10
Figure 6.19 Rad ioactive decay
(b ) To find the half-li fe o f radon-222, we need the value of t fo r wh ich met) = 50. Solving this eq uation. we fi nd
lOOe-()'lsr = 50 e- O.lsr = 0.50
so Hence,
- 0.181 = Inm = - In 2
,nd
t=
In 2 0.18
= 3.85
Thus, radon-222 has a half-life of approximately 3.85 days. (See Figure 6.19.) (c) We need to determine the value o f t such that met) = 10. That is, we must solve the equation
100e- o.18r
=
10 or
e- O. 18 r = 0.1
Taking the natural logarithm of both sides yiel d s - 0 . 181 = In 0.1 . Thus,
In O. l = 12.79 - 0.18
so 10 mg of the sample will remain after approximately 12.79 days.
See U,umr Algebra by S. H. Friedberg, A. J. Inse], and L. E. Spence (Englewood Cliffs, N/: Prentice-Hall,1979).
The solutio n set 5 of the second -orde r d ifferential equation y ~ + uy' + by = 0 is also a subspace of,* (Exercise 20), and it tu rns out that the dimension of 5 is 2. Pa rt (a) of Theorem 6.33, which extends Theorem 6.32, is implied by Theo rem 4.40. Our approach here is to use the power of vector spaces; doi ng so allows us to obtain part (b) of Theorem 6.33 as well, a result that we cou ld not obtain with our previous methods.
.,,' 13
•
tel S be the solution space of y~
+ II)" +
by'" 0
and let A. and A1 be the roots of the characteristic equation 11."1
+ (1'\ + b =
O.
a. If A. ¢ A2• then {e A" , e.4,~ is a basis for S. b. If A, ". A l l then teA"~. re"'-'} is a basis for S.
Rlmarlls • Observe that what the theorem says, in other words, is that the solutions of y" + ay' + by"" 0 arc of the form
in the first case and
in the second case. • Compare Theorem 6.33 with Theorem 4.38. Linear differential equal ions a nd linear recurrence relations have much in (ommon. Allhough Ihe former belong to CO III;IIUOUS mathematics and the latter to discrete mathematics, there arc many
pamllels.
'raal (3) We first show that {eA", eA,,} is contained in S. Let A be any root of the characteristic equation and let f( t) = C r, Then
and r( t) = A2eAr
f( t) = Ae'" from which it fo llows that
r + af + bl= A e
2 Ar
_ ( ,\2
+
+
A
/lAe ' + be aA+ b)e'"
Al
= O·e A• = 0
Therefore, fis in S. But, since AI and A2are roots of the characteristiC equation, this means thai e A" and eA,. are in S. The set {e A" , eA,.} is also linearly independent, since if CleM
+
'2 eA" = 0
then, setung t = 0, we have
Next, we set t = 1 to obtain cl e A, - ele A, ;;;: 0
°
or
el(e A, - eA,)
=0
But e A, - e A, "" 0, since eA, - e A, = implies that e A, = eA" which is clearly impossible if At A2• (See Figure 6.20.) We deduce that e. ;;;: 0 and, hence, C7 = 0, so {col. ,·. eA, ,} is linearly independent . Since dim S = 2, teA", eA,,} must be a basis for S.
*"
Section 6.7
Applications
52'
)'
------------ ,
,, ,, ,, ,,
- ---
Flgur. &.20
(b ) You are asked to p rove this property in Exercise 21.
Example 6.89
r:ind all solutio ns of y. - 5y'
+ 6y =
O.
III.U,. The characteristic equation is Al - SA + 6 :::: (A - 2) (A - 3) = O. Thus. the roots are 2 and 3, so le2', ell} IS a basis for the solution space. It follows that the so lutions to the given equation are of the form
The constants C1 and ~ can be de termined If additional equations, called bou"d· ary conditions, are specified.
Example 6.90
r:i nd the solut io n of y.
+ 6y' + 9y "
0 that satisfies r(0) = 1, y'(O) = O.
Sa1111 •• The characterist ic equation is A< + 6A + 9 = (A + 3)2 = 0,50 - 3 isa re· peated root. Therefore, le- Jr , te- lr ) is a basis for the solution space, and the general solution is of the form
The first boundar y co ndition gives
1 = y(0) == so y =
(, -J '
l O cle- '
+0
= c.
+ ~ te-l'. Differentiating, we have y' = - 3e-)r
+ (2( -31e- lr + e- J ')
so the second boundary condition gives
0= )"(0) = - 3e- 3"!l + 'i(0 + e- l ' U) = - 3 +
(2
Chapler 6 Vector Spaces
528
~
0'
= 3
Therefore, the required solution is
y
=e
J,
+ 3Ie- 1, = (1
-+
Theorem 6.33 includes the case in wh ich the roots of the characteristic equation are complex. If A = P + qi is a complex root of the equation,\2 + aA + b "" 0, then so IS its conjugate'\ ::= p - qi. (See Ap~n d ictS C and D.) By Theorem _6.33(a), the solution space S of the differential equation y" + ay + by ::= 0 has Ie-I', I '} as a basis. Now eA. = e(p... .,.) ' "" el"e'<'" "" el"(cos ql
and
so
eA' ::= e(p- .,."}r ""
e!" cos ql =
eP'1--.) ""
e-I' + e-I' 2
+ isin ql)
eP'(cos ql - i sin qt)
e-I' - e-I'
and eI" sin ql =
2,
It foll ows that leI" cos qt, eP' sin qt\ is contained in span(~', i') = S. Since el" cos ql and eI" sin ql are linearly independen t (see Exercise 22) and dim S :: 2, le P' cos ql,
Sin ql} is also a basis for S. Thus, when its chara cteristic equation has a complex root p + qi. the differential equation y" + ay' + by ::= 0 has solutions of thc for m e l"
y - (l ei" cos (II + 0.e!" sin ql
Example 6.91
Find all solutions of y" - 2y'
+ 4 = o.
Solallol The characteristic equalion is ,.\2 - 2,.\ + 4 = 0 with roots I ± iv3. The foregoing discussion tells us that the general solutio n to the given differential eq uation is
bam pIe 6.92
A mass is atlached to the end of a vertical spring (Figure 6.21). If the mass IS pulled
d ownward and released. it will oscillate up and down. 1'''''''0 laws of physics govern this situation. The fi rst, Hooke's Law, states tha t if the sprmg IS stretched (or compressed) x units, the force F nceded to restore It to its origll1al position is proportIOnal to x: F = - kx
.,>
~
G1
~
0 x
flaur • •.21
-<~
where k is a positive constant (called the spring constant ). Newton's Second Law of Mafia" states that fo rce equals mass times acceleration. Since x = X(I) represents diStance, or displacement , of the spring at time I, X gives its velocity and x" lis acceleration. Thus, we h3ve
mx" :: -kx o r x!" +
(!.)x
= 0
'"
Since both k and m 3re posit ive, so is K = kIm, and o ur differential equation has the form x " + Kx. "" 0, where K is positive. The characteristic equatio n is ,.\ l + K = 0 wit h roots ±iYK. Therefore, the general solution to the differential equatio n of the osci llating spring is x =
'I cos vKt + 0. sin YKt
Sectioll 6.7
Applications
529
Suppose the spring is at rest (x = 0) at tim e I = 0 seconds and is stretched as far as possible, to a length of 20 em , before it is released. Then
0= x(O) =
( I
cosO
+ 'i sin O::=
(I
so X= S sin -vKI. Since the maximum value o f the sine fu nction is I, we must have S = 20 (occurring fo r the first time when t ::: ,"/2 VK), giving us the solut io n x = 20 sin W I
See Figure 6.22. x
20
x - 20 sin
Vii
10
-t-+- ~' - 10
- 20
Flllr.6.22
Of course. this is an idealized solution, since It neglecls any for m of resistance and predicts that the spring will oscillate forever. It is possible to lake damping effect s (such as friction) into account, but this simple model has served to introduce an im portant application of differential eq uatio ns and the techniq ues we have developed .
--tlinear Codes \'I'e now turn our attentIOn to Ihe most important, and most Widely used, class of codes: linear codes. In fact, many of the examples we have al ready looked at fa ll inlo this category. NASA has made extensive use of Imear codes to transmit pICtures from ou ter space. In 1972, the Mariner 9 spacecr
Definition
A p-ary linear code is a subspace C of Z;. , Ie.
530
Chapter 6
V«tor Spaces
Figure 6.23
Figure 6.24
The southern polar cap of Mars
Jupiter's red spot and the rings of Sa!Urn
As usual, ou r main interest is the case p = 2, the binary linear codes. Checking to see whether a subset cor l.l is a subspace involves showing that Csatisfies the condi tions of Theorem 6.32. Since in l.l the only scalars are 0 and I, checking to see whether C is closed under scalar multiplication o nly involves showlIlg that C conta in s the zero vector. All that remains is to check that C is closed under addition,
(Kample 6.93 Are C I
-
0 0 , 0 0
0 0 I I
,
I
I
I
I
0 0
,
and ~ =
I I
I 0 I 0 , 0 , 0 I 0 0
(binary)
linea r codes?
SolallOI
CI clearly contains 0 and is closed under addition, so it is a linear code. is not closed unde r add ition , since it does not contain I
I
0
o + o
0
0
I
I
c;
H ence, Cl is not linea r,
For the remainder of this section, we will d ispense wi th the adjective " binary," since all of the codes we will be consid ering will be binary codes. If a linear code C is a k-dimensio nal subspace of l.~, then we say that C is an (n, k ) code.
ExamDle 6.94
(a) The code CI in Exam ple 6.93 is a subspace o f l.~ and has dime nsion 2, since
o o , I
I
I I
o o
Section 6.7 Applications
531
is a basis for CI • (In fact. CI has exactly three d ifferent two-clemen t bases. What are the other two? See Exercise 31. ) Hence, C I is a (4, 2) code. (b) The (7.4) Hamming code H introduced in Section 3.7 is a (7. 4) linea r code (fortunately!), in our new terminology. It is linear because il has a generator matrix G, so its vectors are all the vectors of the form Gx, where x is in But this is just the column space of the 7X4 matrix Cand so is a subspace of Since the four colulllils of C are linearly independent (why?), they form a basis fo r H. Therefore, H is a (7, 4) code.
l;.
l ;.
(c) The codes c ~
o o, o
I
o
I
0,1,0,\
I
o
1
1
o
I
0 I
arc d ual codes. It is easy to see that each of these is a linear code, that d im C = \, and that dim CJ. = 2. (Check these claims.) Therefore, C is a (3. I) code and CJ. is a (3. 2) code. The fact that 3 = I + 2 is not an accident, as the next theo rem shows.
Theorem 6.34
Let Cbe an (n, k) linear code. a. The dual code CJ. is an ( /I, /I - k) linear code. b. C contains 21 vectors, and CJ. contains 2n - 1 vectors.
Prool (a ) Since C is an (II, k) linear code, il is a k-d imensional subspace of l2' Its dual CJ. is Ihe orthogonal complement of Cand so is also a subs pace of l~, by Th eorem 5.9(a}. Thus, CJ. is a linear code. Now we can apply Theorem 5. 13 to show that dim
C-
=
/I -
dim C =
/I -
k
(Note: Theorems 5.9 and 5.13 arc true if Rn is replaced by l2' This is the case fo r most of the nongeometric results about o rthogonality.) It follows that CJ. is an ( II, /I - k) code. (b) Let {vl"'" form
vd be a basis for C. Then the veclors in v = ' IV I
+
C2V 2
+ ... +
Care all the vectors of th e
c~vk
where each c, is either 0 or I . Therefore, there are two possibilities for ' . and, fo r each of these, two possibilities for "', and so on, making the total number of possibil ities for v 2X2X"'X2 = 2k ~
The Reed-MuHer codes are named
after Ihe computer scientist.s Irving S. Reed and David E. Mu ll er, who
...... 110m.,.
~
Thus, C contains exactly 2" vectors. Applylllg this formula to its (/I, /I - k) d ual code, we see that CJ. has 2 n - k vectors.
published papers, independently, about these codes in 1954.
We now construct one of the oldest famili es of linea r codes, the Reed-Muller codes. As mentio ned in the introduction to this section, this is the type of code fhat
531
Chapter 6
Vecto r Spaces
Recall that the binary, or base two, representation of a number arises frOIll writing it as a sum of distinct powers of two. If n = b.' 2' + ... + bl • 2 + boo where each b, is 0 or I, then in base two n is represented as ' 1 = b;· ·· blbo- For example, 25 16 + 8 + I = I' 2· + 1 · 2' + 0. 21 +0.2+ l,sothebinary representation of25 is 11001.
was used by the Marmer 9 spacecraft ("Q transmit pictures of Mars. In order to be transmitted, each photograph had to be broken down into picture elements, or pixels. This was done by overlaying the photograph with a 700X832 pixel grid and then assigning to each pixel one of 64 shades of gray, ranging from white (0) to black (63). Since 64 = 26, we can use binary arithmetic to represent each oflhese shades: white is 00000o and black is 111 111. We ca n then rewri te these 64 binary numbers as vectors in ~ and encode [helll using a code that corrects as many errors as possible. The code that was chosen for use by Mariner 9 belongs to a large fam ily of codes that are most easily defined inductively.
Definition
The (first-order) Reed-Muller codes R" are defined inductively as
fo llows: I. For
2. For
Ro
= 0, = /I 2: I , R~
/I
Zl = 10, l\. is the subspace of Zr whose basis consists of all vectors of
the form
where U IS a basis vector in R. _I' 0 is the zeTOvector in Z2r' ,lind I is the vector of is in Z12·-' • To get a sense of what vectors these codes contain, let's use the definition to constrUCt RI and R2• A basis for Ro = Zl is just III. so a basis for R[ is
{[:]. [~]} Thus, by closure under addition, RI must also con tain the vectors
[:1+ [~l [~l
[: 1+ [:1 [:1
. nd
It is easy to check that no other vectors can be obtained by addition, so
Sim ilarly, a basis for R2 is I
0
I
I
I
,
I
0
,
I
0 0 I I
and, by closure under addition, it is easy to check that the 8 = 23 vectors in R2 are
R2 =
0 0 , 0 0
0
0
0
I
I
I
I
0
I
I
0
0
I
I
I I
,
0 I
,
I
0
,
I
0
,
0 I
,
0 0
,
I I
$eelion 6.7
Applicallons
518
NotICe that in Rl every code vecto r except 0 and I has weight I, and in Rl every code vector except 0 and I has weight 2. This is a general property of the Reed-Muller codes, and we prove it as part of the next theorem. But first, note that the complement of a vector x in lj is the vecto r x obtained by changing all the zeros to Is and vice versa. For example, J
x=
J
°
<=> x =
J
Observe that x "" x
Theoll. 6.35
+
° ° ° J
I, where I is the vector co nsisting enti rely of [5.
+
Fo r tI 2:. 1, the Reed-Muller code R" is a (r, tI vector except 0 and I has weight 2"· I.
I) linear code in which every code
Prool We will prove this theorem by induction on 11. For n "" I, we have already seen that Rl = Z~ is a (2, 2) = (2 1, I + 1) linear code in which every code vecto r except 0 and I has weight 1 = 21- 1• Assume that the result is true for 17 "" k; that IS,assume that Rl is a (2", k + I) linear code in which every code vecto r except 0 and I has weight 2k- l. Now consider Rk... 1• By constructio n, RH 1 has a basis consisting of vectors of the for m in Rl , together with the vector
[ ~] , where u is
[ ~ ] . By the inductio n hypothesis, the vectors u, 0, and
zf" . Mo reover, the dimenSIOn of Ri is k + I. so there are k + I vectors of the fo rm [ ~] and a ile more, [ ~ ]. It fo llows I are in
zf; hence, the basis vcctors for
R l +1
are in
that the dimension of R1+. is k + 2, and therefore Rk + 1 is a (2HI, k + 2) linear code. For the fin al assertio n, note that the vecto rs in R l+l are obtamed as linear combinations of the baSIS vectors and so are of the fo rm V =C I
["'J
+"'+C1+1
U1
["'' J
+C1+2
Uu I
[0] I
zi.
where lUI'" ., Uk ~l} is a basis for Ri' 0 and 1 arc in and each c, is 0 o r 1. Suppose v :l 0, 1 and I~t u = 'l U I + .. + ' HI Ul+I' (Hence, u is 111 Rk .) If Ck+ l = 0, then u :I 0, I, so, by the induction hypothesis, u has weight 21 - 1• !lut then v has weight 2 . 21 - 1 = 2~. l f "+l = I, then v has the form
v~ [:] + [~J = [u~ .] = [~J where u is in Rt . Sin ce
w(ii) = 2' - w(" ) (why?), we have w(v) = w(u )
+ w(U)
as requ ired. This completes the induction, and for all /I 2: 1.
w~
=
2l
conclude that the theorem is true
Chapter 6 Vector Spaces
534
As noted. Mariner 9 required a code with 64 = 26 vectors. By Theorem 5. the Reed-Muller code f4, has dimension 6 over 1 2, As you will see in the next chapter, it is also capable of detecting and correcting multiple errors. Thai is why the Reed-Muller code was the one that NASA used for the transmission of the Mariner photographs. Exercises 3S-38 explore further aspects of this Important class of codes.
1A HOmOgeneOuS linear Dlllerentlal Equations /n Exercises /- /2, find Iile so/ulioll of Ille differential equation Iilal SlItisfies the givell bOlllldcl f Y condilion(s}. \. ( - 3y = O,y(I ) = 2 2. x'
+ x = O,x( l )
3.)'" - 7(
4, x"
+ x'
+
= 1
12y = O,y(O) = y( 1) = I
- 12x
= 0, .«0) = 0, x'(0)
5. r - r - /= 0,/(0 ) -
=
I
0,f( 1) = I 6. (, - 2g = O,g(O ) = 1,g(1) = 0 7. y" - 2(
+ Y = O,y(O)
= y( 1) = I
8, x" + 4x' + 4x = 0, .«0)
*
= I,
x'(0 ) = I
•. y" - I'y = 0, k O,y(O) = ( 0) = I 10.y· - 2k( + I'y= O,k O,y(O) = l,y(l) = 0 II. 2f' + 5/ = 0,/(0) = 1,/(,,/ 4) = 0 12. /I" - 4/r' + Sir = 0, h(O) = 0, 1r'(0 ) = - I 13. A strain of bacteria has a growth rate that is proportional to the size of the population. Initially, there are 100 bacteria; after 3 hours, there arc 1600. (a) If p( t) denotes the number of bacteria after t hours, fi nd a formula for p{ t). (b) How long docs it take for the population to double? (c) When will the population reach one million?
r-
CAl
*
14. Table 6.2 gives the population of the United States at IO-year intervals for the years 1900-2000. (a) Assuming an exponential growth model, use the dal'a for 1900 and 1910 to find a for mula for p(t), the population in year t. ( H im: Let t = 0 be 1900 and let t "" I be 1910.) How accurately does you r formula calculate the U.S. population in 2000? (b) Repeat part (a), but use the data for the years 1970 and 1980 to solve for p( t). Does this approach give a better approximation for the year 2000? (c) What can you conclude about U.S. population growth?
Tabla 6,2 Year
Population (in millions)
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
76 92 106 123 131 ISO 179 203 227 250 281
Soli'''': u.s.
Bur~au O( lht Censu$.
15. The half-life of radium -226 is 1590 yea rs. Suppose we start wi th a sample of radium-226 whose mass is 50 mg.
(a) Find a formu ltl for the mass 11/(1) remaining after t years and use this formula to predict the mass remaining after 1000 years. (b) When will only 10 mg remain?
16. Radiocarbon datmg is a method used by scientists to estimate the age of ancient objects that were once living matter. such as bone, leather. ",'00<1, or paper. All of these con tain carbon, a proportion of which is carbon14, a radioactive isotope that is continuously being formed in the upper at mosphere. Since living organisms take up radioacti\'e ca rbon along with other carbon atoms, the ratio between the two for ms remaillS constant. However, when an organ ism dies. the carbon- 14 in its cells decays and is not replaced. Carbon- 14 has a known half-life of 5730 years, so by measuring the concentration of carbon- 14 in an object, scient ists can determine its approximate age.
Section 6.7 Applic;ltions
535
lei 0 = O( t) be the angle of the pendulum from the vertICal. It can be shown that if there is no resistance, then when 0 is small it satisfies the differential equation 0"
+ 8(J = O L
where 8 is the constant of acceleration due to gravity, approximatcly9.7 Ill/5 2. Suppose that L = I III and that the pendulum is at rest (i.e., 0 = 0) at ti me I = 0 second. The bob is then drawn to the right at an angle of 01 radians and released.
nllul6.25
(a) Find the period of the pendulum. (b) Does the period depend on the angle 0 1 at which the pendulum is released? This question was posed and answered by Galileo in 1638. [Gali leo Galiki (1564-]642 ) studied medicine as a studen t at the University of Pisa, but his real interest was always mathematics. In 1592, Galileo was appointed professor of mathematICS at the University of Padua in Venice, where he taught prima rily geomct ry and astronomy. He was the first to usc a telescope to look at the stars and planets, and in so doing, he produced experimental data in support of the Copernican view that the planets rcvolve around the sun and not the earth. Fo r this, Galileo was summoned before the Inquisition, placed under house arrest, and for bidden to publish his results. While under house arrest, he was able to write: up his research on falling objects and pendulums. H IS notes were smuggled out of Italy and published as Discourses OIl Two New Sciences in 1638.1
Stonehenge
One of the most successful applications of radiocarbon dating has been to determine the age of the Stonehenge monument in England (Figure 6.25) . Samples taken from the remains of wooden posts were found 10 have a concentratio n of carbon- 14 that was 45% of that found in livi ng material. What is the estimated age of these posts? 17. A mass is attached to a spring, as in Example 6.92. At time t = 0 second, the spring is stretched to a length of
to cm below I(S position at rest. The spring is released, and its length 10 seconds later is observed to be 5 cm. Find a fo rmula for the length of the spring at ti me t seconds. 18. A 50-g mass is attached to a spring, as in Exam-
ple 6.92. If the period of oscillation is 10 seconds, find the spring constant. 19. A pendulum consists of a mass, ca lled a bob, that is affixed to the end of a string of length L (sec figure 6.26). When the bob is moved from its rest position and released, it swings back and forth . Th e time it takes the pendulum to swing from its fa rthest right position to its farthest left position and back to its next farthest right positio n IS called the period of the pendulum.
20. Show that the solution set S of the second-order differential equatIOn y" + ar' + by = 0 is a subspace of ':!P. 21. Prove Theorem 6.33(b) . 22. Show that epr cos qt and cpr sin qt are linea rly independent.
linear Codes Wllie" of the codes in Exercises 23-30 are lim!tlr codes?
,,
,,
,, ,
23,
c-
{[~ltl'[;]}
24,
c-
{ [~l'[:]}
,
25. C = Fl •• ,. &.26
o o, o
I
I
0 , I , I
o
0 I
o
536
Chapter 6
26. C =
27. C
=
I 0 0 0 0 , 0 , I , 0 0 0 0 I
34. If C is an ( n, k ) linear code that is self dual, prove that II must be even. I Hinr: Use the analogue in Z'i of Theorem 5.13. J
0
35. Write oul the vectors in the Reed-Muller code Ry
0
0
,
0
,
I
0
28. C =
Vtctor Spaces
I
0
,
I
0
36. Defi ne:l fa m ily o f matrices Inductively as fo llows: Go = [ II and,for IIi:!: I ,
0 I
0
I
0
I
0
I
0
0
I
I
0
I
0
,
0
I
0
,
0 I
,
0
,
0
I
,
I I
I
29. T he even parity code En (Sec Exercise 18 in Section 5.5.) 30. The odd parity code O~ consisting of all vectors in Zi' with odd weigh t 3 1. Fi nd the othe r IwO bases for the code C 1 in Exam ple 6.94. 32.
Cal
If a (9, 4) linear code has generator matrix G and parity check matrix P, what arc the dimensio ns of
Gand P?
(b) Repeat part (a) for an ( n, k) linear code. 33. For a linear code C, show thai (C.l ) L = C witho ut usi ng matrices.
G, ~ [G", [O j G.. _ I t :
where 0 is a zero vector and I is a vector consisting entirely of Is. (a) Writ~ o ut GI , G., and Gl . (b ) Using inductio n, prove that fo r all n ~ 0, Gn is a
generator matrix for the Reed-MuUer code R,.. 37. Find a parity check ma trix for R2• 38. Find a parity check matrix for Rj •
39. Prove that, for a linear code
C. either all the code
vectors have even weight or exactly half of them do. ( Hint: Let E be the set of vectors in C with even weight and 0 the: sel of vectors in C with odd weight. I f 0 is not empty, let c" be in 0 and consider 0 ' "" Ie. + e : e in fl. Show that 0' "" 0.1
lew Definlllons and basis, 450 Basis Theorem, 457 change-of-basis matrix, 469 composi tion of linear transformat io ns, 48 1 coordinate vector, 453 dtagonalizable linear transformation, 5 13 dimension. 457 Fundamental Theorem o f I nvcniblc Matrices, 516 identity transformatIon, 478 invertible linear transformatio n, 482
isomorphism, 497 kernel of a linear transformation, 486 linear combin::u ion of vectors, 437 linear transformation, 476 linearly dependent vectors, 450 linearly independent vectors, 450 matrix of a linear transformation. 502 nullity of a linear transformation, 488 one- lo-one, 492 o nto, 492
range of a li near transformation, 486 rank of a linear transformation, 488 Rank Theorem, 490 span of a set of vectors, 442 standard basis, 451 subspace, 438 trivial subspace, 44 1 veclor, 433 vector space, 433 zero subspace, 44 1 zero transformalion, 478
Review Quesllons I . Mark each of the following statements tr ue or false;
(a) If V = span(v l •.•• , vn ), then every spanning set fo r V contains at least 11 vectors.
(b ) If lu, v, wi is a linearly independent set of vectors,
then so is lu + v, v + w, u + w}. (c) M21 has a baSIS consisling of invertible matrices.
531
Chap ter Review
(d)
M2 2 IS
has a basis consisting o f m atrices whose trace
zero.
(e) The transfor mation T: IR~ --+ IR defin ed by T(x) =: Ixl is a linear tran sfo rmatio n. (f) If T: V --+ W is a linear transfo rmation and dim V '* di m W, the n T can no t be both one-to-o ne and onto. (g) If T: V --+ W is a linear transfo rmation and ke r( T) = V, then W= {Ol . (h) If T: M n --+ ~4 is a linear transformation and nullity( T ) = 4, then Tis onto. (i) T he vector space V =: Ip(x) in Ql'4 : p{I ) =: OJ is isomorphic to @I,. If I: V --+ V is the identity transfo rmatio n, then the matrix [l] e.... /) is the identity matrix fo r any bases Band C o f v.
m
In QllestiollS 2-5, determine whether W is a subspace ofY. 2. V = 1R2, W=
U;]
:x
3. V = M 21 • W = {[:
2
+ 3l
=
~] : a + b
o} :=
c+ d
w~
(p(x) ;n '.J>" x'P(I / x) ~ p(x)) 5. V = fJ. W= {fin ?i;f(x + 17 ) = f(x) for all x}
'.J>"
6. Determ ine whe ther 11, cos 2x, 3sin 2xJ is linearly dependent or independent. 7. Let A and B be nonzero nX n matrices such that A is sym metric and B is skew-symmetric. Pro ve that lA , Bj is linea rl y independen t.
III QllcStiOIlS 8 (lnd 9,fi"d (I basis for Wand state the dimensioll ofW
8.W ~ {[: :] a+d ~ b+<} 9. W ~ (p(x) ;n '.J>" p( - x) ~ p(x))
III QrlestiollS 11- 13, determine whether T is a linear trmlsformation. II . T:
R2 --+ JRl d efined by "/1x) = yxTy, where y =
1 [2 ]
12. T : Mnn --+ M nn defined by T( A ) = A TA 13. T: Ql'n -+9Pndefi ned by T(p(x)) = p(2x- 1)
14. If T: W' l --+ M21 is a linear transfo rmation such that
T(I) ~[ ~ nT(I +X)~ [~ T( J + x + x 2 ) =
0 -I] [ I
:],nd
O. fi nd T(5 - 3x + 2x2 ).
15. Find the null ity of th e linea r transformation T: M m , --+ IR defin ed by T(A) = tr(A). 16. Let W be the vector space of upper triangular 2 X2
m atrices.
=a+ c=b+d } 4. V ~
I O. Find the change-of- basis mat rices Pc..../) and PB.....c with respect to the bases B = I 1, 1 + x, 1 + x + x"} and C = Ii + x,x+ xl, I + xl} of~ 2 '
(a) Find a linear transformation T : Mn --+ M21 such that ker(T) = W. (b ) Find a linear transform ation T : M12 --+ Mn s uch tha t range( T ) = W.
17. Find the matrix I T lc....a o f the linear transformation T in Question 14 with respect to the standa rd bases B = {I, x, Xl) ofQJl 2 and C = {Ell' E12 , ~ L> !;,2} of M n· 18. Let 5 = {VI' ... , v n} be a set of vectors in a vector space V with the property that every vecto r in V can be written as a linear combination of V I' .•. , V n in exactly one way. Prove that 5 is a basis for V. 19. If T: U --+ V and 5: V --+ Ware li nea r transformations such that ra n ge(T) C ker(S), wha t can be deduced about So T? 20. Let T: V --+ V be a linear transformatio n, and let {VI' ... , v n } be a basis fo r V such tha t {T(v , ), ... , T(v n )} is also a basis fo r V. Prove that Tis inve rtible.
iSlanc I
A stralght/ine may be the shortest dislmlce betwun two points, but il
is by no lIlea/lS the most . . mteresrmg. -Doctor Who In ''The Time Monster" By Robert Sloman BBC,1972
A/though Illis may seem a pnradox, all exact sCIence is dominated by the idea of approximation. - Bertrand Russell In W. H. Auden and L. Kronenberger, eds. Tile Vikillg Book of Aphorisms Viking, 1962, p. 263
B
A
Fluurll.1 Taxicab distance
538
Ii
1.0 Introduction: Taxicab Geometrll We live in a three-dime nsional Euclid ean wo rld , and therefore, concepts fro m Euclidean geometry govern our way of looking at the world. In particular, imagine stopping people on the street and asking them to fill in the blank in the following ." They will a lmost sen tence: "The shortest distance between two points is a certainly respond \vith "straigh t line." There a re, however, other equall y sensible and intuitive notions of d istance. By allowing ourselves to think of "distance" in a more flexible way, we will open the door to the possibility o f having a "distance" between polynomials, funct ions, mat rices, and many other objects that arise in li near algebra. In this section, you will dIscover a type of "distance" that is every bit as real as the straight-line distance you are used to from Euchdean geometry (the one that is a consequence of Pythagoras' Theorem). As you'll see, this new type o f "distance" still behaves in som e fam iliar ways. Suppose you are standing at an mtersection lt1 a city, trying to get 10 a restaurant a l anolher intersection . If you ask someone how far il is to the restaurant, that person is unlikely to measure distance "as the c row flies " (i .e., usmg the Euclidean version of distance). Instead, the response will be someth ing like " It's five blocks away." Since thIS is the way taxicab drivers measure dis tance, we will refer to this notion of "dIstance" as taxicab distance, Figure 7.1 shows a n exam ple of taxicab d istance. The sho rtest path from A to B req uires traversing the Sides of five city blocks. Notice that although there is more than one route from A to B, all shortest ro utes requ ire th ree horizontal moves and two ve n ical moves, where a "move" corresponds to the SIde of one city block. (How many shortest routes are there from A to B?) Therefore, the taxicab distance from A to B is 5. Idealizmg thIS situation, we will assume that all blocks a re unit squares, and we WIll use the notatIon d,(A, B) for the taxicab distance from A to B.
Problell 1 Find the taxicab distance between the followrng pairs of points:
(,) ( 1,2)ood(S,5)
(b) (2,4),nd (3, - 2)
(e) (0,0) ,nd (- 4, - 3)
(d) (- 2,3) ,nd (I, 3)
(e) (I, D and{ -L D
(f) (2.5,4 .6)and(3 . 1,1.5)
Section 7.0
Introduction: Taxicab Geometry
539
Proble. 2 Which of the following is the correct formula for the taxicab distance d ,(A, 8 ) between A = (a l • a2) and B = ( hI> b2)?
(,) d,(A, B) ~ (a, - b,) + (a, - b,) (b) d,(A, B) ~ (la,1- lb,l) + (I.,I- lb,l) «) d,(A, B) ~ la, - & ,1 + la, - b,1 We can d efi ne the taxicab " orm of a ve
Ilvll, ~ d,(v, 0) Problem 3 Find
~ v ll ,
fo r the followi ng vectors:
[-!l
~ [ -~ l
(b)
F
«)V ~[ =~l
(d)
v~
(.) v
[;l
Problem 4 Show that Theorem 1.3 is true for the taxicab norm. Problell 5 Verify the Triangle Inequality (Theorem 1.5), using the taxicab norm and the fo llowing pairs of vccto rs:
Problem 6 Show that the Triangle Inequality is true, in general, fo r the taxicab norm. In Euclidean geometry, we can define a circle of radius r, centered at the origin, as the set of all xsuch that ~ x ll : : : r. Analogously, we can defme a taxicab cirde of radius r, centered at the origin, as the set of all xsuch that Il xll, = r.
Problem 1 Draw taxicab circles centered at the ongin with the following radii: Proble. 8 In Euclidean geomet ry, the value of 7T is half the circumference of a un it circle (a circle of radius I ). Let's define taxicab pi to be the num ber 7T, that is half the circumference of a taxicab uni t circle. What is the value of 7T,? In Euclidea n geometry, the perpendicular bisector of a line segment A8 can be d efined as the set of all points that arc eq uidistant from A and B. If we use taxicab diStance instead o f Euclidean dis tance, it is reasonable to ask what the perpendicular bisector of ~i ne segment now looks like. To be precise, the taxicab p erpendicular bisector of AB is the set of all points X such that
d,(X, A)
~
d,(X, B)
Problem 9 Draw the taxicab perpendicular bisector of A B for the follow ing pairs of points:
(. )
A =( 2 ,1), B~ (4 , 1 )
«)
A = (I , 1), B
~
(5,3)
(b) A (d) A
= ( -J,3), B~( - I , - 2) ~
(I,
1),Jl ~
(5,5)
As these problems illustrate, taxicab geometry shares some properties with Euclidean geometry, but it also differs in some striking ways. In this chapter. we will
Chapter 7
Distance and AI'Proxim;lIion
encounter several other types of distances and norms, each of which is useful In lts own way. We will try to discover what they have in common and use these common properties to our advantage. We will also explore a variety of approximallon problems Hl whICh the notion of "distance" plays an important role.
Inner Produci Spaces In Chapter I, we defined the dot product U ' v of vectors u and v in IA:~, and we have made repeated use of this operation throughou t this book. In this section , we will use the properties of the dot product as a means of defining the general notion of an illllcr product. In the next sec tion, we wili show that inner products can be used to define analogues of "length" and "distance" ;n vector spaces other than R~. The following definition is our starting point; it is based on the properties of the dot product proved in Theorem 1.2.
DannUlon
An inner product on a vector space Vis an operation that assigns to every pair of vectors u and "II in Va real number (u, v) such that the following properties hold for all vectors U, "II, and w in Vand ali scalars c l. (u, v) = (v. u) 2. (u. v + w) :::: (u, v) + (u, w) 3. (ro, v) = q u, v) 4. (u, u) <2: 0 and (u , u) :::: 0 if and only if u = O.
A vector space wit h an inner product is called an inner product space.
•.••r.
l echnically, thISdefinition defines a real inner product space, since it assumes that Vis a real vector space and since the inner product of two vectors is a real number. There are complex inner product spaces tOO, but their defi nition IS somewhat different. (See Exploration: Vectors and Matrices with Complex En tries at the end of this sect ion. )
Elample 1.1
IR~
is an inner product space with (u, v) :::: u ' v. Properties (I) through (4) were verified as Theorem 1.2.
The dot product is not the only inner product that can be defined on R-.
Elample 1.2
1
Let u :::: [ .. , and v :::: III
[v'1be two vectors in 1R. V2
defines an inner prod uct.
2 • Show
that
Sectio n 7.1 Inner Product Spaces
SallUOI
Ul
We must verify properties ( I) th ro ug h (4). Property ( 1) holds because
Next, let w
=::
[::J.
We check that
+ 211l Wl +
=: 214 l Vl
= =:
3ul v!
+ 3Ul Vl ) + (u, v) + (u, w) (211 1V I
+
(2111WI
311 l Wl
+
3ul wl )
which proves property (2). If e is a scalar, then
(eu, v) = 2(eu.)v1 + 3(CI42 )V1 =
c(2 11 jVI
+
3u Zv1)
= ({u, v) which verifies p roperty (3). Finally,
and it is dear that (u , u) = zU: + 314 = 0 if and o nly if II , = III = 0 (that is, if and only if u = 0). This verifies property (4), completing the proof that (u, v), as defined. is an in ner product.
Example 7.2 can be generalized to show that if w •• ... , w~are poSItive scalars and
u, • •
and
v =
u. are vecto rs in
"
'.
Rn, then
defines an inner product o n An, called a weighted dot product. If any of the weights w, is negat ive o r zero, then equation (I ) does not define an inner product. (Set Exercises 13 and 14 .) Recall that the dot product can be expressed as u' v = u Tv. Observe that we can wnte the weighted dot product in equation ( 1) as (u. v) = uTWv
5U
Chapter 7
Distance and Approxi mation
where W is the nX n diagonal matrix
IV -
w,
0
0
0
0
0
0
0
o
0
0
w.
The next example fu rther generali1.t:s this Iype of inner product.
EIIIIPle 1.3
Lt:t A bt: a symmetric, positive defi nite /I X n matrix (see Section 5.5) and let u and v be vectors in Rn. Show that
defines an inner product.
SollllOl
We check that (u, v) ;; uTAv ;; u ·Av = Av· u = ATv·u _ (vTA)T . u = vTAu = (v, u)
Also,
and
Finally, since A is positive definite, (u, u) = uTAu > 0 for all u U TAu = 0 if and only if u = o. This establishes the last property.
To illustrate Example 7.3, lei A = [
4
-2
'* 0, so (u , u)
:2
-2]
7 . Then
The matrix A is posi l'ive defi nite, by Theorem 5.24, since Its eigenvalues are 3 and 8. I-Ience. {u, v) defin es an inner product on H2, We now defi ne some inner products on vector spaces other than IRn,
EllmPle 1.4
In ~2, letp(x) = llo+ (l IX+ (l2x2andq(x) = bo +
(p(x). q(x)) = nobo +
b,x + /'2X1, Show that
(llb l
+
(l2b l
defin es an inner product on £1>2' (For example, if p(x) = I - 5x + 6 + 2x then (p(x), q(x» = 1· 6 + (- 5) . 2 + 3 · ( - 1) "" -7.)
r,
3r and q(x)
E:
SOlutiOD Since (jp 2 is isomorphic to H), we need only show that the dot product in R~ is an inner product, which we have already established.
Section 7.1
Example 1.5
Inner Product Spaces
543
Let f and g be in «5 [a, h] , the vector space of all continuous fun ctions on the closed interval [a, bJ. Show that
(f, g) defines an inner product on '€ [a,
Solution
We have
(f, g)
~
r
[( x)g(x) dx
•
hI.
r
~
~
[(x)g(x) dx
•
r
g(x)[(x ) dx
~ (g,f)
•
Also, if Ii is in '€ la, hI , then
(f, g + h) =
r r
f(x)(g(x) + h(x)) dx
•
([(x)g(x) + [(x)h(x)) dx
•
r
[(x)g(x) dx +
•
= (f,g) + (f, h) If c is a scalar, then
(of, g)
~
r
r•
[(x)h( x) dx
,[(x)g(x) dx
•
~
,r
[(x)g(x) dx
•
~
Finally,if, f) =
f
clj. g)
b(f(X»2 dx 2: 0, and it follows from a theorem of calculus that, since f
• is continuous.lj.f) =
r
(f(X» l dx = 0 if and on ly if f is the zero fun ction . T he refore,
•
(f, g) is an in ner product o n '€ [n, hI.
Example 7.5 also defines an inner product o n any subspaceoftfl. [a, bJ. For example, we could res trict our attent ion to polynom ials defined o n the interval [a, bJ. Suppose we consider '!J> [0, II> the vector space o f all polynomials o n the interval [0, 1J . Then, using the inner product of Example 7.5, we have
{x 2,1 +
~ = (x2(1 + x) dx = ( X2+ x') dx o
=
0
[Y! + x4jl = .!.+.!.=:~ 3
4
(I
3
4
12
544
Chapter 7
Distance and Ap proxi mation
Properties ollner Produc.s The following theorem summarizes some additional properties that follow from the definition of inner product.
I
Theare. 1.1
Let u, v, and w be vectors in 3n inner product s.E.3ce Vand Jet c be a scalar: a. (u+v,w) = (U,W/+ {V,W) b. (u, cv) = C(U,VI c. lu , O)~IO,v)~ O
i
We prove property (a) , leaving the proof of properties (b) and (c) as Exercises 23 and 24. Referring to the definition of inner product, we have
ProOf
(u + v, w) = (w, u +
VI
By (1)
"" (w, ul + (w, VI
By (2)
= (u, W I + {v, WI
By (I)
lengtl, Distance, aDd OrtbOgOD8111V In an inne r product space, we can defi ne the length of a vector, distance between vec\ors, and orthogonal vec!ors,just as we did in Section 1.2. We sim ply have to replace every usc of the dOl product u . v by the more general inner product (u, v). Ii
DeHnlllon
Let u and v be vectors in an inner product s ace
I. The length (or "orm ) of v is ~ v~ = V (v, v). 2. The distance between u and v is d( u, v) = IIu - v~. 3. u and v are orthogonal if (u, VI = o.
Note that I v~ is always defin ed, since (v, v) 2 0 by t he definition of inner product, so we can take the square root of this nonnegative quan tity. As in Rn, a vector of length I is called a u,lit vector. The unit sphere in V IS the set S of all untt vectors in V.
Example 1.6
ConsIder the inner product on <eto, IJ given in Example 75. If f(x) = x and g(x) = 3x - 2, find
(,) II!I
(b) d (f,g)
Soll11al
(a ) We find that If, f)~
'" Ifl
«) If,g)
~ VIf,f) ~ 1/ V3
( ' p{x ) dx = ( IX2dX= xJjl =..!. Jo Jo 3 0 3
Section 7.1
(b) $; nce d(f, g) ~
U-
if -
5_5
~I ~ VI] ~
[ (x ) - 8(x) we ha ve
Inner Product Spaces
g,f - g) , nd x - (3x - 2) ~ 2 - 2x
g,f - g):: ((f(X) - g(x}f dx
=
,
r,
~
2( 1 - x)
4(1 - 2x + Xl ) dx
:: 4[X _X2+fL : Combining these fact s, we see that d (f,g ) :: (c) We compute
(f,g) =
r,
f(x)g(x) dx =
r,
V4fj = 2/v'3..
x(3x - 2) dx = ((3X 2 - 2x) dx = (x 3
,
-
X2J~ =
0
Thus,f and g are o rthogonal. It is impo rtan t to rem ember that the "d ist ance" between f and g in Example 7.6 does 1I0t refer to any m easurem ent related to the graphs of these functions. Neither does the fac t that [and g aTe orthogonal mean that their graphs intersect at right angles. We aTe sim ply appl ying the defimtion of a particular inner p roduct. However, in d oing so, we shou ld b e guided by the corresponding not ions in R2 and 1R1, where the inner product is the do t product. The geometr y of Euclidean space ca n still guide us here, even though we cannot visualize things in the same way.
EKample 1.1
Using the inner p roduct o n R2 defi ned in Example 7.2, draw a sketch of the unit sphere (circle).
SDlnllon If X = [ ; ], then (x, x) = 2xl + 3r·. Since the uni t sphere {circle} consists of all
x such that Ixll 1=
= I,wehave
I x~ = V{x, x) = \12x!- + 31
or
2X 2
+ 3y2 =
Th is is the equation of an ellipse, and its gra ph is shown in Figure 7.2.
y 1
y'j
-+- -HH-+-+-+- -I-l-l- -I-+-I-l-~x 1 1 -v'2 \ii
ngure 1.2 A unit circle thaI is an ellipse
I
546
Chapter 7
Distance and Approximation
We will discuss properties o f length. d istance, and o rthogonality in the next section and in the exercises. One result tha t we will need III this section is the general ized version of Pythagoras' Theorem. which e" tends Theorem J .6. ,
, e
i
Pythagoras' Theorem let u and v be vectors in an in ner product space V. Then u and v and only if
Proof
As you will be asked to prove
lu + v ~2
-
Exercise 32, we have
+ v, u +
(u
11 foUows immediately th:lI l u
In
+ vl1 1 =
v) = l u l 2
l ul2
+
2(u , v) + ~ v112
+ Ilvll if and o nl y if(u , v} ::: O.
on.ogonal Prolecllons and Ihe Gram-Scbmldl Process In C hapter 5, we disc ussed orthogonality in R~. Most of this material generalizes nicely to general inner product spaces. Fo r example. an orthogonal set of vectors in an inner product space V is a sct {V I' ... , vll o f vectors from V such that (v•• v) = 0 whenever v, :1= v}' An orthonormal sd of vectors is then an o rthogonal set of unit vecto rs. An orthogonal btuis for a s ubspace Wof V is just a basis fo r W tha t is an orthogonal set; sim ilarly, an orthonormal basis for a subspace Wof V is a basis for W that is an o rt ho norm al set. In IR". the G ram-Schmidt Process (Theorem 5.15) shows that every subspace has an o rt hogonal basis. We can mimic the construction of the Gram -Schmidt Process to show that every fin ite-dimensional subspace of an inner product space has an orthogo nal basis-all We need to do is replace the do t product by the more general inner product. We illustrate this approach wit h an example. (Compare the steps he re with those in Example 5.1 3.)
~
Example 1.8
Construct an o rthogonal basis for ~2 with respect to the inner product
(f, g) -
r-,
f(x)g(x) dx
by applying the Gram -Schmidt Process to the basis
SallUIi
l..etx l = I, x l = x, andx , =
\1. x. xl \.
xl. We begin
by sctting
VI
=
Xl
=
I.
Next we
comp ute
(VI, VI) =
II
dx = X]I
-I
- I
= 2
and
(V I' X ~ =
' xdx = X-,], =0 J 2 _I
-1
Se
541
Inner Product Spaces
Therefore,
•
To find vJ ' we first compute
Ji
J_]'
J
'J'
2
x2 dx =~
ij
• Adri!'n Marie Legendre ( 1752- 1833) was a French mathematician who worked in ast ronomy, number theory, and elliptic functi ons. He was involv(d in several heated disputes with Gauss. Legendre gave
the first pu blished statement of the law o f quadratic reciprocity in number theory in 1765. Gauss, however, gave the first rigorous proof of this result in 1801 and
claimed credit for the resul t, prompting understandable outrage
"
3 _]
(v1• x~ =
xl dx = ~'J' ' J _]
4
_]
=
0.
T hen
It follows that {v ]. v 1 • v31 is an orthogonal bas is for polynomials I.
1!J'2
on the in terval [ - I, I I. The
x,
are the first th ree Legetldre polytlomials. If we divide each of these polynomials by its length relative to the same inner produc t, we obtain normalized Legendre polynomials (see Exercise 4 1).
from Legendre. Then in 1806, Legendre gave the first published
apphcation of the method of least squares in a book on the orbits of co mets. Gauss published on the same topic in ] 809 but claimed he had been using the method since 1795, once again in fll riating Legendre.
Just as we did in Section 5.2, we can define the orthogonal projection proj w( v) of a vector v onto a subspace W of an inner product space. If !u ], . .. , U k} is an orthogonal basis for W, then
Then the compotlent olv orthogonal to W is the vector
, perpw(v)
fluure 1.3
As in the Orthogonal DecompositIOn Theorem (Theorem 5. 11 ), proj w( v) and perpw( v) are o rt hogonal (see Exercise 43), and so, schematically, we have the situation illustrated in Figure 7.3. We will make use of these fo rmulas in Sections 7.3 and 7.5 when we consider approximation p roblems-in particular, the problem of how best to approximate a
541
Chapter 7
Distance and Approx imation
given fun ction by "nice" funct ions. Consequently, we will defer any examples until then, when Ihey wi ll make more sense. Our Immediate use of orthogonal projection wi ll be to prove an Inequality thai we first encountered In Chapter I.
The Cauchy-Schwarz and Triangle Inequalities The proofs of Iden ti ties and inequalities involving the dot prod uct in IR" are easily adapted to give corresponding results in general in ner product spaces. Some of these .are given in Exercises 3 1-36. In Section 1.2, we stated without proof the CauchySchwarl lnequa lity, which is important in many branches of mathematics. We now give ;\ proof of this result.
Theor•• 1.3
The Cauchy-Schwarz Inequality Let u and v be vectors in an inner product space V. Then
with equality holding if and only if u and v are scalar multiples of each other.
Proof
If u
0, then the Illcqualit y is actually an equal ity, since
=
ko.vj = 0 = IIOIH This inequali ty was d iscovered by several different mathemat icia ns., in several different contexts. It is no surprise that the name of th e prolific Cauchy is attached to I t. The second name associa ted with this result is that of Karl Herman Amandus Schwdrz ( 1843- 192 1), a Ge rm an mathema tician who taught at the Unh'ersity of Berlin. His version of the inequali ty that bea rs his name was pubhshed In 1885 in a paper that used lIltegral equations to study surfaces o f minimal area. A third name also associated wi th this important result is that of the RUS$ian mathema tician Viktor YakovievilCh Bu nyako~")' ( 1804- 1889). Bunyakovsky published the Inequahty in 1859, a full quartercentury before Schwarz's work on the same subjec t. Hence, it is more proper to refer to the result as the Cauchy-Hunyakovsky-Schwar1. Inequal it y.
(u. v) If u 0 , then let Wbe the subspace of V span ned by u. Since proj w(v) = ( ) u and u.u perp wv = v - proh,,( v) are orthogonal, we can appl y Pythagoras' Theorem to obtain
"*
~ vf = I proh,.(v) =
+ (v - prohv(v)) 1 2 = Il projw(v) + perpw(v)11 2
I projw(v W + i pcQ)w(v) 1
(2)
2
It follows that ! proiw(v)11 2 :s Ivf Now
. , I proJw(v)1
=
/ (u. v) (u. v) ) \ (u, u) u, (u, uJ u
(,u,c...v"c )' (u. ' )) ' (u. u) = .,. ( (u. u) (u. u)
so we have
Taking square roots, we obtain ku,v~ < l u lll v~. Clearly this last inequality is an equality if and only if ~ p rojw(v W = ~vl2. By equation (2) this IS true if and only if perp\,,( v) = 0 or, equivalently,
.
(u. v)
v = proJw(v) = ( )u u. u
Section 7.1
Inner Product Spaces
If this is so, then v is a scalar multiple of u. Conversely, if v perpw(v) = v - pro; w(v) = ell -
549
= cu, then
(u, cu)
c(u, u)
( ) u = cu U, u
( )u = 0 U, u
so equality holds in the Cauchy-Schwarz Inequality. For an alte rnative p roof of this inequality, see Exercise 44. We will investig'lte some interesting consequences of the Cauchy·Schwan.: Inequality and related in· equalities in Ex ploratio n: Geomet ric Inequalit ies and Optimization Problems, which follows this section. For the moment, we use it to prove a generalized version of the Tria ngle Inequality (Theorem 1.5).
Theore. 1.4
The Triangle Inequality Let u and v be vectors in a n inner product space V. Then
lu + vi " lull + I vl Proal
Starting with the equality you will be as ked to prove in Exercise 32, we have ~u
+ vf
I u02 + 2{u, v) + Dvl1 " Il u~' + 21(u, v)1+ Ilvi' 2 :S lul1 + 211 ullllvil + ~ v112 = (lui + ivi)' =
By Cauchy. Schwarl.
Taking square roots yields the result.
1M Exercises 1 tllld 2, leI u = [ _ (0)
(u,v)
Ib)
lull
~ J and v = Ie)
[!].
Compute
6. (P(x), q(x» is the inner prod uct of Example 7.5 o n the vector space '!J'2 [0, II .
d( u,v) ~
I. (u , v) is the inner product of Example 7.2. 2. (u, v) is the inner product of Example 7.3 with A =
4 -2]. [ - 2
5. (p(x), q(x» is the inner product of Example 7.4.
7. In Exercise 5, find a nonzero vector orthogonal to p(x). 8. In Exercise 6, find a nonzero vector orthogonal to p(x) .
Exercises 9 alld 10, let j(x) = sill x alld g(x) = sill X + cos x in the vector space'€ [0, 21T ].
~ 111
7
3. In Exercise I, find a nonzero vecto r orthogonal to u. 4. [n Exercise 2, fi nd a nonzero vector orthogonal to u .
9. Compute (a)
In Exercises 5 and 6, lei p(x) = 2 - 3x + 1 - 3x2. Compute la) (pi x), q(x»
Ib) Ip(xl i
x2
alld q(x )
Ie) d(p(x), q(x))
(f, g)
(b)
If I
(e) d( f,g )
10. Find a nonze ro vector orthogonal to f :::0
I L Let a, b, and c be dis tinct real numbers. Show that
(" x), q(x » = p (alq(o)
+ p(b)q(b) + p(elq(e)
550
Chapter 7
DIstance and App roximallon
defines an inner product on 'lP 1. [Hlnt:You will need the fact that a polynomial of degree 1/ has at most n zeros. See Appendix D.l
26. (2v - w, 3u
+ 2WI
27. ~ u + v~
28.
12. Repeat Exercise 5 using the inner product of Exe rcise II with a = 0, b = I, c= 1.
1 2u -
3v
+ wll
29. Show that u + v = w. [Hmt: How can you use the properties of inner product to verify that
u+v-w = O?j Tn Exercises 13-18, determine which of the four inner product aXIOms do not hold. GIve a specific example ill each case. 13. Let u =
[~J and v = [ ~J
14. Let u =
["' Jand v [v'J =
142
(u,
VI =
15. Let u =
(u , VI =
II I VI
l~J
III
v!
III
R2. Define (u, v) = u,v).
III Exercises 31- 36,(u, v) is an inner product. In Exercises 3134, prove that the given statement is l/fl Identity.
Jt1
H2. Define
31. (u +
v, u - vi ~ Ilull' - lvii' 32· llu + vI' ~ 1IuI ' + 2(u, v) + Ilvll'
V2
ulv!.
-
l~:J
and v =
+
33. ~ ull l III [Rl. Define
35.
112 Vl'
V
=
112
VI defilles all illner product 011 [R:l,
[vv!'J. Fmd a symmetric matrix A
silch that (u, VI = u TAv. 19. (u, v) = 20. (u,
VI =
411 1VI
III VI
+
II I v!
+
IIlV I
+
+ lll I V! + lulvi +
2L (u,v) = 22. (u, v) =
+ ! 411 1VI + II I VI
=
SU1V1
[ '"',' J.."nd v
V~l v~ if and only if u and v
+ ~ V~2 if and only ifu
In Exercises 37--40, apply the Gmm-Schmldt Process to the basis 13 to obtaillllll orthogollal baSIS for the mner product space V relatIve to the given mner product. 37. V = [Rl, 13 =
{[~l [: ]}, with the ltlner product in
= { [
~ J, [:J},with the inner product
39. V= 'lP1,B = {I, I + x, I product in Example 7.4
~ [Vv,' J·
+ x + Xl} , with the inner
40. V = 'lP 2 [0, I], [3 = {\, I + x, I + x + x"}, with the U2V1 II I V2
inner product in Example 7.5
+
Il l VI
+
41' l V2
~ 41. (a) Compute t he first three normalized Legendre
polynomials. (See Example 7.8.) (b) Use the Gram-Sch midt Process to find the fourth normalized Legendre polynomial.
Tn Exercises 25-29, suppose thai u, v, alldw are vectors inner product space such that (u,v)~ I,
~ I,
(u, w)
=
5,
(v, w)
=
0
Ivl
~
0, H I ~
2
ill
all
42. If we multiply the Legendre polynomial of degree n by an appropriate scalar we can obtaltl a polynomial L"(xl such that L"( I) = 1.
(a) F;nd r,,(x), L,(x), L,(x), and L,(x). (b) It can be shown that L"(x) satisfies the recurrence relation
Eva/llate the expressiolls ill ExerCISes 25- 28.
(u + w, v - w)
- V~ 2
immediatel y following Example 7.3
24. Prove Theorem 7.I(c).
25.
+ ~ l lu
+ v~ ' - iIIu Prove that ~ u + vii = II u -
3S. V = IR', B
23. Prove Theo rem 7.1 (b).
Ilul
V ~2
Example 7.2
4U l V 1
/11 ExerCISes 21 alld 22, sketch the IInit circle ill R2for the
givell inner product, where u
+
36. Prove that d(u, v) = V~ u 112 and v are orthogonaL
IS. In M)2> define {A, B) = det(AB).
[u,Jand
~ vf = ~~ u
are orthogonal.
17. In 0'" defin' (p(x), q(x)) ~ p( I)q( I).
III Exercises 19 and 20, (u,
+
34. (u , v) = i llu
16. In 0'" d'fine(p(x), q(x)) ~ p(O)q(O).
where u =
30. Show that, in an inner product space, there can not be unitvectorsuandvwith(u,v) < - I.
Lix) =
2n - \
n
xL"_ I(x) -
11 -
n
Section 7.1
for all ,, ! 2. Verify this recurrence for ~(x) and L)(x). Then usc it to compute ~ (x) and 4(x). 43. Verify that if W is a subspace of an inner product space Vand v is in V, then perp ",(v) is orthogonal to all w in W. 44, Let u and v be vectors in an inner product space V. Prove the Cauchy-Schwarz Inequality fo r u 0 as follows: (a) Let t be a real scalar. Then (t u + v, lu + 11 ! 0 for all values of t. Expand this inequality to obtain
*"
Inner Product Spaces
551
a quadratic inequality of the form
at1+bt+c!O What arc a, b, and c in terms of u and v ? (b) Use your knowledge of quadratic equations and their graphs to obtain a condition on 11, b, and c for which the inequality in part (a) is true. (c) Show thai, in terms of u and v , your condition in part (b) is eq Uivalent 10 the Cauchy-Schwarz Inequality.
.....---.. •,
--
.,.~
l
.....
.-
.
Vectors and Matrices with Complex Entries In this book, we have developed the theory and applications of real vector spaces, the most basic example of wh ich is ~H. We have liiso explored the fi nite vector spaces Z; a nd their applications. T he set C" of /I· tu ples of complex numbers is also a vector
space, with the complex numbers C as scalars. The vc(:tor space a..aoms (Section 6.1) a ll ho ld for C~, and co ncepts such as linear independence, basis. and dimension carry oyer from IRH without d iffi culty. The fi rst notable di fference between R~ and C~ is in the defi nit ion o f dot product. Ir we defi ne the dot product in
C~ as in Rn, then fo r the no nzero vector v :;;: [ :] we
have
Ivl = Vv· v This is dearl y an undesira ble situation (3 nonzero vector whose length is zero) and v iolates Theorems 1.2( d ) and 1.3. We now generalize the real dOl product to C" in a way that avoids this type o f difficu lty.
Dellnltlon
If u =
", and v = ",
arc vectors in
e" , then
'.
dOl product of u and v is defined by
Th e norm (or length) o f a complex vecto r v is defined as in the real case: i vii - V v ' v. LIkewise, the distance between two complex vecto rs u and v is still d efi ned as d{u, v ) = Il u- vl.
m
v, I.
Show that, for v = v.
2.
land v = [2 - 3;]. Find : I 1 + 5;
Let u = [ ;
(,) u· v
(b ) lui (e) Ilvl (d) dI U. v ) (e) A nonzero vecto r orthogonal to u (f) A n onzero vector o rthogonal to v
The complex dot product is an example o f the morc general notion of a complex inner product. which satisfies the same conditio ns as a real mner product with two exceptions. Problem 3 provides a summa ry. 3. Prove that t he complex dot product satisfies the followi ng properties for all vecto rs u, v, and w in en and all complex scalars. (a) u-v = v 'u (b) u ' (v + w) = u ' v (c) (e u) · v :::c( u · v)
(d) u' u
~
0
and
+
u .w
and
u'
U ".
u · (cv)=c{u · v) 0 if and only if u = o.
For matrices wi th complex en tries, addit ion, mul ti plication by complex scalars, transpose, and matrix multi plication are all defined exactl y as we did for real matrices in Section 3. 1, and the algebraic properties o f these operations still hold . (Sec Section 3.2.) Likewise, we have the notion o f the inverse and detcrmmalll o f a square complex matrix just as in the real case, and the techniques and pro pe rt ies all carry ove.r to the complex case. (See Sections 3.3 and 4.2. ) The notion of transpose is, however, less useful in the complex case than in the real case. The fo llowing definition provides an alternative.
Dellnltloa
If A is a complex matrix, then the conjllgat~ rmnspose of A is the matrix A· ciined by
-
In the preceding definition, A refers to the matrix whose entries are the complex conjugates of the corresponding ent ries of A; that is, if A = [a'J]' then X = [ a~]. 4. (a) A =
Find the conjugate transpose A· o f the given matrix:
[ ;Uj _i
2- i (e) A = [ 4
(b) A =
3
+
o
3i
- 2 ] 3 - 4i
(d )
A=
[ +' 5- 2'] 5
2;
- I
3;
0
1- i
4
1
+
,.
,.
- ,. 1+ ; 0 Properties of the complex conjugate (Appendix C) extend to matrices. as the next problem shows.
5. Let A and /J be complex mat rices, and let c be a complex scalar. Prove the followin g properties:
(, ) A = A (c) CA = A
c
(e)
(b) A
-
+
B= A
- -
+
B
(d) AB = A B
(A"jT = (iiT) 55.
The properties in Problem 5 can be used to establish the following properties of the conjugate transpose, which are analogous to the properties of the transpose for real matrices (Theorem 3.4 ).
6. Let A and B be complex matrices, a nd let c be a complex scalar. Prove the fol· lowing properties: (b) (A + B)" = A" (d) (A 8 )* = 8"A"
(a) (A")" = A (e) (cA)' ~ ""
+ 8*
7. Show that fo r vectors u and v in C·, the complex dot prod uct satisfi es U'v = u"v. (This result is why we defined the complex dot product as we did. It gives us the analogue of the form ula u • v = U T v for vectors in IR:n.) For real matrices, we have seen the importance of symmetric matrices, especially in our study of diagonalization. Recall that a real mat rix A issymmetrtc if AT = A. For complex matrices, the foll owing definiti on is the correct generaliza tion. Hermitian malrlces are na med after the French mathematician Charles "Iermite ( 1822- 190 1). Hermite IS best known for his proof thll.t the numb« t is triA nscendental, but he also was the first to use the term orlilogoltal mll t rlus, and he proved tha I symmet ric (and Hermitian ) matrices have real eigenvalues.
Dellnltlon
A square complex matrix A is called
Hermitian if A" = A-that
tS, i it is equal to its own conjugate transpose. .m
8.
Prove that the diagonal entr ies of a Hermitia n matrix must be reaL
9. Which of the following matrices are Hermitian? ( a) A
=
(e) A =
( e) A
=
L
(b) A
2 i
= [ - I
2 - 3; I
-3 [I
(d) A
~
- 5i
I - 4; 3
o
3
2
- 3
0
- I
- 2
I
0
(f) A=
I
+
+
4;
2
3 - ; .
,
-,
o
•
j
3
0
- 2
0
2
1
- 2
I
5
10. Prove that the eigenvalues of a Hermitian matrix are real numbers. I Hint: The proof of Theorem 5. 18 can be adapted by making use of the conjugate transpose operation.] Ii. Prove that if A is a HermItian matrix, then eigenvectors corresponding to distinct eigenvalues of A a re orthogonal. {Hint: Adapt the proof of Theorem 5.19 using u >v = u" v instead of u · v = uTv.] Recall that a real square matrix Q is orthogonal if Q-I p rovides the complex analogue.
Dellnlllin
=
QT. The next defi nition
A square complex matrix V is called unitary if V -I
=
lust as for orthogonal matr ices, in practice it is not necessary 10 compute You need only show that V'" U = I to verify that V is unitary.
55.
l?
u=1
directly.
12. Which of lhe following matrices are unitary? For those that are uni tary, give their inver~s.
il V> - il V>] (. ) [ il V> il V> 3/ 5 (c) [ 4i/ 5
-4/5] 3;/ 5
(b)
1+; [
1+1,.]
1- i
- ]
i)/ v.
0
o
1
(I + (d )
+
(- I -
i)/ 0
0
Un ita ry matrices behave in most respects like orthogonal matriccs. The following problem gives som e alternative characterizations of unitary matnccs.
13. Prove that the followin g statements are eqUivalent for a square complex matrix U: (a) U is unitary. (b ) The colum ns of U form an orthonormal set In C· with respect to the complex dot product. (c) The rows of U form an orthonormal set in C" with respect to the complex d ot p roduct. (d) IIUx ll = II xll fo"",,,. in C", (e) Ux.·Uy = x·yforevery xandyi n C".
( HIIII: Adapt the proofs of Theorems 5.4-5.7.) 14. Repeat Problem 12, this time by appl ying the criterion in part (b ) or part (c) of Problem 13. The next definitio n is the natural generalization of orthogonal diagonahzability to complex matrices.
Definition
A square com plex matrix A is called ,mitt/rUy diagollt/lizable i( the re exists a unitary mat rix U and a diagonal matrix Dsuch that U*AU- D
The process for diagonalizi ng a unitarily diago nalizablc /I X /I matrix A mimics the real case. The colu mns of U musl form an ortho norm al basis for C" consisting of eigenvectors of A. Therefore, we ( I) compute the eigellvalues of A, (2) find a basis fo r each eigenspace, (3) ensu re tha t e,lCh eigenspace basis consists o f orl honorm,ll vec10rs (usi ng the Gram-Schm idt Process, with the co mplex do t product. if necessary), (4) form the matrix U whose columns are thc orthonorm al eigenvectors just fo u nd. Then U· AU w!Ube a diagonal matrix Dwhose dl3gonal en tries are the eigenvalues of A, arranged in the same order as the corresponding eigenvecto rs in the co lumns of U. 15. In each of the fo llowi ng, fi nd a ullltary matrix U and a diagonal matrix D such that ~AU = D.
(. ) A =
(c) A =
( b) A =[ ~ (d ) A
=
1
o
o
0
2
I - ;
o
1+ i
3
...
Ser Linear A/gebra Wllh APP/,calja n s by S_J. Lton (Upper SadJJe River, NJ: Prentice- Hall, 2002).
The matrices ltl (a), (c) , and (d) of the preceding problem are all Hermitian . It turns out that every Hermitian matrix is unltanly diagonalizable. (This is the Complex Spectral Theorem, which ca n be proved by adapting the proof of Theorem 5.20.) At this point you probably sllspect that the converse of this result must also be truenamely, that every unitarily diagonalizablc matrix must be Hermitian. But unfortunately Ihis is false! (Can you see where the complex analogue of the proof of Theore m 5.17 breaks down?) For a spe<:inc counterexample, take the matrix in part (b) of Problem 15. 1t is not Hermitian, but it IS unitarily diagonalizable. It turns out that the correct characterization of unitary diagonalizability is the following theorem, the proof of which can be found in more advanced textbooks.
A squa re complex matrix A is ullitarily tliagotUlliz.tlble if and only if
A*A = AA*.
A matrix A for which A*A ::: AA· is called normal.
Show that every Herm it ian matrix, every unitary matrix, and every skew· Hermitiall matrix (A* = - A) is normal. (Note that III the real case, this result refers to symmetric, orthogonal, and skew-symmetric matrices, respectively.) 16.
17. Prove that if a square complex matrix is unitarily diagonalizable, then it must be normal.
Geometric Inequalities and Optimization Problems This 6ploration will introduce some powerfu l (and perhaps surprising) applications of various inequalities, such as the Cauchy·Schwarz Inequality. As you will see, certain maximization/mini mization problems (optimization problems) that typically arise in a calculus course can be solved without using calculus at all! Recall that the Cauchy· Schwarz Inequality in RN states that for all vectors u and v,
with equality if and only if u and v are scalar multiples of each other. If u ::: [XI • . . Xn J r and v = [Yl ... yN JT, the above inequality is equivalent to
+ ... + x,.y 1-< Ixy I I ~-
\lx Il + ... +
V + ... + y'N
X" l y2 I
Squaring both Sides and using summation notation, we have
(±x.y,)' " ( ±xi) ( ±y,' ) ,- I
"1,1
Equality holds if and only if ther~ is some scalar k such that y, :=: h , for i = I •...• n. Let's begin by using Cauchy-Schwarz to derive a special case of one of the most useful of all inequalities. \.
Let x and y be non negative real numbers. Apply the Cauchy-Schwa rz In-
equality to u
""
[~] and v = [ 1] to show that x +y
(1)
2 with equality if and only if x = y.
(a) Prove inequality ( I ) di rectly. [1-11111: Square both sides.] (b) Figure 7.4 shows a ej rcle with cenler 0 and diameter AB :=: AC + CB = x + y. The segment CD is perpendicular to AB. Prove that CD = and use this result to deduce Inequality ( I ). (Him: Use simIlar triangles.] 2.
..;x:y
The right-hand side of incqualit y ( 1) is the familiar arithmetic mea" (or (Ivcrage) of the numbers x and y. The lefl -hand side shows the less familiar geometric m ean of x and y. Accordingly, inequality ( I) is known as the Arithmetic Mea,,-Geometric Mean I"equality (AMGM). It holds more generally; for /I nonnegativt variables x l" .. , x.' it stales
fl,." U
•
"vxIXZ' ''x"s xO'~+.:...;x~,_+;;-._.'~+=x='J
"
with equality if and only if Xl =
~
- ... = x n' "
,
In words, the AM GM Inequality says that the geometric mean of a set of nonnegative numbers is always less than or equal to their arithmetic mean, and Ihe two arc the same precisely when all of the numbers are Ihe same. (For the general proof, sec Appendix B.) We now explore how such an mequa lity can be applied to oplimization probl ems. Here is a tYPIcal calculus problem,
111m pie 1.9
Prove thai among all rectangles whose penmeler is 100 units. the square has the largest area.
y
,
$.1111t1 If we let X and y be the dimensions of the rectangle (see Figure 7.5), then the area we want to maxim ize is gIven by A = xy
y
We arc given that the perimeter satisfies 2x
+ 2y
= 100 551
which is the same as x Inequality:
.~ v xy
~
+y
,
=:
x+ y
50. We can relate xy and x
+y
using the AMGM
or, equivalently,
Since x + Y "" 50 is a constant (and this is the key), we see that the maximum value o f A = xy is 502/4 = 62 5 and it occurs when x = y = 25.
Not a derivative in sight! Isn't that impressive? Notice that in th is maximization problem , the crucial step was showing that the right-hand side of the AM GM Inequality was cOlI$tant.ln a si milar fashio n, we may be able to apply the inequality to a minimization problem i f we can arrange for the left-hand side to be constant.
Example 1.10
Prove that amo ng all rectangular prisms with volume 8 m), the cube has the mini m um surface area.
Solution
As shown in Figure 7.6, if the dimensions of such a prism are x, y, and z, then its vol ume is given by
v
=
xyz
Thus, we are given that xyz = 8. The surface area to be minimized is
s=
2xy + 2yz + 2zx
Since this is a three-variable p roblem, the obvious thing to try is the version of the AMGM inequality for n = 3-namely, x+ y+ z vxyz S 3
43r-::
Figure 1.6
Unfortunately, the expression fo r S does not appear here. However, the AMGM in equality also implies that s = 2xy + 2yz + 2zx 3 3
" ~('xy)(2yz)(2zx) = '\Y( x1')'
=2~= 8 which is equivalent to S curs when
:>
24. Therefore, the minimum value of S is 24, and it oc2xy = 2yz = 2zx
(Why?) This implies that x = y = z = 2 (i.e., the rectangular prism is a cube) .
.4-
3. Prove that among all rectangles with area 100 square units, the square has the sm allest perimeter. 4.
. 558
What is the minimum value of f(x) = x
1
+-
x
fo r x
> O?
S. A cardboard box with a square base and an open top is to be constructed from a square o f cardboard 10 em on a side by cutting out four squares at the corners and folding up the sides. What should the dimensions of the box be in order to make the enclosed volume as large as possible? 6. Find the minimum value of !(x,y, z) = (x + y) (y + z) (z + x) if x, y, and zare positive real numbers such that xyz = I. 8 7. For x> y > 0, find the minimum value of x + ""7(-"-~).' [H im : A substitution might help. ] Yx Y The Cauchy-Schwarz Ineq uality itself can be ap plied to similar problems, as the next example illustrates.
Example 1.11
Find the maximum value of the functio n [(x, y, z) = 3x + y + 2z subject 10 the constraint X- + :l + :? = 1. Where does the maximum value occur?
Solullon T his sort of problem
usually handled by techniques covered in a multivariable calculus course. Here's how to use the Cauchy-Schwarz Inequality. The func tion 3x + y + 22 has the form of a dot p roduct, so we let IS
x y
3 u =
and
1
v =
z
2
Then the componentwise fo rm of the Cauchy-Schwarz Inequality gives
(3x + y + 2Z) 2 :s (3 2 + I} + 22)(K + / + Z2) = 14 Thus, the maximum value of our fun ction
IS
VT4, and it occurs when
x
3
y
= k 1
z
2
Therefore, x = 3k, Y = k, and z = 2k, so 3(3k) + k k = 1/VT4, and hence x 3/V14
9.
VT4. It follows that
1/V14 2/ V14
y
8.
+ 2(2k) =
Find thel11aXmlUlll valueofJ(x,y,z) = x + 2y + 4z subjecttox Find the minimum value of!(x,y,z) = Xl
10.
Find the maximum value of sin
II.
Find the po int o n the li ne x
(I
z'
2
...i
+ 2/ + Z 2 = L
+ 1+ '2 subject to x + Y + z =
10.
+ cos O.
+ 2y =
5 that is closest to the origin.
There are many other inequalities that can be used to solve optimization problems. The quadratic mean of the numbers xl' . . . , xn is defin ed as
+ x2
X,2+
"
"
559
If xl'" . ,x" are no nzero, their harmonic mean is given by
l/ x1
+
l/x2
"+ ... + I Jx~
It turns out that the quadratic. arit hmetic. geometric, and harmonic means are all related. 12.
Let x and y be positive real numbers. Show that _ -:-'_
2
x >
vx;
+y 2
"
,
xy~ I/ x+ l / y
with equality If and o nly if x = y. (The middle Inequality is j ust AMGM. so you need only establish the fi rst and third inequalities.) y
/
13. Find the area of th e largest rectangle that can be inscribed in a sem ici rcle o f radius r (Figure 7.7). ' ' (x. )')
14.
Find the minimum value of the function
y - r ,
£I,." 1.1
• 2x
•
,
(x + y)2 x
[(x,y) =
xy
forx,y > O. (Nim :(x+ y)2Ixy~ (x+ y)( l lx+ Ily)! 15. Let x and y be positive real n umbers with x mum value of
+y=
is if, and delennine the values of x and y for which it occurs.
I
-
a S61
I . Show that the mini-
Section 7.2
Norms 31ld Distance Functions
511
Norms and Dlslance Funcllons In the last section, you saw that it is possible to defi ne length and distance in an inner product spac~. As you will see shortly, there a re also some versions of these two concepts that are not defined in terms of an inner product. To begin, we need to specify the propert ies that we want a "length fu nction" to have. The followi ng definition docs this, using as Its baSIS Theorem 1.3 and the Triangle Inequality. --------------------------_.~.~=_ .E~~
DBnaltioa
A norm on a vector space Vis a mapping Ihat associates wit h each
vector v a real number IIv ~, called the nonn of v, such that the following pro arc satisfied for all vectors u and v and all scalars c: I.
Ivl 2: O, and Ivl ::: Oifand onlyifv = O.
2.
I
J.
rties
A vector space with a norm is called a nontJl!d linear spacr..
(umple 1.12
Show that in an inner product space, ftvft 501111'1
- V(v, v) defines a norm.
Clearly, V(v, v) 2: O. Moreover, V{v. \I} :::
o <=> (v, v) =
O<=> v = 0
by the definition of inner product. This proves propert)' I. For property 2, we only need to note that
levi = V(
-t
We now look at so me examples of norms that are not defined in terms of an inner product. Example 7. 13 is the mathematical generalization to R" of the taxicab norm that we explored in the introduction to this chapter.
lxallple 1.13
The sum " orm Ivl. of a vector v Jil 1\1:" is the sum of the absolute values of ils components. That is, if v - I VI •• v,,1 T, then
Show that the sum norm is a norm.
S.IIUII Clearly. l vl. = IVII + ... + lv"l 2: 0, and the only way toachieV(' equality is if lvd = .. . ::: I Y~ 1 ::: O. But this is so if and only if YJ = ... = Yo = 0 or, equivalently, v = 0, proving property J. For property 2, \VC see that ( V ::: ICY I ... CY" f, so
levi, = iev.l + ... + i" .i
=
kiO' .1 + ... + i,.1) = ki~ v ~ ,
562
Chapter 7 Distance and Approximation
Finally, the Triangle Inequality holds, because if u lI u
+ v~.
= <
~
=
.
[u l · ·· u JTthen ,
+ vI I + ... + IUn + vnl (Iud + 1', 1) + ... + (lu.1 + 1,,,1) (lu,1+ ... + lu.1i + (I'd + ... + 1'.11 lUI
~
M , + I vii ,
The sum norm is also known as the I-norm and is often denoted by Il vlll. On 1R2, it is the same as the taxicab norm. As Example 7. 13 shows, it is possible to have several norms on the same vector space. Example 7.1 4 ill ustrates another norm o n IR".
Example 1.14
The max norm ~ v~ ", of a vector v In IR" is the largest n umber among the absolute values of its components. That is, if v = [VI .. . v" J T, then
Show that the max norm is a norm .
Salulloa Again, it is clear that ~v ll m:> O.lf Il v~m = O. then the largest of Ivl l, .. . , I v~1 is zero, and so they all are. Hence, VI = .. . = Next, we observe that for any scalar c, ~ cv ll m ~
m"" {k,, I,···, I"'.I)
Finally, fo r u = [1,11 Ilu
+ v~",
=
..
1,1"
~
v~
= 0, so v = O. This verifi es property 1.
klmox {jvd,···, Iv.I} ~
l 'l~ v ll m
I T, we have
+ vii.···, 11,1" + v"i}
max{lul
s max{ lud + Ivd , · ··, 11,1"1 + Iv" l}
s max{ lu, I,· ··, I".I} + m",,{I' d,···, I'.I} ~ Il u!m+ Il vllm (Why is the second inequality true?) This verifies the Triangle Inequality.
The max norm is also known as the oo- norm or uniform norm and is often denoted by Il vllco. In general, it is possible to define a norm ~ vll p on IR" by
Ilvll , ~ (I' d' + ... + 1'.1')'/' fo r any real n umber p:> 1. Fo r p = 1, Il vll l = Ilvll., justifying the term I-norm. For p ~ 2, I vl1 2 = ( Ivd 2
+ ... + Iv"12 )1/2 =
Vvf + ... + ~
which is just the fam iliar no rm on [f;\! " obtained fro m the dot prod uct. Called the 2-norm or Euclidean norm, it is often denoted by ~ vIIE . As p gets large, it can be shown using calculus that ~ v~papp roaches the max nOfm ~v ll m. This justifies the use of the alternative notatio n Il v~co fOf th is norm.
Example 1.15
For a vector v in 1:2, defi ne norm.
I V~H
to be w( v ), the weight of v. Show that it defin es a
Section 7.2
563
Norms and DIstance Functions
Solullaa
Certainly. l vll H = w(v) ~ 0, and the only vector whose weight is zero is the zero vector. Therefore. property I is true. Sin ce the only ca nd idates for a scalar ca re 0 and 1, property 2 is Immediate. To verify the Triangle Inequality, first observe tha t if u and v are vectors in III then w( u + v) counls the number of places in which u and v differ. (For exam ple, if u ,", [1
10
Or
1
and
v =[ O
1
1
IV
I
Ij T,so w(u + v) = 3,inagrccIllcnt with thefae! that u and v di ffer ill exactly three posItIOns.) Suppose that both u and v have zeros in ' '0 positions and Is in ", P0511 1011S, Uhas a 0 and v has a I In ''0. positions. and u has a I and v has a 0 in " In p OSi t ions. ([n the example above, " 0 = 0, " I = 2, '~l = 2, and IIHl = I.) then u
+ v = II
0
I
0
Now
w(u) =
"I
+
" '0'
w(v) =
II ,
+
"0"
and
lI'(u + v) =
11 10
+
lit)!
Therefore,
lu + V~f/ =
w(u + v)
= 11 ,0
+
I~ I
+ "eo) + (tie + s: (II I + " 10) + (II I + (til
= lI'(u) T he norm
+ w(v)
=
t~)I)
- 2nl
t~)I)
lulH+ 1v\l 1I
Ivlll is called the Ha mming lIorm.
DIstance Ilnctlons For any no rm , we can d efine a distance funct ion just as we did in the last sectionIl
d(u, v) =
I
Let u = [
_~] and v =
[ - :
l
ru- vi
Compule d ( u , v) relative 10 (a) the Euclidean norm,
( b) the sum no rm, and (c) the max norm.
Solullan
Each calculation req uires knowing that u - v = [ _ ; ].
(a) As is by now quite familiar,
de( u, v) = Il u - v~, =
\I,' + (-3)' =
v'25
= 5
(bl d,( u, vi = Ilu - vii, = 141 + 1-31 = 7 (el d,.(u, vi = Ilu - vIm= ",,,\14 1, 1-3 11 = 4 The distance fu nction on Zi determined by the Hamming no rm is called the Ha mming distance. We will explore its use in error-correcting codes in Section 7.5. Example 7. 17 provides an illustration of the H amming dis tance.
58C
Chapler 7
Distan(l:' and Appmximalio n
ham pie 1.11
Find the Hamming distance between u
SOlutlol
= [I
1 0
I
Or
and
Since we are wo rking over 2 2, u dn( u , v) =
v V
Il u + V~ H
= [0
=
U
I
I
I
If
+ v. Bul
= w{u
+
v)
As we noted in Exam pl ~ 7.15, this is just the number of positions in which u and v differ. The given vectors are the same ones used in thai example; Ih~ calculatio n is therefo re exactl y the same. Hence, dH(u , v) = 3.
Theorem 7.5 summarizes the most important properties of a distance functi on. I
Theure. 1.5
j
Let d be a distance function defin ed on a normed linear space V. The following properties hold for all vectors u, v, and w in V: a. d ( u , v) ~ O,andd ( u , v) "" Oif and o nly ifu = v. b. d ( u ,v) = d ( v, u ) c. d ( u , w)< d ( u , v}+ d (v, w} , .
• n
Proal
(a) Using property ( I ) from the defi nition of a norm, it is easy to check that d( u , v) = ~ u - vi ~ 0, with equality ho lding if and only if u - v = 0 or, equivalentl y, u
= v.
(b) You are asked to prove property b in Exercise 19. (e)
w~
apply the Triangle Inequality to obtain
d(u. v) + d(v. w ) = l u - vi + Iv - wi « I(u - v ) + (v - w)1
= l u - wi
=
d(u, w)
A function d satisfying the three properties of Theorem 7.5 is also called a metric. and a vecto r space that possesses such a fun ction is called a metric spau. These are very important in many b ranches of mathematics and ar~ studied in d~l ail in more advanced courses.
Mallil Norms We can define norms (o r matrices exactl y as we d efined norms fo r vectors in R~. After aU,lhe vector space Mmw of all mXn matrices is isomorphic to Rmn, so this is no t d iffi cult 10 do. Of course, pro perties ( I), (2), and (3) o f a norm will also hold in the setting o f matrices, It turns out that, (or matrices, the norms thai are most use(ul satisfy an addi tional p roperty. (We will restrict our attention to square matrices. but it is possible to generalize ev~rything to arbitrary matrices.)
Seclion 7.2
Norms and
Di sla nc~
515
Functions
Definition
A matrix norm on M~" is a mapping that associates with each n X n mat rix A fl real number ~ A~, called the lIorm of A, such that the fo llowing ~ropcr tics arc satisfied for all 'I X '1 matric~ A and B and all scalars c. I. I A ~ 2: 0 and ~ A~ = 0 jf and only ir A = 0.
I
2.
4. I A~I
"! AUIll
is said to be compatible with a vector no rm (or all ll X" matrices A and all vectors x in R", we have
A matrix norm on
M~~
IAxI s IA
I x~
R" ir,
x 77
I
on
•
=
The Frobellius norm IAII I' 0f a matrix A is obtained by stringing ou t th~ entries of the matrix into a vecto r and then taking the Eucl idean norm. In other words, IAllF is just the squa re root of the sum or the squares or the entries of A. So, if A = (a,) , then
( a) Find the Frobenius no rm of
A=[! -~] (b ) Show that the Frobenius no rm is compatible with the Euclidean norm.
(c) Show that
th ~
frobenius norln is a matrix no rm.
Beforewecon l i nue,obs~rve th3 tlfA , =
vcclo rsof A, lhen IA,IE = V3 2
+ (-
[3
- 1] andA j
1)2and I AlIE=
z::::
[2 4] are the row
'Ill + 42• Thus,
»A'F= V»A,U + IA21i Sim ilarly, if 3, =
[ ~] and al =
[-
~] are the column vectors of A, lhen
IAII F =
Vla,lt + la21i
It is easy 10 se~ Ihal these facts extend to 'l X " matrices in general. We will use these observations \0 solve parlS (b ) and (c).
A, (b ) Write
A,
and Approximation
T hen
A,x II AxIl,=
:!f
·• •
,
v'II Alnill xlli + ...
+
A"lIill xlH
= (VII A,II}+ ... + II A.II!)lI xll, = I AMxh
)
wh ere the inequality arises from the Cauchy-Schwarz Inequality applied to the dot p roducts o f the row vectors A, with the column vector x. ( Do yo u see how CauchySchwarz has been applied?) Hence, the Frobenius norm is compatible with the Euclidean norm. (c ) Let b, denote the ilh column of B. Using the matrix-column representatio n of the product AB, we have
II ABII, = II lAb, ... Ab.] II, =
VII Ab,lI\ + ... + IIAbJli
"vIIA lIlllb,1I1 + .. + II AlIlllb.1I1 IIAII,vll b,1I1 + ... + II b.1I1 lAM nil, which proves property 4 of the defin itio n of a matrix norm. Properties (I) through (3) are true, since the Frobemus no rm is derived from the Euclidean norm, wh ich sa tisfies these properties. Therefore, the Frobenius no rm is a matrix norm.
For many applications, the Frobeni us matrix norm is not the best (or the easiest) one \ 0 usc. The most uscfultypes of Illatrix norms arise fro m considering the effect of the matrix transformation corresponding to the square matrix A.. This transformation maps a vector x into Ax. O ne way to measure the "size" o f A is to compare Ixl and IAxI uSing any convenient (vecto r) norm. Let's think ahead. Whatever definition of I AI we amve at, we know we are gomg to want It to be compatible with the vector norm we are usi ng; that is, we will need for x :;: 0
T he exp ressio n
I;x~1
measures the "stretching capability" of A. If we normalize each
nonlero vector x by d ivid ing it by its norm, we get unit vectors
i - I ~I x and thus
Section 7.2
Norms and Distance Functions
561
4 4
2
- 4
4
-2
- 4
flg.r.l .8
If x rangcs over all nonzero vectors in IR:", then i ranges over allllfllt ,'ectors (Le., the unit sphere) and the set of all vectors Ai d etermines some curve in R". For example, Figure 7.8 shows how Ihe m
[ ~ ~ ] affects the unit circle in [R2-
it maps il
illlo an ellipse. With the Euclidean norm, the maximum value of ~ Ai ~ is clearly just half the length of the principal a.'{is-in this case, 4 units. We express this by wntmg r~lax a Ai ~ = 4. W"
In Section 7.4, we will sec that this is not an isolated phenomcnon . That is,
I Axl max i II .'-'0 x
•
= l'!,m:IIA x~ I "I
always exists, and there is a particula r unit vcelory for wh ich lAy) is maximum. Now we prove that ~ A II = max ~ Axl defines a matrix no rm. IKI- I
I
Theore. J.6
•
Iii
If ~ xl is a vector no rm on IR~, then ~Ai = max~ Ax~ defines a matrix n orm on M" .'
"I
that is compatible wi th the vector norm th:1t induces il.
°
Proof (I) Certainly, dAx! ~ for all vectors x, so, in particular, this inequality is true if ~ x~ == J. Hence, ~ AI = max ~ Ax l ~ 0 also. If I A ~ = 0, then we must have 1_1=1 ~ Ax ~ = O--and, hence, Ax = O-for a1\ x with ft xl = I. In pa rticular, At, = 0 for each o f the standard basis vectors c , in [R". But ACt is just the it h column o f A, so we must have A = O. Conversely, if A = 0, it is cle
1 ~ 1~ 1
Iel m"II A'~ Ixl-
I
IeIIAII
lnd Approximation
(3) Let B be an nX " matrix and let y be a unit vecto r fo r which
IA+ au
~
m" I(A+ 8)xl - I(A+ 8)YI
I .,
IA+ BI
Then
~
I(A + 8)YI
~ ~ Ay
+
Hy~
s 1AYI+ 18 s IAI + II~I
)
(Where does the second inequality come from? ) Next, we show that our defi nition is compatible with the vector norm [ pro perty (5)1 and then use this fact to complete the proof that we have a matrix norm.
(5) If x = 0, then the inequality IAxl < IAll xl is tr ue, sillce both sides arc zero. If x 0, then from the commen ts p receding this theorem,
'*
IAxl1 IAxl Ix I ~ IAI ". xI s mox "'" "fence, I Axl :s:
IAllx,.
(4 ) Let z be a unit vector such that I A ~ == max l( AB} I ~I- I
IABI
=
IABzl. Then
IAB'I - IA(B, )I By properly 5 s IIAII B'I By propert y 5 s IAIIBlI" ~ IAII BI! Th is com pletes the proof that IAII = max I A x ~ defines a matrix norm on M M that is ~
I ~I- t
compatible wi th the vector norm that induces it.
Dennlllon
The matrix norm Induced. by the vector norm Ixl.
' AI in Theorem 7.6 is called the operator norm
The term operator IIorm reflects Ihe fact that a mat rix tnlllsforma tton arising from a square matrix IS also called. a Imearoperator. This norm is therefore a measure of the stretching capability of a linear operator. The three most commonly used operato r norms are those induced by the sum no rm, the Euclidean norm, and the max norm-nam ely,
respectively. The fi rst and last of these tu rn out to have especially nice formulas that make them very easy to computt.
5«tion 7.2
Theor •• 1.1
Norms and Distance Functions
Let A be an fiX fI matrix with column vectors a, and row vectors A, fo r
a. IAIII "" 1"'11 .~ Uajll.} . ..
=
j -
519
I, . .. , II.
;-~.J~ la"l}
b. IAloo = ,.~."UA,1I.l = ,.'i""'Jt, IU"I} In other wo rds, IAiI is the largest absolu te column sum , and IAi,., is the largest absolute row sum. Before we prove the theorem , let's look at an example to see how easy it is to use.
Example 1.19
Let I
-3
4
- I
2 -2
-5
I
3
A= Find ~ Al l and
511111'1
IAn.,.,.
Clearly, the largest absolute column s um is in the first colum n, so
IAI, = la,l. = II I + 1' 1+ I-51= 10 Th e thi rd row has the largest absolute row sum, so
With refere nce to the definition ~ A ~ I = max ~ AxI .. we see that the maximum 1- 1,= 1
value of 10 is actually achieved when we take x "" e l' for then ! Ae , ~ .
ror I Aloo = 1_,."' max1~ Ax ~ ""
- ~ a , l. ~ 10 = I A ~ ,
if we ta ke - I
x=
I I
we obtain
II Ax il '"
= =
I
-3
4
- I
2 -2
-5 I 3 m,, {I-21, 1-71, 191}
- I I
1
=
•
9 = IAloo
We will use these observations In provlllg Theorem 7.7.
-2 -7 9 •
511
Cha pter 7 Distance and Approximation
Prool 01 Theorem 1.1
The strategy is the same in the case o f both the colum n sum and the row sum. If M represents the maximum value, we show that I Ax ~ S M for all unit vectors x. Then we find a specific unit vector x for which equality occurs. It is imporlant 10 remember that for property (a) the vector norm is the sum norm whereas fo r p roperty (b ) it is the max norm. (a) To prove (a), let M =
) - 1, .
Ixl.:::
I. T hen lxl1 + ..
Ua)II .), the maximum abso lute col u m n sum, and let
ma.x ,/P
+ Ix,,1=
I Ax ~. =
1,50
ixla l + .. ' +
x"a ,,~.
" Ix'\lo.ll. + ... + Ix.II ••I . S IxdM + ... + Ix"I M = (Ix,\ + . + Ix.IlM = I · M = M If the maximum absolute col umn sum occurs in column k, then with x = e k \\Ie obtain
I A.,I, = I.J. = M Therefore, ~ A II
max ~ A x l l , = M =
111,- 1
max {~ alL} , as reqUi red.
)-1 .
,n
(b) The p roof of property (b) is left as Exercise 32 . In Section 7.4, we will discover a fo rmula fo r the operator no rm is not as com p utatio nally feasi ble as the formula fo r IAII. or ~ AIOO'"
lAb although it
The Condition Number 01 a Matrl. In Explo ra tion: Lies My Computer Told Me in Chapter 2, we encountered the notion of an ifI-con(iit;o"cd system of linear equations. '·Iere is the defi nition as it applies to matrices. co
Definition
A matrix A is i/f-co"ditio"ed if slllall changes in its en tries can produce large changes in the solutions to Ax"" b. If small changes in the entries of A produce only sma ll changes in the solutions to Ax = b , then A is called welJ·
conditioned.
Although the defin itio n applies to arbitrary mat rices, we will restrict our allen· lion to square matrices.
Inmple 1.20
Show that A = [ :
Solulloa
: .0005] is ill-condit io ned,
If we ta ke b =
[~.oo 10 ]. then the solution to Ax =
ever, if A changes to A' = [ :
:.0010]
b is x =
[ ~ l How-
Section 7.2
:)
then the sol ution changes to x ' =
Norms and DIstance Functions
511
[~ ]. (Check these assertions.) Therefore, a relative
change of 0.0005/1.0005 = 0.0005, or about 0.05%, causes a change of (2 - 1)/ I = I, or 100%, in XI and (1 - 2)/2 = -0.5, or - 50%, in X;.. Hence, A is Ill-conditioned.
-4 We can use matrix norms to give a more precise way of determmlJ1g when a matrix is ill· conditioned. Thmk of the change from A to A' as an error 6A that, in turn, introduces an error 6x in the solution x to Ax = b. Then A' = A + 6A and x ' = x + 6x. In Example 7.20,
T hen, since Ax = b and A'x' = b , we have (A canceling off Ax = b , we obtain
A(a,) + (aA), + (aA)(a,)
~
0
+
a A) (x
+
m A(ax)
ax) ~
= h. Expanding and
-aA(x + ax)
Since we are assuming that Ax = b has a solution, A must be invertible. Therefore, we can rewrite the last equation as
Taking norms of both sides (using a matrix norm that is compatible with a vector norm), we have
Ilaxll
~ II - A - '(aA)" ~ ~
iA- '(aA)x'l
" W'(aA)lllx'i " IW 'IIII M llllx'l )
(What is the justification for each step?) Therefo re,
l axl " W' I IM I ~ (lW'IIAllll aAII 1" 1 I Ai The expression ~ A-ll~ A~ is called the condition number of A and is deno ted by cond( A). If A is not invertible, we define cond( A) = 00. What are we to make o f the inequality j ust above? The ra tio ! aAII/~ AII is a measure o f the re/ative cl/ange in the matrix A, which we are assuming to be small. Similarly, 116xll/llx' II is a measure o f the relative error created in the solution to Ax = b (allhough, in this case, the error is measured relative to the /lew solution x ', not the original one x) . Thus, the inequality
(I)
gives an upper bound on how large the relative error in the solution can be in terms of the relative error in the coefficient matrix. The larger the condition number, the more ill-conditioned the matrix, since there IS mo re "room" for the error to be large relatIve to the solution.
512
Chapter 7
Distance and Approximation
R8111rlll • T he condition number of a matrix depends on the choice of nor m. The most commonly used norms are the operator norms II A ~ I and ~ A lloo' • For any 1l0rm,cond ( A) ~ 1. (Sec Exercise 45. )
Ixample 1.21
1 ] relatIVe . to the oo-norm. 1.0005
Find the condition number of A = [ :
Solution
We first compute A-I = [
- 2000] 2000
2001 - 2000
T herefore, in the oo -no rm (maximum absolute row sum ),
I Aloo =
1
'0 co"d,,(A) =
+
1.0005 = 2.0005
IW'iJIAloo =
l A- II"", = 2001
and
+ 1-
20001 = 4001
4001 (2.0005) = 8004.
It turns out that if the condition number is large relative to o ne compatible matrix norm, it will be large relative to any compatible matrix norm. For exam ple, it can be shown that for matrix A in Exa mples 7.20 and 7.21, cond l( A) "" 8004, cond 2 (A) ... 8002 (relative to the 2-norm ), and cond F( A ) = 8002 (relat ive to the Frobenius norm).
The Convergence o'"era'ive MethodS I n Sectio n 2.5, we explored two iterative methods for solving a system of linear equat ions: Jacobi's met hod and the Gauss-Seidel method. In Theorem 2.9, we stated without proo f that if A is a strictly diagonally dominant nX n matrix, then both o f these m ethods converge to the solution o f Ax = b . We are now in a position to prove this t heorem. Indeed, one of the important uses o f matrix no rms is to establish the con vergence properties of various iterative methods. We will deal only with 1acobi's method here. (The Gauss-Seidel method can be handled using si m ilar tech niques, b ut it requi res a bit m ore care.) T he key is to rewrite the iterative process in terms of matrices. Let's rev isit Example 2.36 with th is in mind. The system of equations is
7xI -
3xI
'0
A= [;
-
~
=
5
(2)
5xl = - 7
--5I] ,,,d b=
[
' ]
-7
Section 7.2
Norms and Distance Functions
513
We rew ro te equation (2) as
X, = x2
=
5
+
Xl
7
7
+
(3)
3xI
5
which is equivalent to 7xI =
x2
-5x 2 = -3x 1
+
5 (4)
-
7
or, m ter ms of matrices, (5)
Study equation ( 5) carefully: The matrix on the left-ha nd side contains the diagonal entries of A, while on the right- ha nd side we see the negative of the off-diagonal entries of A and the vector b. So, if we d ecompose A as
A = [;
-I] o = L + D+U
-']=[00]+[7 - 5
3
0
0
then equat ion (5) can be wri tten as
Dx = - (L
+ U) x + b
o r, equivalentl y,
(6) since the ma trix D is invertible. Equation (6) is the ma trix version of equa tion (3). It is easy to see thaI we ca n do this in general: An /I X " matrix A can be written as A = L + D + U, whe re D is the diagonal part o f A a nd Land U a re, respectively, the portions of A below and above the diagonal. The system Ax = b can then be writte n in the form of equation (6), provided D is invertible-which it is if A is strictl y diagonally domina nt. ( Why?) To s implify the no tat ion, let 's let M = -D- I (L + U) and c = D- Ib so that equation (6) becomes
x = Mx
+c
(7)
Recall how we use this equation In Jacobi's method. We start wit h an imtial vecto r Xo and plug it into the righ t- hand side of equation (7) to get the first iterate XI-tha t is, XI = Mxo + c. Th en we plug X I into the right · hand side o f equation (7) to gel the second iterate Xl = MI l + c. In general, we have X HI = MIl:
+C
fo r k 2: O. For Exam ple 2.36, we have
M
= - D- '( L
and
+ U)
=
_[7o -50]-'[03 -I] 0
(8)
5JC
Chapter 7
Distance and ApproxImation
so
l][0.714] + [!] ~ [0.914] o 1.400 S 1.829 and so on. (These arc exactly the same calculations we did in Example 2.36, but written in matrix form. ) 10 show that Jacobi 's me thod will converge, we need to show that the iterates x,. app roach the act ual solution x of Ax - b. It is e no ugh to show that the error vectors x k - x approac h the zero vector. From our calculations above, Ax = b is equivalent to x = Mx + c. USlOg equation (8) , we the n have X HI -
x = Mx, + c - (Mx + c) = M(Xl - X)
Now we take the norm of both sides of this equation . (At this point , it IS not tmportant whICh norm we use as long as we choose a matrix no rm that is coml>atible with a vector no rm.) We have
Ix,. , -
x~ ~
!M(x, - x)1 s IMllx, - xii
( 9)
If we can show that I M ~ < I, then we will ha ve ~ X l + I - xl < ~ Xl - xl for all k > 0, and it follows tha t ~ Xt - xl approacheszcrQ, so th e error vectors x~ - x approach the zero vector. The fac t that stric t diagonal dominance is defined in te rms of the absolute values of the e ntnes in the rows o f a malrix s uggests that the oo-norm of a matrix (Ihe oper'Hor norlll induced by the max no rm ) is the one to c hoose. If A "" (a 9], then
M ~
°
- alJ all
- (l2l/ ~2
°
- (IIII/ a ""
- ad II""
...
- al .. / au - (/1 .. / (/12
• ••
°
( verify this), so, by Theorem 7.i, ~ Mlloo is the maxim um absolute row sum of M. Suppose it occurs in the hh row. Then
+ .. . +
since A is stric tly diagonally dominant. Thus, wished to show.
+ ... +
IMloo <
I, so
IXl-
xl --+O, as we
Compute IMloo 111 Example 2.36 and usc this value to find the number of iteratio ns required 10 approximate the sol utio n 10 three-dectlnal -place accuracy (after roundIng) If the initial vector is Xo "" O.
Section 7.2
Solution
Norms and Distance Functions
[1
We have already computed M =
~].
so IMII"", =
~=
5'5
0.6
<
I
(implying that Jaco bi's method converges in Example 2.35, as we saw). The approximate solution xt will be accurate to th ree decim al places if the error vector '" - x has the property that each of its components is less than 0.0005 in absolu te value. (Why?) Thus, we need only guarantee that th e maxlm l lnl absolute component o f Xl - X is less than 0.0005. In o ther words, we need to find the smallest value of I.: such tha t
IXl - x ~m
< 0.0005
Using equat ion (9) above, we see tha t
11,,*- xl", :s; IMI"",lIxH - xl ... <
I M I~ x J:- 2
- xl... :s" " IMI!Jx, - xl.
0. 7 1'] [ 1.400
IMI!,lx, - xl. -
...
= 1.4 ,so
(0.6)~ I.' )
(If we knew the exact solution in advance, we cou ld use it instead of X I' In practice, this is no t the case,so we usc an approximation to the solution, as we have done here.) Therefore , we need to find k such that (0.6)~ I.')
< 0.0005
We can solve this ineq ualit y by taking logari th m s (base 10) of both sides. We have loglO«0 .6) *(1.4»
<
logl0 (5 X 10- 4 ):::;. kiog lo (0.6) ;::} -0.2221.:
:::;. I.:
>
+ IOSIO ( 1.4} <
+ 0. 146 <
108105 - 4
-3.301
15.5
Since I.: must be an integer, we can therefo re conclude that I.: = 16 will work and that 16 iterations of Jacobi's method will give us three-decimal-place accuracy in this example. (In fact , it appears from our calculations in Example 2.36 that we get this d egree of accuracy sooner, but our goal here was only to come up with an estima te.)
- 1
In Exercises /-3, let u ==
4 (lnd v =
-5
2
-2
°
I. Compute the Euclidean no rm, the sum no rm, and the max norm o f u .
4. (a) What does d, (u , v) measure? (b) \Vha t does d",( u. v) measure?
frl Exercises 5 (lm1 6, let u = ( I v =[O
I
IOI
I
0
1
1 0 0
1JT (l nd
I JT,
2. Compute the Euclidean no rm, the sum no rm, and the max norm of v.
5. Compute the Hamming norms of u and v.
3. Compute d ( u , v) relative to the Euclidean norm, the su m norm, and the max norm.
7. (a) For which vecto rs v is Ivle = ~ vl ...? Explain your answe r.
6. Compute th e Hamming d istance between u and v.
5'6
Chapler 7
Distance and App roximation
(b) For which vectors vis ~ vll , answer. (c) For which vectors v is ~ vl l , yo ur answer.
= I lv~ m?
Ilvll",
=
Explain your = ~ V~E?
Explain
8. (a) Under what conditions on u and v is l u + vilE = ~ u l l !i + ~ vl l f? Explain your answer. (b) Under what conditions on u and v is Il u + vI,:;;; ~ u ll , +~ v ll '? Explain your answer. (e) Under Whlll conditions on u and v is nu + vii.. = l ull .. +llvll ",? Explain your answer. 9. Show that for all v in W, Ivll m
::5
IvllE'
10. Show that for all v in RN, ~ vh
<:
Ivlr
1 I. Show that for all v in Rn, l vll,
<:
nllvl ....
12. Show that for all v in Rn, Il v ~E
<:
In Exercise5 26-JI, filld vectors xand y lVillt lxi, = 1 and Irl. ~ I ,",h IAI, ~ IAxI. and IAioo ~ ! A m' wllere A is /he matnx HI ti,e given exerCIse.
"m'
26. Exercise 20
27. Exercise 21
28. Exercise 22
29. Exercise 23
30. Exercise 24
31. ExerCIse 25
32. Prove Theorem 7.7(b). 33. (a) If IAI is an operator norm, prove that ~ I ~ = 1, where J is an identity matrix. (b) Is there a vector norm that induces the Frobenius norm as an operator norm? Why or why not? 34. Let ~ A II be a mat rix norm that is compatible with a vector norm Il xll . Prove that ~ A II >- IAI for every eigenva lue Aof A.
Vnlvl,,.,.
13. Draw the unit circles in W relative to the sum norm and the max norm. 14. By showi ng that the identity of Exercise 33 in Section 7. 1 fa ils, show that the sum norm does not arise from any inner product.
/n exerCIses 35-40,filul cOlldj(A) and con4x,(A) . Stale wlre/lrer lire givelllllatrix is ill-condit ioned.
35.A = [ !~] 1 01·99 ]
37. A = [ I
/11 Exeroses IS- IS, prove liIat II I defines (llIorm olilhe vector sp(lce V. 15. V = Rl, 16. V ~
[:J :; ;
Mm", I AI
~
41- utA
",,,""I I",I }
f
If(x) ldx
21. A :;;;
24. A =
- 5
23. A =
[-~ 2 I I
I
2 -3
- 4
3
6
5 0
= [ :
40. A =
0
~].
I I
j
j
, , • !, ! !, • j
I Ax ~ I Abl Ixll '" 'Qod(A) Ibl
III Exercises 20-25, complile JAIl'> I A ~ I ' and I AIOO"
0 3 - 4
5
1
42. Consider the linear system Ax = b. where A IS Inver! ible. Suppose an error {).b changes b 10 b ' = b + db. Let x' be the solutIOn to the new system; that IS, Ax' = b' . Let x' = x + dx so that {).X represents the resulting error III the solution of the system. Show that
19. Prove Theorem 75(b).
22.A =[ -21 -~]
I
[ 3001 150 4002 200]
(a) Find a fo rmula for con
"
:]
38. A =
1 1 I
max{ 12al, l3bl}
~ 17. V ~ ,€[o, IJ, If i ~
20. A =[~
39. A =
-!]
36. A= [_~
25. A =
4 0 3
-~] I
I
3 I
2 3
for any compatible matrix norlll.
43. LetA = [ ~~ 1~ ] andb = [I ~~J. (a) Compute cond.x,(A).
- 2
- I
- I
2
-3
0
(b) Suppose A is changed to A ' =
[ :~ : ~].HOW
large a relative cha nge can this cha nge produce in the solution to Ax = b? [Use inequality ( I) from this section.]
Sectio n 7.3
IS
45. Show that if A is an invertible matrix, then cond(A):
changed to b ' = [ 100 ]. How large a
lOl relative change can this change p roduce in the solution to Ax = b? (Use Exercise 42.) (e) Solve the systems using band b ' and determin e the actual relative error. I
44. LetA =
I
I
46. Show that if A and B are Invertible m atrices, then cond(AB) :S con d ( A)cond ( B) with respect to any matrix norm.
47. Let A be an invertible matrix and lei AI and An be the eigenvalues wit h the largest and smallest absolute values, respectively. Show that
I
5 0 andb =2 I - 1 2 3 (a) Compute (and .(A). 2
(b ) Suppose A is changed to A' =
I
I
I
I
5
0 . How
I
- I
mnd (A)
3 relative change can this change p roduce in the solulion to Ax = b? (Use Exerc.lse 42.)
I· ,I '" I. j
[Hint:See Exercise 34 and Theorem 4.18(b) in Section4.3.]
2
large a relative change can this change produce in the solution to Ax = b ? [Usc inequality ( I) from th is section.] (e) Solve the systems using A and A' and dete rm ine the actual rela tive error. I (d) Suppose b is changed to b ' = I . How large a
511
(e) Solve the system s using band b' and de te rm ine the act ual rela tive error.
(c) Solve the systems usi ng A and At and determine the act ual relative error. (d ) Suppose b
Least Squares Approximation
-
CA6
In Exercises 48- 5 1, write the given system ill the form of equation (7). Then lise the method of Exmllp/e 7.22 to estimate the number of iterations of Jacobi's methotl that will be llceded to approxlnJate the solmion to three·decimal-place accumcy. (Use~ = 0.) COItllUlre your answer with the solutiOlI computed ill t/le given exercise from SectioIl2.5. 48. Exercise I, Section 2.5
49. Exercise 3, Section 2.5
50. Exercise 4, Section 25
5 1. Exercise 5, SectIO n 25
least Squares Approximation In many branches o f science, experi mental da ta are used to infer a mathematical relationsh ip among the variables being meas ured. For example, we might measure the height of a tree at variOus points in time and t ry to deduce a function that expresses the tree's height h in term s of time I. Or, we m ight measure the size p of a population over time and try 10 find a rule that relates p to t. ReI
518
Chapter 7
Distance and Approximation
y
y
•
--r--r--~-+--+--+--+--+-+ X
y
•
figure 1.9 Curves of '"best fit"
Roger Cotes (1682- 1716) was an English mathematKian who, while a fellow at Cambridgt:, edited the second edition of Newton's Prill' ClpW. Although he published little, he made important discoveries in the theory of loga rith illS, Integral calculus, and nUlllerical methods.
The m ethod of least squares, which we are abo ut to consider, is att ributed to Ga uss. A new as tero id, Ceres, was d iscovered o n New Year's Day, 1801, b ut it d isappeared behind the sun short ly after it was observed . As tronomers predicted when a nd where Ceres wo uld reappear, but their calc ulat ions diffe red greatly from those do ne, independentl y, by Ga uss Ceres reappeared on December 7, 1801, almost exa ctl y where Gauss had pred icted it would be. Although he did not disclose h is methods at the time, Gauss had used his least squares approximation method, which he described in a paper in 1809. The same method was actually known earlier; Cotes a nticipated the m ethod III the early 18th cent ury, and Legend re published a paper on it in 1806. Nevertheless, Gauss is genera lly given credit fo r the m ethod of least squares approxlll13tion. We begin our exploration of approximation with a more general result.
Section 7.3
519
Least Squar(S Approximatio n
ne Best IpproKlmalion Theorem In the sciences, there are many problems that can be ph rased generally as "Whllt is the best approximation to X of type Y?" X might be a SC I of data points, a fu nction, a vector, or many o ther things, while Ym igh t be a pa rticu l
Definition
If Wis a subspace of a normed linear space Vand if v is a vector in V, then the best approximatio" to vi" Wis the vector v in W such that
Iv - VI < iv - wi for every vector w in \V d ifferent from v. 7
"
In Rl o r RJ , we arc used to thinking of "shortest distance" as correspo nding to "perpendicular distance." In algebraic term inology, "sho rtest distance" relates to the notion of orthogo nal projection: If Wis a subspace of R~ and v is a ve<:tor In R~2 then we expect pro; w ( v) to be the vecto r in HI that is closest to v (Figure 7. 10). Si nce o rthogonal projection can be defined in any inner product space, we have the follow ing theorem.
w
w
figure 1.10 Ify '" projw(Y), then
lI y - vII < h - wll for all w
.1
-
¢
y
The Best Approximation Theorem If HI is a fi nite-dimensional subspace of an inner product space Vand if v is a""«tor in V, then proh.. (v) is the best approximation to v in W.
Pr••' Let w be a vector in W different from proJw(v). Then projlV
+ ' proj w{v)
- wl 2 = I(v - projw(v)} = ~ v - wl l
+ (prolw(v)
- wW
580
Chnptcr 7
Distance and ApprOXllJlation
as Figure 7. 10 illustrates. However, ~proj 1l'(v) - w l1 2
Iv -
proj1l'(vW
<
Iv - proj ll'(v ) 112
> 0,
+ Ilpro; w{v )
sillce W =f- projll'(v ), so
- w r = ~ v - w l12
or, eq uivalently,
I (Kample 1.23 Let U I =
2
, U2
=
- \
p laneW =s an
3 2 . f ind th e best app roximation to v in the
5
\
U
u
- 2 , andy ::
r 5 and find the Eucl idean di slanc, fr 0 mvto W.
Solulloll
The \'ector in Wthat best approximates v is pro) 1I'(v). Since orthogonal,
\
,1
+"
2
"
- \
5 - 2
3
\
1
" I
and
U2
arc
, s s
The d istance from v to Wis tile dist3nce from v to tile point in W dosesl to v. But til is distance is just l perp ll'( v ) ~ = ~ v - pro; 11'( v) I . We compute v - proj ll'(v) =
so
Iv -
proh ..(v)1 = Y 0 2 +
3 2
3
5
s
, s ,
Ci )! + (~4) 2
-
0 J.i
s
Ii
s
y710 _ 12Vs/ 5
"
-
which is the distance Cra m v to W.
In Section 7.5, we will look al olher examples of the Best Approximation Theorem when we explore the problem of approximating fun ctions.
He.'fll The orthogonal project ion of a vecto r v onto a subspace W is defined in te rms of an orthogonal basis for W. T he Best Approximatio n Theorem gives us an a lternative p roof that proj ll'(v) does not depend on the choice of this basis, since there can be only one vector in W that is closest to v- namely, p ro;w( v).
Least Squares Approximation We now turn to the p roblem offi ndillg a curve that " best fi ts" a sct o f d ata POlllts. Befo re we can proceed , however, we need to defi ne what we mean by "best fit." Suppose the data po ints (1,2), (2, 2), and (3, 4) have arisen from measurements taken during some experiment. Also suppose we have reason to believe that the x and yval ues are related by a linear fun ction; that is, we expect the points to lie on some lme With equatio n y = a + bx. lf o ur measurements were accurate, all three points would satisfy thiS equa tion and we wou ld have
2 = a+/) · 1
2 =(I + b'2
4 = (1 + /) · 3
Section 7.3
Least Squares App rO)(,UnatlOn
581
Th is is a system o f three linear equations in two varia bles:
a+ b= 2 a + 2b = 2 or a + 3b = 4
I
2
I
2
1
4
Unforl u nately, this system is inconsistent (slIlce the three poi nts do not lie on a straIght line). So we will settle for a line that comes "as close 3S possible" to passi ng th rough our points. For any line, we will measure the verlical d istance from each data poi nt 10 the line (represe nting the errars in the y-di rection), and then we will try to choose the line th3t min imizes the toMI error. Figu re 7.11 illustrates the situat ion.
y
6 5 (3. 4) 4
\· - I,+b.~
. ,{
3
2
( I . 2)
., {
}., (2. 2)
1
x - I
1
2
3
4
5
6
- I
figure J. t1 Finding the line thaI minim;l.es e~ + e~ +
e;
If the errors are denoted by e l , e1• and e 3• then we can for m the error vector
We want e to be as small as possible, so lei must be as close to zero as possible. Which no rm should we use? It tu rns o ut that the familiar Euclidean norm IS the best choice. (The su m no rm would 31so be a sensible choice, since l eU. '" + 1£21+ Is)1 is the actual sum of the errors in Figure 7. 11. However, the absolute value signs are hard to work with, and, as yo u wi ll soon see, the cho ice of the Euclidean norm leads to some very nice form ulas.) So we arc going to minimize
lell
~ e~ =
VE~ + e~ + e;
or, equi v31ently,
!ef '" e~ + e~ + e j
This is where th e term " least sq ua res" comes fro m: We need to find the smallest sum of squares, in th e sense of the fo regoing equation. The number l e ~ IS called th e least squares error of the approximation.
582
Chapter 7
Distance and ApprOXlillation
From Figure 7.11 , we also obtain the followmg form ulas for example:
lumple 1.24
8 1' 8 ",
and 8 j in our
Which of the following hnt's gives the smallest least squares error for the data points ( 1, 2), (2. 2), and (3, 4)?
+x (b ) y=- 2 + 2x (c) y = ~ + x (a l y = I
Salullal
Table
Table 7.1 shows the mxcssary calculations.
J.1 y= - 2+2x
y= l+x
e,
2 -(1+1)= 0 2 -( 1 +2)=- 1
t;j
4 - (I
e,
t::f + d+d Ilel
+ 3)
0 2 +( _ 1)2 +0 2 =
0 I
2 -(- 2 + 2 -(- 2 + 4 -(-2+ 22 +0 2 +
y=i
2 -(~ + 1 )=~ 2 - (j + 2) = -;
2)= 2 4) =0 6)= 0 02 = 4
4 -(~+ 3 )=! +
(D1 + (-w
'-'1- 0.816
2
I
+x
,. I' = l +.r
. I
6
,
;+..-
(3. 4)
4
J (I. 2)
2
,
-,
-,
(2. 2) y = - 2+ 2..2
J
4
,
x
6
flgU" 1.12
We see that the Ime y = ~ + x produces the smallest least squares error among these three lines. Figure 7. 12 shows the data points and all three lines.
Least Squares Approximatio n
Section 7.3
583
i
It turns o ut that th e Ime y = + x in Exam ple 7.24 gives the smallest least squares erro r o f any line, cven though it passes through /lOlle of the given points. The rest of this section is devoted to illustrating why this is !>O. In general, suppose we have 11 data poin ts ( XI' YI)'" . , (x", Y") and a line Y = a + bx. Our erro r vector is
" • •
where e, = y, - (a + bx,). The line y = a + bx that m inimizcs d + .. + e~ is called the leastsquares approximating line (or the fin e o/best fit ) fo r the poin ts (Xl' Yl ), ..• , (x", y"). As noted prior to Example 7.24, we can express this p roblem in matrix form. If the given poin ts were actually on the line y = a + bx, then the n linear equat ions
a + bx" = y" would all be true (i.e., the system would be consistent ). O ur interest is in the case where the poin ts are not collinear, in which case the system is inCO/lsistelil. In matrix form, we have
x,
I
y, y,
I
y.
I
which is of the form Ax = b, where I
A=
y,
x,
I
• x = I
[ ~J.
b =
x.
y,
Y.
The error vector e is j ust b - Ax (check th is), and we wan t to m inimize lel2 or,equivalently, Il e li . We ca n therefo re reph rase our problem in te rms of matrices as follows.
DeHnUlan
If A is an mXn matrix and b is in Ax = b is a vector i in IR" such that
lib for all x in RI!.
AX ~
R'", a least squara solution of
s Ib - Axil
584
Chapter 7
Distance and ApproxllllatlOn
SolulioA 01 the least Squares Problem Any vecto r of the form Ax is in the colum n space of A, and as x varies over all vectors in JR ", Ax varies over all vectors in col(A). A least sq ua res solution of Ax = b is therefore equivalen t to a vecto r r in col (A) such that
lib - rll
<
lib - rll
fo r all y in col (A). In other wo rds, we need the closest vector in col(A ) to b. By the Best Ap p ro xi mation Theorem, the vector we want IS the o rthogo nal projection o f b onto col(A ). Thus, if x is a least squares solution of Ax = b, we have
Ax = p roj~A)(b )
(I )
In order to find x, it would appear that we need to first compute projcol(A)(b ) and then solve the system ( 1). However, there IS a better way to proceed . We kno w that b - Ax = b - p roj ,o~AJ
a;(b - Ax)
~
a,· (b - Ax)
~
0
ThIs IS true If and only if
A' (b - Ax) ~ [a,
a"i'(b - Ax)
a',
ai(b - Ax )
o
( b - Ax) ~
a' " wh ich,
III
~ O
o
turn, is equi valent to
Th is rep resents a system of equations known as the normal equations for x. We have just established that the solutions of the normal equations for x are precisely the least sq uares solutio ns of Ax = b. Th is proves the first part of the follow ing theorem.
Theorem J.9
The Least Squares Theorem Let A be an m X n matrix and leI b be in least squares solution x. Moreover,
Rm. Then Ax. = b always has at least one
a. x is a least squares solu tion o f Ax = b if and on ly if x is a sol ution of the normal equatio ns A TAx = ATb. b. A has linearly independe nt columns if and only if A TA is invert ible. In this case, the least squares sol ution of Ax = b is unique and is given by
Section 7.3
Least Squares Ap proximation
585
Prool
We have already established pro perty (a). rOT property (b), we note that the II columns of A are linearly independent if and only if rank( A) = n. But this is true if and only if A l'A is invertible, by Theorem 3.28. If ATA is inverti ble, then the unique solution of A1'Ax = Ar b is dearly x "" (ATA) -l Ar b. ~
Example 1.25
Find a least squares solution to the inconsistent system Ax = b. where
Solu1l01
\
5
2
- 2
- \
\
and
3 2 5
b =
We compute
2
-2
-:]
5
[~ 3~]
-2 \
2
and
- 2
The normal equations ATAx = A rb arc just
whic h yield i =
UJ.
The fact that this solution
IS
unique was gua ranteed by
Theorem 7.9(b), si nce the colum ns of A arc cl early linearly independent.
Remark
We could have phrased Example 7.25 as fo llows: Find the best approximation 10 b in the col umn space of A. The resulting eq uations give the system Ax = b whose least squares solution we just found . (Ver ify this.) In this case, the components of x arc the coefficiellts of that linear combinatio n o f the columns o f A that produces the best approximation to b-namely,
,
!
\
+
2
185
- \
5
3
-2
-I I
\
This is exactly the result of Example 7.23. Compare the two approac hes.
Example 1.26
Find the least squares a pproximating line fo r the data points ( 1,2), (2, 2), and (3,4) from Example 7.24. SOllltiOI
We have already seen that the corresponding system Ax = b is \
\
\
: [ :] =
\
2 2 4
lnd Approximation
where y = a + bx is the line we seek. Since thc colum ns of A .1fC dearly linea rly independent, there will be a unique least squares solution . by part (b) of the Least Squares Theorem. We compute
I
2
I] I 3
:
I
I~J
2
3
Alb
and
r: ~] ~ I
=
2
4
Hence, we can solve the normal equations ATAx = ATb, using Gaussian elimination to obta in
[~
[A'A I A'b] =
So x -
[:J.
6 '] 14 18
(rom which wc sec that a =
squares approximating line: y =
L b "'"
I are the coefficients of thc least
i + x.
The line we just found is the line in Example 7.24 (c), so we have justified our claim that this line produces the smallest least squares error (or thc d:lta poilus ( I, 2), (2,2), and (3, 4). Notice thatrf x is a least squares solution of Ax ... b, we may compute the least squares error as
Since Ax = projcol(A)(b), this is just the length of perPdl,4)(b}-that is, the distance from b to the column space of A. In Example 7.26, we had e = b-AX =
2 2
I
; U]
4
I
3
I
\
,
_1
=
I
so, as in Example 2(c). we have a least squares error of lei =
Vi .. 0.816.
RI"'lrk
Note that the colu mns of A in Example 7.26 are linearly independent, so (A TA) - I exists. and we could calculate x as i = (A TA) - rA Tb. However, it is almost always easier to solve the normal equations using Gaussian elimination (or to let your CAS do it fo r you!). 11 is interesting to look at Example 7.26 from two different geometric points of view. On the one hand, we have the least squares approximating line y = + x, with corresponding errors £ 1 = L £ ] and l:] as shown in Figure 7.1 3(a). Equivalently. we have the projection of b onto the column space of A, as shown in Figure 7.I3(b). Here,
= - ;,
p = projcol("J
1
=!.
I
I
I
2 3
I
rn
=
,, •,
¥
Section 7.3
Least Squares Approximation
581
,.
e, and the least sq uares error vector is e = if the data points werecollincar? l
Example 1.21
£2 . [What \~ould Figure 7.13(b ) look like e:J
Find the least sq uares ap proximating line and the least sq uares error fo r the points (1, I ), (2, 2), (3, 2), and (4) J),
Let Y = a + bx be the eq uation of the line we seek. Then, substituting the four points mto this equatio n, we obtain
SOIUtiOB
11+ b = I a + 2b = 2
a + 3b = 2 a + 4b = 3
0'
I
I
I
2
I
2
I
3
4
So we wan t the least squares solution o f Ax = b, where
A~
I
I
I I
2 3
I
4
I
and
b ~
2 2 3
Since the colu mns of A are linearl y independem, the solution we want is
!
L
(Check this calculation.) Therefore, we take a = and b = producing the least squares approxi mating line y = + ~ x, as sho wn in Figure 7.14.
i
518
Chapter 7
DIstance and Appro)[lmation
y 5 4
Since
e = b - Ai =
th e [cast squares error is ! ell
~
1
1
1
2
1
2
2
1
3
3
1
4
VS/S -
-.I.
-'"-
[l]
" " .I. "
_.l.
0.447.
We can use the method of [eas t squares to approximate data points by curves other than straight lines.
Example 1.28
Find the parabola tha t gives the best least squares approximation to the points ( - 1, 1), (0 , - I), (I , 0), and (2,2). T he equation of a parabola is a quadratic y = a + bx th e given points into this quadratic, we obtain the lin ear system
SalullDn
a- b+ a
a+ b+
, ,
~
a+2b+4c
1
1
- I
1
- I
1
0 1 2
0 1
0'
0 2
1
1
4
a b
,
1 - I ~
Thus, we want the least squares approximation of Ax = b , where
A ~
1
- I
1
1
0
0
1
1
1
1
2 4
and
b ~
1 - I 0
2
+ d.
0 2
Substituting
Section 7.3
Least Squares Approximation
589
We compute
2 2 6 6 8
6 8 18
4
ATA =
,nd
ATb ""
2 3 9
so the normal equations are given by
2 6 8
4
2 6 whose solution
6 8 x= 18
2 3 9
IS
_L
-x = -,'", I
Thus. the least squares approximatlllg parabola has the eq uation
"" ,x + x-'
y = - - -
as shown
In
Figure 7. 15.
4
3 (2, 2)
2 ( -1.1 )
-3
-2
- I
I
\ - I
7
2
3
(0. - I )
-2 filII" 1.15 A least squares approximat mg pa rabola
One of the important uses of least squares approximation IS to estimate constants associated With va rious p rocesses. The next example illustrates th is application III the context of population growth. Recall fmm Section 6.7 that a population that is growing (or decaying) expo nen tially satisfies an equation of the form p(t) = al'. where p( t) is the size of the population al lime t and cand kare constants. Clearly, ("" P(O), but k is not so easy to de termine. It is easy to see that
p'( t) k = Pit) wh ich explains why k is sometimes referred to as the relative gro lVllI rate of the populalion: It IS the ratio of the growth ra te p' (t) to the size of lhe populatio n p(r).
590
Chapter 7 Distance and Approximation
CA5
- Example 1.29
Table 1.2 Year 1950 1960 1970 1980 1990 2000
Population (in billions) 2.56
3.04 3.71 4.46 5.28 6.08
Source' u.s. Bureau of th e Census, Inlerna tional Data Base
Table 7.2 gives the population of the world at 10-year mterv
Solliliol Let's agree to measure time t in lO-year intervals so that 1 = 0 is 1950, t = 1 is 1960, and so on. Si nce c = p(O ) = 2.56, the equation for the growth rate of the populat ion is
p ~ 2.56/' How can we use the method of least squares on this equation? If we take the natural logarithm of bot h sides, we convert the equation into a linear one: In p = In(2.56/') =
In 2.56
= 0.94
+
+ In( Ct) kl
Plugging in the values of tand p from Table 7.2 yields the fo llowing system (where we have rounded calculations to three decimal places); 0.94
=
0.94
k ~ O.172
2k = 0.371 3k = 0.555
4k = 0.724 Sk = 0.865 We can ignore the fi rst equation (i t just corresponds to the initial condition c = p (O) = 2.56). The rema in ing equ.llions correspond to a system Ax = b, with I A ~
2 3
,nd
b ~
4
5
0.1 72 0.37 1 0.555 0.724 0.865
Since ATA = 55 and ATb = 9.80, the corresponding normal equations are just the single eq uation
55i = 9.80 Therefore, k = x = 9.80/ 55 '" 0.178. Consequen tly, the least squares solutio n has t he form p = 2.56e'l.I7sJ(see Figure 7.16). The world's populatIOn in 2010 corresponds to t = 6, from whICh we obtain
p( 6) = 2.56Jl.I1a(6) ... 7.448 Thus, if our model is accurate, there will be approximately 7.45 billion people on earth in the year 20 I O. (The U.S. Census Bureau estimates that the global population will be "only" 6.82 billion in 20 [0. Why do you think our estimate is higher?)
Section 7.3
Least Squares Approximation
591
IJ(I)
7 ~
•c
.-o
6
5 4
-
E
3
I
-+~~~~+-~~+, 2 3 4 5 6 Decades smce 1950
flgl"1 .16
leaSI Squales via Ibe OR faclOllzalioo It is often the case that the normal equations for a least squares problem arc ill -
conditioned. Therefore, a small numerical error in performing Gaussia n elimination will result in a large error in the least squares solulion. Consequen tly, in practice, other methods are usually used to compute least squares approximations. It turns out that the QR factor izadon of A yields a more reliable way of computing the least squares approximation of Ax = b.
Theorem J.10
,
Lei A be an mX n matrix with linearly independent columns and let b be in Jim. If A "" OR is a OR factorization of A, then the unique least squares solution i of Ax = bis
Proof Recall from Theorem 5. 16 that the QR factorIZation A = QR involves an m X " matrix Q wit h orthonormal columns and an invertible upper tri.mgular matrix R. From the Least Squares Theorem, we have ATAx = ATb 0:>
(QR )'QRx ~ (QR) ' b
=> RTOTQRx = RTQTb => RTRx = RTQTb since QTQ = I. (Why?) Since R IS Invertible, so is R 1~ and hence we have RX = QTb
or, equivalently, x
=
R- IQTb
-----
Remark Since R is upper triangular, in practice it is easier to solve Ri = QTb directly than 10 Invert R and compute R- ' QTb.
592
Chapter 7
Distance and App roximation
Example 1.30
Use the QR factorization to find a least squares solution of Ax = b. where
1 22
Solullon
- 1
I
2
- \
0
1
I
I
2
2 and
- 3
b =
-2
o
From Example 5.15,
A = QR =
1/ 2 3Vs/IO -V6/6 -1 / 2 3Vs/ IO o - 1/ 2 Vs/ IO V6/6 1/ 2 Vs/ IO V6/3
1/2 o Vs 3Vs/2 o 0 V6/2 2
I
We have
-1 / 2 - 1/2 1/ 2 1/2 3Vs/ 1O 3Vs/IO Vs/ IO Vs/ IO
o
- V6/6
V6/3
V6/6
2
- 3 -2
o
7/2
-Vs/2 - 2V6/3
so we require the solution to Rx = QTb, or
2
I
o Vs 3Vs/ 2 -
x ~
o o
7/2
1/2
V6/2
- Vs/2 - 2V6/3
Back substitutIOn quickly yields x =
4/ 3 3/ 2
- 4/ 3
Orthogonal Projection Hevlslled One of the nice byproducts of the least squares method is a new formu la for the orthogonal projection of a vector onto a subspace of [R:"'.
Theorem 1.11
Let W be a subspace of [R:m and let A be an mX It matrix whose columns fo rm a basis for W. If v is a ny vector in Rm, then the orthogonal projection of v onto W is the vector proj w(V) = A(ATA ) - IATv The linear transformation P : ]Rm ---cI' ]Rm that projects R'" onto Whas A(A TA) - I AT as its standard matrix.
Prool
Given the way we have constructed A, ils col umn space is W. Since the columns of A are linearly independent, the Least Squares Theorem guarantees that there is a unique least squares solution to Ax = v given by x = (ATA)-IATv
Section 7.3
Least Squares Approximation
593
By equation ( I), p roj ~,tlv ) =
Ai =
projw(v )
prohv{v) "" A((ATA)- 'ATv ) = (A(A TAt'Al)v
Therefo re,
as required. Since th is equation holds for all v in A"', the last statcmenl of the theorem follows immediately. We will ill ustrate Theore m 7.11 by revisiting Exam ple 5.1 1.
EKample 1.31
3 - I
Find the orthogonal projection of v =
o nto the plane W in R' with equat ion
2 x -
y + 2z == 0, and give the standard matrix of the ort hogonal projection transfor-
mation onto W.
Solullon
As in Example 5. 11 , we will take as a basis for W the set - I
1
1 1
1 • 0
We fo rm the matrix
A=
1
- I
1
1
0
1
with these basis vectors as its columns. Then ATA = [
I
- I
1 1
1
~l
1
1
0
1
=
[~ ~l
[~ ~l
(ATA)-I =
so
-I
By Theorem 7.11, the standard matrix of the orthogonal projeCtion transfo rmat ion
onto Wis A(ATA r lAT = A =
1
- I
1
1
0
1
[~
m-:
1 1
~l
,,
=
so the orthogonal projection of v onto W is
, 1 , •,1 ,,• 1, , _1 ! , , , - j
proj ll'{v) = A(ATA ) - IA1"v
=
which agrees with our solution to Example 5.1l.
3 - I
2
,,1 _1, ! I, • ! _1• ! , , ,
,,
_.,, 1
514
Chaptt'r 7
Distance and Approoimallo n
Remark
Since the projection of a vector onto a subspace Wis unique, the standard matrix o f this linear transfo rmation (as given by Theorem 7. 11 ) cannot d epend on the cho ice of basis for W In other words, with a d ifferent basis for W, we have a different matrix A, but the matrix A(ATA ) - IA T will be the same! (You arc asked to verify this in Exercise 43.)
The pseudolnverse 01 a Matrix If A is an /I X /I matrix with linearly indepe ndent colum ns, then it is invertible, and the un ique solution to Ax = b is x = A- lb. If m > "and A is m X n with linearly independent columns, then Ax = b has no exact solution, but the best approxi mation is given by the unique least squares solution x = (ATA ) - IATb. The matrix (ATA) - I AT th erefore plays the role of an "inverse of A" in this situatIOn.
•
Definition
If A is a matrix with linearly independent columns, then the
pseudoinverse of A is the matrix A + defined by A+ = (ATArIA T
• O bserve that if A is mX n, then A + is nX 111.
Example 1.32
, , Find the pseudoinverse o f A =
1 2 , 3
Solation
We have already done most of the calculati ons in Example 7.26. Using our previous work, we have
[ - I1 - ':][' 1 I
,
2
'] ['-~
3 =
1
o
The pseudoinvcrsc is a conven ient shorthand notation fo r some of the concepts we have been explo ring. For example, if A is mX n wit h linea rly independent col umns, the least squa res solution o f Ax ::: b is given by
and th e standard matrix o f the orthogonal projectio n Pfrom IR:'" onto col(A) is
[PJ - AA' If A is actually a square matri x, then it is easy to show that A+ ::: A-I (see Exercise 53). In this case, the least squares solution of Ax = b is the exact solution, since
Section 7.3 Least Squares Apprmamation
595
The projection mat rix becomes !P) = AA + = AA- 1 = I. (What is the geometric interpretation of this equality?) Theorem 7.1 2 summarizes the key properties of the pseudoinverse of a matrix. (Before reading the proof of this theorem, verify these properties for the matrix in Example 7.32.)
Theorem 1.12
,
Let A be a matrl.'{ with linearly independent columns. Then the pseudoinverse A+ of II satisfies the following properties. called the Penrose conditions for A: a.AA+A = A
h. A " AA ~ = A + c. AA* and A+ A are sym metric.
Proal We prove cond ition (a) and half of condition (c) and leave the proofs of the remaining conditions as Exercises 54 and 55. (a) We compute AA+ A = A«(ATA )- lAT)A
= A( ATA ) - r(ATA ) "" AI = A (c) By Theorem 3.4, A TA is symmetric. Therefore, ( A 1'A j - 1 is also symmetric, by
Exercise 46 in Section 3.3. Taking the transpose of AA ~ , we have
(M · )' - (A(A' Ar 'A')' - (A')'«A'Ar')' A' = A(/l TAt l AT = AA+
Exercise 56 explores furt her properties of the pseudoi nverse. In the next section, we will see how to extend the defin ition of A ~ to handle all matrices, whether or not the columns of A are linearly independent.
Exercises 1.3 In ExerCISes / -3, consider tire data points ( 1, 0), (2, I), (ma (3, 5). Compure tile least S'IlIares error for tire given line. /11 each case, plot the points and the IlIIe. 2. y = - 3+2x I. Y = -2 + 2x 3.y= -3+ ~ x 111 & erdses 4-6, cOllsider lIre data points (- 5, 3), (0, 3), (5, 2 ), and ( 10, 0). Compllte the least sql/ares error for tile givelllille. III each mse, plot the points alld the lille. 4. y = 2 - x
.
-,
5 Y _'
6. y = 2 - i x
In exercises 7- 14, filld the leas/ squares (lpproxlmating line for the given pomts (/ lid compllle the corresponding least squares error. 7. (1. 0 ).(2. 1).(3.5 ) 8. ( 1.5).(2, 3),(3.2) 9. (0. 4),(1, 1),( 2, 0 ) 10. (0,2 ), (I, 2), (2, 5) II. ( - 5. - I). (0. I). (5. 2). (10. 4)
596
Dislance and Approximation
Chapler 7
12. (- 5, 3), (0, 3), (5, 2), ( 10,0)
In Exercises 27 alld 28, a OR factorization of A is given. Use it to filld a least squarf$ SOllltioll of Ax = b.
13. (I, 1),(2, 3), (3,4), (4, 5),(5,7)
2
14. (I, 10), (2,8),(3,5), (4, 3),( 5, 0 )
27. A
=
In Exercises 1~ 1 8, find the least squares approximating parabola fo r the give'l points.
2 0 ,Q I
I I
15. (I, I), (2, - 2), (3, 3), (4, 4) 28. A =
16. (1 ,8), (2, 7) , (3, 5), (4, 2)
2
- I
17. (- 2, 4), (- 1, 7), (0, 3), (I , 0), (2, - I) 18. (- 2,0), (- I, - II)' (0, - 10), ( I, - 9 ), (2, 8)
In Exercises 19-22,find a least squares SOhdiotl of Ax = b by cOllStructmg alld solvmg Ih e normal equations. 19.A =
20.A =
21. A =
3
I
I
1
I ,b =
1
I
2
I
3
- 2
I
I
- 2,b=
I
2 I
I 1 - 2 4 - 3 I ,b = 5 - 2 4 0
o 2 3
o
2 I
- I
3
I
- I
2
22. A =
- I
3
In ExercISes 23 and 24, sl/Ow that the least squares solution of Ax = b is no/ unique and solve the normal equations to find (Ill tlte /e(lS t sqlJares SO/ullOllS.
23. A ::::
I
1
0
0
1
I
0
I
I
o
- 1
I
I
-3 2
J
- 1
I
0
o 24. A ::::
'
1/\16 2/ V(, - I/ V(,
-\16/2] b 1/ 0 '
b-
o
I
I
- I
I
I
0
I
o
I
I
I
I
I I I
29. A tennis ball is dropped from various heights, and the height of the ban on the first bounce is measured. Use the data in Table 7.3 to fi nd the least squares approximating line for bounce height b as a linear function of Imtlal height h.
h (,m)
20
40
48
60
80
b (em)
14. 5
31
36
45.5
59
L = tI+bF Table 7.4 shows the results of attaching various weigh ts to a spring.
L
4
I
1/ 0 , 0 1/ 0
5
- I
,b -
100 73.5
30. Hooke's Law states that the length L of a spring is a linear function of the force Fapplied to it. (See Figure 7.17 and Example 6.92.) Accordingly, there are constants a and b such that
I
=
0 - I ,Q I
I] b _ I '
2 3 - I
Table 1.3
5
,b
, 1 1, _ l, U= [3 0 ,, ,1 ' 1
I
3 - I F
I
Exercises 25 alul 26,jil,d the best approximation 10 a soilltion of tile given system of equations. 26. 2x + 3y + z = 21 25. x + y- z - 2 - y + 2z = 6 x+ y + z = 7 -x + Y - z = 14 3x + 2y - z = I I - x+ 2y + z = 0 z- 0
fIIlUfll.11
(II
Table 1.4 F(oz) L (in.)
2
4
7.4
9.6
6 11.5
8 13.6
Section 7.3
least Squares Approximallo n
591
1980
1990 75.4
Table 1.5 Year of Birth Life Expectancy (years)
1920
1930 59.7
54.1
1940 62.9
1950 68.2
1960 69.7
1970 70.8
73.7
SOliru Iforld A/mllnllC 1It111 /Jook of Flier,. New York: World Almanac Books. 1999
(a) Determine the constants a and b by findi ng the least squares approximating line fo r these data. What does 1/ represent? (b) Estimate the lengt h of the spring when a weigh t of 5 ounces is attached.
(b) Use the equation to estimate the U.S. population in 20 10.
Table 1.1
31. Table 7.5 gives life expectancies for people born in the Uni tcd States in the given years.
Year 1950 1960 1970
(a) Detcrm ine the least squares approximating line for these data and use it to predict the life expectancy of someone born in 2000. (b) How good is th is model? Explain .
1990 2000
where Vo is Its mitial velocity and is IS the constant of acceleration d ue to gravity. Suppose we take thc measurements shown in Table 7.6.
rable 1.6 1 17
1.5 21
281
Source' u.s. Bureau of th e Census
5(1) :: ~ + Vol + igr2
0.5 11
150 179 203 227 250
[980
32. \Vhcn an object is thrown st raight up into the air, Newton's second law of motion states that its height s( t) at time t is given by
Time (s) Height (m)
Population (in millions)
2
23
3 18
(a) Find the least squares approximating quadratic for these data. (b) Estimate the height at which the object was released (in m), its initial velocity (in m/s), and its acceleration due to gravity (m m/s2). (cl Approximately when will the object hit the ground? 33. Table 7.7 gives the popul;lIion of the United States at 10-year in tervals for the years 1950-2000. (a) Assummg an exponential growth model of the form p(t) :: ceh , where p(t) is the popu lation at lime t, usc least squares to find the equation for the growth rate of the population. [Hint: Let t = 0 be 1950. J
34. Table 7.8 shows average major league baseball salaries for the years 1970-2000. (a) Find the least squares approximating quadratic for these data. (b) Find the least squares approximating exponen tial for these data. (c) Which equation gives the better approximat ion? Why? (d ) What do you estimate the average major league baseball sala ry will be in 20iO?
Table 1.8 Average Salary (thousands of dollars)
Year 1970
29.3
1975 1980 1985
44.7
143.8 37 1.6
597.5 111 0.8 1895.6
1990
1995 2000 Sowrc,· Major Leag lie
I~ seball
Play<.'rs ASSOClat IQIl
Chapter 7 DIstance and Approximation
598
3S. A 200-mg sample of radtoactive polonium-21 0 is ob-
served as it decays. Table 7.9 shows the mass remainin g . . at vanous times. Assuming an exponential d tXay model, use least squares to find the half-hfe of polonium-21 O. (See Section 6.7. )
as a basis fo r Wand repeat the ca lcula tions to show that the resulting projection ma trix is the same. 44. Let A be a ma trix wit h linearly independent columns and let P "" A{ ATA 1A T be the ma trix of ort hogonal projection o nto col(A).
r
(a) Show that P is symmetric.
(b) Show that P is idempoten t.
Table 1. Time (days) Mass (mg)
o
30
200
172
60 148
90 128
I II Exercises
45. A = 36. Find the plane z = a + bx + cy that best fi ts the data points (0, - 4, 0), (S, 0, 0 ), (4, - I, I ), (I, -3, I), and ( - I , - 5, - 2).
In Exermes J7~2, find the standard matrix of the orthogonal projection onto the subspace W: Then use thJ5 matrIX to find the orthogonal projection of v onto IN.
sP,"([:]).v= [!] = sP ,n([ _;]). v= [:]
38. W
39. W = span
1
2 3
, V ""
1
40. W = span
2 2 - I 1
41. W "" span
- I
42. W "" span
, V ""
1 1
- 2 •
0
1
o
2
[~
48. A=
2
:]
2
50. A = [ ;
o
1
0
o
1
2
1
0
1
o
1
- I
o
1
1
1
1
-2
1
1
1
o
0
2
52. A =
(a) (cAt = ( l/ c)A+ for all scalarsc#:O. (b) (A+t = Aif Aisasquarematrix. (e) {A 1)'" = (A +) T if A is a square matrix.
0 0
, V ""
2 3
43. Verify that the standard matrIX of the projection onto W in Exa mple 7.3 1 (as constructed by Theorem 7.11) docs not depend on the choice of basis. Take 1
1 • 3
o
=
1
Prove the following:
1
- I
1
49. A =
- 1
56. Let A be a matrix with linearly independent columns.
1
1
47. A =
1 3 31
55. Prove the remaimng pa rt of Theorem 7.12(c).
0 0
1
0 • 1
2
54. Prove Theo rem 7.12(b). 1
,v =
46.A =
1 - I
53. (a) Show that If A IS a square matrix with linearly inde pendent columns, then A- = A- I. (b) If A is an mXn matrix with ortho normal columns, what is A;-?
1
1
[2'] 1 3
51. A
37. W =
45-52, compille Ille pseudo lfl verse of A.
1
57. Let 11 data points ( X I' YI}'" . , (x n , Yw) be given. Show that If the points do not all lie o n the same vertical line, then they have a umque least squares approximating line. 58. Let II data points (xl' YI )" . . , (x"' Yn) be given. Generalize Exercise 57 to show that if at least k + I of Xl' . . . ,x" are d istinct, then the given points have a unique least squares approximating polynomial of degree at most k.
Sectio n 7.4
The Singular Value Deco mposition
399
" ,¥.
The Singular Value Decomposition In Chapter 5, we saw thaI every symmetn c matrix A can be factored as A = PDpT, where P is an orthogonal matrix and D is a d iagonal matn x displaying the eigenvalues of A. If A is not symmetric. such a factoriza tion IS not possible, bUI as we learned in Chapter 4, we may still be able to factor a square matrix A as A = PDp -I , wh ere 0 is as befo re but P is now simply an inverllblc matrix. However, not every matrix is diagonaiizable, so it may surprise you that we will now show that every matrix (sym met ric or not, square or not) has a factorization of the form A = PDQ T, where P and Q are orthogonal and D is a diagonal matrix! ThiS remarkable result is the sillgular value decomposilion (SVD). and it IS o ne of the most important of all matrix fa ctorizations. In this section, we will show how to compute the SVD o f a matrix and then con· sider some of its many applications. Along the way, we will tic up some loose ends by answering a few questions that were left open in previous sections.
The Singular Values 01 a Matrix For any mX n matrix A, the nX n m:ltrix AT" is symmetric and hence can be orthogonally diagonalized, by the Spectral Theorem. Not only are the eigenvalues of ArA all real (Theorem 5. 18), they are allnollnegativt. To show this, lei A be an eigenvalue o f AT" with corresponding unit eigenvector v. Then
O:s
~ Avll = (Av) · (Av) = (AV )TAv =
vTAv
= A(v·v)
= vTATAv
= A~vll l
=A
It therefore makes sense to take (positive) square roots of these eigenval ues.
Deflnillon
If A is an ", X II matrix, the singular values of A are the square roots of the eigenvalues of ATA and arc denoted by (T I' . .. ,(Tn ' It is conventional to arrange the singular values so that (T I 2: U l 2:: ... > an' ·s
Example 1.33
Find the singular values of
A =
SolUIloR
1
1
1
0
o
1
The matrix 1
o
~]
has eigenvalues AI = 3 and A2 = I. Consequently, the singular values or A are ~ = v3 and (T 2 = ~ = I.
IT]
600
Chapter 7 Distance and Approximation
To understand the significance of the singular values of an m X II matrix A, con sider the eigenvectors of A TA. Si nce ATA is symmetric, we know that there is an ortlroIfOrmalbasis for RI< that consists of eigenvectors of ATA. Let I V I" • • , v"J be such a basis corresponding to the eigenvalues of A 1A, ordered so that A, 2: Al 2: '" 2: An' From our calculations just before the defi nition,
A, = IAv,I' Therefore, In other words, the singular values of A are the lengths of the vectors Av p " " Av n, GeometrICally, this result has a nice interpretation. Consider Example 7.33 again. If x lies on the unit circle in [R2 (Le. , ~ x ~ = 1), then
IAxf = (A x ) · (A x ) = (A x l(A x ) = xTA'rAx
= [ XI X2][ ~ ~] [ :] = 2x7
+
2Xl x l
+ 2xi
which we recognize is a quad ratic form. By Theorem 5.25, the maximum and mllli mum values of this quad ratic form, subject to the constrain t I x ~ = 1, are AI = 3 and A2 = I, respectively, and they occur at the corresponding eigenvectors of ATA- that is, when x =
VI
=
[ ~; ~] andx =
v1 =
IAvil = ,
[ - :; ~lrespectively.Since
v ,TA TAv J =
A ,
fo r i = 1,2, we see that 0" 1 = I Avl ~ = if) and U 1 = IAvll = I are the maximum a nd minimum val ues of the lengths l Ax. as x traverses the uni t Circle in Rl. Now, the Imear transformation corresponding to A maps R2 onto the plane 111 R3 with equation x - y - z = 0 (verify this), and the image of the unit circle under this transformation is an ellipse that lies in this plane. (We will verify this fact in general shortly; see Figure 7. 18.) So U I and U 1 are the lengths of half of the major and minor axes of this ell ipse, as shown in Figure 7. 19. We can now describe the si ngular value decomposition of a matrix.
z
)'
M,II t ipl ic:l t ion
2
/ -2
'\ 0
"-
/
-
byA
• 2
2
=--- + -+-+"' 2
-2
-2 fIIOlll .1' The ma trix A transrorms the unit circle in Rl into an elh~ in It'
figure 1.19
Section 7.4
601
The Singular Value DecomposlIlOn
The Singular Value DecompaSition We want to show that an m X n matrix A ca n I>e facto red as A = ULV T
where U is an mXm orthogonal matrix. V is an /IX n orthogonal matrix, and L ;s an mX n "d iagonal" matrix. If the nonzero singular values of A arc
and
(J" .oj.
I = (J" , +2
= ... =
(J" n
= 0, then :£ will have the block fo rm
, • ,
[" ;0] , l: ~ oio' ;... :
if,
where
.. .
0
D ~
(I )
0
CT ,
and each matrix 0 is a zero mat rix of the appropriate Size. (If r = m or r = lI,some of these will not appear.) Some examples of such a matrix L with r = 2 are
L
=
I~
0 3
~l
~
~
2 0 0 2 , 0 0
8 0
l:
0
0
3
0 ,
0
0
0
~
5 0 0 0
0
0
2 0 0 0 0 0
(What is D in each case?) To construct the orthogonal matrix V, we first fi nd an orthonormal basis Iv l , ••• , Viti for LQ" cons;stlllg o f eigenvectors of the nX n symmetric mat rix A TA. Then
V=[ VI
•••
v,,]
an o rt hogon:11 n X n matrix. For the o rt hogo nal matrix U, we first n ote tha t jAv" . . . , Av,,) is an orthogonal set of vectors ;n R.n. To see this, suppose that v, is the eigenvector o f A TA corresponding to the eigenvalue A,. Then, fo r I j, we have IS
'*
(Av,) ' (AvJ) = (AvYAvJ
= vTA , TAv
,
= v;AJvJ
, ,
~ A (v'v ' ·) ~ O
since the eigenvecto rs v, a rc orthogonal. Now recall that the singular values sallsfy cr, = IIAv, 11 and that the fi rst r of these a re n onzero. Therefore, we can no rmalize Av I' ... , Av, by setti ng
u, =
I
~Av,
IT ,
for i = I, ... , r
This guamn tees that ju l , .•• , u ,l is an orthonor mal set in Rm, but if r < In it will no t be a basis for Jim. In th is case, we extend the set \u l , .. • , u,l to an orthono rmal basis {u l , ••• , u",1 fo r R .... (This is the only tricky part of the construction; we will describe techniques forca rry mg it out in the examples below and in the exeTClses.) Then we set
... u.J
602
Chapll'r 1 Dist".lIlcc and App roxi mall on
All that remai ns to be shown is that this works; that IS, we need 10 verify Ihal WIth U, Y, and ~ as described. we have A "" Ul: yr. Since V r == V-I , this IS equivalent to sho wing that AV = UI
We kno\v ,h
0' ,
Av, == u ,u, fori = 1, ... , ( = 0 for i:::: r
+
I, ... , n. Hence.
Av, "" O fo r i= r + 1, .. . ,11 Therefo re.
...
v.1 [Av , ... Av.1
AY =A [v ,
= [Av ,
lUlu,
.. . ...
Av,
...
0
•• •
...
u,
01 01 0• • ••• •• ••• ••
0
Uml o ... iT r : ......................•... --.
o
!0
as required. We have just pro \'ed the following exlremd y imp orla nl theorem.
Theore. T. 13
T h e Sin gul ar Val ue Dec om pos itio n
Let A be an
mat rix wit h singular values fT ~ fT2 2: . .. ?! CT, > 0 and J u ,+, = 0' .--2 "" .. . = u ~ "" O. Then there exis t an III X III orthogonal matrix U. an n X tI orthogonal mat rix V, and an m X n mat rix} ; of the fOtnLSh o.wU..-ill equatio n ( I ) such that mX n
A = U~V r
A factoriza tion of A as in Theorem 7.13 is called a singular value dec omposi'ioll (S VD) of A. The column s of U are called left sing ular "tct ors of A, and the columns of Yare called righ t singular vectors of A. The matrices U and V arc not ulllquely determi ned by A. bUI :£ IIIlIsrco ntain the singular values of A, as in equ ation ( 1). (See Exercise 25.)
Example 1.34
Find a si ngular V',Jlue deco mposition for the following matrices: (,) A = [ ' I 0] 00 1
(b) A =
I
I
I
0
o
I
Section 7.4
The Singular Value Decomposition
61S
Salullan (a ) We compute I ATA=I
I I
0 O
00
1
=
and find that its eigenvalues arc Al = 2, Al eigenvectors
I, and Al
I
0
I ,
0,
I
o
I
0
= 0, with
- I
(Verify th is.) These vectors art orthogonal, so we normalize them
VI =
1/ V> 1/ Vi ,
0 v!
=
o
I/ V> V = I/ V> o
0 0
0 • vl
I
obtaIn
=
I/ Vz
o
Vi, q 2 ,. VI - I/ V> I/ V> o
10
- I/ V>
I
The singular values of A are U I ::::
correspondmg
= I, and
and
Ul
= yO = O. Thus,
l: =[Vio 00] I
0
To find U, we compu te
I/ v>
I [ '
V>O
~ ~ll!V2 o ~ ~]
and
o 0 = I
[~]
These vectors already form an o rthonormal basIs (the standard basis) for RI, so we have
U=
[~ ~]
This yields the SV D
I/ V> I/ V> 0
o
o
I
-I/ V> I/ V> 0 which can be easily checked. (Note that V had to be transposed. Also note that the singular value u ) does nOI appear in :i:. )
60.
Chapter 7
Dlst:lI1cc ,lI1d App ro:dn1Ul10n
(b) This is the matrix in Example 7.33. 50 we already know that the singular values .Ire 0" 1
=
0
and
O"~ =
I . correspo ndi ng to
V30 2 =
o o
I
and
VI
=
[ :j~]
and
V1
=
[- :: ~J. So
v = [ '/ V2 - I/V>] I/ V>
1/ V2
0
For U. we compute
: I [l/V>] o 1/ V2
o
I
: I [- I/V>] o 1/ V2
,nd
=
=
o
2/ V. I/ V. II V.
o - I/V> I/ V>
This time. we need to extend lUI' u21 to an orthonormal baSIS for Rl. There arc
several ways to proceed; one method is to use the Gram-Schmidt Process, as in Example 5. 14. We first need to find a linearly independent set of three \'ecto rs that con tains u l and ul" If el IS the third standard basis vector in R3, it IS dear that I u p u 2• eJ ] is linearly independ ent. (Here. you should be able to determine this by Inspection. but a rel iable method to use in general is to row reduce the matrix with these vectors as its columns and use the Fundamental Theorem.) Applying GramSchmidt (with normahzation) to lUI' U z• ejl (o nly the last step is needed). we find
- 1/ V3 Ul=
1/0 1/V3
o 2/ V. - 1/ V3 U= I/ V. - 1/ V2 1/ V3 I/ V. I/ V> 1/ V3 and we have the SVD
I I A= I 0 0 I
=
0 - 1/ V3 2/ V. I/ V. -1/ 0. 1/ V3 1/ 0. II V. 1/ V3
V3
0
0 0
0
I [ 1M 11v'l] "" VI vT - 1/ 0. 1/ 0.
4
There IS ano ther form of the singular value decomposition. analogous to the spectral decomposition of a sym metric matrix. It is obtained from the SVO by an outer product expansion and is very useful in applicat ions. We can obtain this version o f the SVO by imitating what we dId to ob tain the spectral decomposition.
Section 7.4 The Singular Value Decomposit ion
605
using block multiplicatio n and the column-fo w representat ion of the product. T he following theorem summarizes the process fo r obtaining this outtr product form of
the SVD.
I
Theo". 1.14
The Outer Product For m of the SVD matrix wi th singular values U , ~ U 2 2: ... ~ (1' , > 0 and (1' ,+1 = (1',+2 '" .•. '" (I' ~ = O. u t U , ' ••• , U , be left singular vectors and let vI" '" v, be right singular vectors o f A corresponding to these singular values. Then Let A be an
mXn
A=
Rlmarll
ifIUIvIT
T + ... + i fr-•• " v
If A is a positive d efin ite. symmetric matrix, then Theorems 7. 13 and 7. 14 bot h reduce to results that we already know. In this case, it is not hard to show that the SVD generalizes the Spectral T heorem and that Theorem 7. I 4 generalizes the spectral decompositio n. (See Exercise 27.) The SVO of a matrix A contains m uch important informa tion about A, as outlined in Theorem 7. 15, which is crucial.
6D6
Chapter 7
Distance and Approximation
Theor•• 1.15
Let A "" UI VTbe a singular val ue decomposition of an mX 1/ malrixA. Lei U I"" ,u r be all the nonzero singular values of A. Then
a. The ra nk of A is r. b. lu p " " u ,1 is an ortho no rmal basis for col(A).
c. lu,...." ... , u", 1 is an o rthonormal basis for nu!l(A T). d. Iv p ... , v,! is an o rthonormal basis for row(A}. c. !v,<- p"" v~i is an orthonormal basis for null(A).
Prllr
(a) By Exercise 55 in Sectio n 3.5, we have rank(A) = rank( U~ v 1)
= rank(I VT) ~ " nk(~) ~
(b)
,
'''''1' already know that Iu ,' .. . , u,1 is an o rthonormal sct. Therefore, it is linearly
independent, by T heorem 5. 1. Since u, = ( l! ifj)Av, for i
= 1, ... , r, each uj is in
thc
column space o f A. (Why?) Furthermore, r "" ra nk(A) "" dim(col(A) Therefore, lu,•. .. , u,1is an o rt honormal basis for col(A), by Theorem 6. IO(c) . (e) Since lu p ... , u",l ls an o rthonormal basis for [R:'" and lu "" ,) u,l is a basis for col(A), by properly (b) , It follows that I U ,....I' •.. , u ",1 is an ortho norma l basis for the o rthogonal complement of col (A). But (col(A» .1. = null(A T) , by Theorem 5.10. (e) Since
Av, ... 1 = ... = Av n = 0 the set Iv,... I' . .. , vnl is an o rt ho no rmal sct containcd In the null space of A. Thcrefore, Iv,,,,,,, ,, , v~1 is a linearly independent set o f 11 - rvectors in null{A). But dlm(null(A» = n - r by the Rank Theorem, so Iv... I' . .. , v"l is an orthonormal basis for null(A). by Th ~o rem6.IO(c) . (d ) Property (d ) foll ows fro m property (c) and Theorem 5. 10. (You arc asked to p rove this II) Exercise 32. )
The SV D provides new geo m etric insight into the effect of matrix transforma t ions. We have noted several times (wllhout proof) that an ", X II matrix transforms the uni t sphere in R ~ into an ellipsoid in R". This point arose, fo r example. in our disc ussions of Perron's T heorem and of opcr:n o r no rms, as well as in the int roductio n to smgular values in this section. We now prove thIS result.
Section 7.4
Theorem 1.16
The Si ngular Value lJC'composition
601 •
Let A "" UL VJ ' be a singular value decomposition of an mX n matrix A with rank r. Then the image of the unit sphere in Rn under the matrix transfonnation that maps x to Ax is a. the surface of an ellipsoid in R'" if r = n. b. a solid ellipsoid in R'" jf r < fl.
PrDDI Let the left and right singular vectors of A be up, .. , u m and respectively. Since rank(A) = r, the singular values of A satisfy
vI' • • .
, v n'
x, be a unit vec tor in IRn. Now, since Vis an orthogonal
byTheorem 7.15(a). Letx =
x.
matrix, so is V T, and hence V Tx is a umt vector, by Theorem 5.6. Now
,
,
vT V Tx ""
v Tx X
=
vT
v Tx
•
•
so (vi X)l + ... + (V; X)l = L By the o uter prod uct form of the SVD, we have A = O'lulvT + Therefore,
... +
+ u ,u,v;x + ' .. + (u,v;x )u,
Ax "" u 1u 1vi x + ... = (u lvTx)Ul =
YtU t
+ . + y,.u,
where we are Icu ing y, denote the scalar u ,v;x. (a) [f r
=
fI,
then we must have n S ~
III
and
Uy
y,
. Therefo re, again by Theorem 5.6,
where y =
~ Axl
= ~ Uyll = ~ yK, since U is
y. orthogonal. But
(;JZ+ '" + (~y
= (vTX)2 + ...
+ (V;X)2
"" 1
which shows that the vectors Ax form the surfa ce of an ellipsoid in R"'. ( Why?) (b) If
r<
II,
the o nly difference in the above steps is that the equation becomes
1 (;J2+ ... + (:J S I
since we arc mi ssing some terms. TIllS inequality corresponds to a solid ell ipsoid in H"'.
Chapt~r
7
Distance and Approximation
hample 1.35
Describe the image of the unit s phere in Ill' under th e ac tion of the matri...
A
Solution
~
[I0 01 0]I
In Exam ple 7.34(a), we found the following SVO o f A:
O][vl 1 0
I/ vl 0 0 1
o
1
I/ vl 0
Since r = rank(A) = 2 < 3 = 11, the second part of Theorem 7. 16 applies. The image of the unit sphere will satisfy the inequality
,
or~+ 2< 1 2 Y2relative to Y.Y2 coordinate axes in R2 (correspo nd ing to the left slllgular vecto rs u l and u l ). SlIlce u . = e l and U 2 = el , the image is as sh own in r igure 7.20.
, T,
1'1 • •
-:+I;-"~
- >-+,-t'f-+' y ,
-I
x
fl",. 7.20
In general, we can describe the effect of an m X rI matrix A on the unit sphere in Ian in term s o f the effect o f each factor in its SVO, A = U2: V 1, from right to left . Since V T is an orthogonal matrix, it maps the unit sphere to itself. Th e mX /I matrix 2: docs two th ings: The d iagonal entries u ,.;. 1 = U ,+2 = ... = u " = 0 col l:lpse " - r of the di mensions of the un it sphere, leaving an r-dimensional unit sphere, which the n o n ~ero diagonal entries u p " " u , then d istort inlo an ellipsoid. The orthogonal matrix U then al igns the axes o f this ell ipsoid with the orthonormal basis vectors u l " ' " u , in [Rm. (See rigure 7.21.)
APplications 01 the SVD The singular value decomposition is an extremely useful tool, both practically and t heo retically. We will look at just a few o f its many ap plications.
6G9
Section 7.4 The Singular Value Decomposition
, 0
c
)
j
figure ) .21
RanI! Until now, we have not wo rried abo ut calculating the rank of a matrix fro m a computational point o f view. We compute th e rank of a matrix by row red ucing II to echelon form and counting the number of nonzero rows. However, as we have seen, roundoff errors can affect this process, especially if the matrix is ill -condi tioned. Entries that should be zero may end up as very small no nzero num bers, affccting ou r ability to accurately determme the ra nk and other quantities associated wit h the matrix. In practice. the SVD is often used to fin d the rank of a matrix, since it is much mo rc reliable when roundoff errors arc presen t. The basic idea beh ind this appro.1ch is that the orthogonal mat rices Vand Yi n the SVO preserve lengths and thus do not introd uce additional errors; any errors that occu r will tend to show up in the matrix I.
CAS
Example 1.36
Leo
A=
8. 1650
-0.004 1
-0.0041
4.0825
-3.9960
4.0042
4.0825
4.0042
-3.9960
and
B=
8.17
0
0
4.08
- 4
4.08
4
4 - 4
The matnx 8 has been obtained by rounding o ff the en tries in A to two decimal places. If we compute the ran ks of these two a pproximately equal matrices, we find that rank(A) = 3 but rank( B) = 2. By the Funda men tal Theorem, this Implies,amo ng other thm gs, that A is Invertible but B is no l. T he explan.ui on for th is cri tical d iffe rence between two matrices that are approximately equal lies in their SVDs. The singular values o f A are 10,8, and 0.01 , so A has rank 3. The singular values of Bare 10,8, and 0 , so 8 has rank 2. In practical applications, it is often assu med that if a singular value IS computed to be close to zero, then ro undoff error has crept in and the act ual value should be zero. In this way, "nOise" can be filte red out. In this example, if we compute A "" VI VT and rep lac~
10
l:
=
then VI ' VT = 8. (Try it !)
0 0
0
0 8 0 0 0.01
by
-= ~.
10 0 0
0
0
8 0 0
0
-t-
••Irll Nor. , lid I" Condilion Number The SVO can provide simple formulas for certain expressions involving matrix no rms. Consider, for example, the Frobenius no rm of a matrix. The following theorem shows that it is completely determined by the singular values of the matrix.
111
Chapter 7
DiSlarl('(' and Approxllnation
Theor•• 7.11
Let A be an m X fI mat rix and let (T I"' " U , be all the nonzero singular values of A. Then .
The proof of this resul t depends on the followlIlg analogue o f Theorem 5.6: If A is an
", X II
matrix and Q is an m X m orthogonal matrix, then
(2l
To show that this is tr u ~, w~ compute
IQAI} = I[ Q., "" IQaln + ... + I Qa ~l~ "" laM + .. . + l anll ~
=IAn Praol 01 TIIeorellll.11
LeI A :;;:: VI VTbe a singular value decoml'osition of A. Then,
using equati on (2) tw ic~, we have
IAI}
= I ULV' )} =
p:VT)} =
=
IVL 1} = P: 1~= ~ + ... + U;
I( l:V')'l}
which establishes the resull.
CAS
Elample 7.37
Verify Theorem 7. 17 for the matrix A in Exa mple 7. 18. Solutlol
The matrix A =
[~
-'] 4
has singular values 4.5150 and 3. 1008. We
check that
whl('h agrees with Example 7. I8.
In Section 7.2, we commented that there is no easy formula for the operator 2-norm of a matrix A. Alt ho ugh that is true, the SVO of A provides us with a very nice expression fo r IAh Recall that
IAll
=
,maxlAxl -,
'""here the vector norm is the o rdinary Euclidean norm. By Theorem 7. 16, fo r II x ~ = 1. the set of vecto rs IAxl lies on or inside an ell ipsoid whose semi-axes have
The Singular Value Decomposition
Sectioll 7 A
611
lengths equal to the nonzero singular values of A. It follows immed iately that the largest of these is (J I' so
This provIdes us wi th a neat way to express the condition number of a (square) matrix with respect to the operator 2-norrn. Recall that the condition n umber (with respect to the operator 2-norm) of an invertible matrix A is defined as
wnd,(A) ~ II A-H Al, As you will be asked to show in Exercise 28, If A "" UL VT, then A- I = V~ - l U T. Therefore. the singular values of A - I are 1/ a I' ... , 1/ a n (why?), and
u,
", Enmple 1.38
Find the 2-condition number of the matrix A in Example 7.36.
SOlulioD
Since a 1
""
10 and a 3
0.01, a cond 2(A) = =
l
a 3
""
10 "" WOO 0.01
This value is large enough to suggest that A may be ill-conditioned and we should be wary of the effect of roundoff errors.
The Psaudolnvarse Ind leall Squires IgprOlimatioJl
In mtio n 7.3, we produced the for mula A+ = (A TA) - I AT for the pseudoinverse of a matrix A. Clearly, th is formula is valid only if ATA is invertible, as we noted at the time. EqUIpped with the SVD, we can now defi ne the pseudoinversc of any matrix, genemlizing our previous formula.
f.. Ii . ~It)ore ( 1862- 1932) was an
American ma thematician who worked in group theory, number theory, and geo metry. He was the
first head o f the mathematics depHrtment lU the University of Chi cago when it opened in 1892. In 1920, he introduced a generali zed matrix inverse that included rectan gula r ma trices. His work did not receive much attention because of his obscure writing style.
Definition D [o
0
Let A = Ul: v T be an SVD for an m X" matrix A, where L =
and D is an rX rdiagonal matrix containing the nonzero singular values
0 a 1 2!: a 2 ~ • . • ;:!: a , > 0 of A. The pselldo;1Iverse (or Moore·Pet,,05ei1lverse ) of A is the n X m matrix A+ defi ned by where ~+ is the /IX m matrix
.12
Chapter 7
Disl:mce and Approximation
Exallple 1.39
•
Find the pseudoinverscs of the matriCes In Example 7.34. (a) From the SV D
Solll101
[~
A=
I
o
I/ V, 0
~] = [~
o
I
U!VT
I/ V, 0
we form
I/ V, 0
')~
-+- =
o o
I
o
Then
1/2 0
I/ V, 0 1/ V, 0
o
o o
o
I
1/ 2 0
o
I
(b) We have the SVD I
o
I
0
- I/ V,
•
I
1/ 0.
I
- I/ V, I/ V, I/ V,
o [ 1/0. o ~ - 1/ V2 o
~ '= [ '/:
o I
~]
a nd
- 1/"0.][ 1/ 0 I/V,
One of those who was unaware of Moore's wo rk on matnx inverses was R"lIeT Pcn~ (b.19J I) , who int roducM his own nOlion of a generalizM ma.tnx inverse in 1955. Pen rose has made many contributions to geo rnetry and theoretICal phYSIcs. He is also the IIlvcntor of a type of tlonperiodlc tr/;'Ig that CO\'ers the plane with
only two different shapes of tile. yet has no repeatlllg pattern. He has received many awards. IIlcludlllg the
1988 WolfPm.c in Physics, whICh he shared with Stephen Hawking. In 1994, he was kmghted for se rviCes to SCience. Sir Roger Penrose IS currentl y the Emeritus Rouse Ball
Professor of Mathematics at the Umversity of Oxford.
=
•
2/ V.
0 .0 ] • - 1/ v'J I
[ 1/ 3 2/3 -1/3] 1/3 - 1/3 2/3
It is straightforward to check thai this new defi ntt ton of Ihe pseudoinverse general izes the old one, for if lhe m X n matrix A = U! VThas linearly independent columns, then direct substitution shows thai (ATA) - IAT= VI--+- U T. (Youare asked to verify Ihis in ExerCIse 50.) O ther properties of the pseudoinverse are explored in Ihe exercises. We have seen that when A has linearly independent columns, there is a unique least squares solution x to Ax = b: that is, the normal equations ATAx - ATb have the unique solution x =
(ATAr'ATb = A+b
When the columns of A arc linearly dependent , then ATA is not invertible, so the normal equations have infinitely many solutions. In Ihis case, we will ask for the solution x of minimum length (i.e.• the one closest to the origin). h turns out that this time we simply use the general version of the pseudoinverse.
Section 7.4
Theorem 1.18
The Singular Value Decomposition
613
The le:;lst squares problem Ax = b has a unique least squares solution x of minimal length that IS given by
PrOal Let A be an ",XII matrix o f rank r with SVD A = U~V T (so that A"t VL +U T ). Let y = V Tx and let c :::: U Tb. Wn te y and c in block form as y =
[~:J
c ::
and
[~:J
where YI and ' I are in R'. We wish to min imize I b - Axl or, equivalen tly, Ib - Axil. Using Theorem 5.6 and the fact that U T is orthogonal (because U IS) , we have
li b - Axl' ~ iU'(b - Ax)II '
~
I U'(b -
U~ V 'x) II ' ~
IU'b -
U 'U~ V 'x ll '
~ ~e - ~yi' ~ [::]- [ ~ ~][;:l '~ [e, -" Dy,] , The o nly part o f th is expressio n that we have any control over is YI' so the minimum value occurs when ' I - Dyl = 0 o r, equivalently, when YI :::: D- I'I' So all least squares solutions x are of the for m
_ D e, x = Vy ;: V [ -0'
Set
1
We claim that th is x is the least squares solution of minimal length. To show this, let's suppose that
x'
=
Vy' :::: V [
is a d ifferent least squares solution (h en ce, Y2 ~ x~ ~
I Vy~
~
Ilyll <
D- ' C I Yl
1
'* 0). Then
~ y'll ~
I Vy'i
~ ~x'i
as claimed . We stil l must show that x is equal to A-+ b. To do so, we simply compute
x=
Vy:: :
vl D~CI ]
::::
v[ D~I ~][:J
-Example 1.40
Find the minimu m length least squares solution of Ax = b , where
A = [~ :J and b =I~]
614
Chap t~r 7
Distance and Appro)(imation
S,I.llo. The corresponding equat ions x+y =O x+y=1 arc clea rly inconsistent, so a least squares solution is our only hope. Moreo~r, the columns of A are linearly dependent, so there will be infinitely many least squares solutions--among \vhich we want the one with minimal length. An SVD of A IS given by A = [:
I]
I
-
1/0] [ 1/0 1/0] [' 0][1/ 0 1/0 - 1/0 0 0 1/0 - 1/0
(Verify this.) !t follows that
1/ 0 [ 1/ 0
0][ 1/ 2 - 11// 0] 0
1/ 0 ][ 1/2 - 1/ 0 0 o 1/ 0
1/4]
1/' [ 1/ ' 1/'
-x = A' b =
You tall see that thl' minimum least squares solution ill Example 7.40 satisfies x + y = !. [n a sense, this IS a compronU5e betwee n the two equations we started with. In Exercise 49, you are asked to sol\'e the normal equations for thIS problem directly and to verify thai this solulion really is the one closest to the origi n. til. F.......,., """••f I...nlbl•••'rlces It IS appropnate to conclude by revisiting the Fundamental Theorem of Invert ible Matrices one more time. Not surprisingly, the Singula r values of a square InoJtrix tell us when the matrix is invertible.
,
Theorem 1.19
The Fundamental Theorem of Invertible Matrices: Final Version
"x"
Let A be an matrix and let T: V --+ \V be a linear transformation whose matrix [11C_8 With respect to bases B and C of Vand W, respectively, is A. The following statements are equivalent: A is invertible. Ax == b has a uniq ue solution for every b in Ax == 0 has on ly the trivial solution. The red uced row ecllelon form of A is I~. A is a product of elementary matrices. f. rank(A) = II g. nullity(A) = 0
a. b. c. d. e.
R~.
h. The column vectors of A are linearly independent. I. The column vectors of A span R~. j. The column vectors of A form a basis for R~. k.. The row vectors of A are linearly independent. I. The row vectors of A spa n IRw.
Section 7.4
In.
The Smgular Value Dem mposition
615
The row vectors Qf A form a basis for R-.
n. d etA#-O o. 0 is not an eigenvalue o f A. p. T is invertible. q. T is one-to-one. r. T is onto. s. ke r(T) = {OJ t. range( T ) = W u. 0 is not a singular value of A.
Prlot
First note that , by the definition of singular values, 0 is a singula r val ue o f A if and only if 0 is an eigenvalue of ATA.
(a) ~ (u) If A is invertible, so is AT, and hence ATA IS as well. Therefore, property (0 ) implies that 0 is not an eigenvalue of ATA, SO 0 is not a singular value of A. (u) ~ (a) If 0 is no t a si ngular value of A, then 0 IS no t an eigenvalue of ATA. Therefo re, ATA is invertIble, by the equivalence of properties (a) and (0). But the n rank{A) = n, by Theorem 3.28, so A is invertible, by the equivalence o f properties (a) and (f) .
e
it
ression
Among the many applications of the SVD. one of the most impres."ive IS its u~ in compressing digital Images so that they can be efficiently tr.msmi ued electronically ( by satellite, fax, Internet, or the like). We have already discussed the problem of d etecting and correcting errors in suc h transmissions. The problem we now Wish to consider has to do with reducing the amou llt of information that has to be tnl nsm lt tcd, wit ho ut losing any essential information. In the case of digital images, let's suppose we have a grayscale pic.ture thaI is 340 X 280 pixels in s ize. Each pixel is o ne of 256 shades of gra y, whICh we can represent by a number between 0 and 255. We can store this information in a 340 X 280 ma trix A, but tra nsm Itti ng and manipulating these 95,200 numbers is very expensive. The idea behind image compression is that some parts of the picture are less interesting than others. For example, in a photograph of someone standing outside, there m ay be a lot of sky in the b;lckground, while the person's f:ICe cOl1t;lins a lot o f detai l. We can probably get away with transmitting every second or third p ixel in the background, but we would like to keep all the pixels in the regio n of the face.
It turns out that the small Singular values in the SVO of the matrix A come from Ihe"bori ngHpa rts o f the image, and we can ignore many of the m . Suppose, then, that we have the SV D of A in outer product form
A =
U IU IV ;
+ ... + u ,u ,v;
Let k ~ rand define Then Ai is an approxima tion to A that corresponds to keeping o nly the fi rs t k singu lar values and the corresponding singular vectors. For our 340X 280 example. we ma y discover that it is enough to transmit only the data corresponding to the fi rs t 20 s m gular val ues. Then, instead of tra nsm illing 95,200 n umbe rs. we need only send 20 s ingular values plus the 20 vectors u p ...• u 20 in !Ill4\) and the 20 veclOrs vI" .. , vro in 1R 2&l, for a total of 20
+
20·340
+
20,280 = 12,420
numbers. This rcpresellls a substan tial s.1vi ng!
flIU" 1.22
The pICture of the mat hematician Gauss in Figure 7.22 is a 340X 280 pixel image. It has 256 shades of gray, so the correspo nding matr ix A is 340X 280, with e n! ries between 0 and 255. It turns ou t that the matrix A has rank 280. If we approximate A by Al • as described above, we get an image thai corresponds to the first k singular values of A. Figure 6 shows several of these images for values of k from 2 to 256. At first, the Image is ve ry blurry, but fai rly quickly it takes shape. Notice that A J2 already gives a p relly good approximation to the act ual image (which COmes from A = A,fI(l' as shown in the upper left-hand corner o f Figure 7.23). Some of the singular values of A are {T I "" 49,096, {T I6 = 22,589, u n = 10, 187, 0"64 = 484, u m = 182, U l 56 = 5, and u 28D - 0.5. T he sm aller singular values contribute very lillie to the image, which is why the approximations quickly look so dose to the o riginal.
Orignia1. k = r - 280
k= 4
k=8
k = 16
k = 32
k = 64
k = 128
k = 256
flilr. U3
6"
618
Chapter 7
Distance and Approximation
In Exercises I-10, find the singular values of the given matrix.
[~ ~] 3. A = [~ ~]
2. A =
I. A =
5. A =
0 0 0 3 - 2 0
[~
[0 ~] 0
60 A ~ [3
[:]
7. A =
9. A =
4. A =
[~ ~]
0 2
29. Show that if A :: UI V T is an SV D of A, then the left singular vectors are eigenvectors of AA T. 30. Show that A and AT have the same singular values.
4) 1 0 0 1
8. A =
31. Let Q be an orthogonal matrix such that QA makes sense. Show tha t A and QA have the same singular
- 2 2
~]
1 10. A =
0 I
0 1 - 3 0 0 1
1" Exercises 11- 20, find MI SVD of tire mdlcated matrix. II. A in Exercise 3
13. A = [
0 -2]
- 3
0
12. A = [ - 0' 14. A =
[ II
00] - 11 ]
15. A in Exercise 5
16. A in Exercise 6
17. A in Exercise 7
18. A in Exercise 8
19. A in Exercise 9
20.A=[:
1
1
(b) Show that, for a positive defin ite, symmetric matrix A, Theorem 7. 14 gives the spectral decomposition of A. 28. If A is an invertible ma trix with SVOA = u~ V T, show that I is invertible and that A- 1 :: vt - 1 U T is an SV D of A - I.
:]
hI Exercises 2 /- 24, find tire outer proc/llet form of tile SVD
values. 32. Prove Theorem 7. 15(d). 33. What is the image of the uni t ci rcle in 1R2 under the action orthe matrix in Exercise 3?
34. What is the image of the unit circle in R2 under the action of the mat rix in Exercise 7? 35. What is the image of the unit sphere in n3 under the action of the matrix in Exercise 9? 36. What is the image of the unit sphere in R J under the action of the matrix in Exercise \O?
In Exercises 37-40, compute (a) I A ~2 and (b) cond2(A ) for the indicated matrix. 37. A in Exercise 3
39. A
=
38. A in Exercise 8
009] 1 [ 1
10
100
1
for tIre matrix ill tI,e given exercises. 21. Exercises 3 and 11 22. Exercise 14 23. Exercises 7 and 17 24. Exercises 9 and 19 25. Show that the matrices U and Vin the SVO are not uniquely determined. ( Hint: Find an example in which it wou ld be possible to make di fferent choices in the construction of these mat rices.) 26. Let A be a symmetric mat rix. Show that the singular values of A arc: (a) the absolute values of the eigenvalues of A. (b) the eigenvalues of A if A is positive defini te. 27. (a) Show that, for a positive definite, symmetric matrix A, Theorem 7.13 gives the orthogonal diagonalization of A, as guaranteed by the Spectral Theorem.
~]
In Exercises 41-44, compute the pscudoinverseA + of A in tire gil'en exercise. 4 t. Exercise 3 42. Exercise 8 43. Exercise 9
44. Exercise \0
In Exercises 45-48. find A + and use it to compute the minimallength least squares solutIOn to Ax = b.
45. A= [ ~
!].b=[!]
46.A =[~ ~ ~J. b=[~] 47.A=
1
1
1
1
1
3
1 I ,b= 2
Sect ion i.S
48. A =
1 0
1
o
1
o
I
0
1
56. Let Q be an orthogonal matr ix such that QA makes sense. Show that ( QA ) + ::: A I QT.
1
, b ""
619
Applications
1
57. Prove that if A is a positive defi nite matrix with SVD A = UIVT,then U = v.
I
49. (a) Set up and solve the normal equations for the system of eq uations in Example 7.40. (b) Find a paramet ric expressio n for the length of a solution vector in part (a). (c) Find the solution vector of minimal length and verify that it is the o ne prod uced by the method of Example 7.40. I Hint: Recall how to fi nd the coordinates of the vertex of a parabola.]
50. Verify that when A has linearly independent colum ns, the definitio ns of pseudoinverse in this section and in Section 7.3 arc the same.
58. Prove that fo r a diago nal matrix, the 1-,2-, and oo-norms are the same.
59. Prove that for any square matrix A, IIAII~ ::5 ~ A ~ II AI"". I /·/illl: ~ Al i is the square of the largest singular value of A and hence is eq ual to the largest eigenvalue of ATA. Now use Exercise 34 in Section 7.2.}
".w
51. Verify that the pseudoinverse (as defi ned in this section) satisfies the Penrose conditions for A (Theorem 7.1 2 in Section 7.3). 52. Show that A + is the only matrix that satisfies the Penrose conditions for A. To do this, assume that A' is a matrix satisfyi ng the Penrose condi tions: (a) AA' A"" A, (b) A' AA' = A', and (c) AA' and A' A are sym metric. Prove that A' = A +. [Him: Use the Penrose conditions for A + and A' to show that A + = A' AA + a nd A ' = A' AA ... . 1t is helpfu l to note that condition (c) can be written as AA' = (A') TAT and A' A = ArCA') 1', with similar versions for A+.J
53. Show that (A +) + := A. [Hil!t: Show that A satisfies the Penrose conditions for A.... By Exercise 52, A must therefo re be ( A +)+ .J 54. Show that (A +) T ::: (A T )+ . I Hilll: Show that (A+ ) T satisfies the Penrose cond itions for AT. By Exercise 52, { A+ )T must therefore be (AT)+.J 55. Show that if A is a symmetnc, idempotent matrix, then A+ = A.
Every complex lI umber call be written in polar form as z = re"', where r = Izi is a 1I00!1!egative rea/number ,mdO is ilsargllment, with Ittl = I. (See Appelldix C.) Th lls, z has been decom posed illiO a stretching factor r alld a rotalioll factor ctJ. There is all "I/,,/ogolls decomposilioll A = RQ for square matrices, ((II/ed Ihe polar decomposition. 60. Show that every square matrix A can be factored as A ::: RQ, where R is sym metric, positive sem idefinite and Q is orthogonal. [Hilll: Show that the SV D can be rewritten to give A ~ U~ VT ~ U ~ ( U"U) V T ~ (U~ U')( UVT) Then show that R righ t properties.]
= U!
U T and Q
= UVT have the
Filld a polar decomposilioll of the matrices ill Exercises 61-64. 61. A in Exercise 3
63. A=[ -3I
-~l
62. A in Exercise 14
-2
2 2
-3 6
4
- I
6
4
64. A ""
Applicalions ApproKlmatlon 01 Functions In many applications, it is necessary to approximal" e a given fu nction by a "nicer" function. For example, we might want to approximate f(x) = e~ by a linear fu nction g(x) = c + ,Ix on some interval la, bJ. In this case, we have a contin uo us fu nction f, and we want 10 approximate it as closely as possible o n the interval [a, bl
121
ChapteT 7
Distance and ApproxImation
DY a fu nction g in the subspacr ~I. Thr general problem can be phrased. as follows: Givcn a continuous fu nCiion {on an inlervalla, bl and a subspace Waf (, (a. bl. find the fu nction "dosest" to {i n W. The problem is ana logous to the least squares fiui ng of data poin ts, except now we have infinitely many data poin ts-namely, the points on the graph of the fun ction f What should "approximatc" mean in Ihis context? Oncc again , the Best Approximation Theorem holds the answer. The given funct ion f lives in thc veCior space '€ [a, bl of continuous fu nctions on the interval [(I. bJ. This is an inner produCi space, with mncr product
r
(f, g):
j(x)g(x)
•
If W is a finite-dimensional subspace of C€{ (I, bl. then the brs! approximation to f in W is given by the projKlion o ff onlo W, by Theorem 7.8. Furthermore. if {u 1, ... , U k} is an orthogonal basis for W, then
.
proJ wU ) =
Example 1.41
(u,.f) (u l • U I
)u1 + ...
(u,.f)
+(
, u,
U l. Uil
Find thc best linear approximation to f (x) "" r%on the interval I - I, II . l.inrar functions aTe polynomials of degree I, so we use the subspace W 'lP II -I. I) of<e( - I, I I with the mner product
Salalllol
(f, g) : A basis
for~ 1 1- 1, I I
r-,
f(x)g(x)
is given by [I , xl . Since
(I. x):
r,
x
thIS is an orthogonal basis, so the best approximation to {in W is
e- e 2
l
2e~ 1
+ , x
,
Section 7.5
where we have used integration by parts to evaluate tions. ) See Figu re 7.24.
II
Applications
621
xe dx (Check these calculaX
- I
Y
3
/ (.\)
(!r
~(.I') ."
1
1.18
+ Ll Ox
2
- I
figure 1.24
Th e error in app roximatinsfby g is the Olle specified by the Best Approximation Theo rem: the d istance 11- d between I and g relative to the inner product o n '€[- I , 1[. This error is just
II! - gil
~ ~ f' (I(x) -,
g(x))' dx
and is often cal lecl the root mean square error. W ith the aid of a CAS, we fl nd that the root mean sqU
lex -
(He - e- I ) +
3e- lx)~
= J"f"'-(-' '- -I-('- -'--' )- --3-'-- '-X-)'- dx - = 0.23
-,
He • ."
The roo t mean square er ror can be tho ught o f as analogous to the area bet\veen the graphs of f and g on the spec ified interval. Recall that the area between the graphs of land g on the interval Ill, bj is given by
r
I!(x) - g(x Jl dx
•
(See Figure 7.25.) Although the equation In the above Remark is a sensible measure of the "error" between I and g, the abso lute value sign makes it hard 10 work with. T he root mean squa re error is easier to use and therefore preferable. The square root is necessary to "compensate" for the squaring and to keep the un it of measurement the same as it wo uld be fo r th e area between the curves. For comparison pu rposes, the area between the grap hs of la nd g in Example 7.41 is
/(x )
a
flgare 1.75
b
r
-,
leX -
He - e- I )
-
3e- l xj dx '" 0.28
622
Chapter 7
Distance and App roximation
Example 1.42
Find the best quadratic approximation to [(x) = eX on the interval r - I , I J.
Solullon A quadratic function is a polynomial o f the form g(x) = a + bx + cx 2 in w = ttP 2 r- I, 1J. This time, the standard basis 11, x, x 2 1is not o rthogonal. However, we can construc t an orthogonal basis using the Gra m -Schmidt Process, as we did in Example 7.8. The resuit is the set o f Legendre polynomials { 1,X,Xl _H
Using this set as our basis, we compute the best approximation to fin Was g(x) = p rojw (eX ) . The linear terms in th is calculation are exactly as in Example 7.41, so we only requi re the additional calcuhnions
(x 2 -
L e~) :==
f
(x 2 - n ex d.x = !'.x 2C dX
- I
and
-1
eXdx =
He -
r(x4- ~ x2 + .)dX =:5
f(X 2 -\ )! dX= _I
- 1
T hen the best quadratic approximat ion to [(x) = eX on the in terval [- I,
=
3( lle- 1
-
e)
4
7e- l )
-I
- I
(x2-L .x2 _ ~) =
r
15( e - 7e- l )
_ I
+ 3e x + .:::,,---,c:......!.X2."
1.00
4
+
II
I. lOx
(See Figure 7.26.) y
5
/(x l
= 1"
g(x) - I .DO 4
J
,
- 2
~L--x - I
flgural .26
1
2
+
1. l Ox
+ 0.54x2
is
+ 0.54x 2
Section 7.5 Applications
623
Notice how much bettcr thc quad ratic app roximation in Example 7.42 is than the li ncar approximation in Example 7.41. It turns out that, in thc quadratIC casc, the root mc:m square error is
r
v - g(x)11 ~ ~ -, V
- g(x))' dx
~ 0.04
In gcneral, the higher the degree of the a pproximating polynomial, the smaller the error and the better the approximation. In many applICations, fu nctions are approximated by combinations of sine and cosme functions. This method is particularly useful if the fu nction being approximated displays periodic or almost periodic behavior (such as that of a sound wave, an electrical impulse, or the motion of a vibrating system). A funct ion of the fo rm
p(x) = rlo +
fl l
cos X +
a l COS
2x + ... + ancos nx + VI sin x (I)
+ ~sin 2x + .. . + b" sin nx
is called a trigonometric poly nom ial; if a" and b" are not both zero., then p( x ) is said to have order n. For example, p(x) = 3 - cos x
+ sin 2x + 4 sm3x
is a trigonometric polynomial of order 3. Let's restrict our attcntion to the vector space <e[ - rr, TrJ with the inner product
(f, g)
r-.
~
[(x)g(x) dx
The trigonometric polynomials of the form
I It
equation ( I) arc linear combinations
of Ihe set /3
=
{I, cos x, .. . , cos f IX, sin x, . . . , Sill /Lx}
The best approximation to a function f ltl '£! - 1T , 1T I by a trigonometTlc polynomial of order tI will therefore be proj w(fl, where W = sp.lIl (B). It turns out thai B is an ort hogonal set and, hence, a basis for W. Verification of this fact involves showing that any two distinct functions in 8 are o rthogonal with respect to the given inner product. Example 7.43 presents some of the necessary calculations; you arc asked to prOVide the remaining ones in Exercises 17-1 9. Show that sin jx is orthogonal to cos kx I n
SalaUOD
ce [- Tr , Tr I for j, k >
l.
Using a trigonometTl{: Identity, we compute as follows: If j :# k, then
I '/T sin jx cos kxdx = - or
t
r
[sin(j + k)x + sin(j - k)x ] dx
_ or
+ k)x + cosU - k) Xj" j + k j - k
= _ ! [COS{) 1 ~
0
sinee the cosi ne fun ction is periodic wi lh penod 21T.
- 1T
IU
Chapter 7
Distance and .... pproximatlon
If ) = k,thcn
" I
Sin
kxcos kxdx
I 2"k{ sin 2 k.~y
=
11'
=0
-"
since sin /.:rr = 0 for any integer k.
[n order to find the orthogonal projeclion of 3 fun ction fi n "€[ - 7f, 7rJonto the subspace W spanned by the orthogonal basis 13, we need to know the squares of the norms of the basis vectors. Por example, using a half-a ngle form ula, we have (sin kx, sin kx} ==
r lr
sin 2 kx dx
-"
==
- <
==
~[ x
(I - cos2kx)dx
_
sin 2kx]" 2k _ 11'
In Exercise 20, you are asked to show that (cos kx, cos kx} = rr and (I, I} == 2rr. We now have proj w(f) = 110 +
" I
cos X + .. +
where
.. ~ (I,f) ( ) ~ -I I, I 2rr (1.
f"
11"
-.
cos
/IX
+ bl sin X + ... +
b~
sin
fIX
(2)
f(x)dx
(cos kx. n I k ) "" cos x, cos X 7T
= (k
(s;" kx, f) I bJ: = (. k k ) "" Sin X, SIll 'X rr
I" -.
(3)
f(x) cos k;((ix
f"
f(x) sin kxllx
-"
for k ~ I. The approximation to f given by equations ( 2) and (3) is called the nth-order Fourier approximation to fon [-7T, rrJ. The coefficients tlo. a l •• •• , a~, bl • .•• , b" arc called the Fourier coefficients off
Example 1.44
Find the fourth -order Fourier approximation IO f( X) = xon [- 7f. 1r).
Stlatto.
Using formulas (3), we obtain
ao =
....!.... J" 2rr
xdx ""
-<
....!....[o:.]lf 27T 2
= 0
- 1f
and for k 2:: I, integration by parts yields
at=
~f'" XCOSkxdx=..!..[xStnkx +~COS kx]1f_. .. 1rk k
-.
=0
Section 7.5
and
ApplicatiOns
625
bt = ;L".. xsinhdx=;[ - icosu+ ; lSinu[ " =
~[ -1TCOS/m' -k 1T COS(- k7T )]
- -2k 2
if k is even
ifk is odd
k
2(_ \ )*+' ~
k II fo lloWS that the fou rth-o rde r Fourier approximation 10 f (x) = xon i - 1T.1T i is jc,m-Il_LptiSle Joseph Fou rie r ( 1768-1830) WilS a French mathematician and physicist who goa med prominence through hi, ttwestigation into the th~ r y of heal. In his landmar k solution of the .so-called h('at ~ualion, he introduced te<:hniques related to what are now known as Fourier series, a 1001 widely use
2(sin x - ~ sin 2x
+ t sin 3x -
t Sill 4x )
Figure 7.27 shows the fi rst four Fourier approximations to f(x) = x on y
y
n = 1
1l "" 2
of Napoleon. accompanying him on hiS Egyptian campa ign in 1798. Later NapoJ ~n appomted Fourier Prefect of Ishe. where he oversaw many importan t ellgineering proje<:\s. In 1808. Fourier was made a baron. He is commemorated by a plaque on the Eiffei Tower.
y
y
'------'f-+,
n
nl.r.l.21
=
J
n- 4
/- 1T. "IT I.
626
Chapler 7 Distance and Approximation
You can clea rly sec the approximations in Figure 7.27 improving, a fact that can be confir med by computing the root mean square error in each case. As the order of the Fourie r approxi mation increases, it can be shown that this error approaches zero. The trigonometric polynomial then becomes an infinite series. and we write ~
[(x )
=
au + L (a~ cos kx + bi sin kx)
",
This is ca lled the Fourier series of [on [ -
r., r.1.
Error-Correcllng Cades Consider the triple repetition code
o o. o
I I I
If one or two er rors occur in the transmission of either of these code vectors, the resulting vector cannot be another ve
c'
=
I
o is received. However, the receiver has no way of correct ing the error. Since c' would also result if a single error occu rred during the transmission of 'I' But any sillg/eerror can be corrected, since the resulting vector can hilve arisen in only aile way. For example, if
o ,~ =
1
o is received and we know that at most one error has occurred, then the anginal vector must have been Co> since c" ca nnot arise from c1 via a single error. We will now generalize these ideas. As you will see, the not ion of Hamming diStance plays a crucial role in the definition.
Definition
Let Cbe a (binary) code. The minimum diSlance of C is the smallest Hamming distance between any two distinct vecto rs in C. That is,
d(C) = min {du(x, y): x "* yin C}
Clearly, the minimum distance of the triple repet it ion code C above is 3.
Elample 1.45
Find the minimum distance of the code
where
"' ~
0 0
C = {eo. Cit c2• c,l I 0 I 0
, '.= 0
0
0
I
,
' 2
=
I
0
I
,
'3=
I I I
Section 7.5
Solution
Applications
621
We need to compute the Hamming distance between each pair of dis tinct
vecto rs. (There lire four vecto rs, so there are
(~)
= 6 pairs.) We find thaI:
d l/(Co, c , ) = 2
d H(c o, c2) = 2
ell/(eo, c) = 4
d l/(c ,• c2 ) = 4
d HCc ,• Cj ) = 2
d,1(cl, c j
)
= 2
Therefore. d ( C) = 2.
It is possible to picture the notions of m inimum distllnce and error correct ion geometrically. In the case of the triple repetiti on code C, we have a subset (llctually, a subspace) of Z~. We can represe nt the vecto rs in Z~ as the vertices of a un it cube, as shown in Figure 7.28(a) . The Hamm ing d istance between any two vectors x and y IS j ust the num ber of edges in a shortest path from x to y. The code C corresponds to two of these vertices, Co and c ,. The fact that d(C) = 3 corresponds to the fac t that Co and c , a rc three uni ts apart, liS shown in Figure 7.28(b ). If a receIVed vector x is wi thin one unit o f either of these code vectors and we know that at most one error hlls occu rred , we can correctly decode x as the nearest code vector. In Figure 7.28(b), x wo uld be decoded as co' and y would be decoded as cl' This agrees with the fact tha t C can correct single but not double errors.
In
y
•
[~ j.
[~ I·
()
I ()
[gl'
0-
I
0 0
('J
[ij
•
<,
•
,.
I I
0
", .
(b,
flgur.l.2I In Exercise 41, yo u are asked to d raw a picture that illust rates the situation in Example 7.45. In general, we canno t draw pictures of ;l'l, but a Euclidea n analogy is hel pful. If a code ca n correct up to kerrors, think of the code vectors as the centers of spheres of rad ius k. The code vectors themselves are separated by at least d un its. Then, If a recei ved vector x is inside one of these sph eres, it will be decoded as the vccto r corresponding to the center of thai sphere. In Figure 7.29, x will be decoded as Co. This process is known as nearest tleigllbor decoding. Figure 7.29 suggests that if a code is able to correct k errors, then the "spheres" centered at the code lIectors cann OI to uch or o verlap; tha t is, we must have (I > 2k. This turns out to be correct, as we now make precise. A code is said to detect k errors if, for each code veclor c and each vector c' obta ined by changing up to k of the entries of c, c' is no t a code vector. A code is said 10 correct k errors if, for each code vector c
Chaptf'r 7 Distance and Approximation
and ~ach vector c' obtained by changing up to k of the entries of c, nearest neighbor d ecoding of c' produces c.
TDeor•• 1,20
lei Cbe a (binary) cod~ with minimum distance d. a. C d etec i skerrorsifa nd on lyifd~k+ I. b. C corr~cts k errors if and only if d > 2k + I.
Proal (a) Assume that d 2:: k + 1 and let c be a vector in C If up 10 k errors ar~ in troduced into c, then the resulting vector e' has the property that dl/(c, e') then c' cannot be a cod~ vector, since if it were, we wou ld have k+
1 S
d::5 dll(e, c')
S
~
k. But
k
which is impossible. Conversely, if C can detect up to k errors, then the minimum distance between any two code vectors must be greater than k. (Why?) It fo llows that d ~ k + I. (b) Assume that d ~ 2k + I and lei c be a vector in C.As in the proof of property (a), let c' be a vector such that d H(c, c') S k. Let b be another vector In C Then dH(c, b) 2: d 2:: 2k + I , so. by the Triangle Inequality,
2, + I
d,,(c,c'} + d,,(c',b}
>
2k + I - dl/(c, c')
2k + 1 - k = k + 1
d,,(c, b ) "
Therefore. dtA"c' ,b )
2::
2::
>
dl/(c', c)
So c' is closer to c than to b, and nearest neighbor decoding correctly decodes c' as c. Conversely, assume Iha l C can correct up to k errors. We will show that if d < 2k + I (i.e., d :5 2k), then we oblain a contradiction . To do this, we will find a code vector c and a veclor c' such that dl/(e, c') <. k yet nearest neighbor decodi ng decodes e' as the wrOllg code vector b :I- c. Let band c be any code vectors in C such that
There is no harm in assuming that these d errors occur in the first d entries of b. (Ot herwise, we can just permute the entries of all Ihe vectors until this is true.) Assuming that the code vectors in C huve length II, we construct a vector c' In l.2 as fo llows. Make c' agree with b in the first kentries, agree with c in the next d - kentries (why is d 2:: k?), and agree with both b and c In the lasl n - d entries. In other words, the entries of c' satisfy
i,=
b,:l-c, ifi = }, ... ,k ., :l-b, ifi=k+ l • ... , d b; = c. If,= d+ I, ...• n
Now dH(c, c') = k and d H( c', b) - d - k :s k. (Why?) Therefore, d H( c', b) s d /l(c' , c), so either we h:lve equality and it is im possible to decide whether c' should be decoded as b or c or the inequality is strict and c' will be IIlcorrectly decoded as b. In either case, we have shown that C cannot correct k errors, which contradicts our hypothesis. We conclude that d 2:: 2k + I.
Sectio n 7.5
Some books call such:l code all ( ,1, 2<, eI) code or, more generally, an (' 1, M, dl code, where /I is the length of the vectors, M is the number of code \'ectors, and d is the minimum distance.
Theor•• 1.21
Applications
&29
In the case of a linear code, we have the fo llowing notation: If an (II, k) linear code has minimum distance d, we refer to it as an (11, k, d ) code. For example, th e code in Example 7.45 is a (4,2,2 ) code. Linear codes have the advantage that their minimum distance can be easil y determined. In Exercise 42, you are asked to show that the minimum distance of a linear code is the same as the minimum weight of a nonzero code vector. It is also possible to d eterm ine d{C) by examining a parit y check matrix for C
Let C be an ( n, k) linear code with parity check matrix J). Then the mini mum distance of C is the smallest integer d for which P has d linearly dependent columns.
Proal Assume that d{C )
= d. The parity check matm P is an (II - k)X n matrix with the property that, for any vector x in Z~, Px = 0 If and only if x is in C A5 rou will be asked to sho\" in Exercise 42, C contains a vector c of weight d. Then Pc is a lin ear combination of exactly d columns of P. But, since Pc = 0, this implies that some set of d columns of P is linearly dependent. On the other hand, suppose some set of
d - 1 columns of P is linearly dependent- say, P " + p " +'''+ p ~ - , ~ O lei x be a vect or in Zi with Is in positions i p . . . , ' J- I and zeros elsewhere. Then x IS a vector of weight d - 1 such that Px = O. Hence, x is a code vector of weight d - 1 < d = d( C ). This is impossible, by Exercise 42, so we deduce that rank (P) = d - I. Conversely, aSSume that any d - I columns of P are linearly independent but some set of d columns of P is linearly dependent. Si nce Px is a linear combination of those columns of P corresponding to the positions of the Is in x, Px 0 for any vector x of weight d - I or less. Therefore. there are no nonzero code vectors of weight less than d. But some set of d columns of P is linearly dependent, so there ex· ists a vector x of weight d such that Px = O. He nce, th is x is a code vector of weight (I. By Exercise 42 aga in , we ded uce that d( C) = d. ~
"*
Example 1.46
Show that the Hamming codes all have minimum distance 3.
Sohlll,.
Recall that the ( n, k) Hamming code has an ( n - k)X npan ty check matrix Pwhose columns arc all of the nonzero vectors of Z2- t , arranged so that the Identity matrix occupies the last n - k columns. For example, the (7, 4) Hamming code has panty check mat rix p=
I
1
0
1
1
0
I
I
o
I
I
I
100 010 o 0 I
We can always fi nd three linearly dependent columns: Just take the columns corresponding to (' I' ('1' and (' I + ('2' ( In the matrix above, these would be columns 5, 6, and I, respectively.) But any two columns are linearly indepe ndent. By Theorem 7.2 1, Ihis means the Hamming codes have minimu m distance 3.
130
Chaple:r 7
Distance and ApproKimahon
Example 7.46, combined with Theorem 7.20, teUs us that the Hamming codes are all single error-
Example 7.47
Show that the Reed-Muller code R~ has minimum distance r
-
I
fo r n :> I .
Selutlol By Theorem 6.35, every vector in R" except 0 and 1 has weight r - L. Since I has weight r , this means that the minimum weight of a nonuro code \'ector in Rn is r - 1• Hence, d(R") :::: r -1, by Exercise 42.
Mariner 9 used the Reed-Muller code Rs, whose mlllllTIum distance IS 24 :::: 16. By Theorem 1, this code can correct kerrors, where 2k + 1 !SO 16. The largest value of k for which this inequality is true is k = 7. Thus, 14 not only contains exactly the right number of code vectors for transmitting 64 shades of gray but also is capable of correcting up to 7 errors, making it quite reliable. This expla ins why the images transmitted by Mariner 9 wcrc so sharp!
~ .ppro.I •• tlOA Of f •• CtlOAS
Exercises 1-4, find tile best linear llpproximlltioll to f the imervtll [ - I, I]. I. [(x) : x' 2. f(x} :::: Xl + 2x 3. [(x ) : x' 4. [ (x) : ,in{m2)
III
011
J.
10
8. Apply the Gram-Schmidt Process to the basis {I, X. x 2} 10 construct an onhogonal basis for ffp 2 [0, 11. III Exercises 9- 12, filld tile best li"ear approximation to fO Il tile intervt/I !0, 11.
[(x) - x' II. [(x) : .. 9.
10. f(x) :
Vx
15. [(x)
x'
: ,'
111 Exercises 21 alld 22, fiud the tltird-order FOlmer approxrmn tioll to f 0 11 [ - 17, 17 ] . 22. [(x) : x' 2 1. [(x) : Ixi III Exercises 23-26, find ti,e FOlmer coefficients a", a l • and bk offOil [ - 'IT, 17).
o 23. [(x) : { I
if - l7 :S X
24. [(x) : { - I 25. fix) =
17 -
if - 'IT ~ x < 0 ifo :s x !SO 'IT 26. [(x ) :
X
Ixl
12. [(x) : ,in("x/2)
In Exercises 13-16, find Ihe besl quadratic approximntioll to fOil IIIe illterval [0, I].
\3. [(x) :
17 , 17]
19. Show thai sin jx is orlhogonallo si n kx in eel -17 . 17 ] forj* k.j.k ~ I . 20. Showthat lW = 217 and Icos .bjl = l7in
In Exercises 5 and 6, find tile best quadratic approximlltioll to f Oil tire imenml [ - I, I
5. [ (x) - Ixl 6. [(x) : co~ "x/2) 7. Apply Ihe Gram-Schmidt Process to the basis ! I, xl conSlruct an orthogonal basis (or ~ 110, 1).
18. Show that COS )Xis orthogonal to cos kx in
14. [(x) :
Recall that afullction f is n il evell f unction if f( - x) :::: f(x } for nil x;f 1$ called all odd flmctiolJ if f( - x) = -f(x) for all x-
Vx
16. [(x) : ~n( "x/2 )
17. Show that I IS orthogonal 10 cos kx and sin kx in cel- l7, l7l fork?1.
27. (a) Prove that
I-
f (x}dx ::::
a iff is an odd fun ction.
- <
(b) Prove that the Fourier coefficien ts a (. are aUzero if f is odd.
831
Chapler Rev l('W
28. (a) PrOllet hat f"/(X)dX= 2 f'f(X)dX if fisuneven fu nction.
III Exercises 35 flml 36, compute rile ",immum distance of tile code C fllld decode rile veelors u, v, alld w usillg neareSI IIeiglibor decoding.
(b ) Prove thut the Fourier coefficients btare al l zero if f is even.
1
I
o
I
, 0 , I
35. C ""
29. C ""
30. C=
0 0
,
I
I ,
I
0
0
0
I
I
0
0
I
0
I
I
,
I
0
,
,
I
0
0
0
33. The code with parity check m atrix P = [I I A], where I
I
0
I
I
0
I
I
0
I
I
I
I
0 0
I
I
I
0
0
0 0
0
0
I
I
I
I
1
0, w -
0
o o
o
v
~
I
I
I
0
I
0 0
I
I
0
I
I
0
I
I
I
I
I
I
0
I
,
v
~
0 ,
w
~
0
0 0 0
I I I
I
I
0
°
I
I
1
I
1 0 0
I
37./I = 8, k= l,d ""8
38. // = 8,k = 2, d =8
39. n = 8,« = 5,d- 5 "0. II "" 8./0: = ".d = 4 41. Draw a picture (si milar to Figure 7.28) to illust rate" Example 7.45. 42. Let C be a line"ar code. Show that the minimu m distance of Cis equal to the minimum weight of a
nonzero code \'ec IOt.
34. The code With parity check matrix
P=
I
o o
In Exercises 37-40, eOllSlruet II /ille"r (II, /0:, d ) code or prove 11111/110 such code CXi$IS.
Rep ~
32. The II-times repetitIOn code
matrix 0 0
I
I
o
o o
0 I 0 0 ,u = 0
G~
31. The" eyen panty code E,.
A ~
36. C has generator I 0 0 I 0 0
1
= 0 ,
I
F",d Ille millimum (lislflllce of rlre codes m Exercises 29-34.
0
,u
I
o o
I
lnor-CanICtl.1 Codes 0
o o o 1
43, Show that d - 1 < " - k for any linear (II, /0:, d )-code. 44. Let Cbe a linear (11, k, d l code with pa rity check matri x P. Prove that (/ = /I - k + I if and onl y if every " - k columns of P are linearly independent.
8e
,
.......
~
.
.....
~ ."
.
.ew Dellnl1lons and Best Approxi mation Theorem, 579 Cauchy-Schwarz Inequality. 548 condition number of a matrix. 570 distance. 538 Eucl idean norm (2-no rm ), 562 Frobenius norm, 565 Funda mental Theorem of InvertIble Matrices, 614
Hamming distance. 563 Hamming no rm, 563 ill-conditioned matrix, 570 lOner product, 540 inner product space, 540 least squares error, 58 1 least squares solution. 583,613 Least Squa res Theorem, 584
matnx norm, 565 max norm(oo- no rm, Uniform norm ), 562 norm, 561 normed linear space, 56 1 operator norm . 568 o rt hogonal basis, 546 orthogonal projection, 547.592
632
Cha pler 7
Distance and App roximatio n
orthogonal (sct of) vectors, 546 orthonormal basis, 546 orthonormal set of vectors, 546 pseudoinversc of a matrix, 594,611
singular val ue deco mposi tion (SVD), 602 singular values, 599 singular vectors, 602 sum norm ( I-norm ), 56 1
Triangle Inequality, 54 9 unit sphere. 544 unit vector, 544 well ·co nditioned matrix, 570
Review Quesllons 6, d(x, x') ,f (p(x), q(x» ~ J; p(x)q(x) dx
I . Mark each of the fol lowing statements true or false:
(a) lf u =
[~J and" =
[ : :]. then(U, V) =
II I V.
+
zdefines an inner prcxluct on 1R2.
71'112V
(b) Ifu = 2 111v2 -
[ ~:J and Y = 2 112 VI
[ ::J, then (U, V) = 4u l vl
-
lI u + v ll~5 . (e) The sum norm , mllx norm, :md Eucl idean norm
(g) (h) (i )
(j)
on IR" are all equal to the absolute value function whenn = 1. If a matrix A is well -conditioned. then cond(A) is small. If cond(A) is small. then the mat rix A is wellconditioned. Every linear system has a unique least squares solution . If A is a matrix with o nhonormal columns, then the standard matrix of an orthogonal projection onto the column space of A is P = AAT. If A is a symmetric matrix. then the singular values of A are the $.;1me as the eigenvalues of A.
In Qllestio/ls 2-4, determille wlletller tile dejim lioll gives an inner product. " (p(x), q(x» 3. {A, B} 4, fj, g)
=
~
~
P(O)q(I)
([: J.[ ~]}if(U' V} = UT Av'WhereA = [:
tr(ATB) fo r A, 8 in Mn
, m", [(x))' m.x g(x)) fo, f,g; n '(l10, II \l S .r"'1
+ alb] + a2b2
~
8. II, x, x'/ ;f(p(x). q(x»
"oS.rS'
:J
fp(X)q(X) dx
" III Questioll s 9 (m el 10, determine whelher the definitioll .
'I
gIves 1I0rlll. 9. Uv U = vTv {orv inRn
JO. Ilp(x)1I ~ 1p(0) I + 1p( 1) - P(O)I foq,x) ;n!Y, I O. I O. II
II. Show that the matrix A =
0. 11
0.1 O. I I
0.111
O. I I I
OJIII
•
"
ill-conditioned. J2. Prove that if Q is an orthogonal fi X /I matrix. then its Frobenius norm is II QIIf = v;,. 13. Find the line of best fi t through the points ( I. 2), ( 2,3 ), (3, 5), and (4, 7)_ 14. Find the least squares sQlution of 1
2
~ -~ [~J = o
5
I
-
~ 3
+ p( I )q(O) fo, p(x). q(x)
In QuestIOns 5 and 6, compule Ill e indicated qlumtity usirlg the specified inner product. 5. III + x + x 211 if(tlo + '1 1X + a2x 2, bo + blx + b1xl) = tlobo
7.
+ 4112 V2 defines an inner prcxluct
on R2. (e) (A. B) = tr(A) + tr( B) defines an inner prcxluct o n M n. (d ) If u and v arc vectors in an inner prcxluct space with ll u ll = 4.11 ,,11 = Vs,and (u, v) = 2. then
(f)
111 Ques /iom 7 and 8, COII$tT[lCt an orthogollal set of vectors by applyillg the Gram-Schmidt Process 10 tile given set of vectors usillg III I' specIfied illller product.
I
15. Find the orthogonal projectio n of x =
column space of A =
I
I
0
I
2 o nto the 3
I 0 J6. If u and v are orthonormal vectors, show thai P = U U T + vv T is the standard matrix of an orthogonal projection onto span (u, v). [ Him: Show that P = A(A TA) -I A T fo r some matrix A. I
Chaple r Review
In Questions 17 and IS. find (aJ the singular values. (bJ a singular value decomposition, and (c) the pseudoinverse of the matrix A. 17. A =
1
1
0
0
1
- I
18. A
[:
1
1
-I] - I
633
19. If P and Q are orthogonal matrices for which PAQ is defined, p rove that PAQ has the same singular values as A.
20. If A is a square mat rix fo r which A2 = 0, prove that (A +)2=O.
,
- -.
-
'.
,
ix A Mathematical Notation and Methods of Proof
Anyone who mrderstal1ds Illgebruic 110tariorl reads al 0 glana: ill 1111 t(jllotio'l 1'tSilits reached arithmetiCIIllyonly
In this book, an effort has been made to use "math ematical English" as much as pos· sible, keeping mathematical notation to a minimum. However, mathemal ical nOla· tion is a convenient shorthand that can greatly simplify the amount of writing we have 10 do. Moreover, it is commonly used in every branch of mathematics, so the ability to read and write mathematical notal ion is an essential ingred ient of mathe· matical understanding. Finall y, there arc some theo rems whose proofs become "obvious" if the right notation is used. Proving theorems in mathematics is as much an art as a science. For the beginner, it is oft en hard 10 know what approach to use in proving a theorem ; there are many approaches. anyone of which might turn out to be the best. To become profiCient at proofs, It is important to stud y as many examples as possible and to gct plenty of
wtrh grrel r Inoour elnd pel ins
practice.
please, $Ir. I want some more. -Oli....er Charles Dickens, Oli.-er TWIst
-Augustin Cournot Researrhn illlo tIle Mat/rema/lml Pri,rciples of Ihe Theory olWealth Trnnslatcd by Nathaniel T. Bacon Macmillan, 1891, p. 4
Th is appendix summarizes basic mathematical notation applied to sets. Notation specific to linear algebra is introduced as required. The appendix also discuSS6 sum· mation notation, a useful shorthand for dealing wilh sums. Finally, some approaches to proofs are illustrated with generic examples.
Set "olallon A $et is a collection of objects, called the elemelft$ (or members) of the sel. Examples of sets incl ude the set of all words in this text, the set of all books in your college library, the SC I of positive integers, and the set of al l 'lectors in the plane whose equa· tionis2x+ 3y- %=0. It is often possible to list the clements of a set, in which case it is conven tional to enclose the list within braces. For example, we have
{1,2,3),
{a, I, x, ,), {2,4,6" .. , 100),
211" ~ 4... 511"} 4 '5'2'7"'·'6
To {
Note that ellipses (... J denote clemen ts omitted when a pattern IS present. (Wha t is Ihe pattern III the lasllwo examples?> Infinite sets arc often expressed using ellipses. For example, the set of positive in tegers is usually denoted by N or Z "', so
1\1
= Z> = (I ,U,.,.)
The set of all integers is denolt.'
Z={ ... ,-2,-I,Q, ],2, ... }
.3.
Appendix A Mathematical NOlation and Methods of Proof
&35
Two sets a re considered to tx equal if Ih('y contain exactly the same elements. The order in which elemen ts arc listed does not malter, and repetitions arc not counted. Thus.
{I.2.3)
= {2.1.3) = {I.U. I)
The symbol E means "is an el('mcnl of" or "is in," and the symbol negation- that is, "is not an element of" or "is not in." For example,
5 E Z·
but 0
e denotes the
e Z·
It is o flcn more convenient to describe a set in terms o f a rule satisfied by all o f its
elements. In such cases, set blliftler 1IOtat;0'1 is appropriate. The fo rmat is {x: x satisfies I>}
where P represe nts a property or a collection of properties that the element x must satisfy. The colon is prono unct
{II:
fI
E Z. n
>
O}
is read as "the set of all tI such that 1/ is an integer and 11 is greater than zero." Th is is just another way of describing the positive integers Z' . (We could also write Z+ = lIlE Z :n>OI·) The empty set is the set with no clements. It is denolt
Example 1.1
Describe in words the followi ng sets: (a) A=ltI:n=2k.kEZ} ( c) C = Ix E ll: 4X l - 4x - 3
Solution
*"
(b) B = !m/lI:m.nEZ.II 0) (d) D= !xEZ:4r- 4x -3= 01
= 0)
(a) A is the set of numbers is the sel of all even IOtegers.
fI
thnt are integer nm ltiples of 2. Therefore. A
(b) B is the sel of all expressions of the form 111/ II. where m and II are integers and /I is no nzero. Th is is the set of ratio/w / numuers. usually denoted by Q . (Note that this way of describi ng 0 produces many repetitions; however, o ur conventiOn , as noted above. is that we include only one occurrence of each element. Thus, this expression preciseJy describes the set of all rational numbers. ) (c) C is the set of all real solutions of the equation 4x1
-
4x - 3 = O. By factoring or
using the q uadratic fo rmula, we find that the roots of this equation are -~ and (Verify this. ) Therefo re.
i.
c={ -t.H (d ) From the solution to (c) we see that there a re 110 solutions to 4x 1 - 4x - 3 = 0 in R that are integers. Therefore, D is the empty set, which we can express by writing D =0. John Venn ( 1104- 192J ) was an
English mathema tician who studied at Cambridge University and late r lectured Ih('f('. He workfit primarily In mathematical logic and is best known (or ;m"e:nting Venn diagrams.
If every clement of a sel A is also an element of a sel B. then A is called a subset of B, denoted A ~ B. We can represen t this sit ualion schematIcally using a Veml diagram, as shown in Figu re A.1. (The rectangle represen ts the universal sel, ,I set large eno ugh to COntain all of the other sets in question- lli this case, A and B. )
636
Appendix A Mathemaltcal Notation and Methods of Proor
/
'\
, A
B
" ,: : : fllUri • . 1 A C B
Example •. 2
(a ) \l. 2.3}C\l, 2,3.4.5} (b) Z ' C Z C R (c) Let A be the set of all positille integers whose laSI two d igits are 24 and let B be the set of all positiw integers that arc evenly diV Isible by 4. Then if 11 is in A, it is of the form 11 =
lOOk
+
24
for some integer Ie. (For example, 36524 = 100· 365 II ::
lOOk
+
24 = 4(25k
so 11/ 4 = 25k + 6, which is an integer. 1·lence, Therefore, A ~ IJ
/I
+ 24.)
But then
+ 6)
is evenly divisible by 4, so it is in B.
We can show that two sets A and B are equal by showing that each is a subset of the other. This strategy is particularl y useful If the sets are defined abstractl y o r ifit is not easy to list and compare their elements.
EKample '.3
Let A be the set of all positive integers whose last two digits form a number tha t is evenly dillisible by 4. In the case of a o ne-digit number, we take its tens digit to be O. Let B be the set of all positive intege rs that are evenly divisible by 4. Show tha t A = B.
SGlullGa As in Example A.2(c), it is easy to see that A C B. If 1/ is in A, then we ca n split off the number
III
formed by its last two digits by wfllms
n=lOok + m for some integer k. But, since m is divisible by 4, we have Therefore. 11
= lOO k
III ""
4r fo r some integer r.
+ m = lOOk + 4r = 4(25k + r)
is also evenly d ivisible by 4. Hence, A C B. '10 show that B C A, let 11 be in B. Thai is, 11 is evenly divisible by 4. Let's say that /I = 4s, where s is an integer, If In IS the numbe r formed by the last two digits o f 11, then, as above, ,, = lOOk + In for some integer k. Bul now
so
/I
m "" n -
lOOk = 4s - lOOk = 4(5 - 25k)
which implies th.1I m is evenly d ivisible by 4, since is in A, and we have shown that B C A. Since A ~ JJ and B C A, we must have A = B.
5 -
25k is an IIlteger. Therefore, 11
Appendix A
Mathematical Notation and Methods of Proof
631
The intersection of sets A and B is denoted by A n B and consists of the elements that A and B have in common. That is, A n B = {x : xE A
and
xE B}
Figure A.2 shows a Venn diagram of this case. The union of A and B is denoted by AU B and consists of the elements that are in either A or B (or both). That is, AUB = {x:xEA
or
x EB}
See Figure A.3.
,\
Example A.4
'\ An "
8
A
figure • .2
figure A.3
AnB
AUB
B
letA = I '~ : n E ;l ' . 1 :5 /1 :5 4} and let B = {n E lr : 11 :5 Wand n is odd !. Find AnBandAUB.
Solullan We see that /
,
A = {11.21 . 3z, 42} = {I, 4. 9, 16}
and
B = {1,3.5. 7, 9}
Thcrcfore,An B = {I , 9} andA U B = {l,3, 4, 5,7,9,161·
8
figure 1.4 Disjoint sets
If A n B = 0, then A and B are called disjoint sets. (See Figure A.4.) For example, the sct of even integers and the set of odd integers are disjoint.
Summation NotalioD Sum mation notation is a convenient shorthand
[0
use to write out a sum such as
1 + 2+3+" ' + 100
~
is the capital Gret'k letter sigma< corresponding to S (for "sum"). Summat ion notation was In troduced by Fourier in 1820 an was quickl y adopted by the mathematical community.
where we want to leave out all but a few terms. As in sct notation, ellipses (... ) con vey that we have established a pattern and have simply left OU I some in termediate terms. In the above example, readers are expected to recognize that we are summ ing all of the positive integers from 1 to 100. However, ell ipses can be ambiguous. For example. what should one make of the sum 1 + 2 + "'+64?
Is this the sum of all positive integers from 1 to 64 or just the powers of two, I + 2 + 4 + 8 + 16 + 32 + 64? It is ofte n clearer (and shorter) to use summation notation (or sigma flotation ).
&31
Appendix A MathematICal NotatlOn and Methods of Proof
We can abbreviate a sum of lhe for m
(I ,+ a2 +"'+ a"
(I )
"
,-,La,
( 2)
which tells us to sum the terms a. over all integers k ranging from I to n. An alternative version of this expression is
The subscript k is called the index of slIItunatiotl. It is a "dummy variable" in the sense that it does not appear in the actual sum in expression ( I ). Therefore, we can use any leBer we like as the index of summation (as long as it doesn't already appear somewhere else in the expressions we arc slimming). Thus. expression (2) can also be written as
•
L a . .-, The index of summation need not start at I. The su m
(11
+ (14 + . . +
ay."
becomes
" a, L ,., although we am arrange for the index 10 begm at I by rewriting Ihe expression as
",-,a L
h l·
The key to using summation notation cffKtivt:ly is being able to rccogfllze patterns.
(IImpl.
'.5
Write the following sums usmg summation notation. (a) I
+ 2 + 4 + .'. + 64
Solution
(b) I
+ 3 + 5 + . + 99
(c) 3
+ 8 + 15 + ". + 99
(3) We recognize Ihis expression 3S a sum of powers of 2: 1 + 2+ 4 ++ M -r+ 2'+~++r
Therefore, the index of summation appears as the exponen t, and we have
• 2t, L ,-,
(b) This expression is the sum of all the odd Integers from I to 99. Every odd integer IS of the form 2k + I . SO the sum IS
1 +3+5+ '+99 - (2·0+ 1)+(2'1+1)+ (2 · 2+ 1)+' + (2'49 + 1)
.
- ~ (2k+ I )
,-,
(c) The pattern here IS less clear, but a lillie reflection re\'eals that each term is I less than
a perfect square:
3+8+ 15+ " ' + 99 - (2' - I) + (3' - I) + (4' - I ) + ... + (10' - I)
- ,-, L" (k' -
I)
Ap~nd,,(
Example A.6
&38
A Mathematical Notalion and Methods of Proof
Rewrite each of the sums In Example A.5 so that the index of summation starts at I. SOIIU.. (a) If we usc the change of variable, = k + I, then, as k goes from 0 to 6, i goes from I 10 7. Since k "" I - I, we obtain
. .,
(b) Using the same substItution as in part (a). we get ~
L (2 k+ 1) = L (2(i - 1) + 1)
(c) The substitution j - k - 2 will work (try it), but il is easier jU5110 add a lerm spondmg \0 k = I, since 11 - J = O. Therefore,
corn~·
Multi ple summat ions arise when there is m orc than one index o f summation, as there is wit h a matrix. The notation (3)
means to sum the terms a,} as i and j each range independently from I to in expression (3) is equivalent to either
II.
T he sum
where we sum fi rst over j and then over, (we always work from the inside ou t), or
where the order of summation
Example 1.1
, Write out L:
jl
IS
reversed.
usi ng both possible o rders of sum mat io n.
~I" I
SIIIUIi
= (i I
- (I
+
12
+ J') + (21 + 22 + 23 ) + (3 1 + 32 + 33 )
+ 1 + I ) + (2 + 4 + 8) + (3 + 9 + 27)
= 56
It.
Appendix A Mathemallcal Notation and Methods of Proof
and flow to Sob'l: It is the title of a book by the IlUthematician George P61ya (1887-1985). Sina' iu publication in 1945, How to 5011'(' It has ~Id over ~ million copies and has been translated into 17 languages. P61ya was born in Hungary. but b«ause of the political situation in Europe. he ITIOVed to the United States in 1940. He subsequently
taught at Brown and Stanford Unil'ersities, where he did mathematical research and developed a wdl--deserved rq>lIution u an outstanding teacher. The Polya Prize is awarded annually by the Society for Industrial and Applied Mathematics for major contributions to artaS of rnathenutio cI
to those on which POIy-a worked. The Mathematical Aswciation of Amerie;1annually award~ PO!)"il I.ectureships to mathematicians demon:.tl3.ting the high-quality expo<,ition for which P6Jya was known.
"
2:(1' + 2; + 3' )
,. , "" ( I I
+ 21 + J l) + ( 12 + 22 + J2) + ( 13 + 2' + 3')
- (J + 2 + 3) + (J + 4 + 9) + ( I + 8 + 27) - 56
Rlmarl! Of course, the value of the sum in Example A.7 is the same no matter which order of su mmatIOn we choose, because the s um is fim le. It is also possIble to consider mfil/ite Slims (known as il/fillite series in calculus), but such sums do not always have a value and great care must be taken when rearrangi ng o r manipulating their terms. Fo r example, suppose we let ~
S - 2:2'
.·0
Then
5= 1+ 2+ 4 +8+'"
+ 2(1 + 2 + 4 + ... ) 1 + 25
"" 1
fro m which it follows that 5 "" - I. Th is is clearly nonsense, since 5 is a sum of ' 10'1r1eglltive terms! (Where is the erro r?)
Methods 01 Prool T he notion of proof is at the very heart of mathematICs. It is one thing to know W/!(lt lS true; 11 is quite another to know why it is true and to be able to demonstrate its truth by means of a logically connected sequence of statements. The intention here is not to try to teach yo u how to do proofs; you will become bener at do ing proofs by studylIlg examples and by practictng-something you should do often as you work th rough this text. The intentio n of this brief section is simply to provide a fewelementary examples o f some types of p roofs. The proofs of theorems in the text W Ill provide further ill ustratio ns o f "how to solve It," Roughly speaking, mathematical proofs fall mto two categories: direct proofs :md indirec, proofs. Ma ny theorems have the structu re " If P, then Q ," where P (the hypothesIS. or premise) and Q (the conelllsio,,) are statemen ts that arc eit her true or false. We denote such an implication by P ::::> Q. A direct proof proceeds by estabhshIIlg a chain of implications
leading directly from P to Q.
Example A.8
Prove that any two consecutive perfect squares differ by an odd number. This instruction can be rephrased as " Prove that if (I and b arc consecutive perfect squares, then a - b is odd." Hence. it has the form 1' ::::> Q. with Pbcing "a and b are consecutive perfect squares" and Q being " a - b is odd."
Appendix A Mathematical Notation and Methods of Proof
6C1
Assume that a and b are consecuti ve perfect squares. [We may as well assume that a > h, since a - b = -( b - a), and if o ne of these is odd, they both are.]
SolutIon
Then
a=(n+ 1)2 and fo r some integer
fl.
b=,l
But now
a - b = (n + 1)2 - ,r
r?- + 2n +
=
= 2n
+
I
r?-
I
son- bisodd .
There are two types of indirect p roofs that can be used to establish a conditional statement of the form P=*, Q. A proo/ by contrad iction assumes that the hypothesis Pis tTue, j ust as in a d irttt proof, but then supposes that the conclusion Q is false.. The strategy then is to show that this is not possible (i.e., to Tule o u! the possibility that the co nclusion IS fa lse) by finding a contradiction to the truth of P. 1t then follows that Q must be Irue.
Example A.9
~
Let /I be a positive in teger. Prove thaI if'; is even , so is n. (Take a few minutes to try to fi nd a dircct p roof of this asse rtion; it will help you to appreciate the indirect proof tha t follows.)
Solution
Assume that n is a positive integer su ch that '; is even. Now suppose that n is not even. Then n is odd , so
1! = 2k+! for some integer k. But if so, we have If' = (2k
+
1)2 = 4~
+ 4k +
!
so ,; is odd, since it is I more than the even n um ber 4J.:2 + 4k. This co ntrad icts o ur hypothesis that ,1 is even . We conclude that o u r supposition that II was riot even m ust have been false; in other wo rds, /I must be even .
Closely related to the method of proof by contradict ion is proof by cO'ltraposilive. The negative o f a statemen t Pis the statem ent " it is not the case tha t P," abbreV Iated symbolically as -'P and pronounced "not P." For example, if P is "/I is even," then -,p is "it is not the case that /I is evenn_in o the r words, " /I is odd." The contraposilive of the statement P=* Q is the statement -. Q::::} "' P. A conditional statement P:::;. Q and its con trapositlve "'Q => ..,p are logically equivalent in the sense that they are either both true o r both falsc. ( For example, if P=* Q is a theorem, then so is -.Q:::;. "' P. To see this, notc that if the hypothesis..,Q is true, then Q is false. The conclusion ..,p can no t be false, for if it were, then P would be true and our known theorem p :::;. Q wo uld imply the tr uth of Q, givlllg us a contradiction. It
642
Appendix A
MathematICal Notation and Methods of Proof
foHows that -, p is true and we have proved ---.Q::::} ,P.) Here is a contrapositive proof of the assertion in Example A.9.
Example A.l0
let
11
be a posit ive integer. Prove that if
Solullon
/1
2
is even, so is
11.
The contrapositive of the given statement is " If n is not even, then ,,2 is not even"
or
" If
II
is odd, so is
,,2 "
To prove Ihis comrapositive, assume that II is odd. Then II = 2k + I for some integer k. As before, this means that n 2 = (2k + 1)2 = 4k 2 + 4k + I is odd, whICh completes the proof of the cont rapositive. Since the con trapositive is true, so is the original statement.
Although we do not require a new method of proof to handle it, we will briefly consider how to prove an "if and only if" theorem. A statement o f the form" P if and only if Q" signals a double implicatio/1, which we denote by P..:::;. Q. To prove such a statement, we must prove P::::} Q and Q::::} P. To do so, we can use the techniques desen bed above, where approp riate. It is important to no tice that the "If" part of P<::> Q IS "p If Q," which IS Q =? P; the "only if" part of P ~ Q is "p o nly if Q," meaning P::::} Q. The implication P=> Q is sometimes read as "p is sufficient fo r Q" or"Q is necessary fo r P"; Q => P is read" Q is sufficient fo r P" o r " P is necessary for Q." Taken together, they are P ~ Q, or " P is necessary and sufficient for Q" and vice versa.
Example A.ll
A pawn is placed o n a chessboard and IS allowed to move one square at a time, either horizontally or vertically. A pawn's tour o f a chessboard is a path taken by a pawn, moving as described, that visits each squa re exactly once, startm g and ending on the same square. Prove tha t there is a pawn's tour of an /IX n chessboard if and only if /I IS even.
1" "t
Solullon
"t .f-
[ ..;::: 1 ("if") Assume that
is even. It is easy to see that the strategy illustrated in Figure A.S for a 6 X6 chessboard wIll always give a pawn's tour. [ ::::} 1 Conly if") Suppose that there is a pawn's tour of an fiX 11 chessboard. We will give a proof by contradiction that fI must be even. To this end, let's assume that n is odd. At each move, the pawn moves to a square of a di fferent color. The total number of moves in its tour is Ii, which is also an odd num ber, according to the proof in Example A.IO. Therefore, the pawn must end up on a square of the opposite color from that of the square on which it started. (Why?) This is impossible, since the pawn ends where it started, so we have a contradiction. It follows that II cannot be odd; hence, 11 is even and the proof is complete. /1
figure • .5 Some theorems assert that several statements are equivalent. This means that each is true if and only if all of the others are true. Showll1g that n statemen ts are equivalent requires (
,,)
t(
n'
)'
n' - n
"if and only if" proofs. I n practice, however, 2 2. /I - 2 . 2 it is often easier to establish a "ring" of /1 implications that llllks all o f the statements. T he proof of the Fundamental Theorem of InvertIble Matrices provides an excellent example o f this approach. =
Appen xB Mathematical Induction The ability to spot patterns is one of the keys to success in mat hematical problem solving. Consider the following pattern: I = I
1+ 3 = 4
Grear jleas /Jail/! little jleas upon tlieir backs to bite 'em, Amllittle jlens halle lesser jlCf/S, ami so ad mfi nitum . -Augustus Dc Morgan A Budget of Pa ra doxcs Longma ns, Green, and Company. 1872, p 377
1+ 3 + 5= 9 L + 3 + 5 + 7 = 16 1 + 3 + 5+7 +9=25 The su ms are all perfect squares: 12,22, 32, 42,52. It seems reasonable to conjecture that th is pattern will con tin ue 10 hold; that is, the sum of consecu tive odd numbers, sta rt ing at 1, wil! always be a perfect square. Let's try to be m o re p recise. If the sum is It-, then the last odd num ber in the su m is 21! - I. (Check this in the fi ve cases above.) In symbols, ou r conject ure becomes 1 + 3 +5+"'+(2n - I)=,r fora ll ,,~ 1
(I )
NotICe th.11 form ula ( I ) is really an mjitlite collection of statements, o ne for each value of " ~ I. Although our conjectu re seem s reasonable, we cannot assu me that the pattern contmues-we need to p rove It. T his is where mathematical induction • co mes m.
First Principle of Mathematical Induction Let S( II) be a statement abo ut th e positive integer 11. If I. S{ I) is true and 2. fo r all k 2:: 1, the tru th o f S( k) implies the truth of S( k
+
1)
then 5(11) is true for aU" > I.
Verifyi ng that 5( 1) is tr ue is called the basis su p. The assumptIOn that 5( k) is true fo r so me k 2:: 1 IS called the i"duction hypotllesis. Using the mduction hypothesis to prove that S(k + I) is th en true is called the indllction step. Mathe matical induction has been refe rred to as the dom ino principle beca use it is analogous to showing that a li ne o f do m inoes will fall down if ( I) the fi rst d o m ino can be knocked down (the basis step) and (2) knocking down any d om ino (the induction hypotheSIS) will knock over the next domino (the induction step). See Figu re B.l. We now use the principle of mathematical ind uction to prove fo rmula ( I).
643
Appendix B
Mathematicallndllction
_____ :_-_-3• If the li o;t domino fall s. and .
each domino th at fall s knocks down the nelll one.
then all the domlrlOCs can be made 10 fall by pushlllg over the first one.
Fig., ••. 1
Example 8.1
Use mathematical inductio n to prove that
I + 3 + 5 + . + (2" - I) ,.. for all
II ~
,, 2
I.
Solutio.
For" = I , the sum on the left-hand side is just I, while the right -hand side is 12. Si nce 1 = 12, th IS completes the basis step. Now assume that the formula is true for some integer k <:!: I. That is, assume that
1 + 3 + 5 + ... + (2k - I ) = k 2 ( This is the induction hypothesis.) The induction step consists of proving that the formula is true when" = k + J. We Stt that when /I = k + I, the left -hand side of formula ( I ) is
1 +3+5+
'+(2(k+I) - I) - J+3 +5+" '+(2k + l ) ~ 1+3+5 + "'+(2k - 1)+(2k+ 1) .. ... ~
- (k+ I) '
+ 2k +
I ~ By the induc tion hn)(llhesis
which is the righ t-hand side of formula (1) when /I = k + I. Thiscompletes the induction step, and we conclude that formula ( 1) is true for all It ~ 1, by the principle of math ematical induction.
645
AppendIx B Mathemat lcallnd uction
T he next example gives a proof of a useful formula for the sum of the fi rst n positive integers. The for m ula appears several times in the text; for exam ple, see the solutio n to Exercise 45 in Section 2.4.
Example B.2
Prove that
1 +2+"+11= fo r all
II
lI(n+ I) 2
2: I.
Solution The formula is true for /I = I, since I ~ C'I(,,-Icc+-lco) 2 Assume that the fo rmula is t rue fo r
k; that is,
II =
I + 2 + ... + k We need to show that the formula is true when ~
I + 2 + .. . + (k + I)
=
II =
k(k + I) = '-c--'-'2
k + I; that is, we must prove that
(k+ I )[(k + 1)+ IJ
"'--'--"=;:-'"---'--"2
But we see that
\ + 2 + . .. + (k
+ \)
= =
(i + 2 + ... + k) + (k + I)
k(k + I) 2
+ (k +
I)
By the induction
hypoth~sis
k(k+ I ) + 2(k+l) 2
k 2 + 3k + 2 2
(k+I)(k + 2) 2
(k+ I) [(k+ I) + IJ 2 which is what we needed to show. This completes the induct ion step, and we conclude that the form ula is true fo r all n 2: \, by the principle o f mathematical induction .
In a similar vein, we can p rove that the sum of the squares of the firs t in tegers satisfies the fo rm ula 12 + 22
for all
tl
+ Y + .. . +
> I. (Verify this for yourself.)
1/(1/
+ 1)(2" + I)
n2 = - - ' - - - " - - - - ' 6
1/
positive
1t6
Appendix B
Mathematical lnJuclion
Example B.3
The basis step need not be for
/I
Prove that n! > 2n for all integers
II
SOlltll1 slIlce
= 1, as the next two examples illustrate.
> 4.
The basis step here is when
/I
= 4. The lIlequality is clearly true in this case,
41 = 24
>
16 = 2~
Assume that k! > 2k for some integer Ie 2: 4. Then
(k + I)!
~
(k+ I)k!
>(k +1)2k 2:
5·2 l
>
2_21 = 2h l
By the induction hypothesis Since k > 4
which veri fies the inequality fo r II "" k + I and com pletes the induction step. We conclude thai n! > 2" fo r all integers /I 2: 4, by the principle of mathema tical induction.
If a is a nonzero real number and n 2: 0 is an integer, we can give a recursive definition of the power a" that is compatible With mathematical inductio n. We define aO = I and, for II 2: 0,
""Inn (ThiS for m aVOids the ellipses used in the version a" "" ~.) We can now usc mathematicallllduction to verify a fami liar property of exponents.
Example B.4
Prove that (I'''a'' "" am ..-" for all integers m, n
2:
O.
Solution At first glance, it 15 not clear how to proceed, since there are two variables, m and n. But we simply need to keep one of them fixed and perform our ind uct ion using the other. So, let /II 2: 0 be a fixed lllteger. When II = 0, we have
n'nJl = a"' . I = a'" =
a "' ~o
using the defin itio n tf = I . Hence, the basis step is true. Now assume that the formula holds when II = k. where k 2: O. Then a/nat = am+1'. For 11 = k + I, using our recu rsive definition and the fact that adclitlo n and multipl ication are associative, we see that a"'ff~ 1
"" a"'(aka)
By Jcfinilion
- (d"J )a By the induction hypo thesis By definition
=
a"'..-(lH )
Appendix B Mathematicallnduction
641
Therefore, the formula is true for II = k + I , and the ind uction step is complete. We conclude that a"'a~ = a'" + n for all intege rs III, /I 2: 0, by the principle of math ematical induction.
In Examples B. I th rough 804 , the use of the inductIOn hypothesis dUring the induction step is rela tively straightforward. However, this is not always the case. An alternative verSIon of the pri nciple of mathematical induction is often more useful.
Second Principle of Mathematical Induction Let S( n) be a statement about the positive integer II. If I. SO ) is true and
2. the t ruth of S( I), S(2), ... , S(k) implies the truth o[ Sfk + I) then
S( n)
is true for all
II
~
I.
The only difference between the two principles of mathematical ind uction is in the induction hypothesis: The firs t version assumes that S( k) is t rue, whereas the second version assumes that all of S( I), S(2), . . . , S( k) are true. This makes the second principle seem weaker than the first,since we need to assume more in order to prove S( k + 1) (although, paradoxically, the second principle is so metimes called strong induction). In fact, however, the two principles are logIcally equ ivalent: Each one implies the other. (Can you see why?) The next example presents an instance in wh ich the second principle of mathematical induction is easier to use than the first. Recall that a prime number is a positive integer whose only positive integer factors are 1 and itself.
Example 1.5
Prove that every positive integer n uct of primes.
2:
2 either is prime or can be factored into a prod-
50lullon The result is clearly true when n = 2,since 2 is pri me. Now assume that for all integers /I between 2 and k, /I either is prime or can be factored into a prod uct of primes. Let II = k + I. If k + 1 is prime, we are done. Otherwise, it must factor into a product of two smaller integers- say,
k + l = ab Since 2 :S a, b :S k (why?), the ind uction hypothesis applies to a and b. Therefore,
a=Pl '''p, and b=ql"'qj where the p's and q's are all prime. Then
ab = PI .. . P,ql .. . q, gives a factorization of ab into primes, completing the induct ion step. We conclude Ihal the result is true for all integers n i:: 2, by the seeond principle of mathematical ind uction.
641
Appendix B Mathelllatical iliduction
Do you see why the fi rst principle of mathemat ical mduction would have been difficult to use here? We conclude with a highly nontrivial example that involves a combination of induction and btlckward induction. The result is the Arithmetic Mean-Geometric Mean Inequality, d iscussed III Chapter 7 in Exploration: Geometric Inequalities and Optimization Problems. The clever proof in Example B.6 is due to Cauchy.
Elample •. 6
Let X I "
"
, x" be nonnegative real numbers. Prove thaI
fo r all in tegers
/I
> 2.
SOliliol For /I = 2, the inequality becomes Vii :s (x + y)/ 2. You arc asked to verify th is in Problems I and 2 of the Exploration mentioned above. If Sen) IS the stated inequality, we Will prove that S( k) implies S(2k). Assume that S( k) is true; that is,
...v
X I +X1 +
k
X I X2"' X, :5
for all no nnegat ive real numbers X l"
.• ,
" ' + x,
xl- Let ... ,
x, -
YU- I + Yu 2
T hen
{}'YI '" Yll
= :5
.. Ya =
1(YI: Y2) .. (Y2H 2+ Yn)
lSyS(2)
~XI''' XI'
_ --,X "'"+ .,.._·'_'_+--,x", s X",_+ k
I~y
S(k)
Y' + ") + .. . + (Y'H2+ Y a) 2 (
- -'--------'.-...,.--'' - - --'k
YI + ··+Y2J. 2k which verifies S(2k) . Thus, the Arithmetic Mean- Geometric Mean 1nequaJity is true for" = 2, 4, 8, ...- the powers o f 2. In order to complete the proof, we need to " rill in the gaps." We will use backwa rd induction to prove that S(k) implies S(k - 1). Assuming S(k) is true, lei XI
xt =
+ X2 + . . + Xt_ 1 k- l
Appendix B Mathematical Ind uction
Then
xI +x.,+···+
X I + X 2+"' + X
&41
l_' )
( k_ I s----------~~~~~--~
+ X2 + ... + Xl-I ) k- I
k
= :: kx,,'_+:...::kx::;,7.+__..".,..+:...::kx ""'='-_' k(k - l) XI
+ X2 + ... + xk_ 1 k- l
Equivalently,
Taking the ( k - I )th root of both s id~ yields S( k - I ). The two inductions, taken together, show thaI the ArithmetIc Mean-Geometflc Mean Inequal ity is true for aU /I ;2! 2.
••••,11
Al tho ugh ma thematical inductio n is a powerful and indispensable tool, it cannot work miracles. That IS, it can no t prove that a pattern or form ula holds If JI does not. ConSider the diagrams In Figure 8.2, which show the maximum number of regions R( rJ) into which a circle can be subdivided by /I straight lines.
R(O) - 1 = 20
R( I ) = 2 "" 21
R(2) - 4 = 22
fllllr.I.2 Based o n the evidence in Figure B.2, we might conjecture that R{ u) = r for n 2: 0 and try to prove this conjecture using mathema tical induction. \\'e would not succeed, since this formula is not correct! If we had considered one mOre case, we would have discovered that R(3 ) = 7 '" 8 :: 21, thereby demolishmg ou r conjecture. In fac t, the correct formula turns o ut to be
R()
"
-
n 1+ 11+2
2
which Ciln be verified by induction. (Can you do it?) For other examples in wh ich a pattern appears to be true, on1y to disappear when enough casn are considered, ~ Rich3rd K. Cuy's delightful article "The Strong Law of Small Numbers" in the Americtm Mm/rematiC(lI M OIl/hly, Vol. 95 ( 1988). pp. 697-712.
Appe
xC
Complex Numbers
{Thel extension of the numlJer concept to inc/utle the irmtioual, atld we wIll at Ofrce add, the imag;rrlrry, IS the greatest forward step wh ICh pure rrrarhema tics has e\l£r raul!.
A complex number is a number of the form a + bi, where a and b are real numbers and i is a symbol with the property that jl = - I. The real number a is conSIdered to be a special type of complex number, since a = a + Oi. If z = a + hi is a complex number, then the "al part of z, denoted by Re z, is (/, and the imagi,mry part of z, denoted by 1m z. is h. Two complex numbers a + b, and c + di are equal If their real pa rts arc equal and their imaginary parts are equal -that is, if a = c and b = d. A complex number 11+ bican be identified wit h the point (a, h) and ploned in the plane (called the complex plane, o r the Argand plane), as shown in Figure Co l . In the COlllplex plane, the horizontal axis is called the real ax;sand the vertical axis is called the .. .
,maglllary axIS.
1m
- Hermann Hankel Tlroone der Complexetl lalr/rmsysteme
6'
Lelpzig, 1867, p.60
-4
There is nothing " imagi n a ry"iL..~ about complex numbeu-they art just as ~ real" as the real numbers. The- term Imagimuyarose from study of polynomial equations such as X l + I = 0, whose solutions are /lot "real" (i.e., real numbers). It is worlh. remembering lh. at one time negati~ numbC"rs we lh.ought of as "imaginary" t'OQ~
4,
+ 3i
•
3
2i
---+-6
C-I---f-
-4
- 3 - 2;
•
---+-+-1- ....1- - +-+-_ R,
-2
•
+ 2;
2
4
6
- 2,
•
I - 4;
- 6i
flgur. C.l The complC"X plane Jean-RobC"rt Argand ( 1768-1822) was a French accountant and amateur mathematician. HIS geometric interpretation of complex numbers appeared in 1806 in a book lh.at he- publishC"d privaldy. He was nO!, however, the first to gi~ such an interpretation. The NorwegianDanish surveyor Caspar Wessel (1745-1818) gave the same version of the- complex plane in 1787, but his paper was not noticC"d by themathematical community until after his death.
'"
Oper811011 01 COIIPII. NUllbers The sum of the complex numbers a + bi and c + di is defined as
(a + bi) + (c + di)
=
(a + c) + ( b + d) i
Notice that, wi th the identification of a + b, with ( a, b), c + di with ( c, d ), and (a + c) + (b + d) i with (0 + c, b + d ), addition of complex numbers is the same as vector addition. The product of a + bi and c + di is
(a + bi)(c + (Ii) = a(c + di) + bi(c + d,) = ac + adi + bri + blb 1
651
Appendix C Complex Numbers
Since ;2 = - ] , Ihis expression sim pli fi es to (a r - btl)
+ (ad + be);. Thus, we have
(t/ + bi)(e + di) = (ae - bd) + {ad + be)i O bserve Ihat, as a special ( 3SC, a(e + d i) = ae + adl, so the negative of e + dl IS -(e + di ) = (- 1)«( + di) = - e - di. This fact allows us to compute the differetlce ofa + biandc+ d;as
(a + hi) - (e + di) = (a + hi) + (- I)(e + di ) ~ (a + ( - e» + (b + (-
Example C.l
Find the sum, d ifference, and product of 3 - 4; and - I
SOlillol
+ 2i.
The sum is
(3 - 4,) + (- I + 2,)
~
(3 - I) + (-4 + 2) ,
~
2 - 2;
T he difference is
(3 - ,;) - (- I + 2;)
~
(3 - (- I)) + (-, - 2) ;
~,
- 6.
The prod uct is 1m
(3 - 4i)( - ] + 2/) ;:' = (I+hi
hi
, .L ,, T
=
- 3 + 6; + 4i - 8il
=
- 3+ IO/ -8{- I )= S+!Oi
The conjugate o[ 2 = Il + hIts the complex number
-t--,." - +o<
,
z = a-bi
,
T
- hi
=
I
a - hi
flgur. C.2 Compl('x conjugates
Example C.2
(z is prono unced "z bar.") Figure C.2 gives the geometric inlerprel
- 1 + 2i Express 3 + 4; in the for m a + bi.
Salvllall
We multiply the n umerato r and deno minator by 3 Example C. I , we obtain - I 3
+ 2/ + 41
+ 21 • 3 - 4; 3 - 4; 3 + 4;
- I
+ 41
= 3 - 4i. Using
5+ 1015+10; 1 2 . = = - +- 1 2 3~ + 4 2S S 5
Below is a summary of some of the properties of conjugates. The p roofs fo llow from the defi nition of conjugate; yo u should verify them fo r yo urself.
55!
Appendix C
Compln; Numbers
I. z =z 2.z+w =z+ w 3. zw = XI\! 4. Ifz ,* O, thcn (w/z) = w/z. 5. z is real if a nd only if X = z.
1m
-•
b.
a+bi
The absolute value (or modulus) 1:1 of a complex number z :: a + bl is its distance from the on gin . As Figure C.3 shows, Pythagoras' Theorem gives
Observe that
b
zz.: (a
-I"-------I-"-- Ro "
Figwr. C.3
+ hi)(a - hI) =
(, 2 -
(l b,
+ hoi _ b2i2 =
(1 2
+ b2
Hence, This gives us an alternative way of describing the division process fo r the quotient of two complex numbers. If wand z +- are two comp lex numbers, then
°
- w,- w,. - = - '"
w
w ,
z
z z
- - -
zz
Izl 2
Below is a summa ry of some of the propert ies of absolute val ue. You should try 10 prove these using the defimtlon of absolute value and other properties of complex nu mbers.
,.
It! = 0 if and only if z = 2. 1'1~ 1'1 3. Izwl ~ 1 '11"1 f.
4. Ifz-:#O , then
-,I
5· 1'+wI< l~
+ Iwi
O.
I
~
--
1,1'
polar form
1m (I
+ bi
As yOll have seen, the complex number X = a + bi can be represen ted geometrically by the point (a, b). This point can also be expressed in terms of pola r coordinates ( r, 9), where r ;:: 0, as shown in Figure CA. We have
a = r cos O and
b
so
z= a
+ hi =
h = nin O
r cos O + ( rsin O)i
Th us. any complex number can be written in the polar fo rm -Ir---~a~---L---+R '
nlur' C.4
z = r(cos8
:t isin 0)
Ap~ndix
C
Complex Numbers
151
where r = 1:1 = V a2 + II and tan 0 = bf a. T he angle 0 is called an argument of z and is denoted by arg z. Observe that arg z is not unique: Adding or subtracting any integer multiple of 27T gives another argument of z. However. there is only one argument 0 that satisfies - Tr < I) S 7r
This IS called the principal argumellt of z and
Example C.3
IS
denoted by Arg z.
Write the following complex numbers in polar form using their principal arguments: (a) z = I +;
SIIIIII.
(b)w = I - V3,
(a) \lIe compute 1
tanO == - = 1
1m
1
~
Therefore, Arg z = (J
1
= -4
(= 4S0) . and we have
R, ~
3
- I
2
,, ,, ,, ,,
as shown in Figure C.S.
1
(b) We ha\'e
V,
-v, =-V'3
tanO =
1
, ,
, -2
•
I - \ 3;
Smce w lies in the fou rth quadrant. we must have Arg Therefore,
z
=
0
1T
= - "'3 (= - 60°).
figure C.5
See Figure C.5.
The polar fo rm of complex numbers can be used to give geometric interpretations of multiplication and division. Let ZI
=
'I(COSO I
+ isin OI ) and
Z:!
= ' lcos O.
+ isin( 2)
Multiplying, we obtain ZIZ:!
= "'icos9.
+ isin( 1)(cos92 + i sin(2 )
= '.'l[(COS 0, cos 01
-
sin 0, sin ( 1 ) + i(sin OJ cos OJ + cos 01 sin ( 2)]
Using the Ingonomelric identities cos(O, + O! ) "" cos 8, cosO! - sin 8. sin O2 sin{O, + 1)1) = sll18 1 cos 01 +
COS
O. sin 81
654
Appendix C
Complex Numbers
we obtain
1m
(I)
+0,
which is the polar form of a complex number wi th absol ule val ue 8 1 + O!, T his shows that
f l f2
and argument
Formula ( I) says Ihat to multiply two complex /lumbers, we /IIultiply their (ibsoilite vt/Iues and add their arguments. See Figure C.6. Similarly, using the sub trac tion Identities for sine and cosine, we can show that
figure C.6
(Verify this.) Therefore. 1m •
•
a nd we see that to dIvide two complex numbers. we divide their absolute vailies and sub-
tract their urgulllellts.
c--,-l-----+ R~
As a special case of the last result, we obtain a for mula fo r the recip rocal of a complex num ber in polar for m. Seuing ZI = I (and the refo re 0 1 = 0) and ~ = Z (and th erefore Ol "" 0). we obttlin the followmg:
If Z
"'"
r(cos 8 I
+ j sin 0) is nonzero, then I
-z "" -(cOft..8 - isinO ) ,
Flgur, C.l See Figure C.7.
Enmple C.4
Fmd the product o f I
Solution I
+
j ""
+
i a nd I -
Y3i in polar form.
From Exa mple C.3, we have
VZ( cos : +
isin : ) and I -
V3i =
l(
cos(- ; )
+
ISin ( - ; ) )
T herefore,
(I + ;)( \ - V3 i)
See Figure e.a.
=
2V2 [ cos(:
-
~)
+
I
Sin(: - ; ) ]
Appendix C
Complex Numbers
655
1m I
+i
R, 2
2
-I
{I (I
I -
n -12
3
+ i)( l - V3il = + v'3) + i(l - V3)
V3i
figure C.I
Rellarli must have
Since (I
+
i)(I - Y3i) = (J
+
\13)
+
i(l - \13) (check th is), we
(Why?) This implies that 1
+ v3
Zv,
and
.(n)
Sill
-
12
=
V3 -1 ZV2
We therefore have a method for finding the sine and cosine of an angle such as 71'/12 that is not a special angle but that can be obtained as a sum or diffe rence of special angles,
De MOivle'S Theolem If n tS a positive integer and z = r( cos () yields for mulas for the powers of z:
Z2 = r2(cos 28 Z3
Abraham De Moivre ( 1667- (754) was a French mathematiCian who made important contributions to trigonom et ry, analytic geometry, probability, and statistics
=
ZZ 2
+ i sin 0 ), then
repeated use of formula (I)
+ i sin 28 )
= r\cos38
l = ZZ3 = r4(cos40
+
isin30)
+ iSJJl40)
In general, we have the following result, known as De Mo;vre's Theorem.
&5&
Appe ndix C Complex Numbers
,
Theoll. Col
.
De Moivre's Theorem If z = r( cos 0
+
i sin 0) and n is a positive inleger, then
zIt = r"(cos nO
+
j sin
110 )
Stated differendy, we have I %~I
= I zl~ and arg(z ~) - Flargz
In wo rds, De Moivre's Theorem says that to take the tUh power of a complex number,
we take the 711 h power of its absolute value alld multiply its argumem by II.
Example Co5
Fmd (1 SoJulion
+
;)6.
From Example C.3(a), we have 1
+ i=
v'2(cos:
+
Isin : )
Hence, De Moivre's Theorem gives
6w
(1 + ;)6 = (\12)6 ( cos 4 + 1m
(
=8 cos
- 2 + 2i
, -, I
+i Ro
See Figure C.9,which shows I
0
6W) 4
.. 3W) 23w + 1510 2
~
8(0 + ;(-1))
+
1,(1
+
o
I S ill
, )2, (1
~
+
-8;
+
;)1 •... ,(1
i)6.
We can also use De Moivre's Theorem to fi nd nth roots of complex numbers. An 11th roo t of the complex number z is any com plex nu mber w such that
- 4 - 4;
w~
= z
In polar form, we have
- 8;
w = s(COSIp
+ isin Ip)
and
z = r(cosO
+
isin 0)
so , by De Moivre's Theorem,
Flglrl
S"{COSll
e.9
Powers of 1 + i
Equating the absolute values, we see that
s"= r o r We must also have cos
nip
= cos 0
and
sin ncp = sin 0
(Why?) Since the si ne and cosine functions each have period 211', these equat ions imply that "'P and (} differ by an integer multiple of 211'; that is,
rnp=(}+2Jm
or
Ip
=
8 + 2k1r
"
Appendix C Coml)lex Nu m bers
651
where k is an integer. Therefore,
[ (8+21m) + (0 +21m)]
w == ,-1/" cos
n
i sin
n
describes the possible nth roots of z as k ranges over the mtegers. It is not hard to show that k == 0, I, 2, ... , n - J produce distinct values of w, so there are exactly II di fferent nth roo ts of z = , (cos 0 + i sin 0). W e summarize this result as follows:
Let z = , (cos (J + ; sin (J ) and let ti nct nth roots given by
11
be a positive integer. Then z has exactl y n dis-
+2 1m) (0 + 2 k~ )] 0 + [ (
, II" cos
fo r k == 0, 1, 2, . . . ,
Example C.6
11 -
i sin
tI
(2)
"
I.
Find the three cube roo\s o f - 27.
Salulloll
in polar fo rm, -27 = 27(cos 7T + i sin 1T). 1t follows that the cube roots o f - 27 are given by
[
(_27) 'tl = 27ltl cos Using formula (2) with
/I
( ~ +321m) + jsin (~ +3 21m)]
for k == 0, 1,2
== 3, we obtai n
1m
2i /'[ cos ;
'", -'-..--+-++Re -3 3
3 = -+
+ Isin ; ]
3V3 .
2
+ [ ( ~+2~) ("+2w)] [ ("+ ''') + ( ,,+, ~ ) ] (5"3
2
271/l cos
3
+ j sin
3
- 3(cos 1T
i sm 11 ) == -3
27 1/ l cos
3
I sin
3
= 3 cOS
+ ; SII1
I
5 ") 3
~-+~3Y3\3 .
-,, -- ,, - '
-
flgur, C.l0 The cu be roo ts of - 27
-
As Figure C.IO shows, the th ree cube roots o f - 27 are equally spaced 211/3 radi:lI1s ( 120") apart aro und a circle of radius J centered at the origin.
In general, formula (2) implies that the 11t h roots of z == r (cos 0 + i sin 8) will lie on a circle o f radius ,-I'" centered at the o rigin. Moreover, they \"ill be equally spaced 21TI" radians (360hf) apart. (Verify this. ) Th us, if we can find one IIIh root of z, the remaining 11th root s of z can be obtained by rota ting the fi rSI roOI thro ugh successive Illcremen ts of 21T/ n radians. Had we known Ihis in Example C.6, we could have used the fact Ihat the rea l cube root of - 27 is - 3 and then rotaled il twice th ro ugh an angle of 271'/3 radians ( 120") to get the olher two cube rOOIS.
658
Appe ndix C
Complex Num~ rs
~nhard
Euler ( 1707- 1783) was the most prolific mat hematician of ail lime. He has over 900 publications to his na me. and hiSrolJectw works fill over 70 volumes. There are so many results aunbuted to him Ihat-Eule r's formula~ or " Euler's Theorem" can mean many differen t thinS$> dqlending on the contcxt. Euler worked In so many areas of ma th ~mali cs, it is difficult to list them all His cont ributions to calculus and analYSIS, d irr~re nll~1 C"qualions. number thwry. gwme-try. topology, mechanics, and Qther areas of applied ma thematics continue to be innuential. He also introduced much of th~ notattOn we currently usc, including Tr, t. I, l: for summation, .1 for difference, and [ (xl for a fu nctIOn, and was the first to treat sine and cosine as function s. Euler was born in Switzerland but spent most of his mathematkal life in RUSSia and Germany. In 1727, he joined the St. Pc-tmburg Aca&my of Sciences. whkh had been foonded by Cathenne I, Ihe Wife of I)eter Ihe Great. He wenl 10 Berlm in 1741 at the Invitation of Prederick the Great, but relUTned in 1766 10 SI. Pttersburg, where h ~ remained until his death. When he was )·oun8, he 10s1 the Vision m one eye as the result of an illness. and by 1776 he had ~I the vision In the other tye ~ n d was totally blind. Remarkably. hiS mathematICal output did not dimin ish, and he continued to be producti~ un til the day he died.
laler's formula In calculus, yo u learn tbal
th~
fu nction
c' = I
e~
has a power series expansion Z2
Zl
2!
3!
+Z+- + - +···
th at co nve rges for every real number z. It can be s hown that this expansion also works when z is a co mplex numbe r and Ihat the com plex exponential functio n e" obtys the usual rules fo r exponen ts. The sine and cosine functio ns also have po....'Cr • • series expanSions:
;t? x' sinx = x - - + - - - +-··· 3! 5! 7! xJ
cos x = I
r X4 x' - - +-- -+ _ .. 2!
41
6!
If we let z = ix, whcre x is a real number, then we have
e' "" e'" Using the fact that j2 length 4, we see thai
e'"
= I
=-
I,
(ix) '
::=
(ix)' 1 + ix + ':'f- + + .. 2! 3!
jl = - ', ,. ""
I, i ~
= i, a nd so on, repeatmg in a cycle o f + - - ...
+
= cos
This
x + isin x
r~markabl e
result is Know n as Euler's formula.
.. .
)
Appendix C Complex Numbers
Theorem C.2
659
."
Euler's Formula For any real n umber x,
ea: = cos x
+ isin x
Using Euler's fo rmula , we see that the polar form of a com plex number can be written more compactly as z = r(cos O + i sinO) = re itl For example, from Example C.3(a), we have
I
+
i =
V2(
cos:
+
i sin
~)
= V2e'''/4
We can also go in the other d irection and convert a complex exponential back IIltO polar or standard form.
Ixample C.l
Write the fo llowing in the fo rm a + hi: (a ) e'"
(b) e Hnr / 4
Solution (a) USing Eu ler's fo rmula, we have e'" = COS7f + Isin7f = - ] + i · 0 = - ] (If we write this eq uatio n as e irr + I = 0, we obtain what is surely o ne o f the most re markable eq uations in mathematics. It co ntains the fundamen tal operations of additio n, m ultiplicatio n, and exponentiation; the additIve identity 0 and the multiplicative Ident ity I; the two most important transcenden tal numbers, 7f and e; and the co mplex un it i-all in o ne equation!) (b) Using rules for exponents together with Euler's formula, we obtain
If z = re ll = r (cos 8
+
i sin O), then
(3)
z = r(cos 8 - ism O) The trigonometric iden tities cos( - O)
= cosO
and sln( - O)
=-
sinO
&&1
Appendix C Complex Numbers
allow us to rewrite equation (3) as
z=
r(coo( - O) + isin( - O» = re.i;-I)
This gives the following useful formula for the conjugate: If z
Mole
rr4l, then
Euler's formula gIVes a (luick, one-line proof of Dc Moivre's Theorem: [r(cos 8
+ IsmO) ]" = (rc oll )" = r"c,,06 = r"{cos Iii + isin nO )
."
..
~.-"
Appe
. ..
-, ~~
xD
Polynomials A polynomial is a function p of a single variable x Ihal ca n be wrillen in the form (I)
Eiller gave the most algebraIC of the proofs of the existence of the roots of fa polynomial} eqllallon . . .. l regard il as Ut/jIlSI 10 a5Cribe this proof exclusively 10 Gauss, who merely added the jinishillg toudlC5. ----Georg Frobenius, 1907 Quoted on the MacTutor History of Mathematics archIve,
where 110, al' ... , a" are consta nts (a" =1= 0), called th e coefficiems of p. With the con vention that ~ = I, we can use su mmation 110tatioll to write p as
p(x) =
2:" akx '
1;_U
The integer n is called the degree of p, which is denoted by writing deg p = nom ial of degree zero IS called a cotlStant polynomial.
II.
A poly-
http://wwv.'-history.mcs. st -and.ac.uklhistory/
lump Ie 0.1
Which of the followi ng are polynomials?
(h) 2 -
(d) In
e (2"")
(g) c05(2
(,)
lx
1 3x '
x 2 -Sx+6
x-2
(f)
Vx
(h) ,.
COS - IX)
SOlulloD (a) Th is is the only one that is obviously a polynomial. (b) A polynomial of the form shown 10 equation (I ) cannot become infinite as x 2 approaches a finite value [lim p(x) + 001, whereas 2 - 1/ 3x approaches - 00 as x
-,
"*
approaches zero. Hence, it is not a polynomial. (c) We have
which is equal to V2x when x 2: 0 and to - V2x when x < O. Therefore, this expresSiOIl IS formed by "splicing together" two polynomials (a piecewise polynomial), but it IS not a polynomial itself. 661
'"
AppendIx D
Polyno mials
(d) Using properties of exponents and logarithms, we have
In( 2,~:' ) .><4
=
In(2eSX '-JX)
= In 2
so this expression
IS a
+
=
In 2 + In(eSxl- )X)
5x' - 3x = In 2 - 3x
+ 5x'
polynomial,
(e) The do main of this fu nction consists o f all real numbers x of x, the functi on si mplifies to
x2
* 2. For these values
5x + 6 (x - 2)(x - 3) = = x - 3 x- 2 x- 2
-
so we can say that il is a polynomial 011 its domain.
( f) We sec that thIs function cannot be a polynomial (even on its domain x
0), since repeated differentiation of a polynomial of the for m shown In equation (I ) eventually results in zero and Vx docs not have this property. (Verify Ihis.) (g) The domain of Ihls expression is - I a trigonometric identity, we see that
COS(2(OS- 1 x) =
(OS
:$
?;
x $ 1. Let 8 = cos- 1 X so that cos 0 = x. Using
20 = 2cos 2 0 -
1
= 2r -
I
so this expression is a polynomial on its doma in. (h ) Analyzing this expression as we did the o ne in (f), we co nclude that it is not a polynomial.
Two polynomials are equal if the coefficients of correspo nding powers of x are all equal. In partic u la r, equal poiynom13ls must have the same degree. The sum of two polynomials is obtained by adding together the coeffi cients o f corresponding pow~ crs of x.
Example 0.2
Fi nd thesum of2 - 4x + rand 1 + 2x - '?
50lull00
+ 3x3•
We com pute
(2 - 4x
+ x 2) +
(1 + 2x - x 2 + 3x J) "" (2
+
I) + (- 4 + 2)x
+ ( I + ( - I )) x' + (0 + 3)x' ::: 3 - 2x+ 3K
wh", W, h,y< "p,dd,d" ' h' fi,,, polynom;,1 by g;v;ng;' ,n
x'
,o,ffi,;w, of
wo---t-
We define the difference of two polynomials analogously, subtract ing coefficients instead of adding them . The product of two polynomials is obtained by repeatedly using the d istributive law and then gathering together corresponding powers of x.
AppendIx D
Example 0.3
Find the product o f2 - 4x
SOIIUO.
Polynomials
&&3
+ xl and I + 2x - i' + 3x'.
We obta in
(2 - 4x + i')(l + 2x - x 2 + 3xl )
+ 3xl) - 4x( 1 + 2x - xl + 3x') x 2 + 3x' ) + 6X» + (- 4x - si' + 4K - Ih" ) x' + 3x')
+ 2x - xl + x 2(! + 2x (2 + 4x - zi' + (x' + 2x' -
:::: 2( 1
=z
~
2 + (4x - 4x) + (- 2x' - 8x' + x') + (6x' + 4x' + 2x') + (- 1 2x~ - x4 ) + 3 ~
:.: 2 - 9i' + 12x' - 13x" + 3 ~
Observe that for two polynomials p and C/, we have
deg(pq) = dcg P + deg {I If P and q are polynomials with deg q :S deg p. we can divide q into p, using long division to obtain the quotient plq. The next example illustrates the procedure. which is the same as fo r long division of one integer inlOano ther. Just as the q uotien t of two integers is not, in general, an integer, the quot ient of two polynomials is n OI, in general, anot her polynomial.
Example 0.4
Co mpulC
I
+ 2x -
Xl
+
2 - 4x + x
3Xl
2
.
S,IIII'I We w[1I perform long div ision. It is helpful to write each polynom ial with decreasIII8 powers of x. Accord ill gly, we hllve Xl -
4x
+
2"h~x~j--x""'+"""2~x-+"'-;-'
We begin by dividing xl into 3x' to obtai n the partial quotient 3x. We then mul tiply 3x by the divisor 4x + 2, subtract the result, and bring down the next term from the dividend (3x' + 2x + I):
r r
3x Xl -
4x
+ 2)"3,,;>:r:::-"Cx:l'"c+"""'2-=x"+C";", 3Xl - 12K + 6x Il r
- 4x + 1
Then we repeat the process with 11 r , multiplying II by r - 4x + 2 and subtracting the result from II xl - 4x + I. We obtain
+ 11 x l + 2x + I 12xl + 6x ll xt - 4x + 1 llxl - 44x + 22 3x
x2
-
4x
+
2l3x l -
3x J
-
40x - 21
...
Appendix D
Polynomials
We now have a remainder 40x - 21. Its degree is less than that of the diviso r x 2 4x + 2, so the process stops, and we have fo und tholl
3x 3
-
x2
+
2x
+
I = (x 2
-
3x J -x 2 +2x+ 1 0'
x l - 4x +2
4x
+ 2 )(3x +
II)
+ (40x
-
- 21)
40x - 21 = 3x + l1 + 2 x - 4x+ 2
Example 0.4 can be general ized to give the followin g result, known as the divi· sion algorithm.
Theorem 0.1
The Division AJgorithm If / and g are polynomials with deg g :5 deg f, then there art polynomials q and r such that
[(x) - g(x).(x) + ~ x) where either r = 0 o r deg r
< deg g.
,
In Example 0.4,
f(x) = 3x J
-
xl
+ 2x + I, g(x ) and
=
r -
4x
+ 2,
q(x) = 3x
+ II.
r( x):::: 40x - 21
In the d ivision algorithm, if the remainder is l ero, then
[(x ) - g(x).(x) and we S.1Y that g is a factor of f (Notice that q is also a facto r of f) There is a dose connection between the factors of a polynomial and its zeros. A uro o f a polynomial f is a num ber a such thatf(a) = O. [The number II is also called a root of the polynomial equatIon f (x ) = 0.) The following result. known as the Factor Theorem. establishes the con nection between fa ctors of a polynomial and its zeros.
r
Tbeo,.. D.2
The Factor Theorem Let fbe a polyno m ial and let a be a constant. Then a is a zero of fif and only if x - a is a fa ctor of f( x).
Proof
By the dIvision algorith m .
fix ) - (x - a)q(x) + , (x) where either r {x} = 0 or deg r constant. Now,
< deg(x - til
= I. Thus, in either case, r ex) = r is a
f{ a) - (a - a). (a) + , - ,
Appendix 0
so f(a) =
Polynomials
665
°
if and only if r = 0, which is equivalent to
[( x )
~
(x - a )q(x)
as we needed to prove.
-
----'-
There is no m ethod that is guaranteed to find the zeros of a given polynomial. However, there are som e guidel ines that are useful in special cases. T he case of a polynomial wit h imeger coefficients is particularly interesting. The follow ing result, known as the Rational Roots Theorem, gives criteria for a zero o f such a polynomial to be a rational nu mber.
Theorem 0.3
The Rational Roots Theorem L
f(x) = ao + alx + ... +
a"x~
be a polynomial with integer coefficients and let alb be a rational number writlen in lowest term s. If alb is a 7.£ro of f, then flo is a multiple o f a and a" is a multi ple of b.
Proof
If al b is a zero of f, then
-b -' + a (a)" -b (a) + ···+a (a)"
ao + a1 -b
n ~l
=0
~
Multiplying th rough by bn , we have
aull' + a 1all'- 1 + ... + an_1 a',- lb + a"a
=
0
(I )
which implies thai (2)
The left-ha nd side of equation (2) is a m ultiple of b, so ana" must be a multiple of b also. Since alb is in lowest terms, a and b have no common facto rs greater than I. Therefo re, an must be a multiple o f b. We can ,llso write equat io n ( 1) as - 'loll' = a 1all'- 1 + ... + aw_1a"- lb + a"d' and a si m ilar argument shows that flo must be a multiple of a.
Example 0.5
Find all the ra tional roo ts of the equation (3)
$olulloo If al b is a roo t of this equation, the n 6 is a multiple of b and - 4 is a multip le of fl , by the Ratio nal Roots Theorem . Therefore, aE{±1, ± 2, ± 4}
and
b €. {±1,±2,±3,±6}
...
Append ix D
Polynollllais
Forming all possible ra tional nu mbers tljb with these choices of a and h, we see that the only possible ralional roots of the given equation are
± I, ±2, ±4,
±t.±~, ±1. ±l. ±1;
Substituting these values into equat ion (3) one at a time, we find that - 2, - ~ . and ! are the o nly values from this Jist that are actually roots. As we will see shortly, a poly· nomial equation of degr~ 3 ca nnot have more than three roots, so these aTC not only all the rational roots of equatio n (3) but also Its only roots.
We can improve on the Irial-and -error method of Example D.5 in various ways. For example, o nce we find one root a o f a given polynomial equation [(x) = 0, we know that x - a is a factor of {(xl- say, { (x ) := (x - a)g( x). We can the refore divide I (x) by x - a (using long division ) 10 find g(x). Since deg g < deg f, the roo ts of g(x) = 0 [which are also roots of f(x ) = OJ may be easier to find. In particular, If g(x) is a quadra tic polynom ial, we have access to the quadratic formula. Suppose
(lx Z+bx+c=O (We may assume that a is positive, since m ultiplying both sides by - \ would produce an equivalent equation otherwise.) Then, completing the square, we have
a
b' b b ' ) (xl +-x+- =-,. c •
4,.'
(Verify this.) Equivalently,
a( x+ faY = :~ -
c or
( X
+
.!.)' ,,, = ,.' Ii -
4ac
Therefore,
-b± VII'
x=-
,,,
-
4(1c
Lei's revisit the equation from Example 0.5 wIth the quadratic formula in mind.
Example 0.6
Find the roots of 6i'
+
13xl - 4 = O.
SO'IU"
Let's s uppose we usc lhe Rational Roo ts Theorem 10 discover that x = - 2 is a rational root of 6x' + 13x 1 - 4 = O. Then x + 2 is a factor of 6x' + 13x 2 - 4, and long division gives 6x'
+
13x z - 4 = (x
+ 2)(6xl +
X - 2)
Appendix D
Polynomials
661
(Check this.) We can now apply the quadratic fo rmula to the second factor to find that its zeros are
x=
- I ±VI ' - 4(6 )(-2) 2·6
- 1±Y49
- 1±7
12
12
-.!.
,
-
12
12 '
i.
or, in lowest terms, ~ and - Thus, the three roots of equation (3) are - 2. ~ , and as we determined in Example D.s.
S.
Rellark
The Factor Theorem establ ishes a con nection between the zeros of a polynomial and its linear factors. However, a polynomial without linear factors may still have factors of higher degree. Furt hermore, when asked to factor a polynomial, we need to know the number system to which the coefficients of the factors arc supposed to belong. For example, consider the polynomial
p(x)
= X4
+
1
Over the rational numbers 0., the only possible zeros of p are I and - I, by the Rational Roots theorem. A quick check shows that neither o f these actually works, so p(x) has no linear factors with rational coefficients, by the Factor Theorem. However, p{x) may still factor into a p roduct of two quadratics. We will check for quadratic factors using the method of undetermined coefficients. Suppose that X4
+ [
== (x 2
+ ax + b)(x 2 + ex + d)
Expanding the right-ha nd side and comparing coefficients, we obtain the equations a+c = b+ac +d= bc+ad = bd =
O O O 1
If a = 0, then c = 0 and d = - b. Th is gives - b 2 = I, which has no solutions in Q . Hence, we may assume that a O. Then c = - a, and we obtain d = b.lt now follows that b 2 = I ,so b = I or b = - I. This implies that a2 = 2 or a l = -2, respect ively, neither of which has solutions in 0.. It follows that X4 + I cannot be facto red over 0.. We say that it is irreducible over 0.. However, over the real numbers R, X4 + 1 does factor. The calculations we have just done show that
'*
X4
+
1=
(X l
+ Vzx + 1)(x 2
-
Vzx + I)
(Why?) To see whether we ca n factor further, we apply the quadratic formula . We see that the first factor has zeros
x=
=
-Vi ± v=2 2
=
Vi
.
2 ( - I ±,)
I
I .
- Vi± Vi'
681
AppendIx 0
Polynomials
which are in C but not in R. I-iencf!.x' + V2x + I canno t be factored mto linear facto rs over R. Similarly, x, - V2x + 1 cannot be factored into linear factors over R. Our calculations show that a com plete factorizat ion of x· + 1 is possible over the
1m
complex numhers C. Th e fo ur zeros of :t 1
0----1,-.", I
a "" -
1 .
_
Vi + \12"
I
I
V2
a '" - (I ::::
+ I are
I
Vi" I
v'l
-= -a
.
v'l'
which, as FIgure 0.1 shows, all he on the unit circle in the complex plane. Thus, the factorization of X 4 + 1 is
flilur. 0.1 X4
+
I ::
(x - a)(x - a)(x + a )(x + a)
The p receding Remark ill ustrates .several important propertIes o f polynomials. Not ice that the polynomial p(x) = :t + I satisfies deg p = 4 and has exactly fou r zeros in C. Furt herm ore, its complex zeros occur In conjugate pairs; that IS, us complex zeros can be paired up as { l~ ,ii}
and
{- a, - a}
These last two facts are true in general. The fi rst is an instance of the Fundamental Theorem of Algebra (fTA), a result that was first proved by Gauss in 1797,
•
TheOIl. 0.4
The Fundamental Theorem of Algebra Every polyno mial of degree n with real or complex coefficients has exactly" zeros (counting muhiplicities) in C.
This importan t theorem is sometimes stated as " Every polynomial with real or complex. coeffiCIents has a z.ero in c." Let's call this statement ITA ' . Certainly, ITA implies ITA '. Conversely, ifFfA' is true, then if we have a polynomial p of degree n, it has a u ro cr in C. The Factor T heorem then tells us that x - a is a fado r of p(x) . so
p(x )
~
(x - a )q(x)
where q is a polyno mial of degree n - I (also with real o r complex coeffi cients). We can now apply ITA ' to q 10 gel another z.ero, and so on, making ITA true. This argu ment can be made into a nice induction proof. (Try it .) It is no t possible to give a formula (along the Ii nes of the quad ratic fo rm ula) for the zeros of polynom ials of degree 5 o r more. (T he wo rk of Abel and Galois confirmed this; sec page 308.) Consequently, other methods must be used to prove ITA. The proof that Gauss gave uses topological methods and can be found in mo re advanced mathematics courses. Now suppose thai
Appendix D
Polynomials
&69
is a polynomial with real cOefficients. L.et a be a complex zero of p so that
Then, using properties of conjugates, we have
= Pia) = 0 = 0
Thus, cr is also a zero of I'. This proves the fo llowing result:
The complex zeros of a polynomial wi th real coefficients occur in conjugate pairs.
Descartes stattd this rule in his 1637 book lA Grornetrie, but did not give a proof. Several mathematicians later furnishtd a proof, and Gauss providtd a somno.hat shar pt!r version of the theorem in 1828.
In some situations, we do not nced 10 know what the zeros of a polynomial arewe only need to know where they are locatcd. For exam pic, we might on ly need to know whether th e zeros are positive or negative (as in Theorem 4.35). O ne theorem that IS useful in this rcgard is Descartes' Rule of Sig1Js. It allows us to make cerlain predictions about the number of posi tive zeros of a polynom ial with real coeffi cients based on the signs of th ese coeffi cients. Given a pol)'nomial l1o + a.x + ... + a~x ", write its nonICro coefficients in order. Replace each positive coefficient by a pl us sign and each negative coefficicnt by a minus sign. We will say that the polynomial has k sign dtanges if there are k places where the coefficients change sign. For example, the polynom ial 2 - 3x + 4 ~ + x' 7xs has the sign pattern
--
+-++-
so it has three sign changes, as indicated.
The.r•• D.5
Descartes' Rule of Signs Let p be a polynomial with real coefficients that has k sign changes. Then the number of positive zeros of p (counting multiplici ties) is al most k.
In words, Descartes' Rule of Signs says that a real polynomial cannot have more positive zeros than it has sign changes.
Example 0.1
Show that the polynomial p(x) = 4 + 2Xl - 7X4 has exactly a liI' posi tive zero. The coefficients of p have the sign pattern + + -, which has only one sign change. So, by Desca rtes' Rule of Signs, p has at most onc positivc zero. But p(O) ., 4 and p ( I) = - I, so there is a zcro somewhere in the interval (0, 1). Hence, this is the only positive zero of p.
Solullon
6JO
AppendIx D
Polynomials
We can also usc Descartes' Rule o f Signs to gIve a bound on the number o f I/ega Ijl'ezeros of a polynomial with real coefficients. Let
p(x} = flo + ajx +
U 2X 1
+ .. +
tl..x ~
and leI b be a negative zero o f p. Then b = - ( for c > 0, and we have
0 = P(b ) = Ito + alb +
Q1
b 2 + ... + a"b~
Blit
so c is a positive zero of p( - x), Th erefore, p(x) has exactl y as many negative zeros as p( - x) has positive zeros. Combined with Descartes' Rule of Signs, this obser va tio n yields the following:
Let p be a polynomial with real coefficients. Then the number of negative zeros of p is at most the num ber of sign changes of p( -x),
Example D.B
Show that the zeros of p(x) = I + 3x + 2Xl + x 5cannO I all be real. SOlutlol
The coeffi cients of p(x) have no sign changes, so p has no positive zeros. Si nce p( - x) = I - 3x + 2x 2 - x 5 has th ree sIgn changes among its coefficients, p has at most th ree negative zeros. We note that 0 is not a zero o f p either, so p has at most three real zeros. Therefore, it has at least two complex (non -real) zeros.
rs 10 Selected b Exercises Answers are easy. It's askiNg fht' right qllutlom /that's/ hard -DoclorWho "The Face of Evu,» By Chris Bou(her aBC, 1977
Cbaote, 1
(d)
,
Exercises 1.1 I.
y
In
In
Answers 10 Selecttd Odd-Numbered Exercises
19.
y
«I
y
2 4 -+-~X
-2
6
-2
2
1I.w =-2u + 4v y
y
(d)
,
6
2
4
,,.
, -+-+--~_ x
246
,,
, ,,
-2
--+--+~x
6
7. a+b - [5,3 ]
Exemscs 1.2 1. - 1
Y 4
7.
3.11
¥S. [ - :j~]
2
++--+-+ x 6
17.
9. d - < = 15. - 51 )'
,
4
21. Acu te
23. Acute
25.60°
27 . .... 88. 10°
29. =1 4.34°
31. Since AB· AC'"
I
1·
d
1;a
33. If we take the cube 10 be a unit cube (as in f igure 1.31 ), the fou r diagonals (Ire given by the vectors
v= [-
V;j:].
u
- I
+v=
+ V31(2] (I + 31(2
1
(I - V3)('] u _ v = [( I
[ (V3 - 1)(2 • 15 . Sa
17. x = 3a
= 0, L BAC is a righl
an gle.
2. 31
13. u - [
1
-3
- 1
-4
13. -
19. Acute
-4
-2
11.
1( Vi< 9. Vi<. 2( Vi< 3( Vi<
II. v'6. [ I( v'6. IfV3. I( v'2. 0j 13. V17 15. v'6
b
4
5. 2
I
'*
I , d~=
- [ I
Since d,' dJ *" 0 for all i j (six poSSIbilities), no two dl3gonals are perpendICular.
Answers 10 Selected Odd-Numbered ExcrclMS
35. [
,,
-n
37.
41. A = V45/ 2
39.
- 0.30 1 0.033 - 0.252
(c) Perpendicular
21. [;] =[J +1[:] x
43. k =- 2,3 23.
6 1. The Cauchy-Schwar.l lnequality would be violated.
Exercises 1. 3 (b)3x+2y = 0
1. (a) [:] -[;] = 0 3. (a) [;] =
[ ~] + .[ - ; ]
(b) x = 1 -
I
y=
31 I
z
x 3 2 • Y = 2 (b ) 3x+ 2y+ z= 2 -3
2
Y =s I
9. (a)
+
z 2 (b) x =2s- 3t y = S + 2t Z
=
25
+
=
+,
15. (a); : - I
+
- I
/til
- I.
°
2
3 9. I
0 0 0 I
0 0 2 3
0 0
2 0 3 2
3
2 I
13. 2, 0,3
19. 3,2
23. No solution
21. x = 2 25. x = 3
27. No solution
29. x
=2
'*
(b )a = 1,5 33. (a l AJJ a 0 (el tl and III can h ave no commo n fac tors other than 1 ILe., the greatest common divisor (gcd ) of a and 111 is I].
- I
+
1
0
- 2
3: [;] = [_~] + {~] "'2
if and only if d 1 and d z are o rthogonal. But d • . d 2 = if and only if 1 + "'11112 :::: 0 o r, equivalently, m l m~ -
I
0
2
I
35. [I , 1,0, I, 1, 0)
37. Erro r
39. No error
41. d = 3
43. d = 7
45. d = 8
47. (a) Since
I ] and d l = [ I ]. The lines are perpendicular
=: [
3
= I 0 I
17. [ I , 1, 0)
17. Direction vectors for the two lines arc given by d,
37. i
3I. x = l ,orx = 5
I
I
35. 1Sv'i3/ 13
15.5
3
I
33· (1, 1, ~ ) 43. -78.9~
2
,
I
13. Y z
3l. o,n
II.
II. [;] = [_:] + I[ ~] X
29. 2V3j3
3 7.0
z
x
27. 3V2j 2
3. u + v =[0, I, O, O], u ' v • 5. + 0 I 2 3 0 0 I 2 3 0 I I 2 3 0 I 2 2 2 3 0 I
1 (b )Y =- 1 , = 41
4
I
y=
Exercises 1.4
x=
y = 1 - I
7. (a)
- \
l.u + v = [~J. U'V = l
X
5. (a)
-)
O+t 3 - I 3 25. (a) x = O,x= l. y = O, y= l,z = O, z= I (c) x + y - Z = 0 (b) x - y "" 0
k[~al, where kis a scalar.
45. v IS of the fo rm
(b ) Parallel (d) Perpendicular
19. (a) Perpend icula r
-,,, , -,,
613
°
c· u = (3 , I, 3, I, 3, I, 3, 1,3, 1,3, 1] . [0, 4 , 6,9, 5, 6, I, 8, 2, 0, 1, 5) = 7
'* 0
an error has occurred, SO the UPC is incorrect. (b) )0, 4, 7,9,5, 6,1,8, 2,0, 1,5 ) 51. d - 7
Answers 10 Selected Odd-Numbered Exercises
614
31.
Review QllestioflS
1. (a) T
('J T
(c) F
8 3.X =[1 1]
(i) T
- 2/Vs 7. I/ Vs
5. IW
o II. V6/ 2
9. 2x+3y - z = 7
13.
(g) F
The Cauchy-Schwarz Inequality would be violated.
17. x =2
15.2 V6/3
19. d = 7
y + z=l x- Y I 2x - y + z = J
33. !I,I ]
35. [4,-1[ 39. (a) 2x 4x 41. Let
37. No solution
+ y= + 2y =
6
= ..!.. and v
II
(b)x =f-1 s y= s
3
x
= ~. The solution isx = },y = y
,.
43. Let II = tan x, l' = sin y, w = cos z. One solutio n is x == 71'/4, Y = - 71'/6, z = 71'/3 . (There arc infini tely
cnapter 2
many solutions. )
ExerCIses 2.2
ExerCIses 2 J 1. Llllear
3. Not linear because of the
5. Linear
7. 2x + 4y = 7
9.x + y = 4(x,y
'* 0) 4 - 2. - 3t
II {[2,']}
13.
X- I
term
1. No
3. Reduced row echelon form
5. No
7. No
9. (a)
;
,
I
I
I
0
1
1
o
0
I
I
0
- I
0
1
- I
13. (bl
15. Unique solution, x = 3, Y = - 3
1 0 II. (b)
0
1
o
0
o
0 o 15. Perform elementary row operations in the order R4 + 29R J , 8R J , R4 - 3R1, Rl +-+ RJ , R4 - Rp RJ + 2Rp and, finally, Rl + 2R,. 17. One possibility is 10 perform elementary row operations on A in the order Rl - 3R" i R1, R, + 2R1, R2 + 3R" R, +-+ Rl • 19. Hint: Pick a random 2X2 matrix and try thiscarefully!
-+-+-+ x
-2 -2
4
-4
21. Th is 's really two elementary row operations combined: 3R1 and R2 - 2R ,.
17. No solution y
23. Exercise I: 3; Exercise 3: 2; Exercise 5: 2; Exercise 7: 3 2
25.
/'
,
2
-2 31.
[2 I - ):, J 3' 3'
19. [7, 3[
23. [5, - 2, I , I] 27.
[~
-I I
~l
21. 25. (2, - 7, - 32[ 29.
1
27. t
I 5 - I - I I -5 2 4
4
1
- I
I
X
,Y
2 5 24
6
- 10
- 2
0
+
r
o o
J
o o
o
12
6 -6 +50 +t 0 I
o
o
1
33. No solution 35. Unique solution 37. Infinitely many solutions
615
Answers to Selected Odd·Numbered Exercises
39. NIIII: Show that if ad - be,* 0, the rank of [ :
'*
:J
is 2. (There arc two cases: a = 0 and a 0.) Use the Rank Theorem to deduce that the given system Illust have a unique solution. 41. (a) Nosolutioni f k =- 1 (b) A unique solution if k + I (c) Infinitely many solutions if k = I
If-urther row operations yield x = ( n + IJ)/2,
y
~ (a -
1
II. We need to show that the vector equation
'* '*
o
45.
Y
,
-]
+
49. No in tersection
x, x 1 are Inc solutions of the
51. The required vectors x =
x, homogeneous system with augmented matrix [
0]
UI
III
II J
VI
Y1
V3 0
*"
I
112V3 -
113 V2
53.
[ ~]
IIJ VI -
lil Y)
+z I
o
1
II 1Y2 -
112 v I
1
57. [~]
0
o
=
,
lOa
o ]
l b . Row red uction yields
I
O
l e
1
lOa
o o
1
b
I
, from which we can see
0 2b+c - 1l that there is a (unique) solution. IFurther row operat ions yield x = (a - b + c)f2, Y = (a + b - ,)/2. F ( - a + b + ,)/ 2.1
Hence, W = sp..1n
vector [
o
0,
1 I ,
1
o
1
I
- ~]
(b) The line with general equat ion 2x + y = 0 15. (a) The plane through the origin with direction vectors
1
3
2,
2
o
-I
(b) The plane with general equation 2x - y + 4% == 0
Exercises 2.3 I. Yes
1
a b has a solution fo r all values
13. (a) The line through the o rigin with direction
But a direct check shows that th ese arc still solutions even if II I = Oand/orII 1 Y2 - 112v I = o. 55.
o
1
Uy Theorem 3, there are infi nitely many solutions. If 14, 0 and II , V2 - Uzv l 0, the solutions are given by
*"
1
]
-7
1
+
of a, b, and c. This vector equation is equivalent to the linear system whose augmented matrix is
9 - ]0
I
0 1
43. (a) No solution if k = ] (b) A unique solution if k - 2, I (c) Infinitely many solutions if k = - 2
x
>I"". R' = ",.n( [:].[ _: ])
b)/2.J
17. Substitution yields the linear system 3. No
5. Yes
7. Yes
9. We need to show that the vecto r equation
11
x[:] +
r[ _: ] = [ ~] has a solution for all values of aand b. ThiS vector equation is eqUivalent to the linear system whose augmented matrix is [ :
_ ::]. Row
reduction yields [ I I 1. a ]. from which we can 0 - 2u - {1 see that there is a (unique) solution.
- 11
+ 3c= O + b - 3c = 0 -3
whose solution is I
O. It follows that there are 1
infini tely many solutions, the simples! perhaps being
a = - 3,b = O,c = I. 19. u = u + O{U + v ) + O{U + v + w) v = (- I)u + (u + v) + O(u + v + w) w = Ou + (-I )(u + v) + (u + v + w)
6J6
Answers to Selected Odd-Num bered Exercises
21. (cl We must show that span(epe" e]) = span(e l , e l + e2 , e l + e 2 + e). We know that span(e p e l + e" e l + e, + e3 ) C IR:) = span(el' e " e, ). From Exercise 19, e l' e2, and e] alJ belong to span( e p c I + e2, e l + e, + e3 ). Therefore, by Exercise 21(b), span(e p " 2' e3 ) = span(e l' e l + " 2," 1 + e, + eJ)' 23. Linearly independent
o 25. Linearly dependent, -
+
1
2
2
2
1
0
3
\
21. (a) 1 = \0 amps, II = 15 = 6 amps, I) = 14 = 4 amps, I} = 2 amps
(b) Riff = ~ ohms
(e) Yes; change it to 4 ohms. 23. (a) Yes; push switches 1, 2,and 3 or swilChes 3, 4, and 5. (b) No 25. The states that can be obtained are represented by those vectors X,
x,
27. Linearly dependent, since the set con tains the zero vector
x,
29. Linearly independent
x,
31. Llllearly dependent,
3
- \
- \
\
- \
3
-\
\
\
3
3
\
+
\
+
\
- \
- \
in .l~ for whic h X l + such possibtlities.)
Exercises 2.4 1.
XI
= 160, x ,
= 120,x) = 160
3. two small, three medium, four large 5. 65 bags of ho use blend, 30 bags of special blend, 45 bags of gourmet blend
7. 4FeS,
+
9. 2C4 H w II.
) 2Fep]
1102
+
2~H lI OI-I
1302
+
1502
l SCOl
13. Na 2C03 + 4C + N2 15. (a) j; = 30 - I f 2 = - 10+t
l
+
2NaCN
IOC02
+ 3CO
(b) f l = 15,f} = 15
= t 0 < f l < 20
O S f2<20 10 S f , S 30 (d) Negative flow would mean that water was flowing backward, against the direction of the arrow. 17. (a) j; = - 200 + 5 + t A = 300 - 5 - 1 h= 5 h = 150 t is= t
A=
(b) 200 <
h<
300
O,then fs = 1 > 200(fromthei; equation), but is = 1 ~ 150 (from the equatIOn ). This IS a contrad iction .
(c) If
5 =
+
Xs
= O. (There arc 16
\
\
0
0
0 2
\
\
0
0 0
1 1
\
\
0 \ 0 2
0
\
\
\
\
0
0
0
\
\
2
\
\
2 \ 0 2
\
0
0 0 0
I
0 0
0
\
0 0 0
0
0
\
\ 2
0
0
0
0
0 0
TJllS yields the solutions
fJ (e)
.\4
which reduces over.l] to
+ 850) + IOHP
12HP
l
+
27. If 0 = off, 1 = light blue, and 2 = dark blue, then the Imear system that arises has augmented matrix
(b) No
43. (a) Yes
Xl
r.
(d) 50 ~ h<300
19.11 = 3 amps, 12 = 5 amps, I, = 2 amps
X,
\
2
X,
\
\
x, x,
2 + 10 2 2
X,
\
where t is in 1:3" Hence, there are exactly three solutions: I
o
2
I
2
0
2 , 2 , 2 2 \ o
o
\
2
where each entry indicates the number of times the corresponding switch should be pushed.
In
Answers 10 Selectt>d Odd - Number~ E.xerd~ 29. (a) Push squllres 3 lind 7. (b) The 9x9 coefficient mat rix A is row cquivalcntto Zl' SO for any bi n ZI. Ax :c: b has a unique solut ion.
37. Ca) No solution (b) la,b,o,d",!1 = 14,5,6, -3 , - 1,01 [1 - 1, - 1, - 1,1,1,1\ 39. (a) y =
31. Grace is 15, and Ha ns is 5.
xl - 2x + 1
+
(b) y =r+ 6x + 10
41. A = \,8 = 2
33. 1200 and 600 squan.. yards
43. A =
35. (a) a = 4 - d,b = 5 - d,e = - 2 + (J,disarbitrary (b) No solutio n
-L IJ = J, C = O,D =
--h ,E =-i
45.a= f ,b =!, e = 0
Exercises 2.5 Ion
0
1
2
3
4
5
XI
0 0
0.857 1 0.8000
0.971 4 0.9714
0.9959 0.9943
0.999 1 0.9992
0.9998 0.9998
Xl
Exact solution: XI
3. n
x, x,
o o
o
-
1, X l
1
=
1
2
3
4
5
6
0.2222
0.2539 0.3492
0.26 10 0.3582
0.2620
0.2622 0.3606
0.2623 0.3606
0.2857
0.3603
Exact solution (to fou r decimal places): XI = 0.2623, x 2 = 0.3606
5.
PI
o
1
2
3
4
5
6
7
8
x,
o
0.3333 0.2500 0.3333
0.2500 0.0834
0.3055 0. 1250 0.J055
0.2916 0.0972 0.29 16
0.3009 0. 1042 0.3009
0.2986
0.3001 0.1008 0.3001
0.2997
X,
x,
o
o
0.2500
0.0996 0.2986
0.1000
0.2997
Exact solution: XI = 0.3, Xl = O. J, Xj = 0.3
7. "
x, x,
o o
o
1
2
3
4
0.8571 0.9714
0.9959
0.9998
0.9992
1.0000
1.0000 1.0000
After three iterations, the Gauss-Seidel method is within 0.00 1 of the exact solutio n. Jacobi's method took fo ur itera tions to reach the Sllme accuracy.
9. n
x, x,
o o o
1
2
3
4
0.2222 0.3492
0.2610 0.3603
0.2622
0.2623 0.3606
0.3606
After three itera tions, the Ga uss-Seidel method is within 0'(:Hl I of the exact solution. Jacobi's method took fou r iterations to reach the same accuracy.
III
Answers lo~kcled Odd-Nurll~red Exercises
II.
"
0
I
2
3
X,
0 0 0
0.3333 0.1667
0.2777 0.1112
0.2962 0.1020
0.2777
0. 2962
0.2993
X,
x,
• 0.2993 0.1004 0.2998
13.
5
6
0.2998
0.3000
0.1000 0.3000
0.1000
.1:2
I
0.3000
0.5 After rour iterations. the Gauss-Seidel method is within 0.00 I of the ('xact solutIOn. Jacobi 's m ethod took seven iterations to reach the same accuracy.
1 5. ~'~'____0~__~1____~2____~3~_____4C-
o
X,
3 -4
o
x,
-5 8
19 -28
-53
80
1f 1he equa tions are intercha nged and the Gauss-Seidel met hod is applied to the e(lu ivalent system
3Xl+2x2= 1 XI - 2X2= 3 we obtain
o o o
n X,
x,
I
2
3
•
5
6
7
8
0.3333 - 1.3333
1.2222 - 0.8889
0.9260 - 1.0370
1.0247 -0.9876
0.9918 - 1.0041
1.0027 - 0.9986
0.9991 - 1.0004
1.0003 - 0.9998
After seven iteral1ons, the process h as converged to within 0.00 1 of the exact solution x , = 1, x~ = - I.
X,
17.
I•.
10 10
20
x,
n
0
I
2
3
4
5
6
x,
0 0 0
-1.6 25.9 - 10.35
14.97 11.408 -9.3 11
8.550 14.051 - 11.200
10.740 11.615 - 11.322
9.839 11.718 - 11.72 1
10.120 11 .249 - 11.816
x,
x,
- 10 -20
n
7
8
9
10
II
12
- 30
x,
9.989 11.187 - 11.91 2
10.022 11.082 - 11.948
10.002 11 .052 - 11.973
10.005 11.026 - 11.985
10.00 1 11.0 15 - 11.992
10.00 1 11.008 - 11.996
x,
x,
After 12 iterallons, the Gauss-Seldclmcthod has convcrgl'
x.,=- 12.
Answers to Selected Odd· Numbered Exercises
21.
13
" x,
10.0004
X,
11 .0043 - 11.9976
x,
I'
15
16
10.()()()3 11.0023 - 11.9986
IO.()()()I 11.0014 - 11.9993
1O.()()()1 1L()()()7 -1l.9996
819
23. The Gauss-Seidel mcthod produces
"
0
I
x, x, x, x,
0 0 0 0
0 0 50
12.5 18.75 68.7 5 71 .875
62.5
Thc ex-act solution is X I 25. Th~ Gauss-Seidd
,"
0
I, I, I, I, I,
0 0 0 0 0 0
.
•
5
6
7
8
9
24.219 24.609 74.609 74.805
24.805 24.902 74.902 74.95 1
24.95 1 24.976 74.976 74.988
24.988 24.994 74.994 74.997
24.997 24.998 74.998 74.999
24.999 24.999 74.999 75.000
2
21.875 2 1.4 38 73.438 74.21 9
= 25. X2 = 25, X, = 75, x4 =
m~thod
produces the following I t~rates: 2
3
•
21.25 11.25 24.6094 5.8594 14.6289 24.9072
22.8 125 13.3203 26.9873 8.2373 16.2829 25.3207
23.3301 14.6386 27.7303 8.9804 16.7578 25.4394
I
20 5 21.25
2.5 7. 1875 23.0469
75.
5
6
23.6596 15.0926 27.9626 9.2126 16.9036
23.7732 15.2732 28.0352 9.2852 16.949 1
25.4759
25.4873
"
7
8
9
10
\I
12
I,
23.8093 15.2824 28.0579 9.3079 16.%33 25.4908
23.8206 15.2966 28.0650 9.3150 16.9677 25.49 19
23.8242 15.3010 28JJ671 9.3172 16.%90 25.4922
23.82 52 15.3024 28.0678 9.31 78 16.%95 25.4924
23.8256 15.3029 28.068 1 9.3 181 16.9696 25.4924
23.8257 15.3029 28.068 1 9.3181 \ 6.9696 25.4924
t, I, I, I, I,
27. (a)
" "
x,
0
I
2
3
0
0
!
,!
I
,
,
1
1
,
, •
x,
•
5
6
~
-'-
n M
1.2
II n
II
0.8
",
•
"
"
0.4
02
0.4
Answers 10 Sd ected Odd-Numbered Exercises
610
+
= I x,+2x2 "" I
(b ) h ,
X2
n
0
1
2
3
4
5
6
7
x, x,
0 1
0 0.5
0.25 0.375
0.3 125 0.3438
0.328 1 0.3360
0.3320 0.3340
0.3330 0.3335
0.3332 0.3334
(c)
[Col um ns 1.2, and 3 of this table are the odd-numbered columns 1,3, and 5 from the table in part (a).J The iterates arc converging to XI "" X, = 0.3333. (d) XI = X, :::
!
21.
[~
o 3.
(, ) T
(g) T
5. [~]
2
(' ) F
II .x -2y+z= 0
25.
27. BA =
13. (a) Yes
15.lor2
17. If cl(u + v) + c2(u - v) ::: 0, then (c] + cJ u + (c i - c2 )v = O. Unear independence of u and v impliesc i + Cl:: Oandc] - C2 = O.Solvingthis system, we gct C I ::: C2 = O. Hence u + v and u - v arc linearly indepcndcn t. 19. Their ranks must be equal.
-6 12
I! ]
' · l I O)
0 13.
17.
0 0
[~
19
II.
0 0 0
0 0 0
2 - 12 - 8 0 0 0 0 6 0 + I - I I + - I -6 0 0 0 I 6 0 l A , + 3Al (where A, is the /th row AI - A2 + AJ - AI + 6A2 + 4AJ
,
3
-,
[-,
IS. [
2~ ]
1 2! 2 0 3 4 5 3 .. ..·· .. ·· 1.......... • ..
-!J
27
12~]
1.75
1.50
2.00 ] , BA = [ 650.00 1.00
675.00
i
33.
5
1
0 :1
2
o
1 :0
- I
~].AJ = [-~ -~J.
- I [ - I
A' [0 -I] A' [I A'::[_~ :] =
(b ) A"" = 37.
1.00
l
0
35. (a) A2 =
8
- 4'
•
I' 0
• .................
o
~]
19. B = [ 1. 50
- \
3. Not possible 7. [ 3
+ 6a J
29. Ifb, is the ith column of B, then Ab, is the ith column of An. If the columns o r B are linearly dependen t, then there arc scalars cp • •• , c~ (no t all zero) such that clb l + ... + c"b" = O. But then cl( Ab ,) + ... + c"( A b ~):: A(c, b l + ... + c"b"):= A O = 0 ,50 the columns of AB are linearly dependent.
31.
-~]
-,
x,
320
Exercises 3.1
[ 12
[~]
""
of A )
cnapter 3
s.
X2
-, ,
-6
9. {0,3, 1)
7.k = -\
- I
L[ 3 -5
-5
2
«) F
3]
23. AS = [2a l + a 2 - a j 3a l - a 1 (where a i is the ith column of A )
Review QllestlOrlS I. (a) F
x,
-2
A~
46250 ] . 406.25
Column i corresponds to warehouse i, row 1 co ntains the costs of shipp mg by truck, and row 2 contams the costs of shippi ng by train .
39. (a)
""
1
=
- \ '
I
[-~ _~]
1 [0 'I'] I
- I
I
- I
- I
I
- I
I
I
- I
\
- \
- I
I
- I
1
«)
o
0
0
0
I
I
I
I
2 3
4 9
8 27
16 81
Answers to Selected Odd-Numbered Ellerc\scs
Exercises 3.2
af(a' + b' ) 9. [ - hl( a! + bl )
L X = [~;]
b/(a' + b' )] al(a 2 + hI)
x,~ [_;lx,~ [ -~j. x, ~ [ _~]
13 (.)
-I
0 I
13. Linearly independent
15. Lmea rly independent
23_ a = d,e :::: 0
29. E=
25_ 3b = 2c,a = d-c
27_ a = d, b = c = 0 29. Let A =: [ ",,] and B = (b,,] be upper triangular tl X tl matrices and let I > j. Then, by the defin ition of an upper triangular matrix, a il :::: a iZ "" ••. = (1 .. , _ \ = 0 and b i) = b i+ \,j "" ... = b"l = 0
Now let C :::: An. Then e'l = alibI} + aj2b1} +
... a,. •_,b'_l.j + + ... + amb~J
+ a,.,. \ b ,t!.)
a,;b'l
(lm'O = 0
43. (b)
4 5 7 8
6
-
9
3 5
5 7
7
9
3
5
+
0 I
2
[ _~
-:J
5. Not invertible
- 1.6 0.3
-2.8 ] I
0 0
0j. A- ' - 2
37.
~
I
o o
0
lie
0
o o
I
-m: ~]
[I 0
+
1 = 0 as A(2 f - A ) = f.
- I
-2
0 I
- I
[1/«((-+ I)
51. 4 (((
s
53. Not invertible 0 I/ a 55.
- I/ d
I/O'
0
59.
0 0
I/ a
,a
+ 1)
¢
- a/(d +
-2
5
- 4
4
I
5
I
2 2
9
2
-2 - 2 - 4
0
3
0
0 I
0 0
0 0
0
0
I
0
- Il/d - bid
- c/ d II ()
1)]
I/ (d + I)
- il ti I/a
- II
I
3. Not invertible 7. [
35.
0 2 I
0 I 0
43. (a ) If A is invert ible, then BA = CA=> (BA)A- I = (CA)A-' => /l( AA- ' ) ~ C(AA -') => Bl ~ CI => B = C.
Exercises 3.3 I.
I
39. A=[ - II ~][~
57.
47. Hi,.,: Use the trace.
0
[~ ~ ]
10
41. Either A or 8 (or both ) must be the zero matri x. I
o
[J ~]
3 1. 0
49. [11]
"'*
2 3
I
47. If AB is invertible, then there exists a matrix X such that (AB)X "" I. But then A(8X) :::: 1 too, so A IS invertible (with inverse EX ).
fro m which it follows that C IS upper triangular. 35. (a) A,Bsymmetric => (A + B)T = AT + BT:::: A+B A + B is sym metric 37. Matrices (b) and (c)
0
0 2 I
45. Him: Rewrite A2 - 2A
"" 0-b1j+ 0· hl/ +· _·Q·b,_ I., + a,, · O
+ a,.'+ I'O + ... +
33.
681
,d ,* 0
Answers to Selected Odd-Numbered Exercises
&12
4
6 1. Not invertible
6
4
4
63.532
27.
065
69.
0 10 0 • o I *i ........ 0 0. ............... - 2 - 3 1[ 0 - \ - 2:0 I
-2
0 1• \
- 1
\
- \ !
1
0
I . Subspace
- 3/ 2 - 2
13.
5.
2
!] 3
-1
- I
- 3 3
2
I
1
I
o
1
o
\
- 2
0
o o
3
3
1
0
0
5
[-!
,\] A- I =
21. {[ I 0
{
[-5/12 1/12]
~ ,
0
1/6
0
I
0 001
o
0
o
I
000
010 23.100 o 0 1
o 25.
0
o
o o
000 010
1
I
0 o I - 1 5
I 0 0 I 0 o 0
o
0
0
001
I
o 1 o o o
I
I
0,
I ,
I
o
I
is a basis
1
I
-I J, [ I
1 l ]} isabasisforrow(A)j
[~]. [~]} ;" b,,;, focco l(A)
23. {( I
I 0
o
- I
0 0 I
2
I
o I o o
4
000
1]. [0
- I
0
0
1 0
o o
0
o
- I
1
ww(A )= {(a and {[
I
- I IJ. [0 1 - I - Ill ;' 100 0, 1 , 0 is a basis fo r 0 0 1
b
-a+2bll Both{[ : ].[~]}
~]. [ ~ ] } are linearly independent spanmng
setsforcol{A} = R2. I
-1 6
I
0
0
o
0
- I
25. Both {( I 0 -1]. [0 I 2ll.nd{[ 1 0 -I J. [ 1 I I ]} arc linearly independent spanning sets for
1 0
1 0
is a basis fo r col(A);
is a basis fo r null(A).
a basis fo r row(A)j coleA)
1 0 0 1 0 o 0 o 0 1 0
o
I
1/6
00 1 0 1 0 19. 0 1 0 I 0 0 100001
0
- 2 I
for row(A);
0
15. C' = [11 0]\ • V-I =
21.
I
I •. {(I 0 I OJ.[O I - I OJ. [O 0 0 ' ll ;"b,,;,
1 23 0 - 3 - 6 0 0 3 0 0 1 2 0 0 o 2 1 0 o 0 - 2 1 o 0
o
1
I 2]}isa basisforrow(A)j
for null(A).
1
o
0
7. Not a subspace not in row(A).
{[: J. [~]} is a basis fo r coleA);
- 2
100 1 0 0 1
17. {[ I 0 - IJ,[O
- IS
-1
0
6
15. No
- 7
- I
0
3. Subspace
5. Subspace
0
ExerCises 3.4
1 00 9. 4 1 0 8 3 \ 1 0 2 1 11. o 3
I
-I]
Exercises 3.5
II. b is in col(A), W IS
7. [ _~ ~] [~
- \
0] [ '
~
,
3.
I 0 ] [-2
I
• o 1 j - \ 0 ............... ............. o \ :• 0 0
71.
31. [
- I
27. I - I
o
I I
2 3
2 •• {( I
I
I
31. {(2
004
- I
- I ,
0
o o
I
OJ. [O I
OJ. [0 0 III - 3 1]. [1 - I OJ. [4 - 4
III
683
Answe rs to Selected Odd-Numbered Exercises
35. rank(A ) = 2, nullity(A) = I
15. [F)-
37. rank(A) = 3, null ity(A) = I 39. If A IS }X5, then rank(A) :s 3, so th('re cannot be more than tnr('(' lill early inclepend('nt colum ns.
19.
-I 0 ] [ 0
[~ ~]
17. [ D) =
I
[~ ~] stretches or contracts
III
the x-direction (com-
[~ ~]
4 I. nullity(A}:: 2,3,4,or 5
bined with a reflection in thc y-axis if k
43. If a = -1, lhen rank(A) = I; ifa = 2. then - 1,2, rank(A) = 3. rank{A) = 2; for (l
stretches or contrncls in the y-dirtttion (combined
"*
45. Yes 47. Yes 49. W IS in span(l3) if and only if the linear system ,,,ilh augmented matrix [6 1wi is consistent, which is true in thiS case, sinct I 0 3 I I I [B lw) - 2 o 6 - .... 0 1 - 2
o
with a reflection in the x-axis If k
[-~l
example. J' )'
(0. I),
5 1. ra nk(A) - 2, nulJity(A) = 1
t----"(I.
,
II
55. Let AI' ... , Ambe the row vectors of A so that row(A) = span{A I, ... , A,,,). If x is in llull(A), then, sinceAx = O, wcalsohave A, 'x = Ofori= 1, .. . ,111 , by the row-column definition of matrix multiplication. If r is in row(A), then r is of lhe form r = clAI + ... + ,,..A,,.. Therefore, r· x =(cI AI +" +c",A .. }·x 57. (a) If a set of columns of AB is linearly independent, then Ihc correspondIng columns of B arc linearly independent (by an argument similar 10 Ihal needed to prove Exercise 29 in Sect jon 3.1 ). 11 follows that Ihe 'lItiXltl/WII number k of linearly independent columns of AB Ii.e., k = rank(A 8) I is nol more than the maximum number r oflinearly independent col um ns or 8 [i.e., r = rank( 8) I. In other words, rank{AB) :s rank(B). 59. (a) From Exercise 57(a), rank( VA) S rank(A) and rank{A) = rank{(Ir'U) A)::: rank( IrI(UA)) < rank( VA ). Hence, rank( VA) "" rank(A).
Exercises 3.6
(I. 0)
(0.0)
(0. k )
- I I
> 0)
.,
( 1.0)
(0.0)
( 1. 0)
[ 0 /2 1/ 2] 21. -I / 2 0 / 2 0
( I. k)
[; :], (k
- I
(1.0)
" (I. I)
25. [
.,
(0.0)
J'
-t -:J [-,i n
23. [
-~]
27.
3Q5 . T] =
[-84 ~l
33. [5 ' T) =
[~
35, [5 ' T] -
1. 11") = L~].11V) = [~]
[~
(k+l.l)
,
= ' I(A I ' x) + ... + ' ..(A ... · x) = 0
13.
(1.'. 1)
(k> 0)
53. rank{A) = 3, nullity(A) = 1
II. [ :
[~ ~] ISa
0]. . t he y- d lfectlOn. ' . F'or I IS a SIlear III
· . [ k' x- d Ifecllon;
0 0 From this reduced row echelon form , it is also clear
Ih" [w). =
0);
0);
[~ ~] is a shear in Ihe
refl ection in the li ne y = x;
o
-I 2
<
<
-~]
6
-2
I - I
0
- I
I
0
0
- ]
I
[ - 0 /2 1/ 2] 37. 1/ 2 V3/2
[ - 0/ 2 - 1/ 2] 39. 1/2 -V3/ 2
614
Answers to Selected Odd-Numbered ExerCl.ses
45. In vector form, II'I the parallcl lines be given by x = P + rd and x' = p ' + rd. Their images are
T(x ) ~ ' I P + td ) ~ T(p) + (r(d ) , nd T(x' ) ~ T(p' + td ) ~ T(p') + tT(d ). S' ppose T(d ) O. lf T( p' ) - '1'(p) is parallel to T(d ), then the images rep-
*
resell! the same linc; o therwise the images represent distinct parallel lines. On the other hand, if T(d ) = 0, then the images represent two distin ct points if T(p') +- T(p ) and single point o therwise. 47.
II'
-
(2. 31
...
0.08 0.09 0. 11 II. (a ) P = 0.07 0. 11 0.05 0.85 0.80 0.84 (b) 0.08, 0.1062,0.1057. 0.1 057, 0.1057 (c) 10.6% good, 5.5% fair, 83.9% poor
15.4
..,1
17. 9.375 500
19. x L =
'"
of
720
70
, X2
=
350 , Xl
50
49.
I
(2. - 3) X7
('" ~) ,
x.D
XI= [~ ]. Xl = [:~].
=
l3~~]. x. = [~:]. ~ = [I~:].
10240] = [10240 '
(b) The first population oscillates between two Slales, while the second approaches a stead y state.
'---~ x
23. The population oscillates through a cycle of three slates (fo r the relative po pulation): [f O. 1 < s :S I, the actual population is growing; if s = 0. 1. the actual population goes through a cycle of length 3; and if o ::$ s < 0. 1, the actual population is d~cli n in g (and will ev~ nt ua ll y die o ut).
H, -:l y
0 25. A =
29.
I
0
I
I
0
I
0
0
I
0
I
I
0
I
0
27. A =
5. x l
=
0.6'
x '" 1
ISO 120 , x z 120
0.62
155 120 11 5
3.64%
7.
fi
I
I
I
I
0
I
I
I
0
0 0 I 0
I
0 I 0 0
°0
",
3 1.
Exercises 3.7
0.4 ] [0 . 3 8] [
0
I
I
I.x = •
11 75 50. 175
Xl= [2~J. x. = [ ::~J. xs == [:J.~ = [::~J.
y
5 1.
=
35
2 1. (a) Fo r Ll' we have
( - 2. - 3)
(b) 0.353
13. Thc entries of the vecto r j P are just the column sums of the matrix P. $0 P is stochastic if and only if jP - j.
y ( - 2, 3)
9. (a) P = [ 0.662 0.250 ] 0.338 0.750 (c) 42 .5% wet, 57.5% d ry
'"
I
I
I
Answers to Selected Odd-Numbered bercises
33. A
~
0
I
I
0
0
0
0
0
0
I
0
I
I
0
0
0
35. A
~
0
I
0
I
0
I
0
0
I
0
55. (AA T)'J co unts the number of vertices adjacent to /Jor/r vertex I and yertex).
I
I
0
0
0
57. Bipartite
I
0
0
0
I
I
0
0
0
0
6 1. A single error could change the code vector Cz = (0, ), 0, I) mto c' = { I, ),0, I ]. However, c 'co uld also be obtained from the code vector c~ = [ I, I, I, 1] via a single error, so the error cannot be corrected.
,>
37.
59. Bipartite
I
I
I
I
0
I I
63. 0
",
"
615
65.
0
39.
I
I
I
I
I
67. Pc ' =
is the second colu mn of P, so the error is
0 I
in the second componenl of c' . T he correct message vector (from the first four componen ts o f the corrected I
o o
vector) is the refo re x =
o
"2
41. 2
45. 0
43. J
69. ( ')P ~[ 1
47. 3
51. If we usc direct wins only, PI is in first place; Pl' p~, and P6 tIe for second place; and P I and Ps lie fo r third place. If we combine direct and indirect wins, the players rank as follows: P2 in first place, followed by P6' p., PJ • PSt and PI'
53. (a)
0 0
8,"
A~
0 I 0
I
I
I
I
IJ
I(}(}OOO
49. (al Vertex i is not adjal;ent to any other vertices.
Ann
I
0
I
0
I
0
I
I
0
0
0
0
I
0
I
0
0
I
0
0
0
Carl3
(b ) two steps; all o f the off-diagonal entn es of the second row o f A + Al arc nonzero. (d) If the graph has" vertices, check the ( j, j) -cnt ry of tncpowe rsA.l:for k = \, . .. ,,, - 1. Vertex jis connected to vertex j by a path of length k If and only if (A*),! 4- O.
o
(bl G
~
1 00
0 0 0 0 1 00()
0
0
0
o
0 0 0 1 0
I
0
0
o
0 00 0 I 111111 (c) The columns of Pare no t d istmct.
o 71. (a)
o o
0
I
I
I
I
00110011 o 1 0 10 1 01 0 ' 0'0 '0' 1'1'1' 1 00
1
11100
o
1
0
I I I
0 I 0
o
1 1 I 0 correcting code.
o
(b) P ~
0 I
0 0
°
0 1 0; no t an erro rI
88&
Answers to Selected Odd-Numbered Exercises
73. One set of candidates for P and G IS 11 p=
1 11
0
1 01
1
I
I 0 1 0 0
1 011011010 010
1 000
00001
1
1 0 0 0
o
I
1 111
o
0 0 0 0 o I 000 0 o 0 o 0 I o 0 o o 0 000 I o 0 0 0
0 0 0
0 0 0 0
0 0 0 0
000 I o 0 0 o 0 o o 0 0 0 I o 0 o o 000000 I o 0 0 o 0 0 0 0 0 0 1 o 0
0 0 0 0 0
0
0 0 0 0
o
G=
o o o
111
I
0
I
I
o
o o
I
0
I
000
010
11 I
o
I
000
I
o
1
I
I
I
I
I
I
I
I I
17. Because A has 11 linearly independent columns, rank(A) = 11. Hence rank(ATA) = nbyThcorem 3.28. BeclUse ATA is nX II, this implies thaI ATA is invert-
ible, by the fu ndamen tal Theorem oflnvenibie Matrices. AAT need not be invertible. For example. ta ke A ""
19.
I
o
I
A =
I
0 0 0 0 0 0 0 I o 0 0 0 0 0 0 0 o I o 0 0 0 0 0 0 0 0 0 I
11
[~ ~].
1 10010
1 0
and I
15. An invertible matrix has a trivial (zero) null space. If A IS Invert ible, then so is AT, and so both A and AThave trivial null spaces. If A is not invertible. then A and AT need not have the .same null space. For example, take
[~].
- 1/5v1 - 3/5v1]
." [ 2/ 5v2
. '"2 6/5v
ChapIe,4 Exercises 4.1 I.Av ==
[ ~] = 3v,A == 3
3. Av = [
-!]
= - 3v, A = - 3
6 nC I/jew
Questions
1. (a)T
5. Av =
(el F
(g) T 11
(f!) T
5. [ _~
3. ImpossIble
(i) T
~i]
o" -9
10] 2S
9.
2
4
I
-6
- 3
= 3v, A = 3
3 7.
1
[~]
II.
I - I
11 . Because (I - A)( / + A + A!) = /- A)= 1 - 0= I, (1- A) - I = I + A + Al .
13. A basis for row(A) is II I. -2,0.- 1, 01,10,0, 1,2,01, 2 5 5 [0,0, 0, 0, II I; a basis for col(A) is I , 2 , 1 436 (or the standard basis for Rl); and a baSIS fo r null(A) is 2 I I 0
o o o
- 2 I
0
15. A = 0,
~ = span( [~]} A =
I, £1
= span( [~])
17.A =
2,~ = span ([~]} A = 3,E} = span ([~])
19. v =
[~]. A =
1; v =
[~]. A =
v1] A= 2; v 21. v = [ 1/ 1/v1'
=
2
[-I1//v1 ] A= 0 v1'
Answers to Selected Odd-Numbered
~ = span([;]} A =
23. A == 2,
Exercises 4.3 3, Ej = span( [:] )
I. (a) ,\2 - 7A
y
1<) 4
2x
2
+
(b) '\ == 3, 4
12
E,~ ,p"n( [:]); E, =,p'n ( [ : ])
(d ) The algebraic and geometric multiplicities 3re 311 I. 3. (a) _ AJ + 2,\ 2 + 5A - 6 Ib ) A ~ -2. I.J
J,
,
I
1
o '--t'-+--+--+_ 4 2
- 3
1<) E_2 = span
x
;E,=span
0 0
0
~ = span( [~])
25. A :: 2,
&11
Ex~rcises
; E,
=
I
Span
2 10 (d ) The algebraic and geometric mul tiplici ties arc
).
all I. 5.(a) - A'+ A2
(b ) A == O, 1
2 1
'" 2
,
1<)
~
= span
I,
E, +. = span ( [ : ] } ,\ = 1 - i,
£,-. =
+ I,E,+, =
span([:]} .\ "'"
1 - i,El _' =
£Xerwes 4.2
3. 0
17. 7
°
31. 39. -8
5. - 18
7.6
11. li b + aEf
13.4
I S. alnlg
25. 8
27. - 24
29. 0
33. -24
35.8
37. -4
45.k * O,2
47. - 6
49.
:=
(det A)( det 8) "" (det
1<) & = span
59.x = - I, y=O,z= I
, _1,
-I
63.
0
I
- I
0
0
I
61.
+
27
0
0 •
I
I
0
multiplicity 2. 9. (a) A4 - 6A' + 9Al + 4..\. - 12 Ib)A = - 1.2.3
m(det A) "" det( BA)
[t -1]
o o
(el L, = span
-2 I
-f
57.x = ~'Y""-}
55. 0, 1
!
I
- I
I
;
~
= sll;\n
- I
o o
•
•
o
51. (-2)3" 53. del(A 13)
°
0
(d ) A = 3 has algebraic multipliCIty 3 and geometric
33. A = 4
3 1. A == 1,2
9. -12
I
7. (a) _ AJ + 9 Al - 27.\ Ib ) A = 3
,p,,([-:]) 1. 16
; £ , == sp'H1
multiplicity I.
,p'n( [~J) 29. .\ = I
- I
h:ls algebra ic multipl ici ty 2 and geometric multiplicity 1; A = 1 has algebraic and geometric
(d) A =
27. A == 1 +
I
E, ""
span
o 2 I
(d ) A "" - J and A = 3 have algebraic and geometric
multlpheit y I; A = 2 has algebraic multipliCIty 2 and geometric multipliCity 1.
688
Answers 10 Selected Odd-Nu mbered Exercises
II. (a) A~ - 4A 3 + 2A2 + 4A - 3 (b)A = - I , I ,3
Exercises 4.4 I. The characteristic polynomia l of A is A2 - SA + 1,
but that of S is Al - 2A
0 0
(cl £_1 = span
EI
5. AI = 4,
- 2 0
= span
3. The eigenvalues of A are A = 2 and A = 4, but those of Bare A = I and A = 4.
,•
0 I
,
I
-2 2 0
3
.,
7. AI = 6,
27.
1
0
o
I
I
I
- 2
_!]}
0 0 0
15. P =
A = 5, E5 =
35839 17. [ - 11605
-i, E_I/2 =
span ( [
_~] } A = L EI/5 =
,p,,{[ ,p,n([:]) E"
=
_:]),A = 7,
0 I
0 0
0 0
I
,_ A3
-
0
I ,D =
0
- I
0
0
0
0 I
3).2
- I
0 ,D = 0 I
2 0 0 2 0 0 0 0
0 0
0 0
-2
0
0
-2
- 69630 ] 24234
100 0 I 0
o
0
I
23. (2' - ( - 3)')/ 5 (-5 + 210 , + (-3)')/10] (2'" - 2( - 3)')/5 (2' + 4( - 3 )')/5 (2' " - 2( - J )1)/5 [ (-5 + 2 '~' + (-3)')/10 (2' - (- 3)')/5 (5 + 2" / + (-3) ' )/10
(5 + 2'" + (-3)')/ 10
+
4A - 12
35. A2 = 4A - 51, A3 = 11 A - 201
A4 = 24A - 551 4 II , A -· = - -A + -1 5 ' 25 25
+ -4 /
0
0
25.k = O
o o
2
{3' + 3( - 1)')/4 {3'" - 3(-1 )')/ 4] 19. [ (3' - (- 1)')/4 (3'" + (- 1)')/ 4 2 1.
- 12
I 1 37. A- = - - A 5
I
- I
I
'P'"([ :]) -3 4
I
(2 . 3 10 - 1)/3!O
'P'"([ :])
'" =
11. P =
I
13. Not diagonalizable
23. (a) A = -2,E_1 = span([
(iii) A = 0,
0 - I
2 2
(b) (i) A =
; A1 = - 2' £ _2 =
9. Not diagonalizable
17.
3.210
2 I
0 I , - I
span
(d) A = - 1 and A=::3 have algebraic and geometric multiplicity 1; A =:: I has algebraic and geometric multiplicity 2.
15.
= span
3
I
[ -2-' +
~
3
2
T'+ J.21O]
£~ = span([ : ]}A1 = 3' £ 3 = span ([~]) 3
0 0
& = span
+ I.
27. k = 0
29. All real values of k =I: - 2 35. If A - B, then there is a n invertible matrix P such that B = f' lAP. Therefort, we have
,,(B) = "pr ' AP) = "pr '(AP)) = t«(APW ') = tr(App- l) = tr(Al) = tr(A)
usi ng Exercise 45 in Section 3.2. 37. P =
[710 -'] -3
Answers 10 Selected Odd-Numbered Exercises
,, -,, ,, ,, -~ -,,
39. P ==
0
3. (a) [' ],2.6IS 0.6 1S (b) A, ~ (3 + Vs) / 2 - 2.6 18
1
0
49. (b) dim E_I = l , dimE. ""
2, d im~
= 3
5. (a) 1115 = I I.0DI,ys = [- 0.333] 1.000
Exercises 4.5 I. (a) [1
2.5
1 7. (a)
] ,6.000
11/8
= 10.000, Ya =
0
1
(b)A I = 6
9. k
0
1
2
3
4
5
x,
[:]
[2:]
[17.692] 5.923
[I S.0IS ] 6.004
[17.999J 6.000
[ IS.000 J 6.000
y,
[:]
[~.30S
]
[~.333 ]
[~.333 ]
IS.0IS
17.999
IS.OOO
Ink
[~.335 J [~. 333
]
26
1
Therefore, ..1. 1 = IS, V I
17.692
....
[I
D.333
l
0
1
2
3
4
5
6
x,
[~ ]
[~]
[ 7571] 2.S57
[7755 ] 3.132
[ 7808 ] 3.2 12
[ 7823] 3.234
[7 827] 3.240
y,
[~]
[~.2S6
[~.377]
[~4~]
[~.411 ]
[~.4 13]
[~.414]
/11 k
1
7.57 1
7.755
7.808
7.823
7.827
11. k
]
7
The refore, AI .. 7.827, V I = [ 1 ]. 0.4 14
13. k
x,
y. /I1t
o
1
2
3
4
5
1
21
16.809
17.0 11
16.999
17.000
1
15
12.238
12.37 1
12.363
12.363
1 1
13 1
10.71 4 1
10.824 1
10.818 1
10.8 18 1
1
0.7 14
0.728
0.727
0.727
0.727
1
0.619
0.637
0.636
0.636
0.636
1
21
16.809
17.0 11
16.999
17.000
1
Therefore, AI .... 17, VI =
0.727 0.636
619
698
Ans wers to Selected O dd -Numbered Exercises
1 15. A) = 5,v 1 =
0 0.333
17. k
x.
0
1
2
3
4
5
6
[~ ]
[~]
[757 1] 2.857
[7.755] 3.132
[ 7808] 3.2 12
[ 7823] 3.234
[ 7.827] 3.240
7
7.755
7.823
7.828
7.828
7.828
7.828
R(x.J
Y.
[~ ] [~.286 ] [~.377 ] [~.404 ] [~.4 11 ] [~.413] [~.4 14] o
1
2
3
4
5
1
21
16.809
17.011
16.999
17.000
1
15
12.238
12.371
12.363
12.363
1 16. 333 1
13 16.998 1
10.714 17.000 1
10.824 17.000 1
10.818 17.000 1
10.8 18 17.000 1
1
0.714
0.728
0.727
0.72 7
0.727
1
0.619
0.637
0.636
0.636
0.636
19. k
x. R( Xk)
Yo
2 1. k
x, y,
m,
0
1
[:] [:] [:] ~8] [~.667
[4 8] 3.2
[
1
5
Since Al = A2 = 4, V I =
3
4
5
6
7
8
[ 4.667 ] 2.667
[ 4.57 1 ] 2.286
[ 4500] 2.000
[ 4.444 ] 1.778
[ 4400 ] 1.600
[ 4 364 ] lASS
[~A44 ] [~AOO ] [~.364 ]
[~.333 ]
2
4.8
]
[~.571 ] [~500 ] 4.667
4.57 1
4.500
4.444
4.400
4.364
[~ ], mk is converging slowly to
the exact answer.
23. k
x,
Y. "'t
o
I
2
3
4
5
6
7
8
1
4.2
4.048
4.012
4.003
4.001
1
5 4
3.2
3.048
3.012
3.003
3.001
4.000 3.000
4.000 3.000
1
1
0.2
0.048
0.012
0.003
0.001
0.000
0.000
1
I
I
1
1
1
1
I
1
1
0.8
0.762
0.753
0.75 1
0.750
0.750
0.750
0.750
I
0.2 5
0.01 2 4.048
0.003
1
0.048 4.2
0.001 4.003
0 4.001
0 4.000
0 4.000
4.0 12
Answers to Selected Odd-Numbered Exercises
\
o, \ o 0
In th is case, Al = A2 = 4 and E4 "" span Clearly,
lIIi LS
converging to 4 and
Yi lS
0
converging to a
\ vector in the eigenspace E4- namely, 0
0
+ 0.75
o
o
\
l:] l:]
[~] [~]
25. k
y,
\
I 0
J
2
4
5
[~] [~] - \
\
[~] [~ ] - \
\
\
The exact eigenvalues are co mplex ( j and - i), so the power method cannot possibly co nverge to either the domi nan t eigenvalue or the dom inant eigenvector if we start with a real initial iterate. I nstead, the power method oscillates between two sets o f real vectors.
27. k
x, y, lilt
o
\
2
J
4
5
\
3
2.500
2.250
2.125
2.063
\
4
4.000
4.000
4.000
4.000
I
3
2.500
2.250
2. 125
2.063
\
0.750
0.625
0.562
0.53 1
0.516
\
\
\
\
\
\
\
0.750 4
0.625 4
0.562 4
0.531 4
0.5 16 4
\
The eigenval ues are AI "" - 12, A2 = 4, A) = 2,with corresponding eigenvectors \ \ \ VI
=
2 , v, =
0
I
\
Since Xo :: ~ V2 + ~V3 ' the ini tial vector Xo has a zero componen t in the d irection of the d omi nant eigenvector, so the power method cannot converge to the dominant eigenvalue/eigenvector. Instead , it converges to a second eigenva lue/eigenvector pair, as the calculatio ns show.
691
692
Answers to Selected Odd-Numbered Exercises
29. Apply the power method to A - 181 =
0
k
y,
m,
- I~]
[ -~8]
[
x,
3
152] - 19
[
-~ 8 ]
[
- 10
I
- 15 .
l52] - 19
-~8] - 19
- 19
T hus, - 19 is the dominant eigenval ue o f A - 181, ,md A.) = - 19 + 18 = -I is the second eigenvalue of A.
-8 31.ApplythepowermethodtoA - 17/-=
4
8
4 -2 - 4
8 -4 -8
33.
k
x, y, /Ilk
0
I
[:] [:] I
k
0
I
2
3
I
4 - 2 -4
- 18
- 18 9 18
I - 0.5
I
I
- 0.5
- 0.5
- I
- I
- 18 - 18
- 18 - 18
I I
[
[
5
2
I
[:] [:]
[-. "]
[ - O.J 0=]
I
y,
I I I - 0.667
m, R(x,)
- I 4
- 18
In this case, there is no dominant eigenvalue. (We could choose either 18 or - 18 for rill> k ~ 2.) How· ever, the Rayleigh quotient method (Exercises 17-20) converges to - IS. Thus. - 18 is the dominant eigcnvalue of A - 171,andA.2 = - 18 + 17 = - 1 isthc second eigenvalue of A.
2
3
[ - 0833]
[ 0.798 ]
1.056
9 18
5
4
[
- 0.997
[-:]
[ - ~.7S9 ]
[ - ~.SOI]
0.5
1.056
- 0.997
[
0800 ]
[
0800]
- 1.000
- 1.000
- ~.SOO]
[ - ~.800 ]
- 1.000
- 1.000
Thus, the eigenvalue o f A that is smallest in magnitude is 1/ (-1) =- 1.
35.
k
o
5
O. I I 1 -0.500
0.500 0.259 - 0.500
0.500 0.160 - 0.500
- 1.000 -0.667
- 1.000 - 0.222
- 1.000 - 0.5IS
- 1.000 -0.321
1.000
1.000
1.000
1.000
- 0.500
- 0.500
- 0.500
- 0.500
2
I
- 0.500
I - I
0.000 0.500
0.500 0.333 - 0.500
I
- 1.000
I
0.000 1.000
x,
y,
- I /ti t
•
I
I
-0.500
Clearl y. tnt converges to -0.5, so the smallest eigenvalue of A IS 1/ (-0.5) = - 2.
3 0.500
I!II
Answers to Selected Odd-Numbered ExtrclSC'S
Exercises 4. 6
37. The calculations are the same 3S for Exercise 33.
39. We apply the inverse power m ethod to A - 51 = - I 0 6 1 - I - 2 1 • Taking Xo "" I ,we have
•
k
o
- I
3. Regular
5. Not regular
7. L ""
I
0
,
I
0.200
- 0.080
I
- 0.500
- 0.500
I
0.200
- 0.080
- 0.500 0.032
I
- 0.400
0. 160
- 0.064
I
I
9. L ""
3
I
I
I. Not regular
0.0321
J
I
I
- 0.400
0.160
- 0.064
I
- 0.500
- 0.500
- 0.500
11.
0.304
0.304
0.304
0.354
0.354
0.354
0.342
0.342
0.34 2 I.
I '[~]
13. 2,
17.
Clearl y, m. converges to - 0.5, so the eigenvalue of A closestto5 is5 + 1/( - 0.5 ) = 5 - 2 '" 3. 43. - 0.619
47.
)'
,
<
I 15. T he population is inc reasing, decreasing. and constant, respecti vely.
I
41 .0.732
Un
o
0
o
0
..
o.
o
o
o
o.
•
• •
o o
o
o
I
o
The charaCleristlC polynomial o( L IS ( A ~ - hlA" 1_ b2sjA" - 1 - b,Sj52" ~-3 - ... - bnsj s1 ... S~_ j )( - I )".
19." - 1.746, p -
0.660 0.264 0.076
0.535 0. 14 7
-2
0.094 21. A .. 1.092, P =
0.078 0.064
49.
0.053
0.029 25. (a) I. "" 0.082
,•. 3. [!]
, j
, ,
31. 3, !
33. Reducible
j
51. H int: Show that 0 IS not contained in any Gerschgorin d isk and then apply Theore m 4.16.
53. Exercise 52 implies that lAI is less than or equal to all of the column sums of A for every eigenvalue A. But for a SlOchastic matriX, all co lum n sums are 1. Hence
IAI s
I.
•
35. Irreducible
41.1 ,2,4,8, 16
43. 0, 1, 1,0, - I
45. x~ "" 4" - (- 1)"
47. Yn = (II -
4'. b"
~
Dr
,':""((1 + 0)" - ( I -
0)"J
614
Answtts 10 ScI«I(d Odd-Numbered Exercises
55. (a) tl. = I.d~ = 2. d) "" 3,d~ = S, ds :;;: 8 (b) d~ = tl"_1 + d,._: (e) d.=
~[( I
+,v'S)"" - (1-,v'S)""]
7. Ax ""
57. The general solution is x(t) "" - 3C.c- ' + G.!c·',
y( t) :;;:
2e.e- ' +
-3 c '
+ 3t',y( t) ""
C:e4 '. The specific sol ution is x( r) 2c- '
5. Si I1Ct'A T = - A,wc havedetA = del(A1) = det( -A) = (- I )· det A = - del A by Theorem 4.7 and the fact that /I is odd. It fo llows that det A = O.
=
+ 3l'.
61. The general solutio n is x( t) = - C 1 + Clc- ', y( t) = C1 + C2c' - Cle- ', z( t) :;;: C1 + C 2e'. The specific solution is x( t) = 2 - c- ', y( t) = - 2 + c' + c-I, Z( I) = -2 + c'. 63. (a) X(I) = -120t tf5 + 520~ 1"IO.Y(I) == 240e"tf5 + 260 111/ 10• Strain X dies out aft er a pproximately 2.93 da ys; strain Y contm ues to grow.
65. (/ - 10, b = 20;x( l) = lOe'(cost + sin t ) + iO, y( t) = JOe'( cos I - sin () + 20. Species Y dies out when t = 1.22. 69. X(I) = C, C11 + (ie' 75. (a)
[:}[:][:][~;J
(c) repeller
77. (a)
[:}[:}[:}[:J
(e) neither
79. ( [IJI ' [0.5 J[1.75}[3.125J - 1 ' -0.5 - 1.75 81. [IJ[0.6}[0.36}[0.216J a)
(a )
0.6
1
83. r ""
Vi,o =
0.36
0.216
I
0
0.1
3. - 18
158
I -I , ° ° 2
, £_2 = span
I
15. not similar
not sim ilar
17.0, I, or - 1 19. If Ax = Ax, then (A 2 - SA + 2I) x = A~ - SA x + 2x = }2X - S(3x ) + 2x "" -4x.
Chl.llr 5 Exercises 5.1 1. Orthogonal 7.
[wi. =
3. Not orthogonal
[-:J
J8 = ° i
9. [w
5. Orthogonal
II. Orthonormal
I
1!3 2!v'S 2!3v'S 13. 2!3, -1!V5 , 4!3V5 2!3 ° -5!3v'S 15. Orthonormal
(e) anraClor
cos 0 sin fJ 19. Orthogo nal, - cos O - S1l\1 0
cos2 0
sin 0
sin 0 - cosO sinO
cosO
°
21. Not o rthogonal
(Qx) · (Qy) '7. ,",(L (Q x, Qy)) = IQxll QYI
- O. IJ spiral attracto r
0.2
o rbital cen ter
(e) F
162J [
17. O rthogonal, [ 1/ Y2
89.P = [1!12 -Vi!2} C =[ 1!2 -Vi/2} ° Vi!2 1!2 Review Questions I. (a) F (e) F
II.
(c) saddle poi nt
45°, spiral repeller
- I} C = [0.2
I -I ° 13.
I!V>
85. r = 2,0 :;;: - 600, spiral re peller 87. p = [ - 1
A)
-
(e) EI = span
+ V2)ev'l '/ 4 +
0V""!4, xl I) = 0,/i'!< - 0,-Vi'!4.
(2 -
= 5x, A .. 5
9. (a) 4 - 3A 2
59. The general solution is xl(t) = (1 + v1)C1eVir + ( I - v1)G.![ v'l " x2(t) = CleVi , + C I e- Vi ,. The specific solutio n is xl ( / ) = (2
L5o]
(g) T
(i) F
(Qx)'Qy = -:-F~B.~.,.. V(Qx)'Q x V(Qy)'Qy
695
Answ"ers to Selected Odd -Numbered Exe rCises
29. Rotation , 0 33. (a) A(AT
= 45
0
31. Reflectio n, y
+ nT) B = AATB + ABTB =
= V3;.;-
18
25. No
+ AI =
Exercises 5.3
B+ A = A+B det(A + 8} = det(A(A' + 8')8}
1
"" det A det( AT + 8 1)det B = det A det« A + 8 )1')det B = del A det(A + B)det B
3. v, ·
Assume that det A + d el 8 = 0 (so that de t B "" - det A) but thilt A + /j is inver tible. Then del(A + B) 0, so 1 "" de t A del /j "" del A(- del A) = -(de l AV This is impossible, so we conclude that A + B cannot be invertible.
ql =
I
I I
I
3
9. y
,
: X . '.y=I,Z= - 1 . 8 1 ",.
0 I , 3 0 I
,
I
7. mw(A),1[ 1 0 11.10 1 -'II. null (A), 1 5 0 • - I
2
1
,
- [
• null(A T ) :
I
- 1
1 - 10
19. v = [
13.
- 1/\/2 1/\/2
,,
_ .1:
7. v =
•
••
2
I
,
_1
•
,
!
,I,
+
," -"Jl, •
-I _ll
_1.
•" _1 ",
•
"o , •
1/ 0 I/V6 -2/...,16 13. Q = 0 1/ 0 1/ ...,16 - I/ Vi 1/ 0 0 Vi I/ Vi I/Vi 2/ ...,16 1/ 0 0 3/ V. 1/ ...,16 15. I/ Vi -1/ ...,16 1/ 0 0 ,/ 0 I/Vi 1/ ...,16 -I/0 0 3 9 0 6 0 0
,,1 , ,
I
1/ V2 - 1/ -.16 -1/2'113 o 2/ V. - 1/20 o 0 3/20
-3 1 0 1 0 • 3 0
3
, - 2 ,,
2 1.
\I
==
, 0 _1,
5.4
1
I
+
0 2/ V(,
1/ 0
1/\/2 1/ v'6 -1/V. 1/0 - 1/0 l/ Vi
23. Let Rx = O. T hen Ax "" QRx "" QO = O. Since Ax represents a lineilr combination of the columns of A (which a re linearly independen t), we must have x = O. He nce, R is invertible. by the Fundamental Theore m .
,1
17.
=!] + [ -ll
- I/ YJ
o
, ,1
•
- 1/ '113,
; q l=
I
_1
I/ Vi
- 4
-,,
[1]
- I
I/YJ
19.A = AI 21. A-I = {QR) -I = R- 1Q I _ R-I QT =
- 4
15.
•
17. R =
-5 1 0 1 2 • -7 0 1 11 .
\
5
I
y :x - y+3z - 0 ,8.1·
9. ,o[(AI'
- I
I •
11 .
- I
X
5. W-'-=
1 , vJ'"
0 0
X
0
- \ , v2 -
I I •
5.
Exercises 5.2
2
2/V. 1/\I6, q, 1/\16
"*
3. \\1.1 =
[-1I//2] [ 1/ \/2] [- 1 1// V2 \/2] Z; ql= I/ Vi, q2""
I] I.v l =[I ' v2 -
(b) From part (a),
I
0]
_ [ I/ Vi I/ Vi] D _ [5 . Q - I/ Vi - I/ Vi ' - 0 3
&9&
Answers to Selected Odd-Numbered Exercises
'/V3] [2 -21 V(, , 0 = 0
2/V(,
3. Q = [ I/V3 I
100
o
0
5 0 0 2
o 110 o 1/ 0
- 1/0,0= 1/0 - 1/0 0 1/0
5. Q =
o
7. Q =
I
o 1/0 o 1/ 0 o -1/0 o 1/ 0
110 1/ 0 1/ 0
9. Q =
,D=
0
o o
o
1/0
0 4 0 0
o o
3. G' =
o o - 2 0
7. P' =
010 000
o o
I 0
1 0
I 0
0 0 1 0, C is equivalent to Chut
o
I
0
0
o
1/0 ' -1 /0
9. Cl. =
0b]= D
,-
13_ (a) If A and B arc orthogonally diagonah7.able, then each is symmetric, by the Spect ra! Theorem. Therefore, A + 13 is symmetric, by Exercise 35 in Section 3.2, and so is orthogonally diagonalizable, by the Spectral Theorem.
19. A =
21. [
-t
0 0 0
0
0
0
-!]
+
0 0 0
2
23.
I
0
0
I
I
0
+
0 0 0
0
0
- I
I
,, -,, -,, , ,
2
, _1,
, 1, ,1 I
I
-I
I
0
r
I
o
0
0
0
0'
1 '0'
o
o
I
0
o
I
1
I
I
I
I
0
o
I
I
0
I
I
23. 2X2 + 6xy
o
, 1" =
[I
, P' =
[I
1
I
1 I I
+ 4y 2
-l]
1101000 I 0 I 0 1 0 0 0110010 11
10001
25. 123 29.
- I
o
0 0
27. -5
3 1. [ _~
1
~]
[~ 5
I
-2
I
- I
2
- 2
2
2
33.
1/ v'S] , , v'S 35. Q = [2 11 / v'S -2/ Vs ' YI + 6Y2
ExerCISes 5.5 I. G' =
0 0 2 2
0
o
1 0 o I I I
+
5 0 0
I
0 1 0 00 1 1 1 0 , pi=
i] [-t -n
[~
I
0,0,0,0
I
15. If A and B are orthogonally diagonalil..able, then each is symmetric, by the Spectral Theorem. Since AB = BA, AB is also symmetric, by Exercise 36 in Section 3.2. Hence, AB is orthogonall y diagonaJizable, by the Spectral Theorem.
r
0
o
0 10][,
17. A =
r
C"'C
o 2 0 0 0 = o 0 0 0 o 0 0 0 / , [1 1 b] 11. Q AQ 1/0 - 1/0 b a . 1/0] = [. + b -1/0 0
0 , C is equivalent to C bu t C * C I
000 5. P' = [ 1 01 ],C'isequivalent toCbu t C*C
2 00 0
[ 11// 00
I 0
,C'::C
2/ v'S 2/ v'S - 1/3 37. Q = 0 I Iv'S 2/3 1/v'S 0 2/3
, 9yf + 9y} - 9y;
Aruwe:rs to Sclect«l Odd-Numbered Exercises
l/ V(, I/ V; l/ Vi 2/ V(, ,2(x' )' + -lt V; o -lt V; l/ Vi - lt V(,
39. Q -
y. y •
(I)' - (r)'
3
41. Positive definite
43. Negative definite
45. Positive definite
47. Indefinite
51. For any vector X, we have x TAx == x TBTJh "" (Bx )'(Bx ) - ~ BxU' " o. If x' Ax - 0, then R BxH' - 0, so Ex =- O. Since B is invertible, this implies that x =: O. Therefore. xTAx > 0 for all x "" 0, and hence A = BTB is positive definite.
53. (a) Every eigenvalue of cA is of the form cA for some eigenvalue A of A. By Theorem 5.24, A > D. so cA > 0, since c is positive. Hence, cA is positive definite, by Theorem 5.24. (el Let x #- O. ThenxTAx > oand xT.ax > O, sinceA and B are positive definite. But then xT(A + B)x = "TAx + x TBx > O,soA + Bispositivedefinite.
-3
71. Parabola. x' = x - 2, y' = y
Y
1/ 0
•
- ,-4
73. Ellipse, (x' )' / . + (1)'/ 12 - 1
l/ V; 57. The maximum value of I(x) is 4 when x =:!:
y'
-2
l/ Vi]
[ ~j~J.
Y
j
l/V;
4
the minimum value of {(x ) is 1 when x =
l/ Vi
o
+
or '::
- l/ Vi
61. Ellipse
-l/ Vi 1/ V2 0
63. Parabola
- 4
4
,
~/ .
65 . Hyperbola
X
-4
67. Circle. x' = x- 2.y' = Y - 2. (x' ? + (1 )1 "" 4
75. Hype,bol., (x' )' - (I)' - I
l'
y
•
y
, y.
5 4
"
3
\,•
-
.,I 1
•
2
3 4
,
-
\
- 4 .--'
x
+ 2. x'
2
55. The maximum value ofj{x) is 2 whenx= :: [ -l/ Yz ;
the minimum value of [(x) is 0 when X "" :!:
++-1'-)+ -+-'. ~.
-)
4
".-
,
•
.. x'
=
- icy')l
Answers to Selected Odd-Numbel't'd Exercises
+ (y")'/IO = I 79. Hyperbola, (x' )' - (y")' = I
7. Theorem 5.6(c) shows that if v, · v} = 0, then
77. Elllp".(x'·)'/50
Qv,' Qv) = O. Theorem 5.6(b) shows that {Qv!, . ..• QVk} consists of unit vectors, because {VI> ... • vJ does. Hence, {Qv!• .. . ,QVk} is an orthonormal set. -I
81. Degenerate (two lines) y
9
-2
{[-m
I!.
I
13. mw(A), {[I
2
-2
ool(A),
0
- I
2
I -5
-2
y
0 I , 0 0
null(A)' 2
--+--t--+-+-+- ,--+-+ x 2
-5 -3
null(A,),
-3 -2 0 , I 0
y
15. (a)
1
I
• 1
I
1
•
-,••
I
19. 89. Hyperboloid of one sheet, (X' )2 - (1)2
+ 3(i)2
= I
+ (y'? v3(1)2 + V3(Z)2
, 1, 1,
_1
0
,
o
-I 0
,,
-I ! -! o
- 4 - I 0 0 I
1
I
o
2 III
I
-I
17.
0
2 0 I
I
I
I
-I
0 85. Degenerate (two lines)
4J, [ 0
2
3
-2
2 3
I -I
83. Degenerate (a point)
- 2
3
o
0 0 I
91. Hyperbolic paraboloid, z = _(X')2
93. Hyperbolic paraboloid, x' = 95. Ellipsoid,3(x" )2 + (1')2 + 2(z")2 = 4
Review Questions I. Ca)T
Co)T
Ce) F
Cg) F
c;)F
9/ 2 3.
2/3 - ll/6
5. Verify that Q' Q = 1.
ella••lr • Exercises 6.1 1. Vector space 3. Not a vector space; axiom I fails. 5. Not a vector space; axiom 8 fails . 9. Vector space 7. Vector space 11. Vector space
IS. Complex vector sp;
Answers to Sdected Odd· Numbered Exercises
17. Not a complex vector space; axiom 6 fails. 19. Not a vector space; axioms 1.4, and 6 fail. 21. Not a vector space; the operations of addition and multiplication are not even the same. 25. Subspace 27. Not a subspace 29. Not a subspace 31. Subspace 33. Subspace 35. Subspace 37. Not a subspace
39. Subspace
41. Subspace
43. Not a subspace
47. Take Uto be the :c-axis and Wthe r-axis. for example.
[~] and [~] arem U U W, but
[:J
=
[~] + [~] i'not 51. No
53. Yes; s{x) = (3 + 2t)p(x) + (! + t)q(x) + t>(x ) fot any scalar L
55. Yes; h(x) = I(x) + g(x) 59. No
37. dim V= 3, B = {[I0 0' 0] [00 O· I] [0 OJ} 0 1 39.dim V=~B= {[~ ~],[~ ~]} 41. (n' - n)/ 2 43. (a) dim( U X V) = dim U + dim V (b) Show that if {WI•...• wn} is a basis for W, then {(WI' WI).···. (w n• w ~)} is a basis for.6..
'7.
{[~ ~]. [~ ~],[~ -~]. [~ ~]}
49. {I, I + x} 51. {1-x,:c-x 1} 53. {sin' x, cos1 x} 59. (a) Po(:c) :IE tx2 - ~ x + 3,PI(x) = _ x 2 + 4x - 3, p~(x) = t x 2 +1 61. (c) (i) 3:c 2 - 16x + 19 {iil x 2 - 4x + 5
ix
1'"-')
Exerds~6.3
61. Yes 1.
F.:cercises 6.2 1. Linearly independent 3. Linearly dependent;
- I
7
=
-2
2
3. [xl. =
Ps-c
19. Basis
21. Not a basis
23. Not a basis
25. Not a basis
-I -I 4
29. [pIx) l. =
0 ,[xJc -I I 0 0 I I 0 I
5. [p(x)].
13. Linearly dependent; In (x') "" -2 in 2 · 1 + 2 · \n(2x) 17. (a) Linearly independent (b) Linearly dependent
-I
:c:
+ b! = 7x - 2(a - r)
11. Linearly dependent; I = sin 2 X + cos' x
PB-c =
7.
I
3
~
-I
,Pc_~
=
00 I 0•
I -I
0 -I
-I
~ [ _ ~]. [p(x) k = [ -~J. p,_. = [ -:
[~
[p(x)l. -
Ps-c =
I
:] I -I
I -I
0
I
I
I
6 -I
Pc_~ = [t
[_:].
I
5. Linearly independent 7. Linearly dependent; 3x 9. Linearly independent
{xl~ = [ ~]. [xlc =
[: -:]
[-I 0] 4[-I I] +
[~ ~]-2(_~ ~]
[Al. =
1 - Xl}
63. (p" - I)(P " - p)(P" - p') ".(p" -
57. No
27.
X,
45. {1 +x,l +:c+r,l }
45. Not a subspace
Then
35. dim V = 2, B = {I -
...
I
,[p(x) l, =
00 I 0 -1
I
0 I
• PC_8
=
~].
I I
00 I 0•
I
I
I
Answers to Selected Odd· Numbered Exercises
lt1
1
4
2 • [AJc = 0 - 1 0 -1
9. [AJ. =
,,
0
-3
-,,
0 -1
1 1
0 1 -1
,
1
11. (f(x) J. = [
. PC_
B=
,
_1
1 2 0 -1
_!], (f(x)]c = [_: ]. P'_B
2 1 1 1 1 0 1 0 0 0 1 1 = [:
P _[ 1 0] 8_C -
- I
~].
b [Z + ZV3]_ [5.464] () 2v3 - 2 1.464
=:H!n
+ 3(x + 1) - 3{x + 1)2 + (x + l )3
Exercises 6.4 L Linear transformation 3. Linear transformation 5. Linear transformation 7. Not a linear transformation 9, Linear transformation 11, Not a linear transformation 13. We have
S(P(x) + q(x)) = S«(P + q)(x) ) = x«(P + q) (x)) = x (P( x) + q(x)) = xp(x) + xq (x) = S(P(x)) + S(q(x)) and
S(cp(x))
15.
J-7 l ]=(a: J
9
(ka ) + (ka + kb)x
+ (a + b)x)
5 - 14x - 8.:C, Ja] IIb =
7b)x + (a
= kT( [:])
(a+3b) 4 -
~ b )x'
17. T(4 - x+ 3x 2 ) = 4
+ 3x + 5r, T(a + bx+ 0: 2)
(3a- zb - c).,?
1) [~ ] = [~ -~].(So 1)[;]= [~
-y
(To S) [;]does not make sense. 27. (So 1)(P(x)) = p'(x + I),(T' S)(p(x)) = (p(x + 1))' = p'(x + 1) 29. (So
1)[;] s( r[;]) s( [-:X-:4J) =
=
Therefore. 5 is linear. Similarly,
T( [:]+ [~]) - T[::;]
=
4(x - y) + (-3x + 4Y) ] [x ] [ 3(x- Y) + (- 3x+4y) = y
(ToS)[;] = T(t]) T([~:;]) = =
(4x + y) - (3x + y) ] [x ] [ -3(4x + y) + 4(h + y) = y Therefore, SoT = I and To S = I. so Sand Tare • mverses.
=
+ 0) + «a + 0) + (b + d))x (a + (a + b)x) + (c + (0 + dlx)
=
r([~] ) + r( [~])
= (a
ExerCISes 6.5 L {a} Only (ii) is in keT(T). (b) Only (iii) is in ran e{T). (e)
k,~1) = ([ ~ ~
},rnnge(1)
3. (a) Only (iii) is in ker(T), (b) All of them are in range (T).
=
1
2x + Zy .
= S«cp)(x)) = x (( cp) (x)) = x(cp(x)) = cxp(x) = cS(P(x))
=
19. Hint: Let a = T{EIl) ,b = TeEll l.c= T(Ez I),d= TCEn), 23. Hint: Consider the effect of T and D on the standard basis for ilP".
25. (So
17. -2 - 8(x - 1) - 5(x - 1)' 19. -l
T[ ~l =
Therefore, T is linear.
a + ex +
1
(3 + 2v3)/2] [ 3.232 ] 13. (a) [ (-3v3 + 2)/2 - 1.598
15. 6= ([
=
= k(a
l
0 , P8 o-C = 1
-2
+[:])
and
{[~ ~]}
(e) ker(I) = {a + bx + cx 2; a = - co b ::::l - c} = {t + 0: - o:l}. range (T) = R2
•
Answ~ rs
S. A basis for ker( T) is {[
~ ~ J. [~ ~]}, and a basis for
100
o o
~]. [~ m,''''k(T) ~ nullity(T)
unge(T)is { [ :
r}. and a basis for ''''get T) is {[: J. [ ~]}, ,ank( T ) ~ 2. nullity( T) ~
21. Isomorphic.,
0
o
b 0 0 c
b c
[:J
R Hint, Define T , 't 10. I J --> "10. 2( by letting T(f) be th~ function whos~ va1u~ at x is (T(f) ){x) = /(x/2) forxin \O.2J. 33. (a) Let VI and v2 be in Vand l~t (S 0 1)(v l ) = (So T)(v,). Then S(T(v ,» ~ S(T(v,». so T(v ,) ~ T(v2 ), sinc~ S is on~-to · on~. But now V I = VI. since T is on~-to-one . Hence, S o T is one-to-one. 35. (a) By the Rank Theorem. rank(T) + nullity(T) = dim V. If T is onto, then rang~T) = W, so cank(T) ~ dim(,,,,ge(T» ~ dim W. Theeefoce. dim V + nullity(T)
< dim W + nullity(T) ~
cank( T) + nullity(T)
~
6
4
-3
-2
2
- I
~ [ -I0 0I]. [T),_,[4 + 2xJ, ~
[_~ :][~]- [ _~] ~ [2 - <xJ, ~ [1{4 + a )J,
4
-3
-2
2
-I
.[1],_,[-~], ~ 0 0 7
[-~] ~ 0 0
1
9.
[T),_, ~ 0
~
0
0
1
0 0
0 1 0 0 0 0 1 0 0 0 0 0 1 0 b 0 1 0 0 0 0 0 1 d
•
,
[[:
7 7 7
,
~
7 7 7
,
~
. [TJ,_,[AJ, ~
1
-
•, ~
b d
~]L~ [T(A)1o -I
1
0
0
1
1
0 0
-I
0
1
0 -1
0 -I
II. [T Io_, =
1
0 0
- I
,
0
1
-I
0
d
-I
1
0 1
[[' - b db .]], - [AB _, a -d
. [TJ,_.JAJ,-
0
a b
- 1 0 0
0
Eurcists 6.6
6
[~ -~lL
dim V
so nullity ( T ) < 0, which is impossibl~. Therefor~ , T cannot b~ onto.
1. [T),_"
b ::: [a+b{x+ 2)+ c(x+ ,
a+b ' O+C ' OZ] [ a+ b'I + ,.1 2 c = [T(a + bx+ exl ))c
7. [1'),-, ~
23. Not isomorphic 25. Isomorphic, 1'(a + bl) =
•
[TJ,_, ~ [: ~ ~ J. [TJ,_,[' + bx + o<'J, ~
= a
=
1
[ ~ ~ ~] :, =[a+:+J
9. cank(T) - nullity(T) - 2 II . cank(T) ~ nullity(T) ~ 2 n. cank(T ) - 1. nullity(T) ~ 2 15. On~-to -on~ and onto 17. N~ither on~-to-on~ nor onto 19. On~-to·on~ but not onto a 0 0
0
2)')J, ~ [T(. + bx + a')k 5.
3 "'" dim @ll'
O. [T]co-B[ a+bx+of ]s=
•
7. A basis for ker(T) is {I + x -
=
1
1 0 0 010 b = o 0 I,
-= 2, and rank(T) + nullity( T ) = 4 = dim M22•
I, and rank( T ) + nullity( T)
to ScJ~cted Odd-Numbered Exercis6
~
,dab-
b a d ,
-
BAJ, - [T(A)J,
-
Answers to Selected Odd-Numbered ExercIses
102
13. (b )
(c)
~ [~ -~]
[Dl.
29. Linear
[DJB[3SinX -5 COSX]B = [~ -~][
[!] =(3cosx+
-:]~
0
I
0
31.
I I
SsinX]B=
l S. (a)
[Dl.
~
17. [S o TJV<---E =
0
0
2 -I
I
[ -I 1
19. Invertible, y - l(a
35. 11., ..
2 - 2] - I
+ bx)
= -b
37.
+ ax
21. Invertible, rl(p{x)) = p{x - 2)
+ 2c) + r'(p(x» ~ p(x} - p' (x} + p"(x)
23. Invertible, rl(a + bx
(b - 2c}x +
",' 0 '
25. Not invertible
+ crl) = (a -
b
27. -3sinx - cos x
+C
29. t~cosx -~e2M5in x+ C
3J. C=
{[ _ ~ ],[-~ ]} 33.C ={ I-~2+x}
(d~ - dD/(d: + dD 2dldd(d~ + d?) ] 37. [Yl e = 2dld2/(d~ + dD (d~ - dD/(d: + di)
[
Exercises 6.7 I. y(' } ~
2."/e' 3. y(t} ~ « I - ,'}, " + (, ' - I}l'}/(e' - ,' ) 5. [(t) ::
(~Vs-l)/2)[~I+ V5)'/2 eV5 - 1
_ e!1 -V5)1/2]
I. (a) F
. (
"~K)
Sin IOvl\
19. (b) No
23. Not linear
25. Not linear
27. Linear
+
IOcos(VKr}
I
(e) F
OlT
(g) F
5. Subspace
X4 },
dun W = 3
11. Linear transformation
13. Linear transformation
17.
1
0
-1
0
I
-2
0
0
I
0
1 - I
IS. "'- 1
19. S o T is the zero transfo rmation.
1. (a )
°
(b )
v'TT
(0)
3. Any nonZ('ro scalar multiple of v =
VTi
[!]
(b)
vI4
(c)
v'26
(b)
y:;;
(0)
y:;;
13. Axiom (4) fails: U =
[~]
oF- 0, but (u. u) = O.
15. Axiom (4) fails: u =
[ ~]
:1= 0, bUI (u, u) = O.
S. (a) - I
7.1 -2x' 9. (a)
sin(VKr}
I
7. Let cLA + C2B = O. Then cLA - c2B = c1A T + c,BT = (cLA + C:1B f = 0. Adding, we have 2c LA = 0, so c1 = 0 because A is nonzero. Hence c2 B = 0, and so C2 = O. Thus. { A. B} is linearly independent.
Exercises 7.1
17. x(t} ~
,
0
3. Subspace
9. ,{t} ~ « k + I}<" + (k - IV"}/2k
5 - 10 cos(lOv'K)
0
(c) T
ChaDllr 1
(b ) After 3691.9 yea rs
I
Review Questions
7. y(t) = e' - (I - e- I)te'
II . y(t) "" e' cos(2t) 13. (a) pet) "" l ooen(16)I/J .... lOO!·924t (b) 45 minutes (c) In 9.968 hours 15. (a) m(t) "" SOe- a, where c == In2/ 1590 " 4.36 x 10- 4 ; 32.33 mg remain after 1000 years
I
[I 1 1 1)
9. {I, x 2,
35 .C ~ (1, x}
I I
,
I
, , , , ,, , , , , ,, , , , , , , , , , , , ,, , ,, ,, ,, ,, ,, ,, ,, ,, , , , , , , , , , , , , ,, ,, ,, ,, , , , , , , , , , , , , , , ,,, ,, , ,,, , ,, ,, ,, ,, ,, ,, ,, , , , , , , ,
[D(3sinx - SC05X)JB 2 0 0
I
,
I
7T
,It
Answt'n [0 Scl«tld Odd-Nwnooed Exerci~
17. Axiom (4) fails: p(x) = I - xis not the zero polynomial, but (p(x), p(x)} ::: O.
7. (a) AI most one component of v is nonzero.
9. Suppose Ivl ... = Iv.l. Then ~
Vl1 + .. . + vI + ... + v! 2: Vvt = IVil = Ivt ...
19.A :::[~ ~] 21.
E=
11 . Suppose ~ v~ ... ::: i= 1, ... ,n,SO
y
Iv.l. Then Iv,1S; IVtl for
I , = Ivd + ... +
2
=
I v~ 1 <::
IVkl + .. . + IVI I
nlv.1= nUvU",
13. x
2
-2
y
y
1
1
-2T 25. -8
27.
29. ~ u + v - W~ l
- '--+-+-+--'
v'6
- +x
+ v - w) ::: (u, u) + (v, v) + (w, w) + 2(u, v) - 2(u, w} - 2(v, w}
:::
(u +
V -
-- I
x 1
W, U
-I
::: I + 3 + 4 +2- 10-0 = 0
2I . IAI, - Vi9, IAn, = 4, IA~ ~ = 6 B. IAI, - ..,jj\, IA!, = 6, I An~ = 6 25. IAI, = 2ViI, IAI, = 7, IAI~ = 7
Therefore, lu + v - wi = 0, so, by axiom (4), u + v - w = 0 or u + v = w. 31. (u + V, u - v) = (u, u) - (u, v) + (v, u) - {v, v) =
lui' - (u,.) + (u, v) - I '= lui' - lvi' 33 . Using Exercise 32 and a similar identity for
lu -
,,
27. x =
we have
1
l u + vl l + l u - V~2 = (u + v, u + v) + (u - v, u - v) = U u~ '
=
+ 2(u, v) +
~ v U'
31.x =
+ v ~ ==
#
37.
R U ~2 -
2(u, v) +
maxFxl
~ vj l
1
1
- I - J
Exercises 7. 2 10, !un. = 5
3. dt(u, v) == V7(i, d,(u, v) = 14, d..,(u, v) = 6
:a
~III
:;::
m axa x ~ = 1. Isl-!
37. condlA) "" cond...,(A) = 400; ill-conditioned
39. condl A) =
n, condw(A} '=
128; moderately
iU-conditioned
41.
l/ Vi, vlxl Vi,V5(3x' - 1)/ 2Vi
S· !IuIH = 4.lvlH= S
1
35. condl(A) = con
2{u, v) = -2(u, v) # (u , v) = 0
{[~].[~]} lui, = \142, ion, =
0 ,y -
Isl-!
41. la)
I.
0 , Y=
33. (a) By the definition of an operator norm,
~ u - v~ <=> Uu + V~ l = j u - V~ 2 # I U ~ 2 + 2(u, v) + Ivl 2 =:
I
1
o
+ l u ~' - 2(u, v) + H' 2~ uU' + 21vll '
Dividing by 2 yields the identity we want. 35. ~ u
[~]. y = [-:] 29. x =
o
w nd,,(A) =
(max{
k~ l
(maxllkl + +
I, 2) '
k ~ I' k=l})
43. (a) cond,JA) = 40 (b) AI mosl400% relative change 45. Using Exercise 33(a), we have cQnd{A ) IAA-'! = III - I. 49.k a- 6 51.k 2:: 10
=:
l An A- II
2:
AnswtTS to Sde<:tcd Odd - ~ u mbered ExerciSC5
114
ExercJ5ts 7. 3
nen = Vi .. 1.414 5. lei ~ v7 • 2.646
I '~ ~
3.
J.
Vo/ 2 • 1.225
IS.A =
0
ix. lq = 1.225 9 . y -J} - 2x, lei - 0.816
J7. A =
7.y = -3 +
4 + ~x, ~ e~ "'" 0.447 13. Y = -! + ix. ~ en ~ 0.632 19.i
-If x + 7!
[!J
=
21.
+
4
23.%
17. Y =
I
41.
[Ill. [!] .,
_1
47.
1
A~ =
!
•,1
•
I
49.
~
0
-',
- 1 1
1
2 2
0 [ 10]
+!14 So 1 (b) cond,(A) =
39. (a) U A ~ l == 1.95
(b) cond2(A) = 38.11
- :] 6 1.
, ,
[
63. [
1_1
43. A* ""'
Hi~ [052]
[2;
00
,I
0
0
1
1
0
,
1.04
25
U•• lJ. x~ [:]
47. A+ =
-! i I i
11
37. (. ) IAU, ~ 0
"
I]
A~ = [ ~
35 . The solid ellipse.rl 5
45. A". =
_1
•
1]+ 2
- I
41. A*=[t ~]
2
i
~
45 . A ~ =[!
U-1-U
51.A*=
~
i 1!.
,
o
33. The line segment [ - 1,
! i
5
l/ Yz] (Exercise 3)
o
29. Y = 0.92 + O.73x
39.
Ijv'i] + O[~]
23. (Exercise 7) A = 3 I [0
35. 139 days
l ! I
0 [ ~][I/0 o
31. (a) If we let the year 1920 correspond to r = O. then y = 56.6 + 2.9t; 79.9 years 33. (a) p(t) = 1SOe'l UI ,
~
21. A
[ - 1/ 0
""
[-n
-,•,
~][ ~
[~
-
[~ ~]
0 0] 2/v'S 0 1/v'S 0 0 I 2 o 1/ v'S 0 - 2/v'S
19.A =
- !7!
tf"
25.
I
37
-
g
- 5 - 2t
27. i =
~x
-1]
x= [
- 5- t
=
If -
0 I 3 0 0 0 0 2 I 0 0 0
I
0
It. y ::
IS . Y ::: 3
[i -lJ[ ~][ l)
1
1
0o][ - 1/0 1/ 0
0 0
1/ 0 ] 1/ 0
-1][ 0 I]
2 - I
3
- I
0
53. (a) If A is invertible. SO is AT, and we have A* == (ATA) -I AT:::: K I(A1) - IAT = A-I.
ExercISes 7. 5
Exercises 7.4 I . 2, 3
9.
3.
0.0
5. 5
v'S. 2. 0
11. A =[~ 13.A =
7. 2,3
I. g(x)=i
3. g(x) = i x
5 . g() x -- )6'+~ 16 X '
7. {I.x- j}
9. g(x) = x - ~
It . g(x) = (4e - to) + (18 - 6e)x "'" 0.87 + 1.69x
0][0 I 0
0][ 1/ 0 0 1/ 0
0][- 1 0] [~ o1][3 0 2 O-}
1/ 0 ] - 1/ 0
13.
g(x)
=
i -
~x
+ ~xl
Answers to Selected Odd-Numbered Exercises
+ (588
- 216e)x + (21Oe - 570)xl "'" 1.01 + 0.85x + O.84.xl
15. g{x) == 3ge - lOS
2
"
9. Not a norm
9
23.
110
= 1, at = 0, bk =
25.
Il(j
=
71', ak :c
1
'--"--'''-
1m 2( - 1)' 0, b. = k
37. C = {O, I}, where 0 is the zero vector and 1 is the vector of all Is in Z:~. 39. A parity check matrix Pfor such a code is (8 - 5) X 8 = 3XS. Therefore. rank(P) < 3, so any four columns of P must be linearly dependent. Hence, the smallest inleger d for which there are d linearly dependent columns satisfies d S 4. By Theorem 7.21, this means that d( C) :S 4, so an (8, 5, 5) linea r code does not exist. Review QI4csrions
•
«) F
It. con
I - (- I )'
29. d(C) = I 3[. d(C) = 2 33. d(C) = 3 35. d(C) = 3; if C = {c l , c.z, C}' c4 }, then u decodes asc z, v cannot be decoded, and wdecodes as cJ.
I. (a) T
s. V3
3. Inner product
21. 71' _ ~(COSx + COS3X)
(e) T
(g) T
115
(i) T
,I,
13. Y = 1.7x
15.
17. (a) v1, v'2 1/ v1
1/ v1 0
(b) A =
o
0
I
1/ v1 - 1/ v'2 0 (c) A + =
[ii
0 0
l]
v'2
o
o v1 o o
[~ ~]
-~ 19. The singular values of PAQare the square foots of the eigenvaJues of (PAQ )T(PAQ) = QTAfpTPAQ = QT( ATA)Q. But QT( ATA)Q is similar to ATA because QT = Q- I, and hence it has the same eigenvalues as ATA. Thus PAQ and A. have the same singular values.
-
..
•
••
•
Index I've got a little list. -Sir w. S. Gil~rt, "They'll None Of 'Em Be Mimd ,~ from The Mikado, 1885
Abel, Niels Hcnrik, 308, 668 Absolute value. 652 Addition of complex numbers, 650 of mat rices, 138 closure under, 190,433 of polynomials, 662 of vectors,S, 9, 433 Adjacency matrix. 236. 238 Adjoin t (adjugale ) of a matrix, 275
Algebraic multiplicity. 291 Algorithm, 87 AJ· Khwarizmi, Abu Ja'far Muhammad ibn Musa, 87 Allocation o(mouTus, 101 Altitude of a triangle,)O
Angle between vectors, 21 Argand, 'can-Robert, 650
Argand plane, 650
Argument of a complex number. 653 Arithmetic mean, 557 Arithmetic Mean-Geometric Mean
Inequality, 557 Assoc iativity, 10, 152, 156,221,433 Attractor,347 Augmented matrix, 62, 68
Axioms inner produCI space, 540 vector space, 433 Back substitution. 62
Balanced chemical equation. 103 Basis, 196, 450 change of, 467-476 coo rdinates with
respect to, 206, 453 orthogonal, 367, 546
III
orthonormal, 369, 546 standard, 196,451 Basis Theorem, 200, 457 Best approximation, to a vector, 579 Best Approximation Theorem, 579 Binary code, 48 Binary representation of a num~r, 532 Binary vector, 47 Binet, Jacques, 335 Binet's formula, 336, 432 Block,143 Block multiplication, 146 Block triangular form, 282 Bunyakovsky, Viktor Yakovlevitch, 548 ~,
439 C", 436, 552 Carroll, Lewis, 280 Cassini, Giovanni Domenico, 359 Cassini's identity, 359 Cauchy, Augustin-Louis, 272. 279,548 Cauchy-Schwarz Inequality, 19,548 Cayley, Arthur, 297 Cayley-Hamilton Theorem. 297 Centroid of a triangle, 29 Change of basis, 467-476 Characteristic equation. 289 Characteristic polynomial, 289 Ch«k digit, 49 Circuit, 237 Circumccnter of a triangle, 30 Oosure under addition, 190,433 under linear combinations, 190 under scalar multiplication, 190,433 Codabar system. 55
Code(s) binary, 48 dimension of. 241 dual, 408 equivalent. 406 error-correcting, 240--243, 627 error-det« ting, 48-50, 627 Hamming, 242 length of, 241 linear, 529 minimum distance of, 626 parity ch«k, 49 Reed-Muller, 532 vector, 48, 240 Codomain, 210 Coefficient(s) Fourier, 624 of a linear combination, 12, 152 of a linear equation, 59 matTix,68 method of undeternuned, 667 of a polynomial, 661 Cofactor. 265 Cofactor expansion, 265-268 Column matrix, 136 Column-row representa tion of a matrix product, 145 Column space, 193 Column vector, 3, 136 Commutativity, 10, 16, 152,433 Companion matrix, 296 Complement of a binary vector. 533 Complex dot product, 552 Complex numbers, 650-660 absolute value of, 652 addition of, 650 argument of, 653 conjugate of, 651
I
Index
Diagonal entnes of a matrix, 137 Diagonal matrix, 137 Diagonalizab1e linear transformation, 513 Diagonahzable matrix, 300 o rthogonally, 397 unitarily, 555 Diagonalization, 300-306 orthogonal,397-404 Diagonalization Theorem, 304 Diagonalizing a quadratic form, 414 Diagonally dominant matrix, 126,321 Difference of complex numbers, 65 1 of matrices, 138 of polynomials, 662 of ve<:tors, 8, 437 Differentia! equation{s), 337, 360, 440,522 bo undary conditIOns for, 527 homogeneous, 440, 522-529 inmal conditions for, 337,340, 341,360 solution of, 522 system of linear, 337-345 Differential operator, 477 Digital image i;Qmpression, 616-617 Digraph,238 DimenSion, 201,457 of a code, 241 , 530 Direction vtctor, 32, 36 Disjoint sets, 637 Distance Hamming, 563 from a point to a line, 38-40 from a point 10 a plane, 40-41 taxicab, 538 between vectors, 20, 544 Distance functions, 563-564 Distribu\lvity.lO, 16, [52, 156,433 Division algOrithm, 664 Dodgson, Charles Lutwidge, 280 Domain, 210 Dominant eigenvalue, 308 Dominant eigenvector, J08 Dot product, 15 complex, 552 weighted, 541 Dual code, 408 Dual space. 51S Dynamical system, 252, 345-352 trajectory of, 346
111
Echelon form of a matrix redu ced row, 76 row, 68 Edge of a graph, 236 Eigenspace, 255 Elgenvalue(s),253 algebraic multiplicity of, 291 dominant, 308 geometric multiplicity of, 291 inverse power method for computing,314-315 power method for computing, 30S-3l3 shifted inverse power method for computing, 315-316 shifted power method for computing,313-314 Eigenvector{s),253 domman!,308 orthogonal,399 Electrical network. \06-\09 Elementary matrix, 168 Elementary reflector, 394 Elementary row operations, 69- 70 Elimination Gauss-Jordan, 76-78 Gaussian, 72-76 Empty set, 635 Eq uality of complex numbers, 650 of matnces, 137 of polynomials, 662 of selS, 635 of vectors, 4 Equati on(s) li near, 59 normal,584 system of linear, 60 EqUivalence relation, 299 Equivalent codes, 406 Enor-co rrecting code. 240-24J,627 Error-detecring code, 48-50,627 Er ror vector, 581 Euclidean norm, 562 Euler, leonhard, 658 Euler's formula, 659 Even function, 630 ExpanSion by cofactors, 265-268 Exponential of a matrix, 343 '3',435 Factor Theorem, 664
".
Index
Factorization LU,178-18-4 modified QR, 39; Q R, 389-39 1 Fibonacci,333 Fibonacci numbers, 332 Fm it~-dimensiona! vect. Finite linear gamts, \()9. Floatmg point form, 66 Fouri«, kan-Baptiste Je Fo urier approximation, Fourier ooefficients, 624 Fourier ~ries, 626 Free variable, 75 Frobenius, Geo rg, 202 Fr0b4.-nius norm, 565 Fundamenta1 subspaces Fundamental Theorem Fundamental Thcore-m , Matrice5, 170,204,2'
Grassmann, Hermann, 433 Grassmann's Identity, 462, 500 Half-life, 524 HamIlton, Wi\!iam Rowan, 2, 297 Hamming, Richard Wesley, 243 Hamming code, 242 Hamming distance, 563 Hamming norm, 563 Harmonic mean, 560 Head of a veclOr, 3 Head-to-tail rule, 6 Hermitia n matrix, 554 Hilbrrt, DaVId, 400 Homogeneous linea r differential equatio ns, 522-529 Homogeneous linear system, 80 Hooke's law, 528 Householder, Alston Scott, 393 Householder mal ri x, 394 Hyperplane, 37
I nverlible Imear transformation, 219-220.482 Invertible matrix, 161 Irreducible matrix. 332 Irreducible polynomial, 667 Isometry, 372 Isomorphism, 497 !ccratlve methodes) convergence of, 123,309-313, 572-575 Gauss-Scidd method, 122- 129 inverse power method, 314-315 Jacob,'s method, 122- 129 power method, 308-3 i3 shifted inverse power method, 315-316 shifted power method. 313-31 4 Jacobi, Carl Gustav, 122 '~ cobi's method. 122- 129 Jo rdan, Wilhelm, 76
Galil~i,
GaJileo, 535 Galois, Evariste, 308. 66Gauss, Carl Friedrich, 7 578,668 Gauss-Jordan eliminali' Gauss-Seidel method, I Gaussian elimination, 7 Generi1! form of the eqt line, 31. 33, 38 General form of the eqt plane, 35, 38 Ge nerator ma trix, 241" Geometric mean, 557 Geom~tric multiplicity, Gcrschgorin disK, 316 Gerschgo rin's Disk The Global P05itioning Syst, (GPS), Jl9-121 Google, 354-355 Gram, Jorgen Pedersen, Gram-Schmidt Process. Grap h, 236, 252-253 adjacency malri) bipartite, 248, complete, 252 complete bipnti connected,358 cycle, 253 d ir~ted (dlgrapi edges of, 236 k-regular,358 path in a, 237 ~tersen, 253 vertices of. 236
,650 Idempotent ma trix, 177 Idenllty matrix, 137 Identity transformation, 219. 478 iU-conditioned line~r system, 67 IU-condltiontd matrix, 570 Image, 210 Imaginary Ui5, 650 Imagmary conic, 428 Imaginary pari of a complex numbe r, 650 Inconsistent lineilr system, 61 Indefinite matrix, 416 quadrat ic form of, 4J6 Index of summation, 638 Infinite-dimensional vector space, 457 Initial pomt of a V«\or, 3 Inner product, 540 Inner product space, 5<10 and Cauchy-Schwarz and Triangle InequalitIes, 548-54.9 distance between vectors in, 544 length of vectors in, 5<14 orthogonal vectors 10, 544 proptrtles of, 544 Inlernational Sta ndard Book Number (ISBN), 53 Intersection of sets, 637 Inverse of a linear transformation, 219-220,482 of a matn x, 161
Kernel,486 Kirchhoff's Laws, 106 Lagrange, Joseph Louis. 462 Lagrange interpolation form ula, 463 Lagrange polynomials, 462 Laplace, Pierre Simon, 265 Laplace Expansion Theorem, 265, 279 Lattice, 520 Leading entry, 68 Leading 1,76 Leading variable, 75 Least squares approximation, 577-578. 58D-591 Btst ApprOXImation Theorem and,579-580 and orthogonal prOJection, 592-594 and the pseudoinversc of a matrix, 594-595 via the OR factori1..ation, 59 1-592 via the singular value decomposition, 612-614 Least squares approxunating line, 583 Least squaTts erro r, 581 Least squares soluilon, 583 of minimal length, 612 Least Squares Theorem. 584 Legendre, Adnen Mane, 5<17 Legendre polynomials, 547 Leibniz, Gottfned \\filhelm von, 280 Lemma, 270
MacWilliams, Florence Jessie Collinson,
Frobelllus,565 operator, 568 Norm of a vector, 17,544,561 1-,562 2-,562 ""-,562 Euclidean, 562 Hamming, 563 max, 562 sum, 561 taxicab, 539 uniform, 562 Normal equations, 584 Normal form of the equation of a hne, 3 \, 33, 38 Normal form of the equation of a plane, 35, 38 Normal matrix, SS6 Normal vector, 31. 35 Normalizing a vector, 18 Normed linear space, 561 Nutl space, 195 Nulhty o f a linear transform ation, 488 of a matrix, 202 Odd function, 630 Ohm's Law, 106 One-to-one, 492 Onto, 492 Optimization constrained, 416-418 geometric inequalities and, 556-560 Orbital cenler, 352 Ordered "-tuple, 9 Ordered pair, 3 Ordered triple, 8 Orthocenter of a triangle, 30 Orthogonal basis, 367, 546 Orthogonal complement, 375 Orthogonal Decomposition Theorem, 38 1 Orthogonal diagonaliUltion, 397-404 Orthogonal matrix, 371 Orthogonal proj«1ion, 379, 547 Orthogonal set of vectors. 366, 546 Orthogonal vectors. 23, 544 Orthonormal basis, 369, 546 Orthonormal set of vectors, 369, 546 Outer product, 145 Outer product expansion, 145 OUler product form of the SVD,605
~, 435
~ ... 435
Paralldograrn rule, 6 Parameter, 33 Parametric equation of a line, 33, 38 of a plane, 36, 38 Parny,49 Parity ch«k code, 49 Parity check matrix, 241, 406 Partial fraCiions, 118 Parlial pIVoting, 86 Partitioned matrix, 143-147 Path(s) c-,237 length of, 237 nurnbt"r of, 237-239 si mple, 237 Pcano, Giuseppe, 433 Penrose, Roger, 612 Penrose conditions, 595 Permutation matrix, ISS Perpendicula r bisector, 30 Perron, Oskar, 329 Perron eigenvector, 332 Perron-Frobcnius Theorem, 332 Perron root, 332 Perron's Tneorem, 330 Petenen graph, 253 PIVot , 70 Pivellng.70 partlal,86 Plane, 35-38 Argand,65O complex, 650 equatIOn of, 35, 36, 38 Polar decomposition, 619 Polar fo rm of a co mplex number, 652 P6l ya, George, 640 Polynomial,661-670 characterlSt ic,289 d~ree of, 661 IrredUCible, 667 lagrange. 462 legendre, 547 Taylor, 475 trigonometric, 623 zero of, 664 Population distribu tion vector. 234 Population growth, 233-235, 327-329 Positive definite matrIX, 4]6 quadratic form of, 4]6 Positive semidefinite matrix, 416 quadratic form of, 416
Index
Rank of a linear transformatIon, 488 of a matrix, 75, 202 Rank Theorem, 75, 203, 383, 490 Ranking vector, 353 Rational Roots Theorem, 665 RayleIgh, Baron, 313 Rayleigh quotient, 313 Real axis, 650 Real part of a complex number, 650 Recurrence relation, 333 solution of, 334 Reduced row echelon form, 76 Reed-Muller code, 532 ReflectIon, 213, 222 Regular graph, 358 Repeller, 349 Robotics, 224-227 Root, of a polynomial equatIon, 664 Root mean square error, 621 Rotation,214- 216 center of, 520 Rotational symmetry, 520 Roundoff error, 66 Row echelon form, 68 Row equivalent matrices, 72 Row matrix, 136 Row-matrix representation of a matrix product, 144 Row reduction, 70 Row space, 193 Row vector, 3, 136 Saddle point, 349 Scalar, 8 Scalar matru, 137 Scalar multiplication, 7, 9, 138,433 closure under, 190,433 Scaling,311 Schmidt, Erhardt, 387 Schur, lssai, 282 Schur complement, 282 Schur's Triangularization Theorem, 405 Schwarz, Karl Herman Amandus, 548 Seidel, Philipp Ludwig, 123 Seki Kowa, Takakazu, 279 Self-dual code, 410 Ser(s),634--637 dIsjoint, 637 elemen ts of, 634 empty, 635 intersection of, 637 subset of, 635 union of, 637
111
Shannon, Claude Elwood, 47 Similar matrices, 298 Simple path, 237 Size of a matrix, 136 Singular valuc5, 599 Singular vectors, 602 Singular value decomposition (SVD ), 601--608 applications of, 608--614 and condition number, 611 and least squares apprOXImatIOn, 612--614 and matnx norms, 609--611 outer product form of, 605 and polar decomposition, 619 and pseudoinverse, 611--612 and rank, 609 Skew lines, 79 Skew-symmetric matrix, 160 SolutIOn of a dIfferential equation, 522 least squares, 583 of a linear system, 60 mimmum length least squares, 612 of a recurrence relation, 334 of a system of differential equations, 3,37- 339 Span, 92, 154, 191,442 Spectral decomposition, 402 Spectral Theorem, 400 projectIOn form of, 402 Spectrum, 400 Spiral attractor, 352 Spiral repeller, 352 Square matrix, 137 Square root of a matTiX, 428 Standard basis, 196,451 Standard generator matrix,241 Standard matrix, 214 Standard parit y check matrix,241 Standard position, 4 Standard unit vectors, 19 State vector, 229 Steady-state vector, 231 Stochastic matrix, 230 Strutt, John William, 313 Subset, 635 Subspace(s), 190,438 fundamental,377 spanned by a set of vectors, 190-191,445
Triple scalar product identity, 46, 284 Turing, Alan Mathison, 179 Union of sets, 637 Unit circle, 18 Unit lower triangular matrix, 179 Unit sphere, 544 Unit vector, 18,544 Unitarily diagonalizable matrix, 555 Unitary matrix, 554 Universal Product Code (UPC), 52 Upper triangular matrix, 160 block,282 Vandermonde, A1exandre-Theophtle, 288 Vandermonde determinant, 288 Vector(s), 3, 9, 433 addition of, 5, 9, 433 angle between, 21- 23 bmary,47 code, 48, 240 column, 3, 136 complex, 433, 436, 552- 553 complex dot product of, 552 components of, 3 coordinate, 206, 453 cross product of, 45-46, 283- 284 direction, 32, 36 distance between, 20, 544 dot product of, 15 equality of, 3 inner product of, 540 length of, 17,544 Imear combmation of, 12,437 linearly dependent, 95, 447 Imearly Independent, 95, 447 m-ary,5J norm of, 17,544,561 normal, 31, 35 orthogonal, 23, 366, 544, 546 orthonormal, 369, 546 parallel,8 population distribution, 234 probability, 229 rank ing, 353 row, 3, 136 scalar multlphcatiOn of, 7, 9, 433 span of, 92, 442 state, 229
steady-state, 231 ternary, 51 urnt, 18, 544 zero, 4, 433 Vector form of the equation '" of a line, 33, 38 Ve<:tor form of the equation of a plane, 36, 38 Ve<:tor space(s), 433 basis fo r, 450 complex, 433, 436, 552-553 dimension of, 457 finite-dimensional, 457 infinite-dimensional,457 isomorphic,497 subspace of, 438 over Zp' 433, 436 Venn, John, 635 Venn diagram, 635 Vertex of a graph, 236 Weight of a bmary vector, 4 I 1 of a magiC square, 464 Weighted dot product, 541 Well-conditiOned matrix, 570 Weyl, Hermann, 433 Whea tstone bridge circuit, 107- 108 Wronski, J6sef Maria Hoene-, 461 Wronskian, 461 x-axis, 3 xy-plane,8 xz-plane,8 y-axis, 3 yz-plane,8 Z l,47 Z;,47 Z m, 51
Z,;:, 5 I z-axis, 8 Zero matrix, 139 Zero of a polynomial, (. ,,4 Zero su b~pace, 441 Zero transformation . ;;8 Zero vc{:tor, 4, 433
•
-