Linear Algebra
and Geometry KAM-TIM LEUNG
HONG KONG UNIVERSITY PRESS
The Author
Dr K.T. Leung took his doctorate in Mathematics in 1957 at the University of Zurich.
From 1958 to 1960 he taught at Miami University and the University of Cincinnati in
the U.S.A. Since 1960 he has been with the University of Hong Kong, where he is now Senior Lecturer in Mathematics and Dean of the Faculty of Science.
He is the author (with Dr Doris L.C. Chen)
of Elementary set theory, Parts I and II,
also published by the Hong Kong University Press.
LINEAR ALGEBRA AND GEOMETRY
Linear Algebra
and Geometry KAM-TIM LEUNG
HONG KONG UNIVERSITY PRESS 1974
© Copyright 1974 Hong Kong University Press ISBN 0-85656-111-8 Library of Congress Catalog Card Number 73-89852
Printed in Hong Kong by EVERBEST PRINTING CO., LTD 12-14 Elm Street, Kowloon, Hong Kong
PREFACE Linear algebra is now included in the undergraduate curriculum of most universities. It is generally recognized that this branch of algebra, being less abstract and directly motivated by geometry, is easier to understand than some other branches and that because of the
wide applications it should be taught as early as possible. The present book is an extension of the lecture notes for a course in algebra and geometry given each year to the first-year undergraduates of mathematics and physical sciences in the University of Hong Kong since 1961. Except for some rudimentary knowledge in the language
of set theory the prerequisites for using the main part of this book do not go beyond Form VI level. Since it is intended for use by beginners, much care is taken to explain new theories by building up from intuitive ideas and by many illustrative examples, though the general level of presentation is thoroughly axiomatic. The book begins with a chapter on linear spaces over the real and the complex field in leisurely pace. The more general theory of linear
spaces over an arbitrary field is not touched upon since no substantial gain can be achieved by its inclusion at this level of instruction. In §3 a more extensive knowledge in set theory is needed for formulating and proving results on infinite-dimensional linear spaces. Readers who are not accustomed to these set-theoretical ideas may omit the entire section. Trying to keep the treatment coordinate-free, the book does not follow the custom of replacing any space by a set of coordinates, and then forgetting about the space as soon as possible. In this spirit linear transformations come (Chapter II) before matrices (Chapter V).
While using coordinates students are reminded of the fact that a particular isomorphism is given preference. Another feature of the book is the introduction of the language and ideas of category theory (§8) through which a deeper understanding of linear algebra
can be achieved. This section is written with the more capable students in mind and can be left out by students who are hard pressed for time or averse to a further level of abstraction. Except for
a few incidental remarks, the material of this section is not used explicitly in the later chapters. v
vl
Geometry is a less popular subject than it once was and its omission in the undergraduate curriculum is lamented by many mathe-
maticians. Unlike most books on linear algebra, the present book contains two substantial geometrical chapters (Chapters III and IV) in which affine and projective geometry are developed algebraically and in a coordinate-free manner in terms of the previously developed algebra. I hope this approach to geometry will bring out clearly the interplay of algebraic and geometric ideas. The next two chapters cover more or less the standard material on matrices and determinants. Chapter VII handles eigenvalues up to the
Jordan forms. The last chapter concerns itself with the metric properties of euclidean spaces and unitary spaces together with their linear transformations. The author acknowledges with great pleasure his gratitude to Dr
D.L.C Chen who used the earlier lecture notes in her classes and made several useful suggestions. I am especially grateful to Dr C.B.
Spencer who read the entire manuscript and made valuable suggestion for its improvement both mathematically and stylistically. Finally I thank Miss Kit-Yee So and Mr K.W. Ho for typing the manuscript. K. T. Leung University of Hong Kong January 1972
CONTENTS PREFACE .......................................
v
.......................... .................
4
...................
17
Infinite-Dimensional Linear Space A. Existence of Base B. Dimension C. Exercises
..................
32
§4 Subspace .....................................
35
Chapter I LINEAR SPACE § 1 General Properties of Linear Space A. Abelian groups B. Linear spaces D. Exercises §2 Finite-Dimensional Linear Space
I
C. Examples
A. Linear combinations B. Base C. Linear independence D. Dimension E. Coordinates F. Exercises §3
A. General properties B. Operations on subspaces C. Direct sum D. Quotient space E. Exercises
.............
Chapter II LINEAR TRANSFORMATIONS §5 General Properties of Linear Transformation .......... A. Linear transformation and examples B. Composition C. Isomorphism D. Kernel and image E. Factorization F. Exercises
45 45
§6 The Linear Space Hom (X, Y) .....................
62
Dual Space ....................................
73
A. The algebraic structure of Horn (X, Y) B. The associative algebra End (X) C. Direct sum and direct product D. Exercises
§7
A. General properties of dual space B. Dual transformations C. Natural transformations D. A duality between ..(AX) and
E. Exercises
§8 The Category of Linear Spaces .................... A. Category D. Exercises
B. Functor C. Natural transformation
84
.....................
§9 Affine Space ..................................
96 96
§ 10 Affine Transformations ..........................
113
Chapter IV PROJECTIVE GEOMETRY ................
118
§ 11 Projective Space ................................
118
.....................
141
Chapter III AFFINE GEOMETRY
A. Points and vectors B. Barycentre C. Linear varieties D. Lines E. Base F. Exercises A. General properties B. The category of affine spaces
A. Points at infinity B. Definition of projective space C. Homogeneous coordinates D. Linear variety E. The theorems of Pappus and Desargues F. Cross ratio G. Linear construction H. The principle of duality I. Exercises
§ 12 Mappings of Projective Spaces A. Projective isomorphism B. Projectivities C. Semilinear transformations D. The projective group E. Exercises
............................. .....................
Chapter V MATRICES § 13 General Properties of Matrices A. Notations B. Addition and scalar multiplication of matrices C. Product of matrices D. Exercises § 14 Matrices and Linear Transformations A. Matrix of a linear transformation B. Square matrices C. Change of bases D. Exercises
155
................
166
§15 Systems of Linear Equations ......................
175
Chapter VI MULTILINEAR FORMS ..................
196
§ 16 General Properties of Multilinear Mappings ........... A. Bilinear mappings B. Quadratic forms C. Multilinear forms D. Exercises
197
155
A. The rank of a matrix B. The solutions of a system of linear equations C. Elementary transformations on matrices D. Parametric representation of solutions E. Two interpretations of elementary transformations on matrices F. Exercises
§ 17 Determinants ..................................
206
Chapter VII EIGENVALUES ........................
§ 18 Polynomials ...................................
230 230
§ 19 Eigenvalues ...................................
239
A. Determinants of order 3 B. Permutations C. Determinant functions D. Determinants E. Some useful rules F. Cofactors and minors G. Exercises
A. Definitions B. Euclidean algorithm C. Greatest common divisor D. Substitutions E. Exercises A. Invariant subspaces B. Eigenvectors and eigenvalues C. Characteristic polynomials D. Diagonalizable endo-
morphisms E. Exercises §20 Jordan Form ..................................
A. Triangular
form
B.
Hamilton-Cayley
250
theorem
C. Canonical decomposition D. Nilpotent endomorphisms E. Jordan theorem F. Exercises
Chapter VIII INNER PRODUCT SPACES ...............
§21 Euclidean Spaces ............................... A. Inner
267 268
product and norm B. Orthogonality inequality D. Normed linear space
C. SCHWARZ'S
E.
Exercises
§22 Linear Transformations of Euclidean Spaces.......... A. The conjugate isomorphism B. The adjoint transformation C. Self-adjoint linear transformations D. Eigenvalues of self-adjoint transformations E. Bilinear forms on a euclidean space F. Isometry G. Exercises § 23 Unitary Spaces A. Orthogonality B. The Conjugate isomorphism C. The adjoint D. Self-adjoint transformations E. Isometry F. Normal transformation G. Exercises
280
.................................
297
Index ...........................................
306
Leitfaden - summary of the interdependence of the chapters.
Linear space I
Linear transformations
Affine geometry
Matrices
Eigenvalues
Multilinear forms
Inner product spaces
I Projective geometry
CHAPTER I LINEAR SPACE
In the euclidean plane E, we choose a fixed point 0 as the origin, and consider the set X of arrows or vectors in E with the common initial point 0. A vector a in E with initial point 0 and endpoint A is by definition the ordered pair (0, A ) of points. The vector a = (0, A) can be regarded as a graphical representation of a force acting at the origin 0, in the direction of the ray (half-line) from 0 through A with the magnitude given by the length of the segment OA. Let a = (0, A) and b = (O, B) be vectors in E. Then a point Con E is uniquely determined by the requirement that the midpoint of the segment OC is identical with the midpoint of the segment AB. The different cases which can occur are illustrated in fig. 1.
The sum a + b of the vectors a and b is defined as the vector c = (0, C). In the case where the points 0, A and B are not collinear, OC is the diagonal of the parallelogram OACB. Clearly our construction of the sum of two vectors follows the parallelogram method of constructing the resultant of two forces in elementary mechanics.
Addition, i.e. the way of forming sums of vectors, satisfies the familiar associative and commutative laws. This means that, for any three vectors a, b and c in the plane E, the following identities hold:
(Al) (a+b)+c=a+(b+c); (A2) a+b=b+a.
Moreover, the null vector 0 = (0, 0), that represents the force whose magnitude is zero, has the property (A3)
0 +a = a for all vectors a in E. 1
2
I
LINEAR SPACE
Furthermore, to every vector a = (0, A) there is associated a unique vector a'= (0, A') where A' is the point such that 0 is the midpoint of the segment AA'.
The vectors a and a' represent forces with equal magnitude but in opposite directions. The resultant of such a pair of forces is zero; for the vectors a and a' we get (A4) a' + a = 0. In an equally natural way, scalar multiplication of a vector by a real number is defined. Let A be a real number and a = (0, A) a vector in the plane E. On the line that passes through the points 0 and A there is a unique point B such that (i) the length of the segment OB is IAI
times the length of the segment OA and (ii) B lies on the same (opposite) side of 0 as A if A is positive (negative).
The product Xa of A and a is the vector b = (0, B). If a represents
a force F, then Xa represents the force in the same or opposite direction of F according to whether A is positive or negative, with the magnitude equal to IAI times the magnitude of F. The following three rules of calculation can be easily seen to hold good. For any real numbers A and p and any vectors a and b, (M 1) A(pa) _ (AA)a,
(M2) A(a+b)=Aa+Ab and (A+p)a=Aa+pa, (M3) la =a.
I
LINEAR SPACE
3
To summarize, (1) there is defined in the set of all vectors in the
plane E an addition that satisfies (Al) to (A4) and (2) for each vector a and each real number A, a vector Aa, called the product is defined such that (Ml) to (M3) are satisfied. At the same time, similar operations are defined for the forces acting at the origin 0 such that the same sets of requirements are satisfied. Moreover, results of
a general nature on vectors (with respect to these operations) can
be related, in a most natural way, to those on forces. Similar operations are also defined for objects from quite different branches
of mathematics. Consider, for example, a pair of simultaneous linear equations:
(*)
4X1+5X2- 6X3 =0 X, - X2 + 3X3 = 0.
Solutions of this pair of equations are ordered triples (a, , a2 , a3) of real numbers such that when a; is substituted for X; (i = 1, 2, 3) in the equations we get
4a1 + 5a2 - 6a3 = 0 and a, -a2 + 3a3 = 0. Thus (1, -2, -1), (-1, 2, 1) are two of the many solutions of the pair of equations (*). Let us denote by S the set of all solutions of the equations (*) and define for any two solutions a = (a,, a2, a3) and b = (b 1, b2 , b3) and for any real number A the sum as
a+b=(al +b1 a2 +b2 a3 +b3) and the product as ka = (Aal
,
Xa2 ,)la3).
Then we can easily verify that both a + b and Xa are again solutions of the equations (*); therefore addition and scalar multiplication are defined in the S. It is straightforward to verify further that (Al) to (A4) and (Ml) to (M3) are satisfied. This suggests that it is desirable to have a unified mathematical theory, called linear algebra by mathematicians, which can be applied suitably to the study of vectors in the plane, forces acting at a point, solutions of simultaneous linear equations and other quite dissimilar objects. The mathematician's approach to this problem is to lay down
a definition of linear space that is adaptable to the situations discussed above as well as many others. Therefore a linear space should be a set of objects, referred to as vectors, together with an addition and a scalar multiplication that satisfy a number of axioms similar to
4
1
LINEAR SPACE
(Al) ,(A2), (A3), (A4) and (M 1), (M2), (M3). Once such a definition is laid down, the main interest will lie in what there is that we can do with vectors and linear spaces. It is to be emphasized here, that the physical character of the vectors is of no importance to the theory of linear algebra - in the same way that results of arithmetical calcula-
tions are independent of any physical meaning the numbers may have for the calculator.
§ 1. General Properties of Linear Space A. Abelian groups From the preliminary discussion, we see that it is necessary for us
to know more about the nature of the operations of addition and scalar multiplication. These are essentially definite rules that assign sums and products respectively to certain pairs of objects and satisfy a number of requirements called axioms. To formulate these more precisely, we employ the language of set theory. Let A be a set. Then an internal composition law in A is a mapping
r: A x A - A. For each element (a, b) of A x A, the image r(a, b) under r is called the composite of the elements a and b of A and is usually denoted by a r b. Thus addition of vectors, addition of forces and addition of solutions discussed earlier are all examples of internal composition laws. Analogously, an external composition law in A between elements of A and elements of another set B is a mapping a: B x A -> A. Similarly,
scalar multiplications of vectors, forces and solutions discussed earlier are all examples of external composition laws. Now we say that an algebraic structure is defined on a set A if in A we have one or more (internal or external) composition laws that satisfy some specific axioms. These axioms are not just chosen arbitrarily; on the contrary, they are well-known properties shared by composition laws that we encounter in the applications, such as commutativity and associativity. Abstract algebra is then the mathematical theory of these algebraic structures.
With this in mind, we introduce the algebraic structure of the abelian group and study the properties thereof.
§1
GENERAL PROPERTIES OF LINEAR SPACE
5
Let A be a set. An internal composition law r: A x A - A is said to define an algebraic structure of abelian DEFINITION 1.1.
group on A if and only if the following axioms are satisfied. [G 1 ] For any elements a, b and c of A, (a rb)rc = a r(b rc). [G2] For any elements a and b of A, a rb = bra. [G3] There is an element 0 in A such that Ora = a for every element a of A. [G4] Let 0 be a fixed element of A satisfying [G3]. Then for every element a of A there is an element -a of A such that
(-a)ra = 0. In this case, the ordered pair (A, r) is called an abelian group. It follows from axiom [G3] that if (A, r) is an abelian group, then A is a non-empty set. We note that the non-empty set A is just a part of an abelian group (A, r) and it is feasible that the same set A may
be part of different abelian groups; more precisely, there might be two different internal composition laws r, and r2 in the non-empty set A, such that (A, r,) and (A, r2) are abelian groups. Therefore we should never use a statement such as "an abelian group is a. non-empty set on which an internal composition law exists satisfying the axiom [Gl ] to [G4]" as a definition of an abelian group. In fact, it can be
proved that on every non-empty set such an internal composition always exists, and furthermore if the set in question is not a singleton, then more than one such internal composition law is possible. For this reason, care should be taken to distinguish the underlying set A from the abelian group (A, r). However, when there is no danger of confusion, then we shall denote, for convenience, the abelian group (A, r) simply by A and say that A constitutes an abelian group (with respect to the internal composition law r). In this case, the set A is the abelian group A stripped of its algebraic structure, and by a subset (an element) of the abelian group A, we mean a subset (an element) of the set A.
The most elementary example of an abelian group is the set Z of all integers together with the usual addition of integers. In this case, axioms [G11 to [G4] are well-known properties of ordinary arithmetic. In fact, many other well-known properties of ordinary arithmetic of integers have their counterparts in the abstract theory of abelian groups. In the remainder of this section, § IA, we shall use the abelian group Z as a prototype of an abelian group to study the general properties of abelian groups.
6
1
LINEAR SPACE
For convenience of formulation, we shall use the following notations and abbreviations.
(i) The internal composition law r of (A, r) is referred to as the addition of the abelian group A.
(ii) The composite arb is called the sum of the elements a and b of the abelian group A and denoted by a + b. a, b are called summands of the sum a + b. The particular notations that we
use are not essential to our theory, but if they are well chosen, they will not only simplify the otherwise clumsy formulations but will also help us to handle the calculations efficiently.
(iii) A neutral element of the abelian group A is an element 0 of A satisfying [G3 ] above, i.e., 0 + a = a for every a eA.
(iv) For any element a of the abelian group A, an additive inverse of a is an element -a of A satisfying [G4] above, i.e., a + (-a) = 0. (v) As a consequence of the notations chosen above, the abelian
group A is called an additive abelian group or simply an additive group.
Now we shall turn our attention to the general properties of additive groups. To emphasize their importance in our theory, we formulate some of them as theorems. In deriving these properties, we
shall only use the axioms of the definition and properties already established. Therefore, all properties shared by additive groups are posessed by virtue of the definition only.
THWREM 1.2. For any two elements a and b of an additive group A, there is one and only one element x of A such that a + x = b.
is required to show that there exists (i) at most one and (ii) at least one such element x of A; in other words, we have to show the uniqueness and the existence of x. For the PROOF. It
former, let us assume that we have elements x and x' of A such
that a + x = b and a + x' = b. From the first equation, we get -a + b =-a + (a + x) = (-a + a) +x =0 +x = x. Similarly, -a +b = x'.
Therefore x = x', and this proves the uniqueness. For the existence, we need only verify that the element -a + b of A satisfies the condition that x has to satisfy. Indeed, we have a + (-a + b) = (a + (-a)) + b = 0 + b = b. Our proof of the theorem is now complete.
§I
GENERAL PROPERTIES OF LINEAR SPACE
7
Another version of the above theorem is this: given any elements a and b of an additive group A, the equation a + x = b admits a unique solution x in A. This solution will be denoted by b -a,
and is called the difference of b and a. Using this notation, we get a -a = 0 and 0 -a = -a for all elements a of A. Consequently, we have COROLLARY 1.3. In an additive group, there is exactly one neutral element and for each element a there is exactly one additive inverse -a of a.
Here is another interesting consequence of 1.2 (or 1.3). In an additive group A, for each element x of A, x = 0 if and only if a + x = a for some a of A. In particular, we get -0 = 0. Let us now study more closely the axioms [G1 ] and [G2] ; these are called the associative law and the commutative law of addition respectively. The equation (a + b) + c = a + (b + c) means that the element
of A obtained by repeated addition is independent of the position of the brackets. Therefore it is unambiguous to write this sum of three
elements as a + b + c. Analogously, we can write a + b + c + d = (a + b + c) + d for the sum of any four elements of an additive group A. In general, if a, , ... , aN are elements of an additive group A then, for any positive integer n such that 0 < n < N, we have the recursive definition:
a, +a2 + ... +a, +an+I =(a, +a2 + ...+an)+an+ The associative law [G 1 ] can be generalized into
[GI'] (ai +a2 + ... +am)+(am+I +am+2 + ... +am+n)
=a,+ a2 + ... +am+n, The proof of [G1'] is carried out by induction on the number n. For n = 1, [GI'] follows from the recursive definition. Under the induction assumption that [G1'] holds for an n > 1, we get
(a, + ... + am) + (am +, + ... + am +n + i )
=(a, + ... +am)+ [(am+, + ... +am+n)+am+n+i ] [(a, + ... +am)+(am+, + ... +am+n)] +am+n+i =(a, + ... +am+n) + am+n+i
=a, + ... +am+n+,.
8
LINEAR SPACE
1
This establishes the generalized associative law [G1'] of addition.
A simple consequence of the generalized associative law is that we can now write a multiple sum a + ... + a (n times) of an element a of an additive group A as na. The commutative law [G2] of addition means that the sum a + b is independent of the order in which the summands a and b appear.
In other words, we can permute the summands in a sum without changing it. Generalizing this, we get
[G2'] For any permutation (0(l), 0(2),
(1,2, ...,n),
ao(I) + ao(1) + ...
+ aO(n) =a1
... ,O(n)) of the n-tuple
+a2 + ... +an
where a permutation is a bijective mapping of the set (1, 2, ... , n ) onto itself. The statement [G2'] is trivially true for n = 1. We assume
that this is true for n-1 > 1, and let k be the number such that O(k) = n. Then (0(l), . . . , Ak), . . . , 0(n)), where the number ¢(k) under the sumbol ^ is deleted, is a permutation of the (n-l)-tuple (1, ... , n-1). From the induction assumption, we get
a¢(1) + ... + acb(k) + under ^ is deleted. Now
... +
2O(n)
= a1 + ... +an-1 where aq5(k)
ao(1) + ... +aO(n) =(ao(1) + ... +ao(k) + ... + ao(n ))+ ao(k)
=(a1 + ... +an-1)+an =a1 + ... +an. The generalized commutative law [G2'] of addition therefore holds. Taking [G Y] and [G2'] together we can now conclude that the sum of a finite number of elements of an additive group is independent of (i) the way in which the brackets are inserted and (ii) the order in which the group elements appear. It is convenient to use the familiar summation sign E to write the n
sum a I+ a2 + ... + an as Eai or E (ai : i = 1, ... ,n}. In particular, whenever the range of summation is clear from the context, we also
write a1 + a2 + ... + an as Eat or simply Eat. The elements n i
ai (i = 1, ... , n) are called the summands of the sum Ea.* i=1 Using this notation, we can handle the double summations with
more ease. Let aii be an element of an additive group A for each i = 1, ... , m and j = 1, ... , n. We can then arrange these m n group elements in a rectangular array:
§1
9
GENERAL PROPERTIES OF LINEAR SPACE
a1 1
a12 ...
all ........ain
a21
a22
. . .
a2J
....... . a2n
ai i
ai 2
...
ail
........ ai n
aml
amt
am/
amn.
A natural way of summing is to get the partial sums of the rows first
and then the final sum of these partial sums. Making use of the summation sign ;, we write: (a11 +a12 + ... +a1n) +(a21 +a22 + ... +a2n)+ ... + m
n
(ail + ai2 + ... + ain) + ... + (am 1 + amt + amn) = ; (;ail) 1=1 /=1
On the other hand, we can also get the partial sums of the columns first and then the final sum of these partial sums. Thus we write (a11 +a21 + ... aml)+(a12 + a22 + .. . +am2)+ ... + m (all +a2J+ ... +amj)+ ... +(aln +a2n + ... +amn)=;n(Fail). 1=1 i-1
Applying [G1'] and [G2'] , we get mn n m
; (;all) =E
1= 1 /=1
ail).
1=1 1=1
Therefore this sum can be written unambiguously as
;
i=1,...,m a. 1.
or;[a11:i=1,...,m;j=1,...,n} if
j=1'...,n
or simply Fail when no danger of confusion about the ranges of i and i,J j is possible. Triple and multiple summations can be handled similar-
ly. Finally we remark that there are many other possible ways of getting ;aij besides the two ways given above. 1,
Another important consequence of the laws [G1'] and [G2'] is that we can define summations over certain families of elements of an additive group. More precisely, we say that a family (ai)iGJ of elements of an additive group A is of finite support if all but a finite
10
1
LINEAR SPACE
number of terms ai of the family are equal to the neutral element 0 of A. The sum Eat of the family (ai)iEI is defined as the sum of the iEl
neutral element 0 and all the terms ai of the family that are distinct from 0. It follows from [G1'] and [G2'] that the sum Ea1
is well-defined for any family (ai)1Ej of finite support of elements
j E=-1
an additive group. In particular, when I
family, we get Eai = 0. Moreover, if 1 iEO
1,
0, i.e., for the empty
... , n 1, then n
Lai =a1 +a2 + ... + a, = Ea;.
iEI
i=1
B. Linear spaces
With the experience gained in dealing with additive groups, we now have no difficulty in laying down a definition of linear space.
DEFINITION 1.4. Let X be an additive group and R the set of all real numbers. An external composition law a: R x X -i X in X is said to define an algebraic structure of real linear space on X if and only if the following axioms are satisfied: [M 1 ]
for any elements A,µ of R and any element x of X, Aa(µax) = (X z)ax;
[M2] for any elements X, p of R and any elements x, y of X, (X + µ)ax = Xax + iax and Xa(x + y) = ]sax + pay; [ M3 ] for all elements x of X, 1 ax = x.
In this case, the ordered pair (X,a) is called a real linear space, a linear space over R, a real vector space or a vector space over R.
Here again, the additive group X and hence also the set X are only parts of a real linear space (X,a). However, when the addition and the external composition law a are clear from the context, we denote by X the real linear space (X,a). In this way, the letter X represents the set X, the additive group X, or the real linear space X, as the case may be. The external composition law a is called the scalar multiplication of the real linear space X and the addition of the additive group X is called the addition of the linear space X. The axioms [Ml ] and [M2] are called the associative law of scalar multiplication and the distributive laws of scalar multiplication over addition respectively. The composite Xox is called the product of X and x or a
multiple of x and is denoted by Ax. Elements of R are usually
§1
GENERAL PROPERTIES OF LINEAR SPACE
11
referred to as scalars and elements of X as vectors; in particular the neutral element 0 of X is called the nullvector or the zero vector of the linear space X. Algebraic structure of complex linear space and complex linear space are similarly defined. In fact, we need only replace in 1.4 the set R of all real numbers by the set C of all complex numbers. In the general theory of linear spaces, we only make use of the ordinary
properties of the arithmetic of the real numbers or those of the complex numbers; therefore our results hold good in both the real and the complex cases. For simplicity, we shall use the terms "X is a linear space over A" and "X is a A-linear space" to mean that X is a real linear space or X is a complex linear space according to whether A = R or A = C.
Now that we have laid down the definition of linear space, the main interest will lie in what we can do with vectors and linear spaces. We emphasize again, that the physical character of vectors is of no importance to the theory of linear algebra, and that the results of the theory are all consequences of the definition.
Here are some immediate consequences of the axioms of linear spaces.
THEOREM 1.5. Let X be a linear space over A. Then, for any AeA and
xeX,Ax=Oifandonly ifX=Oorx=0.
PROOF. If A= 0 or x = 0, then Ax = (A + A)x = Ax + Ax, or Ax = A(x + x)
= Xx + Ax respectively. Therefore in both cases, we get ax = 0. Conversely, let us assume that Ax = 0 and X 0 0. Then we get x=Ix From the distributive laws, we get (-X)x = A(-x) _ -(Ax) for each vector x and each scalar A. In particular, we have (-1)x = -x. We can
use arguments, similar to those we had in § I A, to prove the generalized distributive laws: [M2'] (A1 + ... +
A(al + ...
Al a +
+ Xa; +Aa,,.
C. Examples
Before we study linear space in detail, let us consider some examples of linear spaces.
12
1
LINEAR SPACE
EXAMPLE 1.6. A most trivial example of a linear space over A is the zero linear space 0 consisting of a single vector which is necessarily
the zero vector and is hence denoted by 0. Addition and scalar multiplication are given as follows. 0 + 0 = 0 and AO = 0 for all scalars A of A.
1.7. The set V2 of all vectors in the euclidean plane E with common initial point at the origin 0 constitutes a real linear space with respect to the addition and the scalar multiplication defined at the beginning of this chapter. The linear space V3 of all vectors in the euclidean space or (ordinary) 3-dimensional space is I?XAMPLE
defined in an analogous way. Similarly, the set of forces acting at the origin 0 of E also constitutes a real linear space with respect to the addition and the scalar multiplication defined earlier. EXAMPLE 1.8.
For the set Rn of all ordered n-tuples of real
numbers, we define the sum of two arbitrary elements
x=(1, of Rnby
,
Sn)andY=(111,
. . . ,
11n)
x+Y=( 1+111, ..., to+11n)
and the scalar product of a real number A and x by
Ax = (A1,
... , Atn)
It is easily verified that the axioms of linear space are satisfied. In particular 0
=
(0,
. . . ,
0) is
the zero vector of Rn
and
-x = (-h1, ... , -ln) is the additive inverse of x. With respect to the addition and the scalar multiplication above Rn is called the n-dimensional arithmetical real linear space. The n-dimensional arithmetical complex linear space Cn is similarly defined.
The set of all ordered n-tuples of complex numbers also constitutes a real linear space with respect to the addition and the scalar multiplication defined by
EXAMPLE 1.9.
rr (1, ..., to )+(771, ... , 'nN S I + 111, ... , Sn +71n) A(G1, ...,tn)=(AS1, ...,Atn) where A is a real number and ti, 71; are complex numbers for i = 1, . . , n. This real linear space shall be denoted by RC2 n . .
Note that the set RC2n and the set Cn are equal, but the real linear
§I
GENERAL PROPERTIES OF LINEAR SPACE
13
space RC2 n and the complex linear space C" are distinct linear spaces.
At this juncture, the reader may ask why the superscript 2n is used here instead of n. Until we have a precise definition of dimension (see §2C) we have to ask the reader for indulgence to accept that the linear space RC2n is a 2n-dimensional real linear space while the linear space Cn, with the same underlying set, the same addition but a different scalar multiplication, is an n-dimensional complex linear space.
EXAMPLE 1.10. Let R[T] be the set of all polynomials with real coefficients in the indeterminate T. R[T] is then a real linear space with respect to the usual addition of polynomials and the usual multiplication of a polynomial by a real number.
EXAMPLE 1.11. The set F of all real valued functions f defined on the closed interval [a,b ] = [ t( =-R: a < t < b) of the real axis is a real linear space with respect to the following addition and scalar multiplication: (f +g)(t) = f(t) +g(t) and (Af)(t) = A(f(t)) for all tE [a, b] I. EXAMPLE 1.12. Consider the set of all differentiable functions f de-
fined on the closed interval [a, b I, which satisfy
a SCHRODINGER'S
differential equation d2 dt2f-Af
for a fixed real number A. This set constitutes a real linear space with respect to addition and scalar multiplication of functions as defined in 1.11.
(a) Let S = {s1, ....,sn } be a finite non-empty set. Consider the set FS of all functions f : S - R. With respect to EXAMPLES 1.13.
addition and scalar multiplication of functions as defined in 1.11, FS
constitutes a real linear space called the free real linear space generated by S or the free linear space generated by Sover R. If for every element s, (=- S we denote by f, : S -+ R the function defined by
f (s,) = then every f= f(s1)f1 +
vector f of
I ifi=j 0 if i f/
FS can be written uniquely as + f(sn)fn. It is convenient to identify each s; ES with
the corresponding f;ES and consequently for every fEFs we get f=f(s1)s1 + +f(Sn)sn
14
1
LINEAR SPACE
(b) If S is an infinite set, some slight modification of the method given above is necessary for the construction of a free linear space.
At the end of § 1A we saw that the natural generalization of a finite sum of vectors is a sum of a family of vectors of finite support. Therefore in order to arrive at a representation similar to + f(sn)f, above, we only consider functions f: S-> R for which the subset {tES: f(t) * 0} of S is finite. A function f: S -> R with this property is called a function of finite support. Let Fs be the set of all functions f: S -* R of finite support. Then with respect to addition and scalar multiplication as defined in 1.11,
f = f(sl)f1 +
Fs constitutes a real linear space called the free linear space generated
by S. For every tES by ft : S -* R again we denote the function defined by
f () r
1
if t = x
0 ift*x
Then ft E Fs. If f E Fs, then the family (f(t))tES of scalars is of finite support since f is of finite support. Therefore the family (f(t )ft)r Es of vectors of Fs is also of finite support. Hence E f(t)ft is tES a vector of Fs and f = E f(t)ft. Again, for convenience, we identify (ES
each t E S with the corresponding ft E Fs and consequently for every
vector f r= Fs we get f = E f(t)t.
tES Free complex linear space generated by S is similarly defined.
The restriction to functions of finite support imposed to Fs of 1.13 (b) is necessary for representation of vectors of Fs by a sum: f = E f(t)t. To provide another type of linear space we drop this restriction. Let S be a (finite or infinite) non-empty set. We consider the set Its = Map (S, R) of all mappings S -* R and define sum and product by EXAMPLE 1.14.
(f + g) (t) = f(t) + g(t) (Xf) (t) = X(f(t)) Then RS is a real linear space with respect to addition and scalar multiplication above. Every vector f E RS is uniquely represented by the family (f(t))1ES of scalars. Moreover the representation is compatible with the algebraic structure of RS in the following sense: if f and g are represented by (f(t))res and (g(t))tE=-S respectively, then
f + g and Xf are represented by the families (f(t) + g(t))fES and
GENERAL PROPERTIES OF LINEAR SPACE
§I
15
(?f(t))tes respectively. If S is finite, then Rs = Fs. If S is infinite, then RS and Fs are essentially different linear spaces: If S = [a, b I, then RS is identical with the linear space F of 1.1 1.
These few examples show that the theory of linear space can be applied to various kinds of dissimilar objects. On the other hand, these examples can always be used to illustrate definitions and theorems throughout the course. The reader will find it most helpful to resort to these and other examples when he has difficulty in understanding an abstract definition or a complicated theorem.
D. Exercises 1.
2.
Let A be an additive group. For any elements a and b of A, we denote by a - b the unique element x of A such that x + b = a. Show that (i)
-(a + b) = (- a) - b.
(ii)
a - (b - c) =(a - b) + c.
Let A be an arbitrary non-empty set. Prove that there exists an
internal composition law r: A x A - A so that (A, r) is an additive group. Prove also that more than one such internal composition law exists if A is not a singleton. 3.
In the set T = 10, 11 we define an internal composition r by:
0T0=0, 1 70=0r 1 1
= 1,
r 1 = 0.
Show that (T, r) is an additive group. Find another internal composition a in T so that (T, a) is an additive group. 4.
... , a are fixed complex numbers, show that the set of all ordered n-tuples x = (x, , ... , of complex numbers such If a, ,
that a, x, +
. . .
+ a x = 0 is a linear space under the usual
addition and scalar multiplication.
16
5.
1
LINEAR SPACE
Find out whether the following addition and scalar multiplication in R x R define an algebraic structure of linear space on
RxR.
(
1
,
t2) +rr(771 ( 1,52
6.
772) = (
1
+ rtl , t2
(X 1,X 2 +
2
+rr
772
+ S1 771)
Sj)
Let R+ be the set of all ordered n-tuples of positive real numbers. Show that Rn constitutes a real linear space with respect to addition and scalar multiplication defined by
(al, ..., an) + (N1, ..., On) = (cj31, ...,cnOn) t(a1, ...,Lyn) = («1, ...,axn). 7.
Show that the set X = R x R becomes a complex linear space under the composition laws: (x1 , x2) + (Y1, Y2) = (x1 + Y1 , x2 + Y2) and
(a + i(3) (x, y) = (ax - (3y, ay + /3x) 8.
.
Let a = (1, 0, 2, 3), b = (2, -1, 4, 7) and c = (3, 1, 5, -3) be vectors R1. Evaluate
2a - 3b + 5c
-a.-
b+
c
a + 3b + 8c and find the scalars X, µ, v so that
Xa + µb + pc = (1, -1, 1, -7). 9.
Let A and B be two linear spaces over the same A. Prove that A x B is a linear space over A with respect to the addition and scalar multiplication defined by (a, b) + (a', b') = (a + a', b + b') and
X (a, b) = (Xa, X b) .
The linear space A x B is called the cartesian product of the linear space A and B.
§2
10.
FINITE-DIMENSIONAL LINEAR SPACE
17
Let (Ai)i EI be a non-empty family of linear spaces all over the same A and A = TTAi is the cartesian product of the sets A;(iEI). iEI
For any elements x = (xi)i EI and y = (yi)i Et (where xi E Ai, yi E Ai for every i d ) of A and any scalar AE A, we define
x + Y = (xi + Yi)iEI Ax = (Axi)iEJ (a)
Show that A is a linear space over A with the addition and
the scalar mutiplication defined above. A is called the cartesian product of the linear spaces Ai(iEI). (b) Show that the subset B of A consisting of all elements
x = (xi)iEI with all but a finite number of xi equal to 0 is a
subspace of A. B is called the direct sum of the linear spaces A1(i(=-I )
§2. Finite-Dimensional Linear Space
It follows from 1.5 that the underlying set of a linear space X is always an infinite set unless X = 0, which is an entirely uninteresting case. But, as we have emphasized before, the linear space X is not just the underlying set X alone. To study the linear space X we must make full use of its algebraic structure. In the present §2 we shall go
into the question whether a number of "key vectors" of X can be selected in such a way that all vectors of X can be "reached" from these "key vectors" by an "algebraic mechanism".
A. Linear combinations Consider the linear space V2 of all vectors in the euclidean plane E with common initial point at the origin O. To start with, let us pick an arbitrary non-zero vector a of V2 and see what other vectors of V2 can be obtained by repeated applications of addition and scalar multiplication to it. Clearly vectors obtained this way are all of the form Xa where A is a real number; in other words; the endpoints of these vectors all lie on one and the same straight line on E passing through O. This shows that (i) a large number of vectors of V2 can be "reached" by this "algebraic mechanism" from a single vector, and
18
I
LINEAR SPACE
(ii) there are vectors of V2 which cannot be "reached" this way. Now a simple geometric consideration shows that if a and b are two non-zero and non-collinear vectors, i.e., their endpoints and initial points are not collinear,
a+b Fig. 4
0
b Fig. 4
then repeated applications of addition and scalar multiplication on them will yield vectors of the form Xa + pb. But every vector of V2 can be put into the form Xa +µb; thus a and b can be regarded as a
pair of "key vectors" of V2.
Let us now clarify our position by giving precise definitions to those terms between inverted commas. Let X be a linear space over A, x and y two vectors of X, and A and p two scalars of A. Then the scalar multiplication of X allows us to form in X the multiples Ax and µy of x and y respectively; and the addition of X allows us to form in X the sum Ax + µy. We called the vector Ax + py of X a linear combination of the vectors x and y. The concept of linear combination is a very important one in the theory of linear space, for the concepts of sum and product, which are fundamental in the algebraic structure of linear space, are special cases of it. Consider a linear combination Ax + µy of vectors x and y. Substituting special values for A and p, we see that the vectors 0, x and y, the sum x + y and the products Ax and py are all linear com-
binations of the vectors x and y. On iteration, we have, for the scalars Al , ... , An and the vectors x1, ... , xn , a linear comp
bination Al x1 + ... + Anxn =EA;x,. More generally, let (A,);Ej be a family of scalars and (x;);EI be a family of vectors. If (x;)tEJ is of
finite support or (AI);Ej is of finite support (i.e., all but a finite number of terms are zero), then the family (Afx;)jEtof vectors is, by 1.5, of finite support. ThereforeEN.x; is a vector of the linear space
called a linear combination of the vectors of the family (xi)1EJ.
§2
FINITE-DIMENSIONAL LINEAR SPACE
19
Clearly, if x is a linear combination of vectors of a family (x;),Efsuch
that for each iEI, x; is a linear combination of vectors of a family then x is also a linear combination of the vectors of the family (Y/)jEJ
B. Base
In the last section, we have given a precise definition to the "algebraic mechanism" mentioned in the introductory remarks. In this section, we try to do the same to the "key vectors" through the concept of base.
It follows from the definition of linear combination that each vector of a linear space X is a linear combination of vectors of X. This result, trivial though it is, leads to some important definitions and questions in our theory. DEFINITION 2.1. Let X be a linear space and (x;),Ela family of vectors of X. The family (x;),Elis said to generate the linear space X
if and only if each vector x of X is a linear combination of the vectors of the family. In this case (x,),E1is called a family of generators of the linear space X.
Since Ex; = 0, we see that the empty family generates the zero ieo linear spac e 0. In general, the family of all non-zero vectors of a linear space X generates the linear space X. Clearly, we can remove from this family a great deal of vectors so that the reduced family still generates X. For example, we can take away, for a fixed non-zero
vector x, all scalar products Ax with X * 1, and then for a fixed non-zero vector y : x of this reduced family all Ay with A 0 1, and so forth. Our aim is then to remove all the redundant vectors and get a minimal family of generators of X. The advantage of dealing with such a minimal family is obvious.
DEFINITION 2.2. A base of linear space X is a family (x;);Ej of generators of A such that no proper subfamily of (x;)((-=S generates A.
Now we can put forward a most important question. Does there always exist a base for each linear space? We shall show in §3A that
a base always exists in a linear space, thus answering the above question in the affirmative. For the present, let us consider some special cases.
In the real linear space V2 of 1.7, any two non-zero vectors (0, A) and (0, B), where 0, A and B are not collinear, form a base of V2.
20
1
LINEAR SPACE
In the real (complex) linear space R" (Cn) the family (e;)t= of vectors R" (Cn) where
e1 =(1,0,...,0),e2 =(0,1,0,...,0), ...,en =(0,...,0,1) form a base of R" (C"), called the canonical base of R"(C"). For the real linear space RCZ" we find that the family a, _ (1,0, . .,0), a2 = (0,1,0, .
... ,0), ... , an = (0, ... Al)
b1 =(1,0,...,0),b2 =(0,1,0,...,0),...,b"=(0,...,0,i)
of vectors of RCZ" form a base for RC,". The family (pk)k=0, 11 ... of polynomials where pk = Tk form an infinite but countable base of the linear space R [ T 1. The free linear space generated by a set S over A admits the family
(f )tes as a base. C. Linear independence
In this and the following sections, we shall study properties of bases of a linear space so as to prepare ourselves for the proof of the theorem on the existence of base. Let (xI, ... , xn) be any finite subfamily of a base B of a linear space X. Then none of these vectors is a linear combination of the other n -1 vectors; for otherwise, we would obtain a proper subfamily
of B that generates X by removing one of these n vectors from the given base B. This is an important property of vectors of a base and an equivalent formulation of it is given in the following theorem.
... , xn) be a family of n (n > 0) vectors of a linear space X over A. Then the following statements are THEOREM 2.3. Let (x1,
equivalent : (i)
none of the vectors x1, ... , x" is a linear combination of the others.
(ii) if, for any scalars A1i ... , An of A, Al x1 + then Al = A2 = ... = A" = 61
+ Anxn = 0,
PROOF. (ii) follows from (i). Assume that A; 0 0 and A1x1 + An xn = 0. Then we would get xj = ( 1 1 X1 + ... +
A
+ ... +
A
xn
+
21
FINITE-DIMENSIONAL LINEAR SPACE
§2
where the summand x; on the right-hand side under the symbol is deleted. This would mean that the vector x, is a linear combination of the others, contradicting (i). (i) follows from (ii). Assume that x; is a linear combination of the + zl + + other vectors. Then we would get x, = A, x, + An Xn where x; under" is deleted. Therefore
A, x, + ... + Ai xi + ... + )"'X" = 0 where A. = - 1, contradicting (ii). DEFINITION
2.4a. A finite family (x1
,
.
. . ,
x,) of vectors of a
linear space X over A is said to be linearly independent if and only if
it satisfies the conditions of 2.3; otherwise it is said to be linearly dependent.
It follows that the empty family, every finite subfamily of a base
of X and every subfamily of a linearly independent family are linearly independent. Furthermore, by 1.5, a family (x) consisting of a single vector x is linearly independent if and only if the vector x is
non-zero. For a family (x,y) consisting of a pair of vectors to be linearly independent, it is necessary and sufficient that both x and y are non-zero and x is not a multiple of y. Moreover, no two terms of a linearly independent family (x1 , ... , x) are equal, i.e., x; * x for i * j. Therefore if the family (xI , . . . , is linearly independent, then we can say that the set {x, , ... , X,,) of vectors is linearly independent or that the vectors x; (i = 1, . . . , n) are linearly independent. In other words, a finite family of linearly independent vectors is essentially the same as a finite set of linearly independent vectors. A necessary and sufficient condition for a family (y1
of vectors being linearly dependent
is
, ... , Am) of scalars such that AI y, + for some i =1 , ... , m. Clearly, a family (y, ,
that there
(A,
,
is
...
, Ym )
a family
+) ym = 0 and A, 0 0
. , y,n) of vectors is linearly dependent if one of its terms is equal to 0, or if y, = yl for some i 0 j. Finally, for a linearly dependent family (yI, . . . , ym )
of vectors, the set (y1 , ... , ym) may or may not be linearly
dependent. For example, if b * 0, then the family (bb) is linearly dependent and the set {b, b) = {b } is linearly independent. On the other hand, both the family (0,b) and the set {0,b} are linearly dependent. For this reason, care should be taken to distinguish a
linearly
dependent
{y,,...,ym}
.
family (yi
, ... ,
ym)
from
the
set
22
1
LINEAR SPACE
Let (xl , ... , xn) be a linearly independent family of vectors of a linear space X. Then THEOREM 2.5a.
Xi XI + ... + Anxn = pl xi + ... + 11nXn if and only if A, = µ, for all i = 1, ... , n. PROOF. If A, x, + .. 0 = (A,X, + . .
(A, - µi )Xi +
+ AnXn = Al Xl + . + AnXn) - (lAlX1
+ tlnx +
.
.
.
,
+
then we get IlnXn)
+ (An - µn )xn . Since the family is linearly independent, we get (Ai - µi) = 0 for all i = 1, . . , n. Therefore .
.
Ai = µ, for all i = 1,
. . . ,
n. The converse is trivial.
From Theorem 2.5a. It follows that there is a one-to-one corres-
pondence between the set of all linear combinations of vectors xl , ... , xn and the set of all ordered n-tuples (Al , . . An) of .
scalars. By means of this correspondence, a process of assigning coordinates (similar to those used in analytic geometry) to linear combinations is possible.
We now generalize 2.4a. A generalized definition of linear independence is needed for the characterization of a base given in 2.7 below and it is also needed for the proof of existence of bases in §3.
DEFINITION 2.4. A family (x,),EI of vectors of a linear space X is said to be linearly independent if and only if all its finite subfamily
are linearly independent; otherwise it
is
said to be linearly de-
pendent.
2.4 agrees with 2.4a when the family (x,),(=-/ in question is finite. Similarly a necessary and sufficient condition for a family (x,),EI of vectors to be linearly independent is that if EA,x, = 0 for a iEl family (A,),EI of finite support of scalars, then A, = 0 for all iE7. Again
because no two vectors of a linearly independent family are equal, we need not distinguish a family of linearly independent vectors from a set of linearly independent vectors. However a family of linearly dependent vectors is essentially different from a set of linearly dependent vectors for reasons similar to those given above. Theorem 2.5a can be generalized as: THEOREM 2.5. Let (x,),(=-I be a linearly independent family of vectors of a linear space X. Then EA,x, = Ypix, if and only if A, = p, for all iEJ.
iEl
iel
§2
FINITE-DIMENSIONAL LINEAR SPACE
23
The proof of 2.5, which is a straightforward rewording of that of 2.5 a, is left to the reader.
Our next theorem gives two necessary and sufficient conditions for a family of vectors to be a base of a linear space. THEOREM 2.6. Let B = (xi)lel be a family of vectors of a linear space X. Then the following statements on the family B are equivalent. (i)
B is a base of X.
(ii) B is a maximal family of linearly independent vectors of X, i.e., B is not a proper subfamily of any family of linearly independent vectors of X. (iii) B is a linearly independent family of generators of X.
PROOF. (ii) follows from (i). Let B be a base. Then B is clearly a linearly independent family of vectors. It remains to prove that any family B' of vectors which has B as a proper subfamily cannot be linearly independent. Indeed such a family must contain xi for each and, besides these, at least one vector y of X distinct from each xi (iEI). Since B is a base, y is a linear combination of the vectors
xi. Therefore at least one of the vectors of B' is a linear combination of other vectors of B' and hence B' is linearly dependent. (iii) follows from (ii). Let B be a maximal family of linearly independent vectors. For each vector x of X, we have scalars A and Ai
such that Ax +,e Aixi = 0 where A 0 0 or Ai : 0 for some i EI. Assume that A = 0. Then we get EAixi = 0 where Ai * 0 for someiel, jEl
contradicting the linear independence of B. Therefore A * 0 and (-wi)xi x=E is a linear combination of vectors of B. Hence B is a iE1
A
linearly independent family of generators of X.
(i) follows from (iii). Let B be a linearly independent family of generators of X. By the linear independence of B, no vector of B is a
linear combination of the other vectors of B. Therefore no proper subfamily of B can generate A. Hence B is a base of X.
D. Dimension In the examples of §2A, we have seen that some linear spaces have
finite bases whereas others do not. We say that a linear space is
24
1
LINEAR SPACE
finite-dimensional if it possesses a finite base. We propose to study here the bases of a finite-dimensional linear space X.
Earlier in §2B, we have put forward the question of whether a base always exists for each linear space. With the definition just given
above it seems that we have neatly "defined away" the problem partly. This is, to a certain extent, quite true; but even in the case of a finite-dimensional linear space, where a finite base exists by definition, there are some interesting and important problems to be solved. First, we shall show that every base of X is finite; this means that we can conclude from the finiteness of one base of X the finiteness of every other base of X. Finally, we shall show that all bases of X have the same number of vectors. This number will be called the dimen-
sion of X. After we have successfully completed this work and become accustomed to dealing with finite bases, we shall tackle the general case in § 3-
LEMMA 2.7. Let x1,
.. . , xp be linearly independent vectors of a
linear space X. If a vector xp+ 1 of X is not a linear combination of the vectors x, , ... , xp, then the vectors x,, : .. , xp, xp + 1 are linearly independent. Let Al , ... , Ap, Ap+1 be scalars such that AI xl + . + Apxp + Ap+lxp+l = 0. Then Ap+I = 0, for otherwise xp + I is a linear combination of x1, ... , xp, contradicting the assumption. TherePROOF.
-
fore we get A1x1 + ... + Apxp = 0. By the linear independence of these vectors, we get Al = = Xp = 0. Therefore Al = A2 = ' ' = Xp = Ap+1 = 0, proving the linear independence of the vectors -
X1, ... , Xp, Xp+1 .
Lemma 2.7. gives a sufficient condition for enlarging a linearly
independent family (xl, ... , xp)by the adjunction of a vector xp+ I to a linearly independent family (x1 , . . . , xp, xp+ I ). Our next step is to study the replacement of a vector in a linearly independent family (x1, ... , xp) by another vector y without changing the set of vectors which the family generates. In the statement of the next lemma, we again use the symbol A to
indicate that the expression under A is to be deleted. From now on the symbol A will be used exclusively for this purpose.
Let x, , ... , xp be linearly independent vectors of a linear space X and y = Al x1 + .. + Apxp a linear combination. , p) is such that Ai * 0; then Suppose ie{1, LEMMA 2.8.
.
.
.
FINITE-DIMENSIONAL LINEAR SPACE
§2
the vectors y, x1,
(a)
... ,
25
z,, .. , xp, are linearly inde-
pendent, and (b) the vector x, is a linear combination of the vectors
Y.x1, ..., x,, ..., xP
PROOF. Without loss of generality, we may assume that i = 1. Let + µPXP = 0. µl, ... , pp be scalars such that µ1y + 112z2 + After substitution, we get µl'XlXI + (µ1R2 + µ2)R2 + ... + (µ1)'P + Mp)xp = 0.
By the linear independence of the vectors x1, ... , xp and the inequality X1 * 0, we get µl = µ2 = ... = µp = 0. Therefore
vectors y,
the
Moreover, x1 = fly hold.
x2 ,
... ,
- i x2 -
linearly independent. 4P-x,, .Therefore (a) and (b)
xp are
-
We are now in a position to prove the following supplementation or replacement theorem which has many important applications in the theory of linear spaces. This theorem is essentially a generalization of 2.8 which allows us to replace a number of vectors of a linearly independent family (xl, xp) by certain given linearly independent vectors Y1, ... , yp, without changing the set of vectors
.....
which the family generates.
Let x1,
ThEOREM 2.9.
. . .
, xP be p linearly independent vectors
and let yl , ... , Yq be q linearly independent vectors of a linear space X. If each y, (j = 1, . . , q) is a linear combination of the .
vectors x 1
,
.
vectors x1 , .
.
, xP , then q < p and there exist q vectors among the
. . ,
vectors are x1, (a)
.
xp, the numbering being chosen so that these
. . .
, Xq, such that
the p vectors y1 ,
..., yq
xq+ 1 ,
... , xp are
linearly
independent, and (b)
for each j = 1, ... , q, the vector xi is a linear combination the p vectors yI , ... , yq, Xq + I , ... , xP .
The theorem is trivial if q = 0. For q 0 0, we proceed to replace one vector at a time. Since the vectors yI , ... , Yq are linearly independent, we have y1 = 0. Therefore in the unique rePROOF.
presentation y1 = X1 x1 +
+ Xpxp, at least one of the p scalars
26
1
LINEAR SPACE
... , Ap is non-zero. Renumbering x, , ... , xp and the corresponding scalars if necessary, we can ensure that A, * 0. Then it follows from 2.8 that (a,) Y1, X2, ... , XP are linearly independent, and (b1) xI is a linear combination of y, , x2, ... , x, . Similarly y2 * 0 and, by (b,) above, y2 is a linear combination of Y1, x2 , ... , xp . In the unique representation y2 = Al Y1 + 112X2 Al ,
+ pp xp , at least one among the p-1 scalars P2, ... , pp is non-zero, for otherwise y, and y2 would be linearly dependent, contradicting the hypothesis of the theorem. Again renumbering if +
necessary, we can ensure that p2 * 0. Then it follows from 2.8 that
(a2) y, , J'2, x3 , ... , xp are linearly independent, and (b2) x2 and hence also x, are linear combinations of the vectors y1, y2, x3, ... ,xp. The theorem is proved if this procedure can be carried out up to the q-th step, at which we obtain: (aq) y1 , ... , yq, xq+1 , . . , xp are linearly independent, and (bq) X1, ... , xq are linear combinations of yl, . . . , yq, .
xq+,, ..., xp. In other words, it is sufficient to prove that q < p. This means that
the vectors x, are not used up before the vectors y; are. Let us assume to the contrary that q > p. Then we can carry out our procedure of replacement up to the p-th step and get
(ap) y,
,
... , yp are linearly independent, and
(bp) x, , ... , xp are linear combinations of y l , ... , yp .
But the vector yp+, is a linear combination of x, , ... , xp and therefore, by (bp ), it is a linear combination of y 1 , ... , yp, contradicting the linear independence of y, , ... , yq. The proof is now complete.
2.10. Let x 1i ... , xp be p linearly independent vectors and F = (yt);E, a family of linearly independent vectors of a COROLLARY
linear space X. If, for each iel, y; is a linear combination of xl, . . . , xp then F is finite and the number of vectors of F is not greater than p.
We can now apply the supplementation theorem and its corollary
§2
FINITE-DIMENSIONAL LINEAR SPACE
27
to prove some basic results on bases of a finite-dimensional linear space.
It follows from 2.10 that every infinite family of vectors of a finite-dimensional linear space is linearly dependent. Therefore every base of a finite-dimensional linear space is finite. If x 1, ... , xp and
yl, ... , yq form two bases of a linear space X, then the vectors x1, ... , xp are linear combinations of the vectors y,, ... , yq and conversely the vectors yl , ... , Yq are linear combinations of the . . , xp. Therefore by 2.9 or 2.10, p = q. Hence the following theorem on invariance of dimensionality of finite-dimen-
vectors x 1,
.
sional linear space is proved.
THEOREM 2.11. Let X be a finite-dimensional linear space which has
a base consisting of n vectors. Then every base of X consists of n vectors and conversely, every family of n linearly independent vectors of X is a base of X.
Therefore the following definition is justified.
DEFINITION 2.12. The dimension of a finite-dimensional linear space X over A is the number n (n > 0) of vectors of any base of X and is denoted by dimAX, or simply dim X.
For a linear space X whose dimension is n, we also say that X is an n-dimensional linear space. The linear spaces of the examples 1.6 to 1.9 have dimensions as follows:
dim 0 = 0; dim V2 =2; dim V'=3; dim R" =n; dim C" = n; and dim RC2" = 2n. The linear spaces of 1.10 to 1.12 are not finite-dimensional. These are called infinite-dimensional linear spaces.
E. Coordinates In the discussion following 2.5a, we have indicated the possibility of assigning coordinates to vectors of a linear space. This section will concern itself with this problem. Let X be a linear space over A (here A = R or A = C as the case
may be) of dimension n and (x1, . . . , x") a base of X. Then, by theorem 2.5a, every vector x of X is a unique linear combination x=X1x1+
+ X" x" . The scalars A1, ... , X, , which are uniquely
28
I
LINEAR SPACE
determined by the vector x, are called the coordinates of the vector x
relative to the base (x1, ... , xn). Addition and scalar multiplication of vectors can be expressed in terms of coordinates. If x and y have coordinates A, , ... , X, and µ, , ... , µn , respectively, relative to the base (x, , ... , xn ), then, from
X = X1x, + ... + Anxn and
we get and
y = ta1X1 +
+ µnXn
x + y = (AI + µI)xI + rlx = (riX1)x1 + -
.. + (X, +lan)Xn + (rjX n)Xn for any scalar??.
Thus the coordinates of the sum of two vectors are the sums of the corresponding coordinates of the summands, and the coordinates of the product of a vector by a scalar are the products of the coordinates of the vector by the scalar in question. Using these results, we get a bijective mapping 4): X - An (here An - Rn or An = Cn as the case may be) by putting for every xE X 4)(x) = (a, , ... , an) where x = A 1 x I + ... + An xn . This mapping 4) has furthermore the property that
4)(x + y) _ D(x) + 4)(y) and
((Ax) A4)(x)
for all x, y(=-X and XEA. Mappings with this property will be the subject of our investigation in the next chapter. In the linear space V', a base consists of three vectors x = (0 A), y = (0, B) and z = (0, C), where 0, A, B and C are distinct noncoplanar points. We call the straight line passing through 0 and A the x-axis; the y-axis and the z-axis are similarly defined. The coordinates
(A, p, 7) of a vector p = (0, P) relative to the base (x, y, z) are obtained as follows. Construct a plane F through P and parallel to the plane spanned by the y-axis and the z-axis. Then F intersects the x-axis at a point A and A is the ratio of the directed segment OA' to the directed segment OA, i.e., (OA') = X(O, A). The coordinates p and y are similarly obtained. In this way our definition of the coordinates of the vector p = (0, P) coincides with the definition of
the coordinates of the point P in a parallel (but not necessarily rectangular cartesian) coordinate system.
§2
FINITE-DIMENSIONAL LINEAR SPACE
29
F.
Exercises
1.
Consider four vectors x = (1, 0, 0), y = (0, 1, 0), z = (0, 0, 1) and u = (1, 1, 1) of R3 . Show that the four vectors are linearly
dependent, whereas any three of them are linearly independent. 2.
(a)
Find a necessary and sufficient condition for which the vectors (1, t) and (1, 17) of R2 are linearly dependent.
(b) Find a necessary and sufficient condition for which the vectors (1, i E2 ), (1, r1, r12) and (1, , 2) of R3 are linearly dependent. 3.
Is it true that if x, y and z are linearly independent vectors then so also are the vectors x + y, y + z and z + x?
4.
Test whether the following family of vectors of R4 are linearly independent.
(i) a1 = (1,0,2,3), a2 = (1,6,-16,-21), a3 = (4,3,-1,0). (ii) a, _ (1, 2, -1, 0), a2 = (0, 1, 3, 1), a3 = (0, 0, 4, 3), a4 = (0, 0, 0, 6).
(iii) a, _ (1, 1, 0, 0), a2 = (1, 0, -1, 0), a3 = (0, 1, 0, 1), a4 = (1, 0, 0, 1), as = (3, 2, -1, 2). 5.
Find maximal subfamilies of linearly independent vectors of the following families:
(i) a, = (1, 0, 1), a2 = (3, 0, 0), a3 = (2, 0, 1) of R3.
(ii) b, = 1, b2=%/2, b3
b5=2+V3 of R'.
\/2 -\/3, b4=./6,
(iii) C, = 1, C2 = X, C3 = X2, C4 = X2 - X,
c5 = X + 1 of R [X]. 6.
Consider the canonical base (e I, e2, the vectors
... , en) of R'. Show that
fi = e, f2 = e1 + e2
A = e1+ ...+en
30
LINEAR SPACE
1
form a base of Rn and so also the vectors gl = fl
g2 = f1
+ f2
gn = f1
+f2+...
+ fn .
Express en as linear combination of f1i ... , fn and as linear combination of g1i 7.
.
. .
, gn .
Show that the vector x = (6, -7, 3) is a linear combination of the vectors
a = (2, -3, 1), b = (3, -5, 2), c = (1, 1, 0). Do the vectors a, b, c form a base of R'? 8.
Let
a=(1,1,1), b=(1,1,2) c=(1,2,3) d= (2, 1,-3), e = (3, 2,-5), f = (1,-1,-1).
Show that B = (a, b, c) and C = (d, e, f) are bases of R3. Find the coordinates of the vectors
x = (6,9 14) y =(6,2,-7) z = (0,0,0) relative to B and to C. 9.
Let F be the real space of all real-valued functions defined on the closed interval [a, b] (a
and (1, sine t, cos? t) linearly independent? 10. Let a and b be linearly independent vectors of a real linear space X and let u = c11a + c12b, v = c21a + c22b. Show that u, v are linearly independent if and only if c1 1 c2 2 c12c21 *0. , x5) in R5 which satisfy 11. Let W be the set of all (x1,
2x1 - x2 + x1
+
4x3 - x4
3x3
=0
- x5 = 0
9x1 - 3x2 + 6x3 - 3x4 - U. = 0 . Find a finite set of vectors which generates W.
§2
12.
FINITE-DIMENSIONAL LINEAR SPACE
31
Let x = (a, , a2) and y = (01 i R2) be vectors of R2 such that a, X31 + a2 92 = 0, Q1 + R2 = ai + a base for R2.
= 1. Prove that B = (x, y) is
Find two bases of C4 that have no vector in common so that one of them contains the vectors (1, 0, 0, 0) and (1, 1, 0, 0) and the other contains the vectors (1, 1, 1, 0) and (1, 1, 1, 1). 14. Determine the dimension of the linear space R+ of Exercise 6 of
13.
U.
Determine the dimension of the complex linear of Exercise 7 of §1. 16. Let X be the linear space of all real valued functions of a real 15.
variable
t. Prove that the functions et and e2t are linearly
independent. 17. Vectors x1 , ... , xp of Rn are said to be in echelon if the number of consecutive zero components in x;, starting from the
first component, increases strictly with i. Prove that non-zero
vectors x, , ... , xP are linearly independent when they are in echelon. 18.
Let a,, ... , a,
be distinct real numbers. Show that the
vectors. x1 = (1, a1, ai, ..., a1)
x2 = (1,
a22, ...
,
n
a2)
....................... n 2 xm = (1, am, am, ... , am
are linearly independent vectors Rn+1 if n > m-i. 19.
Let R[T]n be the linear space of all polynomials in the indeterminate T with real coefficient and of degree 5 n-1. (a) Prove that g1 = 1 , 92 = T, ... , gn = Tn -' forma base of R[T] . (b) Prove that if a, , ... , an are n distinct real numbers, then the polynomials
f = (T-a1) ... (T-at-1)(T-a,+1) ... (T - an) (i= 1, 2, ... , n) form abase of R [ T] n .
32
1
LINEAR SPACE
(c) Express each ff as a linear combination of gl , (d) Express each gl as a linear combination of fi ,
... , g, . , f,
20. Let X be a real vector space. Consider the set Xc = {(x, y):
x, y e X). Show that XX is a complex linear space with respect to addition and scalar multiplication below: (x, Y) + (x', y') = (x + x', Y + Y') (A + iµ) (x, Y) = (Ax - µY. Ay + µx).
Show also the dimRX = dimcXc. XX is called the complexification of X. § 3. Infinite-Dimensional Linear Spaces
The study of infinite-dimensional linear spaces occupies nowadays an important place in mathematics. Its development has been largely influenced by demands of functional analysis and theoretical physics,
where applications make use of the notion of topological linear space. Here we shall not touch upon notions of topology and restrict ourselves to purely algebraic results. In formulating and proving the theorems below, extra set-theoretical knowledge will be needed. Readers who are not accustomed to these set-theoretical ideas and study linear algebra for the first time
are advised to omit the present §3 entirely. Only the finite-dimensional counterparts of the theorems 3.1, 3.2 and 3.3 will be used in the main development of the theory in the later chapters. A. Existence of base
We shall prove in this section the basic result that every linear space has a base. For the proof of this, we shall make use of ZORN'S lemma. We remark that ZORN'S lemma, which is essentially an axiom
in set theory, is a very powerful and convenient tool for handling infinite sets. Let A be a set and C a non-empty set of subsets of A. A subset 1W An of Q' is called a chain in C if C C D or D C C for any C,
upper bound of 'e' in 0 is any Ue 6 such that C C U for all Ce W. Any Me 6th at is not a proper subset of any element of ' is called a maximal element of 6. In this setting, ZoRu's lemma is formulated as follows.
§3
INFINITE-DIMENSIONAL LINEAR SPACE
33
If every chain ' of e has an upper bound in (', then (' has a maximal element.
We have seen in the discussion following definition 2.4 that a family of linearly independent vectors is essentially the same as a set of linearly independent vectors. Therefore a base of a linear space X is a maximal set of linearly independent vectors of X. THEOREM 3.1. Every linear space has a base.
PROOF. Let X be a linear space. If X is finite-dimensional, then by definition, X has a finite base and there is nothing more to be said.
Assume now X is infinite-dimensional and let C be the set of all subsets of linearly independent vectors of X. By our assumption on X, we get X 0. Therefore 0 is non-empty; for instance, {x} E for every non-zero vector x of X. We shall show that every chain
in 0' has an upper bound in Cl. Let ' be a chain in (. Then we consider the union U of all CE K. Since ' is a chain, for any n(n %0) vectors x, , ... , xn of U there is a member C of ' such that x;EC for i = 1, ... , n. Since C belongs toe, the vectors x, , . . . , xn are
linearly independent. This means that U is a set of linearly inde-
pendent vectors of X and therefore UE O1 and U is an upper bound of the chain W. By ZORN's lemma, C has a maximal element M. M is then a maximal set of linearly independent vectors of X, and hence a base of X.
We can now generalize 2.10 to the supplementation theorem below.
THEOREM 3.2. Let B be a base of a linear space X and S a set of linearly independent vectors of X. Then there exists a subset B' of B such that SnB' = 0 and SUB' is a base of X. PROOF. Let C be the set of subsets of SUB, such that C belongs to C if and only if S CC and C is linearly independent. Following an argument similar to that used in the proof of 3.1, we have no difficulty in proving that Q' has a maximal element M. M is clearly a linear independent set of generators of X and hence a base of X. Therefore B' = M \ S satisfies the requirements of the theorem. B. Dimension
The general theorem on the invariance of dimensionality of linear
space does not follow immediately from the supplementation theorem as in the finite-dimensional case. For the formulation and
34
I
LINEAR SPACE
the proof of this theorem, certain results of set theorey are necessary. To each set S there is associated a unique set Card(S) called the
cardinal number of S in such a way that for any two sets S and T, Card(S) = Card(T) if and only if S and Tare equipotent, i.e., there is a bijective mapping between them. When S is equipotent to a subset of T, or equivalently when there is a surjective mapping of T onto S, then we write Card(S) < Card(T). The well-known SCHRODER-BERNSTEIN
Theorem states that if Card(S) < Card(T) and Card(T) < Card(S), then
Card(S) = Card(T). In what follows, we shall also make use of the following theorem: If A is an infinite set and T a set of finite subsets of A such that every element x of A belongs to some element S of ?P, then Card A = Card (jf ). The general theorem on the invariance of dimensionality is given as follows. .
THEOREM 3.3. Let X be a linear space. Then for any two bases B and C of X, Card(B)= Card(C).
PROOF. For a finite-dimensional linear space X, this theorem clearly holds. Therefore we may assume X to be infinite-dimensional. In this
case, B and C are both infinite sets. For the sets Y-(B) and 5C) of all finite subsets of B and C respectively, we get Card(l'(B )) = Card(B)
and Card(,(C)) = Card(C). For every Sef'(B), we construct Ts = { cCC: c is a linear combination of vectors of S). By 2.10, we see
that Ts is finite, hence T. E f(C). Therefore a mapping b: f B) -5C) is defined by the relation that 4)(S) = TS for all SEfB).
Denoting the direct image of 4) by $', we obtain Card(() < Card(_f(B)). Since B is a base of X, every vector c of C is a linear combination of a finite number of vectors of B. This means that every vector c of C belongs to some TS of T-, therefore Card(C)=
Card(f) < Card(fB))= Card(B). By symmetry, we obtain also Card(B) < Card(C). Therefore Card(B) = Card(C) by the ScImoDER-BERNsmn Theorem.
DEFINITION 3.4. The dimension of a linear space X over A, denoted by
dim AX or simply by dim X, is equal to Card(B) where B is any base of X C. Exercises 1. Show that in the linear space R[T] of polynomials
p = (p0,..., pk ..) and Q = (qo, ... , qk ...) where pk = Tk and qk = (T - X)k, X * 0 being a constant, are two bases. Express the vectors qk explicitly in terms of pk.
§4
SUBSPACE
35
2. Prove that the linear space F of all real valued functions defined on the closed interval [a,b], where a 0 b, is an infinite-dimensional linear space.
3. Let a be any cardinal number. Prove that there is a linear space A over A such that dim A = a.
§ 4. Subspace
A. General properties
Many linear spaces are contained in larger linear spaces. For instance the real linear space R' is a part of the real linear space RC2". Again, the linear space V2 of vectors in a plane is a part of the linear space V3 of vectors of a euclidean space. In the linear space R[T] of all polynomials with real coefficients in an indeterminate T, the set of all polynomials of degree less than a fixed positive integer n constitutes an n-dimensional linear space. These examples suggest the concept of a subspace. A subset Y of a linear space X over A is called a subspace of X if Y is itself a linear
space over the same A with respect to the addition and the scalar multiplication of X. More precisely, Y is a subspace of X if x + y and
Xx belong to Y for every x, yEY and every scalar XEA, and the axioms [Gl ] to [G4] and [M1 ] to [M3] are satisfied. It follows immediately from this definition that if Y is a subspace of a linear space X, then the subset Y of the set X is non-empty. For any linear space X, the subspace 0 = {0} consisting of the zero vector alone is the smallest (in the sense of inclusion) subspace of X and X itself is the largest (in the sense of inclusion) subspace of X. We shall see presently that if dim X > 1, then X has subspaces other than 0 and X.
THEOREM 4.1. Let X be a linear space over A. Then Y is a subspace of X if and only if Y is a non-empty subset of X and Xx + py belongs to Y for any vectors x, y of Y and any scalars X, p of A. PROOF. If Y is subspace, then clearly Y is non-empty and contains all linear combinations of any two vectors of Y. Conversely, since x +y
and Xx are linear combinations of the vectors x and y of Y, they belong to Y. The axioms [G1], [G2], [M1], [M2] and [M3] are obviously satisfied by Y since they are satisfied by X. Moreover, 0
36
1
LINEAR SPACE
and -x for every x of Y are linear combinations of vectors of Y. Therefore [G3] and [G4] are also satisfied, and hence Y is a subspace of X.
Let X be a linear space and S any subset (or any
EXAMPLE 4.2.
family of vectors) of X. Then the set Y of all linear combinations of vectors of S is obviously a subspace of X. Y is called the subspace of
X generated or spanned by S. In particular 0 is the subspace generated by 0.
The following theorems give information on the dimension of a subspace.
THEOREM 4.3. If Y is a subspace dim Y < dim X.
of a
linear space X, then
PROOF. By 3.1, X and Y have bases, say B and C respectively. By 3.3
and 3.4, we have dim X = Card B and dim Y = Card C. Since every
vector of C is a linear combination of vectors of B. we have Card C < Card B from the proof of 3.3. Therefore dim Y < dim X. For finite dimensional linear spaces we have a more precise result. THEOREM 4.4. Let Y be a subspace of a finite-dimensional linear space
X. Then dim Y < dim X. Furthermore, dim Y = dim X if and only if Y = X. PROOF. The
first part of the theorem is a special case of 4.3.
However, we shall give a new proof for this special case without resorting to the results of §3. For this, it is sufficient to show that Y has a base. If Y = 0, then the empty set 0 is a base of Y. For Y = 0, we can find a non-zero vector x, E Y. The set S, = {x 1 } is then linearly independent. If S, generates Y, then S, is a base of Y; otherwise, we can find a vector x2 of Y that is not a linear combination of vectors of S, S. By 2.7, the set S2 = {x1, x2 } is linearly independent. If S2 generates Y, then S2 is a base of Y; otherwise we
can proceed to find x3, and so forth. Now this procedure has to terminate after no more than n = dim X steps, since every set of n+l vectors of X are linearly dependent. Therefore Y has a base of no more than n vectors. The first part of the theorem is established. The second part of the theorem follows immediately from 2.11. B. Operations on subspaces
The set -'(X) of all subspaces of a linear space X over A is
§4
SUBSPACE
37
(partially) ordered by inclusion. Thus for any three subspaces Y', Y" and Y"" of A, we get
if Y' D Y" and Y" D Y"', then Y' D Y"' Y' = Y" if and only if Y' D Y" and Y" D Y'.
We shall introduce two operations in 2 (X). For any two subspace Y' and Y" of X, the intersection
Y' n Y" = {xEX:xEY'andxEY") is, by 4.1, also a subspace of X. In the sense of inclusion, Y' n Y" is the largest subspace of X that is included in both Y' and Y".
By this, we mean that (i) Y' 3 Y' n Y" and Y" 3 Y' n Y", and (ii) if Y' D Z and Y" 3 Z for some subspace Z of X, then Y' n Y" 3 Z.
Dual to the intersection of subspaces, we have the sum of subspaces. In general, the set theoretical union Y'U Y"of two subspace
Y' and Y" of X fails to be a subspace of X. Consider now the subspace of X generated by the subset Y'UY" of X. This subspace, denoted by Y' + Y" and called the sum or the join of Y' and Y", is clearly the smallest subspace of X that includes both Y' and Y". By this, we mean that (iii) Y' + Y" 3 Y' and Y' + Y" 3 Y", and (iv) if Z 3 Y' and Z 3 Y" for some subspace Z of X, then Z 3 Y' + Y". It is not difficult to see that
Y'+Y"={zEX:z=x+yfor some xEY'andyEY"}. The following rules hold in2'(X). For any three elements Y', Y" and Y"' of 2°(X), (a) (Y' n Y") n Y""" = Y, n (Y., n Y'"") and
(Y'+Y"")+Y"'=Y'+(Y"+Y'");
(b) Y'n Y""=Y"nY' and Y'+ Y" = Y"+Y'; (c) Y'n0=0,Y'nX=Y' and Y'+O=Y', Y'+X=X; (d) if Y' 3 Y"', then Y, n (Y" + Y"') = (Y, n y") + Y"'
On the dimensions of the intersection and the sum of subspaces we have the following useful theorem.
THEOREM 4.5. Let Y' and Y" be subspaces of a finite-dimensional linear space X. Then dim Y' + dim Y" = dim(Y' + Y") + dim(Y' n Y"). , . . , x, form a base of Y'n Y", then these vectors can be supplemented by vectors y, , . . . , ys of Y' to form a base of Y', and by vectors zl , ... , zt of Y" to form a base of Y". The theorem follows if we prove that the vectors xl , ... , x,, y,, ... , ys'
PROOF. If x,
.
38
1
LINEAR SPACE
Z1i ... , zt form a base of the sum Y' + Y". Clearly these vectors generate the subspace Y' + Y", and therefore it remains to be proved that they are linearly independent. Assume that (1) X, 1x1 +. ..+A,X,+p1yj +...+µsYS +PIZ, +...+vvzt = 0, then the vector (2)
x=AIx1 +...+ArXr+µ1Y1
(3)
x = -v1 z1 - ... -
vtztEY"
is a vector of Y' n Y". Since the vectors x I, ... , x, form a base of = µs= 0. Hence (1) becomes Y' n Y", we get from (2) µI = µ2 = A1X1 +
.. + ArXr +
vIZ1
+ ... + vtZt = 0.
But the vectors x1, . . . , Xr, z1, . . . , zt form a base of Y", therefore = Pt = 0. Hence the r+s+t vectors in Al = . . = Ar = Vi = question are linearly independent. C.
Direct sum
We have seen that for any two subspaces Y' and Y" of a linear space X we have
Y' + Y" = {zeX: z = x + y for some xeY' and some Y E Y" }.
The representation z = x + y of a vector of the sum by a sum of vectors of the summands is not unique in general. Take, for instance, the case where Y' = Y" = X. Let us assume that for a vector zEY' + Y", we have
z=x+y=x,+y, where x, x' are vectors of Y' and y, y' are distinct vectors of Y' Then the non-zero vector t = X - X, C= Y'
t=y'-yeY" belongs to the intersection Y' n Y". Conversely, if t is a non-zero vector of Y' n Y", then for each z = x + y of Y' + Y", where xEY'
§4
SUBSPACE
39
and yEY", we get z =(x+t)+(y-t)=x+y. ThusY'fY"=0isa necessary and sufficient condition for the unique representation of each vector of Y' + Y" as a sum of a vector of Y' and a vector of Y". This state of affairs suggests the following definition. DEFINITION 4.6.
Let Y' and Y" be subspaces of a linear space X.
Then the sum Y' + Y" is called a direct sum and is denoted by Y'® Y"
if for every vector x of Y' + Y" there is one and only one vector y'
of Y' and there is one and only one vector y" of Y" such that x = y' + y". It follows from the discussion above that Y' + Y" is a direct sum if and only if Y' fl Y" = 0. Furthermore for a direct sum Y' a 1"', the union B' U B" of any base B' of Y' and any base B" of Y" is a base of Y'9 Y". Therefore dim Y' e Y" = dim Y'+ dim Y". A straightforward extension of the idea of direct sum to that of more than two subspaces is as follows:
Let Y1 (i = 1, ... , p) be subspaces of a linear space X. Then the sum Y1 + . + Yp is called a direct sum and is . + Yp denoted by Y1 ® .. ® Yp if for every vector x of Y1 + there is one and only one vector y; of Y1 (i = 1, ... , p) such that DEFINITION 4.7.
x=yl+...+yp.
Similarly for Y1 ®
.
® Yp,, the union B1 U
U Bp of bases
B, of Y; is a base of Y1 ® .. ® Yp. We remark here that the direct + Y/ + ... + Yp) = 0 sum Yl ®... is Yp requires Y fl (Y1 +
for/=1, ...,p,andnotjustY;flYf=0fori*j.InV2,letL1,L2
and L3 be subspaces generated by the non-zero vectors (O,A1), (O,A2) and (O, A3) where the points 0, At, Ai are not collinear for i * j. Then the sum L1 + L2 + L3 is clearly not a direct sum but Li fl Li = 0 for i * j. The importance of direct sum is much enhanced by the possibility that, for any subspace Y' of a linear space X, we can always find a subspace Y" of X, called a complementary subspace of Y' in X, such that X = Y' ® Y". Indeed, if B' is a base of Y', then, by the supplementation theorem 3.2 or 2.9 for the finite dimensional case, we have a set B" of linearly independent vectors of X such that B' fl B" = 0 and B' U B" is a base of Y. The subspace Y" generated by B" clearly satisfies the required condition that Y = Y' ® Y". We have therefore proved the following theorem.
40
1
LINEAR SPACE
THEOREM 4.6. For any subspace Y' of a linear space X, there is a subspace Y" of X, called complementary subspace of Y' in X, such
that X=Y'0Y". It follows that if Y" is a complementary subspace of Y' in X then Y' is a complementary subspace of Y" in X. Furthermore
dim X = dim Y' + dim Y"; in particular, when X is finite-dimensional
dim Y" = dim X - dim Y'. We remark that a complementary subspace is not unique (except in trivial cases). Indeed from the example above, we see that L1 Q) L 2 = V, L1 e L3 = V2 and L. e L3 = V2. Finally if X = Y1 e . . . e YP,
then we can study the subspaces Y1 individually to obtain information on X. Therefore the subspaces Y; can be regarded, in a certain sense, as linear independent subspaces of X, and the formation of direct sum as putting these independent components together to form X.
Quotient space Similar to the calculation of integers modulo a fixed integer, we have the arithmetic of vectors of a linear space modulo a fixed subspace. Let Y be a subspace of a linear space X over A. Then we say that two vectors x and y of X are congruent modulo Y if and only if x - y e Y. In this case, we write D.
x = y (mod Y). The congruence modulo Y is easily seen to be reflexive, symmetric and transitive: for any three vectors x, y and z of X. (a) x = x (mod Y); (b) if x = y (mod Y), then y = x (mod Y); (c) if x = y (mod Y), and y = z (mod Y), then x = z (mod Y).
Furthermore, the congruence modulo Y is compatible with addition and scalar multiplication in X. By this, we mean that if x = y (mod Y) and s = t (mod Y), then
(d) x+s=y+t(modY), (e) Ax = Ay (mod Y) for all scalars XeA.
§4
SUBSPACE
41
Therefore, by (a), (b) and (c), the congruence modulo Y defined in X is an equivalence relation. With respect to this equivalence relation, we have a partition of X into equivalence classes
[x] = {aEX: x = a (mod Y)} where x(-=X. In the set X/Y of all equivalence classes [x] (xEX), an addition and a scalar multiplication are defined by
[x] + [s] _ [x +s] A[x] _ [lax].
and
By (d) and (e), these two composition laws are well-defined. Moreover, it is readily verified that the set X/Y constitutes a linear space over A called quotient space of X by the subspace Y. For any y E Y, and in particular for y = 0, the equivalence class [y] is the zero vector of the quotient space X/Y, similarly the equivalence class
[-x] is the additive inverse of the equivalence class [x] for any xEX. Consider the linear space V2 of all vectors on the ordinary plane with common initial point at the origin 0. If we represent graphically every vector of V2 by its endpoint, then a 1-dimensional subspace is represented by a straight line passing through the origin, and conversely every straight line passing through the origin represents in this way a 1-dimensional subspace of V2. Let L be a 1-dimensional subspace represented by the straight line 1. Then it is easily seen that elements of the quotient space V2/L are represented by straight lines parallel to 1, and that the algebraic structure on V2/L corresponds to that of
the 1-dimensional subspace G represented by the straight line g passing through the origin and perpendicular to 1.
It follows from the definition that the quotient space X/0 is essentially the same as X itself whereas XIX = 0. In a finitedimensional linear space X, we get dim X/Y = dim X - dim Y. To see this we use a base (x1, ... , xp, xp+ 1 , ... ) xn) such that (x1, ..., xp) is a base of Y. Then ([xp + 1 ] , ... , [x, ]) is easily seen to form a base of X/Y. E.
Exercises
1.
In R4, L1
is the
subspace
generated by (1,2,0,1),
(-1, 2, 0,-3) and (1, 0, 0, 2), and L2 is the subspace generated
42
1
LINEAR SPACE
2.
by (0, 2, 0,-1), (0, 1, 0, 1) and (2, 5, 0, 3). Find the dimensions of L1, L2, L, n L2 and L, +L2In C', Let H, be the subspace generated (1, 0, 0) and (1, 1, 0) and let H2 be the subspace generated by (1, 0,-1) and (1, 1, 1). Find the intersection H, n H2 and determine its dimension.
3.
Let a=(-1,1,0),b=(1,0,1), c = (0, 1, 1) be vectors of R3. Show that the subspace X generated by a, b, c has dimension 2. Find a general expression of vectors of X using 2 independent parameters s and t.
4.
Let X be a linear space and S a subset of X. Show that the following statements are equivalent: Y is the subspace of X generated by S. (ii) Y is the intersection of all subspaces of X that include S as a subset. (i)
(iii) Y is the smallest (in the sense of inclusion) subspace of X that includes S as a subset. 5.
6.
7.
Let A' and A" be subspaces of a linear space A. Prove that if A = A' U A" then A = A' or A = A". Let A' and A" be subspaces of a linear space A. Prove that A' U A" is a subspace of A if and only if A' J A" or A' C A". Let X1, ... , X. be proper subspaces of a linear space X. Show that there is at least one vector of X which belongs to each X1 and there is at least one vector of X which does not belong to any X; .
8.
Let X = X1 e X2 and Y = Y1 a Y2 . Show that X X Y = (X 1 x Y1) e (X2 X Y2 ).
9.
Let A and B be subspaces of the n-dimensional complex arithmetical linear space C'. Show that
(a) A n R" is a subspace of the n-dimensional real arithmetical linear space R". (b) 2 dimCA - n < dimR A n R" < dimcA. (c)
If dims A = dimR A n R", dims B = dimR B n R" and
AnR"=BnR", then A
B.
§4
10.
43
SUBSPACE
Let A be a linear space and (X,),Ej a family of subspaces of A. (a)
Prove that the intersection X = nX, = {x E: A: x e X, for all iCj
i e 1} is a subspace of A.
(b) What is X if I = 0? 11.
Let (x,)iej be a family of vectors of a linear space A and X; the 1-dimensional subspace of A generated by x; for each i el. Prove that the family (x,),ej is linearly independent if and only if (i) xi Xi for i = j and (ii) for each jeI, X; n Yi = 0 where Y is the subspace of A generated by all xi where ieI, i * j.
12.
Let A be a linear space. Show that for any subspaces A,
,
A2
and A 3 of A the following equations hold.
(i) (A, +A2)+A3 =A1 + (A2 + A3) (ii) A, +A2 =A2 +A, (iii) A, n (A2 + A3) =(A, n A2) +(A1 n A3) if Al 3 A2. 13.
Let A' be a subspace of a linear space A. Prove that (a) dim A'+ dim A/A' = dim A. (b) if B" is any base of any complementary subspace A" of A'
in A, then the set { [x"] : x"eB"} is a base of A/A'. (c)
If A' 0 0 and A' 0 A, then A' has more than one com-
plementary subspace in A. (d) all complementary subspaces A" of A' in A have the same dimension. 14.
Let X1 and X2 be subspaces of a linear space A such that x, n X2 = 0. Prove that B1 U B2 is a base of X, e X2 for any base B1 of X1 and any base B2 of X2 .
15.
... , X be subspaces of a finite-dimensional linear space A and X = X, + .. + X their sum. Show that the Let X1,
following statements are equivalent.
(i) every xcX has a unique representation as x = x1 + .. + xn where xi r= Xi for i = 1, 2,
(ii) X,n(X1+
... , n.
+Xi+ ...
(iii) dim X = dim X, +
+ dim X,, .
...,n.
44
1
(iv)
LINEAR SPACE
x, nx2 =o,(x,+X2)nX3=0,(X,+X2 +X3) n x4= 0,
..., (X, + X2 + ... + X -1) n Xp =0.
a family of subspaces of A. We define X = EXi as the subspace of A generated by the subset
16. Let A be a linear space and
UXi of A.
iEI
(a)
Show that each vector x of X can be written as x = Exi, iEI where (xi)1EJ is a family of finite support of vectors of A such that xiEXi for every
(b) Show that representation x = Exi in (a) is unique for every
xeXif and only if x, n (EXi) = 0 for each jEI. iEJ
i 4i 17.
Let X, , ... , X, be subspace of an n-dimensional linear space X. Suppose for i = 1, ... , r dim Xi < k where k < r. Show that
there exists a (n-k)-dimensional subspace Y of X such that
Xi0Y=0foralli=l, 18.
. . . ,
r.
Let X1, ... , X, be subspaces of a linear space X. Suppose Y is subspace of X which is not contained in any of the subspaces Xi. Show that there is a vector y in Y which is not contained in any of the subspaces Xi.
CHAPTER II LINEAR TRANSFORMATIONS
At the beginning of the last chapter, we gave a brief description of abstract algebra as the mathematical theory of algebraic systems and,
in particular, linear algebra as the mathematical theory of linear spaces. These descriptions are incomplete, for we naturally want to
find relations among the algebraic systems in question. In other words, we also have to study mappings between algebraic systems compatible with the algebraic structure in question.
A linear space consists, by definition, of two parts: namely a non-empty set and an algebraic structure on this set. It is therefore natural to compare one linear space with another by means of a mapping of the underlying sets on the-one hand. On the other hand, such a mapping should also take into account the algebraic struc-
tures. In other words we shall study mappings between the underlying sets of linear spaces that preserve linearity. These mappings are called linear transformations or homomorphisms and they will be the chief concern of the present chapter.
§ 5. General Properties of Linear Transformation
A. Linear transformation and examples In assigning coordinates to vectors of an n-dimensional linear space
X over A relative to a base (x 1 ,
. . . )
X,), we obtained in §2E a
mapping F: X -+ A" such that, for every vector x of X, ,D(xl = (XI
,
... ,
A") where x = X1 x1 + ... + A"x".
As a mapping of the set X into the set A", 4) is bijective. Relative to the algebraic structure of the linear space X and of the linear space M, (D has the following properties: (i) Vx + y) _ c(x) + c(y), (ii) cF(Ax) = A(D(x), for any two vectors x and y of X and any scalar A of A. Note that x + y 45
46
11
LINEAR TRANSFORMATIONS
and Ax are formed according to the addition and the scalar multiplication of the linear space X, while 4'(x) + 4)(y) and A4)(x) according to those of the linear space A" . Since the algebraic structure of a linear space is defined by its addition and scalar multiplication,
the relations (i) and (ii) express that the mapping 4' is compatible with the algebraic structure of linear space. Therefore 4) is an example of the type of mapping that we are looking for. DEFINITION 5.1. Let X and Y be linear spaces over the same A. A mapping 0 of the set X into the set Y is called a linear transformation of the linear space X into the linear space Y if and only if for any vectors x and y of X and any scalar A of A, the equations
(i) 0(x +Y) = 0(x) + 0(Y), (ii)
0(Ax) = A0(x)
hold.
Note that the domain and the range of a linear transformation must be linear spaces over the same A. In other words, we do not consider as a linear transformation any mapping of a real linear space into a complex linear space even if (i) holds and (ii) also holds for all AER. Therefore, whenever we say: `4: X -> Y is a linear transformation' we mean that X and Y are linear spaces over the same A and 0 is a linear transformation of the linear space X into the linear space Y.
Since sums and scalar products are expressible as linear combinations, we can replace conditions (i) and (ii) by a single equivalent condition (iii) O(Ax + µy) = AO(x) + µ¢(y) for any x, yeX and A, pr=- A. Property (iii) is called linearity of 0. For linear spaces over A, linear transformation, linear mapping, A-homomorphism, and homomorphism are synonymous. EXAMPLE 5.2. Let X be a linear space over A and Y a subspace of X.
Then the inclusion mapping t: Y -> X is clearly an ijective linear transformation. In particular, the identity mapping ix : X --> X is a bijective linear transformation.
EXAMPLE 5.3. Let X and Y be two linear spaces over the same A. Then the constant mapping 0: X -> Y, such that 0(x) = 0 for every xEX, is a linear transformation called the zero linear transformation. It is easily seen that the zero linear mapping is the only constant linear transformation of the linear space X into the linear space Y.
§5
GENERAL PROPERTIES OF LINEAR TRANSFORMATION
47
EXAMPLE 5.4. Let Y be a subspace of a linear space X and X/Y the
quotient space of X by Y. Then the natural surjection 77: X - X/Y defined by q(x) = [x I for every x 1C is a surjective linear transformation.
EXAMPLE 5.5. If 0: X - Y is a linear transformation and X1 is a subspace of X then the restriction 01 = 0IXI of the mapping 0 to the subset X1 is clearly a linear transformation 01 : X1 - Y of linear spaces.
EXAMPLE 5.6. In §4C we have seen that if X = X1 ® X2, then the direct summands Xi can be regarded as independent components of
X which are put together to form X by a direct sum formation. Following this idea, we consider a direct sum decomposition X = X1 ® X2 of the domain of a linear transformation 0: X -> Y. By 5.5 above, we get restrictions ¢1 : X1 -> Y and 02: X2 -+ Y of 0. On the other hand for each vector x of X we can write x = x1 + x2 with unique x, E X, . Now
Ox=0(x1+x2) =Ox1 +¢x2 =01x1 +02X2
In other words 0 can be recovered from the restrictions O;. Conversely if 1Ij: X1 -+ Y are linear transformations, then we define a mapping 4: X -> Y by
OX = 1/(x1 +x2) = 'Plx1 +
2x2
If y = y1 + y2 with y;EX1, then for arbitrary scalars X and µ, ?x + µy = ()xI + /.[y1) + (?x2 + µy2) is the unique representation of Xx + µy as a sum of vectors of the direct summands. Consequently (llx+µy)=161(1.x1 +µY1)+lP2()X2 +µY2)
=XV'lx1 +µ01Y1 +MJi2x2 +µ2y2 X (4/1 X1 + 412x2) + µ('PIYI + 02Y2)
=Xt'x+µVIY Therefore 41 is a linear transformation. We have: If X = X1 ® X2, then there is a one to one correspondence between linear transformations 0: X -+ Y and pairs of linear transformations 0.: X1-> Y (i = 1, 2).
&LE 5.7. Following through the idea of Example 5.6, we can further write X as a direct sum of 1-dimensional sub-spaces. Analogously we see that 0 can be recovered from its restrictions on these 1-dimensional direct summands of X. This prompts us to consider linear transformations of 1-dimensional linear spaces. Let L be a
48
11
LINEAR TRANSFORMATIONS
1-dimensional linear space and rr: L -+ Y be a linear transformation. For a fixed non-zero vector z of L, let y = rrz be its image under rr. Since L has dimension 1, (z) is a base of L and every tEL can be written as t = Az with a unique scalar A. Now it = rr(Az) = A(rrz) = Ay.
Thus rr is completely determined by linearity and its value at a nonzero vector.
Conversely let u be any non-zero vector of L and v any vector of
Y. Then we obtain a linear transformation t: L - Y through an extension by linearity of associating v with u: for t = µu of L, define Linearity of t is easily verified. EXAMPLES 5.8.
(a) Let X be an m-dimensional linear space over A with base
(x, , ... , x,,,) and Y any linear space over the same A. Then each family (y,, ... , y,,,) of m vectors of Y determines uniquely a linear
transformation 0: X -* Y such that $(x;) = y; for i = 1, . . . , m. Indeed, for each vector xEX, we have a unique representation x=Alx1 + ... +Amxm. In putting 0(x) = A1y1 + ... + Amym, we obtain a linear transformation 0: X -+ Y with the required property that 0(x;) = y, for every i = 1, ... , m. If 4)': X - Y is another such linear transformation, then
0'(x) = A, O'(xl) + ... + Am O'(xm) = Al y1 + ... + X YM = 0(x).
Therefore 0' = 0 and ¢ is unique. (b) In particular, if both X and Y are finite-dimensional linear spaces, with bases (x1 , ... , xm) and (y1, . . . , y,) respectively, then each family (ai/);=1, , . , , m ; / =1. .. , n of m x n scalars of A determines uniquely a linear transformation ¢: X - Y, such that
O(x;) = ajIYi + "' + ainyn for i = 1, ... , m. Furthermore, every linear transformation of the linear space X into the linear space Y can be determined in this way. The family (cqj);=1, ... , m ; /=1, ... , n of scalars is usually written as
§5
GENERAL PROPERTIES OF LINEAR TRANSFORMATION
49
c«11 ......... CYIn a21 .... ..... Q2n l«mn ......... CYmnj and referred to as the matrix of 0 relative to the bases (x 1, ... , xm) and (y1, ... , ym). (See §14A). (c) Using the method outlined in Example 1.13, we can construct for any two sets S and T, the free linear spaces FS and FT generated over A by the sets S and T respectively. If 0: S - T is a mapping of
sets, then we can extend 0 by linearity to a linear transformation (D: FS -> FT by defining
'F(s) = ¢s F(f) = Ef(s)4)s sES EXAMPLE 5.9.
for every and for every f = Ef(s)s of FS. sES
Let R[T] be the linear space of all polynomials in an
indeterminate T with real coefficients. For each polynomial f of WT], we define ¢(f) = dT ; then 0: R[T] - R[T] is a linear transformation. EXAMPLE 5.10. Let F be the linear space of all continuous functions defined on the closed interval [a, b I, For each f of F, we define
f (x) =
fXf(t)dt for all xE[a, b];
then 0: F--' F such that 0(f) = T is a linear transformation.
These few examples provide some of the most frequently used methods of constructing linear transformations; they also illustrate the usefulness of the concept of linear transformations. It follows immediately from the linearity that, for every linear transformation ¢: X-+ Y, (a) 0(0) = 0 and therefore (b) if x1, . . . , xp are linearly dependent vectors of X then their images 0(x1), ... , ¢(xp) are linearly dependent vectors of Y. We observe that the converse of (b) does not hold generally. In the following theorem, we shall see that the validity of the converse of (b) characterizes injective linear transformation.
50
11
THEOREM 5.11.
LINEAR TRANSFORMATIONS
Let 0: X - Y be a linear transformation. Then the
following statements are equivalent. (i)
0 is injective.
(ii) If xEX is such that q(x) = 0, then x = 0.
(iii) If x1 , ... , xp are linearly independent vectors of X, then O(x1), ... , O(xp) are linearly independent vectors of Y. PROOF. Clearly (ii) follows from (i).
(i) follows from (ii). If O(x) = 0(x'), then ¢(x - x') = 0, by the linearity of 0. Therefore, by (ii) x - x' = 0, proving (i).
(iii) follows from (ii). Let x1, ... , xp be linearly independent vectors of X and Al0(x1) + ... + Xp4(xp) = 0 for some scalars
. .., Xp. Then O(X1x1 + - - + Xpxp) = 0 and therefore X1x1 + -- - + Xpxp = 0, by (ii). Since the vectors x1, ... , xp are linearly independent, we get X1 = = Xp = 0. Therefore the vectors ¢(x I), ... , O(xp) are linearly independent. X1,
-
-
(ii) follows from (iii). Let xEX be such that 0(x) = 0. This means that the vector 0(x) of Y is linearly dependent, therefore by (iii) the vector x of X is also linearly dependent. Hence x = 0. Composition We recall that for any two sets X and Y, the mappings of the set X into the set Y form a set Map(X, Y) and if Z is an arbitrary set, then B.
for any OEMap(X, Y) and any GMap(Y,Z) a unique composite OoOE Map(X, Z) is defined by
/-O(x) = i(O(x))
for every xEX.
It follows immediately from the definition that all linear transformations of a linear space X over A into a linear space Y over the same A
constitute a non-empty set. Denoted this set by HomA(X, Y) or simply Hom(X, Y) if no danger of confusion about A is possible. Moreover, if Oe Hom (X, Y) and
E Hom (Y, Z), then the composite iio0EHom(X, Z). Indeed, for any x, yeX and scalars A, p of A, we
get
>VoO(Xx + lay) = Wxx + l1y)) = ,(X0(x) + µo(Y))
= ?u(O(x)) + µ'(0(Y)) = X,-O(X) + la,oO(y).
§5
GENERAL PROPERTIES OF LINEAR TRANSFORMATION
51
The composition of linear transformations is associative, that is, if 0: X -+ Y and l i : Y -- Z and t: Z - T are linear transformations, then
(o0o0 On the other hand, the composition is not commutative. By this, we mean that in general /o 0 * 0o 1 J / even if the composites >Ji o o and 0 o 1V are defined. Take, for instance, the linear transformations 0 and >y of a 2-dimensional linear space X into X defined by
Xx 1) = 0,
0(x 2) = X2;
/i(x1) = x2,
1G(x2) = x1 ;
where X1, x2 form a base of X. Then , o0(x 1) = 0, whereas cao kX 1) = x2 , and therefore `loo 0 0 o.. The identity mapping ix : X -> X is a neutral element with respect
to the composition,
in
the sense that for any pair of linear
transformations 0: X --> Y and ii': Y -> X.
coix = 0 and ix o 1J = 1i.
Relative to composition, injective and surjective linear transformations can be characterized by the properties formulated in the following theorems. THEOREM 5.12.
Let 0: Y - Z be a linear transformation. Then the
following statements are equivalent. (a)
0 is injective.
(b) For any linear space X and any linear transformations a, $3
of X into YifOoa=¢o$, then a=(3. PROOF.
(b) follows from (a). Let a, 0: X -+ Y be linear trans-
formations such that 0oa = Oo (3. Then for everyxeX,we get 0(a(x)) = 0(13(x)). Since 0 is injective, a(x) = $3(x) for every xEX. Therefore a=$3.
(a) follows from (b). Let y be a vector of Y such that 0(y) = 0. If y 0 0, then, by the supplementation theorem 3.2 or 2.9 in the case
where Y is finite-dimensional, we have a base {y} U S of Y. By 5.8(a), a linear transformation a: Y - Y is uniquely defined by
a(y) = y and a(s) = 0 for all seS.
52
11
LINEAR TRANSFORMATIONS
Then 0° a = 0 °0 where 0: Y - Y is the zero linear transformation. By (b), we get a = 0, contradicting the definition of a. Therefore y. = 0 and 0 is injective, by 5.8. THEOREM 5.13.
Let ¢: X -+ Y be a linear transformation. Then the
following statements are equivalent. (a)
0 is surjective.
(b)
For any linear space Z and any pair of linear transformations a,(3of Yinto Z if a0/3°0, then a=(3.
PROOF. (b) follows from (a). Let a, 0: Y -> Z be linear transformations such that a°4) = (3o 0. Since 0 is surjective, for every vector y of
Y, we get a vector x of X such that ¢(x) = y. Therefore a(y) _ a(¢(x)) = a° 4)(x) = R ° 4)(x) = (3(0(x)) = (3(y), proving a = (3. (a) follows from (b). Since 0 is a linear transformation, we see that
Im 0 is a subspace of Y. Suppose Im 0 * Y, then by 4.6 there is a subspace Y' of Y such that Y'* 0 and Y = Y' ® Im 0. Now a linear transformation a: Y -; Y is uniquely defined by
a(y' + y") = y' for all y' E Y' and y"E Im¢. Then a ° 0 = 0 ° 0 where 0: Y - Y is the zero linear transformation. By (b), we get a = 0, which is impossible, and therefore Im 4) = Y. Hence 0 is surjective.
Theorems 5.12 and 5.13 state that injective (surjective) linear transformations are left (right) cancellable.
In fact the conditions (b) of these theorems can be taken as a definition of injective (surjective) linear transformation in terms of composition of linear transformations alone. C.
Isomorphism
Bijective linear transformations are called isomorphisms. If 0: X -+ Y is an isomorphism, then the inverse mapping 0-1 of the set Y into the set X defined by
0-1(y) = x where xEX is such that 0(x) = y, is also an isomorphism called the inverse isomorphism of 0. Clearly 0-1 is bijective; therefore it remains to be shown that 0-1 is a linear
§5
GENERAL PROPERTIES OF LINEAR TRANSFORMATION
53
transformation. For y and y' of Y, let x and x' be the unique vectors of X such that O(x) = y and 0(x') = y'. Then for any scalars A and A' we get 0(Ax + A'x') = Ay + A'y'. Therefore 0-' (Ay '+ A'y') = Ax+ A'x' and hence 0-' is linear. = X0_' (y) + The formation of inverse isomorphisms has the following properties. (a) (b)
o O = ix and 000-' = iY. If 0: X --> Y and 0: Y -> Z are isomorphisms, then
0-1
Two linear spaces X and Y are said to be isomorphic linear spaces if and only if there exists an isomorphism between them. In this case,
we write X = Y. From the abstract point of view, two isomorphic linear spaces X and Y are essentially indistinguishable because we are not interested in the nature of the vectors of these spaces but only in their composition laws and these are essentially the same.
Through the notion of isomorphism, the notion of dimension of linear spaces gains its prominence. This state of affairs is formulated in the following important theorem.
THEOREM 5.14. Two linear spaces X and Y are isomorphic if and only if dim X = dim Y.
PROOF. Let us consider the case where X and Y are finite-dimensional. Assume that dim X = dim Y = n. For a fixed base x1 , ... , x of X and a fixed base y1, ... , yn of Y, we define a linear transformation 0: X -> Y such that 0(x1) = y; for all i = 1, ... , n. Then 0 is evidently an isomorphism.
Conversely, let 0: X i Y be an isomorphism and (x1, ... , xn) a base of X. Then, by 5.11, 0(x 1), ... , 0(xn) are linearly independent vectors of Y. Therefore, by the supplementation theorem 2.9, we can find vectors y 1, ... , yn, (m > 0) such that
(0(x1), ... is
,
O (xn ), Y 1, ... , Ym )
base of Y. Applying the inverse isomorphism 0-' of 0 this base, we get n + m linearly independent vectors
a
to X11 ... , xn , 0-1 (y 1), hence dim X = Y.
... , 0'1 (ym) of X. Therefore m = 0, and
54
II
LINEAR TRANSFORMATIONS
Thus, the above theorem holds in the finite-dimensional case. The proof for the general case, being a straightforward generalization of the above one, is left to the reader.
Consequently, every n-dimensional linear space over A is An. In this isomorphic to the n-dimensional arithmetical linear space sense, An can be regarded as the prototype of n-dimensional linear space over A. However this does not imply that from now on, when dealing with finite-dimensional linear spaces, we shall restrict ourselves to the study of arithmetical linear spaces. The main objection to doing this is that each isomorphism (D of an n-dimensional linear space X onto A" defines a unique coordinate system in X. Therefore if we study A" instead of X, we are committing ourselves to a particular coordinate system of X. This would mean that with each definition and each theorem it is necessary to show the independence of the choice of the coordinate system in which the definition or the theorem is formulated. Such a proof is usually very tedious and uninteresting.
We observe that X = Y means by definition that there exists one isomorphism between the linear spaces in question. In some specific
case, we are concerned with the existence of isomorphisms that express certain relations between the linear spaces X and Y (see § 8C and § 22A). D. Kernel and image
Let 0: X -+ Y be a linear transformation. Then, in a most natural way, an equivalence relation ^- is defined in the non-empty set X by
x,"' x2 if and only if O(XI) = ¢(x2),
for every pair of vectors x, and x2 of X. Consequently, the set X is partitioned into mutually disjoint equivalence classes. The class that contains the zero vector of X deserves our special attention. This is called the kernel of the linear transformation 0 and is defined by
Ker 0 = (zEX:0(z) = 0) . Firstly, we shall see that the kernel of 0 is a subspace of the linear
space X. Since 0 e Ker 0 and for any z, and z2 of Ker 0 and any scalars X, and X2, we have ¢(A, z, + X222) = X2 0(z,) + X2 0(z2) = 0, Ker 0 is a subspace of X.
§5
GENERAL PROPERTIES OF LINEAR TRANSFORMATION
55
Moreover, due to the algebraic structure of X and the linearity of
the behaviour of 0 in relation to vectors of X is completely determined by Ker 0. More precisely: for any vectors x, and x2 of X, 0(x1) = 0(x2) if and only if xI - x2 belongs to Ker 0. In particular, a linear transformation 0 is injective if and only if Ker = 0, and 0 = 0 if and only if Ker 0 = X.
Let us now consider some subspaces of the range Y of a linear transformation 0: X -> Y. For every subspace X' of X, the direct image 0[X'] = {yEY: ¢(x) = y for some xeX'} of X under 0 is a subspace of the linear space Y. Indeed, for any two vectors yI and y2 of 0[X'], we have vectors xI and x2 of X such that 0(x1) = Yl and 0(x2) = Y21 therefore for any scalars A, and A2 , we get 0(X1 XI + A2x2) = XI0(X1) + A20(x2) = AIYL + A2y2. Since 0[X'] is not empty, 0[X'] is a subspace of Y. In particular, the image Im 0 _ 0[X] of 0 is a subspace of Y. From the discussions above, we see that the subspaces Ker 0 and Im 0 show to what extent the linear transformation 0 in question differs from an isomoprhism. NOETHER's isomorphism theorem 2.17 in § 5E will give full information on this matter.
A useful relation between the dimensions of Ker 0 and Im 0 is given in the theorem below.
THEOREM 5.15. Let X be a finite-dimensional linear space and 0: X -> Y a linear transformation. Then dim X = dim (Ker 0) + dim (im 0). PROOF. Let ( x 1 ,
. . .
,
xp) be a base of the subspace Ker 0. By the
supplementation theorem, we can extend this base of Ker 0 to a base (XI ,- .. , xp) xp+ I .... , xn) of the entire linear space X. The theorem is proved if we can show that (0xp+I ) ... , Oxn) form a
base of Im Y. Every vector of Im 0 is of the form ¢x with xeX.
+ ApXp + Ap+Ixp+I + + Apxn, Therefore if x = A1x, + + An ¢xn ; hence (Oxp+I, ... , On) then Ox = Ap+, 0xp+, + +µn¢xn = 0 for some scalars j. generates Im0.Ifpp+I Oxp+I + + Anxn)= 0. n) then by linearity 0(µp+1xp+1 + (i = p + 1, ... , Therefore the vector µp+Ixp+I + .. + Anxn belong to Ker 0. Our
choice of the base (xI,...) x,, xp+I,...,xp) shows thatAp+, = ... = An = 0. Therefore (0xp+ I , ... , On) is linearly independent and the theorem is proved.
56
11
LINEAR TRANSFORMATIONS
When the domain X of a linear transformation 0: X --> Y is finitedimensional linear spaces, a numerical rank of 0, denoted by r(¢), is defined as
r(o) = dim(Im ). Since Im 0 is a subspace of Y, we get r(o) 5 dim Y.
On the other hand we get from linearity of p
r(0) < dim X. E. Factorization Let X, Y and Z be three linear spaces over the same A. Then for any linear transformations 0: X - Y and >!l: Y Z, we get . = Ooo:
X -+ Z. In other words, we can fill into the following diagram
a
unique linear transformation t so as to make it commutative.
Now we shall investigate a problem of the following nature. Given
linear transformations t : X -- Z and 0: X - Y, does there exist a linear transformation >!l: Y - Z such that t = Y,o0? In other words, can we fill into the following diagram some linear transformation so as to make it commutative?
X
0
y
In general, there exists no such linear transformation y'. Take, for instance, t * 0 and 0 = 0. Therefore we shall only consider a special case of this problem in which a restrictive condition is imposed on 0.
§5
GENERAL PROPERTIES OF LINEAR TRANSFORMATION
57
Let us assume that 0: X -+ Y is surjective. In this case, if a linear transformation 0: Y -* Z exists so that t = oO, then is unique, by
Theorem 5.13. Moreover, we have Ker t D Ker 0. ' Therefore Ker t D Ker ¢ is a necessary condition for the solvability of the special problem. Now we shall show that this is sufficient. Indeed, under the assumption that Ker t J Ker 0, we get
if O(x) = 0(x') for some x, x' of X, then t(x) = t(x'). For the equation 0(x) = ¢(x) implies that x - x' belongs to Ker 0, and
hence to Ker %; therefore t(x) = t(x'). On the other hand, since 0 is surjective, we get for each yEY some xEX such that 0(x) = y. Therefore a mapping 1/i of the set Y into the set Z is uniquely determined by the condition for every yEY, > i (y) = t(x) where xEX and 0(x) = y. Obviously i; = l'/oO. Therefore it remains to be shown that 1': Y - Z is a linear transformation. For any two vectors y j and Y2 of Y, let x1 and x2 be vectors of X such that 0(x 1) = y1 and 0(x2) = Y2 . Then, by the linearity of 0, we get ¢(X1 x1 + X2x2) = X1y1 + X2y2 and therefore ll/(X1y1 +X2y2) = (A1x1 + X2x2) = Al (x1) + X2 t(x2) = )'1 i(y1) + A2 1 i(y2), and hence i is a linear transformation. We state our result as part of the following theorem. THEOREM 5.16. Let 0: X - Y be a surjective linear transformation. Then for any linear transformation t: X - Z, a linear transformation ip: Y -+ Z exists such that t = o0 if and only if Ker t D Ker 0. In this
case, the linear transformation y is unique. Furthermore, if i
is
surjective then i, is surjective; and if Ker t = Ker 0, then >y is injective.
PROOF. The first two statements are proved above. If t is surjective,
then for every zEZ, we have some xeX such that i(x) = z. By definition of s, >y(y) = z where y = 0(x). Therefore >L is surjective. If Ker t = Ker 0, then for any x, x' of X, 0(x) = 0(x') is equivalent to
t(x) = t(x'). Therefore (y) = 0(y') implies y = y' and hence
is
injective.
As a corollary of Theorem 5.16, we have NOETHER'S isomorphism
theorem that expresses an important relation between the image and the kernel of a linear transformation.
COROLLARY 5.17. Let 0: X -- Y be a linear transformation. Then
X/Ker0=Im 0.
58
11
LINEAR TRANSFORMATIONS
It follows from 5.17 that for a linear transformation 0: X - X whose domain X and range Y are finite-dimensional linear spaces, we get
r(o) = dim X -dim(Ker 0). F. 1.
Exercises Determine which of the mappings below were linear transformations. (a) gyp: R -> R2 such that p(t) _ (et, t). (b) iy: R -> R2 such that Vi(t) _ (t, 2t). (c) t: R2 -> R2 such that t(x, y) = (2x, 3y).
(d) : R2 -> R2 such that (x, y) _ (xy, y). (e) 2.
Let ,i: Y - Z and gyp: X -+ Y be linear transformations. Prove that (a) (b) (c) (d)
3.
q: R -> R2 such that rl(x, y) _ (ex cos y, ex sin y).
if t//oip is surjective, then tp is surjective, if is injective, then p is injective,
if Wop is surjective and 0 is injective, then p is surjective, if is injective and p is surjective, then ty is injective.
Let E: X -+ Z be a linear transformation and 41: Y -> Z an injective transformation. Prove that a linear transformation gyp: X - Y exists so that t =
4.
if and only if Im>fi D lint
Let a, 1, y, 6 be fixed complex numbers. (a)
Prove that gyp: C2 -> C2 such that ,p(x, y ) = (ax + (3y, -yx + 6y) is a linear transformation.
(b) Prove that p is bijective if and only if aS - i3ry 0 0.
§5
5.
GENERAL PROPERTIES OF LINEAR TRANSFORMATION
59
Let X = C[a, b] be the space of continuous real-valued functions defined on [a, b I, and let tp: X -* X be defined by t
(0) (t) =
r f(x)dt.
(i) Show that W is a linear transformation. (ii) 6.
Is p injective?
Let X, be a proper subspace of a linear space X. Let Y be a linear space and yEY. Suppose x0EX\X1. Prove that there exists a linear transformation gyp: X - Y such that ip(xo) = y and p(x) _ O for all xeX1.
7.
Let ip be an endomorphism of a linear space X, i.e. a linear transformation of X into itself. Suppose for every endomorphism 0 of X if oip = 0 then ifs = 0. Prove that p is surjective.
8.
Let X be an n-dimensional linear space over A. Prove that if f: X -> A is a linear transformation different from the constant mapping 0 then dim(Kerf) = n-l.
9.
Let X and Y be two n-dimensional linear spaces and gyp: X -> Y a linear transformation. Show if ip is injective or surjective, is an isomorphism.
10. Show that if X = X 1 ® X2 , then X2 = X/X1
.
11. Let X1 and X2 be subspaces of a linear space X.
Show that (X1 + X2)/X2 = X I /(X 1 n X2 ). Show that if X1 D X2 , then X/X 1 = (X/X2) / (X1/X2). 12. Let X be a linear space and X1, X2 and X3 subspaces of X such that X3 C X1. Show that (a) (b)
(XI +X2 )/(X3 +X2)=X1/(X3 +(X1 nX2)) 13.
Let >fi : Y -> Z and p: X- Y be linear transformations where X, Y and Z are all finite-dimensional linear spaces. Prove that (a) (b) (c)
if 0 is injective r(,tio1p) = r(,p), if p is surjective r(i,(iotp) = r(1/i).
60 14.
II
LINEAR TRANSFORMATIONS
Let gyp: A -- B be a linear transformation and X, X' two arbitrary subspaces of A. Prove that
p[X] C AX'] if X C X', (b) p[X +X'] (c) p[X n X'] C ,p[X] n AX'l (a)
.
Find subspaces X and X' of A so that
pp[XnX'] * p[X]
15.
Let gyp: A - B be a linear transformation. For any subset Y of B, define gyp-' [Y] = {xEA :,p(x)G Y } .
Prove that for every subspaces Y, Y' of B and every subspace X of A, gyp-' [ Y]
is a subspace of A, Y]) = dim(Y n Im4p) + dim(Ker gyp),
4p-'[Y] Ccp-' [Y'] if YCY',
1p-' [Y+ Y'] _ p-' [Y] + 0-' [Y'l, pp-' [Y n Y'] _ p-1 [Y] n p-1 [Y'l , Y D p [tip-' [Yl
XC P-' [AXl ] 16. A sequence of linear spaces and linear transformations Ipn Pn-I pn+2 pn+I Xn+1
) Xn-)Xn-1
0
...
is said to be exact if and only if for each integer n Im pn + 1 =
(a)
Show that a linear transformation gyp: X -> Y is injective if and only if 0-*X p >Y
is exact. (b) show that a linear transformation bfi : X -> Y is surjective if and only if
X 0>Y--00 is exact.
GENERAL PROPERTIES OF LINEAR TRANSFORMATION
§5
(c)
61
Show that for each subspace X' of a linear space X, the sequence 7r
)0' the inclusion mapping and 7r the natural X
> X/X'
where t is surjection, is exact.
(d) Show that if a sequence
0--oX'--0 X - X" - '.0 is exact, then X' can be identified with a subspace of X and X" with XIX'. 17. The cokernel of a linear transformation gyp: X -- Y is defined as Coker p = Y/Imp. (a)
Show that if the diagram of linear spaces and linear transformations
X
X'
y'Y
P'l
1'P
is commutative, then the linear transformations 4) and define linear transformations Keep' - Kernp, Imp' -+ Imp and Coker 'p' (b)
Let
X' --,X
0X"
Y'
) Y"
>Y
be a commutative diagram of linear spaces and linear transformations. (i)
Show that if the linear transformation Y' -+ Y in question is injective, then the sequence
Kerpis exact.
62
LINEAR TRANSFORMATIONS
11
(ii)
Show that if the linear transformation X -> X" is surjective, then the sequence CokerV -> Cokerip -> Cokerip"
is exact. 18.
Let A, B be real linear spaces and C(A), .C(B) the sets of all subspaces of A and of B respectively. A mapping: £(A) -+.C(B)
is called a projectivity of A onto B if and only if (i) 4' is bijective and (ii) for any X, X' E (A), X J X if and only if 4'(X) J 4(X'). Show that for any projectivity of A onto B and any subspaces X, X' of A
b(X + X') _ 4(X) + D (X), (b) (D(X n X') _ 4(X) n 'F(X'), (a)
(c) (N0)=0,(D(A)=B, (d) dimX = dim((D(X)). §6. The Linear Space Hom(X, Y) A. The algebraic structure of Hom(X, Y) In the § 5B, we saw that for any two linear spaces X and Y over
the same A the set Hom(X, Y) of all linear transformations of X into Y is non-empty. We shall introduce appropriate composition laws into this set so as to make it a linear space over the same A in question. Let 0, 0 E Hom(X, Y). We define a mapping 0 + Ji of the set X into the set Y by the equation (1) (0 +,y)(x) = 0(x) + i/i(x) for every xEX.
We take note that the + sign on the left refers to the addition to be defined and the + sign on the right comes from the given addition of the linear space Y. Thus addition of mappings X -> Y is defined in terms of addition of vectors of the range space Y alone. Since (0 + 0)(Xx + µy) = 0(Xx + µy) + 4i (Ax + µy)
=X0(x)+110(Y)+XiG(x)+µ0(Y) _ A(0(x) + O(x)) + µWY) + 41(Y)) = X(O + O(x) +µ(O + 41)(Y),
the mapping 0 + 0 is a linear transformation of X into Y.
§6
63
THE LINEAR SPACE HOM (X. Y)
We shall show that the ordered pair (Hom(X, Y), +) is an additive group. For the associative law and the commutative law, we have to
verify that for any three linear transformations
and t of
Hom(X, Y), the equations
(0+ J,)+t ¢+(,y+ ) 0+,y1P +0
and
hold. On both sides of these equations, we have mappings with identical domain and identical range, and moreover for any xEX, we get
0 + Ji) + ) (x) = (0 + 0) (x) + E(x) _ O(x) + OW + t(x) = O(x) + (J, + ) (x) = (0 + (J< + )) (x) (0 + Ji) (x) _ O(x) + (x) _ OW + O(x) = (4 + 0) (x), and therefore the above equations of mappings are proved.
The zero mapping defined in 5.3 is easily seen to be the neutral element of the present addition. For 0 E Hom(X, Y), the mapping -O of X into Y defined by (-0)(x) = -0(x) for every xEX,
is clearly a linear transformation and as such it is the additive inverse of 0 with respect to the (Hom(X, Y), +) is an additive group.
present addition. Therefore
For every 0 E Hom(X, Y) and every scalar A of A, we define a mapping AO of the set X into the set Y by the equation (2)
(AO)(x) = A(¢(x)) for every xEX.
Clearly A0EHom(X, Y), for (AO) (µx + vy) = (Ap)O(x) + (Av)O(y) _ y((A0)(x)) + v((AO)(y)). Similarly, we verify that the additive group
Hom(X, Y) together with the scalar multiplication above form a linear space over A.
We remark here that the algebraic structure on Hom(X, Y) is given by that on Y alone.
THEOREM 6.1. Let X and Y be linear spaces over A. Then the set Hom(X, Y) of all linear transformations of X into Y is a linear space
over A with respect to the addition and the scalar multiplication defined in (1) and (2).
Relative to addition and scalar multiplication of the linear space
64
II
LINEAR TRANSFORMATIONS
Hom(X, Y), composition of linear transformations has the following property. For all 0, >y E Hom(X, Y) (a)
(0 + ly)oa = Ooa + 0oa and (X0)oa = Oo(Xa) = X(4oa) for any linear transformation a: X' -> X.
(b) 1o(0 + >) = f3oO +(3o4i and 13o(X0) =(V)o0 = X(Qo0) linear transformation 0: Y -* Y'.
for any
This property is referred to as bilinearity of the composition of linear transformations. Therefore for any linear transformations a: X - X and p: Y -> Y' the mapping Hom(a, 0): Hom(X', Y') --> Hom(X', Y') defined by Hom(a, 0)(0) = /o0oa for every OEHom(X, Y) is a linear transformation. Thus Hom(a, Q)(X0 + 110) = XHom(a, 9)(0) + p Hom(a,
Finally let us consider the dimension of the linear space Hom(X, Y)
for the case where X and Y are finite-dimensional. We shall use the method of 5.8 to construct a base of Hom(X, Y) and show that dim Hom(X, Y) = dimX dimY. Let (x1, . . . , x m) be a base of X and (y I, ... , be a base of Y. By 5.8, we need only specify the images of x1 and extend by linearity to obtain a linear transformation. Using yl and the zero vector 0 of Y as images we get a linear transformation 011 : X - Y which send xI to y1 and all other x; to zero. Similarly a linear transformation 021: X -> Y is determined by specifying 02 1 (x2) = yl , and 021(x1) = 0 for i * 2. In this way m linear transformations 0, X - Y (j =1, . . . , m) are obtained such that y1 if i = j 011(x1) =10
if
m).
i*j
Using y2 instead of yI , we get another m linear transformations 012 , . . . , 0m2 such that y2
if i = j
012(x;)0 if i*j
(i= 1,...,m).
Carrying out similar constructions for all yk (k = 1, . , n) we obtain mn linear transformations 0 k (j = 1, ... , m; k = 1, . . . , n) such that yk ifi= j .
0jk(x,)-10
ifi*j
.
m).
§6
65
THE LINEAR SPACE HOM (X, Y)
At this juncture, we introduce a notation that will help us simplify the complicated formulations above before we proceed to prove that the linear transformations Ojk form a base of Hom(X, X). For any pair of positive integers (or more generally, of elements of an index set 1) i and j, the Kronecker symbol 6rj is defined by 1
if i = j
0 if i *j.
6r/
Using these symbols we can write
Oik(x1) = 6rjyk (Q = 1,
.
. .
, m; k = 1,
... , n).
If 0: X --> Y is a linear transformation, then +a1.
¢(x1)=a11Y1+.
yn (i=1, ...,m)
for some scalars alk (i = 1, ... , m; k = 1, ... , n). Consider the linear
combination P = 1 ak jk . Then
k
(x1) _ (Y aikOik) (xi) = Eaik0ik(xI) = Eaik6liyk. j,k
j,k
j,k
For the last summation, if we sum over the index j first, then for every k = 1, ... , n, we have a partial sum Y ajk 61jyk
.
But of the m summands only the one corresponding to j = i can be different from zero since Sri = 0 for i # j; therefore
i
aiksi,yk = arkyk.
Hence
1y(x1) = all Y1 + ... + arnyn = >G (xl) and >y = 4' by 5.8(a). Therefore the linear transformations 4jk generate Hom(X, Y). Moreover if
i
0
for some scalars NiO = 1, ... , m; k = 1, i = 1,
. . .
... , n), then for
, m, we get
0 = TX/k¢ik(xl) = Xr1Y1 + ... +
xinyn,
each
66
LINEAR TRANSFORMATIONS
II
, . . . , y, are linear independent, we get . = A;,r = 0 for i = 1 , ... , m. Therefore the linear transformations 0ik ( j = , ... , m; k = 1, ... , n) form a base of
Since
the vectors y,
=
A=1
.
1
Hom(X, Y), proving dim Hom(X, Y) = dim X dim Y.
THEOREM 6.2. If X and Y are finite-dimensional linear spaces over A, then dim Hom(X, Y) = dim X dim Y.
B. The associative algebra End(X)
Linear transformations of a linear space X into itself are also endormorphisms of X. Consequently the linear space Hom(X, X) is also denoted by End(X). In this linear space we can define one more internal composition law called multiplication called
as follows.
00 = 0 o ¢ for every ¢, 0 of End(X). It follows from associativity and linearity of composition that for any
of End(X) and A, p of A, (>; 4,)o _ >;(40)
t(4,+0) t4,+ to ( + 00 = to + 00 (4)(14)= (X)(4,*). The linear space End(X) together with multiplication as defined above constitute an example of what is known as an associative A-algebra.
When the multiplication of endomorphisms is our chief concern, then instead of the associative A-algebra, we consider the algebraic system which consists of the set End(X) and its multiplication. In
this case, the subset Aut(X) of all automorphisms of X, i.e., isomorphisms of X onto itself, is closed under the multiplication. By this we mean that for any 0, 41 a Aut(X) the product 4io belongs to Aut(X). Furthermore, we have [GI ] [G3 ]
t(4,) = (.i)(k for any three elements t, 0, 0 of Aut(X), there is an element ix of Aut(X) such that ix 0 = 0 for every 0 of Aut(X).
[G4]
to each 0 of Aut(X) there is an element 0`' of Aut(X) such that 0-10 = ix.
§6
THE LINEAR SPACE HOM (X, Y)
67
Using the terminology of abstract algebra, we say that the algebraic system consisting of the set Aut(X) and its multiplication is a group. This group is called the group of automorphisms of the linear space X over A, and has played an important role in the development of the theory of groups. C. Direct sum and direct product
The idea of direct sum formation which was described earlier as fitting together independent components of a linear space has been repeatedly used to give useful results. On the other hand linear transformations have been shown to be indispensable tools in the study of linear spaces. Here we shall try to formulate direct sum decomposition of a linear space in terms of linear transformations so as to link up these two important ideas. Let X = X 1 ® X2 be a direct sum decomposition of a linear space
X. Then we have a pair of inclusion mappings LI : X1 - X and 12 : X2 -> X which are linear transformations. On the other hand every vector x of X is uniquely expressible as a sum x = x I + x2 with x I EX I and X2EX2 . Therefore a pair of mappings TrI : X -+ XI and 7r2: X -+X2 are defined by 7r1 (x) = x1 and ir2(x) = x2. It is easily seen that 1r1 and 7r2 are linear transformations and they will be called the
projections of the direct sumX = XI ®X2. From these two pairs of linear transformations, we get composites: iri
X, - X1®X2 -Xi (i,j=1,2) and
7r
X1®X2 -' X1
ti
X1®X2
(j = 1, 2) .
It is easy to see that (a) (b)
1ri°4 = 6JkiXk, and b ° ir2 + r.2 ° 7r2 = 'X1 ®X2
Using these notations, we can reformulate the result of 5.6 as follows: If X = X1®X2, then a one-to-one correspondence between
'
Hom(X1 (D X2,Y) and Hom(X1,Y) x Hom(X2,Y) is given by 0-'(L1°0, L2° 0) and its inverse is given
ir2°W2
Conversely we have the following theorem whose proof is left to the reader as an exercise.
68
II
LINEAR TRANSFORMATIONS
THEOREM 6.3. Let X, and X2 be subspaces of a linear space X and let ti: Xt -* X be the inclusion mappings. If there exist linear trans-
formations p1: X - Xt (j = 1, 2) such that (a)
pj°St=SjkiXk, and
(b)
t1°P1 + L2°P2 = ix,
then X = X1 ®X2 and the linear transformations p, and p2 are the projections of the direct sum.
Consider two linear spaces X, and X2 over the same A and the cartesian product X, x X2 of the sets X, and X2. In this set we define addition and scalar multiplication by
(x1,x2)+(YI,Y2)= (x1 +Y1, x2 +Y2),
X(x1,x2)(Xx1,)v2) for all x1 FYI EX 1,x2 i y2 EX2 and XEA. Clearly X, X X2 constitutes a
linear space over A with respect to these composition laws. We call
this linear space the cartesian product of the linear spaces X, and X2.
The cartesian product X, X X2 x
... X X of a finite number of
linear space X1 , X2, ... , X, over the same A is similarly defined.
In particular, we have An = A x ... x A (n-times). More interesting and more useful in the applications is to find a necessary and sufficient condition for an arbitrary linear space X to be isomorphic to the Cartesian product X, x X2 of two linear spaces X, and X2. Consider the pair of mappings called the canonical projections of the cartesian product X, 4
A
X1 X X2
P20X2
and the pair of mappings called the canonical injections of the cartesian product 12
11
X,
'
X1 xX2 4
X2
defined by and
P1(x1,x2) = x1, P2(XI,x2) = x2 '1(x1) = (x1, 0), i2 (x2) = (O,X2)
69
THE LINEAR SPACE HOM (X, Y)
§6
Clearly all four mappings are linear transformations and between them the following equations hold:
P1° it = iXi, P2°2 = iX2 P1° 2 = 0, p2°il = 0 ii °P1 + I2 °P2 = 'XI X X2.
On the other hand, if a linear space X is isomorphic to Xi X X2, then we get from an isomorphism Z: X Xl x X2, and the following linear transformations aj = '-i ° ij, pj = p o ( for j = 1, 2. These linear transformations, clearly satisfy the following equations: (c)
Pi° Qk = Sjk'Xk
(d) Q1 'PI + 02 °p2 = iX. Conversely if there exist linear transformations
Xi XI
Pi
X
a i 4 X,
P2
02
X2 X2
such that the equations (c) and (d) are satisfied, then the mapping (D: X -- X1 X X2
defined by
cb(x) _ (p (x), p2 (x)) for every xEX 1
is an isomorphism.
p
THEOREM 6.4. Let X, XI, X2 be linear spaces. Then X is isomorphic to X1 X X2 if and only if there are linear transformations aj
Xj
for which the equations
IX
- X.
j = 1, 2.
P/°Qk = SjklXk Q1 °P1 + Q2° P2 = ix
hold.
We say that the linear transformations a,, pi yield a representation of X as a direct product of the linear spaces XI , X2 .
70
II
LINEAR TRANSFORMATIONS
From 6.4, it follows that a direct sum is a special case of direct product where the linear transformations vt : Xi - X are the inclusion mappings. D.
Exercises
1.
Let p be an endomorphism of a finite-dimensional linear space X and let Y be a subspace of X. Prove that dim Y.
di mp[Y} + 2.
Let X = Xl + X2. Define gyp: X1 x X2 --> X by
,p(x1,x2) = x1 + x2 for all x1 EXI, x2 eX2.
Show that p is a linear transformation. Find
Show also
that W is an isomorphism if and only if X = X 1 ® X2. 3.
Let a and a be endomorphisms of a finite-dimensional linear space X. Suppose a + (3 and (i - a are automorphisms of X. Prove that for every pair of endomorphisms y and 5 there exist endomorphisms p and
such that
av +00 ='Y (3,p+ai = S 4.
Let p and i be endormorphisms of an n-dimensional linear space. Prove that
5.
r(cp°>!i) > r(ip) + r(>!i) - n. Let p and 0 be endomorphisms of a finite-dimensional linear space A. Prove that
0) < r(ip) + r(qi). 6.
Let X be an n-dimensional linear space over A and let ppEEnd(X). For every polynomial f(T)=amTm+... +a1T + ao
in the indeterminate T with coefficients a; in A we denote by f(op) the endomorphism r(,P) =
a1 p+ao.
Prove that (i)
There exists a polynomial f(T) of degree < n2 such that f(V) = 0.
§6
(ii) 7.
71
THE LINEAR SPACE HOM (X. Y)
If tp is an automorphism, then there exists a polynomial f(T) with nonzero constant term such that f (4p) = 0.
Prove that if for an endomorphism p of a linear space X the equation
holds for every i E End (X), then p = Xix for some scalar X. 8.
An endomorphism a of a linear space X is called a projection of
X iff a2 = a. Two projections a and r of X are said to be orthogonal to each other iff aor=Toa = 0. (a) Show that if a is a projection of X, then a and ix - a are orthogonal projections. (b) Show that if a is a projection of X, then Kera = Im (ix - a),
Ima = Ker(ix - a) and X= Kera ® Imo. (c) Show that if al , ... , an are mutually orthogonal projections of X and X a) ... ® Iman.
ix = al + ... + an, then
9.
4 E End(X) is called an involution if 02 = ix. (a) Prove that if p r= End (X) is a projection then i = 2tip - ix is an involution. End (X) is (b) Prove that if X is a real linear space and if an involution, then p ='/z (0 + ix) is a projection.
10.
Let >y be an endomorphism of a 2-dimensional space X. Prove
that if >fi * ix is an involution, then X = X1 ® X2 where X1 = {xEX: Ox = x} and X2 = {xEX: Ox = -x} . 11.
Let 'p and 41 be endomorphisms of a linear space X. (a) Prove that if ipo a i - o p = ix, then ipm o
m
0-
o ipm =
1
> 1.
(b) Prove that if X is finite-dimensional then ,poi - >/1 oip = ix. (c) Find endomorphisms p and c of R[T] that satisfy the
equation 'po>y - 0 ocp = IR1T
.
72 12.
II
LINEAR TRANSFORMATIONS
Let p be an endomorphism of a linear space X. Denote for each
i=1,2, ...
K; = Ker(,p') and I = Prove that (a) K = U;K; and I = n;1; are subspaces of X, (b) if there exists an n so that Kn = Km for all m > n, then
Kni=0,
(c)
if there exists an n so that In = Im for all m > n, then
K + I = X, (d) if X has finite dimension, then X = K ®1. Ir>
13.
Let Xi -- X - Xi (j = direct product of X, , (a)
1, ... , n) be a representation of X as .
, Xn.
Show that the linear transformations y: X ->X(j=1, ...,n) satisfy the following condition: [11 for any linear space Y and any system of linear transformations ii : Xi -- Y (j = 1, . . , n), a unique linear transformation gyp: X -> Y exists so that ii = `paLj (j= 1, .. , n). .
(b) Show that the linear transformations 7ri: X -+ Xi the following condition:
satisfy
[T] for any linear space Z and any system of linear transformations pi : Z - Xi (j= 1, . . , n), a unique linear transformation : Z -+ X exists so that pi = 7rioO .
>
(j= 1, ...
,
n).
§7
73
DUAL SPACE
14. Show that if a system of linear transformations i1: Xj -> X (j = 1, ,
n) satisfies the condition [11 of Exercise 13, then a
unique system of linear transformations rrj: X-+ X (j = 1, ... , n)
exists so that the linear transformations Xj -- X -i+ Xj (j = 1, ... , n) form a representation of X as direct product of X1,
...,
X,,.
15. Show .that if a system of linear transformations Trj: X -> Xj (j = 1, . . . , n) satisfies the condition [T] of Exercise 13, then
t
a unique system of linear transformations t,: X1 -> X (j = 1, ... , n)
X!4 X1 (j =1, ... , n) form a representation of X as direct product of X1, ... , X .
exists so that the linear transformations X;
16. Let X be an n-dimensional real linear space.
(a)
If xo is a non-zero vector of X, prove that {ipeEnd(X): Vxo = 01 is an (n2-n)-dimensional subspace of End(X).
(b) Let Y be an m-dimensional subspace of X. Prove that {,pcEnd(X): p[Y] = 01 is an n(n-m)-dimensional subspace of End(X). 17. Let p bean endomorphism of an n-dimensional linear space X. (a) Prove that set F(op) = 01 is a subspace of End(X). (b) Find Cpl, 'p2 and p3 such that dimF(p1) = 0, n and dimF((p3) = n2 . What other possible values can dim F((p) attain? 18.
Let Cpl, ... , cps be s distinct endomorphisms of a linear space X. Show that there exists a vector xEX such that the s vectors P1 x, ... , osx are distinct.
19.
Let (p and ,1i be projections of a linear space X. Show that (i) Im(p = lm' if and only if (po> = >!i and 1Potp =(p and (ii)
Kernp = Keri if and only if po0 = (p and §7. Dual Space
In Examples 5.5 to 5.8, the idea of direct sum decomposition of the domain X of a linear transformation 0: X -> Y has led us to study
74
II
LINEAR TRANSFORMATIONS
linear transformations with 1-dimensional domain. As a result, different ways of constructing linear transformations were found. We now want to know if similar operation on the range Y would serve useful purposes, and in particular if there is a case for studying linear transformations with 1-dimensional range. Let Y = Y, ® Y2 be a direct sum and let Lj and 1ri be the canonical
injections and projections of the direct sum. Then by an argument similar to that used in § 6C we see that a one-to-one correspondence between the set Hom(X, Y, ® Y2) and the set Hom(X, Y,) x Hom(X, Y2) is given by 0 - (ir,o0, 1r2o¢) and its inverse is given by (p/ 1,V'2 ) -+ L, o lP I + L2 o 0 2.
Following through this idea, we can
further decompose Y = Z, ® ...®Z, as a direct sum of 1-dimensional subspaces. Then 0 can be recovered from the linear transformations rr;o¢. This, in a way, motivates the study of linear transformations
with 1-dimensional range. As a prototype of such linear transformation, we take a linear transformation whose range is the 1-dimensional arithmetical linear space A' over A. A. General properties of dual space
Let X be a linear space over A and denote by A the 1-dimensional
arithmetical linear space A' over A. Then by the results of §6A, X* = Hom (X, A) is a linear space over A. We call X* the dual space or the conjugate space of the linear space X. Elements of X* are called linear forms, linear functions, linear functionals or covectors of X. It follows from 7.2 that dim X = dim X* for a finite-dimensional linear space X.
EXAMPLE 7.1. Let X be a finite-dimensional linear space and B = a fixed base of X. Then the coordinates X1 of a vector (x, , ... ,
xeX relative to the base B are determined by the equation
+ Xnx , By mapping x to its first coordinate X1, we obtain a linear function f, eX*. f2 , , f are similarly defined. The linear function f,, is called the i-th coordinate function of X
x = a, x, + .
relative to B.
Since covectors are linear transformations, the methods of constructing linear transformations outlined in Examples 5.8 are applicable here. Likewise the kernel and the image of a covector f of X are defined. Since the range of a covector f is the 1-dimensional linear space A, and Im f is a subspace of A, we have either Im f = 0 or Im f = A. It is easily seen that Im f = 0 if and only if f = 0. For the kernel of a covector, we have the following theorem.
§7
DUAL SPACE
75
THEOREM 7.2. Let X be a linear space and f a covector of X. If f is not the zero covector, then dim X = I + dim(Ker f). PROOF. The theorem follows from 5.17, or more explicitly it can be proved as follows. Since f is non-zero, there exists a non-zero vector y
of X such that yo Ker f. If Y is the 1-dimensional subspace of X generated by y, then Y and Kerf are complementary subspaces of X.
Indeed Yf1Ker f = 0; on the other hand, for every x(=-X, we can
write x = Ay + z where A = f(x)/f(y) and zE Ker f. Therefore dim X = 1 + dim(Ker f).
From the definition of the zero-vector 0EX*, we see that a covector f of X is zero if and only if f (x) = 0 for every xEX. Dual to this, we have the following theorem.
THEOREM 7.3. Let X be a linear space. Then a vector x of X is the
zero-vector of X if and only if f(x) = 0 for every covector f of X.
PROOF. If x = 0, then obviously f(x) = 0 for all fEX*. Conversely, let f (x) = 0 for all fEX *. If x # 0, then x generates a 1-dimensional subspace X of X. If X" is a complementary subspace of Xin X, then every vector z of X can be written uniquely as z = Ax + y where AEA and yEX' A covector f of X is defined by f(z) = A for all z = Ax + y where yEY". But f (x) = f(x + 0) = 1 0 0. Therefore the assumption that x 0 0 is necessarily false.
For finite-dimensional linear spaces, we can extend the duality to a relation between the bases of X and the bases of X*. We say that a base xl , ... , x of X and a base fl , ... , f,, of X * are dual to each other if (1) f (xi) = Sid for i, / = 1, ... , n, where S,i are the Kronecker symbols. Given any base x 1i ... , x of X, the equations (1) uniquely define n vectors fl, ... of X* that form a base of X*, as has been shown in the proof of Theorem 6.2. Thus for every base of X there is a unique base of X * dual to it. From 7.1, we see that the base (fl , .. . , of X * dual to the base (x1 , ... , of X consists of the coordinate function of X relative to the base (x1, ... , of X. It is also true that for every base of X* there is a unique base of X dual to it; this will be shown in §7C.
76
B.
LINEAR TRANSFORMATIONS
II
Dual transformations Let us now consider linear transformations. For any linear trans-
formation 0: X - Y, we get a linear transformation Hom (0, in) which shall be denoted by 0*. Therefore 0*: Y* -> X* is defined by
0*(g) = goo for every gEY*, or diagramatically X
0
>Y
A
The linear transformation 0* is called the dual transformation of 0. EXAMPLE 7.4.
It follows from 5.8(b) that
relative to a base
B=(xl, ...,xm)ofXandabaseC=(y1,. .., y,)ofY,a linear transformation 0: X -)- Y is completely determined by the scalar aij defined as follows: m). O(x'i) =ailYi + ... + ainYn Denote by (f l , .. j m) the dual base of B and by (g l ,
the dual base of C. Then it is easily seen that or
... , g,)
0*(gi) = alif, + ... + am j.fm (j = 1, ... , n) 0*(gj)(xi) = aii (i = 1, . . , m; j = 1, ... , n). .
The formation of dual transformation has following properties: (a) for any linear space X, (ix)* = ix., (b) for any pair of linear transformations 0: Y - Z and 0: X -+ Y, 0*4*.
Since, for every f EX *, (ix) *(f) = f o ix = f = ix *(f), (a) holds. We observe that (OoO)*: Z* -* X* and for every hEZ* (Po0)*(h) = ho(OoO) = (ho0)oO = (0*(h))oO = 0*(P*(h)) = 0*o,P*(h); therefore (b) holds.
§7
77
DUAL SPACE
Natural transformations The operation of forming the dual space X* of a linear space X is not just an operation on a linear space X but an entire collection of
C.
such operations, one for each linear space; in other words it is an operation D on the whole class of linear spaces. Similarly, the operation of forming the dual transformation 0*: Y* -> X* of a linear transformation 0: X - Y can be regarded as an operation A on the whole class of linear transformations. This and other pairs of similar operations will be the subject matter of the present § 7C. Repeated applications of the operation D on a given linear space X give rise to a sequence of linear spaces:
X X* X** X*** .....
Suppose X is a finite-dimensional linear space, then each linear space of the sequence is finite-dimensional and has the same dimension as X. Thus X = X*, X = X**, ... . There does not exist, however, any
"particular" isomorphism from X to X*. We may be tempted to think that the isomorphism 'F which takes the vectors x; of a base (x,, to the corresponding covectors fj of the dual base (fi,
... , f,) is a "particular" one. But 4) behaves quite differently with
respect to other pairs of dual bases; for example, it does not take vectors of the base (x, + x2 , X2, X3,
. . . ,
to the corresponding
vectors of the dual base (f, , f2 -f1 , f3, . . . , fn ). In other words, 'F depends on the choice of a pair of dual bases. But there is a particular isomorphism tX, one which distinguishes itself from all the others, of X onto its second dual X**. It turns out that if (xl , ... , xn )
is any base of X, (fl, ... , fn) the base of X* dual to (xl, ... , xn) and (F,, . . ., the base of X** dual to (fl , . , fn ), then this .
.
particular isomorphism tX takes x; into F;. Moreover tX is seen to be just one of a collection t of such isomorphisms, one for each finitedimensional linear space and its second dual. Similarly repeated applications of the operation A to a linear transformation 0: X -+ Y give rise to a sequence of linear transformations
o***
0**
J
Y**
Y***
It is natural to ask if similar comparison between 0 and 0** can be
78
II
LINEAR TRANSFORMATIONS
made and it turns out that the pair of isomorphism tx and ty can be used for this purpose as well. Let X be an arbitrary linear space over A. For every element x of X, consider the mapping Fx : X* - A defined by Fx(f) = f(x) for every fEX*. For any f, g EX * and any A, µEA, we get Fx (Af + µg) = (Af + µg) (x) = AFx(f) + µFx (g); therefore Fx is a linear transformation. Hence FxEX**, for every xeX. THEOREM 7.5.
For every linear space X, the mapping tx : X -+ X*
defined by
tx(x) = Fx, where Fx(f) = f(x) for every fEX*, has the following properties: (i) Ix is an injective linear transformation; (ii) for every linear transformation 0: X - Y, O**otx = ty 00,
i.e. the diagram
X
Y
tx
ty
**
X**--% Y** is commutative; (iii) tx isan isomorphism if X is a finite-dimensional linear space.
In this case tx is called the natural isomorphism between X and X**. PROOF. That tx is injective follows from Theorem 7.3. Therefore for (i) it remains to be proved that tx is a linear transformation. Let x, x'EX and A, µr=-A. Then for every fEX *, we get (AFx +
AFx(f) + iaFx, (f) = Af(x) + Af(x') = f(Ax + µx') = FAx+µx' (f);
therefore tx is a linear transformation, proving (i). For (ii) we observe that both 0**otx and tyo(a are linear transformations of X into Y**. For every xEX we get and
(0**otx)(x) _ O**(Fx) = FxoO* (tyoo)(x) = ty(O(x)) = Fi(x).
§7
79
DUAL SPACE
Therefore it remains to be proved that Fx oO* and Fo(x) are identical elements of Y**. But elements of Y** are linear transformations of Y* into A, so we have to show (F o *)(g) = Fo(x)(g) for every gEY*. Now and
(FX oo*) (g) = FX (o*(g)) = Fx (goo) _ (goo) (x) = g(o(x)) Fo(x) (g) = g (o(x))
Therefore (ii) is proved. Since tX is a linear transformation, Im tX is a subspace of X**. By (i) tX is injective; therefore dim(Im tX) = dim X.
On the other hand dim X = dim X** for any fmite-dimensional linear space X. Therefore dim(Im tx) = dim X** and hence by 4.4 Im tX = X**, proving (iii).
REMARKS. In §8 suitable terminology and notations for handling such "large" mathematical objects such as the operations D, A and t above are introduced. Under the categorical language there, t is a natural transformation, the pair D and A constitutes a functor and the domain on which this functor operates, i.e., the class of all linear spaces and the class of all linear transformations, is a category. By the first two parts of Theorem 7.5, we can always identify every linear space X with a subspace of its second dual X* * and as a result of this identification each linear transformation 0 is the restriction of its second dual o**, i.e., o = o** W. Analogous relations between X*, o* and X***, o*** are similarly defined. By 7.5(iii), each finite-dimensional linear space X can be identified with its second dual X** and consequently 0 = o** for each linear transformation 0: X -* Y of finite-dimensional linear spaces. Therefore in the sequences X X* X**, X***
o, o*,
....
0**, o***
we need only consider the first pairs of terms X, X* and 0, o*; the remaining ones, being naturally isomorphic copies, can be identified with them. The natural isomorphism tX : X _X* * of a finite-dimensional linear
space X onto its second dual X** will have many applications later on. We observe that from the equality dim X = dim X** alone it follows that X and X** are isomorphic linear spaces, i.e., there exists an isomorphism 0: X -+ X**. However this unspecified isomorphism 0 may not satisfy the condition: (o(x))(f) = f(x) for all x(=-X and fEX*.
80
11
LINEAR TRANSFORMATIONS
of At the end of §7A, we have seen that every base (x,, . , a finite-dimensional linear space X determines a unique dual base , of X* Conversely let (g, , be an arbitrary (f, , . , of base of X* Then this base determines a dual base (G, , . . , X**. By means of the isomorphism tX: X - X**, we get a base .
.
.
.
.
.
.
.
(y,, '.... yn) of X where tx(yi) = G; for i = 1, _ (tX(yi)) (gi) = G.(g1) = S for i, j = 1, of X* and the base (y,, (g, , ... ,
.
.
. ,
n. Now gi(yi)
... , n. Therefore the base
of X are dual to each other. It follows from 7.3 that the base (y, , ... , y,) is also unique. We have thus proved the following corollary. COROLLARY 7.6.
Let X be an n-dimensional linear space. Then
every base of X determines a unique dual base of X* and conversely every base of X* determines a unique dual base of X.
D. A duality between .?(X) and ..'(X*) Between the set .2'(X) of all subspaces of a finite-dimensional linear space X and the set.t'(X*) of all subspaces of the dual space
X* of X there is a duality from which the well-known duality principle of projective geometry can be derived (see § I IH). We express this duality in terms of two mappings AN: 21(X) ->2' (X*)
and an: f(X*) ->9(X). For every subspace Y of X and every subspace Z of X*, we define the annihilator of Y and the annihilator of Z respectively as AN(Y) = {fEX*: f(x) = 0 for all x(=-Y} and
an(Z) = {xEX: f(x) = 0 for all fEZ}.
It is easily verified that the annihilator AN(Y) of Y is a subspace of X* and the annihilator an(Z) of Z is a subspace of X. In terms of the annihilators, the mappings AN: L'(X) -> -51'(X*) and an:. (X *) ---2(X)
are defined and their properties are formulated in the following theorem.
THEOREM 7.7. Let X be a finite-dimensional linear space, Y, Y, and Y2 subspaces of X and Z, Z, and Z2 subspaces of the dual space X* of X. Then the following statements hold. (i) dim Y + dim AN(Y) = dim X. (ii) an (AN(Y)) = Y. (iii) AN(Y1) C AN(Y2) iff Y, D Y2. (iv) AN(Y, + Y2) = AN(Y,) n AN(Y2). (v) AN(Y, n Y2) = AN(Y,)+AN(Y2)
§7
81
DUAL SPACE
(i )* dimZ + dim an(Z) = dim X* (ii)* AN(an(Z)) = Z. (iii)* an(Z,) C an(Z2) iff Z1 D Z2. (iv)* an(Z1 + Z2) = an(Z1) n an(Z2). (v)* an(Z1 n Z2) = an(Z1) + an(Z2)-
PROOF. (i). By the. supplementation theorem, we can find a base of X such that (xl , ... , xp) is a base (x1i ... ) xP , xp+ 1) ... , is the dual base of the base of X ... , of Y. If (f, , ... , fp, fP+, , in question, then from the equations fi(xi) = Sid for i, j = 1, ... , n it follows that the covectors fp+, , ... , f, all belong to the annihilator AN(Y). On the other hand, if a covector f = X1 If, +
belongs to the annihilator AN(Y), then for each i =
-
1,
+ V.
... , p
we get
0 =f(xi) _ X1f1(xi)+ ... +Anfn(xi) _ A1Si1 + ... +XnSin Xi. Therefore the covectors fn+ I, . , fn form a base of AN(Y) and .
.
hence (i) is established. The proof of (i)* is similar. (ii). It follows from definition that Y is a subspace of an(AN(Y)).
On the other hand, we get dim Y = dim X - dim AN(Y) and dim AN(Y) = dim X* - dim an(AN(Y)) from (i) and (i)*. Therefore dim Y = dim an(AN(Y)) and hence Y = an(AN(Y)). The proof of (ii)* is similar.
(iii) and (iii)* follow immediately from the definition of annihilators and (ii), (ii)* above. (iv). From (iii) it follows that AN(Y1 + Y2) C AN(Y1)n AN(Y2)Conversely if f EAN(Yl) n AN(Y2 ), then f (x1 + x2) = f (x1) +f(X2) = 0
for all xl + x2 Y1 + Y2. Therefore (iv) is proved and (iv)* is proved similarly.
(v). It follows from (ii) and (ii)* that the mappings AN and an are inverse of each other; therefore we can find subspaces Z, and Z2 of X* such that AN(Yi) = Z; and an(Zi) = Y; for i = 1, 2. Using (iv) and (iv)* above, we get AN(Y1 r) Y2) = AN(an(Z1) n an(Z2)) = AN(an(Z1 +Z2)) = Z, + Z2 = AN(Y1) + AN(Y2). This proves (v) and (v)* is similarly proved. As an important application of 7.7 we shall show that
AN(Im 0) = Ker O* and an(Im 0*) = Ker 0
82
11
LINEAR TRANSFORMATIONS
for any linear transformation 0: X -> Y of finite-dimensional linear spaces and its dual transformation 0*: Y* X*. Indeed, ifgEKerO*, then goO = 0 and hence g(O(x)) = 0 for all xEX. Therefore g E AN (Im 0).
Conversely, if gEAN(ImO), then g(O(x)) = 0 for all xEX. Therefore (O*(g))(x) _ (go0)(x) = g(O(x)) = 0 for all xEX, and hence gEKer 0*. This proves the first equation and the second equation is proved similarly. From these relations between kernels and images we get the following theorem on the rank of a linear transformation. THEOREM 7.8. For any linear transformation 0: X -> Y of finite-dimensional linear spaces the equality r(o) = r(¢*) holds.
PROOF. The rank r(O) of 0 is defined as dim(Im 0); therefore r(o) _
dim Y - dim(AN(Im 0)) = dim Y - dim(Ker ¢*). But for the dual transformation O*: Y* - X* we get dim Y* = dim (Ker 0*) +r(O*).
Since Y is finite-dimensional, dim Y* = dim Y, and hence r(0) = r(o*). E. Exercises 1.
Let X be a linear space and let f 1i ... , fP be p covectors of X. Prove that a covector f of X is a linear combination of P
f1, ... , fP if and only if Ker f D n Ker f; . i=1
2.
Let X be an n-dimensional linear space over A, (x 1, ...
, xn) a base of X and (f1, ... , fn) a base of X* dual to (x1, ... 1X'). (a) Show that (x 1 + X2, X2, ... , xn) forms a base of X. Find the base of X* dual to (x1 +x2i x2, ... , (b) Show that (Xx 1, X2, ... , xn) forms a base of X for any nonzero scalar X of A. Find the base of X* dual to
(Xx1, x2, ..., X10(c) Show that the base (x1, x2 - X2x1, ... , x -
of X
and thebase (f1 +X2f2 + ...+Anf,,,f2i ...,fn)ofX*
3.
(a)
are dual to each other. show that every linear homogeneous polynomial aX + 13Y + yZ defines a linear function f on R3 such that
f(x, y, z) = ax + ay + yz for every (x, y, z) ER3. Show also that every linear function on R3 can be defined in this way.
§7
83
DUAL SPACE
(b) Determine the base of (R3 )* dual to the base ((2, 3, 4), (1, 0, 2), (4, -1, 0)). (c) Determine the base of R3 dual to the base
(X+2Y+Z, 2X+3Y+3Z, 3X+7Y+Z). Let X be an n-dimensional linear space over A with base
4.
(x1,... , xn ). Define e1 : A - X by ti(X) = Xx1 for every Ac=A.
Prove
fn) is the base of X* dual to , Xn), then A - X 4 A is a representation of X as a
that if (fl,
... , Li
(X 1,
5.
direct sum of n copies of A. Let X be a finite-dimensional linear space with base (x1, ... , xn ). Let (fl, . . . , fn) be a base of X* dual to (x1 , ... , xn ). If ipE End(X) is such that
ipXi =i
n
1,...,n,
express p * f i (i = 1, ... , n) as a linear combination of f1, .. . , fn . 6.
Let X be an n-dimensional linear space and f 1 , ... , fn r= X *. Prove that ( 1 , ... , fn) is a base of X* if and only if n
nKerfi = 0.
i=1
7.
Let X be an n-dimensional linear space and fl, ... , fn EX*. Show that if there are n vectors x 1, ... , xn of X such that f i x = 8i/ for all i, j = 1, 2, ... , n, then (f1, ... , fn) is a base of X* and (x1, ... , xn) is a base of X.
8.
Let X, Y be n-dimensional linear spaces and let gyp: X -4 Y,
X* -> Y* be linear transformations. Show that if fx for all fEX* and all xEX, then p is an
isomorphism and i = (lp*)-1. 9.
Let X be an infinite-dimensional linear space. Show that dim(X*) > dim X.
10.
Let X be a linear space. Show that X and X* have the same dimension if and only if they have finite-dimensions.
84 11.
11
LINEAR TRANSFORMATIONS
Let gyp: X -> Y be a linear transformation and p*: Y* -+ X* the dual transformation of gyp. Prove that if p is an isomorphism then 0* is an isomorphism and (gyp*)-' = (gyp -') *.
12.
Let X, Y be linear spaces and X*, Y* their dual spaces. Prove that (X x Y)* _ X* x Y*.
13.
Let p: X -> Y be a linear transformation whose domain X and range Y are both finite-dimensional linear space and pp*: Y* -> X* the dual transformation of gyp. Prove that an(Imp*) and Ker,p* = AN(Im gyp) (a) Im p= an(Ker np*) and Im p* = AN(Ker gyp) (b) (c) r(,p) =
14.
Let X be a finite-dimensional linear space and Y a subspace of X. Prove that (a) AN (Y) a5 (XI Y)* and (b) X*/AN (Y) = Y*. Show thaf (a) and (b) hold also for an arbitrary linear space if AN (Y) is defined as the subspace of all covectors f of X such that f(x) = 0 for all xEX.
§8. The Category of Linear Spaces
The `modern algebra' of the 1930s concerned itself mainly with
the structures of linear spaces, groups, rings, fields and other algebraic systems. These algebraic systems have been studied axiomatically and structurally ; furthermore, homomorphisms have
been defined for each type of algebraic system. In the last three decades, in which algebraic topology and algebraic geometry have undergone a rapid development, it has become clear that the formal properties of the class of all homomorphisms repay independent
study. This leads to the notion of category, functor and natural transformation first introduced by S. EILENBERG and S. MACLANE in 1945, and to a new branch of mathematics called category theory.
Recent years have seen more and more mathematics being formu-
lated in the language of category theory - an event somewhat similar to what took place earlier in this century when an effort was made to formulate mathematics in the language of set theory. In this section we shall acquaint ourselves with some basic notions of category theory. It is not just for the sake of.following a fashionable trend that we introduce such sophisticated notions here. The
§8
THE CATEGORY OF LINEAR SPACES
85
reason for our effort is that some methods of linear algebra cannot be made clear just by the study of a single linear space or a single linear transformation in isolation; for a deeper understanding of the subject we have to study sets of linear transformations between linear spaces and the effect of linear transformations upon other constructions on linear spaces. To do this effectively, we need the language of category theory. If set theory and linear algebra as given in the previous sections are
thought to be at the first level of abstraction from mathematics taught at the schools, then the material of the present section is at a second level of abstraction. Readers who have not quite got used to the first level of abstraction are adviced to omit this section. This will
in no way impede their progress in this course of study since the language of category theory will only be used in occasional summary remarks. A. Category
The immediate aim of §8 is to give a definition of natural transformation so that the results of §7C can be organized in a more systematic way and viewed in a better perspective. This, however, makes a definition of functor necessary, since functors are the things
that natural transformations operate on. Again in order to define functor, we must first define category since functors are the tools for comparing categories.
We shall first try to arrive by way of a leisurely discussion at a definition of the category of sets as a working model of the definition of category. To study sets in a unified fashion, we first accommodate all the sets in a collection. This collection s/of all sets,
as we know, is no more a set. The mathematical term for such a collection is class. Thus we begin our study with a classYMembers of this class are called objects, and in the present case they are sets. While membership of sets is a chief concern of classical set theory,
this has to be of only secondary interest to the category of sets, which concerns itself mainly with relations among sets and other con-
structions on sets in general. Thus together with the class.Jof all sets, we study the class of sets of mappings Map(X, Y), one for each pair of sets. Something similar to a structure on these two classes is given by composition of mappings i.e. a mapping Map(X, Y) x Map(Y, Z) -> Map(X, Z) for every triple of sets X, Y and Z which takes every pair (0, ) to the composite OoO. Thus we
86
II
LINEAR TRANSFORMATIONS
find ourselves in a situation rather similar to that when we first considered abelian groups earlier in § IA, when we were dealing with a non-empty set together with an internal composition law satisfying certain well-known conditions called axioms. Similarly we single out
certain well-known and useful properties as axioms. These are as follows: (1) Map(X, Y) and Map(Z, T) are disjoint for X * Z or Y 0 T. (2) If 4)E Map(X, Y), >, c= Map(Y,Z) and t e Map(Z,T), then
t. (0-0) = (o4)o0. (3) For each set X, there exists iyE Map(X,X) such that 0 o ix = ¢ for each 0 E Map(X, Y) ix oo = t i for each OE Map(Z,X). and
Before we proceed to give a definition of category, we remark that a
number of important concepts of set theory can be actually recovered from (1), (2) and (3) alone. For example the empty set 0 is characterized by the condition that Map(O, X) is a set with exactly one element for each X. Similarly 0 is injective if and only if it is left cancellable, i.e. l; = 1 whenever 4ot = Oo%j/; and 0 is surjective if and only if it is right cancellable.
Clearly the basic properties (1), (2) and (3) are common to the class of all linear spaces over A and the class of all linear transformations of these linear spaces together with composition of these linear transformations. Furthermore these properties are also common to different classes of algebraic systems. This therefore motivates the following definition. DEFINITION 8.1. A category ro consists of (i) a class, the elements of which are called objects of (ii) a function which to each pair (X, Y) of objects of Wassigns a. set Mor W(X, Y), (iii) a function
Morq(X,Y) x for every triple of objects X, Y and Z of The function given in (iii) is called composition; its effect on a pair is denoted by 004). The above data are subject to the following axioms: [C11
MorV(X, Y) and Mor'(Z,T) are disjoint if X* Z or Y * T.
§8
THE CATEGORY OF LINEAR SPACES
87
OeMorc(Y,Z)andteMorC(Z,T), then
[C21
to(`Yo0) = Q. 0) - 0. For each object X there exists iXE Mor W(X,X) such that
[C3 J
Oo ix =
and iIoo =
for each OE MorW(X, Y)
for each OE Mor c(Z,X).
Some terminology and notation is convenient for our formulation. The elements of Mor'(X, Y) are called morphisms. Often we write Mor(X, Y) for MorW(X, Y). Instead of Or= Mor?,(X, Y) we frequently
write 0: X -+ Y or X
Y. Moreover, we denote
X = D(¢) = demain of 0, Y = R (0) = range of 0. It follows from the definition that the composite Y,o0 is defined if and only if R(¢). In this case D(iJ/o0) = D(O) and R(io¢) _
R(0). The element ix postulated in [C3] is easily seen to be unique; it is called the identity of X. A morphism 0: X -> Y is called an isomor-
phism if there exists 0: Y - X such that Woo = ix and 0o/ = iy. In ; this is also an isomorthis case >V is unique, and is denoted by (0-1) 1 = 0. In this case we also say that the objects X phism and and Y are equivalent. If 0: X -- Y and iJi: Y -+ Z are isomorphisms, then so are oO and (lJ,o0)' = 0-'o1 -'. We leave to the interested 0-1
reader the proof of these statements. In this way, the category Iof sets is the category where (i) objects are all the sets, (ii) Mor (A, B) = Map(A, B), (iii) composite of morphisms has the usual meaning, (iv) iA : A -> A is the identity mapping.
The isomorphisms of this category are the bijective mappings of sets. The category j of additive groups is the category where (i) objects are all the additive groups, (ii) a morphism 0: A -> B is a mapping such that O(x + y) = O(x) + O(y) for all X, YEA, (iii) composite of morphisms has the usual meaning, (iv) the identity morphism iA : A - A is the identity mapping.
88
11
LINEAR TRANSFORMATIONS
Finally, the category AA) of linear spaces over A is a category where (i)
objects are all the linear spaces over A, (ii) a morphism ¢. X - Y is a linear transformation, (iii) composite of morphisms has the usual meaning, (iv)
the identity morphism i1: X - X is the identity linear transformation of X.
Here isomorphisms of .1'(A) are the isomorphisms (i.e., bijective linear transformations) of linear spaces over A. The category .,Y(A) of finite-dimensional linear spaces over A is similarly defined.
The guiding principle is therefore this: whenever a new type of algebraic system is introduced, we should also introduce the type of morphism appropriate to that system as well as the composition of morphisms. (p The categories _V, Y. (A) and x (A) carry some extra structure inherited from the algebraic structures of their objects and properties of their morphisms; they are special cases of additive category defined as follows.
A category his called an additive category if [AC 1 ] for any two objects A, B of r,, Mor (A,B) is an additive group. [AC2 ] for any three objects A, B and C, the mapping Mor(A, B) x Mor(B, C) -> Mor(A, C) defined by (0, y) -> o¢ is biadditive, i.e., for ¢, E Mor (A, B) and >y, i' E Mor(B, C) DEFIMTION 8.2.
jio(4 + 0) = >yo¢ + I/so¢' and (, + Vi') oO = t/loO + 0'4 .
Additive categories are of special interest to us, since most categories that appear in this course are additive categories.
Finally we remark that a category as defined in 8.1 is not neces-
sarily a very "large" mathematical object. For example, a very "small" category ,/' is defined by specifying that (i)
_q has two distinct objects 1 and 2;
(ii)
Mor (1,1), Mor (2, 2) and Mor(1, 2) are singleton sets and Mor(2, 1) _ 0;
(iii) composition in .9 has the obvious meaning.
§8
THE CATEGORY OF LINEAR SPACES
89
The category .f can be presented graphically as an arrow:
1
Therefore
2
' is called the arrow category.
Functor We have remarked earlier that one of the guiding principles in category theory is that whenever a new type of mathematical object is introduced, we should also introduce the appropriate morphisms B.
between these objects. Thus we ought to specify morphisms of categories, so that relations between categories can be formulated; these are called functors. It is natural that given two categories wand 9we should compare objects of dwith objects of ,won the one hand and morphisms of z/with morphisms of on the other hand in such a way that the structures given by compositions should be preserved. This state of affairs is expressed by the definition below. .
Let d and , be categories. A covariant functor T: d -*.Wconsists of a pair of functions (both denoted by T). (i) The first assigns to each object X in z/an object T(X) in ,W, DEFINITION 8.3.
(ii) The second assigns to each morphism 0: X -> Y in s/a morphism T(O): T(X) -* T(Y) in M.
The following conditions must be satisfied. [CO 1 l
T(i°O) = T(i)°T(O) if D(,) = R(O).
[CO 2]
T(iX) = iT(X).
Given two covariant functors T: -Vl -.W and S:-W-+ W, the composite SoT:.S -* ?is defined by S°T(X)= S(T(X)) and S° T(¢) _ S(T(¢)); S°T is also a covariant functor.
For every category ,', the identity functor I: ,j -
is the
covariant functor defined by IM = X and 1(0) = ¢. The forgetful functor F: 2(A) -> .q is an example of a covariant functor. This assigns (i) to each linear space its underlying set and (ii) to each linear transformation its underlying mapping. The effect of
the forgetful functor on $f(A) is, of course, to lose entirely the algebraic structure on the linear spaces and the algebraic properties of the linear transformations.
90
11
LINEAR TRANSFORMATIONS
Another example of a covariant functor is the functor G of Yinto (A) that assigns (i) to each set S the linear space FS generated by the set S, and (ii) to each mapping 0: S - S' the linear transformation ¢: FS -> F. defined in 5.8(c). In general FG(S) contains S as a proper subset and GF(X) is also different from X; therefore they are not the inverse of each other. The pair of functors F and G are related to each other in a way that is most interesting from the categorical point of view; they are said to form a pair of adjoint functors (see exercise 7).
For any set X, we can define a covariant functor TX: .y-* Y by assigning (i) to each set Y the set Map(X, Y) and (ii) to each mapping 0: Y - Y' the mapping (D: Map(X, Y) -+ Map(X, Y') such that 4(r) =
oo for every : X - * Y. The functor TX is usually denoted by Map (X, ).
Let T be a covariant functor of the arrow category q into the
category
.
Then T is completely determined by T(1) = X1, T(2) = X2
and the image of 0: X1 - X2 of the only non-identity morphism 1 -> 2. In other words the effect of T amounts to selecting a mapping 0: X 1 --> X 1 of sets, or simply that T is a mapping.
The operation D which assigns (i) to each linear space X its dual
X*, and (ii) to each linear transformation 0: X --> Y its dual ¢*: Y* - X* is not a covariant functor because, in reversing the order of composition, it takes the domain and the range of 0 to the range
and the domain of O* respectively. D is, however, a functor of another type, called contravariant functor. DEFINITION 8.4. Let .S/and R be categories. A con travarian t functor
U: / -> M consists of a pair of functions (both denoted by U). (i) The first assigns to each object X of _W an object U(X)
of M.
(ii)
The second assigns to each morphism 0: X -+ Y in
.
a morphism U(O): U(Y) -> U(X) in The following conditions must be satisfied.
[CON 1 l U(04) = U(O)o UM if D(>G) = D(O). [CON 2] U(iX) = iU(x) D above is easily seen to be a contravariant functor of 2' (A) into 9(A); more on this functor will be said in § 8C. Another example of a
contravariant functor is the functor U: ,y' -> ,' for a fixed Y which
§8
THE CATEGORY OF LINEAR SPACES
91
assigns (i) to each set X the set Map(X, Y), and (ii) to each mapping
0: X - X' the mapping Map(X', Y) - Map(X, Y) such that *() = to p for every t: X' -> Y. The functor Uy is usually denoted by Map( , Y). The covariant functor Map(X, ) and the contravariant functor Map( , Y) suggest that Map( , ) can be regarded as a functor in two arguments both in .9with values in J. . Functors of this type are called
bifunctors; we leave the exact formulation of a definition of bifunctor to the interested reader. Finally for additive categories, we also consider a special type of (covariant or contravariant) functor. DEFINITION 8.5. A functor T:
of additive categories is called
an additive functor if T(4> + 0) .= T(4>) + T(>I')
for any morphisms ¢ and Vi of J/ such that D(4>) = and R(0) =R(i1i). Following the method of constructing Map(X, ) and Map( , Y), we can easily set up functors Hom(X, ): .q'(A) -+ A(A) and Hom( , Y): _V(A) -* '(A). By bilinearity of composition of linear transformations, it follows that Hom(X, ) is a covariant additive functor and Hom( , Y) a contravariant additive functor. C.
Natural transformation
We have seen that D: AA) -* AA) which assigns to each linear space X its dual space X* and to each linear transformation 0: X -> Y its dual transformation O*: Y* - X* is a contravariant functor called the dual functor. The composite D2 = DoD:.(A) - -V(A), called the
second dual functor, is then a covariant functor such that D2 (X) _ X** and D2(0) _ 4>**: X** -> Y**.
In § 7C we have constructed for each linear space an injective linear transformation tX : X -)- X**, and we have seen that this construction which, unlike the methods of construction in 5.8, is independent of choice of a base of X. In fact tX is defined by tX (x) = FX for every x eX such that FX (f) =f(x) for every f EX*. Moreover for any linear transformation 0: X -+ Y, the corresponding transformations tX and ty on the domain and range of 0 are such that ty o 0 =
92
LINEAR TRANSFORMATIONS
II
v**otx, i.e. the diagram below
X
ty
Y
is commutative. Denoting by 1:A(A) -+Y(A) the identity (covariant) functor of .rte (A), we can rewrite the diagram above as tx
+ D2(X) D 2(0)
ty
D2(Y)
Thus we see that the left column of the diagram expresses the effect
of the functor I on the morphism 0: X -- Y in Y(A) and the right column that of the functor D2 on the same morphism ¢; therefore the collection t of morphisms tx (one for each object X of .I(A)) in AA) provides a comparison between the functors I and D2. In other words, t can be regarded as a morphism t: I -> D2 of functors. A morphism of functors is called a natural transformation; the general definition is as follows. 8.6. Given covariant functors, T, S: A -*. a natural transformation 4): T -- S is a function which to each object X in A. assigns a morphism 4)(X): T(X) -*S(X) in .W in such a way that for each morphism 0: X -* Y in-(/the diagram DEFINITION
T(X)
(WX)
8(X)
S(q)
T(O) 4)(Y)
T(Y)
S(Y)
is commutative, i.e. S(0)o4)(X) = 4)(Y)oT(0).
§8
THE CATEGORY OF LINEAR SPACES
93
If (P: T - S and' : S -+ R are natural transformations of functors, then (4wb) (X) _'(X)o-t(X) is a natural transformation 'Y°4: T --> R. With the categories W and kept fixed, we may regard functors T: d- as "objects" of a and natural transformations .
:T-
S as morphisms of Y', since composition of natural transformations defined above satisfies axiom [CI ] - [C3] of cateogry. Consequently a natural transformation 4): T - S is an isomorphism
in 5I or a natural isomorphism if and only if each O(X) is an ¢"1 isomorphism in, ' in which case Restricting consideration to
(X) = ((D(X))"1 . the category 2'(A) of finitedimensional linear spaces and denoting by I, D the identity and the dual functor of .'(A), we see that t: I -* D2 is a natural isomorphism of functors of A(A).
Finally we leave the formulation of natural transformation between contravariant functors to the interested reader, and conclude
this section with a remark that in general we do not compare a covariant functor with a contravariant functor.
D. Exercises 1. Let' be a category. An object T of Vis called a terminal object if for every object X of 'the set Mor(X, T) is a singleton. Prove that any two terminal objects of Ware equivalent. What are the terminal objects of the categories..X,9l(and-V(A)? Let (be a category. An object I of Wis called an initial object if 2.
3.
for every object Y of the set Mor(I, Y) is a singleton. Prove that any two initial objects of Ware equivalent. What are the initial objects of the categories. J/andY(A)? Let'be an category and A, B two fixed objects of le (a)
Prove that a category 9 is defined by specifying that (i)
objects of
are ordered pairs (X -+ A, X -* B) of
morphisms of F, and (ii) morphisms (X -> A, X -> B) -> (Y - A, Y -+ B) of -1 are morphisms X -> Y of 'such that
XA
BY
is commutative, and (iii) composition has the same meaning as in
94
II
LINEAR TRANSFORMATIONS
(b) A final object of is called a product of A and B in W. Prove that products of any two objects exist in the .
categories Y, d and t(A). Compare the category-theoretical definition of product with the set-theoretical definition of product in each of these three cases. 4.
Let4 be a category and A, B two fixed objects of W. (a) Prove that a category 9 is defined by specifying that (i)
objects of W are ordered pairs (A -' X, B -+ X) of
morphisms of<, and (ii) morphisms (A --' X, B --> X) - (A - Y, B - Y) of are morphisms X -* Y of such that
BAY is commutative, and (iii) composition has the same meaning as in C.
5.
(b) An initial object of `fo' is called a coproduct of A and B in Prove that coproducts of any two objects exist in the categories. dand'(A). what is the corresponding settheoretical concept in each of these three cases? Let le and -q be two categories. Prove that a category x _q is defined by specifying that (i) objects of Yo" x Y are ordered pairs (A, B) where A E'and BE-Y, (ii)
morphisms (A, B) - (A', B') of ' xg are ordered pairs (A -* A', B -+ B') of morphisms,
(iii) composition in lexg is obtained by forming compositions of components. Wx 9 is called the product of 'and.9J. 6.
' be covariant functors. Prove that Let S: ` r te ' - > ?and T: -q S x T: 15x9 -W' x.'' ' defined by (S x T)(A, B) = (S(A), S(B)) for every object (A, B).
(S x T)(f g) = (S(f), S(g)) for every morphism (f,g) is a convariant functor.
§8
7.
THE CATEGORY OF LINEAR SPACES
95
Let F: -V(A) ->,9be the forgetful functor and let G: 9-3t(A) be the functor defined in §8B which takes every set S to the free linear space FS generated by S over A. Prove that for every XE Y (A) and every Sc 9. tx,s : Map (S, F(X)) -> Hom (G(S), X)
which takes every mapping p to its linear extension gyp: Fs -> X is
a bijective mapping. Prove also that the following diagrams are commutative: Map(S, F(X)) -* Hom(G(S), X)
Map(S, F(X)) , Hom(G(S), X)
Map(S',F(X )) - Hom(G(S'), X)
Map(S, F(X')) -> Hom(G(S),X').
CHAPTER III AFFINE GEOMETRY
To define the basic notions of geometry, we can follow the so called synthetic approach by postulating geometric objects (e.g. points, lines and planes) and geometric relations (e.g. incidence and betweenness) as primitive undefined concepts and proceed to build up the geometry from a number of axioms which are postulated to govern and regulate these primitive concepts. No doubt this approach is most satisfactory from the aesthetic as well as from the logical point of view. However it will take quite a few axioms to develop the subject beyond a few trivial theorems, and the choice of a system of axioms which are both pedagogically suitable and mathematically interesting does present some difficulty at this level. One alternative to the synthetic approach is the so called analytic
(or algebraic) approach which puts the geometry on an algebraic basis by defining geometric objects in terms of purely algebraic concepts. This approach has the advantage of having enough properties implicitly built into the definitions, so that the axioms of the synthetic geometry become theorems in the analytic geometry. Thus
the initial difficulties encountered in the. synthetic approach is circumvented. This approach is also more appropriate to the present mainly algebraic course since it lends itself to bringing out clearly the interplay of algebraic and geometric ideas.
§9. Affine Space
As a prototype affine space, we shall take the ordinary plane E. Here we shall concern ourselves chiefly with incidence and parallelism in E and not too much with its metric properties, i.e. properties that involve length, angle measurement and congruence.
To study the geometry of E algebraically, we can first choose a pair of rectangular coordinate axes and then proceed to identify the
points of E with elements of the set R X R. This is the familiar method followed by secondary school coordinate geometry and we have seen that geometrical theories have been greatly assisted by the use of coordinates. Unfortunately the success of these tools has often over-shadowed the main interest of the geometry itself which lies 96
§9
AFFINE SPACE
97
only in results which are invariant under changes of the coordinate systems.
It is therefore desirable to begin our study of affine geometry with
a coordinate-free definition of affine space. We have seen at the beginning of Chapter 1, that by choosing an arbitrary point 0 as the origin, we obtain a one-to-one correspondence between the set E and the linear space V of all vectors of E with common initial point 0. To identify points of E with vectors of V would be just as unsatisfactory as with vectors of R2 since this would give undue prominence
to the point 0 which is to be identified with the zero vector of V. Consider now the linear space V' of all vectors of E with common initial point P, where P is an arbitrary point of E, and proceed to compare the linear spaces V and V' geometrically. In the affine geometry
of E, not only are points regarded as equivalent among themselves but also parallel vectors are regarded as equivalent, since a vector can be taken to a parallel vector by an allowable transformation. Now this relation between parallel vectors can be formulated by means of an isomorphism Op : V' -> V defined as follows. For any vector x = (P, Q) of V', we define Op (x) = (0, R) as the vector of V such that the equation x + (P, 0) = (P, R) holds in V', or in other words, PQRO is a parallelogram in the ordinary plane E.
0
R
i P x Q The relation between ordered pairs of points of E and vectors of V can now be formulated by means of a mapping r: E x E --> V defined by
T(P, Q) = ¢p(P, Q) for all P, QeE. We note that the dependence of V on the choice of the origin 0 can be regarded as merely a matter of notation so far as the mapping r is concerned. The mapping r is easily seen to satisfy the following properties:
r(P,Q)+T(Q,R)=,r(P,R) forallP,Q,ReE; (2) for each PEE, the mapping Tp: E -> V, defined by (1)
Tp(Q) = T(P, Q) for all Qe,E, is bijective.
98
III
AFFINE GEOMETRY
A. Points and vectors Following the intuitive discussion above, we shall begin our study of affine geometry with a formal definition of affine space.
DEFINITION 9.1. Let X be a linear space over A. An affine space attached to X is an ordered pair (A, r) where A is a non-empty set and r: AZ -+ X is a mapping such that the following conditions are satisfied: [Aff 1 ]
r(P, Q) + r(Q, R) = r(P, R), for any three elements P, Q andR of A;
[Aff 2]
for each element P of A the mapping rp: A - X, such that rp(Q) = r(P, Q) for all QUA, is bijective.
It follows from the definition that an affine space (A, r) attached to a linear space X over A consists of a non-empty set A, a linear space X over A and a mapping r that satisfies the axioms [Aff 1 ] and [Aff 2] ; however, if there is no danger of misunderstanding, we shall simply use A as an abbreviation of (A, r) and say that A is an affine
space. The dimension dimA of an affine space A attached to the linear space X is defined as dimX. It is convenient to refer to elements of the set A as points of the affine space A and elements of the linear space X as vectors of the affme space A, while elements of A are called scalars as usual. For any two points P and Q of the affine space A we write = r(P, Q) and call P and Q the initial point and the endpoint of the vector PQ respectively. Under these notations, we can give the axioms [Aff l ] and [Aff 2] a more geometric formulation:
P
[Aff 1 ] PZ + OR = A for any three points P, Q and R of the affine space A.
[Aff 21 for any point P and any vector x of the affine space A there is one and only one point Q of A such that P = X. By the second axiom above, we can denote by P + x the unique point Q of A such that PP = X. Let us now verify some basic properties of points and vectors of an affine space.
§9
THEOREM
AFFINE SPACE
99
9.2. Let A be an affine space and P, Q any two points of
A. Then
(i) PP = 0, (ii) PQ = -QP and
(iii) P=Qif PQ =0. PROOF (i) and (ii) follow immediately from a substitution of Q = P and a substitution of R = P in [Aff 1 J respectively. To prove (iii) we observe that P7 = PP = 0. Therefore P = Q follows from [Aff 2 ] .
Before we study some examples of affine spaces, let us observe that the mapping r:A2 -+ X which defines the structure of an affine space on the set A is not a composition law in the sense of § IA, and hence the affine space (A, r) is not an algebraic system in the sense of § IA. However, for each affine space A we can define an external composition law a: A x X -> A in the following way: a(P,x) = P + x for all Pr---A and all x(=-X.
This composition law a obviously satisfies the following
re-
quirements: (a) for any P and Q in A there existsxeX such that v(P,x) = Q; (b) o(P,x+y) = o(o(P,x),y) for all PPA and x,yeX. (c) for any xeX,x=O if and only ifa(P,x)=PforallPeA.
In this way we get an algebraic system (A, a) associated to the affine space A.
Conversely if (A, a) is an algebraic system where A is a non-empty
set, X is a linear space over A and a: A x X -+ A satisfies the requirements (a), (b) and (c), then we verify by straightforward calculation that for all PeA the mapping x - o(P,x) is a bijective mapping of X onto A. In other words, for any two elements P and Q of A there is a unique vector x of X such that Q = u(P,x). Therefore we get a mapping r: A2 -> X such that
r(P,Q) = x where Q = o(P,x).
It is easy to verify that (A, r) is an affine space attached to X and furthermore a(P,x) = P + x for all PeA and xeX.
100
111
AFFINE GEOMETRY
EXAMPLE 9.3. For each linear space X over A we define a mapping r: X2 -+ X by setting -r (,Y, y) = y - x for all x, yEX.
The axioms [Aff 1 ] and [Aff 2] are easily verified, and therefore
(X,r) is an affine space attached to X. We call this the canonical affine space attached to X and denote it by Aff(X). In particular, the affine spaces Aff(R") and Aff(C".) are the affine spaces we study in ordinary coordinate geometry. EXAMPLE 9.4. Consider a subspace Y of a linear space X over A. If z
is a fixed vector of X, then the subset A = {xeX: x-zEY) of X is an element of the quotient space X/Y. Moreover A does not, in general, constitute a subspace unless z = 0, and in this case A = Y. If we define
r: A2 -> Y by r(x,y) = y - x for all x, yEA, then (A,r) is an affine space attached to Y. We can illustrate this state of affairs by taking X = R2, y = { (ai, a2): ai = a2) and A = { (ai,a2 ): a2 = ai - 1) .
Fig. 5
EXAMPLE 9.5. Let X be a linear space over A and A an affine space attached to X. Unlike the zero vector 0, which plays a distinguished role in the algebraic structure of X, no particular point of the affine space A distinguishes itself from the other points of A. However if
we choose a fixed point P of A as a point of reference, then by means of the b{jective mapping rp: A -> X (Tp(Q) =P6) we can identify the set A with the set X. As a result of this identification, A
§9
AFFINE SPACE
101
becomes a linear space over A with respect to the following addition and multiplication: Q+R=S
where P = PQ + A
XQ=T where PteF= PQWe call this linear space A the linear space obtained by choosing P as the origin. The origin P now plays the role of the zero vector and the mapping Tp is now an isomorphism. B. Barycentre
Corresponding to the concept of linear combination of vectors of
a linear space, we have here in affine geometry the concept of barycentre of points of an affine space. With the aid of this concept we shall be able to formulate most of our results in affine geometry in terms of points of the affine space alone. Let us consider k (k > 0) points P1, P2, ... , Pk of an affine space A and k scalars A1, A2, ... , Ak of A such that Al + A2 + ... + Ak = 1.
Choose an arbitrary point P of A and consider the linear com+ AkP . By axiom [Aff 21, there bination Al PP1 + A2PP2 + exists a unique point R of the affine space A such that
PZ = A1P'1 + A2P12 + ... + AkPPk. We want to show that the point R is uniquely determined by the points P1 and the scalars A, and is actually independent of the choice of the point P. Indeed, for any point Q of A, we get QR = QP + PR
=(A1 +A2 +...+Ak)Q0+(AiPP1 +A2P'2 =A1 (Q'+P1°1)+A2(QP+PP2) + ... +Ak(QP+Ppk) = Al QP1 + A2 QJP2 + ... + AkQPk
Therefore the point R does not depend on the choice of the point P; in particular, by choosing P = R, we see that the point R is uniquely determined by the equation
A1RP1 + A2RP2 + ... + AkRPk = 0. Hence the following definition is justified.
102
111
AFFINE GEOMETRY
DEFINITION 9.6. Let X be a linear space over A and A an affine space attached to X. For any k (k > 0) points P1, P2, ... , P of A and any k scalars Al , A2 , ... , Ak such that Al + A2 + .. + Xk = 1,
.. + Ak RPk = 0 called the barycentre of the points P1, P2, ... , Pk corres-
the unique point R of A such that Al RP1 + X2RP2 + is
ponding to the weights X1, A2, ...
,
X. In this case we write
R=A1P1+A2P2+...+XkPk. We observe that in writing R = X1 PI + X2P2 + ... + XkPk, i.e., R
is the barycentre of P; corresponding to X,, the condition that Al + A2 + ... + Ak = 1 must be fulfilled as it is required by the definition. The following example will illustrate this state of affairs. In the ordinary plane, let P1 and P2 be two distinct points. If Al = A2 = 1
(so that, Al + X2 * 1) and if we take two distinct points P and Pas points of reference, then the points R and R' determined respectively by
P Z = PP1 + PP2
P-h I
P2
are also distinct.
Fig. 6
P
Hence the process does not give a unique point corresponding to P1 and P2, so that we cannot use it to attach a meaning to PI + P2 . On the other hand if Al = A2 = %z (so that, Al + A2 = 1), the point M = 1/zPI + '/2 P2
§9
AFFINE SPACE
103
has a definite meaning, i.e., the barycentre of P1 and P2 corresponding to the weights 1/2 and 1/z. In fact M is the midpoint of the points P1 and P2 in the ordinary sense. In general, for any k points P, , P2, ... , Pk of an affine space, the barycentre
+ k'k C = kP1 + kP2 + is .called the centroid of the points P, P2, ... , Pk ,
.
For k = 3, we get
C= 1P1 + 1P2 + 1P3 3 3
3
and therefore P3C = 1 (P3 P, + P3P2). 3
Hence, if P1, P2, P3 are three points on the ordinary plane then the centroid C is the centroid of the triangle P1 P2 P3 in the ordinary sense.
P3
P3 P1 + P3 P2
Fig. 7
For formal calculations with barycentres, a distributive law holds:
()
P
q
i=1
j =1
q
p
= E (Eµ;A;j)Pj Ep1 (EX--Pj) 11 j=1 i=1
..., p and j = 1 , ... , q,P1 is a point of an affine space A and p;, Xij are scalars such that where, for all i = 1 ,
E, = 1 and E A;j = i
i
I
for all i = 1,
... ,
p.
104
III
AFFINE GEOMETRY
We observe that Q, = EA11Pi is a barycentre for all i = 1,
... , p and
I
hence on the left-hand side of (*) we get the barycentre Q = Eµ,Q, which is characterized by
`
µ1QQ1 = 0.
The assumption on the scalars µ1 and Aii implies that EµiXii = 1, i.i
and hence on the right-hand side of (*) we get the barycentre P = E (E µi A1i)Pi which is characterized by i
i
E (Eµ,A,i)PY = 0. Therefore, by the distributive law of linear combinations of vectors, we get
PQ = Eµ1PQi = Eµ1(EA11PP) = E (Eµ1Aii)PPi = 0
i i i and hence P = Q, proving the distributive law (*) of barycentres. i
i
It is now straightforward to extend Definition 9.6 of barycentre to a definition of barycentre of an arbitrary non-empty family of points of an affine space. Let (Pi)iEI be a non-empty family of points of an affine space A and ()1)iet a family of scalars of finite support such thati ; i = 1. Then there exists a unique point R of the affine space A such that PR = EAifi °i for all PEA
iEl
or equivalently PEI
R is then called the barycentre of the points Pi corresponding to the weights A, (iEI), and we write
R = EA,P1. iEl
C.
Linear varieties
With the aid of the concept of barycentre, which plays a role in affine geometry similar to that played by the concept of linear com-
§9
AFFINE SPACE
105
binations in linear algebra, we are in a position to generalize the ordinary notions of "straight line" and "plane" in the elementary geometry. 9.7. Let X be a linear space over A and A an affine space attached to X. A subset V of A is a linear variety of the affine space A if for any k(k > 0) points P1, ... , Pk of V and any k scalars DEFINITION
X,
,
...
,
Ak of A such that X, + ... + Ak = 1, the barycentre
X 1 Pl + ... + X Pk belongs to V. It follows from the definition that the empty set 0 and the set A itself are linear varieties of affine space A. The intersection of any family of linear varieties of A is clearly also a linear variety of A. However, the union of linear varieties is, in general, not a linear variety. EXAMPLE 9.8.
Let A be an affine space and S a subset of A. It
follows immediately from the distributive law of barycentres that the
subset V of A consisting of all barycentres P of points of S (i.e., P= E XiPi where P;ES and (X
iEl
is a family of finite support such that
Eii = 1) constitutes a linear variety of the affine space A. We call V the linear variety of A generated or spanned by points of S. Clearly V is the smallest linear variety (in the sense -of inclusion) that in-
cludes the given set S. In particular the empty linear variety 0 is generated by the empty set 0 and the linear variety generated by a singleton subset {P} is (P) itself. For practical purposes, we usually make no distinction between the linear variety (P) and the point P.
Let us now examine the formal relations between the linear varieties of an affine space A and the subspaces of the linear space X to which A is attached. THEOREM 9.9. Let X be a linear space over A and A an affine space
attached to X. For any non-empty subset V of A the following properties are equivalent: (i) V is a linear variety of A;
(ii) for any fixed point P of V the set of all vectors PQ where QEV is a subspace of X; (iii) there exists a point P of V such that the set of all vectors P72 where QeV is a subspace of X
106 PROOF. (i)
111
AFFINE GEOMETRY
(ii). Since PP = 0, the set of vectors in question is
clearly non-empty. It remains to be shown that for any two points Q, R of V and any two scalars A, p of A there exists a point S in V such that P = APP + µP2. Since P, Q and R belong to V and V is a linear variety, the barycentre S = (I - A - µ) P + XQ + µR belongs to V and
satisfies the equation P = AP + 01.
The implication (ii) (iii) is trivial. (i). Let P1i ... , Pk be k points of V and A1, ... , Ak k (iii)
scalars of A such that AI + ... + Ak = 1. We have to show that the barycentre R = Al PI + + AkPk belongs to V. By+(iii) there exists
a point in V such that PQ = AIPPI + ... +AkPPk. On the other + AkP°k ;therefore P = A. Hence R = Q hand, P = AIP0°I + belongs to V.
If P and P are any two points of a non-emptyylinear variety V of
A, then by 9.9 we get two subspaces Y = {PQ: QEV} and Y' _ {P'Q: QEV } of X. Now it follows from the equation
P6= P,QP7P that Y C Y'. Similarly Y' C Y, and hence Y' = Y. Moreover it follows that Y = {PQ: PEV and Q(=- V) ; in other words Y is the subspace X consisting of all vectors whose initial points and endpoints both belong to V. Therefore the non-empty linear variety can be regarded as
an affine space attached to the linear space Y. Consequently the non-empty linear varieties of an affine space A can be regarded as the "subspaces" of the affine space A.
We define the direction of a non-empty linear variety V of an affine space A as the linear space Y = (PQ: PEV and Q e V). Thus the
direction of a point is the zero linear space 0 while the direction of the entire affine space A is the linear space X. It is easily verified that given a subspace Y of X and a point P of A,
there exists a unique linear variety V with direction Y and passing through (i.e. containing) P. In particular if A = Aff(X), then the
linear varieties of A with a fixed direction Y are precisely the elements of the quotient space X/Y. With the aid of directions, we can define parallel linear varieties and the dimension of a linear variety. Two linear non-empty varieties
VI and V, with directions Y, and Y,, respectively, are said to be
§9
AFFINE SPACE
107
parallel if Yl D Y2 or Yl C Y2. The dimension of a non-empty variety V (notation: dim V) is defined to be the dimension of its direction. For the empty variety 0 we put dim 0 _ -1. By Theorem 4.4, we get the following theorem. THEOREM 9.10. If a linear variety V is included in a linear variety W, then dim V < dim W. Furthermore dim V= dim W if and only if V = W.
In accordance with the usual notation in elementary geometry, we call 1-dimensional linear varieties lines and 2-dimensional linear
varieties planes. An (n-1)-dimensional linear variety of an n-dimensional affine space A is called a hyperplane of A. In general if A is an affine space attached to X and V a linear variety of A with direction Y and if dim(X/Y) = 1, then we say that Y is a hyperplane of A. D.
Lines We now wish to see whether the points and lines of an affine space
have the usual properties of the "points" and "lines" in elementary geometry. As in elementary geometry, for any point P and any line L of an affine space A we say that P lies on L or L passes through P if PEL. By definition the direction of a line contains non-zero vectors; therefore on a line of an affine space there lie at least two distinct points.
If P and Q are two distinct points of an affine space A, then the linear variety L generated by P and Q consists of all barycentres R = ( 1 - X) P + XQ. Thus P i t = XP ; but on the other hand P? * 0 and therefore the direction of L has the dimension 1. Hence L is a line passing through P and Q. By 9.10, L is the only line passing through
P and Q. Therefore through any two distinct points of an affine space there passes one and only one line. Consequently it follows that two lines of an affine space are equal if and only if they have two distinct points in common.
Similarly, we can prove that given a line L and a point P of an affine space there passes through P one and only one line parallel to L.
In Theorem 9.9 we have an algebraic characterization of linear varieties in terms of vectors; the next theorem gives a geometric characterization of linear varieties in terms of lines.
108
III
AFFINE GEOMETRY
THEOREM 9.11. A subset V of an affine space A is a linear variety of
A if and only if for any two different points P and Q of V the line passing through them is included in V.
PROOF. The condition is clearly necessary for V to be a linear
variety. To prove the sufficiency, we may assume V to be nonempty, for otherwise the theorem is true trivially. Let P be a point V. Then by 9.9 we need only show that the vectors PQ, where Q E V,
form a subspace of the linear space X to which A is attached. In other words, we prove that for any two points Q and R of V and for any scalar A
(i) PS = APQ
for some SE V
(ii) PT = PQ + PR for some TeV . Now, if Q =P, then (i) is trivial. Suppose Q *P, then S = (l - A)P + XQ lies on the line passing through P and Q. Therefore S belongs to V and satisfies APQ = PS. To prove (ii), let T = 2M - P where M = %2 Q + 1/ R
is the midpoint of Q and R. T belongs to V; furthermore PT = 2PM = 2 ('hPQ +'/2PR) = PQ + PR. This completes the proof of the theorem. E. Base Consider k + I points Po, P1, ...
,
k > 0. If V is the linear
Pk of an affine space where
variety generated by the points P; (i = 0, 1, ... , k) and Y is the direction of V, then the linear space Y is generated by the k vectors PoPI , P0P2, ... , PoPk. By definition
dim V = dim Y; therefore dim V S k; furthermore dim V = k if and only if the k vectors PoPI, POP2, ... , PoPk are linearly independent. These results lead us naturally to a definition of linear independence of points of an affine space.
DEFINITION 9.12. A family (Po, P1, ... , Pk) of k + 1 points of an affine space where k > 0 is linearly independent if the linear variety
generated by the points Pi (i = 0,
1,
... ,
k) has dimension k;
otherwise, it is linearly dependent.
In the usual way, an infinite linearly independent family of points can be defined. Similar to the situation in §2B, we can also
§9
109
AFFINE SPACE
say that the points PO, P1, ... , Pk are linearly independent if the family (PO , P1i ... , Pk) of points is linearly independent. Thus a single point PO is always linearly independent, and so are also two distinct points Po and P1. Three points P, P1 and P2 are linearly dependent if and only if they are collinear, i.e., they lie on one and the same line. Finally it is easy to verify that subfamilies of a linear independent family of points are linearly independent. With the aid of the notion of linear independence, the concept of a base of a linear variety is defined modelling on Theorem 2.6. DEFINITION 9.13. A family (PO , P1, ... , Pk) of points of a linear variety V is a base of V if it is linearly independent and the points P1 (i = 0, 1, ... , k) generate V. Bases of infinite-dimensional linear varieties are defined similarly.
With respect to a base, coordinates can be assigned uniquely to points of a linear variety. This state of affairs is dealt with in the following theorem.
9.14. Let (PO , P1, ... , Pk) be a family of points of a linear variety V. The following statements are equivalent:
T!EOREM
(i) the family (PO, P1 , ... , Pk) is a base of V; (ii)
every
point
Q of V has a unique representation
Q=AOPo+A1P1+ ... +AkPkwhere X.+A,+ ... +Ak= 1 as a barycentre of the point P, P1, ... , Pk.
PROOF. (i)
(ii). Since the points P; generate V, every point Q of V
is a barycentre Q = Ao P + A, P, + .. Po4PI ,
. , Pk
+ XkPk. The points
are linearly independent; therefore the vectors o Pi ,
Pk are linearly independent. Hence it follows from the equation + AkPPk that the k scalars A1, ... , Ak are P ,Q = AI PoPI + + Ak = I. uniquely determined by Q. But so is Ao since X. + Al + (i). Since every point Q of V is a barycentre of the points (ii) , Pk, the k + I points P; generate V. Hence the k vectors Po , P, , PoPk generate the direction Y of V; and it remains to be PoPI , proved that these vectors aie linearly independent. Suppose A0, + ... + AkP, Pk = 0. Then putting µo = 1- (11 + ... + pk), we get
µ0POPO +µ,P0PI + +µkPoPk =0, and hence P0=pope+Al P1 + +µkP .By (ii), 1 and µI = µ2 = = µk = 0. Therefore µo the vectors POP,, ... , PoPk are linearly independent.
1 10
111
AFFINE GEOMETRY
Consequently, with respect to a base (Po, P,
,
.
. . ,
Pk) of the linear
variety V every point Q determines a unique family of scalars Ak) such that
Q = A0P0 + A, P, + ... + AkPk and Ao +X1 + ... + Ak = 1; and vice versa. We can therefore call the family (A0, A, , . . ., Ak) of k + I scalars the barycentric coordinates of the point Q relative to the base (Po, P1, . . . , Pk ). We note here that the number k + 1 of
barycentric coordinates of a point of V is by 1 greater than the dimension dim V = k of V and that the barycentric coordinates are subject to the condition that their sum is equal to 1. In the ordinary plane E, any three non-collinear points Po , P, and P2 form a base (Po, P11 P2) of E. Relative to this base the points P,
P11 P2 have barycentric coordinates (1, 0, 0), (0, 1, 0), (0, 0, 1) respectively. The midpoint M of P, and P2 has barycentric coordinates (0,1/2,'/2) while the centroid C of the triangle P0P1 P2 has barycentric coordinates (1 /3, 1/3, 113).
P, (1,0,0)
P2 (010, 1)
Let A be again an affine space and (Po, ... , Pk) a base of A. If Q is an arbitrary point of A, then the parallel coordinates of the point Q of an affine space A relative to the base (Po , ... , Pk) are defined to be the family (µ, , ... µk) of k scalars such that
POQ = p1P0P1 + ... + µkPoPk
§9
111
AFFINE SPACE
Here the point P. plays the role of the origin of the coordinate system.
Clearly this method is essentially the same as that of secondary school coordinate geometry. For example for the points P., P1, P2 and M of E above, we get ('h,'/2) as the parallel coordinates of M relative to (P,,, P1, P2 ).
F. Exercises 1.
Four points P, Q, R, S in an affine space is said to form a parallelogram PQRS if QP = S k. Show that if PQRS is a paralle-
logram, then the midpoint of P and R and the midpoint of Q and S coincide. 2.
Let T, U, V, W be four points in an affine space. Let the points P,Q,R,S be the midpoints of T and U, U and V, V and W, W and T respectively. Prove that PQRS is a parallelogram.
3.
Let (Po, P1,. .. , P,) be a base of an n-dimensional affine space. Let (x 1, ... , xn) denote the parallel coordinates of an arbitrary point P of the affine space relative to this base. Show that the
set of points whose parallel coordinates satisfy the linear equations «11X1+...+«lnXn (*)
-A
«,1X1+....+anX.=j,
is a linear variety. Conversely, prove that if L is a linear variety then scalars ark and P; can be found such that the points of L are
precisely those points whose parallel coordinates satisfy the linear equations M.
1 12
4.
111
Let L be a linear variety of an affine space and P a point not on L. (a)
5.
AFFINE GEOMETRY
Consider the set of all lines passing through P and a point of L. Let M be the set of all points on these lines. Is M a
linear variety? (b) Let A be a fixed scalar. Let N be the set of all points S such that PT = APQ for some point Q of L. Is N a linear variety? Determine all pairs (p, q) of natural numbers for which a pair L, M of linear varieties of an n-dimensional affine space exists such
that dim L = p, dimM=q andLr)M=Q. 6.
Let L and M be linear varieties of an affine space and let N be the set of barycentres AP + pQ wherePEL and QeM. Show that N contains every line joining a point of L to a point of M. Show that N is not necessarily a linear variety.
7.
Let S and T be skew lines in a 3-dimensional affine space. Show
that there exist a unique plane PS containing S and parallel to T, and a unique plane PT containing T and parallel to S. Show also that PS and PT are parallel. 8.
Let X be a linear space and xieX for i = 1, ... , r. Denote by Y
the subspace of X generated by the r-1 vectors xi - x,
... , r). Denote by L the linear variety of Aff(X) spanned by the points x 1 , . . . , x, of Aff(X) . Show that the points of L are ( i = 2,
precisely those vectors of X of the form x, + y where y E Y. Deduce that Y is the direction of L. 9.
Let P, , ... , P,, Pr+, , ... , PS be points of an affine space.
Show that the line through the centroid of P, , ... , P, and the centroid
of P,+, , ... , Ps passes through the centroid of
P, , ... , PS . Consider the different cases for s = 4. 10. Let P and Q be two points of an affine space. By the segment PQ we understand the set of points of the form Al' + µQ where A, µ > 0 and A + p = 1. A subset S of an affine space is said to be convex if for any two points P, Q of S all the points of the segment PQ belong to S. Give examples of convex subsets and examples of subsets which are not convex.
11. Show that if S is a subset of an affine space, then there is a smallest convex subset S containing S. Show that S is unique. (S is called the convex closure of S).
§10
AFFINE TRANSFORMATIONS
113
12.
Let P, , . . . , P, be r points of an affine space prove that the convex closure of the subset (PI , , P,) consists of all points of the form XIPI + ... + X P, where X, > 0 and Al + ... + A, = 1.
13.
Let L be a 2-dimensional linear variety in a 3-dimensional affine
space A. Show that the points of A \L are partitioned into two disjoint subsets called half spaces such that every pair of points of one and the same half space can be joined by a segment without passing through L. 14. Let L be a 2-dimensional linear variety in a 4-dimensional affine space A. Show that if P and Q are any two points of A\L then they can be joined by a series of consecutive segments PT I , T I T2, . . . , T -I T,,, T Q without passing through L.
§ 10. Affine Transformations
A. General properties
Let X and X' be linear spaces over A, and let (A, r) and (A',r')
be affine spaces attached to X and X' respectively. We are now interested in mappings 4' of the set A into the set A' which are compatible with the affine structures defined on the sets A and A'; these mappings are called affine transformatiQns of the affine space
A into the affine space A'. The requirements that an affine transformation 4' has to satisfy can be expressed in terms of vectors of X
and X as follows. Every mapping 4': A - A' of sets gives rise to a mapping 4' x 4': A x A --)- A' x A' defined by
4' x D(P,Q) = ((D(P), fi(Q)) for all P, QeA. We verify without much difficulty, that a unique mapping 0: X -> X exists such that r'o(4' x t) = ¢o-r, i.e., the diagram
AxA
X
(D x(D
'A'xA'
X'
1 14
Il1
AFFINE GEOMETRY
is commutative, if and only if the following condition is satisfied: (i)
for any four points P, Q, R, S of A, 4'( 4 Q
if PP=RS. In this case 0: X -+ X is given by O(PQ) = 4) (P) 4' (Q for all P, Q EA.
Now we say that 4': A - A' is an affine transformation if the condition (i) is satisfied and (ii) the unique mapping 0: X -> X' such that O(PQ) = (D(P)4'(Q)
is a linear transformation.
This is a good but rather complicated definition of an affine trans-
formation; we shall now formulate an equivalent geometric definition, i.e., in terms of points and barycentres.
We observe first that the equations of vectors involved in the conditions (i) and (ii) can be expressed as equations of points and barycentres. In fact for any six points P, Q, R, S, T, U of an affine space X the following statements hold: (a) PQ = RS
if and onlyifS=Q+R-P
(b) PQ + RS = P! if and only if T = Q + S - R (c)
XPQ =
Pt
if and only if U= (1 - X)P + XQ.
Therefore we can write the condition (i) as (iii)
t(S)=(D(Q)+4'(R)-4'(P) if S=Q+R-P;
and, similarly, the condition (ii) as
(D(T)=4'(Q)+4'(S)-4'(R) if T=Q+S-R; 4'(U) = (1 -X)4'(P) +X4'(Q) if U=(1 -X)P+XQ. Thus we see that (iii), (iv) and hence also (i), (ii) are satisfied if 4' takes barycentres into barycentres with the corresponding weights unaltered. This leads to the following geometric definition of an (iv)
affine transformation:
DEFINITION 10.1. Let A and A' be affine spaces attached to the linear spaces X and X over A, respectively. A mapping 4': A A' of the set A into the set A' is an affine transformation of the affine space A into the affine space A' if
4'(X1PI +...+1\k1k) _ Xl
'(PI)+...+Xk4'(Pk)
AFFINE TRANSFORMATIONS
§10
115
for any k (k > 0) points P1, ... , Pk of A and any k scalars X1,
...
Xk of A such that Xl +
+ Xk = 1.
Clearly any affine transformation in the sense of Definition 10.1 satisfies (i) and (ii); therefore we get a linear transformation 0: X -> X' defined by 30
q(PQ) = 4)( 4 Q for all P, QEA. 0 is called the associated linear transformation of the affine transformation 4). To show that any mapping fulfilling (i) and (ii) is an affine transformation, we first prove the following theorem. THEOREM 10.2. Let A and A' be affine spaces attached to the linear
spaces X and X over A respectively. If 0: X -+ X is a linear transformation and if P and P' are points of A and A' respectively then there exists an affine transformation such that (D(P) = P' and 0 is the associated linear transformation of 4). PROOF. We define a mapping 4): A --> A' by
4)(Q) = P' + 0(PQ), for all QEA,
i.e., O(P7) = P'4)( . Clearly 4)(P) = P' and if 4) is an affine transformation, then 0 is its associated linear transformation. Therefore it
remains to be shown that Q = X 1P1 + .. of 4), we get
4)
is an affine transformation. Let
+ XkPk, where X1 + ..
P'cb Q = O(PQ) = 0(x1 P
Xk = 1. Then by definition
i + ... + Xkph)
= X1P'tF(P, + ... + akP'4)(Pk . Therefore 4)(Q) _ X 14)(P3) + . . + Xk'F(Pk) and hence 4) is an affine transformation. -
Finally let 4): A -+ A' be a mapping fulfilling conditions (i) and (ii). Thus 0: X -* X' is the linear transformation defined by ,0 (PQ) = 4)(P) (D (Qj for all P, QEA.
Comparing 4) with the affine transformation of Theorem 10.2 determined by 0 and the points P and cF(P), we see that they are identical. Therefore 4) is an affine transformation.
1 16
III
AFFINE GEOMETRY
B. The category of affine spaces Let 4): A - A' and 4)': A' -+ A" be affine transformations. Then it is easy to verify that the usual composite 4)'o4): A -- A" is an affine transformation and, furthermore, its associated linear transformation
is /'o( where 0 and 0' are, respectively, the associated linear transformations of (D and V.
The category AMA) of all affine spaces over A is then the category where
(i) objects are affine spaces attached to linear spaces over A; (ii) morphisms are affine transformations; (iii) composite of morphisms has the usual meaning; and (iv) iA : A -* A is the identity mapping.
of the category of affine A covariant functor F: ,j4(A) -> spaces over A into the category of linear spaces over A is given by defining F(A) to be the linear spaceX to which the affine space A is attached and F(4)) to be the associated linear transformation 0 of the affine transformation 4). Another covariant functor Aff: f'(A) - 'j(A) is defined by putting Aff(X) to be the canonical affine space attached to X (see Example 9.3) and Aff(O) = 0. Then the composite FoAff of these two functors is the identity functor of the category.'(A), whereas the composite Affo F is different from the identity functor of the category,WJ(A) The isomorphisms of the category,54i(A) are bijective affine transformations. To see this, we prove the following theorem. THEOREM 10.3. Let 4): A -* A' be a bijective affine transformation
of the affine space A into the affine space A' and let 0 be the associated linear transformation of 4). Then the following statements hold: (a) 0 is an isomorphism, (b) 4-' is an affine transformation, and (c) 07' is the associated linear transformation of 4)-1.
PROOF. (a). Let A and A' be attached to X and X' respectively. For each x'EX' we can find P', Q' r= A' such that P'Q'= x'. If P, Q EA are such that 4) (P) = P' and 4)(Q) = Q', then 0(P Q) = 4)(P) 0 (Q = x'.
§10
AFFINE TRANSFORMATIONS
117
Therefore 0 is surjective. Let R, SEA be such that 0(RS) = 0. Then 4(R) 4 (S) = 0 and hence 4'(R) = 1(S). This means that R = S and hence RS = 0. Therefore 0 is injective. This proves statement (a). Statements (b) and (c) are immediate consequences of (a) and the definitions.
A bijective affine transformation is also called an affinity. It follows from Theorem 10.3 that the set of all affinities of an affine
space A onto itself constitutes a group with respect to the usual composition law of mappings. This group is called the affine group of the affine space A. Finally we remark that the affinities treated above are special cases of a more general type of mappings of affine spaces. The interested reader is referred to § 12A, B and C.
CHAPTER IV PROJECTIVE GEOMETRY
§ 11. Projective Space
Points at infinity Let A and A' be two distinct planes in the ordinary space, and let 0 be a point which is neither on A nor on A'. The central projection p of A into A' with respect to the centre of projection 0 is defined as follows: for any Q on A we set p(Q) = Q' if the points Q, Q' and 0 are collinear (i.e. on a straight line). If A and A' are parallel planes, then p is an affinity of the 2-dimensional affine space A onto the 2-dimensional affine space A'. In particular, p is a bijective mapping taking lines into lines and intersecting lines into intersecting lines and A.
parallel lines into parallel lines.
Fig. 8 118
§11
PROJECTIVE SPACE
119
Consider now the case where A and A' are not parallel. Here two lines, one on each plane, deserve our special attention. The plane which passes through 0 and parallel to A' intersects A in a straight line L and the plane which passes through 0 and parallel to A intersects A' in a straight line L'. It is clear that the points on L have no image points on A' under p and the points on L' are not image points of points on A under p. Therefore we have to exclude these exceptional points,- in order to obtain a well-defined bijective mapping p: A \L - A'\L'. The situation in relation to lines is equally unsatisfactory. Take a line G on A and suppose G * L. Then the image points of G will lie on a line G' of A' different form L' since G and G' are on one and the same plane passing through 0. Here the set of exceptional points is G n L on L and G' n L' on L'. For G = L, we do not have a corresponding line G' on A'. L'
Fig. 9
It is now no more true that intersecting lines correspond to intersecting lines and parallel lines correspond to parallel lines. To see
120
IV
PROJECTIVE GEOMETRY
this, consider two lines G, and G2 of A, neither of which is parallel
to the line L. If G, and G2 intersect at a point of L, then the
corresponding G', and G'2 will be parallel; if G, and G2 are parallel, then G', and G'2 will intersect at a point of L'. In order to have a concise theory without all these awkward ex-
ceptions, we can - and this is a crucial step towards projective geometry - extend the plane A (and similarly the plane A') by the adjunction of a set of new points called points at infinity. More
precisely, we understand by a point at infinity of A the di-
rection of a straight line of A and by the projective extension P of A the set of all points of A together with all points at infinity of A. For convenience, we refer to elements of P as POINTS. Furthermore we
define a LINE as either a subset of P consisting of all points of a straight line of A together with its direction or the subset of all points at infinity of A. The LINE consisting of points at infinity is called the line at infinity of A. Thus the projective extension of a plane is obtained by adjoining to the plane the line at infinity of the plane.
We have no difficulty in proving that in the projective extension P of the plane A the following rules are true without exception: (a) Through any two distinct POINTS there is exactly one LINE.
(0)
Any two distinct. LINES have exactly one POINT in
common. We observe that (a) holds also if one or both POINTS are at infinity and (R) holds also if one of LINES is at infinity. These two incidence properties of the projective extension P stand out in conciseness and simplicity when they are compared with their counterparts (A I) (A2) and (B) of the plane A : (A,) Through any two points there is exactly one line.
(A2) Through any point there is just one line of any complete set of mutually parallel lines.
(B) Two distinct lines either have exactly one point in common or they are parallel in which case they have no point in common.
We can now also extend the mapping p to a bijective mapping ir: P - P' by requiring (i) ir(L) = I' (ii) 7r(I) = L' and that (iii) 7r preserves intersection of LINES, where I and I' denote the LINES at
infinity of A and A' respectively. Instead of going into a detailed proof of the existence and uniqueness of a we illustrate this state of affairs by considering the following particular case.
§11
121
PROJECTIVE SPACE
Let to, t1 l t2 be cartesian coordinates in the ordinary 3dimensional space. Let A and A' be the planes to = I and t1 = 1 respectively and let the centre of projection 0 be the point with coordinates (0, 0, 0). Then the exceptional lines L and L' of the projection p are given by the equations {
to = I _0 t,
and
1to = 0
=1 respectively. Under the projection p, a point Q with coordinates (1, 1, t2) outside of L (i.e. t1 0 0) is taken to the point P(Q) with coordinates , 1,? Suppose the extensions of A to P, A' toP 1
(-
and p to 7r are carried out according to the description given above. For the point Q above, we get ir(Q) = p(Q). A point S on L with coordinates (1, 0, a) will be taken to the direction of the line passing through (0, 1, 0) and (1, 1, a). If T is a point at infinity of A (say T is the direction of the line passing through (1, 0, 0) and (1, 1, a)) then 7r(T) is the point on L' with coordinates (0, 1, Q). It is easy to verify that 7r is a bijection of P onto P taking LINES into LINES and is compatible with incidence.
122
IV
PROJECTIVE GEOMETRY
The illustrative example above also suggests that POINTS of the projective extension has ordered triples (subject to certain restriction) rather than ordered pairs of real numbers as coordinates. In fact in the projective extension P of the plane A(0 = 1) the points of A have coordinates of the form (1, t 1l S2) and the points at infinity can be given coordinates of the form (0, 1, t2) i.e. the coordinates of the images under ir. Removing the condition that one of the coordinates is 1, we may allow each POINT of the projective extension Pto have more than one set of coordinates by requiring that (t o, t1, t2) and (170, 171 , n2) are coordinates of one and the same POINT of P if they have the same ratio, i.e. t, = X (i = 0, 1, 2) for some A 0 0. This now suggests a preliminary algebraic definition of the projective plane as follows.
Every ordered triple (to, t, , E2) of real numbers, not all 0, represents a POINT, and every POINT may be represented in this fashion. The ordered triples (to, S1, t2) and (1?0, n1, 7)2) represent the same POINT if and only if there exists a number A 0 0 such that
t,=Ar7; fori=0, 1,2. This definition is short and in accordance with customary terminology, but has the grave disadvantage of giving preference to a parti-
cular system of coordinates, a defect that will be removed in due course. B.
Defmition of projective space We now give a definition of a projective space in the language of
linear algebra. In this definition we shall not make use of affine spaces, nor shall we distinguish between ordinary points and points at infinity.
Let X be a linear space over A. An equivalence relation A in the complement X\(0} of (0) in X is defined by the DEFINITION 11.1.
following requirement: xAy if and only if there exists a scalar A 0 0 such that y = Ax. The set of equivalence classes, i.e., the quotient set P(X) = (X\(O))/i is called the projective space derived from X.
A point (i.e., an element) of the projective space P(X) is, by definition, the set of all non-zero vectors of a 1-dimensional subspace of the linear space X. Consequently we can identify the projective space P(X) with the set of all 1-dimensional subspaces of the linear space X. If we denote by a: X \ 10 1 -+ P(X) the canonical surjection
§ I I
PROJECTIVE SPACE
123
(that maps every vector x of the domain into the equivalence class 7r(x) determined by x), then every point of the projective space P(X) can be represented by 7r(x) for some non-zero vector of X. Clearly 7r(x) = 7r(y) if and only if y = Ax for some non-zero scalar
X of A. If S is a subset of X, we shall write S for S \ {0}; thus O
7r: X -> P(X) under this notation.
EXAMPLE 11.2. When X = Ant J , we write P (A) instead of P(An+ 1)
and call P (A) the arithmetical projective space of dimension n. The points of this projective space can be represented by (n+ 1)-tuples (a,, a, , ... , of scalars of A such that (i) not all a; = 0, and and ((30, (31, i ... , (3n) represent the same . , (ii) (ao, a,,
point of P (A) if and only if a; _ Xt3; (i = 0, 1, ... , n) for some non-zero scalar A of A.
We observe that the projective extension of an ordinary plane considered in § 11A is a projective space according to Definition 11.1.
The dimension of the projective space P(X), to be denoted by dim P(X ), is defined as dim X -I if dim X is finite, and it is the same as dim X if dim X is infinite. Thus a projective space of dimension -1 is
empty and a projective space of dimension 0 consists of a single point. As usual we call projective space of dimension 1 (respectively 2) projective line (respectively projective plane). EXAMPLE 11.3. Let A be an affine space attached to a linear space X over A. Following the idea outlined in § 11A we can define points at
infinity of A to be the 1-dimensional subspaces of X and the projective extension P of A to be the set of all points and points at infinity of A. However a more useful and precise definition is desirable. Consider the linear space Y = A x X whose vectors are ordered pairs (A, x) whereAeA and AeX. Then the projective spaceP(X)
derived from Y has the same dimension as the affine space A. We wish to show that P(Y) and P are essentially equivalent and we may then adopt P(Y) rather than P as the projective extension of A. For
this purpose we construct a bijective mapping 4': P(Y) -+ P. Choose an arbitrary point Q of A as a point of reference. For any vector y = (A, x) of Y, where A * 0, we put 0(y) to be the point R of A such that Q-'R = 0 /A)x. For any vector y' = (0, x') where x' 0 0 we
124
IV
PROJECTIVE GEOMETRY
put $(y') to be the 1-dimensional subspace X of X generated by x'. Thus 0(y) is a point of A and $(y') is a point at infinity of A; in either case we get an element (or a POINT) of P. Moreover the mapping 0: Y\ (0) ->P satisfies the following condition: ¢(y1) =O(y2) if y1 = µy2 for some p 0 0. Therefore 0 induces a mapping 4): P(Y) -+ P such that 4)(7r(y)) = ¢(y) for all y 0 0. It is easy to verify that 4) is bijective. Hence
the projective space P(Y) can be taken as the projective extension of the affine space A. Figure 11 below illustrates the mapping ¢ where A
is identified with the subset { I) x X of Y as follows. Choose an arbitrary point Q of A as a point of reference and identify every point R of A with the vector (1, Qk) of (I) x X.
y=(X,x)
R = 0(y)
x'
Q /
X'_O(y')
Y, = (0,x
/
(0,0)
Fig. 11
C. Homogeneous coordinates
We saw in the preliminary definition of the projective plane in § 1IA and also in Example 11.2, that points of a projective space admit representation by coordinates subject to certain conditions. In general, let X be a linear space of dimension n+1 over A and P(X) the projective space of dimension n derived from X. We denote, as before, by ir: X \ ( 0) - P(X) the canonical surjection. If
§11
PROJECTIVE SPACE
125
(x,, x, , ... , is a base of the linear space X and x = aoxo + a, x1 + ... , + a,, x is a non-zero vector of X, then we say that relative to the base (xo, x 1 , . .. ,
the point ir(x) of P(X) admits a represen-
tation by homogeneous coordinates (ao, a,, ... ,
We verify
immediately the following properties of homogeneous coordinates:
(i) relative to the base (xo, x1i ... , xn ), every point 7r(x) of P(X) has a representation by homogeneous coordinates (ao, a,, ... , an ), where not every ai is zero. (ii) every ordered (n+l)-tuple (ao, a,, ... , of scalars of A such that not every ai is zero is homogeneous coordinates relative to the base (xo, x1, ... , xn) of a point ir(x) of the projective space P(X); and
(iii) two such (n + 1)-tuples (ao, a,, ... , an) and are homogeneous coordinates relative to the base (xo) x1, ... , xn) of one and the same point of P(X) if and only if there is a non-zero scalar X of A such that ai = i for
i=0, 1, ...,n.
D. Linear variety
Let X be a linear space over A, P(X) the projective space derived 0
from X, and Tr: X - P(X) the canonical sudection. For any subspace Y of the linear space X, we call the direct image 7r [ Y] = L a
linear variety of the projective space P(X). Now the subspace Y is itself a linear space over A, therefore we can construct the projective O space P(Y) = Y/A' according to Definition 11.1. Since the equiva-
lence relation A' is induced by the equivalence relation A, we
can identify the linear variety L with the projective space P(Y). By means of this identification, we define the dimension of the linear variety L as the dimension of the projective space P(Y). Thus if L is a linear variety and L = P(Y), then dim L = dim Y - 1. Lines, planes and hyperplanes will have the usual obvious meaning. It is customary to assign the dimension-1 to the empty linear variety. The one-to-one correspondence between the linear varieties of a projective space P(X) and the subspaces of X enables us to translate results on subspaces into results on linear varieties. For example the intersection of any family of linear varieties of P(X) is a linear variety of P(X). Consequently for any subset S of P(X) there is a smallest
linear variety (in the sense of the inclusion) that includes S as a
126
IV
PROJECTIVE GEOMETRY
subset; we call this linear variety the join of S or the linear variety generated by S. It is easy to see that the correspondence between linear
varieties and subspaces is compatible with the formations of intersection and join. In particular if L, and L2 are linear varieties of a projective space P(X) and L. = P(Yi) (i = 1, 2), then L, n L2 = P(Y, n Y2) and L, + L2 = P(Y1 + Y2), where L1 + L2 denotes the join of L, U L2. Therefore it follows from Theorem 4.5 that for any two finite-dimensional linear varieties L1 and L2 of P(X)
dim L1 +dimL2 =dim(L1 nL2)+dim(L1 +L2).
Following a similar line of argument, we can define linear independence of points of the projective space P(X). Let Q0, Q1 ,
. . .
,
Q, be r + 1 point of P(X). Then we say that the points Q0,
Q1, ... , Q. are linearly independent if their join Q, + Q, + is an r-dimensional linear variety. It follows from the
.
+ Q,
corres-
pondence between linear varieties and subspaces, that if Q; = 7r(x;) O with x;E X for i = 0, 1, ... , r, then the linear independence of the
points Q0, Q1, ... , Qr and the linear independence of the vectors , x, are equivalent. The following theorem also is a direct consequence of the definition and the correspondence mentioned
X05.. -
above. THEOREM 11.4. r + I distinct points Q0, Q I, ... , Q, of a projective
space P(X) are linearly independent if and only if for each i = 0, 1, .... r the point Q. does not belong to the join of the other r points.
Similarly a point R of P(X) is called a linear combination of the
points Q0, Q1, ... , Q, of P(X) if R belongs to the join of Q0,
... Qr. It is easy to see that if R = 7r(y) and Q; = ir(Xi), then the point R is a linear combination of the points Q0, Q1, . . . , Q, if and Q1,
only if the vector y is a linear combination of the vectors x X11 ... ,x,. Finally, similar to Theorem 9.11, we have the following characterization of linear varieties in terms of joins.
THEOREM 11.5. Let P(X) be a projective space. Then a subset L of
P(X) is a linear variety of P(X) if and only if for any two distinct points Q and R of L the line joining Q and R is on L. E. The theorems of Pappus and Desargues Let us now prove two well-known classical theorems of geometry.
-
§11
PROJECTIVESPACE
127
In the sequel, we use the expression LAB to denote the line generated by two distinct points A and B. THEOREM 11.6. (PAPPUS). Let L and L' be two distinct lines of a
projective plane P(X). If Q, R, S are three distinct points on L and Q', R', S' three distinct points on L', then the points of intersection A of LRS' and LR'S ,B of LQS' and LQ'S and Cof LQR, and LQ'R are collinear (i.e., they are on one and the same line). L'
Fig. 12
PROOF. We may assume that the point D of intersection of the lines
L and L' is different from any of the six points Q, R, S, Q', R', S', for otherwise the theorem is trivial.
Let x' and z be vectors of X such that ir(x') = Q and 7r(z) = D. Since R is on L = LQD there are scalars a and (3 such that 7r(ax' +(3z)
= R. Now a * 0 and 9 0 0 for otherwise R = D or R = Q. Therefore 7r(ax' + z) = R. Setting x = fix' we get Q = ir(x), D = 7r(z) and R = 7r(x + z).
Using a similar argument we get a scalar A * 0, 1 such that S = 7r(Xx + z).
Analogously we get a vector y of X and a scalar p * 0,1 of A such that S' = 7r(y), R' = 7r(y + z) and Q' = a(py + z). Since A is a linear combination of S = 7r(Ax + z) and R' = 7r(y + z),
128
IV
PROJECTIVE GEOMETRY
and at the same time a linear combination of S' = tr(y) and R = 1r(x + z),
from the equation
(fix+z)+(X- I) (y+z)=(a- 1)y+X(x+z) we get
A = tr(a) where a = Xx + (X - 1)y + Xz. Analogously we get
B=tr(b) where b=Xx -py, and
C = tr(c) where c = (p - 1)x + py +µz.
Between the vectors a, b and c we have X c + b - pa = O; Therefore the points A, B and C are linearly dependent and hence collinear.
Three points Q1, Q2, Q3 of a projective space are said to form a triangle Q 1 Q2 Q3 if they are linearly independent. In this case the
lines 11 = LQ2Q3, 12 = LQ3Q1 and 13 = LQ1Q2 are called the sides
of the triangle Q1 Q2 Q3. We say two triangles Q1 Q2 Q3 and R1 R2 R3 are centrally perspective (or in perspective) if the lines LQ.R. (i = 1, 2, 3) are concurrent, i.e. if they intersect at a point (Fig 13).
Fig. 13 We say two triangles with sides 11, 12, 13 and g 1, 92, g3 are axially
perspective if each pair of the corresponding sides i and g; intersect at a point and these points of intersection are collinear (Fig 14).
§11
129
PROJECTIVE SPACE
(Q3
i
\
Q2 \
There are degenerate configurations that deserve some attention, i.e. configurations in which the triangles assume very special position; for example R, = QI or R2, R3, Q2, Q3 are collinear. We will exclude all these cases and suppose that the configurations in question are not too special. THEOREM 11.7. (DESARGUES) If two triangles are centrally perspective,
then they are axially perspective. QI
130 PROOF. Let
IV
triangles
PROJECTIVE GEOMETRY
Q1 Q2 Q3
and
R1 R2 R3
be
centrally
perspective and denote by T the point of intersection of the lines LQ1R1(i = 1, 2, 3). Let t, x1, y1 (i = 1, 2, 3) be vectors of X such that T = 7r(t) , Q1 =7r(x1) and R; =
(y1).
The assumption on the triangles implies that t = Al x1 + I11Y1 = X2x2 + 1-12Y2 = X3x3 + µ3Y3
for some scalars X1 and µ1. Hence
X2 X2 -X3x3 -µ3Y3 -µ2Y2 X3x3 -X1x1 =µ1Y1 -µ3Y3 A1x1 -'X2x2 =µ2Y2 -µ1Y1
From the first equation above we get a point C1 = ir(X2x2 - X3x3) = ir(113Y3 - µ2Y2)
which is on both 11 and g1 ; therefore C1 is the point of intersection of this pair of corresponding sides. Similarly/ C2 = Tr(X3x3 - X1 x1 ), C3 = (A1 x1 - A2x2)
are the points of intersection of the other pairs of corresponding sides. Since
(X2x2 -X3x3)+(X3x3 -X1x1)+(X1x1 -X2x2)=0 the points C1 , C2 and C3 are linearly dependent and hence they are collinear.
REMARK 11.8. Both the theorem of PAPPUS and the theorem of DESARGUES play an important role in the synthetic treatment of projective geometry. With certain obvious modifications, they also hold in an affine space. F. Cross ratio In classical projective geometry, the concept of cross ratio occupies a central position in the theory of geometrical invariants.
Let Q, R, S, T be four distinct points on a line L of a projective space P(X). We saw in the proof of the theorem of PAPPUS 11.6. that there are vectors x, y of X such that Q = 7r(x), R = 7r(y) and S = 7r (x + y).
§11
131
PROJECTIVESPACE
For the fourth point T we obtain a unique (relative to the vectors x and y of X) scalar A such that
T= 7r(x+Ay). Next we want to show that the scalar A is actually independent of the choice of the vectors x and y. Let x', y' be vectors of X and A' be a scalar of A such that Q = 7r(x'), R = 7r(y'), S = 7r(x' + y') and T = rr(x' + A'y'),
Then x' = ax and y' = Qy for some a * 0 and 0 * 0 of A. The equality 7r(x + y) = 7r(x' + y') implies that a = P. Consequently it follows from 7r(x + Ay) _ 7r(x' + A'y') that A = V. Therefore we can introduce the following definition. DEFINITION 11.9. Let Q, R, S, T be four distinct points of a line in a projective space P(X) derived from the linear space X over A. Then the cross ratio
lQ
R ] is the scalar A such that
Q = rr(x), R = 7r(y), S = 7r (x + y) and T = rr(x + Ay) for some vectors x and y of X. We observe that the cross ratio I Q
is different from 0 and 1,
TI for otherwise we would have T = Q or T = S. Conversely for any three distinct points Q, R, S on a line L and any scalar A * 0, 1, there is a unique point T on L such that
[QTJ
= A.
The cross ratio has certain symmetries: Q]
R]
IL
- IQ
R]-1
S [Q and
IQ
1-
T]
[R
Q
S]
In classical projective geometry, we say that the quadruple (Q, R, S, T) of points constitutes a harmonic quadruple if
R]=_ S [Q
132 REMARK
IV
PROJECTIVE GEOMETRY
11.10. In definition 11.9, the four points Q, R, S, T are
supposed to be distinct. In classical projective geometry, we allow a
certain freedom to the fourth point T. For example, if we allow
Rgi l gives the value 0 or
T= Q or T = S, then the cross ration SCQ
TJ
I
respectively.
In this way, a one-to-one correspondence between the punctured line L\ {R } and the set A of all scalars is obtained by associating the
T]to the point TEL\{R } . If furthermore we are agreeable to the introduction of the "value" oro, then we can also allow scalar) Q
T = R in which case the cross ratio
fQ
T]
is
co. We observe
that in all cases the points Q, R, S should be distinct. By and large we
find it more convenient to insist that all four points Q, R, S, T are distinct.
The cross ratio defined in 11.9 is invariant under perspectivity: more precisely, if L and L' are two lines on a projective plane and the
point quadruples Q, R, S, T on L and Q', R', S', T' on L' are in perspective (see Figure 16),
§11
1
then
Q T]
=
T,J
133
PROJECTIVE SPACE
.
To see this let Q = Tr(x), R = ir(y),
S = ,r(x + y), T =7r(x + ay). Then for the centre of perspectivity Z and the point Q' on LQZ we can choose a vector z such that Z = zr(z) and Q' = 7r(z + x). Moreover we can choose a non-zero scalar A such
that R' = ir(Az + y). Then for the intersections S' = L' n LSZ and T' = L' n LTZ we obtain S' = 7r [ (z + x) + (Az +y)] *and
Q' R'1
T'=7r[(z+x)+a(Az+y)]. Therefore S'
T'
I = a.
Linear construction Let L be a line on a projective plane. With three distinct points Q, R, S on L we can set up a one-to-one correspondence T: A - L\ (R 1 G
Tot] where T(a) = Tot denotes the point T., on L such that [SQ R
a =
We now wish to ask whether the algebraic structure of A makes itself
manifest in any fairly direct manner in the geometry of the plane. More precisely we wish to know whether given two points Ta and TR
on L the points Ta+R and To can be obtained by geometric constructions in the projective plane that involve only the following two basic operations: 1.
2.
Passing a line through two distinct points. Forming the intersection of two distinct lines.
We call such constructions linear constructions.
We observe that our linear constructions here correspond to the geometrical constructions in an affine plane that involve three basic operations: (i) Passing a line through two distinct points. (ii) Forming the intersection of two intersecting lines. (iii) Passing a line through a point parallel to a given line. As a lead up to the linear construction of Ta+(3 in the projective plane P(X), we consider the following situation in an affine plane
134
IV
PROJECTIVE GEOMETRY
where Q, Ta, TT are points of a line 1 such that QTa = ax and QTa
fix. Then the point T on line 1 such that QT = (a + (3)x is found as follows.
D
'C
J' C
-R R
Fig. 17
We choose an arbitrary point A such that A (t 1. Through this point A we pass lines g = LA Q, h = LA T. and finally a line 1' parallel to 1.
Next we pass through Ta a line g' parallel to g and get a point
B = g' n 1'. Finally we pass through B a line h' parallel to h and get the required point T = h' n 1. Imagine now that the affine plane is extended to its projective extension, then the pair of extended LINES L, L' of 1, 1' intersects at a point R at infinity. Similarly the extended LINES G, G' of g, g' intersect at C and the extended LINES H and H' of h, h' intersect at D. Furthermore these POINTS R, C and D are collinear since they lie on the line at infinity. Using this as a model we make a linear construction in the projective plane as follows. First, we choose two auxiliary points A and C in such a way that A does not lie on L and C lies on LA Q but is different from A and Q. Then we obtain, by appropriate linear constructions, the points of intersections D = LCR n LA Ta, B = LA R n LCTR and T = L n LDB.
§It
PROJECTIVESPACE
135
D=7r(x+ay+z)
= 7r(x) Ta = 7r(x+ay) TR=7r(x+(3y) T= 7r(x+(a+j3)y)
Fig. 18
We assert that T = Ta+R. Let x, y and z be vectors of X such that Q = ir(x), R = 7r(y), S = 7T (x + y), Ta = 7r (x + ay), TR = 7r (x + j3y), A = 7r(z) and C = 7r(y + z). Since D lies on the lines LA Ta and LCR , we find
D = 7r(x + ay + z). Similarly B lies on the lines LA R and LCTO; we find
B = 7r(z - 3y). Finally T lies on the lines LBD and LQR; we find T = 7r(.x + (a + (3)y).
Therefore T = Ta + R and we have thus proved the following theorem.
1 1.11. Let Q, R and S be three distinct points on a line L of a projective plane P(X). Furthermore let Tot and TR be points on L
THEOREM
such that [ Q L
Ta] = a and rL Q
such that r Q R
LS Ta+
struction.
1
1 = 3. Then the point To, + a TQ
= a + 0 can be found by a linear con-
136
IV
PROJECTIVE GEOMETRY
Using similar arguments and the following figures we have no
difficulty in proving the following theorem. THEOREM
that
IQ
R = a(y)
11.12. Notations as in 11.11. The point Tag on L such aR
a
1 = Up can he found by a linear construction.
T«Q= 7(x+aQY)
S=lr(x+y)Q=ir(x) Fig. 20
REMARKS 11.13. The basic operations 1 and 2 above correspond to two familiar incidence axioms in the synthetic approach to projective
§11
PROJECTIVE SPACE
137
geometry. The results of this section
is therefore useful in establishing coordinates in a projective plane defined by incidence axioms, since they will allow us to regard the punctured line L\ {R } as a field with R and S playing the roles of zero and identity of the
field.
The principle of duality In § 11 .D we have made extensive use of the one-to-one correspondence between the subspaces of a linear space X and the linear varieties H.
of the projective space P(X) derived from X to translate algebraic theorems on X into geometric theorems on P(X ). Assume X to be of
finite dimension n + 1. Then the mapping AN: 2' (X) - Y(X*) defined in §7.D gives us a one-to-one correspondence between the subspaces of X and the subspaces of X*. We recall that for any subspace Y of X, the annihilator AN(Y) of Y is defined as the subspace AN(Y) = ( fcX*: f(x) = 0 for all xe Y)) of X* and that the mapping AN satisfies the following conditions: (i)
dim Y+dimAN(Y) =n+ 1,
(ii)
AN(Y1) C AN(Y2) if and only if Y, D Y2, (iii) AN(Y1 + Y2) = AN(Y,) fl AN(Y2), (iv) AN(Y1 fl Y2) = AN(YI) + AN(Y2).
Therefore the mapping AN provides us with a link between the geometry of P(X) and the geometry of P(X*) which associates to each r-dimensional linear variety of P(X) a (n-l-r)-dimensional linear variety of P(X*) and reverses the inclusion relation. In particular to a point of P(X) is assigned a hyperplane ofP(X*) and to a hyperplane
of P(X) a point of P(X*). Therefore given a theorem couched in terms of linear varieties and inclusion relations, we can obtained another theorem, called its dual, by suitably changing dimensions and reversing inclusions (e.g. interchange the words point and hyperplane, intersection and join). Since the truth of a theorem implies the truth of its dual, this so-called principle of duality essentially
"doubles" the theorems at our disposal without our having to do extra work. We now enter into detailed study of the principle of duality.
By a theorem of n-dimensional projective geometry over A we mean a statement which is meaningful and holds in every n-dimensional projective spaceP(X) derived from an (n+l )-dimensional linear
138
IV
PROJECTIVE GEOMETRY
space X over A. Suppose that Th is a theorem in n-dimensional projective geometry that involves only linear varieties and inclusion relations between them. Th is then usually formulated in terms of intersections, joins and dimensions. We define the dual theorem Th*
to be the statement obtained from Th by interchanging C and D throughout and hence replacing intersection, join and dimension r by join, intersection and dimension n-l-r respectively. Similarly if F is a configuration of n-dimensional geometry over A, then the dual configuration F* is obtained from F by reversing all inclusion signs and replacing dimension r by dimension n-1-r. For example, if F is a complete quadrangle in a projective plane, i.e. a plane configuration consisting of four points, no three of which are collinear and the six lines joining them in pairs, then the dual configuration is a complete quadrilateral and consists of four lines, no three of which are concurrent, and their six points of intersections.
Fig. 21 Complete quadrangle
Fig. 22 Complete quadrilateral
11.14. (The principle of duality). If Th is a theorem of n-dimensional projective geometry over A involving only linear THEOREM
varieties and inclusion relations among them, then Th * is a theorem of n-dimensional projective geometry. PROOF. Let P(X) be an n-dimensional projective space derived from an
(n+1)-dimensional linear space X over A. Suppose that the premise of Th* is satisfied in P(X). Then making use of the mapping AN and the correspondence between subspaces and linear varieties we see that the premise of Th** = Th is satisfied in P(X*). Since Th holds true in every n-dimensional projective space by hypothesis,
the conclusion of Th is true in P(X*). Applying AN and the
PROJECTIVE SPACE
§11
139
correspondence to the conclusions of Th, we see that the conclusion of Th * holds true in P(X **) = P(X). The proof is complete.
Applying the principle of duality we see that the converse of DESARGUES'theorem in the plane is true. The dual of PAPPUs' theorem is as follows.
Let Q and Q' be two distinct points of a projective plane. If g, h, l are three distinct lines concurrent at Q and g', h', l' are three distinct lines concurrent at Q', then the lines LRR., Lss, and LTT, are concurrent where
R = gnh' R'=g'nh
S = g nI' S'=g'n I
T = hn1' T'= h'n I.
T
1.
Exercises Prove Theorem 1 1.5.
2.
Let Q0, Q 1
1.
... , Q, be r + I
linearly independent points of an n-dimensional projective space. Denote by Li the linear variety spanned by the points Q; with i : j Show that Lo n L1 n ... n L,-
= 0. Does this hold for the set of k-dimensional linear
varieties (k being fixed and 0 < k < r-2) each of which is generated by k + 1 points among the points 0;?
140 3.
IV
PROJECTIVE GEOMETRY
Let Q0, Qr , ... , Q, be r + I linearly independent points of an r-dimensional linear variety L. Show that for every k-dimensional linear subvariety L' of L (i.e. L' is a linear variety of dimension k and L' C L) there are r-k points among the Qi which span a linear variety L" skew to L'.
4.
If R is a point of a 3-dimensional projective space and L, M are
skew lines not containing R, prove that there exists a unique line through R intersecting L and M. 5.
Let L, M, N be mutually skew lines of a 4-dimensional projective space not lying in a hyperplane. Prove that L intersects M + N in a point. Show also that there is a unique line intersecting L, M and N.
6.
Draw a complete quandrangle and join the pairs of diagonal
points. Which are the harmonic point quadruples in your diagram. 7.
Under the hypothesis of Theorem 11.6 (Pappus), show that the line through A, B and C passes through the point of intersection of L and L' if and only if QRS and Q'R'S' are in perspective.
8.
Let Q 1
,
Q2, Q3, Q4 be four distinct collinear points of a 1Q0(I) Qa(2) in terms of
projective space. Express Qa(3)
9.
Qr
Q2
Q3
Q4
Qa(3 )
for all permutations a.
Let Q, R, S be three distinct collinear points on a projective
plane. Find the point T such that
=-1 by a linear
construction. 10.
Let Q, R, S and Ta be collinear points on a projective plane such that
QR
Is
Tot
= a # 0. Find T_a and T, lot by linear
§ 12
constructions such that
Q
IS 11.
141
MAPPINGS OF PROJECTIVE SPACES
= -a and
R
=1/a . R T,1a
Q
IS
T- -a
Let Q1 Q2 Q3 be a triangle on a projective plane. Let R1, R2, R 3 be points on the sides LQ 2 Q 3, LQ 1 Q 3, LQ 1 Q 2 respectively
such that LR IQ,, LR 2 Q 2 and LR 3 Q 3 intersect at a point. Show that the fourth harmonic points S1, S2, S3 for which Q1
Q2
Q2
Q3
R3
S3
R1
S1
=
Q3
Q1
R2
S2
_ -1
are collinear. Dualize this result.
12. The configuration of all points on a line of a projective space is called a range of points. Describe the 2-dimensional dual and the 3-dimensional dual of a range of points (called a pencil of lines and a pencil of planes respectively). Find the 3-dimensional dual of the configuration consisting of all points on a plane. 13.
Let Q and R be two points in the n-dimensional
real
arithmetical projective space P (R). We say that n + 1 real-valued
continuous functions fo (t), [0,
11
define a
. . . ,
path f in
fn (t) on the unit interval
Pn (R)
from Q to R if (i)
(fo(0), fl(0), . . . , fn(0)) and Uo(1), f1(1), ... , fn(1)) are homogeneous coordinates of Q and R respectively and (ii) (fo(t), f1(t), ... , fn(t)) is different from (0, . . , 0) for all values of t. In this case points of Pn(R) with homogeneous coordinates (fo(t), f1(t), ... , fn(t)) are called points of the path f. Show that if H is a hyperplane in Pn (R), then for any two points S on T not on H, there exists a path f from S to T .
which does not pass through H. (In other words P, ,(R) is not separated by any hyperplane).
§ 12 Mappings of Projective Spaces
Throughout the present § 12 we assume all projective spaces that enter into our discussions to be of finite dimension not less than two.
Let P(X) and P(Y) be two projective spaces over A. Here we are
142
IV
PROJECTIVE GEOMETRY
interested in a class of bijective mappings of P(X) onto P(Y) which fulfil certain geometric invariance conditions. A. Projective isomorphism DEFINITION
12.1. A mapping c of a projective space P(X) into a
projective space P(Y) is called a projective isomorphism if for every linear variety L of P(X), (i) 4'[L] is a linear variety of P(Y) and (ii)
dim L =dim 4'[L]. It is clear that a projective isomorphism of projective spaces is a bijective mapping which preserves joins, intersections and linear independence. In particular both it and its inverse take any three distinct collinear points into three distinct collinear points. The following theorem shows that this property alone characterises projective isomorphism.
THEOREM 12.2 A bijective mapping 4': P(X) -> P(Y) is a projective
isomorphism if and only if both it and its inverse take any three distinct collinear points into three distinct collinear points.
In the sequel, we shall denote by Q' the image 4'(Q) of any point Q of P(X) and similarly by M' the direct image 4'[M] of any subset M of P(X). We first prove LEMMA 12.3. If L is a line, then L' is a line.
PROOF. Let Q and R be two distinct points on the line L. Then the points Q and R generate L, i.e., L = LQR . Since 4' is bijective, the points Q' and R' are distinct. We show that L' = LQ-R'. It follows from the definition that L' C LQ.R.. Since dim P(Y) 3 2, for each point T of L .R . we can find two points A' and B' of P(Y) such that T = LQ .R . A LAB.. Then for the point S = LQR n LAB we get 4'(S) = T, proving L' = LQ .R ..
Therefore it follows from 11.4 and 11.5 that 4) takes linear varieties of P(X) into linear varieties of P(Y) and that 4) preserves linear independence. Hence 4' preserves dimensions. This completes the proof of 12.2. We recall that a linear construction in a projective plane involves only
two types of basic constructions, namely (i) forming the join of two
points and (ii) forming the intersection of two lines. Let us now think of a linear construction as being carried out within a 2-dimensional linear variety L of P(X). We then apply a projective isomorphism 4' of P(X) to all the points and lines of the construction. It
§ 12
143
MAPPINGS OF PROJECTIVE SPACES
follows from the results above, that the image of this figure as a whole will lie in the plane L'. Furthermore, the image of a line joining two points will be the line joining the images of the two points in question and the image of the point of intersection of two lines will be the point of intersection of the images of the two lines in question. Therefore the figure of the linear construction maps under (P into an exactly analogous figure. In this sense, we say that a projective isomorphism carries a linear construction into an exactly analogous linear construction. B. Projectivities As a special type of projective isomorphism, we consider mappings of
a projective space P(X) into a projective space P(Y) that arise from
isomorphisms of the linear space X onto the linear space Y. Let >y: X - Y be an isomorphism and let 7r denote the canonical projections 0
0
X - P(X) and Y - P(Y). Then i' has the property, that for any two non-zero vectors x, y of X, 7r(x) = 7r(y) if and only if 7r(ox) = 7r(qiy). Therefore >Ji induces a bijective mapping 'Y: P(X) -+ P(Y) such that 0
*(7rx) = 7r(ox) for all xEX. In general we call a mapping : P(X) --> P(Y) a projectivity if there exists an isomorphism is X -> Y such that 'Y(7rx) = 7r((ix) for all 0 xEX.It is clear that a projectivity is a bijective mapping that preserves linear dependence and independence; therefore every projectivity is a projective isomorphism. We shall see later that projectivity and projective isomorphism are not equivalent concepts, i.e. there are projective isomorphisms which are not projectivities. For the moment we show that different isomorphisms of linear spaces can induce one and the same projectivity. THEOREM 12.4. Two isomorphisms 0, iji: X -+ Y of linear spaces induce the same projectivity c = 4': P(X) -+ P(Y) if and only if there exists a non-zero scalar A such that ly = A. PROOF. If
0
= A0, then 7r(>yx) = 7r(Acbx) = 7r(ox) for everyxEX;there-
fore'=TY. Conversely, suppose (D
then for every xeX we get a non-zero
scalar AX, possibly depending on x, such that X .,,Ox ='x (since 0 4(7rx) _'(7rx)). We now show that Ax = Ay for all x,yEX,whence the theorem follows.
144
IV
PROJECTIVE GEOMETRY
Assume first that x and y are linearly independent. Then all three vectors x, y and x + y are different from the zero vector. It follows
from ix + 41y ='(x + y) that ax¢x + Oy = Xx+y¢(x + y). But on the other hand ¢x + 4y = O(x + y) an the vectors Ox and Oy are linearly independent; therefore Xx = Xy = X + . Assume next that the non-zero vectors x and y are linearly dependent. x The assumption that the linear space X has a dimension not less than two implies that there exists a non-zero vector z such that the two pairs of vectors x and z, y and z are both linearly independent. It follows, from what has already been shown in the linearly independent case, that X = Xz Xy. The proof is now complete.
We now turn to the problem of constructing projectivities under certain geometric specifications.
Let P(X) and P(Y) be both n-dimensional projective spaces. It follows from the definition that given n + I linearly independent points CO, C 1 , . points D0, D1 i .
. . ,
C of P(X) and n + 1 linearly independent
, D. of P(Y), there exists a projectivity (D such that (D(Ci) = D. for i = 0, . . . , n. In fact if x; and y, are vectors of X and Y respectively such that 7rxi = C; and iryi = Di for i = 0, 1, . . . , n, then (xo, x, , ... , and (yo, y, , ... , are bases of X and Y . .
respectively and the unique isomorphism 0: X -+ Y, such that Di ¢xi = y, (i = 0, ... , n), will induce a projectivity (D such that (I = 0, ... , n). But the isomorphism 0 and hence also the projectivity
4) depends on the choice of the bases (x0,.. .... ,x,,) and (yo, ,yn) In other words it is possible that with another choice of bases we may obtain another projectivity ' satisfying the same requirement that 'I'(C1) =Di (i = 0, . . . , n). Take, for instance, the base (2y yl, . . . , yn ) of Y. We have ir(2yo) = Do and 7r(yi) =Di (i = 1, ... , n) and conse-
quently for the isomorphism i, such that fix, = 2y, and >lix, = yi (i = 1, ... , n), we obtain a projectivity ': P(X) -> P(Y) such that + x,,) '(Ci) = Di (i = 0, 1, . . ., n). However for the point U = 7r(xo +
we have 4)(U) 0 *(U) since rr(yo +
+yn) 0 ir(2yo + y, +
+ y ).
This discussion leads us to the following definitions. Let P(X) be an n-dimensional projective space. A simplex of P(X)
is a family (Co, C1, ..., Cn) of n + I linearly independent points of , . . . , Cn) is a simplex of P(X) and U is a point of P(X) which does not belong to the join of any n points among the points C,, then we say that the family (Co, C1, ... , Cn 10 forms a frame of reference for P(X) with unit point U and simplex P(X). If (C0, C 1
(C0,C1, ...,C,).
§12
MAPPINGS OF PROJECTIVE SPACES
145
THEOREM 12.5 Let P(X) and P(Y) be projective spaces. If (Co, C1, ... , C I U) and (Do, D1, .. . , D I V) are frames of reference
f o r P(X) and P(Y) respectively, then there exists a unique projectivity 4 ) o f P(X) onto P(Y) such that 4'(C,) = D, f o r i = 0, 1, ... , n
and 4'(U)=V. We first prove LEMMA 12.6. If (Ca , C1, ... , C I U) is a frame of reference for a projective space P(X), then, up to a common scalar factor, there is a unique choice of vectors xo, x1 , ... , x of X such that 7rxi = C; for
i=0, 1,
. . . ,
nand7r(xo+x1+ ... +xn)=U.
PROOF. Let ao, a 1I,. , .
, an and u be vectors of X such that 7rai = C; + An an for i = 0, 1, ... , n and 7ru = U. Then u = Aoao + A 1 a 1 + . Since U is not a linear combination of any n vectors among ai, all Ai
(i = 0, 1, ... n) must be different from zero. Therefore xi = Aiai (i = 0, 1, ... , n) satisfy the requirement of the lemma. Suppose bo, b1, ... , b are vectors of X such that 7rb, = C; for i = 0, 1, ... , n and 7r(bo + b1 + ... + bn) = U. Then there are non-zero +x scalars A and µi such that A(bo + b l +.... +b,,)= xo + and p.b, = xi for i = 0, 1, ... , n. The linear independence of the vectors xo , x11 ... , xn
implies that µo = ... = yn ; therefore
x0, x1, ... , x, are uniquely determined up to a common scalar factor. PROOF OF THEOREM 11.5. Let xo , x 1, ... , xn and yo ,
yl, ... , yn be
vectors of X and Y respectively such that 7rx, = Ci, iryi = Di and 7r(xo + xl + ... + xn) = U, 7r(yo + yl + . + yn) = V. Then the unique isomorphism ¢: X -> Y such that Oxi = y, induces a projectivity
4' such that 4'(Ci) = Di and 4'(U) = V. Suppose 4;: X -* Y is an isomorphism which induces a projectivity 4 satisfying the requirement of the theorem. Then 7r( Jx,) = D; and 7r(>'xo + 41x l + ... + 1 f txn )
= V. By Lemma 11.5 there exists a non-zero scalar A such that i,xi = Ayi for i = 0, ... , n; therefore = A0. Hence 4' = 'I' by 12.4.
REMARKS. The frames of reference for a projective space are similar to the bases of a linear space in more than one way. For instance, homogeneous coordinates enjoying properties similar to (i),
(ii) and (iii) of § 11C can be asrigned to points of P(X) relative to any frame of reference (Co , 01, . . . , C, J U) as follows. We
146
IV
choose vectors x0, x, 1,
... , n
and0 ir(xo
PROJECTIVE GEOMETRY
, ... , X. of X such that 9rx1 = C; for i = 0, + x, + .. + xn) = U. If x = aoxo + a, x,
. . . + an x, EX, then we say that relative to (C0, CI , ... , C I U ) the point lrx admits a representation by homogeneous coordinates (ao , a, , . . . , an)- In particular (1, 0, . . . , 0),(0, 1, 0, ... , 0), ... , (0, ... , 0, 1) and (1, 1, ... , 1) are homogeneous coordinates of the points Co , C1, . . . , C and U respectively. The basic properties similar to (i), (ii) and (iii) of § 11C are now easily formulated and verified. Moreover it follows from lemma 12.6 that the homogeneous
+
coordinates do not depend on the choice of the base (x3, x, , .... , xn ) of X. Theorem 12.5 plays an important role in the synthetic approach to the classical projective geometry and is usually called the fundamental theorem of projective geometry. (See also remarks 12.12)
C. Semi-linear transformations We shall turn to the problem of determining if there exist projective isoniorphisms which are not projectivities. For this purpose we recall that an isomorphism 0: X - Y is a bijective mapping that fulfills the following conditions: for all xEX and XEA; and (i) O(Xx) = X (x) (ii) ¢(x + y) = O(x) + ¢(y) for all x,yeX.
In the discussion above, we derive from property (i) that ¢ induces a well-defined mapping (P: P(X) --)- P(Y) which furthermore is bijective
since 0 is bijective, whereas property (ii) further ensures that the induced mapping c is a projective isomorphism. However if the bijective mapping 0: X - X satisfies instead of (i) the following weaker condition:
(iii) to each non-zero scalar X there exists an non-zero scalar a(X) such that 0(kx) = a(X)O(x) for all xr=-X,
then ¢ will also induce a well-defined mapping (b: P(X) -- P(Y). This leads us to introduce the following definitions. DEFINITION 12.7. A bijective mapping a: A - A is called an automor-
phism of A (A = R or A = C) if a(X + y) = a(X) + a(µ) and a(X) _ u(X)a(p) for X,,u of A.
§12
MAPPINGS OF PROJECTIVE SPACES
147
We have no difficulty in verifying the following properties of an automorphism a of A: (a)
a(0) = 0;
(b) (c) (d)
u(A - p) = a(A) - a(p); and
a(1)1;
a( '. = a(X where p * 0.
DEFINITION 12.8. Let X and Y be linear spaces over A and a an automorphism of A. A mapping 0: X - Y is a semi-linear transformation relative to a if the following conditions are satisfied: for all X of A and x of X; (i) O(Ax) = a(A)O(x) (ii) O(x + y) = ¢(x) + 0(y) for all x and y of X. Since the identity mapping of A is an automorphism of A, a linear transformation is always a semi-linear transformation relative to this automorphism- The next theorem shows that every semi-linear transformation of a real linear space is a linear transformation. THEOREM
12.9. For the field R of all real numbers, the identity
mapping of R is the only automorphism of R.
PROOF. Let a be an automorphism of R. Then it follows from property (b) above that a(n) = n for all non-negative integers n. From
(d) it follows that a(r) = r for all rational numbers r. Let X be an arbitrary positive real number. Then there exists a real number p (e.g., p = %/A) such that X = p2. Therefore a(X) = a(µ2) = a(p)2 and hence a(A) > 0. Similarly if A < 0, then a(A) < 0. Assume that there exists a real number a such that a *a(a). Without loss of generality of the following argument, we may assume that a(a) < a. Let r be a rational number such that a (a) < r < a. Then it follows from a (r) = r
that a(a - r) = a(a) - r < 0. But this is impossible since a - r >.0. Therefore the assumption that a(a) : a leads to a contradiction and the theorem is proved.
We observe that there exist automorphisms of the field C of all
complex numbers distinct from the identity mapping of C; for example the automorphism a: C -> C such that a(a) = a where a is the complex conjugate of a. In fact there are an infinite number of automorphisms of C. Let 0: X -> Y be a semi-linear isomorphism of linear spaces relative 0
to an automorphism a of A. Putting 4 (7rx) =7r(¢x) for every xeX, we
148
IV
PROJECTIVE GEOMETRY
obtain a well-defined bijective mapping 4': P(X) - P(Y) of projective spaces. Suppose Q, R and S are three distinct collinear points in P(X) with Q =Tr(x), R =Tr(y) and S=Tr(z). Then z =Ax + My and hence Oz = (aA)Ox + (aµ)0y, i.e. 4'(Q), (D(R) and 4'(S) are collinear points of P(Y). Therefore every semi-linear isomorphism X -> Y induces a projective isomorphism P(X) -+ P(Y). We now prove the converse of this
statement. Thus we shall have the following algebraic characterization of the geometric concept of projective isomorphism. The projective isomorphisms are precisely those mappings which are induced by semi-linear isomorphisms. THEOREM 12.10.
Let X and Y be linear spaces over A both of
dimension not less than 3. If 4' is a projective isomorphism of P(X) onto P(Y), then there exists an automorphism a of A and a semilinear isomorphism 0: X -> Y relative to a which induces 4'. PROOF. Let (x0 , x1, ... , xn) be a base of the linear space X. The
theorem is proved if we can find an automorphism a of A and a base
,Yn) of Ysuch that ifx =Aoxo +A1x1 + ... +Anxn is (Yo,Yi, a non-zero vector of X, then 4'(7r(x)) = ir(y) where y = a(A0)yo + a(A1)y1 + .. + a(An)yn . In the sequel we shall denote for each point Q of P(X) the image 4'(Q) of Q under 4) by Q'. Thus for the base point Q1 = Tr(x;) i = 0, 1, ... , n, the images are denoted by Q',
i=0, 1, ...,n.
First we choose an arbitrary non-zero vector yo of Y such that Q'0 = Tr(yo). Our next step is to find a vector y1 of Y such that Q'1 = Tr(Y 1) and U'01 = Tr(Yo + y1) where Uo l = Tr(xo + x 1).
The vector y1 obviously exists and it is uniquely determined by these
two properties. Moreover the vectors yo and yl are linearly independent since the points Q'0 and Q'1 are linearly independent. Consider now the two ordered triples (Q0, Q1, U01) and
(Qo, Q'1, U'01) of collinear points. For each scalar a of A, we denote by Ta the point on the line through Q0, Q1 such that Qo
Q1
a =
Uo1 Ta
A mapping a: A -+ A is then defined by putting Q'o Q11 a(a) _
UO1T'a
149
MAPPINGS OF PROJECTIVE SPACES
§ 12
for all By the one-to-one correspondence between cross ratios and points on a line, the mapping a is bijective. On the other hand,
for any scalars a and (3 of A, the points Ta+p and Tag can be obtained by linear constructions which map under the projective isomorphism 1 into exactly analogous linear constructions. Therefore it follows from
a+(3 =
Qo
Q1
Qo
Q1
Uo1
TR
E-i
Uo1 Ta Qo
Q1
Qo
Q1
Qo
QI
Uo 1
Ta
Uo1
T(3
Q,1
Ta (3
Q'o
Q 'I
and a(3 =
a(a+(3) =
Ta+(3
Uo 1
Q10
Q'1
Uo1 Ta+/3 Qo
Q'1
U01 T'a [Q 'o
Q' 1
and a(a(3) =
Wo 1 Tag
Q'1
Q10
+
_
Uo 1 T 'a
= a(a) + a((3)
Uo1 T(3 Q'o
11
a(a)a((3)
U'o1 T'(3
Hence we have proved that a is an automorphism of A. Up to this point of the proof, we have found an automorphism a of A and two linearly independent vectors yo and y1 such that Q'o = ir(Yo), Q'1 = 'r(Y1) and
4(7r(Aoxo + A1x1)) = 7r(a(Ao)Yo + a(AI)Y1)
Let us now consider the next base point Q2 = 7r(x2) and its image Q'2. A unique vector y2 of Y can be found such that Q2=7r(y2)and U',12=7r(Yo +Y1 +y2) where Uo12 =7r(xo+x1 +x2).
Then vectors yo, y1i Y2 are then linearly independent. Furthermore, the point U1 2 = 7r(x1 +x2) can be obtained from the points Qo, Q1, Q2 and Uo 12 by a linear construction as indicated by the following figure.
150
IV
PROJECTIVE GEOMETRY
U. 12 = 7r(Xo +X1 +x2)
= 7r(xo)
Q1 = 7r(x1)
Fig. 24
Therefore U12' _ 7r(Y1 + Y2) Similarly, for each we get the point 7r(xo + axe) by a linear construction from the points Qo, Q1, Q2, U1 2 and 7r(xo - ax1) as follows: 7r(xo + aX2)
7r(xo- ax1)
Fig. 25
From this it follows that c(7r(xo +ax2)) =7r(yo + 0042) and hence '(7r(Xoxo +X2x2)) = 7r(a(Xo)yo +0r(X2)y2). Similarly we prove that (D(7r(X1X1 + X2x2)) = 7r(o(X1)Y1 + a(X2)Y2). Finally the point 7r(xo + ax1 + (3x2) is obtained by the linear construction: 7r(xo + Qx2)
Q1
Fig. 26
7r(xo +ax1)
§12
151
MAPPINGS OF PROJECTIVE SPACES
Therefore 4)(a(xo+ax1 +13x2))=7r(ya +v(a)y1 + "(0)YO and hence 4)(ir(Xoxo +X1x1 +X2x2)) =,r(a(X0)Yo + a(X1)Y1 +o(X2)Y2) for all A0,X1,X2 of A.
Finally, by an induction on the dimension n, we obtain the required vectors yo, y1 ,
..
Y1 together with the automorphism a of
A.
The following discussion shows the necessity of the condition dim X > 3 in Theorem 12.10. The case where dim X is REMARKS 12.11.
0 or
1
is, of course, uninteresting. If dim X = 2, then P(X) is a
projective line. Consequently every bijective mapping of P(X) onto P(Y) is a projective isomorphism, by Theorem 12.2. On the other hand, if 0: X -> Y is a semi-linear isomorphism, then the induced projective isomorphism 4) has the property that it takes harmonic quadruples of points into harmonic quadruples of points, i.e., 4)(Q)
4)(S)
D(R) V(T)
_ -1 if
Q
R
S
T
= -1.
This is clearly not a property of every bijective mapping of P(X) onto itself. Therefore not every projective isomorphism of a projective line is induced by a semi-linear isomorphism.
REMARKS 12.12. We note that Definition 12.1 is given entirely in geometric terms; therefore we can speak of projective isomorphism between projective spaces defined over different fields and we can use the expression "projectively equivalent projective spaces" to mean
that there exists a projective isomorphism between them. Then Theorem 12.10 gives rise to a Projective Structure Theorem: A projective space P(X1) derived from a linear space X1 over Al and a projective space P(X2) derived from a linear space X2 over A2 are projectively equivalent if and only if (i) dim Xl = dim X2 and (ii) dim X1 = dim X2 = 1; or dim Xl = dim X2 = 2 and Al and A2 have the same number of elements; or dim (Xl) = dim (X2) a 3 and Al and A2 are isomorphic fields. We leave the proof of this theorem to the interested reader. Theorem 12.10 stipulates that the geometric structure of a projective space is completely determined by the algebraic structure of its underlying linear space. Thus it is possible to construct from an abstractly given projective geometry (for example by a system of
152
IV
PROJECTIVE GEOMETRY
suitable axioms) a linear space which gives rise to a projective geometry equivalent to the abstractly given one.
It follows from 12.10 that all projective isomorphisms of real projective spaces are projectivities since there is no automorphism of the real field other than the identity mapping. We now show that for complex projective spaces the concept of projective isomorphism does
not coincide with the concept of projectivity. For this we need the following lemma which is a straightforward generalization of 12.4.
LEMMA 12.13. Let X and Y be linear spaces over A and let 0 and ' be semi-linear isomorphisms of X onto Y relative to automorphisms a and T of A respectively. Then 0 and induce the same projective isomorphism if and only if there exists a non-zero scalar A such that >y = A0. In this case a = T.
PROOF. The proof of the first part of the lemma is entirely similar to the proof of 12.4. For the second part, suppose = A¢ for a none
zero scalar A. For any a e A and xEX, we then have six = AOx and T(a)Vix = 4i(ax) =A4(ax) =Aa(a)Ox. Therefore T(a)Acbx=Aa(a)Ox. Since both A and ¢x are non-zero, we must have r (a) = a(a). Therefore a =,r.
The existence of projective isomorphisms of complex projective spaces other than projectivities now follows from 12.13, since a semi-linear isomorphism which induces a projectivity must necessarily be an isomorphism. In particular if (xo, ... , xn) is a base of X, then the semi-linear automorphism 0 defined by cb(aoxo + ... + anxn) = aoxo + .. + anxn cannot induce a pro jectivity of P(X) onto P(X). D. The projective group
Given a projective space P(X), the projective automorphisms of P(X) constitute a group n with respect to composition. This group n
can be represented as follows. Let G be the set of all semi-linear automorphisms of the linear space X over A. If 0 and >y are semilinear automorphisms of X relative to the aiffomorphisms a and r of
A respectively. Then it is easy to verify that 0*0 is a semi-linear automorphism of X relative to the automarphism aor of A and that (G, o) is a group. We define an equivalence relation R in the set G by putting OR Vi if and only if 0 = Any for a non-zero scalar A. Then by
§ 12
153
MAPPINGS OF PROJECTIVE SPACES
12.13 two elements of G induce one and the same element of II if and only if they are R-related, and by 12.7 we can identify the set IT with the quotient set G/R. Moreover the group structure in n is then
the same as the quotient group structure of G/R defined by [0] c [0] = [0 u Jil where the bracket [ ] denotes the equivalence class formation.
A projectivity of P(X) onto itself is called a collineation of P(X).
The subset IIo of all collineations of P(X) is easily seen to be a subgroup of II and is called the projective group of P(X). A representation of 110 is obtained in a similar way. E. 1.
Exercises
Let 4): P(X) - P(Y) be a projectivity and let Q,R,S,T be four Q
collinear points such that L
f
4'(Q) D(R) 1 4'(S) 4)(fl I
2.
S
is defined and
TJ
is defined. Prove that
QR S
T
4)(Q) 4)(R) 4)(S)
4)(R)
A collineation of a projective plane is called a perspectivity if it
has a line of fixed points. Let 4) be a perspectivity of a projective plane P different from the identity map. Prove that there
exists a unique point E such that for every point Q of P the three points E, Q and 4)(Q) are collinear. E is called the centre of perspectivity and the line L of fixed points is called the axis of perspectivity. 3.
Let L be a line on a projective plane P and let Q, Q' be two distinct points not on L. If E is a point on LQQ distinct from Q
and Q' prove that there exists a unique perspectivity 4) with 4. 5.
centre E and axis L such that 4)(Q) = Q'. Use the result of the previous exe cise to deduce DESARGUES' theorem. Let 4): P(X) ->P(X) be a projective automorphism. Prove that 4' is a collineation if it has a line of fixed points, i.e. there exists a line L in P(X) such that (D (R) = R for all ReL. Use an example to show that this is not a necessary condition for 4) to be a collineation.
154 6.
7.
PROJECTIVE GEOMETRY
IV
A collineation is called a central perspectivity if it has a hyperplane of fixed points. Show that a projective automorphism is a collineation if and only if it is a product of perspectivities. Let X and Y be linear spaces over A and let 4): P(X) ->P(Y) be a
projective isomorphism. If Z is a subset or a point of P(X), denote by Z' the direct image of Z under (D. (a) Let L be a line onP(X) and Q, R, S three distinct points on L. Show that a mapping rL (Q, R, S) : A -+ A is defined by
1QR
TL(Q,R,S)
=
[SI
T
S
Q'
R'
T'
Show also that TL (Q, R, S) is an automorphism of A. (b)
Show that if X, Y, Z are any three distinct points on L, then TL (x,Y, z) = TL(Q, R, S) can be denoted by TL.
Therefore this automorphism
(c)
Show that if L 1 and L 2 are two lines in P(X), then TL 1 = TL 2.
(d)
Hence show that for every projective isomorphism 4): P(X) -+ P(Y) there is a unique automorphism T of A such that
r (e) 8.
QR] S
T
=
[ Q' R' S'
T'
Show that if p is a semi-linear isomorphism relative to an automorphism a and if p induces 4), then a = r.
Let a be an automorphism of A. Let (CO, ... , C. I U) and (Do,
... , D I V) be frames of reference of the projective spaces
P(X) and P(Y) respectively. Show that there is a unique projective isomorphism 4):P(X) ->P(Y) such that (i) 4)(Cj) =D; for i = 0, ... , n, (ii) It (U) = Y,
() a
QR S
T
=
4)(Q) 4)(R) 4)(S)
4)(T)
CHAPTER V MATRICES
In Examples 5.8, we gave some effective methods of constructing linear transformations. Among others, we saw that for any finitedimensional linear spaces X and Y over A with bases (x 1, ... , xm ) and (yI , ... , yn) respectively a unique linear transformation ¢: X -> Y is determined by a family (aij)i=1, m; j=1, ..., n of scalars in such a way that
0(xi)=a,1Y1 +
...
+ ainYn for i= 1, ..., m.
Conversely, let >y: X -+ Y be an arbitrary linear transformation. In writing each i (x i) as a linear combination of the base vectors yj of Y, i.e.,
(x1)=A-1Y1 + ... +(3inyn for i = 1, ..., m, we obtain a family ((3,j),=1, ... ,m; j= 1, .... n of scalars. Thus relative to the bases (x 1, . . . , xm) and (y, . . . , yn) of X and Y respectively, each linear transformation 0: X -+ Y is uniquely characterized by a family ( a 1 1 ) 1 = 1 ,
. .
. ,m/1,... ,,, of mn scalars of A.
This therefore suggests the notion of a matrix as a doubly indexed family of scalars. Matrices are one of the most important tools in the study of linear transformations on finite-dimensional linear spaces. However, we need not overestimate their importance in the theory of linear algebra since the matrices play for the linear transformations
only a role that is analogous to that played by the coordinates for the vectors.
§ 13. General Properties of Matrices A.
Notations
Let p and q be two arbitrary positive integers. 13.1. A real (p, q)-matrix is a doubly indexed family M = (I1ij),=1, ... , p; q of real numbers. 1, DEFINITION
A complex (p, q)-matrix is similarly defined. Here again we use 155
156
V
MATRICES
the term "M is a matrix over A" to mean that M is a real matrix or a complex matrix according to whether A = R or A = C.
It is customary to write a (p,q) -matrix M = (pil)i° 1 , I 1 , . . , q in the form of a rectangular array thus:
.
,p;
.
M= I
1111
1112 .....µ1q
1121
1122 .....µ2q
pp 1
AP 2 .....11p q J .
From the definition, it follows that two matrices
M=
and
N=
I
11111
1112 .....111q
1121
1122 .....112q
11p1
11p2 ..... 11pq
vI1
v12 ..... vls
v2 1
v2 2 ..... v2 s
I
................ I
yr 1
yr 2
..... vrs
I
are equal if and only if (i) p = r and q = s and (ii)11il = v;l for every pair of indices i and j.
We introduce now the following notations and abbreviations to facilitate references and formulations. (a) The ordered pair (p, q) is called the size of the matric M. If
this is clear from the context, then we shall also write M = p**. For practical purposes, we do not distinguish the (1, 1)-matrix (X) from the scalar X. (b) The scalar 11ii is called a term of the matrix M and the
indices i and j are respectively called the row index and column index of the term pig.
(c) For every row index i =
1,
.
.
.
,
q, the family pi*
§ 13
= (µ;j) j=1,
GENERAL PROPERTIES OF MATRICES
157
... , q is called the i-th row of M. Clearly
pt*is a (1, q)-matrix over A. On the other hand pt* is also a vector of the arithmetical linear space Aq ; therefore we may also refer to pr* as the i-th row vector of M.
(d) For every column index j =
1,
.
. .
,
q, the family
p*j = (pjj)1=1, ... , p is called the j-th column of M Clearly p*j is a (p, 1)-matrix over A.On the other hand p*; is also a vector of the arithmetical linear space AP ; therefore we may also refer to p*j as the j-th column vector of M. (e) The term pi/ is therefore said to belong to the i-th row and the j-th colomn of M. (f) The diagonal ofM is the ordered p-tuple (i 1,1222 , , ppP) if p < q and it is the ordered q-tuple (121 I , µ 2 2 , ... 4Lgq )
ifp>q.
Consider a rotation in the ordinary plane about the origin 0 by an angle 0. The point P with cartesian coordinates (x, y) is taken into the point P' with cartesian coordinates (x', y'), where EXAMPLE 13.2.
x'=xcos0 -ysin0 Y' = x sine +Y cos 0. We call cos B
sin 0
-
sin 0 cos B
the (2,2)-matrix of rotation
TY
P,=(x"Y')
0
= (X, Y)
_x
158
V
MATRICES
Let M = p** be a (p, q)-matrix. Consider the
EXAMPLE 13.3
(q, p)-matrix Mt = M'** where µi j = ,2 t for all i = 1, ... , p and j = 1, ... , q. The matrix Mt is called the transpose of the matrix M and is obtained by turning M around about the diagonal.
Al\ 1112 ........ µ1q #21 \#22 ........ #2q M=
`µI
\\
µ21
lap1
A12 #22
Ap2
....................
µP1
1P2 ...#PP\.. Apq
I
.
\\
.
.
Mt =
.
App
1111q
12q
.
Apq I
Between the row vectors and the column vectors we have yj* =At *j and A*j = ptj *. Moreover, for any matrix M, (Mt )t = M. 13.4. Let X be an n-dimensional linear space over A with a base (x1 i ... , x ). For every (n, n)-matrix a** over A, we can define a function F: X2 -* A by EXAMPLE
F(x, Y) =
Ea`jt'?j
wherex=tlx1 + ... +tnxn andy=rilxl+... +rinx, Then
the function F is called a bilinear form of the linear space X i.e., a function of X2 into A that satisfies the following conditions:
(i) F(x + x', y) = F(x, y) + F(x', y); (ii)
F(x, y + y') = F(x, y) + F(x, y') and F(Xx, y) = F(x, Xy) _ W(x,Y)
for any vectors x, x', y, y' of X and any scalar A of A. In other words, F is linear in each of its two arguments. Conversely, if G: X2 -- A is any bilinear form of X, then an (n, n)-matrix P** over A is defined by Q,j = G(xi,Yj)
for every i, j = 1, ... , n. In terms of this matrix R**, we can
§ 13
calculate the value of G at x = S 1 x 1 + ... + t,, xn and y = i 77n
159
GENERAL PROPERTIES OF MATRICES
xn by
1x1
+.
+
G(x, y) = EoilEi11,.
Thus there is a one-to-one correspondence between the bilinear forms of X and the (n, n)-matrices. For each fixed base of X we can determine such a correspondence. Addition and scalar multiplication of matrices
B.
For any pair of positive integers p and q, the set A(v, v) of all (p, q)-matrices over A is clearly non-empty. We shall introduce now appropriate composition laws into this non-empty set so as to make it a linear space over the same A in question. Taking into account that there is a one-to-one correspondence between linear transforma-
tions and matrices, we shall therefore define addition and scalar multiplication of matrices in such a way that these composition laws of matrices should correspond to those of linear transformations (see § 14A).
For any two (p, q) -matrices M' = p' * * and M" =;L"** over A, we define their sum M' + M" as the (p, q)-matrix M = p** whose terms are Ail
= p'ii +
p "ii
for all i=1,...,pandj=1, ...,q. Note that the sum M' + M" is defined if and only if the matrices M' and M" have the same size. If the i-th row p'i* of M' and the i-th row p"i* of M" are both regarded as (1, q)-matrices, then their sum p'i* + p",* is also a (1, q) -matrix and is equal to the i-th row pi* of the sum M' + M". Similarly p'*i + p "*/ = p *i for the columns. For any (p, q)-matrix M' = p'** and any scalar A, we define their product AM' as the (p, q)-matrix M= p** whose terms are
pii = Ap'il for all i = 1, ... , p and j = 1, ... , q. Clearly the i-th row p i* of the product AM' is equal to the product Xp i*. Similarly A*, = Au'*1 for the columns. It is easy to see that the set A(M) of all (p, q)-matrices over A form a linear space over A with respect to the addition and the scalar multiplication defined above. In particular, the zero (p, q)-matrix 0,
160
V
MATRICES
whose terms are all zero, is the zero vector of the linear space A(p, q).
Relative to the formation of transpose (see Example 13.3), addition and scalar multiplication have the following properties:
(i) (M' + M")' =
M"', and
(ii) (X M'), = X M't .
Finally let us consider the dimension of the linear space A(p q).
We use the Kronecker 8-symbols to define a (p, q)-matrix E, . , q for each r = 1, ... , p and each p;1=1,
s = 1, ... , q. All terms of this matrix E,, are zero except the one that belongs to the r-th row and the s-th column which is equal to 1. This family of pq matrices is obviously a base of A(p, q) called the canonical base of A(P.q). Therefore dim A(p,q) = qp. For example, the canonical base of A(2' 3) consists of the matrices: 1
E11
E21
0
0
0
0
0
0
0
6
0
1
0
0
0
0
0
0
0
1
0
E22 = 1
0
0
0
E12 =
E 13
_ 0
0
E23 = 0
0
1
0
0
0
0
0
1
Product of matrices We define now a multiplication of matrices that will correspond in a way to the composition of linear transformations. Let M'= µ'** be C.
a (p, q)-matrix and M" = A"** a (q, r)-matrix both over A. We define their product M' M" as the (p, r)-matrix M = µ** whose terms are
Ail = 141A"11 + /l12;1"2/ + ... + µjgµ'gj
foralli=l, .. ,pandallj=l, ...,r.
Note that the product M'M" is defined if and only if the number of columns of M' is equal to the number of rows of M" and that the
product M%" has the same number of rows as M' and the same number of columns as M". This state of affairs is therefore analogous
to the fact that the composite Ooo of two linear transformations is defined if and only if the range of 0 is identical with the domain of
161
GENERAL PROPERTIES OF MATRICES
§13
and that the domain and the range of the composite ioo are identical respectively with the domain of 0 and the range of 4,. If the i-th row p';* of M' is regarded as a (1, q)-matrix and the j-th column
µ"*j of M" is regarded as a (q, l)-matrix, then their product
is a (1, 1)-matrix whose only term is identical with the term µ;j of the product M' M ". Therefore
rµ 1 lj
µ;j = µ *µ,.* j or l.Lij = (41µ'72 ... µ';q )
µ
2j
µ"qj
t
r, and hence for all i = 1, ...,pandj= 1, M'M"= (1L';*µ"*j);=..., p;i=1, ...,r. Take the matrices
EXAMPLE 13.5.
a A =
c
b
d
and
B=
e
f
h
g
Then we obtain the products AB and BA as follows:
AB =
ae + bg of + bh ea + fc eb + fd and BA = ce + dg cf + dh ga + he gb + hd
.
The usual associative law and distributive laws can be verified by straightforward calculation. However, this will involve a heavy use of
indices. For this reason, we shall use the correspondence between matrices and linear transformations to prove these laws in the following § 14.
Meanwhile we note that, in general, multiplication of matrices is not commutative. Take for instance the matrices
162
V
MATRICES
We obtain 0
1
0
1
0
0
0
0
0
M'M" =
and M"M' =
0
0
0
0
0
1
0
1
0
and therefore M'M" : M "M'. Note also that the product M'M" of two matrices can be a zero matrix while the matrices M' and M" themselves are both distinct from the zero matrices. For instance,
if M'=
0 0
1
0
and M" =[I
0 0
0
then M'M"=
,
0
0
0
0
Relative to the formation of transpose, multiplication of matrices has the property that (M'M")t = (M"t) (M"), i.e., the transpose of a product is the product of the transposes but in the reverse order. Although this can be proved by straightforward calculation, we shall prove it using a different method (see § 14A). In the multiplicative theory of matrices, the identity matrices play an important role. These matrices are defined as follows. For any positive integer p, the identity matrix of order p is the (p,p)-matrix Ip = (Si/)i,/ = 1, ... , p. Thus all terms of Ip are zero except those on the diagonal which are all equal to 1; thus
1100.
0)
.
0
0
1
0
.
.
0
.
.
0
1
0
l0
.
.
.
0
1
For any (p,q)-matrix M = µ**, we get
µi/ = Silµlj + ... + Siiµt/ + ... + 6ipµpl µi/ = µt161/ + . . . + µtj6j/ + ... + µig1gl
foralli= 1, ...,p andj= 1, ...,q. Therefore IpM = MIq = M.
§ 13
D.
Exercises
1.
Let A1=
GENERAL PROPERTIES OF MATRICES
L32 1 ,A1= I1
1
and A3 =
0 0 0
1
3
2
1
1
1
1
2
1
-1
2
10 2 3
7
3
5
1
1
5
1
0 2 -2
2
1
163
10
0 6
500
5 4 6 0
Evaluate Al (A2A3) and A3A3. 2.
Show that the matrices which commute with 0
1
10 0 10 0
0
0
1
0
0
0 0
0
a 0
7
S
0
are of the form
1
0 a 0 7 0 0 a P 0 0 0 a 3.
An (n,n) real matrix (a,) is a Markov matrix if
0 < at/ 4 1 and E
4.
1
ail=1
1
1,
2, . . . , n .
Prove that the product of Markov matrices is a Markov matrix. a 0 b and N I c 0 . Show that Let M = d e 0
7a
M3 - (a + c)M2 + (ac - bd)M - (be - bcd)I3 = 0 and
N2 - (a + S)N + (aS - f7)I2 = 0.
164 5.
V
MATRICES
Let E;j be the canonical base of the linear space A(n,n) of all square matrices of order n and A = (a**) E A(n,n)
A, then ak; = 0 for k = i and a jk = 0 for k j. Hence show that if A commutes with all matrices of A(n,n )
If AE, j =
6. 7.
8.
then A = aId . Find all square matrices A of order 2 such that A 2 = 0. Find all square matrices B of order 2 such that B2 =12 . Show that with respect to multiplication 11
0
0 1
-1 0
1
-1 0 0 -1
0
0 -1
011,
form a group.
A is a (n,n)-matrix whose entries are all 0, 1 or -1 and in each row or column there is exactly one entry which is not zero. Prove that A, A2, A3 ... are of the same type and hence show that Ah = In for some positive integer h. 10. Let H be the set of all complex square matrices of order two of the form
9
M(x,y,z, t) =
I x+yi
z+t
(a) Show that the product and the sum of any two matrices of
H are again matrices of H. (b) Show H is an associative algebra. (c) For any matrix M(x, y, z, t) distinct from the zero matrix find the matrix M(x', y', z', t') such that
x2+y2+z2+t2 M(x, y, z, t)M (x', y', z', t') =
0
(d) Find all the matrices M of H such that M2
0
x2+y2+z2+t2 I.
11. If A and B are square matrices of the same order we denote by
[A, B] the matrix AB - BA. Let
§ 13
Ml
_
0
0
_
0
M2 1
0
0
i
M4
165
GENERAL PROPERTIES OF MATRICES
_
1
0
M
_
[
0
0
i
0
i
C
Ms
0
1
?
]
M6
0 -i
be complex matrices where i is the imaginary unit. Evaluate all
[M;, Mil and determine all matrices A which commute with each M; (i.e. AM; = M;A). 12.
Let M =
(a)
be a (2,2)-matrix where Q* 0.
I
Find a necessary and sufficient condition for
B= to commute with M. (b) Find the dimension of the subspace of A(2,2) consisting of all matrices commuting with M. 13.
Prove that the set of (n,n)-matrices which commute with a
fixed (n,n)-matrix form a subspace of the linear space A(p.n). 14. Let A = (ai)r,i =1, 2, n be a square matrix. The trace Tr(A) of A is defined by
Tr(A) = all + a22 +
+ app .
Show that Tr(A + B) = Tr(A) + Tr(B) and Tr(AB) = Tr(BA). Give an example to show that Tr(AB) * Tr(A)Tr(B). Show also that for any matrices A and B, AB - BA 0 I. 15.
Let X be the linear space of all square matrices of order n. i.e.
matrices of size (n,n). Show that for every feX* there is a unique A feX such that for every MeX
f(M) = Tr(AfM). Show also that f(MN) = f(NM) holds for all M NEX if and only if the matrix A f is a scalar multiple of the identity matrix.
166
V
MATRICES
16. A real square matrix A of order n is said to be positive definite
if XAXt is a positive real number for every nonzero (l,n)matrix X. If A and B are symmetric square matrices of order n we define A < B to mean that B-A is positive definite. Show
that if A
§14. Matrices and Linear Transformations
Let us now pick up the loose threads; we must study firstly the correspondence between matrices and linear transformations in more detail, i.e., the formal relationship between the linear spaces
Hom (X,Y) and A(P.q), and secondly the effect on the matrices arising from a change of bases in X or in Y.
Matrix of a linear transformation Let X be a p-dimensional linear space and Y a q-dimensional linear space both over A. Relative to a fixed base B = (x1, . . . , xp) of X and a fixed base C = (y 1 , ... yq) of Y, every linear transformation 0: X -> Y determines a unique (p,q)-matrix MBC(cb) = a** over A in such a way that A.
O(xi) = ai1y1 + ... + aigyq
for each i = 1, ...,p. We call MBC(¢) the matrix of the linear trans-
formation ¢: X -* Y relative to the bases B = (x1, ... , x,)and C = (y1, ... , yq ). If the bases in question are clear from the context, we use the abbreviationM(O) for MBC(¢).
Assigning to each ¢e Hom (X, Y) its matrix M(O) relative to the fixed bases B and C in question, we obtain a surjective mapping MBC : Hom(X, Y) - A(P.) of sets. This mapping M is obviously a linear transformation of linear spaces; it is also injective, for if MBC(¢) is the zero matrix, then 0 must be the zero linear transformation. Thus, for each pair of fixed bases B and C of X and Y respectively, we obtain an isomorphism MBC : Hom (X, Y) -+ A( q) . We see now that the algebraic structure of the linear space A(P, q) de-
fined in § 13B corresponds to the algebraic structure of the linear space Hom (X, Y) defined in § 7A in a most natural way. For this reason, the linear space A(P,q) can be regarded as an arithmetical
§ 14
MATRICES AND LINEAR TRANSFORMATIONS
167
realization of the linear spaces Hom (X, Y) in the same way as AP is an arithmetical realization of any p-dimensional linear space X. We now justify the statement at the beginning of § 13C that the multiplication of matrices corresponds in a way to the composition of linear transformations. Let Y, Y, Z be three linear spaces all over A and B = (x 1, ... xp ), Yq ), D = (z1, ... , zr) bases of X, Y, Z respectively. C= (y, If 0: X -* Y and 0: Y - Z are linear transformations, then we con-
sider the matrices MBC(cb) =a**, MCB(0) = R** and MBD (l / -0) = 7* * .
These matrices have size (p, q), (q, r) and (p, r) respectively; their terms are respectively defined by
4(xi) = a11Y1 + ... + a;gyq for i = 1, ... , p, (3irzr for j = 1, ... , q, 'P(Yi) = =7i1z1+... + 7irZr for i = 1, ..., p. oO(xi) Now 7* * and the product (a**) (/3*,,) have the same size (p, r). Since
( o0) (xi) = >fi (O(xi)) for each i = 1, ... , p, we get 7j1 Z1 + ... + 7ikZk + ... + 7irZr = ko(xi)) = ip (a11 Y1 + ... + aiiyj + ... + aiq yq) (Eaii(3ik)Zk + ... + (EaiiIir)zr =
=(ai:l+.1)z1 + ... + (ai.R.k)zk + ... + (ai-(3.r)zrTherefore 7ik = ai*(3*k for all i= 1, . . . , p and all k= 1..., r, i.e., MBD MBC(cb)MCD or simply M(i,oO) = M(O)M(i)
So the matrix of a composition of two linear transformations is equal
to the product (in the reverse order) of the matrices of the linear transformations. Consequently, the associative law and the distribu-
tive laws of multiplication of matrices follow immediately from those of the composition of linear transformations.
Earlier in Example 13.3, we introduced the transpose Mt of a matrix M. We study now the relationship between the transpose
of a matrix and the dual of a linear transformation. The base
.. , x ) of X and the base C- (yl , ... ,yq) of Y determine uniquely dual uses B* = (fl, . . . , fp) of the dual space X* of X and
B = (x1,
... , gq) of the dual space Y* of Y. We want to know the relationship between the matrices MBC
C* _ (gI ,
()
168
V
MATRICES
= a** and Mc*B * The matrix P** is a (q,p)-matrix whose terms Pji are determined by the equations
0*(gi)=P11f, +...+(31Pff forallj=l, ...,q. By the definition of dual transformation, we get O*(gi) = gjoo for all
j = 1, ... , q. Therefore from the equations
(0*(g1))(xi) = P/ If, (xi) + ... + Pjifi(xi) + ... +Plpfp(xi)
=Pj1s1i+ ..+Piisii+ ... +P/P8Pi=iji and
(81'0(x;) = g1(O(x1)) = gl(ai1y1 + ... + ailyj + ... + aiq yq)
=ai1g,(y1)+... +ai1gi(yi) +... +ajggj(yq) = ai 1 Sit + ... + aij Sii + ... + aiP SjP =al, we get the equations iji = ail for all i = 1, ..., p and all j = 1, ... , q In other words, the matrices a** and P** are transposes of each other; therefore MBc (0)` = Mc*B*(0*) or simply M(g)t = M(O*),
i.e., relative to two fixed pairs of dual bases, the transpose of the matrix of a linear transformation is the matrix of the dual transformation.
Finally it follows from (M(4)M(iy))t = M(,Jlo4)t = M((Joo)*) = M(O*o0*) =M(iJi*)M(c*) =^,)tM(Of that (MN)t = NtMt for any matrices M and N if their product MN is defined. Thus the transpose of the product of two matrices is the product in the reverse order of their transposes. B.
Square matrices
Let us now consider matrices of endomorphisms of a finitedimensional linear space X. If X has dimension p , then relative to any base of X the matrices of endomorphisms of X all have the size (p,p). These matrices are called square matrices of order p. Since
the product of two square matrices of order p is again a square matrix of order p , the linear space A(P,P) of all square matrices of order p together with multiplication is easily seen to be an
§ 14
MATRICES AND LINEAR TRANSFORMATIONS
169
associative algebra. Relative to any base B of X, the isomorphism MBB : End (X) -> A(P,P) has the following property: MBB(0o0) = MBB ()MBB (0)
In other word under this one-to-one correspondence, the composite of two endomorphisms corresponds to the product of their matrices in the reverse order. In this sense, the associative algebra A(P' P) can be regarded as in an arithmetical realization of the associative algebra End (X). Furthermore under this correspondence, the identity linear trans-
formation ix corresponds to the identity matrix Ip, i.e., MBB (ix) = Ip, since ix(xi) = xi for all vectors x; of the base B = (x1, ... , If 0 is an automorphism of X (i.e., a bijective endomorphism of X) and 0-' the inverse of ¢, then it follows from 0 o 0-' _ 0-' o 0 = ix that MBB (0-')MBB (0) = MBB (0) MBB (0_= 1) MBB (ix) = Ip . This
suggests the notion of invertible matrix. A square matrix of order p is said to be invertible or non-singular and only if there exists a matrix M' such that MM = MM' = Ip. In this case, M' is necessarily a square
matrix of order p; furthermore M' is uniquely determined by the requirement M'M = MM' = 1p. Indeed if M" is such that M"M = MM"
= Ip, then it follows from M'(MM") = (M'M)M" that M' M". The matrix M' is called the inverse of the invertible matrix M and is usually denoted by M-' .
Conversely if the matrix MBB ((¢) of an endomorphism 0 is an invertible matrix M, then M determines uniquely an endo-
morphism / such that MBB (0) = M-' . It follows oO_ MBB (0)MBB (0) = MBB (VG)MBB (0) = MBB (ix) that
from o
= ix. Therefore 0 is an automorphism and 0 = 0-1. Summarizing, we obtain the following result: relative to any base
B of X, the automorphisms of X and the invertible matrices correspond to one another under the one-to-one correspondence MBB : End (X) -+ A(P,P). Furthermore under this correspondence, for every 0 e Aut (X) MBB (O)-' = MBB (¢'') and M(0 *)-' = M(O-' )t .
Consequently for every invertible matrix M
(M'' )-' = M and (Ml )-l _ (M-' )t Finally the set of all invertible matrices over A of order p together with multiplication constitutes a group, called the general linear
170
V
MATRICES
group of degree p over A and denoted by GL(p). Just as A(p.p) is an arithmetical realization of End(X), so also is GL(p) an arithmetical realization of the group Aut(X) of automorphism of X (see §6B). C.
Change of bases
So far, in the discussion on matrices of linear transformations, we have had to choose a fixed frame of reference determined by a base of the domain and a base of the range linear space. We must now study the effect on such matrices arising from a change of base in the domain and a change of base in the range linear space. This state of
affairs is similar to the transformation of coordinate-systems in analytic geometry. Let B = (x 1i ... , xp) and B' = (x'1,
... , xp) be any two bases of a linear space X over A. In writing each vector of B' as a linear combination of vectors of B:
x'1 _'y,1x1 +...+yipxp for i = 1, ...,p, we obtain a square matrix G = y** of order p over A, called the matrix of change of the base B to the base B'. Relative to the base B
(used both in the domain and the range linear spaces), G is the matrix of the automorphism r: X -+ X defined by T(x1) = xi for . . . , p, i.e., G = MBB(T). Therefore the matrix G of the change of B to B' is an invertible square matrix of order p; its inverse matrix G'1 is the matrix of the change of the base B' to the base B. On the other hand, if B' is used as the base of the domain and B is used as the base
i = 1,
of the range linear spaces, then the same matrix G is the matrix of the identity linear transformation ix: X -+ X relative to these bases, i.e., MB.B(ix) = G.
We shall now establish the relationship between the matrices of one and the same linear transformation 0: X - Y relative to two pairs of bases in terms of the matrices of changes of bases. Let B = (x1, ... , xp) and B' _ (x'1, ... , x'p) be two bases of the linear space X, and G be the matrix of the change of the base B to the base B'. Similarly, let C = (y r , ... , Yq) and C = (y 1 , . . . , y'q) be two bases of the linear space Y, and P be the matrix of the change of the base C to the base C. Any linear transformation ¢: X -> Y, can be factorized into 0 = iyoooif. We calculate their matrices relative to the bases as indicated in the following diagram:
§ 14
MATRICES AND LINEAR TRANSFORMATIONS
X (with base B')
-
171
Ix) X (with base B)
Y(with base C)
Y(with base C').
Since G and P-1 are the matrices of the linear transformations at the ends we get MB.C.(q)=MB.C.(iyo oiX)=MB.B(iX)MBC(4)Mcc.(iy) = GMBC(O)P-1.
Therefore the matrix of a linear transformation changes according to the rule: MB-C.(cb) = GMBC(O)P_1.
As a result of the change of the base B to B', the dual B* of B changes to the dual base B'* of B'. From the result we have just obtained, the matrix of the change from B* to B'* is given by MB.,B,(iX,). Since
MB.,B,(iX,) = MBB.(iX)r =
(MB.B(iX)r)-1,
the matrix G =-,y** of the change from B* to B'* is the inverse of the
transpose of the matrix G = 7** of the change from B to B' i.e., (7= (G')'1; for the terms of these matrices we get ?
k 17ik'1')k = St/
for i, j = 1, ... , p.
D.
Exercises
1.
Write down the matrices of the following linear transformation relative to the canonical bases.
(i)
: C3 - C 2 such that
p1(a1,a2, a3) = (0, a1+a2+a3) (ii) p2 : C4 - C4 such that 'P2(0(11 a2,a3,a4)=013,a4,a1,a3)(iii) 'p3 : C2 -* C4 such that p3(a1, a2) _ (a1, a1+a2, a1"a2, a2)
172 2.
V
MATRICES
The matrix of a linear transformation gyp: C4 -> C3 relative to the canonical bases is
1
2
4
-1
2
-2
3j
2
Find the images of the following vector under gyp:
(2 tom, -3 F-1, 4, 0); (1, 1, 1, 1); (0, 0, 0, 3.
)
.
Let p be an endomorphism of R4 whose matrix relative to the canonical base is as follows 11
1
-1
1
-1
-1 -1 -1 -1 1
1
1
-1 -1 -1
4.
Show that x + y + z + t = 0 is a necessary and sufficient condition for a vector (x, y, z, t) of R4 to belong to Imp. Find dim For each real number 0, let Pe denote the endomorphism of R2 whose matrix relative to the canonical base is
Show that (a) 5.
3
cosh
--sing
sine
COs,
`pe+©,
and (b) cpe 1 = p--6
Find the matrix of change of the base B to the base B' in each of the following cases. (a)
B = ((1, 1, 0), (0, 1, 2), (-1, 1, 1)' B'= ((0,0, 1), (-1, 1, 1), (2, 1, 1))
(b) B= ((10,-2,5), (1,1,2), (3,2,1)) B'= ((0, 1 ,
1 ),
(4, - 1, 2), (1, 2, -1))
§ 14
6.
MATRICES AND LINEAR TRANSFORMATIONS
173
Let p be the endomorphism of R[T] which takes each polynomial f(T) into the polynomial f(T+ 1) - f(T) find the matrix of p relative to the base
T(T-1)
T(T-1) (T-2)
2!
7.
3!
T(T-1) ... (T-n+2)) (n-1)!
Let X, Y be linear spaces with bases B and C respectively. Let gyp: X- Y be a linear transformation. If,xEX, denote by MB (x) the
row vector of coordinates of x relative to the base B. Show that MM 4x) = MB(x)MBC4) 8.
Let X be an n-dimensional linear space over A with base B. Let p be an endomorphism of X. Show there exists a base C = (yl,
. . . ,
y,)such thatfori= 1, ...,n
tipy, = X;y;for a scalar X1EA
if and only if there exists an invertible square matrix P of order n over A such that the matrix P-' MBB('P)P 9.
has zero entries off the diagonal. Let X be a linear space with base B = (x 1, X2, x3, x4) . Suppose pp is an endomorphism of X such that
MBB(O)_
(a)
1
0
2
I
-1
2
1
3
1
2
5
5
2
2
1 -2
Find Mcc(tp) where C= (X1 -2X2 +X4, 3X2 -X3 -X4, X3 +X4, 2X4),
(b) Find Keep and Imp. (c) Select a base in extend it to a base of X and find the matrix of p relative to this base. (d) Select a base in Imp extend it to a base of X and find the matrix of p relative to this base. 10.
Let X and Y be linear space over A of dimensions p and q respectively. Show that if gyp: X -+ Y is a linear transformation of
MATRICES
V
174
rank r, then there exist a base B of X and a base C of Y such that 0
Hence show that for any (p, q)-matrix M of rank r there exist UE GL(q,A) and VE GL(p,A) such that UMV =
Cf' -0
0J. O
I
11. Let X be a linear space, p an endomorphism of X and x a vector
of X. Suppose pk -'x
0 and pkx = 0 for k > 0.
pk-'x are linearly independent. (b) Show that if k = dim X, then there exists a base B of X such that (a)
Show that ipx, pzx,
. . .
,
MBB (sP) =
OJ 1
where all the unspecified entries are zero. 12. Let X be a linear space and let B, C be two bases of X. Show that if o E End (X), then
(See Exercise 14 of paragraph § 13 for the definition of Tr). Define Trip = Tr(M B(op)) for all pE End (X) and show that for gyp, 0 E End (X) and XeA
Tr ('P. VI) = Tr(41 o,P),
Tr(?up) = X Trip and
Tr(ix) = dim X.
§15
175
SYSTEMS OF LINEAR EQUATIONS
§ 15. Systems of Linear Equations
A system of equations
(E)
a1 1X1 + ... + a1 nXn = a1
n+1
a2 1X1 + . .. + a2 n Xn = a2
n+1
am1X1 + ... +amnXn =am n+1
with known scalars at1 of A and indeterminates X1
,
... , X, is
called a system of m linear equations in n indeterminates with coefficients in A. The problem is to solve these equations, i.e., to find all possible n-tuples (X1 i ... , An) of scalars of A so that after substituting each Xi for each Xi in (E), we get
allX1 + ... + aInXn = a1 a2 111 +
n+1 + a2 n Xn = a2 n + 1
........................... am 1X1 +. .. + amnXn = am n + 1 .
Not every system of linear equations can be solved; take for instance the system
X1+X2=0 X1+X2=1. We now wish to find a necessary and sufficient condition for the solvability of the system (E) and we propose to study this problem in
terms of matrices and linear transformations. Associated with the system (E) of linear equations are two matrices
A0=
Ia11
.
.
.
.
aln
a21
.
.
.
.
a2n
.
.
.
.
amn I
fml
and A =
I all
.
.
.
a21
.
.
. a2n a2 n+I
.
.
. amn am n+ lJ .
`I aml
a1n a1 n+1
The system (E) admits a solution if and only if there is a linear relation X 1 a* 1 + ... + X n a*n = a* n +I among the column vectors of the matrix A. In other words, the system (E) can be solved if and
176
V
MATRICES
only if the subspace of A"1 generated by the column vectors of A is the same as that generated by the column vectors of A.. We enter now into detail discussion of the problem. A.
The rank of a matrix
Using the same notation as in the previous paragraph, we see that the subspace generated by the column vectors of A0 is a subspace of the subspace generated by the column vectors of A. Thus, to com-
pare these two subspaces, it is sufficient to consider the maximal numbers of linearly independent column vectors of Ao and of A. It therefore suggests the notion of column rank of a matrix. Let M = µ** be any (p, q)-matrix over A. The column rank c(M) of the matrix M is the maximal number of linearly independent column vectors of M, i.e., the dimension of the subspace of AP generated by the column vectors of M. Similarly, the row rank r(M) of the matrix M is the maximal number of linearly independent row vectors of M
i.e., the dimension of the subspace of A4 generated by the row vectors of M.
We prove now that r(M) = c(M). Let 0: AP --> A4 be the linear transformation whose matrix relative to the canonical bases of AP and A4 is equal to M. Then the image Im 0 of 0 is generated by the row vectors µ1*, . . . , µp* of M; hence for the rank r (0) of the linear transformation 0 (see §5D) we get r(0) = r(M). Since the transpose Mt of M is the matrix of the dual transformation O* of 0 we get r(O*) = r(MI) = c(M). From 7.8, the required equation r(M) = c(M) follows. Thus the distinction between the row rank and the column rank of a matrix disappears. DEFINITION
15.1. The rank of a matrix M, to be denoted by r(M), is
the maximal number of linearly independent row (or column) vectors of M.
Under this notation, we obtain the following criterion for the solvability of a system of linear equations. THEOREM 15.2. A system
a11X1 + . .. + a1,Xn = a1 n+
1
amtXi + ... +amnXn =am n+1
§ 15
177
SYSTEMS OF LINEAR EQUATIONS
of linear equations is solvable if and only if the matrices
all ...a1n A. =
all ... aln a1 n+1 A =
.........
amt .amn
...............
am1... amn am n+1
have the same rank. B.
The solutions of a system of linear equations
It follows immediately from 15.2 that a system of homogeneous linear equations
a .11X1 +...+a1nXn = 0 (E0)
am 1 X, + ... + am,Xn = 0,
that is a system of linear equations in which the constant terms are all zero, is always solvable. This also is evident from the fact that the n-tuple (0, .. , 0) of zeros is a solution of (E0). We analyse now the
solutions of (E0). Each equation of the system (E0) gives rise to a covector or linear function f, of the linear space A" defined by
f(x)
aj1X1 + ... + atnXn
for each vector x = (X1 , . , X,,) of A. Thus the set of all solutions of the system (E0) is identical with the annihilator of the subspace generated by covectors f ... , fm of AP, i.e., the set of all xeA" such that Ji(x) = 0 for i = 1 , ... , m. It follows from 7.7 or by direct verification that the set of all solutions of (E0) is a subspace of An of dimension n-r(A0), since the rank r(A0) of the coefficient matrix A0 of the system (E0) is the dimension of the subspace generated by the covectors theorem.
We have therefore proved the following
THEOREM 15.3. The set of all solutions of a system of homogeneous
linear equations (E0) in the indeterminates X1,
.
.
.
X,,
is a
subspace So of the arithmetical linear space A". Furthermore dim So = n - r(A0) where r(A0) is the rank of the coefficient matrix A0 of the system (E0).
178
V
MATRICES
For a general system of linear equations
allXL + . .. + amnXn = a1
n+1
(E)
am, X 1
+- +amnXn = am n+ i
we call the system of homogeneous linear equations a11X1 +
+ amnXn = 0
(E0)
am1X1 +...+amnXn = 0
the homogeneous part of the system (E). We have seen that the system (E) is solvable if and only if r(A) = r(A0). Suppose this is the case and let S denote the set of all solution of (E). Using notations in the proof of 15.3 above, we see that xEA" belongs to S if and only if f.(x) = ain+ 1 for i = 1 , ... , m. It follows that if x and y belong to S, then f (x - y) = f(x) - f(y) = 0, i.e., x - y belongs to the solution set So of the homogeneous part (Eo) of (E). Conversely if xES and
z) = f(x) + f(z) = ain+ 1; therefore x + zES. In other words the solution set S of (E) is an element of the quotient space An/S(,; or in geometric terms, S is a linear variety of the z c= So, then
n-dimensional affine space Aff(A"). C.
Elementary transformations on matrices
We study now an effective method of determining the rank of a matrix. The idea of this method is to find in the subspace generated
by the row vectors 111*, ... , pp* of a (p,q)-matrix I's
1111 1112 .... µ1q
M=
1121 1122.... 1124
Q11p1 1p2 .... lpgJ
a family (p' 1*'
...
p'p*) of generators so that the maximal
number of linearly independent vectors among them can be read off easily. We shall see that this can be achieved by a series of elementary row transformations. These are operations on a matrix that change the rows but do not change the rank of a matrix. There are three kinds of elementary row transformations namely:
§15
SYSTEMS OF LINEAR EQUATIONS
179
the first kind that consists in a transposition of the i-th and j-th rows, (notation: R(i:j)), the second kind that consists in multiplying the i-th row by a non-zero scalar X, (notation: R(Ai)), the third kind that consists in adding a scalar multiple of A and the j-th row to the i-th row, where i 0 j,(notation: R(i + Aj)). For example, by an elementary row transformation of the first kind that interchanges the positions of the first and the second rows, i.e., R(1:2), the matrix M is taken into the matrix 11211222....µ2q 1211 1112 ....111q
l µp1 µp2 .... 1.Lpq J
Similiarly,by the elementary row transformation R(A1) of the second kind and the elementary row transformation R(1 + X2) 6f the third kind, the matrix M can be taken into the matrix A1211 NA12 .... AµIq
I
1221
µ22 ....
µp 1
µp 2 .... µp
122q
... µ1q+ 1222 ........1hq
11+Aµ21 µI2+Aµ22 1221
and
....................... I
µp1
µpa
.......
12pq j
respectively. Clearly the rank of a matrix remains unchanged after the matrix in
question has undergone an elementary row transformation since the subspace generated by the row vectors remains unchanged. The elementary column transformations C(i j), C(Ai) and C(i + Xj) are similarly defined and they can be also used for the same purpose of finding effectively the rank of a matrix.
180
V
MATRICES
We show now that any (p,q)-matrix M = p** can be transformed into an echelon matrix
l ....................
CO ... 0 1 v1
0 ............. 0 1 v2 /2+ 1
......... ..
v1g )
. v2q
....................................
N=
0 ..................... 0 1 v, /,+ I ... vq
0 ........................... .....0
O .................................0 J
by a finite number of elementary row transformations. More precisely, the matrix M can be so transformed by a finite number of elementary row transformations into a matrix N = v** for which non-negative integers r, j I, j2, ... , j, exist so that
(i) 0
r ;
(iv) v;/;=I ifi
0
µ1/1
0
AP/1
M=
l0
pp )
§ 15
181
SYSTEMS OF LINEAR EQUATIONS
(b) If necessary, apply an appropriate elementary row transformation of the first kind and an appropriate one of the second kind to take M into a matrix M' = µ'** of the following form
11211
M' = 0
...
µ2q
...
µ' n J (c) If necessary, add an appropriate scalar multiple of the first row 11'p1 1
of M' to each of the other rows of M' (i.e., an elementary row transformation of the third kind) to take M' into a matrix M" = µ"** of the following form 0
0
1
0
0
0
0
0
0
I'1/i+1
... 11'2a
M" =
1"'pjl+l
...
1i'p J
SECOND STEP: Consider the (p-1, q-jl)-matrix
M1 = `A'p,1+.1.
µ,P11+2......
µ'p11
obtained from M" by deleting its first row and its first jl columns. If
this matrix M1 is the zero matrix, then we get r = 1 and need not proceed any further. Otherwise, we can apply similar operations as (a), (b) and (c) above to the matrix M1 without affecting the result on the first row and the first jl columns achieved by the first step of the procedure.
But this procedure must terminate after no more than p steps since the matrix has only p rows and with each step we transform (at least) one row into the required form. Therefore we have succeeded
182
V
MATRICES
in bringing the matrix M into an echelon matrix N = v** by a series
of elementary row transformations. Similarly, we can use the elementary column transformations alone to take the matrix M into an echelon matrix. Let us apply the method we have just learnt to find the rank of the following real (3,4)-matrix A = a** EXAMPLE 15.4.
A =
1
-2
1
Al
2
1
1
A2
0
5
-1
A3
According to the procedure, we apply an elementary row transformation R(2-2 1) and get 11
-2
0
5
0
5
1
A,
-1 -2A,+ A2 A3 -1
Multiplying the second row of this matrix by the non-zero scalar 5 , i.e., R(s 2) 1
-2
1
A,
0
1
-5
5 (-2A, + A2 )
0
5
-1
A3
Now apply an elementary row transformation by subtracting from the third row five times the second row i.e., by R(3-s2) and get an echelon matrix
11
-2
0
1
-5
0
0
0
1
A,
5(-2A,
+ A2)
2A, - A2 + A3
.
From this echelon matrix, we obtain for the rank r(A) of the matrix A
r(A) -
2 if 2A, - A2 + A3 = 0 3 if 2A, - X2 + A3 * 0.
§ 15
SYSTEMS OF LINEAR EQUATIONS
183
Parametric representation of solutions
D.
When the coefficient matrix A of a system of linear equations
a11X1 + ... + alfXf = al n+I
in
am l X l + ... + amn
= am n + 1
is taken into a matrix A' = a'** by an elementary row transformation, then the system of linear equations
a',,Xl +...+ a'1nXn - a1 n+1 .........................
a11X1 + ... +amn4n - am n+1 whose coefficient matrix is A', has clearly the same solutions as the system (E). Therefore the elementary row transformations can also be used to find the solutions of a system of linear equations. It is obvious that elementary column transformations can not be used for this purpose. Assume now that the system (E) is solvable, then the coefficient
matrix M of (E) can be brought by a series of elementary row transformations into an echelon matrix N:
I-
1
0...01 P11 1 ........................ v1n+1 0.........'.+...
.............. P2 n+1 ....................... 0 1 P212+1
......................................... 0 O l vri'+1 ... Prn+1 0 .................................... 0 ...................................... 0.....i.........i......... I..........0
- r-th row
J
jl-th column
j2-th
column
jr-th column
where jr * n+l; for otherwise r(A0) < r(A) and the system (E) would
not be solvable, contradicting the assumption. We can further simplify the matrix N by elementary row transformations so that on
the j;-th-column (i = I
... , r) all terms are zero except the one
belonging to the i-th row. Clearly this can be done by using appropriate elementary row transformations of the third kind alone.
184
V
MATRICES
This means that the coefficient matrix M = a** can be brought by a series of elementary row transformations into an echelon matrix B: 10.
.
.
0
0 .......Q1
1+1
... 0 .......132
n+1
1014' 1 ... Rlj2-1 0 01j2+1 ..
0
.............. 0
0
............................ 0 1 13rjr+1... Or n+ 1
1132j2+1
0 ......................................0 0 ......................................0
jl-th
/2-th
jr -th
column
column
column
J
Therefore the system of linear equations XjI+Q111+1X11+1+
.........................+131,X1 =01 n+1
X12+13212+1 X12+1+ ............... +Q21X1 =92 n+1 (E')
XY1r+Rrlr+IXlr+1 + ... +jnXn
whose coefficient matrix is B, has the same solutions as the system (E). From the matrix B, we see that among the n coordinates of the solution (X1 i . . . , Xn) of the system (E'),we can choose n-r arbitrary parameters and express the rest of the r coordinates in terms of the coefficients Qij and these parameters. Therefore the solutions of the system of linear equations (E) can be given in a parametric representation: )j is arbitrary for j Oil, ... , jr; X111 = Pi n+1 - (Qi ji+1 Aji+1 + ... + Pin),,) for i = 1,
..., r.
Finally it is emphasized that only elementary row (and never column) transformations are used in the above method of solving
linear equations.
SYSTEMS OF LINEAR EQUATIONS
§ 15
EXAMPLE 15.5.
equations
185
Let us find the solution of the system of linear X, - 2X2 + X3 = 1 2X1 + X2 + X3 =
(E)
1
5X2-X3 =-IThe coefficient matrix of the system (E) is 01 1
A=
-2
2
1
0
5
1
1
1
1
-1 -1
which is equal to the matrix A of Example 15.4 with A, = 1, A2 = 1 and A3 = -1. Therefore A can be brought into the echelon matrix.
N=
1
-2
0
1
0
0
1
1
0
0
--i-
from which we see that the system (E) is solvable. According to the proceedure, our next step is to bring the second column into he form
10 1
0
by some appropriate elementary row transformations of the third kind. In this case, we have only to add to the first row two times the second row and get a matrix.
A'
1
0
0
1
0
0
3
3
5
5
0
0
-15 _15
as well as a system of the linear equations 3
X1
(E') X2
+35X3=5 _5X3 1 1
5
The solutions (A1i A2 , A3) of (E') and of (E) have therefore the parametric representation
V
186 Al
MATRICES
3
_ 3 A3
5
5
+
X2 =
(S)
5
A3
5
A3
is arbitrary.
To make sure that we have made no calculation mistake, we check our result by substituting (S) into (E):
3 5
2
3 5
3 A3 5
-2
3 A3 5
+ C 5 + 5
)
(--L5
+
5
A3/ A3/
5
Two interpretations of elementary transformations on matrices There are two ways of interpreting an elementary row or column transformation on matrix A. One way of doing this is to regard it as a E.
multiplication of the matrix A by certain types of special square matrices, the other way is to regard it as a transformation of the matrix of a linear transformation as a result of a change of base in the domain or range linear space. We call a square matrix of order n an elementary matrix if it is of one of the following forms 11
1
0
1
1
0
- i-th row
E _
- j-th row 1
1) i-th
j-th
column column
§ 15
SYSTEMS OF LINEAR EQUATIONS
187
Cl
- i-th row
A
E' =
L
1J i-th
column
A
- i-th row
E" =
1
j-th column
where the unspecified terms are the same as those of the identity matrix I of order n.
The first of these matrices can be obtained from the identity matrix I. by an elementary row transformation R(i: j), the second by
R(Xi) and the third by R(i + Xj). Left multiplication of a (p, q) -matrix A by elementary matrices of order p results in:
EA = R(i:j)A, E'A = R (Ai)A,
E"A =R(i+Aj)A.
188
V
MATRICES
Similarly the elementary matrix E can be obtained from the identity matrix I by an elementary column transformation C(i: j), E' by C(Ai) and E" by CO + Xi). Right multiplication of a (p, q)-matrix A by elementary matrices of order q results in:
AE = C(i:j)A, AE' = C(Xi)A, AE" = CO + Xi)A.
Thus we get a one-to-one correspondence between the left (right)
multiplication by elementary matrices and the elementary row (column) transformations on matrices.
Let us give the second interpretation. If 0: X -+ Y is a linear transformation such that its matrix MBC(O) relative to a pair of fixed
bases B=(x1, ...,xi, ...)x1,
....
...,
xP)ofXandC=(yl,...,Yt,
y, yq) of Y is identical with the given (p, q)-matrix A, then relative to the base B' = (x 1, . . . , x1, ... , x,, ... , x p), B" (x1, Axi, ..., xi, ..., xp)andB"'_(x1, ..., xr+Axe,
...)
x1) ..., xp) of X we get
MB,C(cb) = R(i: j) A,
MB..C(¢) = R(Xi)A, MB...c(O) = R(i + Xj)A. Similarly, relative to the bases C' _ (yl , - ... , y1,
. . , yq ), C" = (YI, ... , Ayr, ... , y1, ... , yq) and C,., = (YI, ... , .yr, ... , yi+Ayr, ..., yq)ofYweget MBC-(O) = C(i:j)A,
MBC-.(O) =
. . . ,
yr,
C(1i)A,
MBc...(0) = C(i - Aj)A
.
Thus an elementary row (column) transformation on A corresponds to a change of base in the domain (range) linear space X (Y). The possibility of bringing a given matrix A into echelon from can be therefore interpreted as follows. Let 0: X - Y be a linear transformation. Then a base B = (, ... , xp) of X and a base C of (yl , ... , yq) of Y can be found so that if r is the rank of 0, then 05Fr = yr for i = 1, ... , r
0xr = 0 for i = r + 1, ... , p.
§15
189
SYSTEMS OF LINEAR EQUATIONS
Finally we study an effective method of fording the inverse matrix of an invertible matrix. Let A = a * * be an invertible matrix of order
p. Then the rank of A is p; therefore by a finite number of
elementary row transformations R 1, ... , Rm we can bring A into echelon form which, in the present case, is the form of the identity matrix Ip. Thus
Ip = Rm (Rm -1 ... (R2 (RI (A))) ...). By way of the first interpretation of elementary transformation, each R; corresponds to the left multiplication of an elementary matrix. Therefore there exist elementary matrices E1i . . . , Em such that
Ip = EmEm -1
... E2 EI A.
Therefore multiplying both sides of the last matrix equation on the right by A-1 , we obtain A-1 = Em Em-I
..
El Ip.
Reinterpreting left multiplication by an elementary matrix as an elementary row transformation, we get
A-' = Rm (Rm -I
...
(R I (Ip )) ...) .
Therefore to obtain the inverse matrix A-' of an invertible matrix A, we apply consecutively and in the same order the row transformations on Ip which are performed on A to bring A into echelon form.
Let its illustrate this method by the following examples. 15.6.
EXAMPLE 1
0
0
1
To take the matrix A =
1
1
2
1
into the
we apply to it consecutively the elementary trans-
formations R(2-21), R(- 12) and R(1-2). Thus 1
2
1
1
JR(2-zl
A= 1
1
1
0-1
R(-12
0
1_1
0
R(1-2)
1
L0
I2. 1
190
V
MATRICES
Therefore we apply to 12 consecutively the elementary transformations R(2-21), R(- i2) and R(1-2) to obtain A-. Thus:
- 1 - -.- -1 1 01
R(2- 21)
12 =
0
-2
1
15.7.
EXAMPLE
-Is
19 7 6 A = 11 1 2
11 1 1
R(3-91)
1
_ R(1:3)
R(1-2)
1
1
1
1
1
2
976
1
111
1
001
0-2-3
R(2:3)
1
R(-? 2)
(1
01 i
001
1
1
012
001
0 0 1
0 2
C1
=A-'.
R(1 2)
2 -1
0-2-3
>
12-01
R(-i2)
1 0 0RR
01
a
001
R(2--13)1
2
Therefore
001
1 0 0 13
010 0 0
R(1:3)
>
0 1-1 1
R(2:3)
0-9 -10
001 1 0-9 0 1-1
0 1-1
R(2-1)
100
1
0 0 1
R(3-91)
0 1 0
001
L100 r-
1*1
001
X-1
-2 0 2 0 1-1 .10
1
=A-1
1
1-4
2
2
R(2-23) ?
2
1-4
22
2
R(1-2)
2
202
202
Lo 1-i j
0 1-1
Exercises
1.
Find the rank of the following matrices.
10
)
-1
2
2 -2 -2 0 -1 -1
0
1
1
0 (a)
0 1-1
rl -1
2
1
2-2 4 -2 (b)
0
1
1
3
0
6 -1
1
0
1
-1
0
3
0
0
1
1
14
12
6
8
2
I1
0
1
0
0
6 104
21
9
17
1
1
0
0
0
1
0
1
1
0
0
0
0
1
1
0
1
0
1
1
7
.6
3
4
35
30
15
20
(d)
5J
Lo
1
Let M and N be square matrices of order n. Show the inequalities of Sylvester:
r (M) + r(N) - n < r (MN) < min [ r (M), r(N) l . If I = 10,
O
,
find all (2,2)-matrices X such that
X2 -4X + 31 = 0. 4.
0
Li (c)
3.
6
.
F.
2.
191
SYSTEMS OF LINEAR EQUATIONS
§ 15
Solve
2X1-X2 +3X3 = 4 3X1 - 5X2 + X3 = 3 6X1 - 8X2 + 4X3 = 2
192
5.
V
MATRICES
Solve
X1 + 2X2 + 2X3 = 3 2X1 + 3X2 + 5X3 = 7 4X1 + 9X2 + 6X3 =10 6.
Solve
+X4= 0
X1+2X2+
2X1 + 3X2 - 7X3 = 0 -X1 + 4X3 - 2X4 = -2 7.
Solve
- 3af3X3 = 2$3yX1 + 2ayX2 + aPX3 =
4f3yX 1
+ ayX2
$3yX1 - 8ayX2
-
af3X3
0
4y
= 8a$3y
where a(3y * 0. 8.
Solve
aX2 + X3 = 2 2X1 + 5X2
-2X1 +
=1
X2 + j3X3 = 3
and discuss the solutions in relation to the values of a and 3. 9.
Solve
X2 + X3 = a2 + 3a X1 + (a+1)X2 + X3 = a3 + 3a2 X1 + X2 + (a+l)X3 = a4 + 3a3
(a+ 1)X1 +
and discuss the solutions in relation to the value of a.
10. The solutions of
Ot1 I X1 + ... + a,, Xn = a1 n+ I (i)
amlXl + ... + amnXn = am n+1
SYSTEMS OF LINEAR EQUATIONS
§15
193
form a linear variety L in the n-dimensional affine space Aff(A"), and the solutions of a11X1 + .. . + GYInXn - a1 n+1Xn+1 = 0 .....................................
(ii}
+ ... + amnXn - am n+1Xn+1 = 0
am1X1
form a linear variety M in the n-dimensional projective space Pn (A). Discuss the relation between L and M. 11.
Find a necessary and sufficient condition for the points , (an , an , 'rn) of Aff(R3) to lie R2 , 72 ), on a line and find a necessary and sufficient condition for them to lie on a plane.
(«I , R1, 71 ), (a2 ,
12.
Let M be a real square matrix of order n. Suppose r(M) = I. Prove that (a)
a1
I
1
M=
(61,
. . . ,
/3n)
forsome a;,(31eR
a")
(b) M2 =AM foraXeR. 13.
Find the inverse of the following matrices. (a)
y
where a S- yQ
s
1
2
0
0
3
7
2
3
2
5
1
2
(c)
0.
194
V
MATRICES
114
(d)
2
1
0
0
0
0
2
1
0
0
0
0
2
1
0
0
0
0
2
1
0
0
0
0
2 IN
0
al
0
as
(e)
Lan
0
1
where ar. 0 0 for i = 1, ... , n and all unspecified entries are zero. 14.
Find the inverse of 1
0
1
1
1
00
00 15.
0
A
B
0
Let M =
1
1
0
1
. Find M-' in terms of A-1 and B-'
.
§ 15
SYSTEMS OF LINEAR EQUATIONS
195
16. A square matrix M is said to be nilpotent if there exists a positive
integer r such that Mr = 0. Show that if M is nilpotent, then 1-M is invertible and
(1-M)-' = 1 +M+M2 +M3 +... Apply this result to find the inverse of the matrix
rl
2
4
6 81
0
1
2
4
6
0
0
1
2
4
0
0
0
1
2
0
0
0
0
1
CHAPTER VI MULTILINEAR FORMS
Linear transformations studied in Chapter II are, by definition, vector-valued functions of one vector variable satisfying a certain algebraic requirement called linearity. When we try to impose similar
conditions on vector-valued functions of two (or more) vector variables, two different points of view are open to us. To be more precise, let us consider a mapping 0: X x Y - Z where X, Y and Z are all linear spaces over the same A. Now the domain X x Y can be either (i) regarded as the cartesian product of linear spaces and thus
as a linear space in its own right or (ii) taken just as the cartesian product of the underlying sets of the linear spaces. If we take the first point of view then we are treating a pair of vectors xEX and yEY
as one vector (x, y) of the linear space X x Y; therefore we are treating, 0: X x Y -> Z essentially as a mapping of one linear space into another and in this case linearity is the natural requirement. As a
linear transformation ¢ can then be studied with the aid of the canonical injections and projections of the product space X x Y as well as other results of Chapter II. If we take the second point of view and if at the same time we take into consideration the algebraic structures of X, Y and Z separately, then it is reasonable to impose
bilinearity on 0, i.e. to require 0 to be linear in each of its two arguments. Such a. mapping is called a bilinear mapping. We have, in
fact, encountered one such mapping before in §6A and §8A we studied composition of linear transformations Hom (A, B) x Hom (B, C) - Hom (A, Q. The most interesting and important examples of these mappings are the bilinear forms on a linear space X, i.e. bilinear mappings X x X -> A (where A is the 1-dimensional arithmetical linear space fl. The natural generalization of bilinear mapping and of bilinear form are n-linear mapping and
when
n-linear form which are also called multilinear mapping and multilinear
form respectively. The study of multilinear mappings constitutes an extensive and important branch of mathematics called multilinear algebra. In this course we shall only touch upon general properties of multilinear mappings on a linear space (§ 16) and go into considerable detail with determinant functions (§ 17) and much later in § 21 and in § 24 we shall study inner product in real linear spaces and hermitian 196
GENERAL PROPERTIES OF MULTILINEAR MAPPINGS
§16
197
product in complex linear spaces which are important types of bilinear forms.
§ 16 General Properties of Multilinear Mappings A. Bilinear mappings
We begin our discussion by laying down a formal definition of bilinear mapping on a linear space.
DEFINITION 16.1. Let X and Z be linear spaces over the same A. A bilinear mapping on X with values in Z is a mapping 0: X x X -* Z such that O(A1x1 +X2x2 ,x)=X10(X1,X)+X20(X2,X) O(x,A1x1 + A2x2) = Al (x,x1)+A20(x,x2) for all x, x1i x2 eX and X,, A2 EA. We call 0 a bilinear form on X if its range Z is identical with the arithmetical linear space A. A bilinear mapping 0 on X is said to be symmetric if O(x1,x2) _ ¢(x2 , x1) for all x1 , x2EX and it is said to be skew-symmetric (or antisymmetric) if l (X 1 , x2) = -O(x2, x 1) for all x1 , x2FX.
For each pair of vectors x = (A1, A2 , A3) and y = (µ1,µ2, µ3) of the real 3-dimensional arithmetical linear space R3 , we define their exterior product (or vector product) as the vector EXAMPLE 16.2.
xAY=(A2p3 -A3µ2,A3µ1 -A1p3,A1µ2 -A2µ1) of R3. The mapping 0: R3 x R3 -> R3 defined by ¢(x,y) = x A y is then a skew-symmetric bilinear mapping on X. Moreover the exterior product satisfies JACOBI'S identity:
(xAy)Az+(yAz)Ax+(zAx)Ay=0. EXAMPLES 16.3. For each pair of vectors x = (A1i A2 , ... , An) and Y = (µ1 , 1-12, ... , µ,) of the real arithmetical linear space R", we
define their inner product (or scalar product) as the scalar WY) = Al µI + A2µ2 +
+ An An
of A. The mapping 0: R" x R" - R defined by 0(xy) = (xly) is a symmetric bilinear form on R. For n = 2, this is (xly) =AIM + A2µ2
198
MULTILINEAR FORMS
VI
which is a familiar formula in coordinate geometry. On A2, a bilinear form is defined by A1
'P(x,Y) = A1µ2 - A2µ1 = I
µ1
µ2 Azl
for x = (A 1 i A2) and y = (p 1 , µ2 ). It is easily seen that 0 is a skewsymmetric bilinear form.
EXAMPLE 16.4. Consider the real linear space V2 of all vectors on
the ordinary plane with common initial point at the origin 0. The inner product of any two vectors a = (0, P) and b = (0, Q) of V2 is defined as the real number (alb) = pq cos 0
where p is the length of the segment OF, q is the length of the segment OQ and 0 = 4POQ. The value (alb) is also the product of q and the length of the perpendicular projection of the segment OP on the line passing through 0 and Q. Therefore the mapping 0: V2 X V2 -> R defined by O(a, b) = (a I b) is linear in the first argument; by
symmetry, it is also linear in the second argument. Hence 0 is a symmetric bilinear form. If we choose a suitable cartesian coordinate system in the plane with the origin at 0, and if P and Q have coordinates (xl , y1) and (x2, y2) respectively, then we obtain from the well-known cosine law the equation (alb) = XI X2 + Y1Y2 EXAMPLE 16.5.
The inner product of vectors of the complex
n-dimensional arithmetical linear space C" is defined as follows. We denote, as usual, by l the complex conjugate of a complex number A; we then define the complex number
O(x,Y) = A1µ1 + A2µ2 + ... + A"µ"
as the inner product of the vectors x = (XI, A2 , ... , A") and of C. The mapping 0: C" x C" -> C fail to be a Y = (µ 1, 92, .. , bilinear form since it is not linear in the second argument. However the equations (A1x1 + A2x21Y) = A1(x11'.IY) + A2(x21Y)
(xIA1Y1 + A2Y2) = A1(xly1) + A2(x(y2)
hold. For this reason, we called 0 a sesquilinear (i.e. 11/2-linear) form.
§16
GENERAL PROPERTIES OF MULTILINEAR MAPPINGS
199
On the non-empty set B(X, Z) of all bilinear mappings 0: X x X -- Z,
addition and multiplication are defined in the natural way as follows. (01 + 02) (x,Y) = 01 (x,Y) + 02 (x,Y) (A4S) (x,Y) = X (x,Y)
With respect to these composition laws, B(X, Z) is a linear space over A.
For a finite dimensional linear space X there is a o"e-to-one correspondence between bilinear forms on X and matrices ol a certain
size. Let (x1i
... , x,n) be a base of X. Then every bilinear form
0 on X determines a unique (n, n)-matrix a** over A, whose terms are a;l = O(x1, xl)
such that if x = A1x1 +
(i,l = 1, ... , n) . + Anx and y = 1A1z1 + ... + µn then
4(xaY) = E EAiµl¢(xt,xl) = EAfµ/afl This equation cant be written conveniently as a matrix equation:
( L11......... a1n- (µ1 Xx,Y)_(A1, ..., An) ant
aJ µn_)
Conversely, every (n,n)-matrix a** over A defines uniquely a bilinear
form 0 that satisfies the equation above. We have therefore proved the following theorem.
THEOREM 16.6. If X is an n-dimensional linear space over A with base (x1, ... , xn ), and a** is any (n, n)-matrix over A, then there is one and only one bilinear form 0 on X such that O(x1, xl) = ail. We can therefore call a** the matrix of the bilinear form 0 relative
to the bases (x1, ... , xn) of X. It is clear that addition and
scalar multiplication of bilinear forms correspond respectively to addition and scalar multiplication of matrices. From this it follows that dimB(X, A) = (dim X)2. It is also clear that if 0 is a symmetric bilinear form on X, then the matrix of 0 relative to any base B of X is a symmetric matrix.
200
VI
MULTILINEAR FORMS
Conversely if A = as * is a symmetric matrix and B = (x 1i is a base of X, then the bilinear form defined by Cal
... , x )
I ......... alnn 1µl1
O(x,Y)=(X1,...,Xn)
1 pn1 ......... ann ) µn ) for x = Xlxl + ... + An xn and y = µ1x1 i
... , + µn xn is a symmetric bilinear form. Indeed if we denote by L and M the (1, n)-matrices (X1 , ... , X,) and (µl , ... , µn) respectively, then o(x,y) = L A Mt. Then it follows from the definition that
q(y, x) = M A L'. Since L A Mt = (L A Mt)t = (Mt)t At Lt = MALt, O(x, y) = 0(y, x); hence 0 is symmetric.
Analogous to the situation in § 14C, a change of base in the space X will give rise to a change of the matrix of a bilinear form. Let B = (xl, ... , xn) and B' = (x'1 i ... , x',) be linear
bases of the linear space X and G =,y** be the matrix of the change from B to B', thus:
x'i=ytlxl+ ...
.. ., n. If a,1=O(x1,x1) anda11 =O(x;,x,)for i = 1,...,nand +'Yinx,
i = 1,
n, then it follows from bilinearity that a'1, =
k,I
tkyilakl.
Therefore, for the matrix A = a** of the bilinear form 0 relative to B
and the matrix A' = a'* * of the same bilinear form 0 relative to B', we have the following rule of transformation:
A' = GAGt. We note that the transformation of the matrix of a bilinear form arising from a change of bases is quite different from the way the matrix of a linear transformation changes in a corresponding situation. The interesting case where they do coincide is treated in § 22E.
§16
GENERAL PROPERTIES OF MULTILINEAR MAPPINGS
201
B. Quadratic forms
Let 0 be a bilinear form on X. Then the restriction of 0 on the diagonal {(x, x): xEX} of the product X x X defines in a natural way
a function F of X into A. If 0 is further assumed to be symmetric, then the bilinear form 0 itself can be recovered from the function F by the properties of F inherited from the bilinearity of 0. Take, foi example, 0 to be the bilinear form defined by the inner product of vectors of An as given in Example 16.3. In this case 0(X' Y) = (x jy) _ X vu 1 + ... + Xnµn
, ... ,
for the vectors x = (XI, ... , X,) and y = (µI Then F: A" - A is given by
p,) of A".
F(x)=(xix)= Xi + ... +An and we easily verify that
cb(x, y) = 2 { F(x + y) - F(x) - F(y) }
.
DEFINITION 16.7. Let X be a finite-dimensional linear space over A and let 0 be a symmetric bilinear form on X. By the quadratic form
determined by 0, we mean the function F: X -+ A such that F(x) _ 4(x, x). It follows from the definition that for a quadratic form F on X F(x) = F(-x) for every xeX.
A simple calculation using bilinearity shows that 0 can be recovered from F through the following formula: O(x, y) = 2 {F(x + y) - F(x) - F(y) }
.
These two properties of quadratic forms turn out to be properties
that can be used to define quadratic form intrinsically. More precisely we have the following theorem.
THEOREM 16.8. Let X be a finite dimensional linear space over A. Let G: X - A be a function such that (a) G(x) = G(-x) for every xeX, and (b) the function 1y: X x X -- A defined by (x, y) = {G(x + y) - G(x) - G(y) } is a bilinear 2form on X. Then 0 is symmetric and G is the quadratic form determined by >G.
202
VI
MULTILINEAR FORMS
PROOF. It is immediate from the definition that Vi is symmetric. To show that G(x) = qi(x, x) it is enough to verify that G(2x) = 4G(x). By substituting 0 = x = y in (b), we get G(0) = 0. For arbitrary t, u and v of X, it follows from iy(t,u + v) =
(t, u) + ,y(t, v)
and the property (b) of G that G(t+ u + v) - G(t) - G(u + v) = G (t + u) - G(t) - G(u) + G(t + v) - G(t) - G(v)
or
G(t+u+v)-G(t+u)-G(u+v)-G(t+v) + G(t) + G(u) + G(v) = 0.
Substituting t = u = x, v = -x in the last equation and taking the condition (a) into consideration, we obtain 4G(x) - G(2x) = 0. The proof is now complete. If A = x** is the (symmetric) matrix of the symmetric bilinear
form 0 relative to the base B = (x1, ... , x,,) of X and if F is the quadratic form determined by 0, then by a straightforward calculation we obtain
tall
aln
I
I
X,
F(x)=Ea;,X1X1=(X1,... Xn and
annJ
\
'In
J
for every vector of x = XI xj + ... +)\n xn. Therefore we also say: A is the matrix of the quadratic form F. Thus the value of a quadratic form F at a vector x of X is given by a homogeneous quadratic (i.e.
of degree 2) expression in the coordinates X1, C.
... , X,i of x.
Multilinear forms
The definition 16.1 of bilinear form is easily generalized into the following definition.
DEFINITION 16.9. Let X be a linear space over A. A mapping of the
n-fold cartesian product X x ... x X (n times) into A is called a n-linear (or simply multilinear) form on X if for all i = 1, ... , n and
§16
GENERAL PROPERTIES OF MULTILINEAR MAPPINGS
203
all vectors aj E X (j * i) the mapping of X into A defined by x -- 0(a1, ... , ai_ 1) x, ai+ 1, ... , a,) is a linear form on X. In other words, 0 is a multilinear form iff it satisfies the following conditions: for i = 1, . . . , n
0(x1,...)Xi-l,xi+x i,xi+1,...,xn =O(xl,..., xi-l,xi,xi+1) ..., Xn i,xi+1, ..., xn 0(x1, ... ,x1_1,Axi,xi+1) ..., xn)
and
=AO(xl,
..., xi-1, xi,xi+1, ...,
;
xn).
EXAMPLE 16.10. For any three vectors x = (A1, A2, A3), y = (111 , 112, 113 )
and z = (v1, v2 , v3) of the real arithmetical linear space R3 , we define their mixed product as the real number (xlyl z) _ (xiyAz),
where y A z and (x It) are formed according to Example 16.2 and 16.3. A trilinear (i.e., 3-linear) form is then defined on R3 in terms
of the mixed product. It follows from the symmetry of inner product and the skew-symmetry of the exterior product that (xIYIz) = (ylzlx) = (zlx1Y) = -(xlzly) = - (ylxlz) = -(zlylx). Explicit calculation shows that (xlylz) has the same value as the 3 x 3 determinant Al
A2
A3
Al
112
113
V1
V2
V3
EXAMPLE 16.11. Let X be a linear space over A. If for each i = 1, . . , n, ui is a linear form of X, then we define their tensor product as the n-linear form ul ® ... ® un: X x . . . x X -> A .
such that
(ul ® ... ®un) (xl, ... , xn) = u1(x1) ... un(xn) for all x,EX (i = 1, ... , n). More generally if 0 is a p-linear form and 0 is a q-linear form on X, then we define their tensor product $ Q0 as the (p + q)-linear form such that xp,xp+1, ...,xp+q)=0(x1,...,xp)0(xp+1, ...,Xp+q).
204
VI
MULTILINEAR FORMS
The tensor product of multilinear forms is associative, thus:
(000)©t = 0©( © ), but in general it is not commutative. Take for instance two linear forms f and g of a linear space X, then
f©g(x,Y) =f(x)g(Y) g®f(x,Y) = g(x)f (Y) These two expressions need not be identical. D.
Exercises
1.
Show that if B is a bilinear form and p is an endomorphism of a linear space X, then C defined by for all x, yEX C(x, y) = B is a bilinear form of X.
2.
Let X be a linear space and B a symmetric bilinear form of X. Denote by N the set of all z E X, such that B (z, y) = 0 for every y e X. N is called the radical of B. (a)
Show that N is a subspace of X.
(b)
For every pair of elements [x], [y] of XIN define
C([x 1, [y ]) = B (x,y). Show that C is a bilinear form of X/N. (c) Show that the radical of C is 0. 3.
Denote vectors of R2 by (x, y). Show that the mapping F defined by F(x,y) = 2x2 + 3xy + y2 for all (x,y)ER2
is a quadratic form on R2. Find the bilinear form p which determines F. Find also the matrix of p relative to the canonical base.
4.
Let X be a finite-dimensional linear space over A and let p be a bilinear form on X. (a)
If xGX, denote by cpx : X -* A the mapping defined by
px(y) _ ,p(x,y) for all yEX. Show that'px EX *.
§16
GENERAL PROPERTIES OF MULTILINEAR MAPPINGS
205
(b) Prove that the mapping F: X - X* defined by
F(x) _ px for all xEX is a linear transformation. (c) Show that F is an isomorphism if and only if the matrix of pp relative to a base of X is invertible. In this case both p and F are said to be non-degenerate. 5.
Let X be a three-dimensional linear space. Let tp be a symmetric bilinear form on X which determines a non-degenerate quadratic form (gyp itself is said to be non-degenerate in this case). Show that if >y is a endomorphism of X such that pp(Jx, iyy) = p(x,y) for all then the matrix A of p relative to any base of X has the property that its determinant det A = + 1.
6.
Let X be a finite-dimensional linear space. Show that every bilinear form is a sum of a symmetric and a skew-symmetric bilinear form, and that these latter are uniquely determined.
7.
Let X be a finite-dimensional linear space. Show that if the
dimension of X is odd, then there does not exist a nondegenerate skew-symmetric bilinear form X. 8.
9.
A quadratic form F on a real linear space X is said to be positive definite if F(x) > 0 for all x * 0 of X. If F and G are quadratic forms on X, then we define F < G to mean that G-F is positive definite. Show that if F < G and G < H, then F
p(xj,x1)=0fori:jand
(ii)
cp(x,,xi) = 1 or p(x1,xi) _ -1 or p(xi,xi) = 0. (a) Show that X has an orthonormal base. (b) If X0 is the subspace of X consisting of all vectors xEX such that 4p(x, y) = 0 for all yEX, then dim X. is precisely the number of vectors xi in any orthonormal base of X such that p(x1, xi) = 0 (dim X. is called the index of nullity of gyp).
206
Vi
(c)
MULTILINEAR FORMS
There exists an integer r > 0 such that for every orthonormal base (x 1, ... , of X, there are precisely r vectors among xi such that p(xi, xi) = I. (r is called the index of positivity of gyp).
10.
Let X be a linear space and let f, g and h be linear forms on X. (a) Define f®g®h by f®g®h (x,y,z) = f(x)g(y)h(z) for all z, y,zeX. Show that f ®g®h is a trilinear form of X.
(b) DefinefAgAhby
fngnh= f®g®h +g®h®f+ h®f®g
- f®h®g-g®f®h-h®g®f . Show that f n g n h is a trilinear form on X. 11. Let f, g, h be linear functions on a linear space X. (a)
Show that for x, y, z EX,
f(x) f(y) f(z) (fAgnh)(x,y,z) _ (b) Let (x1
,
... ,
g(x) g(y)
g(z)
h(x) h(y)
h(z)
be a base of X and denote by 'pijk the
value (fngnh) (xi, X j, xk) for i, j, k = 1, ... , n. Show that if x = Eaixi, y = E0iyi and z = 1,yiyi, than ai
(fAgAh)(x,y,z) __
I <j
Pi 7i
§ 17 Determinants
Determinants provide an effective (though not always very efficient) computational tool for various purposes; in particular they
are useful to have for determining when vectors are linear inde-
§17
DETERMINANTS
207
pendent or when an endomorphism is an automorphism. They only play an auxiliary role in the subsequent chapters. Thus the reader who does not wish to see the theory, and knows how to compute
determinants can omit this section, or read only the statements concerning properties of determinants. A.
Determinants of order 3 We have seen in Example 16.10 that the determinant
all
a12
a13
a21
a22
a23
all a22 a33 + a12 a23 a31 + a13 a21 a32
a31
a 32
a 33
- all a23 a32 - a12 a21 a33 - a13 a22 a31
of order 3 is the value (a11 az1 a3) of a trilinear form on A3 at the vectors a1 = (all, a12, a13), a2 = (a21, a22, a23) and a3 =(a31, a32, a33) If A = a** is a square matrix of order n, it would be possible to define its determinant IA I or det A by a sum of products of the terms
aii. However, this is obviously a rather tedious task; moreover to start from such a complicated definition it would be difficult to study properties of determinants. Instead we shall try to find out some characteristic properties of the trilinear form (alla2la3) and define the determinant of a square matrix by corresponding properties. Besides being (i)
a non-zero 3-linear form on a 3-dimensional linear space, the form (alla2l%) further enjoys the following properties: (ii) (a11a21a3)=0ifa,=a1 fori*j,and (iii) (e11 eel e3) = 1 for the canonical base (e 1, e2, e3) of A3 .
We now claim that the properties (i), (ii) and (iii) determine (alla2la3) completely. To see this, we first show that from (i) and (ii) it follows that (alla2l%) = -(a21a11a3). Indeed, from (ii) we have (a1 + a21 a1 + a21 a3) = 0 and from (i) we have (a11a11a3) + (a11a21a3) + (a2la11a3) + (a2la21a3) = 0.
Since (a1la11a3) = (a21a2la3) = 0 we get (alla2l%) = -(a21a1la3) Similar argument shows that (iv) (alla2l%) = -(a2Ia11a3); (a11a21a3) = -(a11a3la2); (a11a21a3) = -(a31a21a1); i.e. the value (alla2l%) is multiplied by -1 if two arguments a; and al interchange their positions.
208
VI
MULTILINEAR FORMS
We can now show that (i) - (iii) imply that
(a11a21a2) =
all
a12
a13
a21
a22
a23
a31
a32
a33
for al =--(all, a1 2, a,13),% = (a21, a22, a23) and a3 = (a31, a32, a33). Making use of property (i) we get
(a11a21a2) _ (EalietIEa2jejI ka3kek) = Ealia2ia3k (eil ejI ek) where the last summation is taken over all i,j,k = 1,2,3.
Using property (i), we see that of the 27(i.e. 33) summands in last sum only 6 (i.e. 3!) of them, in which the indices i,j,k are distinct, can be different from zero. Therefore (a11a21a3)=
all a22 a33 (e11e21e3)+a12 a23 a31 (e21e3Ie1) + a13 a21 a32 (e31 e11 e2) + all a23 a32 (e1Ie31e2)
+a12 a21 a33 (e21e11e3)+a13 a22 all (e31e21e1)
Rewriting the factors (eil ejI ek) as i (el 1e21 e3) according to the rule given in (iv) and applying (iii), we finally get
(a11a21a3) =
all
a12
a13
a21
a22
a23
a3l
a32
a33
From the proof above we also see that the properties (i) and (ii) essentially determine the determinant while the property (iii) can be regarded as a "normalization" requirement. While (iii) can only be formulated through the arithmetical linear space, (i) and (ii) can be
adapted to a more general setting. Following the principle of this course to keep the theory coordinate-free as long as possible, at this
stage we shall only make use of (i) and (ii) and introduce the following definition. DEFINITION 17.1. Let X be a linear space over A of dimension n(n > 1).
A determinant function on X is a non-zero n-linear form A such that 0 whenever xi = xi for some i 0 j. A(xl, ... ,
§17
209
DETERMINANTS
The method used in the proof above will help us prove the main result on determinant functions: determinant functions exist and two determinant functions on one and the same linear space differ only by a non-zero scalar (see 17.5). By virtue of this result and with the aid of certain "normalization" requirements we arrive at a viable definition of determinant of endomorphisms and of square matrices. As we have seen in the example of (a 1 1 a21 a3) some working knowledge on permutation (see § 17B) will be necessary for formulating properties similar to (iv) above. To conclude the section
we prove a property of determinant functions which is independent of permutation. THEOREM 17.2. If 0 is a determinant function on a linear space X, then O(x1, ... , xn) = 0 for every linearly dependent family (x1i ... , xn) of vectors of X. PROOF. It follows immediately from the linear dependence of (x1 i
... ,
xn) that an index i exists such that Xi = Xi+1Xi+1 +. ... + XnXn
(Notice that if i = n, then the right hand side of the equation is the zero vector of X.) By n-linearly, we obtain O(xi, ...,x,)= Xj+10(x1, ..., xi-1,
xi+l,xi+1, ...,Xn)
+ Xi+2 0(x1, ... , xi-1,xi+2,Xi+1i ... , xn) ....................................
+ Xn O(x1, ...,xi-i,xn,xi+1, ...,Xn). Therefore ¢(x1,
... , xn) = 0, since each summand at the right hand
side is zero.
Permutations Let Z,, = 11, 2, ... , n }. A permutation of the set Zn is a bijective mapping of Zn onto itself. The composite roo of any two per-
B.
mutations a and r of Zn under the usual composition law of mappings is again a permutation of Z,. The algebraic system (Sn, o), where Sn denotes the set of all permutations of Zn, satisfies the axioms of a group; this group is called the symmetric group of degree
n and is denoted by Sn. It follows from the theory of elementary combinatorics that Sn has exactly n! elements. We remark here that in the definition of the symmetric group Sn, only the number n of elements of Zn is essential, whereas the nature of the elements of Zn themselves is inessential.
210
VI
MULTILINEAR FORMS
Any permutation a of Z, can be represented conveniently by the following notation:
3 ...
2
1
n)
n-1
a(2) a(3) a(n-1) a(n) a For example the cyclic permutation y, defined by y(i) = i+l for i = 1, ... , n-1 and y(n) = 1, is written as y= 2 3 .... n-1 n) rl 3 Q 1)
4 .... n
For the permutations a and T of Z4 1
2
3
2
3
1
4 4
and
1
2
3
3
4
2
1
for example, their two composites can be calculated by a
1
2
3
4
1
1
1
1
2
3
1
4
1
1
4
1
2
1 3
1
2
3
4
4
2
3
1
T
a
1
1
2
3
4
1
1
1
1
3
4
2
1
1
1
1
4
3
1 2
1
2
3
4
1
4
3
2
1
and therefore TOO =
and aor=
We are now going to classify the permutations of Zn into even and odd permutations. Consider the polynomial P= (Xi - Xk)
fl
1GE
in the indeterminates X1, every
... ,
Xn with integral coefficients. For
a polynomial
up= 7 (Xc(i)- Xa(k)). 1
aP = ±P.
§ 17
DETERMINANTS
21 1
We now define the sign of a permutation a of Z,, to be denoted by sgn(a), by the equation aP = sgn(o)P;
and according to sgn(a) = 1 or sgn(a) = - 1, we say that the permutation a is an even or an odd permutation. For any pair of permutations a and r of Z , we have (roa)P = T(aP) and therefore sgn(Toa) = sgn(r)sgn(a). In particular, Toa is an even permutation if and only if a and r have
the same sign, i.e., they are both even permutations or both odd permutations.
Any permutation r of Z that leaves invariant exactly n-2 elements is called a transposition of Z . For a transposition r, there are exactly two elements r and s of Z such that (i)
r(i)=i for all i Or and i *s.
(ii)
r(r) = s and T(s) = r.
Conversely given any pair of distinct elements r and s of Z , we have exactly one transposition r satisfying (i) and (ii) above. Therefore we can write r = (r:s). Clearly for a transposition r, r = T-1. The main properties of transpositions which will be useful later are formulated in the following theorem.
THEOREM 17.3. (a) Every transposition is an odd permutation and (b) every even (odd) permutation is a composition of an even (odd) number of transpositions.
PROOF. (a) Let r = (r:s). We may assume that 1 < r < s < n and partition the set Z into mutually disjoint subsets:
Z =LU{r}UMU{s}UN where L ={i: 1
P=
TT (Xi - Xk) 1
212
VI
MULTILINEAR FORMS
as the product of all factors of the form where i
(Xr - XS).
Now factors of the first four types are left invariant by T, while -(X, - XS). Therefore TP = -P and 7 is an odd (XT(r) - XT(S) permutation. (b) Since sgn(Toa) = sgn(r)sgn(a), we need only prove that every
permutation of Z is a composite of transpositions of Z . If a is a permutation that leaves more than n-2 elements of Z invariant, then
a is the identity mapping which, is the composite of any two identical transpositions. If a is a permutation that leaves p elements of Z invariant and displaces the element r, i.e., a(r) * r, then a also
displaces a(r) and a' _ (a(r): r)oa is a permutation that leaves r invariant and does not displace any element of Z that is left invariant by a. Therefore a' leaves invariant at least p+l elements and a = (a(r): r)oa'. Further factorization of a' leads to the desired result. C. Determinant functions We are now in a position to study determinant functions in greater
detail. The property of determinant functions similar to (iv) at the beginning of § 17A can be formulated as follows. THEOREM 17.4. Every determinant function A is antisymmetric, i.e.
...,
A(xi, Xn)=sgn (a)A(xa(1), ..., for every aESn and every x,EX (i = 1, ... ,n). PROOF. It follows from 17.3 that it is sufficient to show that
A(xi, ... , xn) +A(xr(1), ... , XT(n)) = 0 for every transposition rrSn . Let r = (i : j) where 1 < i < j < n. Then
0(xi, ..., xn)+A(xT(i), ..., XT(n)) =A(X1,..., x,, ..., xi, ..., xn)+A(xi, ..., x,, ...,x,, ...,xn) + A(xi, ... , x,, ... , xi) ... xn) + A(xi, ... xi, .. . , xi, ... , NO =A(xi, .. ., x;+xj, ..., x;+xY, ..., xn)=0.
§ 17
213
DETERMINANTS
We have seen in 17.2 that A(x1 , ... , xn) = 0 if x, , ... ,x, are linearly dependent vectors of X. In the following we show that A is completely determined by its values at a base (y1, ... , yn) of X. Indeed for any n vectors xi = aj1yl + - + ajnyn U = 1, ... , n) of X, we get, by the n-linearity of A, .
0(x,, ...,xn)= E{ali1a(Yi,,x2, ...,xn):i1,EZn} = E{ali1a2;2 A(Yi1,Yi2,x3, ..., xn):i1,i2(-=Zn}
F{a,i1
...
anin0(Yi1,...,Yin):r1,12,...,inEZn}
Since A is a determinant function, in the last multiple summation above, we need only consider those summands a,i, . anin A(yil, .. - , Yin)
where (i1, ... , in) is a permutation of (1, 2, ... , n). Therefore we obtain A(x,, ... , xn) ,0(1) ... ana(n)(Ya(1), ... ,YO(n)). sn
Now A is antisymmetric; therefore we can further simplify the last equation and get l., A(X,, ... , Xn) _ sgn(a)al0(1) ... anU(n) A(Y1, ... ,Yn), oESn
proving our contention. Since a determinant function is, by definition, a non-zero function, it follows from the last result that if A is a determinant function on X and if (y1, ... , yn) is a base, then A(y1, ... , Yn)
# 0. Consequently if A, and A2 are two determinant functions on X, then A, = X02 where X = A, (y1, ... , Yn )/02 (Yi , , Yn) for any -
base (Y1,
.
,
Yn) of X.
It now remains to show that for every n-dimensional linear space X determinant functions exist.
Let 0 be an n-linear form on X and let us denote by a(x) the family (xa(,), ... , XU(,)) for any family x = (x1, .. , xn) of The mapping ;T Xn - A, called the vectors of X and any antisymmetrization of ¢ and defined by fi(x) =Q2; sgn(a)0(a(x))
is clearly an n-linear form of X since each summand on the right hand side is such. If xi = x, for 1 < i < j < n, then the permutation T = (i: j) has the property that T(x) = 7;-1 (X) = x
214
VI
MULTILINEAR FORMS
and aor(x) = a(x) for all aES,,. Pair off each aE 1& with ao1ESn and write consequently fi(x) as a summation of all matching summands of the form
sgn(a)¢(a(x)) + sgn(aor)b(aor(x)) _ (sgn(a) + sgn(uor))0(a(x))
Therefore f (x) = 0 since sgn(a) = - sgn (oor). The existence of determinant function on X will follow if we can exhibit an n-linear form such that its antisymmetrization is non-zero. Let (y1,. , yn) be a base of X and (g1 , ... , gn) the base of X* dual
to(y1,..., y,),i.e.
gi(yj) = sij
1,1= 1, ..
Define 0 to be the tensor product g1®
,
n.
® g, , i.e.
Xx 1 , ... , xn) = 81(x1) ... gn (xn) for all Then the antisymmetrization 45 of 0 is a determinant function on X since
Ryl , ... , y,) = E sgn(a)g1(ya(1)) ... gn(ya(n)) = E sgn(o)S10(1) ... Sno(n)
= 1. We summarize our results in the following theorem.
THEOREM 17.5. For any n-dimensional linear space X over A (n > 1) there exist non-zero determinant functions. If A. is a non-zero deter-
minant function of X then A = X4 (A * 0) for all determinant ... , xn) 0 0 if and only if
functions A of X. Furthermore A0(x1 i x1, ... , xn are linearly independent. D. Determinants
Let X be an n-dimensional linear space and 0: X -> X an endomor-
phism. If A is a determinant function of X, then A': Xn -+ A, defined by
A'(xl , ... , Xn) = 00(x1), ... , O(X,)) for all xi eX is clearly again a determinant function of X. In view of 17.5, there exists a unique scalar A that depends on 0 and A such that A' = XA, i.e., O(xn )) = Xl&(x 1 , ... , Xn )
§ 17
215
DETERMINANTS
for all x;EX. We claim that the dependence of X on A is only apparent. Let H be any determinant function of X. Then we get a non-zero scalar p such that H = p A and therefore
H0(x1), ... , ¢(x, )) = AA(0(x1), ... , O(xn = pXA(x1, ..., x,)_ XH(x1, ..., xn). To summarize, we have
THEOREM and DEFINITION 17.6. Let X be an n-dimensional linear
space over A (n > 1). Then for any endomorphism 0 of X, there exists a unique scalar det(o) of A such that
00(x1), ... , O(xn )) = det($)A(x1, ... , x,) for any x;EX and for any determinant function A of X. This scalar det(o) is called the determinant of ¢. Thus we have defined determinant as a scalar-valued function of endomorphisms; classically it is defined as a scalar-valued function of matrices. Let (x 1, ... , xn) be a base of X, A = a** a square matrix of order n. If 0 is the endomorphism of X such that
Xx1) = aI x I + ... + afn xn
for i = 1, 2, ... , n
then for any non-zero determinant function A of X
A0(x1), ... ,O(xn)) = A(Ea,jxt, ... , Eanlxl) = E sgn (a)alv(1) ... an a(n)A(xI , ... , xn a
n
Comparing this equation with 17.6, we obtain det(O)
; sgn(o)ala(I) ... ano(n) .
vesn
We now define the determinant of a square matrix A = a** of order n by
det(A) = E sgn(u)a1 a(I) QESn
ano(n)
Therefore the connection between the two approaches to the concept of determinant is given by the following equation det (0) = det (M(4))
for any 4eEnd(X)
where M(O) is the matrix of the endomorphism 0 relative to any base of X (see § 14A).
216
VI
MULTILINEAR FORMS
Using the customary rectangular arrangement of matrices, we also write a1 1
a1n
a2 1
a2 n
an 1
an n
det(A) =
For matrices of small sizes, we have explicitly
Ial1 = a11;
all
a12
= a11a22 - a12a21;
a21 a22
all a12 a13
a11a22a33 + a12a23a31 + a13a21a32 a21 a22 a23 a31 a32 a33
-a12a21a33 - a11a23a32 - a13a22a31
For the last equation, there is a convenient way of determining the sign of the six summands; this is indicated by the following diagrams.
all_ a21
a31
a12
'/ /' a3 2
a{
al 1
.. a12
a23
a21
a2z
.
a33a31a32
+\al1
a12
a13
alf
a12
a13
a21
a22
a23
a21
a22
a23
a31
a32
a33
i a31
i P32' a33
_
§17
217
DETERMINANTS
The rule of signs above applies only to determinants of order 3; for determinants of higher order we do not have simple rules.
Finally let us derive some special properties of determinants (of endomorphisms and of matrices).
THEOREM 17.7. Let X be an n-dimensional linear space over A and A(n,n) the matrix algebra of all square matrices of order n over A. Then for any endomorphisms 0 and >G of X and any matrices M and N of A(n,n) the following statements hold. (a) det(O) = 1 if 0 = ix, the identity endomorphism of X. (a') det(M) = 1 if M = In , the identity matrix of order n. (b) det(X¢) = Xndet(O) for any Ac=A. (b') det(XM) _ An det(M) for any XEA. (c)
det(lf/oo) =
(c') det(MN) = det(M)det(N). (d) det(¢) * 0 if and only if 0 is an automorphism and in this case det(O-1) = 1/det(0). (d') det(M) * 0 if and only if M is an invertible matrix, and in this case det(M'') = 1 /det(M). (e) det(¢) = det(0*) where 0*is the dual endorphism of 0. (e') det(M) = det(Mt) where Mt is the transpose of M. Proof. (a), (b) and (c) are direct consequences of the definition. The corresponding statements follow from the relation det(0) = det(M(O)). (d) If ¢ is an automorphism, then it follows from (a) and (c) that 1 = det(¢)det(¢-' ). Therefore det(0) * 0 and det(O"') = 1 /det(O). Conversely
if det(¢) * 0, then for a determinant function A
and a base (x,,
xn) of X, the equation
A(0(xl), ... , 0(x, )) = det(O)A(x
,
... , xn)
holds. Since both factors on the right-hand side are non-zero, we get
A(Xxl ),
... ,
¢(xn ))
0 0,
which means that the vectors
0(x1), ... , O(xn) are linearly independent. Hence 0 is an automorphism. Clearly (d') follows (d).
(e') Let M = (llij)i,j=j, ... , n and Mt = (i'tj)t,t=1, for all indices i and j. Now det(M) = Qn sgn(a)i. 6(1) ... µnu(t )
µi j
... ,
n. Then
218
VI
and
MULTILINEAR FORMS
det(Mt) = E sgn(T)p'i r(j) TES,
µ'nT(n)
_ ; sgn(T)ELT(i)i ... Ar(n)n.
res,
Now for each summand sgn(v)µiv(i) ... µnv(n) of det(M) there corresponds a unique summand sgn(T)MT(1)i Wr(n)n, where
T = a ; and vice versa. For these corresponding summands, we have AT(n)n
sgn((7) = sgn(T) and µT(1)1
therefore det(M) = det(MI). (e) Since M(O*) = (M(O)), we get
= 1 ia(1) ...
An
det(O) = det(M(o)) = det(M(0)1) = det(M(O*)) =det(O*). E.
Some useful rules The determinant of a square matrix all a12...ain
A=
a21 a22 ... a2n
ant ant
ann
was defined in the last subsection as
det(A) = Zsgn(a)aia(1) ... ano(n). R
This is however capable of another interpretation. Consider the ndimensional arithmetical linear space An together with its canonical base (e1, ... , en ). By 17.9, there is a unique determinant function A of An such that A(e1,
..
,
en) = 1;
therefore det(A) = A(a1 *,
where al* _ (ail
... ,
an*)
, at2i ... , atn) is the i-th row vector of A for each
i = 1, 2, ... , n. Hence the determinant det(A) of a matrix A is a
determinant function of the row vectors of A. From this certain rules of practical computation of determinants follow:
219
DETERMINANTS
§17
(a) (b)
det(A) = 0 if A has two equal rows. det(A) _ -det(A') if A' is the matrix obtained from A by a transposition of two rows.
(c)
det(A) = det(A") if A" is the matrix obtained from A by adding to one of its rows a linear combination of the other rows.
(d) X det(A) = det(A"') if A"' is the matrix obtained from A by the multiplication of one of its rows by a scalar X.
Similarly, det(A) is a determinant function of the column vectors of A and from this similar rules are derived.
Finally we see that employing appropriate elementary transformations (see § 15C) we can bring the matrix A into a triangular form
612 .....Eln 622 .....62n E _
0
l 0...........Enn J where all terms below the diagonal are zero, such that
det(A) = det(E) = E11 E22 For example, we have 1
-2
4
2
1
1
0
5
2
E-i
1
-2
0
5
0
5
.. Enn
1
-2
1
-7
0
5
-7
2
0
0
9
1
= 45.
Using a similar argument, we see that if a square matrix A of order n is in the form B C A = O D
where B is a square matrix of order p, D is a square matrix of order q, C is a (p, q)-matrix and 0 is the zero (q, p)-matrix, then
det(A) = det(B)-det(D).
220 F.
VI
MULTILINEAR FORMS
Cofactors and minors
Another way of evaluating the determinant of a matrix A = a** is provided by expansion by a row. This method is effective in the sense that it works; it is not efficient for determinants of order 5 or higher, because of the amount of arithmetic involved. The same is true for CRAMER's rule described below. Let i be a fixed row index and A the determinant function of An such that A(e1 i . , e,) = 1 at the canonical base (e1 , ... , en) of An . Then for each j = 1, ... , n, we have
Aij = A(al...... at-1.) e1,
all ai-1
1
ai+1*,
...
a1 j-1
a1
...
ai-1 j-1
ai-1
, an.)
a1 j+1
j
j
ai-1 j+ 1 0
a1 n
...
ai-1 n
...
0
0
...
ai+i i
...
at+i j-i
ai+1 j
a,+1 j+1 ...
ai+1 n
an i
...
an j-1
an j
an j+1
an n
all
... ai j-i
0
ai /+1
...
ai-i 1
...
0
ai-1 j+1
... ai-i n
0
...
1
0
...
0
ai+1 I
...
ai+1 j-1
0
ai+1 j+1
...
ai+1 n
an j-i
0
an j+1
an 1
0
ai-1 j-I 0
1
ai n
an n
§17
a, ,
1
ai , ai+,
... ... ...
an 1
= A(a*1,
... ,
221
DETERMINANTS
a1 j-1
0
a, j+,
...
ai-1 j-1
0
ai-1 j+1
...
ai j-1
1
ai+1 j-1
0
ai j+, ... ai+, j+1 ...
an j-1
0
an j+1
ei, a*j+,, ... )
a1 n
an n
asn)-
We call A; j the cofactor of the term aij. A; j is the determinant of the matrix obtained from A when we replace the i-th row ai* of A by ej and the j-th column a,j of A by ei, i.e., all terms of the i-th row and
of the j-th column are changed to 0 except the term aij which is to 1. In terms of cofactors, we get for each row index i n
det(A) = EaijAij j=i
,
for det(A) = 0(a, *, .. , an * ) A(a,*, ... , ai-i*, E aijej, ai+,*, ... , an*) n n Z.aijA(aj*, ... , ai-1*, ei,ai+1s, ... , an*) = EiaijAij. .
This is called the expansion of det (A) by the i-th row. The determinant Aij of order n is essentially a determinant of order n-1. By shifting the rows and columns in Aij we get a, ,
...
a1 j-,
0
a, /+1
ai-1 1
... ...
ai-1 j-1
0
ai-1 j+1
ai+1 ,
...
ai+, j-1
0
ai+, j+1
an ,
...
an i-i
0
an i+,
Aij=I 0
0
1
... a, n ai-1 n
0
0
... .
ai+1 n
an n
222
VI
MULTILINEAR FORMS
0 .................................0
1
all
0 = (-1)` +j
ai-1
... ...
1
ai+1 1
a1 i+I
ai-1 j-1
ai_1 j+1
ai+1 j_1
ai+1 j+ 1
an 1
...
ai+11 ...
...
an 1
... ...
ai-1 n ai+1 n
an n
an j+1
all
ai-11
a1 n
a1 j-1
a1 j_1
ai j+1
...
a1 n
a1-1 j-1
ai-1 j+1
...
ai-1 n
aJ+1 j_1 ai+1 i+1
...
ai+1 n
...
an n
an j_1
an j+1
where Mij is called the minor of the term ai, which is the determinant of the square matrix of order n-1 obtained from A when we delete
the i-th row and the j-th column of A. Consequently we have an expansion of det(A) by its i-th row: n
det(A) = I (-1)ai1Mij. j=1
For example a
R
7
S
a'
a R 7q am
a'
-0
=a
7'
a" ,r a"'
6.,
S'
61,
s'n
0N, ,1 m 6' at
+7
all
a°'
0'
5'
got f 11?
s,n
at
(3'
7'
§ 17
223
DETERMINANTS
Similarly the expansion det(A) by the i-th column is:
det(A) = E (-1)i+jajiMji .
Pi
The minors and cofactors are also useful in the evaluation of an inverse matrix. For each square matrix of order n, A = a**, we define
the adjoint matrix ad(A) _ (3** as the square matrix of order n whose terms are :
Qij =(-1)i+1Mji = Aji
(i, / = 1,
. . .
,
n).
In other words, ad(A) is the transpose of the matrix (Aij)f, j = 1, ... , n
of the cofactors of A. It follows from the expansion of det(A) that for each pair of indices i and k, we have n
n
n
EaijOik = EaijAkj = E aij0(al *+ ... , ak-1*, ej, ak+ 1*, ... , an *)
j=1
1=1
/=1
_ A(a1*, ... , ak-1*,
= A(a1*, ... , ak-1 and hence
j=1
aijej, ak+l*, ... , an
ai*, ak+l*, ... , an*) = det(A)Sik,
A - ad(A) = det (A )In
.
Similarly, using the expansion of det(A) by its columns, we get
ad(A) A = det(A)II. From this we derive that if the matrix A is invertible, then det(A) 0 0 and
A-1 = det(A)-' ad(A).
Finally we shall derive the well-known CRAMERS rule for the solution of systems of linear equations. Any system of linear equations
(E)
a11X1 + ... + a1nXn t31 a21X1 + ... + a1nXn = Q2 ........................
a., X1 +...+a.nXn =On
224
MULTILINEAR FORMS
VI
can be written as a matrix equation
AX* = Q* where
a1 n \ a2 l
A=
,an i
(X1 )
a2n
X* =
X2 R*
`Xn / ann J on) If (E) has a unique solution, then A has a rank equal to n and hence .
.
.
it is invertible. Therefore
X* = det(A)-nad(A)(i*
or
det (A) X, =
iI1Ai1Pi
On the other hand Aii = A(a*1 , ... , a*1_1 , ei, a*1+1, .. , a*n) where (el , ... , en) is the canonical base of A" and A is the
determinant function of An such that A(el, ... , en) = 1. By the n-linearity of A, we get n
iEQiAii = 0(a*1
,
a*i-i , R*, a*i+
and hence the solution of (E) is given by f i all ... al i-1 01 al i+1 ... a21
a2i-1
R2
a2 i+1
,
... ,
a*n )
ain
... a2n
...............................
and
ani-1
Rn
ani+i
ann
i=1,.. ,n
Xi =
all ... ai n all ... a2n and ... ann
225
DETERMINANTS
§17
where the denominator is the determinant of the coefficient matrix A while the numerator is the determinant of the matrix obtained from A on replacing the i-th column a*, by the column matrix /3*. These equations are know as CRAMER s rule.
G. Exercises 1. Determine the parity of the following permutations (a)
(b)
(1
2
3
4
5
6
7
8
9)
1
3
4
7
8
2
6
9
5
/1
2
3
4
5
6
7
8
9)
2
1
7
9
8
6
3
5
4
1
2
.
(c)
2.
.
.
.
.
.
n-1
nl
2
1
1
2
.
.
.
.
.
.
.
.
n-l
n
2
3
.
.
.
.
.
.
.
.
n
1
/ /
Find i, j, k, m such that (a)
(b)
3.
.
n-1........
n
(d)
.
2
3
4
5
6
7
8
9
2
7
4
i
5
6
j
9
/1
2
3
4
5
6
7
8
l
k
2
5
m4
91 is odd.
8
9
7
ll(1
Let a = ( (a)
1
. is even,
23456
2 3
1
4 6 5 ) be an element of S6.
Find the least positive integer n such that a" is the identity (n is called the order of a).
(b) Let G = {a, a2, a3,
...
, all }. Show that G is a group with
respect to composition. (c)
Find mutually disjoint subsets J1 , J2, J3 of 16 = 11,2,..., 6) such that (i) JI UJ2UJ3 = 16, (ii) TV,) = J1 for all rEGand i = 1,2,3.
226
VI
MULTILINEAR FORMS
Evaluate
4.
(a)
1
2
6
-2
7
4
-1
19
3
(c)
-3
5
a+6 203
(b)
14
2
a+7
3a 0+2a
0+6
6+i3 'y-j3
7-6
y-26
a+36
6
9-6 7-6 a+7
1
1
1
1
6
-5
3
5
4
2
0
7
3
-8
2
4
2
0
-1
2
3
-1
3
0
0
5
2
1
8
-2 -4
-2 -4
2
-3
6
(d)
-2 -4
0
-5
4
a+6
-2 -4
4
4
2
4
2
0
2
0
-2
4
Prove that
5.
a+(3
a/3
a+(3
1
0
0 a13
a+13
1
=
an+1 - Rn-1
a- R
1
I
a+Q
where all the unspecified entries are zero. 6.
Evaluate
(a)
246
427
327
1014
543
443
-324
721
621
(b)
l+x
1
1
1
1
lx
I
I
1
1
1
1
l+y 1
1
ly
§17
7.
Prove that
I all ...
Cj1.
aii Cll
711...71]
17k 1
8.
... 7kj
0--- 0
0...0 RI1
ail.
131 k
R11...MIk
a1;
akI...akk
...
a71
13k I... Rkk
Evaluate det(a. j) in each of the following cases. (a)
at j = 1 for i = j ± 1 and at j = 0 otherwise.
(b) a = I (c) 9.
227
DETERMINANTS
for i < j and for i = j+l ; a= j = 0 for i = j and for
a1j =0fori=janda1j= 1 fori#j.
Let a1 = a, a2= a+S, a3= a + 28, ... , an= a + (n-1)S. Show that the value of the cyclic determinant a1
a2
a2 a3
Ian is
an-1
an
a3
an
a1
a4
a1
a2
011
(-1)n(n-1)12(ns)n-1(a+
.
.
.
.
. an-2
an -1
i 6).
10. Show that the value of the Vandermonde determinant 1
aI
a21
l
a2
a2
an-1 1 .
.
.
a2-I
is
1 aq
ann-1
f(xt -xj) i>i
.
228
MULTILINEAR FORMS
VI
11. Let M,,. .
.
, Mm
be square matrices. Show that the determinant
of the matrix M, M2
M =
M) consisting of blocks M1, . . . , M, on the diagonal and blocks of zero matrics off the diagonal is equal to the product
detM=detM1 .... detMm. 12.
Let X be the linear space of all square matrices of order n over A and let AEX. Define LA : X -+ X to be the mapping such that LA (M) = AM. Show that LA is a linear transformation and that
detLA =(detA)". 13.
If a(t), b(t), c(t), d(t) are functions of R into R, we can form the determinant
a(t) b(t) c(t)
d(t)
just as with numbers. (a)
Let f(t), g(t) be functions having derivatives of all orders. Let -p(t) be the function obtained by taking the determinant
p(t) =
f(t) f'(t)
g(t)
f(t)
g(t)
g'(t)
Show that
'(t) _
f"(t) g"(t)
(b) Generalize (a) to the 3 x 3 case, and then to the n x n case.
§17
14.
229
DETERMINANTS
Let al , ... , an be distinct non-zero real numbers. Show that the function
ea,t, ea2t, ... , eant are linear independent over R. [Hint: make use of the method in Exercise 13.1 15.
Solve the following systems of linear equations by CRAMER'S rule.
2X, - X2 + 3X3 + 2X4 = 4
(a)
3X1 - 3X2 + 3X3 + 2X4 = 6
3X1 - X2 - X3 + 2X4 = 6 3X1 - X2 + 3X3 - X4 = 6. (b)
=0
5X1+6X2 X1 + 5X2 + 6X3 X2 + 5X3 + 6X4
X3+ 5X4+ 6X5 X4 + 5X5
=0 =0 =0 = 1.
16. Show that the solutions of a system of r homogeneous linear equations
a1ixi+ .. . +amXn=O ai 1 X 1 +
.
.
.
.
+
Xn = 0
of rank r can be obtained by the following method due to FROBENIUS. Augment the coefficient matrix to a square matrix , n such that det A 0 0. Then the vectors A = (ai1)i,1= 1, Ar+i,n) (i = 1, ..., n-r) where Ai1 (Ar+i,1, Ar+i,2, denotes the cofactor of ail are n-r linearly independent solutions of the system.
...,
CHAPTER VII
EIGENVALUES
Given a single endomorphism a of a finite-dimensional linear space X, it is desirable to have a base of X relative to which the matrix of a takes up a form as simple as possible. We shall see in this
chapter that some endomorphisms can be represented (relative to
certain bases) by matrices of diagonal form; while for every endomorphism of a complex linear space we can find a base relative to which the matrix of the endomorphism is of JORDAN form. In § 18
we give a rudimentary study on polynomials needed in the sequel. Characteristic polynomials will be studied in § 19. Finally we prove the JORDAN Theorem in §20.
§ 18 Polynomials
A. Definitions Let A denote again the set R of all real numbers or the set C of all complex numbers. As before elements of A are referred to as scalars. Consider the set S of all infinite sequences f = (ao, a1 , a2,
. .
.) = (a1),= 0,1,2, .. .
of scalars of finite support, i.e. sequences for each of which an index
n exists such that a; = 0 for all i > n. For any two elements
f = (ao, a1 i a2, ...) and g = (00, (31 i R2' ..) of S and any scalar X of A,
we define the sum f + g as the sequence
f+g=(ao +(30,a1 +31,a2 +R2,.. .) and the scalar multiple Nf as the sequence
Xf=(Map,Xal,Xa2,...). Then it is easy to see that both f + g and Xg are again elements of S and that S is an infinite-dimensional linear space over A with respect to the composition laws defined above. The zero vector of the linear space S is then the sequence 0 = (0, 0, 0, ...) and the additive inverse
-f off is the sequence -f = (-ao, -a1, -a2, ...). 230
§18
POLYNOMIALS
231
Let us now introduce another internal composition law in S, called multiplication. For f and g of S, we define their productfg as the sequence fg = (7o , 71 , y2, . ) where
7k= E a1
for all k= 0,1,2,...
1+j=k
Thus the terms 7, of the product fg are defined as follows: 7o = ao go 71 = a0R1 + a1 go 72 = a0(32 + a1 a1 + a2 go
73 = a°a3 +a1 Q2 +a2 Q1 +a3ao
Obviously fg is an element of S. It is readily verified that S satisfies, besides the usual axioms of a linear space, the following properties: (i)
f(gh) = (fg)h;
(ii)
(an (lig) = (X 1) (fg) ;
(iii)
fg = gf; (iv) f(g + h) = fg + fh. Following the customary terminology in algebra, we say that S together with the three composition laws constitute a commutative A-algebra. The sequence 1
= (1,0,0, ...)
is the unit element of S, by which we mean to say that If = fl = f for all f eS. Moreover, for any two sequences f and g, fig = 0 if and only
iff=0org=0.
Consider the sequence T = (0, 1, 0, 0, ...) of S and its powers: T°= (1, 0, 0, 0, ... )
T'= (0, 1,0,0, ...) T2= (0, 0, 1, 0, ...)
Let f = (a°, a1 i a2,. ..). If a, = 0 for all i > n, then we can write fin the more familiar form:
f=a"T" +an-1T"-1 +...+a2T2 +a1T' +a°T°.
232
VIl
EIGENVALUES
In this notation, we write S = A[T] and call it the commutative A-algebra of all polynomials in the indeterminate T with coefficients in A. Addition and multiplication can be expressed as follows:
EaiT`+ E(3iT4 = E(ai +Oi) T`; AEaiT` = EXaiT`;
(EaiT`)(E(3iT`) = E7kTk where 7k
E aigi
i+j=k
It is the custom to write simply T for T1 ; they are just different notations for one and the same polynomial (0, 1, 0, . . .). Consider now the subspace A[ T] o of all polynomials of the form a° T°. Then it is easily seen that A[T] o is isomorphic to A both as linear spaces and as A-algebras under the correspondence a. To --> ao. Moreover
(aoT°) ((3,T" +... +(31 T+(30T°) = ao(an T" +- -+(3T+(3oT°). Therefore we can further write ao in place of ao T °; hence every polynomial in A[ T] has the form
anTn +an_1T"-' +...+a1 T+ao. B. Euclidean algorithm Let f = (a0, a1, (X2, . . .) be a polynomial different from zero. Then by definition there exists a unique non-negative integer n such that ai = 0 for all i > n and an : 0. This integer n is called the degree of
the polynomial f and it is denoted by deg f. A polynomial f of degree n can be written in the simplest form as
f=anT" +an_1Tn"1 +...+a1T+ao where the non-zero coefficient an is called the leading coefficient of f and the last term a° is called the constant term of f. The degree of a product and a sum satisfies the following equality and inequality: (i) (ii)
deg (fg) = deg f + deg g deg(f +g) < max(deg f, deg g).
We observe that the degree of the zero polynomial is not defined. We may find it convenient to set the degree of 0 to be the symbol -00 for which we make the convention that -oc + n = -°°, -°° < n for all integers n, and -o + (-o) = -o, -. o < Under this convention, statements (i) and (ii) hold for all polynomials of A[T].
POLYNOMIALS
§18
233
For any two non-zero polynomials f and g of A[ T] we say that f is divisible by g, g divides f, f is a multiple of g, g is a factor of f or g is a divisor off if f = gh for some polynomial h. If f is divisible by g then deg g < deg f. The converse of this is not true. THEOREM 18.1. (Euclidean algorithm) Let f and g be two non-zero polynomials of A[T] . Then there exists polynomials s and r of A[T] such that f = sg + r and deg r < deg g. PROOF. If deg f < deg g, then f = Og + f satisfies the requirement of the theorem. Therefore we assume that deg f > deg g. +(31T+go +a1T+a0 andg=/3mTm+ Letf=an T" + T"-,, where n > m > 0 and an * 0, gm 0. If s 1 = In then the degree of the polynomial
r1 =f-s1g clearly satisfies deg r1 < deg f. If deg r1 < m, then the desired result holds since
f=slg+r1. Otherwise we can apply the above operation to the polynomial r1 and g and get a polynomial s2 of A[ T] such that the degree of the polynomial r2 = r1 - s2g
satisfies the inequality deg r2 < deg r1. Then
f = (s1 + s2)g + r2 and so if deg r2 < m the desired result holds. Otherwise we can repeat the process and arrive after no more than n-m steps at r3 = r2 sag
rk = rk-1 skg where deg rk < m. Putting s = s1 +
f = sg+r and so the theorem is proved.
.. + Sk and rk = r we get
234
VII
EIGENVALUES
Let us now study an application of the euclidean algorithm. For n . . . , fn of A [Ti, we consider the subset F consisting of all polynomials of A[ T] of the form arbitrary polynomials f 1, f2 ,
r1f1 + r2f2 + . .. + rn fn
where r, are arbitrary polynomials of A[T] . F is called the ideal of A[ T] generated by the polynomials f1 , f2, ... , fn and it satisfies the following properties: (i) (ii)
if f and g belong to F, then f -g belong to F. if f belongs to F, then rf belongs to F for all polynomials r of A[T].
In general any non-empty subset F of A[T] that satisfies (i) and (ii) above is called an ideal of A[T] .
Clearly the subset 0 consisting of the zero polynomial alone is an ideal of A[T], and A [ T] itself is an ideal of A[T]. The subset of all multiples of a fixed polynomial f of A [ T] is also an ideal of A [ T1. The next theorem shows that every ideal of A [ T] can be so obtained. THEOREM 18.2. Every ideal F of A[ T] can be generated by a single polynomial g of F, i.e. F = {sg: seA[T] }. PROOF. If F = 0, then the theorem is trivial. Suppose now that F contains non-zero polynomials. In F there exists a polynomial g such that
0 < degg<degf for all non-zero polynomial for F. The theorem is proved if we show f = sg for all non-zero feF. By 18.1, there exist polynomials r and s of
A[ T] such that f = sg + r and deg r < deg g. Since F is an ideal, it follows that r belongs to F. Hence deg r = -00, i.e., r = 0, proving the theorem. C. Greatest common divisor
Let f1, f2, ... , fn be polynomials in A [ T1. A polynomial h of A[T] is called a greatest common divisor of the polynomials
f1, f2, ... , fn if (i) h divides each of f1i f2, ... , fn and (ii) any polynomial that divides each of f1 , f2, ... , fn divides h. If h and if are the two greatest common divisors of the same polynomials f1, f2, . . . , fn , then it follows from the definition that h = ph' and h' = qh for some non-zero polynomials p and q of A[T]. Since
§ 18
235
POLYNOMIALS
h = pqh, it follows that both p and q are non-zero constants. There-
fore the greatest common divisor of the non-zero polynomials f, , f2, ... , f,, is uniquely determined up to a non-zero constant factor. We say that the non-zero polynomials f, , f2, . . . , f are relatively prime if they have I as their greatest common divisor. We
shall now use Theorem 18.2 to prove that the greatest common divisor h of the non-zero polynomials f, , f2, . . , f can be written as h = r, f, + r2f2 + . + r f for some polynomials ri of A[T]. .
Indeed, let F be the ideal of A [ T] generated by f,, f2, h a polynomial of F that generates F. Then h = r, f, + some
f, and + r f for
. ,
and h is obviously a greatest common divisor of In particular if f, , ..., f are relatively prime, then
1 = r, f, + ... + rn f for some r;
.
D. Substitutions In the linear space A[T], the family (1, T, T2, ...) constitutes a base. Therefore for each scalar A of A a linear transformation OA: A[ T] -> A is uniquely defined by
OX(T) = A' i = 0, 1, 2, .. . where T °= 1 and A° = 1. For a polynomial f = a T" +
+ a, T + a°,
we shall write OA(f) = f (X), i.e.
f(X) = a A" + ... + a,A+a°. Since OA is a linear transformation, we get for f, geA[T] and A, µcA
(f +g)(A) = f(A) + g(A) (µ1)(A) = A(Ax))-
By straightforward calculation, we can further verify that (fg)(X) = f(A)g(A)
Each polynomial in Ker¢A is said to have A as a root or zero; in other words is a root of feA[T] if f(X) = 0. The following theorem characterizes roots of a polynomial in terms of divisibility. THEOREM 18.3. Let f be a non-zero polynomial of A[ T1. A scalar A of A is a root off if and only if f is divisible by the polynomial T -A.
236
V11
EIGENVALUES
PROOF. If f is divisible by T - X, then f = (T- X)gfor some polynomial gin A [ T] . Since f(A) = (X - X)g(X) = 0, X is by definition a root of f. Conversely if X is root of f, then deg f > 1, since a non-zero constant
polynomial has no root. Applying the euclidean algorithm to f and T - X, we get polynomials s and r of A [ T] such that f = s(T - X) + r and deg r < 1. Since X is a root of f, we get 0 = f(X) = s(X)(X - X) + r(X).
That means X is a root of the constant polynomial r; therefore r = 0. Hence f is divisible by T - X.
Consequently a polynomial fin A[ T] of degree n > 1 can have at most n roots. A root X off is said to have the multiplicity m (m > 1) if m is the largest integer such that f is divisible by the polynomial (T - X)m . Accordingly a root of multiplicity 1 (respectively > 1) is called a simple (respectively multiple) root. A polynomial with real coefficients may fail to have real roots; for
instance, the polynomial TZ + I has no real roots. On roots of polynomials with complex coefficients, we have the following information: every polynomial f in C[T] of degree n > I has exactly n roots in C when each root is counted by its multiplicity. This is the so-called fundamental theorem of algebra. We shall make use of this theorem in a few places in this and the following chapter. The reader is asked to accept this theorem whose proof requires the use of results in topology or analysis. Recall that for every linear space X over A we have an associative algebra End(X) of all endomorphisms of X, in which the product ra of any two endomorphisms r and a is defined as the endomorphism To v.
For any endomorphism a of X, we define a linear transformation >ya: A[ T] --> End(X) such that
ya(l) = ix and l1a(T`) = ai for i = 1, 2, ... . + a1T + ao we get an Hence for each polynomial f = a" T" + endomorphism ip(f) of X which we write as .
f(a) = a, a' +.. +a1a+aoix. If there is no danger of confusion we shall write the last term aoix in + a 1 a + ao . f (a) simply as ao ; thus f (a) = a" v" + Analogously for each square matrix M over A of order r and each polynomial f of A[ T] we get a square matrix of order r:
f(M) =
a1M + aol,.
§18
POLYNOMIALS
237
Here again, we may delete the identity matrix Ir from the term aolr + a1M + ao. if it is clear from the context; thus f(M) = aOW + . REMARK 18.4.
In our definition of polynomial, we only make
use of some very fundamental algebraic properties of A (A = R or A = Q. A definition of polynomial with coefficients in the algebra
End(X) or in the matrix algebra A(',') can be given with a few obvious modifications. For instance, a polynomial with coefficients in End(X) is a family
f =(00,01,02 ...)=(0i)i=0, 1, 2, of endomorphisms of X for which an index n exists such that ai = 0 for all i > n. If we put
T = (0, ix, 0, ...), then write f as
... f=anTn + an-1Tn Addition and multiplication are defined similarly. We notice that since multiplication in End(X) is not commutative, for two polynomials f and g with endomorphisms as coefficients, fg may be different from gf. Finally if f = an Tn + ... + a1 T + ao is a poly1+
nomial in the indeterminate T with coefficients in End(X) and T is an endomorphism of X, then f(T) = anTn + ... + a1T + a0
is an endomorphism of X. In particular we say that r annuls f if f(r) = 0. Polynomials with matrix coefficients are analogously defined. E. Exercises
1. Find in each of the following cases q and r in R[T] such that f = gq + r with deg r < deg g.
f = T 3 -3T2-T- 1, (b) f = T 4 -2T+5, (a)
(c) f=3T4 -2T3 - T+ 1,
g=3T2 -2T+ 1; g=T2 +T+2; g=3T2 - T+ 2.
2. In each of the following cases, find necessary and sufficient conditions on a, 13, such that (a)
T3 + (3T + 1 is divisible by T 2 + a T - 1,
(b)
T4 + RT2 + 1 is divisible by T2 + aT + 1.
238
VII
EIGENVALUES
3. In each of the following cases find the greatest common divisor (f, g) in R [ T1. Find also u and v in R [ T] such that of + vg = (f, g).
(a) f=T4 +2T3 -T2-4T-2,g=T4 +T3 -T2-2T-2; (b) f=3T3 -2T2 +T+2, g=T2 -T+ 1; g=(1 -T)4. (c) f=T4 is said to be irreducible if it cannot be written as a product p =fg with f,geR[T],degf <degp and degf<degp.
4. p E R [ T]
(a)
Let p, f, g e R[T] and let p be irreducible. Prove that if fg is
divisible by p, then f is divisible by p or g is divisible by p. (b) Prove that every f E R [ Ti can be written as a product of irreducible polynomials:
f=P1P2 (c)
. . .
Pr.
Prove that if .f =P1P2 ... Pr = gig2 ... qs
where all p, and q, are irreducible polynomials of degrees >1,
then (i) r = s and (ii) after suitable renumbering p, = c;q; for i = 1,
... ,
r where c, is a non-zero constant.
5. Let f r= R[T]. Suppose f has degree n. Prove that f has at most n roots.
6. Let f, g E R[T] be of degree < n. Suppose for n+l distinct real numbers al, a2, . . . , an + 1 , Pa,) = g(a,) for i = 1, 2, ... , n+l. Prove that f = cg where c is a non-zero constant.
7. Let P, (i, j = 1, ... , n) be polynomials in one indeterminate T. Let c,i (0 0) be the leading coefficient of Pip Assume the deg P11 = d; for all i, j = 1, . . . , n. Show that P11
Pin
= cTd + terms or degree < d. I Pn1
...
An I
where c=det(cii)andd=d1+ ... +dn.
§ 19
239
EIGENVALUES
§ 19. Eigenvalues
In the rest of this chapter, we shall concern ourselves with a more detailed study of endomorphisms of a finite-dimensional linear space.
Intuitively speaking, we shall take an endomorphism apart to see how its components work. Technically, we decompose the linear space X in question into a direct sum of invariant subspaces upon
each of which the endomorphism a operates in a way that is relatively easy for us to handle. A. Invariant subspaces Let a be an endomorphism of a linear space X and Y a subspace of X. We say that Y is invariant under a if a[ Y] C Y. If Y is invariant under
a, then a defines an endomorphism a': Y -+ Y where a' (y) = a(y) for all y E Y. Trivial invariant subspaces are the zero subspace 0 and the entire space X. Moreover, both Ker a and Im a are invariant under a. EXAMPLE 19.1. Let a be the endomorphism of R2 defined by a(e1) = X1 e1 and a(e2) = X2e2 where (e1 , e2) is the canonical base. If X1 * X2, then the only 1-dimensional invariant subspaces are those generated by e1 and by
e2. In the case where X1 = X2, every subspace of R2 is invariant under a. EXAMPLE 19.2. Let a be an endomorphism of a linear space X over A.
The existence of a 1-dimensional subspace invariant under a is equivalent to the existence of (i) a non-zero vector x of X and (ii) a scalar X of A such that a(x) = Xx. That means the existence of a non-zero vector x in Ker(a - X) for some scalar X of A. Consider now the endomorphism a: R2 --+ R2 defined by
a(e1)= -e2 and a(e2)=e1. Then for any real number X the endomorphism a - X is an automorphism. Therefore R2 and 0 are the only subspaces of R2 invariant under a. EXAMPLE 19.3. Let
be the real linear space of all polynomials of R [ T] of degree < n and D the differential operator, i.e.
D(a" T" + ... + a1 T + a0) =
na"T"-1
+ ... + 2a2 T+a1.
Then for every k < n, the subspace Pk+ 1 of P. + 1 of all polynomials
240
V11
EIGENVALUES
of R[T] of degree < k is invariant under D. Furthermore, these are the only subspaces of Pn+ invariant under D. 1
If X is a finite-dimensional linear space over A and Y is a subspace
invariant under an endomorphism a of X, then relative to some bases of X, the matrix of a takes up a simple form. For instance, if B = (x 1, . . ., x n) is a base of X such that B'= (x 1, ... , x,.) is a base of Y, then
a(xi)=aiiXi +...+airxr and
a(xj) = aj1 x1 + ... + ajrxr + aj r+i xr+ i + ... + ajnxn
for i = 1, 2, ... , r and j = r +1, ... , n. Therefore the matrix MBB of a relative to the base B is in block form:
all
art
MBB (a) =
.
.
.
a1r
.
.
.
arr
ar+ii .
.
. ar+1r
0 ................ 0 .................... .................... .................... 0 ................ 0
ar+i r+1 ........ ar+1 n ....................
ant ...... anr
an r+1 ........... an n
We observe that the block at the lower left corner need not be zero generally. Suppose the block at the lower left corner is zero. Then a(xj) = ajr+1,Xr+ 1 + ... + ainxn
for all j = r+l, ... , n. Consequently the subspace Y' generated by the vectors xr+ 1, ... , xn is invariant under a, and X = Y ® Y' is a direct sum of two invariant subspace.
§ 19
241
EIGENVALUES
Conversely suppose X = Z1 ® Z2 where Z1 and Z2 are invariant under a. If B 1 = (z 1 , ... , z,) and B2 =(Z,-I," .. , zn) are bases of Z1 and Z2 respectively then a(z1) = oil Z1 + .
1 = 1, ... , r
Ri,Z,
a(Z1) =gir+1Zr+1+
.. +ginZn
1
r+l,..., n.
Therefore the matrix MBB(a) of a relative to the base B = (z1, ... , z,,
Zr+1,..., Zn)is
7 a11............ Qir ..................
0,1 ............Orr 0 .............. 0 .................. .................. .................. 0 .............. 0
0 ............... 0 .................. .................. .................. 0 ............... 0
Or+
lr+1 ...... Q,+ 1 n
Onr+1.......... Qnn
where the upper right and lower left blocks are both zero. It is easily recognized that the upper left block is the matrix MB1 B1 (al) and the lower right block isMB2B2 (a2) where a1 : Z1 -+Z2 and a2: Z2 -> Z2
are the endomorphisms defined by a on the invariant subspaces. Thus r-
MB1B1(a1)
0
0
MB2B2 (a2)
MBB (a) =
----------
It follows that if Z1 is further decomposed into a direct sum
242
VII
EIGENVALUES
Z, = UI ® U2 of subspaces invariant under a, (and hence also under a) and Z2 is decomposed into a direct sum Z2 = U3 ® U4 of subspaces
invariant under a2 (and hence also under a), then relative to a suitable base of X the matrix of a will have the form
where A; are the matrices of endomorphism on U; defined by a and all unspecified terms are zero. The above discussion shows that decomposition of the linear space X into a direct sum of subspaces each of which is invariant under the
endomorphism in question is a suitable method for tackling the problem posed at the beginning of this chapter. Taking this as our point of departure, we shall branch out in different directions in the remainder of this chapter. B. Eigenvectors and eigenvalues
We shall now study in more detail 1-dimensional invariant subspaces. This will lead to the central concepts of this chapter: namely, eigenvectors and eigenvalues. Let Y be a 1-dimensional subspace of a linear space X over A and
x be a non-zero vector of Y. Then Y is invariant under an endomorphism a of X if and only if a(x) belongs to Y, i.e. a(x)= Ax for some scalar X of A. In this case a(y)=Ay for all vectors y of Y. 19.4. Let X be a linear space over A and a an endomorphism of X. An eigen vector of a is a non-zero vector x of X such that o(x) = Ax for some scalar A of A. An eigen value of a is a scalar A of A that satisfies an equation a(x) = Ax for some non-zero vector x of X. DEFINITION
Eigenvectors are also called proper vectors or characteristic vectors; eigenvalues are also similarly called proper values or characteristic values.
243
EIGENVALUES
§ 19
It follows from the above discussion that if x is an eigenvector of
an endomorphism a, then x generates a 1-dimensional subspace invariant under a. Conversely, every non-zero vector of a 1-dimensional
subspace invariant under a is an eigenvector of a. From Examples 19.1, 19.2 and 19.3 we see that some endomorphisms possess eigenvectors whereas others do not. For any fixed scalar A of A, the existence of a non-zero vector x of
X such that a(x) = X x is equivalent to the existence of a non-zero
vector x of X such that (a - A)(x) = 0. Therefore we obtain the following useful theorem.
THEOREM 19.5. Let X be a finite-dimensional linear space over A
and a an endomorphism of X. Then for every scalar A of A the following statemen's are equivalent: (a) (b) (c)
A is an eigenvalue of a. Ker(a - A) 0 det(a - A) = 0.
C. Characteristic polynomials
Theorem 19.5 suggests that it is worthwhile to investigate the expression det(a - A). We have seen in § 17D that we can evaluate the determinant det(a - A) of the endomorphism a -'X with the aid of any base B = (x1, ... , xn) of X, i.e. det(a - A) is equal to the determinant
of the matrix MBB (a - A) of a - A with respect to any base
B = ( X-.1,
, Xn) of X. Thus if MBB(a) = µ* *, then we get
-A
A13
.
.
1121 µ22-A µ23
.
.
µ1 1
1112
.
.
.
.
.
µ1n
.
1A2n
det(a - A) =
llnl
.............: /Ann-A
244
VII
EIGENVALUES
If we replace X in this expression by an indeterminate T, we obtain a polynomial #11-T 1A12
A13
.
.
.
.
.
111n
A21 1122-T 112 3
.
.
.
.
.
µ2n
pa =
Ant
............. '1Mn-T
in the indeterminate T with coefficients in A. This polynomial pa that depends only on the endomorphism a (and does not depend on
the base B of X) is called the characteristic polynomial of the endomorphism a. It follows from the definition that the degree of the characteristic
polynomial pa is n, which is the dimension of the linear space X. The leading coefficient of pa is (-1)n and the constant term of pa is det(a). Thus pa = (-l)nTn + a.,t_1 Tn-1 + ... + al T + det(a). is called the trace of the endomorphism and is denoted by tr(a). Except in the exercises, we shall not make use of the trace of an endomorphism in the sequel. The most important property of the characteristic polynomial pa of an endomorphism a is that any scalar X of A is an eigenvalue of a if and only if it is a root of the characteristic polynomial pa The scalar (-1)n-lan_1 = (µl 1 +
-
-
of a, i.e. pa(X) = 0. Thus by means of determinants we have successfully reduced the geometric problem of the existence of 1-dimensional subspaces invariant under an endomorphism to an algebraic problem of the existence of roots of the characteristic polynomial. This, to some extent, is the justification for introducing determinants in § 17. Since polynomials with real coefficients do not always have real
roots, endomorphisms of a real linear space need not have (real) eigenvalues. Consequently, we may not be able to decompose the real linear space in question into a direct sum of 1-dimensional invariant subspaces. In the complex case, the situation is more promising. By the so-called fundamental theorem of algebra, every polynomial with complex coefficients has complex roots. But, as we
§19
245
EIGENVALUES
shall see later, even for endomorphisms of a complex linear space, it is not always possible for us to obtain such a simple decomposition. The characteristic polynomial pM of a square matrix M =p** over A of order n is similarly defined as the polynomial 1113
.
.
.
.
.
Pin
1121 1122-T 1123
.
.
.
.
'
92n
1111-T
1112
PM =
I
pnI
11nn-T
I
of A[T] . Eigenvalues of the matrix M are then the roots (in A) of pM. Therefore an endomorphism a of X and its matrix MBB(a) relative to any base B of X have the same eigenvalues. D. Diagonalizable endomorphisms
Let X be an n-dimensional linear space over A. Then we call an endomorphism a of X a diagonalizable endomorphism or semi-simple
endomorphism if a has n linearly independent eigenvectors. Diagonalizable endomorphisms are among the simplest of the endomorphisms. Indeed if x1i ... , xn are n linearly independent eigenvectors of a diagonalizable endomorphism a of X, then these eigenvectors form a base of X such that a(x;) = );x; for i = 1, 2, ... , n. Therefore the matrix of a relative to this base is a diagonal matrix
where all terms outside of the diagonal are zero. Furthermore the endormorphism a is completely determined by its eigenvalues X1, X2 ,
246
VII ,
EIGENVALUES
An ; by this we mean that a(x) = A 1 a 1 x 1 +
+ A,; a xn for
+anxn.
Conversely if the matrix of an arbitrary endomorphism a of X is a diagonal matrix relative to some base (x 1, x2 , ... , xn) of X, then the vectors x1 are linearly independent eigenvectors of a. Hence a is diagonalizable.
By means of the correspondence between endomorphisms and matrices, we can define diagonalizable matrix in the following way. We say that a square matrix over A of order n is diagonalizable if there exists an invertible square matrix P over A of order n such that the matrix PMP-1 is a diagonal matrix. It follows from results of
§14C that an endomorphism is diagonalizable if and only if its matrix relative to any base is diagonalizable.
Let us study one important case in which an endomorphism is certain to be diagonalizable. We lead up to this case by proving the following theorem. THEOREM 19.6. I f x1 , x2i ... , XP are eigenvectors of an endomorphism and the corresponding eigen values A1, A2 , then x1 , x2, ... , xp are linearly independent.
... ,
XP are distinct,
PROOF. For p = 1, the proposition is trivial since eigenvectors are non-zero vectors. We assume the validity of the proposition for p - I eigenvectors and proceed to prove for the case of p eigenvectors. If x1i X2, ... , xp are linearly dependent, then by induction assumption, we get
xp = a1 x1 + a2x2 +
.. + ap-1 xp_1.
Applying a to both sides of this equation, we get Apxp = a1 Al x1 + a2A2x2 +
. + ap_1Ap_1xp_1.
Subtracting from this the first equation multiplied by Ap, we are led to the relation: 0 = a1(A1- Ap)x1 + ... + ap-1 (Ap-1- Ap)xp-1
Since all the eigenvalues are distinct, we have A; - XP * 0 for i = 1, ... , p-1. The linear independence of x1, ... , xp_1 therefore implies that a1 = a2 = = ap _1 = 0 and hence xp = 0. But this contradicts the fact that xp is an eigenvector of a and therefore the assumption that the eigenvectors x 1i X2, ... , XP are linearly dependent is false.
From the theorem above, we derive a sufficient condition for an endomorphism to be diagonalizable.
§ 19
247
EIGENVALUES
COROLLARY 19.7. Let X be an n-dimensional linear space over A and
a an endomorphism of X. If the characteristic polynomial pa of a has n distinct roots in A, then a is diagonalizable.
In the language of the theory of equations, we can rephrase Corollary 19.7 as follows. If the characteristic polynomial pa of an endomorphism a of a linear space X over A has n simple roots where n is the dimension of X, then the endomorphism a is diagonalizable.
This, however, is not a necessary condition for a to be diagonalizable. For instance, consider the identity endomorphism ix of X. The characteristic polynomial of ix is (1 - 7)"; thus the n roots of this polynomial are equal to 1. But ix is a diagonalizable endomorphism.
Finally let us consider an example of a non-diagonalizable endo-
morphism whose characteristic polynomial has a root of multiplicity n. EXAMPLE 19.8. Let P be the n-dimensional real linear space of all polynomials of R[T1 of degree
+2a2T+a,. D(c,,_1T"-`+...+a,T+a0) = (n-1)ai_1Tn,+ Using the base (1, T, ... , T'-') of P we find the characteristic polynomial to be PD = (-1)n with multiplicity n, therefore all eigenvalues of the Tn.
0 is a root of PD endomorphism D are equal to 0. Hence a polynomial f of P is an eigenvector of the endomorphism D if and only if f is a non-zero constant polynomial. D is therefore non-diagonalizable since the maximal number of linearly independent eigenvectors of D is 1. However relative to the base 1, T, 21 T 2, T3, ... , (n 11 T" ' the matrix of the endomorphism D takes up the following form
1o
l OJ where all terms are zero except those immediately below the diagonal which are all equal to 1.
248
VII
EIGENVALUES
E. Exercises 1.
Find the characteristic polynomial and the eigenvalues of the following matrices. (a)
Ia1
0
0
0
a2
0
-0
0
...
an
J 1
(b)
2.
a21
a22
ant
2
0
ann
j
The following matrices are matrices of endomorphisms of complex linear spaces. Find their eigenvalues and their conesponding eigenvectors. (i)
(iii)
(v)
3
4
5
2
(ii)
1
1
1
I1
1
-1
-1
1
-1
1
-1
1
-1
-1
1
0
1
0
L_a
11
(-0
a
ro
0
11
1
0
0
Oj
(iv)
1
5
6
-3
-1
0
1
1
2
-1J
,
(vi)
CO
2
1l
-2
0
3
-1
-3
0)
§ 19
(vii)
r1
i
i
-2
(ix)
(viii)
1
i
-i
1
2i
11
2
L0 3.
249
EIGENVALUES
Show that the matrix
cos9 -sin cos9
sing
has no eigenvector in R2 if 9 is not a multiple of 7r. 4.
Find eigenvectors of cos9
sing
sing
-cos9
in RZ.
5. Which of the following matrices are diagonalizable over C? -'s
e-
el
0
0
1
0
0
1
0
0
0
1
0
0
1
0
0
0
1
-loo
0
0
1
10
0
0
0
OJ
0, -1 -.0
1
What about over R? 6.
Prove that all eigenvalues of a projection are 0 or 1.
7. Prove that all eigenvalues of an involution are +1 or-1. 8.
Let X be an n-dimensional linear space and p an endomorphism of X. Show that if tp has n distinct eigenvalues, then p has 2" distinct invariant subspaces.
9. Let p be an endomorphism of a complex linear space. Suppose gypm is the identity mapping for a positive integer m. Prove that p is diagonalizable.
250 10.
VII
EIGENVALUES
Let p be an endomorphism of an n-dimensional complex linear
space and let Al , ... , An be the eigenvalues of p. Find the eigenvalues of p2 (i) , (ii) ,p'' (if p is further assumed to be an automorphism), (iii) f(op) where f(T) is a polynomial.
11.
Let p be an endomorphism of a linear space X. Prove that if every non-zero vector of X is an eigenvector of gyp, then %p is a scalar multiple of the identity mapping.
12.
Let p and >y be endomorphisms of a complex linear space X. Suppose,poi = do gyp. Prove that
(i) if A is an eigenvalue of 1 then the eigensubspace XA consisting of all vectors x of X such that px = Ax is an invariant subspace of ,, (ii) p and >V have at least one eigenvector in common, (iii) there exists a base of X relative to which the matrices of p and , are both triangular matrices.
(iv) Generalize (iii) to a statement in which more than two endomorphism are involved. 13.
Let p be an endomorphism of a linear space X over A and let f e A[T] . Show that if A is an eigenvalue of gyp, then f(A) is an eigenvalue of f (gyp).
14.
Let p and > i be endomorphisms of a linear space X. Prove that tpo 1 and 4iotp have the same characteristic polynomial.
§20. Jordan Form In this section, we shall concern ourselves with endomorphisms of complex linear spaces exclusively. Our problem is still the same as in the previous § 19; that is finding a base relative to which the matrix of the endomorphism in question has the simplest form possible. We
formulate our result here as Theorem 20.12. The reason for the restriction to complex linear spaces is that we can make use of the fundamental theorem of algebra to ensure that an endomorphism of
an n-dimensional linear space has n (not necessarily all distinct) eigenvalues.
§20
JORDAN FORM
251
A. Triangular form
A triangular matrix (or a matrix in triangular form) is a square matrix A = a** of the form all a12
.
.
.
a22
.
.
.
ain
a2n
ann)
where all terms below the diagonal are zero, or of the form
Call a21
l_ an i
an!)
where all terms above the diagonal are zero.
In the following theorem we show that the matrix of any endomorphism of a complex linear space can be "reduced" to a triangular matrix with the eigenvalues forming the diagonal. THEOREM 20.1. To every endomorphism a of a complex linear space X, there is a base (x1 , ... , xn) of X such that a(x1)=a11x1 a(x2) = a21x1 + a22x2 COO - a31 x1 + a32x2 + a33x3
a(xn)=an1x1 +an2X2 +an3x3+... +annxn-
252
V11
EIGENVALUES
In other words, there is a base (x1, ... , xn), relative to which the matrix of a is in triangular form. Moreover the characteristic polynomial pa is given by
pa = (a11- 7)(a22-T) ... (ann- T). PROOF. We prove this theorem by induction on the dimension of X. For dim(X) = 1, the theorem is trivial. We assume the validity of the
theorem for any complex linear space of dimension < n-1 and proceed to prove the theorem in the case where dim(X) = n. Since we are dealing with complex linear spaces, we may apply the fundamen-
tal theorem of algebra to the characteristic polynomial pa. Let all
be an eigenvalue of a and x1 a corresponding eigenvector, i.e. a(x 1) = a, 1 x 1. Denote by Y the 1-dimensional subspace generated by x1 and by Z a complementary subspace of Y in X, i.e. X = Y ® Z. If we denote by 7r: X -> Z the projection of X onto its direct summand Z defined by
7r(x) = z for x = y + z where yEY, zEZ, then an endomorphism p of Z is defined by p(z) = Tr(a(z)) for all zEZ.
Since dim(Z) = n-1, the induction assumption is applicable to the complex linear space Z. Hence there exists a base (x2, x3, ... , xn ) of Z such that p(x2) = a22x2 p(x3) = a32x2 + a33x3
p(xn) =an2x2 + an3X3 + ... +annXn.
On the other hand it follows from the definition of the endomorphism p of Z that for each vector z of Z
a(z) = ax, + p(z). Therefore we get for the base (x1i x2, ... , xn) of X a(x1)= a11x1 o(x2) = a21 x1 + p(x2) a(xn) = an 1 x 1 + p (xn ).
§20
253
JORDAN FORM
This proves the first part of the theorem. The second part follows then immediately from the definition of the characteristic polynomial.
The matrix formulation of the above theorem is given in the following corollary.
COROLLARY 20.2. For every complex square matrix A of order n there is an invertible complex square matrix P of the same order such that the matrix PAP-' is in triangular form.
B. HAMILTON-CAYLEY theorem
As a first step towards the main result on JORDAN forms, we prove the famous HAMILTON-CAYLEY theorem on characteristic polynomials:
THEOREM 20.3. If PA is the characteristic polynomial of a square
matrix A over A (where A = R or A = C) of order n > 1, then PA (A)=0. PROOF. Let PA = 7nTn +
+ 7I T + yo. Then PA = det(B) where B = A - TIn. In evaluating the determinant det(B), we can make use of the results of § 17F. Thus
det(B)II = where ad(B) is the adjoint matrix of B which is the transpose of the matrix of cofactors B11 of B. We recall that B,j is (-1)`+' times the determinant of the (n-1,n-l)-matrix obtained from B by deleting its i-th row and j-th column. Therefore B.1 is a polynomial in T of degree
n-l. Consequently ad(B) is an (n, n)-matrix whose entries are all polynomials in T of degree n-1 and hence we can write ad(B) as a polynomial in T with matrix coefficients: ad(B) = Bn-1
+ ... + B,T + Bo
Tn-1
where the coefficients B; are (n, n)-matrices. It follows from
det(B)II = B. ad(B) that
(7nI,)Tn + ... + (7iln)T + (7oln) _
(A-TIn)(Bn-,Tn-i
+ ... +B,T+B0)
... + = Bn-,Tn + (ABn-1- Bn-2) (AB2 -B, )T2 + (AB, -B0)T + AB,. Tn-, +
254
VII
EIGENVALUES
Comparing coefficients, we obtain 7nln =
- Bn-1
7n-IIn = AB,-, - Bn-2 711, = AB 1
- Bo
7o I, = AB,
If we multiply the first equation by A', the second by An 1 , the third by An-2, ... , the last by In and add up the resulting equations we get ,ynAn +
7n-1An-1
+ ... + 71A + 7oln = 0,
and hence pA(A) = 0. This completes the proof of the theorem.
The endomorphism formulation of 20.3 is given in the following corollary. COROLLARY 20.4. If pa is the characteristic polynomial of an endomorphism a of a linear space X over A, then pa(a) = 0. Besides the HAMILTON-CAYLEY theorem, we shall also make use of the following theorem on factorization of characteristic polynomials.
THEOREM 20.5. Let a be an endomorphism of a linear space X over A of dimension n > 1 and Ya subspace of X invariant under a. If a'
is the endomorphism of Y defined by a, i.e. such that a'(y) = a(y) for all yeY, then the characteristic polynomial p.,a of a is divisible by the characteristic polynomial pa, of a'. -ROOF. Let B = (x1 ,
... , x xr+1 , ... , x,) be a base of X such
that B' = (x1i ... , Xr) is a base of Y. Then the matrix M of a relative to B has the block form o1
EI
§20
255
JORDAN FORM
where C is the matrix of a' relative to B', D is an (n-r, r)-matrix and E (n-r, n-r)-matrix. Consequently, if T is an indeterminate, then
M - TIn=
C-TIr D
0
E - TIn- r
and
det(M - TIn) = det(C - TIr) det(E - TIn_r) . Therefore pM = pCpE ; but pyr = pa and pc = po'. Hence pa is divisible by pa-. C.
Canonical decomposition Consider an arbitrary endomorphism 0 of a complex linear space
X. By the fundamental theorem of algebra we can factorize the characteristic polynomial of 0 into linear factors; thus
P = (XI - T)ml (X2 - T)M2 ... (Xq - T)mq where the eigenvalues X1, . . . , X. are all distinct. For each j = 1, ... , q, we define Xi = Ker(( - Xjy" i. It follows from (¢ -), )mi
_ (0 - Xi `/ 0 that Xi is a subspace invariant under 0. Consider now each of these invariant subspaces Xi separately. On Xj 0 defines an endomorphism 0j (0j(x) = Ox for all x(=-Xj); in this way the behaviour of 0 on vectors of Xi is manifested by that of 0j. We are now going to show that ci is the sum of a diagonalizable (or semi-simple) endomorphism aj and a nilpotent endomorphism vi. In general an endomorphism v of a linear space is said to be nilpotent if Ps = 0 for some positive integers. We define aj and vi as follows aj(x) = X jx and vi(x) = 0ix - Aix
for all x E Xi .
Then aj = XiiX/ (or a, = X,) is clearly semi-simple and vi = Oi - Ai is nilpotent since Xi = Ker(0 - X1)mi and vim/ = 0. Thus 0i = ai + vi is the sum of a semi-simple and a nilpotent endomorphism. We shall
256
Vii
EIGENVALUES
see in Theorem 20.11 that relative to a suitable base of Xi the matrix of the nilpotent endomorphism vi takes the simple form
(0. E2
'
.
EP
where the terms c, immediately below the diagonal are either 0 or I and all other terms are 0. Consequently the matrix of Oj relative to such a base takes the form
rX, E2
Al =
EP
XJ
In other words, if we "localize" 0 to the invariant subspace Xj, it can
be represented by the matrix Al which is in simple enough form. Clearly this result simplifies the study of 0 a great deal; however the
importance of this result is much enhanced if we can furthermore show that X = X 1 ®
® XQ . Becuase this will mean that a base of
X can be found relative to which the matrix of 0 takes the so-called JORDAN form
J
§20
257
JORDAN FORM
This leads us to prove the following theorem. THEOREM 20.6.
Let 0 be an endomorphism of a complex linear
space of dimension n > I and let
P0 = (A, - T)m' (A2 - T)m2
...
(Aq - T)mq
be the characteristic polynomial of 0, where the eigenvalues A. are all distinct from each other. Then for each i = 1, 2, , q, . .
.
X; = Ker(O-A,)mi is an m;-dimensional subspace of X invariant under 0 and X is the direct sum of these invariant subspaces.
PROOF. For convenience of formulation, we introduce the following abbreviations
for i = 1,
f = (Xi - T)mi and gi = f1f2 ... ti ... fq . , q, where fi under the symbol A is to be deleted. Our
proof will be given in three parts. X=X1 + .. +Xq. Part I.
Since the eigenvalues Ai are all distinct from each other, the greatest common divisor of the polynomials g1 , ... , gq is the con-
stant polynomial 1. Consequently, as we saw in § 18C, there are polynomials h,E C[T] such that (1)
g1h, + g2h2 + ... + gghq = 1.
The 3q polynomials f , gi and h; determine 3q endomorphisms of X: V i = fi(0), ti = g, (O) and 1h1(0) for From equation (1) we get ti- J 1 + t2 ' J 2 + . . + tq o tq = iX ;
i= 1, ... , q.
hence
Im( (2) On the other hand we have
p0 = fig;
f o r i = 1 , 2,
... , q;
and therefore, by the HAMILTON-CAYLEY theorem,
of-t; = f Mgi (0) = Po (0) = 0. C Ker 01 = X,. Therefore Consequently O,0t,0t, = 0 and X = X1 + X2 + + Xq follows from (2).
258
EIGENVALUES
V11
Part II. dim(X,) < mi. Since both and ; are polynomials of the same endomorphism 0,
we get Oo Oi = >)/10O. Therefore X; = Ker f is a subspace of X invariant under 0. Let dim(X,) = r and consider the endomorphism Oi of X1 defined by 0, i.e. O;(x) = ¢(x) for all x(=-X1. It follows from 20.1, that a base B of X; exists such that relative to B the matrix of /, is in triangular form
1 a21 a22
('arl
.
.
.
.
.
.
.
arr J
We contend that the terms akk on the diagonal are all equal to X,. To see this, we recall that X; = Ker(tp - A,)mi and hence O'; = (O; - X,)'"i
= 0. Therefore the matrix of 4', relative to the base B is the zero matrix. But the terms on the diagonal of this matrix are (al 1 -Xr)ml, (a22 -Xj)"1t, ... , (arr - X1 )m1, therefore a1 I = a22 = ... arr = X,. Hence for the characteristic polynomial of 0i we get PP
On the other hand, we know from 20.6 that the characteristic polynomial pp of 0 is divisible by ppi . Therefore r < m; .
Part III. X=X1 e X2 a ... eXy anddim(X,) = m;. Since pp _ (X1 - T)mI (A2 - T)m2 ... (Xq - T)mq and deg(Po) = n, we get
On the other hand the sum X = Xl + .. + Xq yields the inequality dim(X1) + dim(X2) + . . + dim(Xq) > n.
Therefore it follows from dim(X,) < m; and dim(X1) > 0 that dim(X1) = mi. Suppose x1, y, eX; are such that
x1 + ... + xq =Y1 + ... + yq.
§20
JORDAN FORM
259
Then the vector x 1- y 1 = (Y2 _X2) + ... + (yq - xq) belongs to Y= X1 n (x2 + ... + Xq). It follows from the dimension theorem that dim X 1 + dim(X2 + ... + Xq) = dim X + dim Y. On the other hand
m2 + ... + mq > dim(X2 + ... + X ); therefore dim Y < 0 and
hence x 1 = yl. Similarly we can prove that x2 = Y2, ... , xq = yq . Hence X = X1 ®
9 Xq. The proof is now complete.
The above theorem will be used in § 20D for the proof of the JORDAN theorem 20.12 which also needs other results on nilpotent endomorphisms. For the present we generalize the result discussed at the beginning of this section to the following theorem.
THEOREM 20.7. For each endomorphism 0 of a complex linear space X there exist a semi-simple endomorphism a and a nilpotent endomorphism v of X such aov = voo and 0 = a + v; moreover a and v are uniquely determined by these conditions. PROOF. Using the same notations as in the proof of 20.6, we define
for each x = x1 + ... + xq with xj eX
ax=X1x1+ ... + Xgxq and
v x = (cbx 1 - X 1 x 1 ) + ... + (bxq - ?q xq ).
Then clearly a and v satisfy the conditions of the theorem. Suppose a' and v' are another pair that satisfies the conditions. Since a' commutes with v', it commutes with 0; hence with (O, - X,)mi. Therefore a'Xj C X1. Since 0 - a' is nilpotent, its eigenvalues are all zero; therefore the eigenvalues of a' on Xi are the same as those of 0, i.e. those of 0j. Since Oi has the unique eigenvalue Jy and a' is semi-
simple, it follows that the restriction of a' to X. is given by scalar multiplication by Aj. Therefore a' = a and hence V= - a' = - a = P. The unique decomposition 0 = a + v satisfying the conditions of 20.7 is called the canonical decomposition of 0. By theorem 20.7 the study of endomorphisms in general is now reduced to the study of semi-simple endomorphisms which we have done in § 19D and the study of nilpotent endomorphism which we shall carry out in § 20D below.
D. Nilpotent endomorphisms We recall that an endomorphism v of a linear space over A is said
to be nilpotent if vs = 0 for some positive integer s. The index of nilpotence of a nilpotent endomorphism v is defined to the least positive integers such that Ps = 0.
260
V11
EIGENVALUES
By Theorem 20.7, our problem is now reduced to finding a suitable base of each X; so that relative to these bases the matrices of the nilpotent endomorphisms 0, - 'X, and hence also the matrices of the endomorphism O, themselves take up the simplest possible form. This
we shall do in theorem 20.11 for the proof of which the following three lemmas on nilpotent endomorphisms are needed.
LEMMA 20.8. Let v be a nilpotent endomorphism of an arbitrary linear space X with index of nilpotence s and let K; = Ker vi for
i=0,1,...,s,then (i) (ii) (iii)
v[K,] CK1 _1 fori=1,2, ...,s;and Kt-1 is a proper subspace of K, for i = 1, ... , s.
PROOF. (i) follows immediately from the definition of K, where v° is, by the usual convention, the identity endomorphism ix of X. (ii) follows from the equation vi = pi-' o P.
()
Suppose Ki-I = Ki for a fixed index j = 1, ... , s. Then it follows from (ii) that
v[X] CKs-1,v2[X] CKs-2
,
...,vsf[X] CKi,
hence vs i[X] C Ki-1 by our assumption. Applying vi-1 to both sides
of this inclusion, we get vs-' [X] = 0. Therefore Ps-1 = 0, contradicting the definition of index of nilpotence. LEMMA 20.9.
Using the same notations as in 20.8, let Y be a
subspace of X such that y n Ki = O for some j = 1, 2, ... , s-1. Then (i) v[Y] n K,.1 = 0 and (ii) v induces an isomorphism of Y onto v[Y].
PROOF. (i) Let v(y) E v[Y] n Ki-1 for some y E Y. Then vi(y) _
P/-' (v(y)) = 0 and hence y E Y n Ki. Since y n Ki = 0, we get y = 0. Therefore v(y) = 0. (ii). It is sufficient to show that if v(y) = 0 for some yEY, then y = 0. Since y r= Ker P = K1, by 20.8(i) we get y e Ki. Therefore
yEYnKiandhencey=0.
LEMMA 20.10. Using the same notations as in 20.8. there exist s subspaces Y1, Y2, ... , Ys of X such that (i)
K;=Ki_1 9 Y, fori=l,2, ...,s;
§20
(ii)
JORDAN FORM
v induces an injective linear transformation of Y, into for each i = 2, 3, ... , s; and
261 Yi-i
(iii)
PROOF. Let Ys be a subspace of Ks = X complementary to K,-,.
Then Ks = Ks-1 ® Ys. Since Y n Ks_, = 0, by 20.9(i) we get v[YS] n Ks-2 = 0. Furthermore v[Ys] C Ks-1 by 20.8(ii). Hence there exists a subspace Ys_, of Ks_, complementary to Ks_2 such that v[Y5] C Ys-1. Thus Ks-1 = Ks-2 ® Ys-1 and v induces injective linear transformation of Ys into Ys-1 by 20.9(u). Again we get
Ys_, n Ks_2 = 0 and v[Ys_,] C Ks_2, therefore we can proceed to find subspaces Ys-2, Ys-a, , Y, = K, that satisfy condition (i) and (ii). Statement (iii) follows readily from (i) and (ii).
THEOREM 20.11. If v is a nilpotent endomorphism of an arbitrary linear space of dimension n > 1, then there exists a base of X relative to which the matrix of v is of the form
EZ
where the terms e, immediately below the diagonal are either 0 or 1 and all other terms are 0. PROOF. Let Y, , Y2, ... , Y. be s subspaces of X satisfying the conditions of Lemma 20.10. Let (y 1, ... , ya) be a base of Y.. By condition (ii) of 20.10, we can get a base of Ys_, that includes the images of the base of Ys. Let v(y1), . . .,V(Ya);za+1, ...,zb be such a base of Ys_1. Similarly there is a base
v2(y1), ...,v2(ya);v(Za+1), ..., v(Zb);tb+1, ...,tc
262
VII
EIGENVALUES
of Ys-2 . Obviously we can continue this procedure until we find a base VS-I(YI), ... , Ys-I(ya);vs-2(za+1), ... , vs-2(zb);
vs-3 (tb+0,
... ,
VS-3 (to ); .... Ud
of YI = KI . (Observe. that the vectors of this base are taken into zero by a since they belong to KI = Ker v.)By 20.20(iii), X is the direct sum of Y1, Y2, .. , YS ; therefore the vectors
), ... , pS-1 (Ya); vs-2(za+ 1), . , vS-2 (Yr), ... , vs-2(Ya); vs-3(za+1), ... , . .
vs-1 (y1
vs-2(zb); VS-3 (tb+
vS-3(zb);
I
), ... ;
.......... ;
...........................................
v2(Y1)....., v2(ya); v(Y1)....... v(Ya); Y1,
.....
v(Za+I)....... v(zb);tb+1
.,tc;
za+1,......., zb;
, ya.
form a base of X. Rearranging these vectors column by column from the left into P'-1(Y1), . . . , v(YI), YI; vs-1 (Y2), , v(Y2), Y2;
... ya;
vs-2(za+1),
such that
... , zb; ... Ud, we get a base (x1 , x2, ... ,
x,1)
either v(x1) = 0 or v(x1) = x;-1.
Therefore the matrix of v relative to this base has the required form. E. JORDAN theorem
We have encountered diagonal matrices and triangular matrices in
our previous endeavour; lying between them in the order of "simplicity" are the reduced matrices or the elementary JORDAN matrices. A square matrix of the form.
l AJ
§20
263
JORDAN FORM
where all terms on the diagonal are equal to a fixed scalar X, all terms
immediately below the diagonal are equal to 1 and all other terms are 0 is called a reduced matrix or an elementary JORDAN matrix with eigenvalue X. For example the following three matrices
Ia
Ix
01
1 t
x
L0
0
01
X
0
1
XJ
are reduced matrices of orders 1, 2 and 3 respectively. In particular, the zero square matrix of order 1 (0)
is also a reduced matrix.
We are now in the position to formulate the main result on endomorphisms of finite-dimensional complex linear spaces. THEOREM 20.12. Let 0 be an endomorphism of a complex linear space
X of dimension n > 1. There exists a base of X relative to which the matrix of 0 is of the JORDAN form
(Al
AJ where each A. is a reduced complex matrix and all other terms are 0. PROOF. Since the characteristic polynomial po of 0 belongs to C[T], we may suppose po = (XI -
T)" ()2 - T )m2 ... (Xq - T)mq
where the eigenvalues Xi are all distinct from each other. According to Theorem 20.6, X is the direct sum of the subspaces Xi = Ker(¢ - I\i)'i
for j = 1, 2,
. . . ,
q, each of which is invariant under 0. If for
each X. a base B. can be found relative to which the matrix of the
endomorphism 0,- of Xi induced by 0 is of the JORDAN form, then the
bases of the subspaces Xi constitutes a base of X relative to which
264
VII
EIGENVALUES
the matrix of 0 is of the JORDAN form. Therefore we need only examine the endomorphisms 0i of Xi. The endomorphism vi = 0i - Ai
of Xi is nilpotent, therefore by Theorem 20.11 a base Bi of Xi can be found relative to which the matrix of vi is of the form (0 E2
Er
J
where Ei is either 0 or 1. Therefore relative to the same Bi the matrix of ¢i is of the form
rXi E2
which is of the required form. For example, if this matrix is
10
0
l
oil
10 L
--1 ---
§20
265
JORDAN FORM
then we can write this as 0
0
0
0
A2
0
0
0
0
A3
0
L0
0
0
A4)
where 0
X/
A, =
1
0 0
0
1
Aj ;
1
A2 = A3 = (X,); A4 =
0 0 Xj
This proves Theorem 20.12. Finally we note that in Example 19.8, the matrix of the differential endomorphism D of P, relative to the base
... ,
(1, T, 2 . T 2 ,
(n 11) i T'-') is in JORDAN form.
F. Exercises
1. Find the Jordan form of the following matrics.
(i)
(iii)
(v)
3
0
8
0
3
-1
6
-1
-2
0
-5
1
-1
1
2
0
0
2
-2
- 2
3
7
-3
1-2
- 5
2
-3
- 3
3
-4
-10
3
-2
-2
2
0
3
3
3
1
0
0
1
8
6
-4
-1
0
0
- 1 4 -10
7
1
2
1
2
(ii )
(iv )
(vi )
1
I-7
- 6
-1
0
266
VII
(vii) 1
(ix)
EIGENVALUES
1
2
3
4
0
1
2
3
0
0
1
2
0
0
0
1
( 0
-3
0
3
-2
6
0
13
0
-3
-1
2
(viii)
1
1
1
3
0
8
i
0
where all the unspecifies terms are zero.
2. Prove that a nilpotent linear transformation of a finite dimensional vector space has trace zero. 3. Let gyp, be endomorphisms of a linear space X. Prove that ix + po41 - > Io p is not nilpotent.
4. Let p be an endomorphism of a complex n-dimensional linear space X. Suppose the matrix of p relative to a base B = (x I, x2 , ... , x,) is in elementary JORDAN form. Prove that (a) X is the only invariant subspace of 'p that contains x ; (b) all invariant subspaces of 'p contain x, ;
(c) X is not a direct sum of two non-trivial invariant subspaces.
CHAPTER VIII INNER PRODUCT SPACES
We began in Chapter I by considering certain properties of vectors in the ordinary plane. Then we used the set V2 of all such vectors together with the usual addition and multiplication as a prototype linear space to define general linear spaces. So far we have entirely neglected the metric aspect of the linear space V2 ; this means that we have only studied the qualitative concept of linearity and have omitted from consideration the quantitative concepts of length and angle of a linear space. Now we shall fill this gap in the present chapter. We use the 2-dimensional arithmetical real linear space R2 as a realization of V2 and consider the bilinear form on R2 (in Example 16.3) defined by the inner product. (xly)=a1131 + a2132 wherex=(a1,a2) andy=(81,fs2)
Then according to the usual distance formula of plane analytic geometry, the distance between x and y is given by the positive root (a1 - R1 )2 + (a2 - R2 )2 . Therefore it can be expressed in terms of the inner product as
x-yx-y.
11X-Y11 =
It turns out that the cosine of the angle between x and y can also be expressed in terms of the inner product as suggested in Example 16.4. A
Fri
11X11
a1, a2)
a2
Fig 27
Fig 28 267
268
VIII
INNER PRODUCT SPACES
Let 0 (respectively w) be the angle between x (respectively y) and the first coordinate-axis. Then we get for the sine and cosine of these angles the following expressions: cos 0
sin 0 =
=
sin w=
cos w
Hence the cosine of the angle w - 0 between x and y is given by +a,292
cos(w - 0) = cos w cos 0 + sin w sin 0 = al R1 Ilxll IIYII
_
(x 1Y)
Iixll Ilvll
We shall now use R2 with its inner product as a model and enter into detailed study of linear spaces with inner product. The real and the complex cases are now treated separately. In §21 and §22 we study
euclidean space where the underlying linear space is a real linear space. In §23 we study the unitary space whose underlying linear space is a complex linear space.
§21. Euclidean Spaces Inner product and norm Let X be a real linear space and 4): X 2 -* R a bilinear form. We say that (i) (P is symmetric if 4) (x, y) = 4) (y, x) for all vectors x and A.
y of X and that (ii) 4) is positive definite if 4)(x, x) > 0 for all
non-zero vectors x of X. The bilinear forms of Examples 16.3 and 16.4 are both symmetric and positive definite. A euclidean space is an ordered pair (X, (D) consisting of a finitedimensional real linear space X and a positive definite symmetric bilinear form 4) of X. When there is no danger of confusion about the bilinear form 4), we shall simply denote the euclidean space (X, (P)
by X. Vectors of the euclidean space X are then vectors of the underlying linear space X. For 4)(x, y) we write (xly) and call it the inner product of the vectors x and y of the euclidean space X.
§21
EUCLIDEAN SPACES
269
Under these abbreviated notations, a euclidean space is a finitedimensional real linear space X together with a real valued function that assigns to each pair x and y of vectors of X a real number (x1 y) such that the following axioms are verified for all vectors x, y and z of X and all real numbers A:
[El]
(xly) = (ylx);
[E2] (x+y Iz) = (xIz) + (yIz); [E3] (Axly) = A(xly); and [E4] (xlx) > 0 ifx*0. It follows that if X is a euclidean space and Y is a subspace of the
linear space X, then Y is also a euclidean space with the inner product defined in the obvious way. For any vector x of a euclidean space X, the non-negative square root (x x) is called the norm or length of the vector x and is de-
noted by llxll in the sequel. It follows from axiom [E4] that, for any vector x of a euclidean space X, IIxit = 0 if and only if x = 0. In other words 0 is the only vector with norm equal to 0. Moreover we can verify that for any two vectors x and y of a euclidean space X and any real number A the following equations hold: (a)
IlXxll = IAI Ilxll
(b) (c)
IIx + y112 = IIXII2 + I1y112 + 2(xly)
Ilx + yll2 + llx -y112 = 2(11x112 + llyll2)
In (a) the expression IA I is the absolute value of the real number A.
Equation (b) is called the cosine law from which it follows that the
inner product of two vectors can be expressed in terms of their norms. Equation (c) is called the parallelogram identity. Before we study euclidean space in more detail let us pause here to consider some examples.
ExAMPLE 21.1. We consider the real linear space V2 of all vectors
on the ordinary plane with common initial point 0. We defined earlier in Example 16.4 for any two vectors x = (0, P) and y = (0, Q) of V2 their inner product as (x ly) = p q cos 0 where p and q are the lengths of the fine segments OP and OQ and 0 = 4POQ.
This inner product obviously verifies the axioms [El ] to [E4]. Thus
the real linear space V2 endowed with this inner product is a euclidean space.
270
INNER PRODUCT SPACES
VIII
EXAMPLE 21.2. In the n-dimensional real arithmetical linear space R" , we define for vectors x = (a1, ... , an) and y On) their inner product as (x 1y) = Q1 91 + ... + anon which satisfies the axioms [E1 ] to [E4] . R" endowed with this inner product is a euclidean space. EXAMPLE 21.3.
If X is a euclidean space and (x1i
... , xn) is a base
of the linear space X, then we obtain a square matrix M = p** of order n whose terms are pij = (xi I xj)
i, 1 = 1,
... , n.
It follows from [E1 ] that the matrix is symmetric, i.e. M =MI. For
vectors x =a, x, + ... +anxn andy =131x1 + ... +Onxn ofXwe get from [El ], [E2] and [E3] that
= i,j1pijaiIj
(xly)
Furthermore it follows from [E4] that the matrix M is positive definite, i.e. E pi jaiaj > 0 for any non-zero n-tuple (a1,
fore relative to a base (x1,
. . . ,
. . .
, an ). There-
xn) the inner product can be ex-
pressed by means of a symmetric, positive definite matrix. Conversely if X is a real linear space, (x 1i . . . , x,) a base of X
and M = p** a symmetric positive definite matrix, then we can define an inner product by (x ly) = + anxn and y = QIx1 + + Qnxn . In this way X becomes a euclidean space. In particular, if X1, 12 , ... , Xn are positive real numbers, then X endowed with the inner product defined by
where x = uix1 +
(x 1y) = X1a1(31 + ... + XnanQn
where x = aix1 +
+ anxn and y = 81x1 +
+ Rnxn is a
euclidean space. EXAMPLE 21.4.
Consider the four-dimensional real arithmetical
linear space R4 . A symmetric bilinear form, called LORENTZ'S form, is defined by (xly) = a1/31 + a2(32 + a3f33 -c2a4R4
§21
271
EUCLIDEAN SPACES
where x = (a1, a2, a3, a4) and y = 01
,
(32
,
(33
, Q4) are vectors of R4
and c is a constant (in applications to special relativity, c is the velocity of light). Lorentz's form fails to be positive definite. We say that the real linear space R4 together with LORENTZ'S form constitute a pseudo-euclidean space, which is very useful in the theory of relativity. EXAMPLES 21.5. (a) In the infinite-dimensional linear space F of all
real-valued continuous functions defined on the closed interval [a, b I, where a < b, we define an inner product by: b
(f 19) =
J I,
f(t)g(t)dt.
It is not difficult to verify that the axioms [E1 I - [E4] hold for the inner product (f Ig). (b) Consider the set H of all sequences x = (xi)i= 1, 2, numbers such that the series.
of real
,
Ix1I + Ix21 + Ix31 + ...
is convergent. For x = (x1)t = 1, 2, . . and for any real number A, we define and
.
and y = (yi)i=1,2,
x + y = (xi + Y0i=1, 2, .. Ax = (Axi)i=1, 2, .. .
,
, .
of H
.
Then with respect to the addition and multiplication above, H is an infinite-dimensional real linear space. Furthermore we define an inner product by (xiY) _n im0(x1Y1 + ... + This inner product satisfies the axioms [E 1 ] - [E4]. However both F and H fail to be euclidean spaces since they are not finite-dimensional. They are examples of Hilbert spaces, which are studied in functional analysis and have many important applications in the theory of quantum mechanics. B.
Orthogonality
The most important and interesting relation between vectors of a euclidean space is orthogonality, by virtue of which we can express
272
VIII
INNER PRODUCT SPACES
the metric properties of the euclidean space in a most convenient form. Two vectors x and y of a euclidean space X are said to be orthogonal or perpendicular (notation: x 1 y) if (xly) = 0. It follows from the positive definiteness of the inner product that the zero vector 0 of X is the only vector of X that is orthogonal to itself. Consider now a family (x1, . . . , xp) of vectors of a euclidean space X. If .(x, lxi) = 0 for all i 0 j, i.e. if the vectors of the family are pairwise orthogonal, then we say that the family (x1 , ... , xp) is an orthogonal family. It follows from the cosine law that a generalized Pythagoras theorem: IIx1
+ ... + XP 112 = 11x1112 + ... + II xp 112
holds for an orthogonal family (x1 ,
... , xp ).
If the vectors of an orthogonal family F = (x1 , ... ,xp) are all non-zero vectors, then the family F is linearly independent. For if X1xI + ... + Apxp = 0, then 0 = (X1xl + ... + Apxp lxi) = X1(xl 1x1) + ... +ap(xplxi) = X.(xiIx,)
0 0. Hence X. = 0 for all for all i = 1, 2, ... , p. Since x{ * 0, i = 1, 2, ... , p, proving the linear independence of the family F.
Replacing each vector x; of the family by y, = xi/l1xill, we get an
orthogonal family F' _ (y I, ... , yn) of vectors of unit length, i.e. 11y; ll = 1 for all i = 1, 2, ... , p. For this family the equation (yi l y1) = S,j holds for all i and j.
In general we say that a family (y1, yp) of vectors of a euclidean space X is an orthonormal family if (y,Iy1) = S,, for all i, j = 1, . . , p. It follows that orthonormal families are linearly independent. If an orthonormal family (y1, ... , yn) is a base of the euclidean space X, then we call it an orthonormal base of X. The advantage of using an orthonormal base (y1, ... , yn) of X as a base .
of reference is self-evident since for vectors x = aI y I + + Rnyn of X we get and y = Ply, + and
'
+ an y,
(xly) = a, A, + ... +anjn
IIxU= ale+...+an2 Let us now show that in every euclidean space there are always "enough" orthogonal vectors to operate within comfort.
EUCLIDEAN SPACES
§21
273
THEOREM 21.6. In every euclidean space there exists an orthonormal base.
PROOF. We prove this theorem by a constructive method known as the GRAM-SCHMIDT orthogonalization method. Since the underlying real linear space of a euclidean space X is a finite-dimensional linear space, we can assume the existence of a base (z1 i . . . ) z,) of X and proceed to find vectors y1 , ... , y of X successively such that they form an orthogonal base of X. To begin the construction we put
y1 =z1. Then y1 : 0. Next we want to find a real number A such that the vector Y2 =22 +Xy1
is orthogonal to the vector yl. This amounts to solving for X in the equation (z2IY1)+X(Y1IY1)=0. Since y = 0, we get X = -(z2 iy1)/(yl Iy1). Moreover y2 * 0, for otherwise z1 and z2 would be linearly dependent. 1
Fig 29
The third step of the construction is to find real numbers y and v such that the vector
Y3' Z3+µy1 +vy2
274
VIII
INNER PRODUCT SPACES
is orthogonal to both y, and Y2. This amounts to finding real numbers p and v such that (z3 [l'1) + µ(Y1 IY1) = 0 and
(z3IY2) + v(Y2IY2) = 0.
Since both y 1 and y2 are non-zero vectors, these equations can be solved and we get p = - (z31Y1)/(Y11Y1) and v = - (z3IY2)/(Y21Y2) Moreover the linear independence of vectors z1 , z2, and z3 implies
that y3 * 0. Carrying out this process to the n-th step, we finally get
an orthogonal family (y1, Y2, ... ,
of n non-zero vectors.
Therefore the vectors
x, =
Yi 11y111
i=1,2, ...,n
form an orthonormal base (x 1i ... , x,,) of X. One of the nice features of the GRAM-SCHMIDT orthogonalization
method is that the vectors xi of the orthonormal base are constructed successively from the first i vectors z1, z2, ... , zi of the given base. Consequently a strong version of the supplementation theorem holds: if Y is a subspace of a euclidean space X, then every orthonormal base (x1, ... , xp) of Y can be augmented to form an orthonormal base (x 1i ... , xp , xp+ 1 , ... , of XA closer inspection of the proof of the above theorem reveals that if we denote by Yi the subspace generated by the vectors z 1, z2, . . . , zi,
then the construction of the i-th vector yi consists in finding a non-zero vector in Yi fl (Yi_ 1)1 where (Yi_ 1)1 is the set of all vectors of X that are orthogonal to each vector of Y1_1.
In general we consider, for an arbitrary subset U of a euclidean space X, the subset U1of all vectors of X that are orthogonal to each vector of U, i.e. the set U1= {xEX: (x(y) = 0 for all yEU} .
We verify without difficulty that U1 is a subspace of X, and moreover if Y is the subspace of X generated by the subset U, then Y1 = U1. We call Y1= U1 the orthogonal complement in X of the subspace Y or of the subset U. Corresponding to the resolutionof a force into a sum of perpendicular components in elementary
machanics we have the following theorem which also justifies the term orthogonal complement.
§21
EUCLIDEAN SPACES
275
THEOREM 21.7. If Y is a subspace of a euclidean space X, then
X=Y®YI. PROOF. Since for the case where Y = 0 the theorem is trivial, we can suppose Y * 0. Any vector of the intersection y n Yl is, by definit-
ion, orthogonal to itself; therefore it must be the zero vector of X. Hence Y fl Yl = 0. To show that X = Y + Yl, we first select an orthonormal base (yl, ... , yp) of Y. For an arbitrary vector x of X, let y = Xlyl + ... + Xpyp where A, = (xlyt) for i = 1, 2, ... ' p. Then z = x - y is a vector of Yl since (z ly,) = 0 for all i = 1, 2, ... , p. Hence the vector x of X can be written as x = y + z where y and z are vectors of Y and Yl respectively.
For any vector x of X the unique vector y of the subspace Y of X such that x = y + z where yEY and is
called the orthogonal projection on Y of the vector x. By
Pythagoras' theorem, we get Ilxll = Ily112 + IIz112
and consequently we have proved the following corollary. COROLLARY 21.8. Let X be a euclidean space and Y a subspace of
X. Then for each vector x of X and its orthogonal projection y on Y BESSEL s inequality:
IIx11 3 Ilvll
holds. Moreover the equality sign holds if and only if x = y. C. ScuwARz's inequality
In order to define the cosine of the angle between two vectors x and y of a euclidean space X by the expression (xly) Ilxll hull
as we have done at the beginning of this chapter, we have first to verify that its absolute value is not greater than one. This we do in the next theorem
276
Vill
INNER PRODUCT SPACES
THEOREM 21.9. Let X be a euclidean space. Then for any vectors x and y of X, SCHWARZ'S inequality
I(xIv)I < IIxII IIYII holds. Furthermore, the equality sign holds if and only if x and y are linearly dependent.
PROOF. If y = 0 then the theorem is trivially true. Suppose now that y is a non-zero vector. Then, by 21.7, we can write x = Xy + z where A = (xly)/Ilyll2 and z is orthogonal to y. By Bessel's inequality 21.8, we get IIXYII < IIxII. Hence I (xly)I/IIYII < 11X II
and SCHWARZ'S inequality follows. For the second statement of the
theorem, we observe that by 21.8 the equality I(xly)I = IIxII 11II holds if and only if x = Xy i.e. if and only if x and y are linearly dependent vectors.
Let us now study some important consequences of SCHWARZ'S inequality. For any non-zero vectors x and y of a euclidean space X the expression (x1Y)/IIxII 1111 is defined and satisfies the following inequalities:
(xly) -1
IIxII IIYII
< 1.
Therefore we can formulate the following definition. DEFINITION
21.10 Let X be a euclidean space and x, y two
non-zero vectors df X. Then the cosine of the angle 0 (0 < 0 < 7r) between x and y is (xly) cos 9 = IIxII IIYII
We notice that'xly if and only if the angle between x and y is it/2
and that x and y are linearly dependent if and only if the angle between them is 0 or a. The cosine law of the inner product, Ilx +y112 = IIxI12 + I1yll2 + 2(xIy),
can now be written as follows: Ilx +y112 = 11x112 + Ilyll2 + 211x11 11Y 11 cos 0.
§21
EUCLIDEAN SPACES
277
In the case where X is the euclidean space V2 or R2 , the above definition of angle coincides with the usual one in elementary geometry and the cosine law above coincides with the usual cosine formula of a triangle ABC in trigonometry: c2 = a2 + b2 - 2ab cos C.
'.\ -------x+y
The difference in signs is due to using C = 7r - 0 instead of 0 in considering the angle between two vectors. Another important geometric consequence of SCHWARZ'S inequality is the well-known triangle inequality.
COROLLARY 21.11. Let X be a euclidean space. Then for any vectors x and y of X, the triangle inequality IIx +y11 < 11X 11 + Ily ll
holds. Furthermore, the equality sign holds if and only if x = 0 or y = 0 or x = Xy for some positive number X.
PROOF. It follows from the cosine law and Schwarz's inequality that IIx +yl12 = IIxl12 + Ilyl12 + 2(xiy) < 11x112 + I1y112 + 2lix11 Ilyii
= (11x11 + Ilyll)2.
Hence the triangle inequality follows. To discuss the equality sign, we may exclude the trivial case where x = 0 or y = 0. If IIx + y II = Ilxll + ilyll, then, by the cosine law, we get 11x112 + IIyI12 + 2(x1y) = IIXII2 + Ilyll2 + 21lxil Ilyll,
and hence
(xly) = IIxllllyll.
278
VIII
INNER PRODUCT SPACES
By 21.9, the non-zero vectors x and y are linearly dependent, and hence x = Xy for some real number A. Therefore it yields
X(yly) = IAlilyll2. Hence A = IXI, i.e. A is a positive real number. Conversely if x = Xy for a positive real number A, then
Ilx+yll = ii(X+ l)yll = (A+ 1)IIyAI = AIIYII + Ilyii = Ilxll + IIYII.
Then proof of the theorem is now complete. D. Normed linear space
Lying between the concept of linear space and the concept of euclidean space in the order of generality, there is the concept of normed linear space. A normed linear space is an ordered pair (X,n) where X is a real (finite-dimensional) linear space and n : X -> R is a mapping such that
(i) n(x) > 0, and n(x) = 0 if and only ifx=0 (ii)
n(Ax) = lXln(x)
(iii)
n(x +y) 5 n(x) + n(y).
The non-negative real number n(x) is called the norm of the vector x. If X is a euclidean space and if we define n(x) = Ilxll, then (X,n) is a normed space. In other words, every euclidean space is a normed
linear space. The converse of this is not necessarily true: given a normed linear space (X,n) there does not always exist a positive definite symmetric bilinear form (xly) such that Ilxll = n(x). A necessary condition for (X,n) to be a euclidean space is that the norm function n satisfies the parallelogram identity: n(x + y)2 + n(x - y)2 = 2(n(x)2 + n(y)2 ).
It turns out that this condition is also sufficient. For the interested readers we refer to JOHN VON NEUMANN, Functional Operators, Chap-
ter XII (Princeton University Press).
E. Exercises 1. In R4 find a unit vector orthogonal to. (1, 1,-1, 1), (1, (2, 1, 1, 3).
1),
§21
279
EUCLIDEAN SPACES
2. Find an orthonormal base of the solution space of
3X1-X2-X3+X4-2X5=0 X1+X2-X3 + X5=0 3. Let B = (x1
, ... , xn) be an orthonormal base of a euclidean
space X. Apply the GRAM-SCHMIDT orthogonalization process + Xn ) , ... , x1 + X2 + to get an orthonormal base C. Determine the matrix of the
to the base (x1 , x1 + X21 x1 + x z + x3
change from B to C. 4. Let M be a real (n,n)-matrix such that det M 0 0. Show that
there exist unique real (n, n)-matrices Q and T such that
(i) M = QT (ii) the row vectors of Q form an orthonormal base of Rn (iii) all terms of T below the diagonal are zero (iv) all diagonal terms of T are possitive. 5.
Let Y be a subspace and x an arbitrary vector of a euclidean space X. Suppose (x1, ... , Xm) is an orthonormal base of Y. Prove that the orthogonal projection of x is (xIx1 )x1 + ... + (XIXm )Xm .
6. Let el , (i) (ii)
... , e,j be n vectors of a euclidean space X such that 1k111 = I for all i = 1, ... 11e1- e111 = 1
,
n, and
for all i * j.
Find the angle between e; and ej and the angle between s - ei and s + ei where
+en)/(n+1). Find a geometric interpretation of the vectors for n = 2 and n = 3.
s=(e1+
7.
In the linear space of all polynomials in T with real coefficient of degree < 4 an inner product is defined by
and
(f18) =,f Find an orthonormal base.
if(x)g(x)dx.
280 8.
VIII
INNER PRODUCT SPACES
Let x1, ... , xm be m vectors of an n-dimensional euclidean space and
'(x1Ix1) (x1Ix2) . .. . (xllxm) 0 =
(xmlX1) (xmIx2) .... (xmlxm) I
.
Prove that A * 0 if and only if x1, ... , xm are
linear
independent. 9.
10.
The distance d(x, U) from a vector x to a subset U of a euclidean space is given by d(x, U) = min { IIx - u II : uE U} . Show that if U is the orthogonal complement of a unit vector e, then d(x, U) = I(xle)l.
Let X be a euclidean space. For every ip e End(X) define the max {llpxli : IlxII = 1). Prove that of p by 11 L11 and 114, 11 Il'p+ 'II < Let X be a euclidean space and z a fixed non-zero vector of X.
norm IIoII 11.
Determine the set of all vectors x of X such that x-z is orthogonal to x + z. § 22. Linear Transformations of Euclidean Spaces A. The conjugate isomorphism
Let X be a euclidean space. For each vector x of X, we denote by ax the covector of the linear space X defined by ax (y) = (y Ix) for all y EX.
A linear transformation a: X - X * of the linear space X into its dual space X* is then defined by
a(x) = ax for all xEX. By the positive definiteness of the inner product, the linear transformation a is injective. On the other hand, since X is a finitedimensional real linear space, the dual space X* of X has the same dimension as X. Therefore a is also surjective and hence it is an isomorphism. Thus we have proved the following theorem.
§22
LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES
281
THEOREM 22.1. Let X be a euclidean space and X* the dual space of
the linear space X. Then the linear transformation a: X -> X* such that
a(x) = ax and ax(y) = (ylx) for all x, y of X is an isomorphism.
As a consequence of the theorem, an inner product in the dual space X* is defined by
(fig) = (a-' (g) la' (f)) for all f, and the dual space X* becomes a euclidean space itself. We call the euclidean space X * the dual space of the euclidean space X. The isomorphism a of Theorem 22.1 is called the conjugate isomorphism of the euclidean space X onto its dual space. Another consequence of Theorem 22.1 is that we may regard, by
means of the conjugate isomorphism a, the euclidean space X as 'self-dual'. It is easy to see that if (x, , ... , x,) is an orthonormal base of X and (fl, ... , f,) is the base of X* dual to (xl , ... , x1) then a(x;) = fi for i = 1, ... , n. Hence any orthonormal base may be regarded as 'self-dual'. However we shall not press this point of view any further here. Finally we observe that the reverse of the order of f and g in the equation (f 1g) = (a'(g)Ia'(f)) is deliberate,
so that the same formula is also used for the complex unitary space (see §24B).
REMARKS 22.2. In his famous treatise, The Principles of Quantum Mechanics, P.A.M. DIRAc uses the ket vector Iy> to denote a vector y of a euclidean space X and the bra vector (xl to denote the covector
ax of X. In this way, every ket vector determines uniquely a bra vector, and vice versa; a complete bracket (xiy) is then the inner product ax(y) = (xly). Analogously, we have a conjugate isomorphism a': X* -+ X** of the euclidean space X* onto its dual. It is now interesting to compare the composite linear transformation a'oa: X - X** of the conjugate
isomorphisms with the functorial isomorphism tX: X -+ X** of Theorem 8.3 which is defined by
tX(x) = Fx and Fx(f) = f(x) for all xEX and feX*. We contend that a'oa = tX. Before proving this let us observe that for any xEX and feX*
f(x) = (xla'(f)).
282
VIII
INNER PRODUCT SPACES
Now for any xEX and any
we get
((a'oa) (x)) (f) = (a(x)If) _ (xI d-' (f)) = f(x), proving FX = (a'o a) (x) and hence tX = a'o a. To summarize: Everything we say about a euclidean space X holds
for the euclidean space X*; and X is also in a functorial conjugate isomorphic relation with its second dual space X**.
B. The adjoint transformation
The conjugate isomorphism a: X -> X* of a euclidean space onto
its dual space (or rather its inverse a') enables us to interprete various properties of certain covectors of X as properties of certain vectors of X. For example the properties of the annihilator AN(Y) of a subspace Y of X given in Theorem 8.5 can be interpreted as properties of the orthogonal complement Yl of Y. Another application of the conjugate isomorphism leads us to the very important concept of the adjoint of a linear transformation. We
recall that the dual transformation 0* of a linear transformation 0: X -+ Y is defined as a linear transformation 0*: Y*
X* such that
0*(g) = goo for allgEY*, or diagrammatically
X-Z0---+ Y
R Suppose now that both X and Y are euclidean spaces. If we denote by a: X X* and 0: Y Y* the conjugate isomorphisms, then for each linear transformation 0: X -> Y of linear spaces, we get a linear
-
transformation ¢ = a' oo*o$3, called the adjoint of the linear transformation ¢. * Y*
- X*
Y --------'X
§22
LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES
283
Now for each pair of vectors xEX and yeY, we have an inner product (xl¢(y)) of the euclidean space X and an inner product (t(x)Iy) of the euclidean space Y. We shall prove that the adjoint ¢ of the linear transformation 0 is characterized by the equality of these two inner products:
(O(x)ly) = (x1(y)) for all xEX and yeY. Indeed
(xla `o$*oj3(Y)) = (xla1(0*(ly))) _ 0*(Qy)(x) = (QyoO)(x) = Qy(O(x)) _ (0(x)IY).
If 4j, and 4/2 are two linear transformations of Y into X such that (O(x)lv)=(xliP,(y))and(O(x)ly)=(xI02(y)), then xl(#1-02)(y)) = 0 for all xeX; therefore (P1 - 412)(y) = 0 for all yEY. Hence if/, = 1'/2. We formulate our results in the following theorem:
THEOREM 22.3. Let X and Y be euclidean spaces and 0: X - Y a linear transformation of linear spaces. Then there exists a unique
linear transformation $: Y -> X, called the adjoint of the linear transformation 0 such that
(0(x)Iy) = (xI(Y)) for all vectors x of X and y of Y.
The formation of the adjoint satisfies the following properties: (a) lx = ix ; (b)
0.0 _o
These follow from
(ix(x)iy) = (xlix(y)) forx,yEX,and (I,o0(x)Iz) = WO(x)AZ) = (0(x)I RZ)) = (x*Rz)) = (xIOokZ)) Let us now examine the formation of the adjoint in the language of categorical algebra. If we put (i) A(X) = X for every euclidean space X, and (ii) A(0) = for every linear transformation 0: X -* Y, then we get a contravariant functor A of the category of all euclidean spaces (where morphisms are linear transformations and composition
284
VIII
INNER PRODUCT SPACES
has the usual meaning) into itself. Moreover for all x c= X and Y E Y, we get
('(x)ly) = (yl(x)) = a(y)Ix) = (xl(y)) = (cb(x)Iy) Therefore (c)
This means that the functor A is idempotent, i.e. A2 is equal to the identity functor of the category of euclidean spaces. Equally self-evident are the following equations: (d) aO- = X (e)
0+
;
and +
Finally let us compare the functor A with the contravariant functor D of §8B defined by (i) D(X) = X* for every euclidean space
X, and (ii) D(¢) = ¢* for every linear transformation of euclidean
spaces. A natural isomorphism u: A - D is defined by putting u(X): A(X) - D(X) to be the conjugate isomorphism a: X -> X*. The requirement that u, as a natural transformation, has to satisfy is that the diagram
u (X)
A(X) -
) D(X) D (0)
A (0)
A(Y)
D(Y) U(Y)
is commutative for every linear transformation 0: X -> Y. But this follows from the definition of the adjoint A (¢) = y5 of 0.
x
a
Y0
X*
Y*
No wonder that the theory of the adjoints runs so parallel to that of the dual transformations!
§22
LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES
285
C. Self-adjoint linear transformations Since a euclidVan space X is a finite-dimensional real linear space with an inner product, the endormorphisms of the linear space X can
be classified according to certain properties relative to the inner product. A most important class of endomorphisms of X is that of the self-adjoint transformations; these are defined as follows:
Let X be a euclidean space. Then an endomorphism 0 of the linear space X is a self-adjoint (or symmetric) transformation of the euclidean space X if _ or equivalently if DEFINITION 22.4.
(0 (x)ly) _ (xjO(y)) for all x,
Let us now study the matrix of a self-adjoint transformation relative to an orthonormal base. THEOREM 22.5. If 0 is an endomorphism of a euclidean space X and
(x, , ... , xn) is an orthonormal base of X, then the matrix M = a** of 0 relative to this base is given by
ai = (0(xi)Ix1)
i, j = 1, 2, ... , n.
PROOF. If follows from the orthonormality of the base (x1, ... , xn that if x = X1 x1 + .. + Xnxn is a vector of X, then X. = (xlxi) for i = 1, ... , n. Therefore we get
0(x1) _ (O(xi)lxl )x1 + ... + (O(x1)Ixn)xn for i = 1, 2,
... , n
and the theorem follows.
COROLLARY 22.6. Let X be a euclidean space, B = (x 1, ... , x ) an orthonormal base of X. Then for any endomorphism 0 of X and its adjoint , the following statements hold (a)
MBB () = MBB (ct)l
(b) det $ = det 0.
Let X be a euclidean space, and $ an endormorphism of the linear space X. If 0 is a self-adjoint transformation of the euclidean space X, then the matrix of 0 relative to any orthonormal base of X is a symmetric matrix. Conversely, if the matrix of 0 relative to some orthonormal base of X is a symmetric matrix, then 0 COROLLARY 22.7.
is a self-adjoint transformation of the euclidean space X.
286
VIII
INNER PRODUCT SPACES
PROOF. The first statement of the corollary follows from Corollary 22.6. Let B = (x1, ... , xn) be an orthonormal base of X, such that MBB(cb) = a** is a symmetric matrix. Then ' for any vectors x = Al x1 + .. + Anxn and y = µ1 x1 + ... + we get
(4)(x)ly) = (Ea,,XIxil Eµkxk) = Ea11X iiii and
O(y)) = (EAkxk I Eailµix;) WOW)
=
Eai,A;pi.
Since a.1 = air, (4)(x)ly) _ (xI4)(y)). Hence 0 is a self-adjoint trans-
formation of the euclidean space X. D. Eigenvalues of self adjoint transformations
Our main result is that every self-adjoint transformation of a euclidean space is a diagonalizable endomorphism. Let us first prove the following important lemma on invariant subspaces of an endomorphism of a linear space. LEMMA 22.8. Let X be a finite-dimensional real linear space and 0 an
endomorphism of X. Then there exists a 1-dimensional or a 2dimensional subspace of X invariant under ¢. PROOF. Let (x1,
... ,
xn) be an arbitrary base of X and M = a* * the
matrix of 0 relative to the base (x1, ... , xn ). Then the existence of an eigenvector x of 0 is equivalent to the existence of a non-zero n-tuple (p1 , . . , µn) of real numbers that is a solution of the .
following system of linear equations: a1 1 µ1
+ ... + an 1 An =
a12µ1 + ..
.
Aµ1
+ an2An = AIA2
a1ngi + ... + annµn = Aµn for some real number A. But this is the case if and only if A is a root of the characteristic polynomial p4) of OA. Now p4) is a polynomial with real coefficients; regarding it as a polynomial with complex coefficients and applying to it the fundamental theorem of algebra, we see that P4) always has a complex root A. We may therefore consider two cases: (i) A is real and (ii) A is not real.
§22
287
LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES
In the case (i), A is a eigenvalue of 0. Therefore there exists a non-zero vector x of X such that 0(x) = ?.x and the 1-dimensional subspace generated by x is invariant under 0. In the case (ii), we write A = (3 + iy where both Q, y are real and
i2 = -1, y * 0. Consider the linear transformation ': C" -- C" such that relative to the canonical base of C" the matrix of 41 is equal to M.
Then the linear transformations 0 and 4 as well as the matrix M all
have the same characteristic polynomial. Therefore, the above system of linear equations has a non-zero complex solution, say
(pt +iv,, ... , An +ivn): + an,(pn+ivn)=(I3+iy)(2i +iv,) a11(2i +iv,) + at2(pt + iv,) + ... +
an 2 (An + ivn) _ (R + iy) (p2 + iv2 )
...............................................
a,n(21 + iv1) + ... + a,,,, (pn +ivn)=(Q+iy)(pn +iv"). Separating the real and imaginary parts, we get
a, t p, + ... + an t An = apt - 7vt a, 2 p1 + ... + an 2 pn = Op2 _ yv2
ainp1 + ... + annpn = Qpn - '1'vn and
+ an i vn = 3vi + -nut + an2vn = (3v2 + ?'p2
a12v1 +
atnvn + ... + annvn = (vn + ypn where all terms are real numbers. From these equations, it follows that the vectors
X=21x1 + .
-
+ pnXn and y = v1x,
of the real linear space X satisfy the following equations
O(x) = Ox - yy and m(y) = yx + PY.
vn Xn
288
VIII
INNER PRODUCT SPACES
Therefore the subspace Y generated by the vectors x and y is is invariant under 0. Moreover since (pi + ivl , ... , P, + non-zero, the vectors x and y cannot be both zero. Therefore the invariant subspace Y has either the dimension 1 or 2.
For self-adjoint transformations of a euclidean space we are able to obtain a better result. Let ¢ be a self-adjoint transformation of a euclidean space X. Then there always exists a 1-dimensional subspace of X invariant under 0. COROLLARY 22.9.
PROOF. Obviously we need only show that the case (ii) of the proof of 22.8 is impossible under the additional assumption that 0 is self-
adjoint. If case (ii) presents itself then we get two vectors x and y of X which are not both zero together with two real numbers (3 and 7 * 0 such that
O(x)_(3x-7Y and 0(y)='yx+(3y. From these equations it follows that (O(x)IY) = Q(xly) - 7(YIY) (xIO(Y)) = 7(xlx) + R(xly) . Since 0 is self-adjoint, (¢(x)ly) = (xIO(y)). Therefore and
7[(xlx) + (Yly)] = 0. Hence x = 0 and y = 0, or y = 0, contradicting the assumption. Starting out from the very promising corollary 22.9 we proceed to prove our main result: THEOREM 22.10.
For every self-ad joint transformation 0 of a eucli-
dean space X, there exists an orthonormal base of X consisting of eigenvectors of 0.
PROOF. We shall prove this theorem by induction on the dimension of X. For dim(X) = 1, the theorem is trivial. Assume now that
the theorem holds for any n-dimensional euclidean space. Let dim(X) = n+ 1 and 0: X -> X be self-adjoint. Then by 22.9 there
is an eigenvector x of 0 with real eigenvalue X. For any vector y of X orthogonal to x, the vector 0(y) is again orthogonal to x, for (xlO(Y)) = (O(x)ly) = (Axly) = X(xl y) = 0.
§22
289
LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES
Consequently if Y is the orthogonal complement of the subspace
generated by the eigenvector x, then the linear transformation ¢': Y -+ Y, such that 0'(y) = 0(y), is a self-adjoint transformation of the n-dimensional euclidean space Y. By the induction assumption,
there is an orthonormal base (yl
,
... , yn) of Y consisting of
eigenvectors of 0'. Then (yl, . . . , yn, yn +1) , where yn + is an orthonormal base of X consisting of eigenvectors of 0.
x/II x II
In terms of matrices, we can formulate Theorem 22.10 as: every real symmetric matrix is diagonalizable. A more precise result is given in Corollary 22.13. E. Bilinear forms on a euclidean space The study of a bilinear form on a euclidean space is equivalent to
the study of a pair of bilinear forms on a real linear space, one of which is further known to be symmetric and positive definite. Here the conjugate isomorphism a: X -+ X* again plays a crucial role as in the study of endomorphisms. THEOREM 22.11. Let X be a euclidean space and let 41 be a bilinear
.form on the linear space X. Then there exists a unique endomorphism ' of X such that 1P (x,y) = (xIjy) for all x,yEX. PROOF. The uniqueness of 4 is an immediate consequence of the positive definiteness of the inner product. To prove the existence of we consider for each x EX the linear form T.,: X -+ R defined by 41, (y) = 4'(x,y) for ally r=-X.
Denoting by a: X - X * the conjugate isomorphism of §22A (i.e. (a(x))(y) = (ylx) for all x, y E X), we claim that the mapping >': X - X defined by IJi(x) = a 1 (4X) for all
is an endomorphism of X which satisfies the requirement of the theorem. If xl and x2 are two vectors of X, then it follows from
`I'(xI + x2, y) = 'I'(xl ,y) + `1'(x2, y) that
*X 1 +y2 = *X I + *X2
+ x2) = a1(4'1+2) = a' ('YXI + *X2) t (xl = a-l(*XI) + a-'(41 X2) _ ,i(xl) + Iy(x2). Similarly 1'!(Xx) = A1j/(x), Therefore
290
VIII
INNER PRODUCT SPACES
proving the linearity of i. For any two vectors x and y of X, we obtain
(Jixly) _ (a ' (`I'x) IY) = 4'x(Y) = `I'(x,Y) Moreover (>'x l y) = (x I iyy); therefore
4'(x,y) = (>[ixI y) = (xl y) for all x, yEX. This proves the theorem. Conversely every endomorphism > of X determines a bilinear form ' on X by the relation '(x, y) = (>('xly) for all x, y EX.
Therefore there exists a one-to-one correspondence between the endomorphisms of X and the bilinear forms on X. If 0 corresponds to ' under this correspondence, then we say that 0 represents 'I'. By means of this correspondence, we can formulate properties of endomorphisms as properties of the bilinear forms that they represent. For example, a vector x of X is called an eigenvector of a bilinear form 'I' if it is an eigenvector the endomorphism 0 which represents 'I'. More precisely, x is an eigenvector of 'I' with corresponding eigenvalue X if and only if
'(x, y) = A(xly) for ally EX. A further property of the correspondence is that every symmetric bilinear form is represented by a self-adjoint endomorphism and vice versa. Consequently the diagonalization theorem 22.10 gives rise to a bilinear form counterpart as follows THEOREM 22.12. For every symmetric bilinear form on a euclix,) of dean space X, there exists an orthonormal base (x,, X and a family (Al , of real numbers such that
...,
*Y(xi,xt)=Xi(xilxl)=A;Si/ for all i, j= 1, ...,n. In other words, the matrix of ' relative to the base B is the diagonal matrix N
lxi
AnJ
§22
LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES
Hence if x = a1 x1 +
.
+ a, xn and y = 131x1 +
.
291
+0,,x,, are
vectors of X, then
`I'(x,y)=X1a,01 + ... +Anan&The diagonalization theorem 22.12 can be also paraphrased as follows. Let 4) and ' be two symmetric bilinear forms on a finitedimensional real linear space X. If 4) is positive definite, then 4) and 4, can be simultaneously diagonalized. More precisely there exists a
base B of X relative to which the matrix of ' and the matrix of ' are respectively the diagonal matrices C
lxi
1
1
and ll l
XnJ
We leave the formulation of the corresponding theorem on quadratic forms to the reader. F. Isometry There is another important class of endomorphisms of a euclidean
space that consists of all endomorphisms which leave invariant the inner product of the euclidean space in question. These are characterized by the equivalent conditions in the following theorem.
THEOREM 22.13. Let X be a euclidean space and 0 an endomorphism of X. Then the following statements are equivalent. (i) I10(x)II = Ilxil for all xEX. (ii) (O(x)I 0(y)) _ (x1 y) for all x, yeX.
(iii) 0 is invertible and 0-' = 0. (iv) For any orthonormal family (x 1i ... , xp ), where I
(ii)
(ii). This follows from the cosine law: 2(xly) = lix + yll2 - 11xII2 - 11y112.
(iii). If 0(y) = 0, then (yl y) _ (0(y)I0(y)) = 0 and therefore
292
VIII
INNER PRODUCT SPACES
y = 0. Thus 0 is injective and hence invertible. For any x, y of X, the condition (ii) yields that
(xIy) = (O(x)10(y)) = (XI-0(y)) Therefore ix and = . (iii) - (iv). If (x I, . . . , xp) is orthonormal, then (O(X1)I0(X1)) _ (Xrl oO(XJ)) = (xil x;) = Sii. Therefore (O(x1), ... , ¢(xp)) is orthonormal. 0-1
(iv)
(v). This is obvious.
(v) = (i). Let (x1, ... , xn) be an orthonormal base of X for which (v) holds. Then for each x = X1x1 + + Xnxn, we get = X12 + ... + Xn 2 = 11o(X)112 . 11X11 The proof of the theorem is complete. We call an endomorphism ¢ of a euclidean space X an isometry or
an orthogonal transformation of X if any one of the equivalent
conditions of 22.13 is satisfied. It follows from (iii) of 22.13 that for any isometry 0 of X det(0) = ±1
An isometry 0 is called a proper rotation if det 0 = I and an im-
proper rotation if det 0 = -1.
Let X be a euclidean plane (i.e. a 2-dimensional euclidean space) and (XI , x2) an orthonormal base of X. Then the identity map ix of X is clearly a proper rotation of X and so also the isometry -ix. The isometry 0 defined by EXAMPLE 22.14.
OXI = X2, 4X2 = X1
has a determinant -1 and is therefore an improper rotation. To perform the improper rotation 0 "physically" one would have to lift up the plane, turn it over and then put it down again. x1
x2
0
>
x1
Whereas both ix and -ix can be executed without lifting.
. x2
§22
LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES
293
The set of all orthogonal transformations of an euclidean space X together with composition form a group O (X) called the orthogonal group of the euclidean space X which is a subgroup of the general
linear group GL(X) of all automorphisms of the linear space X. Similarly the proper rotations of X form a group SO(X) called the special orthogonal group of the euclidean space X which is subgroup of O(X). These groups play an important role in euclidean geometry.
A real square matrix is called an orthogonal matrix if A'A = A At = I, i.e. if A` = A-'. In other words the row vectors (column vectors) of A form an orthonormal base of the arithmetical euclidean
space. Between orthogonal matrix and isometry we have the following relation.
Let 0 be an endomorphism of a euclidean space X. If 0 is an isometry, the matrix of 0 relative to any orthonormal base of X is an orthogonal matrix. Conversely if relative to some orthonormal base of X the matrix of 0 is an orthogonal matrix, then THEOREM 22.15.
0 is an isometry of X.
Consequently the matrix of the change of one orthonormal base to another orthonormal base is an orthogonal matrix. Combiningthis with Theorem 22.10, we obtain: COROLLARY 22.16.
For any real symmetric square matrix A of
order n, there is an orthogonal matrix P of the same order, such that the matrix PAP-' is a diagonal matrix. EXAMPLE 22.17.
The orthogonal matrices of order 2 have a rather
simple form. Let A be an orthogonal matrix of order 2 with det A = 1. Then a simple calculation in trigonometry shows that A has the form
rcos 0
- sin 61
sin 0
cos
for some real number 0. If 0 is a proper rotation of a euclidean plane X and B is an orthonormal base of X, then the matrix of 0 relative to B has the above form. We call 0 the angle of rotation of 0, and it is easily seen that 0 does not depend on the choice of the orthonormal
base B. Clearly ix and -ix have as angles of rotation 0 and it respectively. It is easily verified that if a proper rotation 0 leaves invariant a single non-zero vector of X, then 0 = ix.
294
VIII
INNER PRODUCT SPACES
Let Y be a 3-dimensional euclidean space and p a proper rotation of Y different from iy. Then the characteristic polynomial Pp is a polynomial of degree 3 with leading coefficient equal to -1 and constant term equal to 1. It follows from elementary EXAMPLE 22.18.
theory of equations that Pp has at least one positive real root A; therefore p has eigenvectors. Let a be an eigenvector of p with eigenvalue X. Then it follows from (tall = Ilpall that A = ± 1. Therefore A = 1.
If A is the 1-dimensional invariant subspace generated by a, then p leaves each vector of A fixed. We claim that px * x for all x 0- A. Suppose to the contrary that x 0 A is such that px = x. Since 0 leaves fixed all vectors of the plane generated by a and x, we may assume without loss of generality that alx. Applying the strong version of the supplementation theorem, we can find a vector y of Y such that (a, x, y) is an orthonormal base of Y. Since p is an orthogonal transformation, either py = y or py = -y. The first case is impossible since p * l y ; the second case is impossible since det p = 1. Therefore A consists of all vectors left fixed by p. A is called the axis of rotation of p. It follows from 22.13 (ii) that X = A is invariant under p. Denoting by 0 the endomorphism of X defined by p, we see easily that 0 is a proper rotation of X. Applying the result of 22.17, we see that the matrix of p relative to any orthonormal base (e, , e2, e3) with e3 e A is of the form
(cos 9
- sin 9
in 0
cos B
0
0
00 0 1
where 0 is called the angle of rotation of p. G. Exercises 1.
Show that if p is a self-adjoint linear transformation of a euclidean space X, then there exists a unique self-adjoint linear
transformation >y of X such that q3 = gyp. 2. Let pp be a self-adjoint linear transformation of a euclidean space
X. p is said to be positive definite if 0 for all non-zero xeX. Prove that (i) p is positive definite if and only if all eigenvalues of Sp are positive; (ii)
for every positive definite tip there is a unique positive definite 0 such that 2 = tip.
§22
LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES
295
3.
Let p be a mapping of a euclidean space X into itself. Prove that if (sp4py) = (xly) for all x, yeX, then p is an isometry.
4.
Let a be a unit vector in a euclidean space X. Define p: X -- X by putting p(x) = x - 2(alx)x. Prove that (i) p is an isometry (such an isometry is called a reflexion); (ii) det p = -1.
5. Let p be an isometry of an n-dimensional euclidean space X.
Prove that if I is an eigenvalue of p and if all eigenvectors with eigenvalue 1 form a (n-l)-dimensional subspace of X, then tp is either a reflexion or the identity. 6. Let a and b be two distinct unit vectors of a euclidean space X. Prove that there exists a reflexion p such that pa = b. 7. Show that every isometry of a euclidean space is a product of reflexions. 8. A linear transformation p of a euclidean space X is said to be .
skew self-adjoint if Opxl y)
(xl ipy) for all x, yeX.
Prove that (i)
p is a skew self-adjoint if and only if the matrix of sp relative to an orthonormal base of X is anti-symmetric;
(ii) if p is skew self-adjoint and if Y is an invariant subspace of 4p then Y t is an invariant subspace of gyp.
9. Prove that if X is an eigenvalue of a skew self-adjoint linear
transformation, then X = 0.
10. Show that for every skew self-adjoint linear transformation ip of a euclidean space, there exists a base B such that MBB (gyp) has the following form
J
296
VIII
INNER PRODUCT SPACES
11. Give an example of two self-adjoint linear transformations whose
product is not self-adjoint. 12. Let p and P be self-adjoint linear transformations. Prove that ,p o 0 + iji ocp is self-adjoint and tip o iJ/ - iJi o ip is skew. What happens
if both p and 0 are skew? What happens if one of p and i is self-adjoint and the other one is skew? 13. For each pair of complex numbers a a-id 13, define
(a113) = Re(a(3).
Show that RC 2 is a 2-dimensional euclidean space with inner product defined above. (b) For each complex number y, define spy: RC2 -> RC2 by Py (a) = yet. Show that spy is a linear transformation of the real linear space RC 2 . Find the matrix of oy relative . to the base (1, i) of (c) For what values of -y is an isometry?
(a)
W.
'p-,
14.
Show that an isometry p is self-adjoint if p2 is the identity mapping.
15.
Let (e, .... e,) be an orthonormal base of eigenvectors of a self-adjoint linear transfromation p with corresponding eigenvalues X1,
... ,
A,,. Define for every real number a
Pa = p - a. (a)
Prove that ifais different from all eigenvalues A1i
... , A of
pp, then
(x) = E Ai-a `
e;
(b) Suppose a is an eigenvalue of gyp. Prove that the equation
PPax = y
is solvable if and only if y is orthogonal to the eigenspace Xa. Find a solution of the equation.
§23
UNITARY SPACES
297
Bring the following quadratic forms by appropriate orthogonal transformations into diagonal form
16.
x; + 2x2 + 3x3 - 4x1 x2 - 4x2x3 ; (b) x; - 2x2 - 2x3 - 4x1x2 + 4x1x3 + 8x2x3; (c) 2x1x2 + 2x3x4; (d) x + X2 + x3 + x4 - 2x1 x2 + 6x1 x3 - 4x1 x4 - 4x2x3 (a)
+ 6x2 x4 - 2x3 x4 17.
Find the eigenvalues of the quadratic form
ip (x) = 2 E ti ti
i
§23. Unitary Spaces
In this present §23 we study unitary space which is the complex
counterpart of euclidean space. We shall use in the sequel the customary representation of complex numbers. The letter i is used exclusively to denote the imaginary unit; for any complex number A we write A = a + ig where the real numbers a and 0 are the real and the imaginary parts of A respectively. The complex conjugate of A is
the complex number X = a - ip, and the modulus of A is the nonnegative real number IX I= /(a2 + 132).
A. Orthogonality
Let X be a complex linear space. A sesquilinear form on X is a mapping 4': X2 -- C such that
4'(x1 +x2,Y) = 4'(x1,Y)+(Xx2,Y) 4'(Ax,Y) = A4'(x,Y) 4'(x, Y1 +Y2) = 4'(x,Y1) + 4'(x,Y2) 4'(x, Ay) = X F(x,Y),
for any complex number A, and any vectors x,x1,x2,y,Y1,Y2 of X.
In other words 4' is linear in its first argument and semi-linear (relative to the conjugate automorphism of C) in its second argument; one might say, 4' is one-and-a-half linear.
298
VIII
INNER PRODUCT SPACES
A sesquilinear form 4) of a complex linear space X is an hermitian form if 4)(x, y) = 4)(y, x) for all x, yEX.
It follows that if 4) is a hermitian form, then 4)(x, x) is a real number
for vectors x of X. Hence we have the following definition: a hermitian form (P of a complex linear space X is positive definite if
4)(x, x) > 0 for all non-zero vectors x of X. If 4) is a positive definite hermitian form on X, then the conjugate 4P_ of 4) defined by
if (x, y) = 4)(x, y) for all x, yEX is again a positive definite hermitian form on X. EXAMPLE 23.1. Let X be a complex linear space of finite dimension n > 1 and (x1, ... , xn) a base of X. If 0 is a sesquilinear form + Xnx and on X, then for any vectors x = X1x1 + + µnxn of X we get y =µ1x 1 +
EXrµi4)(xr,x,)
0(X' Y) =
Therefore the sesquilinear form 0 is completely determined by the complex square matrix A = a** of order n whose terms are 07i = 4)(x1, xi)
i,j = 1, 2, ... , n.
Conversely every complex square matrix A = a** of order n determines uniquely a sesquilinear form 0 relative to the base
(x1, ...,xn)by 4)(x, y) =
EX1a1jiii
where x =
EA1x1 and y = Eµjx1
Under this one-to-one correspondence between matrices and sesquilinear forms, the hermitian form 0 correspond to the hermitian matrix A= a**, i.e. a matrix whose terms satisfy the equations a1/=airfori,j=1, ...,n.
A unitary space is an ordered pair (X, 4)) that consists of a finitedimensional complex linear space X and a positive definite hermitian form 4). When there is no danger of confusion about the hermitian form CF in question, we shall simply denote the unitary space (X,'F)
by X. Furthermore for any vectors x and y of X we write (xly) for
§23
299
UNITARY SPACES
4)(xl y) and call it the inner product in the unitary space X of the vectors x and y. Thus a unitary space is a finite-dimensional complex linear space X together with a complex-valued mapping that assigns to each pair x, y of vectors of X a complex number (xly), called their inner product, such that for each complex number A and all vectors x, y and z of X the following conditions are satisfied:
lull [U2] [U3] [U4]
(Xly)=(Ylx) (x +ylz) _ (xlz) + (yIz) (Xxly) _ A(xly)
(xlx)>Oifx*0.
It follows that if (X, 4)) is a unitary space, then (X, if), where is the conjugate of 4), is a unitary space called the conjugate unitary space of the unitary space (X, (P). The conjugate of the conjugate of a unitary space (X, (D) is identical with (X, (h) itself. EXAMPLE 23.2. Let C" be the complex arithmetical linear space. An inner product in C" is defined by
(xly) =aji +... +anTin where x = (a1, ... , an) and y = (j31, duct in C" is defined by (xly) = A1a1a 1
+ ...
... ,
(3n ). Another inner pro-
+ AnanFn
where ai are arbitrary positive real numbers. These inner products satisfy axioms [U1 ] - [U4] and turn C" into unitary spaces. If X is a unitary space and x is a vector of X, then the norm Ilxli of the vector x is again defined as the non-negative square root
(xix) Vectors of X with norm equal to 1, i.e. Ilxil = 1, are called unit Ilx II =
vectors of the unitary space X. Orthogonality and its related concepts are formulated similarly to those given in §21B. In particular, using similar arguments, we can prove the following theorems.
THEOREM 23.3. In every unitary space there exists an orthonormal base.
300
Vlll
INNER PRODUCT SPACES
THEOREM 23.4. Let X be a unitary space and Y a subspace of the linear space X. Then the orthogonal complement
(xly) = 0 for all yEY}
Yl =
is a subspace of the linear space. Furthermore X = Y®Y1.
As a consequence of 23.4 we obtain the fundamental BESSEL's
inequality: if x =y +z where yEYandyEYl, then Ilxll > IIYII.
It follows from BESSEL's inequality that for any vectors x and y of a unitary space the following inequalities hold: SCHWARZ'S inequality: IIxIIIIYII >I (xIY)I;
the triangle inequality: Ilxll + IIYII > Ilx + YII.
B. The conjugate isomorphism
The results obtained in §22A can also be adopted for unitary spaces. Thus if X is a unitary space, then the conjugate isomorphism a: X - X* of the linear space X onto its dual space X* is defined by
a(x) = ax and ax(y) = (y Ix) for all x, y EX. By means of the conjugate isomorphism, a positive definite hermitian form in X* is defined by
(fig) = (a-'(g)Ia'(f)) = (a 1(f)Ja'(g)) for all f, gEX*. Then the dual space X* of the linear space X endowed
with the inner product defined above is a unitary space called the dual space of the unitary space X. Now the conjugate isomorphism a: X -- X* of the linear spaces can be regarded as an "isomorphism" between the dual space X* of the unitary space X and the conjugate space of the unitary space X (see § 23E).
In exactly the same way we prove that a'oa is identical with the
natural isomorphism tX : X - X** where a': X* -> X** is the conjugate isomorphism.
C. The adjoint
Let X and Y be unitary spaces. Then the adjoint of a linear trans-
§23
UNITARY SPACES
301
formation 0: X -> Y of the linear space X Into the linear space Y is defined as the linear transformation 0: Y -> X such that
(O(x)ly) = (xlo(y)) forallxEXandyEY. The existence and uniqueness of can be proved in exactly the same way as in § 22B. Furthermore the results of § 22B can be adopted for unitary spaces except the formula (d) which is modified here as A0 = A¢ .
D. Self-adjoint transformations
Let X be a unitary space. An endomorphism 0 of the linear space X is called a self-ad joint transformation of the unitary space X if or equivalently
(O(x) IY) = WOW) for all x, yr=X. Self-adjoint transformations of a unitary space X are related to (complex) hermitian matrices in the same way as self-adjoint transformations are related to (real) symmetric matrices (see Corollary 22.6).
In the present case, self-adjoint transformations are characterized by a third condition. THEOREM 23.4. Let X be a unitary space and 0 an endomorphism of
the linear space X. Then 0 is a self-adjoint transformation of the unitary space X if and only if the inner product (xl ¢(x)) is real for all x of X. PROOF.
If 0 is self-adjoint, then (xtO(x)) = (0(x)Ix) = (xj0(x))
and therefore (xIO(x)) is real. Conversely, assume that (xIO(x)) is real for all x of X and consider
(x +YIO(x +Y)) _ (xIO(x)) + (YIO(Y)) + (xIO(Y)) + (YI0(x)), (x +iyIO(x +iy)) _ (xIO(x)) + (YIO(y)) - i(xIO(Y)) + i(YI4(x)) where the left hand sides as well as the first two terms on the right hand sides of these equations are all real. Therefore and hence
WOW) = WOW) (0(x) IY) = (xWO(Y)).
302
VIII
INNER PRODUCT SPACES
An important property of self-adjoint transformations of a unitary space is formulated below: THEOREM 23.5. The eigenvalues of a self adjoint transformation of a unitary space are real numbers.
PROOF. Let A be an eigenvalue of a self-adjoint transformation of a unitary space X and x * 0 a corresponding eigenvector. Then
0(x) = Ax, and WOW) = (O(x)lx) = A(xlx). It follows from 23.4 that (xIO(x)) is real; on the other hand (xlx) is also real. Therefore A is real number.
Consequently all eigenvalues of a hermitian matrix are real numbers.
E. Isometry Let X be a unitary space. An endomorphism 0 of the linear space X is called an isometry, an isometric transformation or a unitary transformation of the unitary space X if
(xly) = (0(x)I0(y)) forallx,yeX. The results of §22F are adaptable to the present case without major modification. We observe here whenever the cosine law of
euclidean space is used in §22, we have to substitute it by its counterpart: 2(x1y) = Ilx +yll2 + i llx + iy112
(1 + i) { IIx112 + 11y112I
F. Normal transformation Let X be a unitary space. We call an endomorphism 0 of the linear space X a normal transformation of the unitary space X if
-o = 0° or equivalently if
(O(x)10(y)) = (fi(x) Ic(y)) for all x, yEX. Thus unitary transformations and self-adjoint transformations of the unitary space X are normal transformations of the unitary space
§23
303
UNITARY SPACES
X. The following is an example of a normal transformation which is neither unitary nor self-adjoint. EXAMPLE 23.6.
Let
(x1, x2) be an orthonormal base of
a
2-dimensional unitary space X. Consider the linear transformation of X defined by
O(xl) = ix1 O(x2) = 0. Then the adjoint $ of 0 is such that
0(x1) _ -ixl 0(x2) = 0.
0 is therefore a normal transformation of X, but it is neither self-adjoint nor unitary. In § 19, we defined a diagonalizable endomorphism 0 of a linear
space X over A as an endomorphism of X for which a base of X exists that consists of eigenvectors of 0. Here we are interested in endomorphisms 0 of a unitary space X for which an orthonormal base of X exists that consists of eigenvectors of 0. These are then the nicest endomorphisms of the unitary space X, and we are tempted to call them the orthonormally diagonalizable endomorphisms of the unitary space X. However, this is unneccessary, since it turns out that
they are just the normal transformations defined above. Clearly orthonormally diagonalizable endomorphisms are normal transformations; the converse will be proved in 23.8. LEMMA 23.7. Let 0 be a normal transformation of a unitary space X. Then for any complex number A and any non-zero vector x of X,
0(x) = a.x if and only if fi(x) = Xx
.
PROOF. Since 0 is a normal transformation, for any complex number A the endomorphism 0 - A is also a normal transformation. On the other hand we have Therefore for all non-zero vectors x of X II0(x) - AXII =
Hence the lemma follows.
Ax11.
304
VIII
INNER PRODUCT SPACES
Our main result on normal transformations is as follows:
THEOREM 23.8. Let X be a unitary space. If 0 is a normal transformation of X, then there exists an orthonormal base of X consisting of eigenvectors of 0.
PROOF. We prove this theorem by induction on the dimension of X. For dim X = 1, the theorem is self-evident. Assume that the theorem holds for unitary spaces of dimension n. Let X be a unitary space of dimension n + 1 and 0 a normal transformation of X. Since 0 is an
endomorphism of a complex linear space, there exist (complex) eigenvalues. Let X be an eigenvalue of 0 and x * 0 a corresponding eigenvector. If Y is the orthogonal complement of the subspace generated by x, then Y is an n-dimensional unitary space. Now for all vectors y of Y, we have
(M(y) Ix) = (yl fi(x)) _ (yl xx) = X(ylx) = 0. Therefore 0(y)EY for of Y by
Hence 0 defines an endomorphism 0'
0'(y) = 0(y) for all yEY. Since 0 is a normal transformation of the unitary space X, 0' is a normal transformation of the unitary space Y. By the induction assumption there exists an orthonormal base (y, , ... , yn) of X consisting of eigenvectors of 0'. Therefore (y, , ... , yn , yn + 1) where yn + 1 = x/11x11 is an orthonormal base of X consisting of eigenvectors of 0.
G. Exercises 1.
Let A = a** be a hermitian matrix. Show that At and T = a** are hermitian. If A is invertible, show that A-' is hermitian.
2.
Let A and B be hermitian matrices of the same size. Show that A+B is hermitian. If AB = BA is hermitian, show that AB is hermitian.
3.
Let X be a unitary space and peEnd(X). Show that if
0
for all xEX, then p = 0. Show that the above statement is false for linear transformations of euclidean spaces.
§23
UNITARY SPACES
305
End(X). Prove that the following
4. Let X be a unitary space
conditions are equivalent. (a)
for every xEX. (b) IIipxII = (c) There exist self-adjoint linear transformations a and (3 such
that,p=a+i(3and ao(3=(3oa. 5.
Let X be a unitary space and ,pE End(X) a self-adjoint linear transformation. Show that ix + and ix - are automorphisms.
6.
Show that the eigenvalues of a unitary transformation are all of the form e'0 with real 6.
7.
Show that if a Tn -Pan-1 Tn-1 + ... + a1T + ao
is the characteristic polynomial of an endomorphism 'p of a unitary space X, then anTn +
an-1Tn-1
+ ... + a1T +ao
is the characteristic polynomial of the adjoint p. From this, what can you say on the characteristic polynomial of a selfadjoint linear transformation?
8. Show that if p is an automorphism of a unitary space X, then there exist a unitary transformation p1 and a self-adjoint transformation '2 such that 'p = 'p1 o p2.
INDEX abelian group 4 abstract algebra 4 addition 6 additive group 6 additive inverse 6 adjoint 300 adjoint functors 90 adjoint transformation 282 affine group 117 affine space 98 affine transformation 114 affinity 117 algebraic structure 4 algebraic structure of abelian group 5 angle of rotation 293, 294 annihilator 80 antisymmetric linear mapping 197 antisymmetrization 213 arithmetical linear space 12 arithmetical projective space 123 arrow category 89 associative A- algebra 66 associative law 7, 9 automorphism 66, 146 axially perspective 128 axioms 4 axis of perspectivity 153 axis of rotation 294
barycentre 102 barycentric coordinates 110 base 19 Bessel's inequality 275, 300 bifunctor 91 bilinear form 158, 197 bilinear mapping 197 bilinearity 64 canonical affine space 100 canonical decomposition 259 canonical injection 68 canonical projection 68 cardinal number 34 cartesian product 17, 68 category 79, 86 category of additive groups 87 category of linear spaces 88
central perspectivity 154 centrally perspective 128 centre of perspectivity 153 characteristic polynomial 244, 245 characteristic value 242 characteristic vector 242 class 85
cofactor 221 cokernel 61 collinear 109 collineation 153 column index 156 column vector 157
commutatiR A- algebra 231 commutative law 7 complementary subspace 39 complete quadrangle 138 complete quadrilateral 138 composite 4 composition 86 conjugate 298 conjugate isomorphism 281 conjugate space 74 contravariant functor 90 convex 112 convex closure 112 coordinates 28 coproduct 94 cosine law 269 covariant functor 89 covector 74 Cramer's rule 223 cross ratio 131 cyclic permutation 210
determinant 215 determinant function 208 diagonal matrix 245 diagonal of a matrix 157 diagonalizable endomorphism 245 difference 7 dimension 27, 34 direct product 69 direct sum 17, 39 direction 106 distributive law 9 double summation 8
306
INDEX
dual configuration 138 dual functor 91 dual space 74 dual theorem 138 dual transformation 76 echelon matrix 180 eigenspace 250 eigenvalue 242, 245 eigenvector 242 elementary Jordan matrix 263 elementary matrix 186 elementary transformation 178 endomorphism 59,66 equipotent 34 equivalent objects 87 Euclidean algorithm 233 Euclidean space 268 even permutation 210 exact sequence 60 expansion by a column 223 expansion by a row 221 extension by linearity 48 exterior product 197 external composition law 4
family of finite support 9 family of generators 19 finite-dimensional linear space 24 free linear space 13, 14 functor 79, 89 fundamental theorem of algebra 236 fundamental theorem of projective geometry
ideal 234 identity 87 identity functor 89 identity matrix 162 image 55 improper rotation 292 index of nullity 205 index of positivity 206 infinite-dimensional linear space 27 initial object 93 inner product 197, 299 internal composition law 4 invariant subspace 239 inverse isomorphism 52 inverse matrix 169 invertible matrix 169 involution 71 irreducible polynomial 238 isometric transformation 302 isometry 292, 302 isomorphic linear spaces 53 isomorphism 52, 87
Jacobi's identity 197 join 37 Jordan form 263 kernel 54 Kronecker symbol 65
length 269 line 107 line at infinity 120 linear combination 18 146 linear construction 133 linear dependence 21 general linear group 169 linear form 74 generalized associative law 8 linear function 74 generalized commutative law 8 linear functional 74 generalized distributive law 11 linear independence 21 generate 19 linear mapping 46 Gram-schmidt orthogonalization method 273 linear space 9, 11 greatest common divisor 234 linear transformation 46 group of automorphism 67 linear variety 105 linearity 46 Hamilton-Cayley theorem 253 linearly dependent 108 harmonic quadruple 131 linearly independent 108 hermitian form 298 hermitian matrix 298 Markov matrix 163 Hilbert space 271 matrix 155 homogeneous coordinates 125 matrix of a bilinear form 199 homogeneous part of a system of equations matrix of a change of bases 170 178 matrix of a linear transformation 166 homomorphism 46 maximal element 32
307
308 midpoint 103
minor 222 mixed product 203 morphism 87 multilinear form 202 multiple 9 multiple root 236 multiplicity of a root 236
INDEX
projective structure theorem 151 projectivity 62 proper rotation 292 pseudo-euclidean space 271 Pythagoras theorem 272 quadratic form 201 quotient space 41
natural isomorphism 93 natural transformation 79, 92 neutral element 6 nilpotent 255 non-singular matrix 169 norm 269, 299 normal transformation 302 normed linear space 278 nullvector 11
radical of a bilinear form 204 range of points 141 rank 56 rank of a matrix 176 reduced matrice 263 reflexion 295 relatively prime polynomials 235 replacement theorem 25 root a polynomial 235 row index 156
object 85
scalar 10 scalar multiplication 9 scalar product 197 Schroder-Bernstein theorem 34 Schwarz's inequality 276, 300 segment 112 self-adjoint transformation 285, 301 semi-linear transformation 147 semi-simple endomorphism 245 sesquilinear form 198, 297 sign of a permutation 211 simple root 236 size of a matrix 156 skew self-adjoint 295 skew-symmetric bilinear mapping 197 special orthogonal group 293 subspace 35 substitution 235 sum 6, 37 summand 6, 8 summation sign 8 supplementation theorem 25, 33 symmetric bilinear mapping 197 symmetric transformation 285 system of homogeneous linear equations 177 system of linear equations 175
odd permutation 210 orthogonal 272 orthogonal complement 274 orthogonal endomorphisms 71 orthogonal family 272 orthogonal group 293 orthogonal matrix 293 orthogonal projection 275 orthogonal transformation 292 orthonormal base 272 orthonormal family 272 orthonormally diagonalizable 303 parallel coordinates 110 parallelogram identity 269 pencil of lines 141 pencil of planes 141 permutation 209 perpendicular 272 perspectivity 153 point at infinity 120 polynomial 232 positive definite 298 positive definite matrix 166 positive definite quadratic form 205 principle of duality 138 product 9, 94 projection 67, 71 projective group 153 projective isomorphism 142 projective line 123 projective plane 123 projective space 122
tensor product 203 term of a matrix 156 terminal object 93 theorem of Desargues 129 theorem of Pappus 127 trace 16S trace of an endomorphism 244
INDEX
transpose of a matrix 158 transposition 211 triangle inequality 300 triangular matrix 251 trilinear form 203 unit vector 299 unitary transformation 302 upper bound 32 vector 110
vector product 197 vector space 9 weight 102
zero linear transformation 46 zero of a polynomial 235 zero vector 11 Zorn's lemma 32
309