DIFFERENTIAL GEOMETRY AND THE CALCULUS OF VARIATIONS
This is Volume 49 in MATHEMATICS IN SCIENCE A N D ENGINEERING A series of monographs and textbooks Edited by RICHARD BELLMAN, University of Southern CaI$ornia A complete list of the books in this series appears at the end of this volume.
DIFFERENTIAL GEOMETRY AND THE CALCULUS OF VARIATIONS Robert Hermann UNIVERSITY OF CALIFORNIA SANTA CRUZ, CALIFORNIA
@
ACADEMIC PRESS New York and London
1968
COPYRIGHT 0 1968, BY ACADEMIC PRESSINC.
ALL RIGHTS RESERVED. N O PART O F THIS BOOK MAY BE REPRODUCED I N ANY FORM, BY PHOTOSTAT, MICROFILM, O R ANY OTHER MEANS, WITHOUT WRITTEN PERMISSION FROM THE PUBLISHERS.
ACADEMIC PRESS INC.
I I I Fifth Avenue, New Y o r k , N e w Y o r k 10003
United Kingdom Edition published by ACADEMIC PRESS INC. (LONDON) LTD. Berkeley S q u a r e House, London W. I
LIBRARY OF CONGRESS CATALOG CARDNUMBER : 68-14664
PRINTED I N THE UNITED STATES O F AMERICA
Preface Differential geometry has radically changed in the last twenty years: A
‘‘global ” approach based on the theory of manifolds and inspired, at least
in part, by progress in the sister field of topology has replaced the traditional methods. However, unlike topology (and like, say, analysis) the problems have not really changed, and the student who ignores the history of the subject cuts himself off from the richest sources of intuition. In fact, one might say that the new methods are just a systematization of viewpoints that have always been inherent in the subject, at least in the work of such masters as Lie, Darboux, Cartan, Levi-Civita, and Carathtodory. (These are the men from the classical period of differential geometry whose work will appear often in this book.) This volume is meant to serve a variety of functions. It was originally planned to show to mathematically inclined engineers and physicists how differential forms and vector fields could be used in the calculus of variations and Hamilton-Jacobi theory, i.e., in the mathematics of classical mechanics. However, over the years the book has been written its scope has widened, and now differential geometry itself is emphasized. Hopefully, enough of the applied flavor remains to interest the audience for whom it was originally intended. Half of the book is an exposition of the geometric side of the classical oneindependent variable calculus of variations and Hamilton-Jacobi theory, corresponding to the classical treatise LeCons sur les Invariants Integraux ” by E. Cartan, and “ Variationsrechnung by C. Caratheodory. Now, this material has been in a complete form for at least 50 years. The reasons for giving it such prominence are (a) I like it, feel it has great beauty, and deplore that it has virtually disappeared from mathematical education, and (b) I think that its combination of qualitative geometric reasoning and detailed computation is very useful as a model for training in differential geometry and mathematical physics. Especially important, the student can really learn how vector fields and differential forms, the building blocks of the subject, are used. However, this is not a systematic general treatment of the calculus of variations. (The excellent treatise by Gelfand and Fomin is highly recom“
”
V
vi
Preface
mended here.) In differential geometry today the most important variational structures are Riemannian metrics. Accordingly, 1 have gone as far as seemed profitable in a study of general variational structures, then switched to Riemannian geometry, which has a different flavor because of the intervention of the theory of affine connections. This switch causes a discontinuity in the nature of the material presented in the book. In the first two parts the reader will find the material presented along more classical lines, while in the third and fourth parts we change gears in order to present to the reader the different outlook of contemporary differential geometry. Although in principle the only prerequisites would be a good course in advanced calculus and possibly some vector and/or tensor analysis, it probably would be best for the reader to be familiar with the introduction to differential forms given in the book by H. Flanders [ l ] and i n the introduction to Lie groups and the vector field concept given i n the book by Auslander and Mackenzie [I]. Spivak‘s book [ I ] is recommended as preparation in calculus, and we have referred to it occasionally for proofs. Abraham’s book [ I ] can be consulted for an alternate treatment of many topics, as well as an introduction to the advanced parts of classical mechanics. The beginner in differential geometry will find that the matter of notations is the most annoying obstacle to grasping the fundamental ideas. In fact, there is an amusing definition of modern differential geometry as “the study of invariance under change of notation.” 1 believe that the situation is not really this bad, and that there is a reasonably optimal notation available, namely what can be called the differentiable manifold-vector field-differential form notation. However, one must be prudent in using the big guns of modern mathematics, and the reader will notice that various currently fashionable bits of jargon have been exorcised from the treatment. The aim is not to construct a quasi-algebraic apparatus to tackle a few central problems, but rather to achieve a synthesis of algebraic, analytical, and topological techniques to cover a variety of topics. While it would be desirable in the abstract to say that the “global” problems are the central ones, it is usually not possible to make a decisive distinction between “ local and global.” This is the first in what might be a two volume work. Several items in this volume may seem isolated from the rest of the book. Some were put in for their own sake as interesting side points, but others are planned as introductions to topics that will be covered more systematically in the second volume. This book was begun at the Lincoln Laboratory of Massachusetts Institute of Technology; I am greatly indebted to my colleagues there. A grant of ”
“
Preface
vii
support for one year from the Mathematics Division of the Air Force Office of Scientific Research enabled me to extend the scope of the book; it was completed at Argonne National Laboratory. I am, of course, indebted to many colleagues for conversations and ideas and I would like to thank them. I shall attempt a partial listing: W. Ambrose, L. Auslander, M. Berger, S. S. Chern, J. M. Cook, R. Crittenden, B. Friedman, P. Griffiths, S. Helgason, R. Kalman, W. Klingenberg, N. Kuiper, C. C. Moore, J. Moser, R. Palais, R. Prosser, S. Smale, I. M. Singer, D. C. Spencer, S. Sternberg, and H. C. Wang. J. Moyal and Harley Flanders have read part of the manuscript and made many suggestions. April, 1968
R. HERMANN
This page intentionally left blank
Contents
PREFACE
V
Part 1. Differential and Integral Calculus on Manifolds Introduction Tangent Vector-Vector Field Formalism Differential Forms Specialization to Euclidean Spaces: Differential Manifolds Mappings, Submanifolds, and the Implicit Function Theorem The Jacobi Bracket and the Lie Theory of Ordinary Differential Equations Lie Derivation and Exterior Derivative; Integration on Manifolds The Frobenius Complete Integrability Theorem Reduction of Dimension When a Lie Algebra of Vector Fields Leaves a VectorField Invariant 10. Lie Groups 11. Classical Mechanics of Particles and Continua 1.
2. 3. 4. 5. 6. 7. 8. 9.
3 6 11 21 28 34 46 63 73 81 98
Part 2. The Hamilton-Jacobi Theory and Calculus of Variations 12. 13. 14. 15. 16. 17. 18.
Differential Forms and Variational Problems Hamilton-Jacobi Theory Extremal Fields and Sufficient Conditions for a Minimum The Ordinary Problems of the Calculus of Variations Groups of Symmetries of Variational Problems: Applications to Mechanics Elliptic Functions Accessibility Problems for Path Systems
Part 3.
113 122 142 152 170 232 24 1
Global Riemannian Geometry
19. Affine Connections on Differential Manifolds 20. The Riemannian Affine Connection and the First Variation Formula 21. The Hopf-Rinow Theorem; Applications to the Theory of Covering Spaces
ix
261 272 284
Contents
X
22. 23. 24. 25. 26.
The Second Variation Formula and Jacobi Vector Fields Sectional Curvature and the Elementary Comparison Theorems Submanifolds of Riemannian Manifolds Groups of Isometries Deformation of Submanifolds in Riemannian Spaces
29 1 302 318 342 362
Part 4. Differential Geometry and the Calculus of Variations: Additional Topics in Differential Geometry 27. 28. 29. 30. 31.
32. 33.
First-Order Invariants of Submanifolds and Convexity for Affinely Connected Manifolds Affine Groups of Automorphisms. Induced Connections o n Submanifolds. Projective Changes of Connection The Laplace-Beltrami Operator Characteristics and Shock Waves The Morse Index Theorem Complex Manifolds and Their Submanifolds Mechanics on Riemannian Manifolds
313 378 386 394 401 420 421
BIBLIOGRAPHY
43 1
SUBJECT INDEX
435
DIFFERENTIAL GEOMETRY A N D THE CALCULUS OF VARIATIONS
This page intentionally left blank
Part
1
DIFFERENTIAL AND INTEGRAL CALCULUS ON MANIFOLDS
This page intentionally left blank
1
Introduction
We begin by recalling the main principles of ordinary three-dimensional vector analysis. The underlying space is the space of three real variables x = (x,,x, , x,). A scalar field is a real-valued function f(xI, x, ,x,). A vectorfield, denoted by X , say, is an ordered triplet (A,(x), A,(x), A3(x)) of scalar functions. (In the usual geometric representation of a vector as a directed line segment in Euclidean space, they are just the components along the three coordinate axes. However, we shall try to avoid using the Euclidean properties of three-dimensional number space.) Gibbsian vector analysis, as commonly used in the physical sciences, is concerned with the rules of calculation and the physicogeometric interpretation of the six basic operations on vector and scalar fields. The three basic algebraic operations are:
'A (a) Multiplication of a scalarfby a vector : (b)
Dot or inner product of vector fields. If
x=
(A13
A,,
y = (Bl, B ,
A3),
3
B3),
then X - Y is the scalar of A , B , + A, B, + A, B, . (c) Vector or cross product of oectorfields X and Y :
xx
Y = (A,B3 - B,A,, B,A, - A,B,, A,B, - B,A,).
These operations really involve only the algebraic properties of the range spaces of the scalar and vector fields. However, when we turn to the following three basic operations, which involve differentiation as well, it is clear that the domain spaces play a vital role also: (a)
Gradient. I f f i s a scalar field, grad f is the vector field:
(b) Divergence. If X field :
= ( A l , A , , A,)
is a vector field, div X is the scalar
aA, aA, f3A, +---+-.
ax,
ax, 3
ax,
4
Part 1. Calculus on Manifolds
(c) Curl. If X is a vector field, curl X is the vector field: d A , dA,
a.4, dA2
ax,’
ax,’ ax,
ax,
dx,
These operations involve, in the last analysis, writing out everything in terms of three components, which is cumbersome. However, there are a few simple rules of combination that, when proved once and for all, enable one to calculate problems of physics, differential geometry, and others without referring back to the components at each stage. For example: curl(gradf)
= 0,
x x ( Y x Z ) = ( X x Y ) x Z + Y x ( X x Z), x x Y = - Y x x, x.Y = Y .x. Of course these operations also have a simple physical or geometric interpretation, but we shall not consider either in detail at this time. As examples: (1) grad f is the direction of steepest ascent of the function f ; or, which says the same thing in different words, gradfis perpendicular to the level surfaces off. (2) Af = div(gradf) = 0 expresses the absence of sources and sinks of a flow with a potentialf.
However, the simplicity and beauty of this scheme when applied to everyday problems of the physical sciences almost inevitably forces difficulties and awkwardness when problems involving change of coordinates are encountered. This awkwardness can often be circumvented by a clever combination of physical and mathematical reasoning, but there are limits to what can be done because these operations depend on the flat or Euclidean structure of the underlying space. One might suppose that tensor analysis offers a way out of this dilemma by bringing the requirements of invariance under change of coordinates to the foreground, but the geometric properties, at least, of the fundamental objects and operations are often hidden in a maze of indices and conventions. The great advantages of tensor analysis are that some of the formal simplicity of ordinary vector analysis is retained and that it is the supreme tool in subjects where extensive computations must be made. However, in this book we shall develop and use what we call the formalism of vector-fields-differential forms in n-dimensional spaces and manifolds. Despite the fact that this formalism may, as justly as tensor analysis, be regarded as the direct generalization of the ordinary vector analysis outlined above, it has up to now occupied almost no place in the mathematical arsenal of the theoretical physicist or engineer. We shall attempt to provide the reader with a link
1. Introduction
5
between his presumed knowledge of ordinary vector analysis and/or tensor analysis by defining as explicitly as possible the notion of vector field and differential form on the Euclidean n-space ; further, we shall motivate this notion by applications to the theory of ordinary differential equations and the calculus of variations.
2 Tangent Vector-Vector
Field Formalism
It will be assumed that the reader knows the rudiments of finite dimensional vector-space theory and point-set topology. For the latter, this should include familiarity with such notions as “ compactness,” continuity,” “ Hausdorff,” and topological space,” and with the elementary general theorems interrelating these notions. However, in order to correlate these principles with material that the reader may have encountered in physics or engineering (for example, tensor analysis), we shall show in the next chapter how the notions introduced in this chapter assure a more familiar form for Euclidean spaces. Most of our work in Parts 1 and 2 will be concerned with them. The aim of our formalism is to be able to carry over differential calculus from Euclidean spaces to more general (finite or infinite dimensional) spaces. Now, the primitive notion in calculus is the idea of a derivative of a realvalued function ? + f ( t ) of a real variable t : “
“
df dt
- ( t ) = f ’ ( t ) = lim
f(t + At) -f(t)
At-rO
At
This is extended to real-valued functions J’(x,, . . . , x,,) of n-real variables by defining the partial derivatives:
af af -,...,-. ax 1 ax” However, this is just a special case of (2.1) :
is the derivative of the function t -J(xl
+ t , x 2 , . . . , x,)
at t
= 0.
The coordinates (x,,.. . , x,) play a special role in this definition, since if x,’(x), . . . , xfl’(x)are new coordinates, the
;if
af
are quite different from the ax,,’...’ I ax“
af
-,
8x1
. . . ,-.a !
ax”
However, let us analyze what is done in (2.2). We take the curve + (x, + t , . . . , x,) in R”, restrict the function to the curve, and differentiate
t
6
2. Tangent Vector-Vector Field Formalism
7
it by the rule (2.l).? However, notice that we have used only curves of a special type here; namely, those that are coordinate lines. When we change coordinates, naturally we change this system of curves. Thus we can encompass all possible changes of coordinates by considering this “ directional derivative ” process applied to all$ curves in R”. We shall now describe the mathematical structure (the “ tangent bundle ’7 to which this leads. Suppose t + x ( t ) is such a curve, with t running over the interval 0 5 t I 1. Let us pick any point in the curve. For illustrative purposes, suppose this is the point t = 0. Consider the mapping
from the vector space of real-valued functions of x to real numbers. There are two algebraic rules satisfied by this process: (a) It is linear. (b) If f ( x ) and g(x) are functions
Explicitly, of course, if x ( t ) = ( x l ( t ) ) ,. . . , (x,,(t)),and if d x , (O), ..., u, dt
v 1 =-
dx dt
= 2(0)
’
then
We are accustomed to interpreting (vl, . . . , u,) as the “tangent vector” to the curve t + x ( t ) at t = 0. But (2.5) tells us that by making use of the Euclidean structure of R“ and by using its ordinary coordinate system, the linear mapping (2.3) can be essentially identified with the tangent vector ’’ to the curve. This suggests in general that we can say that two curves t + x ( t ) and t + y ( t ) have the same tangent vector or have contact to first order” at t = 0 “
“
R“ denotes the set of n-tuples of real numbers considered, say, as a vector space over the real numbers. We shall also use a vector notation x = ( x , , . . . ,x,,) when it is convenient; no confusion is likely. 1 Of course we are not making precise questions of differentiability of curves and functions that are necessary to enable the derivatives to make sense. This will be made more precise in the next chapter; in general, we assume everything is differentiable infinitely many times.
8
Part 1. Calculus on Manifolds
if the functional defined by (2.3) is the same. Also (2.5) tells us that two such curves must meet at t = 0. Thus we can say that the set of “ tangent vectors” to a point x(0) of R” is the set of mappings of the form (2.3). This is an idea that is obviously independent of coordinates. Let us use it to define analogous notions for more general spaces. Let M be a space. The set of real-valued functions on M forms a ring. Such functions can be added and multiplied in the usual way. Let F ( M ) be a subring of the ring of all functions that we shall regard as the basic” functions on M . For example, if M is R“, we shall want F ( M ) to be the ring of all functions that depend on the underlying variables x l , . . . , x, in an infinitely differentiable (Cm) way. In this chapter we shall not be so precise about this ring, but shall regard it as given. We shall denote points of M by such letters as p , q, and elements of F ( M ) by such letters asf, g , . . . , occasionally confusing the functionfwith its valuesf(p), and abbreviating F ( M ) to F if M is fixed in the discussion. “
Definition
Let p be a point of M . A tangent vector to p is a linear mapping, typically denoted by v , of F ( M ) to R , such that 4fg)=U ( f M P ) + 49)f(P)
for f , 9 E F .
(2.6)
(Note that (2.6) corresponds to (2.4b).) The set of all such linear mapping is called the rangent space to M at p , denoted by M,, . Since two such mappings defined at the same point p can be added and multiplied by real scalars, the tangent vectors at p form a real vector space. The union M,, of tangent spaces to all points of M forms a new space called the tangent bundle to M , denoted by T ( M ) . We can consider a curve in M as a mapping (typically denoted by such letters as c or y) of an interval of real numbers into points of M . Throughout this book t will denote such a real parameter. For simplicity, we normalize the interval to 0 5 t 5 I . (Occasionally, s will also serve to denote a real parameter.) For each point of the interval we define the tangent vector to B at t , denoted by ~ ‘ ( t )as, follows:
UPEM
(Of course c cannot be an arbitrary continuous curve, since the derivative in (2.7) must exist, but we assume that the reader has enough experience from advanced calculus to formulate the correct differentiability hypotheses.) Thus a’(t) E Ma(,)
for 0 I tI 1.
9
2. Tangent Vector-Vector Field Formalism
At this general level, it is not possible to assert that, for each point p E M and each v E M p, there is a curve passing through p whose tangent vector is v. Thus a “ tangent vector” is not necessarily, as was found for Euclidean space, an equivalence class of curves passing through p , with two curves “ identified ” if they meet at p to “ first order of contact.” The definition we have adopted is the appropriate one if we want tangent vectors to form a vector space. Regard a curve as a mapping 0:[0, I ] - + M . Its tangent vector t-+o‘(t) then defines a mapping 0‘:[0, I ] -+ T ( M ) . It is a cross section in the sense that applying the “projection mapping” T ( M )-+ A4 (which assigns to each vector v E T ( M ) the point to which it is attached ”) gives back 0.Here we touch on the theory of fiber bundles. It will repay our investment to pause and explain the geometric idea of a “vector bundle,” which is a special case of a fiber bundle. “
Definition Consider a mapping 71 of a space E onto a space M . It is called a vector bundle if, for each p E M , the inverse image x-’($) (the fiber above p ) is a real vector space. (In the case E = T ( M ) , the fiber above p is just M , , the tangent space to M at p.) The vector-bundle concept permeates all modern mathematics, and is relevant to many ideas in physics, particularly in quantum field theory. Intuitively, the space M (the base space of the bundle) is nonlinear,” while the fibers are linear: E is nonlinear horizontally and is “ linear ” vertically. (See Fig. 1.) “
“
”
FIGURE I
In this book we shall not consider vector bundles very extensively, although some of the language and naive geometric intuition built up for their study will be helpful. (For exilmple, the book by Auslander and MacKenzie [l] is recommended for its more extensive treatment of many ideas mentioned here.) Associated with a vector bundle 71: E M is the concept of cross section, denoted when general vector bundles are being discussed by $. It is a map M -+ E such that --f
71$(p) = p
for all p
E
The set of cross sections is denoted by T ( E ) .
M.
(2.8)
10
Part 1. Calculus on Manifolds
Although the points of E that are in different fibers cannot be added, cross sections can be : ($1
+ $2)(P>
= $,(PI
+ $z(P>
for P
EM
,
$1.
$2 E
T(E).
(2.9)
Notice that (2.8) guarantees that $ l ( p ) and $2(p) lie in the same fiber 7 c - ’ ( p ) , and the basic postulate of a vector bundle allows us to add ~ l ( pand ) $2(p). Further, $ E r can be multiplied by a function f~ F ( M ) : (,f$)(P)
= f(P)$(P)
for P E M .
(2.10)
I n algebraic jargon, r ( E ) forms a module over the ring F ( M ) . As we shall see, many concepts i n differential geometry take on optimally elegant analytical form when expressed in this module language. For example, let us look at the case E = T ( M ) . A cross section, which we denote in this case by such letters as X, Y, . . . ,can be geometrically described, then, as a uectorjield, since to each p E M it assigns a tangent vector X ( p ) lying in M , . In this case we denote T ( E ) by V ( M ) . The “module” description of V ( M ) will be very convenient for us. Let X E V ( M ) ,that is, X assigns a tangent vector X ( p ) E M , to each p E M . Thus, forfE F ( M ) , p + X ( p ) ( f ) defines another function on M . Assume that it too belongs to F ( M ) , and denote it by X ( f ) . X then defines a linear mapping: F ( M ) + F ( M ) , called the Lie deriuatire operation. X ( f ) is called the Lie deriiiatiue o f f by X,By definition,
X(f)(P) of
= X(P)(f).
(2.11)
The basic property (2.6) of tangent vectors then allows us to note a property X as a mapping: F ( M ) + F ( M ) . X ( f 9 ) = X ( f ) s +f
W)
for f,9 E F ( M ) >
(2.12)
that is, X is a deriration of the ring F(M). This property enables us in favorable cases (for example, if M is a differentiable manifold and F ( M ) is the ring of C“ functions) to define a vector field as a derivation of F ( M ) . In local coordinates, we shall see that such a derivation is nothing but aJirst-order, linear diferential operator. Conversely, suppose X: F ( M ) + F ( M ) satisfies (2.12). Then X defines a cross section p -+ X ( p ) E M , of the tangent bundle to M :
X(P)(f)= X(f)(P). For the purposes of this chapter, we have not made any distinction between the objects X defined in either of two ways, either “geometrically” as cross sections of T ( M ) or algebraically” as derivations of F(A4). “
3
Differential Forms
Whenever we are given a vector bundle 71: E + M , we can construct “new” vector bundles by the operations of tensor algebra on the fibers. As an example, we mention the dual space and the skew-symmetric multilinear forms. Let V be a real vector space whose elements we denote by v and call vectors.” A “covector” is a linear mapping, typically denoted by w , of V into R, the real numbers. The space V* of covectors forms a new vector space called the dual space to V . If Vis finite dimensional, V* has the same dimension as V. In fact, suppose v l , . . . , v, is a basis for V ; that is, every element c’ E V can be written in the unique form “
v=alul
+ . - -+a,v,.
The coefficients a , , .. . , a, in this expansion depend linearly on v, and hence define linear forms on V, that is, elements of V*, which we denote by ol, . . . , o n .One can prove (as an exercise) that w I , . . . , o,forms a basis for V*, called the dual basis to the given basis ( v l , , . . , u,) of V. It can also be characterized by the condition? O i ( U J ) = hij,
1I i, j 5 n.
An r-covector on V is a mapping (%,- - * , v,)+o(u,,
. . - ,0),
with domain the n-tuples of elements of V with values in the real numbers. We can indicate this by the notation w : V x . .. X V + R. We require that o be multilinear in the sense that it is linear in each of the variables ulr . . . , u, when all others are held fixed. In addition, we require that it be skewsymmetric in the sense that w ( u l , , . ., u,) changes sign when neighboring arguments are permuted ; that is, w ( u l , . . . , 13,) =
- o ( v z , u,,
u3,
. . ., v,)
w(vI,v2,uj,...,ur)= -w(vI~v3,u2,u4,...,u~)
and so forth. Again, the set of these r-covectors forms a vector space, which we shall denote by V*‘. If w E V*l, v E V , we shall define v _I o,the contraction of 0
t S,,
is the ‘‘ Kronecker delta” symbol; it is zero except when i =j , when it is I . 11
12
Part 1. Calculus on Manifolds
by u, or the inner product of w with u, in the following way:
..., u r - L ) = W ( U , U 1 , ..., u r - l ) .
(U J O ) ( U I ,
Thus u _I w is the element of V * r - l resulting from holding fixed one of the r-arguments of w . We shall find the mapping from V*‘ to V * r - l convenient for proving facts about r-covectors by induction on r . If r = 1 , of course u _I w coincides with w(u), the value of the linear form o on u. We shall use both notations interchangeably. If w I is a n r-covector and o2 an s-covector, a “product” form denoted by w 1 A w2 can be defined as an (r +s)-covector. It is called the exterior product of w1 and w 2 . Roughly, it is obtained in the following way: Consider (r s)-vectors u , , . . . , u,+, . One can assign to them the number
+
w1(u1,
. . . >u r ) w 2 ( u r + 1 ,
ur+s)-
However, this assignment does not depend skew-symmetrically on all variables. I t can be made so by permuting the variables and adding up the results, with appropriate signs. For example, if r = s = 1 , (w1 A
w2)(u1,
u 2 ) = wI(uI)wZ(u2)
- WI(u2)wZ(uI).
Notice that from this formula follows U _] ( W I A 0 2 ) = + 0 1 ( U ) 0 2
= (V
3 W1) A
- 02(V)Wl W2
-0
1 A (U
02).
(If c is a constant, c A w is taken t o be just cw.) The general formula can be guessed as U
(01 A 02)
= (U
W1) A 0 2
-I-(-
1 y W l A (0 J W 2 ) .
(3.1)
Now we turn things around and use (3.1) to define the exterior product
w 1 A w 2 by induction on r +s. Suppose it is defined for covectors whose
degree adds up to a number less than r the formula:
+ s, with (3.1) true. Define w 1 A w2 by
( 0 1 ~ ~ 2 ) ( ~ 1 , - - . r ~ r + s ) = u~ 1 (
w ~l ~ 2 ) ( ~ 2 , - - . , u r + s ) ,
(3.2)
where u1 _] (wl A w 2 ) is given by the right-hand side of (3.1). We must show that w , A w 2 as defined really depends skew-symmetrically on the variables ( u , , . . . , u,+,). That it depends skew-symmetrically o n the variables u 2 , . . . , o r + g follows from our inductive hypotheses that the righthand side of (3.1) is a genuine (r s - 1)-covector. We must check that it changes sign when u1 and u2 are permuted:
+
(w1 A
w2)(r1, . . . , Ur+J = u2
_I (01
J(w1
A W2NU3,
. . ., U,+J
13
3. Differential Forms
But V2
J (01 J ( 0 1
A 02)) = U2
J (U1
_I 0 1 ) A 0 2
+ (- 1y01 A
J0
(U1
2 )
= ( V 2 J (u1 J 0 1 ) ) A 0 2
+ (- l Y - ’ ( V l
+ (-
l)r(%
-k (-
101) A (U2
_I 0 2 )
01) A (u1 1( U 1
02)).
A 0 2 )
A (212
1)”01
It is now clear that this changes sign when u1 and v2 are permuted. We have shown that o1A o2is defined as an (r + s)-covector, satisfying (3.1). The “ bilinear ” rules
+ 01’)A = W 1 A O2+ W , ‘ A 1 A +02’) = 0 1 A +0 1 W2
(01 0
(3.3b)
A 02’,
Wp
(02
(3.3a)
W2,
are easy to prove by induction also, and are left to the reader. The anticommutativity of the exterior product, namely, “
”
0 1 A 0 2
= (-
(3.4)
I)rs02 A 0 1
will also be proved by induction on r 4-s. Notice that an identity such as (3.4) between (r + s)-forms holds if and only if the identity resulting from applying V J to both sides holds for all u E V . But U J (0, A 0 2 )= ( V J 01) A O 2 4- ( - l Y W l
A (V _I 0 2 )
-(-1)(r-1)su2
A ( V J 01) + ( - q r + ( S - l ) r 02) A 01 (assuming (3.4) is true for forms whose sum of degrees is < r +s)
= (-
ly[(- 1)”02
A (V
01)
= V J ((- I)*s02 A 01).
+ (V
0 2 ) A 011
Then (3.4) is proved. THEOREM 3.1 Suppose w l r . . . , 0,is a basis of V*. Then, for each r, the r-covectors ailA . * . A air,
1I i, < i , <
7
.
.
< i, 5 n
(3.5)
form a basis for V*‘. In particular, each 0 E V * r admits a unique expansion of the form 0=
1a,, .._
i,Wil A
. . . A Oi,,
with coefficients (ai,... i$ depending skew-symmetrically on the indices. Then,
14
Part 1. Calculus on Manifolds
in the language of tensor analysis, V*' is the space of skew-symmetric (covariant) tensors on V. Proof Again we proceed by induction on r , using (3.1) to reduce the case r to case r - 1. First let us prove that the elements (3.5) are linearly independent. Suppose that there is a linear dependence of the form
c
1
...
a;, _._ ;,Oil
A
.. '
A Wi, = 0.
(3.6)
Let u , , . . . , u, be a basis of V dual to wl, . . . , w , ~that ; is, wi(vj)= dij. Apply o 1 _I to (3.6). The result is a relation of the form
1
2 < i l < i < ...
U,iz...i,Wi2
A
".
A
mi, = 0
(since over the indicated range of indices, only i, can have the value 1). By induction hypotheses, a, i , ... i, = 0. Then (3.6) takes the form
._.i, = 0. Continuing Apply r 2 _I to this relation. Similarly, one obtains aZi2 in this way, we see that eventually all coefficients are zero, which proves linear independence of the elements (3.5). We must show that every element w of V*' can be written as the sum of elements of the form (3.5). By induction hypotheses, u , A w has an expansion of the form for i = 1, . . .,n . bi, ...i, wi2A . - .A wi,
C
Since v1 A ( u l
_I
w ) = 0, this expansion takes the form
u,
Thus
_I W =
25i2<
2...
b i 2 . . . i r W i 2A
... A mi,.
has the property that applying u 1 _I to it gives zero. Continue in a similar way with u2 _I applied to w', etc. Eventually one obtains an element of V*'that is a difference between w and a sum of elements of the form (3.5), which has zero contraction with every element of the basis of V. This element must be identically zero; that is, w itself has an expansion in terms of elements of the form (3.5). Q.E.D.
Notice that Theorem 3.1 has as consequence that there is but one linearly independent element i n V*" (namely, 0, A ... A w,) and that V*' is zero if r > n. This plays an important role in linear algebra (leading to the properties of the determinant) and in differential geometry (leading to the concept of differential forms as " volume elements").
15
3. Differential Forms
Now we must examine the properties of the r-covectors when V is subjected to a linear transformation. Suppose that a: V-+ V ’ is a linear transformation from V to a vector space V ’ . The associated transformation on linear forms goes backward from V ’ to V. Precisely, if w’ E V ’ * , a*(o*) is the linear form on V, given by a*(o’)(u) = w’(a(u))
for u E V
a* is a linear mapping V’* + V * , called the dual mapping to a.a* can be extended to a mapping V‘*r+ V*‘ by the rule a*(o’)(u,,
. .., u,)
= o(a(u,), . . . , a(ur)).
It is easily seen that a*(wl’ A 02’) = a*(wl’) A a*(w,’). Suppose ( u , , . . . , u,,) and (u, ’, . . . , u,’) are respectively the bases of Vand V ’ . Then
The m x n matrix ( a i j )is called the matrix associated with the linear transformation and the bases. Let us compute the matrix associated with the dual a * :
Then a”(oj’)(u,) = a x .
But it also equals
where the matrix of a* relative to the dual bases is the transpose of the matrix of a relative to the two given bases of V and V ’ . In general, this passing back and forth between linear transformations and the matrices defining them is rather confusing. It is more in the spirit of modern geometry to work in basis-free terms, but occasionally it throws fresh light on a problem to regard it in both “pictures.” The following facts are readily proved, and are left to the reader. If a: V - t V ’ and p : V ’ 4 V ” are linear transformations, with pa: V - i V ” the composite transformation, then
(3.7)
(pa)* = a*p*.
Choose bases for V, V ’ , and V”. The matrix associated with Pa is then the product (in the usual sense of matrix multiplication) of that associated with and a. (It is to get this correspondence that we chose a(ui) = ajiu, rather
cj
16
Part 1. Calculus on Manifolds
cj
than a i j v j . )We see that a is 1-1, that is, has zero kernel,? if and only if a* is onto, that is, a*(V'*) = V * ,
(3.8)
a is onto V' if and only if a* is 1-1.
(3.9)
and (In matrix language, (3.8) and (3.9) translate into statements about the rank and nullity of a matrix and its transpose.) Now suppose that a is a linear transformation of a vector space V into itself, that is, V' = V . Let ol,. . . , w, be a basis for V * . Then, since V * , is one-dimensional, a*(w, A . . . A 0,) is a scalar multiple of w , A . . A w,. Call this scalar multiple the determinant of CI denoted by det a. Thus a*(wl Suppose that
A
- * .
A 0,) = (det
a)wl
A
... A
w,.
p is another linear transformation: V - + V. Then
P*(a*(0,
A
. .. A
W,))
. . . A 0,) = det(aP)w, A . . . A w, . = (aP)*(Wl
A
It also equals j?*(det(cc)o, A ... A
0,) = det(a)
det(P)w,
A
... A
w,,
that is, det(aj?) = det(cc) det(P), so that the determinant of the product of two linear transformations is the product of the determinant. We can use this remark to show that det a is independent of the basis chosen. Suppose that o,', . . . , w,,' is another basis for V * . There is then a linear transformation p : V-+ V such that
p*(wi) = mi',
i
=
1,
. . . , n.
Also, then, P-'*(w;) = m i ,
i =I,.
. ., n.
Then a*(O,' A
.. . A
0,') = a*(P(Wl)
A
= a*j?*(ol A
. .. A P*(O,)) ... A 0,)= det(ccp)o,
A
A
w,
.
7 The kernel of a linear transformation a is the set of u E V such that a f u ) = 0, that is, a-'(O).
17
3. Differential Forms
Also,
det a(ol'
A
*.
det a . p*(ol A ... A 0,) = (det a)(det B)wl A * - - A 0,
A 0,') =
Since det(ap) = (det a)(det p), we see that a*(o,'
.* .
A
A 0,') = (det
a)(o,'
A
... A
on'),
that is, the determinant does not depend on the basis chosen to compute it. We shall not carry out all details necessary to show the traditional definition of determinant in terms of matrices. Suppose (aij) is the matrix of a. Then U*(Oi) =
1
aij
j
wj
Using the rules for exterior multiplication would give the correspondence : det a = det(aij). Suppose we verify this only for n = 2: g*(01 A 0 2 )
+ a12w2) A ( a 2 1 0 1 + a22w2) = a , l a 2 2 w1 A 0 2 + E l 2 a21w2 A 0 1
= @llW
= (%la22
That is,
- al2a2I)Wl
A 0 2 .
det a = ( ~ l l a 2-2 ~ . ~ ~ a ~ ~ ) , as required. Let us now apply this brief excursion into multilinear algebra to the tangent bundle T ( M ) to a space M . For eachp E M , M,* is the dual space to M , (that is, to space of covectors at p ) , while M Y is the space of r-covectors to p . T*(M)=
u M,*;
P
~
T*'(M)
=
U Mp*'
PS M
M
on the corresponding vector bundles. A differentialform (of degree r ) on M is now a cross section of the vector bundle T*'(M). That is, it is a mapping (which we again denote by w ) that assigns to each point p E M a multilinear, skew-symmetric form on the tangent vectors to M and p : (u1, . . . , D,) + u ( v , , . . ., ur). But u also defines a multilinear map on vector fields. For Xi,. . . , X, E V ( M ) , assign the function p w ( X , ( p ) , . . . , X,(p)), which we denote by w ( X , , . . ., X,.). Notice that this assignment is F(M)-multilinear; that is, --f
w ( f X 1 ,X , , . . ., X,.) = f w ( X , , .. . , X,)
forfE F ( M ) .
Now it will turn out that the spaces we study (for example, differentiable manifolds) will have the property that any F(M)-multilinear, skew-symmetric map
18
Part 1. Calculus on Manifolds
V ( M ) x . . . x V ( M ) + F ( M ) comes from a cross section of the bundle T*‘(M). We can then interchangeably regard a differential form “geometrically as a cross section or algebraically,” using in this way the module structure of V ( M )over the ring F(A4). We denote by F‘(M) the set of rth degree differential forms, with F o ( M ) identical to F ( M ) , which of course is also an F(M)module. The differential forms and the vector fields are the main objects of our “differential calculus’’ on manifolds. So far, of course, we have not talked about differentiation except to motivate the original definition of the tangent bundle, but it appears with the introduction of the three principal, coordinatefree, natural,” differential operators: exterior differentiation, Jacobi bracket, and Lie derivative. The full development will be given in the next chapter. Here, we introduce exterior differentiation d on functions. The operator d of exterior differentiation will ultimately be defined as a linear (but not an F(M)-linear) map: F‘(M) F‘+’(M) for all r. For the moment, we define it only for r = 0, that is, on functions. Let f E F ( M ) ; d f i s a first-degree differential form defined as ”
“
“
--f
for X
dlf(X)= X ( f )
E
V(M),
or alternatively, for ZI E T ( M ) .
df(v) = v ( f )
Note that d ( f g ) = g df +J dg forf, g E F(M). Proof: 4 f S ) ( X ) = X(f9) = X ( f ) s + X ( 9 ) f for X =g df+fdg(X)
E
V(A4).
We have said that d is a “natural” differential operator. What is meant by this is that d has particularly simple properties when the space M is subject to a mapping. Let 4 be a mapping of M into a space M ‘ . Suppose M‘ also has a ring of functions F ( M ’ ) on it, used to define tangent vectors, differential forms, etc. Now, i f f ’ is a real-valued function on M‘, we can define function q5*(J’) in the following way:
q5*(f’)b) =f’(4(d)
forp E M
We shall assume that g5 satisfies the following condition: g5*(F(M’))c F ( M ) . (In case M and M’ are differentiable manifolds, and F ( M ) , F ( M ’ ) are the ring of C” functions, this condition means that q5 is a C“ mapping.) Notice that the mapping 4 induced on functions goes backward from the mapping of 4 itself on points. This is a typical dual situation; in tensor analysis one says “
”
19
3. Differential Forms
that this indicates that scalar functions transform like covariant tensors. We say that 4 induces a map on tangent vectors which goes in the same “direction” as 4 itself, namely, from T ( M ) to T(M‘). For p E M , u E M , , let $*(u) be the following tangent vector to @(p)E M ‘ :
+*(u)(f)
=
44*(f))
4* is a linear mapping from M p to M + ( p ) .
f0r.f-E F(M’).
proof. 4*(4 + u2>(f> = (01 + u2)(4*cf>) = %(4*(f)> + u2(4*(f)), which is occasionally called the diflerential of 4. The geometric interpretation of 4* is very natural: Suppose t -+ o ( t ) is a curve in M . Let t + 4a(t) = o , ( t ) be the image curve in M‘. Then
4*(a’(t))= ol’(t>; that is, the tangent vector of the image curve is the image of the tangent vector to the original curve in M under 4. Proof
4*(a’(t))(f)= a’(t)(4*(f))
d
= --f(o,(t)) = ol’(t)(f)-
dt
Notice that 4 does not have to be either 1 - 1 or onto in order that 4* can be defined on tangent uectors. However, the situation with regard to vectorjelds is not so simple. Suppose that X E V ( M ) . Then p @ * ( X ( p ) ) is a mapping M -+ T ( M ’ ) . One cannot associate a vector field on M‘ with this mapping in an unambiguous way unless 4-l exists. (Then one can define an “image” vector field 4 * ( X ) a s p ’ 4*(X(4-’(p)) = $*(X)(p’)).) However, differential forms do admit a simple law of transformation under 4: This is one main reason for their usefulness in differential geometry. (The other is that they serve as “ volume elements for integration.) Recall that 4* maps F ( M ’ ) onto F ( M ) , that is, maps F o ( M ’ ) onto F o ( M ) . We shall now extend this to a map of F‘(M’) to F‘(M) for all r, that is, to a map sending a differential form o’on M’ buck to a differential form +*(o’) on M . Recall that the linear map 4*:M , + M i ( , ) induces a dual map 4* from covectors on M i ( , ) back to covectors on M , , that is, 4*:Mi:,’, + MF‘. Regard o’as a cross sectior, p’ -+ w’(p’) E MA*‘ of T * r ( M ’ ) Then . 4*(u‘)is by definition the cross section p + 4*(0’(4(p))); that is, --f
-+
”
“
q5*(u’>(ul, . . . , u,)
= o’(4*(u1), . . . , 4*(ur))
for u l , . . . , u,
E
”
T ( M ) . (3.10)
20
Part 1. Calculus on Manifolds
Now we have the following very nice property of the exterior derivative operation : t$*(df) = dt$":(f)
for f~ F(M').
(3.11)
(When we extend d to higher-degree differential forms, this property will also ex tend .)
Proof.
For u E T ( M ) , t$*(df )(u> = d f (t$*(U))
= t$*(u)(f>
= U(t$*(f
1) = &*(f
Notice also that the definition oft$* would be impossible if differential forms were defined solely in terms of the F(M)-module structure of V ( M )and were not the same as those defined as cross sections of the covector bundles.
Exercises 1 . Suppose u , , . . . , u, is a basis of a vector space V. Define wl, . . . , o,E V * such that z: = wI(u)uI+ . . . + o,(u)u, for all u E V. Prove that ol, . . . , onare a basis of V.
2. Prove (3.7). 3. Prove the traditional explicit formula for the determinant of a matrix, using the definition of the determinant of a linear transformation given in the text. 4.
Prove (3.1 1) in an explicit way for mappings between Euclidean spaces.
4
Specialization to Euclidean Spaces: Differential Manifolds
Let R" be the space of n real variables (xl, . . . ,x,). Thus, a point of R" is an ordered n-tuple of real numbers; such an n-tuple will be denoted by ( x j ) , 1 < j i n, or by a vector notation x when no confusion is likely. Also, x i , without parentheses, will denote the real-valued function on R" that assigns the j t h coordinate to each point of R".t Consider a domain D in R";say, for simplicity, that it is contiex$: Let F ( D ) be the ring of real-valued functions that depend on a C" way on the points of D,that is, such that the partial derivatives of all orders exist. I f f € F ( D ) , df/dxi, d2f/(dxiaxj), etc., denote the partial derivatives with respect to the indicated variables. We shall now show how the objects such as "tangent vector," " vector field," and " differential form," which were introduced rather abstractly in Chapter 3, take a very familiar form here. THEOREM 4.1
A,
Let X E V ( D ) . Then, there are uniquely determined functions A , , . . . , E F ( D ) such that$
af
X ( f ) = Ai axi
f o r f e F(D).
(4.1)
Proof. Let xo = ( x i o )be a fixed (for the moment) point of D.Then, given
t The reader should note that much of the notational confusion in undergraduate differential calculus is caused by not making this distinction. 3 A domain in R" is an open, connected set of point. It is convex if it contains, for any two points xo and x' in it, the whole line segment txo i -(1 - t ) x ' , for 0 5 t 5 1. 5 We shall use the summation convention from now on. The general rules (as far as they can be formalized) are that when two indices occur in expressions multiplied together, they are to be summed over their " natural range of values" (which, presumably, has already been specified). We d o not use lower and upper indices as in tensor analysis; where upper indices are used, usually they will be ordinary counting indices. The same indices should not occur three or more times; this always indicates that a mistake has been made. Occasionally it will be required that indices occurring together not be summed, but this will be stated explicitly. Part of the convention requires that one index standing "alone" take on all values from its natural range; for example, y I j kpJ means cJy ,j k pJ for i and k = I , 2, . . . , n. 21
22
Part 1. Calculus on Manifolds
any C" function f(x) in D , Taylor's formula implies that f has a representation of the form f(x)=f(x0)
df +(x')(x~ axi
-
xi")
+ gi,(x)(Xi - X ? ) ( X ~ - x;).
(4.2)
(The functionsgij(x) are those obtained from any of the classical formulas for the remainder; they are C" i f f i s also.) Calculate X ( j ' ) , using the linearity and (2.12), and evaluate it at xo to obtain X(f)(xo). Notice, for example, that X(f(xo)) = 0, since X applied to any constant function is, from (2.12), zero. The result is
(The third terms drop out when xo is substituted.) Now x i is merely another function on D : Call X ( x , ) = A i . Thus, letting xo vary also, we have X (f )
af
a
= Ai - = Ai - (f
axi
dxi
),
whence the theorem.
Q.E.D.
Theorem 4.1 can be interpreted in the following way: Consider the differential operators , i = 1, . . . ,n, which mapfinto d f / 3 x i . They satisfy (2.12) and hence define elements of V(D)(if we define V ( D ) as derivations of F ( D ) ) . Theorem 4.1 then asserts that they form a basis for the module of V ( D ) over the ring F(D). Now we can state precisely how vector fields are defined as cross sections of the tangent bundle. THEOREM 4.2 Let xo be a point of D, and let u E D x 0 .Then
a
that is,
u = U(Xi) y
ax;
(XO).
(4.3)
In particular, we see that the values of the vector fields a/ax, at each point of D form a basis of the tangent space. Proof:
Applying
L'
to (4.2) gives the result, using (2.6).
Q.E.D.
4. Specialization to Euclidean Spaces
23
Theorem 4.2 tells us that T(D) itself is parametrized by a domain D' c R'", for, any u E D, admits a unique representation of the form
a ax,
a,-(x'): The assignment ZI -+ ( x , a,) defines the correspondence. We can use this to require that a cross section map X : D T ( D ) be C"-differentiable. The condition for this is clearly that the functions xo-+ X ( x j ) ( x o )= Aj(xo) be elements of F(D). We see from Theorem 4.2 that X considered as a cross section arises from the derivation A,(d/dx,) of F ( D ) : There is equivalence of the two possible definitions of vector field. Let us now examine differential forms. The coordinate function xj are elements of F(D). Their differentials dx, are both F(D)-linear functions on V ( D ) and cross sections of the bundle T * ( D ) : -+
dxj(X)= X(xj) dx,(v) = ~(x,)
for X
E
V(D),
for u E T(D).
Suppose w is any 1-differential form on D in the sense that it is an F(M)-linear map of V ( D )+ F ( M ) . Then
that is, w = w(d/i3xi)dxi.In particular, every such form arises from a cross section of the bundle T * ( D ) , and the particular " coordinate " differential forms dx, form a basis for the module A'(D). A similar remark holds for r-degree forms. Once we know that thedxjforms a basis, arguments identical to those used in Theorem 3.1 imply that the dx,, A . . . A d x j p ,1 5 j , <j , < . . . n form a basis for all forms of degree r, whether they are defined as F(D)-multilinear skew-symmetric maps V ( D ) x .. . x V ( D ) 4F(D) or as a cross section of the bundle of r-covectors. In particular, every r-form admits a unique expansion of the form ~ = a , , . . . , , . d x , A, * - - ~ d x , , , with skew-symmetric coefficients a,, ... , r . We shall now switch from domains in Euclidean space, denoted by D, D', etc., to general differentiable manifolds (of differentiability class Cm),denoted by M , N , etc. A differentiable manifold carries several sorts of structures, since intuitively it should be considered as a space that locally looks like a convex domain in Euclidean space, with all the local Euclidean structures tied together globally by the topology of the space. (Think of a closed surface in 3-space, say a sphere or a torus.)
24
Part 1. Calculus on Manifolds
Defttition
A space M is said to be a differentiable manifold (of class Cm)of dimension n if it carries the following structures: (a) M is a Hausdorff topological space and is the union of a countable number of compact subsets. (b) M has a covering by a family of open subsets, the typical ones denoted by U , U', . . . , and each open subset U of the family has associated with it a convex domain D in R" and a homeomorphism 4 of U with D such that, whenever two open sets U , U ' of the covering intersect, the associated transition mapping of q5( U n U ') with 4'( (In U ' ) is a map of differentiability class C" in the usual sense for Euclidean space. (Explicitly, this map assigns to an x E D such that 4 ( p ) = x,for p E U n U ' , the point 4 ' ( p ) E D '.) In addition, for differential-geometric purposes, it is convenient to suppose that the underlying topological space is connected (hence arcwise connected). We shall suppose that this is so implicitly, noting any exceptions explicitly. Each of these admissible homeomorphisms of an open set U with a subset D of R" will be called a chart. U itself will be called a coordinate neighborhood. The collection of all these charts will be called the a la s that defines the manifold structure. Notice that any open subset of R has a manifold structure: The covering can be taken as that defined by its open, convex subsets. A mapping 4: M + M ' between two manifolds will be said to be of differentiability class C" if, whenever referred back via charts for M - i M ' , it defines a C" map in the usual sense for R". As in the earlier part of this work, we shall deal mainly with C" maps; hence we shall not so specify every time one is introduced. Non-C" maps (usually piecewise C", though) will appear later, but we shall identify them explicitly. In particular, a real-valued C" function on a manifold M is a well-defined concept, that is, just a map A4 + R. The set of these functions will be denoted by F ( M ) . We base the definitions of the fundamental concepts of tangent vector-vector field, differential form, and similar terms on the properties of F ( M ) , as described before. A few details must be settled before proceeding further, to assure that everything works as smoothly as for domains in R". First, the manifold structure on the tangent and cotangent bundles should be made precise. This can be done in the following way: Let U be a coordinate neighborhood with a chart giving a correspondence with a domain D c R". Then, under this correspondence, the tangent vectors above points of U correspond in a 1-1 way to the tangent bundle T(D),which we have seen to be homeomorphic to a domain in R'". These "charts" for T ( M ) can be used to give it a manifold structure, and the vector fields are defined as the C" cross-section maps M - t T ( M ) . It can be verified (exercise) that this is equivalent to the condition that a cross-section
4. Specialization to Euclidean Spaces
25
map X : & T (I M) is + C" if and only if the function p -+ X ( p ) ( f ) is C" for each f E F ( M ) . Then one shows the identity between vector fields as cross sections of T ( M ) or as derivations of F ( M ) by relating the proof back to the special case where M is a domain in R". We leave the details of this as exercises. Let M be a manifold. A set of functions x l , . . . , x , defined on an open subset U of M is a coordinate system for U if the map p + ( x l ( p ) ,. . . , x,(p)) of U -+ R" is a diffeomorphism.t Then the differentials dx,,. . . , dx, form a basis for differential forms in U. We define d / d x j as the dual basis of vector fields in U ; that is, ( a / a x , ) ( f )are the coefficients in the expansion: (4.4)
Conversely, suppose ( x i ) are functions such that the dxj form a basis for the differential forms in U. Do they form a coordinate system for U, at least if U is a sufficiently small open set of M ? The answer is "yes," by the use of the implicit function theorem. Suppose that 4 is a chart diffeomorphism from U to a domain D of R",with coordinates on this R" denoted by y,, . . . ,y,. Transferring back to D,the x i , . . . ,x, become functions of the y . By (4.4),
(4.5) Thus the (dx,, . . . ,dx,) form a basis for differential forms if and only if the Jacobian matrix ( a x i / d y j )is nonsingular, that is, has nonzero determinant. If this is so, the implicit function theorem (see Chapter 5) asserts that the mapping y + x is a local diffeomorphism, that is, if U is sufficiently small. Classical tensor analysis works by describing geometric objects completely in terms of such local coordinate systems. Suppose we regard (xi)and (yi)as two different local coordinate systems for the same neighborhood U of M . Then (4.5) enables us to express the transformation law of components of covariant tensors (like differential forms) from one coordinate system to the other. For example, if o = a, dxi = bi d y i , then
which is the characteristic transformation law. The transformation law for controvariant tensors (like vector fields) is most readily derived from (4.5).
t A map 4 from manifold M t o manifold M ' is a difeomorphism if the inverse map 4-j: M 1+ A 4 exists and is C". (A map may be 1-1 and onto, with inverse mapping not C" ; for example, the map x
--f
x 3 from R + R.)
26
Part 1. Calculus on Manifolds
Recall that d / d x , and djay, are defined as the vector fields dual to the dxi and dyi.Suppose
a
aYi
-=
Then
a
A,.--. lJ
axj
That is,
Then (4.6) is a differential-geometric version of the chain rule for differentiation. However, this restriction to the use of '' flat" bases of differential forms and vector fields is the major defect of classical tensor analysis. Many geometric structures (for example, Riemannian metrics) take a very awkward form when their description is forced into this mold. E. Cartan, in his work on differential geometry and Lie groups (which is the foundation of all " modern " differential geometry), worked with a formalism that is halfway between tensor analy. . . , w,) sis and the formalism used today. Roughly, he worked with bases (u,, of the I-differential forms in neighborhoods of M that are not necessarily just the differentials of coordinate functions. He used the greater freedom to choose the " moving frames" to reduce many geometric problems to much simpler form than was possible in classical language.
Exercises
UpEM
M,, is its tangent bundle, prove that 1. If M is a manifold, T ( M ) = the procedure sketched in the text defines a manifold structure for T ( M ) . Prove that a cross-section map: M + T ( M ) is C" if and only if X ( f ) is C" for eachfE F ( M ) . 2. Let M be a manifold, with p o E M . Show that there exists two open neighborhoods U,, U , ofp, and a functionfE F ( M ) such that: (a) The closure of U , is contained in U , . (b) f ( p ) = 1 for p E U,. (c) ,f'(p) = 0 for p E M - U , .
27
4. Specialization to Euclidean Spaces
3. Show that a vector field on a manifold M can be defined either as a derivation of F ( M ) or as a C“ cross section of the tangent bundle T ( M )+ M . 4. Similarly, show that a differential form can be defined either as a C“ cross section of the covector bundle, or as a F(M)-multilinear form on V ( M ) .
5. Prove (4.2). 6. Suppose 4 : M + M ‘ is a C“ map between manifolds. Suppose X , X ’ are vector fields on M and M ’ . Let us say that they are q5-reluted if $*(X’(f’)) = X(4*(f’))
for eachf’
E
F(M’).
Prove: If X , Y E V ( M ) are &related to X ’ , Y ’ E V ( M ’ ) , then [ X , Y ] is &related to [ X ’ , Y ’ ] . Prove this abstractly, and then by using coordinate systems. 7 . Suppose A is the Laplace operator in the plane:
a2
A=-+-.
ax2
a2 ay2
Work out its expression in polar coordinates: X = r cos 0; using (4.6).
y = r sin 0
5
Mappings, Submanifolds, and the Implicit Function Theorem
We now develop the implicit function theorem and its consequences for the theory of mappings between manifolds, based on the “ inverse function theorem” in the version given by Spivak [ I , p. 351. In fact, this result takes the following form for manifolds.
THEOREM 5.1 Suppose M and M‘ are manifolds of the same dimension, and 6 : A 4 + M is a map between them. (Let p be a point of M , p’ = $(p). Suppose that $*(M,) = M i , . Then there is an open subset U containing p such that: (a) $ ( U ) is an open subset of M’. (b) $ is a diffeomorphism between U and $ ( U ) , that is, there is a map n: $( U ) U such that --f
$n(p’)
= p’
for p’ E d ( V ) ,
x$(p) = p
for p
E
U.
For example, suppose M and M ’ are identified with convex subsets D and D‘ of R”. One often wants to regard 4 as defining a “new coordinate system for D. For example, suppose X E D is of the form ( x l , .. ., x,,), with 6(x) = ($,(x), . . . , $,,(x)) = (x,’,. . . , x,,’).Then, regarding xl’, . . ., x,’as realvalued functions on M’, we have $*(xi’) = Cpi(x). Now
”
Since (d$JJx,) is the matrix of $* with respect to the natural bases for the tangent spaces of M and M’ defined by their being in R“, the condition: $*(M,) = M,. is equivalent to any one of the following conditions: (a) det(Jdi/dxj) # 0. (b) d$l, . . . , (I$,, are linearly independent at every point. (C) dol A . . . A d$,, # 0. We can now turn to the case of mapping between manifolds of different dimensions. 28
29
5. Implicit Function Theorem for Mappings
THEOREM 5.2 Let 4 : D + D' be a map of domains, D c R", D c R", m < n. Suppose that 4 satisfies the following condition: for all x E D.
4*(D,) = Di,,,
(4 is then said to be a maximal rank mapping). Then, if D is sufficiently small, it can be changed by a diffeomorphism so that 4 is just the standard projection of R" onto R". Schematically, there is a diagram of maps as follows:
1
D+R"
+\
projection
R"
(The map R" 4 R" is that which assigns, say, the point (xl, . . . , x ,) point (xl,. . . , x,J E R". (Recall that m I n.))
E
R" to the
Proof. Suppose coordinates for D are xl, . .. , x, ; coordinates for D' are y l , . . . ,y,, . Then the map is defined by 4(xl, . . .,x,) = (41(x), . . .,(p.(x)) = (yl, .. . ,y,). The n x m matrix
is the matrix of the linear transformation 4*:D, + D;,,,with respect to the natural bases of these vector spaces. T o say that 4* is onto is to say that the rank of this matrix is maximal, that is, m. After possibly relabeling coordinates and shrinking D,we may then suppose that
(:
det -(x)
)
# 0.
(5.1)
Consider the functions 4,, .. .,4", x " . - ~ , ...,x, on D. We want to show that they form a new coordinate system for D.(This is precisely what is required to prove the theorem, since the diffeomorphism D + R" is defined by (xl, .. .,x,) + (&(x), . . . ,~ J x ) ,x,+~, . . . ,xn).We must then show that the I-forms d+l, . . . ,&J", . . . , dx, are linearly independent. Suppose that there is a linear relation of the form " (5.2a) 1 2 , dba C Ai dxi = 0, "1
a= 1
+
i=m+ 1
or (5.2b)
30
Part 1. Calculus on Manifolds
(the terms . . . involve d ~ , + ~.., . , dx,,),forcing
hence Ia = 0 by (5.1); hence I , , , the proof.
=0 =
. . . = 1, by (5.2), which completes
This result can be rephrased in another way that is often useful in practice. Suppose M is a manifold, andf,, , , . ,f, is a set of real-valued functions on M . Suppose also that df,, . . . ,df, are linearly independent at every point of M . Then if (xl, . . . , x ,) is a coordinate system for M , the map (xl, .. . , x ,) + ( f l ( x ) ,. . . ,fn(x))o f a domain in R" into R" is of maximal rank. We conclude that in a neighborhood of each point of M , a coordinate system of functions can be introduced for whichf,, . . . ,f, are the first n-elements. Now we study submanifold maps. Dejnition
Let Nand M be manifolds, 4: N M a map. $ is said to define an immersion of N in M if the following condition is satisfied: --f
For each p
E
N , the tangent map $*: N , + M6(,) is 1-1.
(5.3)
If, in addition, I$ is 1-1, it is said to be a submanifold map of N in M , or defines N as a submanifold of M . Remarks. Strictly speaking, a submanifold consists of the ordered triple ( N , M , 4) satisfying these two conditions. It is often convenient and customary to relax this precise statement and regard the submanifold as 4 ( N ) , when no confusion is likely.
THEOREM 5.3 Let 4 : N + M be an immersion map. Then, each point p E N has a neighborhood U such that: (a) 4 restricted to U is a submanifold map. (b) 4 ( p ) has a neighborhood V with a coordinate system zl, . . . ,z, for V such that 4( U ) c V , and (6( U ) is the set of all points of V on which the functions z , + ~ ,. . . , z, are zero. Proof. Since $* : M , + M6(,) is onto, the dual map $* : M$(,,)-+ N,* on covectors is 1-1. Then we can find a coordinate system y,, . . . ,y , valid in a neighborhood of $ ( p ) such that:
The values of the 1-forms 4*(dy,), . . . , @*(dy,) in a neighborhood of p form a basis for the 1-covectors.
31
5. Implicit Function Theorem for Mappings
Then, by Theorem 5.1, in a neighborhood o f p the functions 4*(yl), ... 4*(-yn) form a coordinate system of N . Now, the functions #*(Y”+~),. ..,4*(yrn)are functionally dependent on the 4*(Y,), * . 4*(Y,)7 say7 .?
4*(Yn+
1)
= Fn+ 1 ( 4 * ( ~ 1 )... , 9 4*(Yn)),
4*(Ym) = Fm(4*(Y,),
..
. j
@*(Yrn)).
We may suppose without loss of generality that the F a r e defined over all R”. Consider the following functions on the neighborhood of 4 ( p ) :
.
~ 1 ,*
. 7
Yn, ~
n 1+ -
E +I ( Y ~ , .
*
-
7
Y m ) , . . . ? Y m - FrntY1,
.. Y,>. . ?
Their differentials are linearly independent in this neighborhood and hence there is a new coordinate system, say, zl, . . . ,z,, for a possibly smaller, neighborhood of &I) such that 4*(zl), . . . , 4*(z,) is a coordinate system for D ; $*(z,,+~) = 0 = . . . = 4*(z,). These properties imply (a) and (b) required for the theorem. Next, we inquire about the intersection of two immersed submanifolds. Here we must make the distinction between the case where the two submanifolds intersect “in general position” and where they do not. Intuitively, two such submanifolds are not in “ general position when they can be deformed slightly to change the dimension of the intersection, although this will not be the precise definition. First we shall deal with a problem in linear algebra. Let V be a vector space over the real numbers, and let V ’ , Y ” be linear subspaces. Construct the direct sum vector space V ’ 0 V”. We can map this into V , sending u‘ 0 v“ into Y’ + u“. The kernel of this linear map is V’ n V”, the range is V ’ + Y “ c Y . Thus we have the relation ”
dim V’ + dim V ” - dim(V’ n V ” ) = dim(V’ + V ” ) = dim V,
or
dim( V’ n V “ ) 2 dim V’ + dim V” - dim V.
This inequality suggests that we make the following definition :
Definition The linear subspaces V ’ and V“ of V are in general position if: (a) dim(V’ n V ” )= dim V ’ = dim V ” - dim V for the case dim V ’
+ dim V “ 2 dim V .
dim V . (b) dim(V’ n V ” ) = 0 for the case dim V ’ + dim V ” I
32
Part 1. Calculus on Manifolds
Roughly, we may say that V ' and Y" are in general position if dim( V ' n V " ) has minimal dimension compatible with the above inequality. Notice that, in case (a), dim( V ' + V " )= V ; that is, V' + V" = V. The crucial geometric property of this definition can now be stated.
5.4 THEOREM If ( Vt', V,") are two families of linear subspaces of V, depending continuously on a parameter t (say, 0 i t _< 1) and if ( V,', V,") are in general position, then ( Vt', V,") are in general position for t sufficiently small. Proof. The general position condition is equivalent to the condition that the linear map V ' @ Y" -+ V , constructed above, when made into a matrix by means of bases, have maximal rank; that is, a subdeterminant of maximal order must be nonzero. Since the determinant must vary continuously, this subdeterminant remains nonzero for sufficiently small t .
THEOREM 5.5 Let N , N ' , M be manifolds, and let 4 : N -+ M , 4 ' : N - + M ' be immersion maps. Suppose p E N, p' E N ' are points such that &p) = 4'Cp');&(NJ and 4*(NL,)are in general position within M 4 ( p )Then . there are neighborhoods U of p , U ' of p ' , in N and N ' such that 4( U ) n U ') is a submanifold of M whose dimension is equal to dim N + dim N ' - dim M . (PI(
Proof. Let n = dim M - dim N . By Theorem 5.3, there is a neighborhood V of +(p), a neighborhood U of p , and a maximal rank map $: V + R" such
that
$(&I)
= 0,
4 ( U ) = $-l(o)?
#J*WP) = $;Yo).
Let U ' = 4 ' - ' ( U ) . Considerthemap$+': U ' - + R".IncasedimN+ dim"= dim M , ($4')*= N i , + R," must be 1-1 ; hence $4' is an immersion map if U is sufficiently small. But $*$*I:
($4)(4(U> n 4'(U')) = 0, which shows that 4(U) n 4 ' ( U ' ) = 4'(p) only, as required. Consider the case that dim N + dim N ' > dim M . Then (++')* must map Nd. onto R,"; that is, $4' is a maximal rank mapping if everything is sufficiently small. Then, again,
4 - Y 4 W ) n 4'(U')) = ($4')-YO). But ($@)-'(O) can be represented by an immersion map of the required dimension by Theorem 5.3, which finishes the proof.
5. Implicit Function Theorem for Mappings
33
Finally we remark that all these different versions of the implicit function theorem may be intuitively summarized by saying that arbitrary C“ mappings satisfying maximal rank conditions behave locally just as linear mappings of vector spaces. Thus there is a good technical reason why a thorough knowledge of linear algebra is one of the most important prerequisites for the study of differential geometry!
Exercises 1. Suppose 4 : M + N is a maximal rank mapping of manifolds (that is, &(Mp) = N 4 ( p )for all p E M ) . Prove that for p E 4 ( M ) , the “fiber” @ ‘ ( p ) is an embedded submanifold of M .
2. Prove that the intersection of two embedded submanifolds which always meet in general position is a submanifold. (Determine whether they are embedded or immersed.) What is the global structure of the general-position intersection of two immersed submanifolds ?
6
The Jacobi Bracket and the Lie Theory of Ordinary Differential Equations Jacobi Bracket
So far, the basic objects-vector fields and differential forms-have been considered in order t o set up a formalism equivalent to ordinary differential calculus which, in addition, is independent of the choice of local coordinates. However, these ideas were first developed by S. Lie, not only for formal reasons but also for expressing in geometric form the many subtle and interesting relations between the theory of ordinary differential equations and the theory of Lie groups. Although the modern theory of Lie groups emerged from this work, it is an area that seems to be neglected in modern research. Here, we shall indicate only a few of the simpler ideas, partly for their own sake and partly to motivate the introduction of the Jacobi bracket operation on vector fields, which will play a major role in our work. We shall work with domains D in R” with coordinates xi.The extension to general manifolds will usually be evident. Recall that a vector field on D is defined as a derivation, say X , of F(D). It was proved in Chapter 4 that X took the form of a first-order partial differential operator:
a
x=/li--. axi If X and Yare vector fields, XY is not, since as an operator it is a secondorder differential operator. Explicitly, for f, g E F(D),
However, note that the “ b a d ” middle terms cancel out if ( Y X ) ( f g ) is subtracted from X Y ( f g ) ; that is, if f - X Y ( f ) - YX(f) is a derivation of F(D), and hence defines another vector field that we call the Jacobi bracket of X a n d Y , and denote by [ X , Y ] . The following formal laws follow from direct (although occasionally tedious) computations, which we leave to the reader: 34
35
6. Jacobi Bracket and Lie Theory
(6.le) ( X , X , , . . . , Y , Z , . . . denote vector fields, c,, c 2 , . . . denote real constants, f E F(D).) Properties (6.la), (6.1b), and (6.1~)express the fact that V ( D ) is, as a real vector space, also a Lie algebra. (A Lie algebra is a vector space with a multiplication’’ ( X , Y) -+ [ X , Y ] defined for any two elements X and Y satisfying (6.1a), (6.1 b), and (6. lc).) A curve a(t), a 5 t I b, is said to be an integral cume of a vector field X if o’(t) = X ( o ( t ) ) for a 5 t < b. Suppose the expression in coordinates for X is “
a
X = Ai-. ax,
Now d dt
- x,(CJ(t)) = C’(t)(Xi) = X(a(t))(x,)
a axj
= A,(o(t))-(Xi) = A,(o(t)).
Thus we see that CJ is an integral curve of X if and only if its coordinate functions xi(t) = x,(o(t)) satisfy the system of first-order differential equations: d
- Xi(t) dt
= A,(x(t)).
(6.2)
Invoking the existence theorem for systems of ordinary differential equations, we have : THEOREM 6.1 Suppose X is a vector field (of class C E )in the domain D.t Then
t All these statements will hold for manifolds also.
36
Part 1. Calculus on Manifolds
(a) Two integral curves o: [a, b] + D and o,:[a,, b,] D of X that coincide at just one common point to of their domains of existence must coincide in their entire common domain. (b) Given xo E D,there is a number a > 0 and an integral curve t + o(t, xo), 0 2 t 2 a, of X with a(0, xo)= xo.This function depends in a C“ way on xo. In addition, a depends on xo,but can be chosen independent of xo over any compact subset of D. (c) Ifo(t), a 5 t I 6 , is an integral curve of A’, so is thecurve o,(t)= a(t +c), a-c I tI b - c, obtained by translating the time” parametrization of 0. --f
“
All these statements are the geometric analogs of fundamental analytical properties of systems of differential equations of the type (6.2). For example, condition (a) of the theorem is just the uniqueness of solutions of (6.2); (b) follows from the usual Picard iteration method of solving (6.2) ;(c) follows from the uniqueness and the fact that (6.2) does not contain t explicitly on the righthand side (that is, it is a so-called autonomous system). These properties of integral curves enable us to try to “ continue solutions of (6.2) so that we can obtain integral curves defined over maximal domains of t. For example, start off with an integral curve o(t), 0 I t 2 a , , with o(0) = xo. Find an integral curve o,(t),a, 2 t 2 a,, with o,(a,) = .(a,). By uniqueness, the two curves can be fitted together (without corners) to obtain an integral curve over 0 2 t 2 a,. Repeat the process beyond a2 and also in the negative direction. Although we do not want to go into the details here, we can say that the process will succeed in proving the existence of a n integral curve defined over ( - co, co) unless “barriers” are met in the form of two numbers CI, p, CI < 0, fl < 0, such that: ”
There is an integral curve a(t), a < t < p, with a(0) = xo, but there is no such integral curve in a domain containing (a, 13).
One reason for the existence of these barriers” may be that the integral curve wants to escape from D into the remainder of R”. However, barriers may occur even if D = R ” ; for example, suppose that n = I , D = R’, X = x’(d/dx), and (6.2) becomes “
Intuitively, the curve wants to escape to a3 at t = l/xo. One might think that one way of remedying this would be to add ” a point at 00 to R’. This can be done successfully, but it leads to a differentiable manifold. (In this case, the manifold is the circle.) With these warnings in mind, let us, for the sake of understanding the geometric meaning of the Jacobi bracket, suppose that X and Y are vector “
37
6. Jacobi Bracket and Lie Theory
fields defined in D, all of whose integral curves can be extended over (- co, co). Suppose a(t), 0 t I a, is an integral curve of X . For each t construct the curves s -,a(t, s), 0 I sI b, such that: (a) a(t, 0) = a(t) for 0 I 1s a. (b) For each t, the curve s a(t, s) in an integral curve of Y. --f
We ask: If we hold s fixed, and consider the curve t --* a(t, s), when will this curve become an integral curve of X for each such s ? In terms of the coordinates (xi)for R", suppose that X=A.I
d
axi '
d
axi
&=Bi-
xi(t, S )
9
= xi(o(t, s)),
xi(t) = xi(V(t)).
Then our constructions translate into the conditions x,(t,
d x i(t) - A,(x(t)),
0)= Xi(t),
~
dt
Put
~
as
- B j ( x ( t ,s)).
a
Ci(t, s) = - X i ( t , s) - Ai(X(t, s). at
Now
THEOREM 6.2 If [ X , Y ] = 0, then for each s the curve t + a(t, s) is an integral curve of X . Intuitively, knowing one integral curve of X and all integral curves of Y starting on this curve, a whole family of integral curves of X can be obtained. Proof.
[ X , Y ] = 0 if and only if aAi
aBi
-A, - - B .
axj
axj
J
is identically zero. Then Ci(t,s) satisfies the equations
ac, -(r, as
S)
aB.
= 2( x ( t ,s))cj(t, s),
axj
38
Part 1. Calculus on Manifolds
which comprise a system of linear homogeneous first-order ordinary differential equations for C i in s (with t held fixed). Ci(t,0) = 0 , since o(t) is an integral curve of X . Then we take it as known from the theory of ordinary differential equations (uniqueness!) that Ci(t,s) is identically zero, which implies that for each s, t --+ o(t, s) in an integral curve of X . Q.E.D. We now turn to the interpretation of Theorem 6.2 and the integral curves of a vector field in terms of the theory of groups (assuming always that those integral curves of vector fields that we shall be considering can be extended indefinitely within D). Suppose t + o ( t ; x") is the integral curve for X , - 00 < t < 03, such that o(0) = x". Since t + o(t 4- a ; x") is also an integral curve, which takes on the value o(a; x") at t = 0, we must (by the uniqueness of integral curves, that is, by part (a) of Theorem 2.1) have
o(t + a ; x") = o(t; a(a; x")). For each t E (- co, co) define a transformation T, of D into itself as follows: T,(xo) = o(l;
x")
for each x" E D.
Now x" + Tf(xo)is a transformation of D into itself (of differentiability class C", by the fundamental existence theorem for ordinary differential equations). Also,
T+,,(xo>= o(t + a ; xo> = o(t;o ( a ; x")) = T,(T,(xO>).
Since this holds for each x" E D, we have T,+,,= T,T,,. In particular, To is the identity transformation T , T- = T - T , ; that is, T - , is the inverse of T, and T, is an invertible transformation of D into itself (a "diffeomorphism"). The property T,+, = T,T, tells us that the family { T r :-CQ < t < co} o f transformations forms a one-parameter group of transformations of D into itself. This is the one-parameter group generated by X . X can be reconstructed from T, since the curves t -+ T,(xo)are integral curves of X for each x" E D . Suppose that Y is another such vector field, with [X, Y ] = 0, and that Y generates the one-parameter group S, , - co < s < co. Transcribing Theorem 6.2 into group language, we have several equivalent statements:
'A the transform by each S,, t + S,(o(t)) (a) For each integral curve o(t)of , (which, in the notation of Theorem 2.2, = o(t, s)) is also an integral curve of X . Thus the one-parameter group S, permutes the integral curves of X , or leaws invariant the differential equations, giving the integral curves of X . This interpretation is basic to the Lie theory of ordinary differential equations. (b) For each x" E D, each s, t E (- 00, co), Ss(Tr(xo)) must equal T,(S,(x")),since the curve t + S,(T,(x"))is an integral curve of X starting at
6. Jacobi Bracket and Lie Theory
39
Ss(xo) when t = 0. Since this is true for each xo E D, we have S,T, = T, S, ; that is, the one-parameter groups generated by X and Y commute. Suppose now that 4 is a diffeomorphism of a domain D in the domain D’. Thus 4* sets up an isomorphism of F(D’) and F(D). Hence vector fields on D and D‘ correspond. Given X E V(D),define 4 * ( X ) E V(D’) as follows:
4 * ( X ) ( f ’ ) = 4-’*(X(+*(f)N
for f ’ E W’).
(6.3)
for x’ E D’.
(6.4)
This is equivalent to the property
4 * ( X ) ( x ‘ )= & ( X ( ~ - ’ ( X ‘ ) ) ) Proof.
Using (6.3),
4 * ( X ) ( x ’ > ( f )= 4 * L m f ’ ) ( X ’ )
=
4 - *(X(+*(f’))(x’)
= X(4*(Y)(4-‘(X‘)).
However, this is just the right-hand side of (6.4). This mapping X + 4*( X ) has two main properties :
4*(CX, Yl)
=
C4*(X), 4*(Y)l
for
x,y E V ( D ) >
(6.5)
that is, 4* is a Lie algebra isomorphism. (c) If t + a(t) is an integral curve of X , then the image curve t + 4(o(t)) is an integral curve of +*(X). Conversely, if the image curves of all integral curves of X under 4 are integral curves of Y E V(D‘), then y
= +*(XI-
(6.6)
The proofs of these statements are straightforward, and are therefore left to the reader. To provide some practice with this formalism, we discuss the local canonical form theorem for nonsingular vector fields and give some indications of its importance in the Lie theory of ordinary differential equations.
THEOREM 6.3 Suppose that X is a vector field in a domain D of R“, of coordinates
x = (xi),and xo is a point of D with X(xo) # 0 (that is, if X = A,(a/ax,), then not all Ai(x0) are zero). Then, if D is small enough, there is an invertible transformation such that
4: D
D‘, where D’is a domain in the space of variables y i
Proof. We shall give a geometric proof, but shall leave verification of certain analytical details to the reader.
40
Part 1. Calculus on Manifolds
At most reordering the coordinates, we can suppose that A , ( x o ) # 0. Suppose that xo = 0. Construct a mapping of ( y l , . . . ,y,)-space into D by mapping ( y , , . . . ,y,) into x,(y,), . . ., x,(y,), where t --+ ( x i ( t ) )is the integral curve of X , which is equal at t = 0 to (0, y , , . . . ,y,). This mapping is invertible (and is left to the reader to verify); hence they,, . . . , y , can be introduced in a neighborhood of x0. When one follows an integral curve of X in these new coordinates, y , increases linearly while the y z , . . . , y , remain the same. This, however, is just the condition that X i n these new coordinates is a/dy,. Q.E.D.
Theorem 6.3 is the simplest of the “canonical form” theorems that play a central role in the modern theory of ordinary differential equations. Notice that actually putting X into this canonical form is more or less equivalent to ‘‘ solving the differential equations defined by the integral curve. ”
Applications to the Lie Theory of Ordinary Differential Equations The Lie theory tion of a system, “
”
is merely an interplay between the geometric interpreta-
of ordinary differential equations as the integral curves of the vector field X = A ,(d/dx,) and the interpretation of X as a generator of a one-parameter group of transformations. Actually, we have been using Lie theory” all along. However, one may consider the Lie theory in the more restricted sense as the discussion of those parts of the general theory that have relevance to the sort of problem one faces in “ explicitly” solving differential equations. As a first remark : Suppose Xis a vector field,f’is a function with X(f)= 0. Suppose a([) is an integral curve of X . Then “
that is,f‘is constant along all the integral curves of A’. Classically, the function f i s called an integral of A’, or of the system (6.7), defining the integral curves. Conversely, functions having this property satisfy X(f)= 0 ; that is,fsatisfies
41
6. Jacobi Bracket and Lie Theory
the first-order partial differential equation
af axi
Ai - = 0.
One may interpret the problem of “explicitly” solving the system (6.7) as that of finding ( n - 1) functionally independent integral functions, say, f 2 , . . . , f , , for then the submanifolds f 2 = constant, . . . ,fn = constant, in x-space, are one-dimensional and in fact are the sets of points described by each integral curve of (6.7). Thus, “ explicitly solving involves some formal process that converts a set of integral functions into a possibly larger set of interest. To see an example of such a formal process, suppose we are given another vector field Yon D such that [ Y, X I = gX for some g E F(D). lff’is an integral of X, so is Y ( f ) . ”
Proof. X ( Y ( f ) )= Y ( X ( f ) )+ [ X , Y ] ( f )= 0 - g X ( f ) = 0. Thus, Lie derivation by Y is the formal process generating (possibly) “ new integrals. Let us now see what the condition [ Y, XI = g X means geometrically. First we ask whether there is a function h in D such that the vector field X‘ = hX satisfies [ Y, X ’ ] = 0. Obviously, h must satisfy: ”
Y(h)
+ hg = 0
or
Y(log h ) = -g.
Thus log h (and hence h) can be found locally if, for example, Y(x) # 0 for x E D,for then we can suppose, in view of Theorem 6.3, that Y = (d/ax,); hence log h can be found by a simple quadrature: x,,)
or
log h
= Jg(x,,
...,
Now notice that the integral curves of X ’ = h X and X differ only by a change in parametrization, provided h(x) # 0 for x E D.(We may say that two curves a(t), a i t i 6, and a,(t), a, i t i b,, differ only by change in parametrization if there is a map a : [a,b] -+ [a,, b , ] ,that is, between the intervals of parametrization, such that da/dt is always # O and such that a(t) = al(a(t))for a i t 5 b. If da/dt is always > O (resp.
42
Part 1. Calculus on Manifolds
(using the ordinary chain rule for differentiating composite functions of one variable)
hence
d4t) dt ’
ol’(t) = a’(cr(t)) -
Thus, a , ( t ) will be an integral curve of h X
=
X ’ , provided
dU
a’(ct(t))- = al’(t) = X’(a,(t)) = h(a(t))X(o,(t)). nt
But a’(u(t))= X(a(u(t))).Thus a(t) must satisfy - - - h(a(a(t))).
dt
This is simply an ordinary differential equation for a(t) (of the type solvable by separation of variables); hence there is no trouble showing that for each x E D,there are integral curves of X and X ’ passing through x, differing by construction only by a change in parametrization; one may paraphrase this by saying that “ integral curves of X and X I are the same up to a change of parametrization.” Suppose we return to the case [ Y, X I = g X . A t least locally, A” = h X , with h ( x ) # 0 and [ Y, X ’ ] = 0. As we have seen, the one-parameter group generated by Y maps integral curves of X ’ into integral curves. Combining this with the remark about the integral curves of X and X ’ , we see that: The one-parameter group generated by Y permutes the integral of X with a change of parametrization if [ Y , X I = g X for some function g E F(D). Suppose now that [ Y , X I = 0. The coordinate system may be chosen so that Y = d/dx,. Further, looking through the details of Theorem 6.3, we see that this coordinate system may be easily found whpn the explicit equations of the one-parameter group determined by Y are known. If we write X = djdx,, we must have “
”
asi a ax, axi’
O=[X,Y]=---
hence the Bi are functions Bi(x,, . . ., x,,)not depending on xl. The equations
6. Jacobi Bracket and Lie Theory
43
of the integral curves of X are then
dxi
. . .,
dt = Bi(x2(t),
(ii) that is, Xl(0 = JB1(x2(t), . . ., x,(t>) dt.
Thus the system of order (n - 1) can be solved first, and then x l ( t ) can be found by “ quadrature,” that is, by an integration. The order of the differential equations defining the integral curves of Y has been essentially reduced by 1 . If n = 2, this is ideal, since the system (i) can also be solved by “quadrature.” These observations constitute Lie’s main contribution to the classical problem of solving differential equations in the plane. If
dY - P ( X l Y ) dx Q ~ Y ) is such a differential equation, the solution curves, when written in parametric form, are the integral curves of
Lie observed that all the classical tricks for “solving” this equation by quadrature were associated, in the way we described above, with a one-parameter group of transformations in the plane.
Exercises
1. Suppose X = Bi(d/dxi),where all the Bi(x) are homogeneous of degree I ;that is, B i ( l x ) = l B i ( x ) for each 2 > 0. Show that for each, the transformation x + esx permutes the integral curves of X . Deduce that [ X , Y ] = 0, where Y = xi(d/axi).Now verify this directly.
2. In the (x, y ) plane, consider the vector field
a ax
Y=y--x-.
a
ay
44
Part 1. Calculus on Manifolds
Show that the one-parameter group it generates is the group of rotations: + y sin t, x sin t - y cos t ) . Let X be another vector field
(x,y ) + (x cos t
a
.4-+B-
ax
a
ay
in the plane. Find the condition: The one-parameter group generated by Y permutes the integral curves of X up to a change in parameter. 3. Find the coordinate system in which the infinitesimal generator of the one-parameter group of rotations in the plane has its canonical form. If
a ax
X=A-+B--
a ay
is such that [ Y, X I =fX (that is, if the problem is rotationally symmetric), find the " explicit " formulas for the integral curves of X. 4. Consider the space of one variable x,and on this space the three vector fields
Compute the Jacobi brackets. Show that the one-parameter group generated by any linear combination of these three vector fields with constant coefficients is contained in the group x + (ax b)/(cx d ) of linear fractional transformations.
+
+
5. Suppose X = Ai(djdxi) and Y = Bi(d/dxi)are vector fields such that [ X , Y ] = Y and such that the n-vectors, ( A i ( x O ) )and (Bi(xO)),are linearly independent. Show that the coordinate system can be chosen so that xo = 0, and about this point,
where c1 is some function of the indicated variables. Suppose that Z = Bi(djdxi)is such that 0 = [ X , Y ] = [ Y, 21. Show that the integral curves of Z in this coordinate system can be found by solving a system of order n - 2, followed by quadratures. 6. Suppose
21 = 2, [ Y, 21 = 2 X and such are vector fields such that [ X , Y ] = Y, [X,
45
6. Jacobi Bracket and Lie Theory
that the vectors (Ai(x0)),(Bi(xo)),(Ci(xo))are linearly independent. Show that the coordinate system can be chosen so that xo = 0, and about this point,
y are functions of the indicated variables. Show that the problem where u, /I, of finding the integral curves of any vector field W that satisfies 0 = [A', W ]= [ Y, W ]= [Z, W ]in these coordinates can be reduced to solving a system of differential equations of order n - 3, quadratures, and a Riccati equation. (A Riccati equation is one of the form dxldt = a(t) b(t)x c(t)xz.)
+
7. Prove (6.5) and (6.6).
+
7
Lie Derivation and Exterior Derivative; Integration o n Manifolds
Let us return to the study of differential forms on a manifold M . We have described the Jacobi bracket operation (X,Y ) -+ [X,Y ] on vector fields, the Lie derivativef+ X ( f ) of a function by a vector field, and exterior derivative f + df of a function. We shall now extend the latter two operations from functions f (that is, differential forms of degree zero) to differential forms of any degree. Note first that the definition of [ X , Y ] can be rewritten as
X ( Y ( f ) )= Y ( X ( f ) )+ [ X
mf).
The key idea is that X acting on Y ( f ) acts first on Y , leading to [X,Y ] ,and then onf, leading to X ( f ) . Suppose now that o is an rth degree differential form. For XI, . , . , X,E V ( M ) , o ( X , , . . . , X,)is then a function. Let X be another vector field. Let us apply X to this function and write X(W(X1,
. . . ,X,))
=
X ( w ) ( X , , . . . > X,)
+ d C X , XI13 x2, . . .,X,)
+ . . . + w ( X , . . . , [ X , X,]).
(7.1)
Now, we use (7.1) as the definition of the r-form X ( w ) , and call it the Lie derivative of o by X. We must verify that X ( w ) is well defined by (7.1). That it depends skewsymmetrically on XI, . . . , X, should be obvious. The only nontrivial point is that it is F ( M ) multilinear, that is,
X ( w ) ( f X , , X 2 , . . . , X,)
=fX(w)(X,,
. . . , X,)
for f~ F ( M ) . (7.2)
(Here we use the algebraic “module” definition of differential forms. It is much more convenient for the purpose of doing things in a coordinate-free way than is the vector bundle definition.) Now
X ( w ) ( f X , , . . . , X,)
= X ( w ( f X 13
. . ., X , ) )
-4[X,fX,l,X2, =
X ( f ) o ( X , , . . . , X,) -
.-.,Xr>
- 4 f X l , X , X 2 , ...,Xr)
+ f X ( X 1 , . . ., X,)
w ( X ( f ) X , , . . . , X,) - o ( f [ X , Xll,
x,,. . . , X,) ... .
Notice that the first and third terms now cancel, as required in order to prove (7.2). 46
47
7. Lie Derivation and Exterior Derivative
For w E F'(M), X E V ( M ) ,define the contraction of w by X,X J a,?as the ( r - 1)-form given by ( X A w ) ( X , , . . ., X,-,)= o ( X , x,, . . . ,X , - , ) .
(7.3)
It is readily verified that (7.1) gives the rule X ( Y J o)= [X,Y]_I
+ YJX(W)
Using this, we can prove that
X(w,A
w 2 )= X ( o , )
w2
A
for X , Y
+ w1 A X(w2)
V(M).
E
for X
E
(7.4) (7.5)
V(M).
Proof: Suppose degree w , = r, degree w 2 = s. Then (7.5) is true for r = s = 0. Proceed to prove (7.5) by induction on r s. Let Y be another vector field:
+
Y i X(w,A w 2 )= (using (7.4)) X ( Y =
x((Y JWl) -
A W2
(wl
A
(CX, Y l
+ (y
J0
1 )
A a 2
A
_I ( w l A 0 2 )
( y Jw2))
+ ( Y J X(0))A
w2
A x(O2)+ (- l y ( X ( w 1 ) A ( Y 1 ([x,y1 _I w2) + A ( y X ( w 2 ) ) (CX, Y I J w1) A w2 + (- 1Irwl A ( [ X , Y l
mi)
W2)
WI A
-
w 2 ) ) - [X,Y I
+(-1yWi
A 0 2 + ( - 1 y m 1 A ([X, y1 0 2 ) ) (7.4) again and the induction hypothesis)
([x,y ]
= (using
_I
0 1
w2)).
When the cancellations are made, we get Y applied to the right-hand side of (7.5). Since Y is an arbitrary vector field, (7.5) holds for forms of this degree also. Iff
E
V(M),
then X ( d f ) = d X ( f ) .
(7.6)
Proof. For Y E V ( M ) ,
Y
J X ( d f ) = X(Y J df) = X(Y(f))=
But,
y
[X,Y] J df
rx, Yl(f1
Y(X(f)).
4 X ( f > )= Y X ( f ) .
Q.E.D.
The geometric meaning of the Lie derivative by X is not so evident in this formal treatment: It will become clearer when we deal with Lie groups.
t In some differential geometry books, X -I w is denoted by i ( X ) ( w )and X ( w ) is denoted by 8 ( X ) ( w ) or LAW).
48
Part 1. Calculus on Manifolds
Roughly, X ( w ) is a measure of the extent to which o is invariant under the one-parameter group generated by X . Suppose, for example, that we examine the geometric consequences of the condition X ( w ) = 0. If X is identically zero, it means nothing, since X ( w ) is always zero. Pick a point p at which X ( p ) # 0, and introduce a coordinate system x iabout p in which X = ajax,. Now o admits an expansion in this neighborhood of the form
1a j l ...
j,
dxj, A
A
*..
dxjr.
Using the rules developed above,
0 = X(o)= C X ( a j l _ _ _ j , ) dxjl
+ ail ...
j,
... A dxjp d ( X ( x j , ) ) A . .. A d x j r + . . . A
since d ( X ( x j ) )is always zero. Hence
that is, a j , ... j p is a function of x2,. . . , x, above. The one-parameter group t + $ t generated by X then leaves x2,. . . , x, alone and increases x1 linearly; x1-+x I t . We see that (7.7) is the condition that o be invariant under each of these transformations; that is,
+
for all t .
q!~?(o)= O
This holds in a neighborhood o f p ; however, the set of points where it holds is open and closed in M , and since we are assuming M to be connected, it holds everywhere on M . Now we turn to extending d to forms of all degree, sending an r-form o onto an ( r 1)-form do. We shall want the following basic formula relating d to Lie derivative to hold:
+
X(w)= X
_I
do + d ( X
_I
for X E V(A4).
o)
(7.8)
Let us take advantage of the fact that we have already defined X ( w ) , and can assume that d(X w ) is defined by induction, to define d o for forms of degree r, assuming it is defined (and satisfies (7.8)) for forms of degree less than r. Explicitly, dw(X1
3
. . ., Xr + 1)
. . .> Xr + 1) O ) ( X , . .., Xr+l).
= X , ( o > ( X ,3
- 4x1
1
(7.9)
Now, as in the earlier definition of X ( w ) , we must verify that (7.9) really
7. Lie Derivation and Exterior Derivative
49
defines do as a skew-symmetric, F(M)-multilinear function of XI, . .., X,,,. This is similar to the earlier computation and is left as an exercise. The three remaining important properties of d are
X(dw)= dX(w), d(w,
w 2 ) = do, A w,
A
+ (- lye, A dw, ,
(7.10) (7.11)
(if o,is an r-form), and d(dw) = 0
for all forms w.
(7.12)
All three can be proved b y the technique we have used already; namely, we assume that they are true for forms of lower degree and apply the inner product Y to both sides, for an arbitrary vector field Y. As an example, we prove (7.12) with this technique: Y A d ( d o ) = Y ( d o ) - d(Y A d o ) =
Y(dw) - d(Y(w) - d( Y A m ) )
=
Y ( d 0 )- d Y(w) (the third term vanishes by induction hypotheses)
=0
(by (7.10)).
In principle, (7.9) can be worked out to give a noninductive, explicit definition of do. However, this is never used in practice. Either (7.8) is used or d can be calculated very simply in local coordinates. Suppose xi are coordinate functions on a neighborhood of M. Then ~=Ca~,...~,d ~ x. ~. , . ~ d x ~ ~ .
Using (7.1 1) and (7.12), we see that do
=
dajl .__ j , A d x j l A . - .A d x j v .
(7.13)
(Notice that (7.13) requires (7.12) only for zero forms, where it is easily proved directly, and (7.1 I). But (7.10) for forms of degree greater than zero is an easy consequence of (7.5) and (7.10) for forms of degree zero. Thus (7.10) could be proved quite simply by using (7.13) instead of the formal method indicated above as an exercise.) We find that (7.13) has as consequence the basic rule for the behavior of d under mappings between manifolds. Suppose 4 is a map: M' -+ M of manifolds. Then we have b*(dw) = db*(w)
for each differential form w on M .
(7.14)
Proof. We have already verified (7.14) for zero forms. But it suffices to
50
Part 1. Calculus on Manifolds
verify (7.14) in each coordinate patch, and (7.13) obviously enables this, since
4*(0)= d$*(w) = =
c .__ c 4*(daj, ... +*(Uj,
to do
.. A dxj,). d 4 * ( x j , ) A ... A d 4 * ( x j $
j,)4*(dXj, jr) A
US
A
'
4*(dW).
Integration on Manifolds Our main concern in this book is with differential calculus on manifolds. Since we shall also occasionally need some of the basic facts of integral geometry, we now present a short survey of what we need. The reader can refer to Spivak [I] for a fuller treatment of the integration theory. Let M be a manifold (always assumed to be C" and representable as a countable union of compact sets). Suppose dim M = n. Definitions
M is orientable if it admits at least one n-differential form w of degree n whose value is nonzero at each point of M . Two such forms w and o' are equivalent (for the purposes of orientation) if w =f w ' , with f E F ( M ) ,f ( p ) > 0, for all p E M . An orientation of A4 is just an equivalence class of such forms. It readily verified that if M is connected, it admits either no orientation or two. (For if an n-form w defines an orientation, - w defines an orientation in a different class. If o' also defines an orientation, o' must equal f w for some everywhere nonzero f E F ( M ) . Since M is connected, either f > 0 or f < 0 everywhere on M . In the former case, o' is equivalent to w ; in the latter, to -o.)If M is disconnected, fixing an orientation on each connected component of M defines an orientation for M in an obvious way. A coordinate system (x,,. . . , x,) valid in a connected open set U of M is positively (respectively, negatively) oriented with respect to an orientation defined by an n-form o if dx,
A
... A dx,
= fw,
with f > 0 in U (respectively, ,f < 0 in U ) . A partition of unity for M is a sequencef;, f 2 , . . . of functions from F ( M ) such that (a) C ? , f , ( p ) = l a n d f j ( p ) 2 O f o r a l l p E M , j = 1 , 2,.... (b) Each functionfi has assigned to it an open subset U j of M such that fi vanishes outside U j , and each U j meets only a finite number of the other sets of the sequence.
51
7. Lie Derivation and Exterior Derivative
(c) Each set U j is a coordinate neighborhood of the manifold, with a coordinate system of functions x,,. . . ,x, valid in U j .
It follows from (a) and (b) that M is the union of the U j . Conversely, it can be proved that any open covering U , , U , ,.. . of M such that each set meets only a finite number of the other sets has a set of functions f , , f i , . . . associated with it satisfying (a) and (b). The proof can be found quite easily in Helgason [l, p. 81 or Spivak [I, p. 631. Now, let M be orientable, with a fixed orientation. Let 8 be an n-form on M and let f be a continuous real valued? function on M . We define the integral off over M with respect to 8, denoted by jM fe, as follows: Case 1
f vanishes outside a coordinate neighborhood U.
Let 4 : D + U be a diffeomorphism of U with a convex domain D in R", chosen so that the coordinate system defined by 4 on U is positively oriented. Let (x,,. . . ,x,) be coordinates on D,4*(8) = g dx, A .. . A dx, . Then put
where the integral on the right-hand side is the ordinary$ Riemann integral for the function 4*(f ) g on the domain D. We pause to show that this is independent of the choice made of 4 and D. Suppose then that 4': D' -+ U is another diffeomorphism of U with a convex domain D' in R", with 4'*(8) = g' dx, A . . . A dx, . Let II/ be the mapping D' -+ D defined as: II/ = 4-'4'. Then $*(dx,
A
* - .
A
dx,)
= 4'*4-'*(dx,
A
... A
dx,)
But, $*(dx, A A dx,) also equals Jdx, A ... A dx,, where J is the Jacobian of the mapping 41, from D' to D. Now we take, as known from advanced calculus, the behavior of the Riemann integral under a diffeomorphism $: D' + D,namely,
IDh d x , But J
* *.
= g'/II/*(g),
dx, =
$*(A) IJI
JD,
dx,
. . . dx,
which is positive, since both
for all h E C(D').
4
and
4'
define coordinate
t Of course the theory can be trivially extended t o complex functions by separating them into real and imaginary parts. $ When we write dxl . . . dx. with no wedge products, we just mean the ordinary, unoriented Riemann integral. Thus, to be pedantic, it should as a symbol be distinguished from s D + * ( f ) g dx, A ... A dx., although, of course, it is equal to it as a number.
52
Part 1. Calculus on Manifolds
systems that are oriented positively. Now ‘M
f0
=
14*(f)g D
... dx,
dx,
Case 2 f vanishes outside a compact subset of M . Let { f l , f 2 , . . .} be functions on M defining a partition of unity for M . Then f can be written as
f=c& j =1
The sum on the right-hand side is really finite, for by property (b) of a partition of unity, the compact subset outside of whichf vanishes meets only a finite number of the elements of the covering of M associated with the partition of unity. Since eachfSi vanishes outside a coordinate neighborhood, by case 1,
jM&8 is defined.
N o w let us define
(Again, this is really only a finite sum.) We must prove that this is independent of the partition of unity chosen. Suppose, then, that {f,’, f2’, . . .} is another partition of unity. N o w {fjfk’: 1 ij,k < a}is also a partition of unity, since?
f fjfi(P) c m
j,k= 1
=
m
cfjfi(P)
j=1 k = l
Some analysis is needed to prove that the double summation can be broken up, but the justification (left to the reader) readily follows from theorems on convergence of infinite series, since all terms are nonnegative.
7. Lie Derivation and Exterior Derivative
Now consider split up into
x:k=l
J M ffjfk'8. w
53
Since this is really a finite sum, it can be m
.
For fixed j , ffjfk' vanishes outside a coordinate neighborhood; hence the additivity of the Riemann integral implies that
Performing the double summation in the reverse order, we see that
which proves invariance. Case 3
The General Case. Let Cc(M) be the vector space of continuous, realvalued functions on M that vanish outside a compact subset of M . We can sum up what we have proved above in the following way: An n-differential form 8 on M defines a h e a r functional f + J M f d on C,(M) with the following property : Given a compact set K of M , there exists a number > 0 such that, whenever f E Cc(M)vanishes outside K and is everywhere bounded in absolute value by 1, J M f 8 5 a. a
(Tracing through case 2 and referring back to the properties of the Riemann integral in bounded domains in R", we see that a is fixed, once a particular finite set of coordinate neighborhoods covering K is chosen.) Now, this is just the property needed to extend J M f d to functions f on A4 that are merely Bore1 measurable. (Following the pattern established in extending the Riemann integral from continuous functions on closed intervals to a Lebesgue integral over the whole real line, roughly, one approximates f by sequences from C,(M), while defining the integral by a limiting operation.) As usual in measure theory, we say that such an f is integrable if iM1f 10 is finite. We have all the general tools of integration theory that are available on, say, the real line for the Lebesgue integral. However, all immediate work in this chapter is concerned with continuous functions, so that we are really dealing only with the analog of the Riemann integral. It must be emphasized, however, that one should not go too far in trying to regard this from the point of view of functional analysis. The main aims in integral geometry are computation of such integrals, or at least statements of
54
Part 1. Calculus on Manifolds
theorems showing how such computations can be reduced to computation of geometric invariants, and of theorems about the behavior of the integrals under mappings. In addition, in the future the problem of singularities of such integrals will be increasingly important. In such problems it is important to be able to utilize the intuitive tricks of integrating that are learned in integral calculus. In these computations, our standard notation is sometimes awkward,? and more intuitive notations are desirable. For example, for J M , f 8we sometimes write
and so forth. The third notation is useful when one form of 8 is fixed throughout the discussion; it can be called dp. Physicists have devised a useful notation for integrating over domains in Euclidean space of variables x l , . . ., x,, which goes something like this: sf(.) d"x. We now turn to the question of behavior under mappings, which is really the main problem of integral geometry. The most immediate concern is behavior under diffeomorphisms, which follows more or less from the definitions.
THEOREM 7.1 Let 4: M -+ M' be a diffeomorphism between manifolds, and let 8' be a volume element1 form on M ' , f ' E F ( M ) . Then
JMW = J 4*(f'M*(O'). M Pro06 When .f vanishes outside a coordinate neighborhood of M , this is inherent in the proof given above that the defining Riemann integral depends on the coordinate neighborhood. The general case can be obtained from this one by using a partition of unity. Next, suppose that M is a manifold, that o is a p-form on M , that N is a p-dimensional manifold, and that the map 4: N + A4 defines N as a submanifold of M . Then 4*(o) is a volume element form with respect to N ; hence we
t Although, conversely, sometimes problems in integral calculus become much more amenable when written in a more abstract than usual notation. This is particularly true of "change of variable" arguments; writing down explicitly the mappings involved and using the general formalism often clears up much confusion. 1 By this we mean a form of the same degree as the dimension of the space. We shall state things for C " functionsf, but of course everything that does not involve differentiation off usually extends to at least continuousf. In addition, we shall not state explicitly unless there is a possibility of confusion about whether we are working with a particular orientation on each manifold.
7. Lie Derivation and Exterior Derivative
55
can define J N 4*(w) as above. As usual, it is often convenient to suppress mention of 4 and write this simply as jN o. Defining the integral of a p-form over a p-dimensional manifold is very natural and easy, since differential forms can be pulled back” under mappings. Now let us suppose that 4 : M -P B is a map of manifolds and that o is a volume element differential form on M . Of course it makes no geometric sense to think of defining an object like &(w), since differential forms are covariant objects; that is, they behave under mappings as functions rather as points. However, o does define a linear form on functions, namely, f -P Jyf o , which, as the dual of a covariant object, is contravariant. Thus it may be expected that the linear functional defined by o on forms does get “pushed” by 4 to define a linear functional on functions of B. We shall denote this functional by c#-’*(w). Thus, as definition, 4 - ’ * ( w ) is a linear map C,(B) R, defined by “
--f
6-’*(o)(j’)= JM 4 * ( j ) w
for j’E c,(B).
Now there is one immediate difficulty with this definition, namely, 4*(f ) may not be integrable with respect to the measure defined by o.However, there are two main cases where this difficulty does not arise: (a) All continuous functions on M are integrable with respect to w. (b) 4 is a proper map; that is, the inverse image of every compact subset of M under (b is a compact subset of M . For the moment, the reader can assume that we are working with one of these assumptions-there are various devices available for weakening them. To justify this unorthodox notation, namely, +-‘*(o), note that by Theorem 7.1 (in case 4-l exists; that is, 4 is a diffeomorphism) 4-l*(o) as a functional agrees with the functional defined by the form 4 - ’ * ( w ) . In addition, if $: B+ C is another map, the reader can easily check that ($4)-’* is just $-I*$-’*, so that the notation will not lead to inconsistency on iteration of mappings. We also believe that the notation has some intuitive geometric content. If 4-l exists as a map, 4 - ’ * ( w ) is of the same degree as o.If not (for example, if B is of lower dimension than M ) , then $-‘*(o) is something like a volume element form on B so that 4-’* “collapses” the degree of w. In fact, we shall see below that 4-’* acts by “collapsing” the component of o along the fibers of 4 by a process of “integration over the fibers.” There is another mapping associated with 4 that carries measures on B back into measures on M . Let 6 be any fixed volume element form on B. Then any f E F ( B ) defines a measure on B, namely, that associated with the volume element form f0. But we can pull back f to 4*(f), then multiply +*(f)w to get a volume element, and hence get a measure on M . We can thus
56
Part 1. Calculus on Manifolds
regard this as a mapping of F ( B ) into the space of measures on M , or more generally, as the space of linear functioiials on C(M). Thus we have the possibility of extending this map, defined by 4 and o of C ( B ) into? C,(M)* from certain generalized functions$ on B, into certain generalized functions on M ; we shall continue to use the notation 4* for this mapping. The most important such generalized function is the Dirac delta function. Suppose, then, that b is a point of B, that d,, d 2 , . . . are a sequence of elements of C,(B) that converge to the Dirac delta function dbf; that is, lim
j A m J”M,
djfO = f(6)
for allf
E
C,(B).
Then, according to our definition, Cp*(S,) is the linear functional on C , ( M ) such that
Physicists use the notation
4*(M = 6,-
‘ ( b )7
regarding 6,as the “delta function” corresponding to the set 4 - ’ ( p ) just as the usual Dirac functions associated with a set consisting of one point. Using this bit of formalism, we can write the relation between 4-’* and +* in an interesting, but purely formal, way. Suppose the measure 4-’*(0) on M ’ is defined by a volume element differential form on M ‘ ; for example, by one of the forms go. Thus
where d6 is a suggestive shorthand for 0. Now, formally, g can be written as J B
thus, formally again,
Using the relation +-‘*(o) = go again, we have (purely formally, of course) /Md*(db)”
= ’>(b’)g(6’)
d6‘.
t This just denotes the set of real-valued linear forms on C , ( M ) . 1 We refer to Gelfand-Silov [I J for the notion of “generalized function.”
57
7. Lie Derivation and Exterior Derivative
Finally,
s, I fW =
(JMff#-’*(6b)o)
3
db
for all
f €
F(M).
(7.15)
This is one of the two main formulas of integral geometry (the other is the Stokes’ formula); it describes how an integration over M can be decomposed into an integration over the fibers of 4, and then into an integration over B. (It is a generalization of the formula
which is well known in integral calculus. However, (7.15) contains a good deal more information, since it holds in cases where the map (p is much more complicated than the simple projection map associated with a Cartesian product.) At any rate, one of the most urgent tasks of integral geometry is to find the broadest conditions under which (7.15) holds, that is, in which the formal tricks we used can be justified. We shall now mention one general theorem that is relevant. Let M be a manifold, 4: M - , B a map of M onto B, with dim B I dim M and with both M and B orientable. We say that a point p E M is a nonsingular point of the mapping if 4 * ( M p )= B b ( p ) ,and we then say that a point b E B is a nonsingular image point of the mapping if (p-’(b) consists only of nonsingular points. Let 4 be an everywhere nonzero volume element form on B. A theorem of Sard tells us that the singular image points of 4 form a subset of measure zero on B. (By “measure zero” we mean relative to that measure defined by 4 on B. We must refer to the exposition given by Sternberg [I, p. 471.) Now, if b E B is a nonsingular image point of 4, it follows from the implicit function theorem that 4-’(b) is a submanifold of M whose dimension is equal to (dim M - dim B). Now, let/€ Cc(B) and let o be a differential form that vanishes outside a compact subset of M and whose degree is equal to (dim M - dim B). Thus 0=0A
4*(f$)
,s
is a volume element form on M ; hence 8 is well defined. We want to express this integral in terms of an integration over the fibers of 4 and over the base manifold B. THEOREM 7.2 With the above notations, (7.16)
58
Part 1. Calculus on Manifolds
A word of explanation of the notations inherent in this formula is necessary. If b is a singular image point, we must regard j.b-,(b)$ as undefined. If b is a nonsingular image point, we define I)as follows: I) restricted to 4 - ’ ( b ) is a volume element form for the submanifold. The points where this submanifold is nonzero form an open subset of the submanifold (hence also a submanifold of M ) , and S & - l ( , , ) $ is defined as the integral of $ over this submanifold. Thus b +f(b) j4- I) is a real-valued function defined except for a set of measure zero on B ; hence the right-hand side of (7.16) is defined as the integral of this function over B. The proof of formula (7.16) can be found in all generality in a paper by Federer [I], although it is expressed there in a different language. We shall not give the proof of (7.16) in full here, but only in the case where 4 has no singular points, that is, where 4 is a maximal rank, onto-mapping. Actually, the fact that (7.16) can allow singularities is its most interesting and delicate point, but the “nonsingular” version we prove here is adequate for most of the applications we have in mind. Since (7.16) is linear in $, notice first that it suffices to prove it in case $ vanishes outside a coordinate neighborhood of M . By the implicit function theorem, M can be covered by coordinate neighborhoods U having coordinate systems x i , . . . , x, with the following properties:
I&-
(a) 0 < x I ,..., x, < 1. (b) 4 ( U ) is a coordinate neighborhood for B, with coordinate system y , , . . . , y, such that $ * ( y , ) = xl,. . . , +*(y,) = x,. Suppose that in g!(U),,fib is given by a form h(y) dy, A ... A dy,. We can suppose without loss of generality that w is of the form k(x) dx,+l A ... A dx, (since any of the factors involving dx,,. . . , dx, will not affect either side of (7.16)). Thus the left-hand side of (7.16) is
1; . . . Jolh(x)k(x)
dx,
. . . dx,
The right-hand side is
and the two are then equal by the property of the Riemann integral that asserts that multiple integrals can be evaluated by iterated partial integrals. Q.E.D.
COROLLARY TO THEOREM 7.2 Suppose that 4*($) A o is a volume element form for M that is nonzero on 4-’(b), with b a nonsingular image point for 4. Then 4*(6b) = d + - I ( b ) , the
7. Lie Derivation and Exterior Derivative
"
59
Dirac delta function" of the fiber 4 - ' ( b ) , is just o in the sense that
To make this more explicit we give an example: Suppose that M is R" itself and that 4 is a map of R" -+ R ; that is, 4 is just a real-valued function on R", say, of the form x + 4(x).
Suppose, say, that b = 0. The condition that b be a regular image point is then just d4(x) # 0 for each x such that 4 ( x ) = 0. Let grad 4 be the vector field
a4 a I---. axi axi
i= 1
Then grad 4 ( x ) # 0 for each x such that 4(x) = 0. Suppose the form $ on R is just the Riemann integral form. Then 4*($)= d4. Let us then try to find an (rz - 1)-form o such that
0 = dx, A ... A dx, = d 4
A
o.
4 to both sides: grad 4 -I 0 = grad 4(4)w - d4 A (grad 4 A o). Now d 4 is zero when restricted to the fibers of 4 : Apply the inner product of grad
which, anticipating the notation to be introduced in Part 3 , we write as Thus, we see that
llgrad 4112.
8,-1(,,1 can be represented by the ( n - 1)-form (grad 4 -I Q)/)Igrad4112, in the sense that 8,-1(,,) applied to a functionfE C,(M) is just the integral off over the hypersurface +-'(O) with respect to this form. At this point we make contact with the material in the first volume of a treatise by Gelfand and silov [ I ] on "generalized functions," and we refer the reader to that discussion for more detail and for the fascinating applications to partial differential equations. We have now completed our admittedly fragmentary remarks about the general facts concerning the behavior of measures defined by differential
60
Part 1. Calculus on Manifolds
forms under mappings. We now turn to the second basic general fact about integration on manifolds, namely Stokes’ formula.” Now, just as the behavior of measures under mappings can ultimately be reduced (if things are not too pathological) to the very simple theorem that a multiple Riemann integral can be reduced to iterated one-dimensional Riemann integrals, so can “ Stokes’ formula” be reduced to the fact that the integral of a derivative of a function is the function itself. Here, again, it is rather difficult to state and prove precisely a version of Stokes’ formula that is comprehensive for all geometric applications (at least without detouring into considerable technicalities). We shall compromise again by stating it in reasonable generality and by proving it under simple hypotheses. Let M be a (an oriented) manifold that will be fixed throughout the discussion. Let o be a form of degree equal to (dim M - I), and let D be an open subset of M . Let D‘ be the boundary of D in A4 (that is, D‘ = D - D, where D is the closure of D in M ) . Now, of course, D‘ can be quite pathological. However, nice domains will have boundaries that can be exhibited as the union of a “large piece” that is a submanifold of M of codimension 1 (that is, a hypersurface) and various smaller pieces of lower dimension. Suppose this hypersurface is orientable: One of the two possible orientations can be chosen as follows: Let p E D‘ be a point on the hypersurface boundary N of D, and let u E M , be the vector such that sufficiently small curves with tangent vector u point inside D. Then, a basis u , , . . . , (n = dim M ) for N , is positively oriented if (ox, . . ., u,_ u) is a positively oriented basis of M , (in terms of the given orientation on M , that is, if 0 is the everywhere nonzero volume element form on M , then B(u,, . . . , vnp1, u) > 0.) This orientation of N will be called the positive orientation of N relative to D. We denote by d D this hypersurface on the boundary of D (possibly nonconnected, of course) with the orientation described above. Stokes’ formula then states that “
“
”
,,
(7.17) (Of course, under suitable conditions, one can allow d D to be the whole boundary of D if the proper precautions are taken to orient the hypersurface part of the boundary and the convention adopted is that the integral of o over a subspace of lower dimension is zero; but our procedure is more in line with d D as it is defined in topology.) Now we can prove one simple, but adequate version of Stokes’ formula: THEOREM 7.3 Let D be an open subset of an oriented manifold M , and let d D be an oriented hypersurface of M that lies on the boundary of D and is positively
61
7. Lie Derivation and Exterior Derivative
oriented relative to D. Suppose that U , , U , , . . . is a sequence of open subsets of M which covers D u d D and such that, in each U j ,j = 1, 2, . . . , there is a function such that
fi
dD n U j = { p E U j : f j ( p ) = 0},
dfj # 0
(7.18a)
in U j ,
(7.18b) (7.18c)
D n U j = { p E U j : f j ( p ) > 0},
and such that each U j meets only a finite number of the others.
(7.19)
Then Stokes' formula holds in D for each form w that is defined and smooth in a neighborhood of D u do. Proof. Using a partition of unity, it suffices to deal with the case where o vanishes outside one of the sets of the covering having the properties described in (7.18). If necessary, making the covering smaller, we can suppose that each of the sets carries a coordinate system xl, . . . ,x, such that x1 is just the functionf, . Thus we can reduce to the case where: (a) D is the subset {(xl, ...,x,) E R": x, > 0). (b) d D is the subset { ( x ~ ., . . , x,) E R": x1 = 03. (c) w vanishes outside a compact subset of D. Suppose, for example, that o = gl(x) dx,
+
A
... A dx, - g, dx, A ... A dx,-,.
A
dx,
A
... A dx,
g, dx,
Since (7.17) is linear in w and Ciw, it suffices to deal with essentially just two cases, namely, o = g,(x) dx,
A
... A dx,
or
w
= g2(x) dx, A
dx,
In both cases, g1and g2 vanish outside a compact set. Then
In the first case,
integrating first with respect to x,, we find that
A
... A dx,.
62
Part 1. Calculus on Manifolds
In the second case,
(xz.
..., x,)
E R“-
= 0,
after integrating with respect to x2 and remembering that g 2 vanishes at infinity. But jaD do also vanishes, since dx, = 0 on dD. Q.E.D. Remark. In both Theorems 7.2 and 7.3 we have used the same trick, namely, find a formula that is expressed in completely geometric, coordinatefree language. To prove the formula, we first verify it in the simplest possible cases, where it reduces to a well-known property of the Riemann integral, and then extend it, using partition of unity ” tricks, to more complicated situations that are built up from the simple ones. In fact the whole procedure is the prototype for many of the ideas of algebraic topology. “
Exercises
1. Prove that dw as defined by (7.9) depends skew-symmetrically on its arguments. 2. Work out dw(X, Y ) and dd(X, Y, Z ) explicitly if w and 0 are 1- and 2-forms. Guess and prove the general formula do(X,. . ., X,,,) for an r-form. 3. Prove (7.10) and (7.1 I), two ways: first, using local coordinates; then completely intrinsically.
4. In the proof of (7.14) given, show why “it suffices to verify (7.14) in each coordinate patch.” 5 . Suppose A4 is an orientable manifold, and N is an embedded submanifold of A4 of one less dimension. Suppose that a function f~ F ( M ) is identically zero on N , and d f # 0 at each point of N . Show that N is orientable.
6. Show that the classical Gauss, Green, and Stokes’ theorems (proved in vector analysis) are specializations of the general Stokes’ theorem.
8 The
Frobenius Complete Integrability Theorem
Let M be a manifold, and let V ( M )be the set of its vector fields. Originally, we defined an X E V ( M ) as a cross section of the tangent bundle to M. However, we have established that X can be alternatively defined as a linear mapping: F ( M ) + F ( M ) such that
X(fS)= X(fk +
fm)
forf, 9 E
wf).
This property can be described algebraically by saying that a vector field is a derivation of the ring F(M). Now, given X , Y E V ( M ) , we established (in Chapter 6) that the vector field [ X , Y ] ,the Jacobi bracket of X and Y, can be defined by the rule
f
+
[ X , Y l ( f )= X ( Y ( f ) )- Y ( X ( f > ) .
Formula (6.1) gave some of the algebraic properties of this bracket operation. In particular, they showed that V ( M )is a Lie algebra (over the real numbers) and established a connection with the theory of Lie groups, which will be explained in more detail in Chapter 10. In this section, we establish a connection between this algebraic structure and certain geometric facts. A set H of vector fields on M is said to define a vector-jield system on M if it is an F(M)-submodule of V ( M ) ;that is, if
fX+gYEH
f o r X , Y E H , f,g€F(M).
We shall suppose that such a vector-field system is given on M . For p E M , define H,, the “value” of H at p , as the set of all vectors of M , of the form X ( p ) , for X E H. H , is a linear subspace of M,: Its dimension is called the rank of H at p and is denoted by r ( p ) ; the point p is said to be a maximal point for H if rCp) 2 r(q) for all q E M . LEMMA 8.1 r(p) I r(q) for all points q sufficiently close to p . In particular, the set of maximal points of H i s an open set of points in M .
aI r(p) be elements of H such that the (X,(p)) are a Proof. Let X,, 1 I basis of H , . To prove the lemma it suffices to show that the (X,(q)) are linearly independent elements of H , whenever q is sufficiently close to p . 63
64
Part 1. Calculus on Manifolds
This is indeed a general fact. Suppose that
a axi Since the values of the vector fields (a/ax,) form a basis of the vector space of X a = A at.-
tangent vectors at each point, the dimension of the subspace of M , spanned by the Xa(q) is equal to the rank of the r x n matrix (Aai(q));that is, it is equal to the number of rows of the largest square submatrix whose determinant is nonzero. Since the determinant of a matrix of continuous functions is a continuous function, the r x r subdeterminant, which is # O at q = p (since the rank is Y at p , by construction) will remain # O when q varies in some neighborhood of p . This proves the lemma.
LEMMA 8.2 Let p be a maximal point for H. Then there is a neighborhood U of p and a set of elements (X,), 1 2 a 2 r, in H such that (a) (Xa(q)) is a basis for H , for all q E U . (b) Each X E H can be written in the form X = f a X a , with (Such a set of vector fields is called a basis for H in U . )
,fa
E F(U).
The proof is a corollary to the argument of Lemma 8.1. The vector fields ( X u )chosen for that proof are linearly independent at every point q sufficiently close to p ; hence they form a basis for H , , since dim H , = dim H,. This proves (a). To prove (b), choose an X E H. X(q) can be written as ,f,(q)Xa(q)for q sufliciently close to p . The assignment q + (fb(q)) defines the functions f, . It remains only to show that they are C". This can be done, as in the proof of Lemma 8.1, by writing the X , in terms of local coordinates. Definition
A mapping 4 : N 4 A4 of a manifold N into A4 is called an integral map of H if 4 * ( N p ) c H , for all p E N . A functionfE F ( M ) is called an integral function of H if X ( f ) = 0 for all
fEM.
Notice that this notion generalizes ideas we have already discussed for the case where H has a basis consisting of a single vector field. In that case, the integral maps that are submanifolds are one-dimensional, that is, are determined locally by ordinary differential equations. As we shall see, integral maps of more general vector-field systems are determined locally by partial differential equations.
65
8. The Frobenius Complete Integrability Theorem
Suppose for the rest of this chapter that all points of M are maximal points for H. Notice that an integral functionfc F ( M ) is constant along an integral submanifold map 4 : N - + M . For suppose t -+ o(t) is a curve in N . Notice that df(H,)
=0
for all p
E
M.
(8.1)
Since &(o’(t)) E H g ( t ) we , see that d dt
- (f(4(40))> = 0,
which shows thatfis constant along N (if it is connected, of course). Now, we may ask: When are the integral submanifolds determined by the integral functions? Precisely, we mean the following: For p E M , let H,‘ be the set of all vectors z, E M , such that: df(z,)
=0
for all integral functionsfdefined in a neighborhood of p .
By (8.1), we have H, c H,‘. If H,
= H,’
for all p
E
(8.2)
M,
we say that the integral functions determine the integral submanifolds. In fact, supposef,, . . .,f, are a maximal set of functionally independent integral functions of H defined in a neighborhood of p . Consider the submanifolds defined locally about p by setting these functions all equal to constants. One obtains submanifolds locally defined about p that will also be integral submanifolds of H if (8.2) is satisfied; that is, if Y = dim N - s. However, H cannot be an arbitrary vector-field system if this condition is satisfied. For then H can locally be defined as the set of all X E V ( M ) such that d f 1 ( X ) = 0 = * . . = dfS(X). Thus, if X , Y E H , dfl(CX, Yl) = 0 = ... = df“,
Yl),
that is, [H, H ] c H. Algebraically, this means that H i s a Lie subalgebra of the Lie algebra V ( M ) . Geometrically, (8.3) is an “integrability condition,” as we shall now prove. COMPLETE INTEGRABILITY THEOREM, LOCAL VERSION) THEOREM 8.3 (FROBENIUS
Suppose H is a vector-field system on H which satisfies the integrability condition (8.3). Suppose that p is a maximal point for M , with r = dim H,.
66
Part 1. Calculus on Manifolds
Then p has a neighborhood U and a coordinate system ( y i ) , 1 I iI n, defined in U such that (a) The a/dy,, 1 5 a I r, form a basis for H in U . (b) The y , , r + 1 < u I n, form a basis for integral functions of H in U (in the sense that any integral function f can, in this coordinate system, be written as a function of the y , alone). (c) The submanifolds y , = constant are integral submanifolds for H .
(A coordinate system with these properties is called a$at coordinate system for H.) The proof will proceed by induction on n ; this induction will involve repeated application of Theorem 6.3, and the following trick.
LEMMA 8.4 If M is sufficiently small, there exists a basis (X,) for H in M so that X b l = O. 3
Proof. We can first suppose that M is sufficiently small so that there are elements A’,’ E H forming a basis of H in M. Suppose that
X,’
a axi
= Aai-,
in terms of any local coordinate system (xi). Thus, rank ( A J q ) ) = r for all q E M . By at most relabeling the coordinate system and possibly choosing M smaller, we can suppose that det(d,b(q)) # 0
for
E
M.
Let (Bob)be the inverse matrix to (Aab);that is, BnbA,, then
= 6,,
. If X ,
= BobXb’,
The (X,) also form a basis for H i n M , since (Bob)is everywhere a nonsingular matrix. Thus: [ X , , X,] must be a linear combination of the X,; that is, [X,, X,] =fa,, X , for some functions fobc E F ( M ) . Note, however, from the form of X , given above, that [X,, X,] does not have any terms involving a/ax,. This forces fa,, = 0. Q.E.D.
8. The Frobenius Complete Integrability Theorem
67
Suppose now that (X,) is a basis for H in M satisfying [ X , , Xb] = 0. Using Theorem 6.3, choose a new coordinate system ( y i ) for A4 (if necessary, making M even smaller) so that
x i =-.a
aYl
Suppose X ,
= Cai(a/ayi).Then
0 = [ X , , X,]
ac. a a~1 ayi ’
= ?--
that is, aCai/ayi= 0 and hence
Suppose
Then, for 2 I a, b I r, =
Ex,, xbl
=
[X,’,
xb’]
+ xi,c b l x l + Xb)l a alone. + terms containing -
= [Calxi
ayi
Thus, also [X,’, X,’] = 0, 2 I a, b I r. The alay, = X i and the X,‘, 2 I a I r, form a new basis for H in M . But (X,’), 2 I aI r, is a basis for a completely integrable vector-field system in a domain of the space of variables ( y 2 , ... ,y,). Part (a) of Theorem 8.3 thus follows by induction on n. Parts (b) and (c) are then obvious consequences of (a). For example, let us prove (a). Letf(y) be an integral function expressed in terms of these coordinates (yi). Since (a/ay,) E H, we have df/ay, = 0; that is f is a function of y , + , , . . . ,y , above. Q.E.D.
Remarks. Theorem 8.3 provides us with r-dimensional integral submanifolds locally defined about each point. The global form of the Frobenius theorem provides (if every point is a maximal point) a unique, maximal, connected integral manifold passing through each point. The intuitive idea in its proof is to take the piece of an integral submanifold provided by the local version (that is, Theorem 8.3) and “ analytically continue” it. For example, the process we described earlier for finding integral curves of a vector field
68
Part 1. Calculus on Manifolds
defined over maximal intervals of real numbers is a special case. We shall give more details below. There is a dual description of vector-field systems that is also useful. Suppose o r +,, . . . , onare 1-differential forms on M. Define H as the set of vector fields X such that 0 = or+, ( X )
=
. . . = on(X)*
Suppose (xi)is a coordinate system for M , and o,= aUidxi. If the rankof the (n - r ) x n matrix ( ~ ~ ~ is ( p(n) -) r ) at every point of M , then every point of M is a maximal point, and r = rank H . Suppose, for example, that: o,= dx,, + auodx, .
Now an integral submanifold
4 : N -+
(8.4)
M , of H , satisfies
4*(o,) = 0,
(8.5)
since If the o,are given by (8.4), we can attempt to define integral manifolds of H by givingf,(x,) to xu as a function. Then (8.5) requires that the following system of differential equations be satisfied :
The integrability conditions (8.3) can also be expressed in terms of differential forms. In fact, we have the following result, which we leave to the reader.
LEMMA 8.5 H is completely integrable, that is, satisfies (8.3), if and only if do, can be written in the form 8," A w , , for some choice of 1-forms d,,; that is, do, belong to the " ideal " (in the Grassman algebra of forms) generated by the o,. Now we turn to the global form of the Frobenius theorem. Let H be a vector-field system on M ; suppose that every point of M is a maximal point of H (we then say that H i s nonsingular), and the integrability condition (8.3) is satisfied. Recall that an integralcurve of H is a C" map f + a(r)of an interval [a, 01 + M such that o'(t)E H,(().
for a I tI 6.
Let us extend this notion to define an integral path of H as a continuous image of an interval of real numbers that is composed of a finite number of pieces of integral curves. For p E M , let Lp denote the set of points of M that
8. The Frobenius Complete Integrability Theorem
69
can be joined t o p by an integral curve. Lp is called the leaf of H which passes through the point p . THEOREM 8.6 (GLOBAL VERSION
OF THE FROBENIUS COMPLETE INTEGRABILITY
THEOREM)
Each Lp can be made into a submanifold of M so that it is a maximal connected integral submanifold of H , and so that LqP= H ,
for all q E Lp.
(8.6)
We can sketch the proof of this theorem from the local version proved in Chapter 8. First of all, recall that a function f defined on an open set of M is an integral of H if X ( f )=0
for all X E H .
This condition is satisfied by f if and only if it is constant along all integral paths of H. Let q E Lp. By the local version, there is a neighborhood U of q on M and a coordinate system xl, ~.. , x, for U such that x,+
1,
. . . ,x, are integrals of H ( r = dim H J .
Such a coordinate system will be said to be afrat one with respect to H . Any other f E F ( U ) that is an integral of H can be written as a function of the x,,,,
*.*,x,.
A basis for the open sets of L p will be obtained as follows: For q E Lp, let U be an open set containingq and carrying a flat (with respect to H ) coordinate system xl, . . . ,x,. Since x , + ~ ., . . , x, are constant on Lp, the map q’
-+
(xl(q’),
..
. 7
XAq’))
defines a 1-1 correspondence of the subset {q’:xr+l(q’) =xr+I(q),
q e . 9
x,(q’) =x,(q)>
of L p n U with an open subset of R‘. We call this set of L p a slice of Lp with respect to the flat coordinate system. All such slices will be taken as the basis for open sets on the topology of Lp. The slices determine a system of coordinate systems, with open subsets of R‘, for a possible manifold structure on Lp. The transition map between two such coordinate systems is C“, since it is given by functions inherited from the transition maps for coordinate systems of the manifold structure of M . What is not obvious a priori is that, with the topology so defined, L p can be covered with a countable number of open sets. In fact, this is a rather deep fact, whose proof we shall not give here, but for which we shall refer the reader to Chevalley [I]. Let 4 : L p - + M be the inclusion map. It is clearly C“, and 1-1 ; that is, it is a submanifold map.
70
Part 1. Calculus on Manifolds
It should be obvious from its construction that (8.6) is satisfied. However, what is meant by “maximal” integral manifold? Suppose that 4’: N + M defines a connected integral manifold of H, and that the inclusion map 4: L p + M can be written as 4 = 4‘$, where $ is a map: Lp + N , and Cp‘ is an integral submanifold map: N + M $ , with N connected. Since Cp‘t,b is a submanifold map, so is $. Now + ( N ) is contained in Lp, since every point in 4 ( N ) can be joined to p by an integral path of H ; hence $(Lp)= N . Then, $ is a 1-1 onto submanifold map and hence must be a diffeomorphism. It is in this sense that L p is a maximal ” integral submanifold. This finishes the proof of Theorem 8.6. However, as a by-product of the proof we obtain the following theorem, which plays a very important role in the theory of Lie groups. “
THEOREM 8.7 Let p E M , and let L p be the leaf through p of the nonsingular completely integrable vector-field system ff. Suppose that 4: N + M is a map of manifolds such that 4 ( N ) c L p . Then 4 can be factored through a differentiable map $: N + L p ; that is, $ followed by the inclusion map: L p-+A4 is Cp.
Proof. Evidently, there is a point-set map $: N + L p with this property, but it is not obvious that it is a C“ map. Suppose, then, that J’E F(LP).We must show that $*(f)E F ( N ) . Since L p is a manifold,,fcan be written as the sum of functions that vanish to the outside of slices of L p . To see this, note that we have taken over the proof given by Chevalley [ l ] that L p can be covered by a countable number of slices by flat coordinate systems of H . The existence of a “partition of unity” (see Chapter 7) for this covering of L p then guarantees this property, since f can be written m
where f l , f ; , . . . is the partition of unity,’’ with each of its elements contained in a slice-coordinate neighborhood of L p . Now any such function can be written as F ( x , , . . . , x,.).This function can evidently be extended to a C“ function in a neighborhood of N surrounding the slice. Thus $*(f)is obtained by pulling back via Cp a C“ function on M , and hence is C“ on N . “
Q.E.D.
Theorem 8.7 guarantees that the submanifolds defined as leaves do not have one kind of possible pathology. However, there is another sort of pathology that they might have, namely, they may not be regularly embedded in the sense of the following definition:
8. The Frobenius Complete Integrability Theorem
71
Definition
Let #: N -+ M be a submanifold of the manifold M . It is said to be regularly embedded if # is a homeomorphism of N with # ( N ) , that is, if the map # - I : # ( N ) + N (which exists in the point-set sense, since # is assumed 1-1) is continuous.
In fact, N is regularly embedded if and only if the following property is satisfied : Every point p E # ( N ) has a neighborhood U such that #-'(U n #(A')) consists of one connected coordinate neighborhood of N . Then, if we think of a curve on a space winding around it an infinite number of times, coming nearer and nearer to a given point each time, it is not regularly embedded. We leave the discussion of the global properties of completely integrable systems at this point, with an apology to the reader for lack of details and examples concerning this rich subject, which deserves a book of its own. Our immediate aim here is to do only enough to use the results as a tool in Lie group theory. Exercises
1. Suppose M is a manifold and X , , . . . , X , are vector fields such that [ X i , Xi]= 0 for 1 I i, j I r, and such that the(X,(p)) are linearly independent at a point p E M . Show that there is a coordinate system ( x i , . . . , x,) valid in a neighborhood of p such that
x.=-a
axi
for 1 i i _< r.
2. The torus is defined as the space obtained by identifying two points
x = (xl, x,), x1 = (x,',x,') whose coordinates differ by an integer. Consider
the system of parallel lines in R2 whose slope is a given vector a = ( a i ,a,). Show that this, projected down to the torus, is a one-dimensional foliation? whose leaves are those lines. Find the conditions on a that the leaves be nonregularly embedded, or dense, or both. Also examine the question of the existence of global integrals of the foliation. (Approached directly, one probably has to use facts from number theory, which can be found if necessary The system of leaf-submanifolds defined by a nonsingular, completely integrable vector field system is called a foliation.
72
Part 1. Calculus on Manifolds
in appropriate texts. There are other indirect proofs using Lie or topological group theory or both. It would be instructive to compare the two approaches.) 3. There is another proof (given by Cartan [l]) of the local existence of leaves that starts from (8.4). Construct the functions xu =fu(x,) by finding ordinary differentials for the functions t -+f,(tx,), with (x,) regarded as a “ parameter.” Work this out as a problem, and show directly that the resulting functions actually do define an integral submanifold.
4. Suppose H is a vector-field system on M , with dim H(p) = n for all p E M . Suppose that each point of M has an n-dimensional integral submanifold of H passing through it. Must H be completely integrable?
9
Reduction of Dimension when a Lie Algebra of Vector Fields Leaves a Vector-Field Invariant
As we have said, the Lie theory of ordinary differential equations is concerned with discussing the interrelation between a set of differential equations and a group of its “ symmetries,” with particular emphasis on the question of how various properties of the group help in the practical problems connected with the differential equations. We shall now examine one typical situation. Let M be a manifold, X E V ( M ) a vector field on M , and L a linear set of vector fields on M such that
CL,L1 = L, [L, X ] = 0.
(9.1 a) (9.1b)
(Condition (9.1 a) means that L forms a Lie algebra of vector fields.) Let p be a point of M . We shall suppose that in a neighborhood U of p , there are elements Y ,, . . ., Y, E L such that each Y EL can be written uniquely in the form
Y = fa(Y)Ya + f ( y ) X ,
(9.2)
a 2 r and the with functions fa( Y ) , f ( Y ) E F ( U ) . (Choose indices 1 I summation convention.) Using condition (9.1b), we have 0 = CX,Y l = x(fa(Y>>Ya+ X(f(Y))X*
Hence
X(f,(Y))= 0
for 1 5 a 5 r ,
Y EL.
(9.3)
This means that all thefa( Y ) are integrals of X . Since we are trying to solve ” A’, that is, find as many integrals as possible, note that any functionfthat can be expressed as a polynomial in the fa( Y ) is an integral. Designate 2 ! the set of integrals obtained in this way. (In other words, R is the smallest algebra (over the real numbers) of functions defined in CJ containing all the f,( Y), 1I aI , YEL.) “
LEMMA 9.1 If Z E L ,f E R, then Z ( f ) E R. 73
74
Part 1. Calculus on Manifolds
ProoJ: It suffices to prove this when f occurs among one of the generators of Q, that is, as one of the f,( Y ) , for Y EL.
Comparing these two expressions, we see that Z ( f , (Y ) )E R, as required. Let N be the subset of U consisting of the points q such that for allf
f(q)=f(p)
E
R.
We shall suppose that N is a submanifold of M . Note that the vector field X is tangent to N ; hence we can reduce the problem of finding its integral curves to finding the integral curves of X restricted to the submanifold N . (This process can be repeated about every point of M of course.) Let L , consist of those vector fields X in L that are tangent to N , that is, X ( q ) E N , for q E N . Then L, is a Lie subalgebra of L, that is, [LN, L,] c L, . Let XN be the vector field X restricted to N so that, also
[LN
9
xh'l
= O.
Then the process can be iterated. Let us examine this. Suppose the Y , , .. ., Y , EL were chosen so that Y , , . . ., Y, E L,, Y, , , , . . ., Y, are linearly independent mod L,; that is, no linear combination of them lies in L,. If Y EL, , then y = f m x ,+ f ( W .
i:
z =a = s + 1 f m x a
(9.4)
is tangent to N . Let us suppose that R has a certain number of functions that, in the neighborhood of p , are functionally independent, and such that every other function in R can be written as a function of them. We can suppose these functions, x I , . . . ,x,, are part of a coordinate system, x l , . . ., x, for M . Choose indices 1 I ii n. Then we can suppose without loss of generality that N is determined by the equations: x i = 0, and that p is the point 0 of R".
LEMMA 9.2 I f Y EL satisfies Y ( p ) E N , for one point p
E
N , then Y EL,
.
75
9. Reduction of Dimension
Proof. We know that Y ( x i )for 1 5 i I n are functions and Fi(x,, .. . , x,,) Y is tangent to N if Fi(0) = 0 for 1 < i 5 n. But, thisis soif and onlyif Y ( p )E N p . Q.E.D.
We see from this lemma that 2 defined by (9.4) is identically zero on N . For otherwise there is a point q E N such that & s + , f,( Y)(q). Y, E L , , which is a contradiction. Thus, we have for Y EL, , Y = i f a ( y ) y a+ f ( y > x , a=
1
restricted to N . Now thefa( Y ) are constant on N , since they belong to Q. There are then essentially two cases: Case 1 f( Y ) = 0 on N for all Y ELN In this case, notice that Y EL, is everywhere nonzero on N if it is nonzero as an element of the vector space L. A Lie algebra of vector fields with this property is said to act simply. Thus we may say that the reduction process reduces the general case to the case where the given Lie algebra of vector field acts simply. Case 2 fY # 0.fov some Y ELN
c:=
Then Y f,( Y )Y, and X are vector fields whose integral curves differ only by a change in parametrization. However, the former vector field is an element of L ; hence its integral curves may be considered as “ known.” Thus the integral curves of Xare “known” also by a simple quadrature, once f( Y ) is known. Now we consider another method for reducing dimension when a known Lie algebra L of vector fields commutes with a given vector field X , that is, [ X , L]
= 0.
Recall that a functionfe F ( M ) is called an integral of L if Note that:
Y ( f )= 0
for all Y E L.
Iff is an integral of L, so is X ( f ) .
Suppose that (x,), 1 5 a 5 m, is a functionally independent basis for the integrals of L ; that is, any integral function f on M is a function of the x,, . . . , x, above. Let (y,), 1 I iI n, be a set of functions on M such that ( x u ,y i ) forms a coordinate system from M . Now X can be written in the form
a a x = A&) ax, - + Ai(X, y ) -. aYi
76
Part 1. Calculus on Manifolds
Let X ‘ be the vector field in x-space defined by X‘
a
= A,(x) -.
ax,
Then the integral curves (x(t),y(t)) for X can be obtained by solving two lower dimensional systems:
(9.5b) Thus (9.5a) can be solved first for x(t), which is then substituted in the right-hand side of (9.5b) to be solved for y(t). For example, we may be able to change coordinates for x,-space so that X ’ = a/ax, . (In fact, this is what is meant by “ solving” (9.5a).) Then (9.5b) takes the form
for a choice of constants c,, . . . , c,. Continuing on a general level, suppose that L and L‘ are Lie algebras of vector fields, that L c L’, and [L’, X ]= 0. Suppose, as above, that the coordinate system (xu,yi)is chosen so that i3/axUis a basis for the vector-field system defined by L. Then
We do not assume that [L’, L ] c L, so that the elements of L‘ not contained in L do not “ pass to the quotient to define vector fields on x,-space leaving X ’ invariant. Thus they are no help in the problem of integrating X ’ . However, once X ‘ is solved,” they can be of use in solving (9.6). As an explanation, suppose that the coordinate system (x,) is chosen so that X‘ = a/dx,. Then ”
“
x =- a
ax,
a
+ /I.-
’ axi
and x 2 , . . . , x, are integrals of X . Hence the Y ( x , ) , . . . , Y(x,), Y , Y2(x2), . . . , Y , Y2(xm),. . . are integrals of X for all Y,, Y , , . .. E L‘. We need (n - nz) integrals of X that are independent of Y , , . . . , Y, in order to say that X has been completely “ solved.” The point is: “ Purely algebraic” conditions can be given for L and L’ which guarantee that this is so. We now turn to the following more explicit example.
77
9. Reduction of Dimension
Matrix-Riccati Systems Change notations slightly. Let i, j , .. . ,range between 1 and n. The underlying space is that of the variables ( t , xij),a space of dimension n2 + 1. Consider a vector field
a + U i j ( t ) X j k -.a x =at
Thus the parametrization of the integral curves of X is precisely the given t, and the integral curves are determined by the following system of linear homogeneous ordinary differential equations :
If b = ( b i j )is a constant matrix, let xb
Lxb
9
xO1
=
[
Xik bkj
= Xik bkj
a ax,, .
-1
a a a 5 + uhl X l m aX hm a 3
= uhl X i k b k j 8ij,lm
a
- al,l aXhm
X l m 6ik;hm b k j
a
= uhl X l k bkm -- uhl X l m b m j aXhm
=
I f b’ = (bij),then
X t k bkm
a
-- a h l X t k aXhm
aXhj
bkm
a
-= 0. aX hm
a axij
-
78
Part 1. Calculus on Manifolds
Thus, Cxb
3
xb']
= Xbb'-b'b
9
and the collection of the x b forms a Lie algebra of vector fields that leaves X invariant. Hence the above theory can be applied. However, the set of all x b is too big, since the vector-field system it determines is the set of all vectors on xij-space. Thus we look for a subalgebra of such vector fields to which to apply the theory. There is an obvious advantage in choosing the subalgebra as large as possible, since then the system (9.5a) will be as small as possible. Rather than go any further here into the general algebraic details, we shall deal only with a special choice.? Divide the indices 1 I i, j , . . . , 5 n into two groups: 1 5 a, 6, . . . , 5 m ; m + 1 5 u, u, . . . , n. Consider the set of all matrices b = (bij)such that b,,
= 0.
(9.7)
If b and b' satisfy (9.7), so do bb' and b'b; hence so does bb' - b'b. To see this, (bb'),,
=
bai b:, = b,, &,
= 0.
Let L be the set of vector fields x b = Xijbjk(d/dXk) such that b satisfies (9.7). Let L' be the set of all vector fields of the form xb, with b an arbitrary matrix. According to the general theory described above, the next step is to solve the completely integrable vector-field system defined by L'. This can be done explicitly in a certain neighborhood U of the identity matrix I = ( h i j ) $ ; let U be the subset of matrices (xij) such that det(x,,) # 0. For each n x n matrix x E U, let y(x),,, be the functions of x such that (y(x),,,) is the inverse matrix of (xu,);that is, xu,
Y(X),W
=
44,.
(9.8)
Let f(x),, = x,, ~(x),, be the indicated nm functions defined for each n x n matrix x E U . We shall show that thef,, are integrals of the vector fields of the form X,, where b satisfies (9.7). First suppose that X is any vector field. Apply X to both sides of (9.9a) (9.9b)
t Algebraically, the set of all A', is isomorphic as a Lie algebra to the linear Lie algebras of all n x n matrices. The subalgebras we now describe are maximal subalgebras. 1 Note that the vector fields A', are independent of 2.
79
9. Reduction of Dimension
But if
x = x b = Xik bkj(d/aXij), x(xau)Yuu
with bau= 0, We have = xak bku Y u u = xau buu Y u u .
x a u ~ u x x ( x x w ) ~ w= u xau Yaxxxk
b,w
~
= x a u ~ u x x x byy w ~
w
u
w u
= xau 6uy b y w ~ w u
- xay byw ~
w u .
Hence we see that xb(hu) = 0 for all b satisfying (9.7), that is, for all X EL. It is also easily seen that the functions f a , in U are functionally independent; that is, the dfaUare everywhere independent. In fact the functions (fa,, xu, x a b xua)form a new coordinate system for U. According to the general theory, the next step is to calculate X ( f , , ) , for then
X'
=
a + -a
Y,(fa,) afau
at'
THEOREM 9.3 Consider a system of linear homogeneous differential equations :
ni
+ 1 5 u,u,w,x,y, . . ., I n.
(9.10)
Consider U, the open set in the space of all n x n matrices (xij)= x determined by the conditions: (a) det(xuu)# 0. (b) Let y ( x ) , , be the inverse matrix of ( x u " ) . Introduce a space of variables zau and on this space a system of ordinary, time-dependent differential equations (a matrix-Riccatti system) :
80
Part 1. Calculus on Manifolds
Consider the map 4 from U to this z-space which assigns to each x E U the point +(x) = z = (z,"), with z,, = x,,y(x),, . (a) I f x(t) is a solution of (9.10) that lies completely in U , 4(x(t)) = z(t) is a solution of (9.1 1). (b) Suppose z ( t ) = (z,,(t)) is a solution of (9.11). Suppose that ( x ; ~ = xu E (Iis such that 4(xa) = z(a). Let x(t) be the solution of (9.10) such that x(a) = .'x If x(t) lies in U, then d(x(t))= z(t).
10
Lie Groups
It will be assumed that the reader is acquainted with the elementary algebraic properties of groups. Recall that a group denoted typically by G, with elements g , g l , g', etc., has associated with it a multiplication operation (9,g l ) + g g l , satisfying the rules : g(g1g2)= ( g g l ) g 2
(associative law).
There is an identity element e E G such that for all g
eg = ge = g
For each g E G, there is an inverse element 9-l
E
E
G.
(10.1)
G such that
g-'g = e = 99-1.
Definition Let G be a group and let a : G x G + G be the map that assigns g l g i l to the pair g l , g 2 E G. If G in addition has a topological structure so that a is a continuous map, we call it a topologicalgroup. If, further, G has a manifold structure so that a is a differentiable (that is, Cm)map, we speak of a Lie group. It can be proved that two such manifold structures that give rise to the same topological structure must coincide, so for most practical purposes we think of the manifold structure as determined by the group structure. Historically, groups arose as transformation groups on spaces. Some typical examples would be: The group of permutations of a finite set; the group of linear or affine transformations of a vector space; the group of canonical transformations in classical mechanics; the group of unitary transformations in quantum mechanics; the group of Lorentz transformations in the theory of special relativity. Definition Let G be a group and let M be a space. An action of G by transformations on M is defined by a map: G x M + M , (9,p ) + gp, such that 81
82
Part 1. Calculus on Manifolds
for 91, 92 E G, P E M -
91(92 P ) = ( g l g 2 ) P
for p
eP = P
E
M,
where gp is thought of as the transform o f p by the transformation g. If M is a manifold and G is a Lie group, we speak of G as a Lie transformation group if this map G x M -+ A4 is differentiable. Let us study the simplest type of transformation group, where the group G acts as a group of linear transformations on a real vector space V. We assume that to each element g E G is assigned a linear transformation p(g): u u, and the map G x V - t V, ( g , u ) -+ p(g)(v), satisfies the transformation-group conditions described above. We call p a linear representation of G. For certain applications, to be described later, V must be infinite dimensional. We shall suppose that it is a topological vector space, that is, the concept of the limit limn+mu, = u of a sequence of elements of V is well defined, and the sum of limits equals limit of sums, that is, --f
lim(u,
n-m
+ un) = u + u ,
if lim u, = u,
lim u,
n
n
= u.
This enables us to define the derivative du(t)/dtof curves t -+ u ( t ) in V : - - - lim (u(t
dt
Aft0
+ At) - u(t))/At,
with the usual rules of differential calculus satisfied. Suppose that t g ( t ) defines a one-parameter subgroup of G ; that is, a map R -+ G is given such that -+
dtl
+ t2) = g ( t M t 2 )
for t l , t z
ER.
As we have already seen, such objects play a very important role in the application of group-theoretic ideas to differential equations. Lie group theory (as opposed to abstract group theory) is concerned with studying a group by means of its one-parameter subgroups. Definition
Let p be a linear representation of a Lie group G by linear transformations on a vector space V , with t + g ( t ) a one-parameter subgroup of G. A linear transformation A : V + V is called the infinitesimal generator of the oneparameter group t + p(g(t)) of linear transformations if = (by
definition)
lirn t+O
p(g(r))u - u t
We shall suppose that each one-parameter subgroup of G has in this sense
83
10. Lie Groups
an infinitesimal generator. (If V is finite dimensional and if G acts as a Lie transformation group on V , then this condition is obviously satisfied. Certain infinite dimensional V , to be described below, also satisfy it.) Let us also suppose that the mapping G x V - , V, (9, u) + p(g)(u), is continuous in the sense of mapping convergent sequences into convergent sequences, with limits mapped into limits. Conversely, p(g(t)) is determined by a linear differential equation involving A : This is the reason for the terminology of “infinitesimal generator.”
that is, the orbit t -P p(g(t))(v) = u(t) of the one-parameter group satisfies the linear differential equation “
”
d
u(t) = A(u(t)); dt
u(0) = u.
(10.2)
Conversely, these steps are reversible: If A is a given linear transformation: V + V and if (10.2) has a unique solution, then a one-parameter group of linear transformations, denoted by exp(tA), is defined by the rules exp(tA)(u) = u(t). The motivation for this notation is that exp(td)(u) is defined by the power series (10.3) within its domain of convergence. For example, if u is finite dimensional, (10.3) always converges. (See Chevalley [l, Chap. 11.) Such operator-power series can indeed be handled much as real or complex power series, provided one handles the possible noncommutativity of operators with care. (See Exercise 2.) Having associated linear transformations with one-parameter subgroups, we may ask for the relation between the algebraic operations possible on linear transformations and the properties of the one-parameter subgroups. For example, operators may be added : (A
+ B)(u) = A(u) + B(b)
for u E V,
84
Part 1. Calculus on Manifolds
multiplied by (real) scalars
(c’wu)
=ca4,
and the commutator [A, B] = AB - BA of two operators may be defined. The addition and scalar multiplication imply that the operators form a vector space (over the real numbers). The commutator operation defines it then as a Lie algebra in the sense of the following definition. Definition A real Lie algebra, typically denoted by G , with elements denoted by X , Y, . . . , is defined by requiring that:
(a) G is a vector space (over the real numbers). (b) A bilinear multiplication operation (A’, Y) -+ [ X , Y], G x G -+ G, is defined for elements of G , satisfying the following law, called the Jacobi identity: [ X , [Y, 211 = CCX, YI, Zl
+ CY, [ X , Z l l
for X , Y, Z
E
G.
(c) [ X , Y ] = - [ Y, X ] for X , Y E G .
Of course, on this purely algebraic level, the real numbers are not sacred: The definition makes sense for an arbitrary field of scalars, for example, for the complex numbers, the rational numbers, and the integers mod a prime number. However, Lie algebras over a field of nonzero characteristic have certain unpleasant features. The most interesting cases are the real, complex, and rational numbers, and accordingly one speaks of a real, complex, or rational Lie algebra. For the purposes of Lie group theory, the real case is by far the most important, and when we talk about a Lie algebra without mentioning the scalars, we shall always mean a real one. It is readily verified that the commutator definition (A, B)+ [A, B ] = AB - BA makes the linear operators on V into a Lie algebra. How does this translate back into terms of one-parameter groups? Explicitly, we ask the following question: Suppose t -+gi(t), i = 1, 2, 3, 4, are one-parameter subgroups of G, with
how are g3(t) and g4(f) related to g,(t) and g2(t)? To answer this question, let us work formally for the moment.
85
10. Lie Groups
LEMMA 10.1 If A is a linear operator, V + V, then formally: exp(tA)
= lim "-+
w
(I
+ :)",
(10.4)
Proof. There are two approaches. First, purely as an operator equation, we have, using the binomial expansion,
=I
~ t ) ' n - 1 (At)3 ( n - l ) ( n - 2) + At + (.++..-. 2 n 3! n2
Formally, as n -+ coy this goes over to the power series for exp(tA). Another approach would be to work with the differential equation (10.2). Set v,(t) =
Hence v,(t) - v =
(1 +-?)"(v)
s'.( + 0
1
As n-1
for E V
" (y) )
-
n-1
( v ) ds.
(10.5)
Then, if v,(t) exists and equals, say, v ( t ) , and if the formal limiting operations in (10.5) are justified, we have v ( t ) - v = J)(v(s))
ds,
which is the integral equation form of (10.2). We shall not get involved with the material in functional analysis necessary to justify these formal limits, since it would take us too far afield. (See Yosida [I]). However, these results will be very useful to us as intuitive motivation.
86
Part. 1. Calculus on Manifolds
LEMMA 10.2 If A , B are linear operators, V - t V, we have formally: exp(t(A
+ B ) ) = lim
(10.6)
n+m
exp(t[A, B ] ) = lim[exp(:)
exp(:)
ex^(?)]^*.
exp(+)
(10.7)
Proof. We prove only (10.6). Equation (10.7) is similar, and is left as an exercise. Set C ( t ) = exp(tA) exp(tB). Then dC - (0) = A + B. dt (Since (d/dt)exp(tA) is, formally, A exp(tA), exp(0A) = I, the identity operator, and the product law for differentiation holds as long as the order of the operation is respected.) Suppose Taylor's expansion holds. Then
C(t)= I
+ (A + B)t + tZA2(t),
where A , ( f ) is a well-behaved function in the neighborhood o f t [exp(:
A ) exp(r;"
.)In
=
[I
+(A+B)t + ~
n
= 0.
(D'(i)j" -
A,-
Then
.
Note that the right-hand side will not be affected by the third term as n + co, since it has an n2 i n the denominator and the product involves n terms. Then the limit as n + co is
which proves (10.6) formally. Equations (10.6) and (10.7) are the key formulas connecting Lie algebras and Lie groups. They suggest the following ideas. The "Lie algebra" of a Lie group denoted by G should be defined as the set of one-parameter subgroups of G. The algebraic operations necessary to define a Lie algebra can be, intuitively, presented as follows: If t -+ g(t) is a one-parameter subgroup, if c E R, then the "scalar product" of c with the subgroup is the subgroup f -t g(ct).
(10.8)
87
10. Lie Groups
If t - +gl( t) and t -+gZ(t) are one-parameter subgroups, the “sum” of the two is the one-parameter subgroup t -+ g 3 ( t ) such that
(10.9) The bracket formula “
”
one-parameter subgroup t -+ g4(t) is
“
defined ” by the
Now, so far there is no guarantee that these limits exist or that they satisfy the identities needed to express a “ Lie algebra.” However, this suggests such a direct and intuitive approach toward defining the Lie algebra of a Lie group that we shall do it anyway. Use the symbol X to denote the one-parameter subgroup t -+g(t), and write g(t) = exp(tX). If exp(tX) = g l ( t ) , exp(tY) = g2(t),define X + Y and [ X , Y ] so that exp(G
+ YN = g 3 ( t ) ,
exp(tCX, Yl) = g4(0.
Since Helgason does in fact prove [l, Chap. 21 that necessary limits exist, we shall adopt this definition of the Lie algebra. We may say then that we have shown, if the formal steps can be made rigorous, that this “ notation ” for G suggests an algebraic interpretation of our work on linear transformations and infinitesimal generators. Suppose p is a linear representation of G by operators on V. To each one-parameter group denoted by the “symbol” X , that is, the group is t -+ exp(tX) = g(t), associate the infinitesimal generator A : V + V : p(exp(tX))
= exp(fA).
Let A = p(X). Regard p as a mapping of G -+ (Lie algebra of linear operators on V ) . Then Lemma 10.2 asserts that p is a Lie algebra homomorphism; that is, P(X
+ Y ) = P(X) + P
( n
P(CX7 YI) = CP(X), P(Y)I.
Since the foundations of Lie group theory are not our main concern, we shall leave the development of this general approach at this point and turn to more geometric material. Suppose G acts as a transformation group on a manifold M . For topology, adopt that of pointwise convergence; that is, a sequence ( f n ) of functions converges if Iim f n ( p ) = f ( p )
n-t
00
for all p E M
88
Part 1. Calculus on Manifolds
The linear representation p of G by transformations on v is defined as follows: P ( 9 ) ( f ) ( P ) =f(s-’P) f o r f e F ( W , P E (10.1 1) Let G be the collection of dzfferentiuble (say, Cm), one-parameter subgroups t + g(t) = exp(tX) of G. Then, if p denotes the infinitesimal generator of the one-parameter group t -+ ( p ( t ) ) of linear transformations on F ( M ) , we have d (10.12) P(X)(f)(P) = ,f(exp( - tX>P) i=o-
Also, P(X)(f*f2)
= P ( X ) ( f ,) f 2
+f l P ( - m f 2 ) -
This shows that p(X) is a vector field on M , that is, an element of V ( M ) . We can sum up these ideas in the following theorem.
THEOREM 10.3 Suppose the Lie group G is a transformation group on a manifold M . Equation (10.1 I ) defines a representation of G by linear transformations in F ( M ) , and (10.12) defines a mapping of G , the set of all one-parameter subgroups of G, into V ( M ) . Suppose t g(t) is a one-parameter subgroup of G and X E V ( M ) is the vector field on M defined by (10.12). Then each orbit t - + g ( t ) p of the oneparameter group is an integral curve of the vector field - X . -+
Proof. The first part of the statement is evident. To prove the second, suppose a(t) = g(t)p, f E F ( M ) :
=
=
lim f ( g ( A t ) g ( t ) p ) - f ( g ( t h )
At-0
-X ( f ) ( d t ) P ) .
Q.E.D.
There are three standard ways to make G act on G itself (strictly speaking, G acts on M , where M is the underlying manifold structure on the space of points making up G), namely: (i) Left translation: Given g E G, L, denotes the diffeomorphism h +gh on G. (ii) Right translation: Given g E G, R, denotes thediffeomorphism h hg-’ of G. (iii) Adjoint action: Given g E G, Ad gdenotes the diffeomorphism h -+ ghg-’. --f
Notice that for fixed g E G, L, and R, commute, and Ad g = L, R, .
89
10. Lie Groups
Definition
A vector field X on G is left (right) invariant if L,*(X(f))
= X(L,*(f))
R,*(X(f)) = X(R,*(f))
for allfE &),
all g E G .
for a l l f e F(G), all g
E
G.
For each one-parameter subgroup X E G , let X , be the vector field on G that is the infinitesimal generator of the one-parameter group t -+ R e x p ( f X ) . Thus X, is a left-invariant vector field on M. Similarly, let X, be the infinitesimal generator of the one-parameter subgroup t + L e x p ( f XIt) .is right invariant.
THEOREM 10.4 The mappings X - t X , and X , are 1-1 onto maps from the set of oneparameter subgroups to the set of left- and right-invariant vector fields on G. Proof. We shall work with the left-invariant vector fields. The proof for the right-invariant fields is similar. Suppose first that two one-parameter subgroups t + g , ( t ) and t + g 2 ( t ) give rise to the same element X , . Now both
t-+gl(t)
and
t+gz(t)
are integral curves of the vector field - X , (by Theorem 10.3). Since both begin at e, they must coincide, that is, g , ( t ) = g 2 ( t )for all t . This proves that X + X , is 1-1. T o show it is onto, proceed as follows: Let Y be a left-invariant vector field on G. We shall first show that the integral curve of Y beginning at e can be extended over (- co, a).Suppose otherwise, that is, that a : (a, b) + G is an integral curve of Y which cannot be extended over a large interval. Now the following geometric property of left-invariant vector fields is inherent in this definition : If t + y ( t ) is an integral curve of Y, if g E G , then t g y ( t ) = L,(y(t)) is an integral curve of Y . --f
Thus, for to E (a, b), the curve t + a ( t , ) - ' o ( t ) is an integral curve of Y, which is equal to e for t = t o . By uniqueness of integral curves, ~(1,)- 'a(t) = a ( t - to)
for a I tI b.
This shows that the size of the neighborhood of to in which there exists a solution of the differential equations defining the integral curves of Y remains bounded away from zero as to approaches b, which gives the desired contradiction.
90
Part 1. Calculus on Manifolds
Then let o(t), - co < t < co, be the curve in G that is the integral curve of Y beginning at e for t = 0. Since the to used above can be any real number, o(t,
+ t ) = o(t) .
o(t0);
that is, t + a(t) is a one-parameter subgroup of G and hence defines an element of G , which we call X . The left invariance of Y then proves that
-x, = Y .
Q.E.D.
Remarks. It is more customary to define the Lie algebra of a Lie group as the set of left-invariant vector fields. (For example, this is the procedure adopted by Chevalley [l] and Helgason [I].) While this is most convenient for the purpose of proving the main theorems in the foundations of Lie group theory, it is slightly awkward when considering Lie groups as transformation groups, since the identification of the Lie algebra with the set of one-parameter subgroups is better adapted to the geometric intuition. At any rate, Theorem 10.4 and Exercise 6 shows that this is compatible with the definition we have chosen. There is an action of G on the underlying vector space of G that is also called the acljoint action of G. (Strictly speaking, it should be called the infinitesimal version of the adjoint action of G on G, but it is customary to confuse this point.) It can be most readily defined as follows: For g E G, X E G , the one-parameter subgroup represented by Ad g ( X ) is just t
--f
Ad g(exp(tX)) = g exp(tX)g-’
It is readily verified that for each g E G, Ad g considered as a mapping G + G is a Lie algebra isomorphism. Thus “Ad” also stands for a linear representation of G by automorphisms of G. This can be symbolized by the relation : Ad g(exp t X ) = exp(t(Ad g ( X ) ) )
for g
E
G, X
E
G,
- co
< t < co.
If G acts on a manifold M , we then also have the important formula
g . exp(tX) . p
= g . exp(tX). = Ad
g-l .g p
g(exp(tX)) . SP
= exp(t(Ad
g(X))) . SP.
It seems to be inevitable in Lie group theory that each symbol has at least two possible meanings. For example, we have seen the two meanings of “Ad.” So far, we have been working with one fixed meaning of “exp.” However, there is another related meaning.
91
10. Lie Groups
Definition
Let G be a Lie group and let G denote its Lie algebra. The exponential mapping, denoted also by “ exp,” is the mapping: G G defined as follows : --f
For X E G , exp(X) is the value at t = 1 of the one-parameter subgroup of G determined by X . This completes our listing of the general facts relating a single Lie group to its Lie algebra. However, for geometric purposes it is most important to know the relation between the Lie subgroups of a Lie group and the Lie subalgebras of its Lie algebra. Definition
Let G be a Lie group. A Lie subgroup of G is defined by a pair (typically denoted by (H, 9))such that H i s a Lie group, and 4 is a submanifold map: H + G, that is a homomorphism of the group structures on H and G. As usual in the theory of submanifolds, it is convenient to often suppress explicit mention of the map 9, for the sake of notational simplicity, and write H c G. However, it is quite important to keep in mind that the topology that makes H into a Lie group is not necessarily the topology induced from G . In addition, we shall usually say “ subgroup instead of “ Lie subgroup,” since subgroups in the purely algebraic sense will not be considered. A subgroup H of G defines a (Lie) subalgebra, denoted typically by H, of G. For, every one-parameter subgroup of H can be regarded as a oneparameter subgroup of G, defining the inclusion H c G. It is obvious from the definition that all Jacobi brackets of elements of H again lie in H, so that it is a subalgebra; we say that H corresponds to H. One of the main theorems of Lie theory is Theorem 10.5. ”
THEOREM 10.5 Let G be a connected Lie group. The correspondence H -+ H sets up a 1-1 correspondence between connected subgroups of G and the subalgebras of G .
Proof. First we shall show that every subalgebra H of G arises in this way from a connected subgroup of G. For this purpose it is most convenient to regard the Lie algebra of G as the set of left-invariant vector fields on G. Thus H can be regarded as a subalgebra of V ( G ) ; hence it defines a completely integrable vector-field system on G. This system is invariant under left translation by G; hence it is everywhere nonsingular. Let H be its maximal connected integral submanifold passing through the identity element of G.
92
Part 1. Calculus on Manifolds
Next we show that H is a subgroup of G in the purely algebraic sense. Let h E H. We want to prove that h-' E H . From Exercise 11 [part (a)], we know that h can be written at least one way in the form h
= exp(X,). . . exp(X,)
for some choice X I , . . . , X , E H.
Now exp(X)-' = exp( - X ) ; hence h-' E H. Next we prove that h,h, E H if E H. By left invariance, Lhl(H) is an integral submanifold of H passing through h,; hence it must be contained in H , whence h,h, E H. Consider the map H x H-+ C defined by (hlh2)-+ h;'h,. Its image is contained in H. By Theorem 9.4 it can be factored through a mapping H x H + H , that is, the map (h,, h,) h; ' h , is differentiable in terms of the manifold structure on H. This defines it as a Lie group. Clearly, the submanifold map H + G defines it as a Lie subgroup of C, and by its very construction the corresponding subalgebra is H. Suppose H I is another connected subgroup of G whose corresponding Lie subalgebra of G is H. Using the fact that a connected Lie group is generated by any neighborhood of the identity, we see that H I and H are identical as point-sets in G. Using Theorem 9.4 again, we see that the identity map H, + H is differentiable. Turning the argument around, the identity map H-+ H , is differentiable; that is, H and H, are identical as Lie subgroups of G. Q.E.D. h,, h,
--f
Thus we begin to see how the algebraic properties of the Lie algebra reflect the algebraic properties of the groups. There is a group of useful theorems giving sufficient conditions that subgroups in the algebraic sense be Lie subgroups : Let H be a subgroup in the algebraic sense of a Lie group G such that every element of H can be joined to the identity element by a (broken) C" path lying completely within H. Then, H is a Lie subgroup of G. For the proof, we refer to Kobayashi-Nomizu [ I , p. 2751. Let H be a subgroup in the algebraic sense of a Lie group G that is a closed subset of G. Then H i s a Lie subgroup of G. Further, the topology on H is that induced from G, and H is a regularly embedded submanifold of G. For the proof, we refer to Helgason [l, Theorem 2.3, p. 6051. These subgroups are the most important in the geometric applications. We shall call them chedsubgroups. They arise geometrically in the following
93
10. Lie Groups
way: Suppose that a Lie group G acts as a transformation group on a manifold M . Let p E M , and let H be defined as follows: H
= {g E
H:gp
= p},
where H i s called the isotropy subgroup of G at p . That it is a closed subgroup of G follows from the Fact that the mapping G x M -+ M is continuous. The subset Gp = { g p : g E G} c M is called the orbit of G at p . It can be identified with the space of left cosets of G by H , which is denoted by GIH, defined as: An element of GIH is the subset OF the form g H for one choice of g E G. GIH is also called the homogeneous space of G with isotropy subgroup H . The coset eH is called the origin of GIH. The map G + G / H sending g E G into gH is called the projection of G into G / H , Each go E G induces a transformation of G/H as follows: g o . ( g H ) = ( g o g ) H . (In other words, G acts on G/H in such a way that the projection map G GIH commutes with the action of G on itself by left translation and on GIH.) Notice that H is then the set of g E G such that the transformation defined by g on GIH leaves the origin invariant. We shall state the basic theorems concerning these ideas, referring again to Helgason [ I , Chap. 2, Sects. 3-41 for the proofs. -+
Theorem. Let H be a closed subgroup of a Lie group G. GIH can be made into a manifold so that the projection map G -+ G / H is a maximal rank, onto mapping. The fibers? of the projection are the left cosets of H , and they are also integral manifolds of the left-invariant vector-field system on G determined by H. (In fact, G -+ G / H is a principal fiber bundle with H as structure group, referring to Auslander-Mackenzie [ 11 for this notion.) Theorem. Suppose a Lie group G acts as a transformation group on a manifold M , and that the subgroup H is the isotropy subgroup of G at a point p E M . Then the mapping GIH- Gp, which assigns to g H the point gp,f defines GIH as a submanifold of M whose set of points is the orbit top. (Thus, when we speak of the orbits as “submanifolds,” we mean their manifold structure inherited from GIH.) As a bonus From these general theorems, we obtain a way of proving that various spaces are manifolds, without the necessity of going through the details of exhibiting an atlas of coordinate systems. One such important example is the Grassman manifolds, which we do as an illustration, Let V be
t If 4: M + B is a map of M into B, the inverse image of a point of B is called the fiber above that point. 1Notice that this map is well defined, since if g and g, define the same coset, gg;’ E H ; hence g p = g l p .
94
Part 1. Calculus on Manifolds
a real, finite dimensional vector space. Let A ( V ) be the group of linear automorphisms of V. It can be easily proved that A ( V ) is a Lie group such that the mapping A( V ) x V .+ Vdefines it as a Lie transformation group on V. (For example, choosing a basis for V identifies V with R" (n = dim V ) and identifies A ( V ) with GL(n, R), the group of all n x n invertible real matrices. In the exercises, we go over the Lie group generalities for GL(n, R ) and the other " classical " groups.) For each integer p , 0 < p < n = dim V , let Cp(V ) be the set of p-dimensional linear subspaces of V (the Grassman manifold). A ( V ) acts on G P ( V ) in an obvious algebraic way: If W is a p-dimensional subspace of V, that is, a " point" of GP(V ) , and if a E A( V ) , then a W is just the subspace a( W ) . It is quite simple linear algebra to prove that A( V ) acts transitively on GP(u).Let W , be a fixed element of GP(V ) . Then the isotropy subgroup of A( V ) at W is A( V , W ) , defined as A ( V , W,)
= {u E
A ( V ) : a(W,)
=
W0}.
It is clearly a closed subgroup of A( V ) ,identifying GP(V ) with A( V ) / A (V , Wo), hence giving it a manifold structure. Of course a similar procedure would be followed for vector spaces over the complex numbers. We shall leave the general theory of Lie groups at this point.
Exercises 1. Show that the solution of (10.2) does indeed define exp(rA) as a oneparameter group of linear transformations on V.
+ +
2. Show that exp(A) exp(B) = exp(A B ) = exp(B) exp(A) if A B = BA, with exp(A) as defined by (10.2). If [ A , B ] merely commutes with A and B, work out the formula connecting exp(A B ) , exp(A), exp(B). 3.
Prove (10.4) if V is finite dimensional.
4.
Work out the formal details of (10.7).
5. Show that (10.6) and (10.7) hold if A and B are finite dimensional operators, that is, carry out the needed estimates. (In fact, the same estimates hold if A and B are bounded operators in a Hilbert space.)
6. Let G be a Lie group, and let X -+ X , be the isomorphic mapping of G , the set of one-parameter subgroups of G, onto the set of left-invariant vector fields. Suppose p is a representation of C by linear transformations on a vector space V. For each X E G, let p(X) be the infinitesimal generator of the
95
10. Lie Groups
one-parameter group of linear transformations t X , Y EG. Prove directly that
cx,,
x, + YL = P(X) + P ( n
-+
p(exp(tX)). Suppose that
YLl = P(CX,,
YLI).
This result can be interpreted as follows: Using Theorem 10.4, G can be identified with G,, the set of left-invariant vector fields on G . Now G , is a subalgebra of V(G). Use this identification to make G into a Lie algebra. Then the exercise shows that X + p ( X ) is a Lie algebra homomorphism. 7.
Show that dim G = dim G.
8. Prove that, for X E G , the vector field on G, which is the infinitesimal generator of the one-parameter group t -+ Ad(exp(tX)), is X , + X , . 9. Prove that, for X E G, X,(e) of G.
=
-
X,(e), whose e is the identity element
10. For X E G , prove that exp(X) is the value at t curve of the vector fields - XL or X , that begins at e.
1I.
=
1 of the integral
Prove the following facts: (a) exp, considered as a map G G, is differentiable. (b) exp, , its differential, is an isomorphism Go 4 G, . -+
(If V is a neighborhood of 0 in G such that exp restricted to V is a diffeomorphism, then U = exp( V ) is called a canonical neighborhood of e in G . The coordinate system in U, obtained by pulling back via exp-l a Euclidean coordinate system for G , is called a canonical coordinate system for U.) (c) If G is connected and - X E V for all X E V, then every element of G can be written as the product of a finite number of elements chosen from U = exp( V ) . (d) If G is connected, it is Abelian if and only if [ X , Y ] = 0 for all X , Y EG ; that is, G is an Abelian Lie algebra. 12. Suppose that XI, . .., X , is a basis for the vector space G . Define a map 4 : G + G as follows: If X = x, X , + .. . + x, X,, ,then 4 ( X ) = exp(x, X , ) . . exp(x, X,). Prove that 4 is also a diffeomorphism in a neighborhood of X = 0. The coordinate system obtained in this way for the corresponding neighborhood of e is called a canonical coordinate system of the second kind. 13. The Lie algebra G is said to be nilpotent if, for n sufficiently large, Ad X , . . . Ad X , = 0 for any n-tuple of elements X , , . . . , X , of elements of G . Prove that exp: G -+ G and the map 4 of Exercise 12 have everywhere nonzero Jacobian in this case.
96
Part 1. Calculus on Manifolds
14. If G is connected, show that every element g E G can be written in the form exp( Y , ) . . . exp( Y,) for some choice Y , , . . ., Y, of elements of G.
The following exercises will elucidate the “ classical groups.”
15. Let Y be a finite dimensional real vector space. Prove that the set of invertible linear transformations is a Lie group, denoted by GL(V). Its Lie algebra is E ( V ) , the space of all linear operators V + V. Similarly, if V is a complex vector space, with GL(V, C ) the group of all complex-linear isomorphisms: V + V , its Lie algebra is E ( Y , C ) , the complex-linear operators V - t V. If V = R“, then GL(Y, R ) becomes GL(n, R), the group n x n real invertible matrices. Similarly, if V = C”, GL( V , C ) is GL(n, c), the group of n x n complex invertible matrices. 16. Suppose V is a vector space; B( , ) is a bilinear, scalar-valued form on V . Let G be the group of isomorphisms: V - t V such that B(gu, gu) = B(u, u) for all u, u E V ; that is, G is the subgroup of GL(V) that preserves the form B. Prove that G consists of the operators A : V + V such that B(Au, v )
+ B(u, Av) = 0.
The other “classical” matrix groups (in addition to GL(n, R), GL(n, C ) ) can be obtained in this way. First suppose that V is a real vector space of dimension n, and B( , ) is a symmetric, positive definite form. G is then essentially O(n, R ) , the real orthogonal matrices. Show that its Lie algebra consists of the skew-symmetric n x n real matrices. If V is a complex vector space, B( , ) a nondegenerate symmetric complexlinear form G is essentially O(n, C ) , the n x n complex-orthogonal matrices. If Y is a real vector space of dimension n, if B( , ) is a symmetric bilinear form that, when brought to “ normal form,” has p-plus and ( n - p)-minus signs, G is denoted by GL(p, n - p). If B is a nondegenerate skew-symmetric form, then G is denoted by Sp(n, C ) and Sp(n, R), according to whether the form is real or complex. Determine its Lie algebra. Suppose V is a complex-vector space, but the form B( , ) is Hermitian bilinear; that is, it is bilinear as a real form, and in addition, B(cu, u) = cB(u, u)
for u, u E V , c E C ,
B(z4, cu) = cB(u, ?I).
If B(u, u) > 0 for all u E V , the group of complex-linear automorphisms preserving B is denoted by U ( B ) .In terms of matrices, it can be identified with U(n), the group of n x n complex unitary matrices. Determine its Lie algebra.
97
10. Lie Groups
If V is direct sum of complex subspaces V, 0 V, with B(V,, V,) = 0 ;
B(u,, u , ) > 0;
B(u,, u,) < 0;
for u, E V,, u2 E V , ,
and complex dim V, = p ,
dim V, = q,
then U ( B ) ,when realized as a matrix group, is denoted by U ( p ,4).Determine its Lie algebra. 17. Let SL(n, R ) and SL(n, C ) be the subgroup of determinant 1 matrices. Show that their Lie algebra are the n x n matrices of trace zero. Define : SO(n, R ) = O(n, R ) n SL(n, R ) SO(n, C) = O(n, C) n SL(n, C) S U ( n ) = U ( n ) n SL(n, C )
18. Prove that SL(n, R), GL(n, C ) , SL(n, C), SO@, R), O(n, C), SO@, C), U(n), SU(n) are all connected. However, GL(n, R), O(n, R ) are disconnected and have two components. Find the formula for the dimension of each of these groups.
11 Classical Mechanics of Particles and Continua Mechanics of a Single Particle Our aim in this chapter is to give a survey of some topics in mechanics that are of interest from a geometer’s point of view. Let us start off at the most elementary level, with Newton’s law of motion for a particle moving in Euclidean 3-space, which we denote by R3. Conforming with the notations in books on mechanics, points of R3 are denoted by r. The Euclidean dot product is r * r‘.
It is a symmetric, positive-definite bilinear form. The vector, or cross product is r x r’. It will be assumed that the reader is familiar with the rules of this Euclidean vector algebra and analysis. Consider a particle moving with time t, analytically given by a curve -+ r(t). If m is its mass and if F(r, i, t ) is its force law, Newton’s equation of motion is
d2r
(11.1)
This force law F is then essentially a mapping T ( R 3 )x R + R 3 , which must be prescribed by the physical theory with which one is involved. The main physical theories (for example, gravitation, electromagnetism, and fluid mechanics) provide a distinctive way of prescribing a force law. Note how strongly the vector-space structure of R3 enters into the equation. First, the solutions for the force-free case are straight lines. Second, to equate both sides of (I 1.l), we are relying on the fact that the tangent vector to the wave, drldt, can be identified with a vector in R3 itself, a characteristic property of vector spaces. Notice that the theory is “covariant” under the group of affine transformations of R 3 ; that is, if R3 --+ R3 is of the form
+:
4(r) = 4 )+ a,
where A is a linear transformation: R3 -+ R3, a E R , , and if r’(t) = +(r(t)) 98
11. Mechanics of Particles and Continua
99
is the transformed motion, then
with
’
F(r‘, r‘, t ) = AF(+- ‘(r’), A - r, t)
(11.2)
Here, “ covariance ” is a rather vague term of course : If the solutions of (1 1.1) are subjected to an arbitrary diffeomorphism R3, (1 1.1) is transformed into some second-order differential equation, but of a considerably more complicated type than (11.2). Thus the criterion of “covariance” can be regarded as esthetic; one could pinpoint the affine group by requiring that the diffeomorphism preserve the force laws that are linear in the indicated variable. After the equations are recast into the form of the Euler equations of the calculus of variations, or the Hamilton equations of Hamilton-Jacobi theory, there will be revealed a more subtle “covariance” with respect to a much larger group. (Perhaps one can look upon this as the physical reason for introducing these mathematical elaborations into mechanics.) The metric structure of R3 so far has played no role. It enters, for example, with the idea of ‘‘energy.” The kinetic energy T is defined by
(11.3)
-
with Idr/dt12 defined as (drldt) (dr/dt). Then d
(11.4)
Suppose that F is independent of i and t . Equation (11.4) suggests that we use F to define a I-differential form o on R3 by requiring that o ( u ) = F(r) u,
for each r E R3, each u E Rr3, which is identified with R 3 itself. Then, if o = -dV, where V E F ( R 3 ) ,we see that
that is, the “total energy” E = T + V is constant along the motion or, as the physicists say, is a “conserved quantity.” The (local) condition for the existence of such a potential-energy function V is, of course, that d o = 0, which means in the language of Euclidean vector analysis that the curl of the vector field F is zero.
Part 1, Calculus on Manifolds
100
The conservation “ laws readily introduced :
”
of momentum and angular momentum are also
p, the linear momentum,
L, the angular momentum,
dr dt
= m -, =r
x p;
Systems of Particles Suppose we are given s particles, each moving in 3-space, subject to given external forces and t o mutual interaction forces. Suppose r,(t), .. . , r,(t) are their paths of motion, with masses m,,. . . , m, . Newton’s equations of motion then look something like this:
dt Introduce indices 1 5 a, b, . . . I S. The total kinetic energy T is given by
Again a potential function V(r,, ...,r,) can be introduced by the condition
d dt
- V(r, (t),
. ..,r,(t))
= -
1Fa
*
a
dra dt
-
for each curve t + (rl(t), . . . ,r,(t)) in R3’. Conservation of total energy E = T + V will result. Total momentum p = p + ...
and total angular momentum L = L,
+ pm= C m a .dra dt
a
+ .. . + L, = 2 ra x pa a
11. Mechanics of Particles and Continua
101
can be now introduced. These quantities are most useful if the forces Fa are of a special type, namely, Fa
=
9
b
where Fab, for b # a, is interpreted as the force that the bth particle exerts on the ath particle, and Fa, is interpreted as the action of the external forces on the ath particle. Then (11.5) This suggests the simplifying condition Fob = -Fba
for a # 6.
(It is just the quantitative version of Newton’s law : “Action With this condition,
(11.6) = reaction ”.)
(11.7) The right-hand side is the sum of the external forces. In particular, total momentum is conserved if the system is isolated, that is, if there are no external forces. Turn now to total angular momentum:
=
(ra a
- rb)
$-
a
This suggests the simplifying condition (r,
- rb) x Fa, = 0
for a # b,
(11.8)
which means that Fobpoints along the line joining a # b; that is, the forces of interaction are directed along the line joining r, to Tb . Again the underlying geometry enters in a strong way at this point. With (1 1.7) satisfied, we have
d -L=zraxFaa. dt a
(11.9)
Again the right-hand side involves only the external forces. Equations (1 1.7) and (1 1.9) then express precisely the desirable features of linear and angular momentum. In turn, they arise from Lie groups acting on Euclidean space, in a way we shall describe later.
102
Part 1. Calculus on Manifolds
Constrained Motion As in the preceding section, consider a system of particles moving according to the laws of motion, (1 1.5). The motion of the individual particles, rl(f), . . . , r,(t), can be consolidated into a single curve r(t) = (rl(t), . . . , r,(t)) in R3”. Suppose we regard the particles as being constrained to be on a submanifold N of R3’. Again the problem is indeterminate until we prescribe, from other physical reasoning, how this effects the force law. “ D’Alembert’s principle ” gives the relevant information. (See Goldstein [I] for a fuller discussion of this topic.) Suppose F,, . . . , F, would be the forces if there were no constraint. Then, consider the vectors :
d2r w~--FI, dt2
d2r,
...,in,----
dt2
F,.
They would be zero if there were no constraints. Thus they should be interpreted as the forces of the constraint.” They can be considered together as giving a vector in R3’. D’Alembert’s principle now asserts that this vector, considered as a tangent vector to the point of R3” given by r(t) = (rl ( t ) , . . . , r,(t)), is perpendicuhr to the submanifold of constraint N . Again, Euclidean geometry enters into one of the basic assumptions of dynamics. (We keep emphasizing this point because it is just the understanding of the basic “ principles of interaction” that constitute the main unsolved problems of elementary particle physics.) This, in turn, suggests that we reformulate the equations (11.5) so as to put the Euclidean geometry of R” more in the foreground. Consider r = (rl, . . . , rr) and i = (tl, . . . , is) as vectors of R3”.Consolidate the force function appearing on the right-hand side of (11.5) into a function F: R3“ x R3’ x R + R3,: “
F(r, i, t ) = (Fl(rl, . . . ,rs, i,, . . . ,is, t), . . .). Then (1 1.5) takes the form d2r
(11.10)
where m,instead of being a scalar as for the motion of a single particle, is an operator: R3”-+ R3”;that is, m(rl,
. . . , r,)
= (in,r,, . . . , m,r,).
The momentum p is now the vector: m(dr/dt). The kinetic energy T is now
11. Mechanics of Particles and Continua
103
given by
where dot product (.) is that which is usual for R 3 ” ;that is,
-
(rl, . .., rs) (rl’, . . . , rs’) = rl rl’ + . . . + r, rs’; d2r
(11.11)
is perpendicular to the tangent subspace N,(,, to N in R3”. There are various techniques available for finding explicitly the equations of motion implicit in (11.11). One method proceeds via the calculus of variations: If the equations of motion, (ll.lO), can be interpreted as the Euler equations of a variational problem, the problem can be regarded as a special case of a ‘‘Lagrange problem ” obtained by putting constraints on the curves (that is, that they lie on N ) that are in competition to give a stationary value to the Lagrangian. However, in the simplest case, where the forces are derivable from a potential V(r), there is a more direct method available for writing down the explicit equations of motion as a system of Hamiltonian equations. We shall present this method, leaving the details of the calculation as an exercise. Suppose the submanifold N is locally describable by functions r(ql, . . . ,qn), where q = (ql, . . . ,q,,) is a point in a domain D of R“. Choose indices i, j , k , . . . running from 1 to n. Suppose the potential function V , when restricted to N , becomes a function V(q,, . . ., q,,) of these variables. The curve t - + r ( t ) defines a curve t 4 q(t) such that
r ( 0 = r(q(0)Now T takes the form
where G i j ( q )are functions of ql, . . . ,qn. Further, the n x n matrix function of q = (ql, . . . ,qn), G = (Gij), is nonsingular; that is, G-’ exists. We introduce new functions o f t : (11.12) These variables, p = ( p i , ...,pn), are defined as the momenta of the particles defined by the constraints. Consider E, the total energy T + V. It is a function
104
Part 1. Calculus on Manifolds
of the variables q l , . . . , qn, (dq,/dt),. . . , (dq,/dt). Equation (1 1.12) enables us to describe it also as a function of ql, . . . ,q n , p l , . . . ,p , , . Explicitly,
E
= (G-'(q))ijPiPj
+ v(q1,.
* *
9
4").
(1 1.13)
Now one can verify by straightforward but tedious calculations that equations (1 1.1) are equivalent to the following Hurniltoniun equations:
We shall investigate such equations in more detail in Part 2 and find a deeper reason why (1 1.13) and (1 1.14) take this simple form. Continuum Mechanics Suppose we have s particles of mass m,, . . ., m, moving in R 3 . Let rl(t),
. . . ,r,(t) be the position curves as these particles evolve in time. Suppose G
is a group of transformations acting on R3. Suppose that there is a curve t -+g(t) in G such that r,(t) = g(t)r,(O), rs(t) = g(t)rs(O).
Suppose also that gr,(O) = r,(O)
for 1 I aI s
implies that g is the identity element of G . Then the different possible positions of the particles may be identified with the group G itself. As Arnold [2] has recently emphasized, this enables one to interpret certain ideas in mechanics in terms of the differential geometry of the underlying manifold of G. In this section, we carry out a part of this program for the case of fluid motion. Here one thinks of a particle at every point of R3, and G as the group of all diffeomorphjsms of R3. We shall aim to derive the standard " Eulerian " description of continuum mechanics (see Prager [l]). In fact it is useful to be even more general. Suppose A4 is a manifold and t -+ g(f) is a one-parameter family of diffeomorphisms of M . For each value of t , define a vector field X , E V ( M ) as follows: For p E M , X , ( p ) is the tangent vector to the curve
u --+ g(u)g(t)-'p at u = t .
Physically, X , is the
"
Eulerian " velocity field corresponding to the
"
fluid
11. Mechanics of Particles and Continua
105
motion” defined by g(t); that is, A’&) is the velocity vector of the particle that, at time t, is at the point p . Let us compute X , in terms of its action on functions on M . For f E F ( M ) ,
or
or
or
X A f ) =dt)- l*
a 5 (s(t>*(f>) f o r f e F ( W .
(11.15b)
On the other hand, one can form the “Lagrangian” velocity field Y,: Y,(p) is the “velocity” vector of the particle that at time t = 0 was at p . Analytically, for f E F ( M ) ,
Y,(f)(P)
=
a 5 (dt)*(f)(P))>
We can express the “conservation of mass” in this form. Suppose w is a fixed-volume element differential form on M , and p ( p , t ) is the density function of mass; that is, if D is a domain in M , then j D p,w is the total mass of the “ fluid ” in the region 0 at time t , where p t is the function p + p ( p , t ) on M . Then “conservation of mass” in integral form is the relation (11.18)
106
Part 1. Calculus on Manifolds
From Chapter 7, we see that the left-hand side of this relation is equal to (11.19)
Since the right-hand side of (1 I . 18) is independent of t , we can differentiate (11.19) with respect to t and set the result equal to zero :
Now, if 0 is a differential form, d - (s(t>*(Q>> = Yt(0) = S(l)*(X,(Q>>? at
(11.21)
where Y,(0) denotes the Lie derivative of the form 6 by the vector field Y , . (See the exercises for the proof of this : Note that (1 I . 16) and ( I 1.17) assert this when 0 is a 0-form.) Hence
and (1 1.20) takes the form :
which equals
Since D is an arbitrary open set, this can be true only if the integrand is zero; that is,
(11.22) This is the eguution of continuity. We leave to the exercises the task of writing out what this means for M = R 3 , and show there that it reduces to the form of the continuity equation that one finds in books on fluid mechanics (for example, Prager 111). Suppose now that we want to write down the “equations of motion” for the fluid. In Prager’s book [ I ] one finds a clear description of how to do this for Euclidean space; that is, M = R 3 . Since, as an exercise in differential geometry, we would like to continue to work with a general manifold, let us adopt one simple way of freeing the argument from Euclidean geometry.
107
11. Mechanics of Particles and Continua
Let 0 be a l-differential form on M . Suppose that W is a vector field on M. Let o continue to be a volume-element differential form on M . Then, if D is a domain in M , O(W)w can be considered as the “O-component of the total effect of the vector field W o n the domain D.” Suppose that F is a vector field representing the “volume” forces on a domain D ;that is, j D p B ( F ) o is the ‘‘@component of the volume forces acting on the domain D.”( p is a function on M giving the density of mass. We are working here at a fixed time. In essence, we are analyzing the “ kinematics” for forces.) The discipline of continuum mechanics can also encounter forces that come from the region surrounding a bit of fluid, expressed analytically by the “stress tensor.” We shall not present the analysis of the physical idea, but only state how this can be phrased in differential geometric language. Let T be a tensor field on M representing this stress. At p E M , T is a skew-symmetric(m - l)-multilinearmappingMpx..~xMp-+ M p ( m=dim M ) . Thus, for P E M , vl, .. ., vmPl E M , , 8(T(vl, . . ., v,-,)) is an (rn - 1)covector on M . We can then consider the symbol 0(T) defining, for each 8, an ( m - 1)-differential form on M . (In fact we can also regard T as defined by an F(M)-linear map F ’ ( M ) -+ F “ - ’ ( M ) of l-forms into (m - I)-forms. should be thought of as the stress” force Physically, T(v,, . . . , exerted on the piece of hypersurface that is tangent to the vectors vl, . . . , v, - 1. Thus, if D is a domain in M with boundary dM, e(t) should be thought of as the total 0-component of the stress force on the domain D.This is equal, by Stokes’ theorem, to &(t). Thus
SD
“
SaD
SD
j - p w o+ Wt))
(11.23)
should represent the 0-component of the force acting on the domain D at a fixed time. Now, we apply Newton’s law of motion in the form : “
Rate of change of @-componentof momentum
= forces
acting.”
Suppose a group of particles making up the fluid starts out at t = 0 to occupy the domain D.At time t, they will be in domain g(t)D. Their 0-component of total momentum will be ( 1 1.24)
(11.25)
108
Part 1. Calculus on Manifolds
Equation (1 1.21) tells us how to differentiate this with respect to t :
Equating this to the expression (11.23) for the force acting on this bunch of particles gives the identity:
(a:)
+ e - p r w + o(xr)--l
aP at
w . (11.26)
Since this is to hold for all domains D, we have ~t
o(Ft)a +
= Xt(KXt)Pr 0 )
pro
+o
aP at
( x p 0.
(1 1.27)
This is the coordinate-free version of Euler's equations of fluid motion. Let us make it explicit for M = R3, with coordinates (xi),i = 1,2, 3. Suppose
a
Xr = q(x, t ) -, axi
w = dx,
Suppose Tikis defined by the condition
A
dx,
A
dx,,
11. Mechanics of Particles and Continua
where cljkis the 3-index, skew-symmetric tensor (with E~~~
Now
109 =
1). Now
aui
X , ( O ) = - 0. axi
Hence (1 1.27) becomes
(1 1.29) This is the fundamental dynamical equation of continuum mechanics. Proceeding further requires the development of additional physical ideas concerning the relation between the stress tensor and the kinematic data. We shall leave it at this point.
Exercises 1. Prove (11.21).
2. Work out the explicit form for the equation of continuity, (11.22), for the case of fluid motion in R3. 3. Work out the equations of motion of a rigid body (that is, a system of particles constrained to move in such a way that the distances between individual particles remains constant) by using d’Alembert’s principle or the equation (11.9) for rate of change of angular momentum, or both. Compare with the more indirect approach that uses the calculus of variations and “moving frames” given in Chapters 16 and 33.
This page intentionally left blank
Part
2
THE HAMILTON-JACOBI THEORY A N D CALCULUS OF VARIATIONS
This page intentionally left blank
12 Differential
Forms and Variational Problems
Although we are concerned in this book’only with the relatively simple theory of variational problems with one independent variable, in this chapter we shall briefly describe a general method for dealing with the multiple integral problems. It follows Cartan’s general approach to geometric problems; the idea is to “prolong” a variational problem given on a space to one defined by differential forms on a space sitting over the given one. This “ standard ” type of variational problem will be in optimally simple form; hence it will be called a canonical variationalproblem. We shall see that Stokes’ formula can be used in a very systematic manner to derive the first and second variation formula for this sort of variational problem. To describe a canonical variational problem, we should be given the following objects: (a) A manifold M. (b) An r-differential form 0 on M . (c) A set of differential forms, denoted by I, on M which forms a differential ideal; that is, an ideal in the algebraic sense with respect to the exterior algebra structure on all differential forms of M and, in addition, which is closed under exterior differentiation d. (d) A manifold N with boundary dN. The dimension of N is r. N should be oriented, and should induce an orientation on aN so that r-forms can be integrated over N. (We shall be using in this chapter the theory of integration on manifolds described in Chapter 11 of Part 1. Thus dN does not have to be the whole point-set boundary of N , but just the (r - 1)-dimensional part so that Stokes’ formula is valid.) Let E(N, M ) be the space of submanifold maps of N onto M . Since N a n d M will be fixed throughout the discussion, we abbreviate E(N, M ) to E . Let E ( I ) be the set of maps of E that are integral maps of the ideal I, that is, the set of submanifold maps 4 : N M such that --f
Cp*(w) = 0
for all o E 1.
The r-form 8 defines a real-valued function L on E as follows: If manifold map : N + M , then
L(4) =
14 * ( 0 N
113
4 is a sub-
114
Part 2. Hamilton-Jacobi Theory-Variational Calculus
The problem ” is to study the “ critical points ” L restricted to E(I). The advantage of studying variational problems of this special type is that the first and second variation formulas are obtained very simply from Stokes’ formula. Now E should be considered as an infinite dimensional manifold. This can be done in a formal way by introducing a ring of functions on E. Rather than trace through the general formalism, we simply indicate the obvious geometric meaning of the tangent space” to E. This is the only feature of its manifold structure that we shall actually use. For 4 E E, the “tangent space” to E a t 4, denoted by E 4 , is defined as follows: An element of E4 is a map, typically denoted by v, of N -+ T ( M ) such that v(p) E M4(,,)for all p E N . Thus, as in Fig. 2, v is a vector field along the mapping, pointing outward into M ; “
“
“
”
FIGURE 2
that is, u is an “infinitesimal deformation” of 4. To make this precise, suppose that we are given a deformation of 4, that is, a one-parameter family of maps, t + 4t , of N + M , with 40 = 4. Define the infinitesimal deformation as an element u E E by the following formula: For p E N , v(p) = the tangent vector to the curve t + Cp,(p) at t = 0. There is another geometric way of looking at this: t + 4t defines a “ curve” in E, and v E E is the “ tangent vector ” to this curve at t = 0.
The following formula (12.1) is the basic tool of this chapter, since it tells us how differential forms on M behave to the first order when they are dragged along by a deformation of 4. The reason that it is desirable to express geometric objects in terms of differential forms (rather than, say, other types of tensor fields) is that they do behave so simply. (12.1) for each differential form o on M . There is a notational convention inherent in this formula. For P E N , v(p) w is the contraction of the form o($(p)) by the tangent vectorv(p); hence it is a form on the tangent space to M at 4 ( p ) of one lower degree than o.
12. Differential Forms and Variational Problems
115
4*(v(p) J o)is the pullback of this form to give a form on the tangent space to N at p as p varies over N ; this gives a differential form on N (which we denote by 4*(v A o)), to which d may be applied. This is just the second term on the right-hand side of (12.1). The first term on the right-hand side is defined similarly, with d o replacing w . Note the similarity of this formula and the one involving o,a vector field X , and Lie derivative of w by X , namely, X(O) = X
_I
do + d(X
_I
a).
(12.2)
In fact this formula may be considered as a special case of (12.1); it is obtained when we choose N = M , 4 as the identity mapping, and t -P 4t as the oneparameter group of M generated by X . Now we turn to the proof of (12.1). First (12.1) must be proved when w is a 0-form, that is, a function, which we shall call f to keep things straight. The second term on the right-hand side of (12.1) is zero, since a form of degree 1 is, by convention, zero. The value of the right-hand side at a point p E N is
which is
which is, by the definition of v(p),
which is the left-hand side. The next step in the proof of (12.1) is to show that if it is true for o,it is true for do. The left-hand side is now
The right-hand side is &*(v J do).They are equal if (12.1) is true for o.The final step is to show that if (12.1) is true for o1and w 2 ,it is true for o1A w 2 . We leave it to the reader to do this tedious but straightforward calculation. Finally (12.1) is true in general, since any form can be built up from 0-forms by applying d and exterior product. Return now to the variational problem defined by the differential ideal I of differential forms and the r-form 8. T o study the behavior of 4 + L(4) =J&*(O), let us suppose that t + 4, is a deformation of 4 by a one-parameter
116
Part 2. Hamilton-Jacobi Theory-Variational Calculus
family of maps of N mation. Then
+M .
= (using
Let v E E be the corresponding infinitesimal defor-
Stokes’ formula)
N
4*(v J df?)+
1
JN
+*(v J 0).
(12.3)
This is the “ first variation formula.” Now we are interested in the “critical points” of L restricted to E(I). If 4 E E ( I ) , that is, if 4*(o) = 0 for all o E I , we must find the subspace of E+ corresponding to deformations that also lie in E(Z).But the condition for this is also obvious from (12.1), namely,
4*(v
+
A do) d4*(v A o)= 0
for all o E I .
(12.4)
linear variational equations ” of the The set of v E E satisfying (12.4)-the partial differential equations defining the integral submanifolds of I-can be thought of as the “ tangent space to E ( I ) at the point” 4, and will accordingly be denoted by E(Z)+.We can then say that a map 4 : N + M is an extremal of the variational problem if “
”
We shall say that v E E 4 ; that is, if
4*(v _I do) = 0 4 is an extrernal
“
for all u E E(Z)+.
(12.5)
of theJirst kind if (12.5) is true for all
4 * ( X J do) = 0 for all vector fields X on M , that is, if 4 is an integral manifold of the system of forms spanned by the X J df?,with X
E
V(M).
(12.6)
A n extremal of the first kind, then, is one that is also an extremal of the variational problem obtained by retaining 6’ but throwing away I. As we shall see, it is the extremals of the first kind that admit the simplest type of a second variation formula. We can now read off from the first variation formula the definition of the second basic idea, namely, that of transversality. The vector field v E E+ on M is transversal to 4 at the boundary (with respect to 0) if $*(v
_I
0)(p)= 0
for allp E dN.
Hence we see that the first variation is zero if transversal to q!~at the boundary.
4
(1 2.7)
is an extremal and if v is
12. Differential Forms and Variational Problems
117
T o show that there are situations where all extremals are of the first kind (hence that the theory of the second variation to be developed below is nonvacuous), let us look at the example that will be our main concern in this book.
Simple Integral Variational Problems “ Simple integral ” means that dimension N = 1. Therefore, we may as well choose N as an interval of real numbers, which we shall normalize to [0, 11. Then t will usually denote the parameter on this interval. In keeping with our earlier notation, maps of N into manifolds, that is, curves, will usually be denoted by o. Let B be a manifold of dimension n with indices 1 5 j , . . . , 5 n. Let T ( B )be the tangent bundle of B, and let M be the Cartesian product T ( B ) x R . A Lagrangian for B is a real-valued function on M , denoted usually by L(u, t). Associated with a curve t + o ( t ) in B, we define the number
L(a)
as fdL.(o’(L),t ) dt.
(12.8)
The “ordinary” problem is to study the critical points of this real-valued function on the space of paths of B. In (classical) Lagrange variational problems, we are, also given an additional set gl(u, t ) , . . . ,gm(u, t ) of real-valued functions on M , and we study the critical points of L restricted to the set of curves t o ( t ) in B that satisfy the “constraints”: --f
g,(a’(t), t ) = 0
for 0 I t 5 1.
(12.9)
More generally, we can prescribe a constraint subset C of M (not necessarily defined by setting functions equal to zero; for example, in optimal control theory, subsets defined by inequalities appear) and consider the critical points of L restricted to curve t .+ o ( t ) in B that satisfy (o’(t), t ) E
C
for 0 I t I 1.
(12.10)
Following Cartan’s idea [I], such variational problems can be “prolonged ” to canonical variational problems on M . It is most convenient to do this by using a local coordinate system x j for B : It is usually readily verified that the objects defined are independent of the coordinates used. Define functions f j on T(A4): f j ( U ) = dx,(u).
Thus the f j are just the differentials of the coordinates relabeled and regarded as real-valued functions on T ( B ) ; hence, also on M = T ( B ) x R . We shall often use a vector notation: x = (xi);
1 = (aj).
118
Part 2. Hamilton-Jacobi Theory-Variational Calculus
L and the g, then become functions L(x, 1, t ) and ga(x, 1, t ) . Define
aL ax j
= 7d x j
- H dt,
(12.11)
with
where O(L) is called the Cartan form defined by the Lagrangian L. Consider (for the ordinary variational problem) the ideal I of forms generated by the differential forms dxj - l j dt. Suppose we have a curve t + ( ~ ( t )i (, t ) , t ) = 4(t) in M . Suppose it annihilates the forms in I. Then dx dt
-= k j ( t ) ,
that is, t -+ & t ) is the tangent vector curve t -+ ( d ( t ) , t ) to the “projected” curve t + x ( t ) in B. Conversely, the tangent vector curve to any curve in B annihilates the form in I. For such a curve, notice that aL O(o’(t),t) = --T(dXj - k j dt)(o’(t), t ) + L(o’(t), t ) d t ; ax j
that is,
Thus, under the projection map M - , B, the spaces E(Z)and the curves in B correspond in a 1-1 way, and the real-valued function L corresponds to the real-valued function on E ( I ) obtained by integrating 8(L). Let us see what condition (12.6) means. First, dO(L) is a 2-form, which we call w. Notice it means only that
4’(t)1o = 0
for 0 I t I 1.
(12.12)
Recall that it is the condition that t + 4(t) = ( ~ ’ ( t )t, ) be an extremal of the first kind. As we shall describe in the next chapter, this means that t + 4 ( t ) is a characteristic curce for o,and will also lead to a description of t 4 ( t ) in certain local coordinates for M , the Hamilton equations. --f
12. Differential Forms and Variational Problems
119
t)Now we shall work out the conditions for (12.12) in terms of the ( x , i, coordinate system for M . Suppose that t + ( x ( t ) ,i ( t ) , t ) describes 4 in these coordinates. Put
Then d o = dL,,, = dL,,
= (dL,,
+ dL, A dt - L , + j d i A dt A ( d x j - i j dt) + L j dxj A dt + L j dt) A (dxj - i jdt). A
(dxj - i jdt)
(12.13)
Suppose that t + x ( t ) is a curve in B. Then its tangent vector curve
t + (x(t),(dx/dt),t ) annihilates the forms dxj - i j d t . Thq(12.12) is satisfied.
if and only if
(12.14) These are just the classical Euler-Lagrange equations, and will be familiar to the reader if he has looked at more traditional treatments (for example, Gelfand and Fomin [l]). The solutions of (12.14) are called the extremals of the variational problem. We have then shown that: The extremal curves in B are precisely those curves tangent to vector curves in M = T ( B ) x R and are characteristic curves of the 2-form w . The Second Variation for Extremals of the First Kind Let us return to the study of general variational problems. Now let 4 : N + M be a map on E(Z)that is an extremal of the first kind for the canonical variational problem defined by 0 and I. Let t + 4t be a deformation of 4. We want to compute
This will give us the second variation formula. Now it is clear geometrically that we are not interested in considering deformations for which the infinitesimal deformation v E Eb is tangent to 4 ( N ) . Suppose then that
120
Part 2. Hamilton-Jacobi Theory-Variational
Cafculus
Then, for sufficiently small t , we can find a vector field X E V ( M ) such that t -,4 , ( p ) is an integral curve of X . (Strictly speaking, we shall be calculating the second variation for deformations of this type, and then verifying that the result is independent of the actual choice of X.) Then, we can apply (12.1) again to obtain
Hence, differentiating again and using (12.1), we have
We must now verify that the separation of the second variation formula into the two terms given by (12.15) has an intrinsic geometric meaning, that is, that it is independent of the extension of v to a vector field on M , provided 4 is an extremal of the first kind and v is transversal to 4 at the boundary. Let us deal with the first term, that is, the boundary term. Consider the form
( x , Y ) -+
4 * ( x _I d ( _I~ e)
+ B J d ( x _I el),
(12.16)
defined for vector fields that are transversal to the (ft at the boundary. To show that it depends only on the values of X and Yon 4 ( d N ) ,it suffices to note what happens when fX is substituted for X , where f is a function on M.
( f X , Y ) -,4*(f)4*(X J 0) + 4*(fM*(Y Working out the third term, we have
J4
J 6))
+ 4 * ( J~ (djA ( x _I el)).
(ft*(rj)4*(x J s) - cp*(df) A @*(YJ x
Now, since X is transversal to
X
4 at the boundary,
J el.
4 * ( x J e) = 0.
Iff is constant on the submanifold, 4*(df) = 0. This shows that the form (12.16) depends only on the values of X and Y in the manifold 4. Let us perform the analogous computation for the form
( X , Y ) + 4*(X J d ( Y Now
(fX, Y)
-+
4 * ( f ) 4 * ( X J d( Y J do)
+ 4 * ( _I~ (df
A
A
+ Y J d(X
+ Y J d(X
_I
( X J de))
+
_]
do)).
(1 2.17)
do))
( Y dJ o ) Y _I d ( x J do)) X J do). #*(Y
= 4*(j-)+*(x J ~
- #*(df)
do)
+ 4 * ( r ( f ) ) 4 * ( xJ do)
12. Differential Forms and Variational Problems
I21
Now 4 * ( X _I do) = 0, since 4 is an extremal of o ;hence we see again that the form (12.17) depends only on the values of X and Y on the submanifold 4. In summary, we have split up the second variation formula into two terms, one involving integration over aN, the other involving integration over N . Both integrands involve first-order differential operators assigning differential forms on JN and N to pairs of elements of E + . (However, if dim N = 1, 4 * ( d f ) = 0, indicating that the boundary term depends only on the values of u at the points of J N . This is one reason for the simplicity of the I-independent variable calculus of variations in comparison with the multiple integral problems.) Finally, we mention another reason to be confident that the second variation formula always takes the form given by (12.15), even when t + 4tis not necessarily of the assumed form. Computation in local coordinates such as can be found, for example, in the classical literature, convinces one that the second variation can be written as the sum of two terms-one involving an integration over N , the other an integration over the boundary-and that each of the terms depends only on the values of
Hence, in computing (d2/dt2)L(dr)/l t = O , we can choose any deformation t + 4twhose infinitesimal deformation is v : In particular, if v can be exhibited as being restricted to 4 ( N ) of a vector field X on M , we can choose q5t as the image under the one-parameter group generated by X . We shall leave this approach to the calculus of variations and the second variational formula. and turn to more classical material.
13 Hamilton-Jacobi Theory Classically, Hamilton-Jacobi theory is the study of the formal properties of the solutions of ordinary differential equations of the Hamilton type:
and of first-order partial differential equations of the Hamilton-Jacobi type :
as
-(x,t)=
at
--H
(
:1
x,--,t
,
and of the “ duality between them. We shall interpret Hamilton-Jacobi theory in the wider sense as the study of the characteristic curves and maximal integral submanifolds of a closed 2-differential form. As we saw in Chapter 12, this leads to the study of extremal curves of simple variational problems. Let M be a manifold with a 2-differential form w such that d o = 0. ”
Definition
A tangent vector-vector u E T ( M )is a characteristic vector for w if v A w A vector field X E V ( M ) is a characteristic uector$eld if
= 0.
XAo=O;
that is, if X ( p ) is a characteristic vector for allp E M . A curve t + a(t) in M is a characteristic curce of w if a’(t) is a characteristic vector for all t ; that is, if o’(t)J o ( t )= 0 for all t . Let us study the algebraic properties of the characteristic vectors at a point p E M. They form a subspace of M , . Let u,, . . . , u, be a basis for M , such that u l , . . . , u, form a basis for the characteristic vectors. Let ol, . . . , w, be a dual basis for 1-covectors; that is, Then
mi(”)
= 6,
for 1 I i, j 2 m.
w = a i j o A oj at p.
0 = v, A w
for 1 5 a 5 n,
=
122
13. Hamilton-Jacobi Theory
or
123
0 = a a j o j;
hence, aaj = 0
or
o=
C a i j o i A mi.
i, j > n
Now the determinant of the matrix ( a i j ) n + l s ij r, m must be nonzero. Otherwise there would be a characteristic vector of o that is a combination of the u,+], . . ., 0,. But this matrix is real and skew-symmetric; hence its eigenvalues are pure imaginary and come in nonzero complex-conjugate pairs. We conclude that: m - n is an even number, say, equal to 2r. Then o r= det(aij)on+l A ... A o m #0. (ormeans the exterior product of r copies of the 2-covector a). Also, or+l
= 0.
Summing up, we have proved the following theorem
THEOREM 13.1 The dimension of the space of characteristic vectors at a point p E M is equal to m - 2r, where m = dim M and r is the greatest integer such that ar # 0. Suppose ol, . . . , w, is a basis for covectors at p , and w=
m ...
C
aijoirwj.
i, j = 1
Then 2r = rank of the matrix (aij)l
i,
Now we show how the characteristic vector fieid and the characteristic curves of o can be used to construct integral submanifolds for w.
Definition A submanifold
4 : N - t M is said to be an
integral submanifold for o if
(p*(o)= 0; that is, if o restricted to the submanifold is zero.
Now suppose that 4: N - , M is an integral submanifold for o and that X E V ( M )is a characteristic vector field for o ;that is, its integral curves are characteristic curves of w. For p E N, t E R , define 6(p, t ) by requiring that: (a) The curve t + 4 ( p , t ) is an integral curve of X . (b) 6(P, 0) = 9(P)In another notation,
d(P, 0 = exP(tX)4(P)-
124
Part 2. Hamilton-Jacobi Theory-Variational Calculus
Suppose that $ ( N ) is small enough so that this can be defined uniformly for 0I tI a. Then 6 defines a map: N x [0, a] -+ M . It is a submanifold map (for a and N sufficiently small) if for all P E N ,
X(4(P))4*(NP) that is, if X is not tangent to
4(N).
THEOREM 13.2 With these conditions satisfied, 6 is an integral submanifold of w. Geometrically, by passing through each point of the given integral submanifold the integral curve of X starting at that point, we obtain an integral submanifold of one greater dimension passing through the given submanifold. Proof. The theorem is obviously of a purely local nature. We can suppose that coordinates ( x i ) , 1 I i, j I m,are chosen so that X = a/dx, :
X ( W ) = X J dm
+ d(X J W ) = 0.
Suppose that o = a i j d x i A d x j . Then aa..
0 = X ( a i j )dxi A d x j = 2’ dxi 8x1
A
dxj ;
hence,
Also, X
_I w = 0;
that is, a I jd x j = 0 or a I j = 0. Then w=
From the explicit form X
i, j r 2
a i j ( x 2 ,. .. ,x,) dxi A d x j .
= d/dx,, we
see that
6*(xi) = $*(xi)
for i 2 2
(since the x i , i 2 2, remain constant on integral curves of X ) ; hence, obviously, 6 * ( W ) = f$*(W) = 0. THEOREM 13.3 Let w be a closed 2-form on M whose rank is the same at every point of M . Then each point p E M has a neighborhood U and a coordinate system ( y i ) , 1 I i, j , . . . , ~ n valid , in U such that w takes the following canonical form in U : w = d y , A dy, + * * . + dy,-, A dy,.
13. Hamilton-Jacobi Theory
125
Proof. We proceed by induction on the dimension m of the space M . It is certainly (if vacuously) true for m = 1. Suppose, in the original coordinate system (xi), that
o = aij dx,
dxj.
A
Case 1 (2r < n, where 2r is the rank of o)
Since the rank is constant, there is at least one nonzero characteristic vector field X . If the coordinate system is changed so that X = ajdx,, then, as in the proof of Theorem 13.2, o=
i, j 2 2
aij(x2, . . . ,xn)dxi A dxj ;
that is, o is a form in (n - 1) variables and the induction hypothesis can be applied. Case 2 (2r = m)
Then det(aij(p)) # 0. We may suppose, if U is small enough, that det(aij(q)) # 0 for all q E U. Let f be any function in U such that df is never zero in U. Since the linear equations aij(x)Aj= ci can be solved for ( A j ) whatever the (ci), and the rank is constant when x varies, there is a vector field X in U such that X A w = df. Suppose that the coordinate system is chosen so that X = d/dx, : X(o)=X
_I
d o + d(X
_I W )
=d(df) =0
aaij
--
ax,
hence,
A
dxj;
aai -= 0. 3x1
Also, a, dxj = 3 df; hence o = dx,
Now
dx,
A
(
d --
f
i,j22
aij(x2, ..., x,,) dxi A dxj
126
Part 2. Hamilton-Jacobi Theory-Variational Calculus
Hence, by at most changing variables in (x, , . . . , x ,) space, we may suppose that - ( f / 2 ) = x,; that is, w = dx,
xk,
A
dx,
+ C
i, j 2 2
a i j ( x z ,. ..,x,) dxi
A
dxj
Now o'= j 2 2 a i j dxi A dxj is closed also. Its rank is everywhere < m - 1 ; since m is even, it is < m - 2. It cannot be less than m - 2 at any point, for otherwise the value of o at that point could be written in terms of at most (m - 1)-1-covectors, which would contradict that omis everywhere # 0. Thus o' is a form in a space of (m - 1) variables (x,,, . . . , x ,) always of rank (m - 2). By induction hypotheses, there are (m - 2) functionally independent functions y , , . ..,y n Z - ,of the variables (x,,...,xn,)such that w' = d y ,
A
dy,
+ .. - + d y , - ,
A
dy,-, .
At no point can dx, be written in terms of dx,, d y l , . . . , dy, - ,, for otherwise w would not beof rankmat that point. Thus the functions(x,, x2,y,, . . . ,y m - , ) are such that their differentials dx,, dx, , dy,, . . . , dy,-, are everywhere linearly independent; hence they form a new coordinate system in which o obviously has the desired canonical form. Q.E.D. THEOREM 13.4 Let o be a closed 2-form on a manifold M . Suppose that o is of rank 2 r ; that is, opr # 0 for all p E M , but 0''' = 0. Then: (a) An integral submanifold of o can at most be of dimension m - r . Let us say that an integral submanifold is maximal if it is of dimension m - r, that is, if it is of maximal dimension. (b) If 4: N + M is a maximal integral submanifold of w , then there is a local coordinate system (xi), I < i, j , .. ., Lrn, for A4 such that o=
C dxi A d x r + i;
i= 1
that is, w is in canonical form and 4 ( N ) is defined locally by 0 = x1 = * .. = x , . (c) If 4 : N + M is any maximal integral submanifold of w , then there is a unique vector-field system H on N , with dim H p = n - r for all p E N, having the property that if t-+o,(t) is an integral curve of H , t - + + ( o , ( t ) )is a characteristic curve of w . Proof. Let C be the vector-field system on M consisting of the characteristic vector fields of o ;that is
c = { X E V(A4):x _I 0 = O}.
Since the dimension of the characteristic vectors of o at a point is constant on
13. Hamilton-Jacobi Theory
127
M , equal to rn - 2r, C i s a nonsingular vector-field system; that is, every point p E M is a maximal point, and dim C, = m - 2r
for all p E M .
Notice that [C, C ] c C; that is, C is a completely integrable vector-field system. Suppose we first choose the coordinate system (xi), using the Frobenius integrability theorem, so that the vector fields ( d / d x Z r + J.,..,(djdx,,,) are a base for C (possibly making M smaller, of course). For X E C, X(o)= X A dw d(X w ) = 0. Suppose then that
+
Then
o=a,jdxiAdxj. a.. rJ = 0
aa,,
_-0 ax,
+ 1, for k 2 2r + 1 , all i, j . for i,j 2 2r
Hence o can be written as a form in the variables x i , . . ., xZr; that is, 0
=
i, j < Z r
uij(xl, . ..,x,) dxi A d x j .
This form in r-variables can have no characteristic vectors; hence wr # 0. By Theorem 13.2, any integral submanifold 4 : N -+ M of maximal dimension is obtained locally by taking a maximal integral submanifold of this form in r-variables and “adding” the variables ( x , + ~.,.. , x ), to it. Thus, to prove part (a), it suffices to suppose that rn = 2r, which we shall proceed to do. Now, looked at more abstractly, 4 : N + M is an integral submanifold if and only if for all u, v, E 4*(Np.), all P‘ E N . W + ( ~ , ) ( Uu) , =0 The fact that rn = 2r, that is, that there are no characteristic vectors, implies that for each p E M , the skew-symmetricbilinear form in M,-(u, v) -+ op(u, v) -is nondegenerate. Hence the fact that dim 4*(N,) 5 r will follow from the following lemma from linear algebra. LEMMA 13.5 Let V be a real vector space of dimension 2r, and let (u, v) -+ o(u, v) be a nondegenerate skew-symmetric bilinear form on V. Let T be a linear subspace of V such that w(u, u) = 0 for all u, v E T. Then, dim T 5 r. Proof. Let (u, v ) -+ (u, u ) be any positive-definite symmetric quadratic form on V. For fixed v E V , there must then be a vector J(u) E V such that w(u, v) =
for all u E V.
(Since the linear form u -+ w(u, v) belongs to the dual space of V, and it is
128
Part 2. Hamilton-Jacobi Theory-Variational Calculus
known that ( , ) can be used to set up an isomorphism between V and its dual space.) Clearly, v + f ( v ) is a linear transformation of V into itself that must be an isomorphism, lest o be degenerate. Suppose dim T > r + 1. Then T n J(T) would be a nonzero subspace of V ; say v E T r\ J ( T ) , v # 0, and u = J(u), for some u E T. Then 0#
(0, 0)
= (%"xu)) = w(z1, u ) = 0,
contradiction.
Q.E.D.
Turn to (b) now. In a similar way as in (a), it suffices to prove (b) in case m = 2v; that is, there are no characteristics. We proceed in a way very similar to that used in the proof of Theorem 13.3, by induction on m. Suppose that fis a function on M such t h a t f = 0 on (P(N),but that df # 0 in A4 (if necessary making M smaller). As in the proof of Theorem 13.3, there is a vector field X such that X J w = dJ hence X(w) = 0. Choose the coordinate system for X so that X = djax,. If w = aij dx, A d x j ,
hence, o = dx,
A
i 3+ C
d --
i,j>2
As before, we can suppose that x2 - (-( w = dx,
A
ds,
+
aijdxi A dxj.
f/2); hence
i, j r 2
aij dxi A d x j .
xi,
Now (P*(x2) = 0 = (P*(f), by construction; hence (P is also an integral submanifold of the form j 2 2 a i j dxi A d x j . As in the proof of Theorem 13.3, this form must everywhere in M have rank m - 2; hence the induction hypothesis may be applied. Part (c) now follows from (b): For if o=
dxi
i= 1
A
dxrtir,
with x r + l = 0 = ... = x, defining (P(N) locally, then the vector fields (dji3x2r+l),. . . , (djdx,) span C , the characteristic vector-field system of w . Now, by inspection, the vector fields ( J / ~ X ~ ~ . .+. ,~(ajax,) ), are tangent to the submanifold xr+ = 0 = . . . = x Z mthat ; is, their integral curves starting at one point of the submanifold remain on the submanifold. Abstractly, this means that for all p E N . C+(p)c (P*(Np,)
13. Hamilton-Jacobi Theory
129
C " by restriction" defines the vector-field system H on N ' required for the proof of (c). THEOREM 13.6 Let w be a closed 2-form on the manifold M , and let 6' be a 1-form such that d6' = w . Suppose that 4 : N + M is a submanifold of M that is also an integral submanifold of 8, that is, 4*(6')= 0. Let X be a characteristic vector field of w that is not tangent to 4 ( N ) at any point, and such that for somefE F ( M ) . X(0) = fd Let 6: N x [0, a] be the submanifold map constructed above; that is, for fixed p , t 46(p, t ) is an integral curve of X , reducing to 4 ( p ) at t = 0. Let 4t be the submanifold map: p + 6(p, t ) = 4,(p). Then: (a) 6 is an integral submanifold of w . (b) For each t, 4t is an integral submanifold of 8. Proof. (b) is a consequence of Theorem 13.2, since 4 is also, of course, an integral submanifold of w. Now X(6') = X _I d6' + d(X(6'))+ d(X(O)),since X is characteristic do. It suffices to prove (b) locally: We can then suppose that the coordinate system ( x i ) , 1 < i, j , .. ., S n , has been chosen so that X = d/dx,. Then there exists a g E F ( M ) such that X(g6') = 0 and such that g is everywhere # O . For g must satisfy
and it is evident that (locally) such a g can be found. Suppose g0 = a , d x i . Hence dai/ax1= 0. We see, using the explicit form of the integral curves of X , that 4,*(xi) = 4 * ( x i ) for i > 1, 4r*(X1) =
Thus, holding t constant,
4 * ( 4 + t.
4,*(g') = +,*(ai(xz 7
..
. 7
xn))
d(4,*(xi))
.. . , x.)d(x, + t ) U i ( X 2 , . . . , xn)dxi
= Ul(XZ,
+ izc
1
Q.E.D.
Now we shall indicate the relation of this general theory to HamiltonJacobi theory in the usual sense, namely, the study of the Hamilton ordinary differential equations and the Hamilton-Jacobi partial differential equations.
130
Part 2. Hamilton-Jacobi Theory-Variational Calculus
We change notations: M will be a domain in RZ"+', 1 i, j , . . . , In.A point of M will be denoted by (x, y , t ) , x = ( x i ) ,y = ( y i ) .In physical problems, the xi are coordinates of configuration space, the y i the coordinates of momentum space, t the coordinate of time. (The reader will more frequently in physics books see x and y replaced respectively by q and p . ) Suppose that H ( x , y , t ) is a real-valued function on M . A closed 2-form o in M is said to be in Humiltoniun form with Hamiltonian H if dw = dyi A dxi - dH
A
dt.
The following special notation is useful in the computations in HamiltonJacobi theory and the calculus of variations:
H. Thus,
d2H
aZH
ayj axi ,
axi axj7
.=-
'33
dH
H i , n + j = p
= Hi dxi
etc.
+ H , + i dyi + H , dt.
THEOREM 13.7 Let w be the 2-form given by w = dy,
A
d x i - dH
A
dt.
Then : (a) Rank o = 2n, dimension of characteristic vectors = 1. (b) There is exactly one characteristic vector field X such that X ( t ) = 1, namely, (13.1) X also satisfies: X ( H ) = H , . (c) A curve in M that is a characteristic curve of X can be written in the form ( t , x ( t ) ,y ( t ) ) .The functions x ( t ) , y ( t ) are determined as solutions of the Hamilton equations with Hamiltonian H :
Proof. Suppose U E T ( M )is a nonzero tangent vector to M that is a characteristic tangent vector to o.Then
0=v
_I
w = v(Y,) Clxi - U ( X J dyi
- v ( H ) d t + ~ ( t ) ( Hdxi i + H,+i
dyi +.H, dt).
131
13. Hamilton-Jacobi Theory
First we must have v ( t ) # 0: for otherwise, 0 = u(y,) = u(xi); hence u is identically zero. Thus u can be normalized so that u ( t ) = 1 ; hence U(XJ = Hn+i ~(yi= ) -Hi > that is, 2) is uniquely determined; hence the space of characteristic vectors is one-dimensional. Working backward, we see that (13.1) provides an everywhere # O characteristic vector field. That (13.2) gives the integral curves of X follows, of course, from the very definition for vector field. Now, we turn to a similar geometric interpretation of the Hamilton-Jacobi partial differential equation with Hamiltonian H : 9
as
and to an explanation of the Hamilton system (13.2).
"
duality " between this single equation and the
THEOREM 13.8 Let w be the 2-form dy, A dxi - dH A dt. Suppose, for simplicity, that H i s defined over all (x, y , t)-space so that w is defined over all of R2"+'. Let 7c: R 2 n + l Rll+1 be the projection map of (x, y , t ) on (x,t ) . Let D be a convex domain of R"", and S(x, t ) a function defined on D.Associate with D the map: &: D -+ R2"+',defined as follows: -+
(Thus (a)
4s*(~i) = x i ,4s*(yi) = as/axi,cPs*(t) = t.) Then &*(a) = 0; that is, 4s is an integral submanifold
of o if (13.3)
that is, S(x, t ) is a solution of the Hamilton-Jacobi partial differential equation. (b) Conversely, any map 4 : D + R2"+l that is a cross-section map for 7c, that is, satisfies 7c4 = identity, and that is an integral submanifold of w arises in this way as (Ps for some function S on D that is a solution of (13.3). (c) If S(x, t ) is a solution of (13.3) defined in D,consider the vector field
a
a + +s*(H,+i) at axi . a
=-
-
(13.4)
132
Part 2. Hamilton-Jacobi Theory-Variational Calculus
The integral curves of Yare of the form ( t , x ( t ) ) , where x ( t ) is a solution of the system of ordinary differential equations :
(13.5) Each of these integral curves is the projection under .n of a characteristic curve of w.? Explicitly, for a solution x ( t ) of (13.5), the curve t + ( x ( t ) , yi(t) = (dS/dxi)(x(t), t ) , t ) is a characteristic curve of w, that is, is a solution of the Hamilton equations. Equivalently, we may say that 4 maps every integral curve of Y into an integral curve of X .
as axi
= - dxi -
:(
4s*(H) dt
= dS - -
+ &*(H)
1
dt.
Thus, if S is a solution of (13.3), that is, (aS/at)+ &*(H) = 0, then
+,*(do) = d dS
= 0.
This proves (a). Conversely, suppose that 4 : D + RZn+lis a cross-section map such that 4*(0) = 0. Now w = do; hence d(4*(e)) = 0.
Suppose S(x, t ) is a function on D such that dS
=
4*(O)= 4*(yi) dxi - 4 * ( H ) dt.
(Since a cross-section map: D + R Z n f ’ is characterized by the conditions 4*(xi) = x i , 4*(t)= t ) ; hence: 4*(yi) = aS/dx,,that is, 4 = 4,. This proves (b). Turn to (c) and note that 0=X
_I
w =X
_I
dO = X(8).
Since proving (c) involves a general relation between mappings and vector fields, it is worth our while to put down the general condition.
7 For the Harniltonians arising from the calculus of variations, the integral curves of Y form what is known as an “extremal field.”
13. Hamilton-Jacobi Theory
133
LEMMA 13.9 Let a : M - , M‘ be a mapping between manifolds. Let X and X ’ be vector fields on, respectively, M and M ‘. Then a maps every integral curve of X into an integral curve of X ’ if and only if or
a*(x(f)) = Xa*(f)
for all f E F(M’),
(13.6a)
a*(X(p)) = X ( a ( p ) )
for all p E M .
(13.6b)
Proof. Suppose o(t),0 5 t I b, is an integral curve of al(t) = ao(t). Then, al‘(t) = X(a,(t)); that is, for f E F(M’),
d -f(o,(t)) = X(a,(t>)(f>= X(f)(o,(t)). dt
But
Since o(0) can be any point of M , we get condition (b), and also condition (a), as
X(a*(f))(p> = a * ( W ) ) ( p )
for all P E M .
The steps we have gone through are reversible, to prove the converse. Q.E.D.
Returning to Xgiven by (13.1) and Y given by (13.4), we verify the condition (13.6) (it suffices to verify (13.6) when f varies over any basis for the functions) : 4s*(X(xi>)=z 4s*(Hn+i) = 4s*Y(xi) = Y(4s*(xi))*
134
Part 2. Hamilton-Jacobi Theory-Variational
(using the fact that S satisfies (13.3)) =
=
(
E:
)
Calculus
: E azs (
- H ~~ , - - , t - H , , + ~x,--,t
as
+
H"+j(X7
-
4s*(Hi) = 4s*(X(xi>).
2)
&*(X(t)) = &*(I) = 1 =
)
-
r(t)= Y(#s*(t)).
Q.E.D.
Let us now turn to the problem of actually solving the Hamilton-Jacobi equation
as
(13.7)
THEOREM 13.10 Suppose that the domain D c R2"+l is sufficiently small, that S o ( x ) is a function defined in a domain D' of R" (that is, x-space) such that the point (x, y = (aS'/ax,)(x), t = 0) belongs to D whenever x E D'. Then, if D is sufficiently small, there exists an a > 0 and a unique solution S(x, t ) of (13.7), defined for x E D', 0 I t i a, such that S ( x , 0) = SO(X). Further, for xo E D, let ( x ( t ) ,y ( t ) ) ,0 5 t i a, be the solution of the Hamilton equations : (13.8) with x(0)
= xo, y(0) = (aSo/axi)(x').
Then ( x ( t ) , y ( t ) ) must also satisfy
That is, t + (x(t),t) is an integral curve of the vector field defined by (13.4) on (x, t)-space. For T E [0, a],
s(~(T),
= so(X0)
+ J 0T [ Y i ( r ) dxi(t) - H ( x ( t ) , Y ( x ) , t ) ]
dt.
(13.10)
Proof. This theorem is really nothing more than a realization of Theorem 13.2. Let 4' be the mapping D' -+ D that assigns (x,y = ((aS/i3xi)(x),0)) to x E D'. Then ~O*(Y,
as
dxi - H d t ) = - dxi = d S ;
axi
135
13. Hamilton-Jacobi Theory
hence 4' defines D' as an integral submanifold of the 2-form w = dyi
A
dxi - dH
A
dt.
We have seen that the characteristic curves of w are essentially defined as the solutions of the Hamilton equations (13.9). By Theorem 13.2 there is an integral submanifold 6: D' x [0, a] -+ D such that 6(x, 0) = 4"(x) for x E D', and such that for each xo E D,t + S(xo, t ) is a characteristic curve of o. Let p : D 3 R"" be the " standard" projection that assigns (x,t ) to the point (x,y , t ) E D. Consider the map p 6 : D' x [0, a] RE+'.If a is sufficiently small, this map has nonzero Jacobian. Suppose D' and a are chosen small enough so that there is a domain D"c R"" and an inverse map (p6)-' : D"-+ D' x [0, a]. Let 4 = 6(p6)-', a map D"-+ R2"+'. Then --f
p+
= p6(p6)-' = identity
map;
that is, Cp is a cross section map D"+ R2"+',and is an integral submanifold of o.Then d4*(yi dxi - H d t ) = 0; hence 4*(yi) dxi - 4 * ( H ) dt = d S
(13.11)
for some function S(x, t ) defined for (x,t ) E D".(4*(xi) = x i and + * ( t ) = t because of the cross section property of 4 ; that is, pCp = identity map.) Thus
that is, S satisfies the Hamilton-Jacobi equation. We now show that S(x, 0) = So(x) + constant. We see that 4(x, 0) = @"(x). Thus d ( S ( x ) - S0(x)) = 0.
We have already seen in Theorem 13.7 that (13.9) does define a characteristic curve of o starting at (xo,y(0) = ((dS/axi)(xo)),0) = 4 ° ( ~ ohence ) ; it must agree with the curve t -+ 6(xo, t ) . Finally we prove (13.10). By (13.11),
Q.E.D.
Equation (13.10) has an interpretation in terms of the "action" that has a
136
Part 2. Hamilton-Jacobi Theory-Variational Calculus
certain importance in physics. In general, if (x(t), y ( t ) ) ,0 5 t 5 T, is a curve in (x,y)-space, the number ( 13.1 2)
is called the action along the curve. In problems in mechanics, x i ( t ) and y i ( t ) describe, respectively, how the position and momentum coordinates of a particle (or system of particles) changes in time. The function H gives the value of the energy, so that (13.12) assigns a definite number to each possible trajectory of the physical system. The “principle of least action” requires that the trajectory actually followed is one that minimizes this value of the action. It is easily seen that these curves are contained among the solutions of the Hamilton equations (13.8). In quantum mechanics (according to the viewpoint of Feynman) there is a “smearing out” of this one definite trajectory that minimizes the action, and possible trajectories are given weights determined by the action, and by Planck‘s constant h, in such a way that as h + 0, this smearing peaks up to concentrate at the one definite minimizing trajectory. This would give a marvelously geometric picture of the correspondence principle” (that is, of the sense in which quantum mechanics reduces to classical mechanics as Planck’s constant goes to zero) if Feynman’s ideas could be made more rigorous. This finishes our discussion of the part of Hamilton-Jacobi theory that can be stated in terms of precise theorems. Much more material, harder to formulate as theorems, has been developed in the long history of the subject. This is mainly motivated by the role that Hamilton-Jacobi theory has played in physical applications, particularly in celestial mechanics, geometrical optics, and in the foundations of quantum mechanics. It is almost an obligatory task for anyone interested in these physical applications to read this supplementary material. We shall limit ourselves here to several remarks that fit in particularly well with the differential-form point of view. First we ask what is meant by “solving” the Hamilton equations. We have seen in Part 1 that solving a system of ordinary differential equations may be interpreted geometrically as finding a change of coordinates so that the vector field whose integral Eurves solve the differential equations is in these new coordinates a vector field whose integral curves are in some sense known.” We can make a similar interpretation of the problem of “solving” the Hamilton equations. Suppose that o = dyi A d x , - dH A dt. If a new coordinate system (xi’, yi’, t’) can be found so that “
”
“
“
”
“
o = dy,‘
A
dxi,
(1 3.13)
we can say that o is in canonical form in the coordinate system. Since o
13. Hamilton-Jacobi Theory
137
remains the same, the characteristic curves of w are geometrically the same, but the coordinates of these curves in the “ new ” coordinate system satisfy the Hamilton equations with Hamiltonian zero. That is, the (xi’,y,’) are constant along the integral curves; hence, when expressed in terms of the “old” coordinates, they are a set of 2n functionally independent integral functions of the original Hamilton equations. Hence these original Hamilton equations can be regarded as ‘‘ solved.” We can then say that solving them is more or less equivalent to throwing w into canonical form. In Theorem 13.3 we have one method for writing w in canonical form. However, this is not a really practical method for doing so: For example, it applies to any 2-form of constant rank and does not really take advantage of the fact that o is initially in a relatively simple form. There is another method, due to Jacobi, for doing this. This method works by solving the HamiltonJacobi partial differential equation and that seems particularly well adapted (at least as well as anything else) to the equations of celestial mechanics, particularly the 2-body problem. We now explain this method. We have seen that a single solution S(x, t ) of
as -+H at
::
(x , - - , t ) = o
(13.14)
determines an n-parameter family of solutions of the Hamilton equations, obtained by finding the integral curves of the vector field
in ( x , t)-space. Thus an n-parameter family S(x, t; a,, ..., a,,) of solutions of (13.4) that depends “ essentially” on the parameters, that is, satisfies det(-)
as axi a a j
# 0,
(1 3.15)
should in principle determine the full 2n-parameter family of solutions of the Hamilton equations. An n-parameter family of solutions of (I 3.14) satisfying (13.15) is called a complete solution of (13.14). Jacobi’s trick consists in the observation that, given a complete solution, the reduction of o to canonical form can be made directly without solving any differential equations, as follows : Introduce a “ new ” space R4”+l of variables (x,y, t, a, b), a = (ai), b = (bi),etc. Consider the form Q = d y , A d x , - dH = d(yi
dxi - H dt
A
dt
+ d b , A dai
+ bi d a J .
138
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
Consider S(x, t ; a) as a function of all the indicated 2n + 1 variables. Then
as axi as =-
d S ( x , t ; U ) = - dxi
as + as - d t + -d at
aai
~ i
axi
Consider the submanifold defined by the relations (13.16) Notice first that the differentials of the functions on the left-hand side of relations (1 3.16) are everywhere linearly independent; hence the relations do define a bonafide 2n + 1-dimensional submanifold of R4"+'. We next show that the projection of this submanifold on (x,y , t)-space has nonzero Jacobian. For this it suffices to show that every form on the submanifold can be written in terms of the d x , , dy,, and dt.7 Clearly, this is an integral submanifold of 0.Now, on the submanifold,
a2s
dbi
= -dXj
dy,
= -d a j
axj aai
a2s
axi a a j
a2s a2s + aai d t +daj at aai auj ~
2
+ ax,a2s axj d x j + _ axi_at dt. a2s
~
By (13.1 5), dui can be restricted to the submanifold, then be expressed in terms of dy and dx,and thus so can db, . This shows that the projection on (x,y , t ) space has nonzero Jacobian. Then, locally, the submanifold defined by (13.16) can be written as the graph of a mapping 4 : (x, y , t ) -+ ( u ( x , y , t ) ) of R2"+' -+ R'". We have 4*(dUi A
db,) = 0 ;
hence 4 is the required mapping, sending o into canonical form (if ui and bi are redefined as xi', y i ' ) . Of course one point to all this is that (at least for some of the Hamiltonians occurring in the simpler problems of classical mechanics) a complete solution
4
t If : V-. V' is a linear transformation between vector spaces of the same dimension, to prove that is an isomorphism it suffices to prove that the dual map $* of linear forms: V'* + V* is mito.
4
139
13. Hamilton-Jacobi Theory
of the Hamilton-Jacobi equation can be found (after a possible change in variables of x-space) in the additive form
S(x,, . . . , x,; a,, . . ., a,)
= S , ( x , , a,)
+ ... + S,(X,,
a,)
by the method of separation of variables, so useful in elementary theoretical physics. However, the Hamilton-Jacobi equation is highly nonlinear, and solutions of this type seem to be even more accidental than are the solutions that can be obtained from the usual linear partial differential equations of theoretical physics. Our second remark is concerned with “ perturbation theory ” in the special sense that it is used in celestial mechanics. However, a few remarks about perturbation theory in general might be in order. Let D be a domain in an R”, and let X be a vector field in D. Let X “ be a one-parameter family of vector fields in D reducing to X when E = 0. Suppose also that, for simplicity, the coefficients of X “ depend on E in a real analytic way. The simplest example would be X “ = X + E Y, where Y is another vector field. In general, “perturbation theory” is concerned with studying how the integral curves of X “ are related to those of X as E -+ 0. In terms of the nineteenth century, pre-Poincark view of differential equations, this was a purely computational problem, limited only by one’s ability to compute formal expansions in E . Research since PoincarC has shown that this is a hopelessly naive view. (It is of a different order of difficulty, for example, from the results in divergent series; modern research has shown here that, by and large, the formal classical work can be cleaned up.) Existence theorems for the type of expansions one is trying to find must be proved and are usually quite difficult, at least for physically realistic situations. The typical problem of this type is that of perturbing periodic integral curves. An integral curve a(t), -00 < t < 00, is periodic or closed with period T if o(t T ) = a ( t ) for all t. It suffices, in view of the uniqueness theorem for integral curves, to show that a(T) = a(0) (but a ( t ) # o(0) for 0 < t < T ) . Given such a periodic integral curve, does there exist a periodic curve a&with period T(E)for E sufficiently close to 0, reducing to the a and T as
+
E+O?
Let us return now to Hamilton-Jacobi theory. Suppose we have a oneparameter family of Hamiltonian functions, say, H ( x , y , t ) = H o ( x , y , t ) + E H ’ ( x ,y, t ) . Suppose the Hamilton equations with Hamiltonian H o can be “solved,” say by the method of Jacobi given above, and H ’ is regarded as a small perturbing “energy ” applied to the “ known ” system with Hamiltonian H o . The typical example in celestial mechanics is that where H o describes the motion of the sun and earth, the “solvable” 2-body problem, and where H’ describes the perturbation on the earth, by, say, Venus. (Or, to be “
”
140
Part 2. Hamilton-Jacobi Theory-Variational Calculus
modern, replace sun by earth, earth by satellite, Venus by the Moon.) Intuitively, the effect can be visualized as a “ slow ” change in the elliptical orbit of the earth, that is, the “parameters” describing the orbit that would be constant were there no perturbation to change slowly with time. Analytically, this can be interpreted as follows: dy,
A
dxi - dH
A
dt
= (dy, A
dxi - d H o
A
dt)
+ dH’
A
dt.
Being able to solve the system with Hamiltonian H o means that we can find functions ( x i ’ ,yi’) on (x,y , [)-space such that dyi
A
dx, - dHo
A
d t = dy,‘
A
dx,’.
Now ( x i ’ ,y,’, t ) forms a new coordinate system; for, holding t = constant, we have dyi‘ A dx,‘ = dy, A dx,;hence the Jacobian, for fixed t, of the map going from (x, y ) to (x’(x,y , t ) , y‘(x, y , t ) ) is 1. Thus d y , A dxi - d H
A
dt
= dyi‘ A
dx,’+ dH’
A
dt.
When H’ is expressed as a function of the new coordinates, the characteristic curves of the form, that is, the Hamilton equations with “total Hamiltonian H , are described by Hamilton equations in the primed coordinates with Hamiltonian H’ : The effects of the unperturbed system have been completely taken into account, and the variations of the ‘‘constants” ( x i ’ , y i ’ )of the unperturbed motion due to the “perturbation” H’ are very simply taken into account . Our third topic will be to discuss in more generality some of the underlying transformation ” properties of the Hamilton equations that we have used in the first two topics. Suppose, then, that we are given two separate systems. Consider two spaces of variables ( x i ,y i , t ) and (xi‘,y,’, t ’ ) with Hamiltonians H a n d H ’ given on both. A mapping 4 from unprimed to the primed space such that ”
“
4*(dyi‘
A
dx,’ - dH‘
A
dt’) = dyi A dxi - d H
A
dt
(13.17)
will have the following property:
Given a curve ( x ( t ) , y ( t ) ) in unprimed space (that is, solutions of the Hamiltonian H ) , the image curve $ ( x ( t ) , y(t )) will, after reparametrization by the level surface of the function t‘, be a solution of the Hamilton equations with Hamiltonian H’. Thus a 4 satisfying (13.17) sets up a correspondence between solutions of the two Hamilton equations; however, the sense of “time” may not be preserved in this transformation, and the Hamiltonian H‘ may bear a quite complicated relation to H.
13. HamiltonJaeobi Theory
141
Now, clearly of special importance in all this are the transformations of ( x , y ) to (x’,y’)-space above, such that
4*(dyi
A
dxi’) = dy, A d x i .
(13.18)
Such a transformation is called a canonical transformation. If H‘(x’, y‘, t ) and H ( x , y , t ) are functions such that $*(H’) = H , by (13.18) 4 carries solutions of the Hamilton equations with Hamiltonian H into solutions for the Hamiltonian H ‘ . If we regard 4 as a mapping of the same space onto itself, the canonical transformation with an inverse? forms a group. This group may be regarded as permuting the Hamiltonian systems. In classical language, a Hamiltonian system of ordinary differential equations is a Lie system with respect to the group of canonical transformations (just as a linear system of differential equations defined by a vector field of the form
is a Lie system with respect to the group of all transformations of (x, t)-space of the form ( x , t ) -+ ( n ( t ) x , t ) , where a ( t ) = ( a i j ( t ) ) is an n x n matrix function o f t ) . We defer further study of the group of canonical transformations until later. Exercises
1. Work out the solution of the Hamilton-Jacobi equation for the case where H(x, y ) is of the form: + y i y i+ V ( r ) with r 2 = x i x i , (1 2 i 5 3). Work out the solution of the Kepler problem in the celestial mechanics (that is, the case V ( r ) = l / r ) in as complete a form as possible. 2. Let.H(x, y ) , H ’ ( x , y ) be two Hamiltonians. Show by means of Jacobi’s complete solution method that locally there is a canonical transformation taking one into the other. If, for example, H ’ is a function of y alone, discuss what this means for the problem of “solving” the Hamilton equations for Hamiltonian H . Can you think of any “global” reasons why this canonical transformation may not exist globally? For example, discuss the Kepler problem from this global point of view. Also, discuss globally the simple case where x and y are one-dimensional vectors. “
”
7 Since +*@xi’ A dyi’ A ... A dx.’) = dxi A dyi A ...A dy”, a canonical transformation always has Jacobian equal to 1 ; hence it has at least a local inverse.
14
Extremal Fields and Sufficient Conditions for a Minimum
Let B be a manifold. We have said that a Lagrangian for B is a real-valued function L : T ( B ) x R + R , which enables us to define a real-valued function 0 -+ L(0) = j lL(a’(t ),t ) 0
dt
on the space of curves of B. For theoretical purposes, it is most convenient to study homogeneous, time-independent Lagrangians. This means that
L is a function L(v) of L ( h ) = AL(u)
D E
T ( B ) alone;
for i> 0.
(14.la) (14.1b)
Condition (14. I b) guarantees that the function t -+ L (0)is independent of the parametrization of the curves. In the development of this chapter (we are following CarathCodory’s ideas [2], as modified slightly by Hermann [I]), we shall consider such Lagrangians, indicating later how nonhomogeneous ones are handled also. Introduce local coordinates ( x i ) for B and (xi, ij)for T ( B )as explained in Chapter 12. Then L becomes a function L(x, i ) ,satisfying the homogeneity condition
L(x , A i ) = AL(x, rn).
Differentiating both sides of this relation gives Euler’s relations :
L(x , a) = L,, j ( x , i)rnj,
(14.3)
L,, j ( x , i)ij =0, L,, j(x’ A i ) = L,, j ( x , i). (Recall that
L
aL
ai, ,
_=-
aL Lj=- ,
axj
etc.)
Now
e(L) = L , , ~d x j - ( L - ~ , , + dt ~ i ~ ) = L,,,
dxj; 142
(14.2) (14.4)
143
14. Extremal Fields-Conditions for a Minimum
using (14.2), 6(L)becomes a form on T ( B ) alone. Hence, in dealing with such homogeneous Lagrangians, we can take M as T ( B ) and omit explicit time ” dependence. When considering constraints, it is appropriate to consider homogeneous ones also. Thus K will be a subset of T ( B ) such that “
Au E K
for all u E K , all L > 0.
In the traditional versions of the Lagrange variational problem, K is defined by equations on T ( B ) : ga(u) = 0
for 1 I a I m,
with ga(lu) = Ag,(v)
for L > 0.
Definition An extremalfield for the homogeneous variational problem is defined by a pair ( W , X ) consisting of a real-valued function W on B and a vector field X E V ( B ) such that
X ( W )= 1
= L(X(b))
for all b E B.
(14.5a)
That is, d dt
- W(o(t))= L(o’(t))
along any integral curve t + a ( t ) of X . (This is just a normalizing condition.)
X(b)E K
for all b E B.
(14.5b)
For each b E B, L(X(b))is a relative minimum function
u -+ L(u),where u varies over the vectors u E Bb satisfying u E K , u(W) = 1.
(14.5~)
THEOREM 14.1 Let o ( t ) , 0 I t I a, be an integral curve of X . If al(t), 0 I t I a is any nearby? curve to o(t>,with W(a(0)) = W(a,(O)),W(al(l)> = W(o(l)), whose tangent vector curve belongs to the constraint-set, that is, ol’(t) E K for 0 I t I a, then
L(o) = fL(d(t)) dt I L(al) = f L ( o l ’ ( t ) )dt. 0
t The precise meaning of
‘I
nearby ” will be made clear in the proof.
144
Part 2. Hamilton-Jacobi Theory-Variational Calculus
If, further, for each b E B, X(b) is the only minimal point of L(v), with u subject to the conditions listed in (14.5c), then L(o) < L(o,), unless o1 differs from CJ only by a change of parametrization. Proof. We can suppose without loss of generality that W(a(0)) = 0 W(a(a)) = 1 . Then, by (14.5a), W(o(t))= t ; that is, c~ is parametrized by the value of the W o n the level surface of Wit lies on. (Think of the successive level surfaces of W as wave fronts,” and the integral curves of X as the “ rays ” corresponding to these wave fronts.) If (d/dt)W ( o l ( t ) )> 0, the parametrization of 0 , can be changed so that W ( o l ( t ) )= t also, that is, so that o1 is parametrized by the level surface of W it lies on. Let us then make precise the condition that c1be “nearby” to [r by requiring that: “
(a) (d/dt)W(o,(t))> 0; that is, the function Wis always strictly increasing on ol.Thus the values of W can be introduced as a parametrization for 0 , ; that is, we can suppose that W(a,(t)) = t , 0 5 t 5 1. (b) For each t E [0, a ] , ol’(t) (which now satisfies ol’(t)(W)= 1 and cr,‘(t) E K ) is sufficiently close to X ( o , ( t ) ) so that property (14.5~)holds; that is, L(X(CJ,(t1)) 5 UCJ,’(t1). Thus,
whence
L(o,’(t)) 2 L(X(o,(t)) = 1 = L(X(a(t))) = L(a’(t)),
L(0,) = J>(cT1yt)) 2 />(d(t))
dt 2 Jaldt 0
dt 2 L(cr).
Q.E.D
Thus we have found a method (CarathCodory’s) for proving that certain curves give a minimum to a given variational problem by solving an infinite succession (parametrized by the level surfaces of W ) of finite dimensional minimization problems. Actually, the method is an abstraction of a method that had been implicit from the beginning of the calculus of variations. Carathtodory simply stood the classical reasoning on its head and put the motivation into, first, making the method clear; and second, into carrying out the analytical details necessary to show that there is a plentiful ” (local) supply of such extremal fields, and to find the conditions that a single curve be embeddable as an integral curve of an extremal field. At every point of the domain, we have a vector X(b), which are the rays” corresponding to the curve fronts W = constant. X ( b ) represents the “ optimal” direction to go when a curve is at point 6. Since c~ is an integral curve “
“
I45
14. Extremal Fields-Conditions for a Minimum
of X , it is always going optimally; (14.5~)expresses this "optimality". Any other curve going from a(0) to the surface W = W(o(1)) would at some point violate this optimality; hence it would give a larger value to L(o,). Let us now work out the conditions (14.5) more explicitly in the case of the constraint-set K defined by equations : g,,(x, a) = 0. Introduce the abbreviations : g a , . = -aga axi I
aga g a , n + i = - , etc. axi
7
aw
Wi =-,
axi
etc.
Suppose that W ( x ) and X = Ai(x)(a/axi)define an extremal field. Fix xo E D. We shall carry out the minimization of (14.5c), using in the usual way the Lagrange multiplier rule. Introduce real constants A, A,: set up the function f L(x", i) add , the constraint functions multiplied by the multiplier of constants --f
f
+ L(x0,
a) + A(wqxO)fi - 1) + A,g,,(xO, a),
and express the fact that il= Ai(xo) is to be a critical point for this unconstrained function of 2 ; that is, Ln+i(XO,A(X0))
+ Aw,(xO)+ Aag,,,n+i(XO,A(xO))= 0.
Now we want x o to vary. It is not too unreasonable to suppose that the A and A,, will also vary with x ; that is, Ln+i(x,
A(x)) + 4x)wi
+ &(x)ga, n+i(x, A ( x ) ) = 0.
(14.6)
By (14.5a), we have W iAi = 1. The Euler relations for homogeneous functions give Ln+i(x, 2)fi = L(x, a), go, n + i ( x , f ) f i = gn(x, f). By (14.5b), g,(x, A(x)) = 0; hence ga,n+i(x,A(x))Ai(x)= ga(x, A(x)). Multiplying (14.6) by A i ( x ) and adding, using the Euler relations, we have 0 = L ( x , A(x)) + A(x) = 1 + A(x).
Thus (14.6) simplifies to
aw
- axi (XI
= Ln+i(x,
)IX('
+ Aa(x)ga,n+i(x,A(x)),
or
dW
= Ln+i(x, A ( x ) ) dxi = &(x)ga,n+i(x, A ( x ) ) d x i .
(14.7)
We want to describe this relation more geometrically. Now O(L) = L,,, a x i . Consider (A,) as the coordinates of a space R", and consider the space of the
146
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
variables (xi,ii, La) as T ( B ) x R". Then X and the functions (A,(x)) together define a cross-section mapping @ : B-+ K x R". Notice that (14.7) can be rewritten as (14.8) d W = @*(O(L) laO(g,)).
+
Applying d t o both sides, we have 0 = @*[Id(O(L)+ l a W a > > I *
(14.9)
Thus @, considered as a map of B + K x R", defines a submanifold of K x R" such that the 2-form, w = d(O(L) + &O(g,)), on K x R" is identically zero when restricted to this submanifold. Thus the (partial) differential equations defining the extremal field can be defined in this "geometric" way. Conversely, consider an n-dimensional submanifold of K x R" such that o is zero on this submanifold and such that the forms dx,, . . . , dx, are independent on the submanifold. Notice now that if B is sufficiently small, the cross-section map @: B+ K x R" c T ( B ) x R" and the function W on D satisfying (14.8) can be reconstructed. For suppose that the submanifold is realized as a map 4 : B' -+ K x R", where B' is a domain of R" such that
4*(w) = 0,
+*(dx,
Then the composite map B' -+ K x R" dinates of B' are t,, . . . , t,, 4 satisfies
b*(dX, A
"'
A
A
... A dx,) # 0.
+B
has nonzero Jacobian (if coor-
dX,) = J(t)(dt, A
... A dt,)).
Hence, if B' is small enough, an inverse map exists; that is, we can identify B' with B and suppose that 4 is a cross-section map x + (X(x),A,(x)), where X(x) E K , and (&(x)) E R". 0 = 4*(o) = 4*(d(O(L)+ l
a O(ga>>) =
d4*(O(L) + l a %a))
hence there is a function W with
dW
= $*(O(L)
+ l a O(ga));
that is, if 4 is identified with @, we have just (14.8). This discussion may be summed up by saying: There is a 1-1 correspondence between extremal fields of the variational problem defined by ?hehomogeneous Lagrangian L(x, i) and the homogeneous constraint function ga(x, i) = 0 and n-dimensional integral submanifolds of the 2-form d(O(L)+ &O(g,)) on ( K x R") (= subset of (x,,ii,La) defined by ga = 0) which have the property that dx, A - - .A dx, is nonzero on the submanifold.
;
14. Extremal Fields-Conditions for a Minimum
147
Of course one source of imprecision in this statement is that in our working of our way backward to define X , X ( x ) is only a critical point, not necessarily a relative minimum, of the function u -+ L(v) when u runs over those u E Dx0 such that u E K, v(w) = 1. It is usual in classical treatments to impose a priori conditions on the second derivatives L,+ i, n + j , ga,n+ i , ,+ of L and g,, guaranteeing that the Hessian form of this function is positive. Such a condition is usually called a Legendre condition, but there is really no point in our writing it down explicitly here. THEOREM 14.2
If CD : x -+ (x, i= A(x), A,(x)) is an integral map of B -+ K x R” such that @*(d(Q(L)+ Aa8(ga)))= 0, if X is the associated vector field, that is, X = A,(d/dx,), then an integral curve x ( t ) of X has the following property: The curve t + (x(t), (d/dt)x(t) = i ( t ) , A,(x(t))) is a characteristic curve of the 2-form d(8(L) + A, O(g,)). A necessary condition that a given curve t -+ x(t) in B be embeddable in an extremal field is then that there be functions (A,(t)), called Lagrange multipliers, so that the curve
is a characteristic curve of this 2-form.
Proof. Let o(t),0 t 5 1, be an integral curve of X , and let ol(t)=@ ( o ( t ) ) A,(a(t))) be the image curve in K x R”. Let 8 = L , + , d x , + L t a , n + i d x i = 8 ( L ) + Aa8(gJ. = (ol’(t),
Taking t 3 / d i j of (14.2), we have
+
L , + i , , + j ( x ,i)ii L,+j = L,+,
or
J ~ , + ~ , , + ~ ( Xi ,) i j= 0.
We remark now that al’(t)-Jdo, as a 1-covector at the point ol(t), contains nonzero terms only in the dxi-terms. For since a ( t ) is an integral curve of X = A i(a/axi), d - x ( t ) = A(x(t)). dt
148
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
The coefficient of ol’(t)J dO involving d i j is then
by the Euler relations. The coefficient of dL is
since a’(t) E K ; that is, ga(a’(t))= 0. Now we remark that the 1-covector o l ’ ( t )_I dB is zero when pulled back to B by @*; that is, @ * ( c l ’ ( t )_I dB = 0. To prove this, let u E B,,(t). @*(al’(t) A dO)(u) = ( o l ’ ( t )_I dB(@,(u)) = dO(a,‘(t),CD,(v)) = dO(@*(o’(t),@*(v))
= @*,dO)(o’(t),v) = 0.
These two remarks clearly force al’(t) _I d0 = 0; that is, t -+ ol(t) = @(a(t))is a characteristic curve of the 2-form d(B(L) + A,O(g,)), as required. The fact that t + ( x ( t ) ,(dxldt), A,(t)) is a characteristic curve of d(O(L) + A, O(g,)) leads to an interpretation of the Lagrange variational rule: Construct the Lagrangian Z ( x , i, t ) = L(x, a)
+ A,(t)g,(x,
i).
Notice that the curves r + ( x ( t ) ,(dxldl)),A,(t )) that are characteristic curves of d(U(L) + A, U(g,)) and satisfy the constraints ga = 0 are also characteristic curves of dO(L’);that is, they are the extremals of the ordinary variational problem that are defined by L‘ that happen to satisfy the constraints. Linear Lagrangians and Convex Inequality Constraints Suppose now that L : v + L(v) is a linear function on T(B) with B a domain of R”’.Thus, in terms of coordinates ( x i ,i i )for T(b),1 I i, j , .. ., < n, L ( x , 2) must have the form L ( x , x ) = at(x)xi. Further, we shall suppose that K is defined as the set of ( x , i) E T(B)such that ga(x, i) i 0, 1 5 a, b, . . . , i m, where, for each x E B, the Hessian matrix (go, ,+ ,,+,(x, i) is ) positive semidefinite. (Thus, the g, are convex functions when restricted to each tangent space.) Suppose that W and X = A,(d/dx,)are, respectively, functions and vector
14. Extremal Fields-Conditions for a Minimum
149
fields in D that define an extremal field in B for the variational problem. By (14.5a), ga(x, A ( x ) 5 0 . X ( W ) = 1 = Ui(X)Ai(X), For each x E B, f = A ( x ) is the minimum value of the function f + A i ( x ) i i when f varies in a neighborhood of A(x), subject to the condition W,(X)ii =
1,
ga(x, 2)) s o .
Note first that we cannot have ga(x,A ( x ) ) < 0 for all a ; that is, A ( x ) cannot be in the interior of the constraint set. For otherwise the function 1-+ L(x, i ) would be a linear function that has a relative minimum on an open subset of a space R"-' (after the constraint Wi(x)i1 = 1 is taken into account), which is impossible. Suppose, then, that g,(x, A(x)) = 0 for 1 5 a s p , but that g,(x, A(x)) < 0 forp+ 1 < a i m . Let us use a geometric terminology. Let K" be the subset o f f such that (x,f ) E K . Thus K" is that part of the constraint-set lying over the point x. It is a convex subset of B,. Think of the subset of K " consisting of those 3 satisfying ga(x, 2) = 0, 1 5 a i p , ga(x, f ) < 0, p
+ 1 ia I n,
as a face of the convex set K". We also call the face nondegenerate if the rank of the matrix (g,, n+i(x,a)) is, for all ( x , i )on the face and for 1 5 a 5 p, 15iI n, equal to n, that is, is of maximal rank. We shall then say that this is an (n - p ) dimensionalface of the convex set K". Continue to regard x E B as fixed. We now show that the linear forms i
-+
K(x)ii,
1-+ g,,
n + i ( ~ ,
A(x))ii,
1I a 5 p,
(14.10)
are linearly independent. Suppose otherwise: By the hypothesis that the matrix (ga,n + i(x, A(x))) has maximal rank, the forms 2 -+ga,,+ i ( x ,A ( x ) ) i i are linearly independent. There must then be a relation of the form
Then
which is a contradiction.
150
Part 2. Hamilton-Jacobi Theory-Variational Calculus
+
Let v E B, and consider the line segment t + X ( x ) tv, - 1 t I 0. If the following conditions are satisfied, the whole segment lies in K", for t sufficiently close to zero, and satisfies the normalizing constraint v( W ) = 1 : Wi(X)ki(U)= 0 . d
+ tu)J(
-ga(x(x)
dt
t=O
= g,, , , + ~ x~ ,( x ) ) , ki(v> I
(14.11a)
o
for 1 1a 5 p .
(14.11b)
The minimal property of X ( x ) = ( x , A ( x ) ) requires that L(v) = U,(X)ki(U)2 0
for all vectors v E B, satisfying (14.11). The first condition this imposes is that the form i -+ a i ( x ) f ior u + L(v) can be written as a linear combination of the forms in (14.10); that is, D
ai(x> =
P
C Aaga,n+i(x,A ( x ) ) + nW,(x)*
a= 1
(14.12b)
For the forms of (14.10) are linearly independent: If they are extended to a basis of linear forms, if the form i + a i ( x ) i iis expressed in terms of this basis, if the coefficients of the forms other than those in (14.10) were nonzero, there would be a u E B, satisfying (14.1])-in fact (14.1 lb) could be zero-but L ( v ) could have arbitrary sign, which would give a contradiction. Using the Euler homogeneity relations again, we see that the A occurring in (14.12a) must be 1. Thus, finally, the minimization property requires that the I = , 1 _< a S p , occurring in (14.12) be 1 0 . We can thus extend the Aa to 1 I a i m by requiring that I , I 0. A11 this has been for a fixed value x E B. If, when x varies, X ( x ) always lies on a nondegenerate (n - p)-dimensional face, the Ra can be chosen as functions of x, Aa(x), 1 2 a I m, and we then have dW
= ~i
Note that O(L) = L,,
dxi -
C A,(X)ga,n+i(X,A ( x ) )d x i . Y
a= 1
dxi = ai dxi . Again this means that the mapping
@: x -(X(x), - I a ( x ) ) of B + C x R" is an integral submanifold of
4O(L) + Aa O(ga)h so that theextremal fieldsare determined by the same sorts of differential equations as those for the variational problems (14.5a) involving equality constraints, but in addition the integral submanifolds of this 2-form must satisfy
14. Extremal Fields-Conditions for a Minimum
151
an inequality condition, namely: The functions Aa on K x R” must be 20 on the integral submanifold. This may be described as the “ non-singular ” part of the theory. One obvious sort of singularity may happen when x + X ( x ) lies on faces of different dimensions as x varies over B. In problems of the theory of optimal control, this phenomenon has great importance, since it corresponds to “ switching,” but its general theory seems to be much more difficult. It is traditional in treatises in the calculus of variations to spend considerable effort in developing the various necessary and sufficient conditions for extremals of variational problems to give minima. However, this is not a subject of active interest in differential geometry (except in the special case of a Riemannian metric, which we shall develop later with other methods), and we shall not go into more details here. Our aim in this chapter has been only to present CarathCodory’s brilliant idea in as clear a form as possible. In the next chapter, we shall present further material of a classical nature that is useful in classical mechanics.
Exercises 1.
Prove (14.2) through (14.4).
2. Suppose L is a Lagrangian on a manifold M , and N is a submanifold. Suppose the constraints-set K consists of the u E T ( M ) that are tangent to N . What does the Lagrange variational rule say about the extremals ?
3. Show that Newton’s equations of motion for a system of particles (Chapter 1 I), for forces derivable from a potential, at least, can be written directly as the Euler equations for a Lagrangian. What is the relation of d’alembert’s principle to the Lagrange variation rule ? 4. Discuss the case of “ nonholonomic ” constraints in classical mechanics (for example, the case of a friction-free sphere rolling on a plane) from the point of view of the Lagrange variational rule. Is there a generalization of d’Alembert’s principle to this case? 5. For a simple variational problem (that is, with no constraints) work out the Legendre condition.
6 . Consider a simple case where such a Legendre condition is not satisfied in a uniform way; for example, the case of a pseudo-Riemannian metric in the plane (see Part 3). Discuss the possible minimizing properties of geodesics (that is, extremals), using Carathtodory’s method.
15 The Ordinary
Problems of the Calculus of Variations
In this chapter, we present the more classical approach to the most important special case of the general variational problem, whose theory has been already outlined. One may regard much of this material as constituting the mathematical content of classical mechanics; some will be a repetition (for the sake of clarity) of material already given. Let D be a domain in R“, the space of variables x i , 1 < i, j , . . . n. For x E D , let D , be the tangent vector space to D at x. Let
T(D)=
u
Dx
X E D
be the tangent bundle to D considered as a domain in R2“ with coordinates ii),where ii are the functions on T ( D ) defined as
(.xi,
for v E T ( D ) .
i i ( v ) = D(XJ
Consider R , the real numbers, as parametrized by t. A Lagrangian on D is a real-valued function L on T ( D ) x R . If a : [a, b] + D, L(0) =
I’ L(o’(t), b
t ) dt.
a
The function 0 L(a) defines a function on the space of curves in D , and the extremal curves of L, in general, are the curves that in some sense are critical “points” for this function. However, in this simplest case, thejirst variation formula gives a rationale for a more explicit definition of the extremals as solutions of the Euler equations. Let us derive the Euler equations and the first variation formula in the standard way. Suppose a(r) is defined in coordinates by x ( t ) = (xL(r)).Then --f
L(a) =
jyX(t), a
-( t ) , t
dt
1
clt.
Suppose that a,, 0 I s < 1, is a deformation of a, that is, a one-parameter family of curves with a. = a. If in coordinates, as= x(t, s) = ( x i ( t ,s)); if c ( t ) E D a ( r )is, for each t E [a, b], the tangent vector to the curve s a,(t) at s = 0, then --f
axi
i . , ( u ( t ) ) = -( t ,
as
152
0).
153
14. Ordinary Problems of Calculus of Variations
The tangent vector field t v(t) along cs is called the infinitesimal deformation corresponding to thegiven deformation s -+ o, of CS.Put vi(t) =(ax,/as)(t, O).? --f
and after the second term on the right-hand side is integrated by parts,
Hence
(15.1) Thefirst variation formula leads us to consider curves that satisfy the following system of second-order differential equations, the Euler equations:
dt
(15.2)
We depart slightly from the general point of view and regard an extremal of the variational problem as a curve satisfying (15.2); thus (15.2) is to be regarded as a system of differential equations to be investigated for its own sake. Of course the relation of the two notions of " extremal curve" is obvious from (15.1). For example, if a,(a) = oo(a), a,(b) = oo(b),that is, if crS is a deformation with fixed end points, then v,(a) = 0 = zil(b); hence (d/ds)L(as)I s = o = 0. Conversely, if (d/ds)L(a,) I is true for every deformation with fixed end points, then (15.2) is satisfied (see any classical book on the calculus of variations). However, (1 5.1) contains information about deformations that may not have fixed end points: For example, if cs satisfies (1 5.2) and if each of the " boundary terms"-that is, the last two terms on the right-hand
t It is convenient to use a subscript notation for partial derivatives:
154
Part 2. Hamilton-Jacobi Theory-Variational Calculus
side of (15.1)-vanishes, then the left-hand side vanishes. We shall see a little later on what the vanishing of these last two terms means geometrically. After these classical comments, we proceed to a more intrinsic, geometric” characterization of the solution of the Euler equations as the projection into D of the characteristic curves of a certain closed 2-form on T ( D ) x R. Define the Cartan I-form O(L)associated with L as a I-form on T ( D ) x R as “
Given a curve o ( t ) , a
< t < b,
in D, we can consider its extended curve
0(t) = (a’(t), t ) in T ( D ) x R: I n other words, 0 is the graph of the tangent
vector curve of 0. O(L), as a 1-differential form on T ( D ) x R, defines a Lagrangian on T ( D ) x R. (a) L(a) = e(L)(0); that is, the value of the function defined by the Lagrangian L on the curve rs is equal to the value on the Lagrangian 8(L) defined on T ( D ) x R on the curve 0. Explicitly:
and H is the function (dL/di)ii - L on T ( D ) x R. (b) The curve o is a solution of the Euler equations (15.2); that is, is an extremal of L if and only if
(
d2)>
6 = x ( t ) ,- t )
is a characteristic curve of the 2-form dO(L) on T ( D ) x R. Thus, since O(L)= y i dxi - H dt, if the functions (x,, y , , t ) form a new coordinate system for T ( D ) x R, that is, of det(Ln+,,n+j)# O ,
(15.5)
then a is an extremal of L if and only if its extended curve, when written in the new coordinates as ( x i ( z ) ,yi(t), t), is a solution of the Hamilton equations with Hamiltonian H :
15. Ordinary Problems of Calculus of Variations
155
All except the remark about the Euler equations being equivalent to Hamilton’s has been proved in Chapter 12. Note that dyi = Ln+i, n + j d i j
+ Ln+ 1 , n + j
dxj
+ L + i , t dt.
Now the ( x i ,y i , t ) define a new coordinate system (at least locally) for T ( D ) x R if and only if the d i j can be expressed in terms of the dxi , dyi , and dt. This is so if and only if (15.5) is verified. Given this condition in the new coordinate system, dB(L) is in Hamiltonian form dy, A dxi - dH A dt with Hamiltonian function H = L,,, ,ti - L when expressed in the new coordinates.? We can now refer to our work in Chapter 13 to complete the description of the extremals as solutions of the Hamilton equations. A Lagrangian L satisfying (15.5) is said to define a regular, nonparametric (or nonhomogeneous) variational problem. In such problems, the extremal curves come with their “ own ” parametrization, in the sense that the parametrization cannot be freely changed without destroying the extremal curve property. Thus this sort of variational problem is the appropriate one for defining the time evolution of physical systems in Newtonian physics, where “ time” is “ given ” absolutely for the whole system, from the “ outside,” as it were. A brief indication of at least the formal relation with Newton’s laws of motion might be in order. The type of Lagrangian satisfying (15.5) for which the Euler equations are as simple as possible is one of the following form (assuming no explicit time dependence in L ) : (1 5.6) L(x, i) = $ m i i i i - V(x), where m is a constant, V(X)is a function on D, n
ddt ( L + , ( x ( t ) , $ ) )
=
d2xi dt2
- Li = m - +
1, 2, or 3;
dV
-.
axi
Recall Newton’s law for motion of a particle: Force “vector” =mass x “ acceleration vector.” This will be formally identical with the Euler equations if we identify m with mass, (d2Xi/dt2)with the acceleration vector, and (-a V / d x i )with the “ force ” vector. Thus V(x)is to be regarded as the potential function giving rise to the “force” vector. Newton’s laws are implicity based on the Euclidean nature of the underlying space, in particular on the unnatural” identification between vector fields and differential forms. It “
-f In other words, H may be regarded as function H ( x , , y i,t) of the indicated 2n variables such that, identically in ( x , i ,t),
H(xi,L.+i(x,a,t),t)=L.+j(x,i,t)aj-L(x,i,t).
+1
156
Part 2. Hamilton-Jacobi Theory-Variational Calculus
seems more natural, then, to regard the “force” vector field as a differential form, namely as -dV. Continuing, we see that the i iare to be regarded as the coordinates of the velocity of the particle. L then has the form: (kinetic energy - potential energy). Then y , = L n + ,= m i i are the coordinates of the linear momentum of the particIe.
. =Yi x. ’
H
m’
= L,+,ii - L = miiii
+ V(X)
-$ ~ ~ . i i i i
+ V(x) = (potential + kinetic) energy = total energy. =$WI.fiii
(15.7)
Thus the formal study of regular, nonparametric variational problems may be regarded as a generalization of the material that goes under the name of classical mechanics.” This has the great advantage of replacing the Newtonian theory, which is really covariant” only under the group of orthogonal transformations, by an apparatus that is covariant under the much bigger group of all transformations of the underlying space D. We now want to make precise this “covariance” of the Euler equations under arbitrary transformations of the domain D. We shall be able to d o even more, namely, to develop covariance with respect to arbitrary mappings of one domain D c R“ into another D‘ c R”. Suppose, then, that D is a domain in R“with coordinates ( x i ) ,1 5 i,j , . . . I n, that D’ is a domain in R”, with coordinates (zJ, 1 5 a, 6, . . . 5 m, and that 4 is a map: D -+ D‘. Suppose @,(x) = z, are the functions defining the mapping; that is, “
“
4 * ( z a ) = 4,.
There is a mapping &*: T(D’)-+ T ( D ) of the tangent bundles assigning 4*(v) E D;,,, to each z) E D,. If (z,, 2,) and ( x i , ii) are respectively the standard coordinates on T( D) and T(D’), respectively, then recall that (15.8)
THEOREM 15.1 Let D and D’ be domains in, respectively, R” and R”, 4 : D .+ D‘ a map between them, 4* : T( D ) -+ T( D’) the prolonged map to tangent vectors. Consider 4* as a map also of T( D) x R + T(D’) x R by mapping
4*(u, 0 = (4*(u), 0
for v E T ( D ) , t E R ;
15. Ordinary Problems of Calculus of Variations
157
that is, (4*)*(t)= t . Let L and L’ be, respectively, functions on T ( D ) x R and T(D’) x R defining Lagrangians on D and D’ such that (&)*(L’) = L. Then: (a) If a(t),a 5 t I b, is a curve in D,if al(t) = 4(o(t))is the transformed curve in D‘ under 4, then (b) L ( 4 = L’(a,), ( 4 * ) * ( w %= 49. Proof. (a) Follows more or less from the definitions. The basic geometric property of 4* is Ol’(t)
= 4*(o’(t>);
hence, L’(o,)
=
j L‘(ol’(t))d t = j L!(&(d(t)))
=
ja(4*)*(WCJYO)d t
=
jaL(o’(t))d t = L(a).
b
b
a
dt
a
b
b
(b) This requires a little more computation: dL
= Ln+ld f i
+ Li dxi + L, d t
= (+,)*(dL!)= (&)*(L;+.
= (4*>*(Zn+a)
d(%
fj)
di,
+ L,’dza + L,‘dt
+ (4*)*(~:)d4o + ( 4 * ) * ( ~ :d)t
(the unwritten terms involve dx, and dt). Hence,
a4a kj dt (4,)*(dz, - in d t ) = d&=- -
axj
a4a
a4a
= - d x j - -fj
ax
84a axj
ax
dt
= -( d x j - f j dt).
158
Part 2. Hamilton-Jacobi Theory-Variational Calculus
= L,+ j(dxj
- f j dt) + L dt
= B(L).
Q.E.D.
COROLLARY 1 If Cp,(T(D)) = T(D’), then 4 carries an extremal of L into an extremal of L’. Proof. The proof consists in putting together three general remarks. (a) If E and E‘ are domains, cp: E-+ E‘ is a map such that 6,(T(E)) = T(E’) ; that is, Cp is a maximal rank mapping. If w‘ is a closed 2-form on E ’, and if u E T ( E ) is a characteristic vector of cp*(w), then cp&) is a characteristic vector of w. For the proof, suppose that D E E,, y E E. Let u’ E E i ( y ) We . must show that w((D*(P), u’) = 0. But by hypothesis there exists a vector u E Ey such that cp*(u) = u’. Then W(cP*(U>
u’) = w(cp*(u), 4o*(u’))
= cp*(w)(u, u ’ ) = 0.
(b) If 4,(T(D)) = T(D’), then (&)*: T(T(D))-+ T(T(D))is onto, that is,
4* is a maximal rank mapping of T ( D ) or T(D’).
We shall use the standard coordinates ( x i , ii),(z,, 5,) for, respectively, T ( D ) and T(D’),1 I i, j , . . . , I n ; 1 I a, 6, . . . , 5 m. To show that a mapping of vector spaces is onto is equivalent to showing that the dual mapping on covectors is 1-1. Thus, suppose that there is a covector on T(D’) mapped into zero by (&I*; say,
0 = (4*>*(Aa dza + A m + , d i a ) = A,
- nx; + Am+, d
= A,
a(Pa a24, dxi + Am+u
JdJU
axi
axi
axj axi
~
xi
dxj
+ A,,,+, 84, -d i i axi
Now that 4 is maximal rank means that the rank of the matrix ( d 4 , / d x i ) is everywhere m. Then
84, dRi = 0; ,Im+,-
axi
hence,
34,
;Im+, - = 0,
axi
I59
15. Ordinary Problems of Calculus of Variations
hence,
a 4 a dxi = 0, A, axi
a4a
hence, Aa - = 0 , axi
hence, A,
= 0.
(c) If a(t) is a curve in D, if ol(t) = 4 ( o ( t ) ) is the transformed curve under the image of the curve t -+ (o'(t), t ) under the map (p* is the curve t (%'(t>,0.
4, then +
This follows from the very definition of the map
c$*
:T(D) x
R iT(D') x R.
COROLLARY 2 Let X = A ,(a/dx,) be a vector field on D. Define the prolonged vector field
X on T ( D ) x R as
aAi s. a + -.a 2 = A . -a + laxi
axj
asi
at
Then x ( e ( L ) )= o ( ~ ( L ) ) .
(15.9)
Proof. This could be verified by a similar direct computation, but it is more constructive to reduce it to Theorem 15.1 by a geometric argument. Suppose, for simplicity, that X generates a one-parameter group of transformations of D,s -+ T,; that is, for xo E D, s T,(xo)is an integral curve of X . In particular, --f
a axj
a
- T,"(Xi) = X(T,*(Xi)) = Aj - (T,*(Xi)).
at
Now, from the geometric meaning, s T,, is also a one-parameter transformation group on T ( D ) . Extend T, to T ( D ) x R by --f
(T,*)*(t) = t
+ s.
Finally, then, s + T,, is a one-parameter group of transformations of T ( D ) x R. We want to show that X is the infinitesimal generator of this group :
160
Part 2. Hamilton-Jacobi Theory-Variational Calculus
hence,
a as
- K*)*(t)I
s=o
=
1.
This verifies the indicated form of X . Finally, to prove (15.9),
a as
= I3 - (T,,)*(L)
I
s=o
= O(X(L)).
As useful as these nonparametric Lagrangians are for understanding classical mechanics from a higher” point of view, for theoretical purposes it is more convenient (as we saw in Chapter 14) to have Lagrangians whose extremal curves can be freely reparametrized. In physics this corresponds to giving up the Newtonian picture of “time” as an “independent” variable with a different status from the “ dependent” space and velocity coordinates. We first show that the extremals of a Lagrangian L on D are independent of the parametrization if “
(a) L is a function on T ( D ) alone; that is, L is time-independent; (b) L(/lu) = AL(u) for A > 0.
ProoJ: Let o ( t ) ,a I tI b, and ol(z), c( I 7 I B, be curves differing only by change in parametrization. By definition, this means that there is a realzI [I, such that t’(r) > 0, (.) = a , z(P) = b, and valued function t ( z ) , c( I ol(z)= o(t(7)) for c( I z 5 p. L(o,)
/I = Ja L(o; (7)) n7 =
J:” L(o’(t(z)t’(z))dz
On the other hand, we can start with an arbitrary time-dependent Lagr-angian L(.u,, . . . ,x,, i, . ., . , in, t ) ; aiid by introducing another pair of dependent variables x , + ~= I, infl = i, and a Lagrangian L i , then by
15. Ordinary Problems of Calculus of Variations
161
the formula
we convert the time-dependent, parametrization-dependent variational problem in iz variables to a time-independent, parametrization-independent probIem in (n + 1) variables. To verify that this formula actually does this, notice that L, so defined is homogeneous, and that Ll(al) = L(o), where B = ( x , ( t ) ) , and oI = (xi(t), t ) is the graph ” of c in (n + ])-space. We shall work from now on with such a time-independent, homogeneous Lagrangian L ( x , , ii). Then L satisfies the Euler homogeneous function relations, which will play an important role. To derive them, start with “
An) = AL(x, i).
(15.10a)
L,+ l(x, A i ) i , = L(x, i).
( 15.lob)
L(x, Differentiate with respect to A:
Differentiate again with respect to
A and set A = 1.
f)iiij = 0. Ln+l,n+j(x,
(15 . 1 0 ~ )
Differentiate (15.10a) with respect to A i : J ~ , , + ~A( iX) A, = 2L,,+l(x,i) or
L,z+l(x,Ai)= L,+l(x, i). (15.1Od)
Applying a / d A to (15.10d), we have
Ln+1 , n + j(x, f ) i j = 0.
(15.10e)
Then
B(L) = Ln+i dxi - ( L n + i i i L) dt = L n + ldxi
- 0,
by (15.10b). Thus O(L) is a form on T ( D ) alone, and we can in effect ignore the additional explicit “ time” variable and consider extremals as characteristic curves of the 2-form dB(L) on T(D). In addition, note that by (15.10e), det(Ln+i,,+j)= 0; hence the functions y , = L n + iare not functionally independent. However, if (n - 1) of them are functionally independent, that is, if rank(Ln+i,n+j(u)) =n -1
for all u E T(D).
( 15.1 1a)
then we say that L defines a regular homogeneous variational problem.
162
Part 2. Hamilton-Jacobi Theory-Variational Calculus
Another way of putting this condition is as follows: Whenever (bj)are numbers such that Ln+i,n+j(X,
I)bj=O,
(15.11b)
then we must have b j = a I j for some real number a. For (15.11b) just expresses the fact that the nullity of the matrix ( L n + i , n + j ) is 1, equivalent to the fact that the rank is n - 1. Suppose from now on that L satisfies this regularity condition. The general theory of homogeneous variational problems expounded in Chapter 14 is applicable, with no constraints; that is, K = T ( B ) . Modifying slightly the general theory (since we may want to also consider extremals that do not minimize L), we say that an extremal j e f d for the variational problem is defined by a vector field X : D + T ( D ) such that
X*(dO(L)) = 0,
X * ( L ) > 0 at all points of D.
(15.12)
(We are using the characterization of vector fields as mappings D + T(D) such that X ( x ) E D, for all x E D, that is, as cross-section maps.) Equation (15.10b) implies that
If X E V(D) defines an extremal field, so doesfX, for each positive function f on D.
( 15.1 3)
Suppose that X = Ai(i3/dxi), and that W is a function on D such that X*(O(L))= d W ; that is, d W = L n + i (A~( ,x ) )d x i .
(15.14)
Then
X ( W ) ( X )= La+ I(x, A(x))Ai(x) = L(x, A ( x ) ) = X*(L)(x)
for X E D .
Hence we can normalize X by multiplying by a positive function so that X ( W ) = 1= L ( X ( x ) )
for all x E D.
We shall call such a function (defined up to an additive constant) the characteristic function associated with the extremal field. THEOREM 15.2 Suppose that Xis a vector field on D that defines an extremal vector field for the variational problem defined by a homogeneous regular LagrangianL on D ; that is, X satisfies (15.12). Suppose that W is a function o n D such that X*(O(L)) = dW. Then:
15. Ordinary Problems of Calculus of Variations
163
(a) The integral curves of X are extremals of L. (b) Let o and o1be curves beginning and ending on the same level surface of W whose tangent vector curves are sufficiently close together. Suppose further that o is an integral curve of X and that the following condition is satisfied : The symmetric matrix (Ln+i , ,+ j ( o ’ ( t ) ) ) is positive semidefinite.
( 15.15)
Then L(o) < L(o,). Equality holds only if o1 is also an integral curve of X . Proof. Part (a) is a consequence of Theorem 15.1. To prove Part (b) we will show that, for each x E D,L(u) > L(X(x)) for each u E 0,that is sufficiently close to X ( x ) , and that satisfies u( W ) = 1, u # X(x). If, say,
then d dt
- (L(x, A ( x )
+ t(v - A(x))) = L,+i(Ui - A,(x)).
d2
- L(x,A ( x ) + t(v - A(x))) I t = O dt2
= Ln+i,n+j(x,A(x))(ui
+
- Ai(x))(vj - Aj(x)>-
We know that (d/dt)(L(x,A ( x ) t(u, - A(x))) = 0, from the very definition of extremal vector field. Now, (d2/dt2)L(x,A ( x ) tu) 2 0. We want to prove that it is > 0. Suppose, otherwise, that is, it =O. Then, by (15.11b), ui - A i ( x ) = p A i ( x ) . The condition
aw
+
aw
1 = -( X ) U i = -(X)Ai(X) axi axi forces p
= 0; contradiction.
Q.E.D.
We see that extremal vector fields are very useful if they can be found. Now we indicate how the general method given in Chapter 13 for finding integral manifolds can be adapted to finding integral manifolds of the special 1-form dd(L). Let o(t),a 5 t 5 6, be a curve in D, and let us look for necessary conditions that o is the integral curve of an extremal vector field X . The first condition that comes to mind is that o is to be an extremal itself, that is, a solution of the Euler equations. Let W be the function on D such that X *(B(L))= d W.At most adding a constant to W , we can suppose that W(o(a))= 0. Suppose that
164
Part 2. Hamilton-Jacobi Theory-Variational Calculus
4 : D‘
+ D is a submanifold of D such that W is equal to zero on Then, for x E D, dW = L,+,(X(Xj) d x , .
Hence, for x x , we have
E
(p( 0‘)and for a tangent vector u E
u( W ) = L,,
4(D’).
D, that is tangent to (P(D’)at
j ( X ( X ) ) U ( X , ) = 0.
Forgetting for the moment how we arrived at this relation, let us make a general definition. Definition
Let L : T(D) R define a homogeneous, regular Lagrangian on D. Suppose that u and u are tangent vectors to a point x E D ;u is said to be perpendicular to u (with respect to the Lagrangian L) if --f
L,+,(u)fi(v)= 0.
(1 5.16)
If V is a subspace of D, ,we say that u is perpendicular to V (with respect to L) if u is perpendicular to all vectors u E V . If (p: D’ D is a submanifold of D such that x E 4(D‘), we say that v is perpendicular to the submanfold if u is perpendicular to the tangent space of the submanifold at x. If a x is a vector field on the submanifold (p, that is, x is a map assigning a vector ~ ( x E) D, to each x E $(D), we say that x is a perpendicular uectorjield to 4 if ~ ( x is ) perpendicular to the tangent space to 4 ( D ) at each point x E 4 ( D ) . This relation is a generalization of the ordinary relation of perpendicularity for vectors in Euclidean spaces. However, notice that this relation is not necessarily symmetric, as it is for Euclidean geometry; that is, if u is perpendicular to u, u is not necessarily perpendicular to u. It is also easily seen that this definition is independent of the coordinate system. Returning now to the extremal vector field X , its associated characteristic function W, and its integral curve a(tj with W(o(0))= 0, we see that o’(0) = X(a(0)) is perpendicular to any submanifold 4 : D’ -+ D on which W is zero, and the vector field on the submanifold obtained by restricting Xis also perpendicular to the submanifold. Note also that the map D‘ -+ T(Dj, which assigns X(q5(x’)) to each x E D‘, is an integral submanifold of the I-form B(L). Now we are prepared to apply the general theory of Chapter 13 concerning integral submanifolds of 1- and 2-forms to reverse this reasoning and give a condition that an isolated extremal curve of L can be embedded as anintegral curve of a vector field. --f
THEOREM 15.3 Let D be a domain of R”, with a homogeneous, regular Lagrangian L. Let D be a submanifold of D of dimension n - 1, and let B : [0, I ] -+ D be
4 : D’
-+
15. Ordinary Problems of Calculus of Variations
165
an extremal curve such that a(0) lies on 4(D') and such that its tangent vector a'(0) is perpendicular there to (P(D'), and such that L(a'(0)) = 1. Then, if D is sufficiently small, there is a unique extremal vector field X such that: (a) a is an integral curve of X . (b) If W is the function on D such that X*(B(L)) = dW, then W = 0 on
4(W*
Thus, an isolated extremal curve of L can be embedded (locally) in an extremal field in many ways-every choice of a hypersurface to which it is initially perpendicular defines such a field. We shall arrange our proof of Theorem 15.3 in a series of lemmas so that the reader can see at least the beginning steps of attempting to embed a in an extremal field if its initial perpendicular submanifold is of lower dimension than n - 1. We shall not complete this project here, partly because we do not know the answer (except for Riemannian manifolds, where it can be done). LEMMA 15.4 Let D be a domain with a homogeneous regular Lagrangian L. Let + D be a submanifold of dimension m of D. Let P(4) be the set of tangent vectors u E T(D)to points x E +(D')such that L(u) = 1, u is perpendicular to +( 0).Then P(4) is a submanifold of T ( D ) of dimension (n - l), and is an integral submanifold of the 1-form B(L). A vector Y E P ( 4 ) is not tangent to
4 : D'
4W)-
Proof. It suffices to prove the lemma in the case that D is as small as we please. Let X , = A;(d/dx,), 1 I a, b, . .., 5 rn, be everywhere linearly independent vector fields in D such that the X , are tangent to 4(D') and that their values at each point of 4(D') define a basis for its tangent space. A vector u = (i is ) perpendicular to 4 ( D ' ) at x if Ln+i(X,i)Ai"(x)= 0.
We add the equation L(x, i) = 1, and must show that these equations are of maximal rank. Thus we must show that the rank of the n x (m+ 1) matrix
(
L n
is equal to m
+ i, n +j ( X , i)Ai"(x) Ln+
j ( x , i)
1
+ 1. Suppose, then, that there is a relation of the form
+
AL"+j(x,iO) ; l . L n + l , n + j ( X , iO)A,"(x)= 0. Multiplying by i jand using the Euler relation (15.10b) and (15.10c), we have
166
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
1= 0. Since now Ln+i, n + j ( x , i)(& A:(x)) = 0, sinceL is regular, we must have relations of the form 1,A:(x) = px:, whence 0 = L , + ~ ( xfo)l, , A; ( x ) = P L , + ~ ( xi)ii , =p. Finally, then, 1, X a ( x ) = 0, whence 1, = 0. Now suppose u E P(#)n D, is tangent to d(D). We must then have relations of the form
v
hence,
= y,
X,(X),
Ln+i(x,i ( u ) ) A q ( x )= 0;
0 = Ln+i(x, i(v))A,R(x)ya= Ln+i(x, +>)k(u) = L(x, i ( u ) ) =
contradiction.
1,
15.5 LEMMA
Let L be a regular homogeneous Lagrangian on a domain D and let D x 0 ,for xo E D,satisfy L(v) = 1 . Then, if a is sufficiently small, there is a unique extremal curve a(t), 0 s t I a, satisfying a(0) = xo, o'(0) = u, L(a'
Proof. If x ( a ( f ) )= x(t), it must satisfy d dt
-L
'
"+I(
'(x ( t )'2)
'2)
x(t) -
= L.
-
dx dt
= Ln+i, j -
d -L dt
+L
.
n + I , n+ j
d2xj dt2 '
-
= L j - - r + L n + j - - J - - 0. (x ( t ) , 2) dt dt2 dx .
d2x.
We must show that these equations are equivalent to a system of differential equations of the form
To do this, it suffices to prove that the n x (n + 1) matrix
i 1 Ln+i,n + j Ln+j
has rank n at every point (x,i) where L(x, i) = I , Suppose, then, that there is a relation of the form iLn+ j
+
l i
L,+i,n + j = 0.
167
15. Ordinary Problems of Calculus of Variations
Proceed as in the proof of Lemma 15.3. Multiply by f j , and use the Euler relations to infer that A = 0. Then, since rank ( L , + i , n + j=) n - 1, there is a relation of the form Ai = p i i . In particular, the nullity is 1 ;hence the rank is n.
LEMMA 15.6 Let L be a regular, homogeneous Lagrangian on a domain D. Then the 2-form dO(L) restricted to the submanifold L = 1 has rank 2(n - 1). In particular, at each u E T ( D ) with L(v) = 1, there is a unique (up to scalar multiple) tangent vector to v that is a characteristic vector of the 2-form dO(L) and is tangent to the hypersurface L = 1.
Proof.
dO(L) = L , + i , , + jd f j A dxi
+ L,+i,
dxj A d x i .
We shall prove that ( L , + i , , + jd i j d i j A dxi)”-’ A dL # 0.
(15.17)
This will obviously prove that (dO(L))”-’ # 0 when restricted to L = 1, as required, since the second term for dO(L)will contribute no term in dij. Note that when d i j and dxj are subject to the same linear transformations with constant coefficients,(L,+i , ,+ j ( v ) ) changes as the matrix of a quadratic form. Hence, in trying to prove that there is no loss of generality to suppose that the symmetric matrix (Ln+ ;, ,+ j(u)) is diagonal, with eigenvalues (Al, .. . , An-l, 0). (There is only one zero eigenvalue, since we know that rank ( L , + i , n + j=) n - 1.) Then
( L , + i , n +dj f j A dx,);-’
= (A, d i l A
dx,
+ ... + An-’
= 1,.-*A,-, di’ A
dx,
A
d i n _ ,A dx,)”-’ A
din-,
A
dxnPl.
= O =Aiii(u)(nosummaN o w d L = L i d x i +L,+idxi.NowL,+i,~+j(u)ij(v) tion), whence fi(u) = 0 for i < n. But, I = Ln+,(v)ii(v) = L,,(u)x,,(u). This proves (15.17) and hence the lemma. Now we can turn to the proof of Theorem 15.3. Let xo = a(0) E 4(D‘), and let uo = o’(0). By hypothesis, uo is perpendicular to 4(D’). By Lemma 15.4, if D is sufficiently small, there is a mapping 2 : 4(D’) + T ( D ) such that for each x E +(D’), ~(x) E D, is a vector that is perpendicular to d ( D ’ ) and such that L(x(x))= 1. Thus, the mapping x is an integral manifold of O(L);hence also an integral manifold of dO(L). By Lemma 15.4, ~ ( x is) not tangent to +(D’) for x E d(D’). By Lemma 15.5, the characteristic curve of dB(L) beginning at ~ ( xand ) lying on L = 1 is the tangent vector curve to the extremal of L beginning at x and tangent there to u. In particular, the unique (up to a scalar
168
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
multiple) characteristic vector field of dO(L) is not tangent to the submanifold of T ( D ) defined by x; hence Theorem 13.2 can be applied to construct an n-dimensional integral manifold of d@(L)passing through the one already defined by x. Project this integral submanifold down to D : This amounts to constructing, for a and D sufficiently small, a map 6: D‘ x [0,a ] + having the following properties: (a) 6(x’, 0) = &(x’) for x’E D‘. (b) For each x‘,the curve t + 6(x’, t ) is an extremal of L. The tangent vector to this curve a t t = 0 is x ( ~ ( x ‘ )L) .has the value 1 on its tangent vector field. (c) Construct the mapping X : D’x [0,a ] -+ T ( D ) by assigning to each (x’, 2 ‘ ) E D’ x [0,a ] the tangent vector to the curve t -+ 6(x’, t ) at t. Then X i s an integral manifold of dO(L). Notice now, since ~(x)is not tangent to 4(D‘) for x E 4(D’), that the mapping S has nonzero Jacobian ; that is, no nonzero tangent vector to D‘ x [0, a ] can be mapped into 0 by 6 (if a is sufficiently small). Hence we can suppose, if D is taken small enough, that 6(D’ x [0, a ] ) = D,and that 6 has an inverse map. Having made this identification, it should be clear that the map X is the extremal vector field of L that we are looking for. As a bonus from this proof, we obtain the following geometric-physical picture of how the extremal field is constructed: For t E [0, a ] ,let 4t:D‘-+ D be the map assigning 6(x’, t ) to each x’ E D‘. We construct the map xt: & t ( D’)-+ T(D)by assigning X(x’,t’) to each 4t(~’). By Theorem 13.6, each map x t is an integral submanifold of O(L);that is, for x E 4t(D’),x,(x) is perpendicular to &*(Of). Mapping t q5t defines a “wave” traveling through D, guided along by the rays ” that are the integral curves of X . If-as in geometrical optics defined by Fermat’s principle-L(a), for a curve 0 , represents the time a light ray takes to move along the curve, the surfaces 4,(D’) represent the wave fronts at time t for light rays starting at t = 0 on 4(D’). W , the characteristic function of the extremal field, is defined by the condition that its value of 4 t ( D ’ )is t , that is, the time needed for the beam of light rays to get from the initial surface. Thus, inverting the historical order, we arrive at the picture that guided Hamilton in discovering Hamilton-Jacobi theory.” -+
“
“
Exercises 1. Prove that Euler’s equations (15.2) are necessary conditions for an extremal.
2. Take a variational problem of minimal complexity (for example, a
15. Ordinary Problems of Calculus of Variations
169
Lagrangian L(x, 1,t), where x is a real number). Discuss carefully the differentiability properties of extremals. One interesting question : If the curve is only C', the notion of extremal is well defined. However, Euler's equation (15.2) requires it and L to be C2. How does one modify things? Under what conditions does a C' extremal have to be C2. What are the conditions that an extremal be piecewise C 2 ?
16
Groups of Symmetries of Variational Problems: Applications to Mechanics
The role played by transformation groups that leave a variational system invariant can often be best understood from the more abstract, manifold point of view. Hence we shall pause to show how the basic machinery of the calculus of variations can be formulated in coordinate-free terms for manifolds. As a bonus, we shall obtain a useful way ofdeveloping the differential equation for extremals in terms of an arbitrary basis of differential forms, which has great advantages in certain types of calculations. Let M be a manifold; F ( M ) denotes its ring of (Cm)real-valued functions, V ( M ) the set of its vector fields, T ( M ) = M , its tangent bundle. A Lagrangian on M is just a real-valued function on T ( M ) x R , usually denoted by a letter such as L. ( R = the real numbers, usually parametrized by t.) Such a function enables one to define a real-valued function r~ -+L(o) on curves 0 in M . If D is a map: [a, b] M , then
UpEM
--f
L(0) = JbL(o’(t),t ) d t , a
where t o’(t) E Mo(t)is the tangent vector field to CJ. As in Chapter 12, one defines the extremals of L as the curves r~ that have the property that CJ is a “ critical point of L restricted to the space of all curves going from o(a) to a(b). (For simplicity, we shall consider only ordinary variational problems in this chapter; that is, we shall impose no additional constraints that the tangent vector field to (T must satisfy.) To give the Euler equations for the extremals in coordinate-free form, introduce the Cartan ITform O(L), a 1-differential form on T ( M ) x R . This can be defined by using a coordinate system, as in Chapter 15. The reader may find it useful to see a more general definition in terms of an arbitrary basis ( w l , . . . , on)of differential 1-forms in an open set U of M . Adopt the following range of indices and summation convention : --f
”
l < i , j , ... i n = d i m M .
Since no confusion is likely, let wi also denote the differential forms on the open subset of T ( M ) x R lying above U, obtained by pulling back via the 170
171
16. Symmetries of Variational Problems
projection map. Of course we must then denote by y i the real-valued functions defined by mi on T ( M ) ; hence also on T ( M ) x R: yi(u) = wi(u)
for u E T ( M ) .
At any rate, ( m i , dyi , dt) forms a local basis for differential forms on T ( M ) x R. Suppose, then, that
dL = L i w i
+ Ln+idyi + L, dt.
Now
~-( + ~ , +~ -~ Ly) ~dt. ~
e(L) = L
~
One immediately sees that this reduces to the I-form, introduced in Chapter 15, by specializing to the case where wi = dxi ,with ( x l , . . . ,x,) functions on U defining a coordinate system. As an independent check on this fact, let us verify that B(L) remains unchanged when a different basis (mi’) of 1-forms for U is used. Suppose that mi‘ = A i j u j . Then also
y.‘ dL = Lfwi‘+ Hence,
dyi‘+ L:dt
= A . .y . *J
J
’
= E f l + i A idj y j
(modulo mi, dt).
L,+j = Efl+iAij. W(L)= L!n+i~i’ - ( E f l + i~ iL ) dt = E,+iAijwj
- ( L ! , , + i A i j y j- L) dt
= L,+jOj - ( L , + j y j- L) at =
w),
which expresses the invariance of B(L). We state another property of B(L): If t -+ a(t), a I t 5 b, is a curve in M , if t -+ (a’(t), t ) = y ( t ) is its extended curve in T ( M ) x R , then b
Jab
L(o’(t),t ) dt = J” O(L)(y’(t))d t . a
Thus, by ‘‘ lifting” to the space T ( M ) x R sitting over M , we have converted a “ nonlinear Lagrangian L to a “ linear one, d(L). (This procedure of lifting to a higher dimensional space to simplify the structure of a geometric object is typical of Cartan’s entire approach to differential geometry.) We now recognize that a curve t -+ a(t) in M is an extremal for L ; that is, it satisfies the Euler equations if and only if its extended curve t + ( ~ ’ ( tt)),= y ( t ) in T ( M ) x R is a characteristic curve for the differential form dB(L), that is, satisfies y’(t) J dO(L) = 0. ”
”
1 72
Part 2. Hamilton-Jacobi Theory-Variational Calculus
As we have seen in Chapter 13, the study of the characteristic curves of a
closed 2-form is more or less identical with Hamilton-Jacobi theory. Let us work out these conditions explicitly:
+ L dt) = dLnIi A (mi- yi d t ) + L,+,(dwi - dy, A d t ) + L j w i A d t + Lnfi dyi A d t = dL,+i A (oi- yi d t ) + LnIidwi + Liwi A dt = dL,+ A (0, - yi d t ) + LiOi A dt + L,+i Cjki
dB(L) = d(L,lli(oi- yi d t )
j
W j A O k ,
where the (C,,,) are the functions such that dwi
cjki f ckji
= c j k i Oj A O k ,
= 0.
Now (mi
- Y i dt)(y’(t))= wi(c’(t>>- yi(y(t))= 0,
dL,.,(y’(t)) Hence the conditions that y ( t ) are d dt
- L,+i(O’(t),
=
d
L,+i(c’(t),0,
= ( ~ ’ ( t )t,)
etc.
be a characteristic curve of
t)(wi - J’idt) - Li(a’(t),t)wi
+ Liyi dt + L,+iCjkiYjWk = 0.
Now w i and dt are independent differential forms; hence these equations imply that the coefficient of wi is zero, that is: d dt
- L,+i(C’(t)>
t ) - L,(@‘(t), t)
+ Ln+k
Cjik(‘’(t),
t)wj(c‘(t>)= 0.
These differential equations for o are the EuZer(-Lugrange) equations with respect to the basis (q, . . . ,w,). Notice, in case the w , , . . . , w, are the differentials of a set of coordinate functions x,,. . . ,x, for M , that they reduce (since Cjik= 0) to the classical Euler equations d aL aL ----=o dt a i i axi
that were derived in Chapter 15. The more general equations are very useful in certain mechanical problems (for example, in rigid body dynamics) where a basis for differential forms can be found more readily than for a “natural” coordinate system. 1f the variational problem is nonhomogeneous, that is, if extremals cannot
173
16. Symmetries of Variational Problems
be freely reparametrized, regularity of the variational problem is determined by the condition
-.. A e(L) z 0. e(Ly = B(L) n factors Alternately, this can be expressed by the condition det(Ln+i, ,+
dL,,
E L,, i, n +
dyj
j)
# 0, where
(modulo m i , dt).
If the variational problem is time-independent and homogeneous, that is, if L ( h ) = LL(v)
for
A > 0,
then ( L n + i y i L) = 0; hence O(L) can be considered as a I-form on T ( M ) . Regularity is now determined by the condition : Dimension of the characteristic vector of B(L) = 2 or rank(&,+,, ”+ j ) = (n - 1). An extremal vector field for the variational problem can now be defined as a cross-section mapping @: M x R -+ T ( M ) x R assigning to each pair ( p , t ) E M x R a tangent vector @ ( p , t ) E M p such that
a*(de(L)) = 0. A function S ( p , t ) on M x R such that dS
= @*(B(L))
is a solution of the Hamilton-Jacobi partial differential equation associated with the variational problem. The rays of an extremal field are the curves a ( t ) that are solutions of the system of ordinary differential equations: d ( t ) = @(a(t),t ) .
They are extremals of the variational problem, as described above, but such n-parameter families of extremals play an important role in proving that extremals really do minimize, and are very important in physics (for example, in describing the wave-particle duality). Let 4 : N M x R be a submanifold of M x R. A “ vector field ” on N is a mapping v : N --+ T ( M ) such that v(p) E 4 ( p ) for p E N . Such a vector field is perpendicular to N if v*(B(L))= 0. If @ is an extremal vector field, with S the associated solution of the Hamilton-Jacobi equation, and N is a submanifold defined by S = constant, then one sees that @ restricted to N is such a perpendicular vector field. Now suppose that L and L‘ are Lagrangians on manifolds M and M ’ , that 4 : M ’ M is a map, and 4*: T ( M ’ ) 4T ( M )is the differential of 4. We shall --f
174
Part 2. Hamilton-Jacobi Theory-Variational Calculus
also use 4*to denote the map: T ( M ' ) x R - + T ( M ) x R, which acts identically on R. Suppose that Then, also, as we have proved in Chapter 15 (Theorem 15.1) For example, if 4 is a diffeomorphism between M and M ', it is clear that this implies that 4 maps an extremal of L' into an extremal of L. If M = M ' , L = L', such a 4 can be regarded as a symmetry of the variational problem. A Lie group G acting on M as diffeomorphisms of M can be regarded as a group of symmetries if each individual transformation is a symmetry. Now, the action of G on M can be prolonged to an action of G on T ( M ) x R by sending C#I E G into 4*. (It is left to the reader to verify that this actually defines an action of G on T ( M ) x R.) This action of G preserves O(L);hence, also dB(L). Now the Lie algebra of G also acts as a Lie algebra of vector fields on T ( M ) x R, since the action of a Lie group on a manifold gives rise to an action of its Lie algebra as vector fields. For example, if an element X in the Lie algebra of G is realized as a vector field
on M (using a local coordinate system xi,. . ., x,,for M ) , the prolonged vector field is
x
a +aAi i. a axi axj aii
= A.-
on T ( M ) x R, as in Chapter 15.7 Thus we shall automatically have X(O(L))= 0,
X(dO(L)) = 0.
These basic geometric facts suggest that we split up the problem of discussing groups of symmetries into its " Hamiltonian " and " Lagrangian " components. Symmetry Groups from the Point of View of Hamilton-Jacobi Theory
We shall now change our point of view. We regard Hamilton-Jacobi theory as the study of the characteristic curves of a closed 2-form, independently of whether the 2-form arises from a variational problem.
t With one slight modification, that is, we leave off + ( a / a t ) in the definition of 1,thus regarding ias acting trivially on the time coordinate.
16. Symmetries of Variational Problems
175
Let P be a manifold. (By choosing P, we mean to emphasize that the typical case is that where P is the “phase space” of a mechanical problem.) Let o be a closed 2-form on P. Recall that a vector v E T ( P ) or vector field X E V ( M )is the characteristic for w if
x_Io=o.
u_lo=o,
We suppose that the dimension of the characteristic vectors (or, equivalently, the rank of o)is constant on P. A vector field Y E V ( M )leaves o invariant if Y ( w ) = 0.
THEOREM 16.1 Let o be a closed 2-differential form of constant rank on P. Let
c = ( X E V ( P ) :x J w = O},
z = { Y E V ( P ) : Y ( w ) = O}.
Then
[ C , I ] c c.
[I, I ] c I ,
In other words, C is an ideal in the Lie algebra Z. The one-parameter transformation group generated by a Y EI permutes the characteristic curves of w. Proof. Most of this follows trivially from the rules of operation for vector fields and differential forms. For example, we prove [C, I] c I. For X E C, YEI, [X, Y ] A
0=
Y(X
_I 0)-
x _I Y ( 0 ) = 0 ;
hence [ X , Y ] E C. To prove C c I, suppose X E C. Then X ( w ) = X A do + d ( X A w ) = 0. We leave the last remark as an exercise. Now, in accordance with general group-theoretical principles, one should regard I a s the Lie algebra of the transformation group of all diffeomorphisms 4 of P that preserve w , that is, that satisfy 4*(w) = o.The reader should be warned, however, that this group cannot be regarded as a Lie group. (Roughly, it cannot be described by a finite number of parameters.) Hence this relation between the group and Lie algebra will remain in the background as intuitive motivation. Let Y EI. Using another of the rules of operation concerning vector fields and forms, we have 0 = Y ( w )= Y
_I
do + d(Y _I w),
hence d( Y _I o)= 0. Thus the mapping Y -+ Y J 0 sends I into the set of closed 1-forms on P. The kernel is C . Since C is an ideal, the image in the set of closed 1-forms inherits the Lie algebra structure associated with I/C.
176
Part 2. Hamilton-Jacobi Theory-Variational Calculus
What precisely is this image in the set of closed 1-forms? Notice that 0 = Y _I o satisfies
X
_I
8 =0
for all X E Z-
(16.1)
Note another useful fact:
If $ is a 1-form satisfying d$ then
= o,and
if Y ( $ ) = 0,
d ( Y _ I $) = - Y_I 0.
(16.2)
Before proceeding further in these abstract directions, it may be helpful for the reader to see an example of what this means in more classical language.
EXAMPLE (THE
POISSON BRACKET)
Suppose dim P w = dpi A dqi
= 2n,
with coordinates ( q l , . . . q n ,p l , . . ., p , ) .
(1 I i, j , . . . , I n ; summation convention in force).
We choose this way of labeling coordinates on P so as to suggest the usual terminology in classical mechanics. P is " phase-space,'' the q are coordinates of " configuration space," and the p are coordinates of " momentum space." Let Y EI. Y -I w
=
Y ( p J dqi - Y ( q J d p i -
Since d( Y o)= 0, and P is a Euclidean space, by the PoincarC lemma there is a function f E F ( P ) such that (16.3a)
af -- -Y(qi) _ api
or
Y=
af a ---+ aPi
aqi
af a
--.
aqi aPi
(16.3b)
Suppose that we turn this around and, given f E F ( P ) , define Y, as a vector field on P by the formula (16.3). Then Y,(cu)
= d ( Y , -I w )
+ Y , -I do = d ( d f ) + 0 = 0.
Thus, for g E F(P), (16.4) The function on the right-hand side of 16.4 is classically called the Poisson
177
16. Symmetries of Variational Problems
bracket of the function f and g, and denoted by if, g } . Now we have defined it so that d f = Y,Jw. (1 6 . 3 ~ ) Then d { f , g> = d Y j ( g ) = Y,(dg) = Y,( YgJ 0) = [ Y, > Ygl 0Thus, Y{f, g } = Yf YgI. (16.5) 2
Equation (1 6.5) suggests that we regard F ( P ) as some sort of algebra under the Poisson bracket operation (f,g) -+ {A g), and (16.5) then says that the mapping f -+ Y, is an algebra homomorphism of F(P) onto the Lie algebra I. In fact, we shall show that F ( P ) under { , 1 is itself a Lie algebra, so that f 4 Y , is a Lie algebra homomorphism. Skew symmetry: {f,g} = - {g, f } of the Poisson bracket is obvious from (16.4). It remains only to prove the Jacobi identity:
{f,( 9 , h:)
t r, >
+ Yg(Y,(h))
=
Y f ( b h ) ) = YfC Y,(W
=
y{f,g}(h)+ ( 9 , { f , h } } = { { Ag } , h} + ( 9 , {f,h } ) ,
=
Y,l(h)
which is precisely the Jacobi identity. There is an alternative definition of Poisson bracket in terms of exterior multiplication. Note that o is a 2-form of maximal rank 2n. Hence on# 0. (0" = exterior product of n copies of w.) Forf, g E F ( P ) , on-' A d f A dgis a say, 2n-form that then must be a multiple of on, A
df
A
dg
= ho".
Let us find h by applying Yf_I to both sides: Y,
_I
on-' = (n - 1 ) 0 " - ~
(since d f o~ = (-1)"'o
Y,
A
A
df = o
J df =
df, A
Y,
_I
on= no"-'
A
df
d f ) , and
Y,(f)
=
{f,f> = 0.
Finally, then, on-' A
df-{f,g } = h * no"-'
A
#.
This suggests that we try to prove that h = {f, g)/n. Since they are functions, it suffices to prove that there is point-by-point equality. Now, if df = 0 at a point, clearly both sides must be zero; hence, equality. If df # 0, df A on-' is not zero (exercise in exterior algebra; left to the reader); hence, equality again. Finally, then, gn-
1
A
df
A
{f,9) gn.
dg =-
n
(16.6)
178
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
Notice now that we have used only the coordinate system ( p i ,qi) to get the classical formula for Poisson bracket. Then we can sum up our results in coordinate-free form as follows : THEOREM 16.2 Let P be a manifold of dimension 212,and let w be a differential form on P of rank 2n. Let Z be the Lie algebra of vector fields on P that preserve w. There is a linear onto mapping: F ( P ) +Z, denoted by f Y,, satisfying (in fact, defined by) (16.3~).Defining the Poisson bracket o f f ; g E F ( P ) , denoted by {f,g } , by (16.4) makes F ( P ) into a Lie algebra such that the mapping onto Zis a Lie algebra homomorphism with kernel the constant functions. Equation (16.6) provides an alternate definition for Poisson bracket in terms of exterior algebra. The geometric properties of the Poisson bracket operation can be summed up as follows: Each f € F ( P ) gives rise (modulo the usual difficulties in extending integral curves of vector fields, which we shall ignore here) to a oneparameter group of diffeomorphisms on P that preserves the form w. The condition {f,g } = 0 means that g is constant on the orbit of the group, that is, that g is an integral of the ordinary differential equations defining the orbit. If ( q i ,p , ) is a coordinate system for P such that --f
w = dpi A dq, (that is, if w is in canonical form with respect to the coordinate system), then the differential equations for the orbits of the group generated by f are just
-dqi _ --dt
af
dpi’
dPi
dt
-
af
dqi’
These, it will be recognized, are just the Hamilton equations with Hamiltonianf. Let us rephrase some of these results in a more informal way, which is useful in physics. We are given a “phase space” P and a 2-form of w of maximal rank on P. The functions on P are the observables on phase space. With the help of o,the observables can be made into a Lie algebra. There is a mapping that assigns a one-parameter group of w-preserving transformations on P. (Classically, a w-preserving diffeomorphism is called a canonical transformation.) This is not quite 1-1, but the kernel is unimportant (for classical mechanics); that is, just the constant functions. In physics, choosing a particular mechanical system with phase-space P amounts to choosing a distinguished observable H, to be called the Hamiltonian (or energy). The corresponding one-parameter transformation group on P is to be regarded as determining the evolution of the mechanical system with time: A oneparameter group of symmetries of the mechanical system with Hamiltonian H is determined by an “ observable ” f with { H ,f } = 0.
16. Symmetries of Variational Problems
1 79
The main point to keep in mind is this correspondence between " observable~,''that is, functions on phase space, and certain one-parameter groups of diffeomorphisms of phase space, called " canonical transformations." It is this structure that quantum mechanics has in common with classical mechanics, but in quantum theory the phase-space P must be regarded as being infinite dimensional. As application of these ideas, we shall prove two simple theorems concerning the global properties of Hamiltonian systems. As above, let P be an even dimensional manifold, carrying a closed 2-form o of maximal rank. For f E F(P),let Y, be the vector field such that df = Y, A o.Then the Poisson bracket off and g is
Yf(d= {f,s}. Two functionsf and g are said to be in involution? if {f,g } = 0. Thus, iff and H E F(P),f and H are in involution iff is an integral of Y , ,that is, iff is constant along the solutions of the Hamilton equations with Hamiltonian H . The next theorem due to Arnold [I] gives a good qualitative picture of the global conditions that a given Hamiltonian system must satisfy in order that it admit a " large " number of integrals that are in involution.
THEOREM 16.3 Suppose dim P = 2n, and that n functions f,, . . . ,f, are given on P that are in involution with each other and with H. Let Q be a connected component of { p E P:f,(p)= 0 =
=f,(p)}.
Suppose that: (a) Q is compact. (b) The forms df,,. . . , df, are linearly independent at each point of Q. Then Q is diffeomorphic to a torus, in such a way that Y, goes over into a vector field on the torus generated by a one-parameter subgroup.$
Proof. Suppose for the moment that just (b) is satisfied. Then Y,, , . .., Ym are vector fields on P that are tangent to Q and are linearly independent at every point of Q. Further, (b) guarantees that Q is a submanifold of P. Then Yf,, . , . , YJndefine a basis for vector fields on Q . Further, they commute, as
t This is the classical terminology, which probably should be changed because it is confusing to the modern reader. It conies from the classical theory of partial differential equations. 1 That is, regarding the torus as the underlying space of a compact Abelian Lie group. Thus, either all integral curves of YH o n Q are closed or they behave in the same way as do the one-parameter subgroups going off at an irrational angle.
180
Part 2. Hamilton-Jacobi Theory-Variational Calculus
is obvious from the condition that thefi, . . .,f,are all in involution. Then YH can be written in Q in the form YH = 91 Y J , + . . . + gn Yfn2
with
. . . > E F(Q). Now the condition that Y , commute with Yf,, . . . , YSnforces g1 =constant, . . . , gn = constant. Now, if Q is compact, the Y,, , . . . , Yfn generate a 91,
9 1 1
global connected Abelian Lie group of diffeomorphisms of Q; hence Q is the underlying manifold of a compact, connected Abelian Lie group, that is, a torus. Q.E.D.
Now let P continue to be a manifold of even dimension, with a closed 2-form o of maximal rank. Let D be an open subset of P whose boundary in P consists of a number of submanifolds of P. Let Y be a vector field defined in a neighborhood of D such that (a) (b)
Yis tangent to the submanifolds constituting the boundary of D ; Y ( o )= O .
Suppose, then, that there exists a functionfsuch that d f = Y A w . (Recall that d( Y _I o)= 0, so there are certain topological restrictions to the global existence of thisf.) Then the critical points o f f t h a t occur inside D are also zero points of Y. This can be exploited (see, for example, Theorem 16.4) to show the existence of zero points for Y if one knows for some a priori reason that the maxima and minima o f f d o not occur on the boundary. This can be made explicit as follows : Let p be a point on the boundary of D in P . Let us extend the notion of tangent vector to D by saying that a tangent vector v E P, belongs to D, if there is a curve t + o(t), with o(0) = p , defined for sufficiently small t and lying in the closure of D , such that ~ ' ( 0 = ) v. Then, if such a p is also a critical point off restricted to the closure of D, v(f)
= 0 = o(Y ( p ) , v)
for all
E
D,.
This condition can, under suitable hypotheses on Y, often be used to conclude that Y ( p ) = 0. Now, we specialize. THEOREM 16.4 Let D be the region between two concentric circles in the (x,y)-plane. Let t -+ 41 be a one-parameter semigroup area-preserving diffeomorphism of D such that: Each 4t rotates the inner and outer boundary of D in opposite senses, with no fixed points on the boundary. Then the semigroup has at least two fixed points inside D.
181
16. Symmetries of Variational Problems
Proof. Let Y be the vector field in D which is the infinitesimal generator of t + d t . The area-preserving condition requires that Y(w) = 0, where o = dx A dy. Hence also, d( Y(w))= 0. We want to assert the existence of a function in D such that
Now D is not simply connected, but the condition for the existence off is seen to be that the integral of Y _I w around a boundary circle be zero, or that the line integral of the vector field
around the boundary circle is zero. Now X and Y are perpendicular vector fields. Our assumptions on each d t guarantee that Y has the behavior indicated in Fig. 3, where D is the region between the two circles. The full arrows indicate Y, which is tangent to the two boundary circles. Hence X , represented by the dotted arrows, is perpendicular to the boundary circles, the line integral of X around each is zero, and such an f exists.
FIGURE 3
Now let p be a critical point off restricted to the closure of D.Then
o(Y, X ) = Y
_I
w ( X ) = A2 + B2.
At most changing Y to - Y, we can suppose that X always points into D on the boundary. Then p cannot be on the boundary, for otherwise 0 = X(f)(P)
=4-(X)(P) =
y
J W ( X ) ( P ) = A2
+ B2(P),
which contradicts that t -+ 4, has no fixed points on the boundary. Hence, f has at least two critical points (namely, its maximum and minimum) inside D, which then must be fixed points for the semigroup. Q.E.D.
182
Part 2. Hamilton-Jacohi Theory-Variational
Calculus
Remark. The famous PoincarC-Birkhoff fixed-point theorem asserts that a single area-preserving homeomorphism of D that rotates the boundary of D in opposite directions must have at least two fixed points. Thus, Theorem 16.4 is the infinitesimal, differentiable version of their theorem. The proof we have given of Theorem 16.4 is considerably simpler than any existing proof of the stronger theorem. (Notice that, even if the transformation is differentiable, Theorem 16.4 cannot be applied to a single one, since there is no reason to expect that it can be embedded in a whole semigroup. Perhaps this is true, but no doubt proving it would be considerably harder than proving the PoincarCBirkhoff theorem.) It remains a challenge to topologists to formulate and prove a general fixed-point theorem including the Poincart-Birkhoff theorem, whose proof now uses very special techniques. This completes our discussion of the case where P is even dimensional, and where o is of rank equal to the dimension of P. Let us now return to the general case, where o is a closed 2-form on P whose rank is constant. As we have seen, if Y E V ( P ) satisfies Y ( o )= 0, then d( Y 1o)= 0. Now, if the PoincarC lemma applies, there is a functionfs F ( P ) such that df = Y _I o.At any rate, we shall suppose that such a function exists. For example, this is so if the first Betti number of P is zero. Of coursef is only really defined up to an additive constant, but this is not particularly bothersome. THEOREM 16.5 Let o be a closed 2-form on a manifold P of constant rank on a manifold P; let Y E V ( P ) and f E F ( P ) satisfy Y ( w ) = 0, d f = Y A o.Thenfis a constant along all characteristic curves of o. Proof. If X is a characteristic vector field for o,that is, if X
X ( f ) = @(X)
=o(X,
Y ) = ( X A w ) ( Y ) = 0.
_I
o = 0, then Q.E.D.
We see the general setting for the relation between one-parameter groups of symmetries and functions on phase space which are integrals of motion that is typical of all Hamiltonian mechanics. (Physicists usually know this as “ E. Noether’s theorem.”)
THEOREM 16.6 Let o,P , Y and ,f be as in Theorem 16.5. Then a point p E P is a critical point offif and only if the integral curve of Y beginning a t p is a characteristic curve of w.
Proof: If Y ( p ) = 0, this is true (since a point curve must be considered as a characteristic curve of 0). Suppose, then, that Y(p) # 0. Since the theorem is
16. Symmetries of Variational Problems
183
a local one, we may suppose coordinates (q, . . . ,x,) have been chosen for P so that
Thus, if o=
Y(o)= 0 forces
1
15 i . j s r
a i j d x iA d x j ,
Hence, d'(p) = Y(p) J o is zero if and only if aTj(O,. ..,0) = 0, where a:( , . . ., ) is the function on R' such that a:(x,(p), . . . ,xr(p)) = aij(p).But if a'(t) = Y(a(t)),a(0) = p , then xi(o(t))= 0 ~ ' ( tJ) o = =
if i > 1;
t
if i = 1.
1 alj(o(t))dxi(a'(t))d x j 1 aTj(t,0, ..., 0) d x j .
1 si,j s r
25j5r
Buti3alj/ax, = 0 forces aTj(t, 0, . . . ,0) = aTj(O,. ..,0 ) ;hence a'([) J o = 0 Q.E.D. for all t if and only if d f ( p ) = Y ( p ) o = 0. Return to the Calculus of Variations M is a manifold of dimension m ;L is a Lagrangian on M , that is, a realvalued function on T ( M ) ; and P = T ( M ) x R. (For simplicity, we consider only time-independent Lagrangians on M . ) We may as well work with coordinates. Suppose x i (1 5 i , j , . . . 5 n = dim M ; summation convention) is a coordinate system for M . Then ( x i , f i , t ) forms a coordinate system for P , with ki(v) = dxi(v),t the coordinate on R.The x i are just the original xi on M pulled up to T ( M ) with no change in notation. The Cartan 1-form associated with L is then - L ) dt, O(L)= L,+i dxi w is then dO(L). A first source of symmetry vector fields of
UJ is obtained by taking a vector field on M , whose associated one-parameter group preserves the extremals of L, and prolonging to P :
184
Part 2. Hamilton-Jacobi Theory-Variational Calculus
We know that
Y ( W ) )= WW). We have seen earlier that exp(tX) permutes the extremals of L if and only if X(L) = 0. Thus, if Xi s a vector field on M that generates " symmetries " of L, we are in position to apply (16.2), and Theorems 16.3 and 16.4. If f x = o(L)(XT)= Ln+iAi,
then fx is constant along the characteristic curves of dO(L); that is, by Theorem 16.1,
$)
fX(x(t).
= constant
for any curve t + x ( t ) in M that solves the Euler equations. We callf, the function on P that is conjugate to the vector field X on M . Now we can discuss where Theorem 16.4 is applicable. We must examine d f , = d(L,+ A i)for critical points. But
+
d ( L , + i A i ) = L , + i , . + j A j d ~ j (L,,+i,jAi+ L , + i A i , j ) d x j . This is equal to zero at a point and forces (16.7a)
L n + i, n + j A j = 0,
L"+,,j A i
+ L , + , A , , = 0.
(16.7b)
j
These equations admit an immediate geometric interpretation in case L is a homogeneous regular Lagrangian, since then rank (Ln+ ,+ j ) = m - 1, and ( A j ) must be a multiple of (aj), say, l i j. The second equation becomes
,,
dA. 0 = L j , n + i ( xi)lii , + Ln+i(x,i )-2.
ax
The Euler homogeneous function relations give
(
0 = l L j x, -
+ L,+i(x,
'f)) axj
~
a axj
- - - L(x, A ( x ) ) .
Returning to coordinate-free notations, these conditions mean that the point is a critical point of the function q + L(X(q)) on M . There is a similarly simple geometric answer to the question in case L is a nonhomogeneous, regular Lagrangian. In this case, however, (1 6.7) gives essentially a trivial answer. For regularity means that det(L,+ i, n + j ) # 0 ;
16. Symmetries of Variational Problems
185
hence (16.7a) forces A j = 0, that is, the point is a zero point of X . To get a more interesting answer, we must replace 8 by the vector field X + atat:
that is, we must just subtract fromf, the Hamiltonjan function. The critical point conditions for this function become
a
Ln+,n +j Ai = L n + i, n+ j ki
9
-(Ln+iAi - L"+iAi + L) = 0. ax
(16.8a) (16.8b)
But (16.8a) and regularity forces A i = Ai;hence (16.8b) condenses to
Hence, geometrically, we get essentially the same answer as in the homogeneous case. We can sum these computations up in the following coordinatefree form. THEOREM 16.7
Let L be a regular Lagrangian on a manifold M , and let X be a vector field on M that generates a one-parameter group on M permuting the extremals of L ; that is, X generates symmetries of L. Then the integral curve of X beginning at qo is an extremal of L if and only if qo is a critical point of the function 4 +. L(X(4)) on M. A famous example of this theorem is provided by the particular solution given by Lagrange of the Newtonian 3-body problem, where the three bodies rotate uniformly at the vertices of an equilateral triangle. Newtonian mechanics. A glance at a book on classical mechanics will convince the reader that most Newtonian problems correspond to Lagrangians of the form
L(x72) = +gij(x)Ai2j+ U i ( X ) A , - V ( X ) .
(16.9)
(As above, xl,. . . , xn continue to denote coordinates for M , usually the " configuration space.") Tn fact the g i j ( x )are determined by the mass-distributions, the (ai(x))are components of the zwctor potentials, the V ( x ) the scalar potential, of any force fields that may be present. (For example, problems of motion of charged particles usually involve the ai(x) in a nontrivial way, while in gravitational or electrostatic problems, only V(x) is present.) In fact this
186
Part 2. Hamilton-Jacobi Theory-Variational Calculus
possibility of separating the Lagrangian into the sum of terms involving, respectively, inertial masses, scalar potentials, and vector potentials seems to be characteristic of Newtonian physics and accounts for its simplicity by comparison, say, to Einsteinian physics. Now let us compute the Hamiltonian function for the Lagrangian given by 16.9,
Thus the variational problem (which is nonhomogeneous) is regular if and only if det(gij) # 0, a condition that we shall suppose is verified. Let us compute the Hamiltonian function. Put yi=L,+i=gijlj+ai H
= L , , + i l i- L
= g i j i i i j- L =QLJijiiij
+aili
+ V(x)
= +Gij(yi - ai)(vj -
aj)
+ V(X),
where ( G j j ( x ) )is the inverse matrix t o ( g j j ( x ) ) . Again, this simplicity of the Hamiltonian (that is, total energy) is characteristic of Newtonian physics. A word as to why His to be regarded as the total energy might be of interest. Most naively, t g i j i i i j is to be regarded as the “kinetic energy” (4 mass x (velocity)2 in terms of elementary physics), and V(x) is the potential energy, so that their sum is to be regarded as the “total energy.” Of course, in elementary physics, V and hence H i s defined only up to an additive constant. Now the Lagrangian L is time-independent; hence d / d t is to be regarded as generating a symmetry group on P . The function
(3
-O(L) -
H
is then to be regarded as the function that is “conjugate” to the symmetry ?,dt. Another more precise way of looking at this is to regard t as another , replace the Lagrangian L ( x , , . . . , x,, dependent variable, say, t = x , , + ~and i l ,. . . , i,)by the equivalent homogeneous Lagrangian L’(x,, . . . , x,+
1,
11,. . . , l,*+ 1)
=
L
(
XI,
. . . , x,,
5)ifl+P
x,+i l 1 , ... 9 x,+
1
187
16. Symmetries of Variational Problems
Then ‘n+t+i=Ln+i
n
ij
j = 1
Xn+ 1
1
for I ~ i l n , E ~ ~ + -~L ~ =+ ~ - + L .
which is just Q(L)when we identify x , + ~ with t , and i n +with l dtldt, that is, with 1. Considered as a group on the configuration space of variables (xl, .. . , xn, t), the group generated by a/& is genuinely a group of symmetries for the Lagrangian L’, and H is genuinely its conjugate function on phase space. This identification of the total energy with the function that is conjugate to time translation is most useful in relativistic and quantum mechanics, where the “ elementary,” operational ways of defining energy are not available in obvious form. Of course this remark can be turned around and can be used to convince one that $ g i j i i i j and V ( x ) should be regarded as, respectively, the kinetic and potential energy. This indicates that the case where V = 0 = ai is to be regarded as the free particle ” case. Let X be a vector field on M that generates a group of symmetries of the “free particle” Lagrangian L = + g i j i i i j . Thus, if X = A,(x)(d/axi), the conjugate function g i j A i ( x ) i jcan be regarded as the “momentum” function defined by X . For example, suppose that “
9.. V = 6..m, ‘I
m = 3;
that is, L is the free Lagrangian appropriate to a particle of mass m moving in 3-space, R 3 . It is easily seen that the generators of symmetries are of the form
x =i . ( +
pijXj)-,
a
axi
where c l i , p i j are constants, and (pij) is a skew-symmetric matrix. Let us take, say, translation in the x,-direction:
The conjugate momentum function is then m i l , that is, just the classical linear momentum in the corresponding direction.
188
Part 2. Hamilton-Jacohi Theory-Variational Calculus
Suppose now that
x = x 2 - a- x l - . a ax, ax,
Thus X generates the group of rotations about the x,-axis. The conjugate momentum function is m(x2 il- x1i2), which is just the classical angular momentum. These facts explain from our higher point of view the role that these momentum functions play in elementary physics. Further, giving these functions such a group-theoretic interpretation enables one to define the corresponding functions in the generalizations of classical mechanics ; for example, Einsteinian and quantum mechanics. The Equivalence of Homogeneous and Nonhomogeneous Lagrangians. The Principle of Maupertuis Let (x,, . . . , x,) be a coordinate system for the manifold M , and let L(x, 2 ) be a time-independent homogeneous, regular Lagrangian for M . We now ask: How can the differential equations for the extremals of L be written in Hamiltonian form? One way is to give up the symmetry between all variables, and choose one, say, xl, to parametrize the curves of M . The effect of this is to construct the Lagrangian
. ., ., q x , , . . . ) X,-I, t, i,
= L XI,. . . ) X,-I,
A,
t, -,i,. . . ) . x,+ 1 x,+ 1 I),
obviously leading both to rather awkward formulas and to difficulties from the global point of view. There is an alternate procedure that keeps the symmetry between the dependent variables ; hence it is preferable both for esthetic reasons and because it carries over to manifolds. The extremals of L are always the curves of M whose tangent vector curve in (x, I)-space is a characteristic curve for dO(L), where
B(L) = L,,; d x ; . Now the obstacle in the way of writing the equations of the characteristic curves in Hamiltonian form is that the functions L,+l, . . . , L,, are not functionally independent. Indeed, since rank(l,+
i, ,,+ j )
= II -
1
(by definition of regularity), we know by the implicit function theorem that there is a function H(x,, . . . , x,, y,, . . . , y,) of 2n-variables such that identically. . . ., , L,,(x, i) =) 0 H(x,, . . . , x,, L,, l(x, i)
16. Symmetries of Variational Problems
Define a mapping
189
4 of (x, 1, c)-space to (x,y , t)-space by
Then, since by construction of H , 4 * ( H ) = 0, we have 4*(yi dxi - H dt) = 8(L).
Thus 4 carries a characteristic curve of dQ(L)into a characteristic curve of d(y, dxi- H dc) restricted to the submanifold: H = 0. We know from Chapter 13 that the characteristic curves of d(y, dx,- H dt) are, after parametrization by t , just the solutions of the Hamilton equations with Hamiltonian H ( x , y). Hence the differential equations for extremal curve x ( t ) of L can be written in Hamiltonian form : (16.10) subject to the subsidiary condition H ( x ( t ) ,y(CN = 0.
(16.1 1)
Now suppose that L'(x, 1) is another Lagrangian on M , in fact nonhomogeneous and regular. Suppose in addition that this same function H is the Lagrangian for L'; that is,
(
El
Hx,-
Ei-..
=-
We also know that (16.10), without the subsidiary condition, describes the differential equations for the extremals of L. We can immediately draw several conclusions. (a)
Every extremal curve of L can be parametrized so that it is an extremal + x(t) is determined by the condition
(b)
Every curve x ( t ) of L' that satisfies
of L'. This parametrization t
is also an extremal curve for L.
190
Part 2. Hamilton-Jacobi Theory-Variational Calculus
We now want to turn these remarks around and suppose that L' is given. We know that H ( x , y ) can be constructed according to the usual rules and that every extremal of curve t -+ x(t) of L satisfies (16.10), with and
ff(x(t), y ( t ) ) = constant
Suppose now that we can find a one-parameter family e -+ L" of homogeneous, regular Lagrangians on x-space such that
(E 1
H x, We conclude:
(x, 2 ) = e
identically on (x,2).
(a') For each choice of e, the extremals of L' can be parametrized so as t o be extremals of L'. (b') If t --f x(t) is an extremal of L', with
then it is also an extremal of L'. This reduction of the problem of finding the extremals of a nonhomogeneous Lagrangian to the problem of finding the extremals of a one-parameter family of homogeneous Lagrangians is sometimes known as the isoenergetic reduction. (Notice that if L' is the Lagrangian of a Newtonian-mechanical problem, H is just the energy.) This circle of ideas forms what may be called the Principle of Maupertuis. Let us descend from these generalities to find examples of Lagrangians that actually occur in Newtonian physics. Let us consider a Lagrangian of the form L'
= JV(x)g,,(x)i.,i,
+
U'i.,
The condition for regularity is seen t o be det(gCj)# 0,
V # 0,
and we shall suppose without further comment that these conditions are fulfilled. =
Jvgijij
+ ui JTiiij
16. Symmetries of Variational Problems
191
Let H ( x , y ) = G i j ( x ) b ,- ai)(yj- a j ) - V(x),where (G,) is the inverse matrix to (gij).Then
As we have seen above, this function H(x, y) is the Hamiltonian for the following nonhomogeneous Lagrangian :
L = + g i j% i f j + V(X) + a, ii. Thus an extremal of L‘,say, t 3 x ( t ) , that satisfies dxi dt
= 0 = L,+i - - L
or
dxi dx .
3gij-A - v(x(t)) = 0 dt dt
is an extremal of L. For example, if a, = 0, V = +,this means that the extremals of L‘ that satisfy
are extremals of Lr2.Now, in this case L’ defines a Riemannian metric on A4 (see Part 3). The condition C ( x , (dxldt))= 1 means that the curve t + x ( t ) is parametrized by arc length. (This equivalence between the variational problem defined by L‘ and L’’ is used as a simplifying technique by Milnor El]. Notice, however, that it is quite special and is linked to the quadratic nature of the Lagrangian.) Conversely, suppose that we start off with L given. The time independence of L implies that, for each extrernal curve, t x ( t ) : --f
dx, dx, dxi + gij - - V - a, - = constant. d t dt dt This constant can be absorbed in V. The result is that the integral curves of L for which this constant has the value e are integral curves of the homogeneous Lagrangian Le=J@+e)gijiiij+aiii.
The most interesting case is where a, = 0. Then this isoenergetic reduction process” tells us that all integral curves with a given value of “energy” are geodesics (that is, extremals) of a certain Riemannian metric or configuration space. As we shall see in Part 3, the theory of geodesics on a Riemannian manifold has a much richer geometric background than a random variational problem. “
I92
Part 2. Hamilton-Jacobi Theory-Variational Calculus
The Transition from Newtonian to Einsteiniant Mechanics via the Calculus of Variations The aim of this section is to show how some standard facts of special relativity theory can be derived (following rather closely to Levi-Civita’s ideas [l]) from the formalism of the calculus of variations. To give the essentials of the method, it suffices to suppose that M is onedimensional, so that the Newtonian picture is of a particle of mass m moving on a line with coordinate x and potential energy V ( x ) .The Newtonian Lagrangian is, of course, just
This Lagrangian is of the nonhomogeneous, regular type, so that its extremals come with their own parametrization. This parameter, of course, is identified with the physical time. Now L defines a whole class of Lagrangians, as V ( x ) runs over the class of suitable functions. Let 4 be a transformation of (x,t)-space into itself which “ preserves” the class in the following sense: Given L and 4, there is a Lagrangian L‘ of the same class, that is,
so that if t .+ (x(t), t ) is an extremal of L, t + 4(x(t), t ) is an extremal of L‘. Now it is easily seen that this requires that c$*(dt) = dt. That is, 4*(t)= t + constant: (4*)*(L’)= L. To determine the explicit possibilities for 4, suppose $*(x) = f ( x , t ) . Then
for some constant
M,
or
af
ax =
fl
or
f=
+x+g(t).
t We believe it would be more accurate from the geometric point of view to replace the standard name ‘‘ relativistic mechanics ” by “ Einsteinian mechanics.” For example, the beginner often is led to believe (in the numerous bad expositions of the theory) that classical Newtonian mechanics is not “relativistic.” However, it is covariant under a perfectly good group, namely, the Galilean group. In fact, the main effect of the Einsteinian revision is to replace covariance under this group by covariance under another group, the Lorentz group. (Of course Einstein himself in his own popularizations is always very good on this point.)
193
16. Symmetries of Variational Problems
forces g ( t ) = fi = constant. Finally, then, we see that
4 must be of the form 4(x, t ) = ( + x + a,t + p).
Hence the symmetry group is the group of '' rigid motions " of the real line. We also see that the symmetry group permuting this class of Lagrangians is the symmetry group of the Lagrangian for the free Lagrangian. Now let us look at the symmetry group for the extremals of the free Lagrangian. We are looking for maps 4 of (x,t)-space into itself which permute the extremals of L = $2.
4 is of the form The extended transformation of
4*: (x,i,t ) -+
4 is
(f ( x , t), ax i+ -,at t + p1. af
-
af
Now O(L)= idx - +dt; i2 hence
Setting the coefficient of d i
A
dt in 4*(dO(L))- dO(L) equal to zero gives
This implies, since (x,1,t ) are independent variables,
Hence,
(
(4*)*(O(L))= + i Setting the coefficient of dx
-d2g =o dt2
$1
+ - ( f d x + dg) - 1 ( + i+ 2 A
g)2
dt.
d f in d(&)*(O(L)) - dO(L) equal to zero gives or
g(t) = yt
+ tl.
194
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
Finally, then, 4 is of the form (x, t ) ( +x, y t + M, t + fi). This is a GuZiZeun transformation. Its defining property can be put more physically in the following way: --f
If s + (x(s), t(s)) is a curve in space-time, let ((dx/ds)/ (dflds)) be the velocity of the curve. The Galilean transformations permute the curves of constant velocity. The coefficient y is the increment given to the velocity. The new coordinates for space-time introduced by a Galilean transformation then represent physically a coordinate system moving at constant velocity with respect to the old. The Einstein modification of this scheme is an attempt to allow transformations of space-time that permit a more thorough “mixing” of the space and time variables, that is, that dethrone time from the absolute position it holds in Newtonian physics. Now an arbitrary transformation on (x, t)-space would send a Lagrangian of the form
into a Lagrangian on (x, t)-space in the form L’(x, t, i, t), but clearly it would be of a rather unrecognizable type. Following Levi-Civita [l, p. 2921, we can modify the Lagrangian L to get a Lagrangian L* that does have a more reasonable transformation law under space-time transformations. Consider a given extremal of L, and let c be a positive real number that is very large in comparison with the velocity of the given extremal. Let us first modify L to
This is harmless, since it does not affect the extremals of L. In a neighborhood of the tangent vector-field t -+ ( t , x(t), (dxldt)) of the given extremal, $ I
1--+c2
Thus, if we replace L by
c2
is close to
J :+ 1 - -I
2v(x).
16. Symmetries of Variational Problems
195
we can be confident that the extremal of L* having the same end points as t + x ( t ) will be “close ” to t + x ( t ) . t What is the significance of the Lagrangian L* ? Notice that L* is in fact closely related to the homogeneous Lagrangian
In fact the extremals of L**,which are a priori of the form are, when reparametrized by t , just the extremals of L*. Now L** is a Lagrangian whose extremals are geodesics (extremals) of a pseudo-Riemannian metric (of Lorentz type) on space-time. Such Lagrangians have a very simple transformation law under changes of variable in space-time. Thus, at the expense of introducing this new sort of “ Lorentzian ” or (which is more just historically) Minkowskian geometry for space-time, we have “ geometrized ” the problem of allowing “ mixing ” transformations of space and time. Notice another fact that is of interest for the later extension to general relativity: The coefficients of the metric determined by L** depend, in case V ( x ) # 0, on the mass of the particle. Crudely, the massiveness of the particle actually affects the geometry. Let us return for a moment to L* and compare it to L. We know how to define the energy and momentum of the Newtonian system by applying the variational formalism to L ; namely, energy = -mB(L)
,
momentum
= rnO(L)
This also makes sense with L replaced by L*. Let us compute and see what happens. aL _---
ai
e(L) =
JC’
C i
- P’ + 2 ~ ( x ) / m . C f
Jc’
dx - H dt.
- i z+ 2 ~ ( x ) / m
t The analogy is with the following fact about extrema of functions in finite dimensional spaces: If qo is a critical point of the real-valued function q + F(q), and if the critical point is nondegenerate, then for small E , the function q + F(q) EG(q) will have a critical point that is close to qO.In this finite dimensional situation, this can be proved by use of the implicit function theorem, but the matter is more delicate for functions on infinite dimensional spaces such as those that occur in the calculus of variations.
+
196
Part 2. Hamilton-Jacobi Theory-Variational Calculus
with
Thus the
“
energy ” is
and the momentum is
mk Several famous conclusions follow from these calculations : First notice that if V ( x ) = 0, the energy differs by an additive constant from what it is in the Newtonian case; namely, energy
= me2.
Further notice that if t + x ( t ) is an extremal, then i ( t ) = dx/dt, the velocity of the particle in the usual sense. If (as we shall see in a moment) c should be identified with the velocity of light in a vacuum, then as long as the velocity of the particle is small compared with this velocity of light and V(x) is small compared with mc’, there should be no substantial difference from the Newtonian energy (except for the additive constant); hence the motion should also not differ substantially from the Newtonian motion. (Of course this is built into our construction, but it is nice to see precisely how it is reflected on the equations of motion.) Finally note that if one wants to write momentum
= mass
x velocity,
the mass of the particle must be identified not with its Newtonian m, but with m
+
J 1 r ( F / c z ) (2v(x)/mcz).
Thus the mass of a moving particle may vary with time, particularly if V ( x ) is small compared with mc2, and (dx/dt)/cis close to 1; the mass blows up. Since the energy must be a constant of motion, we see that a particle moving by this Lagrangian never can (if V ( x ) is small compared with mc2) approach
197
16. Symmetries of Variational Problems
the velocity of light (from below). Finally, let us write the Euler-Lagrange equations of motion : aL- aL _d _ -dt
a i ax’
or
or d
mci
Notice that the Newtonian law of motion, mass x acceleration
= force,
makes no kind of sense. But, if the Newtonian law is rewritten in the form derivative of momentum = force, it does make sense in Einsteinian mechanics. with the force -(dV/dx)
JC ( q c v ( % i j / m T )
’
Now - ( d V / d x )is the Newtonian force. We shall leave Einsteinian mechanics at this point, having described how the basic laws of elementary physics might be modified. Let us now return to the homogeneous Lagrangian L** in ( x , t)-space whose extremals, when parametrized by t, are those of L*. We shall consider only the force-free Lagrangian, so L** = J
T T .
Let us look for the symmetries of L**, that is, the transformations space x time into itself such that (4,)*(L**) = L**. If
4*(t>= g ( x , t ) ,
4 * ( 4 = f ( x , t), then
af
af
(4*)*(f) = i, ax i+ at
ag
ag
(4*)*(i)= i; ax f + at
4 of
198
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
hence,
Thus,
Differentiating the first relation with respect to t and the second with respect to x , c 2 ag a29 at
ax ax
af a2f - 0 , ax ax at
c 2 ag a29
at
ax at
af a2f -- 0. at ax at
Thus, either
The second possibility is impossible: For example, it leads to aflax - aglax ag/ax aj/ax
(3’(:I2,
or
=
which leads to c 2 = 0. The first possibility leads to Substituting this back in, leads to
dk
2($)
-
dh,
= c2
Using the first two gives
This identity forces dk, -= a , dt or k , is a linear function o f t .
= constant,
199
16. Symmetries of Variational Problems
Similarly, working on the rest of the relations, we see that f and g are linear functions of x and t, say,
4*(x) = allx + a,,
t
+ a,
4*(t)= azlx+ a22t + b.
The relations the constants must satisfy are found by substituting back in c”$, - a:,
= -1,
c2&
-
= c2,
c2a21a22- a,,a,,
= 0.
These conditions define the affine linear transformation as the Lorentz group.? Several very important physical facts can be read off from the properties of the Lorentz transformations. First let us examine what happens as c -+ co. Let us suppose that the matrix
(2 2)
depends on c and that each element goes to a finite limit as c -+ co. Let a21
a)2) a22
be the limit matrix. It follows from the first and second relation that aLl = 0, ai2 = A 1. Now it is readily seen that the determinant of every Lorentz transformation is _+ 1. This relation also holds in the limit: 1 = (a;,a;2 - 4;1u;2)2 = Q12l l U ,22 2 = a 12 ll.
Finally, then, we see that the “ limit ” of a Lorentz transformation as c -+ co is one of the form x-+ + x
+ ut f a
(withu =
t
-+
+t
+ b.
Notice that this is a Galilean transformation! Hence we may say that the “ limit ” as c --+ co of the Lorentz group, the symmetry group of Einsteinian physics, is the Galilean group, the symmetry group of Newtonian physics. Thus we have a more sophisticated group-theoretical way of describing the transition. A variant of the same sort of reasoning can be used to describe the transition from quantum to classical mechanics. Now, let s -+(x(s),t(s)) be a curve in space-time. We agreed earlier to call dxjds
--
dtlds
-4s)
the velocity of the curve. We saw that the Galileangroup was characterized by
t Of course ‘‘space ” is usually three-dimensional, so that what is usually called the Lorentz group is the analogous group acting on (x,,x2, x 3 , &space.
200
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
the property of giving a constant increment u to the velocity of all curves in space-time. Let us examine a Lorentz transformation from the same point of view. Now the transformed curve is
+
s -+ (allx(s> a I 2t ( s )
+ a, ~ ~ ~ x+( a22 s ) t(s) + b).
Hence the transformed velocity is a I l(dx/ns> + a 1 2 ( W s ) - a 114 s ) + a 1 2 a,,(dx/ds) + a22(dt/ds) - a214s) + a22’
Notice that if ~ ( s )= 0, then the transformed curve has velocity a12/a22. This number, then, should be an interesting invariant of the Lorentz transformation; in fact, it would be physically just the velocity of the new coordinate system defined by the Lorentz transformation with respect to the old. Let fl = a12/a22 be this invariant. Now 0 seems completely to determine the transformation. (This version of the Lorentz group is just one-dimensional.) One finds explicitly that 011 =
1 J1 - (P2/C2)’
a21
=
B czJ1 - ( f l ’ / C ’ ) ’
(The sign +_ is determined by the sign of the determinant (u11a22- u I 2 u 2 , ) . ) Substituting this back into the expression for the transformed velocity, we see that it is
+ B < + _ 4 s )+ P (a21/a22)4s> + 1 - * ( W B / C 2 ) + 1 . (a11/a22>4s>
Thus we see explicitly how the transformation law for velocities in Newtonian physics must be modified. Notice also that the condition that the transformation be real is /I2 < c2. Then
+
v(P/c2) + 1
cannot be greater or equal to c if B < c.
For then there would be (by continuity) a u such that
Hence it is impossible to take a velocity less than c and transform it to be greater than c. In effect, c is an upper bound for the velocities possible in our
16. Symmetries of Variational Problems
201
world. Further, the same argument (when reversed) shows that the result of applying a Lorentz transformation to a motion of velocity cis again a motion of velocity c. These motions (physically, they are the paths of light rays, as we shall see in a moment) then have a distinguished role, both as a limiting possibility for the motions of velocity less than c and as possessing the property that their velocity is invariant under Lorentz transformation. (This latter property is just the mathematical statement of the result of the famous Michelson-Morley experiment, which initiated the relativistic ” revolution in physics.) Another way of labeling these notions serves to introduce us to the notions of general relativity. Change our previous notations slightly, and let L be the following Lagrangian on (x,t)-space: L = Ci2 - f 2 . “
In terms of the jargon to be introduced in Part 3, L defines a pseudoRiemannian metric of Lorentz type, or a Lorentz metric, for short, on (x,t)space. The curves whose velocities are less than c are then just those on whose tangent vectors L has apositice value. Such curves are also said to be timelike. The curves on whose tangent vectors L has a negative value are spacelike, while those for which L has the value zero are lightlike. Since the Lorentz group preserves L, it is geometrically obvious that it permutes timelike, spacelike, or lightlike curves. The route we have taken in developing the Special Theory of Relativity ” by generalizing Newtonian mechanics is not historically the way it was discovered, nor is it even the most important from the general physical point of view. Actually, the main clue in the minds of the discoverers of the theoryEinstein, Lorentz, and PoincarC (listed in alphabetical order)-was that the Lorentz group, not the Galilean group, permuted the solutions of Maxwell’s equations of electromagnetism. In fact, in Whittaker’s judgment [2], Poincart! rather than Einstein deserves most of the credit, since he saw most clearly that it was just this property of the Lorentz transformation that was involved. (Of course, as one of the two or three greatest figures in mathematics-indeed in all of science-in the nineteenth century, this is not surprising. As in so many other things, PoincarC was far ahead of his time, for this groupinvariance point of view has only recently been absorbed into the mainstream of physics.) While it would be too great a detour to describe Maxwell’s equations here, perhaps it is worthwhile to give a primitive, one-dimensional version of the argument. First, we must describe what is meant by a ware. Restricting ourselves to 1-space dimension x,it may be described as a real-valued function S(x, t). For a fixed t and constant a, the points S(x, t ) = a may be called the wave fronts at time t. A curve t -+ x(t) is a ray if “
S(x(t), t ) = constant for all t.
202
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
Thus the ray is a curve following along the wave front. (In our one-dimensional situation, of course it is more or less uniquely determined.) The velocity (dx/dt)(t) of the ray may be called the velocity of the wave at time t and point ?(?). Now the curves describing ordinary light waves in a vacuum are those satisfying the wave equation :
a2s ax2
-
1 a2s c2 at2
Suppose 4 is a Lorentz transformation of (x,t)-space into itself. We leave it to the reader to verify, but it is seen that: If S is a solution of the wave equation, so is +*(S). This is what is meant by the Lorentz transformation “permuting” the solutions of the wave equation or, more loosely, preserving the wave equation. Suppose now that t + x ( t ) is a ray associated with a given wave S(x, t). Then
as d x as - ( x ( t ) , t ) - + - ( ~ ( t ) t, ) = 0. ax
clt
at
First we assume without proof a proof of the uniqueness of the wave equation: If S(x, t ) is a solution, and S(x, 0) = (aS/ar)(x,0) for all x, then S is identically zero. (See Courant-Hilbert [l, p. 4411 for a simple proof.) Thus, if f( ) and g( ) are functions of one variable such that S(X, 0) = I ( x >
+dx),
as
(x,0) = c ( f ‘ ( x ) - g ’ ( x ) ) ,
then
S(x, t ) = f ( x
f ’ ( 0 ) = 0,
+ ct) - g(x - ct).
(Clearly, such functions can be found.) Now this “general ” solution represents a superposition of curves traveling to the left and right on the x-axis. Clearly, only waves traveling strictly in one direction will possess a genuine “wave velocity” and a system of rays. Suppose, for example, that S ( X ,t ) = f ( x - ct). Then
or
_ -- c
dx
dt
or
x=ct+a.
16. Symmetries of Variational Problems
203
Thus we have completed our limited discussion of the connection between c as the wave velocity” of light waves and as the constant occurring as an upper bound for velocities (and in E = me2) in Einsteinian mechanics. “
Special Relativity and Lie Group Theory In the preceding section we developed the elements of what is usually known as the Theory of Special Relativity from the point of view of generalizing Newtonian mechanics so as to replace the symmetry group of classical mechanics (the Galilean group) by another group (the Lorentz group), which permits some mixing between space and time. It is appropriate to call this and its later extensions to general relativity, which generalize the Newtonian theory even further, Einsteinian mechanics.” In this way of developing the theory, Lie group theory plays only a subsidiary role, and mechanics (or the calculus of variations) is basic. We shall now present an alternate approach based more directly on Lie group theoretical considerations. We shall still consider a space-time which, as used throughout the Theory of Special Relativity, is a manifold covered with a single coordinate system, diffeomorphic to Euclidean space. For the moment, we shall continue to suppose that space is only one-dimensional and that (x, t ) are the coordinates on this space-time manifold. As in the preceding section, the velocity of a curve s + (x(s), t(s)) in spacetime is, as a function of its parameter, “
dxlds s-+-- u(s). dtlds We saw that both the Galilean and Lorentz groups had the property that they permuted the curves with constant velocity. We shall now show that these are essentially the only possibilities if we want this and a reasonable physical property to hold. Suppose 4 is a diffeomorphism of space-time, with $*(XI
=f(x, t),
$*@I = 9(x,t>.
Then, if s + 4(x(s), t(s)) is the transformed curve,
Thus we see that the transformed curve is
204
Part 2. Hamilton-Jacobi Theory-Variational Calculus
The velocity of the transformed curve is then
(d/dW(x(s),t ( 4 N - (iifiJx>(dx/ds> + ( w J t ) ( d t / a- ( m W ( s ) + (af/w. ds(g(x(s), t(s>>>
(dS/aX>(dx/ds)+ (ag/dt)(dt/ds) - (ag/ax)v(s>+
(WW
Hence, if we want to be constant whenever u(s) isconstant, it is clear that the coefficients must be constant; that is, f and g must be linear? functions of x and t , say, 4*(t)= azlx + aZ2t , c#J*(x) = allx + a,, t, so that the velocity of the transform of a curve moving with constant velocity u is v’ = a110 + a12 az1u + a 2 2
As before, it is convenient to put P = u I 2 / a z 2the , velocity of the transform of
a curve with zero velocity, so that 0’ =
(allla,z>~+ B (a21la22)u
+1.
Now, when fi approaches zero, one expects (for physical reasons, if none other) that z’’ approaches u. This indicates that the coefficients of L) in the numerator and denominator should be$ functions of p, say,
v’ = 4 P ) v
+P
Y ( P b + 1’ with ~ ( 0 = ) 1, y(0) = 0. We now determine c( and y by imposing the condition that the transformations we are considering form a group, which again is obviously physical. Suppose that & is the parameter of another such transformation and that the result of composing the two transformations is a third transformation characterized by parameter p 2 . A direct computation shows that
First notice that product, we have
B2 = u ( / ~ ~ + ) PPI. Comparing
the other two terms in the
4 4 P I ) P + P I ) = 4Pl).(P) + PI? Y ( 4 B l ) P + P1) = Y(Pl>4P> + Y(B>.
7 For convenience, we shall consider only homogeneous transformations.
J More accurately, one expects that the transformations belong to a group. The condition 0 requires that the group be onethat the transformations approach the identity as dimensional.
p
--f
205
16. Symmetries of Variational Problems
Differentiate these relations with respect to
p and then set p = 0:
E ’ ( P l ) a ( P d = .(PI>~’(O>,
Y(Pl)a(B1) = y(Pl)a’(o) + Y’(0).
Also differentiate the first relations with respect to p, and then set P1 = 0:
+ 1) = a’(O)a(p) + 1.
.(p)(a‘(o)p
From this last relation, we have, after setting
0 = 0,
1 = a’(0) + 1
or
~’(&(@)=
Y’(P)@(P) = Y’(0).
a’(0) = 0.
Now changing PI to p, we have the following differential equations determining a and y : 0 3
Now a(P) = 0 is ruled out, since a(0) = 0. Hence a’(P) = 0
Thus, putting y’(0)
= l / c 2 , we
or
a(@
=
1.
have?
This gives the law of transformation for velocities that we obtained earlier for the Lorentz group:
Now, we must determine the whole matrix
(::: u“):.
As before, the conditions that this belong to a group and that it approach the identity as fi + 0 require that the coefficients be functions of fi also. Thus
Now, from the fact that c@)
=
1, that is, p2 = PI + p, we see that
is a one-parameter group of 2 x 2 real matrices. 7 We include the possibility ~’(0) = 0 by possibly allowing c
=
co.
206
Part 2. Hamilton-Jacobi Theory-Variational Calculus
One more relation is needed to determine the matrix elements completely. This can be obtained by the following considerations: Let D(p) = alla22 - uz1ulZbe the determinant of the matrix. As for any one-parameter group of matrices, we have D(B + B1) = D(P>D(P1);
hence,
D’(P) = ~’(O)D(P).
Hence,
o(p)= e D ’ ( 0 ) P .
We want to conclude that D’(0) = 0. We must impose an additional physical condition to do so : Let s + (x(s), t(s))be any curve in space-time. The “ timeinterval along the curve, say over the interval 0 < s < 1, is just t(1) - t(0). The transformed curve is ”
s
The “time interval
-+
”
(allx(4
+ a12 44, a21x(s>+ a 2 2 t(s>).
along this curve,? say, for 0 2 s < 1, is then just
a2,xU)
+ a 2 2 t(1) - a21x(O>- a 2 2 t(0).
Now this is in general riot the same time interval as the original curve. Of course, as p-+ 0, the time intervals become the same, namely,
- a2140) + t(lNa22 - 1) - m a 2 2 - 1)
0 as /?-+ 0. However, it is reasonable to suppose even more: namely, that this difference divided by P approaches zero as B 0. This requires, of course, that Q2141)
+
-+
It is readily verified that this condition leads to D‘(0) = 0; hence 2 P2 a 222 3 N P ) = 1 = a11a22 - a12a21 = a22 -7
or
t Mathematically, “time” is just a real-valued function on our manifold so that the “time interval ” for the end points of two curves is just the difference of the values of this “time function ” at the end points.
20%
16. Symmetries of Variational Problems
By reversing the reasoning of the preceding section, it is seen that these conditions show that 4 is a Lorentz transformation. Thus we have succeeded in characterizing the Lorentz transformations by means of reasonable conditions. This method bypasses mechanics. We can now reintroduce it in the following way. Suppose a particle of mass m moves along a curve in space-time. Its Newtonian energy and momentum are, respectively, dxlds dtlds
E(s) = m -
M(s)= m -
Considered as a function on velocity space, with the coordinate u,
E(v) = imu2,
M(u) = mu.
Now a Galilean transformation 4 of parameter s + u + p on velocity space. Thus
4*(E)(u)
= m(u
p introduces the translation
mpZ + p)" = mu2 + mug + 2 2 ~
~ * ( M ) ( u=) mu + mp = M(u) + mp. Thus,
4*(E) = E
mp2 + p M + -, 2
+*(M) = M
+ mp.
This computation indicates the following group-theoretic interpretation of energy and momentum : The transform of any one of the functions E, M and the constant function under a Galilean transformation is a linear combination with constant coefficients of the functions themselves. Thus the mapping
p+[
F
defines a linear representation of the Galilean group by 3 x 3 real matrices. Perhaps it is worthwhile to pause and describe the general background of this sort of phenomenon. Suppose that a Lie group G acts on a manifold P. The space F ( P ) of all real-valued C" functions on P forms a vector space
208
Part 2. Hamilton-Jacobi Theory-Variational
Caiculus
under pointwise addition and multiplication by real scalars. The action of G on P defines a linear representation of G into the group of linear transformations on F(P>. For f E F(P), g E G, the transform off by g is defined just as g*(f). Now F ( P ) is, of course, infinite dimensional. However, there may be linear subspaces of F ( P ) that are finite dimensional and invariant under G and hence define a finite dimensional representation of G. The case where P is the velocity space used above; G is the Galilean group (which is just the translation group on velocity space); and the subspace of F ( P ) is spanned by E, M ; and 1 is the case considered above. Now, in general, it is a very difficult problem to decompose completely the representation of G on F(P). (Such a decomposition would be known as the “Plancherel theorem” for the action of G on P.) However, the cases where it can be accomplished are very important: For example, if P is the group G itself, with a compact G acting on itself by left translation, then the resulting decomposition is the Peter- Weyl theorem. The case where P is the 2-sphere in 3-space, with G the group SO(3, R) of rotations, is very important in quantum mechanics. The irreducible finite dimensional subspaces F ( P ) are generated by letting G act on the spherical harmonics. One further general remark will be useful to us in extending these considerations to special relativity. Let fi,. . . , h be linearly independent functions on P that are transformed among themselves by the action of G. Explicitly, g*(fi) = a i j ( g ) f , ,
1 I i,j, . . . I n (summation convention).
The mappingg + (aij(g))then just defines a linear representation of G by n x n real matrices. (This is just the matrix representation obtained by choosing the basis fi, . . . ,f , in the space of functions spanned by fi, . . . ,f,.) We know from general Lie theory that there is then at the “ infinitesimal ” level a corresponding linear representation of the Lie algebra of G by n x n real matrices (Lie bracket going into the commutator ab - ba). This can be obtained explicitly as follows: Recall that an element of the Lie algebras of G is a one-parameter subgroup of G, say, t + g(t). It, acting on P, has an infinitesimal generator A’, which is a vector field on P. This correspondence defines a Lie algebra homomorphism of G into V ( M ) . Explicitly,
We then see that
The mapping X real matrices.
+ ( a i j ( X ) )is
the desired linear representation of G by ( n x n)
209
16. Symmetries of Variational Problems
Returning to the case where P is velocity space, G the Galilean group, we see that these infinitesimal relations take the form dE X(E)= -= 2M,
aM X ( M ) = -= m ,
av
av
X(1)= 0.
Note that we can separate out the roles played by E and M by, additional relations :
X y E ) # 0.
X 2 ( M ) = 0 = X3(E),
Now turn to the Lorentz group. It, too, acts on velocity space: U-+
v+B .
I
(P/c2)v
+1
*
Its Lie algebra is also one-dimensional: The infinitesimal generator is then
Now, in the Galilean case,
xi)=[
i i)(4).
8 ;I.
The trick now is to replace this matrix by g
2
It is not possible here to explain in detail why this is the correct modification. However, it is related to the fact that in higher dimensions the Lorentz group is a semisimple Lie group, while the Galilean group is not. (Notice that the matrix
i)
has three distinct eigenvalues, namely, 2 = 0, , I=
[ 8 K)
(lie), while
has only 2 = 0 as a multiple eigenvalue. The effect of the perturbation by c is
210
Part 2. Hamilton-Jacobi Theory-Variational Calculus
to split apart" these eigenvalues.) Now E and M satisfy the following conditions : "
X(M)=
X(E) = M ,
1 E c
+ m.
Hence X 2 ( M )= (l/c*)M. This enables us to determine M explicitly by a change of variable: u = 2 log(%).
Notice that (1
-
$) 2 (; log(=))
=
1.
Let M *(u), E *(u) be the functions such that
M*(u(u))= M(u),
E*(u(v)) = E(u).
Thus, by this change of variable, X goes over to d2M* - - M 1 *,
--
dU2
c2
a/&,
and M
*, E*
satisfy
dE" -- M* du
Hence, M * ( u ) = aleUiC + a 2 eCuIC,
E*(u) = b
+ c(aleu/C- a 2 e-"/').
Now
Now we clearly want
M(O)=O=a, + a 2 ; hence,
M(u)=
2a u / c [l - ( V 2 / C 2 ) ] ' / 2 '
21 1
16. Symmetries of Variational Problems
Now
E*(u) = b or
[(-1cc +- vu + (-)cc +- uu 2c b + c 4 ( c 2 , , /,j b
E(v) = b =
+ cul(e”/‘ + e-”/c), + CLI
-
=
‘”1
2u1c
+ [1 - ( v 2 / c 2 ) ] 1 / 2 ‘
We can determine the integration constant b by using the relation
1 X(M)(O) = - E(0) C2
+ m = -1 ( b + 2 a 1 c )+ m C2
-- aM ( O ) = > 2a ,
av
C
or b = -mc2. To determine a,, it seems necessary to impose an additional condition: For example, it seems reasonable that as v + 0, relativistic effects should subside and that d M / d v should approach its Newtonian value m. But (i?M/av)(O)= 2a1/c. Finally, then, M(v) =
mu [l - ( u 2 / c 2 ) ] ” 2 ’
E(v) = - m c 2
+ [l - mc2
(v2/c2)]”2’
Now we have determined M and E partly by requiring that they reduce to the Newtonian values as c + co. However, there is nothing sacred about E = +mv2 in the Newtonian case; $mv2 + constant would serve just as well. However, since no particular constant serves to simplify anything, we usually are content to let it be zero. It is quite different, however, in the relativistic case. Notice that redefining E as E ‘ , where E’
=
mc2
[l - ( u 2 / c 2 ) ] ’ / 2 ’
gives the following law of transformations under infinitesimal Lorentz transformations: 1
X ( M ) = - E’, C2
X(E‘) = M .
This is obviously a considerably simpler transformation law than our original choice of E. Notice, for example, that the transformation law no longer invofves m , but is determined completely by the underlying geometry. In fact,
212
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
let us compare this transformation law to the transformation law satisfied by the functions x and t on space-time. We have seen that a Lorentz transformation on space-time can be written in terms of fl as follows:
CZ[l
Px -(p/c2)]”2
+
[l - ( P Z / C 2 ) ] ” 2
Hence the infinitesimal generator is the vector field (which we shall also call X , since it, too, is identified with the generator of the Lie algebra of the Lorentz group) :
Hence the functions X and t transform under a Lorentz transformation precisely in the same way as do M and E‘. This suggests the following geometric construction : Let s -+ (x(s), t ( s ) ) = a($) be a curve in space-time, with velocity function u(s) = (dx/ds)/(dt/ds). Define a vector field along (r by assigning to a(s) the tangent vector
a + M(v(s)) at ax a
E‘(v(s))-
This is the “momentum-energy ” vector field along the curve. This vector field has the following “ covariance ” property: If curves a and a1 correspond under a Lorentz transformation, the momentum-energy vector fields also correspond under the same Lorentz transformation. Notice that this behavior of energy and momentum together has no analog in Newtonian physics!
Variational Problems Admitting Given Groups of Symmetries
As we remarked in Part I , there is often a connection between being able to solve a given system of differential equations by quadratures,” and the differential equations admitting a symmetry group of a certain algebraic structure, although it is difficult to make this precise. The differential equations arising from variational problems have a special structure, and this leads to a “
213
16. Symmetries of Variational Problems
further interesting relation to possible symmetry groups. Lacking a general theory, we shall restrict ourselves to sufficiently illustrative remarks. Let M be a manifold with a Lagrangian L given on M . Let 0(L) be the Cartan 1-form on T ( M ) x R = P. A vector field Y o n P generates a symmetry group of L if Y(dO(L))= 0. Suppose that Yl, . . . , Y, are vector fields on P satisfying this condition. Choose functions fi,. . . ,f , on P such that
df, = - Y , _I dO(L)
for a = 1, . ..,r .
Then thef, are constant on the characteristic curves of dO(L);that is, they are “integrals of motion,” in the classical sense, of the extremals of L. For each choice c = (c,, . . . , c,) of real constants, consider the submanifold” of P defined by “
P’ = {P ~ P : f , ( p=) ~ 1 .,. .
= cr>.
( 16.12)
(Of course this need not be a submanifold. Most of our discussion will be only generic,” ignoring the possible singularities that may arise, and is intended only to cover the high points of the theory.) Thus the problem of finding the characteristic curves of dO(L) can be “reduced” to the problem of finding the characteristic curves of &(L) restricted to each of the submanifolds p‘. Since this is a manifold of lower dimension, we have succeeded in reducing the difficulty involved in solving the differential equations that define the characteristic curves of dO(L). However, this remark is independent of the algebraic structure of the Lie algebra generated by Y,, . . . , Y,. Do certain algebraic structures lead to a further simplification ? Choose indices 1 < a, fi, . .. < r , and the summation convention. Then ‘1
The k,,, are the structure constants of the Lie algebra generated by the Y , , . . . , Y,. If Y is a vector field, then Y(fa)= dfa(Y) = dO(L)(Y, Ya).
If Y(dO(L))= 0, and if Y(f,) is expressible as a linear combination of (f,- c,), . . . , ( f , - cr), then Y is tangent to the submanifold P‘ and hence provides an additional symmetry for the characteristic curves of dO(L) that lie on P‘.
In particular, the Y-elements in the Lie algebra generated by Y,, .. ., Y, that are tangent to P‘ form a subalgebra that can be computed in a purely algebraic fashion. (For example, if the algebra as generated by Y,, . . . , Y, is Abelian, this subalgebra is the whole algebra.) This subalgebra acts as symmetries on the differential equations determining the characteristic curves
214
Part 2. Hamilton-Jacobi Theory-Variational Calculus
of dO(L) that lie on P'. The whole algorithm can then be iterated, with the subalgebra acting on P' instead of the whole algebra acting on P . If the process ends with a problem of finding the characteristic curves of a 2-form on a two-dimensional submanifold of P, we shall have succeeded in " integrating the characteristic curves of dO(L) (hence the extremal curves of L) by quadratures," in the classical terminology. As illustrations, let us consider an example of structure for the Lie algebra. First let us suppose m = 2:
cy,, Y2l
= kY2.
(The Lie algebra is then solvable.) We shall also suppose that Y,(f z ) = kfz . Suppose Y = a, Y , u2Y2:
+
Y(fJ =
- a 2 kf2 >
Y ( f 2 )= U l k f 2 .
where Y is tangent to P' if c2 = 0. Thus, we see in the non-Abelian case ( k # 0) that we can expect that only a one-parameter family of the submanifolds P' will admit a further group of symmetries. All this applies if the Yare prolongations of vector fields on M that generate groups of symmetries of the extremals of L. However, the technique of finding normal forms for the vector fields is more useful for practical purposes in this case. Suppose, then, that A',, . . . , X , are vector fields on M such that
xi,(^) = 0,
a = 1,
..., r.
First suppose that
[A', XpI 3
= 0,
1 I a, B I r, x,(q),. . ., Xr(q)
are linearly independent for all q E M . Then, it is easily seen by an extension of the argument used in Chapter 8, that coordinates ( x l , .. ., x,,) can be chosen for the open set of M that we are working in, so that
Then, also,
We conclude that L in the coordinates is a function L(x,+,, . . ., x , ,
i , ,. . . , i,,,). In classical language, the coordinates ( x l , .. ., x,) are ignorable.
Now
O(L) = L,,+idxi - H dt,
with H
= L,+iii - L.
215
16. Symmetries of Variational Problems
We conclude that 6(L)(Xl)= L,+l, . . . ,O(L)(X,)= L,+, are constant along the characteristic curves of dO(L). H is a function H(x,+,, ..., x,, yl, . . . ,y,). Then put y i = L,+ i , and set y1 = cl, . . . ,y , = c, . The Hamilton equations for ~ , + ~ ( .t.).,, x,(t), ~ , + ~ ( t. ). ., ,y,(t) are then, for i = r + 1, . . . ,n, dxj dt
-=
d yi dt
Xn(t),c1, * . . )c r , Y r + i ( t ) , - * . j Y n ( t ) ) ,
Hn+i(xr+l(t),
- = --i(xr+
Ir
. . ., xn
9
.. ., Cr
clr
3
yr+ 1 r
. . ., Y n ) *
If ( x r + l ( f ) ,. . . , x,(t), ~ , + ~ ( t. .) ., ,y,(t)) is a solution of this Hamiltonian system in 2(n - r ) variables (for each value of (cl, . . . , c,)), then xl(t), .. . ,x,(t) are determined by xi(t> = JH,+i(xr+ l(t>, . . . 7
xn(t),
..
~ 1 ,
. ?
cr
7
Y r + l(t>,
..., y n ( t ) >
dt
for i = 1 , ..., r. Of course this is ideal if (n - r) = 2 because the resulting reduced Hamiltonian system can, of course, be further solved " by quadratures," since the Hamiltonian H is still available to us as an integral of motion. For each value of c = (cl, . . . ,c,) is there a Lagrangian R' on x,+~,. . . , x,space whose extremal curves are the curves in x,+ 1, . . . ,x,-space that occur as the projection of solutions of the reduced Hamiltonian system? We mention the classical procedure for finding such an R :
R'(X,+~, . . ., x,, ir+l, . . . , a,) is the function such that
R is called the Routhian. We refer to Whittaker [l], for a fuller discussion and for the solution of many problems using Routh's method. Integrals of the Type of
"
Total Angular Momentum "
Let M be a manifold on which a Lagrangian is given. We have discussed the general principle that the integrals of motion are associated with vector fields on T ( M ) , leaving &(L) invariant. One source of such vector fields that we have exploited is obtained by taking a vector field on M , which generates a group M permuting the extremals of L and prolonging it to T ( M ) . However, there is the possibility of groups acting on T ( M ) , leaving dO(L) invariant, which do not arise as prolongations of vector fields on M . In this section, we
21 6
Part 2. Hamilton-Jacobi Theory-Variational Calculus
present one method of obtaining such symmetries. It will also give us a n opportunity to illustrate our earlier statement, namely, that it is often useful to work with a basis for differential forms on an M that does not arise from a coordinate system. Suppose, then, that w i (1 5 i, j , . . . n = dim M ; summation convention) is a basis for I-forms o n M , with
dWi= C j k i O j
A Ok.
As before, we consider mi as forms on T ( M ) ,pulling them up with the dual of the projection map T ( M ) M without any special notation. Then y i denotes the functions I‘ - y i ( ~ i )= o i ( u ) on T ( M ) . If L E F ( T ( M ) ) is the Lagrangian dL = L i mi L,, dyi , then --f
+
O(L) = L,+ioi - Hdt,
with
H
= L,s+iyi- L.
Let X be a vector field on T ( M ) x R such that
X ( t ) = 0 = X ( H ) = X&+J Then
x A dO(L) = -Oi(x) dL,+i -k L,,+iCjkjOj(X)Ok X will generate symmetries of the characteristic curves of d ( L )if d ( X 1 dO(L))= 0. There is one important case (to which we shall restrict ourselves) where such a choice can be made: O j ( X ) = Ln+j .
Suppose, for example, we assunie that ( c j k i )is skew-symmetric in all three indices, and that L is a nonhomogeneous, regular Lagrangian; that is, det(L,+i,,+j) # 0. Thus X A dO(L) = - L , + j , dL,+i.Hence, with these is an integral of the characteristic curves of assumptions, I = dU(L). We have not yet established that such an X exists. Now 0 = X ( H ) = X(L,+i)yi =
- L , L n + i.’
+ Ln+iX(.Yi) - Li W i ( X > - L n + i X ( Y i )
Thus the relations
must be considered as the conditions for the existence of this new integral of motion, which in the classical rotating rigid body problem is the integral of “total angular momentum.”
16. Symmetries of Variational Problems
217
Rigid Bodies Treated Group-Theoretically The configuration space for a rigid body in Euclidean 3-space, with one point fixed, is just the group of 3 x 3 real orthogonal matrices of determinant 1. (For consider the fixed point of the body as the center of the coordinate system, and consider a fixed orthogonal coordinate system. The position of the body is evidently determined by the position of another orthogonal coordinate system fixed in the body. Two such coordinate systems are related by a 3 x 3 orthogonal matrix. Since the moving coordinate system can be deformed into its original fixed position, and an orthogonal matrix always has determinant k l , it is clear that we are interested only in orthogonal matrices of determinant 1.) The traditional treatment of rigid-body dynamics usually is designed to mask the fact that the configuration space is the manifold of 3 x 3 real orthogonal matrices, that is, the underlying manifold of a Lie group. To restore some balance to this situation, we shall treat things strictly from the grouptheoretical point of view, purposely looking for variational problems that can be solved easily by using symmetry principles. We shall mention only very briefly the relation to the traditional rigid-body problems. Let G be a Lie group. We have defined the Lie algebra of G, usually denoted by G , as the set of one-parameter subgroups of G, and have justified the name “Lie algebra” by showing how the sum and Jacobi bracket of two one-parameter groups may be defined. If G acts a group of diffeomorphisms on a manifold M , we have also seen that, as the infinitesimal version” of the action, G acts a Lie algebra of vectorjields on M . The two most obvious examples of such an action are the action of G on itself by left and right translation. If a basis for G is chosen, the corresponding vector fields on G defined by these two actions form two bases for vector fields (“ absolute parallelisms”) on G, and are, respectively, right- and left-invariant vector fields on G. The basis of 1-forms dual to the basis of left- (or right) invariant vector fields on C is a basis for the vector space of left- (or right) invariant differential 1-forms on (3.1-Conversely, giving a basis of left- (or right) invariant 1-forms on G seems to fix the basis of left- (or right) invariant vector fields by duality. Most Lie groups can be realized simply as subgroups of GL(n, R), the group of all real n x n invertible matrices. Hence, a basis for its left- or rightinvariant vector fields can usually be most easily found by finding a basis for left- or right-invariant vector fields on GL(n, R), and then finding the subspace of these vector fields that are tangent to G. The general theory then tells “
f A differential form on a Lie group that is invariant under left or right translation is often called a Cartan-Maurer form.
218
Part 2. Hamilton-Jacobi Theory-Variational Calculus
this subspace defines a basis for the left- or right-invariant vector fields G. We must compute the left- and right-invariant differential forms on GL(n, R ) . Choose the following range of indices and summation conventions, 1 5 i,,j, . . . 2 n ; x i j will denote the functions on GL(n, R) which assign the entry in the ith row and,jth column to a matrix in GL(n, R). Thus the whole matrix is ( x i j ) .The functions that assign the (i,j)th entry to the inverse matrix of an element of GL(n, R ) are denoted xG' . Thus LIS that
011
x..x:lJ k 1J
= 6.tk = x 1.J . xJrk ' .
Now the following forms define a basis for left-invariant forms, 0.. ij = x. t i 1 dxkj,
(16.13)
while the following forms define a basis for right-invariant ones,
oij= d X i k X k j 1 .
(16.14)
The ortlro~gor~al group, O(n, R ) , consists of all matrices whose inverse is equal to its transpose, and is thus a closed subgroup of GL(n, R).SO(n,R),the voiatiorr group, consists of all orthogonal matrices of determinant I. (Recall that the orthogonality condition requires that the determinant be 1.) Hence, from thegeneral theoryof Liegroups sketched in Chapter 10, one deduces that: (a) O(n, R ) is a closed Lie subgroup of GL(n, R). (b) SO(n, R ) is the connected component containing the identity ofO(n, R); hence it i s an invariant closed Lie subgroup of O(n, R).
(Prove these facts directly, as an exercise!) Now O(n, R), as a submanifold of GL(n, R ) , is determined by the relations XkiXkj =
hjj.
Differentiating these relations, we have, on SO(n, R), &Xkj
+ X k id.Xkj= 0.
Restricting to SO(n, R) (with o i j and oij given by (16.13) and (16.14), ~
; =j ~
dXk; = - 0 j ; .
k d ;~ k = j -xkj
Similarly, one proves that ( w i J ) are skew-symmetric in i and j . Now it is readily verified by counting that dim SO(n, R ) =
n(n ~
-
2
1) '
16. Symmetries of Variational Problems
219
that is, just the numbers of pairs (i,j), with 1 < i <j I n. We conclude: The w i j (resp. 6.1;~) with 1 I i <j I n are, when restricted to SO(n, R) from GL(n, R ) (where they were originally defined by (16.13), (16.14))’ form a basis for the left-invariant (right-invariant) 1-forms on SO(n, R). Let us now return to the case n = 3. We have seen that SO(3, R ) would be identified with the configuration space of a rigid body moving in threedimensional Euclidean space with one point fixed, say, the origin. Recall that this is obtained by choosing aJixedorthonorma1 basis (elo, ezo, e3’) of R3, and choosing one orthonormal basis (el, e , , e 3 ) that is fixed in the body. There is a unique orthogonal matrix ( x j j ) such that ei = x i j e j o .This allows us to identify the given “configuration” of the body with the matrix ( x i j ) . Now a physical motion of the body as a function of time is determined by functions of time: ei(t). This allows us to define the curve ( x i j ( t ) )in SO(3, R) by ej(t) = xij(t)ejO. What sort of Lagrangians on SO(3, R) are suited to describe the (Newtonian) dynamics of rigid-body motion with no external motion ? By general principles (or perhaps, more truthfully, because it is the type we are used to), the Lagrangian should be, at each point q of configuration space, a quadratic form on the tangent vectors to that point. Further, the general symmetry of Newtonian mechanics under rigid motions tells us that this Lagrangian should be invariant under left trunshtion on SO(3, R), since applying a left translation just means rotating each element of the rigid body by the same rotation. These two conditions severely restrict the choice of Lagrangian. To express this analytically, let us take advantage of the fact that dim SO(n, R ) =
n(n ~
-
1)
2
=n
(only for n = 3).
Thus the tangent space to SO(3, R) at a point, say, the identity element, can be identified with R 3 . T o do this explicitly, put
Then (wl, w , , 6.1~) form a basis for left-invariant forms on SO(3, R ) . Let (yl, y , , y 3 ) be these forms regarded as functions on T(SO(3, R ) ) . Evidently, then, the Lagrangian L representing a possible force-free rigid-body motion must be of the form L
= I..y.y. ti
I
I’
220
Part 2. Hamilton-Jacobi Theory-Variational Calculus
with constants I j j forming a symmetric matrix. Of course this matrix is nothing but the moment of inertia matrix of the body. We can obviously exploit the freedom to choose the fixed bases (elo, e,’, e,’) by choosing it SO that the matrix ( I j j )takes a diagonal form; that is,
Iij=O ifi#j,
=Ii ifi=j.
The axes moving the body are then the principal axes of the body, and L takes the form L
= I,yI2
+ I,yz2 + I,y,2.
In accordance with general principles, the Lagrangian for a motion under forces derived from a potential function V defined on configuration space is then just
L
= )(I,y12
+ 12yz2 +
z3y32) -
I/.
Then a curve o(t)is an extremal of L if it satisfies the general Euler equations (with respect to the basis of 1-forms) that were derived in the beginning of this chapter, namely,
The first step in making these more explicit is to find dwi: d o i j = d(xki d x k j )
= dxki A d x k j = d X k i A 6 k k l d X k l j
= dXkiXkjlXktjt
A dxk,j
= Oij, A O j , j .
Now = dw12= 013
dw,
A 0 3 2 = - 0 ~ 1 3 A 023
dw, = d q 3 = w12 A w 2 , = w1 A dw, =do,,
=
o,, A o
, =~-wl
= - w 2 A 0,.
0,. A
w2.
This gives the following values of the nonzero components of (cjki), the structure constants of the Lie group SO(3, R ) : =
Suppose dV
=
-1,
c132
= 1,
c231
=
-1.
V i o i .Then
L j = - Vi,
Ln+i= I i y i
(no summation).
22 1
16. Symmetries of Variational Problems
If yi(t) = o,(o'(t)),the Euler equations take the form (16.15a) 1 2dY2 -= dt
- y I Y3 v 3 -
(16.15b)
(16.15~) These equations are given in every textbook on rigid-body mechanics. (Notice, then, that the configuration space variables d o not appear except in the potential. The problem of finding the extremals a(t) can be divided into two parts: First find yi(t)by solving the three-dimensional system (16.15); then find a(t) as solution of the second three-dimensional system: mi(a'(t>) = Yi(t)*
Except in the most trivial cases, solving (16.15) involves the theory of elliptic functions. (In fact, the applications to rigid-body problems were the principal impetus to the development of the theory of elliptic functions in the first half of the nineteenth century.) Let us suppose that Y = 0, and inquire about the possible existence of integrals of motion of the system (16.15) that arise because of the underlying group invariance. First, the Hamiltonian H = L n , , y i - L = $ Z i y ~= L
is such an integral. (It is also the totd energy in the case of rotating rigid bodies.) We shall now look systematically for more integrals of the characteristic curves of d8(L) that are functions of y , , . . . ,yn alone. Now let us look for a function f of ( y , , ...,y,,) alone, and a vector field Y o n T(SO(3, R)) such that
df = Y_I de(L) = Y_I (dLn+iA
Wi
+ Ln+iCjkiWj A Wk - dH A dt),
and Y(t) = 0. As conditions we have (16.16a) (16.16b) (16.16~)
222
Part 2. Hamilton-Jacobi Theory-Variational Calculus
Now, if Y satisfies these conditions, Y(dO(L))= d( Y(d(L)))= d(d Y _I dO(L)) = ddf dd(O(L)(Y))= 0,
+
+ d(O(L)(Y ) )
so that Y generates a one-parameter group that permutes the characteristic curves of dO(L). Conversely, if a vector field Y on T(SO(3,R))satisfies Y ( t ) = 0 (16.16a; 16.16b)
and
A
d w , ( Y ) = 0,
(16.17)
then such anfexists, is an integral of the characteristic curves of dd(L), and is a function of y,, . . . ,y, alone. Writing out the condition of (16.17) in more detail, we have Ln+j , n + j d y j
(16.18)
dmi(Y) = 0
A
(since L is a function of y l , . . . , y , alone). Now L is a regular, nonhomogeneous Lagrangian ; that is, det(L,+ i , ,+ The conditions then become: mi( Y ) is a function of y,,
j)
# 0.
. . . , y , alone.
We shall not go into a deep analysis of these conditions here. Consider the simplest choice, namely, mi( Y ) = a i j y j , with constant a i j ;
Y(yi) = 0.
The condition on Y then becomes Ln+i,n+jaikdyj A d y k = O . Now 0 Ln+i,n+j = Zi Thus,
C Iiaik d y , A dyk = O i, k
or
ifi#j if i = j .
I i a i k= I k u k i (no summation).
Condition (16.16a) requires that
Now ( c j k i is ) skew-symmetric in all three indices. Then this condition can be realized by choosing a .. =
(no summation).
16. Symmetries of Variational Problems
223
It is then clear that all desired conditions are satisfied.? Then (no summation);
mi(Y ) = I , yi hence,
df =
i
Iizyi d y ,
or
2f
+ ( I z y,)’ + ( I , y,)’.
=(I,Y,)~
(16.19)
This is obviously an integral of (16.15) independent of the energy integral I,y12 I,y,’ I,y,’ (unless, of course, Il = I, = I , , which is the trivial case, since the right-hand side of (16.15) is identically zero anyway). Physically, it is the integral of total angular momentum for the rigid body, and the reader will readily. verify that it is the integral found by more general arguments in the preceding section. These two integrals enable one to reduce (16.15) (remember that V = 0) to three separated first-order differential equations for y l , y , , y 3 , which can be solved with elliptic functions. In fact, as we show later, following Tricomi [l], these equations can be used to define the Jacobi elliptic functions and derive their principal properties.
+
+
The Euler Angles for a Rotating Rigid Body We have seen that the Euler equations for the extremals of a left-invariant variational problem on SO(3, R)(or any Lie group for that matter) split up in two parts: To find an extremal curve o(t), first one solves for yi(t) = o,(o’(r)), for i = 1,2, 3 (q,o,, o3 a convenient basis for left-invariant forms) then finds o itself. Of course the question arises exactly how to describe curves on SO(3,R) in explicit terms, since it is a compact manifold and hence cannot be covered by a single-coordinate system. At least two methods can be used. First, we have defined the left-invariant form as
,,
0..=
x,k dxkj
(1 I i , j , k, . . . I 3; summation convention)
where x i j are the functions on SO(3, R) which to every matrix assign its (i,j)th entry. Of course the x i functions x i j are bound by the orthogonality conditions x i j x k j= a,,. In principle, these relations could be solved for three independent functions to define a coordinate system for a piece of the manifold, but we can be certain that this would be too awkward to be of much value. However, if the “ momentum functions y i j ( t )= o i j ( t ) have already ”
t In making these choices, we must confess that we have been guided by knowing the answer via the analogy with rigid-body dynamics. The reader is invited to try to work out the necessary conditions to a conclusion.
224
Part 2. Hamilton-Jacobi Theory-Variational Calculus
been found, the functions xij(a(t))that actually describe the extremal are obtained as solutions of
Now these form a system of linear, ordinary, time-dependent differential equations for the functions t + x i j ( o ( t ) ) ,so there are certainly methods available to solve them, although they, too, may probably not be too practical for computations or for predicting the qualitative properties of the extremals. The second method proceeds by introducing a coordinate system for a piece of SO(3, R) that is well adapted to describing the group structure of SO(3, R) (hence is also well adapted to the physics, since the physics and the group theory more or less coincide), namely, the Euler angles. They can be described group-theoretically. Consider the set of matrices : cos 0, -sin 0, ( 0
sin 0, cos 8, 0 0 O1 I
=
A(&),
-
03
< 0, < 03.
(16.20)
They form a one-parameter subgroup of SO(3, R); in fact, each one just represents a rotation about the x3-axis of angle el. Similarly, consider the one-parameter group of rotations about the x,-axis :
i' 0 0
0 cos 0, -sin 8,
sin 8, cosO 0, I
= B(B,),
-co < O2 < co.
(16.21)
It is seen now that each orthogonal matrix (of determinant 1) can be written as a product, A(0,)B(0,)A(0,), of three about the two axes. It can be verified that this representation is unambiguous for a certain open subset of SO(3, R) and for O,, U, , 0, suitably restricted. Thus 0,, 0,, and 0 , serve as a coordinate system for a piece of SO(3, R). In fact, a suitable (but tedious) calculation shows that
+ cos 0, do,,
(16.22a)
w 2 = sin 0, cos 0, d0, - sin 0, do,,
(16.22b)
o1= sin 0, sin 0, do,
o3= cos 0, d0,
+ d03.
Notice, for example, that these forms are not independent for 0, so the Jacobian of the map
(16.22~) = 0 = 0, = O3
,
225
16. Symmetries of Variational Problems
is zero at 81 = 0 = 8, = 0 3 . Now, the Lagrangian of the left-invariant variational problem is just
L = I,(sin 8, sin 83 8,
+ cos 83 8,), + I,(sin 0, cos O3 8, - sin 83 8,), + Z,(COS8, 8, + 8,),.
Thus 0, does not appear explicitly in L. In fact, this is just the coordinate system chosen so that the infinitesimal generator of left translation by A(0) is just dl88,. Note that the infinitesimal generator of right translation by A ( 0 ) is just 81883 . Thus the condition that the rigid body be symmetric about the x,-axis is that the Lagrangian not depend on e3 either. Clearly, the condition for this is Il = I , . Let us now compute the Hamiltonian for the Lagrangian L = Ilyt2 + I , y,, + 13y3,. Suppose the Ln+i and Lk+iare such that
dL = Ln+i dyi+ - - .;
dL = Lk+iddi+
- - a .
Now L,+ = 2Ziyi (no summation). The d iare related to jiby solving (16.22) and making the substitutions w -+ j , d0 -+ 8.
8, =
+
sin O3 y , cos 83 y , , sin 8,
8,
Thus, Ln+1
= cos
O,(sin 83 y ,
83 y , - sin 0, y , ,
+ cos 0, y,).
sin O3 sin 0,
+ En+,cos O3 - En+,cot
cos 8 3 sin 8,
cot
= En:,,,-
Ln+2= Ln+3
= y 3 - cot
8,
En+, _ _ - En+,sin 83 -
82
sin Q 3 ,
82
cos 83,
=&+3*
Now, if p i = L,+i, the Hamiltonian H is just L written in terms of the p (since L is a quadratic Lagrangian):
If pi’= LA+i , we know from earlier work that the Hamiltonian for L in the 8,, 0, , 83 coordinate system is obtained by simply substituting in the values of p i = L, + in terms of pi’= LA + i :
+ pz’ cos 8, - p3’ cot 8, sin 0, 1
p,‘ sin 8, - p3‘ cot 0, cos 8,
226
Part 2. Hamilton-Jacobi Theory-Variational Calculus
For example, look again at the case I, = I , ; that is, suppose that the variational problem is invariant under left and right translation by 4 0 ) . Then
As predicted by the general theory, the Hamiltonian does not depend on 8, and Q 3 , so pl‘ and p3‘ are constants, say, C, and C2. Then the Hamilton equations for 8,, p,’ give
do, =dt
1 21,
Pz‘.
Also, we know from “conservation of energy” (that is, the fact that the Hamiltonian is a constant of motion) that
(
_~ ‘lZ
41, sin2 0,
+4112($)’
+ C3’ cot2 8, or
(2)’
- 2C,C,
C,’
= constant = E ,
1
= - 41,’
x (C,,
sin2 8,
+
Change variables to x dx
1 1 2 ( x )=
c,2
cos2 0, - 2 c , c , cos 8,)
+ E - (c32/413) 11
= cos 8,.
Then
( E - $)Id1
- x2)
+
- (C,
- C3X)2,
which can certainly be solved in elementary terms without elliptic functions. The most important point is to compute the roots of the second-degree polynomial on the right-hand side. The solution will then oscillate between these limits. If our rigid body is a top, 0, will measure the angle between its x,-axis (in a coordinate system fixed in the top) and the fixed space x,-axis. This leads to the typical rising and falling motion of the top. I n fact, we see that we can add a potential-energy term of the form V(cos 0,) to our Lagrangian without affecting this qualitative picture of the motion. By the general principles, this merely adds V(cos 0,) to the Hamiltonian: p,’ and p3’ remain constants of motion, and x(t) is again determined by an equation of the form
16. Symmetries of Variational Problems
227
Hence the solution will oscillate between two roots of f ( x ) if it starts out between them. For example, if V(cos 8,) = a cos 8, (corresponding to the example of the “ heavy symmetrical top,” which is found in every textbook on rigid-body mechanics; for example, Goldstein [I]), the effect is to makef(x) a cubic polynomial in x, requiring the introduction of elliptic functions. Once we have found x ( t ) = cos 8,(t), the other Euler angles el, 63 can be found by a quadrature from the Hamilton equations :
do3 aH -=-=dt
api
1
)
2C3~2-2C1~ +c3 1-xz 41,
(
We can now summarize the qualitative features of our discussion: We have considered variational problems of Newtonian type on a manifold M that admits a relatively large group, namely, left and right translation by SO(3, R). However, the largest Abelian subgroups of this group are just two-dimensional. The Euler angles define a coordinate system in which one of these twodimensional Abelian subgroups takes its normal form, this form being the natural coordinate system for discussing variational problems that admit the Abelian groups as symmetry groups. (All these two-dimensional Abelian groups are conjugate within the big group, so the seemingly arbitrary choice of one of them really does not matter. In terms of the theory of compact Lie groups, these two-dimensional Abelian subgroups are Cartan subgroups of SO(3, R ) x SO(3, R).)These variational problems are “ integrable by quadratures,” and in fact form most of the classical problems of rigid-body mechanics that have been found to be “integrable by quadratures” (except for the case discovered by S. Kovalewska, which does not seem to be explicable group-theoretically; see Golubev [I] for a full discussion). The case of a variational problem admitting left translation by SO(3, R)as a symmetry group (the rigid body with no external forces) seems to be a typical problem admitting a non-Abelian group of symmetries, a class of problems on which more research needs to be done. (The maximal Abelian subgroup of SO(3, R) are just one-dimensional, that is, the one-parameter subgroup, so that left invariance provides only a one-dimensional Abelian group of symmetries, which is not enough for integrability by quadratures.”) In fact, the group-theoretic properties of SO(3, R) are connected with the basic properties of the elliptic functions, as we have tried to show in Chapter 17, but it is not yet possible to put this connection into definitive form. Finally, we want to describe how the parametrization of SO(3, R) by the Euler angles fits into the general theory of Lie groups, particularly the theory of symmetric spaces. (In this paragraph, we shall be using Helgason’s book [l] “
228
Part 2. Hamilton-Jacobi Theory-Variational Calculus
as a basic reference, and shall assume that the reader is familiar with the general notions found there.) Let G be a connected Liegroup, and let s: G C be an automorphism of G such that s2 = identity. (s is then called an inuolutiue automorphism.) Let --f
K
= {gE
G : s(g) = g}.
Then K is a closed subgroup of C, called a symmetric subgroup of G, and GIK is called a symmetric homogeneous space. We shall deal here with the case: K compact. (GIK is then called a Riemannian symmetric homogeneous space.) Now s defines an automorphism of G , the Lie algebra of G, that will also be denoted by s. For example, this can be seen by identifying G with the set of one-parameter subgroups of G. If t -+ g(t) is such a one-parameter subgroup, its transform by s is the one-parameter subgroup, that is, t sg(t). We see, then, that --f
s ' (X )
=X
for all X E G .
Since s is a linear transformation of G, we can split G as the direct sum K O P , with
K
=
{ X E G: s ( X ) = X } ,
P
=
{ X E G :s ( X ) = - X } .
From the fact that s is an automorphism of G, we see that Ad K(P) c P,
[P, P] c K .
Thus Ad K induces a linear representation on P (which is essentially equivalent to the linear isotropy group of the homogeneous space GIK). Let A be a maximal Abelian subalgebra of P. One basic theorem of the theory of symmetric spaces is that Ad K(A) = P. A is called a Cartan subalgebra of the symmetric space G/K. (All maximal Abelian subalgebras of P are conjugate under Ad K.) Now let P = exp(P) c C. The exponential map of G + G usually has singularities. Thus it is a remarkable fact that P can be shown to be a closed submanifold of G. In fact P is the connected component containing the identity of {g E
c:s(g) = 9-1).
Elements of P are called transuectiuns of the symmetric space G / R Let A = exp(A) c P. Again it can be shown that A is a closed submanifold of P (diffeomorphic to a multidimensional torus if G is compact, to a Euclidean space if G is noncompact, and if K is a maximal compact subgroup of G). In a sense, this " flat " submanifold A is the " core " of the symmetric space; many
229
16. Symmetries of Variational Problems
of the important geometric and group-theoretic facts about G / K can be reconstructed from knowledge of A and a certain finite group acting on A , the Weyl group. (The Weyl group can be defined as follows: Let N ( A , K ) and C(A, K ) be, respectively, the normalizer and centralizer of A in K; that is, N ( A , K ) = { k E K : Ad k(A) = A )
C ( A , K ) = { k E K : kak-'
=a
for all a
E
A}.
Then C(A, K ) and N ( A , K ) are closed subgroups of K, C(A, K ) is a normal subgroup of N ( A , K ) , and the Weyl group is the quotient group N ( A , K ) / C ( A , K). A more geometric way of looking at this is to notice that each k E N ( A , K ) induces a transformation, namely, Ad k, on A, and C(A, K ) is just the subgroup of those elements that act trivially on A.) Now the relation Ad K(A) = P implies Ad K(A) = P. Further, it can be proved that G=P-K.
Thus,
G = KPK-~K= KPK,
that is, the map a : K x A x K -+ G such that a(k, a, k') = kpk' for kpk' E K, a E A is onto G. Now C(A, K ) can be made to act as a transformation group on K x A x K as follows: c * (k,a,k') = (kc-', a, ck')
for c E C(A, K ) , k, k' E K, a E A.
Notice that N(C
*
(k,U , k')) = kc-', a, ck')
= kc-lack' = kac-'ck' = a(k,a,
k');
that is, a maps each orbit of C(A, K ) into a point. Also, C(A, K ) acts on K x A x K i n such a way that no element except the identity transformation has a fixed point. Hence the orbit space C(A,K ) / K x A x K is a manifold; a passes to the quotient to define a map of the orbit space onto G, which we shall denote by E. Now the orbit space and G have the same dimension: CY is not quite a diffeomorphism (this would be impossible topologically, if for no other reason), but the points of G that are regular with respect to a t are sufficiently plentiful in the sense that their complement in G is the union of a finite number of submanifolds of lower dimension.
4:
t If M + M'is a map of manifolds of the same dimension, a point p' E M ' is regular with respect to 4 if has nonzero Jacobian at each point of + - ' ( p ' ) . If is onto, a basic general theorem on the theory of manifolds says that the complement in M' of the set of regular elements is of measure zero.
4
4
230
Part 2. Hamilton-Jacobi Theory-Variational Calculus
As an example, we can apply this construction to the case G = SO(3, R), K = one-parameter group of rotation about the x,-axis. Explicitly,
.i( 1
cos0 sin 0 0 -sin 0 cos 0 0 = A ( O ) : O 5 0 < 2 7 t 0 0 1
i .
We want to exhibit the involutive automorphism of G that exhibits K as a symmetric subgroup and G / K as a symmetric homogeneous space.? Since K is a one-parameter group, a reasonable choice is just Ad of an element of K of order 2; namely, s(g) = A(n)gA( - 17)
for g E SO(3, R).
We shall leave it to the reader to show that this choice does the job. Let us compute P : P = (9 E G : A(n)gA( -n) = g - ' ] .
Consider for the moment that matrices define linear transformations on 3-vectors, a 3-vector denoted by L'. Any rotation in 3-space admits one and up to a constant multiple only one invariant vector, say, g u = u. Then A(n)g = A(n)v = g-'A(n)zi
or
g(A(n)u)= A(n)u.
Thus, A(n)c' = +v.
Case I A(n)v = v
Then c' is the same invariant vector as the whole one-parameter group 0 + A ( 0 ) ; hence g commutes with each A(B), or g2 = A( -2n) = identity matrix.
Case 2 A(n)u = - u Then r is perpendicular to the invariant vector of the one-parameter group 0 4 A ( @ , that is, lies in the (xl, x,)-plane. Since we are interested only in the connected component of P containing the identity element, we see that Case 2 is the only relevant one, and P can be
t In fact, C / K is just the two-sphere (that is, a Riemannian manifold of constant curvature). This fits in with the alternate geometric definition of a Riemannian symmetric space (modulo certain global complications) as a Riemannian manifold whose sectional curvatures are invariant under parallel translation.
16. Symmetries of Variational Problems
23 1
considered as the set of rotations about axes lying in the (xl, x,)-plane. In particular, it contains the one-parameter group
i
1 0 O+B(fI) = 0 cos 8 sin 8 0 -sin 8 cosO 8 I of rotationsabout thex,-axis. Now the centralizer of A in Kis the identity; hence the map (01,8,,8,) + A(8,)B(8,)A(8,), which defined the Euler angle parametrization of SO(3, R),is essentially just the construction a : K x A x K + G outlined for the general symmetric case.? (Another more qualitative way of putting this is to say that in this case K and A turn out to be one-dimensional (in fact, circles); hence K x A x K can be described by three angular parameters. The specific choices we made are unimportant, since any two choices are related by a conjugacy.)
Exercises 1. Prove the last statement of Theorem 16.1. 2. Suppose (b is a diffeomorphism of R2" such that (b*: F(R2")-+ F(R2")is a Lie algebra homeomorphism relative to Poisson bracket. Prove that 4 is a canonical transformation. 3. Investigate (using Theorem 16.6 and (16.7)) the solution of the two- and three-body problems of celestial mechanics that are also orbits of oneparameter groups of symmetries. 4.
Verify the formula given for the
"
Routhian."
5. Suppose G is a Lie group and (ai), 1I i In, are a basis for the leftinvariant form on G. Let yi be the functions on T(G) such that yi(u) = o i ( u )
for u E T(G).
Let L = Ziyi2define a Lagrangian and a variational problem on G. Work out the general conditions that a polynomialf(y, , . . . ,yn)be an integral of motion. (Hint: Is there a relation with the Casimir operators of the universal-enveloping algebra of the Lie algebra of G ? See Hermann [8] for the notion.)
6. Prove Formula (16.22).
t I owe these remarks concerning the general setting of the Euler angle construction to C. C. Moore.
17 Elliptic Functions Unaccountably, the theory of elliptic functions has virtually disappeared from recent mathematics or physics literature, despite the fact that it is amazingly rich in structure, theorems, and mathematical or physical intuition. Of course we cannot hope to give the subject the systematic treatment it needs, and shall limit ourselves to some properties that follow from the fact that they can be dejned as the functions describing rigid-body motion. Our treatment by means of differential equations then follows up an idea briefly sketched by Tricomi in his book on differential equations [I, pp. 19-26]. The most readily accessible treatment of elliptic functions along classical lines can be found in Whittaker and Watson [l], although they neglect the geometric side of the theory. Recall that the problem of motion of a rotating rigid body with no external forces leads to differential equations of the form : ( 17.1a)
(1 7.1 b) (17.1~) We have seen that the underlying rigid-body problem has two algebraic integrals, namely, those of “energy and “total angular momentum ”: ”
I , y,’ ~,’y,’
+ 1, y2’ + z3y,’
=c
+ Z,’y,’ + ~ ~ ’ y =, ’m
(= constant)
(1 7.2)
(= constant).
(17.3)
It can easily be verified directly from (17.1) that (17.2) and (17.3) are indeed integrals; that is, they are constant along solutions of (17.1). We shall suppose that Zl# 1, # I , # 0. We have already seen that if this were not satisfied (for example, if two of the I were equal), then (17.1) could be solved in terms of sines and cosines. If, on the other hand, one of the I is zero, it is clear that 232
233
17. Elliptic Functions
(17.1) can be solved in terms of exponentials. Finally, we are not necessarily assuming that the I are positive (as they are in the rigid-body problem). One of the variables can be eliminated from (17.2) and (17.3) to obtain algebraic relations among the other two :
+ (I, I , - ~ , ~ ) + y ,(I,~ 1, - ~ (z,f, - ~ , ~ ) y+~(z’,I , - ~ (zlz2- ~
~ ’ ) y (~z *~ I ~~
~= Z,C ~- m, )
y
(17.4) ~
~
~= z2c~- m, )
y
(17.5) ~
~
~= Z3c~ - m .)
y
(1~7.6) ~
These can be substituted into (17.1) to actually “ solve” (17.1). For example,
for suitable constants u, p, y, 6, or
The integral on the left is an “elliptic integral,” so this solution does us little good in practice. In fact we are usually interested in the reverse process, namely, inverting an elliptic integral to make it part of a system of the type of (17.1). The remarkable property of system (17.1) is that any system of solutions ( y l ( t ) ,y2(t),y3(t)) satisfies an algebraic “ addition formula ” that is independent of 11,12,Z3 , namely,
Two similar identities are obtained by permuting y,, y , , and y 3 . Further, (1 7.4) through (1 7.6) can be used to obtain a n algebraic formula connecting, say, y2(s t ) to y i ( t ) and y,(s), for i = 1, 2, 3. T o prove (17.7), one has only to apply the differential operator (a/&)(ajds) to the left-hand side of (17.7) to verify by direct computation that it vanishes when combined with (17.1) through (17.6); hence it is a function of (s 6 ) . The function given on the right-hand side is obtained by setting t = 0. The solutions of (17.1) with special choices of the adjustable parameters have explicit names-the Jacobian elliptic functions:
+
+
y l ( t ) = snt,
y 2 ( t ) = cnt,
y 3 ( t )= dnr.
(17.8)
234
Part 2. Hamilton-Jacobi Theory-Variational Calculus
For
--- I 3 I1
-
- 1,
I2 -
13
k2,
y,(O) = 1;
~ ~ (=0 1. )
Putting these values in (1 7.2) through (17.7) gives the classical addition formulas for the Jacobian elliptic functions, the treatment of which can be found in complete detail in all the reference books. For example, sn(s
dn(t) + cn(s)sn(t)dn(s) + t ) = sn(s)cn(t) 1 - k2sn2(s)sn2(t)
5
where lc is a free parameter, so really the Jacobian functions depend on t and k , but it is customary to express this explicitly. Let us return to the study of system (17.1).
LEMMA 17.1 Tf any two functions among those constituting a solution (yl(t), y2(t), y 3 ( t ) )of (17.1) vanish for one value oft, then the three functions are constant.
Proof. Let us suppose, say, that y,(to) = 0 = yz(to). Then, if (yl*(t), y2*(t),the y3*(t))are defined as follows: Yl*(t> = 0,
Y2*(0 = 0,
Y3*(t) = Y&o).
Notice that they define a solution of (17.1) which satisfies the same initial conditions at t = to as does our original solution. By the uniqueness theorem for ordinary differential equations, they must coincide. Q.E.D. I n studying the properties of a system of differential equations such as (17.1), it is often a good practice to start by finding how the system behaves when transformed by various groups of transformations of the underlying space. We shall now do so, considering only the simplest group that seems interesting. (A more systematic treatment would be very interesting, but would carry us too far afield.) Let us begin by rewriting the differential equations (17.1) as a Pfaffian system : (17.9a) 0 = 0 1 = 11 - (13 - 121Y2Y3 dt, (17.9b) 0 = 0 2 = 12 4 ' 2 - (11 - 131.~1~3 dt, (17.9~) 0 = 0 3 = I , dy3 - ( 1 2 - Ii)y,y2 dt.
235
17. Elliptic Functions
We shall consider only the group of linear transformations of ( y l ,y 2 , y 3 , t)space that are dilations, that is, that multiply the coordinates by constants. Thus, if Q, is such a transformation, Q,*(t)= Adt,
4*(yi) = A i y i
for i = 1, 2, 3 (no summation).
Consider another system of the same form as (17.9): 01‘
= 11‘
02‘
= I,’
03’
= 13’
dy, - ( 1 3 ’ - Z 2 ’ ) ~ 2y1 dt, - (Zi’- 1
3 ’ ) ~ 1 ~dt, 3
dy3 - (12’ - I ; ) y l y , dt.
(17. IOa) (17. lob) (17.10~)
The condition that 4 carry the integral curves of (17.10) into integral curves of (17.9) is that the 4*(0) be linear combinations of the o’, that is, that we have a relation of the form
Q,*(wi)= C a i j o j f j= 1
for
i = 1, 2, 3.
Comparing the coefficients of the dy, we have
aij = 0 if i # j ,
A i l i = aiiJi’,
or
A iZi
a.. = -
’
Zi‘
(no summation).
Thus, Zl’(Z3
- I3)Az A 3 A = AiZI(Z3’ - 1 2 7 ,
- Z3’), Z3’(12- Z1)A1A2 A = A , Z3(Z2’- 13’). 12‘(ZI
- 13)AIA3 A = A ,
ZZ(11’
(17.1 la) (17.11b) (17.1 I c)
Notice that if the Z and I’ are prescribed, one of the A’, A , , A , , A can be prescribed arbitrarily. If Zl= Ill, 1, = Z2’, Z3‘,then (17.1 1) holds if and only if (1 7.12)
Now, obviously system (17.1) is preserved under time translation. Thus, if one function of a triple (yl, y 2 ,y 3 )that solves (17.1) vanishes at some value of t , then combining a permutation o f y , , y 2 , y 3 , a transformation of type (17.1 1) and a time translation will send the given solution into the Jacobian elliptic functions (possibly needing complex values for the parameters of the transformation). Thus, the first problem is to find those solutions.
236
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
LEMMA17.2 Let ( y l ( t ) ,yz(t),y 3 ( t ) ) be a nonconstant solution of (17.1) defined for = 0. Then, both y 2 and y , must have a zero
aI t I 2b such that yl(a) = y,(b) in the interval a < t < 2b. Proof.
Suppose otherwise: For example, suppose that y 2 ( t )# 0 for
uI tI 2b. Without loss in generality we may suppose that a = 0. Then (17.1)
takes the form
YZ(S) Yl(t)Y3(S)
+ 0 + YLO) + t)Y3(0)
+ Y2(t)
- Y&
f Y3(t)Y1(S)
- Yl(s
*
By Lemma 17.1, y3(0) # 0. Since the denominators are nonzero, y1(2b)= 0. Hence, putting r = 2b - t , we have
vl.(t)y3(2b - t , + Y3(l )yl(2b - t , = O. Equation (17.6) gives a relation of the form y3’ Y , ( t ) 2 ( v l ( 2 b-
v + B)
= (vl(t)2
= ay12
+ p. Then
+ PlYl(2b - t Y >
or P(yl(t)2 - y,(2b - t)’) = 0. Hence /3 = 0, since y , is nonconstant. But then y 3 ( t ) = 0, and Lemma 17.1 forces y1 constant; contradiction. LEMMA17.3 Suppose that ( y l ( t ) ,y 2 ( t ) ,y 3 ( t ) )is a nonconstant solution of (17.1) defined over - co < t < 03. Then at least one of the components of this solution must vanish at least once on - co < t < co. Proof. Suppose otherwise: Then the derivatives of the components must also be everywhere nonzero. Since 11,I , , and I , are nonequal, at most one of the right-hand sides of (17.4) through (17.6) can vanish. Let us suppose, then, that I , c - m and I , c - m are nonzero. Then
with
6 # 0.
At most transforming the system by equations of the type of (17.1 l), we can suppose that
237
17. Elliptic Functions
Thus y,(t) > 0, and (dy,/dt)(t)< 0 for 0 5 t < co; hence
t =
I”‘o’ Yl(f)
dx
J<..‘-
- P)(yx’ - 6 ) < J:“)
J(a.2
dx fi)(rx2
-
s>‘
But this integral converges, since at each possible singularity the integrand has a singularity of order -3 (since /3 # 0, 6 # 0), which gives the contradiction. Hence, in studying a nonconstant solution ( y l ( t ) ,y 2 ( t ) ,y 3 ( t ) ) ,we can suppose (after making a time translation and a permutation) that (17.1 3)
Y l ( 0 ) = 0.
(At this point a further transformation can be made by throwing the solution onto the one defining the Jacobian elliptic functions, but we shall not be particularly concerned with that here.) By Lemma 17.1, and (17.4) and (17.5), I, c - m # 0 and I , c - m # 0. Also, if (17.12) is satisfied, v1(-t> = -Y,(t);
Uz(-t) =Yz(t);
Y3(-t)
=r,(t).
(17.14)
To prove this, one can choose A , A , , A , , A , to make a change of variables such that
A, By (17.11), I , Zl(Q
= A , = 1,
A,
= 11’,I, = 12’, 1, = I,’.
=
-JJ,(-O,
z2(t)
=
-1 = A .
Thus the new functions
=Y 2 ( - 0 ,
Z d t )
= y,(-t)
satisfy the same system (17.1) as the old, with the same initial condition; whence, (17.14). Clearly we can use a change of variable of the type of (17.12) to suppose in addition that YZ(0) > 0,
Y3(0) > 0.
(17.15 )
So far we have been proceeding without assumptions of the signs of Zl,I , , and Z3 . Now the global behavior of the solutions is radically different, depending on whether or not all the signs are the same. The case of like signs is the one that occurs in the force-free rigid-body problem; the case of unlike signs will be left as an exercise. We can suppose without any essential loss in generality that I , > 0,
I , > 0,
I , > 0.
(17.16)
238
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
We shall now show that we can suppose (at most permuting y , and y,) that dY2 ( t )
dt
<0
for sufficiently small t > 0.
(17.17)
Suppose otherwise : Then,
dY3 J3 dt = (1, - Zl)yl y z ( t ) > 0
Case I
for sufficiently small t > 0.
(dy,/dt)(O) > 0.
Then y l ( t )> 0 for small t > 0, and J , > J, . Then, also, I , > J , and I , > I,, which is a contradiction. Case 2 (dy,/dt)(O)< 0
Then J3 < 1,. Hence, also, I l < J 3 , J , < I , , which is again a contradiction. Now, let K > 0 be the first positive real zero ofy, or y , . By the mean-value theorem, y,(t) # 0 for 0 < t 5 K. (17.13) Further, if (dy,/dt)(t)> 0 for t > 0 sufficiently small, then Y2(K) = 0.
(I 7.19)
We can now show how to compute K. For example, suppose that YAK) = 0.
(17.20)
Then (dy,/dt)(t)< 0 for 0 < t s K . We can solve (1 7.4) and (17.6) in the form dY
2 = -JG,y,2dt
~
-~
pz)(yzy,2 - 6,)
for
oI tI K.
Thus, t=
dX
-
for 0 1 t
I
K.
Hence, (17.21)
In particular, notice that K < 00.
239
17. Elliptic Functions
Once we have found a K that is a zero of y,(t), what can be done with it? The addition formula (17.7) enables us to extract considerably more information, as described in the following theorem.
THEOREM 17.4 Let (y,, y , ,y 3 ) be a nonconstant solution of (17.1) (with no assumptions on 11,I , , Z3 other than I , # 0, I , # 0, I , # 0 ) ; let K # 0 be such that yl(0) = 0 = y 2 ( K ) . Then Yl(t + 2 K ) = -.Yi(t),
~ 3 (+ t 2 K ) = y3(t),
y2(t + 2 K ) = - ~ z ( t ) ,
for all t for which these make sense. In particular, y1 and y z are periodic with period 4K, while y 3 is periodic with period 2K. Proof. Let us rewrite (17.7) in the form
(y2(t)+ y2(s))(yl(t+ s)y3(0))= (V2(l Put t = K
= s.
+ s> +y2(0))(yl(t)y3(s)+ Y3(t)y(sl)).
Now either
y2(2K) + y2(O) =
y1(f)y3(2K- t,
Or
+ Y1(2K - t)Y3(f) = O. (17.22)
We shall rule out this second identity. Now (17.5) can be solved in the form Y3
,= I , c - m - ( I , I , (12 13
Z12)y1Z
- 13,)
Substituting in this would give 1, c = m. In particular, y3(0)= 0, which would contradict the nonconstancy of the solution. Now we can use (17.7) again, with a permutation of yl, y , , and y 3 : [Yl(t)
+ y,(s)lb,(t + S)Y3(0)+ Y3(t + slYz(O>l = [Yi(t
Putting t
=s =K
+ $1+ Y I ( O ) I [ Y ~ ( ~ ) YJ’3(t)Yz(s)l. ~(~)
makes the right-hand side zero; hence = y2(2K)Y3(0)
+ y3(2K)yZ(o),
whence, using ( 1 7.22), we have Y3(2K) = Y3(0).
(17.23)
Similarly, playing with the other permutations proves that y1(2K)= 0. Now put
(17.24)
240
Part 2. Hamilton-Jacobi Theory-Variational Calculus Zl(t) =
-Yl(t)>
z&) = - Y z ( 4
z3(t>= Y 3 W .
Notice that (z,(t),z2(t),z 3 ( t ) )is also a solution of (17.1), which has the same initial conditions at t = 2K as does ( y , ( t ) ,y 2 ( t ) ,y 3 ( t ) )at t = 0. Thus -Y1@> = Y l ( t + 2 K ) , -Y2@) = Y 2 ( f
+ 2K),
Y 3 0 ) = Y 3 ( t f 2K)>
which proves the theorem. Remark. Notice that Theorem 17.4 is purely formal and holds for the complex variable case as well. However, to consider this extension to complex variables would take us too far afield, and we must refer to Whittaker and Watson [l].
18 Accessibility Problems for
Path Systems
General Remarks From a higher point of view, the calculus of variations should be regarded as the “theory” of a real-valued function on an infinite dimensional space, namely, the space of curves on the underlying configuration space. One might think that it would be possible to eliminate the confusion and ambiguity that afflicts the subject by treating it consistently from this point of view. However, in searching for such a panacea, one must have respect for the fact that the calculus of variations has the longest history of any currently active branch of mathematics; in addition, this basic insight into the calculus of variations has been explicit since Volterra’s pioneering work on ‘‘infinite dimensional manifolds” in the 1880’s. In this chapter, we hope to demonstrate that this insight is useful for developing intuition into the mathematical structure of the subject. However, the foundations are still unsettled; there is no point in committing to print a full-scale exposition. Let M be a manifold and let P ( M ) denote the space of paths of M . (Recall that a path is a continuous map of an interval [a, b] of real numbers into M that is piecewise Cm.)By a path system on D we mean a subset (denoted, say, by n ) of P ( M ) having the property that if a path 0 belongs to 71, all paths obtain from 0 by changing the parametrization of 0 and by restricting (r to a subinterval of its domain of definition also belong to n. Now, our basic intuition is that P ( M ) is to be regarded as an “infinite dimensional manifold” and that we shall be considering path systems obtained by setting a set of real-value functions on P ( M ) (usually uncountably infinite in number) equal to zero. Thus, in a sense, we are to regard a path system 71 as a “submanifold” of P ( M ) . Let n be a given path system on M . Since paths can be freely parametrized, let us suppose they are defined over the interval [0, 11, with the parameter denoted by t. For p E M , let n(y) be the set of all points of M that can be joined t o p by a path in n.We can look at this in the following way: Consider the subspace of paths in n that begin at p , denoted say by np, and map n p + M by sending a path into its end point. The “ accessibility ” problem, in full generality, is to describe n(p) in terms of the differential geometric invariants of n. 24 1
242
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
We see the analogy with a problem for ordinary finite dimensional manifolds. Suppose A and B are two such spaces, and 4 : A + B is a map, with dim B 5 dim A . We are interested in finding conditions that guarantee that 4 is onto B. One such condition, of a local nature, is given by the implicit function theorem: For a point a E A , 4* is a linear map A , B,,,,. If it is onto, 4 covers a small neighborhood of ~ ( c I )If. 4*(Aa)= B,,,, for all CI E A , then It is easy to see that this local fact can be “expanded out” globally, provided there is some condition of uniformity for the norm of 4* as CI varies on A . Considered from the opposite point of view, the “critical points” for the discussion of onto-ness are the points CI E A such that 4,(A,) # Bb(,). Dually, such a critical point would be a point having the following property: There exists a function f E F ( B ) with d j # 0 at $(a) but d ( 4 * ( f ) )= 0 at a. Now, a map may be onto, even though it has critical points. Consider the maps 41 and $ 2 : R + R, defined by 41(x) = x2 and 42(x) = x3 for I E R. Both have critical points at x = 0; the first is not onto; the second is. (As a side point, we may mention that the key fact is that the Jacobian $,‘(x) of changes sign, whereas the second does not.) We shall not pursue the discussion of the type of singularities of mappings that are sufficient to guarantee that a map be onto. This would invoke a delicate and still largely unknown field. (See paper by Hartman and Nirenberg [I], for a beginning in this direction, at least for the case where A and B are of equal dimension.) However, this simple example does suggest one such invariant : Suppose . f F~( B ) satisfies df # 0 at $(a), but d 4 * ( f ) = 0 at a. Thus, 4 * ( f ) has a critical point in the ordinary sense; hence we can define its Hessian, which is a quadratic form on A , . Suppose the Hessian is positive semidefinite. Then CI has a neighborhood U in which --f
#*(f)(x’) 2
4*(f)(~1)
for all a’
E
U,
that is, f ( + ( a ’ ) ) - f ( 4 ( ~ 1 ) ) But . then 4 ( U ) cannot cover a neighborhood of 4(a), for otherwisej would have a critical point at ~ ( c I ) . Another simple remark suggests itself: Consider the set of all functions f E F ( B ) such that 4 * ( f )= constant. Obviously, a necessary condition that 4 be onto is that any such function be constant on B. It is also plausible that in some cases (for example, if the image set 4 ( A ) is not too wildly behaved) this condition will be sufficient. Let us return to the case where we have the end-point mapping x p M of the space of a path from a given path system x on a manifold M beginning at the point p . Pursuing the analogy in the preceding paragraph, we can ask for functions on M that are constant when pulled back to xp under the mapping. Such functions obviously have the geometric property of being constant along those paths in x beginning at p . As we shall see below, for common --f
18. Accessibility Problems for Path Systems
243
types of path systems defined by systems of ordinary differential equations, functions on M which are constant on paths of the system must satisfy certain systems of partial differential equations. Thus, if we can prove that these partial differential equations have no nonconstant solutions, we can hope to turn this around and actually p r o w that n ( p ) = M . To make more explicit the program sketched in the last paragraph, let us make more definitions. Let U be an open subset of M . A function f E F ( U ) is said to be an integral of the path system n if f(a(t)) = f(a(0)) 0:[O,
114 U
for 0 I t _< 1, every path that belongs to n.
The set of all such integrals will be denoted by Z(n, U ) . Since they can be multiplied and added, they form a ring of functions. We use these rings of functions to define a new path system, denoted by n*, called the completion of n, in the following way: A path a: [0, I ] + M is in K* if and only if for every open subset U c M , every f~Z(n, U ) ,f ( a ( t ) )= constant for those values of t for which o(t) E U . Clearly, n c n*. (n* is something like the dual of the dual of a vector space.) We say that the path system TC satisfies the dualityprinciple if n*(p) = n(p)
for all p E M .
(18.1)
Intuitively, we may summarize (18.1) by saying that all points are accessible from a point p along paths in n that are not obviously inaccessible.” For the existence of a nonconstantfs 1(n, U ) sets up a priori limitations on the accessible points, since all paths in n must lie on the hypersurfaces f = constant. There are several other weaker versions of the duality principle that may be formulated, but the one we have just given should be sufficient to indicate the idea. Now, there does not seem to be as yet any general theorem describing necessary and sufficient conditions that a path system satisfy the duality principle. This seems to be an important subject for further research. The closest thing to such a general theorem is a theorem due to Chow [l], which may be interpreted as proving the duality principle in the case where n consists of all the integral curves of a nonsingular vector-field system whose derived vector-field system is also nonsingular. In fact this is quite typical of the general situation, namely, that in order to prove the duality principle, certain local nonsingularity conditions must be imposed. In turn, Chow’s theorem is a generalization of a famous theorem due to CarathCodory [ I ] that gives a geometric condition that a Pfaffian equation admit an integrating factor. It is interesting that Carathkodory’s theorem “
244
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
arose from accessibility conditions. It asserts that the second law of thermodynamics, in its form postulating the nonexistence of a perpetual motion machine of the " second kind," can be formulated as requiring that certain points of the "phase space" of a physical system be nonaccessible along adiabatic paths, that is, on integral paths of a Pfaffian form 8 in phase space representing '' heat." It is possible to prove the Lagrange multiplier rule for extremals of the Lagrange variational problem, using accessibility idea. (This is due to Bliss [l] and Radon [l].) We shall now explain this approach. Suppose that D is a convex domain in R" with coordinates x l , . . . ,x,. A Lagrange variational problem in homogeneous form will be supposed given on T ( D ) . Thus we are given functions g l ( x , i), . . . ,gm(x,i) and L(x,i) on T ( D ) that are homogeneous in x. Recall that a curve x(t), 0 I t 5 1, in D is an extremal of the Lagrange variational problem if (18.2a) (18.2b) for all paths 2(t), 0 5 t 5 1, such that (i) 2(0) = x(O), 2(l) = x(1); that is, 2(t) has the same end points as x(t). (ii) g0(2(t),(d2/&)) = 0 for 0 I t 5 1, 1 5 a I m, that is, the path 2 ( t ) satisfies the constraints given by (18.2a). Bliss' idea is to convert this Lagrange variational problem into a Mayer variationalproblem. Since we do not want to go into full detail here explaining the Mayer problem,? we shall simply describe the situation explicitly without attaching special labels to the construction. Introduce another real variable, labeled y , to our variables xl, . . . ,x , . Let D* be the following convex set in R"":
D* = {(xi, . . . , x, y ) : ( X I ,. . . ,x,)
E
D, - co < y < CO}.
Consider the following path system n in D*: A path ( ~ ( t y(t)), ), 0 5 t 5 1, in D* lies in n if and only if dt
(18.3)
iThe Mayer and Lagrange problems are equivalent in the sense that one can be transformed into the other, at least locally. We are concentrating on the Lagrange problemin this book because we think it is more natural from a differential-geometric point of view, although it seems that the Bliss school of the calculus of variations considered the Mayer problem to be more basic.
18. Accessibility Problems for Path Systems
245
for 1 < a 5 m, 0 i t i 1 . Let x ( t ) , 0 i t i 1, be an extremal curve of the Lagrange variational problem defined by gl,. . . ,g, and L, that is, a curve satisfying (18.2). Construct y ( t ) so that
Then the curve ( ~ ( t )y,( t ) ) in D* belongs to n, that is, satisfies (18.3). We now show that the fact that x ( t ) is an extremal, that is, satisfies (18.2), implies that the point (x(l), y(1) - E ) in D* is inaccessible with respect to 7-c from the point (x(O), 0) in D*,for any positive number E . Suppose otherwise: that is, (2(t),$(t)), 0 I t I 1, is a path in 71 joining these two points. Then, from (18.3), ga(2(t),(d2/dt))= 0 ; that is, 2(t) satisfies the constraints, and 2(0) = ~ ( 0 ) 2(l) ; = x(1). Then
hence,
hence,
contradicting that x(t) is an extremal. Apply the duality principle now. If n is sufficiently small, which we shall suppose is true, there is a functionfE I(n, D*),that is, a functionf(x, y ) such that d
-f(x(t),
dt
y(t)) = 0
for every curve ( x ( t ) ,y ( t ) )in n ;
hence,
Now these conditions on f are implied by the following conditions:
af axi
-(x, y ) i i
af ( x , y ) L ( x , i) +=0 aY
(18.4)
for every (x, i, y ) satisfying ga(x, i) = 0. These conditions are in turn implied
246
Part 2. Hamilton-Jacobi Theory-Variational Calculus
by the following conditions: There exist functions &(x, y , i) such that af (x, y)%
axi
+af (x, y ) L ( x , i )= La(% y . i)g.(x, aY
i)
(18.5)
identically in (x, y , i). (We are using the summation convention, with the following range of indices: 1 i, j , . . . I n ; 1 I a, h, . . . m.) Now, conversely, (18.5) is implied by the preceding two conditions if suitable local regularity conditions are satisfied and if again D is sufficiently small. (This is just a matter of applying the implicit function theorem.) I n this semi-intuitive account, we shall short-circuit the matter by assuming that (18.5) is true. Then, applying d / a x i to both sides of (18.5), we have
a j af
-+ - - = -aL g
axi
ay ai,
aIa
ai,
a
aga +I a
xi'
or multiplying by dxi and summing, we have
df
= o('a)ga
af + La o ( g a ) - - O(L), aY
where
Thus O(L) and O(g,) are the Cartan 1-forms associated with the Lagrangian functions. (Recall that L and ga are assumed to be homogeneous in i.) We can now eliminate df by applying d to both sides: d(O(&)g,)
= d ( M ( L )-
IaO(ga)),
af . where I = dY
Now let ( x ( t ) , y ( t ) ) be the curve in D* satisfying (18.3), with x ( t ) the given extremal of the Lagrange variational problem. Let o(t) be the curve (x(t), y ( t ) , (dxldt)) in (x, y , i)-space. Since go(x,(dxldt)) = 0, we see that
is a characteristic curve of d(hQ(L)- L,O(g,)>. Working out the explicit conditions that this be so, we see that x ( t ) satisfies the equation
where 4 t ) = A(x(t),A t ) ) , &(t) = Aa(x(t),y(t)). This says, however, that with this choice of the " Lagrange multipliers" ( A ( t ) , & ( t ) ) , the curve x(l) in D is an extremal of the time-dependent, ordinary,
18. Accessibility Problems for Path Systems
247
unconstrained variational problem whose Lagrangian is : A(t)L + A,(t)g,. This, however, is just the " Lagrange multiplier rule " for finding the extremals of the Lagrange variational problem with which we started. In summary, we may say that we have shown that the accessibility problem is really the basic one for the calculus of variations, underlying all the classical variational problems. We now turn to a treatment of the accessibility problem for vector-field systems, and in particular, to the proof of Chow's theorem.
The Accessibility Problem for Vector-Field and Pfaffian Systems Since the material in this section will be geometric and independent of coordinate systems, it will be convenient to work with a differentiable manifold M instead of a domain D in R". Recall the relevant notations: F ( M ) denotes the ring of functions on M . (All functions, maps, vector fields, curves, etc., will be of differentiability class C" unless mentioned otherwise.) V ( M ) denotes the set of vector fields on M . Each X E V ( M ) is by definition a map F ( M ) -+ F ( M ) that defines a deriuation of F ( M ) . For f E F ( M ) , X ( f ) , the image off under this map is the Lie derivative off under X . (Alternatively, X can be thought of as a first-order partial differential operator on functions on M , and X ( f ) is then the result of applying the operator symbolized by X to f.) Geometrically, if ~ ( t ) a, t 5 b, is an integral curve of X , then
For p E M , M p is the tangent space to M . Each v E M , is by definition an R-linear map: F ( M ) -+ R such that 4 f g ) = v(f)g(p)
Each X
E
+ f(p)v(g)
for all f , 9 E F ( W .
V ( M ) determines a value X ( p ) E M , at each p E M . X ( P > ( f >= X ( f ) ( P )
for
f
E
F(M).
If a(t), a I t Ib, is a curve, o'(t), the tangent vector to a at t may be defined as that element of Mo(<) such that d -f(a(A))IA=, dl
= a'(t)(f)
for all f
E
F(M).
Thus (1 8.6a) may also be expressed by the condition a'(t) = X ( a ( t ) )
for a < t I b.
(18.6b)
248
Part 2. Hamilton-Jacobi Theory-Variational Calculus
Let X E V ( M ) . From the existence theorem for systems of ordinary differential equations (when (18.6) is expressed in local coordinates, it is seen to be equivalent to such a system for the coordinates of a(t)), we see that, givenp E M , there is a unique integral curve o ( t ;p ) of Xequal t o p when t = 0, and defined for t sufficiently small, say - E 5 t I E . This E can be chosen uniformly when p ranges over a compact set, and 0 then depends in a C" way on p and f. This local solution can be continued with, say, B ( E ; ~ re) placing p . By uniqueness of the integral curves and the fact that t + a(t c), for constant c, is an integral curve of X whenever t a(t) is an integral curve, we see that for each p E M , there are numbers a(p) and b(p), a(p) < 0 < b(p), such that a ( t ;p ) can be defined over a(p) < t < b(p), but over no larger interval. If a(p) = - 00, b(p) = + 00 for all p E M , we say that X generates a oneparameter group of d ~ ~ e o m o r of p ~M,? ~ ~denoted ~s by exp(tX). For - co < r < co, exp(tX) is the diffeomorphism of M such that exp(tX) ' p = o(t,p), for p E M. The group property
+
--f
exp(t,X) ' exp(t,X) = exp((t,
+tJX)
follows from the fact mentioned above, that the integral curves of X are preserved under translation of t . Of course not every vector field generates such a one-parameter group. (For example, M = R,X = x'((a/(ax).) However, we shall proceed as if this were so, since it may be easily checked by the reader that the proofs can be suitably modified.$ Suppose now that X is a vector field on M and that 4 : M + M is a diffeomorphism. We have defined the transformed vector field +*(X) by any one of the following equivalent rules:
4*(X(P)> = 4* (X)(d)(P))
for all p E M .
b*(X(f))= 4; '(X>(4*(f)) for a l l f E mo. 4 maps integral curves of X into integral curves of 4 ( X ) .
Suppose now that Y is a vector field on M that generates a one-parameter group t + d)* = exp(tY) of diffeomorphisms of M . Let p E M . Then s+exp(sX) - p is the integral curve of X beginning at p . Thus s +
t In the general case, X is said to generate a one-parameter pseudogroup of diffeomorphisms, but we shall not try to make the pseudogroup concept precise here. In addition, one can prove the following lemma, which shows that in arguments where the parametrization of integral curves does not matter, the possible noncompletability of integral curves does not matter: Let M be a manifold and let X be a vector field on X . Then there exists an everywherepositive function on an M such that (fX)generates a global oneparameter group of diffeomorphisms of M .
18. Accessibility Problems for Path Systems
249
exp(tY) exp(sX) * pis an integral curve of (p,*-(X); hence we have the basic formula exp(tY) exp(sX) * p= exp(s(exp tY),(X)) exp(tY).
(1 8.7)
We know already that the one-parameter family t + (exp t Y),(X) = Z' of vector fields is determined as the solution of the following system of partial differential equations :
a
--I=
at
[Y,Z'].
(18.8)
If everything is real-analytic, they can be " solved " by Lie series : (18.9) where Ad Y ( X ) = [ Y, X I , Ad2 Y(X) = [ Y, [ Y, X I ] , etc. Now we can prove the Chow [l] accessibility theorem.
THEOREM 18.1 Let H be a real subspace of V ( M ) , the space of all vector fields on a manifold M . Let n ( H ) be the following path system on M defined by H : A path a(t), 0 I t I 1, is in n ( H ) if and only if a'(t) E H,,,,? for 0 I t I1. Let D ( H ) be the smallest completely integrable vector-field system on M containing H.8 If D ( H ) = V ( M ) ,then n(p) = M for all p E M. Proof. If [H, HI c H , we are finished, since then M p = D ( H ) , = H p for allp E M . Then there exists a pair X , Y E H with [ X , Y] 6 H . For at least one value of t , say, t = 1, we must have, by (18.9), (exp tY)*(X) 6 H . Let H, be the vector-field system spanned by H and exp( Y),(X). By (18.7), every point that can be reached on an integral path of H , can be reached by an integral path of H. If [H,, H,] c H,, we are finished, since then H , = D(H)= V ( M ) . If [ H I ,H,] g H , , choose X , , Y, E H , so that exp( Yl)*(Xl) 4 H , ; let H 2 be spanned by Hl and exp( Yl)*(Xl), etc. This process must eventually come to an end, which proves the theorem.
7 Recall that H p = { X ( p : X EH} for PE M, that is, H, is the set of values of elements of H at p. 3 D ( H ) is called the derived system of H, since it is obviously generated by sums of iterated Jacobi brackets of elements of H with functions on M as coefficients.
250
Part 2. Hamilton-Jacobi Theory-Variational
Calculus
Covollavies
(a) Suppose that H c V ( M ) is a vector-field system on M , that p E M , and that N is a submanifold of M that contains all integral paths of D ( H ) starting at p , and satisfies: N , = D ( H ) , for all q E N . Then N is the set of points of M that can be reached from p along integral paths of H. (b) Suppose that D ( H ) is a nonsingular vector-field system; that is, dim D ( H ) , = constant for p E M . Let n be the path system consisting of the integral paths of H. Then n*, as defined before, consists of the space of integral paths of D ( H ) ; hence the “ duality principle” for the accessibility problem is satisfied. For the proof of (a), one should notice that both H a n d D ( H ) are tangent to N ; hence, restricting to N reduces to the situation stated in the theorem. For (b), notice that an f E F ( M ) is constant along integral paths of H if and only if for all X E H . X(j)=0
But then this is true when X is composed of iterated Jacobi brackets of elements of H ; hencefis an integral of D ( H ) . Since D ( H ) is nonsingular, the Frobenius integrability theorem assures us that locally the integral manifolds of D ( H ) are obtained by setting all such functions equal to constants, which is precisely the duality principle.” “
Some Examples Example I The following problem arises in the theory of optimal control: Suppose + 1 I u, u, ... I n.
A&(t), uau(t)are given functions o f t , 1 I a, b, ... I rn; rn Consider the system of differential equations :
(18.10) where the x,(t) are the unknown functions and the y,(t) are functions that can be chosen arbitrarily.? THEPROBLEM For given xa0 (for example, x/ = 0), what are the set of points xa that can be reached on solutions of (18.10) starting at:x for all choices of the -t Not necessarily even continuously. To make sure there are no dificulties with existence of solutions, suppose, for example, that the allowable y.(t) are piecewise differentiable.
18. Accessibility Problems for Path Systems
25 1
functions y,(t) ? In physical problems, the x, may be the position-velocity coordinates of a system, and y , may be the forces. The problem posed is usually to get from initial point x,' to final point x, along a solution of (18.10), minimizing some performance criterion associated with the curve (x,(t), y,(t)). Clearly, an important preliminary problem, which we deal with here, is to know what points x, can be reached from x,' along any solution of (18.10) at all. To convert the problem into one covered by Theorem 18.1, introduce t as a new variable and the 1-forms
w, = dxa x b + vou yu) dl on (x, y , t)-space. Let H be the dual vector-field system on this space: (&J
X E H o w ; ( X ) = 0. Then, if a solution of (18.10) is regarded as a curve in
(x, y , t)-space, it is an integral curve of H unless the y,(t) have discontinuities. To deal with this case also, notice that a curve along which dt = dx, = 0
(that is, the x, and t are constant while the y,(t) are arbitrary) is also an integral curve of H . Therefore solutions to (18.10) with simple jump discontinuities in the y,(t) give rise to continuous integral curves of H . For example, if (x,(t),y,(t)), 0 5 t I 1, is a solution of (18.10), with limt+l,2-y,,(t) = y,,(+-) # y(++), and t = is the only discontinuity of y , define
+
U s ) , Y^.(s),
as follows: U s ) = x,(s), = Y,(S), t ( s ) = s,
9.6)
= X,W, = 0,
t ( s ) = +,
R,(s)
= x,(s - l),
p (s) = y,(s - l), t ( s ) = s - 1,
w,
0 I s I 2,
O I S I + .
0 I s < +. j,(+) = yo(*-). O I S < * . * I S < + .
+ Is 5 3
is any curve such that pa($) = y e ( + + ) .
+IS<*.
+ Is 5 2 . +5s52.
3IS52.
Conversely, an integral curve of H, that is, a curve x(s), y(s), t(s), 0 I s 5 1, such that w,(x(s), y(s), t(s)) = 0, gives rise to a solution of (18.10) with possibly discontinuous y(s). For 0 5 s I 1 can be divided into intervals in which dt = 0 or dt # 0; that is, t changes or remains constant. In the former intervals, (x(s),y(s), t(s))can be reparametrized by t so as to be a solution of (18.10). In the latter type of interval, t (hence x,) remains constant, and the
252
Part 2. Hamilton-Jacobi Theory-Variational Calculus
effect is for y , to “jump” from one value to another while x, and t remain constant. We now turn to computing D(H). Introduce the following vector fields in (x, y , t)-space : (18.11)
They span H , that is, satisfy w,(X,) Using (18.11), we have
= 0 = w,(
Y).
Let 2, = -u,,(a/ax,). Then [Z,, Z,] = 0. Suppose now that Z is any vector field of the form z,(t)(a/ax,). Then aZ/at will denote the field [(a/&), Z ] = (d/dt)z,(t)(a/ax,). Then,
If z = (z,) denotes the m x 1 vector function of t, A = (Aab) the m x m matrix function o f t , define V z as the m x 1 vector function such that
Then
[ Z , X]
a
= (VZ), -.
a
ax,
a
[ [ Z , XI, X] = ( ~ ( V Z ) ) , = (v’z), -, ax, ax,
etc.
Combining these calculations with Theorem 18.1, and our remarks above about the relation between solutions of (18.10) and integral curves of H we have proved: THEOREM 18.2 For each u, define the m x 1 vector function of t, 21, = (u,,). D ( H ) is then spanned by the vector fields: X , Y , (defined by (18.11)), and the fields
Suppose that for each t , the dimension of the space of rn x 1 vectors spanned
18. Accessibility Problems for Path Systems
253
by v,(t), Vv,(t), V2v,(t), etc., is m. Then, given points xo, x and a real number T > 0, there is a solution ( x ( t ) ,y( t) ) of (lS.lO), 0 I t 1. T, such that
x ( T ) = x.
x(0) = xo,
The ~ ( t are ) continuous, piecewise smooth, but the y ( t ) may possibly have simple jump discontinuities. Example 2 We work now in a space of variables (xi), 1 s i, j , . . . I n. Let w = a i dxi be a 1-form. Let H be the vector-field system annihilating o,that is, the set of vector fields X = A , dxi satisfying O = o ( X ) = A i a i . To avoid questions of singularities, we suppose that o is everywhere nonzero. Then H i s everywhere of dimension n - 1. Let F ( H ) be the set of all vector fields X E H such that [X, H ] c H. X E F ( H ) if and only if w ( X ) = 0, and X J do =fo, for some function$ F ( H ) is a completely integrable vector-field system, that is, [ F ( H ) ,F ( H ) ]c F ( H ) , and f X E F ( H ) for any function f , all X E F(H). Proof. Suppose first that w ( X ) = 0 and X A do =fo.Then, for Y E H,
0 = x _I dw(Y) = dw(X, Y) = X(o(Y))
- Y(w(X>)- 4cx, Yl)
= -o(CX, Yl),
that is, [ X , Y] E H. This reasoning can be reversed to prove the converse: If X E F ( H ) ,f is a function, Y E H, CfX,
Y l = - Y(S)X
fCX,
Yl E H,
that is,fXE F ( H ) . If XI, X, E F ( H ) , Y E H , CCXi, X2I9 YI = [XI, CXz
that is, [XI, X,]
E
9
YII - CX2
3
[ X i , Y l l E H,
F(H).t
Definition
The form o is said to be nonsingular if F ( H ) is a nonsingular vector-field system. A form o is said to admit an integrating factor (that is, function) f i f d(fo) = 0. Then o = dg, X E H , that is, o ( X ) = 0 if and only if X ( g ) = 0.7
t Of course the “classical”
assumption must always be made that f is nowhere zero,
254
Part 2. Hamliton-Jacobi Theory-Variational Calculus
In particular, F ( H ) = H. Suppose conversely that F ( H ) = H . We can then choose the coordinate system (xi)so that w(d/dx,) = 0 for i > 1. Hence, if w = ai dxi,then a,=O f o r i > 1 and w = u l dx,, that is, w admits an integrating factor. We say that the Pfaffian equation w = 0 is completelyintegrable. This, combined with Theorem 18.1, proves the following theorem of Caratheodory [ 11, historically the first accessibility theorem.” “
THEOREM 18.3 If w is a 1-form, with H the vector-field system defined by the Pfaffian equation w = 0, if F ( H ) is nonsingular, w admits an integrating factor if and only if the following geometric condition is satisfied: For each point xo of the space, there are points x arbitrarily close to x0 that are inaccessible from xo along curves satisfying w = 0, that is, that are integral curves of H . This theorem is the foundation for Carathtodory’s axiomatization ’7 of the second law of thermodynamics. (Recall that this is the law asserting the existence of entropy.) To see the connection, imagine that each point of space represents a state of a given physical system and that the form w represents “heat,” that is, the curves along which w = 0 represent changes of state in which no heat is added. Then, if all states could be reached from a given state without adding heat, clearly a “ perpetual motion machine” could be constructed. For example, if our system were composed of a gas in a box, and if a state where all molecules with a velocity above a certain number were in one part and all molecules with velocity below that number were in the other part, and each part could be reached without adding heat, the faster (that is, hotter) molecules could be used to perform free work. Since nothing is free, there must be inaccessible points; hence, by Theorem 18.1, w must admit an integrating factor; that is, w can be written in the form T dS, where T is the “temperature” and S the “entropy.” Of course T and S are not uniquely defined in this way. However, their other properties can be obtained by taking them as known for simple systems (for example, ideal gases) and postulating how they behave when systems are combined. “
Example 3 Monge Systems A Monge system is, classically, defined by a system of ordinary differential equations: dX x = (x,,. . . , x,) = (Xi). 2 = --d t ga(x, i, t ) = 0; 2
t It would be better to call this a
“
geometrization” of thermodynamics.
255
18. Accessiblity Problems for Path Systems
(We adopt the range of indices 1 I i, j , . . . i n ; 1 I a, b, . . . i m I n and the summation convention.) A formulation in terms of manifolds would postulate: (a) a manifold M ; (b) a subset C c T ( M ) x R ;(c) a curve 0:[a, b] + M satisfies the system if ( ~ ’ ( t )t ,) E C for a i t I b. However, we shall work with the more explicit formulation in terms of coordinates. Let a point xo and an initial time, say, t = 0, be given. Our problem is to decide what is the set of pairs (x, T ) such that there exists a solution, continuous and piecewise C“, of the system; say, t + x(t), with 0 i t I T and x ( T ) = x’? We want to show that the “ duality principle” for accessibility is satisfied here if appropriate regularity conditions are satisfied. The first such assumption is that the rank of the matrix ( d g J d i , ) is maximal, that is, is equal to m. By the implicit function theorem, the coordinate system can be chosen so that the system of differential equations defining the curves in the system takes the form dxm + dxn 5 = h,(x(t) -,..., x , t dt ’ dt I
)
for
a = 1 , 2 ,..., m.
Thus we can choose the functions ~ , + ~ ( t. .) ., , x,(t) at will, and determine the rest by solving these differential equations. However, consider the following Pfaffian system in the space of variables (xl, . . . ,x,, i m + ...l,i, nt,) : dx, - ha dt
= 0,
d x , - f, dt
for 1 i a
m; m
+1I u 5 v.
Notice that a curve t + (x(t), i,+,(t), . . . , i,(t), t ) in these variables, that is, an integral curve of this Pfaffian system, gives a solution of the Monge system. In effect, we have “prolonged” the original Monge system to a Pfaffian system on the space of variables (x, i, t ), having the property that the integral curves of the Pfaffian system and solution curves of the Monge system correspond under the projection. Consider the dual vector-field system H , that is, the space of vector fields X that satisfy ( d x , - ha d t ) ( X ) = 0 = (dx, - f, d t ) ( X ) . A function f (x, f, , t ) is an integral of this vector-field system if df is a linear combination of the forms (dxa - ha dt) and (dx,, - i, dt). , These forms do not contain di,,; hence (af/di,,) must be zero, that is, f is independent of 2,. Such a function, then, must be an integral of the initial Monge system; thus the process of prolongation has not added any new integrals. Hence, if the duality principle holds for the prolonged Pfaffian system (which it will, by the Carathtodory-Chow theorem if the D ( H ) is nonsingular), it will also hold for the original Monge system. Returning to the connection established earlier between the accessibility problem for Monge systems and the validity of the Lagrange multiplier rule for the Lagrange variational problem, we see that these arguments prove
256
Part 2. Hamilton-Jacobi Theory-Variational Calculus
the multiplier rule with an absolute minimum of technical apparatus, at the expense of making very strong assumptions about the local regularity of the data. There is another approach to these accessibility questions due to Bliss [I], which needs much less stringent regularity conditions. Bliss’ approach has recently been revived by Pontryagin and his coworkers [l], and extended to treat the nonclassical variational problems with inequality constraints that are involved in the theory of optimal control. This work has been extended and clarified by Halkin [I] and Roxin [l]. Since the book by Pontryagin et al. [l] presents an excellent exposition of this approach, we shall limit ourselves to several qualitative comments. Return to considering a general path system n on a manifold M , and let 7cP again denote the set of all paths in the system that start at a point p of M . The type of accessibility provided by the Carathiodory-Chow approach is really “global” in nature. By this we mean that even though two points q and q‘ of M that are close together can both be joined t o p by paths from n, we d o not necessarily know that these paths are close together. To study this question, we must study the local properties of the end-point mapping of np M ; that is, we must construct its differential.” First, of course, we must say what is meant by the “ tangent space” to npat a “point,” that is, a path CJ. Let us restrict ourselves to a simple case, namely: suppose that CJ: [0, I] M is a curve, with a(0) = p. A “curve” in nP should be considered as a p oneparameter family s -+ a, of curves in M , each of which belongs to np.Thus, if C J ~= c for s = 0, we may think of this as a deformation of the curve CJ. We can construct the “ infinitesimal deformation” to this deformation as a vector field u : t + u ( t ) E M o ( , )along the curve CJ as follows: For 0 I t 5 1, v(t) is the tangent vector to the curve s -+ uJt) at s = 0. The condition that a,(O) = p for all s obviously requires v(0) = 0. Thus it is reasonable to dejine the “ tangent space” to np “ a t ” a curve a as the set of all vector fields t -+ u along CJ that arise in this way as infinitesimal deformations corresponding to curves in np. (See Fig. 4.) Denote this space by n,”. Obviously, now, the “ differential ” of the end-point mapping zP-+ M at a point CJ E n pis the mapping that assigns the end vector u(1) to each such vector field t -+ v(t) along CJ. -+
“
-+
FIGURE 4
Suppose now that N is an ordinary finite dimensional manifold and that mapping. Explicitly, then, to each point a E N , we are given a curve C T ~ of np. We shall consider that this mapping is “smooth” if the
4: N + np is a
18. Accessibility Problems for Path Systems
257
mapping ( t , a) + o,(t) from N x [0, 11 --+ N is smooth (that is, Cm) in the ordinary sense. This enables us to define a linear mapping from N u to nup. Comparing this with the differential of the end-point mapping nup-,Mu(,) gives us a linear mapping N u -+ which is the differential of the mapping N + M that assigns a,( 1) to 6,.We conclude from the ordinary implicit function theorem for finite dimensional manifolds that if the mapping N + npcan be chosen so that the linear mapping Nu --+ Muz(l)is onto, then every point in a neighborhood of a,(l) can be reached by curves from np that are “close” to the initial curve a. We shall not go further into this approach here. It is done quite simply and naturally in a classic paper by J. Radon [I]. Indeed, since this paper is one of the clearest and most elegant in the entire history of the calculus of variations, we prefer to suggest to the reader that he consult it directly, rather than attempt to reproduce its contents here. As a bonus, Radon also gives an exposition of the theory of the second variation, a subject we shall not touch on except in the context of Riemannian geometry, Part 3.
This page intentionally left blank
Part
3
GLOBAL RIEMANNIAN GEOMETRY
This page intentionally left blank
19 Affine Connections on Differential Manifolds Introduction Let M be a manifold. Recall that a homogeneous, ordinary variational problem on M is defined by giving a real-valued function v -+ L(v) on T ( M ) , called a Lagrangian on M , such that L(lu) = lL(v) for A > 0. Suppose that L satisfies the following condition : L(U + u)'
+ L(U
- 0)'
+2
= ~L(u)'
~ ( v ) ~ for u, v
E
T(M).
We can then dejne, for u, v E T ( M ) , ( u , v ) = *(L(u
+ v)'
- L(u)' - L(V)*),
and verify easily that on each tangent space M, ,x E M , the mapping (u, v ) ( u , v) defines a symmetric, bilinear form such that
L(u) = ( u , u)*"
for all u
E
--f
M,.
We also write: L(u) = (IuI(. If, in addition, this bilinear form is positive definite, we say that L defines a Riemannian metric on M ; the study of the geometric properties of the extremal curves of this Lagrangian (here called, in this special case, geodesics) constitutes Riemannian geometry. Riemannian metric are the simplest general class of variational problems and have been studied more extensively than more general variational problems; therefore they appear naturally in a great many contexts in mathematics and physics. Although some of the basic general theorems of Riemannian geometry can be considered as special cases of theorems about general variational problems, there are many more special results, giving a deeper insight into the structure of the extremals. One general reason for this is that there is an additional element of structure, an afine connection, associated with a Riemannian metric, enabling one to study the differential equations for the extremals in a more detailed way, and leading to the concept of curvature. There has recently been a resurgence of interest on the part of differential geometers in the study of global Riemannian geometry. We aim in this chapter to provide a t least an introduction to this aspect of current research. 261
262
Part 3. Global Riemannian Geometry
Definition
An affine connection on a manifold M is a map V ( M ) x V ( M ) denoted by ( X , Y ) + V x Y for X , Y E V ( M ) , such that
--f
V(XI+XZ)(Y1 + Y2) = VX,YI
+ vx, y2 + vx,y2 + vx, Yl
V(M), (19.la)
for all X,, X , , Y , , Y , E V ( M ) . (Bilinearity.) V f X Y= f V x Y Vx(fY)=X(f)Y+fVxY
forf
E
F ( M ) ;X , Y
E
V(M).
(19.1b) (19.1~)
Intuitively, an affine connection is a law of “covariant differentiation” (Vx Y is described as the covariant derivative of the vector field Y by the vector field X ) of vector fields. The reader may protest at this point that we already have a method of “differentiating” Y by X , namely, the Jacobi bracket [ X , Y ] . Note, however, the difference in the formal properties of these two operations, particularly [ f X , Y ] = - Y ( f ) X + f [ X , Y ] . To get a distinctive feeling for the difference, let us look at the situation in a typical manifold, namely, a convex domain D in R” with coordinate functions x i , 1 2 i, j , . . . 2 n. The functions rijk in D such that
are called the components of the affine connection in the coordinate system ( x i ) .(In classical tensor analysis, they are more or less the Christoffel symbols.) On the one hand, they completely characterize the affine connection, since if
a
X=Ai--, axi we have, using the laws (19.1),
a
Y=B.--, axj
(19.2) On the other hand, it is readily verified that on changing to a different coordinate system, these components do not have a “tensorial” law of transformation. (The tipoff that this is so is that (19.1~)is postulated instead of the ‘‘ tensorial” postulate V , ( f Y ) =f V x Y.) This nontensorial character of an affine connection creates great difficulties in classical tensor analysis;
263
19. AfTine Connections on Differential Manifolds
a triumph of the manifold point of view is that an affine connection can be described by the simple postulates (19.1). (They were first given by J. L. Koszul.) Equation (19.2) shows us in greater detail the difference between Jacobi bracket and covariant derivative. For example, in Vx Y , only the components of Y are differentiated, and Vx Y @ ) = 0 if X @ ) = 0. An affine connection of M gives rise to a method of “parallel translation” of tangent vectors of M along any curve in M. First we shall describe in qualitative terms what we want to mean by this. Suppose that o: [a, b] --f M is a curve in M . (By a curve we mean C“ in the sense that there exists an E > 0 such that o can be extended as a C“ map to (a - E , b E ) . ) The tangent spaces to two points of M are both n-dimensional real-vector spaces; hence they are isomorphic when considered as abstract vector spaces. However, there is no unique way of describing such an isomorphism. Of course, if M is Euclidian space itself, the tangent spaces are isomorphic with M (hence with each other), but we have been emphasizing that this must be ignored if one is interested in nonlinear phenomena. By parallel translation along o we mean some method of setting up consistently and smoothly an isomorphism between M+) and Mu(b)for a It 5 6 . It is to be expected, however, that the isomorphism one derives between Mu(n) and Mu@) will depend on the choice of curvejoining o(a) to o(b). We shall not worry here about the more general scheme for accomplishing this goal, but shall describe how an affine connection on M , that is, a covariant differentiation law V satisfying (19.1), gives such a process. Suppose the curve o is an integral curve for a vector field X on M ; that is, o’(t)= X ( o ( t ) ) for a 5 t I b. Of course not every curve can be so described, but those that can be are sufficiently plentiful for our purposes. Suppose that Y is another vector field on M . Consider the condition
+
“
V, Y(a(t))= 0
tI b. for a I
”
(19.3a)
If Y satisfies this condition, we say that its covariant derivative along o is zero. Let us for the moment look at this condition in a domain D c R” with coordinates ( x i ) . Suppose that o(t) = (xi(?))= x(t). Using (19.2) and the relations
we have
264
Part 3. Global Riemannian Geometry
Thus we see that (19.3a) is equivalent to the relations
As a first observation, notice that this condition on X and Y involves only the values of X and Y on the curve o(t). Thus, if x(t) = ( x , ( t ) ) ,a 5 t 5 b, is an arbitrary curve in D, and if v : [a, b ] + T ( D ) is a vector field along x ( t ) , that is, v(t) E Dx(r,for a 5 t 5 b, with components vi(t) = dx,(u(t));that is, if
we may define another vector field, denoted by V v ( t ) ,along the curve x(t) by the following formula and have reasonable expectations that the operator v -+ Vv on vector fields along the curve is of interest. (Vv is called the couariant derivatiue of v along the curve.) (19.4) Note that Vv(t) = 0 if and only if (19.5) As soon as x ( t ) is given, (19.5) may be regarded as a system of first-order, linear, homogeneous differential equations for the components uk(t) of u. Thus we see that, given a ua E D,,,, , there is a unique vector field v(t) along the curve such that Vu = 0 and v(a) = v", and that if us is another element of D,,,, , if the vector field u(t) along x(t) satisfies V u = 0 and u(a) = us, then
+
V(u(t) u(t)) = 0
and
u(a)
+ u(a) = aa + ua.
Thus the correspondence v" -+ u(t) sets up an isomorphism between the vector spaces D,,,,and D x ( rfor ) each t E [a, 61. This is the desired "parallel translation " of tangent vectors along the curve. We can sum up what we have proved for domains of R" in terms of an arbitrary manifold, in the form of the following theorem. The proof for an arbitrary manifold can be done by referring pieces of the manifold back to R" via charts, and will be left to the reader.
THEOREM 19.1 Let M be a manifold with an affine connection described by a covariant differentiation operation ( X , Y ) + V, Y satisfying (19.1). Let 0 : [a, b] -+ M
19. Allhe Connections on Differential Manifolds
265
be a curve in M . A vector field on 0 is a mapping, usually denoted by v, assigning a tangent vector u ( t ) E Mu(t)to each t E [a, b]. Then there is an operation assigning a new vector field V v to each vector field v on o, with the following properties: (a) If a ( t ) is a real-valued function o f t , a i t i 6 , da V(a(t)u(t))= - u ( t ) + a(t)Vv(t). dt
(b) If u and u are vector fields along o, then V u + V v = V ( u + u). (c) If X and Yare vector fields on M , with o'(t)= X ( o ( t ) ) ,u(t) = Y ( o ( t ) ) for a I t I b, then Vv(t) = V, Y(o(t)). (d) If v, is a given tangent vector in Mu(,,, there is a unique vector field u along o with v(a) = ua and V v = 0. The correspondence va 4 v(t), for each t E [a, b], is linear, and sets up an isomorphism between Mu(,) and Mu(t), called the parallel translation of tangent vectors along o. Any vector field v on o such that V v = 0 is said to be self-parallel. If M = R", a special affine connection can be defined by requiring that V,,,,,(a/dx,) = 0; if v(t) = v,(t)(a/ax,)(o(t)),the condition that Vv = 0 reduces to ( d ~ / d t=) ~ 0, that is, u i ( t ) = vi(a).Thus the parallelism idea does not really depend on the curve, and the isomorphism between tangent spaces is that obtained by identifying the tangent space to Euclidean space at a point with Euclidean space itself. The straight lines in Euclidean space are those whose tangent vectors a t different points are parallel, that is, curves o(t)satisfying Vo'(t) = 0.
(19.6a)
Thus we are justified in calling the curves satisfying (19.6a), in a space with a general affine connection, the straight lines or self-parallel curues of the space. Let us look at the conditions by referral via a chart back to a domain with coordinates ( x i ) in R". If o ( t ) = ( x i ( t ) ) ,the components of the vector field o'(t) along o are ( d x i / d t ) ,and, using (19.5), (19.6a) becomes d 2 x i ( t ) = - r i j k ( x ( t ) )dxi 2. dx .
dt2
dt dt
(19.6b)
These differential equations are, of course, nonlinear and of second order. The most one can say is that there is a unique solution with x(a) and (&/&)(a); that is, with o(a) and o'(a) prescribed, that the solution to this initial value problem exists if b is sufficiently close to a, and that the obtained solution is a C" function of the initial conditions x(a) and (dx/dt)(a).Of course we also have the following homogeneity condition : If t + a(t) is a solution of (19.6a), so is the curve t --+ o(At), where 1 is a real constant.
(19.7)
266
Part 3. Global Riemannian Geometry
Now we turn to the description of the torsion and curvature tensorfields associated with an affine connection on a manifold M given by a covariant differentiation operation satisfying (19.1). We shall mainly restrict ourselves to formal considerations, since our goal is to sketch the theory of affine connections in as efficient a manner as possible for use as a tool in Riemannian geometry. The torsion and curvature tensors are, respectively, maps T: V ( M ) x V ( M ) + V ( M ) and R : V ( M ) x V ( M ) x V ( M ) + V ( M ) , defined as follows for X , Y , Z E V ( M ) : T ( X , Y ) = v x Y - v y x - [ X , Y1.
(19.8a)
R ( X , Y ) ( Z )= VX(VY Z ) - VY(VXZ) - V[X,Y1Z-
(19.8b)
We write R ( X , Y ) ( Z )because we mean to suggest that R should be interpreted as a law assigning to each pair ( X , Y ) E V ( M ) a mapping 2 R ( X , Y ) ( Z )of V ( M ) into itself. Certain algebraic properties of T and R should be evident. First, both T and R are skew-symmetric in X and Y. Second, both are obviously R-multilinear in their arguments. Less obviously, however (since V is not P ( M ) multilinear), they are both F(M)-multilinear with respect to multiplication of X , Y , or 2 by functions from F ( M ) . For example, using (19.1), for f E F(M): T(fX,Y ) = V f xY - V y ( f x ) - [ f x ,Y I =f V , Y - Y(f)X-f V y X Y(f)X-f [ X ,Y ] = fT(X, Y).
+
N f X , Y ) ( Z )= fV,VY - V Y ( f V X Z )- VLfX,Y ] Z = f V x V , Z - Y(f)V,Z - f V , V x Z + Y(f)VxZ-fV,,,Y,Z = f R V , Y)(Z). That f pulls through with no differentiations acting on it when the other arguments are multiplied by it is verified similarly. This property indicates the tensorial character of T and R. We can use this property to define the values of T and R at each pointy E M in a way similar to that we used earlier to define the value of a differential form at a point. The value of T at p, denoted by T,, is to be a bilinear mapping M p x M , M , . For u, u E M , , choose vector fields X , Y E V ( M ) with X ( p ) = u, Y ( p )= u, and define T,(% 0) = T ( X , Y>(P>. The value of R at p , denoted by R,, is to be a bilinear mapping M p x M , x M , 4 M , defined as follows: For ul, u 2 , v E M,, choose X , Y, 2 E V ( M ) with X ( p ) = u, Y ( p ) = u 2 , Z ( p ) = z), and --f
R,(uI> u*)(u) =
w,Y)(Z)(P).
267
19. Aftine Connections on Differential Manifolds
Of course we must verify that these definitions make sense, that is, are independent of the vector fields chosen to extend the tangent vectors at p . One can show that it suffices to verify this for a neighborhood of each point p . Using charts, it suffices, then, to suppose that M is a convex domain of R”,with coordinates ( x i ) .Thus, for example, if
a axi
u = ui -( p ) ,
a axi ’
X=A,-
Y
a axi
= Bi -,
2,
a axi
= V i -( p ) ,
with A i ( p ) = u i , B i ( M ) = u i ,
and if
(the T i j k are the components of T with respect to the coordinate system (xi)), then
This shows quite explicitly that defining T,(u, v) = T ( X , Y ) ( p ) is legitimate, since it is independent of how u, u are extended to vector fields. Similar considerations hold also for the curvature tensor, of course. Consider now a manifold A4 and a two-parameter family 6(s, t ) in M , a I t I 6, 0 2 s 2 1 ; that is, 6 is a map [0, I ] x [a, b] M . The geometric interpretation of 6 is mainly a matter of taste, of course, but the following picture will be most useful to us: For s held fixed, the curve t -+ 6(s, t ) is a curve in M , denoted by a,, say. Thus s -+ a, can be considered as a oneparameter family of curves, or as a curve in the space of curves of M . 6(s ,t) can be thought of as a homotopyor deformation of the curve t a(t) = a,(t) = 6(0, t). For notational convenience, we shall usually normalize the parametrization of t to be also the unit interval [0, 11. A vectorJieZd on 6 is a map denoted by, say, z), of [0, I ] x [0, 11 T ( M ) , assigning to s, t E [0, 13 a tangent vector u(s, t ) E M,,,, t ) . The vector fields a,6,d, 6 on 6 are defined as follows : -+
-+
-+
For 0 I s, t I 1 , d, 6(s, t ) and a, 6(s, t )are, respectively, the tangent vectors to the curves u 6(u, t ) and u + 6(s, u) at u = s and u = t. Thus a,6 and d,6 are the tangent vector fields to the curves obtained by holding, respectively, t and s constant and varying s and 1. -+
(19.9)
268
Part 3. Global Riemannian Geometry
Similarly, if v : (s, t ) + v(s, t) is a vector field on 6, and if V is a fixed afiine connection on M , define V,u and V,u as vector fields on 6, the covariant derivatives of v in, respectively, the s and t-directions, as follows: For 0 I s, t I 1, V, v(s, t ) and V, u(s, t) are, respectively, the value of the covariant derivative vector fields at u = s and u = 1 of the vector fields u + v(u, t ) and u-+ u(s, u) along the curves u+6(u, t) and u + 6(s, u).
(19.10)
The reader will find it easier to keep these definitions in mind by referring to a special case, namely: Suppose that X and Y are vector fields on M such that for each s, t -+ 6(s, t ) is an integral curve of X , and for each t , J + S(s, t ) is an integral curve of Y . Then, from (19.9),
a, qS,t ) = x(qs,t i ) ,
ai6(s, t ) = y(qS,t ) ) .
(19.1 1)
Of course not all homotopies can be written in this form, but usually this type of homotopy is sufficiently general so that results proved for them extend to arbitrary homotopies. If also there is a vector field Z on M so that Z(6(s, t)) = v(s, t), (19.10) takes the form
v, v(s, 0 = v, Z(S(S, t ) ) , v, u(s, t ) = v, Z(S(S, t ) ) .
(19.12a) (19.12b)
Note that if X , Y E V ( M ) satisfy (19.11), then
Proof. It suffices to prove this in case 6 is contained in a sufficiently small open set of M . We may as well suppose, then, that we are in a domain of R" with coordinates (xi). Suppose that
a
X=Ai--,
axi
a
Y=Bi--,
axi
and that S(S,
t ) = ( X i ( & t ) ) = x(s, t ) .
Then (19.1 1) reduces to
axi
at = A i ( X ( S ,
t)),
axi
as = B,(x(s, t)).
269
19. Affine Connections on Differential Manifolds
Thus,
a2xi - I aZx. = -aA.ax. Biaxj 0 =1 2 - a-asat ataxi axj as axj at
whence (19.13). Now we are prepared to state the fundamental formulas connecting this " covariant derivative along curves " concept with the curvature and torsion tensor fields : Suppose that 6(s, t ) , 0 s, t 1, is a homotopy of curves in a manifold M with affine connection V, and that u : (s, t ) + u(s, t ) E Ms(,,t).is a vector field on 6. Then,
v, a, W ,t ) - V, 8, 6(s,
vsVAs, t ) - Vt V,u(s,
t ) = ~ ( , , ~ ) (6(s, a , t ) , a,6(s, t ) ) .
(19.144
t ) = Rqs,t)(as6(s, t ) , at N s , t))(u(s, t)),
(19.14b)
where, for each p E M , T, and R, are respectively the values, as defined above, of the torsion and curvature tensors of the affine connection. Proof. We shall give the proof only in the special case where there are vector fields X , Y on M related to 6 via (19.11), and a vector field 2 on M such that u(s, t ) = Z(6(s, t)). The proof in complete generality must be done by a straightforward but tedious computation in local coordinates, which we leave to the reader as an exercise. By (19.13),
v, a, ~ ( s ,t ) - v, a, 6(s,
[ X , Yl(6(s,
0 )= 0.
t ) = v Y m s ,t ) ) = =
vx y w , ti) - [IY,
w, X)(W, 0) G(s,,)(a,
t ) , a,
XI(^
t)),
whence (19.14a). By (19.12),
V,u(s, t ) = V,Z(6(S, t)),
V,u(s, t ) = VyZ(h(s, t ) ) .
Then applying (19.12) again, we have
v, v,u(s, 0 = VY v x Z(Ns, t ) ) v,VS& 0 = VXVY Z(d(s, 0) =
CVYVXZ + V[X,Y,Z + R ( X , Y)(z)l(s(s,
t)).
o)
270
Part 3. Global Riemannian Geometry
Now R ( X , Y > ( z ) ( s ( s1)) , = R6(s,t)(X(fi(S,t)), Y ( % t)>)(Z(6(s, t))), by definition of the "value" of R at a point. We have seen that [ X , Y](d(s, t ) ) = 0 implies that VLx,y,(Z)(6(s, t)) = 0. Subtraction now gives (19.14b). As first application of (19.14), we have: THEOREM 19.2 Suppose that M is a manifold with an affine connection. If, for any curve
o: [a, 61 -+ M , parallel translation of Mu(a,to Mu(,,,along o does not actually depend on the curve joining a(a) to o(b), then the curvature tensor of the
affine connection is identically zero. Conversely, if the curvature tensor is identically zero and if M is simply connected, then parallel translation of tangent vectors along curves joining any two points of M really does not depend on the curve joining the two points.
Proof. Suppose that S(s, f), 0 I s, t I 1, is a homotopy of curves with fixed end points; that is,
6(s, 0)= 6(0,0),
6(s, 1) = 6(0, 1)
for 0 I s I 1.
Let vo be an element of Ma(o,o,,and define a vector field v(s, t ) on 6 as follows: u(s, 0) = uo, and the vector field t u(s, t ) o n the curve t -,6(s, t ) is self-parallel. Analytically, this means that V, u = 0. Applying (19.14), we have -+
V, V, v
= 0 = V,
V, v
+ R(d, 6, d,6)(~).
Now u(s, 1) E 1) is the parallel-translate of vo along the curve t -+ 6(s, t). Thus, if parallel translation is independent of the curve, we must have u(s, 1) = v(0,l),
whence Vszl(s, 1) = 0,
whence &(a,
lps 6(0, I)>3, 6(0,l))fV(O,1))
= 0.
Since vo and 6 can be chosen completely arbitrarily, we must obviously have R = 0. Conversely if R = 0, suppose o(t) and ol(t),0 I t I 1, are two curves joining o(0)and a(1). If M is simply connected, we can find a fixed end-point homotopy 6(s, t ) such that
6(0, t ) = o(t),
6(1, t ) = Ul(t).
Let u be a vector field along 6 such that V, v = 0 and ~ ( s ,0) = uo. Since R = 0,V, V, u = V,V, v = 0. Since ~ ( s 0) , = uo, also Vsv(s, 0) = 0.
19. ABne Connections on Differential Manifolds
271
Hence, for fixed s, V,u(s, t j is the parallel translate of V,D(S,O j along the curve t + 6(s, t ) ; hence V, D(S, t ) must also be zero. In particular, V, u(s, 1) = 0. Since 6(s, 1j = 6(1, l), this forces u(s, 1) = u(0, I j for all s; in particular v(1, I> = v(0, 1). But v(1, 1) and v(0, 1) are, respectively, the parallel transQ.E.D. lates of the vector uo along the curves 0 and c1.
20
The Riemannian AfFine Connection and t h e First Variation Formula
Suppose that M is a manifold with a Riemannian Lagrangian L E F ( T ( M ) ) . As explained in the Introduction, this means that, for each p E M , there is a positive definite symmetric bilinear form (u, v) + ( u , u ) on each tangent space M,, such that
L(u) = ( v ,
u)"2
= (/?I((.
If X , Y are vector fields on M , then inner product of X and Y , denoted by ( X , Y >, can be defined as
( X , Y X P ) = < X ( P ) ,Y(P)). This inner product has the properties:
( X , Y ) EF(M)
for
X,Y
E
V(A4).
( X , Y > = ( Y , x>. (flX,
(20.la) (20.lb)
+ f 2 X 2 , Y ) = f l ( X , , 0+ f,(X,, y > for f l , f 2 E F ( M ) , X 1 , X 2 , YE V(A4). (20.1~) ( X , X ) 2 0.
( X , X ) ( p ) = 0 implies X ( p ) = 0.
(20.ld)
Clearly, a Riemannian metric can be just as well defined by an inner product satisfying (20.1). If (20.ld) is replaced by the weaker definiteness condition, (X,Y )
=0
for all Y
E
V ( M ) implies X = 0,
(20.ld')
then the inner product is said to define a pseudo-Ric.manniun structure on M . Many of the elementary formal properties carry over from Riemannian to pseudo-Riemannian situations. However, since the deeper global properties of the geodesics do not carry over in a routine way, we shall restrict ourselves to the Riemannian case. Of course the pseudo-Riemannian case is very important for the theory of relativity, but we must refer for the moment to Lichnerowicz' book [l] for an account of this aspect. 272
273
20. Riemannian AfFine Connection
THEOREM 20.1 Suppose that M is a manifold, with a Riemannian metric defined by an inner product for vector fields satisfying (20.1). Then there is a unique affine connection V on M such that (a) V,Y = V, X + [ X , Y] for X , Y E V ( M ) ; that is, the torsion tensor is identically zero. (b) Z ( ( X , Y)) = (V,X, Y ) ( X , V z Y ) for Y , X , 2 E V ( M ) .
+
This affine connection is called the Riemannian (or Levi-Civita) connection.
Proof. We prove uniqueness first. Rewrite (b) as < v , x , Y> = Z(X, Y> - (X, = (using (a)), = (using
Y>
z ( X , Y> - (X, V,Z) - ( X , [Z,
Yl)
(b) again), Z(X, Y) - (X, [Z, Y])
- Y(X,
z>+ (VYX, 2 )
= (using (a)),
-
v,
Z(X, Y> - <X, [Z,
Yl>
Z(X, y > - (X, [Z,
Yl>
y<x,z>+
+ (CY, XI, z>
= (using
+ - X(Y, z>- (Y,
= (using (a)), z ( X ,
Y> - (X, [Z,
- Y(X,
z>
VXZ)
Yl)
- Y(X, Z >
+ ( [ Y , XI, z>+ X- (Y, VZX) - (Y, cx,Zl).
Finally, then, we have (VZX, Y> = f(Z(X, Y> - (X, [Z, Y1) - Y(X, + X(Y, z>- (Y, [X, Zl)).
z>+ ([Y,
XI,
z>
(20.2)
Since the right-hand side does not now involve V, uniqueness of V is proved. We can also use this formula to define V,X if it is verified (left to the reader) that when f Z or f Y is substituted for Z or Y (for f E F ( M ) ) , the functionfpulls out to multiply everything on the right. Alternately, we work out V i n terms of a coordinate system ( x i ) . Suppose gij
= (-9
a
a -).
axi axj
The rules (20.1) imply that for each p, ( g i j ( p ) )is a positive definite, symmetric
274
Part 3. Global Riemannian Geometry
matrix. The g E jare the components of the metric with respect to the coordinate system. The Lagrangian L(v) = ilull can then be written as
Let (g,;’) be the inverse matrix to (gij). Finally, then,
(20.4) This formula can serve to define V in each coordinate patch. The proven uniqueness guarantees that the V defined in each patch agrees in the overlap when two patches intersect; hence defines a V operator globally on M . We can use the Riemannian metric to define a length function for curves. If (T: [a, h] -+ M, the fengtfz of (T is f b \ l d ( t ) \ l dt = jbL(d(t)) d t = L(o).
*a
a
(20.5)
Of course the general theory of homogeneous, regular, ordinary variational problems developed in Chapter 14 applies in this special case, but it is instructive to go over the same ground, using the Riemannian affine connection, which is not available when discussing more general variational problems. Certain normalizations of the parametrizations of curves are convenient. First we say that the curve (r is parametrized by arc length if ( ~ ’ ( t ) a’(t)) , = 1 for u I t I b. In this case, b - a is the length of the curve. We say that cr is parametrized proportionally to urc fength if
b Then
=
1,
a = 0,
( ~ ‘ ( t ) , a’(t)) =
length
(T
=
(o’(O),o’(0))
for 0 I t I 1.
(o’(O), 0’(0))’’~.
Consider a homotopy 6(s, t ) , 0 5 s, t I 1, with 6(0, t ) = o ( t ) ; that is, the homotopy defines a deformation of cr. For 0 5 s 5 1, let ( T ~be the curve o,(T)= 6(s, t ) . Let L(o,) = length of crs. We are interested in computing (d/ds)L(o,).For this purpose, we shall use freely the covariant differentiation of vector fields along curves and homotopies ideas developed in Chapter 19, always referring to the Riemannian connection given by Theorem 20.1. We can suppose without an y loss in generality that each of the curves t + 6(s, t ) is parametrized proportionally to arc length. Now
L(cS)=
1 (a, 6(s, 0
t ) , 3, a(s, t ) )
dt.
275
20. Riemannian AEne Connection
Parametrization proportional to arc length implies
= (using 1
t)> = L(aJ2.
(19.14a); that is, the fact that the torsion tensor is zero), “1
Consider now the vector field t -+v(t) = 13,6(0,t ) along the initial curve o(t)= 6(0, t ) . It may be thought of as the injktesimal deformation corresponding to the deformation s + o, of o. In terms of this vector field, the first variation formula is I d 2 ds
- - (L(oJ2)
1
s=o
=
(41,) o’(1)) -
<m, o’(0)) J (40,Vo’(t)> dt. 1
-
(20.6)
Now, if c is to be considered as an extremal of the variational problem, it should be clear that it should satisfy Vo’(t) = 0 ;
(20.7)
that is, the extremal or geodesics are the self-parallel curves or straight lines of the associated Riemannian affine connection. Of course it could be verified that (20.7) is just the Euler equation for the variational problem, but there is no real need to do this here explicitly. Having obtained the geodesics, we would expect next, following the general theory, to try and prove that, at least locally, they are minimizing by using extremal fields. However, again the general theory can be circumvented by use of the Riemannian connection (although the reader will notice that the “extremal field” idea appears in disguised form).
276
Part 3. Globat Riemannian Geometry
THEOREM 20.2
Let M be a Riemannian manifold and let p o be a point of M . For each r > 0, let B(r) be the set of tangent vectors u E M,, whose length is less than r, that is, satisfying ( u , u ) < r 2 . (Thus, B(r) is the open ball of radius r with center 0 in the vector space M,, considered as a Euclidean space with the inner product ( , ).) Then:
(a) If r is sufficiently small, there is a mapping denoted by exp: B(r) + M such that, for u E B(r), t + exp(tu), 0 5 t I 1, is the geodesic of M starting at p o and tangent there to 1‘. (b) The Jacobian of the map exp at the point 0 E B(r) is nonzero. (c) If r is sufficiently small, exp is a diffeomorphism of B(r) with an open neighborhood B(r, p o ) of p o , called an open geodesic ball about y o of radius r . t As p o varies over any compact subset of M , the radius r of a geodesic ball can be chosen uniformly.
Prooj: We have seen that, given u E M,, , there is an E > 0 and a geodesic a(t; c) defined for 0 I f I E , which is equal to p o for t = 0 and tangent at t = 0 to 2’. Since, in a coordinate system aboutp, , cr is determined by a system of second-order ordinary differential equations, with p o and u determining the initial conditions, we see from the existence theorem on solutions of ordinary differential equations that D also varies in a C” way when p o and u vary over the tangent bundle to M . Further, E can be chosen uniformly when p o and u vary over a compact subset of T ( M ) .We also have derived the homogeneity condition: For /z > 0, t + o(At; u) is a geodesic beginning at p , , defined for 05t I c/A, tangent there to Au. By uniqueness, we have a(At; u) = a(t; h).
Thus we can normalize E = 1 at the expense of making u small; hence we can arrange that cr(f; u) is defined for 0 5 t 5 1, (0, u ) < r 2 , when r is sufficiently small. This r will do for part (a), since we can then define exp(u) = cr( 1; u )
when u E B(r).
To prove part (b), since M,, and M are the same dimension, it suffices to prove that exp, maps the tangent space to M,, at 0 onto M,,, . But, if u E M,, , t + exp(rt. ( u / r ) ) is a curve in M beginning at p o and tangent there to u. Hence, exp, maps the tangent vector to the curve t -+ rt . (u/r) onto u. Part (c) follows now from the implicit function theorem. Notice that an If 4:N N ’ is a ( C a ) map of manifolds, and if U is an open subset of N , we say that 4establishes a r/i’eornorphir.rn of U with its image U if: (a) d ( U ) is open in N ‘ , and (b) re--f
stricted to U has an inverse map +((/)
->
U.
r$
20. Riemannian Affine Connection
277
open geodesic ball of radius r can also be characterized as an open subset U of M containing p o such that: (a) Each p E U can be joined to p o by a unique geodesic of length < r, and each geodesic of length less than r beginning at p o lies in U . (b) The map exp- : U -+ B(r) (defined because, by (a), exp restricted to B(r) is 1-1 and onto U is C"). Condition (b) is just a technical point; that is, (a) describes the intuitive geometric meaning of a geodesic ball, but seems to be necessary, since there are of course 1-1 onto C" maps of manifolds that have no C" inverses. For example, the map x x3 of R + R. Another simple consequence of the first variation formula is: Suppose d(s, t ) , 0 5 s, t I 1, is a deformation of curves such that --f
(a) the length of each curve t S(s, t ) is the same; (b) t + o(t) = d(0, t ) is a geodesic; (c) t + v(t) = d,S(O, t ) is the infinitesimal deformation. --f
Then, (v(l), ~'(1))= (v(O), ~ ' ( 0 ) )This . result is known as Gauss' lemma. THEOREM 20.3 Let M be a manifold with a Riemannian metric and let p o be a point of M . Suppose that B(r, p o ) is an open geodesic ball about p o of radius r . Suppose that p E B(r, p o ) and that the length of the unique geodesic of length less than r joining p to p o is W ( p ,po). Then the length of any path? joining p o to p is greater than W ( p , p o ) except if the path actually is the geodesic of length less than r joining p to p o . Proof. The function p - , W ( p , p o ) is a C" function on B ( r , p o ) - p o , since it is just the carry over via (exp)-' of the function v + IJv/)on B(r) - (0), which is known to be C". Now, in general, given a function J'defined in an open set U of a Riemannian manifold, we can define the gradient vector field in U , denoted by gradf, by either of the following equivalent conditions: (gradf, Y ) = d f ( Y ) = Y ( f )
=v(f)
for all p
for all Y E V ( U ) .
(20.8a)
U , all u E U p .
(20.8b)
E
t Recall that a path is a continuous image of an interval of the real numbers, which is made up by putting a finite number of C" curves end to end.
278
Part 3. Ciobal Riemannian Geometry
LEMMA 20.4
The function W defined in U(r,p o ) - ( p o ) satisfies the following condition: (grad W , grad W ) = 1. (20.9) Further, any geodesic o(t), 0 I t I 1, of length less than r , parametrized proportionally to arc length and beginning at p o , satisfies the following condition : a’(t) =
W(a(t)’ t
grad W(o(t)).
(20.10)
In particular, o is, after a change in parametrization, an integral curve of grad Wfor 0 < t 5 1 . Proof. Let y(s), 0 5 s 5 1, be a curve in B(r,po) - (po), with y(0) = p , y’(0) = zil. By definition of B(r, p o ) , there is a C“ homotopy 6(s, t ) , 0 s, t I 1, such that:
(a) 6(s, 1) = Y(.+ (b) For each s, t + 6(s, t ) is the unique geodesic of length less than r joining p o to y(s), parametrized proportionally to arc length. Thus the length of this curve is equal to W(y(s),p o ) . Let r ( t ) = 13,6(0, t ) be the corresponding infinitesimal deformation. Since
6(s, 0) = p o , we have v(0) = 0. Also, u(1) = ul. Thus the first variation
formula (20.6) can be rephrased as follows:
But the left-hand side of this relation is
Since this must be true for all u1 E M , , we have W P , P o ) grad W(P>= a’(1).
(20.11)
But, since the curve t + a(t) = 6(0, t ) is parametrized proportionally to arc length, we have (a’(l),o’(l)) = (length a)’ = W ( p , p0)’.
Also
(@‘(I)>
W)) = W(P?Po)2(grad W P ) , grad WP)).
Canceling, we have (20.8).
20. Riemannian Affine Connection
To prove (20.91, introduce for each I OA(t) = a(Lt),
Then o,(l)
= o(I). Substituting CAV)
=
E
279
(0, 11, the geodesic 0 I t 2 1.
oAfor o in (20.10) gives
w44,po)(grad W(o(4).
But, o,’(l) = Io’(I). Combining these two equations and changing I back to t gives (20.9). Q.E.D. LEMMA 20.5 Let U be an open subset of the Riemannian manifold M , and let F be a closed subset of U . Let W be a continuous nonnegative-valued function in U such that: (a) F = ( p E U: W ( p )= 0). (b) W is C“ in U - F, and there (grad W, grad W ) = 1. Then any path in U joining a point p o E F to a point p E U - F must have length 1 W ( p ) ,and equality can hold only if the path actually is a geodesic. Proof. Let o(t) be a path joining p o t o p , 0 4 t 5 1. We can suppose without loss of generality that o(t) E U - F for 0 < t I 1. Let E be a number >O.
4 /ell(grad W(o(t)),o’(t))l dt 5 (using the Schwarz inequality? and condition (b),
As E + 0, the left-hand side approaches, by coiitinuity of W , W(o(l)), while the right-hand side approaches the length of o.
t The Schwarz inequality asserts that in a real-vector space V with a positive definite symmetric bilinear form ( , >, for all u, u E V : I ( u , u>l S ” ~ ( U , u ) ~ ” .Equality can hold only if u and u are proportional.
280
Part 3. Global Riemannian Geometry
Suppose that W(o(1))
Choose
E
1
s,
J:
1
0
(~’(t)~ ,
‘ ( t ) ) ’ ’d~t .
sufficiently small that W ( E < ) W(o(1)). Then Jb
Now,
=
( ~ ’ ( t ) ,~
(a’(t),
’ ( t ) ) ’ dt /~I
J: ‘l(grad
W ( a ( t ) ) ,o’(t))l dt
+ W(E).
a’(t)) dz 2 W ( E )hence ;
1
(a’(t), a’(t))’’2dt I
Iz11(grad W(a(t)), a‘(t))l dt
+ j E ( d ( t ) ,~ ‘ ( t ) ) ’ ’dt. ~ 0
Hence, Jci(o’(t),~ ’ ( t ) ) ’ dt / ~5
or 05
s,
1:
‘I(grad W(a(t)), a’(t))l dt
1
(l(grad Wa(t>),o‘(t)>l - ( ~ ’ ( t ) o’(t))’/’(grad ,
W(a(t)), grad W ( a ( t ) ) )dt.
By the Schwarz inequality, this is possible only if the integrand is zero, forcing o’(t) and grad W(o(t))to be proportional; that is, o is, perhaps after a change in parametrization, an integral curve of grad W. The proof of Lemma 20.5 will be completed after proving the following lemma.
LEMMA 20.6 Let W be a C“ function in an open set U of a Riemannian manifold such that (grad W , grad W ) = 1. Then any integral curve of W is a geodesic. Proof.
For X E V ( U ) , we have 0 = X(grad W , grad W ) = 2(V, grad W , grad W )
+
X, grad W ) 2 ( [ X , grad W l , grad W ) = 2 grad W ( ( X , grad W ) ) - 2(X, Vsrad grad W ) + 2 ( [ X , grad W ] ,grad W ) . = 2(Vgrad
Now suppose that 0 = (grad W, X )
=
X ( W ) . Then
([X, grad W ] , grad W ) = [ X , grad W ] (W ) = X grad W (W ) - grad W X ( W ) = X((grad W , grad W ) ) - 0 = 0.
20. Riemannian Afene Conneetion
28 1
Thus we have
<x,Vgrad w grad w >= 0. Since X can be an arbitrary vector field perpendicular to grad W, for each p E U, Vgrad grad W ( p ) must be a scalar multiple of grad W k ) . On the other hand,
0 = grad W((grad W, grad W } ) =
grad W, grad W).
Hence this scalar multiple must be zero. Thus we have finally Vgred grad W = 0.
Suppose that cr(t), 0 I t 5 1, is an integral curve of grad W; that is, d ( t ) = grad W(o(t)).
By the very definition of the covariant derivative of a vector field along a curve, we have Vo‘(t) = vgrad
grad W(o(t))= 0;
hence cr is a geodesic. Return to the proof of Theorem 20.2. The lemmas prove that for p E B(r, po),W (p, p o ) is greater than the length of any curve in B(r,p o ) joining p to p o other than the geodesic of length W ( p ,p o ) joining p to p o . The only other possibility is a path ~ ( t )0, 5 t I1, joining p to p o that does not completely lie in B(r, po). Since M - B(r, po) is closed in M , there must be a first instant to E (0, 1) at which o(t,) 4 B(r,po). We must then have: length (r 2 r > W ( p ,po). For otherwise the length of the curve is, say, r - E. Suppose, for example, that a([) = exp(u(t)) for 0 I t < t o , where t -+ u(t) is a curve in B(r). But as we have seen, u(t) must then lie in B ( r - 8 ) ; hence there is a sequence (t,) of real numbers, 1 Ia < co, converging to t o with ~ ( t , -+) vo E B(r) as a .+ 00. Then, by continuity of the exp mapping and cr, exp(u(t,)) converges to both Exp(v) and cr((to) as CL + co; hence, cr(to)E B(r), a contradiction. This finishes the proof of Theorem 20.3. 1 COROLLARY
Let M be a Riemannian manifold and define a real-valued function d: M x M --f R as follows: For p, q E M,d(p, q) is the greatest lower bound of the length of all paths joining p to q. Then d satisfies all the axioms necessary to define a “distance function” on M, that is, to define a metric on M in the sense of point-set topology. The topology on M defined by this metric agrees with that already given by the manifold structure.
282
Part 3. Global Riemannian Geometry
If B(r, p o ) is an open geodesic ball defined about p o , if p W ( p ,p o ) is the function defined above, then 4 P >Po)
=
W P , Po).
E B(r, po),
if
(20.12)
Further, if cr(r), 0 I t I 1, is a path in M such that d(o(O),cr(1)) = length cr, then o (after possible reparametrization) is an unbroken geodesic curve joining o(0) to a(1). Proof. To show that d is a bona fide distance function in the sense of point-set topology, we must show that (a) d(P7 9 ) = 4% P> (symmetry). (b) d(p, q ) 5 d(p, p o ) d ( p o , q ) (triangle inequality). (c) d(p, q) = 0 implies p = q.
+
(a) and (b) follow immediately from the definition of d. To prove (c), note first that (20.1 1) follows from the theorem. Hence, if r is the radius of a geodesic ball about p, and if d(p, q ) < r , then q must lie in this geodesic ball. Then, further, if d(p, q) = 0 , also W ( p ,q) = 0, and we see that p = q. That the topologies agree is merely a fancy way of saying that a sequence ( p a ) , 1 CI < 03, of points of M converges to p as CI .+ co if and only if d(p, p,) 0 as CI -+ 0. This follows similarly by circling p with a geodesic ball. T o prove the last remark, notice that the additivity of the length, when paths are placed end to end, forces d(o(a),o(b)) = length o for a I t I 6, whenever a, h E [0, 11 if it is true for a = 0, b = 1. Let r be a uniform radius of geodesic balls about all points of a. We can suppose without loss of generality that d(cr(O),a(1))= 1 = length cr, and that o is parametrized by arc length; that is, length of o, 0 I t I t o , is to . Thus if 0 I E < r. a(to & E ) E B(o(t,), r ) --f
By the theorem, a ( t ) must be an unbroken geodesic for to I t I to + E and to - E t 5 t o . There is an a priori possibility of the two geodesic segments meeting in a corner at t = t o , but the fact that to can be any element of [0, I ] rules this out also. Progressing by “continuous induction”? on t, 0 I t I 1, proves that o(t) is an unbroken geodesic for 0 I t I 1.
Ordinary induction proves statements about functions k +f(k), whose domain is the set of all integers, by proving the statement for k = 0, then by showing that its truth for k implies its truth fork 1. Continuous induction, very useful in differential geometry, proves statements about maps t +f’(t) defined over an interval, say, 0 < t I 1, by proving it for I 0, then by showing that its truth for to E [0, 1 1 implies its truth for to -t 6, where6 is some small number independent of t o . Its basis is, of course, the fact that any interval of real numbers is connected, and the remark that verifying this condition verifies that the set of all t for which the statement is true is open and closed. When, as above, we think it is routine to carry out this process, we shall leave it to the reader as an exercise.
+
~
283
20. Riemannian M n e Connection
COROLLARY 2 Suppose that N is a submanifold of a Riemannian manifold M , that a: [0, 11+ M is a geodesic in M such that (a) o(1) E N . (b) d(p, a(0)) 2 length (r for all p E N ; that is, distance from o(0) to any point of N . Then (a’(l), v) = 0
CJ
realizes the shortest
for all v E N u , , , ;
that is, a is perpendicular to N at its point of contact.
Proof. The first variation formula tells us that if s -+ 0, is any deformation of a with v : t -+ v(t) the infinitesimal deformation vector field along a, then
If such a deformation has its end point fixed at t = 0, and if J + ~ ~ ( lies 1 ) in N , then the left-hand side must be zero, v(0)= 0, which forces (u( l), a’(1)) = 0.
Of course v(1) must then be in N u ( l ) ,but we do not know a priori that any v E Nu(l)can be exhibited as v(1) of such a deformation. However, it is clear that this is so if a(0) is in a sufficiently small neighborhood of a(1). Now use the fact that o(t), to I t I 1, must be a minimizing curve from a(to) to a(1) for each to E [0, I), to finish the job.
21
The Hopf-Rinow Theorem Applications to the Theory of Covering Spaces
The results in Chapter 20 are basically local in nature, but we shall see in this chapter that they can be put together to prove interesting and nontrivial results concerning the global properties of the geodesics of a Riemannian manifold. While the main difference between modern” and “ classical” differential geometry is this concern with global results, this is also a good illustration of the maxim that there is often in practice not a great deal of difference between the two sorts of problems. “
THEOREM 2 1.1 (HOPF-RINOW) Let M be a Riemannian manifold and let p o be a point of M such that all geodesics beginning at p o can be indefinitely extended. Then : (a) Any point of M can be joined to p o be a geodesic whose length is the distance between it and p o . (b) As a metric space in the sense of topology, M is complete in the sense that any Cauchy sequence of points has a limit. Further, every bounded closed subset of M is compact. Conversely, if (b) is true, then a geodesic beginning at any point of M can be indefinitely extended. In particular, any two points of M can be joined by at least one geodesic realizing the distance. Proof. Let B(r, p o ) and S(r, p o ) be the set of all points p E M such that, respectively, 4 p , pol < r
and
4 p , pol
=
r.
Let exp: M,, + M be the map that assigns to each u E M,, the point exp(u) = a(1, u ) , where t -+a(t, u) is the geodesic beginning at t = 0, tangent there to u. (Since geodesics can be parametrized over (- co, a),there is no necessity of assuming that u is small in length.) We shall prove by continuous induction on r that all points in B(r, p o ) can be joined to p o by a geodesic realizing the distance. Now, if r is sufficiently small, this is so by Theorem 20.3. Suppose that r o is the least upper bound of numbers having this property. If we suppose that ro < a,we shall derive a contradiction. 284
21. Hopf-Rinow Theorem-Theory
285
of Covering Spaces
Clearly, ro itself has the property. First we show that all elements S(ro,p o ) can be joined to p o by a geodesic of length ro . For if p E S(r, po), there is at least one sequence (p,), 1 Ia < 00, in B(ro,po), converging to p as a -+ co. Then there is a sequence u, E M,, with lIu,l\ 5 ro and p a = exp(u,). By at most taking subsequences, we can suppose that u, itself converges to v, say, as a -+ co. We must then have: /lull 5 r o , exp(u,) + exp(u) as a + co by the continuity of the exp mapping; hence p = exp u. If ((u((< ro ,we would have p E B(ro, po), which is not so; hence t -+ exp tv, 0 5 t 2 1, is a geodesic of length ro joining p o to p . Note further that S(ro,p o ) is the image under exp of a compact set; hence it is compact. Second, the definition of ro as a least upper bound implies that there is a sequence (c,) of real numbers, all > ro , converging to ro as o! 00, and a new sequence of points p , E S(r,, po), each of which cannot be joined to p o by a geodesic of length r , . Let q, be a point of S ( r o ,p o ) such that --f
4 p , 4.) I 4 p , , 4 )
for all q E S(ro
pol.
(Such a point exists, since S ( r o ,p o ) is compact.) Again, since S(ro,p o ) is compact, we can suppose, after possibly taking subsequences, that q, -+ qo E S(ro,p o ) , as a -+ 00. Then, using the triangle inequality, we see that p , -+ qo as o! + 03. In particular, if o! is sufficiently large, p a lies in a geodesic ball about q, . The geodesics realizing the distance from p a to q, and from p o to q, cannot meet with a corner at q,: For otherwise the corner could be “cut across” to define a path joining p , to p o of length less than ro + d(p, ,q,), which would provide a curve joining p a to S(ro,p o ) of length less than d@,, 4,). (See Fig. 5.)
FIGURE 5
For similar reasons, ro + d(p,, qa) = d(p,, ro). But, the geodesics meeting without a corner at q, define an unbroken geodesic of length ro + d(p,, q,) = d(p,, ro) joining p o to pa, which is the desired contradiction. To prove that all bounded closed sets are compact, note that any such set is contained as a closed set in B(r, p o ) u S(r, p o ) for r sufficiently large, which we know is compact. Completeness as a metric space follows. T o prove the converse, we shall show, assuming that M as a metric space is complete, that any geodesic a([) defined for 0 I t < 1 can be extended to be
286
Part 3. Global Riemannian Geometry
defined over 0 I tI 1 + E , for some E > 0. That any geodesic segment can be extended over (- co, co) is then proved by continuous induction on t. Geodesics are parametrized proportionally to arc length. Thus, if 0 I s, t
But then the union of the set of points of a with p1 is compact. Applying the fundamental existence theorem for geodesics, there exists an E > 0 such that unifo.rm/y in to E [0, 1) there exists a geodesic starting at each t o , tangent there to a’(to) and defined over 0 5 t 5 E. Except for a translation of parameter, this constructed geodesic segment must agree with a;hence a can be extended beyond 1 by taking to sufficiently close to 1. This finishes the proof of the Hopf-Rinow theorem, one basic theorem in global Riemannian geometry. We shall now discuss some useful applications of this theorem to the theory of covering spaces. Definition
Let M and N be connected topological spaces; 4 : M - t N a continuous map of M onto N . 4 is called a couering map if each point p E N has an open neighborhood U such that each component of 4 - ’ ( U ) is homeomorphic to U under 4. This purely topological definition is most useful when combined with differentiability assumptions. We shall limit ourselves to proving here the following sufficient condition that a differentiable map be a covering map. It will have useful applications later on. THEOREM 21.2 Let M and N be manifolds, 4 : M-+ N a (Cm) map with everywhere nonzero Jacobian. Then 4 is a covering map of M onto N if and only if the following condition is satisfied: There is a pair ( p o ,qo) of points, p E M , q E N , with 4(p0) = q o , such that, for every curve a : [0, 11 + N with a(0) = q o , there is a lifted curve y : [0, 11 M , with y(0) = p o and b(y(t))= a(t) for 0 5 t 5 1. --f
The proof will consist in a series of lemmas.
(21.1)
21. Hopf-Rinow Theorem-Theory of Covering Spaces
287
LEMMA 21.3 Let M and N be a Riemannian manifold, (b : M isometry; that is, it satisfies ((b*(u), (b*(u)) = ( u , u )
for all p
E
+N
a map that is a local
M , all u,u E M,.
Suppose in addition that M is complete? as a Riemannian manifold. Then (b is a covering map. Proof. It is easily seen that (b carries geodesics of M into geodesics of N . Let q E N , and let B(r, q) be an open geodesic ball about q. We shall show that every component of 4-'(B(r, q)) is mapped diffeomorphically by (b onto B(r7 s). Of course it follows from the local isometry condition that 4 has nonzero Jacobian; hence, by the implicit function theorem, it maps open sets into open sets. Let p E (b-'(q). A point of M whose distance from p is less than r can be joined to p by a geodesic whose length is that distance, whose projection under (b lies then in B(r, q). Thus, if B(r,p) is the set of points of M whose distance from p is less than r, then (P(B(r,p ) ) = B(r, q). (It is onto B(r, q), since if o(t), 0 I t 5 1, is a geodesic of N starting at q of length less than r , if ol(t),0 5 t 5 1, is the geodesic of M such that ol(0)= p , a,'(O) = (b;'(o'(O)), then (b(o,(t))= o(t), by uniqueness of geodesics in N . ) To prove: 4 restricted to B(r, p ) is 1-1. Suppose that pl, p 2 E B(r, p ) , with 4(pl) = (b(p,). p 1 and p 2 are endpoints of geodesics of M beginning at p of length less than r. Their projections lie in B(r, q) and meet at beginning and at end points; hence they coincide. Then p1 must equalp,; otherwise, (be could not be 1-1. Thus, (b, as 1-1, onto nonzero Jacobian map: B(r, p ) + B(r, q ) must be a diffeomorphism. We now show that B(r,p) is a connected component of (b-'(B(r,q)). Suppose that y(s), 0 Is I 1, is a curve in (b-l(B(r, q)),with y(0) = p . We must show that the distance of y(s) t o p is less than r for 0 i s I 1, that is, that y(s) E B(r, p ) . Now N is also a complete Riemannian manifold, for a small geodesic segment of N can be lifted via 4 to M , extended to (- 03, a),using completeness of M , and the projection down to N must by uniqueness be the extension to (- co,03) of the small geodesic segment with which we started. Thus the exp map of N can be extended over all of Nq .
t A Riemannian manifold is said to be complete if conditions of the Hopf-Rinow theorem are satisfied for example, the condition that all geodesics can be indefinitely extended. In this case, map exp can be defined over the whole tangent space at each point.
288
Part 3. Global Riemannian Geometry
By the definition of B(r, q) as a geodesic ball, there is a curve v(s) E Nq with
(4s),
4s)) < 2 .
for 0 I s I 1 ;
exp(v(s)) = 4(y(s))
v(0) = 0. Let u(s) be the curve in M , with +*(u(s)) = u(s) for 0 I s I 1. Clearly, then, 4 exp(.(s)) = #(y(s)). Since both curves s + exp(u(s)) and y(s) have the same image under 4 and coincide for s = 0, they must coincide for all s (otherwise q5* could not be 1-1). In particular, the distance of y(s) to p is majorized by ~ \ u ( s )that ~ ~ , is, by r ; hence y(s) E B(r,p), which is what we had to prove. Finally, it should be clear that the collection of B ( r , p ) , where p runs through &-'(q), exhausts the connected components of 4-'(B(r, 4)). Since q is arbitrary in N , this proves that 4 is a covering map. LEMMA 21.4
(NOMIZU-OZEKI)
Let M be a manifold with a Riemannian metric. Then M has a new Riemannian metric conformal to the old, which is in addition complete. Proof. Two Riemannian metrics (u, v) + ( u , u ) and (u, v) -+ ( u , v)' are said to be conformal if there is a positive valued C" function of an M such that
( u , v)'
= f(p)'(u,
v)
whenever u,v E M , .
For p , q E M , let d(p, q) and d'(p, q) be, respectively, the distance between p and q with respect to the old and new metrics. For r > 0, let B ( r , p ) and B'(r, p ) be the set of q E M such that, respectively, d(p, q) < r and d'(p, q) < r . Let c(p) be the least upper bound of the set of numbers r such that the closure of B(r, p ) is compact. Then the closure of B(c(p),p) must be noncompact. Now, if c(po)= m for some p o , we would be finished, for bounded closed subsets in the old metric would be compact. Then we can suppose that 0 < c ( p ) < 00. We now prove IdP) - 4 q ) l I 4 P ,4 )
for all P A
E
M.
(21.2)
By symmetry, it suffices to prove that c(q) 2
0) - 4 P ,4 ) .
Suppose, then, that ( p a ) , 1 I o[ < 03, is a sequence of points of M such that 4%P,) I r < c(p) - 4 p , 4 ) . Then d(p, p , ) I d ( p , q ) + d(q, p,) I 0, q) + r < d(p, q ) c(p) - d(p, q) = c(p). Thus a subsequence of the ( p a ) converges,
+
21. Hopf-Rinow Theorem-Theory of Covering Spaces
289
since the closure of B(d(p,q) + Y , p ) is compact. Also, then, B(Y,q) is compact; hence C ( d 2 C(P> - d(P,d. Expression 21.2 shows that c is a continuous function of p ; hence, so is I/c. Let the function f used to define the new metric be any C“ function f such that for all p
E
M. (21.3)
In particular, the closure of B’(3, p ) in M is compact. For the proof, let o(t), 0 < t 1, be any path in M starting at p whose length in the new metric is less than 1/3. Suppose that 0 is parametrized proportionally to arc length in the old metric, and that the length of 0 in the old metric is L. Then 1
lof(a(t))(of(t),o’(t))‘/2dt = length of o in new metric < 3
dt.
BY (21.4, c(a(t)) i c(a(0))
+ d(a(O), a(t)) i c ( p ) + L.
Combining these inequalities, we have
This proves that o(1) E B((c(p)/2),p); hence it proves (21.3). To prove that A4 is complete in the new metric is now a consequence of the Hopf-Rinow theorem, since the “ uniform radius” for compact sets makes it easy to show that geodesics can be indefinitely extended. We can now prove Theorem 21.2. That (21.1) is satisfied if 4 is a covering map is easy to see, and will be left to the reader as an exercise. Conversely, make N into a complete Riemannian manifold, using Lemma 21.4. One can make M into a Riemannian manifold so that 4 is a local isometry by setting (u,
0)
=
(4*(u), 4*(u))
for u,u E
M p
(We refer to this as pulling buck the metric on N via
9
P E M.
4.)
290
Part 3. Global Riemannian Geometry
To infer that 4 is a covering map, it suffices, in view of Lemma 21.3, to show that this metric on M is complete. By the Hopf-Rinow theorem (Theorem 21.1), it suffices to show that geodesics of A4 starting at p o can be indefinitely extended. Then let o(t), 0 5 t I E , be a small geodesic segment of M starting at p o . Let a be the least upper bound of the numbers to such that the geodesic 0 can be extended over 0 I t I t o . Thus 0 can be extended over 0 5 t < a , but no farther. However, 4(o(t))is a geodesic of N , for 0 i t 5 a. If a < 00, this geodesic can be extended over 0 I t < a + E , by completeness of N . By (21.11, it can be lifted to a curve of M starting at p o , which clearly must be a geodesic of M . Since 4 has nonzero Jacobian, this lifting must agree with o for 0 i t < a ; hence, also, 0 I t i a + E . This is a contradiction: hence a = 00 and c can be indefinitely extended.
COROLLARY Let M be a complete Riemannian manifold, p o a point of M . Suppose that the map exp: M,, M has an everywhere nonzero Jacobian. (In terms of a later definition, this condition means that p o has no conjugate points.) Then this map is a covering map. In particular, if M is simply connected, the map exp is a diffeomorphism. --f
Proof. Lemma 21.3 applies, since the pullback of the metric on M via exp; has the straight lines t tv as the geodesic through the origin; hence it is complete by the Hopf-Rinow theorem. That 4 is a diffeomorphism if M is simply connected is a standard fact about covering maps. (The proofs are left to the reader as an exercise.) --f
22 The Second Variation
Formula and Jacobi Vector Fields
Suppose that 6(s, t ) , 0 5 s, t 2 1, is a homotopy in the Riemannian manifold M. For fixed S, let osbe the curve t 6(s, t ) , assumed to be parametrized proportionally to arc length. Let o = no,and let u(t) = as6(s, t ) be the vector field on o defining the corresponding infinitesimal deformation. Suppose that o is a geodesic. If L(uJ is the length of the curve os, we have, from the first variation formula (20.6), ---f
This is the second variation formula of arc length. We shall be mainly interested in the case where the homotopy 6 has fixed end points; that is, 6(s, 0) = o(O), 6(s, 1) = a(1). In this case the first two terms on the right-hand side of (22.2) vanish. Since, in addition, u(0) = 0 = u(l), the first term in the integrand can be integrated by parts, to give in this case,
(22.3) This is the most useful form of the second variation formula. For example, if L(o) = distance from o(0) to a(1) (in this case we say that o is a minimizing 29 1
292
Part 3. Global Riemannian Geometry
geodesic), the left-hand side of (22.3) must be 20. It is clear, qualitatively at least, that this will involve nontrivial conditions on the quantities (in particular, the curvature tensor R) on the right-hand side. Our goal is to work out these conditions in detail. Note for future reference that the second variation formula applies also to homotopies 6(s, t ) such that t -+ 6(s, t ) is, for each J, merely a path, but where, for each t, s + 6(s, t ) is a C“ curve. Conversely, given a vector field u(t), 0 I t 5 1, along a geodesic o(t) whose length Ilu(t)ll is sufficiently small (if M is complete, this condition is unnecessary) and whose components in local coordinate systems are continuous, piecewise C“ functions o f t , we can define such a deformation 6(s, t ) with d,6(0, t ) = u ( t ) by the formula 6(s, t) = exp(su(t));
(22.4)
that is, for each t , s -+ 6(s, t ) is the geodesic starting at o(t)and tangent there to u(t). If L(s) is the length of the path t -+ exp 6(s, t ) , 0 5 t I 1, it should be clear that the first and second variation formulas hold. For example, if u(0) = 0 = u(l), then 6(s, 0) = a(0), 6(s, 1) = a(l), d
- L(0) = 0, dS
In particular, if the integral is < O or >O, each of the paths t -+6(s, t ) is respectively, for sufficiently small s, shorter or longer than a.
THEOREM 22.1 If ~ ( t )0,_< t 5 1, is a geodesic of a Riemannian manifold M , if u : t -+ u ( t ) E Mu(,)is a vector field along a that is the infinitesimal deformation corresponding to a deformation s aFof a, with each a, a geodesic of M (called a geodesic deformation of a), then u ( t ) satisfies the following Jacobi differential equation -+
VWt)
+ R U ( , ) ( 4 t )o’(t))(o’(t>) > = 0.
(22.5)
Any vector field satisfying these (linear, second order, homogeneous ordinary) differential equations is called a Jucobi j e l d along o. If, further, a, is a geodesic deformation with fixed end points, then length (a,) is constant in s. Proof. Equation (22.5) follows from (22.1), on remarking that the explicit condition that o,(t) = 6(s, t ) define a geodesic deformation is: V, d,6(s, t ) = 0.
22. Second Variation Formula-Jacobi
Vector Fields
293
Dejin it ion Let o ( ~ )0, 5 t 5 1, be a geodesic of M . The point cr(1) is said to be conjugate to o(0) with respect to a if there is a nonzero Jacobi vector field u along o such that a(0) = 0 = a(1). THEOREM 22.2 Let M be a complete Riemannian manifold and let p be a point of M . Let exp: M,, -+ M be the usual exponential map that assigns to each v E M,, the end point of the geodesic starting t o p and tangent to v . Let a(t), 0 I t 5 1, be a geodesic of M with o(0) 5 p. Then a(1) is conjugate to a(0) with respect to cr if and only if the map exp has a zero Jacobian at the point o‘(0) E M p. Proof. Suppose first that exp has a nonzero Jacobian at o’(0).Let y(s), 0 I s 5 1, be a curve in M,, such that (a) y ( 0 ) = a’(0); (b) exp,(y‘(O)) = 0, but y‘(0) # 0. Let 6 be the homotopy
6(s, t ) = exp(ty(s)),
0 Is, t 5 1.
If u(t) = as6(s, t ) , then v(0)= 0, since 6 has the t = 0 end point fixed. But a,6(0, 1) = tangent vector to curve s 4exp(y(s)) at s = 0 = exp,(y’(O)) = 0. Now
v, a, 6(0,0) = v, a, 6(0,0).
Since a,6(s, 0) = y(s), Vu(0) = v,
I,=o
a, 6(0,0) = d;;)
f 0;
hence u(t) cannot be identically zero. The converse will follow from the following more general result. LEMMA 22.3 Let u ( t ) be any Jacobi vector field on 0 , let y(s), 0 5 s 2 1, be any curve in M with y(0) = p , y’(0) = u(0). Let u(s) be any vector field along y such that Vu(0)= Vu(O),t u(0) = o’(0). Let 6(s, t ) = exp(tu(s)). Then 6 is a geodesic deformation of o whose infinitesimal deformation vector field is precisely v . Proof.
D, 6(0,0)
= u(O),
since y ( 5 )
= exp(0 . u(s)).
v, a, 6(0,0) = v, a, 6(0,0).
t Of course v u is the covariant derivative of t + u(s) along s + ~ ( s ) , while covariant derivative of I + u ( t ) along I + o(i).
vv is the
294
Part 3. Global Riemannian Geometry
a, 6(s,
0) = u(s) ; hence V, d , 6(0, 0 ) = Vu(0) = Vu(0). Thus t -+ d,6(0, t ) and t + v(2) are two Jacobi vector fields along o such that u,(O) = v(O), Vu,(O) = Vu(0). By the uniqueness of solution of the Jacobi differential equations (22.3), u ( t ) = u l ( t ) = d , 6(0, t ) for all t , which proves the lemma. Return to the theorem. Suppose that t + u(t) is a Jacobi vector field along u such that v(0) = 0 = ~(1).Apply the lemma with y(s) = o(0) and u(s) = sVu(0) + o'(0) for 0 2 s I 1. Then 0 = u( 1) = a, 6(0, l), where
Now
= ul(t)
6(s, t ) = exp(t(o'(0)
+ s Vu(0))).
Thus the image under exp, of the tangent vector to the curve s + o'(0) + s Vu(0) at s = 0 is zero; that is, exp, is not 1-1 on the tangent space to M p at o'(0). Q.E.D. We now turn to the main theorem, due to Jacobi, concerning the significance of conjugate points in the calculus of variations. THEOREM 22.4 Let A4 be a Riemannian manifold and let o(t),0 2 t I 1, be a geodesic of M . Suppose that for some to E (0, l), o(t,) is a conjugate point of o(0) along o for 0 I tI f, . Then o is not a minimizing geodesic between o(0) and a(l), since there exists a continuous, piecewise C" vector field u* vanishing at the end points of o with H(u*) < 0. Also, of course, one deduces that if H(u*) 2 0 for all piecewise C" vector fields vanishing at the end points of o, then o(t) cannot be a conjugate point of ~ ( 0for ) 0 I t < 1. LEMMA 22.5 Let M be a Riemannian manifold, let o: [0, 11 -+ M be a geodesic such that o(1) is not a conjugate point of o(0)with respect to o.Then, given an element u, E M,(,,, there is a unique Jacobi vector field u(t), 0 I t I 1, along o such
that c(0) = 0, c(1) = 21,.
Proof. We see from the Jacobi differential equations that a Jacobi vector field 21 is determined uniquely by its values u(0) and Vu(0). Thus, if u(0) = 0, a u exists, given an arbitrary Vu(0);that is, the space of all such Jacobi fields is a vector space of the same dimension as M . The map u -+ v(1) must then be an isomorphism of vector spaces, since it has kernel zero.
LEMMA 22.6 Let M be a Riemannian manifold; o: [0, 11 M a geodesic. Let us be a continuous one-parameter family of Jacobi fields along CT such that --f
lim v"(t) = 0 7+0
for 0 I t I 1.
22. Second Variation Formula-Jacobi Vector Fields
295
Then lirn Vu"(t)= 0
for 0 5 t 5 1.
s-ro
Proof. As we remarked above, the linear space of all Jacobi vector fields has 2 x dim A4 as its dimension. Suppose u i , 1 5 i 5 2n, forms a basis for this vector space. Thus us = a,(s)u,.
We must show that lims+o v"(t) for all t implies that lims-roa,@) = 0. This will finish the proof, since then Vu" = a,(s) Vvi . We can suppose the ui chosen so that ui(0) = 0 for 1 s i In, but ( ~ ~ ( for 0)) n + 1 I 1 I 2 n forms a basis of M u ( o )Then . lim ~ ' ( 0=) lim S-t
0
s-0
(, C
j>n+l
)
ai(s>ui(O) = O ,
+
forcing lims+o a&) = 0 for a = n 1 5 i I 2 n . Now, an interval of a, say, 0 5 t 5 E , is free of conjugate points. Then lim us(&)= lim s-ro
s-0
(, C
jZn+l
ai(s)ui(E)
+ j sCn ai(S)ui(E)
1
= 0.
Since O ( E ) is not a conjugate point, the (vi(&))are linearly independent elements This forces of Mu,,,. lim
s+o
(3" 1 ai(sjui(E)
= 0;
hence lims.+oai(s) = 0 for all i. If a(t) is a geodesic in M and u is a continuous, piecewise C" vector field on O , define
that is, H(u) is an abbreviation for the second variation formula in the fixed end-point case. Thus, if H ( u ) < 0 for some such vector field vanishing at t = 0 and t = 1, a is not a minimizing geodesic joining a(0) to a(1). Further, if u is a Jacobi vector field, we see, after integrating by parts the first term of the integrand of (22.6), that
N u ) = -
(22.7)
Return to the proof of the Jacobi theorem. Let ~ ( t )0, I tI 1, be a nonzero Jacobi vector field on the geodesic a(t), 0 I tI I , such that u(0) = 0 = u(to) for some to E (0, 1). To prove t is not minimizing, there is no loss in generality in supposing that for 0 < s < to and to - s sufficiently small, no conjugate
296
Part 3. Global Riemannian Geometry
points of o(s) are along a(t), s _< t I 1. For each such s, let u”(t)be a Jacobi field such that u”(1) = 0,
U S ( S ) = u(s).
Since lims+rou(s) = 0, we see, by an argument similar to that used in Lemma 22.6, that lims+rou”(t) = 0 for 0 5 t 5 1 ; thus, by Lemma 22.6 itself, lim (Vus)(t)
“+to
=0
for 0 5 t _< 1 .
(22.8)
Let u”(t) be the continuous, piecewise C“ vector field on o such that u”((t) = u ( t ) for 0 5 t I s,
u”(t) = u“((t) for s 5 t 2 1.
Using (22.7) and the fact that u”(1) = 0 , we have H(u“) = ( V u ( s ) , u ( s ) ) - ( V U S ( S ) , U S ( S ) ) =
(Vu(s) - VUS(S), u(s)).
Now H(u’) -+ 0 as s + to . We then try to find the first derivative of s + H(us) at s = t o : H(VS) s
~
But lim,,,,
u(s)/(s
-
-
to
= (VU(S)- VuS(s),
>-
s u(s) - to
~
to) = Vo(to), and lims+toVuS(s)= 0, by (22.8). Thus
lim __ H(’”) - (Vu(t,), Vu(t,)) > 0. s - to
s+to
Hence H(u“) < 0 for s < to and ( t o - s) sufficiently small, which proves that Q.E.D. there is a shorter path than o joining o(0) to o(1). Having shown that geodesics are not minimizing beyond the first conjugate point, we may ask: What if it has no conjugate points? (The case where the end point 4 1 ) is the first conjugate point is the borderline case that can go either way.) It is unreasonable to expect that the absence of conjugate points, without any further assumptions, would imply that the geodesic was minimizing, since a geodesic’s being minimizing is a global property, whereas the absence of conjugate points is a local condition, Thus the reasonable expectation is that the absence of conjugate points will imply that the length of the geodesic will be less than that of “nearby” paths joining the same end points. One way of making this precise is by requiring that H(u) > 0 for all nonzero, continuous, piecewise differentiable vector fields on the geodesic vanishing at the end points. We shall now show that this (and more) is true.
22. Second Variation Formula-Jacobi
297
Vector Fields
THEOREM 22.7 Let o(f),0 5 t < 1, be a geodesic of a Riemannian manifold M . Suppose that a(t) is not a conjugate point of o(0) with respect to CT for 0 I t I 1. (We shall say in this case that CJ contains no conjugate points.) Let v be a Jacobi vector field on CJ such that v(0) = 0. If u is any other continuous, piecewise differentiable vector field along CT with u(0) = 0, u(1) = u(l), then H ( u ) 2 H ( v ) . Equality holds only if u = v. Proof. Suppose dim M = n. Let v, , 1 I i, j , .. . I n, be Jacobi vector fields on CJ such that v,(O) = 0 and so that the vectors Vvi(0)form a basis of Mu(,, . By Lemma 22.5, v i ( t ) forms a basis for Mu(t)for 0 < t I 1. Suppose that u(t) = ai(t)ui(t)for 0 < t I 1 :
H ( u ) = lim JIC(Vu(t).V u ( t ) ) - ( u ( t ) , & ) ( U ( t ) , &+O
= lim
&
dt
CT’(t))(CT’(z)))]
H,(u),
E’O
where H,(u) is the integral. We now evaluate H,(u): vu
da, dt
= - vi
+ a , VV,.
Substituting, dt
+ aiaj(Vvi, V v j ) - (vi, R ( v j , d)(d))]d t . We now prove that d ((Vvi(t),u j ( t > ) - ( v ~ j ( t )vi(t)>) > = 0. dt
(22.9)
-
(This holds for any pair of Jacobi vector fields, independently of the initial conditions.) The left-hand side of (22.9) is (VVvi, U j )
+ ( V V , ,V u j ) - ( V V v j , v i ) - ( V v j , V V ; )= - ( R ( v ; ,o‘)(cJ‘),
vj)
+(WVj
>
CJTO’),
= 0,
using the identities for the curvature tensor to be provided in Chapter 23.
v;>
298
Part 3. Global Riemannian Geometry
I n our case, where vi(0) = vj(0), note that (22.9) implies that (VVi,
Vj) =
(VVj,
(22.10)
Vi).
Returning to HE(u),using (22.10), integrating by parts the third term, and applying Jacobi equations, we have
Now v(1)
=
u(1) implies that v ( t ) = a,(l)v,(t).
Thus,
H ( v ) = ai(l)aj(l)(Vui(l)> v j ( 1 ) ) ; hence, H,(u)
= H(v)
+
s (dt da,
vi,
&
2v j ) d t
da-
- (ai(&)VV~(E), u(E)).
Now ai
Define a vector field t
vv, = v u - dai - 21; dt
+ w ( t ) along
c for 0 < t 2 1 by
Then u is a Jacobi field if and only if w ( t ) H,(u), we have
=0
for 0 5 t I 1. Substituting in
ff,(u) = H ( u ) + Je'(w(t), w(t>> d t -
+ (W(E),
U(&)).
We suppose that ~ ( t is) not identically zero. To finish the proof of the theorem, we must show that H ( u ) - H ( v ) > 0. Suppose otherwise: Since U ( E ) --f 0 as c + 0, we must clearly have lim ( w ( E ) , w ( E ) ) = co; &-0
hence, also lim ( w ( E ) , u(E)) = - 00. E-0
Now we can choose numbers
E
arbitrarily close to zero for which
<w(c), w ( E ) ) = max <w(t), w ( t ) ) . &St
22. Second Variation Formula-Jacobi
Vector Fields
2 (using the Schwarz inequality),
1IIWo((I(w(E), 2
II4E)II
Hence, as E + 0 through these specially chosen values, grows much faster than J(w(E), u ( E ) ) J hence ; the inequality
299
U(E))~
1; ( w ( t ) , w ( t ) ) dt
lim H,(u) - H ( u ) s 0
&-+O
is impossible.
Q.E.D.
The proof of Theorem 22.7 is due to Ambrose [l], and has the great advantage of being direct and explicit. The theorem was actually originally proved (Morse [l]) by exploiting the basic “extremal field” idea of the calculus of variations. For example, when one tries to find vector fields along r~ with end points at t = 0 and t = 1 fixed, which minimize H (which, from (22.6), is merely an ordinary, nonhomogeneous variational problem), it is seen that a vector field u that does minimize H must satisfy the Jacobi equations; that is, the Jacobi equations are the Euler equations of the second variation formula! We now turn to the simplest important geometric application of these second variation” ideas, the theorem of S. Myers. “
THEOREM 22.8 Let M be a complete Riemannian manifold of dimension IZ.Suppose c( is a positive number such that the Ricci curvature of all tangent vectors of M (see below for the definition) is 2 c1 > 0. Then M is compact. Further, every geodesic of M of length 2 must contain at least one conjugate point, and A4 has diameter s Jniun.
Jqin
?‘root Let a(t), 0 I t I 1, be a geodesic of M that is free of conjugate points. Let IZ = dim M , and let u i , 1 I i , j , . . . 5 n, be self-parallel vector fields (that is, Vu, = 0) along c whose values at each a(t) form a basis of M , ( t ) , and with ( u i ( t ) , u j ( t ) ) = dij.
300
Part 3. Global Riemannian Geometry
If v(t) = ai(t)vi(t), with ai(0) = ai(l), we must have, by Theorem 22.7, H ( v ) 2 0. But
H(v) = jl[(Vv, Vv) - (v, R(v, o’)(o’))] dt 0
= (after
Jb
1
-
integrating by parts the first term and using v(0)= v(1)
[(VVv, v)
= 0),
+ (v, R(u, o’)(o’)>] d t .
Suppose we try to find such a v satisfying
for some constant A > 0.
VVv = -L2v
Working out the conditions u wouId have to soIve, we see that the simplest vector fields of this type would be u i ( t )= sin(nt)ui(t).
Then
H(ui) = 71’ /lsin’(nt) dt 0
1 -
0
sin2(nt)(vi(t), R,(t)(vi(t), o’(t))(o’(t))) dt
Definition Let p be a point of a Riemannian manifold M . Let u be a unit tangent vector ofp. The Ricci curvature of v, denoted by R(v), is the trace of the linear transformation u R,(u, v)(v) of M,, If v is any nonzero tangent vector, define --f
Thus we have
since these are the diagonal elements in the matrix of the linear transform
22. Second Variation Formula-Jacobi
30 1
Vector Fields
with respect to the basis (ui(t)) of M , ( , ) . Finally, then,
0<
r=l
H(ui) = nx2 Jo1sin2(nt)dt - J'sin2(nr)R(o'(t)) d t . (length o ) ~ 0
1
I Josinz(rrt) dr(nn'
-
min R(a'(t)).(length
osts1
Thus, if rnino,t,, R(o'(t))2 a > 0, and if o(t), 0 5 t 1, contains no conjugate points, then length o < J<.i.)n.
1
.
(22.11)
To finish the proof of the theorem, suppose that p , q are arbitrary points of M . Since the metric is complete, there is a geodesic ~ ( t )0, 5 t I 1, joining p to q, with d(p, q) = B . By the Jacobi theorem (Theorem 22.6), o(t) is not a conjugate point of o(0) for 0 I t < 1 (else o could not be minimizing for 0It I l), whence length o 5 Jn/;x, by (22.10). Thus d(p, q) I J & n and the diameter of M is ,---
sup d(p, 4 ) 5
P,qeM
J' a n.
By the Hopf-Rinow theorem, M is also compact.
23
Sectional Curvature and the Elementary Comparison Theorems
Definition
Let M be a Riemannian manifold, with R( , )( ) its curvature tensor. If p E A4 and u, v E A4 are orthogonal tangent vectors of unit length, and if r is the two-dimensional subspace of M , spanned by u and u (a tangent plane), then
m)= ,
is called the sectional curvature of the plane
r.
We shall now collect the algebraic formulas concerning the curvature tensor and sectional curvature that will be needed. As the definition, recall that for X,Y, Z E V ( M ) ,
R ( X , Y ) ( Z ) = V x V y Z - Vy Vx Z - Vrx, ylZ.
(23.1)
This is called the Ricci identity, and expresses the noncommutativity of iterated covariant derivatives. Suppose we assume that X , Y, and 2 commute; that is, [ X , Y ] = 0 = [ Y , Z ] = [X, Z ] . We get an identity for the curvature tensor when we apply V, t o the torsion-free identity, namely, Vy(Vx Z ) = Vy V, X
Y = Vx Vy Z = V, Vx Z
= VxVz
Vy X + R( Y, Z)(X) = Vz Vx Y + R( Y, Z ) ( X ) + R ( Z , X ) ( Y ) + R(Y, Z)(X) + R ( Z , X ) ( Y ) + R( Y, Z ) ( X ) + R ( X , Y ) ( Z )+ R ( Z , X ) ( Y ) + R( Y, Z ) ( X ) ; = Vz
hence, R(X, Y ) ( Z )
+ R(Z, X ) ( Y ) + R( Y, Z ) ( X ) = 0.
(23.2)
This is called the permutation identity for the curvature tensor. Because of the tensorial character of R, it holds for arbitrary X , Y , Z E V ( M ) . We remember it in the following way: Consider the function ( X , Y, Z ) +(the sum of the six permutations of R ( X , Y ) ( Z ) ,each permutation multiplied by its signature). Equation (23.2) expresses the fact that this function is identically zero. (Since R ( X , Y ) = - R( Y , X ) , the six terms collapse t o three.) 302
303
23. Sectional Curvature and Comparison Theorems
Now suppose that X , Y, 2, W E V ( M ) commute: ( N X , Y)(Z), W > = ( V x V y Z - v,v,z, W )
z, w>>- (VY z, v, w>- Y ( ( V , z,W ) ) + ( V , z, v y W ) = X Y ( ( Z , W > )- X ( ( Z , VY W ) ) - Y( )+ Y( )+ X ( ( Z , v y W ) ) - ( Z , v,v, W ) = X((VY
=
( Z , vyv, W - v,vy W ) .
Hence, ( R W , W Z ) , W>=
-
(23.3)
Again, because of the tensorial nature of R, this identity holds for arbitrary X , Y, 2, W ; hence, also for the value of R on tangent vectors. It expresses the skewsymmetry of the linear transformation R(X, Y ) with respect to the form ( , ). Finally an identity can be proved by combining (23.2) and (23.3) in a rather complicated way, following the proof given by Helgason [l, p. 691. First write (23.2) in the form ( N X , Y ) ( Z ) ,w>+ < N Z , m y ) , w>+ ( N Y , Z ) ( X ) , w>= 0. Permute W with X , Y , and Z in turn to obtain these more similar identities:
(NK
Y ) ( Z ) ,x>+ ( N Z ,W>(Y),x>+ ( R ( Y , m w ) ,
x>= 0
( R ( X , W>(Z),y > + (W,X > ( W , Y > + ( N W ,Z>(X),Y > = 0
( N X , Y W ) , Z> + ( N W , X ) ( Y ) ,Z )
+ = 0.
Now add these four identities, noting that many terms cancel by (23.3), and identity R ( X , Y ) = - R( Y , X ) , giving 2(R(X, W ) ( Z ) ,Y > + 2 + 2 ( Y )X, > = 0 or, using (23.3), ( R ( X , W ) ( Z )+ R ( K Z ) ( X ) , Y > + ( N W , Y ) ( Z ) ,X > = 0 =(using (23.2) again), = -(R(Z,
m w > ,y > + (NW, Y ) ( Z ) ,x>.
This leads to our third basic identity for the curvature tensor, which we write in the more easily remembered form ( N X , W Z ) , W > = ( J V Z , W X ) , Y>.
(23.4)
304
Part 3. Global Riemannian Geometry
Now we can show that K ( T ) really does not depend on the choice of orthonormal vectors generating r. Suppose that
id
= cos
O M + sin 8u,
R(u’, u‘) = (cos’ 8
v’ = -sin 8u
+ cos 8v.
+ sin2 8)R(u, u) = R(u, u)
by skew-symmetry of (u, v) + R(u, u). ( u ’ , R(u’, v’)(v’)) = (u’,R(u, u)(u’))
+ sin 8u, R(u, v)( -sin 8u + cos 8u)) f3+ sin2 8)(u, R(u, v)(u)> = K(T),
= (cos 8u = (co?
by (23.3). We can easily derive the formulas for K ( T ) in case vectors u, v span but are not orthonormal. Put u’=- U II u II ’
vf =
(u’, v’) are orthonormal generators of
tion process).
v--
r,
(v, u>u llU1l2
l[~-WIl.
(the Gram-Schmidt orthonormaliza-
and by (23.3),
Now
where 8 is the angle between u and v. Thus (23.5)
23. Sectional Curvature and Comparison Theorems
305
Geometrically, of course, the denominator is just the square of the area of the parallelogram determined by u and u. It will be recognized at once from the preceding definition that essentially it is the sectional curvature of the plane r, that is, K ( T ) = ( u , Rp(u,u)(u)), that appears in the second term of the second variation formula (22.3). As a first observation, the sign of the curvature plays a crucial role. For example, t < 1, is any geodesic, if if the sectional curvature is always S O , if a(t), 0 I u is any continuous piecewise C" vector field on a, then H ( u ) > 0 unless u = 0 identically. By Theorem 22.4, a can have no conjugate points. We deduce from the corollary to Lemma 21.4:
THEOREM 23.1 (HADAMARD AND
CARTAN)
Suppose that M is a complete Riemannian manifold whose sectional curvature is always nonpositive. Let p E M . Then, exp: M p M is a covering map. --f
As a second observation notice from the definition the following property of the Ricci curvature of a unit tangent vector v E M,: With respect to any orthonormal basis ( u i ) , 1 < i < n, of M , chosen so that, say, u = u l , the Ricci curvature is the sum of the sectional curvatures in the planes spanned by u and v i , 2 I i < n. Thus, if the sectional curvatures of M are bounded from below by a positive number, so are the Ricci curvatures; hence, by Theorem 22.8, if M is complete it is also compact, and its diameter is bounded from above, depending only on this positive number. The ultimate purpose of the Rauch type of comparison theorem is to refine this result by giving more information about the topology of such a Riemannian manifold, but in this chapter we shall restrict ourselves to developing fundamental analytical tools. First, however, we discuss the more classical geometric interpretations of the sectional curvature.
THEOREM 23.2
Let p be a point of a Riemannian manifold and r a plane in M p , K ( T ) the sectional curvature of that tangent plane. For each r > 0, let y r be the circle in r about the origin of radius r. Let L ( r ) be the length of the curve exp(y,) in M (assuming that Y is sufficiently small so that the exp mapping is defined in some convex ball of M p about 0 containing 7,). Then
dL dr
- (0)= 271,
d2L dr2
-(0) = 0,
d3L dr3
-(0) = -6nK(T).
306
Part 3. Global Riemannian Geometry
Thus, taking the Taylor expansion of L(r), we have a new geometric definition of K ( T ) .
K(T) = lirn r+O
length y,. - length exp(y,) nr3
(23.6)
7
Proof. Since this is purely a local result about p , we can (and will), for simplicity of notation only, suppose that M is complete. Then, let 6(s, t,) 0 I s, t I 1, be a homotopy such that:
(a) 6(s7 0 ) = P. (b) s -+d,6(s, 0) is the circle of unit length in T, that is, yl, in its arclength-proportional parametrization. (c) 6(s7 t ) = exp(t d,6(s, 0)), that is, t -+ 6(s, t ) is the geodesic starting from p tangent there to yl(s). Then, L(t) = li(d,6(s, t ) , ds6(s, t ) ) ' I 2 ds, 0
_-
LEMMA 23.3 Let 6(s, t ) , 0 I s, f I 1 be a homotopy with 6(s, 0) = 6(0, 0), but V, 3,6(s, 0) # 0. Then lirn t-0
d s 6 ( s , t) -
v, d s 6 ( s , 0)
l l & W 7 till - I I V , a s m 0111.
Proof. lirn IIas6(s' t-0
t2
t)ll
= by
L'H6pital's rule (since lla,6(s, 0)ll
= 0).
Q.E.D.
307
23. Sectional Curvature and Comparison Theorems
Applying Lemma 23.3,
Now V, V, a,6 = V, V, hence V, a, = 0. Then
a, 6 + R(8,6, a, 6)(d, a),
M a , &a, wa, 6), 8,s) + iiv, a,a ii
d2L
and t + 6(r, s) is a geodesic;
IIa, 6 II
First, note by Gauss’ lemma that (a, 6, a, 6) = 0. Let T(s, t ) be the plane in Md(,,,) spanned by a,6(s, t ) and a, 6(s, t ) . Since
and
a, 6(s,
0), V, 8,6(s, 0) E r, we see that K ( r ( s , t ) )+ K ( T ) as t + 0.
(v a 6 ”>‘+(v ’ IlaS6ll
a6
- 2K(T)IIVSd,6(s,
A) v a 6
’ IIV,a,~ll
as t + O
0)II =
as t -+0.
IIV,d,6(s, 0)ll.
308
Part 3. Global Riemannian Geometry
Putting the calculations together, we see that
d2L dt2
-(0)= 0, d3L
__
dt3
(0) = lim t+O
d2L/dt2 ~
= lim
t
1-0
d 2L/dt ~
IIas4l
. IIvt a, 6 II
= - 3 K ( r ) ~'\lVIlJsd1l ds = - 6 zK ( r) . 0
Q.E.D.
COROLLARY 1 Let M and N be Riemannian manifolds, Il4*(v>II = llvll
4 :N
--t
M a map such that
for all v E T ( N ) ,
(that is, 4 is an isometric immersion of N in M ) . Let p o E N and let r be a plane in N,, such that, for all u E r, t -+ +(exp(tu)) is a geodesic of M . Then KN(r)= K M ( & ( r ) ) , where KN and K M are the sectional curvatures with respect to the metrics on N and M . Proof. Let y, be the circle of radius r in r. Since 4 is isometric, the length of d(exp(y,)) and exp(y,) are the same. But, 4(exp(y,)) = exp(d,(y,)). 4*(yr) is the circle of radius r in 4*(r).The result follows from (23.6), applied to r and &(r). This corollary leads to another geometric interpretation of the sectional curvature of a Riemannian manifold M . Suppose first that dim M = 2. Since dim M , = 2, there is only one sectional curvature associated with each point. The function thus defined over M is called the Gaussian curvature of M . Now suppose that dim M > 2; let p E M , and let r be a two-dimensional subspace of M . The union of all geodesics of M starting at p and tangent there to r, at least locally about p , determines a two-dimensional submanifold N ( r ) about p . Now any submanifold of a Riemannian manifold has on it an induced Riemannian metric obtained by restricting the inner product to the tangent spaces to the submanifold. (The resulting metric on the submanifold is such that the inclusion map of the submanifold into the big manifold is an isometric immersion.) If we apply this to N ( r ) , we see from Corollary 1 that the Gaussian curvature of N ( T ) at p is precisely the sectional curvature of r. Of course this fact can also be taken as a geometric definition of sectional curvature. Another remark resulting from Corollary 1 is that if dim M = dim N (that is, if 4 is a local isometry of M on N ) , 4 preserves the sectional curvature.
309
23. Sectional Curvature and Comparison Theorems
This remark is also, of course, implicit in the fact that the Riemann curvature tensor is invariantly attached to the metric. We shall present another useful corollary after a preliminary definition. Definition Let M and N be Riemannian manifolds and let 4 : M N be a map such that (p,(Mp)= N+(,) for all p E M . (4 is then called a maximal rank mapping of M into N . It follows from the implicit function theorem that 4 is then an open mapping.) For p E M , let Fp = {v E M,: (p*(u) = O}. F, is called the space of vertical vectors with respect to 4. Let Fp' = { u E M p :u is perpendicular to F p ; that is, (v, u ) = 0 for all u E Fp}. (p is said to be a compatible mapping between the Riemannian structures on M and N if --f
I~~*(U)II
= llvll
for all u E F / ,
all p
E
M.
(23.7)
COROLLARY 2
Let 4 : A4 -+ N be a compatible, maximal rank mapping between Riemannian manifolds. Let p E M and let r be a two-dimensional subspace of Fp'. Then,
Proof. One can show that if u E Fp', if o(t), 0 5 t I 1, is a geodesic starting at p and tangent there to v, that (a) ~ ' ( tE )FA,, ,for all t E [0, I]; and (b) t 4 ( o ( t ) ) is a geodesic of M . If u is an arbitrary vector of M p , u = u1 + u 2 , with u1 E F p , u2 E F,', (vl, u 2 ) = 0. Thus \lullz = IIulIlz llv2112. But lluzII = I14*(u2)11 = ll4*(u)ll. Hence 1 1 ~ 1 12 II4(u)lI. In particular, if Y is any curve of M , length y 2 length 4(y). Let yr be the circle of radius r about 0 in r.
+
--f
length exp(yr) 2 length
4 exp(yr).
But
Using (23.5), we now get (23.8). Corollary 2 is very useful in proving that certain spaces have nonnegative curvature. The sectional curvature also has a geometric interpretation in terms of the comparative size of geodesic triangles in the Riemannian and Euclidean spaces. The following theorem is representative of this type of result.
310
Part 3. Global Riemannian Geometry
THEOREM 23.4 Let M be a Riemannian manifold, p E M , u o , u1 E M,, . For t E [0, 11, let D ( t ) = distance from Exp(tu,) to Exp(tv,), 8 = angle between u,, and ul. Then D(t)2 has a Taylor expansion of the form D(t)’ = llvl - v,Ilt2
+ K(T)IIv,112 llu1112 sin2 8t4 + .-..
(23.9)
Proof. It is most convenient at this point to use Taylor’s formula for covariant derivatives.
LEMMA 23.5 Let v(t), 0
I ] --t M be a curve in an affinely connected manifold M . Let t I I , be a vector field along o. Then u admits a Taylor expansion:
0:[0,
u ( t ) = u(0)
+ Vu(0)t + V2v(0) -
t2 +
2!
...
I vNu(o)tN
N!
+ u(t)tN+’.
(23.10)
(Literalfy, (23.10) makes no sense, since v(t) E M u ( * )while , the vector on the right-hand side belongs to M,(,, , However, we mean implicitly that they are compared by parallel-translating the left-hand side along a from a(t) to om.) Proof. Of course, if o ( l ) is constant, u(t) is just an ordinary vector-valued function in Mu,,,; hence this is just the usual Taylor’s formula. We can reduce to this case, however, by the following trick : For t E [O, I], let w ( t ) be the vector in Mu(,, which parallel-translates along o to give u(t). It is easily seen that (dw/dt)(t) parallel-translates along o to give Vu(t). Thus the classical Taylor expansion for w ( t ) parallel-translates along a to give (23.10). s I I, 0 I t I I,, be a homotopy Return to Theorem 23.6. Let 6(s, t ) , 0 I in M such that : ( 4 6(s, 0) = P. (b) 6(0, t ) = exp(v, t ) , 6(1, t ) = exp(u,t) for 0 I t I t o . (c) For each t , s --t 6(s, t ) is a geodesic parametrized proportionally to arc length. (If to is sufficiently small, obviously such a homotopy can be constructed.) For notational convenience we shall assume that to = 1. Note that D(t)’ =
J
1
0
I/dS6(s,t)l12 ds.
Condition (c) implies that s + d,6(s, t ) is a Jacobi vector field along the
311
23. Sectional Curvature and Comparison Theorems
geodesic s -+6(s, t ) . From (a), the Jacobi equations reduce at t
=0
to
v, v, a, 6(s, 0) = 0. Thus, in view of the conditions 8,S(0,O) = uo ,a, 6( 1,O) = u1 derived from (b), we have
a, 6(s, 0) = (1 - s)uo + SUI .
V,a,6(s,O)=V,a,6(s,O)=u,
-Do.
Thus the Taylor expansion of a,S(s, t ) about t = 0 is
a,6(s,
t ) = (ul
+ . . + (higher-order terms in 2).
- uo)t
*
V, V, a,6(~,t ) = V, V, a,6(s, t ) = V, V, a,6
+ R(a,6, a, 6)(a, 6).
Hence,
N O Z= J
1
0
IKU,
- U 0 ) t + HV, v, a, 6(s, t )
+ R(a,6(s, t), ds6(s, t))(a,6(s, t)))t2 + . .. 11’
Now
s,
1
<(u1
- uo)t,
v, v, a, 6(s, t)>ds = J0 [& ((01 l
= ((01
a
- uo)t, v, a,
= 0 (by
- UOP,
v, 4d(0,
1
w ,0 )
- uo)t, v, a, 6(1, 0 )
- ((01
ds.
t)>
condition (b)).
We now prove
Integrating by parts and taking into account condition (b), we have (left-hand side (23.11)) =
1 - (V, V, V, a, 6(s, 0),V, 8,6(s, 0)) ds. 1
0
v, v, v, a, 6 = V,(V, v, a, 6 + R(a, 6, a, 6)) = v, v, v, a, 6 + R(d, 6, a, S)(V, a, 6) + (V, 6, 3, w , 6) + N V ,a,6, 8,6>@,6) + R(a,6, V, 4Was 6) + R@, 698, a v , a, b).
ds
312
Part 3. Global Riemannian Geometry
Finally we see that V,V,V, Now
a, 6(s,
0) = 0. This proves (23.11).
+ N u ,, u1 - u0>(u1)s2 + “0 =
j(u13
R(u0, ud(u0))
> u1
+ K U l ? W u , , - uo>(uo)>
+ 3< - D o N u , , -uo)(ud) + $< - 00 9
=
N u , ,%)(DO))
- s)s> ds
- uo)(u1)(1
=K
m 11uo 112
7
R(u0 ud(v1)) 9
llu1 112 sin2 0,
where 0 is the angle between uo and u l , and is the plane of M , spanned by uo and u1 ; whence (23.9). Note that the first term on the right-hand side of (23.9) gives the “law of cosines,” or basically, the Pythagorean theorem, for triangles in Euclidean geometry. Note also that tlluoll ~ ~ lsin u 01l t 2~ is~ the Euclidean area of the triangle. Thus (23.8) shows that the sectional curvature gives the deviation of the law of cosines for small triangles. Many other formulas of trigonometry on Riemannian spaces can also be derived, using (23.9) as a replacement for the law of consines, or independently in a similar way. So far, we have been comparing to a certain infinitesimal order the geometric entities in a Riemannian space with the corresponding entitites in a Euclidean space. These are very classical results (mostly due to Riemann); only recently have results of this type been proved that are global and that enable one to compare goemetric entities in two, possibly both, non-Euclidean, Riemannian spaces. The following comparison theorem, due to Rauch, is the foundation of much of the work.
THEOREM 23.6 p
Let M be a manifold, with two Riemannian metrics defined on it. For M , u, v E M,, let ( u , u> and ( u , v)* be the inner product in the two
E
23. Sectional Curvature and Comparison Theorems
313
metrics. Let R and R* be the Riemann curvature tensors of the unstarred and starred metrics. For u, v E M , , let K(u, v) and K*(u, v ) be the sectional curvature on the unstarred and starred metrics of the plane spanned by u and v. Let p o be a point of M such that the inner products on M , agree. Let a: [0, 11-+M be a curve beginning at p o , which is a geodesic in both metrics, which has the same length in both metrics, and such that for 0 < t 5 1, a(t) is not a conjugate point of a(]) (with respect to 0)in the unstarred metric. Let v(t) be a vector field along CT which vanishes at t = 0, which is a Jacobi field with respect to both metrics, and such that (v(t), a’(t)>
= 0 = ( u ( t ) , o’(t)>*
for 0 I t I 1.
For t E [0, 11, let u(t) E Ma(r)be defined as follows: Parallel-translate u(t) to ~ ( 1 )along CT, using the starred affine connection; then parallel-translate this vector at 4 1 ) back to o(f) along a, using the unstarred affine connection. The result is u ( t ) . Then
I J:[K*(u(L), a‘(i)) - K ( u ( l ) , a’(L))](v(l),u(L)>* d l .
(23.12)
l I t. Equality holds if and only if v ( l ) = u ( l ) for 0 I Proof. It suffices to prove (23.12) in case t = 1 and to suppose that length 1. Let dim M = n. Choose the following indices and summation convention: I < i, j , . . . < n. Let ( w i ( t ) ) ,( z i ( t ) )be vector fields along a such that: CT =
(V and V* denote covariant differentiation with respect to the unstarred and starred affine connection, respectively.) Suppose that u ( t ) = ai(t)wi(t)= bi(t)zi(t).Writing out the Jacboi equations for u for both metrics, using (b) and (c), we have d ’bi(t )
dt2
+ bj(Zi(t),R*(Zj(?),a’(t))(a’(t))>*
= 0.
(23.13)
314
Part 3. Global Riemannian Geometry
From (a), ai(l) = bi(l). Hence u(t) = bi(t)wi(t). Since u(0) = u(0) = 0, u(1) = u(l), we have, by Theorem 22.7,
= Jol
db, db, [ z dt
= (after
+
1
0
- bi bj<wi> R(wj 9 c’(t))(0‘(l))>] dt
db, integrating by parts and using (23.13)) bi(l) -(1) dt
[bj*bi- ( u , R(u, a’)(a’)>ldt
COROLLARY 3 Suppose that M is a Riemannian manifold with two Riemannian metrics , )*. Let p o E M satisfy the following conditions:
( , ) and (
(a) The geodesics beginning at p o in the two metrics coincide and have the same length. (b) If a: [O, I] M is any geodesic beginning at p o , having no conjugate points of p o , then --f
K*(u, o’(t)) I K(u, a’(t))
for all t
E
[O, 11, all u, 0 E Mo(t).
23. Sectional Curvature and Comparison Theorems
315
Let p be any point of M lying on a geodesic from p o having no conjugate points of p o . Then for all u E M , . ( u , u ) I( u , u)* Intuitively, inside the “ conjugate locus ” of p o the starred metric is bigger than the unstarred one. Proof. In view of the relation between Jacobi vector fields and geodesic deformations, the Jacobi fields of both metrics that are zero at pa must coincide. Thus (b) implies that
d dt ((u(0,
4t>> - (u(t),
u(t>>*)I 0
for each Jacobi field that is zero at p a ; hence 5
(4%u(t>>*.
If o(1) is not a conjugate point of o(0) = p o ,the values at t = 1 of all Jacobi vector fields that are zero at p a spans Mu(,,. The corollary then follows. The following more qualitiative comparison theorem is due to Morse and Schoenberg. Both comparison theorems may be considered as generalizations of the classical Sturm comparison theorem. THEOREM 23.7 Let M be a Riemannian manifold, and let a: [O, 11-M be a geodesic of M such that o(1) is the first conjugate point of a(0) along o. Suppose that c1 and c2 are positive real numbers such that Cl IK(u,o’(t)),
(23.14a) (23.14b)
=/A,(b) ./AIlength o.
(a) length a I
(23.15)
Proof. Suppose for the moment that o(1) is an arbitrary geodesic of M . Let u,(t), 0 It I 1, 2 Ii i n (summation convention), be vector fields along o such that Vu,(t)= 0, (u,(f), u j ( t ) ) = a,, , (u(t), o‘(t)) = 0. Suppose that u is a vector field along o of the form u ( f ) = a, sin(knf)ui(f). Thus llu(t)l12 = sin2(knt) . a,ai,
H(u) =
1
0
vu
= k m , cos(knt)u,(r).
[(VU,VU) - (u, R(u,o‘)(o’))] dt
= aiai Jo
1
[k2n2cos2(knt) - sin2(knt). Ila’(t)1I2K(a’(t), u(t))] d t ,
316
Part 3. Global Riemannian Geometry
where K(o’(t),u(t)) is the sectional curvature in the plane spanned by o’(t) and u(t). By (23.14), the integrand is no greater than
k2n2 cos2(knt) - c1 sin2(knt)~~o’(t)~~z. But IIa’(t)II = length o.
Take k = 1. Note that A j cos2(nt)dr = A j sin2(xt) dt and u(0) = 0 = ~(1). If there are no conjugate points on oy we must have H ( u ) 2 0. This forces length o i 7 c / f i l , and hence proves (23.15a), since the same inequality obviously holds if o(1) is the first conjugate point of ~(0). We turn to proving (23.1%). Let u(t), 0 I tI 1, be a continuous, piecewise Cm vector field along o, with u(0) = 0 = u(l), and ( ~ ‘ ( l ) ,~ ( 1 ) ) = 0. Using a Fourier series expansion of the components of u, we can write u(t) =
C aik sin(knt)ui(t). k=l m
Since u is piecewise Cm, the Fourier series for Vv also converges and is, vV(t)=
m
C aik kn COS(knt)u,(t),
k= 1
Suppose that a(1) is a conjugate point of o(0). We can then choose the vector field u so that H ( v ) = 0. Then
c m
I \ ~ ( t ) (= 1~
aikailsin(knt) sin(lnt).
k,l=l
j:cos(knt) cos(Znt) dt = -
1
0
cos(k + Z)nt dt
sin(k + l)nt k+f
=0
J0
‘sin(knt) sin(Znt) dt
IO+~O
+ J’sin(knt) 0
+
sin(knt) sin(Znt) dt
sin(lnt) dt.
23. Sectional Curvature and Comparison Theorems
317
Thus, z2
m
k,l=l
aikaii(z2k~ - (length .)'c2)
s,
1
sin(kzt) sin(l7c.t) d t .
Suppose now that (23.15b) is not true; that is, (length .)'cZ2 < n2 I n2kl
for k , 1 2 1.
But
These inequalities are thus contradictory, whence (23.19, and the theorem is proved. This theorem can also be proved by using the Rauch comparison theorem, that is, Theorem 23.6.
24
Submanifolds of Riemannian Manifolds
Throughout this chapter, let M be a Riemannian manifold. Thus, each p E M has a positive definite inner product ( , ) defined by the metric. M carries the Riemannian affine connection ( X , Y ) 4 V, Y. Now recall that technically a submanifold must be considered as a pair ( N , 4) consisting of another manifold N and a mapping 4 : N 4 M such that: (a) For p E N , 4* : N , (b) 4 itself is 1-1.
-+
M4(p)is 1-1.
If (a) is satisfied, but not necessarily (b), the pair is called an immersed submanifold: By the implicit function theorem, every point of N has a neighborhood so that 4 restricted to this neighborhood is a submanifold. Intuitively, an immersed submanifold is locally a submanifold, but may have " self-intersections." However, many differential geometric facts proved about submanifolds carry over with little difficulty to immersed submanifolds, so we shall restrict attention here to submanifolds. If ( N , 4) is a submanifold, it is customary to suppress explicit reference to 4, to identify N with the subset $ ( N ) of M and each N p with the subspace 4*(N,) of M4(,). When there is little possibility of confusion, we shall do so. Let N be a submanifold of M . Since each N p is identified with a subspace of M,, the given inner product ( , ) on M , can be restricted to N , to define a positive definite inner product there also: Thus N inherits a Riemannian metric from its embedding, called the induced metric. Our first job is to compute the affine connection and the curvature for the induced metric. For p E N , let Npibe the orthogonal complement of N p in M , with respect to the form ( , ). An element v EN,' (satisfying (v, w) = 0 for all w E N,) is called a normal vector to N . Define N'=
PEN
NPi,
the normal vector bundle to N . It is readily verified that N' is a submanifold of T ( M ) , the tangent bundle to M , whose dimension is equal to that of M . A vector field X E V ( M ) is said to be tangent or nomal to N if, respectively, X ( p ) E N p or X b ) E N,' for all p E N . Suppose that X and Y are tangent to 318
319
24. Submanifolds of Riemannian Geometry
N , while Z is normal. Note that, forfE F ( M ) :
(VX(fY),Z X P ) = (fvx y, Z X P ) + (X(f)r,z> =f(P)(Vx Y?Z>(P), since ( Y , Z ) ( p ) = 0.
(V,,X,Y? 2%) =f(p)(Vx y, Z>W = (VX Y , f Z ) ( P ) . These identities are the tipoff that the mapping of vector fields into functions: ( X , Y, 2 )-+ (V, Y, 2 ) possesses a “value ” at each p E N ; that is, if u E NpL, u, w E N p , choose X , Y, Z E V ( M ) such that X and Y are tangent to N , Z normal to N , so that X ( p ) = u, Y ( p ) = w,Z ( p ) = u, and define
Su(u, w)= < v x y, ZXP) = < v x Y(P),Z(P>>.
(24.1)
Considered as a function in the indicated subset of T ( M ) x T ( M ) x T ( M ) , S is called the second fundamental form of N . (This is the classical terminology; the first fundamental form is just the inner product ( , ) on T ( M ) restricted to T ( N ) . ) The symmetric bilinear form (u, w) -+ SJu, w) defined on T ( N ) is called the value of the second fundamental form on u. It must be verified that this is independent of the extension of u, u, w to vector fields, but we shall do this in a moment after computing in terms of a local basis of vector fields. As algebraic properties, note from (24.1) that S,(v, w)varies linearly when u, u, or w are varied separately; that is, S as a function of Np‘ x N p x N p + R is multilinear. A less automatic property is symmetry, namely, SdU, w)= SU(% u).
(24.2)
Proof. First note that if X and Y E V ( M ) are tangent to N , so is [X,.Y]. T o prove this, let p E N. Revert to explicit mention of the map 4 : N -+ M defining the submanifold. For u E N,, f~F(M), & ( u ) ( f ) = u(cj*(f)). Hence 4 * ( u ) ( f )= 0, provided 4*(f)= 0. Thus, eliminating 4 again from the notation,
N , c {u E M,: u ( f ) = 0 for allfE F ( M ) that vanish on M ) . Conversely, it is seen (using the implicit function theorem, which is left as exercise) that the set on the right-hand side has the same dimension (as a vector space) as N p ; hence equality holds. Now suppose that f E F ( M ) vanishes on N and that X , Y E V ( M ) are tangent to N. Thus
0 = X(P)(f>= X(f)(P)= Y ( p ) ( f ) = Y(f)(P)
for all P
E
N.
cx,YI(P)(f)= [X,YI(f)(P)= X(Y(f>>- Y(X(f))(P)= 0,
320
Part 3. Global Riemannian Geometry
since Y (f)and X ( f ) are functions vanishing on N . Thus [X,Y](p)E N p for all p E N. Returning to (24.2), suppose that X, Y E V ( M ) are tangent to N , and Z E V ( M )is normal to N . If u = Z ( p ) , z, = X(p), w = Y(p), &(u, w ) = (V, y, Z X P ) = (VY =
x, z+ [X, YI, Z X P )
x,Z>(P>+ a x , YI(P), Z(P)> = SU(W V),
since [ X , Y](p)E N p , Z ( p ) E Np’. This proves (24.2). We must learn how to compute the second fundamental form in terms of a local basis for vector fields.
LEMMA 24.1 Let p be a point of N, let (vJ, 1 I i,j, . .. I mdim M , be an orthonormal basis of M p such that: iI n = dim N , is a basis for N p . (a) ( v J , for 1 I (b) ( v J , n + 1 I i = m, is a basis for NoL.
Then there is an open set U of M containing p and a basis ( X i )of vector fields in U such that ( X i , Xi> = J i j
for 1 5 i , j 5 m .
(Any basis of vector fields satisfying this condition is called an orthonormal basis.) Now n U, X i @ ) = ui for I I iI m, [ X i , X,], for 1 I i, j , . . . I n, is expressible, in N n U, in terms of XI, . . . , X,.
X i ,for 1 i i I n, is tangent to N
Proof. Since N is a submanifold of M , U can first be chosen so that it carries a coordinate system of functions ( x i )so that: U n N = ( q E U : x,+,(q) = 0 = ... = x,(q)} (exercise in the implicit function theorem). From (d/dxi)(xj) = 0 for 1 iI n, n + 1 < j I m, it follows that the vector fields (dlax,), . . . , (a/ax,) are tangent to N . Recall the Gram-Schmidt orthogonalization linear algebra process of constructing an orthonormal basis from a given basis. Thus
The construction is such that the vector fields XI‘ obtained form an orthoare also tangent to N, normal basis of vector fields so that the XI’,. .. , X,,’
32 1
24. Submanifolds of Riemannian Geometry
while the XA+17.. ., X,‘ are therefore normal to N . Thus we have expressions of the form
vi =
vi =
c n
j= 1
m
C
j=n+l
i < n,
for 1
CijXj‘(P)
for n + 1 s i 5 m.
cijXj’(p)
Each of the matrices occurring in these relations is an orthogonal matrix. If now we define
xi= c CijXj” n
l s i l n ,
j=l
the vector fields (X,, ..., Xm)will do the required job.
Q.E.D.
Let us say that a basis of vector fields having the same properties as the bases X,, . .., Xn constructed in Lemma 24.1 is a local moving frame? for the submanifold geometry of N . Lemma 24.1 can then be interpreted as asserting the existence of a plentiful supply of local moving frames. Suppose now that we work with any such local moving frame XI, . . . , Xn defined in U. Since the indices I i i, j , . . . i n must systematically be split into two parts to account for N p and Np’, it is convenient to introduce the following further ranges of indices, with the corresponding summation conventions in force : 1 < i , j ,... I n ;
n + l < a , P ,... I m . (24.3)
l < a , b ,... < n ;
X, (that is, the r i j k ) are the components of the Suppose that Vx, X j = rijk Riemannian affine connection with respect to the basis ( X i ) . The rabu determine the second fundamental form, since =
Xb
7
xu>
=
-< x b
= rabu(P) = s X , ( p ) ( x a ( P ) ,
9
vX,
xb(p>>
xz> =
(24.4a)
- l-aab
for P
N . (24.4b)
Note that 0 = Xi((X i , X,)) =
0 =rijk 0 = Vxi X j - Vx, X i - [ X i , X j ]
+
rikj.
(24.5a)
(torsion zero condition). (24.5b)
t This, translated into our language, is what E. Cartan meant (specialized to this geometric situation) by a “ repkre mobile.”
322
Part 3. Global Riemannian Geometry
Hence,
0 =rijkx, - rjikx
k
-[
x i , x j ] .
(24.5~)
Conversely, the r i j k are uniquely determined by these conditions, since obviously another set of r satisfying (24.5a) and (24.5~)would determine the Riemannian afine connection, which we know is unique. LEMMA 24.2 Consider the X , as vector fields Xa* in N n U by restriction; that is, E N p for p E U n N . Let V* be the Riemannian affine connection associated with induced Riemannian metric on N . Then
X,*(p)= X,(p)
Vtf.,. x b * where the
rzbc
are the
rabc
=
r,*,,x,*
(24.6)
restricted to N n (1.
Proof. From (24.5a), we have r z b , + r,*,, = 0. Also, the fact that [X,, Xb] is tangent to N forces, using (24.5b), raba - rba, = 0. (In view of (24.4), this is just the symmetry of the second fundamental form.) Hence, from (24.5b) again,
By the preceding remarks leading to (24.5) (repeated for the metric on N )
dejining V* by (24.6) gives an affine connection on U n N , which satisfies the
two conditions needed to prove its identity with the Riemannian connection. Q.E.D.
Another way of stating (24.6) is VX, x b ( P )
- v;,,
xb(P)
= sXa(p)(xa(P)2
xb(P))xa(P>.
(24.7)
To find the relation between the curvature of the metrics on M and N and the second fundamental form, it is necessary to apply covariant derivatives of both sides of (24.7) and use the various identities we have developed. However, there is a much neater way of doing this, developed by E. Cartan, using a dual differential form point of view. It will repay our investment to detour a moment to develop this approach. LEMMA 24.3 Let ti be an open set of a Riemannian manifold M of dimension n (1 i, j , .. . I n) which has an orthonormal basis ( X i ) of vector fields; that is, ( X i , X i ) = d i j . Let coi be the dual basis of differential forms; that is, W i ( X j )=
dij.
323
24. Submanifolds of Riemannian Geometry
Let
rijk
be the functions in U such that
and let w i j be the 1-forms defined by (24.8)
+
(a) oij oji= 0,
(b) d o i = wij A w k .
(24.9)
Conversely, any set of forms wij satisfying (24.9a) and (24.9b) is uniquely determined and given by (24.8). Let the 2-forms aijbe defined by
a..= dw.. - 0. rk A 0k j '
(24.10)
ij
11
Then the curvature tensor is determined as follows : = Oij(X, Y
( R ( X , Y ) ( X i ) ,X i )
for X , Y
)
E
V(U).
(24.11)
Proof. Equation (24.9a) is equivalent to (24.5a). We show that (24.9b) is equivalent to (24.5b) : (dwi
- Oij A
wj>(xk
7
xl)
- xl(wi(xk))
= xk(wi(xl))
- wij(xk)wj(xl) = -wi([xk, x l l ) =
9
- Oi([xk,
- oij(xl)wj(xk) - rkij6jZ + r t i j 6 j k
xll) + rkZi
- rZki
xll)
*
This shows that (24.9b) is equivalent to (24.5b); hence, also that the wij are uniquely determined by (24.9), since they determine the unique Riemannian connection. Note, for example, that for X , Y E V ( U ) , VX
y
= [x(wk(y))
+ wj(y)wjk(x>Ixk.
(24.12)
In particular, vxi xj = o j k ( x i ) x k
and
wk(vx xj) = wjk(x).
From (24.10), dwij(X, Y ) = X(wij(Y))- Y(wij(X)) - Oij(CX, Y l ) = X ( ( X j , V Y X i ) - Y ( ( x j 7 v ~ x i ) ()~- j 7 V ~ x , Y ] X i )
( V ~ x jV,X X i ) + < X i , R ( X , Y ) ( X i ) ) - ok(Vl'Xj)wk(VXXi) + <xj R(X, y)(xi)) = wjik(X)Wik(Y) - w k j ( y ) w k i ( x ) + ( x j ,R ( X , y)(xi)), =
whence (24.1l), using (24.10).
9
324
Part 3. Global Riemannian Geometry
This finishes the proof of Lemma 24.3. The forms w i j satisfying (24.9) are called the connection forms of the Riemannian affine connection with respect to the orthonormal basis ( X i ) . The 2-forms O i j are called the curvature forms. Suppose now that we return to the case where X,, 1 5 a, b, .. . I n, are tangent to U n N . Then, since o,(X,) = 0 for IZ + 1 5 a,/3, . . . 5 m, the 1-forms o,,when restricted to U n N , are zero. If o is a differential form on U n N , let o*denote that form restricted to U n N (that is, is pulled back via the map U n N -+ U defining the submanifold structure of U n N ) . Thus the forms (ma*) are a basis of forms on U n N dual to the orthonormal basis (A',*) of vector fields on U n N . But do,* = (o,i A mi)* = o:b
A wb*.
Since clearly o:b+ cob*, = 0, the forms m:b are the corresponding connection forms on U n N . (This is the dual statement t 0(24.6).) The corresponding curvature forms a,*,can be easily computed :
=
- raacoc*
A rbacl
wc*, -
Now using the relations (24.1 1) and (24.4) between the curvature forms, Christoffel symbols and the curvature, we have Theorem 24.4. THEOREM 24.4 Let N be a submanifold of a Riemannian manifold M . Let p E N , u, u, w, n = dim N < a 5 m = dim M , be any orthonormal basis for N,'. Then w1 E N , , and let (u,),
c m
(w, M u , U)(Wd>
-
( w , N u , u)(w1)> =
a=n+ 1
S",(WI,
U)S",(W,
u)
where R( , )( ) and RN( , )( ) are respectively the curvature tensors of M and N , and where S, )( , ) is the second fundamental form of N . In particular, if u, v are unit orthonormal vectors of N , , then
where K ( u , u ) and KN(u,u ) are respectively the sectional curvatures of M and N in the plane spanned by u and v.
325
24. Submanifolds of Riemannian Geometry
COROLLARY Suppose that N is a hypersurface in M ; that is, dim M Let v, be a generating vector of N p L .Then, for u, u E N p ,
= dim
N
+ 1.
KN(u,v) - K(u, u) = the product of the eigenvalues of the quadratic form Sun(, ) restricted to the plane spanned by u and v. (These eigenvalues are called the principal curvatures of the plane.)
To prove the corollary, recall the following facts from linear algebra: (i) A vector u E N p is an eigenvector with eigenvalue 1 of Sun(, ) if
S,,,(u, u ) = 1(u, u )
for all u E N , .
(ii) The set of all eigenvectors corresponding to a given eigenvalue is a linear subspace of N p. (iii) Eigenvectors corresponding to different eigenvalues are perpendicular with respect to { , ). Hence, to compute (for u, u E Np)the sectional curvature of the plane spanned by u and v, we can choose u and v so that they are eigenvectors for eigenvalues 1, and 1, of Sun(, ) restricted to the plane, and satisfy ( u , u ) = 0, S",(U,
( u , u ) = 1,
v)
{ v , v) = 1.
= 4 = 0,
SU"(U, u ) = I, = 1 1 , Sun(v,0) = A , ( U , u> = 1 2 7
whence, from (24.14), K,(u,
U)
- K ( u , U ) = 111,.
Q.E.D.
Now we turn to the second variation formula in a more general form than that considered in Chapter 22, namely, when we consider homotopies whose end points are not necessarily fixed, but which lie on two submanifolds of M . Explicitly, suppose that: (a) 6(s, t ) , 0 I s, t I 1, is a homotopy in N , with each curve t + 6(s, t ) parametrized proportionally to arc length. N and N' are submanifolds of M : 6(s, 0) E N and 6(s, 1) E N' for 0 s < 1 ; that is, the end points of the homotopy lie on N and N Further, o(t)= 6(s, 0); v ( t ) = d,6(0, t ) E M , ( t ) . Hence, t + u(t) is the vector field on (T representing the infinitesimal deformation of o. I .
326
Part 3. Global Riemannian Geometry
(b) R( , )( ) is the curvature tensor of M ; S, )( , ) and S, )( , ) are the second fundamental forms of N and N’, respectively. L(s) = length of curve t--t6(S, t),0 I tI 1. From (20.6) we have d L(0) ds L(s)I
s=o
=(
~ ( t )o’(t)>[=’ , t=O
1
1
0
( ~ ( t ) Vo’(t)) , dt.
This is the first variation formula. It vanishes if u(0) E No(,,, ~ ( 1E) N;,,), that is, v(0) and ~ ( 1 are ) tan-
gent to, respectively, N and N‘.
(24.15a)
o‘(0) E N&,,, o’(1) E N$,,, that is, o is perpendicular and N‘ at, respectively, t = 0 and t = 1.
(24.15 b)
o(t) is
a geodesic, that is, Vo’(t) = 0.
(24.15~)
Formula (24.15a) is, of course, implied by our assumptions that 6(s, 0) E N and 6(s, 1) E N ‘ . Let us suppose further that the remaining conditions are satisfied. We can now carry out the differentiation in the first term of the right-hand side of (22.2), with the result that
Let us examine the term of the form, say, (V, 13,6(s,0), o‘(O)>, which at first sight does not have a familiar form. However, we shall as usual assume that 6(s, t ) is of a special form, namely, that there exists a vector field X E Y ( M ) that is tangent to N such that ds6(s, t ) = X(6(s, t ) ) .
Then
= (V,X(6(s,
O)), o’(0))
= S,r(ot(u(0),~ ( 0 ) )
for s = 0.
Since this formula is independent of the Xchosen, we can be reasonably confident that it holds in general. (Explicit verification is left to reader!) Finally,
327
24. Submanifolds of Riemannian Geometry
then, we can write the second variation formula in several forms:
= SUYi)(4t),u(t))j’ t = O
+ J1<WO, WO>- (v(t>, 0 i=
=
o’(t))(o’(t>)>d t
1
su,{i)(m3 W)/ i=O
+ J1[llVu(t)llz 0
- K(v(t), o’(t))((v(t)I(’L(0)’ sin’ O ( t ) ] dt. (24.16)
The second form is obtained from the first by integrating by parts the first term in the integrand. The third is obtained from the second by applying the Gram-Schmidt process to the vectors v(t), a’(t); (O(t) is the angle between v(t) and o ’ ( t ) ; L(O), the length of the curve t -+ 6(0, t ) = o(t), is equal to llu’(t)ll, since o is parametrized proportionally to arc length). There is an obvious interest in the first two terms on the right-hand side of (24.16), particularly in knowing geometric conditions that they be zero. The following theorem will give us such conditions.
THEOREM 24.5 Let M be a complete Riemannian manifold, let N be a submanifold of M , let 6(s, t ) , 0 5 s, t I1, be a homotopy in N such that 6(s, 0) E N . Let o(t) = 6(0, t), and u(t) = d S 6(0, t). Conclusion : If, for each s, the curve t -+ 6(s, t ) is perpendicular to N at t = 0, then (Vv(O), u )
=
-S,,.(,,(u(O), u )
for all u E Nu(o).
Conversely, if a(t), 0 I t I 1, is a curve in M , with o(0) E N , a’(0) and if uo and u1 E Nu(o,satisfy uo E M,,(o), ( ~ 1 ,u>
= - ~ u ~ ( o ) ( uuo) ,
(24.17) E N:(o,,
for all u E
then there is at least one homotopy 6(s, t ) , 0 5 s, t 5 I , such that
w,0 )
EN
,
46 ( S , O ) E N&, 0).
(24.18)
For each s, t -+ 6(s, t ) is a geodesic 6(s, t ) = ~ ( t ) If. v(t) = 8,6(0, t ) , then
328
Part 3. Global Riemannian Geometry
t -+ v ( t ) is a Jacobi vector field along a satisfying v(0) = u,, , Vv(0) = ul. In particular, c’ satisfies the initial conditions (24.17).
Proof It suffices to prove this theorem locally, that is, to suppose that N is contained in an open set U that has defined on it a basis of vector fields XI, . . . , X , E V ( U ) such that (Xi,Xj)=aij
for 1 1 i , j i n = d i m M (summation convention in force)
so X,, . . . , A’, are tangent to N .
Then let Ai(s, t ) , Bi(s, t ) be the functions such that
dt6(s, t ) = A;($, [ ) X i ( $ , tj,
d,d(s, t ) = B i ( S , t)Xi(S, t ) .
Our assumptions about 6 are equivalent to the conditions for 1 5 i I n;
Ai(s,0) = 0
for
Bi(s, 0) = 0
II
+ 1 I i 5 m.
Now
V”(0) = V,d,6(0, 0) = v,d,s(o, 0) = vs(
1
n+lSi<m
AJ,).
Hence (Vu(O),
xj(a(o>)> = C Ai(O, O)B,(O, O)
3
x j > ( a ( o ) ) for 1 ~j I n,
1
= - (a’(O), -
vu ( o ) X j )
-sc,(o)(40)> xj(a(O>>,
which proves (24.17). Now we deal with the converse. Let y(s), 0 I s i 1, be any curve in N with y’(0) = u o . We show that there exists a vector field s + w(s) E N y c s , along y such that
w(s) E N;,,,
for 0 I s I 1,
Vw(0)
=ul,
w(0)
= a’@>.
To do this (again it suffices to work locally), we can choose again the orthonormal bases X i for vector fields such that X , , 1 i s n, is tangent to N. Suppose that
Let us look for w(s) of the form
329
24. Submanifolds of Riemannian Geometry
Then
This suggests that we choose the aj(s), hence w(s), so that for n
aj(s) = ( X i , o’(0))
da, (O) = < U l j x k > ds
-
n+ 1
+ 1 Ij
Bi(Obj(O)(VxiXj
sj_<m
_< m , 1
lsisn
Xk)(O(O))
for n
+ 1 I k Im.
TOshow that w(s) SO defined satisfies the required conditions, it remains only to check that
(vw(o),x k )
=
for 1 5 k 5 n .
(#I, x k )
But, for 1 5 k 4 n,
=
-1Bi(o)uj(o>(v,ix, X j ) ( g ( o ) ) >
= -
-
- Sa’(0)(UO
=(ul,x
k ) ,
7
xk(a(o))>
as required.
Now that we have verified the existence of a vector field s W(S)along the curve s -+ y(s), we can proceed to the proof of the converse. Choose the homotopy 6(s, t ) so that --f
6(s, t ) = exp(tw(s))
for 0 I s, t 5 1.
It should be clear that 6 satisfies all conditions of (24.18) except possibly the initial conditions, which we now verify: dt6(s, 0) = w ( s )
and
d,6(s, 0) = y (s),
by the definition of the exponential map. Thus u(0) = a,s(o, 0) = y’(0) = u o ,
Vu(0)= v,a,s(o,0) = v,a,s(o,0) = V w ( 0 ) = u l , which completes the proof.
330
Part 3. Global Riemannian Geometry
This theorem suggests several definitions. First, if a(t), 0 I t I 1, is a geodesic that is perpendicular to N at t = 0, we say that a Jacobi field t -+ u(t) along o is transziersal to N if it satisfies 40) E NdO,
(Vu(O),u )
=
-S,,,(ol(u(0), u )
(24.19a) for all u E N o ( o ) .
(24.19b)
The second part of the theorem asserts that such a vector field arises as the infinitesimal deformation of at least one geodesic deformation of c such that each geodesic of the deformation is initially perpendicular to N . Let us say that a point o(to)of o,0 < to 5 I , is a focal point of N with respect to a if a nonzero transversal Jacobi field exists which is zero at t o . The dimension of all such Jacobi fields (notice that u(to) = 0 and (24.19) are linear homogeneous conditions) is called the index of the focal point: Let N' = N p L be the normal bundle to N , and let exp: N'+ A4 be the map such that, for c E N,,', t + exp(tr;) is the geodesic starting at p which is tangent there to u. We then have:
UpEN
COROLLARY TO THEOREM 24.5 If u E N', p = exp(u), then p is a focal point of N with respect to the geodesic t + exp(tu) if and only if exp,: (N'),, -+ M , is not 1-1 ; that is, if and only if exp has a zero Jacobian at u. Proof. Suppose first that exp, is not 1-1. Let y(s), 0 I s I 1, be a curve in N , ~ ( s E) N&, , be a vector field on y such that s + exp(w(s)) has a zero tangent vector at s = 0, w(0) = u. If 6(s, t) = exp(tw(s)), u(t) = i3,6(0,t ) , then u ( t ) is a Jacobi vector field along t -+ exp(tu) that is transversal to N and vanishes at t = 1 ; that is, p = exp(u) is a focal point with respect to the geodesic t + exp(tu), according to the above definition. Conversely, if t + c ( t ) is a Jacobi vector field along the geodesic t exp(tu) that vanishes at t = 1 and that is transversal to N , by Theorem 24.5 we can construct a geodesic deformation 6(s, t ) satisfying (24.18), such that ds 6(0,0)= o(O), V, d, 6(0,0) = Vv(0). Then t -+ a, 6(0, t ) would be a Jacobi field along t + exp(tu) satisfying the same initial conditions at t = 0 as ~ ( 2 ) ; hence, must coincide with u(t). But then -+
6(s, 1 )
= exp(46(s,
0));
hence, s 6(s, 1) is a curve starting at exp(u), which is the image under exp of a curve in N', and which has a zero tangent vector at s = 0. Hence, exp, is not 1-1 at u, as required to finish the proof of the corollary. The corollary provides us with important qualitative information about the distribution of focal points. For example, if u E N', and if exp(u) is not -+
331
24. Submanifolds of Riemannian Geometry
a focal point (with respect to t -+ exp(tu)), then exp(u') is not a focal point with respect to t + exp(tu') for all u' E N' that are sufficiently close to u. Further, by Sard's theorem, the set of points p E M that are focal points with respect to some geodesic joining p to N and perpendicular to N is of measure zero. Note further that in case N is a point, say p o E N , N' = M,, , and the focal points are just conjugate points in the sense defined in Chapter 22. Many of the results proved in Chapters 22 and 23 concerning conjugate points can be generalized to apply to focal points. The Jacobi theorem (Theorem 22.4) is the prime example, and takes the following form: THEOREM 24.6 Let N be a submanifold of a complete Riemannian manifold M , let u E N1 be such that exp(u) is a focal point of Nwith respect to the geodesic t -+ exp(tu). Then, for a > I , t -+ exp(tu) is not the geodesic of minimal length joining exp(au) to N. The proof is similar to the proof of Theorem 22.4, and is left as an exercise. Now we turn to various elementary geometric applications of these concepts. The first one we have in mind is concerned with the following situation: N is a submanifold of a Riemannian manifold M , p o is a point of M , qo is a point of N. We ask: What are suficient conditions that guarantee that the real-valued function q d(po, q), for q E N, cannot have a relative maximum at q = qo? It will turn out that this is a question that unifies many isolated geometric questions concerning Riemannian spaces, and whose answer falls out in a natural way from the second variation formula. The basic theorem is : -+
THEOREM 24.7 Let N be a submanifold of a complete Riemannian manifold M . Let po be a point of M , u E Nposuch that exp(u)= p E N , and such that the geodesic def
t exp(tu) = a([) is perpendicular to N at t = I . Suppose further that c satisfies the following condition: For every Jacobi vector field t -+ v(t) along cr such that v(0) = 0, --f
(Vu(t), u(t)) > 0
for 0 5 t I 1.
(24.20)
Suppose in addition that N satisfies any one of the following conditions: N is a minimal submanifold of M
(24.21)
or dim M I 2(dim N ) - 1, and K,(u,, u z ) I K ( u , , u z )
for all u , , u2 E T(N).
(24.22)
332
Part 3. Global Riemannian Geometry
Then there is at least one geodesic deformation 6(s, t ) , 0 I s, r I 1, with 6(0, t ) = a(t), 6(s, 0) = p , 6(s, 1) E N , and with length of t + 6(s, t ) 0 5 t 5 1, actually greater than the length of a, if s is sufficiently small, but not zero. Intuitively, p cannot be a relative maximum of the function q -+ d(p,, q ) on N , although this is not strictly true (unless N has within a geodesic ball about po and G is a minimizing geodesic), so we have stated the theorem in this complicated and more precise form. A word of definition is needed for the terms used in the statement of the theorem. First, a submanifold N of a Riemannian manifold M , in general, is said to be a minimal submanifold of M if for all p E N , all w E N I ,
A,(w)
+ . . . + A,(w)
= 0,
(24.23)
where Al(w), . . . , A,,(M') are the eigenvalues (counted according to multiplicity) of the symmetric bilinear form S,( , ) on N , (n = dim N ) . The geometric interpretation in terms of N minimizing " surface area" will be explained in a second volume. (In case M = R 3 , with the Euclidean metric, dim N = 2, this definition gives the classical one, that is, soap bubbles.) The proof consists of a series of lemmas.
LEMMA24.8 a contains no conjugate points of a(0) with respect to cr. The proof is almost obvious: d ( u ( t > , 4 t > >= 2(Vu(t), dt
-
40) > 0;
hence u ( t ) cannot vanish because v(0) = 0; hence, no conjugate points.
LEMMA 24.9 If u, E N , is such that S g l ( l l (u~, )L=, 0, then there is a geodesic homotopy 6(s, t ) such that
6 ( ~0, ) = p o ,
S(S, I )
E
N,
6(0, t ) = ~ ( t ) ,
and t + 6(s, t ) is of greater length than t small.
+ a(t)
d,8(0, 1)
=
v,,
if s is >O, but sufficiently
Pro?/: The fact that a( 1) is not a conjugate point of o(0) = p o with respect to a implies, we know, that exp, is 1 - 1 in the neighborhood of (Mpo)&,, where u E M,, is such that o(t)= exp(tu).
333
24. Submanifolds of Riemannian Geometry
Thus there exists a curve y(s) in M starting at p o , such that s + exp(y(s)) is any curve in N , in particular, chosen to be tangent to u1 at s = 0. Now define 6(s, t ) = exp(ty(s)),
L(s) = length of t
--f
6(s, t ) .
If v ( t ) = 8,6(0, t ) , we see from the second variation formula (24.16) that
since u(t) is a Jacobi vector field, and u(1) = u,. By (24.20), this is greater than zero; whence, the lemma. Now, in case N is a minimal submanifold of M according to the definition (24.23), Su,(ll(, ) must have at least one nonpositive and one nonnegative eigenvector; hence it must have at least one u1 E Nu,(1)that annihilates So,(1). This suffices to prove the theorem in case condition (24.21) is satisfied. Condition (24.22) is more difficult to handle. The tool is the following lemma, conjectured by Chern and Kuiper [I], but proved by Otsuki [l].
LEMMA 24.10 Let W be a vector space over the real numbers of dimension d. Let , ) be symmetric, bilinear forms over V such that Q,( , ), . . . , d-1
C
i= 1
Qi(W1,
wlIQi(w2 > wZ)
- Qi(W1,
~
2
50 )
~
(24.24)
for ail choice of vectors w l , w 2 E W . Then there is at least one nonzero vector M! E W such that Q,(w,
W) =
0 = ... = Q I - , ( w , w ) .
For the proof, we must refer to Otsuki's paper [I]. To apply this to the theorem, we choose W = N,, Q1= So,(,),,,o,(l),,, and Q 2 , ..., Q d - l are the second fundamental forms Su2(, ), ..., Sud-2(, ), where (d(I)/\\o'(l)\\, u 2 , . . . , u d - l ) is an orthonormal basis of N,'. That (24.24) is satisfied is a consequence of the assumptions made in (24.22), namely, that KN( , ) 5 K ( , ), and the fundamental formula (24.14) relating KN( , ) - K ( , ) and the second fundamental forms. Now, to have
334
Part 3. Global Riemannian Geometry
Lemma 24.10 apply to give the vector u1 E N , needed to satisfy S a . ( l ) (ul), ~l, we must have dim M - dim N = d - 1
and
d 5 dim N ,
whence dim M
2(dim N ) - 1 ,
which is precisely the condition postulated in (24.22).
Q.E.D.
For p o E M ,r > 0, recall that &o,
r ) = { P E M : 4 P o 7 P) < r > ,
that is, B(p,, r ) is the ball of radius r about po .
COROLLARY TO THEOREM 24.7 Suppose r > 0 is such that, for all u E M,, and llufl < r , the geodesic exp(tu) satisfies (24.20) for 0 < t 4 1, and such that d(exp(u),p,) = (lull. Then B ( p o , ( r / 2 ) ) is geodesically convex in the sense that, for p , q E B @ , , (r/4)), any geodesic of shortest length joining p to q must be completely in B(p,, (r/2)). In particular, there is a geodesically convex ball about each point of M . t
--f
Prooj. By the triangle inequality for the distance function d( , ) we have d(p, q ) 5 d(p, p o ) + d(p, q ) < r . Let y(s), 0 I s I 1 , be a geodesic of length d(p, q ) joining p to q. Then d(J,, y(s)) < r for 0 Is 5 1 ; hence y(s) E B(po, r ) for 0 I sI 1 . But, by Theorem 24.7, the function s + d(po,y(s)) cannot have a maximum for 0 < s < I , since s + y(s) is a geodesic of M hence d ( p o , y(s)) 5 d(po , p ) < r/2. That is, y(s) E B(p,, (r/2)) for 0 I sI 1. To show that each point p o has a convex ball, it remains only to show that such a positive real number r exists. Suppose, then, that o(t),0 t I 1 , is a geodesic of M , with a(0) = p o , and that t u(t) is a Jacobi vector field along 0 that vanishes at t = 0. Then --f
which is obviously greater than zero, provided Ila‘(O)Il is sufficiently small, and IIVr-(O)ll is bounded, say, IIVu(0))I I 1 and l/cr’(O)/l I r. But if this holds for all such ~ ( t with ) lIVu(O)l/ i I , clearly ( V v ( t ) , u(t)> > 0. Now this r might vary as Ilcr’(O)ll varies. But again the infimum of such r is positive as o’(0) varies in direction about p o . Q.E.D.
24. Submanifolds of Riemannian Geometry
335
Note an additional fact that follows from this argument: (d/dt)(Vu(t),v ( t ) ) is always > 0 if
(40,R(U(t),a’(t>)(a’(t)>>5 0. This condition is automatically implied by the condition : The sectional curvature of M is nonpositive. This condition, together with the condition that M be simply connected, implies (Theorem 23.1) that exp: M,, + M is a diffeomorphism. With these conditions we conclude that: The geodesic balls B(po, Y), for any Y > 0, are geodesically convex, if the curvature is nonpositive and if M is simply connected.
(24.25)
Another result of this type is: If u E M p o , (Iu((= r, if (24.20) is satisfied along the geodesic t + exp(tu) = o(t), then expuw E Mpo IIwII
=yH
is a submanifold about exp(u), and its second fundamental form S-up(lJ , ) is positive definite there.
(24.26)
Proof. That it is a submanifold follows from the implicit function theorem, since exp(u) is not a conjugate point of p o (Lemma 24.9). If u1 E Mexp(”) is tangent to the submanifold, there is a Jacobi field t -+ u ( t ) with u(0) = 0, u(1) = ul, and a geodesic deformation t 6(s, t ) with v ( t ) = d, 6(0, t ) , [(a,6(s, 0)II = Y, 6(s, 0) = p o . From the second variation formula, we have --f
&{1)(Ul3
01)
+
= 0;
whence (24.26). Reversing the arguments, the converse of (24.26) is also true, and gives a geometric interpretation of (24.20), namely: If o(t), 0 I t I I , is a geodesic of M with a(0) = p o , containing no conjugate points of p o , if, for 0 < r 5 Ila’(O)ll, the second fundamental form of exp{w E Mpo: llwJI = r> in the direction -a‘(r/J]d(O)II) is positive definite, then a satisfies condition (24.20).
(24.27)
Calculation of the Second Fundamental Form of Hypersurfaces Let f be a real-valued function on a Riemannian manifold M . We want to see how the second fundamental form, hence also the curvature, of the hypersurface f = constant can be computed in terms of .f: Construct the
336
Part 3. Global Riemannian Geometry
gradient (vector) field o f f ; an element of V ( M ) , denoted by grad f and defined by (gradf, X ) Thus if p
E
=X(f)
for all X
M is not a critical point forf, that is, if
E
V(A4).
4f#
0 at p , then
f -f ( P ) = 0 defines a hypersurfacet i n a neighborhood of p , and grad f ( p ) is perpendicular to this hypersurface. Hence (gradf/I/gradfll)(p) is the unit normal to the hypersurface, and the second fundamental form is
for X, Y tangent to the hypersurface, that is, satisfying X ( f ) =0 = Y(f).
Since (A', gradf)
= 0 = ( Y , gradf),
this can be rewritten as
Let us compute this in terms of an orthonormal moving frame. Let U be an open set of M containing p, with a basis (w,) of 1-differential forms that is dual to an orthonormal basis X i of vector fields in U . (1 5 i , j , . . .
m
= dim
M ; summation convention in force.)
Suppose that
clf=fiwi,
dfi = f i j w j .
Then we see that grad f
= fi
Xi
3
II grad f II
=
fifi .
Let ( a i j )be the connection forms corresponding to the given orthonormal basis. Then, for X E V ( M ) , by (24.12),
V, ~ r a d f= CX(fi>+ f
j~ji(x)IXi.
Hence
(V, grad f >Y >
= (X(si)
+ f j wji(X>>Oi(Y ) .
A hypersurface of a manifold is a submanifold of one lower dimension (that is, of codimension I).
24. Submanifolds of Riemannian Geometry
337
Thus the eigenvalues of the (normalized) second fundamental form are precisely those of the quadratic form (in i, k ) :
restricted by the condition f i A i = 0. Example
M = Euclidean space, with the flat Euclidean metric ,f = f a i j x i x j , where (xl, . . . , x,,)are the Euclidean coordinate system, ( a i j )is a symmetric constant matrix. Put the wi = d x i . df = a i jx j dx,,hence
fi = a i j x j ,
f.. CJ = a i j .
Since w i j = 0, the above quadratic form reduces to
These formulas plainly indicate how the second fundamental form is to be computed in principle in terms of the algebraic properties of the matrix (aij). Let us carry this out explicitly for the simplest case, namely: Suppose aij = aij, f = $ x i x i ;hencef = r 2 determines a sphere of radius r J 2 . Suppose, then, that f (x) = r 2 .
h =xi,
Hence,
fih = x i x i = 2j-= IIgradfII
2r2.
= J2r.
The quadratic form is then AiAj/,,br, whence: The (normalized) second fundamental form of a sphere of radius Y has all eigenvalues equal to 1/,,h. Thus, it has constant sectional curvature equal to llr. Totally Geodesic Submanifolds
DeJinition
N of a Riemannian manifold M is said to be geodesic at N if each sufficiently small geodesic of M beginning at p and tangent there to N lies in N completely. N is said to be toral1.v geodesic if A submanifold
a point p
E
it is geodesic at each of its points.
338
Part 3. Global Riemannian Geometry
Of coLirse this is the geometric definition of a totally geodesic submanifold, designed to generalize the concept of plane in Euclidean geometry. We now want to show that this definition is equivalent to several others and that this equivalence is reasonably nontrivial and useful. “
”
THEOREM 24. I 1 A submanifold N of a Riemannian manifold M is totally geodesic if and only if its second fundamental form of N is identically zero.
Proof. Let y E N , and suppose that N is geodesic at p. Let cr(t), 0 5 t 5 I, be a curve in N , beginning at p . which is also a geodesic of M . Set u = o’(0)E N , . Pick a u E N,’. Then, almost by definition, SU(U,u ) = ( u , Vu(0)) = ( u , Vo’(0)) = 0.
Hence, if N is geodesic at p, So( , ) is identically zero for all u E N,‘. This proves one part of Theorem 24.1 I . Turn to the converse ; suppose that S, )( , ) is identically zero. By (24.7), we see that
v,
Y = VX*Y,
for any pair X , Y of vector fields of M that are tangent t o N . (V* denotes covariant differentiation in the induced metric on N . ) In particular, we have proved : LEMMA 24.12 If the second fundamental form of N is identically zero, then every curve on N that is a geodesic in the induced metric on N is a geodesic of M also. But this property of N clearly implies that it is geodesic at each point, by the uniqueness of geodesics. Q.E.D. Another useful geometric characterization of total geodesity is Theorem 24.13. THEOREM 24.13 A submanifold N of a Riemannian manifold M is totally geodesic in M if and only if the following condition is satisfied:
Each sufficiently small geodesic of M whose end points lie on N must lie completely in N .
(24.29)
339
24. Submanifolds of Riemannian Geometry
t I 1, be a geodesic of ProoJ: Suppose N is totally geodesic. Let o(t),0 I M with o(0) and ~ ( 1E) N . If length o is sufficiently small, o(1) lies in a geodesic ball about o(O), and a(1) can be joined to a(0) by a geodesic nth induced metric on N : By Lemma 24.12 and the uniqueness of geodesics, these must coincide. Conversely, suppose that (24.29) is satisfied. Let o ( t ) , 0 I t I 1, be a geodesic of N in the induced metric. If length o is sufficiently small, ~ ( 1 ) can be joined to o(0) by a geodesic of M . By (24.29), this geodesic must also lie in N ; hence, by the length-minimizing property? of geodesics, it must also be a geodesic of N , which by uniqueness of geodesics on N must equal o. Thus every geodesic of N is a geodesic of M ; hence N is totally geodesic in M . Q.E.D.
This completes our study of the more or less superficial properties of totally geodesic submanifolds. We now go a little deeper and investigate the relation between the properties of totally geodesic submanifolds of M and its curvature tensor. THEOREM 24.14 Let N be a totally geodesic submanifold of a Riemannian manifold M . Then, for p E N ,
R V , , N,)(N,) c N , ,
NN,;N , N,)(N,) = N , ,
etc.,
where R( ; , ), R( ; ; , )( ), etc., denote the successive covariant derivatives of the curvature tensor. Proof. First we must define the covariant derivative of the curvature tensor. It is to be an F(M)-multilinear mapping V ( M ) x V ( M ) x V ( M ) x V ( M )+ V ( M ) ,denoted by and defined by
x,y, 2, w
+
N X ;y, ~>(w>,
W X ;r, Z > ( W = V A W , Z ) ( W ) ) - w x Y,Z > ( W - R( Y, Vx Z)(W ) - R( Y,Z, V,y W ) .
(24.30)
It is easily verified that this formula does actually define an F ( M ) multilinear mapping; hence it forms what is classically known as a " tensorfield " on M . Our earlier discussion of how to define the values at a point of such tensor fields as differential forms and vector fields can be extended to
t Notice that up to this point we have been using only the self-parallel property of geodesics.
340
Part 3. Global Riemannian Geometry
show that all these covariant derivatives of the curvature tensor possess “values” at points of M.For example, the value at p is a multilinear map M , x M , x M , x M , --f M,, denoted by
(u, u1, v 2
3
u3)
--f
R(u; u1, D d ( U 3 ) .
x, z,
For y, W E V ( W , R ( X ; Y, Z)(W)(P> = R(XCp); Y(P>,Z(P)>(W(P)). This definition can be iterated to define the higher covariant derivatives of the curvature tensor. Now we know that N totally geodesic is equivalent to V, Y tangent to N , for any A’, Y E Y ( M ) tangent to N. Hence V, V, Y and V, V, Y are tangent to N , for X , Y, Z tangent to N. By the Ricci identity connecting iterated covariant derivatives, we see that R ( X , Z ) ( Y ) is tangent to N. This leads to the statement:
Further covariant derivation leads to the analogous statement for the covariant derivatives. Q.E.D. Theorem 24.14 tells us that the tangent spaces to totally geodesic submanifolds cannot be arbitrary. The following theorem tells us what Riemannian manifolds have a maximal number of totally geodesic submanifolds. One feels intuitively that a “generic” Riemannian manifold can have very few totally geodesic submanifolds, but research in this direction is not very advanced.
THEOREM 24.15 Let M be a Riemannian manifold of dimension 2 3 such that each twoand three-dimensional tangent subspace is tangent to, respectively, a twoand three-dimensional totally geodesic submanifold. Then M has constant sectional curvature. Proof: Let p E M , and let N, be any two-dimensional subspace of M,. Then R(N,, N ~ ) ( N , )c N,. Let u E M p n N,‘. Now
since each linear transformation R(u,, u z ) is skew-symmetric. R ( N p ,N,)(u) must belong to N + (u), since N , ( u ) is tangent to a three-dimensional totally geodesic submanifold. By skew-symmetry of R( , ), these two relations are compatible only if R(N,, N,)(v) = 0. Since v is arbitrary in N,’,
+
NN,>N,W/)
= 0.
(24.31)
34 1
24. Submanifolds of Riemannian Geometry
Let ul, v2 be orthonormal vectors in N p . Then, since N p is an arbitrary twodimensional subspace, there are relations of the form R(u,, u2)(2)1)
=
-an,,
R(u,,
N u , , v)(u2)
a u , , u)(u) = h,
uz)(v2)
= 0,
= a02 9
R ( v , , u2)(v) = 0.
We want to prove that a = b. Let ul, v2 be orthonormal vectors in N p . Since (24.3 1) holds for arbitrary two-dimensional subspaces, R(u,, u
+ v2)(v - v 2 ) = 0.
But also R(v,, u + u2)(v - u2) = R(v,, u)(v) - R(u,, u2)(u2) = (b - a)v,,which implies a = b. This implies that the sectional curvatures of all two-dimensional subspaces of M p are the same. We now show that they remain constant when p varies over M . Let X i (1 i,j, . . . 5 m = dim M ; summation convention) be an orthonormal basis for vector fields on an open subset of M , let mi be a dual basis of I-forms, and let Q i j = R i j k l okA wI be the corresponding curvature forms. Let p - K be the function on M whose value at each point p is the common value of the sectional curvatures at this point. Then, by (24.31),
This implies that Qij =
Kwi A m i .
(24.32)
The Bianchi identities for the curvature tensor are dQj
= Wik A n k j - f i i k A W k j ,
where ( w i j )are the connection forms. From (24.32),
dRij
= KWikOk A W j - KO,A W k A Okj.
But also dQij
= d K A Oi A
Oj
+ K W i k A wk A
wj
- Koi
A ~
j Ak
ok.
Combining these two different ways of computing d Q i j , we have
dK
A
mi A m j = 0.
Here is where we use the fact that m 2 3. The 1-form d K can have zero inner product with all 2-forms only if it is zero; that is, K = constant. Q.E.D.
25
Groups of lsometries
One theme pervading mathematics for at least a hundred years is the emphasis on the reciprocity between a geometric structure and its group of automorphisms. This attitude pervades physics: For example, we may say that the whole point of the Theory of Special Relativity is to replace the automorphism group of Newtonian mechanics (the Galilean group) by the Lorentz group. Thus our study of Riemannian manifolds must take into account the group of its automorphisms. Since a complete development would involve us in the technicalities of Lie group theory, we shall limit our treatment to several topics that will give the flavor of what may be called “grouptheoretical geometry,” trying to get along with a minimum of Lie group theory. As usual in mathematics, the subject is rich and attractive precisely because it involves the interaction of two seemingly different disciplines, but this creates difficulties i n exposition. Let M be a Riemannian manifold, supposed, for simplicity, to be complete. A diffeomorphism 4 : M M is said to be an isometry of M if --f
ll+*(4ll
=
lI4l
for all u E T ( M ) ,
where u + /1u/1= ( u , u)”’ is the length function defined by the metric. Then preserves the length of curves; hence it also preserves distances between points. (It is a n interesting fact that, conversely, a distance preserving homeomorphism is an isometry.) Since obviously the product of two isometries is an isometry, as is the inverse, the set of all isometries forms a groupt. Now, the first general result of interest is that the group of all isometries forms a Lie group. Let I ( M ) be the group of isometries of M . As a start, we shall take over the following theorem without proof, which can be found in Helgason’s book [l].
THEOREM 25.1 Let A4 be a Riemannian manifold. I ( M ) can be made into a Lie group so that:
t It is assumed that the rcader is familiar with the definition and elementary algebraic propertics of groups, as well as certain standard notations. 342
25. Groups of Isometries
(a) The map Z ( M ) x M
x M is differentiable.
-+ which
assigns 4 ( p ) to each pair
343
(4,p ) E Z ( M )
(b) If p E M , and 41,4 2 ,. . . is a sequence in I ( M ) so that limj,m 4 j ( p ) exists, then at least one subsequence of all 4 converges to an element of Z ( M ) . The first topic to be studied concerns the relation between the orbits and isotropy groups of a closed subgroup G of Z ( M ) . It is known that G itself is a Lie group that acts, in the manner by which it is defined, as a differentiable transformation group on M . For p E M , the isotropy subgroup, denoted by GP, of G at p , is defined by
GP = { g E G : gp
=p } .
By (b) of Theorem 25.1, GP is compact. The orbit of G at p , denoted by Gp, is defined by Gp = { g p : g E G}. (It is convenient to simplify the notation g(p), that is, the transform of p by the diffeomorphism of M which is g, to gp when no confusion is likely.) Assertion (b) of Theorem 25.1 implies that each orbit is a closed subset of M . Further, each orbit is a submanifold. For the coset space GIGp is a manifold and the map GIGP M obtained by passing to the quotient from the map g + g .p if G -+M is a submanifold map. It is even a regularly embedded submanifold, since part (b) of Theorem 25.1 implies that a convergent sequence in Gp must also converge when considered as a sequence in GIGP. Now we can state the main general result concerning the structure of the orbits and isotropy subgroups of a closed group of isornetries. THEOREM 25.2 Let G be a closed group of isometries of a complete Riemannian manifold M . Let p E M , and let N = Gp, the orbit of G at p . Then there is an open set U of M containing N such that: (a) GU = U, that is, U is the union of orbits of G. (b) Every q E U can be joined to N by exactly one geodesic whose length is d(q, N ) . For q E U , Gq is conjugate within G to a subgroup of GP. (c) U is dense in M . (d) The main part of the proof is in the following lemma. LEMMA 25.3 Let N be a closed, regularly embedded submanifold of a Riemannian manifold M . Let N be the normal tangent vector bundle to N . Consider N
344
Part 3. Global Riemannian Geometry
as a submanifold of N', via the zero cross section.? Define: V = { u E N ' : There are no focal points of N along the geodesic t + exp(to), 0 < t I 1, and this geodesic is theonly geodesicof length 5 ((uII joining exp(u) to N ) .
Then V is an open subset of N' which contains N . exp restricted to V is a diffeomorphism of exp( V ) with V. Proof. Suppose that uo E V and that any neighborhood of vo in N' contains points of N' not lying in V. Now, since uo is not a focal point, that is, the Jacobian of exp: N' + M is nonsingular at u o , a neighborhood of uo in N' contains no focal points. Thus there are two sequences u l , u 2 , . . .; u l , u 2 , . . . of elements of N' with (a) limj+mu j = vo. (b) exp(uj) = exp(uj) for 1 5 j < GO, but uj # u j . (c) /Iujll I llujll for 1 < j < co. (d) The geodesics t exp(ruj) and t + exp(tuj) contain no focal points. --f
Suppose that ui E N i J . We see that all points p i lie at a bounded distance from p , where p is the point such that uo E Np'. Since the metric on M is complete and N is a closed regularly embedded submanifold of M , we can assume without loss of generality that lim p j
j-
=q
EN,
0)
lim u j = u EN,'.
j+m
Then exp(u) = exp(u,), ljull 5 IIuII, which implies that u = uo by the definition of V. This, however, contradicts the fact that there is a neighborhood of vo in N ' on which exp is 1-1 when restricted. This shows that V is open in N'. Now, by its definition, exp is 1-1 when restricted to V . Since it also has nonzero Jacobian at each point of V , exp is a diffeomorphism of V with exp(V). Return to the case where N is the orbit G * p of a closed subgroup G of I ( M ) . If V c N ' is as described in Lemma 25.3, it should be clear that$ g* u E V
for all v E V .
Hence, g(exp V ) = exp V ;
t That is, p t N is identified with the zero element of N i . J If .L/ denotes by the element of G and the diffeomorphism of M derived from C c I ( M ) , g* denotes the linear extension ofg to tangent vectors. g +g* defines an action of G on N I. Since .L/ sends a geodesic of M into a geodesic, the actions of G on M and N commute with the map exp N' + M .
25. Groups of Isometries
345
hence the U c M required for the theorem can be chosen as exp(V). This will, at any rate, satisfy (a) and (b). To prove (c), let q E U, and let o ( t ) , 0 5 t 5 1, be the geodesic of minimal length joining q to N . We have G4 c G"(O). For otherwise there is a g E G4 such that g 4 G"(O);that is, go(0) # o(O), but gq = q. Then c and go would be distinct geodesics of minimal length joining
q to N , contradicting that q E U. But o(0) E N = G p ; hence a(0) = gp for some g E G. Then one checks easily that GU(')= G g P = gGPg-' = Ad g(GP).
That is, Ad g-'(G4) c GP.
To show that U is dense in M , suppose q E M and o(t), 0 5 t I 1, is a geodesic of length d(q, N ) joining q to N . Then a ( t ) E CJ for 0 I t < 1. For otherwise there would be another geodesic y joining a(to)to N , y perpendicular to N at y(1). The corner between c and y at a(to) could be cut across to give a curve of shorter length than a joining q to N ; contradiction. This finishes the proof of Theorem 25.2. Remarks (A) Theorem 25.2 may be regarded as providing a local structure theorem for a group of isometries, asserting that in the neighborhood of an orbit the action of a closed isometry group is, in a sense, built up from the action of a transitive isometry group, namely, G on Gp, and a linear action of the isotropy subgroup, namely, GP on Np'. (B) Let us say that a point p E M is a maximal point for the action of G on M i f
dim G P5 dim G4
for all q E M ,
dim Gp 2 dim Cq
for all q E M .
or, equivalently, Let us say that a point p E M is a principal point for the action of G on A4 if p is a maximal point, and if the number of connected components? of C p is no greater than the number of connected components of C4, for any other
t Recall that G' is a compact topological group; hence, as a topological space, it has only a finite number of connected components. As in any topological group, the component containing the identity element is an invariant subgroup of G'.
346
Part 3. Global Riemannian Geometry
maximal point q E M . Thus Theorem 25.2 guarantees that if p is a principal point or maximal point for the action of G, so are all points of U . In particular, the set of all principal and maximal points are both open and dense in M . I n general, if p E M , g E GP,g* maps Npi into Np'. This defines a homomorphism of G into the group of linear transformations on Np'. (This is the linear action referred to in (A) above.) Notice that:
If p is a principal point for the action of C on M , for each g E GP,g * : N,' + N,' is the identity map. To prove this remark, suppose u E N,'. We may suppose that [lull is sufficiently small so that exp(tu) E U for 0 2 t 5 1. Thus Gexp@)c GP. Since p is a principal point, Gexp(u) = G p ; hence g exp(u) = exp(u). But g exp(u) = exp(g,(u)); hence exp(u) = exp(g,(u)), forcing u = g,(u). Hence the action of G in a neighborhood of a principal orbit is " trivial" i n the sense that a neighborhood is the product of the orbit by a cell of Euclidean space, and the group action on the Euclidean cell is trivial, that is, every element of the group acts as the identity. Another way of putting this is to say that, at the principal points, the structure of isometry groups is just that determined by the extreme types, namely, the transitive groups and the trivial groups. Now we would like to get some idea of the structure of the action of G at points that are not principal orbits. Let us say that two points p , q E M lie in the same orbit class if the isotropy subgroups GP and Gq are conjugate within G.
THEORFM 25.4 Let G be a closed group of isometries of a complete Riemannian manifold M . Let C be a compact subset of M . Then there are only a finite number of orbit classes among the points of C . Proof. Proceed by induction on M . If it is zero dimensional, G must be a finite group; hence the statement is obvious. Suppose it is not true for M , but is true for all manifolds of lower dimension. Let (p,), 1 < j < GO, be a sequence of points of C such that the isotropy subgroups Gp' are all nonconjugate within G. Since C is compact, we can suppose that lim,+m p J = p . Let N = Gp, the orbit of G at p , and let N L be the normal tangent vector bundle to N . Define
S
= {V E N':
ll~i= l l}.
S is a manifold of one less dimension than M . G acts on S : For a given g E G gives, by definition, a diffeomorphism of M . Its differential g? is a diffeomorphism of T ( M ) , and the correspondence g + gu defines an action of G on T ( M ) . (Exercise: Prove this.) S is a submanifold of T ( M ) and
25. Groups of Isometries
347
clearly each g* maps S into itself; hence it defines an action of G on S as a transformation group. We want to apply our induction hypothesis to this action. To do this we must know that S admits a Riemannian metric having the property that G acts as a group of isornetries. This will follow from a lemma. LEMMA 25.5 Let M be a Riemannian manifold. Then T ( M ) admits a Riemannian metric having the property that the group of isornetries of M , when extended to an action o n T ( M ) , acts as an isometry group on T ( M ) . We leave the proof of this lemma as an exercise. At any rate, we suppose that there are only a finite number of orbit classes of the action of G on S. Suppose that q is a point of M that is close t o p , that is, so that q = exp u, for some u E V, where V is the subset of N’ that is described in Lemma 25.3. Note that (34
=G
~ / ~ ~ ” ~ ~
where the right-hand side is the isotropy group of the action of G on S. For if g,(u) = u, then 94 = 9 exp(u) = exp(g*(u)) = exp(0) = 4 .
That is, G’’llullc G4. If gq = q, then g,(u) = u, for otherwise exp(u) = exp(g,(c)), contradicting the definition of V, whence Gq c Gv’IIuII. Hence there are only a finite number of orbit classes among the orbits of points nearp, which contradict the fact that limj+apj = p , and that the orbit classes among the (pj) are distinct. In studying the distribution of the various orbits of G, it is convenient to consider the set of orbits as itself forming a space. Dejinition Let G be a closed group of isometries of a Riemannian space M . The
orbit space of the action of G on M is a space, denoted by G\M, abstractly
constructed as follows: A point of C\M is an orbit of the action of G on M . G \ M is made into a metric space as follows: For p , q E M , the “distance” between the orbits Gp and Gq is just the minimal distance d(Gp, Gq) between the subsets, defined as usual by using the given Riemannian metric on M . Define the projection mapping 4 : M - t G\M by assigning 4 ( p ) = Gp to each p E M .
348
Part 3. Global Riemannian Geometry
THEOREM 25.6 The orbit space G\M with the distance function defined as above is a well-defined metric space. The projection mapping 4 : M + G\M defined above is an open, continuous mapping. Let M o be the set of p E M such that Gp is a principal orbit of G, and let (C\M)' be the set of principal orbits, that is, the image under 4 of M o . Then (G\M)' can be made into a manifold so that 4 : M o + (G\M)O is a maximal rank mapping, in fact a principal fiber bundle with structure group G. Proof. Given p so that
E
M , one can choose a point q on any given orbit of G 4P7
4 ) = 4GP,
(since M is complete and the orbits of G are all closed in M ) . This fact suffices to show that the metric space axioms are satisfied for G\M. d(Gp, Gq) = 0 *d(p,q)
=0
*p =
*Gp
= Gq.
4 G P , Gq) = 4 P ,4 ) = 4 q , P) = 4 G q , Gp). To show transitivity, that is, 4GP, Gr)I 4 G P , Gq) + 4 G q , Gr), choose q and r on their respective orbits so that
~ G PGq) , = 4 p , 91,
+
d(Gg, G r ) = d(q, r ) .
Then d(Gp, C r ) I d(p, r ) 5 d(p, q ) d(q, r ) , whence transitivity. Continuity of 4 also follows easily from this property of the orbits. Suppose that U is an open subset of M , that p E U , and that d(Gq, Gp) is sufficiently small. As remarked above, we can choose q so that d(Gq, Gp) = d(p, q ) . Hence q E U if d(Gq, Gp) is sufficiently small, and $ ( U ) is open in G\M. There is another way of stating this result:
4 - ' ( 4 ( ~ is ) )open. But 4 - ' ( $ ( U ) ) is the saturation of U with respect to G, that is, the union of all orbits of G that touch U . Turn to the last statement of the theorem. We have seen that M o is open in M (Theorem 25.2). Let p E M o and N = Gp. If V is the subset of N' defined in Theorem 25.2, we have seen that exp( V ) is an open subset of M o , and that exp( V n M p ) intersects each orbit of G only once. Thus V n Mp -+ 4 exp( V n M p ) provides a homeomorphism of an open subset of Gp with an open subset of a Euclidean space. It is readily verified that these homeomorphisms combine in the right way to provide a manifold structure for G\M*, which has almost by definition the property that 4 is a maximal rank mapping.
25. Groups of Isometries
349
The fact that 4 : M o + 4 ( M o ) is a principal fiber bundle mapping follows now at once from the definition of principal fiber bundle, which goes as follows :
Definition Let E and B be topological spaces, let 4 : E + B be a continuous mapping, and let G be a topological group that acts on E. This setup is said to define a principaljiber bundle with structure group G and projection map 4, denoted by (E, B, 4, G), if each point b E B has a neighborhood U c B and a mapping 4 : U x G+ E, such that: (a) +(U x G) is an open subset of E, and 4 is a homeomorphism with this subset. (b) For 6‘ E U , g, g‘ E G, g4@, s’) = 4@,gg’).
Roughly, we may say that a principal fiber bundle is determined by the action of a topological group G on a space E so that (i) G acts simply on E, that is, g E G, e E E, ge = e implies g = identity; (ii) each point of E has a neighborhood invariant under G which is isomorphic to the simplest type of action of G, namely, G acting on a product U x G by leaving each element of U fixed and acting on G by left translations. Return to the case where G is a closed group of isometries acting on a complete Riemannian manifold M . Now the principal orbits of G are dense in G\M and hence have a manifold structure. This fact, together with some computations of orbit classes in special cases, suggests that an orbit space be regarded as a sort of “ generalized manifold,” with the distance function on it that we have used above to define a “ generalized Riemannian structure.” We proceed then with a geometric study of the orbit space G\M, based on the fact that it is a metric space in the natural way described above. LEMMA 25.7 Let G be a Lie group of isometries of a Riemannian manifold M . Let
a: [0, 11 + M be a geodesic of A4 that is perpendicular to one orbit of G, say, to Ga(0). Then o(t>is perpendicular to G a ( t ) for 0 t < 1.
Proof. Let g(s), 0 I s 5 1, be a curve in G, withg(0) = the identity element. Let 6(s, t ) = g(s)o(t), 0 5 s, t 5 1. 6 is a homotopy in M having the property that, for fixed s, the curve t 3 6(s, t ) is a geodesic whose length is equal to the length of G (since each transformation g(s) is an isometry of A4 and hence
350
Part 3. Global Riemannian Geometry
maps geodesics of M into geodesics). Let v(t) = d, 6(0, t ) ; that is, t + v(t) is the vector field on y representing the infinitesimal deformation. The first variation formula implies that (v(t), y’(t))
=
(v(a), y’(a))
for 0 i t I 1.
( ~ ) ; (u(O), y’(0)) = 0, implying that (v(t), y ’ ( t ) ) = 0. Now v(a) E ( G ~ ( U ) ) ~hence But, as y varies over all such curves in G, v ( t ) fills up the tangent space to the orbit of G at y ( t ) ; hence y ’ ( t ) is perpendicular to the orbit as required. Say that a geodesic of A4 is transversal to the action of G if it is perpendicular to each orbit of G that it touches. Lemma 25.7 then asserts that there is a plentiful supply of such transversal geodesics. Let o(t), 0 I t 5 1, be one of them. We say that a Jacobi vector field t + v(t) along o is transversal to the action of G if there is a geodesic deformation 6(s, t ) of o whose infinitesimal deformation vector field is v, that is, so that v(t) = d,6(0, t ) . A curve of G\M is a geodesic of G\M if it is equal to the projection under the projection map M G\M of a geodesic of M that is transversal to the action of G. (The justification for this simplification is that it can be shown that these are precisely the curves that locally minimize arc length, as arc length is defined in the general theory of metric spaces.) --f
THEOREM 25.8 Let G be a closed group of isometries of a complete Riemannian manifold M . Let L be a closed subgroup of G, and put M ( L ) = { p E M : GP is conjugate within G to L } . Then M ( L ) and G\M(L) are manifolds and the projection map M ( L ) + G\M(L) is a fiber space map. Further, G\M(L) is a totally geodesic subset of G\M.
Proof. Obviously, if two points of M lie on the same orbit of G, then the isotropy subgroups of these two point are conjugate in G. Hence M ( L ) contains the entire orbit of any point it touches: By C\M(L) we mean the subset of the orbit space consisting of those orbits that lie in M(L). Let p E M , let N be the orbit of G at p , and let CJ be the neighborhood of N having the properties listed in Theorem 25.2. We have seen that, for any q E U , Gq is conjugate to a subgroup of G”. Suppose that GP = L, that is, p E M ( L ) . Any q E U can then be transformed by a n element of G so that Gq c G P , and so that the unique geodesic of shortest length joining q to N ends at p . Suppose t + exp(tu), 0 5 t 5 1, v E Np’, is this geodesic. Then q E M ( L ) if and only if Gq = G P .This fact and the geometric properties of U force the following conclusion: G P= GP if and only if g,(u) = v for all g E GP.
25. Groups of Isometries
35 1
This suggests the following way to parametrize points of M(L) close to p . Define N,'(L) = {u E N , ' :g,(u) = u for all g E L ) . There is a map G/L x N,'(L) + M defined as follows : For g E G, u E NpL(U ) , map ( 9 , ~ )into exp(g&)). Since exp((g4,4 = e d g , e*(4) = exp(g,(u)), this map passes to the quotient to define the desired map of G/L x N,'(L). (G/L denotes the space of left cosets of L in G, considered, of course, as a homogeneous space of G.) If we restrict to the product of G/L with a sufficiently small neighborhood of 0 in N,'(L), by our above remarks this map is a homeomorphism with a neighborhood of p in M(L). Similarly, mapping this neighborhood of zero in N,'(L) onto M(L), then projecting on G\M(L), defines a homeomorphism with a neighborhood of the orbit Gp. If we use these homeomorphisms to define manifold structures for M(L) and G\M(L) (it is left to the reader to check that the manifold axioms are fulfilled), that the map M ( L ) --f G\M(L) is a fiber space map follows more or less by definition. Turn to the totally geodesic statement. We must make precise what is meant by " totally geodesic" in G\M, since it is not quite a Riemannian manifold. Now, of all the equivalent defining properties of a totally geodesic submanifold of a Riemannian manifold, one is adapted to generalize to a metric space: We say that a subset A of a metric space Q is totally geodesic if, given q E A , there is a neighborhood U of q in Q such that every geodesic curve of Q starting at q, which lies in U and ends on A , must lie completely in A . In the case Q = G\M, with the fact that geodesics in Q are projections of G-transversal geodesics in M , we see that the neighborhood constructed above of the orbit whose isotropy group is L has precisely this property; hence G\M(L) is totally geodesic in G\M. Q.E.D. The next question is: Given a subgroup L c G, does the metric space structure on G\M, restricted to G\M(L), arise from a Riemannian metric on G\M(L), and how can this Riemannian metric be computed in terms of the given Riemannian metric on M ? THEOREM 25.9 Let A4 be a complete Riemannian manifold, and let L be a group of isometries of M . Let F ( L ) be the fixed point set of L ; that is, F(L)= {pE M:gp=pforallgEL}. Then F ( L ) is a closed, regularly embedded totally geodesic submanifold of M . In addition, for p E F(L), the tangent space to F ( L ) at p , namely F(L), , is precisely { u E M p : g,(v) = u for all g E L ) .
352
Part 3. Global Riemannian Geometry
Pro06 Let p , q E F(L). Let o(t),0 5 t 5 1 , be the geodesic joining p to q, and let g E L . If p and q are sufficiently close together (for example, if q lies on a convex ball about p ) , then ga(t) = o(t) for 0 I t I 1. (Otherwise there would be two small geodesics joining p to q.) If u = a'(O), then g,(u) = u and q = exp(u). This shows that the intersection of F ( L ) and a sufficiently small neighborhood of p is contained in exp({u E M p : g*(u) = u for all g E L } ) . This provides coordinate systems to make F ( L ) into a manifold. The argument also shows that it is a totally geodesic submanifold, since a sufficiently small geodesic of M whose end points lie on F ( L ) must then lie completely in F ( L ) . THEOREM 25.10 Let G be a closed group of isometries of a complete Riemannian manifold M . Let L be a compact subgroup of G, and let N ( L , G) be the normalizer of L in G; that is,
N(L, G)
= {gE
G: gLg-'
c L}.
Let F o ( L ) = { p E M : GP= L } . Then F o ( L ) is an open submanifold of F(L), which is left invariant by the action of N(L, G), and N(L, G)\Fo(L) is isomorphic to G\M(L). All orbits of N(L, C ) on F o ( L ) are principal. Proof. To show that F o ( L ) is open in F ( L ) , suppose p E Fo(L) and that q E F ( L ) is sufficiently close to p . Then G P= L c Gq. We have seen that G4is conjugate to a subgroup of GP. This, however, is possible (since they are compact) if and only if G4= L , that is, q E Fo(L). Suppose now that p E Fo(L), g E G, and g p E Fo(L). Then GP= L = Ggp. But G ~ =P S ~ ~ S = - lgLg-l. (Proof: Let h E Ggp.Then hgp = gp, or g-'hgp = p , or g-lhp E G P , or h E g G P g - ' . ) Thus g must belong to N(L, G). Let p E Fo(L). The orbit of p under N ( L , G) must be contained in M(L).
Thus we obtain a map of N ( L , G)\Fo(L) + G\M(L), which is obviously onto. Let us show that it is 1-1 : Suppose, then, that points p , q E Fo(L) lie on the same orbit of G. The argument in the last paragraph shows that p and q must lie on the same orbit of N(L, G).
353
25. Groups of Isometries
The isotropy subgroup of N(L, G) at each point of Fo(L) is obviously N(L, G) n L;hence all orbits are principal and the orbit space N(L, G)\Fo(L) is a manifold. That the mapping
W L , C)\F0(L)
+
G\M(L)
is differentiable is seen by referring back to the way in which the manifold structures were defined. (Details are left to the reader.) To complete the proof, we shall show that this map is an isometry of Riemannian manifolds. Let p and q be points of F o ( L ) that are close together, and let o(t), 0 i t i 1 , be a geodesic of minimal length joining Gp to Cq. After translating by G, we can suppose that a(0) = p . Now length o i d(p, q). Hence, if d(p, q) is sufficiently small, we have G U ( lc ) GP = L ,
But a(1) = gq, for some g E G; hence G"(') g E N ( L , G). Thus
= L,
hence a(1) E Fo(L), and
&Gp, Gq) 2 4 N ( L G ) P ,N ( L , G ) d .
The reverse inequality is obvious.
Q.E.D.
We change direction now to treat the infinitesimal properties of groups of isometries. First we must describe the Lie algebra of a Lie group. Definition
Let G be a Lie group. Its Lie algebra, usually denoted by G , is defined as follows: (a) An element of G is a one-parameter subgroup of G, that is, a mapping g: R + G such that g(t
+ s) = g(t) . g(s)
for - 00 < t , s < 00.
(b) G is defined as a real Lie algebra (in the abstract sense) as follows: If g1 and gz are one-parameter subgroups, that is, elements of G , then g1 + g 2 is the one-parameter subgroup g3:
The Jacobi bracket, [gl,g 2 ] ,is the one-parameter sugbroup g4:
It is shown in treatises in the theory of Lie groups that these limits exist,
354
Part 3. Global Riemannian Geometry
define one-parameter subgroups, and that these operations satisfy the algebraic conditions necessary to show that G is well defined as a Lie algebra. For our purposes, it is most important to see what the Lie algebra means in terms of transformation groups. Suppose, then, that G acts as a transformation group (in a C" way) on a manifold M . Each one-parameter subgroup t g ( t ) in G then acts on M . We have seen that it has a unique vector field X as an infinitesimal generator; that is, each orbit t -g(t)p of g is an integral curve of X . It is most important to realize that this mapping G + V ( M ) is a Lie algebra homomorphism; that is, that the sum and bracket of two one-parameter subgroups have as infinitesimal generators the sum and Jacobi bracket of this infinitesimal generator. (Again we assume this as a basic fact in the theory of Lie groups.) This homomorphism G 4 V ( M ) may be described as the injinitesiinaf version of the action of G on M . Return to the case of a Riemannian manifold. ---f
THEOREM 25.1 1 Let X be the infinitesimal generator of a one-parameter group of transformations of a Riemannian manifold M . Then this is a one-parameter group of isometries if and only if
(VyX, Z )
+ (V,X,
Y )=0
for all Y, Z
E
V(A4).
(25.1)
These conditions amount to a system of differential equations for the vector field X . These equations are called the Killing equations, and the solutions are called Killing vector fields.
Proof. Let t + g ( t ) be the one-parameter transformation group generated by X. If Y is a vector field, let p be the vector field: p -tg(t)*( Y ( g ( - t ) p ) ) . The following relation is easily derived:
Expressing the fact that g,is a group of isometries, we have
Taking a j d t and setting t
= 0,
X ( ( Y, Z ) ) = ( [ X , Y ] , Z)
we have
+ ( Y , [ X , Z])
for all Y, Z
E
V(A4). (25.2)
(Although we shall not pursue the point, the equations are those guaranteeing
25. Groups of Isometries
355
that the Lie derivative by X of the metric tensor is zero.) But
W < Y , -0) = (VX y, -0+
dxi dxj
g..--. " ds ds
First suppose that the coordinate system may be chosen so that X Then, using (25.2),
= d/dx,.
and the length of a curve is
But the integral curves of X , that is, the orbits of g ( t ) , are then t -+ XZ(XI(0)
+ t, XAO),
. . ., x,(O)),
and we see quite explicitly that these transformations preserve length. In case X cannot be put (locally) in this canonical form, proceed as follows: Introduce M ' = M x (- co, a),with t the additional coordinate, X ' = X + ( d / d t ) , and the metric on M ' that is the product of that on M and the Euclidean metric on (- co, co). It is readily verified that X ' is a Killing vector field for this product metric and that the one-parameter group it generates is a group of isometries if and only if t + g l is composed of isometries. But the above argument can be applied to X ' , since it is everywhere nonzero. Q.E.D.
Remark. As a bonus from this proof, we see that a metric admits a Killing vector field that is nonzero in a neighborhood of a point ifand only ifthe point admits a coordinate system in which the components g i j of the metric are functions of x 2 , . . . , x, alone. The following geometric fact is useful as an illustration of what can be done with the Killing equations.
356
Part 3. Global Riemannian Geometry
THEOREM 25.12 Let X be a Killing vector field on a Riemannian manifold M , and let
f = ( X , X ) be the length function of X on M . Then a point p E M is a critical point for f if and only if the integral curve of X beginning at p is a geodesic of M . Proof.
Let Y E V ( M ) . Y((X,X ) )
= 2(V,X,
X)
=
-2(V,X,
Y).
Hence p is a critical point for ( X , X ) if and only if Vx X(p> = 0. If X ( p ) = 0, we are finished, since we count a constant integral curve of X as a geodesic, even if a degenerate one. Suppose X ( p ) # 0. Let a(s), 0 I t I I , be the geodesic of M , with a(0) = p , a'(0) = X ( p ) . Let u(s) = X(a(s)), that is, u is X restricted to a. A general property is: u is a Jacobi vectorjield on a. This is easiest to see if X generates a global one-parameter group of isometries: t -+ 4 t . Then s -+ 4(at(s)) is also a geodesic; hence 6(s, t ) = 4 t ( ~ ( s ) )is a geodesic deformation of a whose infinitesimal deformation field is precisely u. That u is Jacobi follows from our earlier work. In the case where X does not generate a global isometry group, a slight variant of this argument may be used: 6(s, t ) may still be defined, for t sufficiently small, so that for each s, t -+ 6(s, t ) is an integral curve of X . Again we must prove that s + 6(s, t ) is geodesic. This can be done, for example, by reducing X to canonical form, as in the proof of Theorem 6.3. But Vv(0) = Vat(())x
= v x X(p).
Thus p is a critical point if Vv(0) = 0, that is, if and only if (by the uniqueness of Jacobi fields) X(a(s)) = u(s) = ~ ' ( s ) for 0 I s I 1. This is precisely the desired conclusion. A detailed study of the Killing equations as differential equations is not possible here, but we shall do a few items of this nature as illustrations. LEMMA 25.13 Let X be a Killing vector field on M ; p a point such that X ( p ) = 0 = V, X , for all u E M , . Then X is identically zero on M . Proof: Let a($), 0 _< s I I , be a geodesic of M beginning at p . We have seen that X restricted to a is a Jacobi vector field. The initial value of this Jacobi field and its first covariant derivative must vanish; hence X restricted to a must vanish identically, hence X is zero in a neighborhood of p . Thus the set of all points where X is zero is both open and closed ; hence, equals M , since A4 is connected. Q.E.D.
357
25. Groups of Isometries
THEOREM 25.14 Let M be a Riemannian manifold, and let I(M) denote the set of Killing vector fields on M . Then: (a) I(M) is a Lie subalgebra of V ( M ) . (b) If n = dim M , then dim I(M) I n(n + 1)/2. (c) If X E I(M), Y, 2 E V ( M ) , then
vyvzx = R(Y, X ) ( Z ) + * V [ , , , , X .
(d) For X, Y EI ( M ) , 2 E V ( M ) , R(X, WZ)
= V[X,Y,Z - 3V[Z,X]y
+ P[,,Y ] X + cz, CX, Y l l .
Proof. The straightforward computation needed to prove (a) is left to the reader. We now prove (b): = { X EI(M): X ( p ) = O}. Since I ( M ) P is the kernel For p E M , let of the restriction map X+ X ( p ) , it suffices to show that dim I(M)P I
n(n ~
n(n - 1) + 1) - n = ~.
2
2
Now each X E I ( M ) Phas, by definition, a singular point at p . We can then define the linear approximation to X at p , I , , as a linear transformation: Mp + M p as follows: I,(u) = [ Y, X ] ( p ) ,where Y E V ( M )is such that Y ( p ) = u. For w E Mp, (jx(u), w > = (CY, XI, w> == (VX y, w > - ( V Y X , w> = ( v x ( p )y, w > - ( V , x , w > = -(V,X, w>, since X ( p ) = 0. We conclude: lx(u) = -V, X
for u E M p .
X is identically zero if and only if lX Lemma 25.13).
=0
(using
I, is a skew-symmetric (with respect to the form ( , )) linear transformation of M p .
Thus dim I ( M ) p5 dimension of space of skew-symmetric n x n real matrices = n(n - 1)/2. Turn to the proof of (c). If CT is a geodesic of M , we see that X restricted to CT is a Jacobi vector field. Then, writing out the Jacobi equations, Vd,,, v,.,,,x
= R(o’(s),X ) ( C ’ ( S ) > .
358
Part 3. Global Riemannian Geometry
Since o is arbitrary, for all Y E V ( M ) .
V y V y X = R( Y, X)( Y) We want to “polarize” this identity:
+ vzvzx + vzvyx + v y v z x = R ( Y -t Z , X ) ( Y + Z ) = R( Y, X ) ( Y ) + R ( Z , X ) ( z ) + R( Y, X ) ( Z ) + R ( Z , X ) ( Y ) .
V ( y + z , V ( y + z ) X= v
Hence,
VZVyX V, Vy X
y v y x
+ V y V z X = R ( Z , X)(Y) + R(Y, X ) ( Z ) . + Vy Vz X = 2Vy V, X + R ( Z , Y)(X) + V[z, ylX
(using the Ricci identity for iterated covariant derivatives). Use the “ cyclic permutation identity” for the curvature tensor: R ( 2 , X ) ( Y) - R ( Z , Y)(X) - R( Y, X ) ( Z ) = 0.
Putting these together gives (c). Proof of (d):
“ X , Y1,Zl
= V[X, y] = (using
z
-
VAX, Y l
= V[X, Y]
z - vz v x y + vz v y x
(c)), VLX,y1 Z - W ,Y)(X) - SVCz,.,Y
+ R(Z, X ) ( Y ) + +V[z,
y1 X
.
Use the cyclic identity for R :
w, X)( Y )
-
R(Z, Y ) ( X ) - R(Y, XNZ) = 0,
and we get (d).
Q.E.D.
I n favorable cases, formula (d) may be applied to compute the curvature tensor of M in terms of the Lie-algebra structure of I ( M ) . The most favorable situation is when M is a symmetric homogeneous space. De$nition Suppose that a manifold A4 is acted on transitively by a Lie group G, with K the isotropy subgroup of G at a fixed point p E M (so that M is the coset space C I K ) . Let G be the Lie algebra of G, K the subalgebra corresponding to K . M is then said to be a sj-mnietric homogeneous space (relative to the action of G) if there is an automorphism a of G such that
K
=
{ X E M :a ( X ) = X } ;
a2 = the identity.
359
25. Groups of Isometries
This condition can be rephrased as follows: Put
P = {X
E
G :a ( X ) = - X I .
Then, (a) G is the direct sum of K and P; (b) IK, PI = P; ( 4 [P7PIc K. Conversely, if a subspace P of G exists satisfying these three conditions, ci can be defined as the identity on K, and minus the identity on P, so that the symmetric condition is equivalent to the existence of such a P. We also say that a K satisfying this condition is a symmetric subalgebra. The symmetric spaces are very important for differential geometry, since on the one hand they include most of the interesting “ classical spaces, such as spaces of constant curvature, projective spaces, and Grassman varieties, and on the other hand their geometric properties can be treated with general methods that do not work so well for more complicated sorts of spaces. They form a “max-min” class of spaces, that is, they seem to be the largest class of spaces that can be treated with certain unified techniques. Of course this definition is not the most geometric one possible, but we must refer to Helgason’s book [I] for details. We shall present two general theorems about them as illustrations. ”
25.15 THEOREM Suppose M = G/K is a symmetric homogeneous space, and that M is a Riemannian manifold on which G acts as a group of isometries. Let P be the subspace of G satisfying (a), (b), and (c) of the definition. (a) For X , Y,ZEP,
where p is the point of M at which K is the isotropy subgroup of G. Qualitatively, the curvature tensor of M is determined completely by the algebraic properties of G. (b) The covariant derivative of the curvature tensor of M is zero; that is, the curvature tensor is invariant under parallel translation. Proof. The infinitesimal action of G on M defines G as a Lie algebra of Killing vector fields on M . The basic conditions we need are: The restriction map X + X ( p ) defines an isomorphism of P with Mp.
360
Part 3. Global Riemannian Geometry
For X , Y EP, [ X , Y ] ( p )= 0. Theorem 25.14 gives R ( X , Y ) ( Z )= V[X,YlZ - P,,, X ] y
+ JVrz,,]X + [ Z , [ X , Y ] ]
for X , Y, Z E G .
(25.3)
If we take X , Y , 2 E P, and restrict to p , we have part (a). We now prove for X , Y E P.
V , Y ( p )= 0
First Vx Y ( p )= ( V , X
+ [ X , Y ] ) ( p )= V, X ( p ) . Take Z E P:
(25.4)
( V , y, Z X P ) = -
z, Y ) ( P ) = (V, X , Y X P ) = - ( V , x , Z X p ) = - ( V , y, Z>(P), = (VX
or ( V x Y, Z X P ) = 0 ;
hence
forcing V, Y ( p ) = 0. The covariant derivative of the curvature tensor is an F(M)-multilinear mapping of V ( M )x V(M) x V(M)x V ( M ) + V(M),
denoted by R( ; , , )( ), and defined by the identity V w ( R ( X , V ( Z ) )= R ( W ; X , V ( Z )+ R ( V w X , Y ) ( Z )+ R ( X , V , Y ) ( Z )
+ R ( X , Y)(V,Z)
for X , Y, Z, W E V ( M ) .
(25.5)
(This identity can be turned around actually to define the covariant derivative of R , or any tensor field, for that matter. One simply puts R( W ; X , Y ) ( Z )on the left-hand side, everything else on the right-hand side, uses the resulting formula to define R( W ; X , Y ) ( Z )in terms of R and V, and then verifies that the formula is F(A4)-multilinear.) We now demonstrate how this can be used to show that the vanishing of R( W ; X , Y ) ( Z )identically is necessary and sufficient for the invariance of R (or any tensor field, for that matter) under parallel translation. Let a(t), 0 It I 1, be a regular curve of M , and let v l ( t ) , vz(t), v3(t) be self-parallel vector fields on 0 ;that is, vv,
=0 =
vv, = vv, .
Invariance under parallel translation of R means, then, that V(R(v1,%)(V3))= 0.
361
25. Groups of Isometries
It is sufficient to verify this in a special case, namely, suppose X , Y, Z, Ware vector fields such that a’@) = W(o(t)),
X(4t))
=
%(O,
Y(4t))
=d
0 I t I 1,
t),
Z ( 4 t ) ) = UJt).
Then
Vv,(t) = (by definition), V, X(a(t)), Vu,(t) = V, Y(a(t)),
V(N%,
Q(U3))
= V,(NX,
etc.
Y ) ( Z ) ) ( o ( t )= ) 0,
if R( ; , )( ) vanishes identically. The converse follows by just inverting the argument. After this excursion into tensor analysis, we return to proving part (b). Since G acts transitively on M and any isometry preserves R and all its covariant derivatives (the proof is left to the reader), it suffices to prove that the covariant derivative of R vanishes at p . Using (25.4) and (25.5), it suffices to prove that
V,(R(X, Y ) ( Z ) ) ( p ) = 0
for X , Y, 2, W E P.
By (25.4), V,([Z, [ X , Y ] ] ) ( p = ) 0. Thus, when we apply Vw to both sides of (25.3), it suffices to deal with a term of the form
V,V,,,
yl Z = (using
Theorem 25.14), R( W,Z ) ( [ X , Y ] )+ $VLw,Ix,y,, Z.
But [ X , Y ]E K ; hence [W, [A’, Y ] ]E P, and we see that both terms vanish at p , which finishes the proof.
26
Deformation of Submanifolds in Riemannian Spaces
Let M and N be Riemannian manifolds with, say, dim N < dim M . The embedding and deformation problems can be described as follows : (A)
4:N
Does there exist an isometric embedding of N in M , that is, a mapping that is 1-1 and satisfies
+M
ll4*(v)ll
=
1141
for 21 E
W)?
(B) Given two isometric embeddings 41,42:N - + M , does there exist an isometry CY: M M such that --f
If not, how can the family of all isometric embeddings be described? These may be described as the two main problems of Riemannian geometry. Both involve the theory of nonlinear partial differential equations (mostly of the yet-to-be-created kind); hence we cannot present much more than a series of incomplete comments, which at most will serve to introduce the reader to the problems. In addition in this chapter we mean to introduce the more general ideas that will appear in the second volume of this book. The Generalization of Developable Surfaces Classically, a deivdopable surface is a surface in Euclidean 3-space satisfying the condition: Each point of the surface is contained in a curve having the property that the tangent plane to the surface is self-parallel along the curve. One of the main theorems of elementary surface theory is that the Gaussian curvature of the developable surface is zero and that, conversely, if the Gaussian curvature is zero and if certain local regularity conditions are satisfied, then the surface is developable. (Actually, the classical theory of surfaces is imprecise on this point.) This definition makes sense for an arbitrary submanifold of a Riemannian manifold if, of course, one replaces " parallel " by the parallel translation of Riemannian geometry. Let us make this precise. Suppose N is a submanifold of the Riemannian manifold M , V is the covariant derivative operation 362
363
26. Deformation of Submanifolds
associated with the Riemannian structure on M , and uct. Let Q be a submanifold of N such that:
< , ) is the inner prod-
For each curve cr(t), 0 < t 1, in Q, the field t is self-parallel under parallel translation along
--f
No(,)
CT.
The condition for this is that t --+ Vu(t) be tangent to N for any vector field t -+ u ( t ) along CT that is tangent to N . This implies the following condition:
V, Y is tangent to N , for each X , Y E V ( M ) such that X i s tangent to Q, Y is tangent to N .
(26.1)
Let S, )( , ) be the second fundamental form of N ; that is, S,(X, Y)= (V, Y,u ) for each X , Y tangent to N , each u E N'.
(26.2)
Then (26.1) is equivalent to S N p l ( Q pN, P )= 0
for all p
E
Q.
(26.3)
Equation (26.1) can now be iterated: V , , V , Y is tangent to N , for each X , XI tangent to Q, each Y tangent to N . But
v,,
hence,
v x Y = v x v,, y
WQ,, Qp)(N,) = N,
+ W X ,X,)(Y>; for all P
E
Q.
(26.4)
We can iterate by covariantly differentiating (26.4), giving the conditions
WQ,; . . .; Q p ;Q p ,
Q,)(N,)
= N,
for all P
E
S.
(26.5)
(R( ; . . .; , )( ) denotes all successive covariant derivatives of the curvature tensor, as defined in Chapter 23.)
THEOREM 26.1 Let N be a submanifold of a Riemannian manifold M , Q a submanifold of N . Conditions (26.3) through (26.5) are necessary and sufficient that the tangent spaces to N be self-parallel along Q. That these conditions are sufficient is easily seen by reversing these steps. We now search for the conditions for the existence of such a Q . Condition (26.2) suggests the following definition. Definition Let N be a submanifold of M . A vector u E N , is a characteristic vector (of the second fundamental form) if sx(u,
N,)
=0
for all u E N,'.
(26.6)
364
Part 3. Global Riemannian Geometry
C, denotes the set of characteristic vectors ( C , ( N ) if ambiguity is possible). Let C,‘ be the following subspace of C,: C,‘
= {U E
C,: R(u, N P ) ( N P c ) N,,
(26.7a)
and R(u; N,;
. . . ; N,;
N,, N p ) ( N p )c N,for all p E S } .
(26.7 b)
THEOREM 26.2 Let N be a submanifold of M , with C,‘ defined for p E M by (26.7). Suppose that dim C,‘ = constant for p E N . (26.8) Then, through each p Q such that
E
N , there is a unique maximal connected submanifold
Q,
=
for each q E Q .
C,’
(26.9)
Further, Q is a totally geodesic submanifold of M (hence also of N ) such that the tangent spaces to N are self-parallel along Q . Proof. Let C’ = { X E V ( M ) :X ( p ) E C,‘ for eachp E N } . Condition (26.8) implies that C’(p) = C,‘ for all p E N . Thus C’ forms a nonsingular vectorfield system on N . We want to show that
VxY
E
C’
for X , Y
E
(26.10)
C‘.
This will also show that [ X , Y ] E C’; hence, that C’ is completely integrable. The global form of the Frobenius complete integrability theorem will imply the existence of Q. Formula (26.10) will also imply that Q is a totally geodesic submanifold of M . Now we prove (26.10): V x Y ( p )E C, for all p E N if and only if V,,, tangent to N , for 2 tangent to N . But V,V,,,(Z)
=
p,Zis
(26.1 1)
vz vx y + [IvxY , Zl
+ R(Z, X ) ( Y ) + [V, Y , Z ] = vx VYZ + VX(CZ, YI) + R ( Z , X ) ( Y ) + [Iv, y , Zl, = VxVz Y
which is plainly tangent to N , which proves (26.1 1). Now, for 2,W tangent to N , R(Vx Y , Z ) ( W
= V x ( R ( Y ,W
which we see is tangent to N .
W ) ) - R ( Y , V x Z ) ( W - R ( Y , Z>(VxW ) ,
365
26. Deformation of Submanifolds
The further conditions necessary to prove (26.10) follow in a similar way, Q.E.D. using the definition of iterated covariant derivatives of R. So far, we have been working with an arbitrary M . In case M is special, for example, of constant curvature, results even closer to the classical theory of developable surfaces can be obtained. For example:
THEOREM 26.3 Suppose that N is a submanifold of M , that C={X
E
V(A4):X ( p ) E C, for all p
EN},
and dim C p is constant for p E N , and that R(C,, C,)(N,) c N ,
for all p E N.
(26.12)
Then the vector field system C is completely integrable, and if Q is one of its integral submanifolds, then the tangent spaces to N are self-parallel along Q. If, further, R(C, Np)(N,) 5
=N,
(26.13)
then Q is a totally geodesic submanifold of M. We leave it as an exercise to the reader to show that (26.13) is automatically satisfied if M has constant curvature. Proof. We prove that C is completely integrable, that is, [C, C] c C. For X , Y E C, Z tangent of N , V[X, Y]
z= v x V Y z- v, v x
-W
X , Y)(Z),
which is tangent to N . The rest of the theorem follows routinely from our work in proving Theorems 26.1 and 26.2. In summary, we may say that this work develops a generalization of the classical theory of developable surfaces to reasonably general Riemannian situations. The classical theory completely ignores the possibility of singularities in the developable structure, and only recently (Massey {l] and HartmanNirenberg [l]) has work begun on the singularities. A developable surface in 3-space with singularities may be described as a surface Q c R3 such that each point p E S satisfies either (a) or (b): (a) p is contained in a line segment lying in S along which the tangent plane is self-parallel. (b) The principal curvatures are zero at p , that is, the second fundamental form is zero at p .
366
Part 3. Global Riemannian Geometry
(The necessary and sufficient condition for this is that S be of zero Gaussian curvature.) In this form the conditions may be generalized to higher dimensional and Riemannian situations, but this goes beyond the scope of this book.
Deformation Problems for Riemannian Submanifolds In this section we shall compare different isometric embedding of the same Riemannian manifold N into a Riemannian manifold M . Hence, for the sake of clarity, we explicitly label submanifold mappings, supposing that 4 and 4’are two isometric embeddings of N into M . In addition to being isometries, we shall suppose that 4 and 4’ satisfy the following condition, which will simplify the discussion considerably. (Note that it is automatically satisfied if A4 is of constant curvature, which is the classical situation.) If p E N , u, u E N , , then the sectional curvature of the planes spanned by 4+(4, +*(4 and 4*’(u), 4*’(u> are the same.
(26.14)
We shall not repeat this fixed assumption throughout this section. Let 4 and 4’be two isometric embeddings of a Riemannian manifold N i n a Riemannian manifold M . Let +(N)’ and 4’(N)’ be the normal vector bundles to the submanifolds determined by 4 and +‘. Then 4 and +’ are rigidly related if there is a map I): 4(N)’ --+ 4’(N)’ such that: For p
E
N , I) maps the normal tangent vectors t o
4 ( N ) at 4 ( p ) linearly onto the normal tangent vectors to + ’ ( N ) at 4 ‘ ( p ) , preserving the inner product on these normal vectors that is inherited from M .
(26.15a)
ForPEN UE4(N)&,), V,WENp, sl(4*(v)2 4*(w)) = Si@I)(4*’(49+*’(w)>,
(26.15b)
where S, )( , ) and S; )( , ) are, respectively, the second fundamental forms of N and N ’ . Condition (26.15a) can be summarized in the language of fiber bundles by saying that I) is a bundle isomorphism of the two normal vector bundles defined on N by the isometric embeddings. It is easy to see that such isomorphisms exist locally: Whether they exist globally is a purely topological question, toward which much current research in differential topology is applicable. For a geometer, (26.1 5b) is the key condition. However, we must defer a further explanation of its geometric meaning. Suffice it to say that in case M is a space of constant curvature (which is the classical situation), it
367
26. Deformation of Submanifolds
suffices to guarantee that (PI and d2 are related by an isometry c1 of M ; that is, 41 =
ff42.
In favorable cases it can be proved that two isometric embeddings are rigidly related for purely algebraic reasons. The following theorem offers at least a qualitative explanation of this.
THEOREM 26.4 Let #I be an isometric embedding of N in M . Suppose that m = dim M , n = dim N . Choose the following range of indices and summation conventions : 1 < a , b ,... < m - n . l < i , j,... I n ; For each p E N , choose a fixed orthonormal basis (vi) of N , and (u,) of 4(N);(,,). Consider the second fundamental forms (SJ , )) as symmetric bilinear forms on N , , and let 8, be the I-covectors, that is, I-forms, on N , , defined by
Suppose that:
e,,(u) = SU,(u,vi)
for u E N , .
(a) Any other set el, of 1-forms on N , related to the 0, as
oi, A e,,, e;,(u,) = eia(ui) must be deducible from the ei, by relations of the form el, A e;,
=
&,
= Ma, Qit3,
(26.16) (26.17)
where (Ma& is an orthogonal matrix, that is, satisfies MabMcb= S,,. (In other words, relations (26.16) must imply (26.17), from some choice of Mob.) (b) The forms Sum(, ) are linearly independent. Then any other isometric embedding of N in A4 (satisfying (26.14) as always) is rigidly related to 4. Proof.
Let
4':N-+ M
be another isometric embedding. For p
E
N , let
(ua') be an orthonormal basis for +'(N);,(,,),let SA,n( , ) be the corresponding
second fundamental forms, and let Qi,be I-forms defined by Q,,(v) = SL,a(u,vi). That (26.16) is satisfied is implied by the fundamental relation between second fundamental forms and curvature, and the fact that (26.14) is satisfied. Let Mobbe the orthogonal matrix satisfying (26.17). Define a linear mapping $,: $(N)&,) + +'(Pi);,(,) by the condition $P(~,)
= Mba
ub'*
Now it is readily verified that $, is independent of the bases we have used to define it; hence we may consider it as defining a global map $: 4(N)' 4'(N)', satisfying (26.15a). Further, condition (26. I5b) follows trivially from (26.17). Q.E.D. --f
368
Part 3. Global Riemannian Geometry
Remarks. The condition that SU,( , ), . . . , Sum-”(, ) be linearly independent forms forces a condition between n and m, namely, tn-ns-
n(n
+ 1)
or
2
(n rn I
+ l)(n + 2) 2
(Since (n(n + 2)/2) is the maximal number of linearly independent quadratic forms on a vector space of dimension n, this corresponds to the intuitive fact that when m is too big in relation to n, there is just too much freedom in the perpendicular direction to hope to get rigidity.) Finding effective sufficient conditions for reasonably general cases that (26.17) be satisfied is a difficult algebraic problem which is beyond the scope of this book. In case dim M - dim N = 1, that is, N is a hypersurface in M , the answer is quite simple, and was found by E. Cartan [3]. We shall follow his proof. THEOREM 26.5 Let 4 : N + M be an isometric embedding of N as a hypersurface in A4 such that the following condition is satisfied:
For p E M , the dimension of the space of characteristic vectors of the second fundamental form of N at 4 ( p ) is no greater than n - 3. Then 4 is rigidly related to any other isometric embedding of N in M . Proof. 26.4
(26.18)
Let S( , ) = Sun(, ), O i= O i , , Oi’ = O i n , in the notation of Theorem
Let C, be the set of characteristic vectors of S ; that is, C,
= { u E N,:
S(v,w ) = 0 for all w E N , } .
Notice also that C,
=
(v E N , : 8,(u) = 0 for 1 I i 5 n).
Equation (26.16) takes the form 8, A
oj = 0,’ A ej’
At most relabeling things, we can suppose that el, . . ., 8, are a maximal linearly independent set from among the Oi(a = n - dim C,, since O1 = 0 = ... - 8, defines C,). Thus the 2-covectors O i A Q j , 1 I i,j I a, are linearly independent; hence so are the Oi’A Oj‘ for 1 _< i, j _< a. This implies that the el’, . . . , 8,’ are linearly independent. Turning the argument around, we see that el‘, . . . , 0,’ are a maximal set of linearly independent forms from the da’.
369
26. Deformation of Submanifolds
N o w a 2 3 . F o r u ~ C , ,l < i , j < a ,
o = u _I (ei’A 8;)
=
e:(u)e;
-
e;(u)e:.
Since i can be chosen different from j (requiring only a 2 2, as a matter of fact), O,’(u) = 0. By symmetry, we see that C,
=
{ u E N , : &‘(u)
= 0} = {u E
N , : S’(u, w)
=0
for all w E N,,},
where S’ is the symmetric (by assumption (26.16)) form defined by for u E N , .
S ( u , u i ) = 8,’(u)
Notice that we must prove that S(
3
) = +S’(
3
1,
since a one-dimensional orthogonal matrix must be 1. Since the characteristic vectors of S and s’ are the same, it suffices to prove this relation on C,’ or, what amounts to the same thing, to suppose the special case C, = 0 or a = n. Now
o = el A 8 , A 8,
= 8, A
8,‘ A d2‘.
Thus 8,, O,’, and 8,’ must be linearly dependent, or since 8,’ and 8,‘ are independent, 8, is dependent on 8,‘ and 02‘. Similarly, 8, is dependent on 8,’ and 03‘. However, since el’, 02’,8,‘ are independent (here we use a 2 3), 8, must be dependent on 8,’ alone, say 8,’ = a,8,. But, 1 has played no privileged role ; hence
Bi’
= aiOi
for 1 I i Ia
(no summation).
But then
Oj‘
A
Oj‘ = a , a j 8 , A O j
(no summation).
Hence,
aiaj= 1 u1a2= a , a 3 ;
for 1 5 i < j < a . hence a2 = a 3 .
By symmetry, a, = a, for 1 i j i a ; hence alz = 1, which is what is required to finish the proof.
This page intentionally left blank
Part
4
DIFFERENTIAL GEOMETRY AND THE CALCULUS OF VARIATIONS: ADDITIONAL TOPICS IN DIFFERENTIAL GEOMETRY
This page intentionally left blank
27 First-Order Invariants of Submanifolds
and Convexity for Afinely Connected Manifolds
As in Euclidean geometry, many of the geometric properties of submanifolds of Riemannian manifolds really involve only the underlying affine connection. It is worth our while to change the point of view from that of the calculus of variations and attempt to describe the geometric invariants of submanifolds more systematically, using mainly the underlying affine connection. As a bonus, of course, the results will hold also for pseudo-Riemannian manifolds, which is of interest for applications to physics (for example, the Theory of General Relativity). In this chapter, M will be a manifold with a given affine connection, denoted by V, and N will be a submanifold. We shall assume that V has zero torsion tensor. Recall that V is defined by a bilinear mapping of V ( M ) x V ( M ) -+ V ( M ) , say, (A’, Y ) -+ V, Y, satisfying V,(fY) V,, Y = f Vx Y
= X(f)Y
+ f v,
(27.1)
y
for X, Y E V ( M ) , f
E
F(M).
(27.2)
Recall the significance of these laws. Equation (27.2) implies that the connection depends “ tensorially ” on X , but (27.1) implies that this is not so in Y. However, it may be possible to consider quotient relations that kill off the first term on the right-hand side of (27.1), and hence convert the covariant derivative operation into a genuine “tensor field.” We can now apply this remark. Definition
For each point p E N, define a bilinear mapping, S: N p x N p -P Mp/Npas follows: For u, u tangent vectors to N at p , choose vector fields X and Y that are tangent to N, that satisfy A’@) = u, Y@)= u, and set S(u, v) = image of V, Y(p) under the quotient
projection map M p + M p / N p .
To verify that S( , ) is well defined, notice that the map (A’, Y) -+ Vx Y(p)-r M p / N pis bilinear also when X and Y that are tangent to N are multiplied by functions. S( , ) is called the second fundamental form of N . 373
374
Part 4. Additional Topics-Differential Geometry
To get a real-valued bilinear form from S, choose any I-covector o at p that is zero on N , and so passes to the quotient to define a linear form on M,IN,. Define: S,,,(u, u ) = w(S(u, u ) ) = w(V, Y), if Xand Yare vector fields tangent to N with X(p) = u, Y(p) = u.
We now present in the formal lemmas some of the main geometric properties of this form.
LEMMA 27.1 S( , ) is a symmetric bilinear form.
Proof. The torsion tensor T( , ) is
T ( X , Y) = v, Y - vyx- [ X , Y].
If X and Y are tangent to N , then so is [ X , Y ] . Thus T ( X , Y ) = 0 implies V, Y - V, X(p) E 0 (mod N,), which shows that S(X(p), Y(p))=S( Y(p),X(p)). Q.E.D.
LEMMA 27.2 Let u E N , be a tangent vector t o N that is tangent to a geodesic of the affine connection on M that lies on N. Then S(u, u) = 0. In particular, if N is geodesic at p , S( , ) is identically zero at p . Proof. Let X be a vector-field tangent to N such that X(p) = u and the integral curve of X beginning at p is a geodesic. By the definition of geodesic, V, X = 0 along the geodesic, in particular at p , whence S(u, u) = 0. N is said to be geodesic at p if all geodesics beginning at p and tangent there to N remain tangent. That S( , ) identically zero at p is a necessary condition should now be evident.
LEMMA 27.3 Let p that
+Hpc
M , be a field of tangent subspaces of N defined for p M,
=N,@
E
N such
H,.
Such a field defines an induced affine connection on N , denoted by V N ,in the following way: For X * , Y * E V ( N ) , choose X , Y E V ( M ) which reduce to X * and Y * on N . For p E N , put V& Y * ( p ) = projection of V, Y ( p ) on N , .
27. First-Order Invariants of Submanifolds
375
Proof. The main point is to show that the projection of Vx Y@) on N , is zero if X or Yare zero on N . But this should be rather evident. If H p is identified with M , / N , , note that this definition of VN can be rewritten as
VXNY- V, Y
= S(X,
Y)
for X,Y E V ( M ) , tangent to N .
LEMMA 27.4 If S( , ) is identically zero for all points p E N , then N is totally geodesic; that is, N is geodesic at each of its points. Proof. Since this is a purely local problem, we can suppose that there is at least one such field p + H , of tangent subspaces enabling us to define an induced afine connection on N . Notice that S( , ) = 0 implies that this connection really does not depend on H (hence there is an induced connection globally defined on N ) . Also, V, Y = V X NY for X , Y tangent to N . In particular, a geodesic of N in the induced connection is a geodesic for the given connection on M . By the uniqueness of geodesics having given tangent vector, this proves that N is geodesic at each point. Q.E.D.
Convex Hypersurfaces and Functions Now we turn to the following question: Let f be a real-valued function on M , and let p be a point of N that is a critical point for f restricted to N . Then df restricted to N is zero at p ; that is,
df(u) = u ( f ) = 0
for all u E N , .
(27.3)
We now ask how to compute the Hessian off, restricted to N , at the critical point p . Now, in general, the Hessian is a quadratic form, denoted by u -+ h,(u), on N p . It can be defined as follows: For u E N , ,pick a vector field X that is tangent to N , satisfies X(p) = N , and put h,(u) = X(X(f))(P).
(27.4)
We want to express (27.4) more precisely in terms of the geometry of N and f. Suppose thatfitself does not have a critical point at p . Then (27.3) expresses the geometric fact that N and the level surface f -'( f @)) are tangent at p . Let Sdff( , ) be the second fundamental form. A basic formula is h,(u)
= S&,
u) - Si,(U, u).
To prove (27.5), start from (27.4): X ( X ( f ) ) = X(df(X)) = VX(df)(X) + df(VX XI. Now df(V, X ) ( p ) = Sdf(u, u). by definition.
(27.5)
376
Part 4. Additional Topics-Differential
Geometry
V,(d’)(X)(p) depends only on the value of X at p ; hence we can choose a vector field Y satisfying Y ( f ) = 0, Y ( p ) = X(p). Then Vx(df)( X ) ( P )= V,(df)(
=
Y ( d f ( Y ) )- d f ( V , YXP)
= -S,Ss(Y(P), Y(P)),
which proves (27.5). To realize the significance of this formula, let us compute S&( , ) in case M is Euclidean space, with coordinates ( x i ) , 1 5 i 5 n, and the flat affine connection (that is, V,,,,z(a/dxj) = 0):
af
df =--xi
axi
Suppose Y = A i ( d / d x i ) satisfies df( Y ) = 0; that is, A i ( d f / d x i )= 0.
Recall that the function j” is said to be convex if its Hessian matrix (d2fl(dxiax,)) is positive semidefinite. This then implies that: The form v
+ S&(v,
v) is nonpositive.
(27.6)
Such a condition, verified at all noncritical points off, is then the appropriate generalization of ‘‘convex function ” to an affinely connected manifold. Similarly, we can say that a hypersurface N of M is convex if, for each p E N , each form /I on M , with A(N,) = 0, the following condition is satisfied: The form v
+ Sl(v,
v) on N p does not change sign.
Now we turn to the question of proving bounded by convex hypersurfaces.
“
geometric convexity” of regions
THEOREM 27.5 Suppose that f is a function on N such that f ( p ) = 0, f -‘(O) is a hypersurface that is geodesic at p and is tangent to the hypersurface N at p . Suppose that SdJ(u,v) i 0 for all u E N , . Then p has a neighborhood U such that f(q)I 0
for all q E N n U ;
that is, N lies, in a neighborhood ofp, completely on “one side” of the hypersurface ,f - ‘(0). Notice the analogy with the statement in Euclidean geometry that a convex hypersurface lies completely on one side of its tangent plane. Proof. Using (27.5), we see that the function q +f(q), for q E N , has a relative maximum at q = p . Q.E.D.
27. First-Order Invariants of Submanifolds
377
THEOREM 27.6
Suppose that f is a real-valued function on M such that: For t E [0, 11, the hypersurfacef - ' ( t ) is strictly convex in the sense that S&-(u,u) < 0 for all u E M,, , satisfying u ( f ) = 0 , o sf@)I 1. Then the set of all points p E M such that 0 s f @ ) 1 is geodesically convex in the sense that a geodesic whose end points lie on the set lies completely on the set.
Proof. Suppose ~ ( t ) 0, I tI 1, is a geodesic with 0 ~ f ( o ( O ) )and f(o(t)) 5 1. Iff(a(t)), for 0 I t 1, does not lie on the interval [0, 13, there is a point to E (0, 1) that is a relative maximum for t +f(o(t)). Thus we can apply (27.5), with N taken as the geodesic B . The first term of (27.51, Sdr(u, u), is zero, since it is the second fundamental form of B , which is zero, since B is a geodesic. Our hypotheses then assert that the Hessian offrestricted to B must be positive at t o , which is a contradiction. Q.E.D.
These simple results suffice to give the idea of the geometric meaning of formula (27.5). Many more sophisticated applications are possible.
28
Affine Groups of Automorphisms. Induced Connections on Submanifolds. Projective Changes of Connection
Let A4 continue to be a manifold with a torsion-free affine connection V. Suppose that V‘ is another affine connection. We shall show that the “ difference,” D( , ) of V and V’ is a tensorJield. For X , Y E V ( M ) , put D ( X , Y ) = V, Y - V,’ Y . For f~ F ( M ) , note that DUX, Y) =fO(X,Y), D(X,fY) =fD(X, Y )+ X(f)Y -X(f)Y
= fD(X,X
).
(28.1)
Then D( , ) is F(M)-multilinear; hence, defines a tensor field on M . Clearly it is also symmetric in X and Y . Suppose now that 4 is a diffeomorphism of M and that V’ is the “transformed ’’ affine connection ; that is, 4 * P x Y ) = V&,
4 * ( Y ) = N 4 * ( X , 4*(Y>>>+ V+.(X) 4 * ( Y ) .
(28.2)
Suppose further that t + 4r is a one-parameter group of diffeomorphisms whose infinitesimal generator is the vector field 2. We know then that
Substituting
4, for 4 in (28.2), differentiating with respect to t, we have CZ, V x Y l = D’(X, Y )
+ Vcz, x i Y + VxCZ, Y1,
(28.3)
where D’ is some tensor field. (If one uses this relation as the definition of D‘, it can be easily verified independently that it is a tensor field.) Now, if 4 preserves the affine connection, that is, is an automorphisrn V‘ = V, then if 2 generates a one-parameter group of connection automorphisms, we have [Z, Vx Y ] = Vtz,x, Y
+ VJZ,
Y]
for all X , Y
E
V(M).
(28.4)
This reasoning can be readily reversed (exercise !) to show that (28.4) is also sufficient that the vector field Z generate a one-parameter group of connection automorphisms. 378
28. Affine Groups of Automorphisms
379
LEMMA 28.1 Suppose that 4 : M -P M is a connection automorphism, that M i s connected and that p is a point of A4 such that +(p) = p ;
Then
&: M ,
-P
M,
is the identity.
4 acts as the identity on M .
Proof. The set of all points of M that are fixed under 4 is obviously closed in M . We shall show that it is also open in M , which will prove that it is all of M . 4 maps geodesics of the connection into geodesics. Since it leaves p fixed and leaves fixed all tangent vectors starting at p , it leaves fixed all geodesics starting at p , hence leaves fixed each point of a neighborhood of p ; that is, the Q.E.D. set of all fixed points of 4 is open in M .
Now, if 2 is a vector field in M , the condition that the one-parameter group generated by Z leave p fixed is obviously Z ( p ) = 0; that is, p is a singular point of 2.If this condition is satisfied, we can define a linear transformation (pz: M , + M , in the following way: For v E M , , pick any vector field Y such that Y ( p ) = u, and set lz(v) = c y , Zl(P).
(28.5)
(The reader can readily verify, following a pattern we have used many times before (for example, in the definition of the second fundamental form) that the vanishing of Z at p guarantees that this does not depend on how u is extended to a vector field.) The linear transformation 1, is called the linearpart of Z at p . To justify this name, we can look at it in local coordinates, say ( x i , . . .,xn), with x i ( p ) = 0. with Ai(0) = 0. Then
that is, ( ( d A i / d x j ) ( 0 ) ) is the matrix of the linear transformation I,. The ((dAi/axj)(0))are, of course, just the first terms in the Taylor series of A i ( x ) about x = 0.
380
Part 4. Additional Topics-Differential
Geometry
LEMMA 28.2 Let Z be a vector field, with Z ( p ) = 0. Then Iz = 0 on M,, if and only if the one-parameter group generated by Z acts as the identity on tangent vectors at p . We leave the proof to the reader. From Lemmas 28.1 and 28.2, we have immediately Lemma 28.3. LEMMA 28.3 If Z generates a group of connection automorphisms, if Z ( p ) = 0 = I,, then 2 is zero at every point of M . Now we turn to the following question: Suppose that Z is a vector field that generates a one-parameter group of connection automorphisms, that is, satisfies (28.4). Suppose that N is a submanifold of M and that Z is tangent to N ; that is, Z ( p ) E N, for all p E N so that Z restricted to N defines a vector field on N . Dejinition
Let Z be a vector field on a manifold N . We say that Z vanishes to the kth N if
order a t a point p E
Z(p)
=0
[ X , , [... [ X , , Z ]
and
.-.I@)= 0
(28.6)
for all choices of k-tuples of vector fields ( X , , . . . , X,) on N . (Notice that X vanishes to the first order if Z ( p ) = 0 and 1, = 0.) Return to the case where N is a submanifold of M , Z is tangent to N , and Z satisfies (28.4). We ask: Is it possible that Z restricted to N vanishes to the kth order at a point of N without vanishing everywhere on M ? This is clearly possible for some choices of N . For example, if N is a plane in Euclidean space M , there is clearly a nontrivial one-parameter group of affine transformations of M leaving every point of the plane fixed. However, we can find conditions on N that prevent this. Let X and Y be vector fields tangent to N . Then, by (28.4),
cz, v x Y l = V [ Z , X ]
y
+ vxcz, Y l = V [ Z , X ]
y
+ V[Z,
Y,X
+ [ X , [Z, Y l l .
Thus, if Z restricted to N vanishes to the second order at p , then
[Z,
v,
Y X P ) = 0.
If every tangent vector at p can be written as a combination of N,, and vectors of the form Vx Y(p), for A', Y tangent to N , then we see that I, = 0; hence, by Lemma 28.3, Z is identically zero on M . In general, let us call the space of
28. Affine Groups of Automorphisms
38 1
vectors spanned by N , and {V, Y(p):X , Y vector fields tangent to N } the first osculating space to N . Then we have the following rather trivial theorem, which is important for the qualitative picture given us of the relation of the “induced geometry” on N to the group of affine connection automorphisms. THEOREM 28.4 Let N be a submanifold of the affinely connected space M , let 2 be a vector field that is tangent to N and generates a one-parameter group of connection automorphisms. Suppose that the first osculating space to N at p fills up N p . Then if Z restricted vanishes to the second order at p , it must vanish everywhere on M . This analysis can be carried further to get criteria for nonvanishing of the order, higher than 2, by considering the higher osculating spaces to N . (The second osculating space to N at p would be that spanned by N p , the set of V, Y(p) and V,, Vx, Y ( p ) ,for vector fields X , XI, Y tangent to N , and so on for the higher osculating spaces. We leave this to the reader, since it involves a simple iteration of the basic argument, that is, applying (28.4) to [ Z , V, V,, Y ] . Theorem 28.4 is not the best possible answer. There are clearly submanifolds having the property that a vector field satisfying (28.4), vanishing to the first order when restricted to N , vanishes identically. For example, this is so if N admits an induced affine connection,” that is, one that is left invariant by an affine automorphism of M which maps N into itself. (For example, we have seen that if N is totally geodesic, it admits such an “induced connection.”) We shall now turn to this more basic question of induced affine connections on submanifolds, using Cartan’s method of the moving frame.” (Indeed, this problem gives a good introduction to Cartan’s theory.) The most complete work on this subject is by Klingenberg [l]. We shall be working with a particularly simple case, namely, that where N is a hypersurface in M . Of course this question is a purely local one, so that we are free to choose “moving frames,” that is, a basis ( q , . . . , on)for I-forms on M . (Choose the following range of indices and summation convention: 1 5 I., J., . . . 5 n = dim M ; 1 5 n, b, . . . 5 n - 1.) Let (oil)be the connection forms associated with this basis, and the affine connection. As definition, “
“
wij(X) = -oi(V,Xj)
for X
E
V(A4).
(28.7)
Then the following relations hold: d o i = oijA o j
(28.8)
(expressing the fact that the torsion tensor is zero).
d o i j = oikA okj+ Rij,
Rij = RijklokA ol.
(28.9)
382
Part 4. Additional Topics-Differential
Geometry
(The R i j k ,are the components of the curvature tensor R( , )( ), just as defined earlier for Riemannian geometry.) Now we can arrange the moving frame so that w, = 0 on N . By (28.8), whence on,A o,= 0 restricted to N .
do,= onjA w j ,
Suppose that (coif) is another such moving frame. They are related to the (wi) by relations of the form w i = M i j w j ' ,where ( M i j )is a matrix-valued function on M , whose determinant is not zero. Let (w:,) be the connection forms (defined by (28.7)) in the primed system. The transformation law between the (oij)and (mij) is readily calculated from (28.7), and is found to be 0.. ' I = MikWk;MG
4- d M i kMkj' ,
(28.10)
and ( M ; ' ) is the inverse matrix to ( M i j ) .Suppose that the primed frame also satisfies con' = 0; that is, M,, = 0 restricted to N . Our goal is to use this freedom t o change frames to satisfy more conditions. From (28. lo), we then have on,= M,, co;, Ma;'
, restricted to N , hence w,, . o,= M,,w,,
. w,'.
(28.11)
(The dot ( . ) indicates the symmetric product of I-forms; that is, if U,, 0, are 1-forms, 8, . O,(X, Y ) = +(U1(X)02(Y ) + el( Y ) O , ( X ) ) . )It can be readily seen that the form u,u
+
a,,. w,(u,
0)
is the second fundamental form of N , (u, ZI) -+ Sw,(u, ZI). We shall suppose that this form is definite, that is, that for all u tangent to N implies u = 0.
So,,(u, v ) = 0
For simplicity, we suppose that the form is positive definite-the cases of other signatures of the quadratic form u + So,(u, u ) can be handled similarly. The law of transformation (28.1 1) then assures us that a moving frame (mi) can be that is, so that chosen so that w,, . w, = o,. 0,; restricted to N .
on,= w,
(28.12)
Suppose that (mi') is another choice of moving frame satisfying (28.12). Then the transition functions ( M i j )must satisfy M a , M a , = 6,b M,,
and
M,,
=0
restricted to N .
Differentiate (28.13) : dMacMab
or dM,,M,'
f M a c d'ab
+ M,'
dM,,
=
d'nn
= dab dM,,M,;'.
(28.13)
383
28. Affine Groups of Automorphisms
Hence, dMahM,' or
+ M G 1 dMah= ( n - 1) dM,, M n i L ,
(" 2 ')
dMohM-' ha - -
dMnn
Let
Computing the relation between 8 and 8', we have
e = M,,U;
+
M L ~
e r
(" 2
- ')M,,,,w:
MA^.
(28.14)
Now, given the primed moving frame, we can arrange the unprimed moving frame by a change of frame for which the transition functions ( M i j ) satisfy (28.11), that is, leave the relation (28.12) invariant, and such that
e = 0.
(28.1 5 )
In fact we can accomplish this by a choice of ( M i j ) satisfying Mn, = 1 ; Mah= dab; Ma;' = - M a n ; for with these choices, (28.14) set equal to zero can
be regarded as a set of equations for M a n . Now, with conditions (28.13) imposed, let us see how the possible changes of frame are restricted. Using (28.14) again and the relation 0 = 0' = 0, we have
Hence,
hence,
(" 1
n-1 Mhn= - ( T ) M h c M i l M o ,= - l)Mhn? forcing Mh, = 0.
(28.16)
But this relation tells us that the set of vectors 2, satisfying w,(u) = 0 are left invariant when the moving frames are chosen; that is, for p E N , there is a
384
Part 4. Additional Topics-Differential Geometry
subspace H , of M , such that M , = N , @ H , . Further, the reader can convince himself by working through the above argument that a connection automorphism of M that maps N onto itself will preserve the splitting of the tangent space of M at points of N . (For example, notice that the transformation applied to a moving frame satisfying (28.13) and (28.15) will again satisfy (28.13) and (28.15)) Hence, if we use this field p + H p of subspaces to define an induced affine connection on N (as explained in the beginning of this chapter), an affine transformation of M leaving N invariant will be an automorphism of the induced connection on N . Summing up, we have proved Theorem 28.5 THEOREM 28.5 Let N be a hypersurface in an affinely connected space M whose second fundamental form is definite. Then there is an induced affine connection on N which is preserved by every connection automorphism of M which leaves N invariant. “
”
The procedure we have followed is typical of Cartan’s method of the moving frame. In terms of the jargon, what we have done is to reduce the structure group of the tangent bundle of M restricted to N in a natural way to a subgroup that is small enough to enable one to define an induced affine connection on N . Many examples of this reduction process can be found in Cartan’s book on the method of the moving frame [3], although the generalities are not very clear there. A general (although sketchy) treatment of these matters can be found in a paper by Hermann [9]. “
”
Projective Change of Connections
If two affine connections, V and V‘, are given on a manifold M , we have seen that their “difference” D( , ) is a tensor field. Recall that D ( X , Y ) = V, Y - V,‘Y
for X , Y E V ( M ) .
We may ask : When are V and V’ related in such a way that the second fundamental forms of a submanifold with respect to the connection are the same? The answer is Theorem 28.6. THEOREM 28.6 Suppose that 0 is a 1-form on M such that
v,
Y - V,’Y
= *(O(X)Y
+ O( Y ) X )
(28.17)
for each pair ( X , Y E V(A4)).Then a submanifold N of M has the same second
385
28. Affine Groups of Automorphisms
fundamental form with respect to V' as it does with respect to V. Conversely, if this property is valid for every submanifold of M , then V and V' are related by (28.17). Further, (28.17) is satisfied if and only if V and V' are projectively related in the classical sense, that is, their geodesics differ only by a change in parametrization. Proof. Suppose (28.17) is satisfied. Let N be a submanifold of M . Let w be a 1-form on A4 such that o ( N J = 0 for all p E N . Then, for X , Y vector fields that are tangent to N , we have
S,(X, Y ) = w(V, Y) = w(V,'Y)
= S,'(X,
Y),
in view of (28.17). The converse is obvious. Suppose now that the geodesics of V and V' differ only by a change in parametrization. Let X be a vector field whose integral curves are geodesics of V; that is, V,X= 0. Then the integral curves must, after a change in parametrization, be geodesics of V' ; that is, Vx'X
= fX
for some f
E
F(M).
Let D ( X , Y ) = Vx Y - V,' Y. Then V(X, X ) = -fX. Since D is a tensorJield, this must hold for an arbitrary vector field on M . Polarization of this identity (that is, substitution of X + Y in place of X ) shows that (28.17) is satisfied. The converse is readily obtained by reversing these steps and observing that, at least locally, each geodesic can be exhibited as the integral curve of a vector field X satisfying V, X = 0. Q.E.D.
29 The Laplace-Beltrami
Operator
Let M be a manifold, with a pseudo-Riemannian metric on M defined by a definite inner product ( , ) on tangent vectors. Associated with the metric we can define a linear differential operator A that acts on functions on M . By specializing the metric, many of the differential operators that are important in mathematical physics may be obtained. For example, the ordinary Laplace operator a2
a2
a2
-+1+7 ax2
ay
aZ
is associated with the flat Euclidean metric on Euclidean space; the d’Alembertian, or wave, operator
1 az 0 =---
c2 at2
a2
axz
a2
a2
ay2
az2
is associated with the Lorentz metric on space-time. Our aim in this chapter is to illustrate the power of the theory of affine connections on manifolds by deriving many of the properties of these operators without introducing coordinates. Let j be a function on M . The gradient off, denoted by gradf, is a vector field on A4 defined as (gradf, X )
=X
for X E V ( M ) .
( f )= df(X)
(29.1)
The first-order differential operatorf-t I/gradfI/ is sometimes called the first Bdtrami diferential operator. In fact, we have already seen in Chapter 13 that the function f such that //gradj // * is constant on the level surfaces off is a solution of the Hamilton-Jacohipartial diferential equation associated with the variational problem; for such a nf, the integral curves of grad f a r e geodesics. Let 0 be the n-form on M defining the volume element associated with the metric. The easiest way t o define 0 is by using an orthonormal moving frame (q, . . ., w,J of I-forms, that is, the metric is given by d s 2 = + _ w ,.w1
* ‘..*
w, . w,
where ( . ) denotes the symmetric product of 1-forms. Then 0 is given by 8 = 0,A ..‘ A 386
W,.
(29.2)
387
29. The Laplace-Beltrami Operator
Let X be a vector field on M . The divergence of X , denoted by div(X), is the function such that x ( O ) = (div X)O.
(29.3)
Thus div(X) = 0 is the condition that the one-parameter group generated by X leave invariant the volume on M defined by 8. Letf E F ( M ) . We can define A( f),the Lupluce-Beltrumi operator (sometimes called the second Beltrami operator), as
A(f)
= div(gradf).
(29.4)
We can give a more explicit form of A ( f ) in terms of the orthonormal moving frame (mi) of the connection. Let ( X i ) , 1 i i , j , . . . i n = dim M be a basis of vector fields dual to the (ai); that is,
X ( 0 J = X(Wi)(Xj)Oj [ X ( w i ( X j ) - ai(VxCX, Xj1)Iwj = Wi(VXJ x - v, X j ) O j .
=
Now
... A 0,+ ... + o1A ... A ~ ( 0 , ) x - v, x,)]01A A 0,f f O1 A ..' A 0,-1 A [O,(v~,x -v~X,,)]O, = [wi(v,ix - v,x,)le.
x(e)=~
( 0 A, )
= [O,(V,,
By (29.31,
' ' '
div X
' '
= o i ( V x i X- V , X i ) .
Now Y = mi( Y ) ( X i ) ;hence (Y, Xj> Put e, = (Xi, X i ) . (Then e i = div X But
=
= ui(Y)<Xi >
Xj>-
1.) Now
c e i ( ( X i ,V X i X )
( X i , vx X i )
- (Xi, V,Xi)).
i
=X((Xi,
Xi)) - = 0.
Now we have div and A expressed in terms of the frames and the connection : (29.5) div X = C e i ( V X i X ,X i > i
A(f)
= div(gradf) =
1 ei(VXigradf, Xi>. i
(29.6)
388
Part 4. Additional Topics-Differential
Geometry
There is another useful form of A(f). Starting from (29.6), we have A ( f > = C ei(Xi((gradf7 Xi>) - (gradf, VxiXi>). I
Hence,
A ( f ) = C ei(Xi x i ( f ) - (Vxi Xi)(f)).
(29.7)
1
Equation (29.7) is the form that most closely resembles the usual Laplace (or d'Alembert) operator for Euclidean space. For if the connection is flat, a coordinate system (xi)can be chosen so that d
xi = axi ,
V&Xi = 0.
Then
(Laplace or d'Alembert are obtained by specializing the e.) On the other hand, for an arbitrary metric we can always find an orthonorma1 moving frame (Xi) such that at a given point p ,
VXiXj(P)
= 0.
Further, a coordinate system (xi) about that point can be chosen so that 22
(Exercise: Prove these statements.) Then, at one point, we can always arrange that A has the same form as in the flat Euclidean case. Finally, to illustrate the usefulness of these formulas, we shall derive a formula for the Laplace-Beltrami operator for the metric induced on a submanifold N of M . Suppose that dim N = p , and choose the additional range of indices: 1 < a, b, . . . 5 p ; p 1 5 u, v, . , . 5 n. Suppose that ( X , ) are tangent to N . (Notice that we are implicitly assuming that the metric induced on N is nondegenerate.) Suppose that grad f = Y + Z , where Y and Z are vector fields that are, respectively, tangent and perpendicular to N . Then Y restricted to N is the gradient vector field offrestricted to N with respect to the induced metric on N . Then
+
A"(f) =
c e,<x, a
7
v,.
y>,
389
29. The Laplace-Beltrami Operator
where AN is the Laplace-Beltrami operator with respect to the metric induced on N applied to f restricted to N . Now (X, v x a z>= X,((X, 9
3
2 ) ) - (VXa
x, z>= - S,(X,, 9
XJ,
where S, )( , ) is the second fundamental form of N . Now
Example: Spherical Harmonics Suppose that M is flat Euclidean space, with Euclidean coordinates (xi), and that N is the sphere of radius r. Let g = $xixi = $Ix12. Then p = n - 1. Choosing the moving frame so that (X,,) is tangent to A, we have
x, =
grad - xi(a/axi) -llgrad 9 II 1x1 *
Suppose that f is a function on R" that is homogeneous of degree A. By Euler's homogeneous function relation,
That is,
Now g is a solution of the Hamilton-Jacobi equation for the Riemannian metric ds2 = dxi dxi, since IIgrad 911 = 2g. We know, then, from general principles that VxnX, = 0; that is, the integral curves of X,, are geodesics. (This is obvious geometrically, of course: The integral curves of X,, are the orthogonal trajectories of the spheres concentric about the origin, that is, straight lines.) We have also seen that the second fundamental form Sxn( , ) of the sphere of radius r has all its eigenvalues equal to - ( l / r ) . Then
390
Part 4. Additional Topics-Differential Geometry
-_ -
_ _ Af
n2f
Af(n - 1)
r2 +
r2
r2
A r2
= -(A
+ n - 2)f.
In particular, we have the following. THEOREM 29.1
Iff is a function on R" which is harmonic (that is, satisfies A(f) = 0) and which is homogeneous of degree 2, then f restricted to the unit sphere in R" is an eigenfunction of the Laplace-Beltrami operator on the sphere with eigenvalue A(1 n - 2). In particular, we may consider thefthat are polynomials and that also are harmonic on R".Those polynomials of degree A are permuted by the action of the rotation group, and this gives all finite-dimensional linear representation of the group of rotations of R".
+
The Geometric Background for
"
Separation of Variables "
Now we return to the first definition of A(f) as div(grad f ) . Suppose f, g are functions on M . Then grad(fg) = f grad g
+ g grad f .
A(fg)Q = grad(fg>(Q>
s)(0) + (9 gradf)(@ g(0) df A (grad g _I 0)
= (fgrad
= fgrad
+
+ g gradf(8) + dg
A
(gradf
Now
df
A
(grad g
_I
(df A 0) - (grad g J d f ) 0 - (grad f,grad g)0.
0) = grad g =
_I
Hence, A(fg> = f A ( 9 ) + 9 A ( f ) - (gradf, grad
s>.
We have now proved LEMMA 29.2 A(fg)
= A(f)g
+f A (9) if and only if (grad f,grad g)
= 0.
_I
0).
39 1
29. The Laplace-Beltrami Operator
LEMMA 29.3 Suppose that f is a function on M such that 1lgradfll2 = Fdf).
(29.9)
A(f) = F2(f).
(29.10)
Then there exists a function g = F ( f ) such that A(9) = k.
(29.11)
(Fl( ), F,( ), and F( ) are functions of one real variable, say, x.) In fact, g satisfies (29.11) if and only if F"(x). F , ( x )
+ F'(x)F,(x) = i F ( x ) .
(29.12)
Proof. Suppose we look for g = F(f) to satisfy (29.1 1). dg = F"f) df;
hence, grad g
= Fr(f)grad f.
A(s>d = (grad S)(Q = (F'(f)gradf)@ = F'U) = F'(f)
A(fW A(f)d
+ d(F'(S)) A (gradf
+ F " ( f ) df
(gradf = F ' ( f ) A(f)Q + F " ( f ) IIgradf II 0. A
_J
6) 6)
Formula (29.12) now follows. Functions f satisfying (29.9) and (29.10) are called isoparametric (with respect to the given Riemannian manifold). Once such functions have been found, Lemma 29.2 tells us that eigenfunctions for the Laplace-Beltrami operator can be found by solving an ordinary linear differential equation, namely, (29.12). Conditions (29.9) and (29.10) also have a geometric significance: THEOREM 29.4 Let f be a function such that Ilgradfll is a (nonzero) function off. Then A(f) is a function offifand only if the mean curvature of the level surfaces of f is constant on each surface. Proof. We see from the proof of Lemma 29.3 that we may change f by a function of one variable, f - F ( f ) . In particular, we may suppose
392
Part 4. Additional Topics-Differential Geometry
Ilgradf/12 = + 1 ; hence we may choose an orthonormal moving frame (XI, . . . , X,,) such that g r a d f = XI.
Now Hence,
The right-hand side is now precisely the mean curvature of the level surface
f = constant, that is, the sum of the eigenvalues of the second fundamental form.
Green’s Formula Suppose now that A4 is a manifold with a boundary hypersurface dM = N and a pseudo-Riemannian metric, denoted by ( , ). We shall be using the concepts involving integration that were introduced in Chapter 7. Let A g be functions on M . We suppose that M is orientable and that 0 ( = o1A . .. A o,, in terms of orthonormal moving frames) is the volumeelement differential form defined by the metric. Then g A(f)Q = g gradf(0) = g d(gradf _I 0) = d(g gradf -I 0) - dg A (gradf A 0). dg
A
(gradf J 0) = -(gradf
_I
dg) . 0 =
-
(gradf, grad g)0.
This is symmetric in g andf’; hence fA(g) - g A(f)
= d(fgrad
g
_I
0 - g gradf -I 0).
Integrating over M and using Stoke’s formula on the right-hand side, we have
J;,( f & >
-
9 A(f>Y =
1 N
f(grad 9
J0
- g(gradf J 0).
(29.13)
Suppose now that X N is a unit-length vector perpendicular to aN, pointing into M from N . Then X N _I 0 restricted to N is the volume element form on dN,
29. The Laplace-Beltrami Operator
393
which we denote by O N . If X is any vector perpendicular to X N , notice that X _I 8 is zero restricted to N . Now gradf- X N (f ) X N is perpendicular to N . Hence
Equation (29.13) now becomes JM
Cf&)
-9
A(f>P=
1CfX"(9) N
- 9XN(f)PN.
(29.14)
This is Green's formula, from which many integral formulas and uniqueness theorems can be proved. We shall refer to a textbook on partial differential equations for detail on these applications (Garabedian [l]). As an illustration, we shall consider integral formulas obtained by inserting for g a fundamental solution of the Laplace-Beltrami equation A = 0. Let p be a point of M. A function g that satisfies A(g) = 0 on M - @), but has a singularity at p , with JM
fA(9)
=f(P)
for a l l f E ccdm
(29.15)
is called a fundamental solution. Symbolically, A(g) = 6,. Suppose that A(f) = 0. Then, from (29.14), f(P)
=
J CfXN(9>- sXNU)loN. N
(29.16)
This can be interpreted as a " mean-value" formula expressing the value of f a t p in terms of the values off and its normal derivative over the boundary of M . Suppose now that g = constant on N . Now, inserting g = 1 in (29.14), we have
JNXN(f)8"
= 0.
Hence, the second term in (29.16) drops out, and we have
(29.17) which is a " mean-value formula " for.f(p). For example, in Euclidean space it is readily verified that the fundamental solution can be chosen so as to be constant on concentric spheres about p , giving the ordinary mean-value formula for harmonic functions.
30
Characteristics and Shock Waves
We shall give only a sketchy treatment of the subject matter indicated in the title. A full-scale exposition would involve us in a good deal of the theory of partial differential equations and applied mathematics, and hence would require another book. Our starting point is a brilliant but little-known exposition of the theory of shock waves given by Levi-Civita [2]. The main idea can be described very easily in terms of manifolds. Let M and E be manifolds, and let (b be a mapping of M + E. We shall suppose that, in local coordinates for M and E, 4 satisfies a given system P of partial differential equations of order r.
Definition A submanifold N of M is a characteristic submanifold for the system P of partial differential equations if there is a map 4 : M + E such that:
(a) (b restricted to M - N is differentiable to all orders and is a solution of the system P . (b) 6 is differentiable of order Cr-’ on all of M , but not all the rth derivatives of (b are continuous at each point of N . (c) The limits of the rth-order derivatives of q5 exist along curves in M - N that approach points of N . (This condition will be made more precise below.) A map (b: M + E satisfying (a) through (c) will be called a shock solution of P , with A; as its submanifolds of discontinuity. In addition to giving this general definition (not precisely in this language, of course), Levi-Civita points out how the differential equation for the characteristic submanifolds, and the jump-conditions for the rth derivatives of 4 on N can be obtained as a consequence of “ geometrical-dynamical compatibility conditions,” which are obtained by combining the compatibility relations implied by (a) through (c) with the fact that (b “solves” P . Rather than carry out the details of this program in full generality, we propose in this chapter to concentrate on what seem to be the most important cases for geometric and physical applications, namely: where E is a vector bundle over M and in addition 4 is a cross section. We shall be especially interested in the case where E is the tangent bundle (that is, the 4 are vector 3 94
30. Characteristics and Shock Waves
395
fields, which we like to denote by such letters as X , Y , . . .). It will be almost obvious how to extend from this case to the most general vector bundles; hence we shall not pursue the more general directions. The Geometric Compatibility Conditions for Vector Fields Let M be a manifold and let N be a submanifold of M . All data will be C" unless mentioned otherwise. Let X be a vector field on M which satisfies the following conditions : X is continuous (as a cross-section map: M -+T ( M ) )on M .
X i s C" on M - N .
(30.la) (30.1b)
Let a(t) and ol(t),0 I t i 1, be curves in M , with a([) and o,(t)E M - N for 0 < t < 1, and a(0) = a,(O) for p E N . Let Y be a ( P )vector field on M . Suppose that lim [ Y , X ] ( a ( t ) )exists. 1-0
We shall call it 6( Y , X , a). It is an element of M o ( o ) LetfE . F ( M ) . Then
W Y , x , a) = f(P) 6(Y, x,a) - X(f)(P)Y(P).
(30.2)
From (30.2) we have LEMMA 30.1
6( Y, X , a) - 6( Y, X , al)depends only on the value of Y at p . If, for u E M , , we put BAY)
=w
, x,4 - w ,
x,a),
(30.3)
where Y is a vector field such that Y ( p ) = u, we conclude that 6,(zl) depends linearly on u. Thus, if we are given a smooth pair of family of curves, one pair for each point of N , ti + S,(ti) can be considered as a tensor field on N which measures the jump in the first derivatives of X across N . We aim now to find the compatibility conditions that result from the fact that X itself is continuous across N . Suppose that 2 is a vector field on M that is tangent to N . For p E M , lets + exp(s2)p be the integral curve of Z starting at p . Let o be a differential 1-form on M . Then, for each s, lim o(X(exp(sZ)a(t))) = lim o(X(exp(sZ)a,(t))). t-0
t-0
Take the derivative with respect to s of both sides of this relation, assume that
396
Part 4. Additional Topics-Differential Geometry
it is permissible to irzterchange limit and derizxztiue, and set s = 0. The result is
+
lim Z ( o > ( X ( d t ) )+) w ( [ Z , Xl(4m) = lim Z(w>(X(c,(t)>>4 [ Z , XI(c1(t)N. 1-0
t-0
(Z(w) denotes the Lie derivative of w by the vector field 2, which is again a
1-form.) The first terms on both sides of this relation cancel each other, since X is continuous across N . Since w is an arbitrary differential form, we have proved the following. THEORI'M 30.2
With the assumption listed above, 6,(u) is zero when u is tangent to N . (This condition may be referred to as the geometric compatibility condition.) The extension to the case where X has discontinuities of the rth derivatives across N , but is C r - ' on all of M , can be made similarly. For any choice ( Z , , . . . , Z,) of vector fields, [Z,, [Z,, . . . [Z,, X I ] - . . ]is continuous across N , but its derivatives have a jump across N . Thus we can define for u l , . . . , u, E M , , 6,(ul, . . . , u,) by choosing vector fields (Zl, . . . ,Z,) whose value at p is ( u l , letting G(ul, . . ., u,) = lim [XI, . . . [ X , , Z], . ..](a(t)) f+O -
lim [ X I , ... [X,, Z],
r-0
. . . , ur), and
...I(a,(t)).
We see immediately that 6 depends tensorially and symmetrically on u1 ,. . . ,u , , and that < S ( P ~ , . . . , ti,) = 0 whenever one of the u is tangent to N .
The Dynamic Compatibility Conditions In general, the " dynamic compatibility conditions " are those conditions imposed on the tensor field 6 constructed in the preceding section by the condition that X satisfy a certain rth-order system of differential equations; hence, in particular, certain functions of the rth-order derivatives of X must be continuous across N . A common type of differential equation to require of a vector field X is that the Lie derivative of a tensor field on A4 be zero. For example, we shall examine the cases where the tensor field is a p-differential form 1). Suppose, then, that X ( 0 ) is continuous across N . Suppose that Y,, . . . , Y,, are C" vector fields on M . Then X(0)( Y,, . . . , Y,,) are continuous across N . But X(O)(YI, . . . , Y,) = X ( O ( Y , , . .
. ?
Y , l , Y, . . . , Y,) ... - O ( Y , >. . ., [ X , Y,,l).
YP)) - WCX, -
9
397
30. Characteristics and Shock Waves
Now X(O(Y,, . . ., Y,)) is continuous across N , since X is. Thus we have proved THEOREM 30.3 If 8 is a p-form such that X(8) is continuous across N , then for ul, . . . , up E N , , e(s,(u,),
u2,
. . . , up) + . . . + qv1, .. . ,u p -
s,(u,))
= 0.
In particular, if 8 is a nonzero m-form (m = dim M ) , then the linear transformation u -+ 6(v) of M , -+ M p has trace zero.
Example
A4 has a pseudo-Riemannian metric, defined by an inner product ( , ); X = grad A for f E F ( M ) , that is, ( X , Y ) = Y ( f ) for Y E V ( M ) ; 8 is the volume element differential form defined by the metric; N is a hypersurface of M . Now X(8) = (div X)O, where div X is the divergence of the vector field X . Hence A(f), the Laplacian off, is div(gradf). Suppose that we require that A ( f ) be continuous across N . LEMMA 30.4 With X
= grad J;
we have that ( M u ) , u> = ( u , d,(u)>.
Proof. Let Y , 2 be vector fields on M . Then,
[Y,X]=VyX-V,Y. (VY
x,2 ) = Y ( < X)0, - + ( X , VY Z ) = Y ( W ) ) + <x,VY -0.
We see, then, that the only continuity jumps in ([ Y, X I , Z ) occur in the contribution of Y ( Z ( f ) ) ,which is equal to Z( Y(f))+ [ Y, Z ] ( f ) .Since [ Y , Z ] is a first-order operator, we have the lemma. Now suppose that g is a function on M such that dg # 0 at every point of N , but g = 0 on N . Then, forp E N , grad g is perpendicular to N , . We shall show that: If the first derivatives of X are discontinuous across N at every point of N , then grad g must lie in N , for each p E N , . In particular, (grad g, grad g) = 0 on N . Suppose otherwise : That is, for some p E N , grad g(p) is linearly independent from N , so that N , and grad g ( p ) together span N , . By Theorem 30.3,
398
Part 4. Additional Topics-Differential
Geometry
the trace of 6 must be zero; the geometric compatibility conditions require that S,(N,) = 0. These two facts require that & & - a ddP)) E N , . Then (6(grad g ( p ) ) , grad g(p)> = 0. But also, by Lemma 30.4, (h(grad
N P > = 0.
Since ( , ) is nondegenerate, &grad g ( p ) ) must be zero, which implies that 6 ( M p )= 0, contradicting that the first derivatives of X have a jump across N . In other words, we have proved the following theorem: THEOREM 30.5 Let A be the Laplace-Beltrami operator associated with a pseudo-Riemannian manifold. Consider the partial differential equationf- A ( f ) . . . . (The dots indicate terms of lower order than the second.) Let g be a function on M whose level surfaces are characteristics for the equation in the sense that they are shock solutions for which there are the surfaces of discontinuity for the second derivatives. Then the length of grad g is zero; that is, grad g is a lightlike vector field on M .
+
Shock Conditions for Tensor Fields Occurring in Classical Continuum Physics Let M be a manifold with an affine connection V . For simplicity, we again suppose that V has a zero torsion tensor. Let T be a tensor field of type ( I , 1) on M . This means that T is an F(M)-linear map of P'(M)+ V ( M ) . Let Y E V ( M ) . The roruriunt derivatice of T by Y , denoted by V,T, is another tensor field of the same algebraic type as T. Explicitly,
V , T ( X )= V,(T(X)) - T(V,X)
for X
E
V(M).
(30.4)
(The reader will verify that V,T so defined is F(M)-linear on X ; hence it really does define a tensor field on M.) Hold X and a point p E M fixed: u + V , T ( X ) defines a linear transformation of vectors at p into vectors at p. The trace of this linear transformation is a number depending linearly on the value of X at p . As p and X vary, we get a differential form on A4 called the dicergence of T, denoted by div(T). To get an explicit formula for div T, choose a basis ( X i ) for vector fields on M(l 2 i , j , . . . n = dim M ) . Let (mi)be a dual basis for 1-forms on M ; that is, m , ( X j ) = 6,, . Then div T ( X ) = w,(V,,(T)(X))
for X E V(A4).
(30.5)
399
30. Characteristics and Shock Waves
Tensor fields of the type T can occur in a very fundamental way in classical continuum physics. This tensor operation of " divergence " is the basic one appearing in the partial differential equations of continuum physics. We shall not attempt here to explain why this is so. I t seems to be built into the geometric assumptions underlying our understanding of the " continuum." Brillouin's book [l] offers the reader the most convincing detailed explanation of this fact, at least for elasticity theory. Many books on general relativity offer more general explanations. (The books by Levi-Civita [l] and Einstein [I] are the best from this point of view.) Indeed, that this tensor operation of " divergence " occurs in the same way in all branches of classical continuum mechanics was one of the main mathematical clues in Einstein's mind in constructing The General Theory of Relativity. Suppose, then, that T is such a tensor field on M , that N is a hypersurface of M such that T is C" on M - N , but is merely continuous on all M . For each X E V ( M ) ,T ( X ) is then a vector field that is C a, on M - N , but which is only continuous on M . Hence, for p E N , for two curves CT and C T ~starting at p but pointing out into M - N , for u E M , , we can define 6T(.X)(u)E M p
G S base, measuring the jump in the first derivatives of T ( X ) across N . To make things more symmetric, we can define 6,: M , x N p -+ M , by setting 6T(X(P)?
u> = 6 T ( X ) ( U ) -
Explicitly, then, for X , Y E V ( M ) ,
Now
[ Y , T(X)I
= V,(T(X)) - V T ( X ) y = (V,
T ) ( X )+ W , X ) - V T ( X ) Y .
Since X and Y are C" vector fields, and T itself is continuous across N , the last two terms contribute nothing to (30.6); hence 6T(x(P),
'(P))
= l i m 'k@{ 1-0
T)(X)(CT(t)) - (vY
T)(x)(CTl('))}.
(30.7)
The geometric compatibility conditions are then
6,(M,, N P )= 0
for all p E N .
(30.8)
Suppose that div T is continuous across N . Then, by (30.7), the trace of the transformation u -+ 6,(~, u) is zero for fixed u E M,. This, together with (30.8), implies that
6,(M,, M P )c N ,
for all p
E
N.
(30.9)
400
Part 4. Additional Topics-Differential
Geometry
To get some interesting information about N , let us suppose that M has a pseudo-Riemannian metric ( , ) for which V is the Levi-Civita affine connection, and such that (T(X), Y ) = k ( X , T( Y ) )
for X , Y
E
V(M).
(30.10)
(In elasticity it is usual to have a (+) sign, while the (-) occurs in electromagnetism.) Suppose that g is a function on M such that dg # 0 at points of N , but that g ( N ) = 0. For ,'A Y, 2 E Y ( M ) , (V,(T)(X),
z>= Y(
+
= -Y(X, =
(The terms indicated by
T(Z))
+ ...
k ( X , (V, T ) ( Z ) ) + . . ..
. . . do not affect the jump relation.) Thus we have that
( M X , Y), 2 ) = f ( M Z , Y ) , (XI>.
(30.11)
In particular. 0 = (d,(X, Y ) , grad g>
=
+_<Mgradg, Y ) , X ) .
Summing up, we have proved:
THEOREM 30.6 Suppose that Tis a tensor field on M of type (1, 1) satisfying (30.1 1); that N is a hypersurface on M across which T and div Tare continuous. Suppose that g is a function such that g ( N ) = 0. Then &(grad g, M P )= 0
for all p
E
N.
(30.12)
Notice that the condition div T continuous is not yet sufficient to indicate a condition for N which is independent of T. Such a condition requires additional differential equations that T must satisfy, which in physical problems involve thermodynamic conditions.
31 The Morse Index Theorem Consider a differential equation of the form
u"(t) + r(t)v(t) = 0.
(31.1)
Classical Sturm-Liouville theory deals with such equations in which u(t) and r ( t ) are scalar-valued functions of t . The theory of these equations is well known, and of course they appear in many contexts in applied mathematics. However, the theory of systems of type (31.1) in which u ( t ) is a vector-valued function of t is considerably less developed and less well known, despite the fact that many physical problems lead to such equations in a very natural way, particularly in stability problems. Morse [l] has developed the foundations for a successful generalization of the classical Sturm-Liouville theory to such systems. We shall now present enough notations and definitions to be able to state the main result: the Morse index theorem. The proof will be given later. Since it may be difficult for the reader to see the forest for the trees while reading the proof, we may point out here that the proof basically consists in putting together certain well-known analytical techniques concerning systems of second-order linear differential equations with the basic ideas of the calculus of variations. The main difference between our proof and Morse's is that we try to work directly with the infinite-dimensional linear spaces that occur, whereas Morse, by a variety of ingenious analytical and geometric tricks, tries to reduce the infinite-dimensional situation to a finite one. Let V be a vector space of finite dimension? over the real numbers. Elements of V will be denoted by such letters as u, u, w, .. . . It will be assumed that V has a given fixed, positive-definite, symmetric bilinear form (u, v ) + (u, v). Thus
+ bu, a l u l + b , ~ , =) aa,
For a, a,, b, b,
+ aib + bbi(v,
E
R (=real numbers),
(31.2a)
01).
u, u, u,, u1 E V,
for u , u E V ( u , v) = ( u , u ) ( u , u ) 2 0 for u E V, ( u , u ) = 0 if and only if u
(31.2b) = 0.
(31.2~)
t It seems to be an open problem to extend the theory to infinite-dimensionalspace. 40 1
402
Part 4. Additional Topics-Differential
Geometry
For v E V, put llvll = (v, u ) " ~ . Recall that the following inequalities follow from the positive-definite condition : (a) (b)
+ uI/ I ilull + 1 1 ~ ~ 1(triangle 1 inequality).
IIu
I(u, v)l 5 jlulj IIuil. Equality holds if and only if au a, b E R (Schwarz inequality).
+ bv = 0 for some
We must also consider linear transformations of V onto itself, usually denoted by R , S, T, . . . , and bilinear symmetric forms other than ( , ) that will not necessarily be positive-definite [that is, satisfy (31.2a), (31.2b), but not (31.2c)], and that will be denoted by Q( , ). A linear transformation R : V-+ V is said to be symmetric if ( R ( u ) , v)
=
( u , R(v))
for u , v E I/.
Now t will be a real parameter extending over the interval [0, a)or a subinterval. We shall consider vector-valued functions of t , denoted usually by u(t), v(t), etc., defined over an interval and usually continuous, piecewise C 2 , and taking values in V . The derivative with respect to t is denoted by u'(t), u"(t),dvldt,etc. We shall be considering differential operators of the form v - v " ( t ) R,(u(t)),also denoted by
+
d2 J=-+R dt2
(31.3)
''
where t R , is a one-parameter family of symmetric linear transformations of V. (It is possible to generalize the theory by including some kinds of terms in v' on the right-hand side of (31.3), but we prefer to treat this simpler case, referring to Morse [ l ] for a complete treatment). We must also consider boundary conditions: Algebraically, a boundary condition is an ordered pair (W, (3) consisting of a subspace W c V and a bilinear, symmetric form (u, v) -+ Q(u, v) defined on W alone. One fundamental problem may be described as follows: Find a solution of -+
~"(t)
+ R,(u(t)) = 0,
(31.4)
0I f I CO,
subject to the following boundary conditions : u(0) E W,
(o'(O),
u(a) E W",
(v'(O), w) = -Q"(v(a), w)
w) = -Q(v(O), w)
for all w E W,
for all w
E
(31.5)
W", (31.6)
for a given number a > 0, and two sets ( W , Q ) and ( W " , p") of boundary conditions. We refer to (31.5) and (31.6) as, respectively, left- and righthand boundary conditions.
403
31. The Morse Index Theorem
There is a problem in the calculus of variations associated with (31.4) through (31.6) that is the foundation for the Morse treatment. Proceed as follows to find it: Supposeu(t),O 5 t i a, satisfies (31.4) through (31.6). Then?
- j:
+ R,(v(O), 40) = - ( u ' ( O , u ( t ) >
This suggests the following definition: Suppose D o ) , 0 I f _< a, is a curve in V . Define
and call it the index of the curve u. If our boundary-value problem, (31.4) through (31.6), admits a solution, there is a curve u with Z(u) = 0; hence it is suggested that we turn this remark around and try to minimize Z(u) by a curve u ( t ) satisfying (31.4) through (31.6). This is an ordinary variational problem. It is readily verified that its Euler equations are (31.4), but this fact wiIi remain in the background. In this treatment we shall restrict ourselves to the case in which the righthand boundary conditions ( W " , Q") are identically zero. Definition
A point a E (0, GO) is said to be a focal point for the operator and boundary condition ( W , Q ) if there is a nontrivial C 2 curve u ( t ) in V , 0 I t 5 a satisfying J(u) = 0,
(v'(O), w) = -Q(u(O),
u(0) E
w ) for all w
w, E
W,
u(a) = 0.
The index of such a focal point is equal to the dimension of the linear space of all curves satisfying these conditions (hence, it is alwaysinfinite and no greater than the dimension of V ) .
t Where it is felt that it will lead to no confusion, we shall compress the notation by omitting t.
404
Part 4. Additional Topics-Differential
Geometry
DeJnition
Let [0, a ] be an interval of real numbers. Let Q(0, a) be the space of continuous, piecewise C 2 curves t + u(t), 0 < t < a, in V satisfying the following conditions: v(0) E W,
(u’(O), w )
=
-Q(u(O), w ) for all w
E
W,
u(a) = 0.
Since two such curves can be added pointwise and multiplied by real constants, Q(0, a) is a vector space over the real numbers. For u E R(0, a), let
Define the index of the interval [0, a ] as the maximum number of linearly independent elements of Q(0, a) on which the function Z is negative. Thus there are the two distinct ideas of index of a focal point and index of a closed interval [0, a ] . They are related via the following main theorem : THEOREM 3 1 .I
(MORSE INDEX THEOREM)
The index of an interval [0, a ] is finite and equal to the sum of indices of the focal points contained in the open interval (0, a). It is also equal to the maximal number of linearly independent elements of Q(0, a) that are C 2 and are eigenfunctions of the differential operator f = (d2/dt2)+ R , for positive eigenvalues. As a general intuitive remark, notice that the index of an interval is an analytical invariant of the operator J and boundary condition ( W , Q), while the sum of the indices of the focal points is more like a topological invariant. Thus the index of the interval may be expected to vary reasonably smoothly when J , ( W , Q), or [0, a ] are varied in a reasonably smooth way. As such a variation is performed, it is not expected that each focal point variessmoothly; the remarkable fact contained in the index theorem is that the sum of indices of the focal points does vary in a more reasonable way. Another intuitive remark is that the index theorem provides the foundation for a perturbationtheory approach to the problem of finding focal points.
Proof of the Morse Index Theorem Let V , ( W , Q), J = (d2/dr2)+ R , , Q(0, a), etc., be as described in the Introduction. They will be considered as fixed throughout the discussion. The proof of the index theorem will be broken up into small steps.
31. The Morse Index Theorem
405
LEMMA 31.2 Given a pair (uo , uo) of vectors in V, there is a unique curve in V : t + ~ ( t ) , 0 < t < co, satisfying J(u) = 0, and u(0) = zio, ~ ' ( 0=) uo . In particular, if uo = uo = 0, then u ( t ) E 0. If u o , u o , and the coefficients of J depend continuously on additional parameters, so do the resulting solutions, and the dependence is uniformly continuous for t ranging over a bounded closed interval. This follows from the basic existence theorem for ordinary differential equations.
LEMMA 31.3 The vector space of solution curves of J = 0 that are C 2 and satisfy the (W, Q ) boundary condition at t = 0 has the same dimension as V. Proof. For later reference, we shall prove a little more and develop additional notations. Suppose dim V = n, dim W = m, dim W' = n - m. (W' denotes the orthogonal complement of Win V with respect to the form { , >. Explicitly, WL = { u E V : ( u , w> = 0 for all w E W}.) Adopt the following ranges of indices: l i a , b ,... < m ;
l I i , j,... I n ;
m+lIa,B,
... I n .
(31.8)
Adopt a fixed orthonormal basis (ui)of V such that (u,) and (u,) are, respectively, orthonormal bases of Wand W'. Then we can find n-solution curves of J = 0, denoted by v,(t), 0 I t < 00, 1 < i i n, such that
"') w
(vo'(o)' v,'(O) E
v,(O) =
1 i a i m.
= u,,
-Q(u,, w) V,(O) = 0
v,'(O)
for all w E W, 1 i a 5 nz. for m
= u,
(31.9a)
+ 1 i a I n.
(31.9b) (31.9c) (31.9d)
(The existence and uniqueness of solution curves satisfying these conditions follow easily from Lemma 31.2.) Note also that these curves satisfy the (W, Q)-boundary condition at t = 0. We show that the curves v,(t), 1 I i 5 n, are linearly independent. Suppose there is a linear relation of the form
c ci n
i
Vi( t ) = 0.
ca
Setting t = 0 nd using (31.9a) and (31.9c), we have C, u, = 0, implying C, = 0, implying C,u(t) = 0. Differentiating, setting t = 0, and using
B
ca
406
Part 4. Additional Topics-Differential
Geometry
(31.9d), we have C, 11, = 0, forcing C , = 0, whence linear independence of the c , ( t ) . To complete. the proof of Lemma 31.3, we show that every solution curve u ( t ) if J = 0 satisfying the ( W , Q)-boundary condition at t = 0 can be written as a sum of the with constant coefficients. Now we have u(0) E W ;hence u(0) can be written as l l , ( r )
v(0) =
c c,
x,
a
u, =
2 c, u, c, u,(0) a
The solution curve r ( t ) C,v,(t) is zero for t t = 0 can be written as a sum
= 0;
hence its derivative at
1,
Thus o(t) C, u,(t) is a solution of J = 0, is zero at t derivative is zero at t = 0; hence it is identically zero.
= 0,
and its first Q.E.D.
For future reference, we shall refer to the basis ( u , ( t ) ) of solutions of J = 0 and the ( W , Q)-boundary condition constructed above as a canonical basis. LEMMA 3 1.4 I f ~ ( tand ) ~ ( tare ) two solutions of J condition at t = 0, then (u'(t),
Proof.
w(r))
= (u(t),
=0
satisfying the ( W , Q)-boundary
w'(t))
for 0 5 t < co.
(31.10)
Note the identity
ri ( ( ~ ' ( w(t)> 0 , - (40, w'(t)>) = ( ~ " ( t ) ,4 rl t
9 ) + (u'(4, w'(t)> - (u'(9, w'(t)> - (4%w"(t>> = < - Rt(v(t)), 4 0 ) + ( U ( t > , R,(w(t)))
-
= 0,
obtained by use of the symmetry property of R , . Now
(~'(o),~ ( 0 ) -) <40), ~ ' ( 0 ) )= -Q(u(O),
~ ( 0 )+ ) Q(u(O), ~ ( 0 )= ) 0.
LLMMA 31.5 If
E
is sufficiently small, there are no focal points on the interval [O,
E].
Proof. Let ( ~ ) { ( t )I ) < , i 5 n, be a canonical basis for solutions of J = 0 satisfying the ( W, Q)-boundary condition at r = 0. Define curves i v , ( t ) as follows:
w,(t) = v,(t)
for 1 I ci I m;
w,(t) =
lJ
(2) t
for m
+ 1 II n. c(
407
31. The Morse Index Theorem
By (31.9), w, is continuous at t = 0 and equals there the v,’(O) = u, . Then the vectors (wi(t)) are linearly independent for t = 0 ; hence by continuity also for t sufficiently small, say, for 0 5 t < E . Then [0, E ] can contain no focal points. For suppose otherwise: That is, u ( t ) is a solution of J = 0 satisfying the (W, Q)-boundary condition at t = 0 and vanishing at, say, t = E . By Lemma 31.3, u(t) can be written as C i v , ( f ) for , constants Ci . Hence, also,
xi c c,
0 = U(E) =
a
W,(E)
+ 1c, W,(E) a
‘
E.
Thus Ca = 0 = C,; hence u ( t ) = 0, a contradiction. LEMMA
3 1.6
Suppose to E (0, a) is a focal point. Then, for E sufficiently small, [to- E , to E ] contains no other focal point. We then conclude, using also Lemma 31.5, that each bounded interval contains only a finite number of focal points.
+
Proof. Suppose v i ( t ) , 1 i i I n, is any basis of solutions of J the ( W , Q)-boundary condition at t = 0 such that
for 1 i i i p ,
vi(to) = 0
+ 1 I i s n ( p is then the index of the
but ui(t0)are linearly independent for p focal point). By formula (31.10),
(v;(to), uj(to))
for 1 i i I p, p
=0
= 0 satisfying
+ 1 i j i n.
The ui’(to) must be linearly independent for 1 5 i i p [otherwise the u l ( t ) , . . . , u,(t) could not be linearly independent]; hence u,’(tO),. . . , up’(fo),~ ~ + ~ ( f ., .).,, v,(to) must form a basis for V. Now lim r+to
hence, if
E
t - to
= u;(to)
for 1 5 i
p;
is sufficiently small, the vectors for 1 5 i i p ,
p
+ 1 C j i n,
form a basis for V. The proof that there are no focal points on [to - E , to + E ] is now similar to that in Lemma 31.5. We need more notation. If u ( t ) is a curve in Q(0, a) and W(t),0 t i a, is any continuous piecewise C’ curve in V satisfying the (W, Q)-boundary condition at t = 0; put
I(v, w ) =
-Q(W, ~ ( 0 )+)S’(v’(t), 0
w‘(t)> -
(R,(v(t)),w ( t ) > d t . (31.11)
408
Part 4. Additional Topics-Differential Geometry
Let Q,(O, a) be the subset of curves u ( t ) in Q(0, a) defined by taking all linear combinations with constant coefficients of curves of the following type: For each to E (0, a] that is a focal point, consider a C 2 curve u(t) in [0, t o ] that satisfies J = 0 and the ( W , Q)-boundary condition at t = 0, and that vanishes at t = t o . Extend this curve over [0, a] by defining u ( t ) = 0 for to t a. This is shown graphically in Fig. 6. Then: The dimension of Q,(O, a) as a real vector space is equal to the sum of indices of the focal points on the interval (0, a].
4
(3 1.12)
0
FIGURE 6
LEMMA 3 1.7 Let (vi(t)), 1 i i i 17, 0 I t i a be any basis for the vector space of curves in V that are C 2and satisfy J = 0. Suppose that u(t), 0 I tI a, is a differentiable curve in V such that Z(v, w)= 0 for all curves w ( t ) , 0 I t I a, that lie in R,(O, 1). Then u ( t ) admits a representation as u(t) =
n
1 ai(t)ui(t)
i= 1
for 0 I t i a,
(31.13)
where the coefficients ai(t) are continuous, piecewise C 2 functions for
OIt
Proof: Obviously, u ( t ) admits such a representation (31.13) valid except possibly for the values o f t that are focal points. We must show the functions ai(t) obtained in this way have a limit as t approaches a focal point. Suppose, then, that to E (0, a ] is such a focal point. We may suppose the basis ( u i ( t ) ) is chosen so that ui(to) = 0 for 1 i i I p , and ( u i ( t o ) )are linearly independent for p 1I iI n. By Lemma 31.3,
+
(ui’(to), uj(to))
=0
for 1 5 i I p,p
+ 1I jI n.
As before, this implies that ul’(to), . . . , up‘(to), up+,(to), . . . , un(to)forms a basis for V. Then, for t close to t o ,
409
31. The Morse Index Theorem
hence, for t sufficiently close to
to,
forms a basis of V and depends in a C1 way on t. Since u ( t ) is continuous, the functions ai(t)(t - to)for 1 I i Ip and ai(t) f o rp + 1 I i s n are continuous at t o . These remarks are valid for any u that is merely continuous. Now we want to take into.account the fact that Z(u, w) = 0 for all w E R,(O, 1). For 1 I j Ip , let wi be the elements of R,(O, 1) defined as follows: wi(t) = ui(t) for 0 I t I to,
0 = ~ ( uwi) ,
=
- Q ( ~ ( o )Wi(0)) ,
wi(t) = 0 for to 2 t I a. (31.14)
+ f
w;(t)> - < ~ , ( u ( t ) ) wi(t>> , dt
equals, using (31.14), - Q ( ~ ( o vi(0)) ),
+ fo
uj’(t>> - (Ri(u(t)), v j ( t ) > dt
equals, after integrating by parts and taking into account the fact that J(vi) = 0 and that u and ui satisfy the ( W , Q)-boundary condition at t = 0, (u(to), uj’(to)>.Thus, u(to) must be a linear combination of u p + l ( t o ) , . . . ,u,(to). We conclude that lim ai(t)(t - t o ) = 0
“to
for 1 I i 5 p.
Now, since u ( t ) is differentiable, the functions ai(t)(t - to) for 1 I iI p are differentiable at t = t o . We conclude (using the definition of derivative) that lim, ai(t) exists and equals +
Q.E.D.
LEMMA 31.8 Let u i ( t ) , 1 I i s n, 0 5 f I a, be a basis of curves in V that are Cz and satisfy J = 0 and the (W, Q)-boundary condition at t = 0. Suppose u(t) and u(t), 0 I tI a, are two curves in V admitting representations of the following type : n
410
Part 4. Additional Topics-Differential Geometry
Suppose in addition that the functions f i ( t ) are continuous and piecewise C’ for 0 < t 5 a, and that u ( t ) satisfies the ( W , (3)-boundary condition at t = 0. Then Z(u) 2 I(u). Equality holds only if u = u.
41 1
31. The Morse Index Theorem
Since all the other terms in this identity approach a limit, we have
Since it must clearly be > O unless f i ’ ( t )= 0, that is, unless u(t) = ~ ( t )for 0I tI a, we have Z(u) > Z(u) except if u = ZI. Q.E.D. This lemma is due to Ambrose 111and serves as a replacement for the arguments from the general calculus of variations that were used by Morse. COROLLARY TO LEMMA 31.8 The interval [0, a] contains no focal points if and only if Z(u) > 0 for all curves u E O(0, a). (In other words, the Morse index theorem holds if [0, a] contains no focal point.) We must now apply Lemma 31.8 in the special case in which W and Q are both zero; focal points are, in this case, called conjugatepoints. For the reader’s convenience, we restate the definition in slightly different form. Definition
Let a and b be positive real numbers; a and b are said to be mutually conjugate if there is a C 2 curve ~ ( t ) not , identically zero, satisfying U“ R,(u(t))= 0 and v(a) = v(b) = 0.
+
LEMMA 3 1.9 Suppose that a and b are real numbers, 0 5 a < b, such that the realnumber interval between them contains no pair of mutually conjugate points. Suppose that u(t) and v(t), 0 2 t < co, are continuous curves such that u is piecewise C 2 and v is C 2 .u satisfies:
(it: + -
Then b
ja
IIV’(t)l12
)
R , (v) = 0.
u(b) = ~ ( b ) ,
U(U)
= V(U).
1
b
- (R*(v(t)),40) d t I llu’(t)ll2 - (R,(u(t)),4 9 ) dt.
Equality holds only if u(t) = u ( t ) for a
tI b.
Prooj. First we deal with the case u(a) = v(a) = 0. By a translation of the origin of the t-axis, we can also suppose that a = 0. The result then follows from Lemma 31.8, since our hypotheses imply that there are no focal points in the interval (0, b], with respect to the boundary conditions W = 0, Q = 0, at t = 0.
412
Part 4. Additional Topics-Differential
Geometry
Now we reduce the general case u(a) = u(a) to the case just considered. By Lemma 3 1.3 (since a and b are not mutually conjugate), there is a C curve w ( t ) satisfying w(b) = 0, w ( a ) = u(a) = v(a),
($
- R j w = 0.
413
31. The Morse Index Theorem
Now d dt ((402
-
w’(t>>
- (u‘(0,
w(t>>> = ( u ’ ( 0 , w’(t>>
+ (4f>,w”(t)>
- (u”(t>, 4 0 ) - (u’(t>, w’(t>>
= <do,R,(w’(t))> -
Hence,
w(t)> = 0.
(U(b), w’(b)) = (u’(b), 4 b ) ) + (u’(a>, w(a)> - (u(a>, w’(a)> equals, after using the relations w(b) = 0 and w(u) = v(u),
- .
Putting the last relation together with the last inequality proves Lemma 3 1.9. Lemmas 31.8 and 31.9 may be regarded as particular analytical tools needed to prove the index theorem. They really contain fundamental facts from the calculus of variations in a disguised form. Now we proceed to another analytical tool, proving (again in slightly disguised form) a “ compactness ” principle for solutions of the type of differential equations we have been considering.
THEOREM 3 1.10 Suppose that J k : (d2/dt*) - R,k is a sequence of differential operators of the type we have been considering, k = 1, 2, . . . ,that ( W k ,Qk) is a sequence of boundary conditions, and that uk is a sequence of real numbers. Suppose that lirn R,k = R , ,
k-+ m
lim W k = W,
k-t m
lirn Qk = Q,
k-tm
lirn ak = u > 0.
k+ w
Suppose further that uk(t), 0 < r < a,, is a sequence of C2 curves in V’ satisfying the ( W k ,Qk)-boundary condition at r = 0. u k ( a k ) = 0,
Jk(Uk)
= 0,
Jrlluk(t)ll
dt 5 1 .
Then at least one subsequence of the uk converges, along with its first two derivatives, uniformly to a C 2 curve u(t), 0 i r I a, that is a solution of J(v) = 0, the ( W , Q)-boundary condition, and u(a) = 0. Proof. Let uk(t)
=
uk(t)
11 uk(o) 11 + 11 uk‘(o) 11 ’
Since \\z/k(O)\/ and l\ui(O)I\ are i 1, we can suppose after at most taking subsequences that limkAmuk‘(0) = uo , limk+a uk(0) = u l . By the existence theorem for ordinary linear differential equations of the type J k = 0, uk(r) converges, along with its first two derivatives, uniformly to a Cz nonidentically zero curve u(r), 0 I tI a, that is a solution of J(u) = 0, the ( W , Q)-boundary condition,
414
Part 4. Additional Topics-Differential
Geometry
and u(a) = 0. Then, also, lim j”r((U,(f)l(2dt
But also
k-m
= f\\u(t)ll 0
dt.
Hence IIuk(0)))+ lluk’(0)ll itself is bounded, and we can apply the existence theorem for the systems J k = 0 to infer the existence of a curve v ( t ) toward which v k ( t ) and its first two derivatives converge uniformly. Q.E.D. We can now proceed to the proof of the index theorem. It is most convenient to arrange the proof so that the final result will appear as a statement which says that different kinds of indices are in reality the same. Hence we now introduce the different indices, and also the so-called augmented indices. Definition Let a be a positive real number. Zl(O, a ) = the sum of indices of the focal points contained in the interval [0, a ) . AZ,(O, a ) = the sum of indices of the focal points contained in the interval [0, a]. (Thus AZ, is the augmented index corresponding to the index Zl.) Z2(0,a ) = the maximal dimension of a linear subspace of Q(0,a) on which the form u -+ Z(v) is negative-de3nite. AZ2(0, a) = the maximal dimension of a linear subspace of Q(0, a) on which the form u 3 Z(v) is ,ie~ative-seniide~nite. Z3(0, a) = the maximal number of linearly independent C 2 eigenfunctions of the differential operator ( d 2 / d f 2+) R, corresponding to positive eigenvalues that also satisfy the boundary conditions implied by membership in Q(0, a). AZ,(O, a ) = the maximal number of linearly independent C 2 eigenfunctions of the differential operator ( d 2 / d t 2+ ) R , corresponding to nonnegatice eigenvalues that also satisfy the boundary conditions implied by membership in Q(0, a ) .
415
31. The Morse Index Theorem
Note that several facts follow readily: A13(0,a) - f3(0, a ) = AI,(O, a ) - 11(0,a) = the
index of the focal point w (if a is a focal point) (31.15)
= 0 (if a is not a focal point).
AI,(O, a ) _< (AI2(0, a)).
I 3 ( 0 , a) 5 I 2 ( 0 , a),
(31.16)
Proof. Suppose that u(t), 0 5 t Ia, lies in Q(0, a) and satisfies u”(t)
Then
+ R,(u(t))= A2u(t),
- A 2 sall(u(t))l12 dt
=
0
-ia(u(t, 0
with
A 2 0.
+ R,(u(t)))d t
~“(t))
equals, after integrating by parts and taking into account the boundary conditions satisfied by u, Then, Z(u) < 0. To prove (31.16), notice now that u + I(u) restricted to the linear subspace of Q(0, a) spanned by the positive eigenfunctions of (d2/dt2)+ R, is negativedefinite. Similarly, u + Z(u) is negative-semidefinite on the subspace of Q(0, a) spanned by the nonnegative eigenfunctions of Q(0, a).
LEMMA 31.11 AZ3(0, a) is finite. Proof. Suppose otherwise: That is, there are an infinite number of C 2 eigenfunctions u k ( t ) , k = 1, 2, . . . ,0 5 t < 1, of ( d 2 / d f 2+ ) R, corresponding to eigenvalues Ak2 and satisfying the boundary condition corresponding to lying in Q(0, a). We can suppose without loss of generality that IQ(uk(o>, u k ( o ) ) l
-k
sa
J: (uk(t), u j ( t ) ) dt = 0 Then Ak2
=
ia
Rt(uk(t)),
= -Q(uk(O), 4 0 ) )
5
- Q(uk(O),
Uk(0))
uk(t))
d t = 1,
if k # j .
dt
- salt~’(011- (R1(uk(t)). ~ d t ) )d t 0
J a ( R t ( h ( t ) ) .Uk(t)) 0
dt.
(3 1.17a) (3 1. I7b)
416
Part 4. Additional Topics-Differential
Geometry
There is a real number S such that for all u E V , 0 I tI a.
(R,(u), u ) I S ( u , u ) Then Ak2
5
IQ(uk(o),
uk(o))l
4-6 f(%(f). 0
uk(f))
dt
<1+6; that is, the sequence (;lk2) is bounded. We can then suppose, after possibly taking subsequences, that limk-m Ak = A, and that u k ( t ) converges as k + co, along with its first two derivatives, uniformly t o a curve u ( t ) that is a n eigenfunction for eigenvalue A (using Theorem 31.lo). But this contradicts (31.17b).
LEMMA31.12 AI,(O, a ) = A12(0,a),
Z3(0,a ) = IZ(0,
a).
Proof. We prove the first equality: The second is similar. Let d = A13(0,a). Then there are dlinearly independent eigenfunctions u , ( t ) , . . . , u d ( t ) in Q(0, a), with eigenvalues I,’, . . . , I”,’. We can normalize so that J:(uj(t),
for 1 Ij , k I d.
u k ( t ) ) dt = hj,
M‘hat we must show is that if u(t) E Q(0, a) satisfies JI(uk(t), u ( t ) ) = 0
for 1 I k I d,
(3 1.18)
then I ( r ) > 0. Suppose otherwise: That is, such a u exists with Z(u) 0. Now mirrimhe I(u) over all u E Q(0, a) satisfying (31.18) and J; IIu(t)lJ2dt = 1 . Using the “direct method” of the calculus of variations [l], this minimum is taken on by a C2 function u o ( t ) in Q(0, a) that is also an eigenfunction of ( d 2 / d t z ) R, with eigenvalue 2,. But this eigenfunction would then have t o satisfy
+
02
I(u0) =
-Ao
s,”
IIuo(t)112
dt,
forcing 2, 2 0, contradicting the definition of d.
LEMMA31.13 If 0 I a I b, then IAO,
(1)
I fZ(0,b),
Af,(O, a ) I AI,(O, b),
A I , ( O , a ) I IZ(0, b).
417
31. The Morse Index Theorem
Proof. Choose E sufficiently small and positive so that a + E I b, and so that there are no mutually conjugate points on the interval [a - E , a + E ] . We are now going to define a linear mapping cjE:R(0, a) R(0, b) such that --f
I(4,(u)) I I ( u )
for each u E R(0, a )
by making use of Lemma 31.9. Explicitly, 4,(u)(t), 0 tinuous piecewise C 2 curve in V such that
t
< b, is to be a
for 0 I t I a - E .
4,(u)(t) = u(t)
d2 dt2
4&(U)(U
- E)
+,(u)(t)
= u(a - E ) ,
4&(U)(U
for a
=0
+
E)
+E 5 t I b.
con-
(3 1.20a)
4,(u)(t) is a solution of - + R , = 0, with
(31.19)
(31.20b)
= 0;
(31 . 2 0 ~ )
The reader will readily verify that 4&is a linear mapping that (using Lemma 31.9) satisfies (31.19). Suppose now that ul(t), . ..,u d ( f )are linearly independent elements of R(0, a) on which the form Z is < 0. By (31.19), 1(q58(uk)) < 0 for 1 k 5 d. We must then show that bE(uk)(t)are linearly independent if E is sufficiently small. Suppose otherwise: That is, there is, for each E , a relation of a k ( ~ ) u k (= t ) 0 valid for 0 I t I a - E . the form: If= We can normalize so that ak(&)’= 1. Then there would be a sequence of E going to zero such that a k ( & ) + a k ,and a relation of the form aku,(t) = 0, valid for 0 t I 1, contradicting that the uk were linearly independent. This proves the first inequality in Lemma 31.13. The second is similar. The third involves a slight modification of the argument. Let a,, . . . ,as be the focal points in the interval [0, a],arranged so that 0 < a, < a2 < . . . < a f Ia. Let d,, . .., df be the indices of each of the focal points. Let u , ( t ) , . . . , u d , ( t ) , 0 I t I a,, be in R(0, a,), be linearly independent, and C 2 , and satisfy ( d 2 / d t 2 ) + R, = 0. Then 4&(u1),. . . , 4&(ud,) span a subspace of R(0, b) on which I is
,
zf,,
,
LEMMA 31.14 If E is sufficiently small, A13(0, a
+
E)
I AZ3(0,a).
41 8
Part 4. Additional Topics-Differential
Proof. Suppose otherwise: Let E ~ k, numbers with E~ + 0 as k -+ co, and with
AI3(0, a
+
+
Ek)
=
Geometry
I , 2, . . . ,be a sequence of real a)
2
+ 1.
u j , J t ) , 0 1j5 AZ3(0, a) 1 , 0 I k < co, 0 I t I a + c k , be curves in Q(0, a & k ) that are C 2 , that are eigenfunctions of (d2/dt2)+ R , with nonnegative eigenvalues, and that, for fixed k , are linearly independent. Using Theorem 3 1.10, we can arrange (by taking subsequences and normalizing) that
Let
+
that the first two derivatives converge uniformly for 0 I tI a, and that the corresponding eigenvalues of (d2/dt2) R , converge. Then
+
Joa(uj,(t), u j , ( t ) ) dt
for
= Sjlj2
1I j l ,j , I AZ,(O, a )
+ 1.
But the u j are Cz, belong to Q(0, a), and are eigenfunctions of (d2/dt2)+ R , with nonnegative eigenvalues, and are linearly independent, a contradiction. Now we can prove the index theorem itself. Note that it is equivalent t o the statement:
11(0,a ) = I,(0, a ) = 13(0,a).
(31.21)
Note that we have already proved the second of these equalities (Lemma 31.12). The corollary to Lemma 31.8 implies that the first holds if a is sufficiently small. Assuming that (31.21) is true for a, we shall prove it is true for a + E , if E is sufficiently small and positive. Now, Zl(O,a + E ) = Zl(O, a) + (index of the focal point a). By Lemma 31.13,
+
I ~ ( o U, ) = r2(o,U ) 5 f,(o, E) = I ~ ( oa , if c is sufficiently small, by Lemma 31.14,
+ I)
I AI,(O, a )
+ index of a. f l ( O , a + I ) 5 Z2(0, a + I ) = 13(0,u + I 13(0,a ) + index of a. Thus Z3(0,a + - index of a = 11(0,a), which proves (31.21) for a + I . Then =
13(0,a )
E)
E)
the set of all numbers b such that (31.21) is true for all a E [O, b ] is open. T o complete the proof, we must show that it is closed. Suppose, then, that ah is a monotone-increasing sequence, limk+mak = a, such that (31.21) is true for each ak . We must prove it is true for a also. Thus, we have
fl(0,a k )
=
Uk)
for k
=
1 , 2, . . . .
419
31. The Morse Index Theorem
From the definition of Il we have limk+mZl(O, ak)= Zl(O, a). By Lemma 31.13, Z2(0, ak)5 Z2(0, a). Let us suppose that limk+mZ2(0, ak) # Z2(O, a ) ; that is, for all k .
12(0,a ) > Z2(0, u k )
To complete the proof, we show that 13(0,a ) 5 12(0,a - E )
for
E
sufficiently small.
(3 1.22)
Suppose, then, that u l ( t ) , . .., U d ( t ) are linearly independent C2 eigenfunctions of (d2/dt2)+ R, for positive eigenvalues AI2, . . . , Ad2 that lie in O(0, a ) (d = Z3(0,a)). Choose E > 0 sufficiently small so that there are no mutually
a - 2 ~ a-c
FIGURE 7
conjugate points on the interval [a, a - 261. Let v,*(t), . . . , ud*(t), 0 I t I a - E , be continuous piecewise C 2 curves in V defined as uk*(t)
d2
-
dt2
= uk(t
+ Rr(Vk*)(t) = 0
for 0 I t
< U - 2&,
for a - 2&I t I a - E ,
1< k I d.
12k I d.
(Use Lemma 31.9 to construct the curves in Fig. 7.) Since limc+oZ(uk*) = < 0, for 1 < k I d, we see that I(vk*)< 0 if E is sufficiently small. By an argument similar to that used in proving Lemma 31.13, we see that ul*, . . . , vk* are linearly independent if E is sufficiently small. This proves (31.21) and finishes the proof of the index theorem itself.
I((vk)
32
Complex Manifolds and Their Submanifolds
I n this chapter we present a short treatment of the local theory of submanifolds of complex manifolds, pointing out the analogy with the theory of submanifolds of affinely connected manifolds presented in Chapter 27, and the relation with the analytical theory of functions of several complex variables. First we must describe how complex manifolds are to be regarded from a differential-geometric point of view. To eliminate confusion, it is desirable to eliminate complex numbers themselves from the definition, to regard complex manifolds from a " real " point of view. (Just as the complex numbers themselves are interpreted as pairs of real numbers, eliminating any mysticism about the square root of - 1 .) Suppose that M is a manifold of dimension 2n. Consider a coordinate chart from an open subset of M to a n open subset of R2". Now, 2n-dimensional Euclidean space is just C", the space of n complex variables. We say that M has a coniplex manifold structure if a n atlas of coordinate charts can be chosen so that the transition maps between two charts are defined by complexanalytic functions. We say that a m a p between two manifolds with such structures is complex-analytic (or holomorphic) if, when referred back to C" by the coordinate charts, it is defined by complex analytic functions. (We shall need to consider as known to the reader the elementary facts about holomorphic functions on domains of C".)Two such structures on the same manifold can be regarded as essentially the same if the identity map is holomorphic. I t is important to realize, however, that a given manifold may have many different complex manifold structures and that a manifold need not admit any complex manifold structure. (For example, the 2n-dimensional spheres, for n # 1 o r 3, d o not admit any. It is not known whether the six-dimensional sphere can admit one.) Our first aim is to make this remark clearer byactuallyexhibitinga complex structure as a geometric structure defined by a tensor field on the manifold, just as, say, a Riemannian metric is a structure defined by a tensor field. As a first step in this direction, we describe how the complex analytic structure on C" itself is determined by a tensor field. Consider R2" o r the space of variables (x,, y,), with 1 I i,j, . .. I n. Putting z , = x, + J T y , gives the identification of R2" with C" that we have in mind; that is, the coordinates of R2" are considered as the real and imaginary parts of the complex variables of C". Suppose F = f + J - 1 g is a complex420
42 1
32. Complex Manifolds and Submanifolds
valued function on R2" that is holomorphic. How do we express the holomorphic condition in coordinate-free terms? First, the conditions can be described by the Cauchy-Riemann equations :
-af_ -_ a g .
axi
ay,'
_af__ ayi
ag --
axi'
Define a F(R'")-linear map, that is, a tensor field, J : V(R2")-+ V(RZ")by setting
Then the Cauchy-Riemann equation becomes X ( f ) = J(X)(g)
for all X E V(R2").
(32.1)
Notice also that
J(JX) = -X
for all X
E
We can now characterize complex-analytic maps the J-tensor, namely: For each point p E R2n, each tangent vector u to p
V(R2").
(32.2)
4: R2" + RZmby means of
= ~ , ( J ( u ) )= J @ , ( v ) .
(32.3)
To prove this, note that to prove 4 is holomorphic. It suffices to show that @*(F)is holomorphic for every holomorphic function F on R2". However, the characterization of the Cauchy-Riemann equations as (32.1) makes it obvious that (32.2) is this condition. Equation (32.3) tells us that a given complex structure on a manifold M defines a tensor field J : V ( M )+ V ( M ) , with J 2 = -(identity). For if the J-tensor on R2" is carried over to M by a coordinate chart, then (32.3) implies that J is actually independent of the coordinate chart associated with the complex structure. Now, not every tensor field J: V ( M )+ V ( M ) with J 2 = -(identity) arises in this way from a complex structure: Certain integrability conditions must be satisfied ( M is said to carry on almost complex structure if it merely has such a tensor. The 6-sphere, for example, has such a tensor, which is not integrable). Such conditions are given by Frohlicher [l], and take the form
[X, Y]+J[JX, Y ]=J[X,JY]
+ [JX,JY] =0
for X , Y
E
V(M). (32.4)
The key point is that the left-hand side, as a function of X E Y, is F(M)-bilinear; hence, defines a genuine tensor field. The verification of this is straightforward and is left to the reader. Having done this, notice that to prove it is
422
Part 4. Additional Topics-Differential
Geometry
zero, it suffices to show that it is zero for a basis of vector fields, for example, the basis ((djdx,), (i3/i3yi)),which is obvious. Then we can carry over (32.4) t o a manifold with a complex structure. It turns out that, conversely, a J-tensor satisfying (32.2) and (32.4) arises in this way from a complex structure. If the data are real-analytic, this is not hard t o prove; it was done in the paper by Frohlicher [I], for example. If the data are given as only C”, it is considerably more difficult to prove, but is true. The first proof is by Newlander and Nirenberg [ l ] ; the theorem was first conjectured by D. C. Spencer, and was recently proved following his ideas by Kohn [I], although it turned out that fundamental new ideas in partial differential equations had to be introduced t o accomplish this. At any rate, we shall take our beginning point that a complex structure is defined on a manifold M by a J-tensor satisfying (32.2) and (32.4). Our main concern in this chapter is with the properties of submanifolds of M . First we must consider those submanifolds that themselves are complex manifolds. Let N be a complex-analytic manifold, and let 4: N + M be a submanifold map that is also complex-analytic. We shall call this a complex submanifold of M . From the characterization of holomorphic maps in terms of the J-tensor, we see that J(qh*(N,)) c $ * ( N p )for all p E N. As usual in this book, from now on let us suppress the explicit notation for the submanifold map. Then the condition for a complex submanifold becomes J(N,) = N ,
for all p
E
N.
(32.5)
We now have:
THEOREM 32.1 A submanifold N of A4 is a complex submanifold if and only if (32.5) is satisfied.
Proof. We have already seen that (32.5) is necessary. To prove it is sufficient, notice that (32.5) implies that N itself causes a J-tensor obtained by restricting J to N . That the integrability conditions (32.4) are satisfied, if for the J-tensor restricted to N , is a consequence of the fact that [ X , Y ] is tangent t o N if X and Y are vector fields of M that are tangent t o N . Q.E.D.
Turn now to consideration of a submanifold N of arbitrary dimension. We want to find a method for describing the “maximal” complex submanifold of M that is contained in N . Now if u is a tangent vector of N that is tangent to such a complex submanifold, then J(o) is also tangent t o N . Let us call a tangent vector to N with this property a holomorphic tangent oecfor. We can paraphrase Theorem 32.1 by saying that N is a complex submanifold if and
423
32. Complex Manifolds and Submanifolds
only if all tangent vectors are holomorphic. For p E N , let H p be the subspace of N p consisting of all holomorphic tangent vectors; that is, H,
= {U E
N,; J ( o ) E N , } .
(32.6)
Similarly, define H as the following subspace of V ( M ) : H
=
{X
E
V ( M ) :X ( p ) E H , for all p E N } .
(32.7)
Now there is a possibility of singularities ” in the field p + H p of tangent subspaces; that is, p + dim H p is not necessarily constant on N . However, we shall not consider this sort of pathology here; then, also, for p E N , “
H,
=
( X ( p ) :X
E
H}.
(32.8)
We may ask: Is H p tangent to a complex submanifold of A4 that is contained in N ? In order that this be so, we must have [ X , Y ]E H for X , Y E H. We can construct a tensor field that “measures” the extent to which this is true: For X , Y EH, set L ( X , Y ) = J [ J X , Y ] projected into V ( M ) / H .
(32.9)
(We have chosen this particular combination to assure convenient symmetry properties of L( , ).) Now, we can verify that L( , ) has a tensorial behavior as a function of X and Y (although the term in the right-hand side of (32.9) does not have a tensorial behavior before it is projected). As usual, to verify this, we must show that L( , ) is F(M)-bilinear: For example,
W X , Y ) = J [ J ( f WYI , = J [ f J ( X ) , YI =fJ[JX,Yl + Y ( m ,
= J(SlIJX,
YI - Y(f>J-V
which is equal to f L ( X , Y) when the right-hand side is projected mod H. Hence L passes to the quotient with respect to the restriction mapping H + H,, , and we get, for each p E N , a bilinear mapping (which we again denote by L( , )) of H , x H , + M , / N , . This field of bilinear mappings is called the Levi f o r m of N . Explicitly, then, for X , Y E H , p E N , L ( X ( p ) , Y ( p ) ) = J [ J X , Y ] ( p ) projected into M,/N,.
(32.10)
LEMMA 32.2
The Levi form is symmetric. Proof. This follows from the integrability condition (32.4): J [ J X , Y ] - J[JY, X ]
=
[Y, X ]
+ [JY,J X ] .
The right-hand side projects into zero when projected mod N , . The left-hand side, though, is L ( ~ ( P )W, p ) ) - L( Y(P),XP>). Let us examine now the consequences of the Levi form vanishing identically.
424
Part 4. Additional Topics-Differential Geometry
THEOREM 32.3 If the Levi form vanishes, then the field p + H p of tangent subspaces of T ( N ) is completely integrable. The maximal integral manifolds of this field then define a foliation of N by maximal complex submanifolds. In particular, if N is a hypersurface of M (that is, if dim M = dim N + l), then these complex submanifolds of N are hypersurfaces in N ; hence N may be considered locally as a one-parameter family of complex-analytic hypersurfaces? of M . (Such objects are called hyperplanoids in the classical literature.) Conversely, if a real hypersurface of M has this geometric property, then its Levi form vanishes. Prooj: To prove integrability of p H p , we must show that [ H , N ]c H. If X , Y E H , L ( X , Y ) = 0 if and only if J [ J X , Y ] is tangent to N , hence if [ J X , Y ] also belongs to H . This condition is obviously equivalent to [H,H 3 c H . That the maximal integral submanifolds of the field H , 4H, are complexanalytic submanifolds of M is clear from Theorem 32.1, since J(Hp)= H p , and the tangent space to the maximal integral submanifolds is precisely H , . The converse is obvious. --f
For the rest of this chapter, we shall concentrate on the case where N is a real hypersurface. We can give a more convenient characterization of the hyperplanoids. THEOREM 32.4 The hyperplanoids that are real-analytic are locally, precisely the hypersurfaces that can be written as f = 0, wheref'is the real part of a holomorphic function f + J-1 g = F. Proof First notice that a hypersurface determined by f = 0 can also be written locally as the locus determined by
,/-
1 F - t = 0,
where t is a real variable;
that is, the hypersurface is composed of a one-parameter family of complexanalytic hypersurfaces. Conversely, suppose that A is a complex manifold of one complex dimension less than M , and that 4 : A x R + M is a real-analytic submanifold mapping such that, for fixed ?, the mappingp + $(p, y ) of A + M is holomorphic. (This is precisely what is meant by a hyperplanoid.) We can suppose without loss in generality that M is C" itself, and that A is There is a constant confusion in the terminology of complex manifold theory between real and complex dimension. A " complex-analytic hypersurface" is a complex submanifold of two less real dimensions.
425
32. Complex Manifolds and Submanifolds
C"-'. Since 4 is real-analytic, we can extend 4 to a mapping ofC"-' x C + C" by extending t to be a complex variable. The condition that 6 be a submanifold map requires that this extended map of en-' x C + C" have nonzero Jacobian. Then, by the implicit function theorem, there is (always, locally, of course) an inverse holomorphic map C"+ C'-' x C. Following this map by the projection C"-' x C + C , we obtain a holomorphic function F on c", that is, on M . The image of N in M is characterized by the condition that F take real values on N ; that is, N is obtained by setting the real part of F equal to zero. Q.E.D.
fl
All through this chapter there has been, in the background, an analogy with the theory of submanifolds of affinely connected spaces that was built up in Chapter 27, with the Levi form analogous to the second fundamental form. We can now make this analogy more explicit. THEOREM 32.5 Suppose that, in addition to the complex structure, M has an affine connection V with zero torsion tensor such that the covariant derivative of the J-tensor defining the complex structure is zero. Let N be a submanifold of M , let S( , ) be its second fundamental form with respect to the affine connection, and let L( , ) be its Levi form with respect to the complex structure. Then L(u, v) = S(u, v ) S(Ju, Jv) for u , v E H , , p E N. (32.11)
+
Proof. The condition that the covariant derivative of the J-tensor be zero is explicitly for X, Y E V ( M ) . V, J( Y) = JV, Y
The torsion-free condition is V, Y - V, X = [ X , Y]. Then, for X , Y E H , J[JX, Y] = J(V,, Y - V,JX)
= V,,JY
+vyx.
Taking the value of both sides at p E N and projecting mod N , gives (32.1 1). Q.E.D. The simple formula suggests a relation between " affine conversity " and pseudoconvexity." We shall discuss this only in case N is a hypersurface, so that dim M p / N , is 1. Identifying it with R, recall that N was said to be convex if S : N , x N p + R kept a fixed sign. Notice then that L : H p x H p -+ R also keeps a fixed sign: This property of L is called "pseudoconvexity." Notice, for example, that the flat affine connection on R2" satisfies the hypotheses of Theorem 30.5 when R2"is identified with C",so that a hypersurface of Euclidean space that is convex in the usual sense is pseudoconvex. This analogy can be pursued much further, but this would involve us in another book describing the theory of functions of several complex variables. "
426
Part 4. Additional Topics-Differential
Geometry
The last topic we shall touch on is the question of the geometric nature of “domain of holomorphy” idea. To someone who is familiar only with the classical theory of holomorphic functions of one complex variable, the theory for several variables seems very bewildering, since many of the general principles that are familiar in the one-variable case are completely different in the case of several variables. For example, any domain in C ’ is a “domain of holomorphy ; that is, there are holomorphic functions in the domain that cannot be extended to be holomorphic in any larger domain. The situation is completely different in C” for n 2 2. The simplest example was pointed out by Hartogs, namely: Any function that is holomorphic in the region between two concentric spheres can be extended to a holomorphic function in the interior of the bigger sphere. It turns out the geometric key to this phenomenon is that the sphere is strongly convex; hence it is also pseudoconvex. (Notice that “pseudoconvexity” for real curves in C ’ makes no sense, since H , must always be zero in this case.) Now the most definitive results along these lines has been proved by Grauert, Kohn, and H. Rossi. We cite Kohn [l] for further details. Again, these involve analytical techniques that transcend the scope of this book; we shall present only a simple remark that gives some intuitive geometric insight into their results. ”
THEOREM 32.6 Let N be a real hypersurface of a complex manifold M . Suppose that the Levi form of N is nonzero at each point of N . Let f be a function on M that is the real part of a holomorphic function. Then the derivatives of fat points of h7 in direction normal to N are determined by derivatives o f f i n directions tangential to N . In particular,fcannot be constant on N unless it is identically constant on M . Proof. Let X E H . By hypotheses, for each p E N we can choose X so that L ( X ( p ) , X ( p ) ) is not tangent to N . Then J[JX, X I is not tangent to N at p ; hence, also in a certain neighborhood of p . Then any vector field 2 in a neighborhood of p can, after multiplication by a factor, be written as J [ J X , X I + Y, where Y is tangent to N . Suppose/+ figis holomorphic on M ; that i s ,f a nd g satisfy (32.1). Then
w-1= J C J X , Xl(f> + Y ( f >= - CJX, Xl(S) + W ) = X ( J X ) ( S ) - ( J X ) ( X ) ( g )+ W ) = X ( X ) ( f ) + ( J X ) ( J X ) ( f >+ Y ( f ) .
The left-hand side involves a derivative offin a normal direction to N , while the right-hand side involves derivatives that are in direction tangent to N . The argument can be iterated to show that all normal derivatives can be so expressed. Q.E.D.
33
Mechanics on Riemannian Manifolds
Recall from our short exposition (in Chapter 11) of classical mechanics of particles and waves that many things are closeIy related to the geometry of Euclidean spaces. Now that we have acquired more experience with Riemannian geometry, it is interesting to extend these ideas to Riemannian spaces. This is more than an academic exercise. Some ideas become simpler when looked at from such a Riemannian standpoint, and some (for example, constrained motion) must inevitably involve Riemannian geometric ideas, even if the classical treatments manage to disguise this point rather well. Arnold has recently made this point [2], and some of our work will follow his ideas.
Newton’s Equations on an Affinely Connected Manifold Let M be a manifold, with an affine connection, defined by a covariantderivative operation V. If a: t -+ o(t) is a curve in M , if 0: t + v ( t ) is a vector field on a, then Vv(t) denotes the covariant derivative of the vector field. Recall how this is defined. Suppose X , Yare vector fields on A4 such that Then
o’(t) = X(o(t)),
u ( t ) = Y(a(t)).
Vu(t) = v, Y(a(t)).
This immediately enables us to formulate Newton’s equations. Let F, a “force field,” be a map: T ( M ) x R -+ T ( M ) . Newton’s equations with this force field are Vm(a’(t)) = F(a’(t), t ) (33.1) (m is a tensor field on M such that its value at a point p is a linear map: M p + M , . It will be called the muss tensor). D’Alembert’s principle also makes sense on an affinely connected manifold. Let N be a submanifold of M , and let p -+ Np’, be a field of transversal subspaces of M,, defined for p E N , such that
Mp=Np@Np’
forpsN.
Suppose a “particle” with mass tensor m moves under a force law F, with 427
428
Part 4. Additional Topics-Differential Geometry
the additional condition that it is constrained to be an N . “D’Alembert’s principle now prescribes that the forces of constraint be in the transversal direction defined by N’, that is, ”
(33.2)
Vm((i’(t)) - F((i’(t), t ) E N & t ) .
In Chapter 11, we described a method for writing these equations in more explicit form. We can now show how this is done from the affine connection point of view. Recall (Chapter 28) that the induced affine connection V, is defined on N as follows: If (i: t -+ u ( t ) is a curve along (i tangent to N , then V u ( t ) = projection of V u ( t ) on N o ( t ) ;that is, Vu(t) - Vv(t)
= S(o’(t),v ( t ) ) E
(33.3)
N&),
where S(o’) is the second fundamental form of the submanifold. Apply this to (33.2): We obtain the equations
+
V, m((i’(t)) S((i’(t))- F
E
N:(t);
that is, (33.4) (33.5) where F,, and Fl are the projection of F tangent to and transversal to N . Notice that Newton’s law for the constrained motion, (33.4), is of the same form as Newton’s law on M . Newton’s Law of Motion and Killing Vector Fields
Suppose that M is a manifold with a Riemannian metric. In fact, we do not need to assume that the metric is positive-definite. Let ( , ) be the inner product on vector fields defining the metric, and let V be the Levi-Civita affine connection associated with the metric. Consider a force-field F and a particle moving along a curve t + a ( t ) according to Newton’s law: V(o‘(t)) = F(o‘(t), t).
(33.6)
(For simplicity, we assume that the mass tensor is the identity. It can always be absorbed in F.) Suppose Xis a vector field on M . Define a functionf, on T ( M ) as follows: fx(u) = (0, X(P)>
for P E M ,
2)
E
M,.
33. Mechanics on Riemannian Manifolds
429
We want to investigate whether f, is a “ conserved quantity,” that is, whether (d/dt)f,(o’(t))= 0, where o(t)satisfies (33.6). In fact we have d --fx(o’(t)) = (Vo’(t),X )
dt
= (F(a’(t), t),
+ (o1V0,X ) X)
+ (o’,V , ? X ) .
Suppose that X is a Killing vector field, that is, is the infinitesimal generator of a one-parameter group of isometries of M . Then we know (Chapter 28) that the condition for this is that (v, V , X ) = 0
for all v E T ( M ) .
Hence we see that d -fx(a’(t)) = (F(o’(t), t ) , X(o(t>)>. dt
(33.7)
This is a remarkably simple formula that accounts for the comparative simplicity of the equations of motion of those mechanical systems whose configuration space admits a transitive group of motions.
Newton’s Equation of Motion on a Lie Group; Euler’s Equations of Rigid Body Motion As we saw in Chapter 14, if a Lie group G acts transitively and simply on the configuration space of a system of particles, Newton’s equations of motion take the form
Va’(t) = F(a’(t), t ) ,
(33.8)
where V is the affine connection defined by a right-invariant metric ( , ) on G. Consider G, the Lie algebra of G, as the subalgebra of V ( G )consisting of the right-invariant vector fields. Then ( X , Y ) = constant
for X , Y EG .
(33.9)
Suppose X : t + X ( t ) is a curve in G such that X ( t ) ( o ( t ) )= d ( t ) for all t, where t 4 o(t)is the solution of (33.8). Then
V,,,,(X(t)> = F(o’(tL0But the left-hand side equals
dX dt
-(o(t)) 4- v, x .
430
Part 4. Additional Topics-Differential Geometry
Now, for YE G, we have, using (20.2),
( V X X , y > = X ( ( X , 0 )- Y ( ( X , X > ) + ( X , [ Y , X I > = ( X , CY, X I > , by (33.9). Suppose now that B(X, Y ) is a nondegenerate symmetric, bilinear form on G that is invariant under the adjoint representation; that is, B ( [ Z , X I , Y ) + B ( X , [ Z , Y ] )= 0
for X , Y, Z
E
G.
(Such a form, the Killing form, exists if G is semisimple, a condition that is adequate for our purposes. See Helgason [ I ] or Hermann [8].) Let A be the symmetric linear transformation: G + G such that for X , Y E G .
( X , Y ) =B(AX, Y )
Then we have
or
i: 1
B A -,
Y - B ( [ A X , X I , Y ) = B(AF, Y ) .
Since this holds for all Y E G, we have dX A-=[AX, dt
X ] +AF
or
dX dt
-=
A-'[AX, X ]
+ F.
(33.10)
In particular, if A = identity, that is, if the metric ( , ) is left-invariant also, then dXjdt = F. The equations are then (for the case G = SO(3, R)) equivalent to Euler's equations (16.14). In the force-free case, F = 0, they determine the geodesics of the right-invariant metric on G. (Arnold has particularly pointed out [2] the importance of these equations for problems in point and fluid mechanics.)
Bibliography ABRAHAM, R. [l] “Foundations of Mechanics.” Benjamin, New York, 1967. W. AMBROSE, [ l ] The Cartan structural equations in classical Riemannian geometry. J. Indian Math. SOC. 24, 23-76 (1960). ARNOLD, V. [l] Small denominators and problems of stability of motion in classical and celestial mechanics. Russian Math. Surveys 18, 85-192 (1963). [2] Sur la gtometrie difftrentielle des groupes de Lie de dimension infinie et ses applications a l’hydrodynamique des fluides parfaits. Ann. Inst. Grenoble, 16,3 19-361 (1966). AUSLANDER, L., and MACKENZIE, R. E. [l] “Introduction to Differentiable Manifolds.” McGraw-Hill, New York, 1963. R. BISHOP,R., and CRITTENDEN, [ l ] “Geometry of Manifolds.” Academic Press, New York, 1964. BLISS,G. A. [l] The problem of Mayer with variable end points. Trans. Am. Math. SOC.19,305-314 (1918). [2] “Lectures on the Calculus of Variations.” Univ. of Chicago Press, Chicago, Illinois, 1946. BRILLOUIN, L. [ I ] “Tensors in Mechanics and Elasticity.” Academic Press, New York, 1964. CARATHEODORY, C. [l] Untersuchungen iiber die Griindlagen der Thermodynamik. Maih. Ann. 67,355-386 (1909). [2] “ Variationsrechnung.” Teubner, Leipzig, 1935. CARTAN, E. [ I ] “ LeCons sur les Invariants Integraux.” Hermann, Paris, 1922. [2] “ GbmCtrie des Espaces de Riemann.” Gauthier-Villars, Paris, 1952. [3] “ L e ~ o n ssur la MBthode de la RCpkre Mobile.” Gauthier-Villars, Paris, 1936. S. S., and KUIPER, N. CHERN, [I] Some theorems on the isometric imbedding of compact Riemannian manifolds in Euclidean space. Ann. Math. 56, 422-430 (1952). CHEVALLEY, C. [l] “Lie Groups.” Princeton Univ. Press, Princeton, New Jersey, 1946. CHOW, W. L. [ l ] Uber Systeme von linearen partiellen differential Gleichungen. Math. Ann. 117, 89-105 (1940). COURANT, R., and HILBERT, D. [ I ] “ Methods of Mathematical Physics,” Vol. 11. Wiley (Inteiscience), New York, 1962. EINSTEIN, A. [ I ] “The Meaning of Relativity.” Princeton Univ. Press, Princeton, New Jersey, 1950. FEDERER, H. [l] Curvature measures. Trans. Am. Maih. SOC.93, 418-491 (1959). 43 1
432
Bibliography
FLANDERS, H. [ l ] “ Differential Forms, with Application to the Physical Sciences.” Academic Press, New York, 1963. FROHLICHER, A. [l ] Zur Differentialgeometrie der komplexen Structuren. Math. Ann. 129, 50-95 (1955). GARABEDIAN, P. [ l ] “Partial Differential Equations.” Wiley, New York, 1964. GELFAND, 1. M., and FOMIN,S. [I] “Calculus of Variations.” Prentice Hall, Englewood Cliffs, New Jersey, 1963. GELFAND, 1. M., and SILOV, G. E. [l] “ Generalized Functions.” Academic Press, New York, 1964. GOLDSTEIN, H. [l ] “ Classical Mechanics.” Addison-Wesley, Reading, Massachusetts, 1951. GOLUBEV, V. V. [ l ] “Lectures on Integration of the Equations of Motion of a Rigid Body about a Fixed Point.” Off. of Tech. Serv., U. S. Dept. of Commerce, Washington, D. C., 1960. HALKIN, H. [ I ] The principle of optimal evolution. In Intern. Symp. on Nonlinear Dzfferential Equations and Nonlinear Mechanics, 196l (J. P. LaSalle and S. Lefschetz, eds.), pp. 184-302. Academic Press, New York, 1963. HARTMAN, P., and NIRENBERG, L. [ l ] On spherical image maps whose Jacobians do not change sign. Am. J. Math. 81, 901-920 (1959). HELGASON, S. [ l ] “Differential Geometry and Symmetric Spaces.” Academic Press, New York, 1962. HERMANN, R. [ l ] On geodesics that are also orbits. Bull. Am. Math. SOC.66, 91-93 (1960). [2] On the accessibility problem in control theory. In Intern. Symp. on Nonlinear D$ferential Equations and Nonlinear Mechanics, 1961 (J. P. LaSalle and S. Lefschetz, eds.) ,pp. 325-332. Academic Press, New York, 1963. [3] Convexity and pseudoconvexity for complex manifolds. J. Math. Mech. 13. 243-248 (1964). [4] Second variation for variational problems in canonical form. Bull. Am. Math. Sac. 71, 145-149 (1965). [5] Second variation for minimal submanifolds. J. Math Mech. 16,473492 (1966). [6] Remarks on the foundations of integral geometry. Rend. Circ. Mat. Palermo 9, 91-96 (1960). [7] Some differential-geometric aspects of the Lagrange variational problem. Illinois J . Math. 6, 634-673 (1962). [8] “ Lie Groups for Physicists ” Benjamin, New York, 1966. [9] Equivalence invariants for submanifolds of homogeneous spaces. Math. Ann. 158, 284-289 (1965). KLINGENBERG, W. [ l ] Zur affinen Differentialgeometrie. Math. Z . 54, 65-80 (1951). S., and NOMIZU,K. KOBAYASHI, [l ] Foundations of Differential Geometry.” Wiley (Interscience), New York, 1963. KOHN,J. J. [I] Harmonic integrals on strongly pseudoconvex manifolds. Ann. Math. 78, 112-148 (1963). LEVI-CIVITA, T. [ l ] “The Absolute Differential Calculus.” Blackie, London, 1928. I‘
Bibliography
433
[2] “ CaractCristiques des Systkmes Diff&entielles.” F. Alcon, Paris, 1932. LICHNEROWICZ, A. [l] “ ThCories Relativistes de la Gravitation.” Masson, Paris, 1955. MASSEY, W. S. [l] Surfaces of Gaussian curvature zero in Euclidean 3-space. Tshuku Math. J. 14, 73-79 (1962). MILNOR,J. [I] “Morse Theory.” Princeton Univ. Press, Princeton, New Jersey, 1963. MORSE,M. [I] “Calculus of Variations in the Large.” Am. Math. SOC.,Providence, Rhode Island, (1935). MUNKRES, J. [l] “ Elementary Differential Topology.” Princeton Univ. Press, Princeton, New Jersey, 1963. A., and NIRENBERG, L. NEWLANDER, [l] Complex analytic coordinates in almost complex manifolds. Ann. Math. 65, 391-404 (1957). NOMIZU,K., and OZEKI,H. [l] The existence of complete Riemannian metrics. Proc. Am. Math. Soc. 12, 889-891 (1961). OTSUKI,K. 111 On the existence of solutions of a system of quadratic equations. Proc. Japan Acad. 29, 99-100 (1953). PONTRJAGIN, L. [I] “ The Mathematical Theory of Optimal Processes.” Wiley (Interscience), New York, 1962. PRAGER, W. [I] “Introduction to the Mechanics of Continua.” Ginn, Boston, 1961. RADON, J. [ l ] Zum problem von Lagrange. Hamburg Math. Einzelschriften, Number 2. Teubner, Leipzig (1928). ROXIN,E. [ I ] A geometric interpretation of Pontrjagin’s maximal principle. In Intern. Symp. on Nonlitiear Differential Equations and Nonlinear Mechanics, 1961 (J. P. LaSalle and S. Lefschetz, eds.). Academic Press, New York, 1963. SPIVAK,M. [I] “Calculus on Manifolds.” Benjamin, New York, 1965. STERNBERG, S. [ I ] “Lectures on Differential Geometry.” Prentice Hall, Englewood Cliffs, New Jersey, 1964. F. TRICOMI, [ I ] “Differential Equations.” Hafner Publ. Co., New York, 1961. WHITTAKER, E. T., and WATSON, G. N. [I] “A Course of Modern Analysis.” Cambridge Univ. Press, London and New York, 1940. WHITTAKER, E. T. [l] “A Treatise on Analytical Dynamics of Particles and Rigid Bodies.” Cambridge Univ. Press, London and New York, 1959. YOSIDA, K. [l] Introduction to Functional Analysis.” Springer, Berlin, 1965. ‘I
This page intentionally left blank
Subject Index
A
Abelian, 95 Abelian group, 180 Action, 135, 136 Addition formula, 233, 239 Adjoint action, 88, 90 Affine connection, 261, 262, 269, 270, 313, 378, 381, 384, 425, 427 Affine transformations, 89 Algebra, 73 Algebraic topology, 62 Angular momentum, 100, 109 Arc length, 274 Arcwise connected, 24 Associative law, 81 Atlas, 24 Augmented index, 414 Autonomous system, 36 Automorphism, 342
B Barriers, 36 Betti number, 182 Bianchi identities, 341 Boundary, 113 condition, 402,406, 415 Bounded operator, 94 Bracket, 87 C
Calculus of variations, 99, 130, 151-153 Cartan form, 118 Cartan-1 form, 154, 170, 183, 213 Cartan-Maurer form, 217 Cartan subalgebra, 228 Cartan subgroup, 227 Cauchy-Riemann equations, 421
Celestial mechanics, 136, 137, 139, 231 Characteristic, 154, 167, 175, 182, 184, 189, 213,363,364,394,400 curve, 118, 147, 171, 172 function, 162 Chart, 24 Chow’s theorem, 247 Christoffel symbols, 262, 324 Classical groups, 96 Classical mechanics, 81, 98, 152, 179, 427 Closed subgroups, 92 Commutator, 84 Compact, 6, 24 Compatible mapping, 309 Complete solution, 137 Completely integrable, 68 Completely integrable systems, 71 Completely integrable vector field system, 78 Completeness, 285, 287, 293, 299 Completion of a path system, 243 Complex lie algebra, 84 Manifold, 420, 426 submanifold, 422 vector space, 96 Conformal Riemannian metric, 288 Conjugate point, 293, 294, 296, 299, 305, 313,316,332,411 Configurationspace, 176,185,219,429 Connected component, 345 Connected Lie group, 91 Connected subgroup, 92 Connection, 24, 95, 97 automorphism, 379, 380, 384 forms, 324, 336, 381 Cononical coordinate system. 95 Cononical coordinate system of the second kind, 95 Cononicalform, 40,125,178 Cononical neighborhood, 95 Cononical transformations, 81, 141, 179,231 Conservation laws, 100 435
436
Subject Index
Conserved quality, 99, 429 Constant curvature, 359 Continuous induction, 284 Continuous mechanics, 104 Contraction, 11,47,114 Constraint subset, 117 Convex, 21,376 hypersurface, 375 Coordinate neighborhood, 24,51 Coordinate patch, 62 Coordinate system, 25 Cosets, 93 Covariant, 98 derivative, 293, 310, 312, 339, 357, 359, 360,425 differentiation, 262-264,268,269 tensors, 25 Covector, 11,12,17,19,20,23 Covering, 24 Covering space or map, 286,290 Criticalpoints, 114,147,181,242 Cross section, 9, 10,17-20,22,26,63 Curl, 3,99 Curvature, 266, 297, 303, 323, 326, 335, 359, 382
D D’Alembertian, 386 D’Alembert’s principle, 102, 109, 151, 427, 428 Deformation, 152,153 of submanifolds, 362 Derivation, 10,22,23,34,63 Determinant, 14, 17,20,64 Developable surface, 362,365 Diffeomorphism, 25,28,39,51 Differential, 19,25 forms, 4.14,17-20,23,21,46 ideal, 1 13 manifold, 23,27 one parameter subgroup, 88 operator, 10,34,402 Dirac delta function, 56 Disconnected, 97 Distance function, 281,282 Divergence, 3, 139,386,399 Domain, 21,23,35,39, 107 of convergence, 83 of holomorphy, 426 Dot product, 3
Dual mapping, 15 Dual space, 11 Duality principle for path system, 250 Dynamics, 102
E
Earth, 140 Eigenvalue, 209, 210, 337, 390 Einsteinian mechanics, 188, 192 Electromagnetism, 98 Elliptic functions, 227, 232, 240 Elliptic integral, 235 Elliptical orbit, 140 Energy, 99, 207 Entropy, 254 Equation of continuity, 106 Euclidean dot product, 98 Euclidean space, 3, 22, 24, 98, 164,276 Euclidean geometry, 102, 106 Euler angles, 223, 224, 227, 231 Euler equations, 42, 99, 103, 151-154, 163, 168, 169, 222,299 Euler’s equations of fluid motion, 108 Euler relation, 161, 165, 184, 389 Eulerian, 104 Exponential mapping, 91 Extended curve, 154 Exterior algebra, 113, 177, 178 Exterior algebra, 113, 177, 178 derivative, 20, 46, 113 differentiation, 18 product, 12 External forces, 100 Extremal, 116, 152, 158, 165, 171, 174, 183, 184 curve, 119 field, 142,143, 162-164,168,173
F Fermat’s principle, 168 Fiber, 9 bundle, 348, 349 First fundamental form, 319 Flat coordinate system, 66 Fluid mechanics, 98 Focal point, 330 331, 344, 403, 407, 408 Foliation, 71 Force law, 98 Foundations of Lie group theory, 90
Subject Index
Frobenius complete integrability, 364 Functional analysis, 85 Functionally independent, 75, 79 Fundamental solution, 393
G Galilean group, 192, 199,203, 207- 209 Galilean transformation, 194 Gaussian curvature, 308, 362, 366 Generalized functions, 56, 59 Generalized manifold, 349 General position, 31, 33 General relativity, 399 Geodesic hall, 288 Geodesically convex, 334, 335 Geodesic deformation, 292 Geodesic triangle, 308 Geodesics, 191, 261, 275, 278,282, 285,286, 292,297,299,314,331 Geometric physical compatibility conditions, 374,400 Geometrical optics, 136 Global integrals, 71 Gram-Schmidt orthogonalization, 320 Grassman variety, 359 Grassman manifolds, 93 Gravitation 98 Greens, formula, 393 Gradient, 3, 277, 336, 386 Group, 81 H Hamilton equations, 99, 103, 104, 118, 129, 130,132,134,136,140,154,155 Hamilton-Jacobi equation, 134, 135, 137, 139,173,386,389 Hamilton-Jacobi partial differential equation, 131 Hamilton-Jacobi theory, 122,129, 130, 134136,139,168,174 Hamiltonian, 130-132, 137-141, 154, 155, 178,182,185,186,215,225,226 system, 179 Harmonic function, 390,393 Hausdorff, 6,24 Hermatian bilinear, 96 Hessian, 147,375,377 Hilbert space, 94 Holomorpic, 420 Hopf-Rinow theorem, 284,286,289,301
437
Homeomorphism, 71,87,91,177 Hyperplanoids, 424 Hypersurface, 59,60,107,336
I Ideal, 175 Identity map, 92 Immersed submanifold, 318 Immersion, 30, 32 Implicit function theorem, 25, 28, 33, 58 Index, 404 Infinite dimensional, 82 Infinite dimensional manifold, 114,241 Infinitesimal deformation, 114, 116, 121, 152,280,293 Infinitesimal generator, 44, 83, 87-89, 95, 159,354,378 Infinitesimal version of adjoint action, 90 Inner product, 12 Integral,40,51 curve, 35,44,68,89 curves, of vector fields, 178 equation, 85 function, 64,75 geometry, 50,57,59 manifold, 116 map, 64 submanifold, 67,123,124,126,127 Integrability condition, 65,421 Integration factor, 252 on manifolds, 60 over fibers, 55,57 Intersection, 31 Inverse, 81 Inverse function theorem, 28 Involution, 179 Involutive automorphism, 228 Isoenergetic reduction, 190 Isometric, 287,362 imbedding, 362,366,368 Isometries, 342,361 Isomorphic, 78 Jsoparametric function, 391 Isotropy subgroup, 93,94,343,346,353,359 J Jacobi-bracket, 18, 34, 36, 37, 44,63, 263, 353 Jacobi identity, 35,84
438
Subject Index
Jacobian, 51,135, 138 Jacobi vector fields, 291, 293, 294, 297 298, 315,331,333,356 Jacobian matrix, 25 K Kepler problem, 141 Kernel, I6 Killing form, 430 Killing vector field, 354-356, 359 Kineticenergy, 99, 100, 102, 156,186 Kronecker delta, 11
L L'HopitSll's rule, 306 Lagrange multipliers, 147,246,247,255 multiplier rule, 145 problem, 103 variational problem 142,148,151,244 Lagrangian, 117, 142, 143, 146, 148, 151, 152, 154, 155, 157, 160, 162, 164, 165, 167, 168, 170, 171, 174, 183, 185, 187, 219, 220, 224, 225 Laplace-Beltrami operator, 386,393, 398 Lyiplace operator, 27 Leaf, 69,70 Lebesgue intergral, 53 Left'invariant, 89 Left translation, 88,93 Legendrecondition, 147,151 Levi-Civita affine connection, 273 Levi form, 423,425 Lie Algebra, 35,63,65,78,84,217,353 algrebra of lie group, 86 algebra of vector fields, 75,174,178 derivative, 10, 18,40,41, 46,47, 106,354, 396 groups, 26,34,70,8 1,175,342 series, 249 sbbalgebra, 65,74,91 theory, 73 transformation group, 82,94 Linear automorphisni, 94 Linear homogeneous differential equations, 79 Linear lie algebra, 78 Linear operator, 85 Linear part of a vector field, 379
Linear representation, 82,88 Linear transformation, 83 Local moving frame, 321 Lorentz group, 192, 199,203,205,207,209, 342 Lorentz metric, 386
M Mayer variational problem, 244 Manifold, 26,30,35,50 Mass, 98,196 Matrix, 15 Riccati systems, 77 Maximal connected integral submanifold, 67,69,91 Maximal integral submanifold, 126,127 Maximal point, 64,65,345,346 Maximal rank mapping, 29,32,58,158 Maximal subalgebra, 78 Maxwell's equations, 201 Mean, 140 Mean curvature, 391,392 Measure zero, 57 Metricspace, 285,348 Metric structure, 99 Michelson-Morley experiment, 201 Minkowskian geometry, 195 Minimal submanifold, 331,332 Module, 10,22,30,46 Moment of inertia, 220 Momentum, 100,207 space, 176 Mongesystem, 254,255 Morseindex theorern,401,419 Moving frames, 26,383,384 Multilinear, 11 Multiple integral, 113 N Negatively oriented coordinate system, 50 Newtonian mechanics, 185, 186, 190, 192, 195 Newton's equations, 427 Newton's equations of motion, 100,151 Newton's law of motion, 98,155 Nilpotents, 95 Nonholonomic, 151 Nonsingular points, 57 Normal bundle, 343
Subject Index Normalvector bundle, 318 Number theory, 71
0 Observables, 178,179 One parameter group, 38,39,44,48,94,159,248 subgroup, 82,353 Operator power series, 83 Optimality, 145 Optimalcontrol, 151,256 Orbit, 83,93, 178,229, 343,344,346,351 space, 347,349 Ordinary differential equations, 38 Orientable, 50,57 Orientation, 50 Oriented manifold, 60,62 Orthogonalgroup,218 Orthogonal matrices, 96 Orthogonal matrix, 217 Orthogonal transformations, 156 Osculatingspace, 381 P Parallel translation, 264,265,270,359 Partial differential equations, 59 Particle 98 Partition ofunity, 50,52,61,70 Path,241,277,279 system, 241,249 Periodic, I39 Permutations, 81 Perpendicular, 164,167,173 Perpetual motion machine, 244 Perturbation, 139 theory, 139 Peter-Weyl theorem, 208 Pfaffian equation, 243,254,255 Phasespace, 176,178,179,242 Plancherel theorem, 208 Planck'sconstant, 136 PoincarC-Birkhofffixed point theorem, 182 PoincarClemma, 176 Point set topology, 6 Poisson bracket, 176178,231 Polar coordinates, 27 Positive orientation, 60 Positive oriented coordinate system, 50 Potential energy, 99 Power series, 83,85
439
Principal axes, 220 Principal curvature, 325,365 Principal point, 345,346 Principle of least action, 136 Principle of Maupertuk, 188,190 Projection, 93 Projective space, 359 Prolonged vector field, 158 Proper map, 85 Pseudoconvexity, 425,426 Pseudo-Riemannian manifold, 392,397,400 Pseudo-Riemannian metric, 195,272
Q
Quadrature, 43,75,227 Quantum field theory, 9 Quantummechanics, 81,136,179,188
R
Rational lie algebra, 84 Rauch comparison theorem, 313 Ray(s), 144,168,172,201 Real lie algebra, 84 Regular Variational problem, 155, 161, 162, 173,186 Regularly imbedded submanifold, 70,71, 343,344,351 Relativity, 272 RBpere mobile, 321 Riccati-Equation, 44 Ricci curvature, 305 Ricci identity, 302,358 Ricci tensor, 299,300 Riemannintegral, 51,53,58,60 Riemannian affine connection, 272-275 Riemannian connection, 273, 274, 321323 Riemannian geometry, 257,261 Riemannian metric or manifold 27, 165, 191, 261, 272, 275, 277, 280, 287, 289, 292, 294, 297, 308, 310, 312, 318, 322, 331, 342, 349, 362, 420 Riemannian space, 228 Rigid body, 109, 172, 216, 217, 219, 221, 223,232,429 Rigidly related submanifolds, 366 Right invariant, 89 Right translation, 88 Routhian, 215 Ring, 8
Subject Index
440 S
Sard’s Theorem, 331 Scalar field, 3 Scalar potential, 185, 186 Schwarz inequality, 279,280 Second fundamental form, 319, 320, 326, 333, 335, 337, 338, 363, 373, 376, 382, 384,425 Second variation, 117,257 formula, 119, 121, 291, 295, 325, 327, 335 Sectional curvature, 302,305, 308, 316, 337 Self-parallel curves, 265 Semigroups, I80 Simply connected, 181,270,290 Singularities of mappings, 54 Skew symmetric, 17 matrices, 96 Slice, 69,70 Special relativity, 203,208,342 Sphere, 23,230 Spherical harmonics, 208,389,390 Soap bubbles, 332 Solvable, 214 Stokes’ formula, 57,60,113, 114, 116,392 Stokes’ theorem, 107 Stress tensor, 107 Structure coefficients, 220 Sturm comparison theorem, 315 Sturm-Liouville theory, 401 Subgroup, 92 Submanifolds, 28,58,65 map, 30,69 Sum of one-parameter subgroups, 87 Summation convention, 21,336 Sun, 140 Surface area, 332 Symmetric space, 358,359 Symmetric spaces, 227,228 Symmetry(ies), 73,170, 174,184, 185,187 T Tangent bundle, 7,24,63,89, 152 Tangent space, 8,9,114
Tangent vector, 7,8,17,19,24,152 Taylor’s expansion, 86 Taylor’s formula for covariant derivatives, 310 Tensor algebra, 11 Tensor analysis, 4,14,21,26,262 Tensor field, 373, 378 Thermodynamics, 254 Top, 226 Topology, 23 Topological groups, 81, 345, 349 theory, 72 Topological vector space, 82 Torsion, 266 tensor, 374, 381 Total angular momentum, 223, 233 Total energy, 99, 103 Totally geodesic submanifold, 337,338,339, 340, 351, 352, 365, 375, 381 Transformation group, 81, 87, 93, 354 Transition function, 382 Transversal, I 16, 120 geodesic, 350 Jacobi field, 330
V Vector analysis, 3, 62, 98 Vector bundle, 9, 11 Vector field, 3, 10, 19, 22,24,27, 34, 35,39, 43,46 system, 63, 68 Vector potential, 185, 186 Vector product, 3 Vector space, 6, I 1,20,401 Velocity, 196, 199, 201, 203, 204,208 Volume element, 14, 19, 58 form, 54 Volume forces, 107
W Wave, 168 Wave equation, 202 Wave fronts, 144,201
Mathematics in Science and Engineering A
Series of Monographs and
Textbooks
Edited by RICHARD BELLMAN, University of S o u t h California
1. TRACYY. THOMAS.Concepts from Tensor Analysis and Differential Geometry. Second Edition. 1965 2. TRACYY. THOMAS. Plastic Flow and Fracture in Solids. 1961 ARIS.The Optimal Design of Chemical Reactors: A Study in Dynamic 3. RUTHERFORD Programming. 1961 and SOLOMON LEFSCHETZ. Stability by Liapunov’s Direct Method 4. JOSEPH LASALLE with Applications. 1961 LEITMANN(ed.) . Optimization Techniques: With Applications to Aero5. GEORGE space Systems. 1962 6. RICHARD BELLMAN and KENNETH L. COOKE.Differential-Difference Equations. 1963 7. FRANK A. HAIGHT.Mathematical Theories of Traffic Flow. 1963 8. F. V. ATKINSON. Discrete and Continuous Boundary Problems. 1964 9. A. JEFFREY and T. TANIUTI.Non-Linear Wave Propagation: With Applications to Physics and Magnetohydrodynamics. 1964 T. Tou. Optimum Design of Digital Control Systems. 1963 10. JULIUS FLANDERS. Differential Forms : With Applications to the Physical Sciences. 11. HARLEY 1963 12. SANFORD M. ROBERTS. Dynamic Programming in Chemical Engineering and Process Control. 1964 13. SOLOMON LEFSCHETZ. Stability of Nonlinear Control Systems. 1965 14. DIMITRIS N. CHORAFAS. Systems and Simulation. 1965 Random Processes in Nonlinear Control Systems. 1965 15. A. A. PERVOZVANSKII. C. PEASE, 111. Methods of Matrix Algebra. 1965 16. MARSHALL 17. V. E. BENES.Mathematical Theory of Connecting Networks and Telephone Traffic. 1965 18. WILLIAMF. AMES.Nonlinear Partial Differential Equations in Engineering. 1965 19. J. ACZEL.Lectures on Functional Equations and Their Applications. 1966 20. R. E. MURPHY. Adaptive Processes in Economic Systems. 1965 Dynamic Programming and the Calculus of Variations. 1965 21. S. E. DREYFUS. 22. A. A. FEL’DBAUM. Optimal Control Systems. 1965 23. A. HALANAY. Differential Equations : Stability, Oscillations, Time Lags. 1966 Time-Lag Control Systems. 1966 24. M. NAMIKOGUZTOBELI. 25. DAVIDSWORDER. Optimal Adaptive Control Systems. 1966 26. MILTON ASH. Optimal Shutdown Control of Nuclear Reactors. 1966 27. DIMITRISN. CHORAFAS. Control System Functions and Programming Approaches (In Two Volumes). 1966 28. N. P. ERUGIN.Linear Systems of Ordinary Differential Equations. 1966 MARCUS. Algebraic Linguistics; Analytical Models. 1967 29. SOLOMON 30. A. M. LIAPUNOV. Stability of Motion. 1966
31. 32. 33. 34. 35. 36. 37. 38. 39. 40.
41. 42. 43. 44.
45. 46.
47.
48.
49. 50.
GEORGE LEITMANN(ed.) . Topics in Optimization. 1967 MASANAO AOKI. Optimization of Stochastic Systems. 1967 HAROLD J. KUSHNER. Stochastic Stability and Control. 1967 MINORUURABE. Nonlinear Autonomous Oscillations. 1967 F. CALOGERO. Variable Phase Approach to Potential Scattering. 1967 A. KAUFMANN. Graphs, Dynamic Programming, and Finite Games. 1967 A. KAUFMANN and R. CRUON.Dynamic Programming: Sequential Scientific Management. 1967 J. H. AHLBERG, E. N. NILSON,and J. L. WALSH.The Theory of Splines and Their Applications. 1967 Y. SAWARAGI, Y. SUNAHARA, and T. NAKAMIZO. Statistical Decision Theory i n Adaptive Control Systems. 1967 RICHARD BELLMAN. Introduction to the Mathematical Theory of Control Processes Volume I. 1967 (Volumes I1 and I11 in preparation) E. STANLEY LEE. Quasilinearization and Invariant Imbedding. 1968 WILLIAMAMES.Nonlinear Ordinary Differential Equations i n Transport Processes. 1968 WILLARD MILLER,JR. Lie Theory and Special Functions. 1968 PAULB. BAILEY,LAWRENCE F. SHAMPINE, and PAUL E. WALTMAN. Nonlinear Two Point Boundary Value Problems. 1968 Iu. P. PETROV. Variational Methods in Optimum Control Theory. 1968 0. A. LADYZHENSKAYA and N. N. URAL'TSEVA. Linear and Quasilinear Elliptic Equations. 1968 A. KAUFMANN and R. FAUKE. Introduction to Operations Research. 1968 C. A. SWANSON. Comparison and Oscillation Theory of Linear Differential Equations. 1968 ROBERT HERMANN. Differential Geometry and the Calculus of Variations. 1968 N. K. JAISWAL. Priority Queues. 1968
In preparation
ROBERT P. GILBERT. Function Theoretic Methods in the Theory of Partial Differential Equations YUDELL LUKE.The Special Functions and Their Approximations (In Two Volumes) HUKUKANE NIKAIDO.Convex Structures and Economic Theory V. LAKSHMIKANTHAM and S. LEELA.Differential and Integral Inequalities KING-SUN FIJ.Sequential Methods in Pattern Recognition and Machine Learning