This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
(t) = 0 if io
0 io
(p (t)=0
if
\t0-t\>6 |*o - * | < 6
fori^io,
t€{l,...,d}.
For this choice of ?, however pb
pto+6
/ h(t)
min as fb Q(rj) := / cf)(t,r](t),f)(t))dt -> min
among all 77 G Z^(J,R d ). (1.3.8)
If u satisfies the assumptions of Theorem 1.3.1, then Q(r]) > 0
for all 77 e D ^ ,
(1.3.9)
and hence 77 = 0 is a trivial solution of (1.3.8). We are interested in the question whether there are others. The Euler-Lagrange equations for (1.3.8) are
jtM*Mt)M*)) = 4>v(tMt)>W))>
(1-3-10)
20
The classical theory
i.e. - (Fpp(t, u(t), u(t))rj(t) + Fpu(t, u(t), = F^t,
u(t))v(t))
u(t), t*(t))i)(t) + F ^ ( t , ix(t), u(t))rj(t).
(1.3.11)
Since it is considered as given, our first observation is that (1.3.11) is a linear homogeneous system of second order equations for the unknown rj. These equations are called Jacobi equations. Definition 1.3.1. A solution rj £ C2(I,Rd) (1.3.11) is called a Jacobi field along u(t).
of the Jacobi equations
L e m m a 1.3.1. Let F £ C3(I x Rd x K d ,R), detF p p (t,u(t),6(t)) + 0 for all t £ I, u £ C2(I,Rd). Then any solution of rj £ AC0(I,Rd), d 2 6Q(ri,
™(t,ri(t),fi(t)) = Fpp(t,u(t),u(t)) (7«), iF(a?i,0i),J 2 := (*i)ID- (2-4.11) By (2.4.10), (2.4.11), ||v>(x) - v»(x,)t| < WtfhW \\x - and + ^ ||x - xi|| + i | M x ) - ^(xi)||, hence \\ (a?) - ^(a?i) = -l^hix 1 . 0 Jo Jo However, F(0) = 1 > 0 = lim R continuous with Co M P < f{v) < C\ |t>|p + C2 for some constants Co, CI, c2. (u, t)). min(e 0 , - ) u i ? ^ 2
for all t and rj
and so the assumption det Fpp(t, u(t), ii(t)) ^ 0, that is seemingly weaker than the one of Theorem 1.2.3, indeed suffices to apply that Theorem. q.e.d. We now derive the so-called necessary Legendre condition: Theorem 1.3.2. Under the assumption of Theorem 1.3.1, i.e. u £ D 1 (/,E r f ) minimizes I in the sense described there, we have that Fpp(t,u(t),u(t))
is positive semidefinite for all t £ I,
i.e. Fpipj (t, u(t), ii(t))?&
> 0
for alH = ( £ \ . . . , £ d ) £ Rd.
(At points where ii(t) is discontinuous, this holds for the left and right derivatives.) Proof. We may assume that to £ I and u is continuous at to. The result at the points where u jumps then follows by taking appropriate limits, and likewise at to — a, b. We then consider 0 < e < min(£o — a, b — to) and define r? £ D^(J,R d ) by
{
0
for a < t < to — e and t0 + c < t < b
e£ linear
for t = t0 for to — e < t < to and for t 0 < t < to -f e
1.3 The second variation. Jacobi fields for given ( e l f
21
Then
{
0 £ -£
for a < t < t0 or t0 + e < t < b for t0 - e < t < t0 for t0 < t < t0 + e.
We apply Theorem 1.3.1 to obtain 0 < 62I(u, rj) = / °
Fpipj (t, u(t), u{t))CZJdt + 0(e 2 )
for c -+ 0,
Jto-e
since all other terms contain a factor e, and we integrate over an interval of length 2e. Hence FPv(toMto)Mto))?Zj
= lim -
/
Fpipj{t,u{t),ii(t))Ctjdt
> 0.
€—0 ^e 7 t 0 - c
g.e.d The Jacobi equations and the notion of Jacobi fields are meaningful for arbitrary solutions of the Euler-Lagrange equations, not only for minimizing ones. In fact, Jacobi fields are solutions of the linearized Euler-Lagrange equations. Namely: Theorem 1.3.3. Let F e C3(I x Rd x R d ,R), and let us(t) be a family of C2-solutions of the Euler-Lagrange equations -Fp(t,us{t),us{t))-Fu(t,us(t),us(t))==0,
(1.3.12)
with us depending differentiably on a parameter s 6 (—e,e). Then
is a Jacobi field along u = uo. Proof. We differentiate (1.3.12) w.r.t. s at s = 0 to obtain ~ (Fpp(t, u(t), u(t))fj(t) + Fpu(t, u(t), ii(t))rj(t)) -Fpu(t,
u(t),u(t))fj(t)
- Fuu{t, u(t), ii(t))rf(t) = 0.
i.e. the Jacobi equation (1.3.11).
q.e.d.
Lemma 1.3.2. Let a < a\ < a2
(1.3.13)
22
The classical theory
Proof. Since
Therefore r«2
2/
«/ai
{
(1.3.14)
«/ai
Comparing (1.3.10) and (1.3.11), we see that
0(t,r7,7))dt= /
(<M*,77,77)- ^ M ^
since 77 is a Jacobi
7
) ) ) • *7
field.
q.e.d.
As before, let F be of class C 3 , and let u(t) be a solution of class C2 on [a, 6] of the Euler-Lagrange equations d_F (t, u(t),u(t)) - F (t, u(t), u(t)) = 0. ; p u dt Definition 1.3.2. Let a < a\ < 0,2 < b. We call the parameter value a,2 conjugate to a\ and the point (02,1^(02)) conjugate to (ai,i/(ai)) if there exists a not identically vanishing Jacobi field 77 on [01,02] with 77(01) = 0 = 77(02).
We may derive the important result of Jacobi: Theorem 1.3.4. LetF e C 3 ( J x R d x R d , R ) and suppose ue C2{I,Rd). Suppose that Fpp(t,u(t),ii(t)) is positive definite on I. If there exists a* with a < a* < b that is conjugate to a, then u cannot be a local minimum of I. More precisely, for any e > 0, there exists v E Dl(I, Rd) with v(a) = u(a), v(b) = u(b), sup (\u(t) - v(t)\ + \ii(t) - v±(t)\) < e tei
and I(v) < I(u). Proof. Let rj(t) be a nontrivial Jacobi field on [a, a*]. We put *?*(*)
f rj(t) \ 0
fora
1.3 The second variation. Jacobi fields Then 77* e Dl(I,Rd),
23
and by Lemma 1.3.2 Q(V*)=
I
If u were a local minimum, then by Theorem 1.3.1 0 < 62I(u,rj) = Q(r))
for all fj € Dj(7,R d ).
Hence rj* would be a minimizer of Q, hence by Lemma 1.3.1 rj* £ C2(I,md). Since j)*.(a*) = 0, then 77*(a*)=0. Since also rj*(a*) = 0, and since rj* solves the Jacobi equation, a (linear) second order ordinary differential equation, the uniqueness theorem for solutions of such equations implies
a contradiction, because by assumption rj does not vanish identically. Hence u cannot be a local minimizer. q.e.d. In words, Theorem 1.3.4 says that a solution of the Euler-Lagrange equations cannot be minimizing beyond the first conjugate point. Turned the other way round, Theorem 1.3.4 says that if u is a local minimizer, then there cannot be any parameter value a* with a < a* < b that is conjugate to a. It may happen, however, that b is conjugate to a. An example will be given in the next chapter. S u m m a r y . In order to obtain necessary conditions for a solution of the Euler-Lagrange equations d_F (t,u(t),u(t)) ; p dt
=
Fu(t,u(t),u(t))
to minimize I(u)=
I
F(t,u(t),u(t))dt,
Ja
one needs to study the second variation Q{n) := 62I{u,n) = A 2/ ( „ + «,),,_„ for „ € D\. ds
24
The classical theory
If, for fixed it, we consider the variational problem Q(rj) —* 0, we are led to the Jacobi equations - (Fpp(t,u(t),u(t))i)(t) = Fup(tMt)M*))v(t)
+Fpu{tMt)Mt)Mt)) +Fuu(t,u(t),ii(t))i1(t)
for rj. Solutions rj with 77(a) = rj(b) = 0 are called Jacobi fields, a* G (a, 6) for which there exists a nontrivial Jacobi field on [a, a*] is called conjugate to a, and if there exists such a*, u cannot be locally minimizing on [a, 6]. In other words, a solution of the Euler-Lagrange equations cannot be minimizing beyond the first conjugate point.
1.4 Free b o u n d a r y conditions We recall the definition of an n-dimensional embedded differentiable submanifold M of Rd: For every p G M, there have to exist a neighbourhood V = V(p) C Md, an open set t / c l n and an injective differentiable map / : U —> V of everywhere maximal rank n (i.e. for every z G U, the derivative Df(z), a linear map from E n to E d , has rank n) with MHV
= f(U).
An example is the sphere 5 n described in detail in Section 2.1 (Example 2.1.1). The tangent space TPM of M at p then is the vector space Df(z)(Rn). It can be considered as a subspace of the vector space TpMd, the tangent space of Rd at p. As in 1.1, we now consider the variational problem I(u)=
/ F(t,u(t),u(t))dt Ja
—• m i n
with F of class C2. This time, however, we do not impose the Dirichlet boundary condition that the values of u(a) and u(b) were prescribed, but the more general condition that for given submanifolds Mi, M2 (differentiable, embedded) of Md, we require that u(a) eMuu(b)
eM2.
(Dirichlet boundary conditions constitute the special case where M\ and M2 are points.) In this section, we do not consider regularity questions. As an exercise,
1.4 Free boundary conditions
25
the reader should supply the necessary regularity assumptions on F , it, etc. at each step. Let u be a solution. Then, as before, u has to satisfy the EulerLagrange equations, because if u(a) G Mi, 77(a) = 0, then also u(a) + 577(a) G Mi for any s, and likewise at 6, and so we may again consider variations of the form u + srj, rj e DQ. This time, however, also more general variations are admissible. Namely, let us(t) be a family of maps from J into Rd depending differentiably on s G (—e, e), with u(t) = Uo(t) and us(a) G Mi
,
us(b) G M 2 for all s.
Let
Then again
0= ^/(«.)|„ 0 = ^ jf F(t,u(t),u(t))dt^0 = f
{F p (t,u(i),w(t))-J7(i)+i r „(i,u(0,w(i))-ry(<)}di
Jo
= ^{-jtFP =
+ Fuyr1+Fp(t,u(t)Mt))-v(t)\ttZba
Fp(t,u(t),u(t))-v(t)\ttZba,
since it solves the Euler-Lagrange equations. We now observe that 77(a) G TU^M\ (and likewise at 6), since we may find a 'local chart' / as above with Mi f)V(u(a)) = f(U) for a neighbourhood V of u(a) and some open set U C RHl (n\ = dim Mi). By choosing € smaller if necessary, we may assume us(a) G Mi D V = f(U) for s G (~e, e). Since / is injective, there then has to exist a curve 7(5) C U with us(a) = / o 7 ( s ) for all s. Hence 77(a) = £us(a)u=0 = D}'(f1u{a))ir(0) is indeed tangent to Mi at u(a). Moreover, any tangent vector to Mi at u(a) can be realized in this manner. Therefore, since we may choose the values of 77 at a and 6 independently of each other, we conclude F p (a,7i(a),7j(a)) • V = 0 for all V G Tu{a)Mu and likewise F p (6, u(6), u(b)) -W = 0 for all W G Tu{b)M2.
The classical theory
26 We have thus shown:
Theorem 1.4.1. Let u be a critical point of I among curves with u(a) G M\j u(b) G M2 (Mi, M2 given differentiable embedded submanifolds of Rd), i.e. 5^(^s)| s = 0 = 0 for all variations us(t) differentiable in s with us(a) G Mi, us(b) G M 2 for all s G (~e,e) (e > 0). Then u is a solution of the Euler-Lagrange equations for I, and in addition, Fp(a,u(a),u(a)) and Fp(b,u(b),u(b)) are orthogonal to Tu^Mi and TU(5)M2, respectively. In particular, if for example Mi = Rd, then Fp(a,u(a),u(a)) = 0. Summary. If instead of a Dirichlet boundary condition, we more generally impose a free boundary condition that u(a) and u(b) are only required to be contained in given differentiable submanifolds Mi and M2, respectively, of E d , then Fp(a,u(a),u(a)) and F p (6, u(b),u(b)) are orthogonal to these submanifolds for a critical point of / under those boundary conditions.
1.5 Symmetries and the theorem of E. Noether In the variational problems of classical mechanics, one often encounters conserved quantities, like energy, momentum, or angular momentum. It was realized by E. Noether that all those conservation laws result from a general theorem stating that invariance properties of the variational integral / lead to corresponding conserved quantities. We first treat a special case. Theorem 1.5.1. We consider the variational integral I(u) = /
F(t,u(t),u(t))dt,
Ja
with F G C2([a, b] x Rd x Md, E). We suppose that there exists a smooth one-parameter family of differentiable maps hs : Rd -> Rd (the precise smoothness requirement is that h(s,z) := hs(z) is of class C 2 ((-e 0 ,e 0 ) x Md,M) for some e0 > 0), with h0(z) = z
for all
zeRd
1.5 Symmetries and the theorem of E. Noether
27
and satisfying J
FUhs(u(t)),fths(u(t))\dt
=J
FUu{t),jtu{t)\dt
(1.5.1)
for all s G (-e,e) and all u G C 2 ([a,6],R d ). Then, for any solution u(t) of the Euler-Lagrange equations (1.1.4) fori, Fp(t,u(t),u(t))
^-hs(u(t))\s=0
(1.5.2)
is constant in t G [a, 6]. Definition 1.5.1. A quantity C(£,ii(£),it(£)) that is constant in t for each solution of the Euler-Lagrange equations of a variational integral I(u) is called a (first) integral of motion. Proof of Theorem 1.5.1: Equation (1.5.1) yields for any to G [a, 6], using h0(z) = z, 0 = ^ j ° F Uhs(u(t)), = J°
jtha(u(t))\
dt\s=0
{Fu (t, u(t), u(t)) ^h,(u(t))
(1.5.3)
+F p (*,ti(*)^(*))^^aK*))}*U=o. We recall the Euler-Lagrange equations (1.1.4) for u: 0 = ~FP (t,u(t),ii(t)) - Fu (t,u(t),ii(t)).
(1.5.4)
Using (1.5.4) in (1.5.3) to replace F u , we obtain 0=
/ ° {jtFp (*'"(*)'«('))
zh,{u{t))
+F p (t,u(t),«(*)) ^^/i»(«(0)}
(1-5-5)
to
r d_ Fp(t,u{t),u(t)) — hs{u(t))\s=0 dt V "'"'""-"~'~"ds Ja dt
) dt.
Therefore Fp(to,u(to),u(to))~h,-rh s{u(t 0))\s=o s(u(t 0))\s=o
(a,u{a),u(a)) = Fp(a,u(a),u(a))
—
hs(u(a))\s=o (1.5.6) for any to G [a,6]. This means that (1.5.2) is constant on [a, 6]. q.e.d.
28
The classical theory Examples
Example 1.5.1. We consider for u : E —> E 3 n , u = ( i / i , . . . , it n ) with
i.e. a mechanical system in M3 with point masses m^, and a potential V(u) that is independent of the third coordinates of the Ui. Then ha(z) = z + se$, where es is the third unit vector in M3, leaves F invariant in the sense of Theorem 1.5.1. Since d
u I
— ns \s=0 — e3>
as we conclude that n
i=l
i.e. the third component of the momentum vector of the system is conserved. Example 1.5.2. Similarly, if a system as in Example 1.5.1 is invariant under rotations about the e3-axis, and if h8 now denotes such rotations, then (up to a constant factor) d
L
,
— ha\s=oUi
= e3
AUi.
as Hence, the conserved quantity is the angular momentum w.r.t. the e$axis, n
E i=l
F e3 A Ui =
P
E (m*^) " (e3 A Ui) = E (Wi A m^«) * e3t
i
We now come to the general form of E. Noether's theorem Theorem 1.5.2 (Theorem of E. Noether). We consider the variational integral I(u) = / Ja
F(t,u(t),ii(t))dt
1.5 Symmetries and the theorem of E. Noether
29
with F G C 2 ([a,6] x Rd x E d , E ) . We suppose that there exists a smooth one-parameter family of differentiable maps hs = (h°s,hs) : [a,6] x Rd -> E x Rd as
(s G (—eo,€o)
before) with
h0(t, z) = (t, z)
for all (t, z) G [a, 6] x Rd
and satisfying
rh°*{b) /
(
d
F(ts,hs(u(ts)),—
JhO(a)
\
rb
\ hs(u(ts))
«t s
)dts= /
/
F{t,u(t),u(t))dt
Ja
(1.5.7) forts — h?s(t), all s G (-e 0 ,e 0 ) and all u G C 2 ([a,6],R d ). Then, for any solution u(t) of the Euler-Lagrange equations {1.1.4) for I,
F p (*,ti(*),tiW)x fc -N*))l-=o + (F(*,u(*),
^2WI-=o
(1.5.8)
w constant in t G [a, 6]. Proo/. We reduce the statement to the one of Theorem 1.5.1 by artificially considering t as a dependent variable on the same footing with u. Thus, we consider the integrand F(*(T),«(<(T)),^,^«(*(T))
/
,
4-u(t(T))\
:=F\tMt),dT
*
dt
)&
( L5 - 9 )
eft
F(t,«(«),u(*))- d T . Then
= I F{t,u(t),u{t))dt,
if t(T0) = a, t(n) = b
(1.5.10)
•/a
/(«). By our assumption, F remains invariant under replacing (t,u) by hs(t,u). Consequently, Theorem 1.5.1 applied to I yields that d (u{t))\s=0 s=0 Fp{t,u{t)yu{t)) ,— — hss{u(t))\ as
d Fpo{t,u(t),u(t))~-. + Fpo{t,uW as
30
The classical theory
with p° standing for the place of the argument ^ of F (while p stands as before for the arguments u), is invariant. Since, by (1.5.9), F —F Fpo = F - Fpti at s = 0 (note ^ = 1 for s = 0 since /i[}(£) = £), this implies the invariance of (1.5.8). q.e.d. Example 1.5.3. Suppose F = F(i/,t/), i.e. F does not depend explicitly on t. Then ha(t,z) = ( t - f s , z ) leaves I invariant as required in Theorem 1.5.2. Therefore, the 'energy' F{t, u(t),ii(t))
- Fp(t, u{t), it{t))u{t)
is conserved. We shall see another proof of this fact in Section 4.1. S u m m a r y . The theorem of E. Noether identifies a quantity that is preserved along any solution u(t) of the Euler-Lagrange equations of a variational integral, a so-called first integral of motion, with any differentiable symmetry of the integrand. For example, in classical mechanics, conservation of momentum and angular momentum correspond to translational and rotational invariance of the integral, respectively, while time invariance leads to the conservation of energy.
Exercises 1.1
For mappings u : [a, 6] —• E^, consider
E(u)~\Ja \u(t)\2dt (| • | is the Euclidean norm of Md, i.e. for z = (zl,... ,z d ), \z\2 = J2i=\(z%)2)' Compute the Euler-Lagrange equations and the second variation. Also, let L(u) := / \it(t)\dt. Ja Show that
Exercises
1.2
31
with equality if \ii(t)\ = constant almost everywhere. (What is an appropriate regularity class for the mappings u that are considered here?) Determine all minimizers of the variational integral
I(u)= f
(l-u{t))2dt
withu(-l) = 0 = ii(l). 1.3
1.4
Develop a theory of Jacobi fields for variational problems with free boundary conditions. In particular, you should obtain an analogue of Jacobi's theorem. For mappings u : [a, 6] —• E d , consider
Compute the first and second variation of / and the Jacobi equation. Can you find Jacobi fields?
2 A geometric example: geodesic curves
2.1 The length and energy of curves We let M be an n-dirnensional embedded submanifold of Rd. In this section, we assume that / is of class C 3 , i.e. that all local charts are thrice differentiable. We let c € AC([0,T],M) be a curve on M. This means that c is an absolutely continuous map from the interval [0, T] into Rd with the property that c(t) € M for every t e [0,T]. The derivative of c w.r.t. t will be denoted by a dot ',
e«) :=§«,. The length of c is given by L(c):=£\c(t)\dt = £
Hr(caA
dt,
(2.1.1)
where ( c 1 , . . . , cd) are the coordinates of c = c(t). We also define the energy of c as
E(c) := \ £ |c(t)|2 dt=\£j2
(caf dt.
(2.1.2)
We let now f:U->V
,
f(U) =
MDV
be a local chart for M as defined in Section 1.4. We assume for a moment that c([0, T]) is contained in f(U). Since / maps U bijectively onto f(U), there exists a curve l(t) C U 32
2.1 The length and energy of curves
33
with c(0 = / ( 7 ( * ) ) .
(2.1-3)
Since the derivative Df(z) has maximal rank everywhere (by definition of a chart, cf. 1.4), 7 is absolutely continuous, since c is, and we have the chain rule c(t) = (Df) h(t)) 0 7 ( 0 , or
*°(«) = §7(7(<))7iW, where the index i is summed from 1 to n. Thus
*>-jf( ^(l(f))7''«)f£(7(f))V(<) I
*
and 1 fT dfa E{c) =
dfa
2 Jo ^r(^))^W^j^W)^W*-
In these formulae, and in sequel, the index is summed from 1 to d. For zeU, we put 9f a dfa ftjM - ^ W ^ j W (2-1-4) With this notation, the preceding formulae become
and
£ ( c ) = / (sy(7(*))7<(«)y(«))i* Jo
(2.1-5)
E(c)=l-j
(2.1.6)
ffyfrtoftWW*-
Definition 2.1.1.
is called £/ie metric tensor of M w.r.L the chart f U —>V. We note that (9ij(z))ij=1^n
is symmetric, i.e.
gij(z) =9ji(z)
for all i,j
and positive definite, i.e. 9ij{z)rfrf
> 0 whenever rj = (77 1 ,..., rf1) ^ 0 € W1.
34
Geodesic curves
Remark 2.1.1. The use of local charts for M seems to have the obvious disadvantage that the expressions for length and energy of curves become more complicated. The advantage of this approach, namely not to consider curves on M as curves in Rd satisfying a constraint, is that this constraint now is automatically fulfilled. All curves represented in local charts lie on M. This more than compensates for the complication in the formulae for L and E. Our aim will be to find curves of shortest length or of smallest energy on M, i.e. to minimize the functional L and E among curves on M. For this purpose it will be useful to observe certain invariance properties of L and E. First of all, whenever i : Rd —> Rd is a Euclidean isometry, i.e. i(y) = Ay + b with A 6 0(d), the orthogonal group, and b 6 R d , then L(i(c)) = L(c)
(2.1.7)
E(i(c)) = E(c)
(2.1.8)
for any curve c : [0, T] - • Rd. Secondly, L is parameterization invariant in the sense that whenever r:[0,S]-[0,T] is a diffeomorphism (i.e. r is bijective, and both r and its inverse r _ 1 are everywhere differentiable), then L(c) = I ( c o r ) ,
for any curve c : [0, T] — Rd.
Namely C1A
L(cor)
J h^
1
C0T s s
^ T
0
/l(5c)H T
f\c(t)\dt.
\dTi
\\
ds
(2.1.9)
2.1 The length and energy of curves
35
E, however, is not parameterization invariant. By the Schwarz inequality, we have instead \
T
2
/
f dt\ -If
T1
\ 2
\c(t)\2dt)
^V^^E(cj, (2.1.10)
with equality iff \c(t)\ = constant
for almost all t.
(2.1.11)
We have shown: Lemma 2.1.1. For every c e AC([0,T],R d ) L(c)
<
V^>/E{C),
with strict inequality, unless \c(t)\ = constant
almost everywhere.
\c(t)\ = constant
almost everywhere ,
If
we say that the curve c is parameterized proportionally to arc-length, and if |c(*)| = 1, we say that it is parameterized by arc-length. We recall that a Jordan curve, i.e. an injective curve c : [0, T] —> Mrf, is rectifiable if it is absolutely continuous (which we always assume), and this implies that it may be parameterized by arc-length, i.e. there exists a diffeomorphism r:[0,L(c)]-+[0,T] with — (c o r)(s) = 1 for almost all s, I ®s I i.e. the reparameterized curve c = cor is parameterized by arc-length. From Lemma 2.1.1, we obtain: Corollary 2.1.1. Let c : [0, L(c)] —• Rd be a curve parameterized on [0,L(c)]. Among all reparameterizations r:[0,L(c)]-[0,L(c)]
36
Geodesic curves
(i.e. we keep the interval of definition fixed, namely [0, L(c)]), the parameterization by arc-length leads to the smallest energy. Namely, if c : [0, L(c)] —• E d is parameterized by arc-length L(c) = 2JE?(c),
(2.1.12)
whereas for any other parameterization of c on the same interval, L(c) < 2E(c).
(2.1.13)
We now return to those curves c that are confined to lie on M, in order to discover a third invariance. Namely,we compare the two expressions (2.1.1) and (2.1.5) for the length of c, and similarly (2.1.2) and (2.1.6) for its energy. (2.1.1) is obviously independent of the chart / : U —• V and its metric tensor, and therefore (2.1.5) has to be independent of them, too. In order to study this more closely, let f:U-+V be another chart with c([0,T}) C f(U). Then there exists a curve 7 in U with c(t) = /(7(t)) for all t. Putting
dfa
dfa
^):=^(*)^r(2)
brzeu,
we then also have 1
T
L{C) =
L (^W^'W)'*-
(2.1.14)
In order to study this invariance property more closely, we define V := I'1
of:f-1
(f(U) n f(U))
- r
1
(/([/) n /(£/))
(see Figure 2.1). if is called a coordinate transformation, (p is a diffeomorphism, i.e. a bijective map between open subsets of E n whose derivative D
hence
^(t)
= |£(7W)V W
and from / > ( z ) ) = /(z)
(2.1.15)
2.1 The length and energy of curves
37
Figure 2.1. we get 9iA*)=hiM*))^(z)^(*)-
(2-1-16)
From (2.1.15) and(2.1.16), we see 9ii ( 7 ( t ) ) 7 W « =0« (7(*))7*(*)7 (t),
(2.1.17)
and this shows again the equivalence of (2.1.5) and (2.1.1), and likewise for the corresponding expressions of the energy. The important transformation formula (2.1.16) shows how the metric tensor transforms under coordinate transformations. This invariance property of L and E makes it possible to express the length and energy of an arbitrary curve c on M that is not necessarily contained in the image of a single chart as follows: One finds a subdivision t0 = 0 < U < ... < t m _ i < tm = T of [0, T] with the property that c ([t„_i,t„]) is contained in the image of a single chart
for each v = 1 , . . . , m. Let (<7^(z)). . = 1
be the metric tensor of M
Geodesic curves
38 w.r.t. the chart / „ . Then m
m
**
x
= E / WMt)H(tH(t)V dt where c(t) = }v°lv{i) for t £ [£„_i,£„]. By the preceding considerations, this does not depend on the choice of charts fu. For this reason, one usually just says that for a curve c on M L(c)=
f (ff«(7(*))7 i (t)y(t)) i dt, Jo
(2.1.18)
where 7 is the representation for c w.r.t. a local chart, and (flfy)»,i=i,.-,n is the metric tensor of M w.r.t. this chart. Similarly E{C) =
\1 ^(7(*)M<)7W.
(2.1.19)
We now assume that the charts for M are twice differentiate and return to the question of finding shortest curves on M, for example between two given points. By Corollary. 2.1.1, it is preferable to minimize E instead of L, because a minimizer for E contains more information than one for L; namely, minimizers for E are precisely those minimizers for L that are parameterized proportionally to arc-length. Thus, minimizing E not only selects shortest curves but also convenient parameterizations of such curves. We now compute the Euler-Lagrange equations for E as given by (2.1.19): d 0 = ~v:Eji — £ y
for i = 1 , . . . , m
<=» 0 = ! (2s«(7(t))V(t)) - ( ^ j f l y )
(i(t))ik(tW(t)
(the factor 2 in the first term results from the symmetry gij = gji) «• 0 = 29ii'f 4- 2 ^ 5 i j 7 V - £i9kjikijWe now introduce some further notation: (9ij) • , w / t , j = l,...,n
(2-1.20)
2.1 The length and energy of curves
39
is the matrix inverse to (
f o r ? —~~ jk*
9ij,k :— Qzk9iJi
and finally the Christoffel symbols r
j*
:=
2gtl(gjl*~*~gklJ
"
9jk
^'
Equation (2.1.20) then becomes
o = f + Wl $9i3,kikij - 9kj,nk¥) = 7* + §0*' (&',* + 0w»i ~ 9jk,i) 7*7* by using symmetries. Thus: L e m m a 2.1.2. T/ie Euler-Lagrange curves on M are 0 = f ( i ) + r;. fe ( 7 (t))7 J '(t)7 fe (<)
equations for the energy E for
/ort = l,...,n.
(2.1.21)
The theorem of Picard-Lindelof about solutions of ordinary differential equations implies: L e m m a 2.1.3. For any z G U, v G M n , the system (2.1.21) has a unique solution y(t) with 7(0) = z , 7(0) = v for t G [—€, e] and some e > 0. Moreover, ^(t) depends differentiably on the initial values z, v. Definition 2.1.2. The solutions of (2.1.21) are called geodesies on M.
Examples Example 2.1.1. The sphere
{
n+l
^
(x 1 ,..., x n+1 ) G M n + \ Y^ (xif = l \ C Mn+1
is a differentiable manifold of dimension n. In order to construct local charts, we put fix := 5 " \ { ( 0 , 0 , . . . ,0,1)} ,n2:=Sn\{(0,0,--,0,-1)}
40
Geodesic curves
and define 01 : fix -+ E n , g2 : fi2 -+ Mn as
and 92(lV x
- ° + , ) = (r^T
i^«)
(<7I and 02 are the stereographic projections from the south and north pole, respectively). We then obtain charts /i=Sf1:Rn-Sn\{(0,...,0,l)} /2=52-1:M"^5n\{(0,...)0,-l)}. More explicitly, f\ can be computed as follows: With
[Z ,...,Z
1=
)
Q a x
x
- ^1_xn+l'---'1_xn+l
>
/'
= * V ( 1 - x n + 1 ) 2 4- x n + 1 x n + 1 ,
hence z*V 4-1 and then CJ =
_
(j — 1
n).
Thus
For the metric tensor, we compute df{ _ 26jk dzk ~ 1 4 ^z* df^1 dzk
_
4z*zk (1 4 ^ ^ ) 2
4zk
~(14^^)2'
(j,k = l , . . . , n )
2.1 The length and energy of curves
41
Hence 9ij(z)
QfocQfa = Tr-Tr- =
4 o—&ij'
(2.1.22)
Actually, the metric tensor w.r.t. the chart fi is given by the same formula. In order to compute the expression for geodesies, we also need to compute the Christoffel symbols. It turns out that adding a little generality will actually facilitate the computations. We consider a metric of the form 9ij = ^ « y ,
(2.1.23)
where
(2.1.24)
We also put
* - dzk ~
ij
0 3 dz* "
ij
0 2 dz* '
Next ^k
_ 1 fcf
r ^ = 2^
( # U + » M - #i,«)
a
a^
(2-1.25)
a
Thus, r f • vanishes if all three indices i, j , k are distinct, and for all z, j r
J< = r « = - ^ .
andr
Ji = ^
******
In the present case, ^ = log(l + | z | 2 ) - l o g 2 hence dtp _ 2z* dz^~
1 + Id2'
(2-1"26)
Geodesic curves
42
Therefore, the equations for geodesies become n
0
n
= ? + 2 E r « W ^ - r«(7)ff + £ r ii(7)W (using the symmetry T^ = T^) 7
2V
f
27
J
2
V +V
^tti + W
7
2yV.
2
(2.1.27)
.tti + W
We now claim that the geodesic *y(t) through the origin, i.e. 7(0) = 0, with 7(0) = a £ E n is given by -y(t) = aa(t),
(2.1.28)
where a : R —• R then satisfies a(0) = 0, d(0) = 1. Making the ansatz (2.1.28) in (2.1.27) leads to
fr{l
2\a\2a
&
~Z
v
+ a2\a\2
fril
\,2
, ,2 o d
l-f|a|^a2
+ a2\a\2
i = l,...,n. 7
Since we may assume o / O (otherwise the solution with 7(0) = a is a point curve, hence uninteresting), this equation holds, if a(t) satisfies the ordinary differential equation (ODE) 2 \a\ a .9 /n ., rt^N L« — a 2 . (2.1.29) 1 4- |a| a2 The theorem of Picard-Lindelof implies that (2.1.29) has a unique solution in a neighbourhood of t = 0. We then have found a solution *y(t) of (2.1.27) of the desired form (2.1.28). The image of j(t) is a straight line through 0. By Lemma 2.1.3, we have thus found all solutions through 0. The images of the straight lines under the chart /1 are the great circles on 5 n through the south pole. We can now use a symmetry argument to conclude that all the geodesic lines on Sn are given by the great circles on Sn. Namely, the south pole does not play any distinguished role, and we could have constructed a local chart by stereographic projection from any other point on Sn as well, and the metric tensor would have assumed the same form (2.1.22). More generally, one may also argue as follows: We want to find the geodesic arc j(t) on Sn with 7(0) = po, 7(0) = VQ for some p0 G Sn, V0 £ TPoSn. Let c0(t) be the great circle on n
0= a
2.2 Fields of geodesic curves
43
Sn parameterized such that Co(0) = po, Co(0) = Vo- c$ is contained in a unique two-dimensional plane through the origin in E n + 1 . Let i denote the reflection across this plane. This is an isometry of R n + 1 mapping Sn onto itself. It therefore maps geodesies on Sn onto geodesies, because we have observed that the length and energy functionals are invariant under isometries, and so isometries have to map critical points to critical points. Now i maps po a n d Vo to themselves. If 7 were not invariant under i, i o 7 would be another geodesic with initial values p0, Vo, contradicting the uniqueness result of Lemma 2.1.3. Therefore, 2 0 7 = 7, and therefore 7 = CQ. We draw some conclusions: The geodesic arc through two given points need not be unique. Namely, let p, q be antipodal points on 5 n , e.g. north and south pole. Then there exist infinitely many great circles that pass through both p and q. We shall later on see that the first conjugate point of a point p £ Sn along a great circle is the antipodal point q of p. One also sees by explicit comparison that a geodesic arc on Sn ceases to be minimizing beyond the first conjugate point, in accordance with Theorem 1.3.4.
2.2 Fields of geodesic curves Let M be an embedded, differentiate submanifold of E d , or, more generally, a Riemannian manifold of dimension nf, again of class C 3 . Let MQ be a submanifold of M; this means that Mo itself is a differentiate submanifold of E d , respectively a Riemannian manifold, and that the inclusion i : M 0 c-> M is a differentiate embedding. We assume that MQ has dimension n — 1, and that it is also of class C 3 . Theorem 2.2.1. For any x0 in M0 , there exist a neighbourhood V of XQ in M, and a chart f : U —• V with the following properties: (i) U contains the origin o / E n 7 /(0) = #o(ii) M0nV = f{Un{xn=0}) % (iii) The curves x = Q , C% = constant, i = l , . . . , n — 1, are geodesies parameterized by arc-length. The arcs £1 < xn < £2 on any such f We do not introduce the concept of an abstract Riemannian manifold here, but some readers may know that concept already, and in fact it provides the natural setting for the theory of geodesies. On the other hand, the embedding theorem of J.Nash says that any Riemannian manifold can be isometrically embedded into some Euclidean space ]Rd, hence considered as a submanifold of Rd. Therefore, from that point of view, no generality is gained by considering Riemannian manifolds instead of submanifolds of R d .
Geodesic curves
44
curve between the hypersurfaces xn = £1 and xn = £2 we all of the same length £1 — £2 • (iv) The metric tensor on U satisfies 9nn = 1,
9%n = 0 for all i = 1 , . . . , n - 1
(2.2.1)
(T/ie second relation means that the curves xl = Q , i = 1 , . . . , n — 1, intersect the hypersurfaces xn = constant orthogonally.) Proof. Since Mo is a hypersurface, for every p G Mo, there exist two unit normal vectors n±(p) to Mo at p, i.e. n±(p) G TpM, \\n±(P)\\ = l (n ± (p), v) = 0 for all v G TPM0 C TPM. In a sufficiently small neighbourhood V of #o, we may assume that such a normal vector n(p) may be chosen so that it depends smoothly on p G Mo fl V =: Vo. We assume that there is a local chart (p0 : UQ —* Vo for Mo (Uo C M 71-1 ), possibly choosing V smaller, if necessary. For every p G Mo fl V, we then consider the geodesic arc 7P(£) with = P, 7P(0) = n(p). 7P(0)
(2.2.2)
This geodesic exists for |£| < e = e(p) by Lemma 2.1.3. By choosing V smaller if necessary, we may assume that e > 0 is independent of p. Instead of 7P(£), we write ~/(p,t). Since the solution of (2.2.2) depends differentiably on its initial values (see Lemma 2.1.3), hence on p, the map / : tfo x ( - € , € ) - M (x,t) ->7(?(x),£) is likewise differentiate, where (p : Uo —» Vb is a local chart for Mo. We may assume x0 = ?(0), by composing (p with a diffeomorphism if necessary. At (0,0) G C/ 0 x(-e,e), the Jacobian of / is spanned by the linearly independent vectors
^ " ' • • •' S J ^ T ' n ( ^ ( x ) ) ( n o t e
tnat
7(^(^)>0) = <^(x) a n d
n
(^(x))
2.2 Fields of geodesic curves
45
are orthogonal to all the vectors -^ G T ^ ^ M o , j = 1 , . . . , n — 1). Therefore, by the inverse function theorem, / yields a chart in some neighbourhood U of (0,0) G Uo x (-e,e). / obviously satisfies (i), (ii) (after redefining V). (iii) also holds by construction (putting xn = t). Next, gnn = 1, since the curves x% = Q , namely / ( c i , . . . , cn__i, £), £ G (—c, e), are geodesies parameterized by arc-length, hence gnn = (-gf, -gf) = 1. Finally, the system of equations for these curves to be geodesic is d2xk (dxn)2
^k dxl dxj + I « 7 rtJ- r inr - rn dx dx
(*"=<)
forfc=l,..-,n.
Hence in particular r *nn = 0
forfc=l,
Now =
^nn
9^
(2gnlyTl
-
<7nn,/) = <7 <7n/,n,
since # n n = 1. Therefore gnkyU = 0 for all k = 1 , . . . , n. Since furthermore gnk{x1 ^... , x n _ 1 , 0 ) = 0, because the geodesic arc xn = £, xl = Q = constant, is orthogonal to the surface (p(xx,..., x n _ 1 ) = / ( a : 1 , . . . , xn~l, 0), we obtain £nfc = 0. q.e.d. Definition 2.2.1. The coordinates whose existence is affirmed by Theorem 2.2.1 are called geodesic parallel coordinates based on the hypersurface M0. Theorem 2.2.2. Let f : U —• V be a chart with the properties described in Theorem 2.2.1. In particular, the curves xl = c«, c« = const ant, for i = l , . . . , n — 1 are geodesic arcs. Then any such curve is the shortest connection of its endpoints when compared with all curves contained entirely in U and having the same endpoints. Proof We consider the geodesic 7 (t)
= {x* = Ci,a:n = * , - € < * < € } ,
where U = U0x (-e, e). Let 7(f), h < t < t2 be another curve in U with 7(^i) = 7 ( - e ) , 7(t 2 ) = 7(e). We have to prove L(7) > L(7),
(2.2.3)
46
Geodesic curves
with strict inequality, unless 7 is a reparameterization of 7. Now
£(7) - f 2 '/tl
( E
9a (7 to) 4 W to + ( V ' t o )
\i,i=l
j
A,
(2.2.4)
/
since # n n = 1, ^ n = 0 for i = 1 , . . . , n - 1 by Theorem 2.2.1(iv), rt<2 1 ^
(t)| * > 7 n (t 2 ) - 7 n (*i) = 7"(e) - 7 n (~e)
= L( 7 ). The first inequality is strict, unless 7* is constant for i = 1 , . . . , n — 1, and the second one is strict, unless jn(t) is monotonic. q.e.d. Following Weierstrafi, we say that the geodesies 7 (£)
= {x{ = a,xn
=
t,-c
constitute a field of geodesies. Theorem 2.2.2 essentially says that any geodesic arc in this field is shorter than any other curve with the same endpoints in the region covered by the field. Both properties are essential. Namely geodesic arcs on Sn that are longer than a great semicircle show that geodesies not embedded in a field need not minimize the length between their endpoints. And geodesic arcs on a cylinder, contained in meridians, but longer than a semicircle show that there may be shorter curves not contained in the field. We observe that if 7(2) solves (2.1.21), so does 7(A£) for A = constant. We fix ZQ G U and denote the geodesic arc 7 of Lemma 2.1.2 with 7 (0)
= z 0 ,7(0) = ^
by 7„. Then by the above observation 7
forA^O.
(2.2.5)
Thus 7AV is defined on [ ^ , j], if 7 is defined on [—c, c]. Since 7^ depends differentiably on v, and since v G R n , \v\ = 1, is compact, there exists eo > 0 with the property that for all v with \v\ = 1, j v is defined on [—eo, eo]. Prom (2.2.5), we then conclude that for any w G E n with M < eo> yw is defined on [—1,1]. For later purposes, we also note that by Lemma 2.1.3, eo may be chosen to depend continuously on ZQ.
2.2 Fields of geodesic curves
47
We now define a map e = eZ0 : {w G Rn : M < e 0 } -+ J7 WH+7^,(1). Then e(0) = z0. We compute the derivative of e at 0 as De(0){v) = |7t»(l)|,=o = ^7,(%=„
by (2.2.5)
= 7«(0) =
V.
Hence, the derivative of e at 0 G l n is the identity, and the inverse mapping theorem implies: Theorem 2.2.3. e maps a neighbourhood of 0 G E n diffeomorphically (i.e. e is bijective, and both e and e~l are differentiate) onto a neighbourhood of ZQ G U. q.e.d. We want to normalize our chart / : U —• V for M. First of all, we may assume zQ = 0
(2.2.6)
for the point Zo G U under consideration. Secondly, the transformation formula (2.1.16) implies that we may perform a linear change of coordinates (i.e. replace f by f o A, where A G GL(n,R)) in order to achieve fti(0) = * « .
(2.2.7)
We assume that / : U —> V satisfies these normalizations. We then replace / by / o e defined on {w G E n : \w\ < e0}. Theorem 2.2A. In this new chart, the metric tensor satisfies gijM^Sij r$fc(O) = O = 0y,fc(O)
(2.2.8) for alii , j , k.
(2.2.9)
Proof. By (2.1.16), gij = 6ij holds, since the metric tensor w.r.t. the chart / satisfies this property and De(0) is the identity by the proof of Theorem 2.2.3. In order to verify (2.2.9), we observe that in our new chart, the straight lines tv (v G E n , t \v\ < e) are geodesies. Namely, tv is mapped to 7t v (l) = 7v(£) (see (2.2.5)), where *yv(t) is the geodesic with
48
Geodesic curves
initial direction v. We thus insert 7(2) = tv into the geodesic equation (2.1.21). Then 7 = 0, hence rjk(tv)vjvk
= 0
for t = l , . . . , n .
In particular, inserting t = 0, we get r)k(0)vjvk
= 0 for all v G R n , i = 1 , . . . , n.
We use t; = e*, where (ei)l=1
n
is an orthonormal basis of R n . Then
r{i(0) = 0 for a l i i and/. We next insert v = ^(ej 4- e m ), £ ^ m. The symmetry Tljk — T^. (which directly follows from the definition of H fc and the symmetry gjk = #fcj) then yields rj m (0) = 0 for a l l i , / , m . The vanishing of gij^ for all i,j, k then is an easy exercise in linear algebra. q.e.d. Definition 2.2.2. The local coordinates xl,...,xn constructed before Theorem 2.2.4 are called Riemannian normal coordinates. We let x 1 , . . . , x n be Riemannian normal coordinates. We transform them into polar coordinates r, (p1,..., ipn~~l in the standard manner (e.g. if n = 2, x1 = r c o s ^ 1 , x2 = r s i n ^ 1 ) . This coordinate transformation is of course singular at 0. We now express the metric tensor w.r.t. these polar coordinates. We write grr instead of # n , and we write gr(p instead of g\i, I = 2 , . . . ,n, and g^ instead of (9ki)k,i=2,...,d' * n Particular, by Theorem 2.2.4 and the transformation rule (2.1.16) ffrr(0) = l,ffr V (0)=0-
(2.2.10)
The lines through the origin are geodesies by the construction of Riemannian normal coordinates, and in polar coordinates, they now become the curves
with fixed ?0-
Therefore, the geodesic equation (2.1.21) gives Trr = 0
for all i
(where of course Tlrr stands for T ^ ) , i.e. -gil (2gri,r ~ 9rr,i) = 0 for all i,
2.2 Fields of geodesic curves
49
hence 2#w,r ~ 9rr,i = 0
for all/.
(2.2.11)
Putting r — I gives grrr = u, and with (2.2.10) then grr = 1.
(2.2.12)
Using this in (2.2.11) gives 9rip,r
==:
U,
hence with (2.2.10) again gry = 0.
(2.2.13)
We have thus shown: Theorem 2.2.5. In the preceding coordinates, so called Riemannian polar coordinates, that are obtained by transforming Riemannian normal coordinates into polar coordinates, the metric tensor has the form /l
I° I :
0
... r
9 w ( ,
\0
0\
1 I ' /
where g^ stands for the (n — 1) x (n — 1)-matrix of the components of the metric tensor w.r.t. the angular variables y? 1 ,..., (pn~l. Note that this generalizes the situation for Euclidean polar coordinates. The Euclidean metric on M2, written in polar coordinates, e.g. takes the form
(J r")Note that Theorem 2.2.5, in contrast to Theorem 2.2.4, is valid on the whole chart, not only at the origin. Corollary 2.2.1. Riemannian polar coordinates are geodesic parallel coordinates based on the hypersurfaces r = constant (r ^ 0, since r = 0 corresponds to a single point, and not a hypersurface). Proof. By Theorem 2.2.5, all properties stated in Theorem 2.2.1 hold. q.e.d.
Geodesic curves
50
By Corollary 2.2.1 and Theorem 2.2.1, the curves (p = constant, r\ < r < r 2 5 are shortest connections between their end points among all curves lying in the chart. We are now going to observe that this holds even globally, i.e. also in comparison with curves that may leave the chart: Theorem 2.2.6. For each p E M, there exists e0 > 0 with the property that Riemannian polar coordinates centered at p may be introduced with domain {(r,^):0
> 6.
(2.2.14)
Since the curve (t,(po), 0 < t < e, has length e as easily follows from Theorem 2.2.5, this will imply the claim. In order to verify (2.2.14), we proceed as follows: L c
( !Io.^) =
/t0(ftj(c(*))ci(t)^(*))i* J 0
(identifying Cj
with its coordinate representation) [to
> / Jo
J
(9rrrf)2 dt
2.3 The existence of geodesies by Theorem 2.2.5 and since gw
51
is positive definite (writing c(t) —
(r{t)Mt))) rto
— \ Jo
\r\ dt,
again by Theorem 2.2.5
>
rdt = r(t0) = e. Jo Here, equality only holds if gw
2.3 The existence of geodesies Definition 2.3.1. Let M be a connected differentiate submanifold of Euclidean space R.d, or, more generally^, a connected Riemannian manifold. The distance between p,q G M is d(p,q) := inf{L(c)| c : [a, 6] —• M rectifiable curve with
c(a) = p, c(b) = q}.
Theorem 2.3.1. Let M (as in Definition 2.3.1) be compact. There exists eo > 0 with the property that any two points p,q G M with d(p, q) < e0 can be connected by a unique shortest geodesic arc (i.e. of length d(p,q)). This geodesic arc depends continuously on p and q. Proof. We take e0 as described in Corollary 2.2.2. This gives a unique shortest geodesic arc from p to q which furthermore depends continuously on q. Exchanging the roles of p and q then yields continuous dependence on p, too. q.e.d. f See footnote on p. 43.
Geodesic curves
52
We now proceed to establish a global result: Theorem 2.3.2. Let M be a compact connected differentiable submanifold ofW*, or, more generally, a compact connected Riemannian manifold. Then any two points p,q £ M can be connected by a shortest geodesic arc (i.e. of length d(p,q)). Proof. Let (c n ) n€ N be a minimizing sequence. We may assume w.l.o.g. that all cn are parameterized on the interval [0,1] and proportionally to arc-length. Thus Cn(0) = p , C n ( l ) = g ,
L(cn) —> d(p, q)
for n —> oo.
For each n, we may find t(),n = 0 < tiiTl < . . . < t m , n = 1 with L Cn
(
i[«,-..».«i,»i)^ c °'
with eo given by Theorem 2.3.1. By Theorem 2.3.1, there exists a unique shortest geodesic arc between c n (fy_i jn ) =: Pj-i, n and c n (tj,n) =: Pj, n . We replace cn\ , by this shortest geodesic arc and obtain a t new minimizing sequence, again denoted by c n , that now is piecewise geodesic. Since the length of the cn are bounded because of the minimizing property, we may actually assume that m is independent of n. Since M is compact, after selecting a subsequence of c n , the points p^n converge to limit points pj, (j = 0 , . . . , m) as n —• oo. cn\ , the t unique shortest geodesic arc between Pj_i, n and p J > n , then converges to the unique shortest geodesic arc between Pj~\ and pj (for this point, one verifies that limits of geodesic arcs are again geodesic arcs, that limits of shortest arcs are again shortest arcs, that d(pj~i,pj) < eo, and one uses Theorem 2.3.1). We thus obtain a piecewise geodesic limit curve c, with c(0) = p, c(l) = g, and L(c) = lim
L(cn),
n—+oo
since we have for the geodesic pieces L
(%-».',l)=„^oI'(C»l^-i.-.«i.-l)
for all j (tj = lim t, n). Since the c n constitute a minimizing sequence, n—+oo
L(c) = d(p,q),
2.3 The existence of geodesies
53
and c thus is of shortest possible length. This implies that c is geodesic. Namely, otherwise we could find 0 < S\ < S2 < 1 with L Icj[s s J < e0, but with C|(s s j not being geodesic. Replacing c\{ ] by the shortest geodesic arc between c(s\) and c(«2) would yield a shorter curve (cf. Theorem 2.2.6.), contradicting the minimizing property of c. q.e.d. Thus, any two points on a compact M may be connected by a shortest geodesic. We now pose the question whether they can be connected by more than one geodesic, not necessarily the shortest. On 5 n , for example, this is clearly the case. Actually, the answer is that it is the case on any compact M. That result needs a topological result that is not available to us here, however. Therefore, we will restrict ourselves to a special case which, however, already displays the crucial geometric idea of the construction for the general case, too. Theorem 2.3.3. Let M be a differentiable submanifold of Euclidean space Rd, (or more generally], a Riemannian manifold), diffeomorphic to the sphere S2. The latter condition means that there exists a bijective map h:S2
^M
that is differentiable in both directions. Then any two points p, q £ M can be connected by at least two geodesies. Proof. M is compact and connected since diffeomorphic to S2 which is compact and connected. Let us assume p ^ q. We leave it to the reader to modify our constructions in order that they also apply to the case p = q. (In that case, Thm 2.3.3 asserts the existence of a nonconstant geodesic c : [0,1] —• M with c(0) = p = c(l).) One may then construct a diffeomorphism h0 : S2 -+ M with the following properties: Let S2 = { ( x 1 , ^ 2 , ^ 3 ) e M3 : |x| = l } . Then P = M0,0,1),
g = MO, 0,-1)
and a shortest geodesic arc c : [0,1] —• M with c(0) = p, c(l) = q is given by c(i) — M0,sin7r£,cos7r£). f See footnote on p. 43.
54
Geodesic curves
Let us point out that these normalizations are not at all essential, but only convenient for our constructions. We look at the family of curves 7(^,5) = /i0(sin27rssin7r£, cos27rssin7r£,cos7r£),
0 < s}t < 1. (2.3.1)
Then 7 (£,
0) = 7(£, 1) = c(i)
for all t
and 7(0, s) = c(0),
7(1, s) = c(l)
for all s.
We find some number K with Hn(;8))
for all s.
(2.3.2)
Redefining the parameter t, we may also assume that all curves 7(-,s) are parameterized proportionally to arc-length. By Theorem 2.3.1, there exists 60 > 0 such that the shortest geodesic between any p, q £ M, with d(p, q) < e0 *s u r n c l u e - Let 0 = £0 < h < . . . < tm = 1 be a partition of [0,1] with 'i-«;-!<]£
forj = l , . . . , m .
(2.3.3)
Let another partition (TI, . . . , r m ) satisfy To = ^O < T\ < h < T2 < . . . < Tm < tm = T m + i
and T
J~TJ~i<^
forj
= l , . . . , r o + l.
(2.3.4)
If 7 : [0,1] —• M is any curve parameterized proportionally to arc-length with L(l) < K, we then have for j = 1 , . . . , m d(1(tj^),1(tj))
2.3 The existence of geodesies Lemma 2.3.1. Suppose e0 for all j .
55
< e0 and d(7( r j)>7( r j-i)) <
rf(7(^),7(^-1))
r(7) : = r 2 o n ( 7 )
L(r(7))
(2-3.5)
wn£/i equality iff 7 is geodesic. Proof. By uniqueness of the shortest geodesic between 7(fy_i) and 7(£j), we have L(n(7))
Likewise, for every curve 7', L (7,'
) < €Q for all j ,
MMV))<W) with equality only in case
r2(y) =7 ; Therefore L(r(7))
56
Geodesic curves
Proof. Each curve r n (7), n E N, is a piecewise geodesic with corners r n 7 ( r i ) , . . . , r n 7 ( r m ) and endpoints r n 7(r 0 ) = 7(0), r n 7 ( r m + i ) = 7(1). The individual segments are the unique shortest connections between these points. Therefore, each such curve is uniquely determined by the m-tupel A" := (r n 7(Ti),...,r"7(T r o )) e M x ... x M . m times Since M is compact, a subsequence of An converges to some limit (Pi,...,Pm) 6 M x ... x M. r n (7) then converges uniformly towards the piecewise geodesic 70 with endpoints 7 0 (0) = 7(0),7o(l) = 7(1) and nodes 70(r,) = Pi (i = 1 , . . . , m) with segments 70j, being the shortest geodesic arcs ber tween their endpoints. This follows from the continuous dependence of the occurring geodesic arcs on their endpoints (Theorem 2.3.1). We denote the convergent subsequence of ( r n ( 7 ) ) n € N by (7z,)„€N. For all v € N then 7l/+1
= r n ( l / ) 7^
with n(y) € N.
By the minimizing property of the subsegments of the 7 „ , L
= d
(l» ( r i - i ) >7" fo)),
('Mr,-.!,^)
hence ra+1 L
d
(>) = X)
(?" ( r i - i ) ' > ( r J'))'
j=l
Since 7I/(T 7 ) converges to p^ = 70(T^), L(7I/) converges to m+l L
(7o) = X ] d (7o to-i), 70 (r,-))
for ^ —• 00. Then also L( 7 o) = lim L( 7 „+i) = lim L(rn<">7„) 1/—>oo
1/—>oo
< lim L(7i/)
by Lemma 2.3.1
= i(7o), and equality has to hold throughout. Moreover, r(7„) converges to
r(f0),
2.3 The existence of geodesies
57
and L ( r ( 7 o ) ) = lim L(r( 7 „)) > lim L (rn^7„ =
)
by Lemma 2.3.1 again
L{10).
Lemma 2.3.1 then implies that 70 is geodesic. q.e.d. We now return to the proof of Theorem 2.3.3: We apply the preceding curve shortening process to all curves 7(-, s), $£ [0,1], simultaneously. For each 5, a subsequence of r n 7(-,$) then converges to a geodesic from p to q. We want to exclude the situation that all those limit geodesies coincide with c. Let K0 : = £ ( c ) , and tti := sup n lim L(r n 7(-,s)). o<s K0. We distinguish two cases: (1) Ki > K0
Since 7(-, 5) is continuous in s, so is r n 7(-, s) for every n G N. We now claim: Whenever s u p i ( r n 7 ( - , s ) ) < Kx + e
(2.3.6)
s
there exists s n € [0,1] with L(r" 7 (-, *»)) " ^ ( r n + 1 7 ( - , »»)) < 2e
(2.3.7)
i(rn7(-,*n))>«i-c-
(2-3-8)
and
Indeed, otherwise supL(rn+17(-,s)) < « ! - € , S
contradicting the definition of K\ (note that sup 5 L(r n " f l 7(-,5))
58
Geodesic curves is monotonically decreasing in n by Lemma 2.3.1). By definition of «i, there exists a subsequence (e n ) n € ^ —• 0 with supL(r n 7(-,5)) < Ki + e n . s
A subsequence of (r n 7(-,s n )) n(E N has to converge to some limit curve c as above, and because of (2.3.7) with e = e n , we conclude as in the proof of Lemma 2.3.2 that L(r(c)) = L(c), and c is hence geodesic by Lemma 2.3.1. Because of (2.3.8) and continuity of L in the limit as in the proof of Lemma 2.3.2, we get L(c) = K\. Since c and c are both defined on [0,1] and have different lengths, they have to be different curves. Thus, c is the desired second geodesic. (2) Ki = ft0
We are going to show that in this case, there even exist infinitely many geodesies from p to q. For that purpose, we consider the curve 700 :=7(2> 5 )This is a closed curve with 7(0) = 7(1) = c{\) (see Figure 2.2). Since ho is a diffeomorphism and r n 7(t, 5) is obtained through a process that can easily be made continuous from 7(^,5) = /io(sin27T5sin7rt,cos27r5sin7rt,cos7rt), r n 7(£,s) has to map [0,1] x [0,1] surjectively onto M. Therefore, for every n £ N and every 5 G [0,1], there exists &n(s) with 7(5) €r n 7(-,<7 n (5))=: 7 n , 5 (-) (in other words, r n 7(-,<7 n (s)) is a curve passing through 7(5)). 7n,5(*) then is a curve with 7n,s(0) =c(0) =p,7n,«(l) = c(l) =q, and because of K\ = K0J we obtain lim L(7 n , a (-)) < sup
lim L(r n 7 (-, 5 )) = ^o.
(2.3.9)
2.3 The existence of geodesies
59
q Figure 2.2. After selection of a subsequence, (7n,s(*))n€N again converges to some limit curve cs(-) with ca(0) = p , c 5 ( l ) =q and 7(s)eca(-).
By (2.3.5), L(ca(-)) < K 0 , and since Ko is the infimum of the energies of all curves from p to q («o = £(c), a n d c is minimizing), cs(-) is a minimizing curve itself, hence geodesic. Therefore, we have shown that for every 5, there exists a geodesic from p to q that passes through 7(5). Hence there exist infinitely many geodesies from p to g, as claimed. q.e.d. Remarks: (1) Lemmas 2.3.1 and 2.3.2 do not need that M is diffeomorphic to S2. Compactness suffices.
60
Geodesic curves (2) We may construct the curves 7n,s(*) at the end of the proof also in case K\ > K0. In that case, however, limits of such curves need not be geodesic anymore. (3) See Section 3.1 for an abstract version of the argument at the end of the preceding proof.
Exercises 2.1
2.2
1
For curves
6 R 2 | x 2 > 0}, con-
Compute the Euler-Lagrange equations and determine all solutions. For curves d
7(t) = ( 7 1 , . . t 7 d ) : ^ { ( x 1 , . , x d ) 6 l d | 5 ] ( x l ) 2 < l } ) t=i
consider
2.3
Compute the Euler-Lagrange equations and determine all solutions. Determine all geodesies between two given points on a cylinder { ( x , y , z ) e R 3 : x 2 - f y2 = l } .
2.4
Let £ be a surface of revolution in R 3 , i.e. £ = { ( x , y , * ) e R 3 : x 2 + y2 = / ( * ) }
2.5 2.6
for a smooth, positive / : R —*• R. What can you say about geodesies on £? For example, are the curves (x, y) = constant geodesies? When are the curves z = constant geodesies? Determine Riemannian polar coordinates on the sphere Sn with a domain of definition that is as large as possible. Let p be the center of Riemannian polar coordinates on M, with domain of definition {v e Rrf : \\v\\ < g}. Let c : [0,e] -+ M be a geodesic with c(0) — p that is parameterized by arc-length, 0 < e < Q. Show that c([0, e]) does not contain a point that is conjugate to p.
Exercises 2.7
2.8
61
Let M be a differentiable submanifold of Rd that is diffeomorphic to S 2 . Show that for any p G M, there exists a nonconstant geodesic c : [0,1] —• M with c(0) = c(l) = p. Try to find other topological classes of manifolds with the property that there always exists more than one geodesic connection between any two points.
3 Saddle point constructions
3.1 A finite dimensional example d
Let F : R —• R be a function of class C1 which is bounded from below and which is 'proper' in the following sense: F(x) -+ oc
for |x| -* oc.
(3.1.1)
Since F is bounded from below, (3.1.1) is equivalent to: For every s £ R, {x e R d : F(x) < $}
is compact.
(3.1.2)
Therefore, F assumes its infimum. Namely, we take any s0 > inf
F(x).
x€Rd
Then {x e Rd : F(x) < so} is compact and nonempty, and since F is continuous, it has to assume its infimum on that set. We now assume that F even has two relative minima, #i, #2 in R d , and that they are strict in the following sense: For x = £i,#2, w e have 360Vy
with
0 < \y-x\
< 60 : F(y) > F(x).
(3.1.3)
Theorem 3.1.1. Under the above assumptions, F has a third critical point £3 (i.e. VF(xs) = 0) with F(x3) > max(F(xi),F(x 2 )) =: «o Proof. We consider curves 7 : [0,1] —• Rd with 7(0) = xi ,7(1) =x2. 62
(3.1.4)
3.1 A finite dimensional example
63
We first observe that there exists a > 0 with the property that for any such curve, there exists to € (0,1) with F(7(t0))>Ko
+ <*.
(3.1.5)
In order to verity this, we may assume w.l.o.g. F(xi) <
F(x2).
We then choose 6 with 0 < 6 < min(<50, - \xi - x2\).
(3.1.6)
For every y with \y — x2\ = 5 then by (3.1.3) F(y) >
F(x2),
and since {\y — x2\ — 6} is compact, F assumes its minimum on this set, hence for some a > 0 min
F(y) > F(x2) + a = n0 + a.
(3.1.7)
Since for every curve 7 with (3.1.4) we have |7(1) - x2\ = 0, |7(0) - x2\ = \xi -
x2\,
there has to exist some t0 € [0,1] with h(t0)-x2\
=6
(recall (3.1.6)) .
By (3.1.7) then i W o ) ) > «o 4- a, and (3.1.5) follows indeed. We now define K\ := inf sup F(7(£)), 7
*€[o,i]
d
where 7 again is a curve in R with 7(0) = #i, 7(1) = #2- By (3.1.5) «i > « 0 . Our intention now is to find a critical point £3 of F with F(x 3 ) = «i. Since F(xi),F(x2)<«0,
(3.1.8)
64
Saddle point constructions
Xs will then be necessarily be different from X\ and #2- As a step towards the existence of such a point x 3 , we claim Ve > 0
3<5 > 0 V curves 7 with
7(0) = #1,7(1) = x2
with sup F ( 7 ( * ) ) < « i + *
(3.1.9)
*€[0,1]
3t0£
[0,1] with: F{i{t0))
> «i - 6
(3.1.10)
|(VF)( 7 (*o))|<e.
(3.1.H)
Suppose this is not the case. Then 3e0>0
VneN
3
curve
7„
between
X\
and
x^
with
suPF(7n(«))
with
(3.1.12)
F(7„(«o)) > «i - eo
(3-1-13)
|(VF)(7B(«O))|>CO.
(3-1-14)
For s > 0, we define a new curve 7 n>s by 7n,,(0:=7n(0-5(VF)(7n(0). Since x\ and #2 are minima, V F ( x i ) = 0 = VF(x 2 )> and so 7n,s(0) =Xi,7„, 5 (l) = X2, so that the curves 7 n>s are valid comparison curves. By our properness assumption (3.1.2) and (3.1.12), 7 n (£) stays in a bounded subset of E d , and V F will then be bounded on that bounded set, and hence for any s0 > 0 and all 0 < s < s0, the curves 7n,s(£) stay in some bounded set, too. This set is independent of n (as long as 0 < 5 < 5 0 , for fixed s0 > 0). By Taylor's formula F(7n,.(0) = F(yn(t))
- sVF(ln(t))
• VF( 7 „(<)) + o(s).
Since F is continuously differentiate and 7n,5(£) is contained in a bounded set, o(s) can be estimated independently of n and t (as long as 0 < s < s0). In particular, after possibly choosing s0 > 0 smaller, F(7n,.W) < F ( 7 „ ( 0 ) - | |VF( 7 „(t))| 2
(3.1.15)
3.1 A finite dimensional example
65
for all n, s with 0 < s < so, and t with |VF( 7 nW)l > co.
(3.1.16)
Thus, in particular, F(ln,S0(t))
< F(7„(0) - yeg
(3.1.17)
for all such t and all n. We now simply choose n so large that
i < ?* Then by our assumption, all £Q with F(yn(to)) and hence for all such to F(jn,so(to))
(M.«» > «i ~~ eo satisfy (3.1.14),
< F( 7 „(«o)) - y e g
+
< «i
I _ f 2 e 2 by (3.1.12) n z by (3.1.18).
(3.1.19)
Having proved (3.1.19), there are now various ways to construct a path 7 from X\ to #2 with F ( 7 ( 0 ) < «i
for all t € [0,1].
(3.1.20)
One way is to refine the above construction by letting s depend on t as follows: we choose a smooth function a(t):[0,l]-[0,a0] with a(t) = 0
whenever
F(yn(t))
< K\ — e0
a(t) = so
whenever
F(yn(t))
> K\ — —.
and
We then look at the path j(t) = 7n,
F(7(*)) < F(7B(t)) - ^
~
<M- ? -
^
66
Saddle point constructions
(cf. (3.1.15), (3.1.16), (3.1.14)), and finally for all t with F(7 n (*)) > * i -
T
F(7(*)) = i ! '(7n,.o(0)<«i
(cf-(3.1.19)).
Thus, (3.1.20) holds indeed. This, however, contradicts the definition of K\. Therefore, the assumption that our claim was not correct led to a contradiction, and the claim holds. It is now simple to prove the theorem. Namely, we let e n —• 0 for n —» oo, and for e = e n , we find <5 = 6n as in the claim. We than choose a curve yn from x\ to x
< «i +min(c„,«„).
(3.1.21)
t€[0,l]
According to the claim, there exists tn £ [0,1] with F(7n(*n))>Kl-€n
(3.1.22)
|(VF)( 7 n(*n))|
(3.1.23)
After selection of a subsequence, (7n(^n))n€N then converges to some point £3, because of (3.1.2) and (3.1.21). £3 then satisfies by continuity of F and V F F(x 3 ) = «i
(3.1.24)
VF(x 3 ) = 0.
(3.1.25)
Thus, £3 is the desired critical point. q.e.d. Theorem 3.1.1 may be refined as follows: Theorem 3.1.2. Let F as above again have two relative minima, not necessarily strict anymore. Then either F has a critical point £3 with F(x 3 ) > max(F(xi),F(x 2 )) = «o> or it has infinitely many critical points. Proof. For the argument of the proof of Theorem 3.1.1, we only need inf sup F(7(*)) > « 0 , 7
(3.1.26)
t€[0,lj
where the infimum again is taken over curves 7 : [0,1] —• Rd with 7(0) = x\, 7(1) = X2. So, suppose that (3.1.26) does not hold. We then want to
3.2 The construction of Lyusternik-Schnirelman
67
show the existence of infinitely many critical points. As in the proof of Theorem 3.1.1, we may assume F(xx) < F{x2). The argument at the beginning of the proof of Theorem 3.1.1 then shows that (3.1.26) holds if x2 is a strict relative minimum. If x2 is a relative minimum, which is not strict, for all sufficiently small 8 > 0, say 8 < <5Q, we have F(x2) < F(x)
for all x
with
\x - x2\ < 80
(3.1.27)
and there always exists some x$ with 0 < \xs — x2\ < 8 and F(x6) = F(x2).
(3.1.28)
We then put 8\ = 80/2. Then x&x is a relative minimum of F by (3.1.27), (3.1.28), hence a critical point. Having found a critical point xsn with 0 < \xsn — x2\ < \xsn_1 - x2\, we put 8n+1 = -\x6n
-x2\
and find a critical point xsn+1 with 0
< | ^ n + 1 -x2\
< (5 n+ i.
Thus, xsn+1 is a critical point of F different from all preceding ones. q.e.d. Remark. It is not very hard to sharpen the statement of Theorem 3.1.2 from 'infinitely many' to 'uncountably many'.
3.2 The construction of Lyusternik-Schnirelman In this section, we want to prove the following theorem, in order to exhibit some important global construction in the calculus of variations, introduced by Lyusternik-Schnirelman. The result presented is much more elementary than the theorem of Lyusternik-Schnirelman, which says that on any surface with a Riemannian metric, e.g. a surface embedded in some Euclidean space, diffeomorphic to the two-dimensional sphere, there exist at least three closed geodesies without self-intersections. The more elementary character of our setting allows us to bypass essential geometric difficulties encountered in a detailed proof of the LyusternikSchnirelman Theorem.
68
Saddle point constructions
Figure 3.1. T h e o r e m 3.2.1. Let 7 be a closed convex Jordan curved of class C1 in the plane M2. (7 then divides the plane into a bounded region A, and an unbounded one, by the Jordan curve Theorem. That 7 is convex means that the straight line between any two points of 7 is contained in the closure A of A.) Then there exist at least two such straight lines between points on 7 meeting 7 orthogonally at both end points (see Figure 3.1). Proof. We start by finding one such line. Let £ be the set of all straight lines / in A with dl C 7. We say that a sequence (ln)nen C C converges to / E £, if the end points of the ln converge to those of /. In order to have a closed space, we allow lines to be trivial i.e. to consist of a single point on 7 only. We denote the space of these point curves on 7 by Co. We let / := [0,1] be the unit interval. We consider continuous maps v:I-+C with the following two properties: (i) v(0) = v(l). (ii) To any such family, we may assign two subregions A\(t) and A2(t) of A in a certain manner. Namely, we let A\(t) and A2(t) be the two regions into which v(t) divides A. Having chosen A\(0) and A 2 (0), A\(t) and A2(t) then are determined by the continuity t A closed Jordan curve is a curve 7 : [0, T) -> Rd with 7(0) : : 7(T) that is injective on [0, T). Cf. the definition of a Jordan curve on p. 35.
3.2 The construction of Lyusternik-Schnirelman
69
Figure 3.2. requirement. We then require A1(1) = A2(Q). We let Vi be the class of all such families v. The construction is visualized in Figure 3.2. (0 corresponds to 0 E J, /to±,//to§,///to£, ltol) Actually, in order to simplify the visualization, if v(0) is a point curve (on 7), i) may be relaxed to just requiring that v(l) also is a point curve (on 7), not necessarily coinciding with v(0) (see Figure 3.3). Namely, any point curves can be connected through point curves, i.e. with vanishing length. We denote by L(l) the length of / G £ and define K\ :— inf
s\ipL(v(t)).
veVi tei
Figure 3.3.
Saddle point constructions
70 We want to show that
«i > 0.
For this purpose, let p > 0 be the inner radius of 7, i.e. the largest p for which there exists a disc B(x0jp)
C A
for some XQ G A (B(xo,p) := {x G E 2 : \x - x 0 | < p})« Then «i > «i •= inf supL(v(t)nB(xo,/o)). We let A'i(t) := ^ ( ^ ) n B(xo,p), i = 1,2. Because of (ii) and the continuous dependence of Ai(t) and hence also of Af{(t) on t, there exists some to £ I with Area (A[(t0)) = Area(A ; 2 (t 0 )). Thus v(to) divides B(xo,p) into two subregions of equal area. v(to) then has to be a diameter of J5(xo,p), i.e. L(v(t0)nB(x0,p))
= 2p.
Therefore «i > «i = 2p > 0 and «i is positive indeed. We are now going to show by a line of reasoning already familiar from Sections 2.3 and 3.1 that K\ is realized by a critical point / of L among all lines with end points in 7, i.e. by / meeting 7 orthogonally (see Theorem 1.4.1). For that purpose we shall assume for the moment that 7 is of class C 3 . Later on, we shall reduce the case where 7 is only C 1 to the present one by an approximation argument. We now claim Ve>0
36 >0: sup L(v(t))
Vv € Vi with < K1+6
tei
3t0 e I and where a\(l)
with
L (y(to)) > K\ - e
|cos(ai(v(t 0 )))| , |cos (a 2 (v (t0)))\ < e,
and 0*2(1) are the angles of / at its endpoints with 7.
3.2 The construction of Lyusternik-Schnirelman
71
Otherwise 3e 0 > 0 : Vn G N
3vn G Vi with
supL(i; n 0O) < «i + ^ Vt 0 with L (vn(*o)) > tfi - co |cosai(v n (t 0 ))| > e0 or
|cosa 2 (vn(*o))| > eo-
The idea to reach a contradiction from that assumption is simple, once the following Lemma is proved: L e m m a 3.2.1. For every planar closed Jordan curve 7 of class C 3 , there exists (3 > 0 with the following property: Whenever # G E 2 satisfies dist(x,7) := inf \x - y\ < (3 ye-y
there exists a unique y G 7 with dist(x,7) = \x — y\. Proof We consider 7 as an embedded submanifold of the Euclidean plane E 2 . 7 is then covered by the images of charts / : U —• V of the type constructed in Theorem 2.2.1. Here, U and V are open in E 2 , and 7
n v = f (u n {x2 = 0}).
Furthermore, the curves x1 = constant in U correspond to geodesies, i.e. straight lines in V perpendicular to 7, and they form shortest connections to 7 fl V. By shrinking U, if necessary, we may assume that it is of the form (—£, £) x (—77,77), with £ > 0, 77 > 0. Since 7 is compact, it can be covered by finitely many such charts fi
:
( - 6 , 6 ) x (-WiVi) -* Vi
, i = l,...,ra.
If we then restrict fi to (—6,6) x ("i 21 , ^)» the lines x1 = constant, :z ± ^ < x2 < ^ , then correspond to shortest geodesies to 7, since the part of 7 not contained in Vi is not contained in the image of fi, and hence has distance at least %*• from the image of the smaller set (—&, &) x (-^2i, ?f). This is indicated in Figure 3.4 where the broken lines correspond to x2 = Jk, and this is depicted for two different indices i. Therefore, (3 := min ( ^ ) satisfies the claim. t=l,...,n
q.e.d.
72
Saddle point constructions
Figure 3.4.
Vn(to)
Figure 3.5. We now return to the proof of Theorem 3.2.1: Without loss of generality eo < f3 < ^ . Assume e.\ cosai (vn(t0))
> e0.
The following construction is depicted in Figure 3.5. Choose
SI(£Q)
€
3.2 The construction of Lyustemik-Schnirelman
73
vn(to) with l*i(to)-Pi(to)|=£ where p\(to) is the endpoint of vn(to) where it forms the angle a\(to) with 7. We replace the subarc v^tto) of vn(to) between p\(to) and s\(to) by the shortest line segment vfn(to) from s\(to) to 7. By the theorem of Pythagoras and the convexity of 7 L (v'n (t0)) < L {yl (t0)) sinai (vn (t0))
+
L(vl(t0))
< l K ( t „ ) ) ^ + I(^((o)) = /3y/l^4
+
L(v2n(t0)).
Since L (y\ {to)) = j3, we have L(i£(to))
whenever
L (v n (t)) > Ki - (3
and Pi(t) = s^(t),
whenever
L (i>n (£)) < K\ — 2/3 i = 1,2
and lftW-*iWI?
for all*.
74
Saddle point constructions
We then choose again the shortest lines from Si(t) to 7 and replace vn(t) by the straight line v^{t) between those points, where these two shortest lines meet 7. By our geometric argument above L (v* (t)) < K\ — 77 for some rj > 0, whenever L(vn(t))
> fti - 6 0 .
Since also always L(v'n(t))
contradicting the definition of K\. Consequently, our claim is correct. We then find a sequence (tn)neN C J and (vn)neN C V\ with sup L(vn(t)) tei L(vn(tn))
|cos (ai (vn (tn)))\, |cos (a 2 (vn (tn)))\ < - . A subsequence of (vn (tn))ne^ then converges to a straight line l\ in A of length K\ meeting 7 orthogonally at its endpoints. In order to construct a second line I2 meeting 7 orthogonally at its endpoints, we proceed as follows: We denote by V2 the class of all continuous maps v : / x / —• C with v({0} x /) and v({l} x I) C C0 and with the following property: For all continuous maps T:I-*IXI
T(8) =
(h(8)Ms))
(3.2.1)
3.2 The construction of Lyusternik-Schnirelman
r2 = 0
f2=l/4
r2 = 1/2
f2«3/4
75
>2=l
Figure 3.6. with *i(l) = 1 ~ *i(0),t 2 (0) = 0,t 2 (l) = l,
(3.2.2)
we have v o r € Vi. Let us exhibit an example of such a v € V2 (see Figure 3.6). We consider the vi 6 Vi of Figure 3.5 where i>i(0) and i>i(l) were point curves on 7, and we rotate v\ via the parameter t2 so that at t2 = 1 we have the same picture as at t2 = 0, but with t\ interchanged with 1 — t\. Equation (3.2.2) then holds. We note that I x I becomes a Mobius strip, when we identify the parameter ti on the line t2 = 1 with the parameter 1 — t\ on the line £2=0. We define K2 := inf sup2 v€V2
L(v(t)).
tei
Then K2 >
tti,
and ^2 again is realized by some straight line l2 in A meeting 7 orthogonally at its endpoints. We consider two cases: (1) K2 > * i . Then L(l2) =
K2
>
K,\
=
L(/I),
and l2 hence is different from l\.
(2) K2 = Ki.
We claim that in this case, we even get infinitely many solutions of our problem, i.e. lines in A meeting 7 orthogonally. Namely, we let VQ € V2 be any critical family, i.e. satisfying tel2
76
Saddle point constructions (It is not hard to see that in the present case such a VQ € V2 indeed exists.)
We then have for any r : J -+ I2 with (3.2.2) sup L (vo (r (s)))
(3.2.3)
*i <supL{v0{T{s))),
(3.2.4)
and since K\ = K2, we have equality in (3.2.3) and (3.2.4). This means that VOOT is a critical family for «i, and it then has to contain a solution lr of our problem. Let S C {(s,t) € I x I\ L(vo(s,t)) = K2)} denote the set in J x J corresponding to all solutions induced by vo. After carrying out the identification prescribed by (3.2.2), which makes I x I into a Mobius strip, we see that the complement of S in this Mobius strip is not path connected. Namely, otherwise we could find r satisfying (3.2.2) for which T(I) avoids 5, and for such a r, VQ o r would then not contain a solution, as S is the set of all solutions in the family vo. This, however, contradicts what has just been said (see Figure 3.7). In fact, S has to carry a one dimensional cyclef on the Mobius strip. Otherwise, S would be contractible (in the Mobius strip) and one could reparameterize VQ on I2 so that the set of solutions corresponds to a finite number of points. But this is incompatible with K2 = «i as we have just seen. Since for each path r as in (3.2.2) with T(I) C S, VO O T G V\ is nonconstant by (3.2.1) and (3.2.2), we obtain an uncountable number of solutions. We thus have shown our result if 7 is of class C 3 . If 7 is only of class 1 C , we choose a sequence of curves 7 n of class C3 approximating 7. This means that there are parameterizations 7nW» l(T) by arc-length with lim sup ( |7 n (r) - 7 ( r ) | + ^ 7 n ( r ) - ^ 7 ( r ) •+00
0.
We then let l\,n and l2yn be the corresponding solutions for 7 n . After selection of subsequences, l\,n and l2yU then converge to solutions Zi, l2 for 7, and those l\ and l2 realize the critical values K\ and K2, respectively. Since the argument to produce infinitely many solutions in case K\ = K2 f We have to employ here some constructions from algebraic topology. A reference is any good book on that subject, e.g. M.Greenberg, Lectures on Algebraic Topology, Benjamin, Reading, Mass., 1967, pp. 33-45, 186. While this is somewhat technical we strongly urge the reader to try to understand the essential geometric idea of the preceding construction.
3.2 The construction of Lyusternik-Schnirelman
/
77
r(I)
h Figure 3.7.
did not depend on a higher differentiability assumption on 7, it is still applicable here, and we thus can complete the proof as before. q.e.d. The variational content of Theorem 3.2.1 is that we produce two geodesies in E 2 that meet a given convex Jordan curve orthogonally. In fact, this statement generalizes to any closed convex Jordan curve on some surface, enclosing a domain homeomorphic to the unit disk. In Sections 2.3, 3.2, we could only treat variational problems that could be reduced to finite dimensional problems, because we did not yet develop tools to show the existence of critical points of functionals defined on infinite dimensional spaces. We shall develop such tools in Part II, and consequently in Chaper 9 of Part II, we shall be able to present general results about the existence of unstable critical points in the spirit of the preceding results. The crucial notion will be the PalaisSmale condition that guarantees that the type of reasoning presented in Section 3.1 extends to certain functionals defined on infinite dimensional spaces. Also, the reasoning employed in Section 3.2 that infinitely many critical points can be found if two suitable critical values coincide will be given an axiomatic treatment in Section 9.3 of Part II.
Saddle point constructions
78
Exercises 3.1
X
Let F € C (M, R) (M an embedded, connected, differentiable submanifold of Rd) be bounded from below and proper (i.e. for all s € R, {x € M : F(#) < 5} is compact), and suppose F has two relative minima £i,#2- Let tt0 := max(F(xi), F(x2)).
3.2
3.3
Show that F either possesses a critical point £3 with F(#3) > KQ, or that it has uncountably many critical points. Let F € C ^ R ^ R ) be bounded from below and proper, and suppose it has three strict relative minima xi,X2,x 3 . Try to identify conditions under which F then has to possess more than two additional critical points, e.g. three or four. Let A be a compact convex subset of the unit sphere S2 C R 3 , and suppose dA is a smooth curve 7; the convexity condition here means that for any two points in A, one can find precisely one geodesic arc inside A that connects them. Show the existence of at least two geodesic arcs in A that meet 7 orthogonally at both endpoints.
4 The theory of Hamilton and Jacobi
4.1 The canonical equations We let t be a real parameter varying between t\ and ^ . We consider the variational integral 1=
f2 L
fax1®,...,xn(t),x\t),...
,xn{t)) dt
(4.1.1)
Jti
for the unknown functions x(t) = ( x 1 ( t ) , . . . , xn(t)) with fixed endpoints x(t\) and x(t2). Here, dt' We assume that L is of class C 2 . The Euler-Lagrange equations for i" are ±Lit-Lxi=0
(i = l,...,n).
(4.1.2)
We assume the invertibility condition det L±i±i ^0.
(4.1.3)
As shown in 1.2, this implies that solutions of (4.1.2) are of class C 2 . (4.1.3) also implies that we may perform a Legendre transformation. Namely, by the implicit function theorem, we may then locally solve Pi = L±i
(4.1.4)
w.r.t. x\ i.e. xi =xi(t,x,p)
(p= ( p i , . . . , p „ ) ) . 79
(4.1.5)
80
The theory of Hamilton and Jacobi
The expressions pi are called momenta. The Hamiltonian H is defined as H(t,x,p)
:=i*pt-£(*,s,£).
(4.1.6)
We obtain dx^ dx^ ^ 'd^~Lii'd^~Lxi'
Hxi Pj
and with (4.1.4) then Hxi =
~Lxi.
and with (4.1.2) and (4.1.4) then Hxi = -n.
(4.1.7)
Also dxj dpi
ti
dxj dpi
and thus again with (4.1.4) HPi=x\
(4.1.8)
(4.1.7) and (4.1.8) constitute a so-called canonical system. We are going to see that (4.1.7) and (4.1.8) also arise as Euler-Lagrange equations of the variational problem obtained by expressing L in (4.1.1) through H via (4.1.6). Namely, 1=
f2 (xjPj
-H(t,x,p))
dt,
(4.1.9)
Jti
where the unknown functions are x(t) and p(t), has Euler-Lagrange equations (4.1.7) and (4.1.8), and so does / = - f2(xjpj
+ H{t,x,p))dt.
(4.1.10)
Jti
Before proceeding, we observe that if H does not depend explicitely on t, i.e. H = H(x,p), then if is a constant of motion, i.e. constant along any solution x(t) of the equations, Namely, ~H (x(t),p(t)) by (4.1.7) and (4.1.8).
= Hxix* + HPipi = 0
(4.1.11)
4-2 The Hamilton-Jacobi equation
81
Example. For L = | | i | — V(x), we have H=\\p\2
+ V(x),
and the canonical equations become x =p P=-Vx. This example, which describes the Newtonian motion of a particle of unit mass subject to a potential F , is helpful for remembering the signs in the canonical equations.
4.2 The Hamilton-Jacobi equation Assumption. There is given a set fi C M n + 1 = { ( ^ x 1 , . . . , x n )} with the property that for any points A, B € fi, A = (a, K,1, . . . , ttn), B = (s,*? 1 ,.. . ,
(4.2.1)
and also Pi =gi(t;s,q1,...,
qn; a,*1,...
,Kn) = L±i.
(4.2.2)
In particular, Ki = fi(a;s,q\...,qn;a,K\...)Kn) qi =
(4.2.3)
fi(s;s,q\...,qn;a,K\...,Kn).
We also define
(4.2.4)
v< := gi(s; 5, g 1 , . . . , g n ; a, ft1,..., Kn) = L4i (s, q, q). In the sequel, fl etc. will mean a derivative w.r.t. the first independent variable, fl etc. a derivative w.r.t. the second one. Inserting (4.2.1), (4.2.2) into i", we obtain J = /(*,, a, *)
(4.2.5)
82
The theory of Hamilton and Jacobi
and call this expression the geodesic distance betweeen A and B. In this connection, / is called eiconal. Recalling (4.1.9), we may write /=
/
( Pi i< -H(t,x,p))dt.
(4.2.6)
We want to compute the derivatives of I(s, q, a, K).
Is = Viq1 - H(s, q, v) + J' (g'J1 + gj1' - HXif
- HPigty dt.
Equations (4.1.7) and (4.1.8) yield Hx< = —&, HPi = /*, and thus la = Vi? ~ H(s, q,v)+
(dif)
'dt
.,\t=s
+ gifx
= fi
for t = s
f
for * = a,
= 0
and thus altogether / s = Viq% - H(s, q, v) - Vitf =
-H{s,q,v)
= L(s,q,q)-qiL^
(4.2.7)
Next ^
= /
7 0 f +9iTh dqi1 +9idqi
Ja
~ Hxi^j -H ,^\ "* dq* *pp' dqi
dt
r\ £
= 9i^-\
again by (4.1.7), (4.1.8)
°1j \t=
by (4.2.3) dK a nAd ' = n
d<
?
x
^ ° ' ^ J = ^-
Thus Iqj =Vj=
Lg,{s,q,q).
(4.2.8)
4-2 The Hamilton-Jacobi
equation
83
Analogously, Ia = H(a, K, ip) = -L(a, K, k) + A*L*«.
(4.2.9)
4 ; = -¥>j = - i « i ( ^ *,«).
(4.2.10)
Inserting (4.2.8) into(4.2.7), we obtain I8 + H{s,qjq)=0.
(4.2.11)
Thus, the geodesic distance as function of the endpoint satisfies (4.2.11), a Hamilton-Jacobi equation. In the present context that equation then is also called eiconal equation. We observed at the end of Section 4.1 that H is constant along solutions if it does not depend on t explicitly. In that case, (4.2.11) implies that / then depends linearly on 5. It may be useful for understanding the preceding formulae if we derive them without the use of the Legendre transformation. Thus
/ L{t,x{t),x{t))dt = J
L{t,fJ)dt
and
Is = L(s, q, q) + f (Lxif' + W * ' ) dt. The Euler-Lagrange equations give d_ dt
I>xx ~ 37-^irM
and so
= L(8,q,q) + J'±(L±
L±ifl \t=(T
As before, we obtain from (4.2.3) /*' = - / * /*' = 0
fort = 5 for t = a,
hence Is = L(s,q,q)~
Lqiq\
84
The theory of Hamilton and Jacobi
i.e. (4.2.7). Likewise,
i¥ = f
LTt
L±t dp U
TTT
+
LT' TT-;
dt
t= S
w
i.e. (4.2.8). Thus, the Hamilton-Jacobi equation (4.2.11) is J s - L ( 5 , ? , ( j ) + Jqi
(4.2.12)
We have seen in the preceding how solutions of the canonical equations yield solutions of the Hamilton-Jacobi equations. We now want to establish a converse result. Let ?(£, x 1 , . . . , x n ) be a solution of the Hamilton-Jacobi equation which we now write as po + H(t, x\ . . . , x n , p i , . . . ,p n ) = 0
(4.2.13)
with Po =
Definition 4.2.1. / /
with GeC2
(4.2.14)
and det(GxiXj)ij=1
n^0
(4.2.15)
is a family of solutions of (4-2.13) depending on n parameters A i , . . . , An, we call if = G(t, x 1 , . . . , xn, A i , . . . , An) + A
(4.2.16)
(where A is a free real parameter) a complete integral of (4-2.13). We have the following theorem of Jacobi: Theorem 4.2.1. Let (p = G(t, x 1 , . . . , x n , A i , . . . , An) -f A be a complete integral of (4.2.13). Then one may obtain a family of solutions of the
4-2 The Hamilton-Jacobi equatiIon
85
canonical equations = xl
(4.2 .17)
Hxl = ~Pi
(4.2 .18)
Hpi
depending on 2n parameters A i , . . . , A n ,
/i , . . . , / x "
by solving
Gx< = S
(4.2 .19)
Gxi
(4.2 .20)
= Pi-
Proof. Because of (4.2.15), (4.2.19) may be solved w.r.t. #*, xl = z t ( £ , A i , . . . , A n , / / \ . . . , / / n ) . Inserting this into (4.2.20) then yields Pi = Pi(t, A i , . . . , An, [i , . . . , fin). We have to show that xl and pi satisfy the canonical equations. For this purpose, we differentiate (4.2.13) w.r.t. xl and obtain: GtXi + HPkGxkxi
+ Hxi = 0.
(4.2.21)
Differentiating (4.2.13) w.r.t. A*, we obtain GtXi+HPkGxkXi=0,
(4.2.22)
since the terms containing ^- cancel by (4.2.21). Differentiating (4.2.19) w.r.t. t, we obtain dxk GXit + GXixk—=0. (4.2.23) Comparing (4.2.22) and (4.2.23) and recalling (4.2.15) yields (4.2.17). Differentiating (4.2.20) w.r.t. t, we obtain ^ i = GxH + Gxixk^-.
(4.2.24)
Comparing (4.2.24) and (4.2.21) and using the relation (4.2.17) just derived, we then obtain (4.2.18). q.e.d. The canonical equations are a system of ODE whereas the HamiltonJacobi equation is a 1 s t order partial differential equation (PDE). The preceding considerations show the equivalence of these equations. While in general, one may consider a PDE as being more difficult than a system of ODE, in applications, one may often find a solution of the canonical
86
The theory of Hamilton and Jacobi
equations by solving the Hamilton-Jacobi equation. Here, it is typically of great help that the Hamilton-Jacobi equation does not depend on the unknown function itself, but only on its derivatives. Let us consider the following example of geometric optics: ip{t,x)y/\1 + x2dt
= /
{if (t, x) > 0),
already explained in Example (3) of Section 1.1 in a slightly different notation. The physical meaning is that x(t) is considered as the graph of a light ray travelling in a medium with light velocity 'ux\, where c is the velocity of light in vacuum. In this example, putting L(t, x, x) =
(4.2.25)
we have p = Li =
H = px-L
y/TTx1
= - vV~P2-
(4.2.26)
I(s, q, (7, K) here is the time that a light ray needs to travel from A = (a, K) to B = (5, q). The Hamilton-Jacobi equation Is -f H (5, g, Iq) = 0 becomes the eiconal equation I2s+I2q=V2-
(4-2.27)
The surfaces I(s, q) = constant are called wave fronts. Another simple example comes from a quadratic L(t, x, x) = -(x2 + ax2)
(a = constant).
(4.2.28)
Then p = L± = x,H
= px-L=
-(p2 - ax2),
(4.2.29)
and the Hamilton-Jacobi equation becomes Jt + i ( / x 2 - a r r 2 ) = 0 .
(4.2.30)
If we substitute J = p(t)x2, we are led to the Riccati equation p + 2p2-%=0.
(4.2.31)
4-3 Geodesies
87
If we substitute / = — Xt 4- ip(x) with a parameter A, we obtain from (4.2.30) -A + - (V»'(rr)2 - ocx2) = 0, i.e.
>/a£ 2 + 2Adf.
(4.2.32)
The equation I\(t,X,\) means
t+
- f-r
= [I
di 2
V ^ 4- 2A Jo \/a< This can be solved for x; let us assume for example a < 0; then the solution is / 2A x = \ sin ( \ / ^ a (t 4- /^)) . V —ex x of course solves the Euler-Lagrange equation for (4.2.28) x = ax. A physical realization is the harmonic oscillator, where x(t) is the displacement of an oscillating spring, with a = - ^ (m = mass, k = spring constant). Since p = Ix , It + H(x, Ix) = 0, we obtain from (4.2.32) A=
H{x,p),
i.e. A is the energy of the spring.
4.3 Geodesies We consider the case where L is homogeneous of degree 1, i.e. L = L&xi.
(4.3.1)
det L±iXJ = 0,
(4.3.2)
Then
88
The theory of Hamilton and Jacobi
and we cannot perform a Legendre transformation as in Section 4.1. We have H^-L
+ tfLv = 0,
(4.3.3)
and the computations of Section 4.2 yield (writing L& instead of pi etc.) J, = L ( * , g , g ) - g % « = 0
(4.3.4)
An example are the geodesic lines considered in Chapter 2. Here, L=y/Q with Q = gij(x\...,xn)xixj.
(4.3.5)
The Euler-Lagrange equations are
(43 6)
Z{7QQ")-JQQ*-°-
-
Since t does not occur explicitely in (4.3.5) and since / is invariant under transformations of t, we may choose t such that Q = 1,
(4.3.7)
i.e. that solutions are parameterized by arc-length. Equation (4.3.6) then becomes jtQ±* - Qx* = 0.
(4.3.8)
Conversely, along a solution of (4.3.8), we have Q = constant, justifying our choice of t. Namely, Q is homogeneous of degree 2 w.r.t. the variables £*, hence Qiii* = 2Q.
(4.3.9)
Differentiating (4.3.9) w.r.t. t along a solution, jtQ±i J i* 4- Q±iS? = 2jtQ
= 2Qxiii
and (4.3.8) indeed yields —-Q — 0 along a solution, at
-f 2Q^x*,
4-4 Fields of extremals
89
As already demonstrated in 2.1, (4.3.8) are the Euler-Lagrange equations for E=
Q(x(t)^W)dt=\J\ij(x(t))x\t)x'(t)dt
\j
(4.3.10)
We recall (Lemma 2.1.1) that the Schwarz inequality implies /
y/Qdt <(s-
J Qdt)
2
with equality precisely if Q = constant, and the extremals of E are precisely those extremals of i" parameterized proportionally to arc-length. In contrast to I, E is no longer invariant under transformations of t. Therefore, for solutions of the Euler-Lagrange equations corresponding to E, the parameterization is determined up to a constant factor. The Hamiltonian for E is H = Q±ix{ -Q
=Q
because of (4.3.9) .
(4.3.11)
Moreover, Pi = Qx*=29ijij-
(4.3.12)
Thus (with g* = (gij)-1).
H = -/hiPj
(4.3.13)
The Hamilton-Jacobi equation becomes Et + ^gijEx<EXJ
= 0 cf. (4.3.13), (4.2.11), (4.3.10)
(4.3.14)
and the canonical equations are x'^l^Pj
cf. (4.1.8), (4.3.13)
ldakj Pi = - 4 - ^ r P f c P i
cf
(4.3.15)
- t 4 - 1 - 7 )- ( 4 - 3 - n )> ( 4 - 3 - 5 )-
As observed at the end of Section 4.2, E depends linearly on t.
4.4 Fields of extremals Let ft C M tion
n+1
satisfy the assumptions of 4.2, T G ( ^ ( f y R ) . The equaT((T,^1,...,^n)=0
(4.4.1)
90
The theory of Hamilton and Jacobi
then defines a possibly degenerate hypersurface E (assume E ^ 0). Given B = (s, q1,..., qn) G fi, we seek A = (a, K1, . . . , Kn) G E that minimizes I(s,q\...,qn,a,K\...,Kn) as a function of (a, ) satisfying (4.4.1). At such a minimizing A, we have with some Lagrange multiplier A Ia + A i ; = 0 7^-fAT^=0
(4.4.2) (j = l , . . . , n ) .
Unless the situation is degenerate (A = 0 or Ta = TKi = 0 for all z), this means that the vector ( 1 ^ , i ^ i , . . . , i ^ ) is proportional to the gradient of T, hence orthogonal to E. From (4.2.9), (4.2.10), we then obtain -H(
(4.4.3)
These are equations for the tangent vector (K 1 , . . . , £ n ) of the solution from A to £ . A solution satisfying (4.4.3) is called orthogonal to E. We want to use the following: Assumption. Through each point of CI, there is precisely one solution orthogonal to E. For each B = ( s , ^ 1 , . . . ,
K
a
unique
(5, g))
the geodesic distance from the hypersurface E. Theorem 4.4.1. Given such a field of solutions orthogonal to E, the geodesic distance satisfies J9 = -H(s,q,L4)
(4.4.4)
Jqj =L#,
(4.4.5)
and
hence also the eiconal equation Js + H(sy g, Jq) = 0.
(4.4.6)
4.4 Fields of extremals
91
Js = Is + Iaas + IKiKis
(4.4.7)
Proof.
T(a(s, g), tt(s, #)) = 0 implies a a T„ + n\TKi = 0 and likewise
If we then use (4.4.2), we obtain in (4.4.7)
Jgi — -fgi >
and the result follows from (4.2.7), (4.2.8), (4.2.11). q.e.d. Conversely Theorem 4.4.2. If J(s,q) is a solution of (4*4-G) °f c^ass C2> there exists a field of solutions orthogonal to the hypersurf aces J(s, q) = constant, and J is the geodesic distance from the hypersurface J = 0. Proof Let J satisfy (4.4.6). We put JH:=Jq*(s,q).
(4.4.8)
The following system of ODE (f = # p , ( S , ^ , J g , )
(4.4.9)
then defines an n-parameter family of curves. By (4.4.8), we have along any such curve Pi = Jqia "J" JqiqiQ > and (4.4.6) gives JSqi "+" Hq* "+" HpjJq3qx
— 0.
Recalling (4.4.9), we obtain Pi = ~Hqi.
(4.4.10)
92
The theory of Hamilton and Jacobi
Equations (4.4.9) and (4.4.10) state that the curves q(s) constitute a field of solutions. (4.4.6) and (4.4.8) yield -H = Js
This means that (4.4.3) is satisfied for T = J with A = 1, and the solutions are orthogonal to the hypersurfaces J = constant. q.e.d. Theorem 4.4.1 gives solutions of the Hamilton-Jacobi equation (4.4.6) depending on an arbitrarily given function T £ C 1 (R n + 1 ) (namely, we obtain those solutions that start on T = 0), whereas Theorem 4.4.2 implies that all solutions are obtained in that way. The surfaces J = constant are called parallel surfaces of the field. In the special case where the hypersurface T — 0 degenerates into a point, we recover the considerations of Section 4.2.
4.5 Hilbert's invariant integral and Jacobi's theorem For a solution J(£, re 1 ,..., xn) of the Hamilton-Jacobi equation, we put again Pi '-— JxiV"ix
> • • • » # )•
If A = (a, K1, . . . , Kn) and B = (5, q1,..., qn) are connected by an arbitrary differentiable path x*(r), the integral J(B)-J(A) =
£jtJ(r,x(r))dr + JT )dr
does not depend on this particular path, but only on the end points A and B. We rewrite this integral as
fA
p~-H(T,x(T),p(r)))dT
(4.5.1)
and call it Hilbert's invariant integral Conversely now let functions Pi{T,xx,... , # n ) be given in a region O C E n + 1 for which the integral (4.5.1) does not depend on the path x{r) connecting A = (a, x (a)) and
4-5 Hilbert's invariant integral and Jacobi's theorem
93
B = (s,x (s)). Thus, we may define J : Q —> R by J(B) - J(A) = j ' (p~
- H ( r , x ( T ) , p ( r ) ) ) dr.
(4.5.2)
Since this integral does not depend on the path connecting A and JB, we must have Jx* = Pi Jt =
(4-5.3)
-H(t,x,p).
J then solves the Hamilton-Jacobi equation. By Theorem 4.4.2, any solution of the Hamilton-Jacobi equation is the geodesic distance function for a field of solutions of the canonical equations. Thus, any invariant integral of the form (4.5.1) yields a field of solutions. Let us now reconsider Jacobi's Theorem 4.2.1. Let ip = G(t,x\...,xn,\u...,\n)
+\
(4.5.4)
p + H{t, x\ . . . , x n , P l , . . . ,p n ) = 0
(4.5.5)
be a complete integral of
(with p =
m
particular det(GxtXj)?0.
(4.5.6)
Jacobi's theorem says that we obtain a 2n-parameter family of solutions of the canonical equations by solving
Gxi
=Pi,
where the parameters are A i , . . . , An, /i 1 ,. • • > / / \ For fixed values of A i , . . . , An, A, G determines a field of solutions of the canonical equations, and by the preceding consideration, it is given by the corresponding invariant integral G(B) - G(A) = j = £
S
(W ^ - H \ d r { L (r, x*(r), X\T)) + (^
(4.5.7) - i r ^ r ) ) L» j dr,
where xl(r) now denotes the derivative in the direction of the solution and not in the direction of the arbitrary curve x*(r) connecting A and B. We now vary A i , . . . , An, but keep the curve x*(r) fixed. Then the field
The theory of Hamilton and Jacobi
94
of solutions varies, and so then does xl{r). We also determine A so that G(A) = 0. Differentiating (4.5.7) then yields
G
i, L
dT
(458)
"-f((^- ) ^) '
'-
In the same way as G{B), this expression only depends on B {A is kept fixed for the moment) but not on the particular x J (r). For each 2?, we find Bo on the surface G(*,x\...,£n,Ai,...,An) = 0 that can be connected with B by a solution of the canonical equations. Along such a solution, we have dx>
.7
and the integrand in (4.5.8) thus vanishes along this curve. Instead of integrating from A to B, it therefore suffices to integrate from A to JBo, and we obtain GXi = IM\
(4.5.9)
1
l
with fi being the value of the integral from A to Bo. Thus, fi can be considered as a constant for the solution passing through Bo. If, conversely, (4.5.9) defined a family of curves xl(t, X3r, //•?) (the family is locally unique because of (4.5.6)), then, since G\3 is constant, the integrand in (4.5.8) has to vanish along any curve of the family. Thus dr
-it
)L±JXi=Q
(i = l , . . . , n ) .
(4.5.10)
In our field we have (cf. (4.2.8)) L±i — Gxj,
hence by assumption (4.5.6) det L±JXi
= detGXJXl
^ 0.
Equation (4.5.10) then implies
this means that the curves defined by (4.5.9) are solutions of the canonical equations contained in the field defined by G(t, rr 1 ,..., rrn, A i , . . . , A n ). We also observe that the parameter A is only used for specifying the surface G = 0 and has no geometric meaning beside that.
4-6 Canonical
transformations
95
4.6 Canonical transformations We want to find transformations, i.e. diffeornorphismsf
(x,p)
h-> (£,TT),
that preserve the canonical equations x — Hp P=-Hx.
(4.6.1)
This means that £ = £(#,p), n = 7r(x,p) satisfy
* = -#£.
(4.6.2)
with #*(t,£(x,p),7r(x,p)) = H(t,x,p). Equation (4.6.1) constitutes a system of ODE and if the assumptions of the Picard-Lindelof theorem are satisfied, a solution exists for given initial values x(t0) = x0, p(to) — po on some interval [t 0 ,ti]. For any i € [^o^i]5 we then obtain such a transformation by letting £(x,p) = x(f), 7r(x,p) = p(t) where (#(£),p(£)) is the solution of (4.6.1) with x(to) = x,p(to) = p. Thus, the evolution of (4.6.1) in time t, the so-called Hamiltonian flow, yields 'canonical transformations'. However, the concept of canonical transformations is more general as we now shall see. Since
dx1
dpi
_ d&
*> = M
a& x +
e£p<
f A diffeomorphism is a bijective map that together with its inverse is everywhere differentiable.
96
The theory of Hamilton
and Jacobi
and dx%
dpi l
dlT4
dpi
Hx .8X\H x d&
jsa
P
we obtain the conditions dpj _ dnj
dx1 _ die* ~~ dpi dx1
d& dx1'
d& dpi' dnj dnj
(4.6.3)
or in matrix notation
rSi
SL-i
\ dx dn
dp dn
lite
aFJ
-l
\(%YT .-(S)
-(g)
(4.6.4)
(H)
where ^4T denotes the transpose of a matrix A. Obviously, this is a condition that does not depend anymore on the particular Hamiltonian H. Definition 4.6.1. A diffeomorphism ip : R 2 n - • R 2 n , (x,p) *-> fan), satisfying (^.£.#) (or,equivalently (4-6-4)) i>s called canonical transformation. Canonical transformations can often be used to simplify the canonical equations. Before we return to that topic, however, we interrupt the discussion of the Hamilton-Jacobi theory in order to describe some basic points of symplectic geometry (for more information on that subject, we refer to D.Mc Duff, D.Salamon, Introduction to Symplectic Topology, Oxford University Press, Oxford, 1995). We denote the (n x n) unit matrix by I n and put J:=
0 Jn
-V 0
Then obviously J 2 = - I2n-
(4.6.5)
4-6 Canonical transformations
97
Equation (4.6.4) may then be written as (Dil>)-1 = -J(DiP)TJ,
(4.6.6)
or equivalently (Di/jfjDi/j
= J.
(4.6.7)
In this connection, a ty satisfying (4.6.7), i.e. a canonical transformation, is also called symplectomorphism. Prom these relations, one also easily sees that ^ is a canonical transformation iff ip~l is. In terms of J, the canonical equations (4.6.1) can also be written as z = -JV°H(t,z)
(4.6.8)
where z = (x,p), V°H(t,z) = (HX,HP). For a reader who knows the calculus of exterior differential forms, the following explanation should be useful. We consider the two-form uj = dxi A dpi
on E 2 n
(here, as always, we use a summation convention: dxl A dpi means S r = i dxlAdpi). According to the transformation rules for exterior differential forms (i.e. d^ — ^*jdxl etc.), we have, for £ = £(#,p), n = 7r(x,p), d£J A dnjJ =
^
T^TTT-^ - TT^-Tr-^
\dx*dpk
dpkdx')
dx% A ofe. yk
Thus, a; remains invariant under the transformation ?/>, i.e. dtf ArfTr^= dxl A dp<
(4.6.9)
precisely if ip is a canonical transformation. In fact, this is often used as the definition of a canonical transformation. If UJ is left invariant under i/>, so is ujn := UJA---A<JJ = nli-l)****^
dx1 A- • -AdxnAdpi A- • -Adpn. (4.6.10)
n times
Since G^1 A• • • Ad£ n Ad7Ti A• • • Ad7rn = (det Dip)dxl A • • • Acten Adpi A • • • Adp n , we conclude Liouville's:
98
The theory of Hamilton and Jacobi
Theorem 4.6.1. Every canonical transformation i/>: R 2 n —> R 2 n satisfies det£>V> = l.
(4.6.11) q.e.d.
One also expresses this result by saying that a canonical transformation is volume preserving in phase space as dx1 A • • • A dxn A dpi A • • • A dpn can be interpreted as the volume form of R 2 n . By what was observed in the beginning of this section, this applies in particular to the Hamiltonian flow which constitutes Liouville's original statement. After this excursion and interruption, we return to our canonical equations (4.6.1) and try to simplify them by suitable canonical transformations. Canonical transformations may be easily obtained from the variational integral
/ = /
L(t,x,x)dt
Jti
with L(t,x,x)
= x-p-
H(t,x,p)
(p = L±).
If W is any differentiate function, then
has the same critical points as / , because
r = i + w(t2)-w(t!), so that I* and J differ only by a constant independent of the particular path x(t). Thus, we may for example take any function W(t,x,£) and require that for all choices of x, £, x, £ dW x-pH(t,x,p) = £ • n - #*(*,£, TT) + — . (4.6.12) Then, with
/*= f'A(t,s,£)dt, differs from J only by a constant. Thus, if x(t) is a critical path for J,
4-6 Canonical transformations
99
£ (#(£), p(£)) then becomes a critical path for /*. Since dW -— = Wt + Wx.x
+
WrS,
(4.6.12) becomes * • (P ~ Wx) - £ • (n + Wz) - H + H* - Wi = 0
(4.6.13)
Since (4.6.13) is required to hold for all choices of x, £, x, £, we obtain: Theorem 4.6.2. Given an arbitrary (differentiable) function W(t, x, £), a canonical transformation (transforming (4-0.1) into (4-6.2)) is obtained through the equations V^WX 7T = -Wf:
(4.6.14)
H* = # , i.e.
Wt=0.
Wt = 0 of course means that W = W(x, £). In the same manner, we may also take a function W(t,p, £), W(t, x, TT) or W(t,p,7r). In the first case, we obtain for example the equations x = Wp H* =H , i.e. Wt = 0. Here and above, of course H* — if*(£,£,7r). We may now easily explain Jacobi's method for solving the canonical equations. We try to find W(x,£) satisfying H(t,x,Wx(x,0)
= H*(0,
(4.6.15)
i.e. reduce the Hamiltonian to a function of the variable £ alone. We have to require that detWxiV
T^O.
(4.6.16)
This ensures that the equation n = —W^ determines x, and p then is determined from p = Wx. If (4.6.15) holds, (4.6.2) becomes
7r = - # f
(4.6.17)
This implies that ^ 1 , . . . , ^ n are constants of motion (i.e. independent of t), or so-called integrals of the Hamiltonian flow. A system for which
100
The theory of Hamilton and Jacobi
n independent integrals can be found is called completely integrable. Thus, if we can find a so-called generating function W(x, £) of the above type reducing the Hamiltonian to a function of £ alone, the canonical system is completely integrable. Clearly, since in this case £*,..., £ n are constant in £, the relation n — —H^(^) can then be used to determine 7Ti,..., 7rn. In other words, a completely integrable canonical system may be solved explicitly through quadratures. Actually, one may show in this case that the sets Tc = {£* = c 1 , . . . , £ n = cn} for a constant vector c = ( c 1 , . . . , c n ) are n-dimensional tori, if compact and connected. Thus, the so-called phase space {(x,p) G R2n} is foliated by tori that are invariant under the motion, and on each such torus, the motion is given by straight lines. It should be pointed out, however, that completely integrable dynamical systems are quite rare, in the sense that the complete integrability usually depends on particular symmetries, and their dynamical behaviour is quite exceptional in the class of all Hamiltonian systems. The invariant tori may disappear under arbitrarily small perturbations. By way of contrast, the Kolmogorov-Arnold-Moser theory asserts that these invariant tori persist under sufficiently small and smooth perturbations if the coordinates of HZ are rationally independent and satisfy certain Diophantine inequalities, and if the matrix Hit of second derivatives is invertible. In the older literature, the notion of 'canonical transformation' is usually applied to any transformation ip : R2n —> R2n that preserves the form of the canonical equations, i.e. (4.6.1) is transformed into (4.6.2), but without requiring that
H'(t,Z,ir)=H(t,x,p). An example of a canonical transformation in this wider sense is £ = 2X , 7T = p with
H* = 2H.
If we now take a generating function W (t, x, £) as above, the Hamiltonian is transformed into H* =:H + Wt
(4.6.18)
while the first two relations of (4.6.14), i.e. p = Wx ,7r = - W e
(4.6.19)
4-6 Canonical transformations
101
still hold. This may be used to explain Jacobi's theorem once more, as we now shall see. Let I(t , x , . . . , x n , A i , . . . , An) be a solution of the Hamilton—Jacobi equation It + H(t,x,Ix)
= 0,
(4.6.20)
depending on parameters A i , . . . , An and satisfying as usually det/ x i A i ^ 0 .
(4.6.21)
We now choose W(t,x1,...,xn,€i,...,4n) = /(t,a:1,...,xn^i,...,4n). The corresponding transformation then is
7T = - J e
(4.6.22)
j r ( t , £ , 7 r ) = ff(t,x,p) + I t . Because of (4.6.20), JTsO. Thus, the new canonical equations are just £= 0 7T = 0 .
Solutions are of course £ = A = constant 7T = —I\ = —fi = constant. We have thus obtained the statement of Jacobi's Theorem 4.2.1, namely that from a solution of (4.6.20) with (4.6.21), we may obtain solutions of the canonical equations by solving h =V IX=P with parameters A = ( A i , . . . , A n ), \i = (/i 1 ,..., \in).
102
The theory of Hamilton and Jacobi
Classical references for this chapter include: C.G.J. Jacobi, Vorlesungen iiber Analytische Mechanik (ed. H. Pulte), Vieweg, Braunschweig, Wiesbaden 1996, C. Caratheodory, Variationsrechnung und partielle Differentialgleichungen erster Ordnung, Teubner, Leipzig 1935, R. Courant, D. Hilbert, Methoden der Mathematischen Physik II, Springer, Berlin, 2nd edition, 1968. The global aspects are developed in V.I. Arnold, Mathematical Methods of Classical Mechanics, GTM60, Springer, New York, 1978. A recent advanced monograph is H. Hofer, E. Zehnder, Symplectic Invariants and Hamiltonian Dynamics, Birkhauser, Basel, 1994. That text will give readers a good perspective on the present research directions in the field.
Exercises 4.1
4.2
Discuss the relation between the canonical equations for the energy functional E and the equations for geodesies derived in Chapter 2. (Kepler problem) Consider the Lagrangian L{x,x) = I \x\2 + A 2 |x|
4.3
for x € M3.
Compute the corresponding Hamiltonian and write down the canonical equations. Show that the three components of the angular momentum x Ax are integrals of the Hamiltonian flow. For smooth functions F, G : R 2n —• E, define their Poisson bracket as
where z = (x,p) = (xl,..., xn,p\,... nates of E 2 n . Let z(t) = (x(t),p(t)) system
,pn) are Euclidean coordibe a solution of a canonical
x = Hp P=
-Hx
Exercises
103
for some Hamiltonian H(x,p) that is independent of t. Show that for any (smooth) F : E 2 n - • E
±F{z{t)) = {F,H}. Show that the Poisson bracket is antisymmetric, i.e. {F,G} =
-{G,F}
and satisfies the Jacobi identity {{F, G}, L} + {{G, L}, F} + {{L, F}, G) = 0 4.4
for all smooth F, G, L. Show that a diffeomorphism i/> : E 2 n ~> E 2 n is a canonical transformation if
for all smooth F, G.
5 Dynamic optimization
Optimal control theory is concerned with time dependent processes that can be influenced or controlled via the tuning of certain parameters. The aim is to choose these parameters in such a manner that a desired result is achieved and the cost resulting from the intermediate states of the process and from the application or change of the parameters is minimized. In some problems, the control parameter can be applied only at discrete time steps, while other problems can be continuously controlled. As we shall see, however, the discrete and the continuous case can be treated by the same principles. Since the end result may be prescribed, and the value of a parameter at some given time influences the state of the system at subsequent times and therefore typically will also contribute through this influence to the cost of the process at those later times, the determination of the optimal control parameters is best performed in a backward manner. This means in the discrete case that one first selects the best value of the control parameter at the last stage, whatever state the system is in at that time, then the value at the second-to-last stage, so that at this step the contribution of the value of the control parameter at the last stage to the total cost function is already determined and one only needs to optimize the cost function w.r.t. the second-to-last parameter value, and so on.
5.1 Discrete control problems We consider a process with n states # i , . . . , xn £ Md. At each state #i, we may choose a control parameter KeAi, 104
(5.1.1)
5.1 Discrete control problems
105
where A* is a given control restriction (A* C Rc) to determine Xi+i =(pi(xi,\i)
(5.1.2)
with cost
The total cost of the process starting at the initial state xv is n
j M x „ , A „ , . . . , A n ) := ^ f c t ( x i , A i ) ,
with
xi+i = y><(x<,A<). (5.1.3)
We wish to minimize the total cost of the process and define the Bellman function J„(x„) :=
inf
Ku(xu,\u,...,\n)
(i/ = 1 , . . . ,n).
(5.1.4)
A i €A i
t = f,.. .,n
T h e o r e m 5.1.1. The Bellman function satisfies the Bellman equation J„(x„) =
inf (fc„ (x„, A„) + J„+i (yv (x„, A„)))
/or i/ = 1 , . . . , n
A„€A„
(5.1.5) (ftene, we pttf 7 n + i = 0). Furthermore, (A„,...,A n ) G A^ x • • • x A n , (x^,... ,x n ) mf/i (5.1.2) are solutions of (5.1.4) iff Ij(xj) = kj(xj, Xj) + Ij+xixj+i)
forj = v,..., n.
(5.1.6)
Proof. Since Kv\xv\
A „ , . . . , An) = ku(Xi,, \v) -h Kv+i {(PvyXi") Ai/); Aj,_|_i,..., A n )),
we get J„(x„) =
inf
jFf„(x„;A„,...,A n )
t = t/,. , . , n
inf
=
inf
inf
M^><M +
A„€A„ \ \
=
Kv(xv\ \v,...,
inf x
j£Aj
An)
jFf„+i (y>„(x„, A„); A„+i,..., An) I
j = v + l,...,n
/
inf (fc„(x„, A„) + I„+i (
which is (5.1.5). For (A„,..., An) G A^ x • • • x A n , Xj+i =
for
106
Dynamic
optimization
j = i/,..., n,
< kvyXy, Xu) + • • • -h rC n (x n , A n j = Ky\Xyi
An, . . . , A n j .
If the infimum w.r.t. Xj G Aj (j = z/,..., n) is realized, we must have equality, and (5.1.6) follows. q.e.d. Corollary 5.1.1. (Ai,... ,A„) G Ai x - - - x A n , ( x i , . . . , x „ ) with (5.1.2) is a solution of (5.1.4), iff for all v = I,... ,n, (A„,..., An) G A„ x • • • x A n , ( a v , . . . , x n ) with (5.1.2) is a solution of (5.1.4)Corollary 5.1.2. (Bellman's method) An optimal solution of the process can be calculated as follows: For any value of xn, compute Xn(xn) minimizing (5.1.5) for v = n. Having computed XJ(XJ) for j = i / + l , . . . , n , compute Xu(xu) for any value of xv as to minimize (5.1.5) and put x„+i = (pu(xu1Xu(xu)). For an arbitrary initial value x\, an optimal process thus is given by: Ai := Ai(xi) , x2 := Vi(xi,Ai) , A2 = A 2 (£ 2 ),... •
5.2 Continuous control problems We want to minimize K(ti,x(h))
for a path x : [t0,h] -+ Rd
under the following conditions: We have the initial condition x(t0) = x0 and the final condition x(h) G Bx with a given set B\ eRd.
We have the control equation
x(t) = f(t, x(t), X(t))
for almost all t G (t0l h)
for a piecewise continuous control function X(t) satisfying X(t) G A
5.2 Continuous control problems
107
for some given A C E c . Pairs (\(t),x(t)) satisfying all these restrictions are called admissible, and the set of admissible pairs is called P(£o,#o). We put I(t0,x0):=
inf
K(tux(ti))
(A(t),x(t))€P(to,x 0 )
(Bellman function). Lemma 5.2.1. (i) I(t\,xi) = K(ti,xi) for all x\ G B\ (ii) For any path (A(t), x(t)) G P(to, XQ), I(t, x(t)) is a monotonically increasing function oft£ [to,t\]. Proof (i) is obvious. For (ii), if to < T\ < r2 < ti, the set of all admissible paths from (T2,X(T2)) to (t\,B\) can be considered as a subset of those ones from ( T I , X ( T I ) ) to (t\,x(ti)). Namely, if we have any path from (T2,£(T2)) to (ti,x\) for some x\ G fii, we may compose it with x(t)\. r j to obtain a path from ( T I , X ( T I ) ) to (ti,#i). Thus, every endpoint in B\ that can be reached from (T2,X(T 2 )) by an admissible path can also be reached from ( T I , X ( T I ) ) by an admissible path. This implies monotonicity. q.e.d. Theorem 5.2.1. (A(t),ir(t)) is a solution of the problem, if I(t,x(t)) is constant in t. Moreover, if there exist a function J(t,x) that satisfies J ( t i , # i ) = K(ti,xi) for all x\ G B\ and is monotonically increasing along any admissible path, and an admissible path (\(t),x(t)), along which J is constant, then that path is a solution of the problem. Proof. For a solution, /(to, *o) = K{tux{h))
= J(ti,x(ti)
(x 0 = x(t0)),
(5.2.1)
I(t,x(t)) then is constant by Lemma 5.2.1 (ii). If 7(t,x(t)) is constant, then (5.2.1) holds, and by Lemma 5.2.1, we have a solution. Given J as described, by the monotonicity of J , for any admissible path J{to,x0) < K(ti,x(t\)) and for the path (A(£),x(t)), J(t0,x0) = J(ti,5(ti)) = K(tux(h)), and optimality follows. q.e.d.
108
Dynamic
optimization
Lemma 5.2.1 implies that for those t for which I(t,x(t)) tiable ((\(t),x(t)) G P(t0,x0)) It(t, x(t)) + /*(*, x(t))f(t,
is differen-
x(t), X(t)) > 0.
For an optimal (A(t),x(t)), we have by Theorem 5.2.1 then hit, x{t)) + h{t, xit))fit,
xit), A(t)) = 0.
Corollary 5.2.1. iBellman equation) Let t G [to,£i]> £ £ ^ d - Assume that for every X G A, t/iere exists on admissible pair (A(t),x(t)) mt/i A(r) = A, ar(r) = 6 TTien ini:(/t(r,0 + / « ( r , 0 / ( r , € , A ) ) = 0 . A6A
Proof. This follows from the proof of Lemma 5.2.1. Namely, the assumption implies that we may select A such that the path is optimal at the point (r, £) under consideration. q.e.d. Example. We want to minimize the integral
f1 {u2it)+X2it))dt Jt0 with the initial condition uit0) = u0 and the control equation u(t) = auit) + p\it)
with given a, p G R.
(5.2.2)
In order to express this problem as a control problem, we introduce a new dependent variable v{i) as solution of the equation v(t) = u2it) + A2(t) , vit0) = 0.
(5.2.3)
We then want to minimize
Given /9 : [to, *i] —>• R with p(*i) = 0 and satisfying the Riccati equation p(t) = - 2 o p ( t ) + / 3 a p a ( t ) - l ,
(5.2.4)
5.3 The Pontryagin maximum principle
109
we put J{t,u,v)
p(t)u2(t)+v(t).
=
Then J(h,u(t1),v(t1))=v(t1) and from (5.2.2), (5.2.3), (5.2.4) ^-J(t,u(t),v(t)) at
= (32p2u2 + 2p(3u\ + A2 = (/Jpu + A)2 > 0,
and this expression vanishes precisely if \(t) = -0p(t)u(t). By Theorem 5.2.1, x(t) = (u(t),v(t)) optimal solution.
(5.2.5)
and \(t) = ~/3p(t)u(t) yield an
If we substitute X(t) through the control equation (5.2.2) in the variational integral, we obtain the integral
£ (w^2+(i+$)u{t)2 ~ 2-^u{t)ii{t))dt' which is essentially the same as the one considered at the end of 4.2 with integrand given by (4.2.28). We recall that the latter one had also been reduced to a Riccati equation. Equation (5.2.5) expresses the control parameter as a function of the state of the system. We just have a feedback control: knowing the state at a given time determines the control needed to reach an optimal state at the next time.
5.3 The Pontryagin maximum principle We consider the control problem / F(£,x(£), X(t))dt —• min Jt0 (5.3.1) with the control conditions x(t0) = x 0 x(t) =
f(t,x(t),X(t))
(5.3.2)
Dynamic
110
optimization
with controls X(t) G A C R c and the end condition (5.3.3)
0(
Here, X(t) is required to be piecewise continuous, and x(t) to be continuous. (Equation (5.3.2) then has to be interpreted as an integral equation x(t) = x0 + Jt / ( r , # ( T ) , A(r))dr.) F, / , and g are required to be of class C 1 . Also, to is fixed, whereas t\ > to is variable subject to the restriction (5.3.3). We define the Pontryagin function H(x,\,p,t,no)
:=P-/(*,S,A)
-nQF(t,x,\).
We now state the Pontryagin maximum principle Theorem 5.3.1. If (x(t),\(t)) is a solution of the control problem, there exist A£o > 0, a = ( a i , . . . , a^) € Md (a ^ 0 if HQ = 0) and a continuous p = ( p i , . . . ,Pd) on [£o, h] such that at all points where X(t) is continuous, we have H(x(t), \(t),p(t)91,
fio) = max H(x(t)9 A,p(*), *, W>)
(5.3.4)
p = -Hx ,x = Hp
(5.3.5)
and
and at the end point t\, we have the transversality condition daj p(t1) = ~-±-(t1,x(t1))-aj.
(5.3.6)
There also exists a continuous function rj : [to,t\] —> K s^ch that at all points where X(t) is continuous v(t)=H(x(t),\(t),p(t),t,iio)
(5.3.7)
and f){t) = Ht V(ti)^^(tux(ti))^ Also, one may always achieve fiQ = 0 or I.
(5.3.8) (5.3.9)
5.3 The Pontryagin maximum principle
111
Remarks: (1) The equation x = Hp is just the control equation
x=
f(t,x(t),\(t)).
(2) If A = Mc, then (5.3.4) becomes «A(x(t),A(t),p(t),t,/io)=0. (3) If we want to guarantee a fixed end time i\, we simply introduce an additional variable xd+1 = t with control conditions xd+1 = 1 xd+l{to)
= t0
xd+1{h)
= h.
and end condition
We now want to exhibit the Hamilton-Jacobi theory as a special case of optimal control theory. Concretely, we want to derive the EulerLagrange equations which are equivalent to the canonical equations of Chapter 4 from the Pontryagin maximum principle. We thus consider the variational problem L(t, #(£), x(t))dt —• min J to
with x(t0) = #o, #(£i) = #i, x : [^o>^i] ~• R d a n d where x(t) is required to have piece wise continuous first derivatives. We introduce the control variable through the control equation
\{t) = x(t) with A = Md, i.e. no constraint imposed. We have g(ti,x(t\)) x(t\). The Pontryagin function of this problem is
= x\ —
H(x, A,p, t, fi0) = p • A - noL(t, x, A). By Theorem 5.3.1 there exists fi0 = 0 or 1, a G Rd (a ^ 0
for fi0 = 0)
112
Dynamic
optimization
and p e C°([t0, *i], Rd) with V=~HX p(ti) = a H{t,x(t),\(t),p(t),ii0)
= maxW(*,x(t),A,p(t),/x 0 )
andr/€C°([t 0 ,ti],]R) with t7(t)=W(t,a:(t),A(t),p(t), W ,) *) = Wt > t?(*i)=0. We now want to exclude that fio = 0. In that case, we would have f] = Ht = 0
, hence
77 = 0
since
77(^1) = 0
p= a
since
p(ti) = a.
and p = — Hx = 0
, hence
Thus W = a • A, and since Ht = 0, W(x(t), A(t),p(t), t, 0) = 0, and thus a = 0, contradicting the statement of the theorem that a ^ 0 in case /L^O = 0. We may thus assume fio = 1. The Pontryagin maximum principle then gives the Weierstrafi condition L(t, x(t), A) - L(t, x(t), i(t)) > p • (A - £(*))
for all XeRd
(5.3.10)
and «A(t,x(t),A(t),p(t),l)=0
(5.3.11)
and the Legendre condition £±z (£,#(£),#(£))
is positive semidefinite.
Equation (5.3.11) implies P = L±, and together with P =
—
ria; =
LOJ»
(5.3.12)
5.3 The Pontryagin maximum principle
113
we obtain the Euler-Lagrange equations ~L± = Lx. (5.3.13) at A basic reference for the variational aspects of optimization and control theory where also a detailed proof of the Pontryagin maximum principle together with many applications is given is E. Zeidler, Nonlinear Functional Analysis and its Applications, III, Springer, New York, 1984, pp. 93-6, 422-40.
Part two Multiple integrals in the calculus of variations
1 Lebesgue measure and integration theory
1.1 The Lebesgue measure and the Lebesgue integral In this section, we recall the basic notions and results about the Lebesgue measure and the Lebesgue integral that will be used in the sequel. Most proofs are omitted as they can be readily found in standard textbooks, e.g. J. Jost, Postmodern Analysis, Springer, Berlin, 1998, pp. 151-97 and 209-15. Definition 1.1.1. A collection S of subsets ofRd (on Rd) if
is called a a-algebra
(i) E d 6 E (ii) IfAeT,, then also Rd \ A e E (iii) / / An e E, n = 1,2,3..., then also U^Li An e E. The Borel a-algebra is the smallest a-algebra containing all open subsets of Rd. The elements of the Borel o-algebra are called Borel sets.
Easy consequences of (i)-(iii) are (iv) 0 e E (v) If An e E, n = 1,2,3..., then also (X°=i € E. (vi) UA,Be
E, then also A - B := A \ (A n B) e E.
Definition 1.1.2. Let E be a o-algebra. A measure \i on E is a countably additive function \i : E -+ E + fl {oc}. 117
118
Lebesgue measure and integration theory
'Countably additive' here means that oo
(U
\
An
n=l
/
oo
I = 5Z ^ A «) 7n=l
/or any collection of mutually disjoint (AmCiAn = 0 /or ra ^ n) elements of E. A measure defined on the Borel a-algebra is called a Borel measure. A Borel measure [i is called a Radon measure if n(K) < oo for every compact K C Rd and fi(B) — sup{fi(K) \ K C B, K compact} for every Borel set B. A measure / i o n E enjoys the following properties: (vii) /x(0) = 0 (viii) If A, B G E, A c B, then fi(A) < fi(B) (ix) If An e E, n = 1,2,3,... and An C An+i for all n, then M | I J A n J = lim \n=l
fi(An).
/
Theorem 1.1.1. There exist a {unique) a-algebra E on Rd and a (unique) measure \i in E satisfying (x) ;4ra/ open subset ofRd is contained in E (i.e. E contains the Borel a-algebra) (xi) For Q : = { x = ( x 1 , . . . ,xd) G Rd | a, < xj < bj , j = 1 , . . . ,d} , /or numbers a\,...,
a^, 6 i , . . . , bd, we have d
i=i
(xii) (translation invariance) For x G Rd, A G E we /love x + A := {x + 2/1 y G A} G E and fi(x + A) = n(A) (xiii) If A C B, B £ T,, l*(B) — 0, f/ien A G E (and, consequently, H(A) = 0). This [i is called Lebesgue measure, and the elements of E are called (Lebesgue) measurable. In later chapters, we shall however write meas in place of fi for Lebesgue measure.
1.1 The Lebesgue measure and the Lebesgue integral
119
One should note that the a-algebra of (Lebesgue) measurable sets is larger than the Borel a-algebra. We say that a property holds almost everywhere in A C E d if it holds on A \ B for some B C A with n(B) = 0. We say that two functions f,g : A —> EU {±00} are equivalent if f(x) = g(x) for almost all x E A. A set contained in a set of measure 0 is called a null set. We usually write measA instead of n(A) for a measurable set A. Definition 1.1.3. Let A C Rd be measurable. A function f :
A-+RU{±oc}
is called measurable if {xeA\f(x)<\} is measurable for every A £ E. If/ n , n e N, are measurable, c e E, then / i + / 2 , c/i, /1/2, max(/i, / 2 ) , min(/i, / 2 ) , limsup n _> 00 / n , liminfn_>oo / n are likewise measurable. Any continuous function / is measurable, because in that case {f(x) < A} is open in its domain of definition. We have the following important composition property: Theorem 1.1.2. Let g : A —• E c be measurable (i.e. g = (g1,... ,gc), and each component gi is measurable), y : E c —• E continuous. Then y o g is measurable. The characteristic function \A oi A C Rd is defined as if xe A otherwise. Thus, A is measurable if and only if its characteristic function \A is measurable. More generally, s : A —• E is called a simple function or a step function if it assumes only finitely many values, say s(A) = { A i , . . . , AjJ, and if all the sets {s(x) = A*} are measurable. Thus
XA(*):={J0
k
S=
22*iX{aix)=\i}i=l
Theorem 1.1.3. / : A —• E is measurable if and only if it is the pointwise limit of a sequence of simple functions. If f : A —• E is measurable and bounded, then it is the uniform limit of a sequence of simple functions.
120
Lebesgue measure and integration theory
Definition 1.1.4. (1) Let A cRd
be measurable with [i(A) < oo, k
a simple function on A. The Lebesgue integral of s is
L
k
s(x)dx := ] P A^({s(x) = A*}).
(2) Let A be as in (1), f : A —» R measurable and bounded. Let sn : A —» E 6e a sequence of simple functions converging uniformly to f according to Theorem 1.1.3. The Lebesgue integral of f then is I f(x)dx JA
:= n lim / sn(x)da ~*°° J A
(this integral is independent of the choice of the sequence (sn)n€N)(3) A as in (i), / : i - ^ R U {±00} measurable. Put
{
m n
if }{x) < m if f(x) > n
f(x) We say that f is integrable if lim
ifm
fmin(x)da
exists. That limit then is called the Lebesgue integral fA
f(x)dx
off(4) A c K d measurable, f : A —• R U {±00} measurable, f is called integrable if for any increasing sequence A\ C A
/ f{x)XAn{x)dx
n-^ooJA
exists. That limit then is independent of the choice of (An) and called the Lebesgue integral fA f(x)dx of f. Theorem 1.1.4. The Lebesgue integral is a linear nonnegative functional on Cl(A), the vector space of Lebesgue integrable functions on a measurable set A, and it satisfies:
1.1 The Lebesgue measure and the Lebesgue integral
121
(1) If f € CX(A), and if f = g almost everywhere on A, i.e. li{xeA\f(x)^g(x)}=0, then g € Cl(A), and I f(x)dx=
/
g(x)dx.
In particular, f f(x)dx = Oifti(A) = 0. (2) Iff£
Cl(A), then \f\ € C\A), and \f \JA
f(x)dx\< I
[ \f(x)\dx. JA
(3) If f e CX(A), h: A-+ RU {±00} measurable with \h\ < / , then h€Cl{A) and I h(x)dx\ < I f(x)dx. I JA
\JA
(4) If ix(A) < 00, / : A —• K measurable with m < f < M, then f£C\A), and mfji(A) < J f(x)dx
< Mfi{A).
(5) / / (-An)neN is a sequence of mutually disjoint (Am O An = 0 for m^n) measurable sets, A := U^=i An, f £ £l{An) for every n, and if 00
V; / then f eCl(A),
.
\f(x)\dx<<x>,
and [ f(x)dx
Conversely, if f £ Cl(A), sequence {An)neN.
= T
I
f(x)dx.
then this equation holds for any such
122
Lebesgue measure and integration theory
(6) / / / G CX(A), then for every e > 0, there exists (f G Co(Md) := {g : R d -> R continuous\ {x \ g(x) ^ 0} bounded} with
/ 1/0*0 ~" ^0*01 ^ < 6Theorem 1.1.5 (Fubini). Let A C Rc, B C Rd be measurable, and write x = (£,77) G Ax B. If f : A x JE? —• E U {±oc} zs integrable, then f
f(x)dx=
f ( [
JAXB
m,v)dv)dt
J A \JB
/
{Here, for example fB /(£, n)dn exists for almost all £ G A.) For f e Cl(A), we put
We then have Jensen's inequality: Theorem 1.1.6. Let A C R d be bounded and measurable, f a convex function. Then for all ip G Cl(A)
'
(
/
/
)
*
/
/
•
*
•
1.2 Convergence t h e o r e m s In this section, again no proofs are given, and the reader is referred to J. Jost, loc. cit., pp. 199-208. Theorem 1.2.1 ( B . Levi). Let A C R d be measurable, and let fn : A —• R U {±00} be a monotonically increasing sequence (i.e. fn(%) ^ / n + i ( x ) for all x G A, n € N) of integrable functions. If lim / fn(x)dx
< 00,
n-^ooJA
then f := l i m n - ^ fn (pointwise limit) is integrable, and / f(x)dx = n lim / JA ^°° J A
fn(x)dx.
1.2 Convergence theorems
123
Corollary 1.2.1. Let A C Rd be measurable, fn : A -* R+ U {±00} (nonnegative and) integrable. If 00
r
v)dx £ / U(X)
< oc
then YlnLi fn is integrable, and
/ y]fn(x)dx==y] / fn(x)dxTheorem 1.2.2 (Fatou). Let A C Rd be measurable, fn : A - • l U {ztoc} integrable for n G N. Assume £/ia£ £/iere exists some integrable F : A-+RU{±oo} with fn>F
L
fn(x)dx
for all n G N,
< K < oc /or some if independent of n.
1A
Then lim infn^oo / n is integrable, and / liminf fn(x)dx J A n-*°°
< liminf / n -*°° J A
fn(x)dx.
Theorem 1.2.3 (Lebesgue). Let A C Rd be measurable, fn : A —* RU{±oo} a sequence of integrable functions converging pointwise almost everywhere on A to some function f : A —» R U {±00}. Suppose there exists some integrable F : A —• R U {±00} with \fn\
for all n.
Then f is integrable, and / f(x)dx JA
= n lim / ^°° J A
fn(x)dx.
Thoerem 1.2.3 is called the theorem on dominated convergence. Let us consider an example that shows the necessity of the hypotheses in the previous results: fn : [0,1] —• R is defined as
/»(*) := { n0
for 1/n < x < 2/n otherwise.
,
>
~
^\
124
Lebesgue measure and integration theory
Then lim fn = 0, n—+oo
and lim / fn(x)dx = 1 ^ 0 = / f(x)dx. -+°° Jo Jo The fn do not form a monotonically increasing sequence so that B.Levi's theorem does not apply, and they are not bounded by some integrable function that is independent of n so that Lebesgue's theorem does not apply either. Considering — fn instead of / n , we finally obtain a sequence for which Fatou's theorem does not hold. As a corollary of Theorem 1.2.3 one has (approximate the derivative by difference quotients): n
Corollary 1.2.2 (Differentiation under the integral). Let I C R be an open interval, A C R d measurable, and suppose f : A x I —* R U {±00} satisfies (i) for any t £ I, /(•, t) is integrable on A (ii) for almost all x £ A, / ( # , •) is differentiable on I (Hi) there exists an integrable cj>: A —• RU {±00} with the property that for all t € I and almost all x G A
^ / ( M ) | <*(*)• Then
ip(t):= J is a differentiable function oftEl,
W
f(x,t)dx with d
f(x,t)dx.
A Ot
q.e.d.
2 Banach spaces
In this chapter, we present some results from functional analysis that will be needed in the sequel, in particular in the next chapter. All proofs are supplied. As a reference, one may use any good book on functional analysis, e.g. K. Yosida, Functional Analysis, Springer, Berlin, 5th edition, 1978, pp. 52-5, 81-3, 90-92, 102-28, 139-45 or F. Hirzebruch, W. Scharlau, Einfuhrung in die Funktionalanalysis, Bibliograph. Inst., Mannheim, 1971, pp. 60-88, 107-12. (These were also our main sources when compiling this chapter.)
2.1 Definition and basic properties of Banach and Hilbert spaces Definition 2.1.1. A vector space V overR is called a normed space if there exists a map ||-|| : V —• E, called norm, satisfying (i) | H | >0 for allv e V, v^O (ii) ||Ai;|| = |A| |M| for all\eR,veV (iii) \\v -f w\\ < \\v\\ -f \\w\\ for all v,w £ V (triangle inequality) A sequence (fn)n£N C V is said to converge to v € V if lim \\vn — v\\ = 0. n—+00
(In order to distinguish the notion of convergence just defined from the notion of weak convergence to be defined in the next section, we sometimes call it norm convergence or strong convergence.) 125
126
Banach spaces
A sequence (vn)neN C V is called a Cauchy sequence if for every e > 0 we may find N G N such that for all n,m> N \\Vn-Vm\\
< £•
A normed space (V, ||-||) is called a Banach space if it is complete w.r.t the notion of convergence just defined, i.e. if every Cauchy sequence converges to some v £ V.
Examples (1) Every finite dimensional normed vector space is a Banach space, for example Rd with its Euclidean norm |-|. (2) Let K C Rd be compact. C°(K) := {/ : K ~+ R continuous}, ll/lloo : = suPa:GiC l/( x )l f° r f € C°(K), defines a Banach space. If we equip Cm(K) := {/ : K —• R m-times continuously differentiable}, m G N, with the norm HH^, it is not a Banach space, because it is not complete. Namely the convergence w.r.t. 11-1loo is uniform convergence, and while the uniform limit of continuous functions is continuous, in general the uniform limit of differentiable functions is not necessarily differentiable. (3) Let (V, 11-||) be a Banach space, W dV & linear subspace that is closed w.r.t. ||-|| i.e. if (u>n)n€N C W converges to v G V^limn^oo \\wn - v\\ = 0), then v G W. Then (W, ||-||) is a Banach space itself. Definition 2.1.2. A Hilbert space is a vector space H overR equipped with a scalar product, i.e. a map (•,•): JJ x if —• R satisfying (i) (v, w) = (w, v) for all v,w G H (ii) (Aifi + A2t>2,w) = \\{vi,w) + A2(t>2,w) for all Ai,A2 G R, vi,v2)w G H (iii) (v, v) > 0 for alive H\ {0}. In addition, we require (iv) H is complete w.r.t. the norm \\v\\ := (v, v)*, i.e. a Banach space. In order to justify the preceding definition, we need to verify that \\v\\ = (v,v)* defines indeed a norm in the sense of Definition 2.1.1. Since the properties (i), (ii) of Definition 2.1.1 are clearly satisfied, we only need to check the triangle inequality:
2.1 Basic properties of Banach and Hilbert spaces
127
Lemma 2.1.1. Let ( v ) : ffxfl-+R satisfy (i)~(iii) of Definition 2.1.2. Then we have the Schwarz inequality: \(v,w)\ < \\v\\ • \\w\\ for all v,w G H, with equality if and only if v and w are linearly dependent. Proof. We have for v, w G H, X G R (v + Xw, v + Xw) > 0 by (iii) . Inserting A = — ,y™( and expanding with the help of (i), (ii) yields the Schwarz inequality. Since \\v + w\\2 = (t; -f tc, v + tc) = ||f||2 + I M | 2 + 2(v, w), the Schwarz inequality in turn implies the triangle inequality. q.e.d. Definition 2.1.3. Let V be a vector space (overR, as always). M CV is called convex if whenever x, y G M, then also tx + (1 - t)y GM
for all 0
Example 2.1.1. Let (F, ||-||) be a normed space. Then for every fi < 0, JE?M := {# G V | ||x|| < fi} is convex. Namely if x,y G JBM, i.e. |x| < //, |y| < /x, then for 0 < t < 1 |te + ( l - t ) y | < t | x | + ( l - t ) | y | < j x , hence to + (1 - £)y € B^. The following definition contains a sharpening of the convexity of the balls B^. It will be formulated only for fi = 1, but by homogeneity ((ii) of Definition 2.1.1), it implies an analogous condition for any [i > 0. Definition 2.1.4. A normed space (V, ||-||) is called uniformly convex if for all e > 0 there exists <5 > 0 with the property that for all x, y G V with \\x\\ = \\y\\ = 1, we have 2^ + V) > 1 - £ =H|x - 2/|| < c.
(2.1.1)
Remark 2.1.1. An equivalent form of the implication (2.1.1) is x-V\\ (again for ||x|
> 1 =»
= 1).
11 x +
K
2/)
< 1
(2.1.2)
Banach spaces
128
Example 2.1.2. In a Hilbert space (if, (•, •)), we have the parallelogram identity •2
^IMI' +^IMI2-^-*/!!2
jffr + tf) 1
(2.1.3)
which follows by expanding the norms in terms of the scalar product. Therefore, any Hilbert space is uniformly convex. Lemma 2.1.2. In Definition 2.1.3, the condition \\x\\ = \\y\\ = 1 may be replaced by
<
Ml < i ,
I.
Proof. In the situation of Definition 2.1.3, for €o > 0, we may find So > 0 such that for all z,w with \\z\\ = \\w\\ = 1, we have
j(* + H > i~<50 =» I k - H I < e o-
(2.1.4)
Let now c > 0, ||x|| < 1, \\y\\ < 1. If for 6 < \ l-«<
2^
+ y)
then ||x||>l-2«,
||»||>l-2&
In particular, x, y ^ 0, and by the triangle inequality,
1(JL + JL) 2viwi
2^x+y)
>
iiwiiy
y
y
> 1 - 3<5.
We apply (2.1.4) with z = j^fjj, w = J ^ J , e0 = f. If 36 < 60, we then get
_f
y_
W|
||2/!|
<2-
Now
k-2/ll <
+
x
y
\\y\\ by the triangle inequality
<4*
+
i.
+
Nil
y \\y\\
2.1 Basic properties of Banach and Hilbert spaces
129
Choosing 6 = min(3<5o,e/8), we have shown the implication |1
\(* + v) > l-6=>\\x-y\\
<e
for ||*|| < 1, \\y\\ < 1. q.e.d. Lemma 2.1.3. Let (V, ||-||) be a uniformly convex Banach space. Let (#n)n€N C V be a sequence with limsup ||x n || < 1 for all n G
(2.1.5)
and lim
~\Xn
= I.
-h Xm)
n,m—+c»
(2.1.6)
Then (xn) converges to some x G V with \\x\\ = 1. Proof. Let e > 0. (2.1.5) and (2.1.6) imply lim||x n || = 1. Therefore, by replacing xn by nf^rr, we may assume w.l.o.g ||x n || = 1. Because of (2.1.5), we may apply Lemma 2.1.2. By (2.1.6), we may find N G N such that for n,m> N ly^n
i %m)
>
1-6,
with 6 determined by Lemma 2.1.2. We obtain \\xn
xm 11
i.e. (xn)nGN is a Cauchy sequence. Since (V, ||-| is a Banach space, it has a limit x, and ||x|| = lim ||x n || = 1. n—*oo
q.e.d. In order to formulate the Hahn-Banach theorem, a fundamental extension result for linear functionals from a linear space to the whole space, we need: Definition 2.1.5. Let V be a (real) vector space. p : V -+ M+
(E+:={tEl|t>0})
130
Banach spaces
is called convex if (i) p(x + y) < p{x) + p{y) for all x, y e V (ii) p(Xx) = Xp(x) for all x eV, A > 0 Example 2.1.3. The norm on a normed vector space. Let VQ be a linear subspace of the vector space V, /o : VQ —• R linear. A linear / : V —* R is called an extension of /o if /Ivo = /oTheorem 2.1.1 (Hahn-Banach). Let VQ 6e a Zineor subspace of the vector space V, p : V —• R + convex. Suppose that fo:Vo~+Ris linear and satisfies fo{x)
forallxeVo.
(2.1.7)
Then there exists an extension f : V —• R o/ /o t^it/i f(x) < p(x)
for all xeV.
(2.1.8)
Remark 2.1.2. We shall need the Hahn-Banach theorem only in the case where V possesses a countable basis, i.e. is separable (see p. 130). Proof We may assume VQ 7^ V. Let v £ V \ Vb, V\ be the linear subspace of V spanned by VQ and v, i.e. Vi:={x
+ tv\xeVo,
,te
R}.
We shall now investigate how /o can be extended to f\ : V\ —>• R with /i(a?) < p(x)
for all a; € Vi.
(2.1.9)
We put fi{v) =: a. Then as an extension of /o, /i satisfies fi(x + tv) = /o(x) + ta. Equation (2.1.9) requires /o(a;) -h t a < p(x -h tv).
(2.1.10)
For t > 0, this is equivalent to «^p(? + ")-/o(f).
(2-1-H)
»>-p(-f-»)-/„(f).
(2.U2)
and for t < 0 to
2.1 Basic properties of Banach and Hilbert spaces
131
For #1, x2 £ V, we have /0O2) - fo(xi) < p{x2 - xi) = p((x2 + v) - {xi + v)) < p(x2 4- v) + p(-Xi - v), hence ~fo{x2) +p{x2 + v) > -fo{xi)
- p ( - x i - v).
(2.1.13)
Thus a 2 := inf ( - / 0 (x 2 ) + p(x2 + tv)) > ax : = s u p ( - / 0 ( x i ) - p ( - x i
-v)).
Therefore, any a with ot\ < a < a2 satisfies (2.1.11) and (2.1.12), hence (2.1.10). Thus, the desired extension /1 exists. If V possesses a countable basis, we may use the preceding construction to extend /o inductively to all of V. If V does not possess a countable basis, we need to use Zorn's lemma to complete the proof. For that purpose, let :={?: W —• E extension of /o to some linear subspace W, VQ C W C V, satisfying ?(#) < p(x) for all x € W} On <£, we have an obvious ordering relation (namely, for ^ : W% —> R, i = 1,2, we have ?i < ?2 if W^i C W2 and y>2|wi =
for all x £ V0.
Banach spaces
132
Then there exists an extension / : F - + l o / / o with \f(x)\ < X\\x\\
forallxeV.
Theorem 2.1.2 (Helly). Let (V, ||-||) be a Banach space, / i , . . . , / n linear functional V -* R that are continuous w.r.t. the norm convergence, /i, a\,...,an E R . Suppose that for any Ai,. » A n G
1n
n
1
(2.1.14)
<M TTien for each e > 0, there exists xe G V with fi(x€) = a;
for t = 1,2,..., n
(2.1.15)
and ||x € || < / * + €. Proof Let m < n be the maximal number of linearly independent /*, z = l , . . . , n . It suffices to consider ra linearly independent /*, w.l.o.g. fi, • • •»/m, since the remaining ones are easily seen to be taken care of by (2.1.14). F(x) := ( / i ( x ) , . . . , fm(x)) may then be considered as a linear map onto Rm. We equip Mm with its Euclidean structure. Let B H+e
•-
{xeV\
||x|| * + «}.
Then F(B M+C ) is a convex set containing 0 as an interior point. Also, F(£?M+C) is balanced in the sense that with p G R m it also contains -p. We now assume that a i , . . . , a m is not contained in F(B M + C ). Because of the properties of F(B M+C ) just noted, we may then find A = (Ai,. , Am) with / J AiCki >
sup
X>/<(*) =1
t=l
771
(*X + €) t= l
contradicting (2.1.14). Thus ( a i , . . . , a m ) G claim.
F(JBM+C),
2.2 Dual spaces and weak convergence Let V be a vector space. The linear functionals f:V-+R
implying the g.e.d.
2.2 Dual spaces and weak convergence
133
then also form a vector space. If (V, ||-||) is a normed vector space, we define the norm of a linear functional / : V —• R as #
:= sup ! ^ i G R+ U {oo}.
(2.2.1)
Lemma 2.2.1. A linear functional f :V -+R is continuous if and only z/||/|U < oo. The easy proof is left to the reader. (See also Lemma 2.3.1 below.) q.e.d. Definition 2.2.1. V* := {/ : V —• R linear with \\f\\^ < oo} equipped with the norm (2.2.1) is called the dual space of (V, ||-||). (It is easy to verify that (2.2.1) defines a norm on V* in the sense of Definition 2.1.1.) L e m m a 2.2.2. (V*, ||-||J is a Banach space. Proof Let (fn)neN C V* be a Cauchy sequence. For every e > 0 we may then find N G N such that for n, m G N \\fn-fm\l<e. By (2.2.1), this implies that for every x G V \fn(x)
- fm(x)\
< C.
Therefore, since R is complete, (fn(x))neN converges for every x G X. We denote the limit by f(x). f : V —• R then is a linear functional. It is an easy consequence of the triangle inequality that \\f\\^ < oo and that lim n _oo | | / n - / | | + = 0. This implies that (fn)nen converges to / G V*, and (V*, | H U therefore is complete, hence a Banach space. q.e.d. Remark 2.2.1. We did not assume that V itself is a Banach space. We now consider (V*)* =: V**, the dual space of V*, with norm denoted by ||-|| ## . Any x £ V defines a linear functional i(x) : V* -* R i(x)(f):=(flX):=f(x).
134
Banach spaces
Lemma 2.2.3. ||i(#)|| ++ = ||x||. Thus, the linear functional i(x) : V* —> E is contained in V**, i.e. we have a linear isometric map i : V —» V**. Proof. We have
l(/,*)l
IM|> s u p i j M = ||t(x)|L.
(2.2.2)
Conversely, let x G V. Let /o(te) :=*||a;||
for t e R.
By the Hahn-Banach theorem (Corollary 2.1.1), we may extend / 0 from {tx\ t G l } to V as a linear functional / with
and
l(/,»)l = lkllTherefore ll^)IL=supKMi>|N|. fev* II/IU
(2.2.3)
Equations (2.2.2) and (2.2.3) imply the result. q.e.d. Definition 2.2.2. A normed linear space (V, ||-||) is called reflexive if i:V-*V** is a bijective isometry {i.e. \\x\\ = ||i(x)|| ++ for all x G V). Remark 2.2.2. (1) Since (V**, |||| + + ) is a Banach space by Lemma 2.2.2, any reflexive space is complete, i.e. a Banach space. (2) By the remark before Definition 2.2.2, the crucial condition in that definition is the surjectivity of i.
2.2 Dual spaces and weak convergence
135
Definition 2.2.3. (i) Let (V,||-||)&e a normed linear space. (#n)nGN C V is said to be weakly convergent to x G V if f(xn) converges to f(x) for all f G V*, in symbols: xn
v
x.
(ii) Let (V*, ||-||J be the dual of a normed linear space. (fn)nen C V* is said to be weak* convergent to f G V* if fn(x) converges to f(x) for all x G V. Theorem 2.2.1. Let V be a separable} normed linear space. Let {fn)n€N C V* be bounded, i.e. ||/ n ||* < constant (independent of n). Then (fn) contains a weak* convergent subsequence. Proof. Let (y„)„£?$ by a dense subset of V. Since {fn{yi))neN is bounded, a subsequence (fn(yi)) of (fn(yi)) converges. Having iteratively found a subsequence (/™) of (/ n ) for which (/™(2/z,))n€N converges for 1 < v < m, we may find a subsequence (/™ +1 ) of (/™) for which also (/^T+1(2/m+i))nGN converges. The diagonal sequence (/™)n€N then converges at every yv, v G N, and since (yv)vm is dense in V, (/™(x))nGN has to converge for every x eV. Thus, we have found a weak* convergent subsequence of (/n)n€Nq.e.d. Remark 2.2.3. (1) The argument employed in the preceding proof is called Cantor diagonalization. (2) Theorem 2.2.1 remains true without the assumption that V is separable, and so does the following: Corollary 2.2.1. Let (V,||-||)6e a separable reflexive Banach space. Then every bounded sequence (xn)nen contains a weakly convergent subsequence. Proof. By (2.2.2) or reflexivity, (i(xn))nen is a bounded sequence in V** and therefore contains a weak* convergent subsequence. Since V is f
Separable means that V contains a countable subset (yv)u£N that is dense w.r.t. 11-||, i.e. for every y € V, € > 0 there exists yu with \\y — yu\\ < e.
136
Banach spaces
reflexive, the limit is of the form i(x) for some x eV. f(xn)
- (/,a?n) -* (/,») = f{x)
Thus
for every / G T
so that (x n ) n€ N converges weakly to x. q.e.d. T h e o r e m 2.2.2. Any weakly convergent sequence (xn)n^ space is bounded. Proof. We shall show that i(xn) { / e r | | | / | L < 1 } . Then also
is
uniformly
in a Banach bounded
ll*n|| = \\i(*n)\\„ = sup J ^ i
on
(2.2.4)
is bounded (see Lemma 2.2.3 for the first equality). Since i(xn) is linear, it suffices to show uniform boundedness on some ball in V*. Otherwise, we find a sequence Bj of closed balls, Bj = {/ € V* | | | / - £11 <
Qi}
for some fj e V* , 6j > 0
with Bj+\ C
JBJ
and
lim Qj = 0 3-
and a subsequence (x^) of (xn) with |(/,x;)|>j
for all f€Bj.
(2.2.5)
By construction, (fj)jen forms a Cauchy sequence and therefore converges to some /o G T , with oo
/oef|^. Because of (2.2.5), we have
\(fo,x'n)\>j
foralljeN.
This is impossible since ( / o , ^ ) converges because (x^) n€ m converges weakly. q.e.d. Example 2.2.1. (1) In a finite dimensional normed vector space (which automatically is complete, i.e. a Banach space), weak convergence is just componentwise convergence and therefore equivalent to the usual convergence w.r.t. the norm.
2.2 Dual spaces and weak convergence
137
(2) In an infinite dimensional reflexive Banach space (V, ||-||), this is no longer so, because one may always find a sequence (e n ) nG N C V with ||ei|| < 1 for all i and ||e* — ej\\ > 1 for i ^ j . Such a sequence cannot converge w.r.t. | | | | , because it is not a Cauchy sequence, but it always contains a weakly convergent subsequence according to Corollary 2.2.1 (we have shown Corollary 2.2.1 only under the assumption of separability, but it holds true in general). Lemma 2.2.4. Let (V, ||-||)6e a separable normed space. Then V* satisfies the first axiom of countability w.r.t. the weak* topology, i.e. for each f £ V*f there exists a sequence (U^ueN of subsets of V* that are open in the weak* topology such that every U that is open in this topology and contains x is contained in some Un. Consequently, if (V, \\-\\)is also reflexive, then V* satisfies the first axiom of countability w.r.t. the weak topology. Proof. Let / £ V*. Every neighbourhood of / w.r.t. the weak* topology contains a neighbourhood of the form
UttVu...,Vk(f):={geV*\
\g(vi) - m)\
< e for i = 1,... , * } .
Since V is separable, there exists a sequence (wn)ne^ C V that is dense w.r.t the 11-|| topology. We claim that the neighbourhoods of the form U±tWilt...tWik(f) form a basis of the neighbourhood system of / of the required type, i.e. every U€;Vli.„iVk(f) contains some such U±.w. fW, (/). For that purpose, we choose n with | < e and Wix,..., Wik with \VJ - Wi3 \ < ^ for j = 1 , . . . , k. For g e [A (/), we then have
\9(Vj) ~ f(vj)\ < k K ) - / K ) | + \(9 - f)(vj - ti;y)l < \ + I < €, i.e. g e Ve-Vl,...,vk(f) as required. Finally, if V is reflexive, then the weak* and the weak topology of V* coincide. q.e.d. We now present some further applications of the Hahn-Banach theorem that will be used in Chapter 3. Lemma 2.2.5. Let (V, ||-||) be a normed space, Vo a closed linear subspace. Then VQ is also closed w.r.t. weak convergence.
138
Banach spaces
Proof. By the Hahn-Banach theorem (Corollary 2.1.1), for every XQ G V \ Vb, we may find a continuous linear functional /o : V —• R with /o(x 0 ) = 1
/ok=0. Thus, #o cannot be a weak limit of a sequence in Voq.e.d. Lemma 2.2.6. Let (V,||-||)6e a reflexive Banach space, Vo a closed linear subspace. Then Vo is reflexive. Proof. We may identify VQ* with a subspace of V**, by putting v(f) = v(f\Vo) for / G V*, v G V0**. Let v e V£*. Since V is reflexive, there exists x G V with v(/) = / ( x ) for all / G T . We claim x G Vo- Otherwise, by the Hahn-Banach theorem (Corollary 2.1.1), there exists f eV* with
m ±o f\v0 = 0. Since f(x) = f (/|vb) by the above, this is impossible. Since every /o G VQ* can be extended to / G V*, again by Hahn-Banach, we conclude v(fo) = fo(x) Thus, t; = i(x). This implies
VQ**
for a l l / eV0*.
= i(Vfo), i.e. reflexivity of Vog.e.d.
Corollary 2.2.2. -4 Banach space (V, ||-||)is reflexive if and only if its dual (V*, 11-|U is reflexive. Proof If V = V**, then also F* = F***. Thus, if V is reflexive, so is V*. Consequently, if conversely V* is reflexive, so then is V**. Since V can be identified with a closed subspace of V** by Lemma 2.2.2, Lemma 2.2.6 then yields reflexivity of V. q.e.d. Lemma 2.2.7. Let (V, ||-||)6e a normed space, and suppose that (x n ) nG N C V converges weakly to x eV. Then ||x||
2.2 Dual spaces and weak convergence
139
Proof. After selection of a subsequence, we may assume that ||# n || converges (see Theorem 2.2.2). Assume ||x|| > lim | | x n | | . n—+oo
As in the proof of Lemma 2.2.3, we may find f EV* with
11/11. = 1 l/(x)| = IW|. But then | / ( x ) | > lim ||x n || > l i m s u p | / ( x n ) | , n-+oo
n~*oo
while the weak convergence of (# n ) n eN to x implies f(x) = lim
f(xn).
n-+oo
This contradiction establishes the claim. q.e.d. Theorem 2.2.3 (Milman). Any uniformly convex Banach space is reflexive. Proof (Kakutani). Let (V, ||*||)be a uniformly convex Banach space, and let XQ* € V**. We need to show that there exists some xo € V with i(x0) = x£*
(2.2.6)
(see Remark 2 after Definition 2.2.2). We may assume w.l.o.g. that 11*0*11 = 1-
(2-2-7)
For every n 6 N, we may then find fn € V* with | | / n | | = 1 and 1 - - < x*0*(fn) < 1. n We now claim that for every n G N, we may find xn E V with fi(xn)=x*0*(fi)
for i = l , . . . , n
(2.2.8)
(2.2.9)
and
I M | < 11*5*11 + ^ = l + i .
(2.2.10)
For any A i , . . . , An £ R, we have
X>*o*(/o U=i
r° \^ /i = l' ***
< WxoW
X>/J |t=l
140
Banach spaces
and so the claim follows from Helly's Theorem 2.1.2. Since in addition to (2.2.10) also I K H = | | / n | | l k n | | > fn(xn)
= XQ*(fn)
> 1 ~
~,
we must have lim ||x n || = 1. n—>oo
For ra > n, we have 2 2 - -
2 < fn(xn)
+ / n ^ m ) < | | x n + Xm\\
< \\xm\\
+ \\xm\\
< 2 +
-.
n n By Lemma 2.1.3, (xn)n€N is a Cauchy sequence and converges to some xo eV, satisfying Hxoll = 1 (2.2.11) and fi(xo)=x*0*(fi)
for i = 1,2,3,...
(2.2.12)
The solution #o of (2.2.11), (2.2.12) is unique. Namely, if there were another solution £Q, on one hand, we would have
IN+ 411 < 2
(2.2.13)
by uniform convexity. On the other hand fi{x0 + x'0) = 2XQ*(/<)
for all z,
hence
2 - - < 2x*0*(fi) = /<(x0 + 4) < IN + *oll , hence
IN + 4ll>2. This contradicts (2.2.13), and so Xo is unique. We now claim that /o(*o) = x*0*(f0)
for any / 0 G V*,
(2.2.14)
so that XQ* = i(xo), proving the theorem. Let this /o G V* be given. In the above reasoning, we replace the sequence / i , / 2 , / 3 , - - - by /o> /l> /2, /3, — We then obtain X Q G F with
IKII = i and fi(x'0) = x*0*(fi)
fori = 0 , 1 , 2 , 3 , . . .
(2.2.15)
2.2 Dual spaces and weak convergence
141
Since the solution XQ of (2.2.11), (2.2.12) was shown to be unique, however, we must have xr0 = #o- Equation (2.2.15) for i = 0 then is (2.2.14). q.e.d. Corollary 2.2.3 (Riesz). Any Hilbert space (H, (•, •)) can be identified with its dual H*. Proof. Since a Hilbert space is uniformly convex, Therem 2.2.3 implies H = H**. On the other hand, any x £ H induces an fx G H* by fx{y) '= (x,y)
for y G H.
We have 11/0-11= sup (x,y) < \\x\\ IMI=i and fx(x) = (x,x) = ||x|| 2 , hence
ll/xll = IN|. Thus, H is isometrically embedded into H*. For the same reason, H* is isometrically embedded into H**, and since H = if**, one readily verifies that these embeddings must be surjective, hence H = H* = H**. q.e.d. Let M be a linear subspace of a Hilbert space H. The orthogonal complement ML of M is defined as M1 := {x e H : (z, y) = 0
for all y G M} .
It is clear that M1 is a closed linear subspace of H. M need not be closed here, but the orthogonal complement of M is the same as the one of its closure M in H. Corollary 2.2.4. Let M be a closed linear subspace of the Hilbert space H. Then every x G H can be uniquely decomposed as x = xi + x2 with xi € M,x2 € ML. Proof. By the proof of Corollary 2.2.3, x G H corresponds to fx G H* with fx(y) = (x,y) for all y G H. We let f^
be the restriction of /^ to M . M, since closed, is a Hilbert
142
Banach spaces
space itself, and f*f is an element of the dual M*. By Corollary 2.2.3, it corresponds to some X\ € M, i.e. f™(y) = (xi,y) for all
y€M.
We put X2 := x — x\. Then for all y € M, (x - xu y) = / , ( » ) - / f (y) = 0 since fx = / f on M. Therefore, #2 € M- 1 . Thus, we have constructed the required decomposition. Concerning uniqueness, if x = X\ + #2 = x'i + #2 w ^ n #i> #i
G
M, X2, ^2 £ M x ,
then for all y € M (x,t/) = (xx,y) =
(x'vy),
and by Corollary 2.2.3 applied to M, X\ = x ; j, and therefore also #2 = #2q.e.d. Of course, the reader knows the preceding result in the case where H is finite dimensional, i.e. a Euclidean space. x\ is interpreted as the orthogonal projection of x onto the subspace M, and therefore Corollary 2.2.4 is called the projection theorem. The next result will be needed for Sections 4.2 and 4.3 when we establish the existence of minimizers for lower semicontinuous, convex functionals. Theorem 2.2.4 (Mazur). Suppose (xn)n€N converges weakly to x in some Banach space V. For every e > 0, we may then find a convex combination N
N
] P An Xn (An > 0, ] P An = 1) 71=1
71=1
with
IN I / ^ Anxn
< e.
(2.2.16)
|n=l
Proof. We consider the set Co of all convex combinations of the x n , i.e.
{
N
2 J A„xn 71=1
N
}
with An > 0, 2 ^ An = 1 > . n=l
J
2.2 Dual spaces and weak convergence
143
Replacing all xn by xn — x\ and x by x - #i, we may assume 0 E Co. If (2.2.16) is not true, then there exists e > 0 with ||x-y|| >£forallt/eC0.
(2.2.17)
C\ := {z £ V : ||* - y\\ < - for some y € C0} is convex and contains the ball with radius | and center 0. We consider the Minkowski functional p of C\ defined by p(z) : = i n f { A > 0 ;
X^zed}.
p is convex in the sense of Definition 2.1.5 since C\ is convex, and continuous since C\ contains the ball of radius | > 0 about 0. Since, because of (2.2.17), ||x - z\\ > - for every z e C\, we have p(x) > 1. More precisely, there exists t/o with x = A_1i/o,
0 < A< 1
p(Vo) = 1. We consider the linear subspace V0 =
{fiyo^eR}cV
and the linear functional /o(») =
fionVo.
Then fo <pon
Vo,
and by the Hahn-Banach Theorem 2.1.1, there exists an extension / of /o to all V with /
<\-1=f(\-1yo)
= f(x).
Banach spaces
144
This, however, contradicts the fact that (x„) n€ iv C Co converges weakly to x. Thus, (2.2.17) cannot hold, and (2.2.16) is established. q.e.d.
2.3 Linear operators between Banach spaces The results of this section will be used in Chapter 8. In Section 2.2, we considered linear functional f:V-+R; in the beginning, V was a normed linear space, with norm denoted by 11-||, and later, we also assumed that V was complete, i.e. a Banach space. In the present section, we replace the target E by a general Banach space W, with norm also denoted by ||*||. We thus consider linear operators
T-.V-+W, and we put
||r||:=supi3GR+U{cx)}. L e m m a 2.3.1. The linear operator T : V only if\\T\\ < oc.
(2.3.1)
W is continuous if and
Proof. If ||T|| < oo, then the inequality (2.3.2)
\\TX\\<\\T\\\\X\\
implies that T is continuous. (Of course, this uses the linearity of T.) Conversely, if T is continuous, we recall the usual e — 6 criterion for continuity, and so for e = 1, we find some 6 > 0 with the property that \\Ty\\ < 1 if |M| < 6. For x G V \ {0}, we then have with y = <5pn (||j/|| < 6) \\Tx\\ =
[
Ty
<-sM
Thus
imi< ^
2.3 Linear operators between Banach spaces
145
The space of continuous linear operators T : V —• W between the normed spaces (V, ||-||) and (W, ||-||) is denoted by L(V, W). It becomes a normed space with norm ||T||. Lemma 2.3.2. //(W, ||-||) is a Banach spacef then so is (L(V, W), ||-||). The proofIs the same as the one of Lemma 2.2.2, simply replacing (M, | • |)
by WH-II). Remark 2.3.1. Again, (V, ||-||) need not be a Banach space here. Lemma 2.3.3. Let T € L(V, W). Then kerT:={x€V
: Tx = 0}
is a closed linear sub space of V. Proof, ker T = T - 1 (0) is the pre-image of a closed set under a continuous map, hence closed. q.e.d. In the sequel, we shall encounter bijective continuous linear operators T: K - > W between Banach spaces. It is a general theorem in functional analysis, the inverse operator theorem, that the inverse of T, denoted by T - 1 , is then continuous as well. Here, however, we do not want to prove that result, and we shall therefore frequently assume that T " 1 is continuous although that assumption is automatically fulfilled in the light of that theorem. Lemma 2.3.4. Let T.V-+W be a bijective continuous linear map between Banach spaces, with a continuous inverse T - 1 . If S £ L(V,W) satisfies
l|T-S||
S=
T{Id-T~1{T-S)).
(2.3.3)
146
Banach spaces
As with the geometric series, the inverse of S then is given by
Br'tr-sirV1,
(2.3.4)
provided that series converges. However,
2(r- 1 (r-5))" < E HCr- i ( r _ 5))1| 1 i/=m
<E(ir Mllir -5|ir, and since |JT* -111 \\T — S\\ < 1 by assumption, the series satisfies the Cauchy property and hence converges to a linear operator with finite norm. q.e.d. If V is a vector space, we say that V is the direct sum of the subspaces Vi, V2, V = Vi 0 V2 if for every x G V, we can find unique elements #i G Vi, #2 € V2, with x = # i 4- #2-
We then also call V\ and V2 complementary subspaces of V. Easy linear algebra also shows that if V\ possesses a complementary subspace of finite dimension, then the dimension of that space is uniquely determined, i.e. if Vi 0 V2 = Vi 0 V2', then dim V2 = dim V2'. We now consider a normed vector space (V, ||-||). Then every finite dimensional subspace Vo is complete, hence closed. We also have: Lemma 2.3.5. Let V0 C V be a finite dimensional subspace of the normed vector space (V, ||*||). Then VQ possesses a closed complementary subspace V\, i.e. V = Vo 0 V\. Proof. Let ei,..., en be a basis of V0, /Q : Vo —> R be the linear functiona l with /o( e ») = 6v
( M = l»»-,n).
By Corollary 2.1.1, we may find extensions /«? : V —• R with fL = /<J.
2.3 Linear operators between Banach spaces
147
We define 7r: V —> V as n
7ra?:= J ^ / ^ a : ) ^ . 7r is continuous, with TT(V) = Vo. V"i := ker7r then is closed as the kernel of a continuous linear operator (Lemma 2.3.3), and every x € V admits the unique decomposition x = 7r(ar) -f (x — 7r(#)) with 7r(a:) € Vo, a: — 7r(a:) € Vi, because w on = n. q.e.d. Definition 2.3.1. Let T : V —+ W be a continuous linear operator between Banach spaces (V, ||-||) and (W, ||-||). T is called a Fredholm operator if the following conditions hold: (i) Vo = kerT is finite dimensional Consequently, according to Lemma 2.3.5, there exists a closed subspace V\ ofV with V = Vb0Vi.
(2.3.5)
(ii) There exists a finite dimensional subspace WQ of W, called the cokernel ofT (cokerT) giving rise to a decomposition ofW into closed subspaces W = W0 0 Wi
(2.3.6)
with Wx = T(V) =: R(T)
(range
ofT).
Thus, T yields bijective continuous linear operator T\ : V\ —>W\. We finally require (iii) T~l : W\ —> V\ is continuous. For a Fredholm operator T, we call ind T := dim V0 - dim Wo
(= dim ker T - dim coker T)
the index ofT. The set of all Fredholm operators T : V —> W is denoted byF{V,W). Remark 2.3.2. Question to the reader: Why is F(V, W) not a vector space?
148
Banach spaces
Remark 2.3.3. As mentioned, condition (iii) is automatically satisfied as a consequence of the inverse operator theorem. Remark 2.3.4- I*1 o u r conventions, the cokernel of T is only determined up to isomorphism, i.e. any Wo satisfying (2.3.6) with W\ = T(V) is a cokernel. Usually, one defines the cokernel as the quotient space W/W\, but here we do not want to introduce quotient spaces of Banach spaces. Theorem 2.3.1. Let V,W L(V,W), and
be Banach spaces. F(V,W)
is open in
ind : F(V, W) -> Z is continuous, hence constant on each connected component of F(V, W). Proof. Let T : V —> W be a Fredholm operator. We use the decompositions V^VbeVi
with V0 = kerT
W = W0®W1
with W0 = cokerT
of Definition 2.3.1. For S £ L(V, W), we define a continuous linear operator S':VixW0->W (x,z) *-+ Sx + z, and we obtain a continuous linear operator L(V, W) -> L(Vi x W 0 ,W) Since T\ : V\ —> Wi is bijective with a continuous inverse, T' is also bijective with a continuous inverse, and by Lemma 2.3.4 this then also holds for all S in some neighbourhood of T. For such 5, S'(Vi) is closed as V\ is closed and S" is continuous, and we have the decomposition
w = 5 , (Vi)e5 , (w 0 ), and since S'(Vi) = S(Vi) also iy = 5(Vi)e5,(W0),
(2.3.7)
and since Wo is finite dimensional, so is Sf(Wo). Then 5 ( F ) D S(V\) is also closed since S(V\) is closed and possesses a complementary subspace of finite dimension. Finally, the dimension of the kernel of S is upper semicontinuous.
2.3 Linear operators between Banach spaces
149
Namely, if S is in our above neighbourhood of T, then since S is bijective, S is injective on Vi, and hence the kernel of S is contained in some complementary subspace of V\, and as observed above, the dimension of such a subspace equals the one of Vo- Thus dim ker S < dim ker T
(2.3.8)
if 5 is in a suitable neighbourhood of T in L(V, W). Altogether, we have verified that S is a Predholm operator if it is sufficiently close to T. Prom the preceding, we see that there exist finite dimensional subspaces VQ = ker S and VQ of V with
v = v£®v£'®vu and thus dim VJf + dim Vg = dim V0
(V0 = ker T).
(2.3.9)
S thus is injective on VQ ® V\, and since S coincides with Sf on V\, we get a decomposition W = 5(Vi)®5(V r o / / )eWS, with H^ = cokerS' and from (2.3.7) dim S{V£) + dim WJ = dim S'(W0) = dim W0
(2.3.10)
since 5 ; is bijective. Consequently ind S = dim ker S — dim coker S = dimV r 0 / -dimW'o = (dim V0 - dimF 0 ") - (dim W0 - dimS(Vf)) = dim VQ - dim W0
by (2.3.9), (2.3.10)
since 5 is injective on VQ
= indT. for S in some neigborhood of T. g.e.d. The following result motivates the definition of a Predholm operator: Theorem 2.3.2 (Fredholm alternative). Let V be a Banach space, T : V —> V a Fredholm operator of index 0. We consider the equation Tx = y.
(2.3.11)
150
Banach spaces
Either (i) Either Tx = y is solvable for all y, and thus T is surjective, hence also injective as indT = 0, and so the solution x is uniquely determined by y, or (ii) Tx = y is only solvable if y is contained in some proper subspace ofV (with a finite dimensional complementary subspace), and for each such y, the solutions x constitute a finite dimensional affine subspace. Proof A direct consequence of the definition. q.e.d.
2.4 Calculus in Banach spaces In this section, we collect some material that will only be used in Chapters 8 and 9. Definition 2.4.1. Let (V, \\-\\v)> (W, \\-\\w) be Banach spaces, F : V —• W a map. F is called differentiable (in the sense of Prechet) at u € V if there exists a bounded linear map DF(u)
:V-*W
with Hm
^o°,
\\F(u + v)-
F(u) - DF(u)(v)\\w
= Q
IMIv
/ is called differentiable in U C V if it is differentiable at every u G U. f is said to be of class C1 if DF(u) depends continuously on u. f is said to be of class C2 if DF(u) is differentiable in u and the derivative D2F(u) := D(DF)(u) depends continuously on u. It is easy to show that a differentiable map is continuous. We now wish to derive the implicit and inverse function theorems in Banach spaces that will be used in Chapter 8. We shall need a technical tool, the Banach fixed point theorem: Lemma 2.4.1. Let A be a closed subset of some Banach space (V, ||-||). Let 0 < q < 1, and suppose G : A —• A satisfies \\Gyi - Gy2\\ < q\\yi - y2\\
for all yuy2
€ A.
(2.4.2)
2.4 Calculus in Banach spaces
151
Then there exists a unique y £ A with Gy = y.
(2.4.3)
/ / we have a continuous family G(x) where all the G(x) satisfy (2.4-2) (with q not depending on x), then the solution y = y(x) of (2.4-3) depends continuously on X. Proof. We choose y0 £ A and put iteratively yn :=
Gyn-\.
We have n
n
yn = J2 (th - vi-i)+2/0 = 53 ( G ' _1 ^ t=l
Gi l
~ y°)+y°-
(2-4-4)
i=l
We obtain from (2.4.2)
Y,\\Gl-1vi-Gi-1yo\\
<E ^
1
llyi -2/0II < ^— llyi -itoll-
Consequently, the series yn in (2.4.4) converges absolutely and uniformly to some y £ A1 noting that A is assumed to be closed and the limit function y = y(x) is continuous. We have y = lim Gyn = G ( lim yn) = Gy, n—+00
\n—+00
/
hence (2.4.3). The uniqueness of a solution of (2.4.3) follows from (2.4.2), since q < 1. q.e.d. Theorem 2.4.1 (Implicit Function Theorem). Let Vi,V2,W be Banach spaces with all norms denoted by \\-\\, U C V\ x V2 open, (xo^yo) £ U', F £ CX{U^ W), i.e. F is continuously differentiable. For purposes of normalization solely, we assume F(x0lyo)=0.
(2.4.5)
We also suppose that D2F(x0,yo)'-V2-+W, the derivative of F(x$, -) :V2 ^ W aty = yo, is invertible. By our differentiability assumption, D2F(xo1yo) is continuous, and we assume that
152
Battach spaces
its inverse is likewise continuous. Then there exist open neighbourhoods U\ of XQ, U2 ofyo with U\ x U2 € U, and a differentiate map
with F(xyip(x))
=0
(2.4.6)
and D
(fix)))-1
o DiF(x, (p(x))
for all
xeUi (2.4.7)
(D\F(-,y) :V\—>W is the derivative of F(-,y) : V\ —• W). In fact, for every x € U\9
=0
can be solved locally uniquely for y as a function of #, if the derivative of F w.r.t. y is continuously invertible. Proof. The idea is to transform the problem into a fixed point problem for which the Banach fixed point theorem is applicable. We put l:=D2F(X(hyo)With this notation, our fixed point equation is * ( * , y) ••= y - rlF{x,
y)
=y
(2.4.8)
which clearly is equivalent to our orginal equation F(x, y) = 0. For every x, we thus want to find a fixed point of yt-+$(x,y). x
Using l~ ol = id (note that / is invertible by assumption), we get *(ar, yi) - *(x, y2) = r1(D2
F(x0j y0)(yi - y2) - (F(ar, t/i) - F(x,
y2)).
In Lemma 2.4.1, we take q = ~, and by the differentiability of F at (#o>2/o) and the continuity of Z"1, we may find 6' > 0,£ > 0 with the property that for Wx-xoW^S' and \\yi - Vo\\ < ^, I life - yo\\ < e
( hence also \\yi - y2\\ < 2e ),
2.4 Calculus in Banach spaces
153
we have ||*(rc,yi) - *(a?,i&)|| < - \\yi - !fe||. Furthermore, we may find 6" > 0 with the property that for \\x — sco11 < <$"> we have ||*(a?,»b)-*(a?o,»d)ll < 2* Since $(#o,2/o) = 2/o by assumption, we then have for \\y — yo\\ < e \\Q(x,y) - voll < ||*(»,») " *(aMfo)ll + 11*0*. 0o) " *(*o, tt>)|| <^\\y-yo\\
+
^<e
whenever \\x — XQ\\ < 6 := min(<$',<$"). This means that if \\x — XQ\\ < <5, $ (#,?/) maps the closed ball A:={y£V2:\\y-y0\\<e} onto itself. By Lemma 2.4.1, for every x with \\x — XQ\\ < <5, there exists a unique y =: (£>(#) with \\y — y0\\ < e and y = $(x,t/), i.e. F(x1y) = 0. Moreover, t/ depends continuously on x. We consider the open balls Ui := {x : \\x - x0\\ < (5}, t/2 := {2/: ||v - Vo\\ < £>• ($(#,•) also maps the open ball U2 onto itself.) By choosing <$,£ > 0 smaller, if necessary, we may assume f / i x t / 2 C U. It remains to show that
U2,
and abbreviate y\ :=
D2F(xuyi).
Since F is differentiable, we may write
F(x,y) =h(x-xi)
+ l2{x-x2)
+r(x,y)
where the remainder term satisfies
lim
x-+xi y - • 2/i
^ 4
-=0.
\\x - xi\\ + \\y - yi\\
(2.4.9)
154
Banach spaces
Since F(x,
+2/i -l2Xr{x,
(2.4.10)
By (2.4.9), we may find r),p> 0 such that for I k - a : i | |
\\r(x,y)\\
< 2||ri||(Ha -
Xl
H
+
H» - WID-
Thus ||r(x,
- a?i) -f r 0 (x),
(2.4.12)
with lim „ x-+xi
r
°^ „ = 0
from (2.4.9).
(2.4.13)
\\x — Xi\\
(2.4.12) and (2.4.13) yields the differentiability of
(2.4.14)
Proof. We shall apply Theorem 2.4.1 to F(x1y) := f(y) - x, and find an open neighbourhood U\ of XQ and a differentiable function
2.4 Calculus in Banach spaces
155
with ^>(U\) C C/2 for a neighbourhood U2 of yo1 with
= x for x G C/"i.
As y>(£/i) = f~1(U\) is open, we may redefine [/2 as ^(C/i), and tp then yields a bijection between U\ and C/2. As f(
• ity(*o)) = id, i.e. (2.4.14). g.e.d.
The next topic concerns ordinary differential equations in Banach spaces. In Chapter 9, we shall use the Picard-Lindelof theorem in a Banach space that we shall now derive. We need the integral of a continuous function x.I^V from some interval / = [a, 6] C R into some Banach space V,
1
x(t)dt.
Ja
This can be defined as a Riemann integral as in the case of real-valued functions through approximation by step functions. Given a continouous
we say that x(t) solves the ODE (ordinary differential equation) on / ,
if for
--x{t) = x(t) = $(t, x(t)) at dlltel x(t)=x0+ /
with x(a) = x0
${T,x(r))dT.
(2.4.15)
(2.4.16)
Ja
Theorem 2.4.2 (Picard-Lindelof). Suppose that $ is uniformly Lipschitz continuous, i.e. suppose there exists some L < oc with | | * ( t i , * i ) - * ( t 2 , x 2 ) | | < L (|*i - t2\ 4- ||xi - x 2 ||) for all t e I,xux2 e V. Then for any XQ € V, there exists a unique solution of (2.4-15).
(2.4.17)
156
Banach spaces
Proof. We shall solve (2.4.16) with the help of Lemma 2.4.1. For a continuous y : / —• V1 we define Gy G C ° ( / , V), (Gy)(t):=x0+
f *(r,i/(r))dr. Ja We note that C°(/, V), the space of continuous functions from / with values in V, is a Banach space w.r.t. the norm \\y\\co-SMv\\y{t)\\. tet (To verify this, one just needs to observe that any sequence (yn)neN C C°(I,V) with lim
\\yn - ym\\Co [ =
n,m—• oo
\
lim sup \yn{t) - ym{t)\) n,m—*oo ^ j
=0 J
converges uniformly to some continuous function y : / —• V.) We have \\Gyi -Gy2\\Co
= sup / ($(r,*/i(r)) *€/ |«/a <\t — a\ L\\y\ — 2/211 c70
$(T,2/2(T)))GM
I because of (2.4.17).
We choose e > 0 so small that
Lemma 2.4.1 with V replaced by C°([a, a -f e], V) and with q = \ then implies that there exists a unique t/GC°([a,a + e],F) with fb
Gy(t) = xo + / $ ( r , y(r))dr iov a
x(0) = a?0.
(2.4.18)
Exercises
157
(:r(0), the value at 'time' 0, is called initial value). We denote the solution by x(xo,t). Then for s, t > 0, X(XQ, t + s) = x(x(t), s)
(semigroup property).
Thus, the solution with initial value XQ at 'time11 + 5 is the same as the solution with initial value x(t) computed at 'time1 s. Proof. This follows from the uniqueness statement in Theorem 2.4.2, as both sides of (2.4.18) are solutions. q.e.d.
Exercises 2.1
Let (V, \\'\\v) (W, \\'\\w) be normed linear spaces. For a linear functional
f:V-+W, put sup
ll/(*)ll W
z€V\{0}
2.2
iFllV
Show that / is continuous iff | | / | | < 00. Let L(V,W) := {/ : V -> W linear with | | / | | < 00}. Show that if (W^IHI^) is a Banach space then so is (L(V, W), ||-||). Show that a normed space (V, ||-||) is uniformly convex if the following condition holds: Whenever (xn)neN, (yn)neN C V satisfy limsup||ar n || < 1 , limsup ||y n || < 1 and lim ||arn + y n || = 2, then lim (xn - yn) = 0. n—+oo
2.3
A normed space (V, ||-||) is called strictly normed if the following condition holds: Whenever x,y £V, x,y ^ 0 satisfy
ll* + vll = IMI + llvlU then there exists a > 0 with x = ay.
158 2.4
Banach spaces Show that any uniformly convex normed space is strictly normed. Does the Banach fixed point theorem (Lemma 2.4.1) continue to hold if we replace (2.4.2) by the condition \\Gyi - Gy2\\ < \\yi - y2\\ for all Vl,y2
e A?
3 LP and Sobolev spaces
3.1 Lp spaces In the sequel, instead of functions f : A -> RU {±00} (A measurable), we shall consider equivalence classes of functions, where / and g are equivalent if f(x) = g(x) for almost all x € A. We shall be lax with the notation, however, not distinguishing between a function and its equivalence class. The equivalence class of the zero function is called the null class, and a function in that class is called a null function. Definition 3.1.1. Let A C Rd be measurable, LP(A)
peR\{0}.
= {(equivalence classes of) measurable functions f : A —• R U {±oc} with
\f(x)\"eCHA)}. For f G LP (A), we put
\\f\\p--=\\f\\l,(A)--=(JA\m\Pdxy
•
(3.1.1)
The notation suggests that ||-|| is a norm, and we now proceed to verify this for p > 1. First of all, ||/|| p = 0 & f is a null function.
(3.1.2)
Thus, ||-|| is positive definite (on the set of equivalence classes). Next, for c € R, l|c/|| p = | c | | | / | | p .
(3.1.3)
It remains to verify the triangle inequality. This is obvious for p — 1:
||/i + MLHA)
^
WMLHA) 159
+ WMILHA) •
(3-1-4)
Lp and Sobolev spaces
160 For p > 1, we need
Lemma 3.1.1 (Holder's inequality). Letp,q > 1 satisfy ~ 4- ~ = 1, fr € If(A), f2 € L«(4). Then fuf2 € L 1 ^ ) , and ||/i/2|li<||/i||p||/2||fl.
(3.1.5)
Proof, By homogeneity, we may assume w.l.o.g. Il/i|l p = l , ll/allg = l.
(3.1.6)
Recalling Young's inequality, namely ap bq ab < — 4- — p g
1 1 for a, b > 0 , p, q > 1 , - + - = 1, p g
(3.1.7)
we have for x £ A /1W/2W < — - — + —-—> p q hence by our normalization (3.1.6) / l/i(^)/a(a:)|€te < 1 + 1 = 1 = H/ill^ll/alUq.e.d. We now obtain the triangle inequality: Lemma 3.1.2 (Minkowski's inequality). Let / 1 , f2 € £ P 0A), p > 1. T/ien ll/i + / 2 | | p < | | / i | | p + | | / 2 | | p .
(3.1.8)
Proof The case p = 1 is given by (3.1.4). We now consider p > 1 and put q := ^ (so that \ + ± = 1). For >(*) := l/i(*) + / 2 ( z ) r \ we have
^ = i/i+/ 2 r, q
i.e. V € L {A). Since l/iCx) + / 2 (x)| p < \h(x)*{x)\
+
\MxMx)\,
we get ||/i + / 2 || P '<||/iV<|| 1 + ll/2'||1
= (ll/l|| p + ll/ 2 |l P )ll/l+/2||lp
3.1 LP spaces
161
Since p - 2 = 1, (3.1.8) follows. q.e.d. We have thus verified that ||-|| is a norm on LP(A). In fact, we have: Theorem 3.1.1 (Riesz-Fischer). Let A be measurable, p > 1. Then LP(A) is a Banach space. Proof. Let (/n)n€N C LP(A) be a Cauchy sequence. For every v € N, we may then find nv € N with ll/n - / n j | p < 2^
for
a11 U
^ n^*
This implies that the series oo
<31-9)
\\u\P+J2\\f^-^\\p converges. We claim that the series CO
then converges in LP(A). Since all elements of the series are nonnegative, (<7m)m€N converges to some g : A —• M+ U {oo} pointwise in A, and Corollary 1.2.1 implies that (gm) also converges to g in L P (A). In particular, g(x) < oo for almost all x e A. Thus, our original sequence (3.1.10) is absolutely convergent for almost all x € A, towards some / with | / | < g -f |/m|; in particular / G L p (0). We interrupt the proof to record: Lemma 3.1.3. Let (/ n )n€N converge to f in LP(A). quence converges pointwise almost everywhere to f.
Then some subse-
In order to complete the proofs of Lemma 3.1.3 and Theorem 3.1.1, it remains to show that the series (3.1.10) converges to / in LP(A). (Then a subsequence of (/ n ) converges to / in LP(A). Since (fn) was assumed to be a Cauchy sequence in LP(A), the whole sequence has to converge in LP(A). It is in general not true, however, that the whole sequence also converges pointwise almost everywhere to /.) This is easy: oo /n,W + £
(/n„+1(x) " /»„(*)) "
fix)
Lp and Sobolev spaces
162
converges to 0 almost everywhere in A, and since
fni (*) + Yl C^+i ^) ~ ^ ^) " - ^ <2| 5 (x)| + 2|/ n i (x)|, t/=i
we may apply Lebesgue's Theorem 1.2.3 on dominated convergence to conclude that we get convergence also w.r.t. ||-|L. q.e.d. Corollary 3.1.1. L2(A) is a Hilberi space with scalar product
(/i,/ 2 ):= [ h(x)f2(x)dx. JA
Proof. It follows from Holder's inequality (Lemma 3.1.1) that
i(/i,/ 2 )i
Theorem 3.1.2. L°°(A) is a Banach space. Proof If is straightforward to verify that IHI^ is a norm. It remains to show completeness. Thus, let (fn)neN be a Cauchy sequence in L°°. For v G N, we find n„ G N such that for m,n>_nv Win
/ m | l o o < cyv •
Thus
\x*A\
|/n(rr)-/m(rr)|>lj
3.1 L* spaces
163
is a null set for ra, n > nu, and so then is
N:=
(J
|xGi4|
\fn(x)-fm(x)\>^
m,n>nl
as the countable union of null sets. Since
\fn(x) ~ fm(x)\ < ± for Vfi^n>mnv and x G A \ iV, / n converges uniformly on A \ N towards some / . We simply put f(x) = 0 for x G N. Then ess sup |/„(x) - / ( x ) | = ess sup \fn(x) - / ( a ) | , since the essential supremum is not affected by null sets,
< jp and fn converges to / in L°°(A). q.e.d. We also note that Holder's inequality admits the following extension to the case p = 1, q = oo: Lemma 3.1.4. Let fx G L1(A), f2 G L°°(A).
Then fxf2 G L1(A), and
ll/i/alli < IIAIIi ll/alloo -
(3.1.H)
Proof.
/ |/i(x)/ 2 (a:))|dx<esssup|/2(x)| / JA
x£A
\fi(x)\dx
JA
= ll/ 2 IUI/illiq.e.d. Theorem 3.1.3. Let A C Rd be measurable. Let 1 < p < oo, q = -^y, i.e. -P + jQ = 1. Then Lq(A) is the dual space of LP(A). In particular, LP(A) is reflexive. Remark 3.1.1. The dual space of LX(A) is given by L°°(A) while the dual space of L°°(A) is larger than Ll(A). Therefore, neither Ll(A) nor L°°(A) is reflexive. In order to prepare the proof of Theorem 3.1.3, we first derive:
164
LP and Sobolev spaces
Theorem 3.1.4 (Clarkson). Let A C Rd be measurable, 2 < q < oo. Then Lq(A) is uniformly convex. Remark 3.1.2. Clarkson's theorem holds more generally for 1 < q < oo. The proof for 1 < q < 2 is a little more complicated than the one for 2 < q < oo. The proof of Theorem 3.1.4 is based on: L e m m a 3.1.5. Let 2 < q < oo, f,g G Lq(A).
Then
11/+g\\qq +11/ - *n; < 2*- 1 (H/H; + M I ; ) .
(3.1.12)
Proof. For x, y > 0, we have (xq + yq)* < ( x 2 + i / 2 ) 5 < 2 ^ ( x 9 + i / 9 ) i
(3.1.13)
(In order to verify the left inequality in (3.1.13), we may assume w.l.o.g. x2 -f y2 = 1. Then xq < x2, yq < y2 since q < 2, and the desired inequality easily follows. The right inequality follows for example from Holder's inequality (Lemma 3.1.1) applied to the following functions /i,/2:(-l,l)-»R / i = l, 2
x ut) = {% b 2
for - 1 < t < 0 for 0 < t < 1. )
The left hand side of (3.1.13) implies (|o + b\q + \a- fe|9)' < (|o + bf + \a- 6| 2 ) *
(3.1.14)
for a, 6 € R, and by the right-hand-side of (3.1.13), we have V2(a2 + b2)^ < 2 ^ (\a\q + |6| 9 )' .
(3.1.15)
Equations (3.1.14) and (3.1.15) imply |/0r) + g(x)\" + \f(x) - g{x)\" < 2"~l (|/(x)| 9 + |ff(x)|'),
(3.1.16)
and (3.1.12) follows by integrating (3.1.16). q.e.d. Proof (Theorem 3.14)- Let f,g e Lq(A) with
3.1 LP spaces
165
By (3.1.12),
H/+ffll2 + li/-0li2<2'. Therefore, for e > 0, we may find 6 > 0 such that
ll/-3ll,<< whenever | | | ( / + #)|| > 1 — 5. This shows uniform convexity. g.e.d. Proof (Theorem 3.1.3). We consider the map i: Lp(A) -> Lq(A) with <(/)(») := /
f(x)g(x)dx.
JA
By Holder's inequality (Lemma 3.1.1) ||i(/)||=
sup
|i(/)(s)|<||/||p.
(3.1.17)
g€Ll(A) \\g\\q
Thus i(f) is indeed an element of Lq(A)*. We claim that we have equality in (3.1.17). This means that there exists some g £ Lq with /
f(x)g(x)dx
\JA
We put g(x) := sign/(x) \f(x)\p-\ and / f(x)g(x)dx\^ I.M
HI/IUML.
(3.1.18)
Then \g\q = | / | p , hence g e
Lq(A),
/ |/(x)p(ar)|da? I
JA
[\f(x)\pdx
=
=
JA
(A
l/(X)|PrfX P l/(X)|PrfX 9
) (^
)
= ll/ltllffllgThis verifies (3.1.18), hence equality in (3.1.17). Equality in (3.1.17) implies that i is an isometry, in particular injective. In order to complete the proof we need to show that i is surjective. Suppose on the contrary that L"{A)*\i{U{A))^%.
Lp and Sobolev spaces
166
Since LP{A) is complete and i is continuous, i(Lp(A)) is complete, hence closed. By the Hahn-Banach theorem (Corollary 2.1.1), there then exists veL«(A)**,v ^ 0 , with v\i(LP(A))
= 0.
We now suppose for a moment that 1 < p < 2. Then 2 < g < oo, and Lq{A) is reflexive by Theorems 3.1.4 and 2.2.3. We may therefore find a g in Lq{A) with F(g) = v(F)
for all F E
Lq{A)\
We then have for any y> E L P (A) 0 = v(i(
3.2 Approximation of Lp functions by smooth functions (mollification) In this section, we shall smooth out LP functions by integrating them against smooth kernels. As these kernels approach the Dirac distribution, these regularizations will tend towards the original function. For that purpose, we need some g E C^{Rd)\ with
/ Jud
g(x)dx(= \
f JB{O,I)
g{x) > 0 for all x E Rd
(3.2.1)
g(x) = 0
(3.2.2)
g(x)dx)=l.
for |x| > 1
(3.2.3)
J
Such a g is called a Friedrichs mollifier. In this §, fi will always denote an open subset of Rd. Let / E L x (0). We extend / to all of Rd by putting t For O C K d open, C*Q°(0) is the space of all C°° functions
3.2 Approximation of LP functions by smooth functions
167
f{x) = 0 for x e Rd \ ft. Let h > 0.
/fc( ):=
* £/,X^) / ( y ) d *-
(324)
--
fh is called the mollification of / with parameter h. In order to appreciate this definition, we first observe suppQ \j^~)
C B(Vlh)
:= {z£Rd\
\z-y\<
h},
(3.2.5)
where Q ( ^ ^ ) is considered as a function of x, and
hL'icir)*-*-
<326)
--
For these reasons, one expects that fh tends towards / as h tends to 0. It remains to clarify the type of convergence, however. The advantage of approximating / by fh comes from: Lemma 3.2.1. Let Q,' CC fif, h < dist(fy,dfi). Then fh € C°°(fi'). Proof By Corollary 1.2.2, we may differentiate w.r.t. x under the integral sign in (3.2.4), and since g £ C°° so then is fhq.e.d. We now start investigating the convergence of fh towards / . Lemma 3.2.2. If f £ C°(Q), then for each ftf CC ft, fh converges uniformly to f on fi' as h —• 0. In symbols: fh—^f on ftf as h —• 0. Proof We have f{x) = / g(w)f{x)dw J\w\
by (3.2.3)
(3.2.7)
and fh(x) = [ g(w)f(x - hw)dw J\w\
(3.2.8)
in (3.2.4). For Qf CC ft and h <
f tQ* C C fi' means that the closure of fi' is compact and contained in fi. We say that Q' is relatively compact in Q.
Lp and Sobolev spaces
168
\ dist(n',dfi), we then have sup |/(x) - h{x)\
< sup /
x£Q'
Q(W) \f(x) - f(x -
hw)\dw
xeQ' l\w\ J\w\
< sup |/(x) -
f(x-hw)\
\W\<1
using (3.2.3) once more. Since fi' is bounded, {x G fi1 dist(x, fi') < h) is compact (recall the choice of h). Therefore, / is uniformly continuous on that set, and we conclude that sup |/(x) - fh(x)\ -> 0 as h -+ 0, x£Q'
i.e. uniform convergence. q.e.d. Theorem 3.2.1. Let f G LP(Q), 1 < p < oo. T/ien /^ converges to f in Lp(Q) as h -+ 0. Froo/. We have for p G 17 (f2) / Jn
\gh(x)\pdx Q(w)g(x — hw)dwdx
= JQJ\W\
< / I /
g(w)dw J
JQ \J\W\<1
J
I /
g(w) \g(x - /ut;)|pcfe; J .
\J\u)\
J
by Holder's inequality = / J\w\
g(w) / \g(x -
hw)\pdxdw,
JQ
using (3.2.3) and Fubini's theorem, = / J\w\
\g{y)\pdydw
Q(w) / d
JR
= j \g(y)\pdy, JQ
using (3.2.3) again. Thus \\9h\\LPiQ)<\\9\\LP{Qy
(3.2.9)
3.2 Approximation of LP functions by smooth functions
169
Let e > 0. By Theorem 1.1.4, (6), we may find ip G C#(R d ) with II/-VIILP(R-)<|-
( 3 - 2 - 10 )
Since (p has compact support, we may apply Lemma 3.2.2 to conclude that for sufficiently small h > 0, ^ 3'
(3.2.11)
( 3 - 2 - 12 )
II¥>-WIIILP(R*)
Applying (3.2.9) to / - ?, we obtain | | / n —
(3.2.10)-(3.2.12) yield
11/ - A l l ^ n ) < 11/ - /nlli>(R-) < «•
(3-2.13) g.e.d.
Corollary 3.2.1. For 1 < p < oo, C£°(n) is dense in L p (fi). Proo/. Let / G L p (0), e > 0 . We may then find O ' C C f i with H/llLp(n\n')
<
2*
We put / ' := fxLp{Q')- Then H/-//llLp(n)<|-
( 3 - 2 - 14 )
By Theorem 3.2.1, for sufficiently small h, ll//-/fcllLp(n)<5-
(3-2-15)
By (3.2.13), (3.2.14) 11/ ~~ //IIILP(Q)
<
2*
Since f'h G C^(Q) for h < dist(fi',dfi), the claim follows. q.e.d. Corollary 3.2.2. Lp(fl) is separable for 1 < p < oc. jBvcry / G L P (Q) con 6e approximated by piecewise constant functions. Proof By Corollary 3.2.1, it suffices to find a countable subset BQ of LP(Q) with the property that for every (p G CQ°(Q) and every e > 0, there exists some a G BQ with
IIP-allien) <«•
(3-2.16)
Lp and Sobolev spaces
170
Let B the set of all functions a on Md of the following form: There exist some fc, N £ N and rational numbers c*i,..., a* and cubes Q i , . . . , Qfc G Md with corners having all their coordinates in jjZ and of edge length •^ such that for x E Qi otherwise. Clearly, B is countable. Since a continuous function (p with compact support is uniformly continuous, we may easily find some a £ B with
"<*> = { ?
11° -
^ILP(Q)
^ ll a ~
< e-
(3.2.17)
We put BQ := {axn I « £ # } . # n is likewise countable, and from (3.2.15), (3.2.16), we conclude that BQ is dense in LP(Q). q.e.d. Remark 3.2.1. The separability of LP(Q) can also be seen by using Corollary 3.2.1 and the Weierstrass approximation theorem that allows the approximation of continuous function with compact support by polynomials with rational coefficients. The preceding results do not hold for L°°(fi). Namely, if a sequence of continuous functions converges w.r.t. IHI^oom), then it converges uniformly, and therefore, the limit is again continuous. Therefore, noncontinuous elements of L°°(Q) cannot be approximated by continuous functions in the L°°-norm. Also, L°°(Q) is not separable. To see this, let (a>n)neN be any subsequence of {0,1}, i.e. an € {0,1} for all n. To (a n ), we associate the function /( a n ) on (0,1) defined by , - J 1 ' ^ - " t o ff
for^<x<^rrifafc = l for^<x<^Tifa, = 0
torkeN
-
Then for any two different sequences (a n ), (6 n ), ||/(a n ) -/(&n)|lr,~((o,i)) = = 1 Since the set of subsequences of {0,1} is uncountable, this implies that L°°((0,1)) is not separable. Of course, a similar construction is possible for L°°(fi), fi any open subset of Rd. We finally note: L e m m a 3.2.3. Let f € L2(Vt), and suppose that for all tp e
[ f(x)
Then / = 0.
CQ°(Q)
3.3 Sobolev spaces
111
Proof. Since Cg°(fi) is dense in L 2 (fi), and since gy-> I Jn
f(x)g(x)dx
is a continuous linear functional on L 2 (fi), we obtain that / f(x)g(x)dx /n
= 0 for all g G L 2 (0).
Putting # = / yields the result. q.e.d.
3.3 Sobolev spaces In this section, we wish to introduce certain extensions of the LP spaces, the so-called Sobolev spaces. They will play a fundamental role in subsequent chapters because they constitute function spaces that are complete w.r.t. norms naturally occurring in variational problems. In this section, ft will always denote an open subset of Rd. We shall use the following notation: For a d-tuple a := ( a i , . . . ,a) of nonnegative integers,
N-t«..«.=-(sr)"'-(s'rDefinition 3.3.1. Let u,v G L 1 (fi). Then v is said to be the ot-th weak derivative of u, v := Dau, if I\pvdx = ( - 1 ) N J uDaipdx
(3.3.1)
for every (p G Cg . We can now define, for k G N and 1 < p < oo, the Sobolev space Wk*(Q) := {u G Lp(Q) | Dau exists and lies in LP(Q)forall\a\
liv*.p(fl) : =
] C / lD«ulP
Fino/fe fe* #*' P (Q) andH^p(Q) be the closures o / C ° ° ( l ] ) n ^ ( ( l ) fc and C$° n W 'P(fi), respectively in Wk*(Q).
Lp and Sobolev spaces
172
We shall use the following abbreviations for u G W 1 ' 1 (fi), 1 < i < d. D{U is the weak derivative for the multiindex ( 0 , . . . , 0 , 1 , 0 , . . . , 0), 1 at the i t h position, and Du is the vector ( D i t t , . . . , Ddu) of all first weak derivatives. The following result is obvious. L e m m a 3.3.1. Letu G Ck(Q), and suppose all derivatives ofu of order < k are in LP(Q). Then u G Wk,p(Q), and the weak derivatives are given by the ordinary derivatives. q.e.d. Thus, the WkyP spaces constitute a generalization of the spaces of k times differentiate functions. The Wk)P norm is considerably weaker than the Cfc-norm, and so the Wk*p spaces are larger than the Ck spaces. Before investigating the properties of these spaces, it should be useful to consider an example: Let ft = (—1,1) C M, u(x) := \x\. We claim that u G Wl,p(Q) for 1 < p < oc. In order to see this it suffices that the first weak derivative of u is given by
Indeed, we have for (p G /
CQ((-1,
1))
(p(x)v(x)dx = — /
We claim, however, that u is not contained in W2,p(ft). Namely if w(x) were the second weak derivative of it, it would have to be the first weak derivative of v) and consequently, we would have w(x) = 0 for x ^ 0. The rule for integration by parts (3.3.1) would then require that for all y?e Q 1 ( ( - i , i ) ) 0= /
(f(x)w(x)dx (pf(x)v(x)dx
= — = /
(p'(x)dx — /
ip'{x)di
= 2(p(0) which is not the case. Thus, v does not have a first weak derivative.
3.3 Sobolev spaces
173
Remark 3.3.1. Some readers may have encountered the notion of a distributional derivative. It is important to distinguish between weak and distributional derivatives. Any L 1 (fi) function possesses distributional derivatives of any order, but as the preceding example shows, not necessarily weak derivatives. In the example, of course, the second distributional derivative of u is 2<5o, where <5o is the Dirac delta distribution at 0. u does not possess a second weak derivative because the delta distribution cannot be represented by an L 1 function. Theorem 3.3.1. The Sobolev spaces Wk,p(fl) spaces w.r.t. ||-|liy*,P(n)-
are separable Banach
Proof. That ||-||^fc,p(n) 1S a norm follows from the fact that IHI^pm) is a norm (see section 3.1). Similarly, we shall now derive completeness of Wk*{Q) from the completeness of the LP(Q) spaces (Theorem 3.1.1). Thus, let (vn)n€N C WkiP(Q) be a Cauchy sequence w.r.t. ||-||w*.p(n)* This implies that (Daun)n^ is a Cauchy sequence w.r.t. IHI^pm) for all \OL\ < k. By Theorem 3.1.1, (Daun) therefore converges in Lp(fl) towards some va. [ Daun
• ip = (-l)l a > / unDa
JQ
(3.3.2)
JQ
Therefore, va is the a-th weak derivative of t>o, the L p -limit of (un)new, and consequently VQ € Wk,p(ft). The separability again follows from the corresponding property for L p (fi) (Corollary 3.2.2). q.e.d. Theorem 3.3.2. Wk*(Q) =
Hk*(n).
This result says that elements of WkyP(Q) can be approximated by C°°(fi) functions w.r.t. |Hlwfc,P(m. I*1 general, however, for k > 1 one has ifo' p (0) ^ Wfc'p(fi) so that Wk*(fl) functions cannot be approximated by Cg°(fi) functions, in contrast to LP(Q) functions where this is possible (Corollary 3.2.1). This is seen from the following simple example: 11 = (—1,1) C K, u(x) = 1. If ((pn)neN C CQ° (ft) converges to u in L 1 (fi), then after selection of a subsequence, it converges pointwise almost everywhere (Lemma 3.1.3), and therefore, for sufficiently large n, there exists xn G (-1,1) with (pn(xn) > \. Since
j _ \
Lp and Sobolev spaces
174
Therefore,
Da,xQ ( —j£- J • w(y)dy
(using Corollary 1.2.2)
where Da,x is the derivative w.r.t. x,
= (-l)°jf - i?«,,e(^)-«(y)d» = j ^ Q ( ^ ) A,u(y)d» by definition of Dau = (Dau)h(x).
(3.3.3)
Thus, the derivative of the mollification is the mollification of the derivative. Since Dau € L p (n), by Theorem 3.2.1, (Dau)h converges to Dau in L P (Q) for h —• 0. By (3.3.4), we conclude that Da(uh) converges to Dau in 1^(0), for all | a | < fc, and this means that UH converges to u in
wk*{n).
q.e.d. Theorem 3.3.3. Wk*p(Sl) is reflexive for k e N, 1 < p < oo. Proof. It follows from Theorem 3.1.3 that the dual space of WkiP(fl) is given by Wk,q(Ql)) with £ -f ^ = 1. This implies reflexivity. g.e.d. Theorem 3.3.4. iJo'^fi) is closed under weak convergence in WkyP(Q). Proof This follows from Lemma 2.2.5, since HQ,P(Q) by its definition is a closed subspace (w.r.t. strong convergence) of Wk'p(U). q.e.d. Theorem 3.3.5. For 1 < p < oo, k G N, any sequence in Wk,p(Q) that is bounded w.r.t. ||*||v^fc,p(n) contains a weakly convergent subsequence.
3.4 Rellich's theorem, Poincare and Sobolev inequalities
175
Proof. By Theorems 3.3.1 and 3.3.3, Wk'p(Q) is separable and reflexive. Therefore, the result follows from Corollary 2.2.1. q.e.d.
3.4 Rellich's theorem and the Poincare and Sobolev inequalities The compactness theorem of Rellich is: Theorem 3.4.1. Let ft C Rd be open and bounded. Let (un)ne^ C H0yP(ft) be bounded, i.e. \\un\\Wi,p,Q\ < c (independent of n). Then a subsequence of (un)n£^ converges in Lp(fl). Remark 3.4.1- Rellich originally proved the theorem for p = 2. Kondrachev proved the stronger result that some subsequence converges in Z/*(n) for 1 < q < ^ if p < d and for 1 < q < oo if p > d. Of course, these exponents come from the Sobolev Embedding Theorem (see (3.4.12)). See Corollary 3.4.1 below. Proof. Since un G H0,P(Q), for every n G N and e > 0, there exists some vneC^{Q) with | P n ~ Vn||jyi,p(n) < «•
(3.4.1)
Therefore ||t>n||||ri,p(n) < C '
(=C+f)'
(3A2)
We consider the mollification v
nAx) = yt J e \h^)
V
^y)dy
of vn and estimate \vn(x)
-Vnyh(x)\
/ g(w)(vn(x) |^NI
< / Q{w) / J\w\
- vn(x - hw))dw |
Q
\°r
\^-vn(x-r'd) drdw
by (3.2.7), (3.2.8) w with ^ = 7 - 7 . \w\
(3.4.3)
LP and Sobolev spaces
176 This implies
/ \vn(x) - vUyh(x)\p dx JQ
fhM I d
f ( f < / I /
g(w) /
JQ \J\W\
f ( f
= / I / JQ \J\W\
JO
/
\
g(w)p / Jo
\—vn(x r
\°
dx
J
ft
rh\w\ I
\ '
Y
— vd) \drdw\
\or
(g(w)1~p)
V
I
\—vn(x
I
\
— rfi)\drdw I
J dx /
I [ g(w)hp\w\p f \Dvn(x)\pdxdw) , vM
< ( / g(w)dw) VM
using Holder's inequality, Fubini's theorem and the notation n
(
d
d
Since fiwi<1 g(w)dw = 1 (by (3.2.3)), we obtain K - Vn,h\\LP{n) ^ h WDvn\\LP(Q) < he' by (3.4.2) < -
if h is sufficiently small.
(3.4.4)
Next, \VnMx)\ ^
yiCo\\Vn\\Li{U) with Co :— sup^ g(z) by definition of vn,h
< -jcoimeasny-p \\vn\\LP(n) by Holder's inequality,
(3.4.5)
and similarly O
-j
(3.4.6) with Ci := sup 2 | ^ r ^ ( ^ ) | - From (3.4.2), (3.4.5), (3.4.6), we see that for fixed h > 0, |t>n,h|| C i (n) < constant
(3.4.7)
(where the constant depends on h). Therefore, (vn,/i)n€N contains a uniformly convergent subsequence by the Arzela-Ascoli theorem. Since uni-
3.4 Rdlich's theorem, Poincare and Sobolev inequalities
177
form convergence implies Lp-convergence (e.g. by Theorem 1.2.3), the closure of t>n,h, is compact in L P (Q). Since a compact subset of a metric space (e.g. a Banach space) is totally bounded, there exist finitely many u?i,..., WN £ LP(Q) such that for every n € N there exists 1 < j < N with e IKft-WjllLp(n) < g-
(3-4-8)
By (3.4.1), (3.4.4), (3.4.8), for every n <E N we find 1 < j < N with \un
w
"^ €'
j\\LP{Q)
Thus, (wn)n€N is totally bounded in Lp(ft). Therefore, the closure of (un)n€f$ in L p (fi) is compact (again, a general result for metric spaces), and it thus contains a convergent subsequence in L p (ft). q.e.d. We now come to the Poincare inequality: Theorem 3.4.2. Let ft C Rd be open and bounded. For any u E
HQ,P(Q)
i
||u|
d
W) * ( ™ r ) H ^ ' W
(3-4-9)
lu/iere u;^ is the Lebesgue measure of the unit ball in Rd. Proof. Since CQ(Q) is dense in H Q ' ^ Q ) , we may assume u e Co (ft). We put u(x) = 0 for all x e Kd \ ft. For d e Rd with |tf| = 1, we have £
/•DO
w(x) = — / Jo
Trw(x + or
r
$)dr.
Integration w.r.t. $ yields
I
1
f°° f
d
\u(x)\ = \ — -,— / / —uix + ntydfidr I dud Jo J\0\=i dr
<xr/l—^rrl^(»)l*. <^d JQ
\X
- vf
178
LP and Sobolev spaces
Therefore p
(jf|u(*)f<'dx
P
-l
by Holder's inequality
^(/w»»'*)f(/„^)
<3-4i(,»
using Fubini's theorem to exchange the order of integration in the first factor. In order to control JQ
\x-y\
we choose R with meas ft = meas B(y, R) = uJdR (B(y,R):={zeRd\ Since
\z\ < R}). 1 1 —-j—r for lla? —*«i/l -> ^ ^ d - i -< R-^J—r d-i 1
1 JZT > 1 ^ n
Ix-yl"" -^-
for \x-y\<
R,
we have / jzrdx JQ \X - y\
< I I^i^x JB(y,R) \x - y\ = dwdR = dwd
a
(3.4.11)
3
(rneas 0 ) .
Equations (3.4.10) and (3.4.11) yield (3.4.9). q.e.d.
3.4 Rellich's theorem, Poincare and Sobolev inequalities
179
We now come to somewhat stronger results that will however only be needed in Chapter 9. Namely, we have the Sobolev inequalities. T h e o r e m 3.4.3. Let u G H^P(Q). (i) Ifp
then u G L ^ F ( f i ) , and IHI^L
(ii) Ifp>d,
(3.4.12)
then u G C°(fi), and sup\u\ < c(measfi)^~p \\Du\L v n
(3.4.13)
with constants c depending only onp and d. (Actually, by a Theorem of Morrey, forp > d, u G H0,p(ft) is even Holder continuous with exponent 1 — ^.) We only prove (i) as (ii) will not be used in the present book: Proof. We first assume u G CQ(Q). have
Since u has compact support, we
oo
|Diu(y)|dy* fori = l , 2 , . . . , d .
/
-oo
Multiplying these inequalities for i = 1 , . . . , d yields
/
7=T
W(y)\
d
r°°
\ ^
<\JlJ_ IA"(2/)l
•
Using Holder's inequalityf, we compute °°
d
/
\u(x)\*^
dx1
-co
IDMy^dy1) -co
a
/ /
M]/
dx1
IAti(»)|d»*
J — co \i=2^"°°
/ 1
oo
\ JZT / < *
JDmMlVJ
/*oo
\
3^r
(n/_ lAtid/JId^dx1) .
f More precisely, one uses Exercise (2) below with p\ = • • • = Pd-i ~
d—\.
Lp and Sobolev spaces
180
Iteratively also integrating w.r.t dx2,...,
dxd finally yields
iHiLA(n)^(n/ni^wi^) < 3 f
Y\Diu(x)\dx
This is (3.4.12) for p = 1. The case of general p may now be obtained by applying (3.4.14) to \u\^ for suitable \i > 1 and using Holder's inequality. Namely, from (3.4.14) for |u|M in place of u
lun^K^Jjuixr-'iDuwidx <^\\\ur1
\\Du\\LP Li d by Holder's inequality.
forj + i p
q
For p < d, we may take \i = *•d—p 7 'p and obtain
H
M
^
ft
IIM—1 L.llM-1
II
iin-.M
11 r-k
LP
'
ud_ < -i \wr ud \\Du which yields (3.4.12), since £ = ^ .
q.e.d.
As a consequence, we obtain the theorem of Kondrachev: Corollary 3.4.1. Let ft E Rd be open and bounded. Let (un)ne^ C H0,p(ft) be bounded for some 1 < p < d. Then a subsequence converges inLq(fl) for any I p as otherwise the result is an easy consequence of Holder's inequality since ft is bounded. We denote this converging subsequence again by (un). Prom Holder's inequality, we obtain \\un — um\\La(Q) < \\Un — Um\\Li(Q) \\un ~ ^m||
d£_
if [i satisfies - = /i -f (1 — /i) ( q \p d
(3.4.15)
Exercises
181
Since Dun is bounded in LP(Q) by assumption, and (un) is a Cauchy sequence in LP(Q), hence also in I/ 1 (fi), (3.4.15) then implies the Cauchy property in Lq(ft). q.e.d.
Exercises 3.1
Let Ax := {x G Rd
> 1} , A2 := {x G
<1>.
and consider f(x) = \\x\\x
for A G E .
For which values of d,p, A is / G L*(Ai), or / G U)(A2)cl 3.2
Let A C R be measurable. Let p i , . . . ,Pfc > 1» ]Ci=i p = *> /< G LP'(A) for i = 1 , . . . , k. Show / i •... • fk G L X (A), with
II*
^n«/iiip. i
3.3
<=i
Let A C R d be measurable, meas A < oo, 1
(meas A) p
TII/III (meas A) «
(Hint: Apply Holder's inequality with fi = I, f2 = f) 3.4
Let A C i d be measurable, 1 < p < tf < r, ^ = f + / G LP(A) n Lr(A). Then / G L*(A), and L«(A)
3.5
<
ii
7£i,
l-Q
LP(A) UJ \\Lr(A)
Let A C R d be measurable, meas A < 00, / : A -> R U {±00} measurable. Then lim
1
T 11/11LP{A)
P-*°° (meas A) p
L°°(A)
(where we allow these quantities to be infinite).
Lp and Sobolev spaces Let A C Md be measurable, (fn)neN C LP(A) with 11fn||p < constant. Suppose fn converges pointwise almost everywhere on A to some / . Is / e LP(A), and do we necessarily get ll/n-/||p-0
asn~+oo?
Let A\,A2,f be as in exercise 1). For which d, fc,p, A is / in Wk*(Ax) or in Wk*(A2)? Consider the sequence (sin(nx)) ne N in L 2 ((0,1)). Does it converge in the L 2 -norm? Does it converge weakly? If so, what is the limit?
4
The direct methods in the calculus of variations
4.1 Description of the problem and its solution The typical problem of the calculus of variations is to minimize an integral of the form F(u) := /
f(x,u(x),Du(x))dx
Jn
where fi is some open subset of Rd (in most cases, Q is bounded), among functions u: n - > R belonging to some suitable class of functions and satisfying a boundary condition, for example a Dirichlet boundary condition u(v) = 9(y)
for
ytdft
for some given g : dQ —• R. Thus, the problem is F(u) —> min
for u EC,
where C is some space of functions. The strategy of the direct method is very simple: Take a minimizing sequence (wn)neN C C, i.e. lim F(un) = inf F(u), n—>oo
u€C
and show that some subsequence of (un) converges to a minimizer u G C. To make this strategy be successful, several conditions should be met: (1) Some compactness condition has to hold so that a minimizing sequence contains a convergent subsequence. This requires the careful selection of a suitable topology on C. 183
184
Direct methods The limit u of such a subsequence should be contained in C. This is a closedness condition on C. In particular, for (1) and (2) to hold, C should not be too restrictive. In other words, one should not specify too many properties for a solution u in advance. Some lower semicontinuity condition of the form F(u) < liminf F(un)
if un converges to u
n—>-oo
has to hold, in order to ensure that the limit of a minimizing sequence is indeed a minimizer for F. The lower semicontinuity condition becomes easier if the topology of C is more restrictive, because the stronger the convergence of un to u is, the easier that condition is satisfied. That is at variance, however, with the requirement of (1) since for too strong a topology, sequences do not always contain convergent subsequences. Therefore, we expect that the topology for C has to be carefully chosen so as to balance these various requirements. In order to gain some insights into this aspect, it is useful to approach the problem from an abstract point of view. Thus, we shall return to the concrete integral variational problem raised in the beginning only later.
4.2 Lower semicontinuity We say that a topological space X satisfies the first axiom of countability, if the neighbourhood system of each point x G l has a countable base, i.e. there exists a sequence (t/I/)1/eN of open subsets of X with x €.UV with the property that for every open set U C X with x G U there exists n e N with
VncV. X satisfies the second axiom of countability if its topology has a countable base, i.e. there exists a family {Uu)y^n of open subsets of X with the property that for every open subset V of X, there exists n <£ N with
UnCV. We note that separable metric spaces X satisfy the second axiom of countability. In fact, let (X^)^ € N be a dense subset of X, and let ( r ^ ) ^ ^ be dense in 1R+. Then {7(x„,rM) := {x e X : d(x,x„) < r^}
4-2 Lower semicontinuity
185
(d(-, •) the distance function of X) forms a countable base for the topology. If the first countability axiom is satisfied, topological notions usually admit sequential characterizations. For example, if (#n)n£N C X is a sequence in a topological space X satisfying the first axiom of countability, then any accumulation point of (xn) (i.e. any x G X with the property that for every neighbourhood U of x and any m G N, there exists n > m with xn G U) can be obtained as the limit of some subsequence of (xn). Although we shall often employ weak topologies which typically do not satisfy the first axiom of countability, for our purposes it will usually be sufficient to use sequential versions of topological properties. For that reason, we shall define our topological notions in sequential terms, without adding the word 'sequentially'. Definition 4.2.1. Let X be a topological space. A function F : X —• 1 := R U { i o o } is called lower semicontinuous (Isc) at x if F(x) < liminfF(x n ) n—+oo
for any sequence (xn)nen C X converging to x. F is called lower semicontinuous if it is Isc at every x G X. The following properties are immediate: Lemma 4.2.1. (i) IfF:X-+Ris Isc, A > 0, then \F is Isc. (ii) If F,G : X ~+ R are Isc, and if their sum F + G is well defined {i.e. there is no x G X for which one of the values F(x),G(x) is +oo and the other one is —oo), then F + G is also Isc. (iii) For F, G : X -+ R Isc, inf (F, G) is also Isc. (iv) If (Fi)i£i is a family of Isc functions, then s\xpieI Fj is also Isc. Examples. (1) Any continuous function is lower semicontinuous. (2) If X satisfies the first axiom of countability, then A C X is open if and only if its characteristic function \A is Isc. Definition 4.2.2. (i) Let X be a normed space, with norm ||-||. F : X —• R is weakly proper, if for every sequence (x n ) n € N C X with \\xn\\ —• oo we have F(xn) —• oo for n —• oo.
186
Direct methods
(ii) Let X be a topological space. F : X —• E is coercive if every sequence (xn) C X with F(xn) < constant (independent of n) has an accumulation point We now formulate the following general existence theorem for minirnizers: Theorem 4.2.1. Let X be a separable reflexive Banach space, F : X —• R weakly proper and lower semicontinuous w.r.t. weak convergence. Then there exists a minimizer XQ for F, i.e. F(x0) = inf F(x) Proof. Let (xn)n^
(> -oo).
be a minimizing sequence for F , i.e. lim F(xn) = inf F(x).
Since F is weakly proper, ||x n || is bounded. Since X is reflexive, after selection of a subsequence, xn converges weakly to some xo G X by Corollary 2.2.1. By lower semicontinuity of F , F(x 0 ) < lim F(xn) = inf F(x), n—>oo
x£X
and since XQ G X, we must have in fact equality. Also, since F assumes only finite values by assumption, this implies that inf F(x) > —oo. xex q.e.d. Remark 4-2.1. The argument of the preceding proof also shows that in a separable reflexive Banach space, a weakly proper functional is coercive w.r.t. the weak topology. Lower semicontinuity w.r.t. weak convergence is a rather strong property, in fact much stronger than lower semicontinuity w.r.t. to the Banach space topology of X. Fortunately, there exists a general class of functionals, namely the convex ones for which the latter property implies the former. Definition 4.2.3. Let V be a convex subset of a vector space; F : V —• R is called convex if for any x,y £V, 0
t)F(y)
(convexity of V means that tx + (1 — i)y G V whenever x,y G V, 0 < * < 1).
4-3 Existence of minimizers
187
Lemma 4.2.2. Let V be a convex subset of a separable reflexive Banach space, F : V —» R convex and lower semicontinuous. Then F is also lower semicontinuous w.r.t. weak convergence. Proof Let (xn)n£N C V converge weakly to x G V. We may assume that F(xn) converges to some K G R. By Theorem 2.2.4, for every m G N and every e > 0, we may find a convex combination N
N
nXn
Vm *•= Yl ^
n
(^ > °» y ^ n = 1)
n—m
n=m
with \\Vm -x\\
< e.
Since F is convex, N
F(ym) < J2 XnF(xn).
(4.2.1)
n=ra
Given e > 0, we choose m = m(e) G N so large that for all n > m, F(xn)
Since F is lower semicontinuous F(x) < liminf F(ym) < limsupF(y m ) < K = lim F ( x n ) . m->oo
m->oo
This shows weak lower semicontinuity of F. q.e.d.
4.3 The existence of minimizers for convex variational problems We return to the concrete variational problem discussed in Section 4.1 and begin with: Lemma 4.3.1. Let Q c Rd be open, f : ft x Rd -+ R, with /(.,v) measurable for all v G R d , / ( # , •) continuous for all x G fi, and /(a,v) > - a ( x ) + & M P
188
Direct methods
for almost all x G ft, and all v G Rd, with a G ^(ft), 77ien $(t;) := /
Jn
b G R, p > 1.
f(x,v(x))dx
is a lower semicontinuous functional on Lp(ft),
$ : Lp(ft) —> E U {oo}.
Proof. Since / is continuous in v, f(x,v(x)) is a measurable function, and so $ is well-defined on Lp(ft), by Theorem 1.1.2. Suppose (vn)neN converges to v in Lp(ft). Then a subsequence converges pointwise almost everywhere to v by Lemma 3.1.3. We shall denote this subsequence again by (^n)> noting that the subsequent arguments may also be applied to any remaining subsequence. Since / is continuous in v (actually, it would suffice to have / lower semicontinuous in v), we have / ( x , v(x)) - b \v(x)\p < lim inf (/(x, vn(x)) - b
\vn(x)\p).
n—•»oo
Because of the lower bound / ( x , vn(x)) - b \vn(x)\p > -a(x) with a G I/ 1 (fi), we may apply the Theorem 1.2.2 of Fatou to conclude / (f(x,v(x))
Jn
- b\v(x)\p)dx
(f(x,vn(x))-b\vn(x)\p)dx.
< liminf / n
^°° Jn
Since vn converges to v in L p (fi), / b\v(x)\pdx
= lim / 6|v n (x)| p dx,
and we conclude lower semicontinuity, namely / f(x,v(x))dx Jn
< liminf / n ^ ° ° Jn
f(x,vn(x))dx. q.e.d.
L e m m a 4.3.2. Under the assumptions of Lemma 4-3-1, assume that /(#,-) is a convex function on R d for every x G ft. Then $(v) := fQf(x,v(x))dx defines a convex functional on Lp(ft).
4.3 Existence of minimizers
189
Proof. Let v,w € LP{Q), 0 < t < 1. Then $(tt; + (1 - t)tu) = / f(x,tv{x)
+ (1 - t)u;(a:))da:
< / {tf{x,v{x)) +
{l-t)f(x,w(x))}dx
JQ
by the convexity of / = t$(v) + (1 -
t)$(w). q.e.d.
We may now obtain a general existence result for the rninimizer of a convex variational problem. Theorem 4.3.1. Let Q C Rd be open, and suppose f : Q x Rd —• R (i) /(•,!>) w measurable for all v G Md. (ii) / ( # , •) 25 convex for all x Eft. (iii) f(x,v) > —a(x) + 6|t;|p /or almost all x E ft, all v £ Rd, with ae ^(ft), b>0,p> l. Let g G H1*^),
and let A := g + H^P(Q). F(u) := /
Then
f(x,Du(x))dx
JQ
assumes its infimum on A, i.e. there exists uo G A with F(u0) = inf F(u). ueA Proof. By Lemma 4.3.1, F is lower semicontinuous w.r.t. H1,p(ft) convergencef, and by Lemma 4.2.2, F then is also lower semicontinuous w.r.t. weak H1,p(ft) convergence, since H 1>P(Q) is separable and reflexive for p > 1 (see Theorems 3.3.1 and 3.3.3). Let (un)neN be a minimizing sequence in A, i.e. lim F(un) = inf F(u). n—>oo
u£A
Since / \Dun\p < \F(un) o
JQ
+ lf a(x)dx, b JQ
(Dun)n£N is bounded in Lp(ft), hence (wn)n£N C g+H0,p(ft) is bounded in Hl*(Sl) by the Poincare inequality (see Theorem 3.4.2). Since H^p(ft) t Note that convex functions on Rd are continuous.
Direct methods
190
is a separable reflexive Banach space, by Theorem 3.3.5, after selection of a subsequence, (t/n)n€N converges weakly to some uo E A (A is closed under weak convergence, Theorem 3.3.4). Since F is convex by Lemma 4.3.2 and lower semicontinuous by Lemma 4.3.1, it is also lower semicontinuous w.r.t. weak HliP(ft) convergence by Lemma 4.4.2. Therefore F(u0) < lim F(un) = inf F(u), and since uo € A, we must have equality. q.e.d. Remark 4.3.1. The condition u 6 g+HQ,P(Q), i.e. u-g € HQ,P(Q), is a (generalized) Dirichlet boundary condition. It means that u = g on dft in the sense of Sobolev spaces.
4.4 Convex functionate on Hilbert spaces and Moreau-Yosida approximation In this section, we develop a more abstract method for showing the existence of minimizers of variational problems. It has the advantages that it does not need the concept of weak convergence and that it provides a constructive approach for finding the minimizer. In order to concentrate on the essential aspects, we shall only treat a special situation. Definition 4.4.1. Let X be a metric space with metric d(-,-), and let F : X —• E U {oo} be a functional For X > 0, we define the MoreauYosida approximation Fx of F as F\x)
:= mf(\F(y)
+ d2(x, y))
(4.4.1)
y€X
for x € X. Remark 4-4-1' This is different from the definition in Section 5.1 where we shall take d(x,y) instead of d2(x,y). Here, one might take da(x,y) for any exponent a > 1. For our present purposes, it is most convenient to work with a = 2. We now let if be a Hilbert space with scalar product (•, •) and norm ||-|| and induced metric d(x,y) = ||ar —y||. Let D(F) C if, and let F : D(F) -> E be a functional. We say that F is densely defined if D(F)
4-4 Convex Junctionals
191
is dense in H. For x ^ D(F), we put F(x) = oo. We say that F is convex if whenever 7 : [0,1] —> H is a straight line segment, then for 0 < t < 1 F( 7 (t)) < tF( 7 (0)) + (1 - t ) F ( 7 ( l ) ) .
(4.4.2)
In particular, if 7(0), 7(1) G £>(F), then also 7(f) G D(F) for 0 < t < 1. Lemma 4.4.1. Let F : i f —• E U {00} 6e convex, bounded from below, and lower semicontinuous. Then for every x G H and X > 0, there exists a unique yx =: J\x) with Fx{x) = XF{yx) + d 2 (x, yA)
(4.4.3)
Proof. We have to show that the infimum in (4.4.1) is realized by a unique yx. Uniqueness: Let y^y^ be solutions of (4.4.3), and let 2/o = ^(2/iA + 2/2) be their mean value. By convexity of F F(ti)<\{F{v?)+F{ti)),
(4.4.4)
and by Euclidean geometry, if yx ^ y$, we have
l k - % X H 2 < ^ ( l k - ^ i r + ||x-^|| 2 ),
(4.4.5)
hence \F(yx)
+ \\x-yx\\2<\F(yx) = ^F(yx)
+ +
\\x-yx\\2 \\x-yx\\2,
contradicting the minimizing property of yx and y£. Thus, we must have Vi = 2/2» P r o v m S uniqueness. Existence: (4.4.5) may be refined as follows: For 2/1,2/2 € # and 2/o := 2(2/1+2/2) we have for any x £ H \\* - J/oll2 = 5 (||x - j/xll2 + ||x - y 2 || 2 ) - \ l b ! - y2\\2 •
(4.4.6)
192
Direct methods
We now let (yn)neN be a minimizing sequence, i.e. XF(yn) + \\x - 2/„||2 -
inf (xF(y) + ||x - y|| 2 ) = : « A .
(4.4.7)
We claim that (yn) is a Cauchy sequence. For /, k G N, we put Vi,k '•= 9 ^ f c
+
^)'
Using the convexity of F as in (4.4.4) and (4.4.6), we obtain \F(yktl) < \ (\F(yk)
+ \\x - yk\\2)
+
\\x-yktl\\2 + \\x - » | | 2 ) - \ \\Vk
+ \ (\F(yi)
l|2
(4.4.8) By definition of K\ (see (4.4.7)), the left hand side of (4.4.8) cannot be smaller than «A, and so we conclude that
llifc-wll 2 -o as fc, I —> oo, establishing the Cauchy property. Since the norm is continuous and F is assumed to be lower semicontinuous, the limit yx of (2/n)n€N then solves (4.4.3). q.e.d. Lemma 4.4.2. Let F and yx = Jx(x) in the closure of D(F). Then
be as in Lemma 4.4.I. Let x be
x = lim Jx(x).
(4.4.9)
A—0
Proof. Since x is in the closure of D(F), for every 6 > 0, we may find x6 £ B(x, 6):={yeH:\\x~
y\\ < 6}
with F(xs) < oc. Then lim (\F(x6)
4- ||x - x^|| 2 ) < &2
and therefore lim sup K\ < 0
(4.4.10)
A-+0
(see (4.4.7) for the definition of K,\). Let us now assume that there exists a sequence An —> 0 for n —• 00 with \\x-yXn\\2
>a>0
for all n.
(4.4.11)
4-4 Convex junctionals
193
Then from (4.4.10) limsup(A n F(i/ A n ) 4 - | | x - i / A n | | 2 ) < 0,
(4.4.12)
hence F (yXn) —•-oo
asn^oo.
(4.4.13)
(4.4.12) and (4.4.13) imply F(yx) + \\x - y1^2
+ \\x-yXn\\2
-+-oo
as n -+ oo
which is impossible. Thus, (4.4.11) cannot hold, and (4.4.9) follows. q.e.d. T h e o r e m 4.4.1. Let F : H —• MU{oo} 6e convex, bounded from below, and lower semicontinuous, and F ^ oo. For x € M, we let yx = Jx(x) as in Lemma 4-4-1- If (yXn)nen is bounded for some sequence Xn —> oo, then (yx)\>o converges to a minimizer of F as X —• oo. Proof Since (yXn)neN is bounded and since yXn minimizes ~\\x-y\\2,
F(y) + we obtain F(yx")^
inf F(y) y€H
so that (yXn)neN is a minimizing sequence for F. We now claim that \\*-yx\\2 is monotonically increasing in A. Indeed, let 0 < ii\ < fa- Then by definition of t/Ml W
2
||ar — y»\\2 > F(y^)
) H Mi
+ — ||x - y^\\2 , Mi
hence W )
H
1
,,
-i.2 ^
D/.un
,
1
ii
..uiii2
||ar — y " a | f > F ( ^ ) + — ||x - i/"1
M2
M2
+
(^-^)(^-^lH2-||a:-^ll2)
This is compatible with the minimizing property of y^2 only if \x-y^\\2>\\x-y^\\2
194
Direct methods
and monotonicity follows. This monotonicity then implies that All2
II
II
^ II
is bounded independently of A since it is assumed to be bounded for the sequence An —> oo. We next claim that F(yx) monotonically decreases towards
miF(y). Indeed, from the definition of t/A, F(yx) =
inf
{y.\\x-y\\<\\x-yx\\}
F(y),
and therefore yx has to decrease since | \x — yx 11 increases. The limit has to be inf y € # F(y) since this is so for the subsequence (yAn)n€N- We now claim that (yx)\>o satisfies the Cauchy property, i.e. for every e > 0, there exists Ao > 0 such that for all A,/u > A0 \\yx-y»\\2<e. For that purpose, we choose Ao so large that for A, /x > Ao \x - yx\\2 - \\x - y»
< \
(4-4.14)
which is possible by the preceding monotonicity and boundedness results. We may also assume F(yx) > F(y").
(4.4.15)
We let
y^:=l(y^+y"). Then from the convexity of F , (4.4.15), and (4.4.6),
*V'") + ^ | | * - 2 / A ' 1 | 2
+
by (4.4.14).
l-\\\y>-y»\
4-5 Euler-Lagrange equations
195
This, however, is compatible with the minimizing property of yx only if \\yx
-y»\\2
< <•
x
Thus {y )\>o satisfies the Cauchy property for A —• oo, and it therefore converges to some y £ H. y then minimizes F , because F(yx) decreases towards inf y€ /f F(y) for A —• oo, and F is lower semicontinuous. q.e.d. The preceding reasoning is adapted from J. Jost, Convex functionals and generalized harmonic maps between metric spaces. Comment. Math. Helv. 70 (1995), 659-673. For a more general construction, see J. Jost, Nonpositive Curvature: Geometric and Analytic Aspects, Birkhauser, Basel, 1997, pp. 61-4. In particular, the method also works in uniformly convex Banach spaces. General references for Moreau-Yosida approximation are the books of Attouch and dal Maso quoted in Chapter 6. Theorem 4.4.1 yields an alternative proof of Theorem 4.3.1 in case p = 2. Namely, Lemma 4.3.1 implies the lower semicontinuity, Lemma 4.3.2 the convexity of the functional, and the Poincare inequality the boundedness of any minimizing sequence, as described in the proof of Theorem 4.3.1. The present proof, however, does not need the concept of weak convergence. As mentioned, the method extends to uniformly convex Banach spaces, and thus can handle also arbitrary values of p > 1 (see Remark 3.1.2).
4.5 The Euler-Lagrange equations and regularity questions In this section, we return to the variational problems considered in Sections 4.1 and 4.3; we consider variational integrals of the form *( M ) : = f / ( x , u(x), Du(x))dx, Jn
for u £
Hx'p(n)
on a bounded, open subset ft of R d , and we make the following assumptions o n / : f t x R x R ( i - > R = RU {±oo}: (i) /(•, u, v) is measurable for all u £ R, v £ Rd. (ii) / ( # , •, •) is differentiable for almost all x £ ft. (iii) \f(x,u,v)\ < Co -f ci \u\p -f c2 \v\p, c 0 ,ci,c 2 constants, for almost all x G f i , and all u £ R, v £ R d . Condition (iii) implies that $(u) is finite for u £ Hl,p(fl), since O is bounded. (If fi is unbounded, this still holds provided c 0 = 0.) In the
196
Direct methods
preceding section, we have obtained some results on the existence of a minimizer for # in the class g + ifQ ,p (fi), for given g E HlyP(fl). In the present section, we wish to characterize such minimizers by necessary conditions. These conditions will assume the form of differential equations. In fact, these differential equations will hold for arbitrary critical points of # (as specified in the assumptions of our subsequent results), and not only for minimizers. Theorem 4.5.1. Let f satisfy (in addition to (i)-(iii)) (iv)
d£ (x,u,v) du
d
f(
^ < c3 -f c4 |u| p + c5 |t>|p ,
C3, C4, C5 constants, for almost all x E fi, and all u GR, V E M.d. Let u be a minimizer for # in the class g + H0,p(Q) (g E H1,p(ft) We then have for all (p E CQ°(Q)
given).
= 0.
(4.5.1)
Proof. Since u is a minimizer for $ in g + if 0 ' p (O), *(u) < *(u 4- *?) for t E R,
(4.5.2)
We have $(w + t ^ ) = / Jn
f(x,u(x)+tip(x),Du(x)+tDip(x))dx.
By (ii), (iii), (iv), we may apply Corollary 1.2.2 to conclude that $(u+tip) is differentiate w.r.t. ?, and
dt
$(u + tip)
— \
l—(x,u(x)+tip(x),Du(x)+tDip(x))ip(x) dip(x)"
+ E dii & <x)+^(x)' ^ ^ ) + " W ) -f§-}dxt=i
(4-5-3)
4.5 Euler-Lagrange equations
197
Furthermore, (4.5.2) implies
Equations (4.5.3) and (4.5.4) imply (4.5.1). q.e.d. Remark 4-5.1. From the preceding proof, it is clear that we do not need to assume that u is a minimizer for 3>. If suffices that u is a critical point for # in the sense that ~ $ ( u + tip)\t=0 for all if € C5°(fi). (4.5.5) at Corollary 4.5.1. Suppose that f satisfies (i)-(iv), and in addition, f € C2. If u E C2(Q) minimizes $ in the class g + HQ}P(SI) (or, more generally, satisfies (4-5.5)), then \r*
d2
f
(t
< x r. .xx
S 5^ *'*
(x)
'^
2
( x ) )d u
&^
Js
d2f
,
,
x
^ ,
XN
3u
+E ^ ( ^ " W . ^ ) ) - |(^,«w,^(x))=o. t=i
(4.5.6) Definition 4.5.1. Equation (4-5.6) is called the Euler-Lagrange equation for $ . Proof (Corollary 4-5.1). By the differentiability assumptions made, we may integrate (4.5.1) by parts to obtain
t= l
t=l
From Lemma 3.2.3 (applied to supp? CC Q, so that the term in {• • •} is in L 2 ), we then obtain (4.5.6). q.e.d.
Direct methods
198
Equations (4.5.6) constitutes a quasilinear partial differential equation of second order for u. Many such partial differential equations arise as Euler-Lagrange equations of variational problems. Therefore, if one wants to solve such an equation, one might try to find a minimizer of the associated variational problem. However, the existence theory for minimizers as described in Section 4.3 naturally yields an element u of the Sobolev space HQ'P(Q), whereas in Corollary 4.5.1 it is required that u be of class C 2 (Q). Thus, there exists a gap, since in general elements of H0,P(Q) are not of class C2. It is the task of regularity theory to bridge this gap, i.e. to show that under suitable assumptions on / , any minimizer of # is smooth, and specifically here of class C2. The theory of partial differential equations indicates that such a result does not hold without additional assumptions on / , like an ellipticity assumption, meaning that the matrix (at,7'(£))i,j=i,...,d with coefficients atJ(x) = dvi$vj (x, u(x), Du(x)) is positive definite. Indeed, examples show that without such an assumption, in general one does not get smoothness of minimizers. On the positive side, however, we do have de Giorgi's and Nash's: Theorem 4.5.2. Let Q, be open and bounded inRd, of class C°°, with
f : Q, x Rd —» R be
0) AH 2 (x,t>)
for all x G fi, u E l , t>, £ G Mrf, with constants A > 0, A < oc,
oil &<*«> Let u / n
< M ( l + \v\)
G g + HQ' (ft)
f(x,Du(x))dx
for a constant M < oo.
be a bounded minimizer
of F(u)
:=
(g G H1,P(Q) given). Then u is smooth in fi (u G
C°°(ft)).
The proof of the theorem of de Giorgi and Nash is too long to be presented here. We refer to M. Giaquinta, Introduction to Regularity Theory for Nonlinear Elliptic Systems, Birkhauser, Basel, 1993, pp. 76-99 and
4-5 Euler-Lagrange equations
199
J. Jost, Partielle Differentialgleichungen, Springer, Berlin, 1998 where a detailed proof is given. Of course, there also exist extensions of this result to more general integrands of the form / ( # , u, v). We refer the interested reader to O. Ladyzhenskaya, N. Ural'tseva, Linear and Quasilinear Elliptic Equations, Academic Press, New York, 1968 (translated from the Russian), Chapters IV-VI. One remark is in order here: Since Sobolev functions are only equivalence classes of functions (in the sense specified at the beginning of Section 3.1), a more precise version of Theorem 4.5.2 is: Under the stated assumptions, the equivalence class of u contains a function of class C°°. This point, however, usually is assumed to be implicitly understood in statements of regularity theorems. In order to display at least one regularity result, however, we consider a particular example: For a bounded, open fi C Md, g £ H1,2(Q), we wish to minimize Dirichlet's integral D(u) := f \Du(x)\2dx
(4.5.8)
Jn in the class g + H0,P(Q). By Theorem 4.3.1, a minimizer u exists, and by Theorem 4.5.1, it satisfies / Du(x) • Dip(x)dx = 0
foran>eC£°(n)
(4.5.9)
JQ
(here Du(x) • Dip(x) := Yli=i Diu(x)Di
( A : = | : ^
(A is called Laplace operator.) by Corollary 4.5.1, i.e. it is harmonic. This is the famous Dirichlet principle: obtain a harmonic function u in fi with boundary values g by minimizing the Dirichlet integral among all functions with those boundary values. In order to justify Dirichlet's principle it thus remains to show that any solution of (4.5.9) is of class C2. Actually, one can show more, namely, u e C°° (in fact, u is even real analytic in Q but this will not be demonstrated here), and at the same time weaken the assumption. Namely, we have:
200
Direct methods
Theorem 4.5.3 (Weyl's lemma). Let u € L 1 (fi) satisfy /
u(x)A
for all v € C§°(fi).
(4.5.10)
JQ /n
ThenueC°°{n). Remark 4-5.2. (1) Clearly, (4.5.9) implies (4.5.10) by definition of Du. (2) The remark made after Theorem 4.5.2 again applies. Proof {Theorem 4-5.3). We consider the mollifications with a rotationally symmetric p (and we express this by writing p as a function of |x|)
Mx)
JaQ{^ir)u{v)dv
"h
as in Section 3.2. Given tp € Co°(f2), we restrict h to be smaller than dist(supp?, dft). We obtain / uh(x)A(p(x)dx
=
J2
e[—r—J
= / u(x)A
u(y)dyA(p(x)dx (4.5.11)
JQ
using Fubini's theorem. q.e.d. Remark 4-5.3. We have also used the fact that A commutes with mollification, i.e. (A
(4.5.12)
For this, one needs that Q is a function of |x| only, i.e. rotationally symmetric. Also, this point needs the rotational invariance of the Laplace operator A. Therefore, the present proof does not generalize to other variational problems. After this interruption, we return to (4.5.11) and conclude that /
uh(x)A(p(x)dx
=0
(4.5.13)
JQ
by applying (4.5.10) to
=0
4-5 Euler-Lagrange equations
201
in Qh := {% £ fi | dist(x,dft) > /i}. Also /
\uh{y)\dy
JQh
< J \u(x)\dx
(4.5.14)
JQ
I f f\x — y\\ by Fubini's theorem, using - j / f -—-—- 1 dy = 1 by (3.2.3) < oc
since u £ L 1 ^ ) .
Therefore, the functions it^ are uniformly bounded in L 1 . We now need Lemma 4.5.1. Let f € C 2 (ft) 6e harmonic, i.e. Af(x)
= 0
m n.
TTien / satisfies the mean value property, i.e. for every ball B(xo, r) C 0,, /(so) = ~^1 ^d
f(*)dx
I
r
= TTT^T
I
dwdrd
JB(x0,r)
f{x)da{x) JdB(x0,r)
where Wd is the volume of the unit ball in E d . Proof. For 0 < g < r 0= /
Af{x)dx
JB(xn,p) 'B(xo,Q)
I
dv
JdB(x0,Q)
-(x)da(x),
where v denotes the exterior normal of B(xo7 Q)
J
-J-(y +
d l
Qu)Q
- dw
i) OQ
in polar coordinates UJ = ^z^ ^
Q
j
Qd~l%- / ^Q
4-1
d
dg „d-id
JdB(0,l) JdB(0,l)
gl-d
f(v+
f JdB(0,l)
&»)*» f(x)d
d 1 i dg \\cLj (kjddeQ-d 1 JddB(xo,Q)
I
(4.5.15)
Direct methods
202 Thus, -
TTY /
f(x)da(x)
is constant in Q, and since its limit for g —• 0 is f{xo) as / is continuous, it has to coincide with f(xo) for all 0 < g < r. Since (-T-VT / f{x)da(x) I gd~ldg, d 1 \^dQ JdB(X0,6) J
-i-r / f(x)dx = ~ r rd W JB(X0,e) h
UdTad
the first inequality in (4.5.15) also follows. q.e.d. We return to the proof of Theorem 4.5.3: Since Uh is harmonic, it satisfies the mean value properties of Lemma 4.5.1. Since the family Uh is bounded in L 1 , Uh{xo) =
J /
UdTa
uh(x)dx
JB{xv,r)
is bounded for fixed r with B(xo,r) C fi^. Therefore, the Uh are uniformly bounded in flhQ for 0 < /i < 4&. Furthermore, from (4.5.15) \uh(xi) - uh(x2)\ < j — ( - ) / dWd
XT X
I
J
'
B(x1,r)\B(x2,r) UB(x2,r)\B(x1,r)
< c(r) |xi - X2I
\uh{x)\dx (4.5.16)
for some constant depending on r, if B(a;i, r), B(^2, r) C O^0. Therefore, the gradient of Uh is also uniformly bounded on Qh0 • Likewise, derivatives of Uh of all orders can be uniformly bounded on H^0 (0 < h < ^ - ) , either by repeating the same procedure, or by observing that together with Uh, also all derivatives of Uh are harmonic so that (4.5.16) can be iteratively applied to all derivatives in order to convert a bound on some derivative into a bound for a higher one. Therefore, a subsequence of Uh converges towards some smooth function 1;, together with all its derivatives, as h —• 0. Since all the Uh satisfy Auh = 0 so then does v: Av = 0 in ft. Since on the other hand Uh converges to u in Ll(ft) by Theorem 3.2.1, the two limits have to coincide (e.g. by Lemma 3.1.3). Therefore u = v, and consequently u is smooth and harmonic. q.e.d. As an application, we consider the following
Exercises
203
Example 4-5.1. Let a : R —• R be Lipschitz continuous with 0 < A < a(y) < A < oo
for all y e R.
d
Let ft C R be open. We want to minimize
r F(u):=
d
/ y2a(u(x))Diu(x)Diu(x)dx
in the class A := # + # d ' P ( n ) , w i t n S i v e n 9 e H1*^). Lindelof theorem, the ordinary differential equation dU
dv
(4.5.17) By the Picard-
X
y/a(u)
(4.5.18)
admits a solution u(v) of class C 1 ' 1 . We then have , sdu du
,
Since ^ > A" 2 > 0, the inverse function v(u) exists and is of class C ' as well, and we have by (4.5.19) and a chain rule for Sobolev functions that easily follows from the chain rule for differentiable functions by an approximation argument that d
^
d
a(u)DiuDiU = ] P DivDiV.
Therefore, (4.5.17) is transformed into Dirichlet's integral F{u) = D{v). Since the latter admits a smooth minimizer, the original problem (4.5.17) then admits a minimizer that is of class C 1 ' 1 in ft.
Exercises 4.1 4.2
in (iv) of TheWeaken the growth assumption required for orem 4.5.1. Hint: Use the Sobolev Embedding Theorem. Compute the Euler-Lagrange equations for the variational integral A(u) := / y/l + \Du{x)2\dx. Jn {A(u) represents the volume of the graph of u over ft. Critical points are minimal hypersurfaces that can be represented as graphs over ft.)
Direct methods
204 4.3
Compute the Euler-Lagrange equations for E(u) := / gij(x)Diu(x)Dju(x)
Jn
(detgij(x))^
dx,
where (gl-i(x))i,j=ii...id is the inverse matrix of ((fo(£))t,j=i,...,d. Assume that (<7ij(x))i,j=i,...,d is positive definite for all x G ft. Show that for given g G iif1 , 2 (0), there exists a unique minimizer of E among all u G Hla(9) with u- g G if 1 ' 2 (f2). (Minimizers for E are harmonic functions w.r.t. the metric gij(x).)
5 Nonconvex functionals. Relaxation
5.1 Nonlower semicontinuous functionals and relaxation From Section 4.3, we recall the following Theorem 5.1.1. Let Q C Rd be open, I < p < oo, / : fi x i d ^ measurable and suppose:
1
(i) For almost all x £ Q, f(x, •) is convex on Rd (ii) There exist a G L1^), b G R with f(x,v)
> -a{x) + b\v\p
for almost all x € Q and all v G Md. Then F(u):=
[ Jn
f{x,Du(x))dx
is Isc and convex on H1,p(i}) equipped with its weak topology and assumes its infimum in the class of all f G HltP(£l) with f — g G H0,P(Q) for some given g G H1,P(Q). Here, (ii) is just a coercivity condition ensuring that a minimizing sequence stays bounded w.r.t. the HliP -norm (w.l.o.g. F ^ oo) (i) implies that F is lsc, w.r.t. the norm topology of Hx'p, and the convexity then implies that F is also lsc w.r.t. the weak H1,p topology. Since bounded sequences in H1,p have weakly convergent subsequences, any minimizing sequence has a convergent subsequence, and a limit of such a subsequence then minimizes F by lower semicontinuity. Not all functionals that one wishes to consider in the calculus of variations are convex, however. As a motivation for what follows, we consider 205
206
Nonconvex functionals.
Relaxation
the following example of Bolza: n = (0,l)cR, F(u) = /
u:(0,l)->R,
u(0)=0
= u(l)
(u2(x) + (u'(x)2 - l ) 2 ) dx.
We claim that : u G H*A((0,1))}
M{F(u)
= 0.
(5.1.1)
For the proof, we consider 'sawtooth'-functions: Let n G N, i un{x) := <
2*
r
£
f 0r
n , t+ 1 -x H n
(i = 0 , l , . . . , n - l ) . u n is contained in Hhoo((0,1))
2z + 1 < X <
2n 2n 2z 4-1 ^ 2i 4- 2 for — — < x < —-— 2n 2n
f
C # M ( ( 0 , 1 ) ) and satisfies:
For all x e (0,1)
0 < un{x) < —, zn
t i n ( 0 ) = 0 = ti„(l), for almost all x G (0,1)
\u'n(x)\ = 1.
(5.1.2) (5.1.3) (5.1.4)
Consequently lim F(un)
= 0.
n—+CXD
Since F(u) is nonnegative for every w, (5.1.1) follows. The inflmum of F therefore cannot be realized by any i ^ ' function, because if we had F(u) = 0, then u(x) = 0 for almost all x G (0,1) and |w;(x)| = 1 for almost all x G (0,1), and these two conditions are not compatible. (In fact, since d = 1 here, any u G # Q ' 4 ( ( 0 , 1)) is absolutely continuous, and so u = 0 if u{x) = 0 a.e., hence u is differentiable and u' = 0. (More generally, any Sobolev function that is constant on some set A has a representative u whose derivative Du vanishes on A.) We have thus shown that the problem F(u) -> min in # 0 M (fi) does not have a solution.
5.1 Nonlower semicontinuous Junctionals and relaxation
207
We observe that our minimizing sequence (un) converges to zero rl,4 weakly in H0y , by (5.1.2) and / u'n(x)
for all
F(un).
n—»oo
Therefore, F is not lsc w.r.t. weak H1 ^-convergence although the integrand is continuous in u'. As we shall see this results from the lack of convexity of the integrand. We also observe that any sequence of sawtooth functions un, i.e. satisfying
1^1 = 1 a.e. 2
that converges to 0 in L is a minimizing sequence for F. Remark 5.1.1. Functional of the type of our example often arise in optimal control theory as described in Section 5.2 of Part I. For example, one considers problems of the following type rT
f(t, u(t),a(t))dt / /o Jo
-+ min
(5.1.5)
under the side conditions u(0) = u0,
u(T) = uT
u'(t)=g(tMt),°(t))
(5.1.6) (5.L7)
with given functions / and g. u is called a state variable, a a control variable. This means that one assumes that u describes the state of some system evolving in time t whose derivative or rate of change can be controlled through a parameter a. The aim then is to choose a in such a manner that the functional, often considered as 'cost function', is minimized. Thus, one needs to find some equation a(t) =
208
Nonconvex functionals.
Relaxation
and this leads to minimizing functionals of the type / Jo
f{t,u{t),u\t))dt.
Expressions of the type (u'(i)2 — l ) 2 can occur in many technical examples, like boats sailing against the wind. Faced with a problem that one cannot solve, one may contemplate several options: One could try to modify the problem, or one might generalize the concept of a solution, or both. We shall discuss several such strategies. We first modify the problem via relaxation. This is an important method in the calculus of variations, and we therefore discuss it in some generality. Definition 5.1.1. Let X be a topological space, F : X —• E. We define the lower semicontinuous envelope or relaxed function sc~F of F as follows: (sc~F)(x) := sup {$(#) : $ : X —• R is lower semicontinuous with $(y) < F(y) for allyeX} Lemma 5.1.1. sc~F is the largest Isc function on X that is < F everywhere. In particular, F is lower semicontinuous if and only if F = sc~F. Proof. sc~F is lsc as a supremum of lsc functions, see Lemma 4.2.1 (iv). Obviously, sc~F < F , and for all lsc $ with # < F , we have $ < sc~F by definition of sc~ F. q.e.d. Theorem 5.1.2. Let X be a topological space, F : X —> R a function. Then every accumulation point of a minimizing sequence for F is a minimum point for sc~F. Consequently, if F is coercive, then sc~F assumes its minimum, and min sc~ F = inf F. x x
5.1 Nonlower semicontinuous junctionals and relaxation
209
Proof. Let (#n)neN C X be a minimizing sequence for F with accumulation point xo- Then (sc~F)(x0)
< liminf(sc~F)(x n )
< liminf F(xn)
by lower semicontinuity of sc F (see Lemma 5.1.1)
since sc~F < F
n—>oo
= inf F(y)
since (x n ) isaminlmizing sequence tor F.
(5.1.8)
On the other hand, the constant function $ ( x ) = inf F(y) is lsc and < F , hence by Lemma 5.1.1 for every x £ X inf F(t/) < (sc-F)(x).
(5.1.9)
From (5.1.8) and (5.1.9) we conclude (sc~F)(xQ) = inf F(y) = mm(sc~F)(x). y£X
(5.1.10)
x£X
This implies the first claim. If F is coercive, then every minimizing sequence has an accumulation point, and the second claim also follows. q.e.d. What does Theorem 5.1.2 tell us for our example? It simply says that if we cannot minimize our original functional F due to its lack of lower semicontinuity, we then minimize another functional instead, one that is lower semicontinuous and as close as possible to F. Theorem 5.1.2 then says that limits (or more generally, accumulation points) of minimizing sequences for F do not minimize F , but the relaxed functional sc~F. Since sc~F is the largest lsc functional < F by Lemma 5.1.1 that is the best one can hope for. It then remains the task to determine the relaxed functional of some given F. Before proceeding to do so for our example, let us relax ourselves a little and derive some easy consequences of the definition of the relaxed functional and consider some easier examples first. Lemma 5.1.2. Let X satisfy the first axiom of countability. Then sc~F is the relaxed function for F : X —> ft iff the following two conditions are satisfied:
210
Nonconvex junctionals. (i) whenever XJI
Relaxation
•X (sc"F)(x)
< liminf F ( x n ) n—»oo
(ii) for every x £ X, there exists a sequence xn —• x with (sc"F)(x)
> lim F(xn) n—>oo
Proof. We claim that, since X satisfies the first axiom of countability, (sc~F)(x) = inf{liminf F(xn) : xn -» x in X } .
(5.1.11)
We denote the right hand side of (5.1.11) by F~(x). Then F~~ is lsc. In order to verify this, we have to check liminf (inf {liminf F (y^n) : yvn —• yv}) > inf {liminf F(xn) : xn -> x} (5.1.12) whenever yv —> x. Indeed, otherwise, for some <5 > 0, we would find some diagonal sequence yv,nv ~¥ % as v —» oo with ^ (jfi/.u,,) < inf {lim inf F ( £ n ) : xn -» a:} - 5 which is impossible. Thus, F~~ is sequentially lsc, hence lsc, because X is assumed to satisfy the first axiom of countability. Also, F ~ < F , and for every lsc $ < F , we have for xn —> £ $(#) < liminf # ( x n ) < liminf F ( x n ) , n—••oo
n—»oo
and hence *(z) <
F~{x).
Thus, F ~ is the largest lsc functional < F , and (5.1.11) follows from Lemma 5.1.1. It is then easy to see (and left as an exercise) that F~(x) satisfies and is characterized by the properties (i) and (ii). q.e.d. Example 5.1.1. Let X be a topological space, A c l a subset. The indicator function %A is defined by
t \ := <J°
IA{X)
if:
'xe A ' x £ A. { oo if:
We then have sc iA -
where A is the closure of A in X.
iA,
5.1 Nonlower semicontinuous Junctionals and relaxation
211
The characteristic function \A is defined by
^
, , f 1 if x €A (x):= iA.
i o if*.
Then sc XA = XA where A is the complement of X \ A. Example 5.1,2. Let fl C E d be open, 1 < p < oo, J : L p (fi) -» R denned by
/(„) : = I /n 1^1" d * + In M" <** * « € C 1 ^) 1 oo
otherwise.
(Note that J(it) may also be infinite for some u £ Cl(ti).) We claim (sc-I)(u)
= I In \Du\P I oo
dx
+ In l«lP d *
if u
€ if ^ ( f l ) otherwise.
In order to show this, we shall verify the conditions of Lemma 5.1.2: (i) (sc~I) is lower semicontinuous on Lp which yields condition (i). The lower semicontinuity is seen as follows: Suppose un —» u in L p (fi). For the purpose of lower semicontinuity, we may select a subsequence (wu)v^n C (un)ne^ with lim (sc~"J)(w„) = liminf(sc~J)(ii n ), v—+00
n—•KX)
and we may also assume that this limit is finite. (yjv)v^ then is bounded in H1,p(fl). A subsequence of (wu) then converges weakly in H 1 ' p (f2) (Theorem 3.3.5), and by the Rellich-Kondrachev compactness Theorem 3.4.1, it also converges strongly in Lp(fl). The limit has to be u, because the original sequence (un) was assumed to converge to this limit. Since the Jcf1'p-norm is lsc w.r.t. weak Hl,p convergence (Lemma 2.2.7), we have (sc~I)(u) < lim
(sc~I)(wl/)
= liminf(5C~/)(ii n ).
212
Nonconvex junctionals.
Relaxation
(ii) Let u e H^p(ft). Since C^fl) 0 Hl*(tt) is dense in Hhp(fl), may find a sequence (un)ne^ C C1(f2) D HllP(fl) with lim ( f \Dun\p + / \un\p)
we
= f \Du\p + \u\p ,
i.e. lim/(ii n ) = If ugH1*^),
(sc~I)(u).
then I(u) = (sc~I)(u) = oo.
This verifies condition (ii). Example 5.1.3. Similarly, for °
W
'
if u G L p (fi) \ CQ1 (fi),
\oo
the relaxed functional is I cx)
otherwise.
Remark 5.1.2. We may also define the above functionals / , I0 on Lfoc(f2) instead of Lp(fl). The relaxed functionals will be given by the same formulae. Remark 5.1.3. For p = 1, the relaxations of / and 70 a r e not given anymore by the H 1 ' 1 -norm, but by the BV-norm which is defined in Chapter 7. In metric spaces, there is an alternative useful characterization of the relaxation of a given functional which we now want to describe. Definition 5.1.2. Let X be a metric space with distance function d(-, -), F : X —• R U {oo} be bounded from below, F ^ oo. For \ > 0, we define the Moreau-Yosida transform of F as Fx(x) := inf (F(y) + \d(x,y)). yex Theorem 5.1.3. The functionals F\ satisfy |F A (*i) - Fx(x2)\
< Ad(xi,a?2)
(5.1.13)
(5.1.14)
for every X > 0, xi, x2 € X. In particular, they are Lipschitz continuous. For any x € X (sc~F)(x) = lim Fx(x). A—+oo
(5.1.15)
5.2 Representation of relaxed Junctionals via convex envelopes 213 Proof. For xi,X2,t/ G X, A > 0, we obtain from the triangle inequality F(y) + Xd(xuy)
< F(y) + \d(x2,y)
+
Xd(xux2).
The definition of F\{x2) implies then inf (F(y) + Xd(xuy)) yex
< Fx(x2) +
\d(xux2),
hence Fx{xi) < Fx(x2) + Ad(xi,x 2 ). Interchanging the roles of X\ and x2, we conclude \Fx(xi) - Fx(x2)\ <
Xd(xux2).
Since we have now shown that F\ is Lipschitz continuous, and since ^A < F, we obtain Fx < sc-F, hence for all x £ X supF A (x) < (sc'F)(x).
(5.1.16)
A>0
For any A > 0, we find x\ £ X with F(xx) + \d(x,xx)
+j .
Therefore lim x\ = x A—+oo
and (s
(5.1.17)
A—*oo
Equations (5.1.16) and (5.1.17) imply (5.1.15). q.e.d.
5.2 Representation of relaxed functionals via convex envelopes Theorem 5.2.1. Let Q, C Rd be open, I
214
Nonconvex functionate.
Relaxation
H ^ f i ) : u - u0 € H^P{Q)} -+ R be given by
Let F:{u€
, (u0 € H^P(Q)
F(u) := / f{Du(x))dx
given).
Then the relaxed function of F w.r.t. the weak H1^ topology is given by (sc~F)(u) = / (cvx~ f) (Du(x)) dx where (cvx~ f)(v) := s\ip{g(v) : g < f,g convex} is the largest convex function < f. For the proof, we shall need the following: Lemma 5.2.1. Let W = Y\i=i(aiiPi) C ^d be an open rectangle, P 1 < p < oo. We let f € L (W) and extend f periodically to Rd, i.e. f (x1 + mi (ft - a i ) , . . . , x d + m d (/?d - a d )) = / ( x \
...,xd)
for mi,. • ^ m d € Z, x = (x 1 ,.- . ,x d ) € W, and put fn(x) := / ( n x )
/or n G N.
T/ien we #e£ the weak convergence fn-f
=
~z [ f(x)dx meas W Jw
in Lp(W)
for n -* oo.
(5.2.1)
Proof. First /
\fn(x)fdx=
JW
f
\f(y)\pdy=
\f{nx)\*dx=\f n
JW
JnW
f
\f(x)fdx
JW
by the periodicity of / . Thus ll/n|l L p(w) = ll/llL,(W)-
(5-2-2)
In the same manner, / fn{x)dx= [ f(x)dx= [ fdx. Jw Jw Jw Let now Wo be a subrectangle of W, written in the form d
W =
° I I (a* + bi0ih ai + bi^ '
or more compactly Wo = a + bW
(a = ( a i , . . . ,a d ) ,6 = (&i,... ,&<*)).
(5.2.3)
5.2 Representation of relaxed junctionals via convex envelopes 215 Then /
(/ n (x) -f)dx=
JWQ
[
(f(nx) - f) dx
Ja+bW
= hl
(f(v)-f)dy
71
n
Jna+nbW
h fJna+[nb]W (f(v) ~ f) dy 1
+ •3 n
Jna+\nb]W
/
(f(y) - f) dy Jna+(nb-[nb])W
±r n /
+ —I d 11
by
{f(y)-f)dy
Jna+(nb-[nb])W Jna periodicity of
/.
The first term in the right-hand side vanishes by (5.2.3), and thus, again using the periodicity of / , 1 / {fn(x)-f)dx\<^ [ \f(y)-f\dy. \Jw0 I na Jw Letting n —• oo, we obtain for every subrectangle WQ of W lim / (fn(x)-f)dx = 0. ^ ° ° J Wo with J + - = l. We have to show n
Let now g € Lq(W),
lim /
fn(x)g(x)dx
= /
fg(x)dx.
(5.2.4)
(5.2.5)
Given e > 0, we then find subrectangles W\,... ,Wk (k = k(e)) and Xi eR (i = l,...,fc) with
Yl XiXw%
<e
(5.2.6)
Li(W) q
(The possibility of approximating L (fl) functions g (Q open in Rd) in such a manner by step functions can easily be seen as follows: Since C£°(fi) is dense in L 9 (ft), there exist y>€ E C£°(n) with ll# ~" ^elli^m) < f • ^ ls then e a s Y to construct a step function Y2 XiXw{ (Xi £ E, Wi disjoint rectangles contained in supp?€) with sup S U p p V?c
<
2 meas supp ?€
Nonconvex functionals.
216
Relaxation
Then indeed
g-YlXi*wi
< €. LP(Q)
Then \
(fn(x) ~ f)
9{x)dx
I {fn{x) ~ Jw
f)Y]\iXwM) Y
(fn(x)-f){g(x)-Y^\iXWt(x)
+ / k
< I > l | / i=1
(fn(*)-r)dx\+e\\fn-f\\
\JWi
}
I
by (5.2.6) and Holder's inequality (Lemma 3.1.1). The first term tends to zero as n —• oo by (5.2.4), whereas the second one is bounded by 2e ||/||£,P(W) ^ (5.2.2) and can hence be made arbitrarily small. Therefore, (5.2.5) holds. q.e.d. The proof of Theorem 5.2.1 will be broken up into several steps: (1) We put ( g - ^ ^ ^ i n f j ^ ^ ^ / ^ + D^x))^:
v
G
H**{U),
d
U bounded domain in M. >, (5.2.7) and we claim: Lemma 5.2.2. (sc-F)(u)<
I\q~f){Du{x))dx. JQ
Proof Replacing F(u) by G(v) := F(v -f ^o) for v = u - u0, we may assume u0 = 0, i.e. u G HQ,P(FI). Since the piecewise affine functions, i.e. those u for which Du is constant on disjoint rectangles Wi c f2, with Q \ (J Wi arbitrarily small, are dense in H1,p (for the same reason that the functions that are piecewise constant on disjoint rectangles W{ are dense in L p ), and since F
5.2 Representation of relaxed Junctionals via convex envelopes 217 is continuous under strong H^-convergence, the case where
it suffices to treat
Du = t>o = constant on some rectangle W. We next observe that for a given constant vector v, (q~f)(v) is independent of the choice of U in (5.2.7). First, the value of the inf on the right hand side of (5.2.7) does not change under translations or hornotheties of U. The general case of U\ and U2 then is handled by approximating U\ by disjoint homothetical translations of U2 and vice versa. We may therefore take U = W in (5.2.7). We now choose a sequence ((pn)nen C H^P(W) with (
>
(q-f)(v0). (5.2.8)
We extend (pn periodically from W to R d and put u(x) := vox (then Du = vo) and un(x) := u(x) +
n
-
By Lemma 5.2.1, un converges to u weakly in H1,p. Then un = u on dW by periodicity of (pn and y?nj = 0 . We have / f(Dun(x))dx Jw
= / f(y0 + Jw =
~d
nd
/
n
-D(pn(nx))dx
f(vo + D
(5.2.9)
W
Jw since (pn is periodic. Equations (5.2.8) and (5.2.9) imply lim F(un) = lim / n—+00
f(Dun(x))dx
n—+00 y t y
= / (
218
Nonconvex junctionals.
Relaxation
(2) We observe (q~f)(v)
(put ^ = 0 in (5.2.7)).
(5.2.10)
With (q~F)(u):= Jn
[(q-f)(Du(x))dx,
we obtain from Lernrna 5.2.2 and (5.2.10) sc"F = sc-(q-F)1
(5.2.11)
sc-F = sc~((q~)nF),
(5.2.12)
and upon iteration
where (q~)n means performing the construction q~~ iteratively n times. Prom the growth conditions on / assumed in Theorem 5.2.1, we conclude that is monotonically decreasing and bounded from below in n, hence converges to some limit (Qf)(v). From B. Levi's Theorem 1.2.1, we conclude lim (q-nF)(u) n—*oo
= lira f (q~n
f)(Du(x))dx
n—KX) J
= j{Qf){Du{x))dx
=: (QF)(u).
(5.2.13)
Since by (5.2.12) (sc~F)(u) < {q-nF)(u)
for all n,
we conclude from (5.2.13) (sc-F)(u)
< (QF)(u).
Prom the definition of Q / , we also conclude Qf(v) = inf {K— i — / (Qf)(v + Dip(x))dx, meas U Ju ip e H^P(U), U C Rd open, bounded}. (5.2.14) As before, this expression is independent of the choice of U.
5.2 Representation of relaxed functionals via convex envelopes 219 Definition 5.2.1. g : E d —• E is called quasiconvex if for all v G Rd,
^ ^ ^Z^u I g(v + DV>(x))dx. (5.2.15) mease/ Ju Equation (5.2.14) then implies that Qf is quasiconvex. (3) Lemma 5.2.3. / : E d —• E is convex if and only if it is quasiconvex. Proof. ' = > ' : Jensen's inequality says that if / is convex, for every ip G L1(Rd,Rd)
f
(-fi>(x)dx\
< J f(il>(x))dx
(5.2.16)
(see Theorem 1.1.6). Since, as observed above, in Definition 5.2.1 it suffices to consider one fixed domain (7, we may assume meas U = 1 and put tp(x) = v -f Dip(x). Since
, J>0*0 = v meas U = v, and (5.2.16) therefore implies that / is quasiconvex.
We assume that / is quasiconvex, i.e. f(v0) =
77 / f(v0)dx meas U Ju meas U Ju for all
We have to show that for all vx, v2 <E R d , 0 < t < 1 f{tvx + (1 - t)v2) < tf(Vl)
+ (1 - t)f(v2).
(5.2.18)
Equation (5.2.17) implies f(tVl
+ (1 - t)v2) < — 1 — / / ( t V l + (1 - t)v2 + Dip{y)) dy meas U Ju (5.2.19)
for all U and all <^ G H^P(U).
After a rotation, we may assume
Nonconvex Junctionals. Relaxation
220
that v\ — V2 is a positive multiple of the first basis vector of our standard basis of E d , i.e. v\ — v
H^{W)
with (l-*)(Vl -V2)
Vy>„(x)
on a set W™ C W with meas _^L ^/L „ 2\d-l W? = t(b-a){b-a-%)d on a set W% C W with meas
u / n
and 11 ^ ^ n 11 £,00 (v^) < Co for some fixed constant Co that does not depend on n. Using these (pn in (5.2.19) yields f{tvx + (1 - t > 2 ) < t/(vi) + (1 - t)f{v2) + pn with p n —• 0 as n —• oo, hence (5.2.18). It remains to construct (pn. We divide the interval (a, b) into 2n+* subintervals as follows:
h = (a,a+— h =
(b-a)) (a+^(b-a),a+^(b-a))
h = (a+ i^(b~ a),a+ —(b- a) + ^-{b- a))
i.e. the intervals hv-i have length ~ ( 6 — a), and they alternate with the intervals J2„ of length A ^ ( 6 - a). We then put
W T : =^( U / ^ - i ) x ( a + -n» 6 - - )nd " 1 W2n:=(M/2l,)x(a+-)6--)d~1i/=l
5.2 Representation of relaxed functional
via convex envelopes 221
We then put y>n(a, x 2 ,..., xd) = 0, d
) =
(
d(pn{x) dxi
(1 - *)|vi " v2\ for x G W? \ -t\v1-v2\ forxeW2n, 0fori = 2,...,d.
(Remember that we assume that V\ — v2 points in the positive ^-direction.) We then have ?n(6, x 2 , ...,x d ) = 0. We also put (pn = 0 on dW, and on W \ (WJ1 U WJ1) we choose an interpolation that is afflne linear in x 2 , ...,x d . Since sup |y>„(x)| < xewruwp
(6 - a)\vi - v2\ =:
02 nn
§
2n
we get
sup a:€VV , \(W 1 n UW a n )
. I < n—.
t=2,...,d
Gte*
^n
Thus, for large enough n, sup \V
™ sup
I*
. ax 1
< |vi - t/2| =: c 0 .
This completes the construction of ?n and the proof of Lemma 5.2.3. q.e.d. (4) We may now complete the proof of Theorem 5.2.1 Prom (2), we know (sc'F)(u)
< QF{u) = f
Qf(Du(x))dx.
By Lemma 5.2.3, Qf is convex. By Lemma 4.3.1, Qf(u) therefore is lsc w.r.t. weak H1,p convergence. Since QF < F (see (5.2.10) and the definition of QF), we must also have from the definition of sc~F that QF(u) <
(sc-F)(u).
Hence equality. Thus (sc-F)(u)
=
f(Qf)(Du(x))dx.
222
Nonconvex junctionals.
Relaxation
Moreover, for every convex function g < / , G(u) := J
g(Du(x))dx
is a weakly HltP lsc functional < F. Therefore, from the definition of sc~F, the convex function Qf must in fact be the largest convex function < / . This completes the proof. q.e.d. Corollary 5.2.1. F as in Theorem 5.2.1 is weakly lower semicontinuous in HlyP if and only iff is convex. Proof. Lemma 4.3.1 says that convex functional are weakly lower semicontinuous. If / is not convex, then by Theorem 5.2.1 sc~F ^ F , hence F is not weakly lsc by Lemma 5.1.1. q.e.d. Remark 5.2.1. One may also consider variational problems for vector valued functions u : ft C R d -+ E n , F(u) := f
Jn
f(Du{x))dx.
Again, / is called quasiconvex if for all open and bounded U CRd and all?e# 0 1 > p (t/;R n ), veRnd
In this case, however, while convex functions are still quasiconvex, the converse is no longer true. Theorem 5.2.1 continues to hold but with convexity replaced by quasiconvexity. Also, one may consider more general problems of the form F(u) = /
f(x,u(x),Du(x))dx
with similar results and conceptually similar, but technically more involved proofs. Remark 5.2.2. The notation of quasiconvexity and many of the basic corresponding lower semicontinuity results are due to C. Morrey. In fact, the quasiconvex functionals are precisely the weakly lower semicon-
5.2 Representation of relaxed junctionals via convex envelopes 223 tinuous ones. For detailed references to the work of Morrey and other researchers, see the book of Dacorogna quoted at the end of this chapter. Remark 5.2.3. Theorem 5.2.1 can be considered as a representation theorem for relaxed functionals. In particular, it says that a functional on H1,p obtained by integrating an integrand f(Du(x)) (with certain technical assumptions on / ) has a relaxed functional of the same type, i.e. again representable by integration w.r.t. to some integrand g(Du(x)) of the same type. Furthermore, g may be computed explicitly from / . We now return to our initial example F(u) = f
iu2(x)
+ (u'(x)2 - l ) 2 } dx
f o r i i € i f o ' 4 ( ( 0 , l ) ) . F ( i i ) i s t h e sum of a functional which is continuous w.r.t. strong L2-convergence, hence also w.r.t. weak H1*4 convergence, and another one to which Theorem 5.2.1 applies. We conclude that (sc~F)(u)=
f {u2{x) + Jo
Q(u'(x))}dx,
with
Q(v) = r:
1)2
1 n
tf 1 1
!r -.
otherwise,
the largest convex function < (v2 — l ) 2 .
References For the definition of relaxation and its general properties: G. dal Maso, An Introduction to V-Convergence, Birkhauser, Boston 1993, pp. 28-37. G. Buttazzo, Semicontinuity, Relaxation and Integral Representation in the Calculus of Variations, Pitman Research Notes in Math. 207, Longman Scientific, Harlow, Essex, 1989, pp. 7-28. For Theorem 5.2.1 and generalizations thereof: B. Dacorogna, Direct Methods in the Calculus of Variations, Springer, Berlin, 1989, pp. 197-249.
Nonconvex Junctionals. Relaxation Exercises Determine sc~F and discuss the relaxation for F(u) = /
(1 ~ u'{x)fu{x)2dx
with u(-l) F(u)=
f
for u G H1A
= 0,1/(1) = 1,
(2x-uf(x))2u(x)2dx
for
ueH1'4
w i t h i t ( - l ) = 0,ii(l) = 1, and F(u) = f
((u(x)2 - a ) + (^(x) 2 - 1)) cte for u G #
M
with a G R. Determine 5C~7 for J : L p (fi) -+ ft (11 G R d open and bounded),
J(tt):= f/ n |Dtirdx + / n HA(b ifuecHn) I oo otherwise. Why does the proof of Lemma 5.3.3 not work for vector-valued mappings Rd -+ E n with n > 1, i.e. # : R d n -+ R, v G R d n , (p G £To ,P (K,R n ) as in Remark 5.2.1?
6 T-convergence
6.1 The definition of T-convergence In this chapter, we treat the important concept of T-convergence, introduced and developed by de Giorgi and his school. Definition 6.1.1. Let X be a topological space satisfying the first axiom of countability, Fn : X —• R functions (n £ N). We say that Fn T-converges to F, F = T- lim Fn n—+oo
if (i) for every sequence (xn)n€N converging to some x E X, Fix) < liminf
Fn(xn)
n—>oo
and (ii) for every x G l , there exists a sequence xn converging to x with F(x) = lim
Fn(xn).
n—+oo
Example 6.1.1. Fn : R - • R
1
F„(x) := < nx
for x > — n1 for — < x < — n n
— 1 for x < — . n Then
(r-UmFn)(*) = P 225
for x > 0 for x < 0
226
Y-convergence
while the pointwise limit is 0 for x = 0, 1(-1) for x > 0(< 0). Example 6.1.2. Fn : R -+ E
{
nx
for o < x < —
2 - nx
for - < x < — n n otherwise.
0 Then
(r-limF n )(x) = 0 which is the same as the pointwise limit. Example 6.1.3. Fn : E -+ E -nx Fn(x) :--
for 0 < x < n 1 2 nx — 2 for - < x < n n 0 otherwise.
Then — 1 for x = 0 0 otherwise. whereas the pointwise limit is again identically 0. Note that the Fn of 6.1.3 is the negative of the Fn of 6.1.2. Thus, in general (r-limF n )(x) = {
(r-limFn)^r-lim(-Fn). Example 6.1.4- Fn : —nx
for 0 < x < — n
Fn(x) := { nx — 2 for — < x < — for odd n n n 0 otherwise 0 for even n. Fn then converges pointwise to 0, but does not T-converge at x = 0. Example 6.1.5. Fn : R -+ R F n (x) = sinnx. Then (r-limF n )(x) = - l , whereas F n does not converge pointwise.
6.1 The definition ofT-convergence
227
From Examples 6.1.4 and 6.1.5, we see that among the two notions of pointwise convergence and T-convergence, neither one implies the other. Example 6.1.6. Fn : X —» R converges continuously to F : X —• R if for every x G X and every neighbourhood V of F(x) in R (i.e. F = {y G R : |F(x) - y\ < e} for some e > 0 in case F(x) G R, V = {y G R : t/ > if}U{00} for some K G R in case F(x) = oc, and analogously for F(x) = — 00), there exist no G N and a neighbourhood U of x with
for all n > no, y € U. Fn converges continuously if and only if both Fn and — Fn converge to F and — F , respectively. Continuous convergence implies pointwise convergence, and we conclude from Examples 6.1.2 and 6.1.3 that Tconvergence is weaker than continuous convergence. Example 6.1.7. Let X satisfy the first axiom of count ability, Fn = F : X -+ R a constant sequence. Then r - l i m F n = (sc~F) is the relaxed function of F. Thus, we have the remarkable phenomenon that a constant sequence may converge to a limit different from the constant sequence element. Remark 6.1.1. Without changing the content of the definition of Tconvergence, condition (ii) may be replaced by the following condition which is weaker and therefore easier to verify: (ii') for every x G l , there exists a sequence xn converging to x with limsupF n (:z n ) < F(x). n—+00
The following result is useful in approximation arguments: L e m m a 6.1.1. Let X satisfy the first axiom of countability. Suppose (xm)meN converges to x in X, and limsupF(x m ) < F(x). m—>oo
Suppose that (iif) is satisfied for every xm (i.e. for every m, there exists a sequence (x m ) n ) n € N converging to xm with limsup F n (x m , n ) < n—+00
F(xm)).
228
T-convergence
Then (iif) also holds for x. Proof. Since X satisfies the first axiom of countability, we may take a neighbourhood system {Uu)u^ of x and renumber it and take intersections so that xm £ Um
for all m G N,
and that every sequence (t/„)t,eN with yu G U^u) for all v and some sequence ^i(y) —• oo as v —> oo converges to x. For n G N, we let mn := max I m G N : xm,n € Um , F n (a: m , n ) < — + F ( x m ) > . Then lim mn = oo. n—*oo
Namely, otherwise, we would find fco G N with fn,(Xfc,nJ> £ +
F X
( k)
or #fc,n„ ^ t^fc
for all k> ko and some sequence n„ —• oo for v —> oo. To see that this is impossible we simply observe that since Xk0 G Uk0 and since Xk0,n converges to Xk0 as n —• oo we have ^fc0,n € £4 0
f° r a ^ sufficiently large n,
and likewise since we assume limsupF n (x fco>n ) < F(x fco ), n—KX>
we have F n (x fco , n ) < F(x fco ) + -j—
for all sufficiently large n.
We then have
Fn(xmn,n) < F(x m J + — . mn
Therefore yn := # m n , n converges to x as n —• oo, and limsupF n (t/ n ) < lim sup ( F(xmn)
+ — ) < F(x)
6.1 The definition of V-convergence
229
by assumption and since mn —• oo as n —> oo. Thus, (yn)n€N is the desired sequence. q.e.d. Let F : X - • R U {00} satisfy inf F(y) > - 0 0 . Given e > 0, we say that x G X is an e-minimizer of X if F ( x ) < inf F(y) + e. yex Note that x is a minimizer of F if it is an e-minimizer for every e > 0. In contrast to minimizers, e-minimizers always exist for any e > 0. The following result is a trivial consequence of the definition of Tconvergence, but quite important. Theorem 6.1.1. (Let X satisfy the first axiom of countability). Let the sequence of functions Fn : X —> R F-converge to F : X —> R. Let mfy€x Fn(y) > - o o for every n G N. Let xn be an en-minimizer for Fn. Assume en —> 0 and xn —+ x for some x G X. Then x is a minimizer for Ft and F(x) = lim Fn(xn).
(6.1.1)
n—>oo
Proof If x were not a minimizer for F , there would exist x' e X with F(x') < F(x).
(6.1.2)
Since F n T-converges to F , there exists a sequence (x'n) C X with lim x'n = #' limFn(x;)=F(x/). We put 8 := \(F(x)
Fn(xn)
— F(x')). We may choose n so large that
> F(x) - 6
en < 8
(6.1.3)
F n ( x ^ ) < F ( x ' ) + (5
(6.1.4)
(by property (i) of Definition 6.1.1).
(6.1.5)
230
r-convergence
Since xn is an en-minimizer of Fn, F n « ) > Fn(xn)
- en
(6.1.6)
>Fn(xn)-6
by (6.1.3)
>F(x)-2<5
by (6.1.5).
Prom (6.1.4) and (6.1.6), we get F(x) < F(x') + 3<5 contradicting (6.1.2) by definition of 6. Thus, x is a minimizer for F. If (6.1.1) did not hold, then after selection of a subsequence, F(x) < l i m F n ( x n ) whereas by property (ii) of Definition 6.1.1, there would exist a sequence (x'n) converging to x with F(x) =
\imFn(x'n),
and we would again contradict the en-minimizing property of xn. q.e.d. Corollary 6.1.1. (Let X satisfy the first axiom of countability.) Let Fn : X —• R T-converge to F : X —> E. Let xn be a minimizer for Fn. If xn —> x, then x minimizes F, and F(x) = liminf
Fn(xn).
The following result is similarly both trivial and important. T h e o r e m 6.1.2. (Let X satisfy the first axiom of countability.) Let Fn Y-converge to F. Then F is lower semicontinuous. Proof. Otherwise, there exist some x £ X and some sequence (#rn)m(EN with lim xm = x m—>oo
lim F{xm)
< F(x).
(6.1.7)
m—»oo
By T-convergence, for every m, there exists a sequence (x m>n ) n6 N C X with lim Xfn n—+oo
= n
Xjyi
lim Fn{xm,n) = F(xm).
(6.1.8)
6.2 Homogenization
231
We assume —oo < l i m F ( x m ) , F(x) < oo simply to avoid case distinctions. We let 6 := \ (F(x) - lim F(xm)) 4
\
TO—+oo
> 0 by (6.1.7). /
For every ra G N, we may find nm G N with Fnm(xm,nm)-~F(xm)<6
(6.1.9)
lim x m>Hm = x , lim n m = oo. 771—>00
TO—•OO
Then by T-convergence F(x)<
lim i n f F n m ( x m , n J .
(6.1.10)
TO—+00
We may then choose ra so large that F(xm)
< F(x) - 36
(6.1.11)
Fnm(xm,nJ>F(x)-S.
(6.1.12)
and
Equations (6.1.9), (6.1.11) and (6.1.12) are not compatible, and the resulting contradiction proves the lower semicontinuity. q.e.d. Remark. As a consequence of Corollary 3.2.2 and Theorems 3.1.3, 3.3.1, and 3.3.3, in combination with Lemma 2.2.4, the weak topology of LP(Q) and WkyP(Q) for 1 < p < oo satisfies the first axiom of countability so that the preceding notions are applicable. The reference for this section is G. dal Maso, An Introduction to V-Convergence, Birkhauser, Boston, 1993
6.2 Homogenization In this section and the next one, we describe two important examples of T-convergence. They are taken from H. Attouch, Variational Convergence for Functions and Operators, Pitman, Boston, 1984. In the discussion of these two examples, we shall be more sketchy about some technical details than in the rest of the book, because the main point of these examples is to show how the concept of Tconvergence can be usefully applied to concrete problems that arise in various applications of the calculus of variations.
232
T-convergence
Let M be a smooth subset of the open unit cube (0, l)d of Rd. M is considered as a hole. Let Mc:=
( J e(M + ra) mezd
(e(M -f ra) := {x = y + era with ^ € M}) be a periodic lattice of 'holes' of scale e. Let ft C Md, fic := ft \ (M c O ft), i.e. a domain with many small holes. Such domains occur in many physical problems like crushed ice, porous media etc. Often, the physical value of e is so small that it is useful to perform the mathematical analysis for e —* 0. This is called homogenization. Let , v JO forxeMd\Mi ( v x a{x) :=tRdWl(x) := < { oo for x 6 Mi be the indicator function of Rd \ M\. a(~) then is the indicator function of Rd\Me. We consider the functional F€(u) := \e2
f \Du(x)\2dx
+ / a (-)
u2{x)dx
(6.2.1)
minimizer of the functional Fe(u) — / Jn
f(x)u(x)dx
(for given / G £ 2 (fi)) satisfies Au = - ~ in fic and u = 0 on <9fic.
(6.2.2)
Here 9Q e = dQU (dM e nfi). The boundary condition on dft comes from the requirement that u € H0y2(ft), while the boundary condition on dMe is forced by the functional. Theorem 6.2.1. With respect to weak L2(ft) convergence r - l i m Fe = F,
(6.2.3)
with
where H(M) := / \Dr](x)\2dx= d J(0,l) \M
j n(x)dx, «Ao,l)d
6.2 Homogenization
233
and rj is the solution of in (0, l)d \ M
Arj = -1 77 = 0
inM
(6.2.4)
d
d
77 is lL -periodic (i.e. t](x + m) = rj(x) for x £ (0, l) , m G Z d ). Proof. We put r/c(x) := r?(f). By Lemma 5.2.1, r/c converges weakly in L 2 (fi) to fi(M) as e —» 0. Let now u e L2(ft). By approximation, we may assume that u is smooth, e.g. contained in W1,2(Q.) O C°(fi). We put
u :=
< ^o^"-
Then ue converges weakly in L2(Q) to u, and ue = 0 on M c . Moreover l^d2
F.K) = ~ / 2 .2
(6.2.5) (u2 \DVe\2 + 2urleDu • Dr,< + r,2 \Du\2) .
—^ J 2 n{M)
If U C fi is open, because of the periodicity, fv \Dr]e\2 asymptotically behaves like e
J(0,e)d\Me
€
J(0,l)d\M
This means that lim e2 / \Drje\2 = measU [
c
~> 0
\Drj\2 = measU • fi{M)
(6.2.6)
J(0,l)d\M
JLT
hence, approximating u by step functions, we also get lim e2 / €
-°
u2 \Drje\2 = plM)
Jn€
[ u2
(6.2.7)
Jn
(note that we assume u to be continuous). Moreover, since rje is bounded independently of e, lime 2 /
r)2\Du\2 = 0,
(6.2.8)
and from (6.2.6), (6.2.7) and the Schwarz inequality, also lim e2 /
ur]eDu • Drje = 0.
(6.2.9)
234
F-convergence
Equations (6.2.5)-(6.2.9) imply lim Fe(ue) = F(u).
(6.2.10)
c—+0
In order to complete the proof of T-convergence, we need to verify that whenever functions ve that vanish on Me converge weakly in L2(Q) to u, then liminfF e (i; e )>F(ti).
(6.2.11)
By an approximation argument, we may assume u £ Co°(fi). We put
u =
1
* IW)^
as before. We have Fe(ve) + Fe(ue) > e2 f
DveDue vieDve'Du I fa.
/
+ uDve'Drje).
(6.2.12)
Using (6.2.10), we obtain from (6.2.12) in the limit e -+ 0 liminf FJve) +
n
*
f u2 > l i m i n f - ^ — /
uDveD
(6.2.13)
since the other term on the right hand side of (6.2.12) goes to 0 by a similar reasoning as above. Equation (6.2.4) implies e2Ar)e = - 1
in
fic.
(6.2.14)
Moreover e2 /
Duv€Dr)€ <e2(
f \Du\2 v2j
(f
\Dr)e\2X
- • 0,
(6.2.15)
since v€ as a weakly converging sequence is bounded in L 2 , \Du\ is bounded by our approximation assumption that u is smooth enough, and since we may use (6.2.6). Integrating the right-hand side of (6.2.13) by parts, and using (6.2.14) and (6.2.15), we obtain liminfF e (v e ) +
....
I u2 >
liminf /
veu
6.3 Thin insulating layers
235
since ve converges weakly in L2 to u. This implies (6.2.11) and concludes the proof. q.e.d.
6.3 Thin insulating layers We consider an insulating layer of width 2e and conductivity A, and we want to analyse the limit where e and A tend to 0. Let Q C R 3 be bounded and open, S a smooth complete surface in 3 R , e.g. a plane, E := ft n 5, S e := {x € R 3 : dist(z, 5) < e}
EC -.= nnse fic:=fi\£c. Conductivity coefficient (1 o n Q c ^ A onLc
(A > 0).
Variational problem: JC'A : # 0 1,2 (fi) -> R I € ' A (ti) := J /
\Du\2dx + ~ f
P>x(u) - I fu-> min
\Du\2dx
(/ € L2(Q)
(6.3.1)
given).
The Euler-Lagrange e quations are AuCix + / = 0 on fic
(6.3.2)
AAiiC)A + / = 0
on S c
(6.3.3)
on dtte fl <9£c
(6.3.4)
^C,A|QC
A d^*
=
^C,A|E €
_ _
du€iX
on dQe H <9£c
(6.3.5)
(where n^/ denotes the exterior normal of a set U) u€i\ = 0
on dfl.
(6.3.6)
236
r-convergence
T h e o r e m 6.3.1. We let e - • 0, A - • 0. If £ - • a with 0 < a < oo, then u€y\ —>• u weakly in L2(ft) u€y\ =4 u uniformly on every fto CC fi \ E, w/iere it solves Au + f = 0
onfi\E
^lan = 0 du and
dit on\2
r
,
where [U]J: is the jump ofu across E, and! | | , and | ^ , ore £/ie exterior normal derivatives for the two components of ft \ E. (7n case a = oo, u is continuous across E, and! Aw = / m ft.) Furthermore
-5jL,D"l>+UMid£ JC'A r-converges w.r.t. the weak L2-topology to I(u): = {Un\z\°u\2 I oo
0
=
+ lMyV
lUn\Du\2 I oo
ifueHl0>2(n\ll) otherwise
ifueH^n) otherwise (in this case, the result holds
a = 0 : J(«) = ( i /n\= l^l 2 loo
* « e # 1,3 ( n \ E) L{T ^ f °nff
otherwise
-topology in place of the weak one)
Thus, in case a = 0, we obtain a perfect insulation in the limit, whereas for a = oo, the limiting layer does not insulate at all. We assume for simplicity S = {x3 = 0}. L e m m a 6.3.1. There exists a constant c\ (depending on / , fi, 5, but not on e, A) such that for all sufficiently small e, A
/ n <>
/ \Du,,^ + \ l
(^)
|IHi«,»|' < cL (l + -|) .
6.3 Thin insulating layers
237
Proof. i2 ,A|
f \Due,x\2 + A /
\Du^
= - /
Auc,A • iiC)A + /
+ A /
- A /
Auc>A • uc,A
UC,A-£T7—
= / / • wC)A ^
u€yX~^
because of the Euler-Lagrange equations
' lM«.*lz,2(n) *
I/IL2(0)
By the Poincare inequality (Theorem 3.4.2), /
u2x
< c2 /
/
|£^C,A|2
\Du€yX\2.
By a change of scale y3 = ex3 /
(y1 = x\y2
u2eX < c3e /
JJ2€
= x2),
|2 \Du^ :,A|
JY>€
(we only get e instead of e2, because the area of the portion of #E C on which ue,\ vanishes, namely dQ D 5 € , is proportional to e). Altogether
f < A < c4 (l 4- j) (J \Du^x\2 + \J |Du€fA| 2 < and the estimates follow. q.e.d. Proof {Theorem 6.3.1). We only consider the case 0 < a < oo (the other cases follow from a limiting argument). We first observe r - lim Ie>x(u) = oo
'due
L2(VL) \ H^2(ft \ E).
We assume for simplicity X = X{e) = ea ,T
:=Ie^el
Let u € if 1 , 2 (fi \ E). We first check property (ii) of T-convergence:
T-convergence
238
We need to find ue —* u weakly in L 2 (fi) with HmI e (u e ) = I(u). We define [ u(xl,x2,xz) if | z 3 | > e 2 2 ue(x ,x , x ) := < ~{u(x\x ,e)+u(x\x ,-e)} 1
2
1
if |rz 3 |<£.
2
+—{u(x ,x ,e)-u(x ,x ,-e)} Then ue G #o' 2 (ft \ E), ue - - u weakly in L 2 (fi) for e -+ 0, r(ue) \Du\2 + ^
= 5 / /
\]-D(u(x\x2,e)
I
u(x\x2,-e))
+ \ \2
3
+D(^(u(x\x2,e)-u(x1,x2,-e)))\ ~ \ f
\Du\2 + ?- [
~5 /
|^|2-f £
\D(x*(u(x\x2,e)-u(x\x2,-e)))\2 \u(x\a*,e)-u(x\x2,-e)\2
/
+ terms that contain a:3 and go to zero as e —» 0 (|rr3| < e). If w is smooth (which we may assume by an approximation argument), therefore for e —> 0 2 i > c ) - 4 / l^^l2 + ? / M E* 2
4
JQ\E
JE
We now check property (i) of T-convergence: Let ve —* u weakly in L2(Q). We need to show lim inf F (Ve e-»0
)>n u).
For ue as above, ae f
<+°Hjou,r
\Dv<\'
> ae I > f
/
Due Dv( (D{u(.,e)+u(.,-e)}
J Ee
X3
+-D{u(;e)-u(;-e)} e3 +-±(u(;e)-u(;-e)))-Dv , (
where e$ of course is the unit vector in the x3 direction.
6.3 Thin insulating layers
239
We may assume u smooth (otherwise, we use an approximation argument). Then as above
U
^ f ?/ E J^I 2 n/ E M-
> lim inf £ /
(DW,6)-ti(.,-e))).^x3
+ lim inf - /
e:3 • Dve (u(-, e) - ii(-, - e ) ) .
Without loss of generality liminfe_^o Ie{v€) < oc. Then supe /
|D*;C|2 < oo.
(6.3.7)
Consequently, lima^y (^Du(-,e) + ^Du(-,-e)YDve
\Dve\2J
cae
-+ 0 for e -> 0. Similarly, lim sup a / (D(u(-,e) - w(-,-e))) • Dvex3 —• 0 e->o J E C since \x3\ < e. Thus
W T / J ^ + UM* > lim i n f - /
e3'Dt;e(w(',6)-u(-,-c)).
Since it(-, e) and u(-, — e) do not depend on x 3 , we obtain by integration ~ / = ? /
e3-Ito€(u(-,€)-u(-,-€)) ^ K - , €) - w(-, -c)) - ^ /
t;€ « , c) - u(-, - c ) ) ,
where here of course dZf = 0 D {x 3 = ±c}. Since we may assume liminf c _o^ c (^c) < oo, ve is bounded in jfiF1,2(nc). Therefore, we may
240
r-convergence
assume that the traces of v€ on d £ € converge*)". Since u is assumed smooth and v€ converges to u weakly in L 2 , we may assume v
€,(rae)± - " u ^ O * )
weakly in L2(<9£e).
We then get lim| /
e3-Dve(u(-yc)-u(',-e))
= -
I [ti]|.
Altogether liminf |
\Dve\2 + <\ Jju]l
jf
> f
/ M | .
Therefore \Du\2 + |
liminf I > e ) > \f
J[u}l q.e.d.
Exercises 6.1
Determine the T-limits of the following sequences of functions Fn : R -+ R: F n (#) := n(sinn£ -f 1) v
'
ll
forx = 0 x for 0 < a; < ± B 2n - n 2 x for - < x < n — — n F n (#) := sin n# 4- cos nrr. 2
F (x):-{;
6.2
Show the following result: Let X be a topological space satisfying the first axiom of countability, Fn, Gn : X —• R. Suppose that F n T-converges to F , Gn T-converges to G, Fn 4- Gn Tconverges to H (assume that the sums Fn + G n , F + G are always well defined; for example, there must not exist x € X with F(x) = oo, G(x) = — oo or vice versa). Then F +
G
Does one get equality ' = ' instead of ' < ' here? (Hint: Consider Fn(x) — sinnrr, Gn(x) = — sinnz.) f For this technical point, see e.g. W. Ziemer, Weakly Differentiable Springer, GTM 120, New York, 1989, pp. 189ff.
Functions,
7 BV-functionals and T-convergence: the example of Modica and Mortola
7.1 The space
BV{9)
d
Let C#(]R ) be the space of continuous functions on E d with compact support. For each Radon measure ^ and each //-measurable function v : E d —+ E with \v\ = 1 fi-almost everywhere, we can form a linear functional L : C$(Rd) -* R
L(f) = [ fvdfi. Conversely, we have the Riesz representation theorem, given here without proof (see e.g. N. Dunford, J. Schwartz, Linear Operators, Vol. I, Interscience, New York, 1958, p. 265). Theorem 7.1.1. Let L : Cg(E d ) —• E be a linear functional with \\L\\K := sup{L(/) : / € C 0 °(R d ),|/| < l , s u p p / C K) < oo
(7.1.1)
for each compact K C E d . Then there exist a Radon measure fi on Rd and a fi-measurable function v : E d —+ E with \v\ = 1 fi-almost everywhere with L(f)=
[
fvdn
If L is nonnegative, i.e. L(f) v = 1, i.e.
forallf€C°{Rd).
(7.1.2)
> 0 whenever / > 0 everywhere, then
L(f) = f JRd
241
fdtx.
(7.1.3)
242
Modica-Mortola example
Thus, the Radon measures on Rd are precisely the nonnegative linear functionals on Co(Krf). (Note that (7.1.1) automatically holds if L is nonnegative; namely \\L\\K
=
L(XK)
in that case where \K is the characteristic function of K.) The same result more generally holds for Co(Rd,.ff") where H is a finite dimensional Hilbert space with scalar product (•,•). Then linear functionals L : Cg(R d , H) —> R satisfying (7.1.1) are represented as
Hf)= I (f^)d^
(7.1.4)
Jmd where // again is a Radon measure and v : Rd —> H is //-measurable with \v\ = 1 //-almost everywhere. Also, in the situation of Theorem 7.1.1, one has /i(n) = sup{L(/) : / G Cg(Q), | / | < 1} for any open f ! c R d . The expression vdfi in (7.1.4) (\v\ = 1 //-almost everywhere) is called a vector-valued signed measure, (/i is supposed to be a Radon measure and v a //-measurable function with values in H.) Definition 7.1.1. Let ft £ Rd be open. The space BV(Q) consists of all functions u £ L^fi) for which there exists a vector-valued signed measure v^i with //(fi) < oo and / udivg = — / 9vd[i
(7.1.5)
for all g £ CQ°(Q, Rd). In this case, we write Du = vfi, DiU = i/^/x (y = ( i / l 5 . . . ,i/ d ) ,f = 1,.. .,rf). Foru £ BV(fl), we put ||Z^||(fi) :=//== sup {fQudivgdx
: g = (g\ .. .,gd) £ C§° (fi,R d ), \g(x)\ < 1 for all x £ fi} < oo
and IMlBV(n) : =IMlL«(0) + IP u ll(fi)-
1.1 The space BV(fi) For u € BV(ft),
243
\\Du\\ is a Radon measure on ft: u&ivgdx : G C0°°(fio,Md), M < l }
\\Du\\ (n 0 ) = sup Ij for fio open in fi. We write
| | D i i | | ( n o ) = : / HDfill, Jn0 and also | | D i t | | ( / ) = : / /||jDit|| n
for a nonnegative Borel measurable function / on fi.
We have for / € Cg(fi), / > 0: \\Du\\(f) = sup{[udivg:g€CZ°{n,Rd),\g(x)\
V x £ fij . (7.1.6)
Lemma 7.1.1. J / u G W 1 ' 1 ^ ) , tfien u G BV(f2), and cfyx = |Du| dx
where Du is the weak derivative ofu anddx is d-dimensional Lebesgue measure,
and DU{X)
i/(x) = {
ifDu(x)^0
\Du(x)\ 0
otherwise.
The proof is obvious. q.e.d. On a compact hypersurface 5 C Rd of class C°°, we have an induced metric and in particular a volume form dS. The (d — l)-dimensional volume of S then is
\S\i-! = f
dS.
Jss
L e m m a 7.1.2. Let E be a bounded open set in Rd with a boundary dE of class C°°. Then m^-WDxBWW), where \E is the characteristic function of E.
(7.1.7)
244
Modica-Mortola example
Proof. We have to show \dE\d_x = sup U
div : g e C0°°(Rd,Rd), \g\ < 1J .
By the Gauss theorem [ divg= f
JE
g(x)n(x)d(dE)
JOB
where n{x) is the exterior normal. Therefore \dE\d_x > sup IJ
dwg : g € C^(Rd,Rd),
\g\ < 1J .
For the converse inequality, we use a partition of unity to extend n to a C°°-vector field V on Rd with \V(x)\ < 1 for all x € Rd. For ip € C^{Rd) with \tp\ < 1, we put g =
JdE
Consequently sup lj
div : g G CHR^R**), \g\ < l |
> sup | f
\
= \dE\d_x. This completes the proof. q.e.d. The same conclusion holds if E CC ft for some bounded open set; namely \dE\d_x = \\DXE\m
= s u p | ^ d i v 5 :g e ^ ( f i . R 4 ) , |«?| < l }
in that case. Definition 7.1.2. -4 Bore/ se£ £" C Rd has finite perimeter in an open set ft if XE\ € BV(ft). The perimeter of E in ft in that case is P(E,Sl) :=
\\DXB\\(n)
( = s u p | ^ d i v 5 : f f € C 0 o o ( n ) R d ) , M < l | ) . (7.1.8) E is a set of finite perimeter if XE € BV(Rd).
7.1 The space BV(ft)
245
The following lower semicontinuity result is easy to prove and very useful. T h e o r e m 7.1.2. Let ft cRd
be open, (un)n£n
un —• u
in
C BV(ft),
and suppose
L1^).
Then for every open U C ft \\Du\\ (U) < liminf \\Dun\\ (U).
(7.1.9)
n—+oo
If in addition sup {\\Dun\\ (ft) : n € N} < oo,
(7.1.10)
then u £
BV(ft).
Proof. Let g £ C£°((7,Rd) with \g\ < 1. Then / udivg=
lim / undiv<7 < liminf ||Du n || (U).
Taking the supremum over all such g, we obtain (7.1.9). If (p £ Co°(fi), then for i = 1,.. . d lim / ipDiun = — lim / unDi(p = — / n D ^ and hence
/ uDup < sup |(^| liminf ||Dii n || (ft) < oo in case (7.1.10) holds. Since C%°(ft) is dense in Cg(fi), for z = 1 , . . . , d Diu((p) :== - / uDi?, then is a bounded linear functional on Co(fi), and thus u £
BV(ft). q.e.d.
We next discuss the approximation of BV-functionals by smooth ones through mollification. As usually, we let p £ C^>(Md) by a mollifier with p > 0, suppp C S(0,1), JRd p(x)dx = 1, and we also impose the symmetry condition p(x) = p(-x).
(7.1.11)
246
Modica-Mortola example
We then put as in Section 3.2
and for u € LX(Q), we extend u to Lx(Rd) x € Rrf \ Q and put w^(a;) := ph * u(s) :=
/
JRd
by defining u(x) = 0 for
ph{x - y)u{y)dy e C°°(fi).
Theorem 7.1.3. J/w € 5 7 ( 0 ) , tften uh-* u in Ll{Sl) and \\Duh\\ -> ||Dw|| m the sense of Radon measures as h -+ 0, i.e. for every f G Co(fi) Urn f f\\Duh\\^
f f\\Du\\.
(7.1.12)
In particular, lira ||Z)u h || (0) = ||£>u|| (0).
(7.1.13)
h—•()
JPTW/. iz^ —• u in L 1 ^ ) by Theorem 3.2.1. It suffices to consider the case / > 0. Prom (7.1.3) it follows as in the proof of Theorem (7.1.2) that for every / <E Cg(fi) with / > 0 [ f\\Du\\
[ f\\Duh\\.
(7.1.14)
It thus remains to prove that for such / l i m s u p / / | | D ^ | | < f f\\Du\\. h->o Jn Jn
(7.1.15)
For that purpose, we first obtain from (7.1.6)
f f\\Duh\\ =
Jn
sup | / g(x)Duh(x)dx
: g e C^(il,R),
\g{x)\ < f(x)
V x £ n\ .
(7.1.16) Here, Du^ = (g|rW/,, • • •, gfj^ft) is the gradient of Uh, since Uh is smooth.
7.1 The space BV{Q)
247
For any such g as in (7.1.16) / g(x)Duh(x)dx
= — / Uh(x) divg{x)dx = ~ / / Ph(x - y)u(y)dy = -
divg(x)dx
Ph(y ~ x) divg(x)dx
u(y)dy
by (7.1.11)
= -Ju(y)div(gh)(y)dy.
(7.1.17)
Since we assume \g\ < / , we have \9h\ < \g\h < A , and since / is continuous, fh =3 / uniformly as h —• 0 (see Lemma 3.2.2), i.e. \fh(x) - f(x)\ < f)h for all x G ft, with lim^orfr = 0. By definition of ||JDU||, the right hand side of (7.1.17) therefore is bounded
bySn(f + vh)\\Du\\. Thus, for every such g lim / g(x)Duh(x)dx
h
< [ f \\Du\\,
and (7.1.15) follows (cf. (7.1.16)). q.e.d. Corollary 7.1.1. Let Q be a bounded, open subset of Rd. sequence (un)nen C BV(Q) with WUUWBV -
K for
some
Then any
K
1
contains a subsequence that converges in L ^ ) to some u G BV(Q) with \\u\\BV
~ vn\\Li(Q)
\\Dvn\\(n)
<
~
+ l.
lyl
Therefore (fn)n€N is bounded in W (ft). By the Rellich-Kondrachev compactness theorem 3.4.1, after selection of a subsequence, (fn)nGN converges in Lx(f2) to some u G L 1 (fi). (un) has to converge to u as well (in L 1 ^ ) ) . By Theorem 7.1.1, u G BV{Q), and \\u\\BV
Modica-Mortola example
248
A reference for the BV theory is W. Zierner, Weakly Differentiate Functions, Springer, GTM 120, New York, 1989, Chapter 5.
7.2 The example of Modica-Mortola We now come to the theorem of Modica-Mortola: Theorem 7.2.1. Let FJu) •= I /
| ^ ~
+nsin 2 (7rntx) i
for u e H1*
v oo F{u)
:=
nL1^)
otherwise, / \ j ^ \Du\ = i \\Du\\
for u € BV(R<)
I oo 1
otherwise.
d
Then w.r.t. to L (R ) convergence F = T- lim F n .
(7.2.1)
n~>oo
Proo/. (i) We first want to show F(u)
(7.2.2)
n—*oo
whenever un--*u
ini^R*).
For that purpose, we put 1 fnt hn(t) := - / |sin(7rr)|dr. ™ Jo We note that I M « ) - M*)l < k ~ *l
f o r a11
neN,s,teR.
Therefore \\hn oun — hno u\\Ll < \\un — u\\Li ~» 0
as n —> oo.
(7.2.3)
Also lim /i n (0 = -*• n—+oo
7T
(7.2.4)
1.2 The example of Modica-Mortola
249
We now obtain , fln
2 O Un
U 7T
< \\hnoun L
-hnou\\L1
-f
, hnou
1
0
as n -^ oc
2
u
7T
L1
(7.2.5)
by (7.2.3), (7.2.4), and Lebesgue's Theorem 1.2.3 on dominated convergence. We may assume un G Hh2(Rd)
for every n G N,
(7.2.6)
because otherwise Fn(un) = oo, and (7.2.2) is trivial. Then liminf Fn{un) > 21iminf / n—KX)
n—>oo
\Dun\ |sin(7rrain)|
J^d
= 21iminf
[\D(hnoun)\
n—oo
>-
I
J
\Du\
by (7.2.5) and Theorem 7.1.2
= F(u). This shows (7.2.2). (ii) We want to show that for every u G L x (E d ), there exists a sequence (iin)riGN C Ll(Rd) converging to u in L 1 (M d ) with Ums\ipFn(un)
(7.2.7)
n—+cx>
thereby completing the proof of T-convergence. This inequality will be much harder to show than (7.2.2), however. We shall proceed in several steps: (1) We may assume u G Co°(E d ). By a slight extension of the reasoning of Theorem 7.1.3, we may find UH G C™(Rd) (take a smooth
\uh(x) - u(x)\ dx = 0
lim F(uh) = F(u). h—+0
Applying Lemma 6.1.1, we may indeed assume u G CQ°(Rd). (2) We now want to show that it suffices to verify the claim for certain step functions.
Modica-Mortola example By (1), we assume u e C%°(Rd). By Sard's theorem, for almost all t e R , u"1^)
= {x:u(x)
= t}
is a hypersurface of class C°°. For every v € Z, n £ N, we may then choose £„>n with this property, with u + l
- *-* n
n
and satisfying
The coarea formula (Theorem A.l) then implies /
I |iz""1(*)|rf_1 rf*
\Du{x)\dx^
* E /" h-'wL,-!* I / = —OO
oo
n 1
>£
-k^Lx
t / = —OO OO
=
1]
j
~||^X{tz>^, n }|| by Lemma 7.1.2. n
t/ = — OO
We choose N{n) € N with A^(n) > (nmax \u\ + 1) and put tf(n) 1 «»:=——+ -
^ 2w
X{«>t„„>-
i/=-N(n)
The preceding inequality implies lirnsupF(t/ n ) < F(t/). n—*oo
If £„>n < u(x) < ^i/,n4-i> then w n (x) = —. Therefore n suppi/ n C supp?/, and for all x 2 \u(x) - un(x)\ < - . n
7.2 The example of Modica-Mortola
251
Since u is assumed to have compact support, therefore lim /
\un(x) — u(x)\ dx = 0.
Lemma 6.1.1 then implies that it suffices to prove the claim for the functions un. (3) In (2), we have reduced the claim to step functions N U =
X^Xn*,
where the fti are disjoint bounded open sets with boundary dfli of class C°°. Since the general case is completely analogous, for simplicity, we only consider the case N = 1, i.e. u = aXn
(7.2.8)
with fl bounded and dQ of class C°°. Thus F(u) = -a \dQ\d_1
(cf. Lemma 7.1.1).
(7.2.9)
We let 0 < p < Co, where eo is given in Lemma B.l. Thus, the signed distance function d(x) as defined in Appendix B is smooth on {x G Rd : dist(x, dQ) < p). We need the following auxiliary result: Lemma 7.2.1. Let n G N, let an G R, with limn_>oo an = a G R, nan G Z, 4>n{x) := / ( —
+ nsin 2 (7rn X W)} dt
be the one-dimensional analogue of Fn. Then there exist Lipschitz functions \n • R —• R with Xn(t) = 0 Xn(t)
=OLn
for for
t<0 t>
—=
\/n 0<Xn(t)
for 0
< -£=,
and limsup0n(Xn)<-a. n—+oo
7T
( 7 - 2 - 10 )
252
Modica-Mortola example We postpone the proof of Lemma 7.2.1 and proceed with the proof of the theorem. We choose a sequence (an)n€R C R
with lim an = a, n—»oo
and n a n G Z as in Lemma 7.2.1. We put
fin := j z <Efi:d(:z) < ~L 1 and wn(a:) := Xn(^(#))
with Xn as in Lemma 7.2.1. (7.2.11)
Then un(x) = 0
for
x£Rd\n
V>n(x) = # n for X € fi \ f2 n
0 < u„(x) < a n
for x € Q n .
We also note lim | n n L = 0.
(7.2.12)
\u(x)-un(x)\dx = 0,
(7.2.13)
Thus (cf. (7.2.8)) lim / n->oo J^d
and un converges to u in L1. We also let (as in Appendix B) £ t := {x € Md : d(x) = t}. We note Dun{x) = 0 , sin(n7ru n (z)) = 0
for x e Rd \ nn, (7.2.14)
and \Dd(x)\ = 1 for xEftn
by Lemma B.l.
(7.2.15)
7.2 The example of Modica-Mortola
253
Then limsupF n (ii n )
f I \Du (x)\2 = limsup /
—h nsm2(n7TUn(x))
I
\ 1 \Dd(x)\dx
= limsup [^ |l^Xn(01 + n s i n 2 ( n 7 r X w ( t ) ) | n n-*oo JO \ J by Corollary Bl (coarea formula)
\Et\d^dt
< limsup sup 0 n ( X n ) * | s t l d - i I n-oo \ 0 < t < ^ J 4 < —a \d£l\d_x by Lemma 7.2.1 and Lemma B.l = F(u) (cf. (7.2.9)). This is (7.2.7). (4) It only remains to prove Lemma 7.2.1: The idea is of course to minimize
(7.2.16)
We now construct a solution of (7.2.16) with the desired properties: w.l.o.g. a > 0 (the case a < 0 is analogous). We choose cx = £ in (7.2.16). We put
*»(*) := / ~ ( i Jo n
rU > I
\^+sm2(mTs))
Vn : = >n(c*n).
Then 0 < T]n <
~J=OCn.
We let Xn : [0, fin] -+ [0, an]
ds
254
Modica-Mortola example be the inverse of ipn. Then Xn is of class Cl and \x'n{t)
= ( i + sin 2 (n7r X „(<))) * •
(7.2.17)
We extend Xn to K as a Lipschitz function by putting Xn(t)
= 0
for t < 0
X„(<) = an
for « > T?„.
Then
= f"
(~^-
&®- + n Q +sin2(7rnXnW)) dt
=/ " "
2
= 2 /
Jo
+ nshvVxnW)) dt
Q + sin 2 (7rnxn(<))) * Xn(*)*
b
y (7-2.17)
(n — + sin2(7rns) I ds
\
J
and Lemma 7.2.1 follows. q.e.d.
References L. Modica and St. Mortola, Un esempio di T -convergenza, Boll. U.M.I. (5), 14-B (1977), 285-99. L. Modica, The gradient theory of phase transitions and the minimal interface criterion, Arch. Rat Mech. Anal 98 (1987), 123-42 Let us also quote without proof the following result of L. Modica, loc. cit., which plays an important role in the theory of phase transitions: Let Q cRd be open and bounded with Lipschitz boundary, W : R —• R+ be continuous with precisely two zeroes a, (3 {which then are absolute minima, because W is nonnegative) Fn{u) := { L ( £ \\Du(x)\\2 + nW(u(x))) v oo
dx
for u € H^Sl) otherwise
and F (u) = < ^c° fn H^^ll 1 oo
for u ^ BV{0) otherwise
and for almost all x G M
7.2 The example of Modica-Mortola
255
with f0
co=
i
W2(s)ds.
Ja
Then F$ is the T-limit of Fn w.r.t. L1-convergence. The proof is similar to the one of Theorem 7.2.1, except that we cannot apply Sard's lemma anymore, because even for a smooth function it, a and (3 need not be regular values. Thus, one has to consider nonsmooth level sets as well and appeal to some general results about BV-functions and sets of finite perimeter. The interpretation of Modica's theorem is the following: Consider first the problem / W(u{x))dx
—• min
JQ
under the constraint — / u(x) = 7, meas with a < 7 < /? (w.l.o.g. assume a < (3). A minimizer then is of the form
•{;
for A2 C fi
such that Ai U A2 = fi, a meas A\ + /3 meas A2 = 7 meas ft.
(7.2.19)
Uj thus jumps from the value a to the value (3 along dAiDfl = d ^ f l f t =: T. However, apart from the preceding relations (7.2.19), A\ and A2 and hence also T are completely arbitrary. In particular, Y may be very irregular. In order to gain some control over the transition hypersurface T, one adds the the regularizing term Jn \\Du(x)\\ to the functional, albeit with an arbitrarily small weight, and in fact one passes to the limit where this weight vanishes so that one preserves (7.2.18), (7.2.19). Although this regularizing term disappears in the limit it still has the effect of regularizing the hypersurface V along which the transition from a to (3 occurs. Namely, the hypersurface of discontinuity of the minimizer u now is constrained by the requirement that the BV norm oft/, Jn ||2?w||, be minimized. This means that T is a so-called minimal hypersurface. The existence and regularity theory for such minimal hypersurfaces may be found for example in E. Giusti, Minimal Surfaces and Functions of Bounded Variation, Birkhauser, Boston 1984, pp. 3-134.
256
Modica-Mortola example Exercises
7.1 7.2
Try to construct bounded sets in M.d that do not have a finite perimeter. Prove the preceding theorem of L. Modica for d — 1.
Appendix A The coarea formula
Theorem A . l (coarea formula for smooth functions). Let u e C$(Rd). Then by Sard's theorem, Cu := {t e R : Bx e R d : Du(x) = 0, u(x) = t} has one-dimensional Lebesgue measure zero, and thus, for almost all t € R, u~1(t) is a smooth hypersurface by the implicit function theorem. We then have for every open ft C R d
/ \Du{x)\dx= J JQ
lir^Onfil^d*.
(A.l)
J-OO
Proof. (1) We first show the result for a linear map / : Rd -> R (w.l.o.g. / ^ 0). Let 7r : Rd —> R be the projection onto the first coordinate. We may find A e G/(1,R), R e 0(d, R)f with I = A O 7T O R.
For every measurable subset E of R d , we have by Pubini's theorem
\E\d= I"
lEnir-Ht^dt,
J—oo
where Rd
XE
t Gl(d,R) := {d x d-matrices A with real entries and detA / 0}, 0(d,R) := {,4 G G?l(d,R) | A1 = A" 1 } (orthogonal group).
257
258
Appendix A is the Lebesgue measure of E. Since R is orthogonal, we likewise have /
oo -oo
We then change variables via s — At and obtain oo
lEnR-'on-'oA'Hs^ds
/
-oo /oo
{EnrH^ds.
(A.2)
-oo
Since \A\ = |d/|, and / is linear, this is the coarea formula for linear maps. (2) Let Su = {xeRd:
Du(x) = 0}
Ut := {# € Rd : u(s) > £} for t € R. We put W<
if
' I -XR-\t/ft
* < °-
Then u(x) = /
ut{x)dt.
JR
Let y? e C^(Rd /
\ Su), \
ii(x)div^(x)rfx = /
JRd
/ ut(x) div (p(x)dtdx JUdJR
=
I ut(x) div (p(x)dxdt
JmJmd
by Fubini's theorem.
(A.3)
By definition of Su and the implicit function theorem, u~~l(t) D Rd \ Su is a hypersurface of class Cd. Since we assume supp (p C R d \ 5W, we may apply the divergence theorem to obtain / div(p(x)dx= Jut
/ J(dUt)nRd\su
(p(x)n(x)d(dUt)(x)
and —/ jRd\Ut
div (p(x)dx~
/ J
(dUt)r\Rd\Su
(p(x)n(x)d(dUt)(x)y
The coarea formula
259
where n(x) is the exterior normal of £/*. We use this in (3) (recall the definition of Ut) to obtain —/
JRd
Du(x)ip(x)dx = /
u(x) div (f(x)dx
JR*
= [ [
l d < / \u (t)DR \Su\d_1dt
since we assume \
JR JR
Taking the supremum over all such 9?, we obtain / JRd
\Du(x)\dx=
[
f {u"1^.
\Du(x)\dx<
JRd\Su
dt.
(A.4)
JR
(3) We now prove the reverse inequality. We let ln : Rd —• E be piecewise linear maps with
n
lim /
- * ° ° jRd
lim / n
- * ° ° JRd
\ln-u\=0
\Dln\=
f
(A.5)
\Du\.
(A.6)
jRd
Let C/f := {x e Rd : ln(x) > t}. By (A.5), there exists a countable set T\ C E with the property that for all t $ Tx n
lim /
-*°° jRd
|Xt-x»|=0,
(A.7)
where Xt is the characteristic function of {u(x) > £}, and Xt the one of {/n(#) > t}. As noted above, by Sard's theorem and the implicit function theorem, there exists a null set T2 C E such that for all t £ T2, u"1^) is a smooth hypersurface of class Cd. We put T:=T1UT2.
260
Appendix A Let t e R \ T , e > 0. By Lemma 7.1.2, there exists g e Q ° ( R d , E d ) with \g\ < 1 and l u _ 1 (*)!,,_! < / We let M := JRd \div g(x)\dx. n> UQ
div g(x)dx +
E
-.
(A.8)
We choose no so large that for (A.9)
Then for n >UQ \ div g(x)dx - / div g(x)dx \J{u(x)>t} J{in(x)>t}
<M f
\Xt-X?\dx<~.
(A.10)
(A.8) and (A. 10) imply K
1
(*)!,,_!< /
divg{x)dx+€2
J{u(x)>t}
< /
div g(x)dx + e
J{ln(x)>t}
= /
g(x)n(x)d(d{ln(x)
> t}) 4- e,
Jd{in(x)>t}
n(x) denoting the exterior normal of {/n(x) > t}
ld h-1(*)Li * * i^s f i I1-1 WU d t
\Dln(x)\dx
\Du(x)\dx.
(A.ll)
(A.4) and (A.ll) easily imply the claim. q.e.d.
The coarea formula
261
Corollary A . l . Let u e C$(Rd), g : R-> R integrable, Q C Rd open. Then J g(u(x)) \Du{x)\ dx=
I
g(t) \u~\t)
D n | d _ 1 dt.
(A.12)
Proof. (A.12) follows from Theorem A.l if g is the characteristic function of an open set and similarly if g is the characteristic function of a measurable set. By considering $+(*):= max(0,s(t))
g-(t):=max(0,-g(t)) separately, it suffices to consider the case where g > 0, since always g(t) = g+(t) — g~~(t). We thus assume g > 0. Let now (pn)neN C R + with lim p n = 0 n—»oo oo
]T) Pn = 00, n=l
and put inductively
An := I x € R : $(x) > pn + 5^X4,0*0 \ . Then for all x € Rd OO
^(x) = ^ p n X A n ( x ) .
(A.13)
n=l
Since we observed that (A.12) holds for \An in place of g, the representation in (A.13) in conjunction with Beppo Levi's Theorem 1.2.1 on monotone convergence then implies (A.12) for g. q.e.d. Remark A.l. The coarea formula is due to Federer. It holds more generally for Lipschitz functions u : Rd —> R. See H. Federer, Geometric Measure Theory, Springer, New York, 1969, pp. 241-760, 268-71.
Appendix B The distance function from smooth hypersurfaces
We also need some elementary results about the (signed) distance function from a smooth hypersurface. Let Q C Rd be open with nonempty boundary dft. We put At {X)
:
- fdist(x,an) ifxeft " { - dist(ar, dCl) if x € Kd \ SI.
d is Lipschitz continuous with Lipschitz constant 1. Namely, for x,y G E d , we find 7ry G dfi with d(y) = \y — 7ry|, hence d(x) < |x - 7ry| < \x - y\ + \y ~ ?ry| = \x~y\
+%),
and interchanging the roles of x and y yields \d(x)-d(y)\<\x-y\. We now assume that dQ is of class C 2 . Let XQ G dfi. Let n(xo) be the outer normal vector of Q at #o, and let Ti 0 be the tangent plane of OS), at xo- We rotate the coordinates of Rd so that the xd coordinate axis is pointing in the direction of -n(xo). In some neighbourhood U(x0) of #o, dil can then be represented as xd = f(x')
(B.l)
with x' = (x\... ,xd~1), where / € C2(TXo n tf(s0)), - D / K ) = °- T h e Hessian D2f(xo) is symmetric, and therefore, after a further rotation of coordinates, it becomes diagonalized, 0
/KI
D2f(x0)=\
•. \ 0
262
|. Kd-1.
(B.2)
The distance function from smooth hypersurfaces
263
KI, . . . , Kd-\ are the eigenvalues of D2f(xo), and they do not depend on the special position of our coordinates. They are invariants of dQ, and are called the principal curvatures of dQ, at XQ. The mean curvature of dQ, at #o is H(xo) = ^ - j X > = ^-j
A/(x0).
(B.3)
The outer normal vector n(x) at x S 9f2 f~l U(xo) has components nUx) =
dx JK
' ' , ,t = l , . . . , d - l (l + l D / ^ ) ! ) '
(B.4)
n d (x) =
(B.5)
—
(l +
r
\Df(x'W
(#' = (or 1 ,..., x d _ 1 ) ) . In particular 7j-jn*(xo) = M i j
for t , j = 1 , . . . ,d - 1.
(B.6)
L e m m a B . l . Suppose Q is open in R d and that dQ is bounded and of class Ck with k > 2. For rj e R, put Zr, := {x € R d : d(x) = r/}. TTiere e:riste eo > 0 (depending on dQ) with the property that for M < co, k
Hrj is a hypersurface of class C . Also, limJE.IH^L-i-
(B.7)
Proof. Since dfl is compact and of class C 2 , there exists e > 0 with the following property: Whenever |r/| < e for each XQ £ dQ,, there exist two unique open balls Bu B2 with Bx C ft, JB2 C Rd \ Q,
B1ndn = xo = B2n on of radius \rj\. The eigenvalues of the Hessian D2f(xo) of a normalized representation / of dQ at #o as above then have to lie between — - and I i-e. M < i
(B.8)
for the principal curvatures « i , . . . , Kd-i- If x is a centre of such a ball,
Appendix B
264
then XQ = 7rx. Also, by uniqueness, these balls depend continuously on XQ € <9fi. Thus, if \r/\ < e, each x € E^ is the centre of such a ball, and nx = x -f n(x)d(x)
with n(x) := n(7rx)
(B.9)
is the unique point in dSl with \x — 7rx| = |rf(x)|. We once again employ the coordinates used for the definition of / and rewrite (B.9) as x = F(x', d) = (x', f(x')) - n(x', f(x'))d. Then F € C f c _ 1 (((r x o n U(x0)) x R) ,R d ) and at the point (1 - Kid(x)
0
(x'0,d(x))
\
DF =
by (6) . 1 - Kd~\d(x)
V
(B.10)
0
(B.ll)
1/
By (B.8) and since \r/\ < e, det DF ^ 0. By the inverse function theorem, x' and d therefore locally are Ck functions of x (cf. (B.9)). Since
l
-
d{x) = d(x0 - ^n(xo)) = rj, we have Dd(x) - n(x0) = - 1 . Since d is Lipschitz with Lipschitz constant 1, we conclude \Dd{x)\ = 1 and Dd(x) = -n(x0)
€
Ch~l.
Thus d E Ck locally, and the level hypersurfaces T,v are of class Ck. For (B.7), we may w.l.o.g. take rj > 0 as the case rj < 0 succumbs to the same reasoning. We consider the vector field V(x) = Dd(x). The Gauss theorem yields / J{0
divV»= / JEQ
V(x)n(x)d{Z0}(x)+
f
F(xK(£)d{£j(z), ^
The distance function from smooth hypersurfaces
265
where n^ is the normal vector of E^ pointing in the direction opposite to n. Since the measure of {0 < d(x) < 77} goes to zero with 77 and V(x) = -n(x) V(x) = nv(x)
for x e E 0 = dQ for x ET,V1
(B.7) easily follows. g.e.d.
References D. Gilbarg, N. Trudinger, Elliptic Partial Differential Equations, Springer, Berlin, 2nd edition, 1983, pp. 354-6.
8 Bifurcation theory
8.1 Bifurcation problems in the calculus of variations We wish to consider a variational problem depending on a parameter A, and to investigate how the space of solutions depends on this parameter. We thus consider I(u,\):=
/
F(t,u(t),u(t),\)dt.
Ja
A is supposed to vary in some open set A c S 1 . Often, one has 1 = 1. We assume that F : [a, b] x Rd x Rd x A -+ R is sufficiently often differentiable so that all derivatives taken in the sequel exist. For that purpose, one may simply assume that F is of class C°° in all its arguments although that is a little stronger than needed in the sequel. Remark 8.1.1. One may also impose boundary conditions depending on A, i.e. u(a) = wi (A) u(b) = w2(A), and finally, one may vary the boundary points themselves, a = a(A) 6 = 6(A). This latter variation, however, can formally be incorporated in the variation of F , by transforming the integral. 266
8.1 Bifurcation problems in the calculus of variations
267
Let r(.,A):[a(Ao),6(Ao)]-[a(A),6(A)J be a bijective linear map, for some fixed Ao- Then ,6(A) /
F(T,u(T),il(T))dT
Ja(X)
I
F(T,«(T),«(T))2I^dt,
a(A 0 )
<«
Ja
and putting v(t):=u(T(t,X))
F(tMt)Mt)A):=F^r(t,X),v(t)Mt)
(^r1)") ^
^
rO(A r fr(A 0)0 )
J(t;,A):= /
left F(t,v,t>(t),A)i
«/a(A 0 )
yields a parameter-dependent variational integral for v with fixed boundary points a(Ao),6(Ao). As established in Theorem 1.1.1 of Part I, a critical point u of /(•, A) of class C2 satisfies the Euler-Lagrange equations Fpp(t, u(t), u(t), X)u(t) + Fpu(t, u(t), u(t), X)ii(t) +F pt {t,u(t),u(t), A) - Fu(t,u(t),
(8.1.1)
ii(t), A) = 0.
We abbreviate (8.1.1) as L A u = 0.
(8.1.2)
In the light of Theorems 1.2.2 and 1.2.4 and Lemma 1.3.1 of Part I, we shall assume det Fpp(t, u(t), u(t), A) ^ 0
(8.1.3)
for all functions u occurring in the sequel. Equation (8.1.3) implies that (8.1.1) can be solved for u in terms of u and w, i.e. u = -Fpp(t, u(t), u(t), A)" 1 {Fpu(t, u(t), ii(t),
\)u(t)
+Fpt(t, u(t), ii(t), A) - Fu(t, u(t), u(t), A)} -f(t,u(t),u(t),\),
(8.1.4)
268
Bifurcation theory
see (1.2.10) of Part I. (8.1.2) thus is equivalent to u(t) ~ / ( t , u(t), ii(t), A) = 0.
(8.1.5)
The topic of bifurcation theory then is to study the space of solutions of (8.1.5) in its dependence on the parameter A. Before approaching this problem from a general point of view in the next section, we should briefly comment on the relations with the Jacobi theory introduced in Section 1.3 of Part I. For a critical point u of /(•, A) and 77 G D\{I,lkd), we had established the expansion I(u + sr}, A) = I(u, A) + ^-s262I{u, 17, A) + o(s) for s -+ 0, (8.1.6) with d2 fb 62I(u, rj, A) = Qx(v) := ^ J F(t, u(t) + srj{t), u{t) + *i)(*), = / {Fpipi (*>u, u, X)ViTjj 4- 2FpluJ (t, u, it, X)riirjj Ja
\)dt^0 (8.1.7)
+ F^ui (*, u, u, X)rjiTij } dt, abbreviated as rb
I
{FxmWI + 2F\,PufjV + F\,uuWn} dt.
Ja
Critical points of Q satisfy the Jacobi equations Jx(u)rj:=
~(Fpp(t,u,u, -Fpu(t,
\)r) + Fpu(t,u,u,
\)r))
(8.1.8)
u, ii, X)rj - Fuu(t, u, w, X)rj = 0.
J\{u) is called the Jacobi operator associated with the critical point u of /(•, A). We also observe that J\(u)v = J~SL^U + *V)\8=o.
(8.1.9)
Of course, this is not surprising since L\ represents the first variation of /(•, A) and J\ the second one. Prom the expansion (8.1.6) we see that I(u + s
(8.1.10)
No such conclusions can be achieved, however, if 62I(u,rj,\)=0.
(8.1.11)
8.1 Bifurcation problems in the calculus of variations
269
Now by Lemma 1.3.2 of Part I, for a Jacobi field f] that vanishes at the boundary points a and 6, (8.1.11) holds. This indicates that Jacobi fields play a decisive role for deciding about the minimizing property of a critical point u of /(-, A). Jacobi fields satisfy J\(U)TI = Q,
(8.1.12)
i.e. are solutions of the linearization of the equation L\U = 0 satisfied by u. This also indicates that Jacobi fields will play a decisive role in analysing the bifurcation behaviour of L\u — 0 as A varies. Namely, in finite dimensional problems, the presence of a nontrivial solution of the linearization of a parameter-dependent equation L\u = 0 at some parameter value Ao either results from a nontrivial family u(r) of solutions of L\0U(T) — 0 by differentiating the equation w.r.t the parameter r, or it indicates a nontrivial bifurcation as A varies in the vicinity of Ao- In the next section, we shall see that under appropriate assumptions, the same also holds in the present infinite dimensional context. In fact, the bifurcation problem will be reduced to a finite dimensional one via Lyapunov-Schmid reduction. The reason why this is possible in our variational context is that under our assumption (8.1.3), the space of Jacobi fields is always finite dimensional. Namely, analogously to (8.1.4), (8.1.5), the assumption (8.1.3) implies that (8.1.8) can be solved w.r.t 77, i.e fj - (p(t, u, ii, 77,77, A) = 0.
(8.1.13)
(Although this is not indicated by the notation, (8.1.13) is a linear equation for 77, and so the space of solutions is a linear space.) Now suppose that we have a sequence (?7n)n£N of solutions of (8.1.13) (for fixed u, A) that are bounded in some appropriate function space like C2(I) or W2,2(I). For concreteness, let us consider C 2 (7), i.e. for example \\r)n\\C2(I)
< 1-
By the Arzela-Ascoli theorem, after selection of a subsequence, (f]n)neN then converges in Cl(I) to some limit denoted by 770. (8.1.13) then implies that (rj)nen converges in C°(I) (as it follows from our assumptions on the differentiability of F that ? is smooth, in particular continuous). Thus (since the uniform limit of derivatives is the derivative of the limit), (77n)neN converges in C2{I) to 770, and consequently 770 also solves (8.1.13). From this compactness result, one easily deduces that the space of solutions of (8.1.13) has finite dimension.
270
Bifurcation theory
8.2 The functional analytic approach to bifurcation theory We consider the following general situation. We have Banach spaces V, W, and a parameter space A. We assume that A is an open subset of some Banach space. We consider a parameter dependent family of equations Lxu = 0,
(8.2.1)
with VxA->W (w, A) H-* L\u. We assume that L\u is sufficiently often differentiate w.r.t. to u and A so that all subsequent expansions are valid. The aim of bifurcation theory is to study the set of solutions u of (8.2.1) as A varies, to identify the bifurcation values of A, i.e. those values of A where the structure of the solution set changes, and to investigate that structure at such bifurcation points. In order to arrive at concrete results, we need an additional assumption. We consider the derivative of L\u w.r.t. it, Jx(u)v := (DuLx(u))v
:= -^Lx{u +
to)|t=0
(8.2.2)
for v € V. We assume that J\ is a Predholm operator of index 0, i.e. that ker J\ and coker J\ are of finite and equal dimension, and furthermore that there exists a canonical isomorphism ker Jx = coker J A .
(8.2.3)
We first consider the case where LXouQ = 0
(8.2.4)
ker JXQ(UQ) = {0} for some Ao G A,u 0 G V.
(8.2.5)
We shall see that in this case, no bifurcation can occur at Ao- Namely, we have: Theorem 8.2.1. Let L\0UQ — 0 for some Ao € A, u$ € V, ker J\0{v>o) = {0}. Then there exist neighbourhoods U(\Q) of XQ in A and V(uo) of uo in V such that for all A E U(\Q), there exists a unique u £ V(UQ) with Lxu = 0.
8.2 The functional analytic approach to bifurcation theory
271
Proof. Since J\0 is assumed to be a Predholm operator of index 0, (8.2.5) implies that Jx0 • V - W is an isomorphism. Thus the derivative w.r.t. the variable u of the map
(it, A) H-* L\u is an isomorphism at (ito,Ao), and the implicit function Theorem 2.4.1 implies that the equation Lxu = 0 can be locally resolved w.r.t. it, i.e. there exist neighbourhoods £/(Ao), V(uo) and a map U(X0) -> V(u0) A I-+ U(X)
such that Lxu = 0 precisely if u = u(X). q.e.d. We next consider the case where LXou0 = 0 K :— kerJ\0(uo)
is one-dimensional.
(8.2.6) (8.2.7)
The assumption that this kernel is one-dimensional may look restrictive, but it is typically satisfied in variational problems, and in this situation, we can already see the typical phenomena of bifurcation while avoiding additional technical complications that arise for higher dimensional kernels. In the sequel, we shall assume for simplicity u0 = 0
272
Bifurcation theory
(which can always be achieved by changing the dependent variables in our equation by a translation). In the sequel, we shall also usually write J\0 in place of J\0(UQ) = J\o(0). We may write V = K®VU
(8.2.8)
and in view of (8.2.3), we may also write W = K®Wi,
(8.2.9)
Wl = JXo(V) = Jx(V1).
(8.2.10)
with
We denote by 7T : V - + K
the projection onto K according to (8.2.8), and we consider TT(V) as a subspace of W, according to (8.2.9). Thus, if
u = £+ w with£ eK, w E Vi, then n{u) = £. In particular, ?r(0) = 0. We consider the map AXo : V -+ W u >—• L\0u
4- TT(U).
L e m m a 8.2.1. A\Q is a local diffeomorphism, DA\0 = DA\0 (0) : V —• W is an isomorphism.
i.e. the derivative
Proof. The derivative is computed as DAXov = JXov + TT(V) for v e V.
(8.2.11)
The Predholm operator J\0 yields a bijective continuous linear map between V\ and W\ because of the decompositions (8.2.8), (8.2.9), (8.2.10), and its inverse is likewise continuous (by Definition 2.3.1). Prom the definition of K and n and (8.2.3) we then conclude that DA\0 is an isomorphism. q.e.d.
8.2 The functional analytic approach to bifurcation theory
273
We now consider the map A:V
xA->W (u, A) >-+ A\(u)
:= L\(u)
+ TT(U).
By Lemma 2.3.4, there exists a neighbourhood V(Ao) of \0 in A such that for all A G V(Ao), A\(0) is a local difFeomorphism. We may therefore apply the implicit function Theorem 2.4.1. Consequently, as i4(0,A0) = 0,
(8.2.12)
there exist neighbourhoods U(0) of no = 0 in V, U\(0) of 0 in W such that for all A G V(Ao) and £ G Ui(0), there exists a unique u G t/(0) with ;4(ii,A) = f,
(8.2.13)
Lxu + ir{u) = ^.
(8.2.14)
i.e.
We write tx = u(^,A) for the solution u of (8.2.13). We have in particular ii(0,A o ) = 0,
(8.2.15)
since L\o0 = 0, 7r(0) = 0 (remember u 0 = 0). In this notation, (8.2.13) is A(u(Z,\),\))=£. The aim now is to find £ with *M£,A)) = £,
(8.2.16)
because (8.2.14) will then give L A u(£,A) = 0,
(8.2.17)
which is the equation that we wish to solve. Since the image of n is assumed to be one-dimensional (and in any case finite dimensional as J\ is supposed to be a Fredholm operator), we have reduced our bifurcation problem to a finite dimensional problem. In the sequel, we shall thus let £ vary only in K, the image of n. Thus, we may consider £ as a scalar quantity, £ = a£o, with a G R, where £o is a generator of K. We denote
274
Bifurcation theory
the derivative of u(a£ 0 , A) w.r.t. a and A, respectively, at a = 0, A = Ao by dau and d\u, respectively. (Note that A in general is not a scalar quantity, as we do not assume that A is one-dimensional.) Differentiating (8.2.14) w.r.t. a yields Jxodau + 7r(dau) = Z0.
(8.2.18)
./AoCo+*(&) = &.
(8-2.19)
Since £0 € K, also
Lemma 8.2.1 then implies dau = £0.
(8.2.20)
We are now ready for the essential point, namely the asymptotic expansion of the equation (8.2.16), i.e. 7r(u(£,A)) = £
(8.2.21)
near £ = 0, A = Ao. We let d^u.d^u be the second derivatives of u{a^, A) w.r.t. a and A, respectively, at a = 0, A = Ao, and likewise d\ xu be the mixed second derivative w.r.t. a and A. Higher derivatives will be denoted similarly by corresponding symbols. The Taylor expansion of (8.2.16) then is £ = 7r(w(£, A)) = 7r(0) -f- ir(dau)a 4- ir(d\u)n + -7r(^u)a2
+7r(92 A?i)a/i+
^(^^(^/i)
-f terms of higher order in cv and /i.
(8.2.22)
Since 7r(0) — 0 and since, by (8.2.20), dau = £0, hence 7r(9aw)a — a£ 0 = £, we may write (8.2.22) as 0 = 7r(3\w)/i -f 7:^(^u)a2
4- higher order terms in a only
(8.2.23)
•f 7r(9^ A u)a/i -f higher order terms that also involve /i. Remark 8.2.1. In order to interprets the terms in this expansion, we differentiate (8.2.14), i.e Lxu(£, A) + 7r(tx(£, \m
= ^o
(8.2.24)
twice w.r.t. a. One differentiation yields Jx(u)dau
+ n(0au) = £0,
(8.2.25)
8.2 The functional analytic approach to bifurcation theory
275
and differentiating once more gives DJx{dau)2
+ Jxd2au + 7r{d2au) = 0.
(8.2.26)
We put A = Ao and project onto K in the decomposition (8.2.9). We may also denote that projection by 7r, and we then have IT O J\0 — 0. Also, from (8.2.20), dau = £o5 a n d so we get ir(DJxe0) = -Adlu).
(8.2.27)
Thus, the first term in the expansion of Q in (8.2.24) can be expressed via D J\. In a variational context, J\ represents the second variation, and so DJ\ represents the third variation of the variational integral. Likewise, if d^u vanishes, i.e. if the third variation vanishes on the Jacobi field £o? then ir(d^u) can be expressed by the fourth variation, and so on. We now discuss the simplest case of a bifurcation, namely where ir{d2au) ^ 0.
(8.2.28)
We put a :— tr, /1 =: t 2 /i, a 0 := ir(d\u)p,, a\ := ~n(d^u), and (8.2.23) becomes 0 = a0t2 + ait2r2 + t 2 E(t, r, ft)
(8.2.29)
£(£, r, p) = 0(t) for fixed r, p for t -+ 0.
(8.2.30)
with
For t ^ 0, (8.2.29) is equivalent to 0 = a 0 + a x r 2 + E(t, r, p).
(8.2.31)
We shall now see by a simple application of the implicit function theorem that the bifurcation behaviour of equation (8.2.31) is equivalent to the one of 0 = a0 + a1T2.
(8.2.32)
We assume ao =£ 0; as will be discussed below (see Lemma 8.2.2), this can be derived from a suitable assumption about the variation of LA as a function of A. (8.2.28) of course means that a\ ^ 0. If ^ > 0, then there is no solution r of (8.2.32), whereas for ^ < 0, we have two solutions T15T2. We keep /x fixed for the moment and write (8.2.31) as 0 = a 0 + a i r 2 -f £(£, r, p) =: #(£, r ) .
(8.2.33)
276
Bifurcation theory
We consider the ease ^ E(0,t,/2) = 0, we have
< 0 with the solutions Ti,T2 of (8.2.32). As
$(0,r i ) = 0for i = 1,2,
(8.2.34)
— $(0, n) ^ 0, because of a 0 , ax ^ 0 and (8.2.30). or
(8.2.35)
whereas
The implicit function theorem then implies the existence of (locally unique) functions n(t) : ( - e , e ) - > R for i = 1,2,
0 < |£| < £, for some e > 0 that satisfy #(t,r i (t)) = 0.
(8.2.36)
We have thus found two solutions Ti{t),r2{t) of (8.2.33), hence (8.2.22), hence (8.2.16), hence (8.2.17), i.e. (8.2.1) for t ^ 0, for the parameters At = A0 + t2ji.
(8.2.37)
In the other case, ^ > 0, (8.2.30) implies that for sufficiently small \t\ ^ 0, there is no solution of (8.2.33), i.e. of (8.2.1). Thus, as promised, the bifurcation behaviour in case Tr(d^u) ^ 0 ( (8.2.28)) is completely described by the simple quadratic equation (8.2.32). Of course, replacing p by — p changes the sign of ao and thus interchanges the cases ^ > 0 and ** < 0. (Xl
We summarize our result in: Theorem 8.2.2. We consider a parameter dependent family of equations Lxu = 0
(8.2.38)
as above, V
xA-~>W
(uyX) >-» Lxu, where V, W are Banach spaces and A is an open subset of some Banach space, and L\u is smooth in u and A. We suppose that LXo0 = 0, and we wish to find the solutions of L\u = 0 in the vicinity of 0 as A
8.2 The functional analytic approach to bifurcation theory
277
varies in the vicinity of \Q . With Jx(u)v := (DuL\(u))v
= jtL\{u
+ *v)| t= o,
we assume that there is a canonical isomorphism ker Jx = coker Jx (see (8.2.3)),
(8.2.39)
and we let 7r:V->kerJXo
(JXo = J Ao (0))
6e a projection as defined above (see (8.2.8)-(8.2.10)). thermore that
We assume fur-
dim ker JXo = 1 (see (8.2.7)).
(8.2.40)
We assume that there exists some p, with a0 := 7r(dAu)/i
(= ^7r(u(0, A0 -f */x)),t=0) ^ 0
(8.2.41)
(see Lemma 8.2.2 below), and also 2ai := 7r(5gti) ^ 0
(8.2.42)
(nonvanishing of the third variation, see Remark 8.2.1). Then there exist e > 0 and a variation Xt = Ao -f t2fi of \Q with the property that for 0 < t < e, there exists a neighbourhood Ut of 0 in V such that the number of solutions u G Ut of LXtu = 0
(8.2.43)
equals the number of solutions of the quadratic equation a0 + a i r 2 = 0.
(8.2.44) q.e.d.
Remark 8.2.2. Since kerJA 0 , the image of 7r, is assumed to be onedimensional, we have simply considered n(dxu), 7r(d^u) as scalar quantities. We now consider the case where TT(<92U) = 0,
(8.2.45)
Tr(dlu) ^ 0.
(8.2.46)
but
278
Bifurcation theory
(8.2.23) then becomes 0 = ix{d\u)ji 4- ~n(dlu)a3
+ rc(dlxu)a/i
+ higher order terms. (8.2.47)
For a complete description of the bifurcation behaviour, this time we need to consider a two parameter variation. We assume that there exist /ii,/i 2 with ir(dxu)fii ^ 0
(see Lemma 8.2.2 below)
(8.2.48)
and *(dl,\u)V>2 £ °> 3
but
Ad\u)^2
= 0.
(8.2.49)
2
We put a := tr, /x = t bifii -\-t b2f.i2, with parameters bi,b2, and rewrite (8.2.47) as 0 = ^(nidxujfjubx
+ 7r(<9^Au)/i262r
(8.2.50)
+ g7r(^tx)r 3 H-E(t,r,/xi,/x 2 )) =: c0t3(a0 + axr -f r 3 -f E(£, r, m, /i 2 )), with Co = ^n(dau)
¥" 0 ( s e e (8.2.46)). For t ^ 0, this is equivalent to 0 = a 0 + a i r + r 3 + E(£,r, /ii, /i 2 ).
(8.2.51)
£(£, r, /ii,/i 2 ) = 0(£) as £ -» 0, for fixed r,/ii,/i 2 .
(8.2.52)
Again
As before, we may thus invoke the implicit function theorem to conclude that the qualitative description of the bifurcation behaviour is furnished by the solution structure of the cubic equation 0 = a 0 + CUT + T3.
(8.2.53)
In particular, locally there exist at most three solutions. We summarize our result in: Theorem 8.2.3. As in Theorem 8.2.2, assume the general conditions (8.2.38)-(8.2 40). Furthermore, we assume that there exist parameter variations /ii, /x2 with Adxu)!*! ^ 0, *{dliXu)ii2
± 0, but n(dxu)fi2
=0
(see (8.248), (8.249)).
(8.2.54) (8.2.55)
8.2 The functional analytic approach to bifurcation theory
279
Then there exist e > 0 and a two-parameter variation of \o, \ t = A0 + t36i/xi + £262/i2,
(8.2.56)
such that for 0 < t < e, there exists a neighbourhood Ut of 0 in V for which the number of solutions u G Ut of LXtu = 0
(8.2.57)
equals the number of solutions of the cubic equation a 0 + a i r - f r 3 = 0. (ao = 6ir(d\u)biHi/n(daU),ai 8.2.2 again.)
(8.2.58) /
3
= 6ir(d^ An)62/i2/ 7r(9 n), noting Remark q.e.d.
What we are seeing in Theorem 8.2.3 is the so-called cusp catastrophe (in the language of R. Thorn's theory of catastrophes), the bifurcation of the zero set of a cubic polynomial depending on the parameters a 0 , a\. In the same manner, one may also identify conditions where the bifurcation behaviour is described by other so-called elementary catastrophes, as classified by R. Thorn (see e.g. Th. Brocker, Differentiable Germs and Catastrophes, LMS Lect. Notes 17, Cambridge Univ. Press, Cambridge, 1975). The higher the order of the polynomial involved, however, the more independent parameters one needs. The general idea is that the singular behaviour at a bifurcation point, in particular the nonsmooth structure of the solution set at such a point, is simply the result of the projection of a smooth hypersurface in the product of the solution space and the parameter space onto the solution space. The singularity arises because that hypersurface happens to have a vertical tangent plane over the solution space at the bifurcation point. In order to discuss the assumption (8.2.41), (8.2.54), we provide L e m m a 8.2.2. Assume that for every / 3 G 1 , there exists some \i with (3 = 7r(DALAon(0, A0) (*x) (:= 7r(^L A o + t M u(0, A 0 ))| t = 0 ). (8-2-59) (Again, we write (3 in place of /3^0 and consider it as a scalar quantity, as the image of n is assumed to be one-dimensional.) Then for every j3 £ R, there exists some /i with ir{(dxu)fj) = 0.
(8.2.60)
Proof We abbreviate \ t = AQ -f t/i( as A is open, AQ -f tfi € A for sufficiently small \t\).
Bifurcation theory
280 By (8.2.14)
LA.UK,
A t )+*(«(€, At)) = £.
Taking the derivative w.r.t. t at t = 0 and £ = 0 gives n((d\u)ij)
= - —(L A t u(0,A t ))| t = 0 = -(^^AtMO,Ao)|t=o ~(DuLXo)—
w(0,A t )| t=0 .
Since DUL\0 = J^ 0 , and 7r O J A O = 0 by definition of 7r, applying 7r to both sides of the preceding equation gives 7r((dxu)ii) = -7r(£>ALAo)u(0, A0), and by assumption (8.2.59), we may find \i for which the right-hand side becomes /?£0- (We take -(3 in place of (3 in (8.2.59).) q.e.d. The approach to bifurcation theory presented here originated with L. Lichtenstein, Untersuchung (iber zweidimensionale regulare Variationsprobleme, Monatsh. Math. Phys. 28 (1917), and was developed in X. Li-Jost, Eindeutigkeit und Verzweigung von Minimalflachen, Thesis, Bonn, 1991, see also X. Li-Jost, Bifurcation near solutions of variational problems with degenerate second variation, Manuscr. Math. 86 (1995), 1-14, J. Jost, X. Li-Jost, X. W. Peng, Bifurcation of minimal surfaces in Riemannian manifolds, Trans. A MS 347 (1995), 51-62, Correction ibid. 349 (1997), 4689-90. The reduction of a bifurcation problem in an infinite dimensional setting to a finite dimensional one is an example of the Lyapunov-Schmid reduction which we now wish to discuss. As before, we consider a parameter dependent family of equations Lxu = 0
(8.2.61)
with V
xA-*W
(n, A) i-» L\u. (V, W Banach spaces, A an open subset of some Banach space) near (u 0 , A0) with LXou0 = 0.
(8.2.62)
8.2 The functional analytic approach to bifurcation theory Again, we assume that J\(u) = DuL\{u) V0
281
is a Fredholm operator. Thus
:=kevJXo(u0)
is finite dimensional, and we have decompositions V = V0 0 Vi
(8.2.63)
W = W0®Wu
with Wi = R(J\0(u0)),
W0 finite dimensional. (8.2.64)
(R denotes the range of an operator as in Definition 2.3.1.) We let 7T : W - •
W0
be the projection defined by the decompostion (8.2.64). Then our equation L\u = 0 is equivalent to TTLXU
= 0 and (Id -
TT)LXU
= 0.
(8.2.65)
We first consider (Id-ir)
:VixV0xA^Wu
and with X := VQ X A, we write Lx(v" + v') = (t/", t/', A)
with t/ G Vb, v" £ Vu A G A.
Then, at t/' + v' = uo, Dv„g(v'\ v\ A0) = DV„LXQ(V"
+ i / ) : Vi -+ W^
is an isomorphism by definition of Vi, V^i; namely it is simply JA 0 (^O), considered as a map from V\ to W\. Therefore, by the implicit function Theorem 2.4.1, near (no, Ao), we may find a unique v" =
(8.2.66)
f
Thus u = v' +
(8.2.67)
Equation (8.2.67) is a finite dimensional system of equations, because the image of TT, Wo, is finite dimensional. This is a Lyapunov-Schmid reduction, and we have seen an instance of this in detail in the preceding for the case where Vo and W0 are one-dimensional. A general reference for this and other topics and methods in bifurcation theory is S. N. Chow, J. Hale, Methods of Bifurcation Theory, Springer, New York, 1982.
282
Bifurcation theory
8.3 The existence of catenoids as an example of a bifurcation process We consider the variational problem I(u)=
f F(t,u(t),ii(t))dt
(8.3.1)
Ja
with F(t, u, it) = u \ / l + u2.
(8.3.2)
This variational problem is of the type considered in Section 1.1 of Part I. I(u) with F given by (8.3.2) is the area of the surface of revolution obtained by rotating the curve u(t), a < t < 6, about the £-axis. Critical points are so-called minimal surfaces of revolution. According to Theorem 1.1.1 of Part I, the corresponding Euler-Lagrange equation is computed as
^Fp(tMt)Mt))-Fu(tMt)Mt)) Fpp (*, u(t), u(t)^ju(t) + Fpu (t, u(t), u(t)^jii(t) + Fpt (t, u(t), £(*)) - Fu (t, u(t), ii(t)) = 0 which in the present case becomes
uu
+
- \ / l + u2 = 0,
=
(8.3.3)
or equivalently nix - -u2 - 1 = 0.
(8.3.4)
By (1.1.7) of Part I, we have F — uFp = constant, since Ft = 0, hence in our case F = u\/\ -f u2, u constant =: A. vTTVi2 Therefore, for A ^ 0, the general solution of (8.3.4) is u(t) = A cosh
't-t0y l
A
8.3 Example: bifurcation of catenoids
283
with parameters A, to- Here A ^ 0, and we may assume A > 0 as the case A < 0 is symmetric to the case A > 0. Also, since to just represents a translation of the independent variables, we may assume to = 0, i.e. n(t) = A c o s h ( ^ J .
(8.3.5)
The curve u(t) is called a catenary, and the minimal surface of revolution obtained by revolving u(t) about the t-axis is called a catenoid. For the sake of normalization, we consider the interval I = [—1,1]. In order to use the general theory of Section 8.2, we need to choose appropriate Banach spaces V, W and A = E and consider the operator Lx
: V xA-^W
(n,A)f_>
(8.3.6)
(^(^=^)-V/TT^,n(l)-Acoshi,7x(-l) — A cosh j ) .
On the right hand side, we have a differential operator of second order and a Dirichlet boundary condition. The boundary values are real numbers, and so W should contain E 2 as a factor as we have two boundary points. Otherwise, V and W shouM differ by two orders of differentiability. Thus, possible choices are Sobolev spaces y
=
wk+2*{I),
W = Wk>p(I) x E 2 for some k,p
or spaces of differentiate functions V = C f c + 2 (/),
W - Ck(I) x E 2 .
Here, we shall take V = 1F 2 ' 2 (/),
W = L2(I) x E 2 ,
(8.3.7)
but the reader should also convince herself or himself that the other choices work as well, although the space I? will always play some auxiliary role. In the sequel, we shall denote the scalar product in L2(I) by (•, -)L 2 5 i.e. (WI,W2)L*
= /
/•
wx(t)w2(t)dt
284
Bifurcation theory
for wi,w2 G L2(I). The scalar product on W = L2(I) x E 2 for wx = (wi,si), w2 = (w2,s2) (wuw2 G L 2 ( / ) , 5 i , 5 2 e l 2 ) , (wi,w2)w
= (m,w2)L2
+ si - s2 2
is obtained from the scalar products on L (I) and on E 2 . The Jacobi operator is given by J\{u)v =
DuL\(u)v
by (8.3.5). In order to determine the kernel of the equation jx(u)v
JA(W),
= 0.
we need to solve (8.3.9)
This is equivalent to v{t) - I tanh {v(t) + ^v(t)
=0
(8.3.10)
v ( - l ) = i;(l) = 0 .
(8.3.11)
The space of solutions of (8.3.10) is generated by vi(t) = — sinh j v2(t) = cosh j - j sinh j . (These solutions are simply obtained by differentiating the general solution A cosh (^x*1) of (8.3.4) w.r.t. the parameters A and t0 (at t0 = 0), cf. Theorem 1.3.3 of Part I.) The boundary condition (8.3.11) cannot be satisfied by v\, and so we have to find out for which values of A v(t)
:= y2(t) = cosh { - { sinh {
satisfies v(l) = v(—1)=0.
(8.3.12)
This is the case precisely if A = tanh A.
(8.3.13)
We agreed above to consider only positive values of A, and this equation has precisely one positive solution which we denoted by A0, and likewise, we put uo(t) = A0cosh ( Y) cf. (8.3.5).
'
8.3 Example: bifurcation of catenoids
285
The only solutions of (8.3.10), (8.3.11) are av(t) with a € E and v(t) given in (8.3.12), and so we have dimker J Ao (u 0 ) = 1.
(8.3.14)
We call v a weak solution of the Jacobi equation if
Lv®{dFjm)
dt+
lhdnvitHt)dt-°
(8 3 15)
--
J
for all 17 GCg°( )In the sequel, we shall need a little regularity result, namely that any solution v of (8.3.15) of class L2(I) is automatically smooth, in fact of class C°°(I). As we are dealing with a one-dimensional problem here, this result is not too hard to demonstrate, but since that would lead us too far astray, we omit the proof. It can be found in most good books on differential equations or functional analysis, e.g. K. Yosida, Functional Analysis, Springer-Verlag, Berlin, 5th edition, 1978, pp. 177-82. Of course, if v is of class C 2 , (8.3.15) is equivalent to
for all 77 e C n ' ) , and by Lemma 1.1.1 of Part I, this is equivalent to v being a solution of the Jacobi equation. We shall now identify ker J\0(uo) and coker J\0(uo) as required in (8.2.3). We shall simply write J\Q in place of J\0(uo). According to our choice (8.3.7), we consider J\Q as an operator JXo : W2'2(I) -> L2(I) x R 2 . l e If w £ Ro(Jxo) := R(J\0\H2>2(I))I - - tf there exists v G H0' (I) with J\0v = w, then for all cp £ ker J\0
(w,
= (JXOVI^L*
=
(v,J\0cp)L2
=0 (in the same manner as the equivalence of (8.3.15) and (8.3.16) and noting that cp is smooth and v and cp both vanish on dl.) Thus if w € R0(J\0), then also w e (ker JAQ)"1* where -1 denotes the orthogonal complement in the Hilbert space L2(I), as in Corollary 2.2.4. Consequently, if we denote the closure of a linear subspace M of L2(I) x E 2
286
Bifurcation theory
by M, then also Ro(J\o) C (kerJA,)- 1 . Conversely, if w € i?o(^A0)X(=r: (Ro(J\o))±), (iu, JAo^)vy = 0
for all v e
then H2'2(I).
By the regularity result mentioned above, this implies that w is smooth, and so we may integrate by parts to get (w, J\0v)w
= (JXow, v)w
for all v e
H^2(I)
and hence w £ ker J\0. Altogether, we have shown that RoiJxv)1.
ker JAo =
Since, according to Corollary 2.2.4, we have the decomposition L2(I)
= R0(JX0)L2
we may also consider Ro(J\0)± identification
(BRoiJxo)-1,
as coker J Ao , and so we get the required
kerJ Ao £* coker J Ao .
(8.3.17)
We note that this depends on the fact that JAo is formally self-adjoint in the sense that (v,JXow) if e.g.
= (JXov,w)
(8.3.18)
v,weH2>2(I).
Remark 8.3.1. The situation here is slightly different from the one in Section 8.2 inasmuch as we identify coker JA() here with RQ(J\Q)L and not with R{J\Q)L. Therefore, in the present situation, if IT denotes the orthogonal projection onto ker J Ao = coker J Ao , we have ir(JXov) = 0 2,2
only for v £ H (I), but not for all v £ H2'2(I). This is for example relevant for the argument of the proof of Lemma 8.2.2. Regularity theory also implies that R(J\0) K)n€N C (ker Jxo)-1 C W 2 ' 2 (I), we have J\oVn
=:
is closed. Namely, if for
Jni
and fn converges to / 0 in L2{I) x M2, then ||v n ||w 2 ' 2 (/) is bounded.
8.3 Example: bifurcation of catenoids
287
By Rellich's Theorem 3.4.1, after selection of a subsequence, vn then converges in Wl,2(I). Prom the equation, i.e. 1 .. cosh2 {Vn
+
d ( 1 \ dt ^cosh 2 { )
Vn +
1 1 A^ cosh2 { Vn ~
we then see that Un also converges in L2{I). W2,2(I), and the limit VQ then satisfies
/n
'
Thus, v n converges in
Thus, the image of JA 0 is closed. Thus, J\Q is a Fredholm operator of index 0. Our aim is now to check that the assumptions of Theorem 8.2.2 hold. In order to verify (8.2.42), i.e. Tr(d^u) ^ 0, according to Remark 8.2.1, we need to compute dJ\0, i.e. the second derivative of L\Q. Starting from (8.3.3) and inserting (8.3.6), i.e. no = Aocosht/Ao, we obtain JT
d (
2
.
3Atanh| .2\
1
.2
By (8.2.27), we have to project this onto the kernel of J\Q{uo) and check that the result is nonzero, for our Jacobi field v given in (8.3.12), i.e. v = cosht/A — t/Asinht/A. Since here the projection IT is given by the orthogonal projection in the Hilbert space L2(I) x E 2 onto ker J\0(uo), which is generated by the Jacobi field i>, we simply have to verify that the L2~ product of dJ\Q(uo)(v,v) with v is nonzero. Thus, we compute f { d (
2
{dJ»(*o){v,v),v)La = J^
3Atanh|
[^rrvmt)
l
-
9\
—^Htfj
2
irTv{t) A
cosh
\v{t)dt
3Atanh|
„
3
., , 9 . , | , dt
j Z^T^M*) / i i cosh jM*f ~ cosh' by an integration by parts
J 3v(t)2 T
11 cosh*3 f
( A t a n h | v(t) - v(t)) dt.
Now with v = cosh j — j sinh j , we have A tanh | v(t) - v{t) = ~ cosh j ,
288
Bifurcation theory
and so (dJXo(u0)(v,v),v)L2
= "
3
/ ^
T
< °-
Thus, indeed ir(dlu) > 0.
(8.3.19)
We finally consider (8.2.41). Thus, we have to verify that ir(d\u) ^ 0, with d\u =
-T-U(0,
dt
Xt)\t=o for a suitable family Xt of parameters. (8.3.20)
We start with (8.2.14), i.e. in the notations of Section 8.2 LXtu(tXt)+ir(u(^Xt))
=t
(8.3.21)
In the present case L\ is given by (8.3.6), and IT is the orthogonal projection in L2(I) onto ker JA 0 , the one-dimensional space generated by v(t) = cosh t/Xo - t/Xo sinh t/A 0 (see (8.3.12)), where A0 is so chosen that v(l) = v(—l) = 0. Thus, this v can be taken as the £0 of Section 8.2. However, since — (Acoshi)|A==Ao=0 by choice of A0 (see (8.3.13)), we shall need to employ a variation of the parameter somewhat different from the family At = A -f tfi employed in Section 8.2. Here, we put i/0 := A 0 cosh^and choose the family Xt such that At cosh ^ = i/0 4- t\i.
(8.3.22)
We then differentiate (8.3.21) w.r.t. t at t = 0, £ = 0 to obtain J\0(uo)dxu
+ ^ \\v\\
L2(I)
(dxu,v)L2V
d\u(l) = dxu(-l) Then (J\0(u0)dxu,v)L2
= / | c ^ h 2 ^ (dxu(t))* j
+
^co22±dxU{t))v(t)dt
=0 = //.
(8.3.23)
289
Exercises
Ao
integrating by parts and using 0 = v(-l). 2
JA 0 ( W O)^
= 0 v(l) =
=-/Li
=
Ao cosh ~ < 0 for fi < 0. Equation (8.3.23) then implies 7T{d\u)
= -r-rrx ( 9 * ^ , V ) ^ 7^ 0, \\v\\ L 2 (/)
i.e. (8.3.20). We thus have verified all the assumptions of Theorem 8.2.2 (for the family Xt defined by (8.3.22) in place of the family At = A0 + tfi). Theorem 8.2.2 thus describes the bifurcation behaviour of the solutions of (8.3.3) or (8.3.4), i.e. the critical points of (8.3.1), (8.3.2) near Uo(t) = A 0 cosh^-: For boundary values u(l) = u ( - l ) < Aocosh^-, there is no solution (at least in the vicinity of no), whereas for u(l) = u(—l) > Aocosh^-, we may find two solutions. Of course, this may also be verified directly without going through all the abstract machinery of Section 8.2, but hopefully this example can serve to illustrate the general scheme. The catenoids are frequently discussed in books on the calculus of variations, e.g. O. Bolza, Vorlesungen ilber Variationsrechnung, Teubner, Leipzig, Berlin, 1909, or M. Giaquinta, St. Hildebrandt, Calculus of Variations, Springer, Berlin, 1996, I, p. 366 and II, pp. 263-70. A discussion in terms of bifurcation theory also in the case of not necessarily symmetric boundary conditions (i.e. not requiring u(l) = u(—1)) is given by H. Wenk, Extremverhalten der Stabilitat von Catenoiden als Rotationsminimalflache, Diplom thesis, Bochum, 1994.
Exercises 8.1
How many parameters are needed for a complete description of the bifurcation behaviour of the roots of a fourth-order polynomial?
Bifurcation theory Consider the problem of finding critical points
I(u) = fu{t)^/l + u(t)2dt U(K)
= U(—K) = 1
for a parameter n > 0. Determine the value no for which a bifurcation occurs. (Hint: This problem can be reduced to the one considered in Section 8.3.) Consider geodesies on S2 as in Chapter 2 of Part I. More precisely, we take two points p,q £ S2 with distance rf(p, q) = A, and consider geodesic arcs between p and q of length A, i.e. length minimizing arcs. What happens at A = 7r? Does this fit into the framework described in Section 8.2?
9 The Palais-Smale condition and unstable critical points of variational problems
9.1 The Palais-Smale condition In this chapter, we take up a direction that has already been presented in Chapter 3 of Part I, namely the search for nonminimizing critical points of variational problems. This chapter will consequently be independent of Chapters 4-8 of the present Part II. In Section 3.1 of Part I, we presented existence results for unstable critical points of functionals F of class C1 on some finite dimensional Euclidean space Rd. We only needed a coercivity condition on the functional guaranteeing that a critical sequence (x n ) n€ N (i.e. satisfying DF(xn) —• 0, | F ( x n ) | bounded) stayed in a bounded set. The local compactness of Rd then allowed the extraction of a convergent subsequence whose limit XQ satisfied DF(xo) = 0, because of the continuity of DF. In Sections 2.3 and 3.2 of Part I, we also presented examples where variational problems could be reduced to such finite dimensional problems. The domain was a little more complicated than E d , but being finite dimensional, it was still locally compact so that we had no difficulties finding limits of subsequences for critical sequences. In the remainder of this book, however, we have had ample opportunity to realize that variational problems are typically and naturally posed on some infinite-dimensional Hilbert or Banach space. Such a space is not locally compact anymore w.r.t. its Hilbert or Banach space topology, so that the previous strategy encounters a serious problem. Also weak topologies do not help much as the functionals under consideration typically are not continuous w.r.t. the weak topology. If one searches for minimizers, this problem can be overcome by introducing convexity assumptions as we have seen in Chapters 4 and 8, but any convexity assumption excludes the existence of critical points other than minima. 291
The Palais-Smale
292
condition
Nevertheless, the lack of compactness of the underlying space must be compensated by an assumption on the functional that guarantees the appropriate compactness of critical sequences. In other words we do not require the compactness of arbitrary bounded sequences on our space — which is impossible as argued — but only of critical sequences. This is the idea of the Palais-Smale condition which we now formulate: Definition 9.1.1. Let (V, ||-||) be a Banach space, F : V —> E a functional of class C1. We say that F satisfies the Palais-Smale condition, abbreviated as (PS) , if any sequence (xn)nen C V satisfying (i) | F ( x n ) | < c for some constant c (ii) ||£>F(x n )||->0
forn~+oc
contains a convergent subsequence. Note that a limit Xo of such a subsequence satisfies DF(xo) = 0 (i.e. is a critical point of F) because DF is continuous. A direct consequence of the definition is: Lemma 9.1.1. Suppose F : V —> E satisfies (PS). Then for any a G E KQ:={xeV:
F(x) = a, DF(x) = 0}
(the set of critical points of F with value a) is compact. q.e.d. We also have: Lemma 9.1.2. Suppose F : V —• E satisfies (PS). For a E E, we put
U«,P:= | J {zeV\
||x-*||
(p>0),
xeKa Na,6 := {y E V | \F(y) - a| < 6 , \\DF(y)\\ < 8}
(8 > 0).
Then the families (Ua,P)P>o and (Na,s)6>o are fundamental systems of neighbourhoods of Ka (i.e. each neighbourhood of Ka contains some Ua,P and some Nays). Proof If is clear that Ua,P and NQys are neighbourhoods of KQ for p > 0 respectively 8 > 0. It follows from the compactness of Ka that each neighbourhood of Ka contains some Ua,P> Concerning the same property of the Naj, let us assume on the contrary that there exist a neighbourhood U of Ka and a sequence (yn)neN with yn £ Na x\(Uf)Na j.)
9.1 The Palais-Smale
condition
293
for all n. (PS) implies that a subsequence of (yn)n€N converges to some yo £ Ka C U, contradicting the openness of U. q.e.d. In our applications below, we shall also encounter the situation where we want to find critical points of the restriction of some functional F to the level hypersurface G(x) = (3 of some other functional G. For that purpose, we shall need a relative version of the Palais-Smale condition which we shall formulate only for the case of a Hilbert space: Definition 9.1.2. Let (H,< •,• >) be a Hilbert space, F,G : H -* E junctionals of class C1, J 8 G 1 . Suppose DG{x) ^ 0 for all x with G(x) = (3. We say that F satisfies (PS) relative to G = f3 if every sequence (x n ) n £N C H with G(xn) = (3 and satisfying (i) | F ( x n ) | < c for some constant c (")
\\DG(xn)\\2
0
K
for n —* oc
contains a convergent subsequence. A limit #o of such a subsequence then satisfies G(x0) = (3
(9.1.1)
and
_ (DFMDGjxo)) ||DG(rr 0 )|| 2
= V
i.e. is a critical point of the restriction of F to G(x) = (3. Of course, results analogous to Lemmas 9.1.1 and 9.1.2 hold in the relative case. One simply intersects the corresponding sets with {G(x) = (3} and replaces DF by its projection to that level set. As in Sections 2.3, 3.1, 3.2 of Part I, in order to find critical points of a functional, one needs to construct (local) deformations that decrease the value of the functional except at or at least away from critical points. We shall now do so in stages of increasing generality. We start with a functional F:H 2
^R
of class C on some Hilbert space (H, (•, •)) that satisfies (PS). For each
The Palais-Smale
294
condition
u £ H, DF{u) is a linear functional on H, and by Corollary 2.2.3, it can therefore be identified with an element VF(u) of H, called the gradient of F at u. Thus, VF(u) satisfies DF(«)(VF(«)) = ||£>F( U )|| 2
(9.1.3)
\\VF(u)\\ = \\DF(u)\\.
(9.1.4)
Since F is assumed to be of class C 2 , DF and hence V F are of class C 1 in their dependence on u. In particular, V F is locally Lipschitz. We now consider the (negative) gradient flow induced by F : —i/>(u, t) = -VF(i>(u, t))
for t > 0
V>(u,0) = u.
(9.1.5) (9.1.6)
Because of the Lipschitz property, by Theorem 2.4.2 and Corollary 2.4.2, for small t > 0, there exists a unique solution ^(u, t) satisfying the semigroup property
^ ( M + *)=^W>(M),a)
(9.1.7)
for sufficiently small 5, t > 0. Moreover, ip(u,t) = it
for all u
with VF(w) = 0, i.e. for all critical points of F . (9.1.8)
Finally F(1>(u, t)) = F(u) + j
= F(u) + J
^ F M t i , r))dr
DF(i>(u,t))^(u,T)dT
= F(u) - f \\DF^(u,r))\\2dr < F(u)
by (9.1.5), (9.1.3)
for t > 0, if DF(ti) = DF{^{u, 0)) ^ 0, (9.1.9)
i.e. if u is not a critical point of F . Thus, we have found the prototype of a deformation that decreases the value of F except at its critical points. For technical reasons, however, the above flow will need some modifications and generalizations. First of all, a solution of (9.1.5) need not exist for alH > 0 because it
9.1 The Palais-Smale condition
295
may become unbounded in finite 'time' t. This can be easily remedied by using the Lipschitz function 77 : M + - • M + 1 for 0 < s < 1 rl{s)
=
<1-
f o r 5
>i
s
and putting VF(«):=T/(||VF(U)||)VF(«)
(i.e. VF(u) = VF(u) for ||VF(u)|| < 1 and
VF(ti)
< 1 for all u) and
replacing (9.1.5) by j ^ ( M ) = -VF(iH«,t)).
(9-1.10)
Of course, we still use (9.1.6). Since VF(w) < 1 for all u, the solution of (9.1.10), (9.1.6) now exists for all t > 0, and satisfies (9.1.7) for all s,t > 0. Equation (9.1.8) also still holds, and as in the derivation of (9.1.9), we get F ( V ( M ) ) = F ( t O - / Tj(l|VFWii,r))||)||DF(^(ti,r))|| 2 dT Jo < F(u)
for t > 0,
if u is not a critical point of F. More generally, we have F(ip(u, t)) < F(ip(u, s)) whenever 0 < s < t, for all u. Next, we wish to localize the construction near a level a. Thus, for given eo > 0 and a neighbourhood U of Ka we want to have a flow ip(u, t) with (9.1.7), (9.1.8) and also ip(u,t) = u
if|F(u)-a| >c0,
(9.1.11)
and the following more explicit local decrease of the value of F: For a e R, we put Fa :={veH\
F(v)
We want to find 0 < e < e0 with *P(Fa+€ \ U, 1) C F Q _ C #,l)cFa_eU[/,
(9.1.12) (9.1.13)
296
The Palais-Smale
condition
and of course also F{4>{u,t)) < F{4>(u,s)) H0<s
for all u.
(9.1.14)
We let if : E —• E be Lipschitz continuous with
for \a - s\ > e0
(p(s) = 1
for \a — s\ < —-
0 < (f(s) < 1
for all s
and replace (9.1.10) by
J ^ K t) = -
(9.1.15)
Again, a solution ip(u,t) exists for all £ > 0 and satisfies (9.1.7) for all s,£ > 0, as well as (9.1.8) and (9.1.14) (for the latter it was necessary to require (p > 0). (9.1.11) also is clear from the choice of if. We now verify (9.1.12), (9.1.13). If 0 < e < f and u e F a + € and if F(i/>(u, 1)) > a - e, from (9.1.14) \F{4>{u, t)) -a\<e
for all 0 < t < 1,
and therefore
(9.1.16)
As before, we may now compute
= F(U) + J = F(u)-
jLF^(u,T))dT
f ^(F(^,r))r7(||VF(^,r))||)||I}F(^,r))||2(ir Jo
f min(l,||DF(^(w,r)|| 2 )dr (9.1.17) Jo since we assume u G F a + C , using (9.1.16) and the properties of r\.
By Lemma 9.1.2, we may find 6, p > 0 with iVa,6 C *7a,p C C/a,2p C *7
(9.1.18)
(here, we are using (PS)!). Prom the definition of Naj, thus \\DF(ip(u}r)\\2 > 62 whenever ip(u,T) g NQj- Without loss of generality 6 < 1. (9.1.17) then yields F{${u> 1)) < a + c - (meas {0 < r < 11 i/>(u, r) i 7V M }) 6 2 .
(9.1.19)
9.1 The Palais-Smale condition
297
From (9.1.18), we have for v G H \ U dist(t>, NQys) ( : = \
inf
\\v - w\\ 1 > p.
w€Nai6
J
Since d
m*P(u,t)
< 1,
(9.1.20)
therefore, if u $ U, then also ?/>(u, r ) £ Naj for 0 < r < p, and similarly, if ^(u, 1) ^ £/, then also ip(u,T) ^ 7Va^ for 1 - p < r < 1. Therefore, if either u <£ U or ^(u, 1) ^ £/, then meas {0 < r < 11 ip(u, r) £ i V ^ } > p. Thus, from (9.1.19), if u £ U or if ip(u, 1) £ [/, F(^(u, 1)) < a + e - p6 2 < a - e if we choose e < -p<52.
(9.1.21)
Thus, for 0 < e < min(|eg, |p6 2 ), we get (9.1.12), (9.1.13). In conclusion, we have shown the following deformation result: Theorem 9.1.1. Let F : H —» R be a C2 functional on a Hilbert space H, satisfying (PS). Let a G E, and put Fa := {v G H : F(v) < a} , Ka:={veH:
F(v) = a,DF(v)
= 0} .
Let €o > 0 and a neighbourhood U of Ka be given. Then there exist 0 < e < Co and a continuous family 4>: H x [0, oo) -+ H
with the semigroup property ip(ip(u,s),t) u G H and with
= tp(u,s 4-1) for all s,t > 0,
(i) ip(u, 0) = u for all u G H (ii) F(ip(u,t)) is nonincreasing in t for all u G H (iii) ip(u,t) = u for all t whenever DF(u) = 0, in particular for u G (iv) ip(u,t) = u whenever \F(u) — a\ > e0, for all t (v) ^ ( F a + C \ C/, 1) C F a _ 0 ^ ( F a + C , 1) C F a _ c U *7 (vi) IfF(u) is even (i.e. F(u) = F(-u) for allu), then also F(ip(u, t)) is even in u for all t (i.e. F(ip(u,t)) = F(il>(-u,t))).
298
The Palais-Smale
condition
(Property (vi) follows from the construction: All the auxiliary functions are invariant under replacing u by -u if F is even, and V F ( - w ) = -VF(u) in the even case.) q.e.d. Corollary 9.1.1. If under the preceding assumptions, F has no critical point with value a, i.e. Ka = 0, then there exist a deformation ip with the preceding properties and ip (F a + C , 1) C F a _ e
for some e > 0.
(9.1.22)
Proof. If Ka = 0, we may choose U = 0 in Theorem 9.1.1. q.e.d. We shall now extend Theorem 9.1.1 in two directions. First, we consider the relative case, where in addition to F , we have another C2 functional F : H -> R with DF(x) ^ 0
for all x with G(x) = /?,
for some given value /? £ R. We wish to find critical points of the restriction of F to G = (3. We assume that F satisfies the relative (PS) condition of Definition 9.1.2 on G = (3. We then perform the preceding construction with V°F(u)
:= VF(u) -
{VF{u) VG u))
' i VG(u)
(9.1.23)
in place of VF(u). We then have
|GWM)) =
-
t)))r,(\ IV G F(u)||)
(VGF(TP(U,
t)), VG(i>(u, t)))
from the chain rule and the analogue of (9.1.15) = 0, since (VGF(v),VG(v)) = 0 for all v G H. Therefore, the flow ip(u,t) now leaves G = (3 invariant. We obtain: T h e o r e m 9.1.2. Let F, G : H —• M be C2 functional on a Hilbert space (if, (•, •)) with F satisfying (PS) relative to G = (3. Let a E K , F®J :={veH\
F(v) < a,G(v) = (3) ,
K%J := {v e H | F(v) = a, G(v) = /?, V G F(t;) = 0} .
9.1 The Palais-Smale condition
299
Let e0 > 0, and let U be a neighbourhood of K^ in {G(v) = (3). Then there exist e > 0 and a continuous semigroup family ^:{G(v)
= /3}x[0,oo)^{G(v)
= l3}
satisfying (i) (ii) (iii) (iv) (v) (vi)
ip(u, 0) = u for all u G {G(v) = 0} F('0(ii,t)) is nonincreasing in t ip(u, t) = u for all u G K%# ^(u^t) = u for all t if \F(u) — a\ > eo ^{F°£ \U,1)C F Q G 4,
Secondly, we wish to extend the preceding construction to functionals on Banach spaces. For a functional on a Banach space, in general one does not have a good notion of a gradient. We therefore need to introduce Palais' concept of a pseudo-gradient: Definition 9.1.3. Let (V, ||-||) be a Banach space, U C V, F : U -+ R a functional of class C1. A pseudo-gradient vector field for F is a locally Lipschitz continuous vector field v : U —• V satisfying (i) | K u ) | | < m i n ( l , | | D F ( « ) | | ) (ii) DF(u)(v) > lmin(||2?F(«)||,||Z?F(u)|| 2 ) for all u G U. L e m m a 9.1.3. Let F : V —• E be a functional of class C1 on the Banach space V. Then F admits a pseudo-gradient vector field on V' \={utV\
DF(u)^0}.
Proof For each u G V, we can find w = w(u) with |HI<min(l,||DF(ti)||) DF(u)(w)
> \ min(||DF(ii)||, \\DF(u)\\2).
(9.1.24) (9.1.25)
Since DF is continuous (as we assume F G C 1 ), w satisfies (9.1.24), (9.1.25) also for all v in some neighbourhood Nu of u. Since {Nu : u G V'} is an open covering of V , it possesses a locally finite refinement
{Ma}aG/t- Let Pa^—dist^V^Ma). f This holds for any open covering of a paracompact set, see e.g. J. Dieudonne, Grundziige der Modemen Analysis, 2, Vieweg, Braunschweig, second edition, 1987, pp. 26-9; V is paracompact for example because it is metrizable.
The Palais-Smale
300
condition
pa is Lipschitz continuous, and pa(v) = 0 for v ^ Ma. We put , v
Pa(v)
Since each v is only contained in finitely many Mp, because of the local finiteness of the covering, the denominator of (pa is a finite sum. ((pa)aei is a partition of unity subordinate to { M a } , i.e. 0 < (pa < 1, <pa = 0 outside M a , Ylaei V^a = 1- Also, the y?a are Lipschitz continuous. Then v(u) := S aG /<£a(w)w(wa)
for some u a G Ma
is a convex combination of vectors satisfying (9.1.24), (9.1.25) and hence satisfies these relations, too. v(u) thus is a pseudo-gradient vector field for F . q.e.d. Note that we only need to require F G C 1 , and not F G C 2 , in order to construct a locally Lipschitz pseudo-gradient field. We then have the following deformation for C 1 -functional on Banach spaces. Theorem 9.1.3. Let F : V —• R 6e a C1-functional on a Banach space V satisfying (PS). Let a G R, e0 > 0, U a neighbourhood of Ka as in Theorem 9.1.1. Then there exist 0 < e < 1 and a continuous family ip : V x [0, oo) —• V satisfying the semigroup property w.r.t. t > 0, and (i) (ii) (iii) (iv) (v) (vi)
^(t/,0) = u for allu eV F(ijj(u, s)) < F(I/J(U, t)) whenever 0 < t < s, u G H -0(ii, t) = u for all t whenever DF(u) = 0 ?/>(*/, t) = u whenever \F(u) — a\ > eo, for all t ^ ( F a + C \ (7,1) C F Q _ C , ^((7,1) C F a _ c U C/ If F(-) is even, so is F(i/j(-,t)) for all t.
Proof The proof is the same as the one of Theorem 9.1.1, replacing VF(u) by a pseudo-gradient vector field v(u) — except for the following technical point: Lemma 9.1.3 asserts the existence of a pseudo-gradient field only on {x G V \ DF(x) ^ 0}. We therefore have to choose another Lipschitz continuous cut-off function 7 : V —• R with 0 < 7 < 1, y(u) — 0 if u G Na s, 7(1/) = 1 for u G V \ Naj. We may then consider ^ j M
=
-j(rp(u,t)MF(i>(u,t)))r](Hi>(u,t))\\)v(^(u,t))
with v?, 77 as before. This has the additional effect that dtp(u,t) 0* ~
'
(9.1.26)
9.2 The mountain pass theorem
301
whenever tp(u,t) £ Na «, which is a neighbourhood of Ka, while the evolution is the same as before (with v(u) in place of VF(u)) outside Naj. This cut-off near Ka does not affect the rest of the construction. If F is even, we may also choose 7 even. However, there might still exist critical points of F in F a + C \ Naj. In order to take account of those, we strengthen the requirements on the above cut-off function (p to c
everything works out as before. q.e.d. It is possible, and not overly difficult, to extend Theorem 9.1.3 to the relative case and to obtain a result analogous to Theorem 9.1.2. Here, however, we refrain from doing so.
9.2 The mountain pass theorem With the help of the deformation theorems of the previous section, one may easily derive existence results for critical points of a functional satisfying (PS). To illustrate this point, we start with the trivial Lemma 9.2.1. Let F : V —• E be a C1 functional on a Banach space satisfying (PS). If a := inf F(u) > —00, then F possesses a critical point UQ with value a (i.e. F(uo) = a, DF(u0)=0). Proof. Suppose that Ka — 0. Then U = 0 is a neighbourhood of Ka. Let €Q > 0 be arbitrary. Choose e as in Theorem 9.1.3. From the definition of a, F a + C ^ 0 , F a _ c = 0.
302
The Palais-Smale
condition
Therefore, it is impossible that as Theorem 9.1.3 (v) asserts, the deformation ?/>(•, 1) maps F a + C into F a _ c . This contradiction implies KQ ^ 0, which means the existence of the desired critical point. q.e.d. Of course, the methods presented in Chapter 4 yield more general existence results for minimizers of variational problems. The strength of the Palais-Smale approach rather lies in its capability of producing nonminimizing critical points. To demonstrate this, we now present the mountain pass theorem of Ambrosetti-Rabinowitz. Theorem 9.2.1. Let F : V —• R be a C1 functional on a Banach space (V, l(-ll) satisfying (PS). Suppose F(0) = 0 and (i) 3p > 0,0 > 0: F(u) > (3 for all u with \\u\\ = p (ii) 3u\ with \\ui\\ > p and F(u) < (3. We let r:={7€C°([0,l],V)|7(0)=0,7(l)=tii}. Then a := inf
sup
F(^(T))
(>
0)
^rr€[0,l]
is a critical value ofF (i.e. there exists UQ with F(UQ) = a, DF(u0) = 0). Proof. Suppose again that Ka = 0, and take the neighbourhood U = 0 of KQ. We let e0 = min(/?, (3 — E(u\)). Choose e as in Theorem 9.1.3. Prom the definition of a, there exists 70 G T with sup
F ( 7 O ( T ) ) < a + e,
r€[0,l]
while no such 7 can satisfy sup F(*y(t)) < a - e, *€[0,1]
i.e. satisfy 7([0,1]) C F a _ c . However, if we apply the deformation ?/>(•, 1) of Theorem 9.1.3, we obtain a path 7(r):=^(7o(r),l)cFa_c with 7(0) = 7o(0) = 0 and 7(1) = 7o(l) = u\ by choice of e0. This contradiction implies KQ ^ 0, i.e. the existence of the desired critical point. q.e.d.
9.2 The mountain pass theorem
303
Let us summarize the essential features of the preceding reasoning: (1) One chooses a family of sets, here T, that exploits some properties of F and is invariant under the deformation ^(-, 1). (2) This family yields a minimax value a. (3) a can be estimated from above with the help of any member of our family T (a < sup r€ [ 0)1 ] F(7(£))) f° r a n v 7 £ H» anc * fr°m below through the constraints that the members of T have to satisfy (in Theorem 9.2.1, every 7 G T intersects dJE?(0, p), and therefore a > (3 > 0, and therefore in particular, the critical point produced is different from 0). (4) A reasoning by contradiction, based on the deformation theorem, shows that a is a critical value. As an application of the mountain pass theorem, we consider the following example: Theorem 9.2.2. Let Q, C Rd be a bounded domain, 2 < p < {respectively < 00 for d= 1,2). Then the Dirichlet problem Au+\u\p~2u
= 0 in
-^
Q
(9.2.1)
dQ,
(9.2.2)
u = 0 on
admits at least two nontrivial (weak) solutions ('nontrivial' means not identically 0). Proof. If u is a solution, so is —u. Therefore, it suffices to verify the existence of one nontrivial solution. (9.2.1), (9.2.2) are the Euler-Lagrange equations in iJ 0 ' 2 (fi) for the functional
F(K)
\Du\2--
=\L
f \u\p. P
(9.2.3)
JQ
This functional is a continuous functional on H0' (fi), because fQ \Du\ clearly is continuous there, and J \u\p too, because of the Sobolev Embedding Theorem 3.4.3 as we assume p < ^4>. F is also differentiable, with DF{u)(ip) = f Du-DipJQ
[ \u\p~2 uip. JQ
Again
(9.2.4)
304
The Palais-Smale
clearly is continuous on
HQ'2(Q),
condition
whereas
Jn is continuous, because we have by Holder's inequality
I / \u\p~2u
a - f GOO'
(9.2.5) (9.2.6)
by the Sobolev Embedding Theorem 3.4.3 for some constant c 0 . Thus F : Hl'2(Q) —• R is of class C1. We shall verify the Palais-Smale condition for F: Suppose (wn)n€N C HQ,2(Q)
satisfies
|^(^n)| < c\ for some constant c\
(9.2.7)
DF(un) ^ 0 foru^oo.
(9.2.8)
Thus
\l f l/fclnl 2 -- /|l*nH
(9.2.9)
and 2 : J / Du Dun'D
0
for n —• oo. (9.2.10)
In (9.2.10), we use
an
d obtain
-J\Dun\2 + j \u \
p
n
< c2
\\un\\Hl,2.
(9.2.11)
Since p > 2, (9.2.9) and (9.2.11) imply / \Dun\2 < c 3 \\un\\Hia
+ c4.
(9.2.12)
Since by the Poincare inequality (Theorem 3.4.2)
IKHtf ll2(n) = j K | 2 + J \Dun\2 < C5 y |Z?tln|2 ,
(9.2.13)
we conclude from (9.2.12) IK|| H i,2 ( n) < ceThus, any 'critical sequence' (un)ne^
(9.2.14)
is bounded. We now claim that
9.2 The mountain pass theorem
305
such a sequence (un)ne^ contains a convergent subsequence, thereby completing the verification of (PS). We need to show that, after selection of a subsequence, /
\Dun — Dum\
—• 0
for n,m —• oo
(9.2.15)
(using again the Poincare inequality as in (9.2.13)). Now / DunD(un
- Um) - \ \unf~
un(un - um) —• 0
for n, m —• oo
(9.2.16) by (9.2.10), (9.2.14). By the Rellich-Kondrachev theorem (Corollary 3.4.1), we may also assume (by selecting a subsequence) that (wn)n€N is a Cauchy sequence in 1^(0,). Then, using Holder's inequality as in (9.2.5), p-i P 2
/ \un\ ~
Un(un
- Um)\
P
< I / \un\ )
i P
I / \un~Um\ \
forn,ra->oo.
-• 0
(9.2.17)
Equation (9.2.16) then implies Dun - D(un - um) —• 0
for m, n —> oo,
/ •
which implies (9.2.15). We have thus verified (PS) for F. We shall now check the remaining assumptions of Theorem 9.2.1. First of all, F(0) = 0. Recalling that by the Sobolev Embedding Theorem 3.4.3 (and the Poincare inequality, see (9.2.13)) i
we have F(u) > ( i - « | M | £ £ ( n ) ) ||«|| 2 B ... ( 0 ) > » > 0 if IMI#i.2(m = p is sufficiently small. Finally, take any u2 € HQ,2(Q) with JQ \u2\p ^ 0. Then for sufficiently large A > 0, u\ — Xu2 satisfies
F{u1) =
£j\Dua\2-jJa\u3\>'<0.
306
The Palais-Smale
condition
We have now verified all the assumptions of the mountain pass Theorem 9.2.1, and we consequently get a critical point u of F with F(u) > /? > 0 . This is the desired nontrivial solution. (In fact, regularity theory implies that any weak solution of (9.2.1) is smooth in fi, see e.g. GilbargTrudinger, loc. cit.) q.e.d. Remark 9.2.1. By the same method, we can also treat the equation Au - Xu 4- \u\p'2 u = 0
for any A > 0.
(9.2.18)
9.3 Topological indices and critical points In Section 3.2 of Part I, we have seen an example where a topological construction permitted to deduce the existence of more than one (unstable) critical point of a functional. In the present section, we first give an axiomatic approach to such constructions and then apply this in conjunction with the Palais-Smale condition to a concrete variational problem to show the existence of infinitely many solutions. Such global topological constructions originated with the work of Lyusternik. Contributors also include Schnirelman, and, more recently, Rabinowitz, and many others. The reader will find detailed references in the monographs quoted at the end of this chapter. Definition 9.3.1. Let X be a topological space, F : X —• R continuous, x € X is called a special point for F, with value a, x £ specaF
(a then is called a special value)
if x is contained in all Ac X with the following property: For each open U D A there exist e = e(U) > 0 and a continuous ij) : X x [0,1] -+ X satisfying (i) if>(y, 0)=y foryeX (ii) F{i>{y,t)) < F(*l>(y,s)) for ally e X, 0 < s
9.3 Topological indices and critical points
307
(iii) For every y G X \U with F(y) < a + c, we have F(il>(y,l))
then also \j)(A, 1) G M.
(9.3.1)
Suppose - o o < a = inf s u p F ( i / ) < o o .
(9.3.2)
AeMyeA
Then a is a special value for F, i.e. there exists x0 € spec a F. Proof. Suppose s p e c a F = 0. According to the preceding remark, we may then take U = 0 and find ip : X x [0,1] —• X and e > 0 with F(ip(y, 1)) < a - e whenever F(y)
(9.3.3)
We may find AQ G M. with sup F(y) < a + e, y€A0
(9.3.4)
308
The PalaisSmale
condition
but no A G M. can satisfy supF(s/)
(9.3.5)
y€A
However, if we take A\ := ip(A0, 1) then A\ G M by assumption, and by (9.3.3) sup F(t/) < a — e, contradicting (9.3.5). Thus, spec a F ^ 0. g.e.rf. In order to obtain the existence of further special points, we now shall introduce the notion of a (topological) index. Such an index is based on symmetry or invariance properties of the functional under consideration. Here, we only consider the case of the simplest nontrivial symmetry group, namely Z 2 , although the subsequent constructions easily generalize to any compact group G. We thus make the following symmetry assumptions: • X is a topological space with a nontrivial involution, i.e. there exists a continuous map j : X —• X, j / id, with j 2 = id. • F : X —> R is continuous and even, i.e. F(j(x))
= F(x)
for all x £ X.
• M:= {Ac X\ j(A) = A and for all (i.e. A contains no fixed points of j)} .
x G A,j(x)
^ x
We now also require il)(j(x),t) = j(ip{x, t)) for all the deformations of Definition 9.3.1. Definition 9.3.2. An index for (X, F) is a map i:<M-+{0,l,2,...,oo} satisfying for all A, Ai,A2 (i) (ii) (Hi) (iv) (v)
G M:
i(A) = 0 & A = 0 A finite (A ^ 0) =» t(A) = 1 i ( ^ i U A2) < i{Ai) 4- <(i42) Ai C A 2 =» i(Aj) < i(A2) i(A) < i(j(A))
9.3 Topological indices and critical points
309
(vi) A compact =£• 3 neighbourhood U of A in X with U £ M, i(A) = i(U) < oo. For n € { 0 , 1 , 2 , . . . , oo}, we put
Mn--={AeM\
i(A)>n}.
Remark 9.3.2. More precisely, one should call an i as in Definition 9.3.2 an index for (X, F, Z 2 ), in order to specify the symmetry group involved. For n € { 0 , 1 , 2 , . . . , oo}, we define an := inf A£Mn
supJF(y). y£A
T h e o r e m 9.3.1. Suppose the above symmetry assumptions hold, an index i for (X, F) exists, and - o o < an < oof (i) Then specanF^0
(9.3.6)
(ii) If furthermore for some k > 1, an = a n + i = . . . = c*n-ffc, then spec a n F is infinite. Proof We note that property (v) of Definition 9.3.2 implies that Mn is invariant under (symmetric) deformations ip. Therefore, Lemma 9.3.1 implies spec Qn F ^ 0. For the second statement, we claim that for A0 = spec Qn F , t(i4o) >fc + l.
(9.3.7)
If k > 1, property (ii) of Definition 9.3.2 then implies the existence of infinitely many special points with value an. Suppose on the contrary that i(A0) < k.
(9.3.8)
By Definition 9.3.2 (vi), we may find a neighbourhood U of ,4 0 with U e M and i(Ao) = i(U).
(9.3.9)
Since A0 consists of special points, we may find a (symmetric) deformation ip with F{ip{y, 1)) < an - e for all y e X \ U
with
F{y) < a n + e
f Since the infimum over an empty set is oo, this contains the assumption Mn / 0-
310
The Palais-Smale
condition
for some e > 0. Since an — a n + k , we may find A G Mn+k with sup F(y) < an 4- e, hence sup
F(z) < an - e.
(9.3.10)
*€V(i4\I/,l)
We have i(i4\£7)>i(i4)-i(£7) >n + k-k,
by (iii) using (9.3.8), (9.3.9), ^ G Mn+k
= n. Thus
A\UeMn, hence A\(7 ^ 0 by (i). Since, as noted in the beginning, A4n is invariant under i/>, we get
1>{A\U9l)eMn, hence sup
F(y) > a n ,
contradicting (9.3.10). q.e.d. In order to apply the preceding considerations, we need to construct an index with the properties listed in Definition 9.3.2. We shall present here Coffman's version of the genus of Krasnoselskij. Definition 9.3.3. Suppose the symmetry assumptions stated before Definition 9.3.2 hold. The genus of A ^ 0, A G M. is defined as follows: gen(A) := inf {n G { 1 , 2 , 3 , . . . , oo} | 3 with
f(j(x))
= —f{x)
continuous
f : A -> R n \ {0}
for all x € A}
while gen(0) := 0. As an example, we state: Lemma 9.3.2. The genus of the unit sphere S' n ~ 1 = {||x|| = 1} in R n (with involution j(x) = —x) is equal to n.
9.3 Topological indices and critical points
311
Proof. The inclusion map Sn~l c-^ R n satisfies the properties of Definition 9.3.3, and so gen(5 n _ 1 ) < n. If n > 2, Sn"1 is connected, and therefore, by the mean value theorem, there is no continuous map / : Sn~l -+ R x \{0} with f(-x) = -f(x) for all x. Hence gen(5 n - 1 ) > 2. In fact, by the Borsuk-Ulam theoremf, there is no such continuous map to R m \ {0} with m
by Lemma 9.3.2 . q.e.d.
Theorem 9.3.2. The genus as defined in Definition 9.3.3 is an index in the sense of Definition 9.3.2. Proof. We need to check the properties (i)-(vi) of Definition 9.3.2. (i) is obvious. (ii) If A € M is finite, then A is of the form {xv, j(xu) \ v — 1 , . . . , k} for some k. We define / : A -+ R 1 \ {0} by f(xu) = 1, f(j(xu)) = — 1 for all v (of course, we may assume xM ^ j{xv) f° r aU A*9 v). (iii) Let gen(A„) = nv < oo, v = 1,2, and let the continuous fv : Au -> R n " \ {0} satisfy U{j(x)) = -f„(x) for all x. By the Tietze extension theorem!, fv can be continuously extended to
By considering ^(fl/(x) — fl/(j(x))) that the extension still satisfies
in place of /„, we may assume
UU(x)) = ~fv{x)
for all x.
The map (/i, / 2 ) : Ax U A2 -+ R n * +n 2 \ {0} then shows that gen(Ai U A2)
= gen(Ai) 4- gen(A 2 ).
(iv) is obvious. (v) follows, since / o j shares the necessary properties with / . f See e.g. E. Zeidler, Nonlinear Functional Analysis and its Applications, I, Springer, New York, 1984, p. 708, for a proof. $ See E. Zeidler, loc. cit., p. 49.
The Palais-Smale
312
condition
(vi) Let A € M be compact. Since j(x) ^ x for all x € A (by the properties of Ai), for each x € A, we may find a neighbourhood U(x) with £/(#) O j(U(x)) = 0. Since A is compact, it can be covered by finitely many such neighbourhoods £/„, v = 1 , . . . , n. For each £/„, we choose a continuous function ipu : X —• R with ?j,(x) > 0 for x € t/j,, <^(#) = 0 for x € X \ Uv. We then define /i = ( / i \ . . . , / i n ) : A ^ R n \ { 0 } by h*(x):=JM*)
torx€U„ )
for x € A \ Uu, in particular for x € j(Uv).
(Since every x £ Ais contained in some £/„, we have h(x) ^ 0 for all x € A.) Thus gen(A) < n < oo. If A € M is compact with gen(A) = n, and / : A -> R n \ {0}
is continuous with f(j(x))
= -/(x),
we may extend / as before to / : X —• R n (with the same symmetry property). Since A is compact, so is /(A), and therefore, we may find an open neighbourhood V of f(A) with 7 c l n \ {0}. Then U := J~l(A) satisfies n = gen(A) < gen({7) < n
by (iv)
since J(U) is contained in V C R n \ {0}.
Thus gen(J7) = gen(A) as required. q.e.d. We may now obtain a general existence theorem for critical points of functionals satisfying (PS): Theorem 9.3.3. Let F,G : H —• R be C2 functionals on a Hilbert space (iJ, (•, •)) that are even, i.e. F(x) = F(-x), G(x) = G(-x) for all x £ H. Suppose F satisfies (PS) relative to G = /3, and is bounded from below. Let M := {A C {G(x) = /3}\0<£A
and (x€A<*-x€
A)}.
Let 70 := sup{gen(if) | K € M compact} (< oo). Then F possesses at least 7o critical points relative to G = (3. Proof. Since (PS) holds, by Theorem 9.1.2, all special points (in the
9.3 Topological indices and critical points
313
sense of Definition 9.3.1) for the restriction of F to X := {x G H \ G(x) = /?} are critical points for F relative to G = (3. Hence, it suffices to produce 70 special points of F on 1 . Let an :=
inf
supF(x).
AeM,gen(A)>n
X£A
Since F is bounded below, and since in the definition of 70, we only consider compact sets, we have —00 < an < 00 whenever n < 70. By Theorem 9.3.2, we may apply Theorem 9.3.1 to the genus as an index. We have in fact —00 < ot\
00 whenever n < 70.
If we always have strict equality, then the xn € s p e c a n F produced by Theorem 9.3.2 (i) are all different, because their values F(xn) are all different. If however any two such numbers a n _ i and an are equal, then by Theorem 9.3.2 (ii) we even obtain infinitely many special points. Thus, in any case, we have at least 70 special, hence critical points. q.e.d. As an application of Theorem 9.3.3, we consider the example of the previous section: Corollary 9.3.2. Let Q C Rd be a bounded domain, 2 < p < ^ ~ (respectively < 00 for d = 1,2). Then for any A > 0, the Dirichlet problem Aw - Xu + \u\p~2 u = 0 u = 0
infl
(9.3.11)
on dfl
(9.3.12)
admits infinitely many (weak) solutions. Proof. We consider the even Junctionals
G(u) = l f \ u f . PJn
We claim that F satisfies (PS) relative to G = 1. The proof is similar
The Palais-Smale
314
condition
to the argument employed for the demonstration of Theorem 9.2.2: let {v
(9.3.13)
(»G(un),DF(un))
0
\\DG(un)\\2
for n -» oo
(9.3.14)
where all norms and scalar products are from H0} (ft). From (9.3.13) (and the Poincare inequality in case A = 0), we obtain
M l ^ n ) ^ c 2-
(9.3.15)
We obtain as in the proof of Theorem 9.2.2 (cf. (9.2.5)), by using Holder's inequality, that \DG(un)(un
- um)\ =
/ \un\p~~2 un(un - Um)\ p-l
P
1
< (J \un\A (j \un - umA P .(9.3.16) Since p < j ~ , from (9.3.15) and Sobolev's Embedding Theorem 3.4.3, we conclude that / \un\p is bounded, whereas (9.3.15) and the RellichKondrachev theorem (Corollary 3.4.1) imply that {un)n^ is a Cauchy sequence in Z7(fi). Thus, from (9.3.16) DG(un)(un
— Um) —» 0
for n, m —• oo.
(9.3.17)
Also ||ZX?(u„)|| = >
\DG{un)(w)\ sup w<=Hl'\n) \W\\UU2 \DG(un)(un)\ Un\\Hl,2
/K
(9.3.18)
U*n\\H, „1,2 n
> 0
from (9.3.15) and
- [ \un\p = G(un) = 1.
PJ
(9.3.19)
9.3 Topological indices and critical points Prom (9.3.17), (9.3.18) we conclude that there exist hnm e DG(un)(un
-Um + hnm)
= 0
1 2
H^nmll// - —• 0
315 HQ,2(Q)
for all n , m
with
(9.3.20)
for n, ra —• oo.
(9.3.21)
Therefore, from (9.3.14) DF(un)(un
-um
+ hnm) -+ 0,
i.e. / (Dun (D(un - um) 4 Dhnm) 4 Xun(un -um
+ hnm)) -» 0 for n, m —> oo
and because of (9.3.21) then also / (Dun(D(un
- um)) 4 Xun(un - Um)) -+ 0.
This implies / (\(D(un - um)\2 4- A \(un - um)\2J
-• 0
for n, ra -+ oo,
and consequently, (wn)n€N is a Cauchy sequence in Ho , 2 (n). This verifies (PS) relative to G = 1. In order to apply Theorem 9.3.2, we thus only need to check that in the present case, 70 = 00. However,
juetf^nj^lMl^i} is the intersection of a sphere centered at the origin in L p (fi) with the subspace HQ,2(Q). Therefore, the argument of Lemma 9.3.2 easily implies 70 = 00. Theorem 9.3.2 thus produces infinitely many solutions
DF(unn> )-(DG^DF^DG(un)=0, 2
X n>
\\DG{un)\\
i.e. with = M
" '
(DG(un),DF(un)) \\DG(un)\\2
'
weak solutions of Aun - \un 4 \in \un\p~~ un = 0 un — 0
in Q on dil.
316
The Palais-Smale
condition
If we choose vn with i/P~2/in = 1, then vn := vnun solves (9.3.11), (9.3.12) weakly. Again, we remark that elliptic regularity theory implies that all un and vn are smooth in Q, so that in fact we obtain classical solutions of (9.3.11), (9.3.12). q.e.d. In Theorem 9.2.2 and in Corollary 9.3.1, we had imposed the restriction 2d p< — ( m c a s e d > 3) , a—2 and the reader may wonder whether this is necessary. To pursue this question, we shall now discuss the theorem of Pohozaev: T h e o r e m 9.3.4. Let Q C M.d be a smooth domain which is strictly star shaped w.r.t. 0 £ Md (this means that the outer normal vofVt satisfies (x, v(x)) > 0 for all x £ dft). Then for X > 0, any solution of Au - Xu 4- \u\ *** u = 0 u = 0
in
ft
(9.3.22)
ondfl
(9.3.23)
vanishes identically. We shall present a complete proof only for A > 0 and for smooth solutions u (elliptic regularity implies that any weak solution of (9.3.21), (9.3.22) is automatically smoothf on Q, but the present book does not treat this topic): We multiply (9.3.22) by £ f = 1 x*-^ and obtain
(Au -XU+ \U\&* u) XX 1^7
0
(9.3.24)
By (9.3.23), we have w = 0 o n OVt, hence also X > * 0 = I > V g * (y = ( i / 1 , . . . , vd) is the exterior normal of Q). Integrating (9.3.25) therefore yields f ,r, .2 JQ
Ad r , Z
JQ
l2
d~2 Z
r , ,JLL JQ
i f Z Jdc
2
£ x V = 0. (9.3.26)
f See for example Appendix B in M. Struwe, Variational Methods, Springer, Berlin, 2nd edition, 1996.
9.3 Topological indices and critical points
317
On the other hand, multiplying (9.3.22) by u leads to / \Du\2 + A / \u\2 - / \u\& JQ
JQ
Equations (9.3.26) and (9.3.27) imply 2A , _ , ,
= 0.
(9.3.27)
JQ
X > V = 0.
(9.3.28)
Jdtthe result. (If A = 0, one still concludes If A > 0, this implies uJn= 0, hence that | ^ = 0 on dft. Since also u = 0 on dfi by (9.3.23) one may invoke a unique continuation theorem for solutions of elliptic equations to obtain u = 0 in ft. We omit the details.) q.e.d. Theorem 9.3.4 implies that for p — j ~ in Theorem 9.2.2 and Corollary 9.3.2, the Palais-Smale condition no longer holds. Namely, if it did, the proofs of those results would yield the existence of nontrivial solutions. It also shows that if the Palais-Smale condition fails the whole scheme developed in the present chapter for producing critical points breaks down. Since for p < j ~ , (PS) does hold, the case p — j ^ , c a n be considered as as limit case for (PS). In fact, such limit cases of the Palais-Smale condition occur in many variational problems that are of importance in Riemannian geometry, e.g. the Yang-Mills functional on a four-dimensional Riemannian manifold, two-dimensional harmonic maps, surfaces of constant mean curvature, the Yamabe functional etc. The interested reader is for example referred to K. C. Chang, Infinite Dimensional Morse Theory and Multiple Solution Problems, Birkhauser, Boston, 1993, J. Jost, Riemannian Geometry and Geometric Analysis, Springer, Berlin, 2nd edition, 1998, M. Struwe, Variational Methods, Springer, Berlin, 2nd edition, 1996, and the references contained therein. The basic references that have been used in writing the present chapter are the monograph of M.Struwe just quoted, as well as P. Rabinowitz, Minimax Methods in Critical Point Theory with Applications to Differential Equations, CBMS Reg. Conf. Ser. 65, AMS, Providence, 1986 and
318
The Palais-Smale
condition
E. Zeidler, Nonlinear Functional Analysis and its Applications, III, Springer, Berlin, 1984. These three monographs contain not only detailed bibliographical references — which the reader is urged to consult in order to find the original sources of the results of the present chapter — but also many further results and examples concerning the Palais-Smale condition and index theories.
Exercises 9.1
9.2
Why is Theorem 9.2.1 called 'mountain pass theorem'? Hint: Try to find an analogy between the statement of that result and the geometry of mountain passes. Try to find conditions for a function
so that the reasoning of Theorem 9.2.2 can be extended to the Dirichlet problem Au(x) = f(x,u(x))
for x € ft
u{x) = 0 for x € dfl
9.3 9.4
in a smooth bounded domain fl. (An answer can be found in Theorem 6.2 of the quoted monograph of M.Struwe.) Develop an index theory for a general compact group G in place of Z 2 . Extend Theorem 9.1.3 to the relative case as indicated at the end of Section 9.1.
Index
*-¥ = E ? - i * V = * V , x v
* ' : = * £ , 79
III2 = x x, xv u(t) = 4 « ( t ) , xv
meas, 118 fAf{x)dx off, 120
c'rm'xv
^(A), 120 •II' 1 2 5
~ W
R + : = { t e R | t > 0 } , 130
Df
V := {f : V -+ R linear with
4
x n —*• x, 135 M x :=
FP, 5 /(U + S^lTi4 AC(la,6J), 11
5r/)
'" a=010
"" {1 *4 1€ H : ( « , y ) = 0
U
S^' °' d2 62I(u,rj) := £ y J ( u + afj),_a=ol9 Fpipi77i77, = E f , i = i FpipJWj, 19 r(f\ . _
11*11 := s u p ^ 0 l|gffl € R+ U {oo}, 144 L(V,W), 145 k e r T : = { x 6 V : T x = 0}, 145 V = Vi 0 V2, 146 C fcer 1 4 7 ° '
L c
( ) : = /o T l^(*)l <** =
indT, 147
fT^d
f t f * ^ * 32
Jo \L,a=i
lc ) ;
F(V,IV),147
«*» 6Z
DF{U),
£(c):=±/0T|c(t)|2dt = 1 cT^d / - a \ 2 », Q 0 * . 32 S / 0 E „ i ( O J 39 U )tj=i,...,n' # i . * : = &9V> 3 9 r
for all y G M},
j f e := Wtefrk+gkU-gM),
150
C* 150 C^, 150 D2F(U),150 O D E , 155 I W c o : = s u P t € / IMOII* 156
39
ll/H
:=
||/|| LPM) := (/ A |/(«)|'dx) J,
5 n := {(x1 x n + 1 ) G R n + 1 ^ (x*)2 = l l \ ''"' ' i=i V } J'
159 esssu P*€>i / ( x ) := inf {A € R| f(x) < A for almost all x G A } , 162
d(p,g) 9 := mf{L(c)| c : [a,6] M rectifiable curve with p,c(b) = q}, 51
/ * ( * ) : = £ / R - * ( * * * ) / G O * . 167 Cg°(Q), 166 slippy, 166
c(a) =
319
320 ft' c c n , 167 fh =* / , 167 a : = ( a i , . . . , ad), 171 |a|:=E?«i«i.l71
Index Borel er-algebra, 117 brachystochrone, 4
canonical equation, 85, 89, 95, 97, 99-101, 111 canonical equations, 80, 93 u := D Q u , 171 canonical system, 80 Wk*(Q), 171 canonical transformation, 95-100, 103 Cantor diagonalization, 135 NllV*.p(n) : = ( E | a | < f e In IA*«I P ) P , catenary, 283 171 catenoid, 283 Hk'P(Q), 171 Cauchy sequence, 126 fc p H 0 ' (n), 171 characteristic function, 119, 211, 243 Ditt, 172 Christoffel symbols, 39 Dw, 172 classical calculus of variations, 3 /sc, 185 closed geodesic, 67 coarea formula, 250, 257 Fx(x) :=mfyex(\F(y) + d2(x,y)), 190 coercive, 186 J A ( z ) , 191 coercivity condition, 291 (A : = £ ? . > ^ , 1 9 9 Coffman, 310 sc~F, 208 cokernel, 147 2,4, 210 compactness condition, 183 q-f, 216 compactness of critical sequences, 292 r - l i m n - . o o F n , 225 complementary subspace, 146 £ V ( H ) , 242 complete, 126, 134 \\Du\\, 242 complete integral, 84, 93 Nlf?v(n) : = IMlLi(n) + WDuW («). 2 4 2 completely integrable, 100 conjugate, 22, 24 l^ld-i.243 conjugate point, 43 P(£?in):=||DXJBl|(n)l244 conservation law, 26 ph * u ( x ) , 246 conserved quantities, 26 Gl(d,R), 257 constant of motion, 80, 99 0 ( d , R ) , 257 continuous linear functional, 133 J A ( « ) V , 270 continuous linear operator, 144 (•,-)L».283 control condition, 109 ( F 5 ) , 292 control equation, 106, 108, 109, 111, 207 K a , 292 control parameter, 104 V F ( « ) , 294 control problem, 109 s p e c a , 306 control restriction, 105 gen(A), 310 control variable, 111, 207 accessory variational problem, 19 converge, 125 accumulation point, 185, 208 convex, 68, 127, 130, 143, 186, 191, 193, Ambrosetti, 302 214, 219, 222 angular momentum, 26, 28, 30 convex combination, 142 arc-length, 3 convex curve, 68 Arzeia-Ascoii theorem, 176 convex function, 122 convex functional, 188 Banach fixed point theorem, 150, 152 convexity, 291 Banach space, 126, 129, 132-134, 138, coordinate transformation, 36 145, 161, 162, 270, 291, 292, cost, 105 299-301 cost function, 207 Banach spaces, 150 countable base, 184 Bellman equation, 105, 108 Bellman function, 105, 107 countably additive, 118 Bellman's method, 106 critical family, 75 bifurcation theory, 268, 270 critical point, 5, 62, 66, 293, 294, 298, Borel measure, 118 301, 303, 306, 307, 312, 317 Borel set, 117 critical sequence, 291, 292, 304, 314
Index
321
critical value, 302, 303 cusp catastrophe, 279
fundamental lemma of the calculus of variations, 5
de Giorgi, 225 deformation, 293, 294, 297, 298, 302, 307-309 dense, 169 diffeomorphism, 34, 95 different iable, 150 differentiable map, 150 differentiation under the integral, 124 Dirac delta distribution, 173 Dirac distribution, 166 direct method, 183 Dirichlet boundary condition, 3, 26, 183, 190 Dirichlet principle, 199 Dirichlet's integral, 199, 203 distance, 51 distance function from a smooth hypersurface, 262 distributional derivative, 173 dual space, 133, 163
T-convergence, 225, 227, 229, 231 generating function, 100 genus, 310, 311, 313 genus of Krasnoselskij, 310 geodesic, 39, 43, 45, 50, 51, 55, 57, 58, 60, 88, 102 geodesic distance, 82, 90, 93 geodesic parallel coordinates, 45, 49 geometric optics, 86 gradient, 294, 299 gradient flow, 294 great circle, 42
eiconal, 82 eiconal equation, 83, 86, 90 elementary catastrophes, 279 ellipticity assumption, 198 energy, 26, 30, 32, 34 e-minimizer, 229 equivalence classes of functions, 159 essential supremum, 162 Euler-Lagrange equation, 6, 8-10, 16, 17, 19, 21-23, 29, 38, 60, 79, 80, 83, 88, 89, 111, 197, 267, 282, 303 example of Bolza, 206 extension, 130 Federer, 261 feedback control, 109 Fermat's principle, 4 field of geodesies, 46 field of solutions, 90, 93 finite perimeter, 244 first axiom of count ability, 137, 184, 185, 209, 225, 227, 228 first conjugate point, 23 first integral of motion, 30 flow, 298 foliated by tori, 100 Frechet different iable, 150 Fredholm alternative, 149 Fredholm operator, 147-149, 270, 281, 287 free boundary condition, 26 Friedrichs mollifier, 166
Holder continuous, 179 Holder's inequality, 160, 163 Hahn-Banach theorem, 129, 134, 137, 143, 166 Hamilton-Jacobi equation, 83-86, 89, 92, 93, 101 Hamilton-Jacobi theory, 111 Hamiltonian, 80, 89 Hamiltonian flow, 95, 98 harmonic, 199, 201 harmonic oscillator, 87 Hessian, 4 Hilbert space, 126, 128, 141, 162, 293, 297 Hilbert's invariant integral, 92 homogenization, 232 implicit function theorem, 151, 152 index, 147, 308, 311, 313, 318 indicator function, 210 inner radius, 70 insulating layer, 235 integrable, 120 integral, 155 integral of motion, 27 integral of the Hamiltonian flow, 99 invariant integral, 93 inverse function theorem, 154 inverse operator theorem, 145 involution, 308 isometry, 34 Jacobi, 22 Jacobi equation, 20, 24, 268 Jacobi field, 20-22, 24, 269 Jacobi identity, 103 Jacobi operator, 268, 284 Jacobi's method, 99 Jensen's inequality, 122 Jordan curve, 35 Jordan curve Theorem, 68
322
Index
Kakutani, 139 Kepler problem, 102 Kolmogorov-Arnold-Moser theory, 100 Kondrachev, 175
Moreau-Yosida transform, 212 Morrey, 222 mountain pass theorem, 302, 303, 306, 318
Lagrange multiplier, 9 Laplace operator, 199, 200 Lebesgue integral, 117, 120 Lebesgue measure, 117, 118 Legendre condition, 20, 112 Legendre transformation, 79, 88 length, 32, 34 length minimizing curve, 8 light ray, 4 limit cases of the Palais-Smale condition, 317 linear functional, 132, 133, 241 linear functionals, 129 linear operator, 144 Lipschitz continuous, 155, 203 local chart, 25, 32 local minimum, 22 lower semicontinuity, 184 lower semicontinuous, 185, 186, 188, 193, 208, 230 lower semicontinuous w.r.t. weak convergence, 187 lower semicontinuous envelope, 208 Lyapunov-Schmid, 280 Lyapunov-Schmid reduction, 269 Lyusternik, 306 Lyusternik-Schnirelman, 67
neighbourhood system, 184 Newtonian motion, 81 Noether, 26 nonminimizing critical point, 291, 302 norm, 125 norm convergence, 125, 132 norm of a linear functional, 133 normed space, 125 null class, 159 null function, 159
mean curvature, 263 mean value property, 201 measurable, 118-120 measure, 117 metric tensor, 33, 47 minimal hypersurface, 255 minimal hypersurfaces, 203 minimal surface of revolution, 282 minimax value, 303 minimizer, 4-6, 12, 183, 186, 229, 291, 302 minimizer of a convex variational problem, 189 minimizing, 3 minimizing sequence, 183 Minkowski functional, 143 Minkowski's inequality, 160 Modica, 254 Mobius strip, 75, 76 mollification, 167, 174, 175, 200, 245 momenta, 80 momentum, 26, 28, 30 monotonically increasing sequence, 122 Moreau-Yosida approximation, 190
optimal control theory, 111, 207 ordinary differential equation, 155 ordinary differential equations in Banach spaces, 155 orthogonal, 90 orthogonal complement, 141 Palais, 299 Palais-Smale condition, 77, 292, 293, 304, 306, 312, 317 parallel surfaces, 92 parallelogram identity, 128 parameterization invariant, 34 parameterized by arc-length, 8, 35, 36, 43,88 parameterized proportionally to arc-length, 35, 38, 55, 89 perimeter, 244 phase space, 98, 100 phase transition, 254 Picard-Lindelof theorem, 155 Poincare* inequality, 177, 304 Poisson bracket, 102 polar coordinate, 49 polar coordinates, 48 Pontryagin function, 110, 111 Pontryagin maximum principle, 110-112 principal curvature, 263 projection theorem, 142 proper, 62 pseudo-gradient, 299, 300 quasiconvex, 219, 222 quasilinear partial differential equation, 198 Rabinowitz, 302, 306 Radon measure, 118, 241 range, 147 rectifiable, 35
Index reflexive, 134, 135, 137-139, 163, 174, 186 regularity, 11 regularity theory, 198, 286, 306, 316 regularizing term, 255 relative minimum, 62, 66 relatively compact, 167 relaxation, 208 relaxed function, 208, 214 relaxed functional, 209 Rellich, 175 Rellich-Kondrachev theorem, 305 reparameterization, 8 Riccati equation, 86, 108 Riemannian manifold, 43, 52, 53 Riemannian normal coordinates, 48 Riemannian polar coordinate, 49, 51, 60 Riesz representation theorem, 241 rotational invariance, 200 Sard's theorem, 250, 257 scalar product, 126 Schnirelman, 306 Schwarz inequality, 35, 127 second axiom of countability, 184 second variation, 18, 23 semigroup family, 299 semigroup property, 157, 294, 297, 300 separable, 135, 169, 173, 184, 186 shortest geodesic, 52, 53, 55 shortest length, 50 er- algebra, 117 signed measure, 242 simple function, 119 smoothing kernel, 166 Sobolev Embedding Theorem, 175, 179, 303, 305 Sobolev inequalities, 179 Sobolev space, 171, 173 special point, 306-309, 312 special value, 306, 307 sphere, 39 star shaped, 316 state variable, 207 step function, 119 strictly normed, 157 strong convergence, 125, 174 submanifold, 24, 32, 43, 52, 53 summation convention, xv, 19 support, 166 surface of revolution, 60, 282 symmetry assumption, 308-310 symplectic geometry, 96 symplectomorphism, 97 Taylor expansion, 274 test functions, 166 theorem of B. Levi, 122
323
theorem of Clarkson, 164 theorem of de Giorgi and Nash, 198 theorem of E. Noether, 28 theorem of Fatou, 123 theorem of Fubini, 122 theorem of Helly, 132 theorem of Jacobi, 84, 93, 101 theorem of Kondrachev, 180 theorem of Lebesgue, 123 theorem of Liouville, 98 theorem of Lyusternik-Schnirelman, 67 theorem of Mazur, 142 theorem of Milman, 139 theorem of Modica-Mortola, 248 theorem of Morrey, 179 theorem of Picard-Lindelof, 39, 155 theorem of Pohozaev, 316 theorem of Rellich, 175 theorem of Riesz, 141 theorem of Riesz-Fischer, 161 theorem of Sobolev, 179 theorem on dominated convergence, 123 theory of catastrophes, 279 Thorn, 279 topological space, 185 translation invariance, 118 transversality condition, 110 triangle inequality, 125, 126, 159 uniform convergence, 126, 168 uniformly continuous, 168 uniformly convex, 127, 129, 139, 157, 164 unstable critical point, 291 variational problem, 9 volume preserving, 98 weak convergence, 135-137, 142, 174, 186, 214 weak* convergence, 135 weak* convergent, 135 weak derivative, 171, 172 weak limit, 138 weak solution, 306, 315 weak solution of the Jacobi equation, 285 weak topology, 291 weak* topology, 137 weakly convergent, 135, 136 weakly lower semicontinuous, 222 weakly proper, 185 Weierstrafi, 46 Weierstrass approximation theorem, 170 Weierstrafi condition, 112 Weyl's lemma, 199 Young's inequality, 160 Zorn's lemma, 131