Quadratic Form Theory and Differential Equations John Gregory cz:
DEPARTMENT OF MATHEMATICS SOUTHERN ILLINOIS UNIVERSITY AT CARBONDALE CARBONDALE, ILLINOIS
@
1980
ACADEMIC PRESS A Subsidiary of Harcourt Brace Jovanovich, Publishers
New York
London
Toronto
Sydney
San Francisco
COPYRIGHT ' 1980, BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED. NO PART OF mls PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM mE PUBLISHER.
ACADEMIC PRESS, INC.
111 Fifth Avenue, New York, New York 10003
United Kingdom Edition published by ACADEMIC PRESS, INC. (LONDON) LTD. 24/28 Oval Road, London NWI 7DX
Library of Congress Cataloging in Publication Data Gregory, John, Date Quadratic rorrn theory and differential equations. (Mathematics in science and engineering) Bibliography: p. Includes index. 1. Forms, Quadratic. 2. Differential equations, Partial. 3. Calculus of variations. 1. Title. II. Series. QA243.G73 512.9’44 80-520 ISBN 0-12-301450-6
PRINTED IN mE UNITED STATES OF AMERICA
80 81 82 83
987654321
To Virginia and Magnus
Contents
Chapter 0
A Few Introductory Remarks
Chapter 1
Introduction to Quadratic Forms and Differential Equations
1.0 LI 1.2 1.3 1.4
Introduction The Finite-Dimensional Case The Calculus of Variations Fundamental Lemmas (Integration by Parts) Quadratic Forms and Differential Equations
Chapter 2 2.0 2.1 2.2 2.3
3.0 3.1 3.2 3.3 3.4 3.5
Abstract Theory
Introduction Hilbert Space Theory Further Ideas of Hestenes Approximation Theory of Quadratic Forms
Chapter 3
4 5 25 31 38
58
59 62
73
The Second-Order Problem 82 83 88 103 114
Introduction The Focal-Point Problem The Numerical Problem The Eigenvalue Problem The Numerical Eigenvalue Problems Proofs of Results
135 vii
viii
Contents
Chapter 4 4.0 4. I 4.2 4.3 4.4
Introduction The Signature Theory of Lopez Approximation Theory Comparison Results Higher-Order Numerical Problems and Splines
Chapter 5 5.0 5.1 5.2 5.3
I 2 3 4
174 175
188 197
The Quadratic Control Problem
Introduction Focal-Interval Theory of Quadratic Forms Focal Arcs of Differential Equations Two Examples An Approximation Theory of Focal Intervals
Postscript
140 143 156 160 166
Elliptic Partial Differential Equations
Introduction Summary The Numerical Problem Separation of Variables
Chapter 6 6.0 6.1 6.2 6.3 6.4
The 2nth-Order Problem
201 202
208 215 221
The Numerical Problem Revisited
The x(t)x'(t) Term Cheap Boundary-Value Methods Systems Nonlinear Problems
225 226 227
229
References
231
Index
235
Preface
Historically, quadratic form theory has been treated as a rich but misunder› stood uncle. It appears briefly, almost as an afterthought, when needed to solve a variety of problems. A partial list of such problems includes the Hessian matrix in n-dimensional calculus; the second variational (Jacobi or accessory) problem in the calculus of variations and optimal control theory; Rayleigh- Ritz methods for finding eigenvalues of real symmetric matrices; the Aronszajn- Weinstein methods for solving problems of vibrating rods, membranes, and plates; oscilla› tion, conjugate point, and Sturm comparison criteria in differential equations; Sturm- Liouville boundary value problems; spline approximation ideas for nu› merical approximations; Gershgorin-type ideas (and the Euler- Lagrange equa› tions) for banded symmetric matrices; Schrodinger equations; and limit-point› limit-circle ideas of singular differential equations in mathematical physics. A major purpose of this book is to develop a unified theory of quadratic forms to enable us to handle the mathematical and applied problems described above in a more meaningful way. Our development is on four levels and should appeal to a variety of users of mathematics. For the theoretically inclined, we present a new fonnal theory of approximations of quadratic forms/linear operators on Hilbert spaces. These ideas allow us to handle a wide range of problems. They also allow us to solve these problems in a qualitative and quantitative manner more easily than with more conventional methods. Our second level of develop› ment is qualitative in nature. Using this theory, we can derive very general quali› tative comparison results such as generalized Sturm separation theorems of dif› ferential equations and generalized Rayleigh- Ritz methods of eigenvalues. Our theory is also quantitative in nature. We shall derive in level three an approxima› tion theory that can be applied in level four to give numerical algorithms that are easy to implement and give good numerical results. ix
x
Preface
Our development will provide several bonuses for the reader. A major advan› tage is that our numerical theory and algorithms are designed to be used with high-speed computers. The computer programs are small and easy to implement. They trade detailed analysis by and sophistication on the part of the user for large numbers of computer computations that can be performed in fractions of milli› seconds. Another advantage is that our four levels can be understood and used (virtually) independently of each other. Thus our numerical algorithms can be understood and implemented by users with little mathematical sophistication. For example, for eigenvalue problems, we need no understanding of projection operators, Hilbert spaces, convergence, Green’s functions. or resolvent opera› tors. We need only the idea of the Euler- Lagrange equation, an idea that we can obtain a discrete solution as a result of level one, a one-step- three-term differ› ence equation, and an interval-halving procedure. As with any mathematical theory, we shall leave the reader with several re› search problems still unanswered. In the area of discrete mathematics, we present for splines and for real symmetric banded or block diagonal symmetric matrices a use that may stimulate further research. For those problems in optimal control theory, we expect our methods, which give qualitative results, to give quantitative results similar to those obtained for the calculus-of-variations case. For the area of limit-point-limit-circle differential equations and singular differ› ential equations (Bessel, Legendre, Laguerre), we expect our ideas to carry over to this very important area of mathematical physics. For the area of differential equations, we hope that our ideas on integral-differential equations can lead to new ideas for oscillation theory for non-self-adjoint problems. Our concept of quadratic form theory began with the landmark Pacific Journal of Mathematics paper by Professor Magnus Hestenes in 1951. For many years, he was convinced. that there should be a unified method for problems dealing with a quadratic form Jix) on a Hilbert space d. A major part of his work depends upon two nonnegative integer-valued functions s and n, which corres› pond to the number of negative and zero eigenvalues of J(x). In subsequent years, Hestenes and his students showed how this theory could be applied to solve a multitude of applied problems. In 1970 the author developed, in a Ph. D. thesis under Professor Hestenes at the University of California, Los Angeles, an approximating theory of quadratic forms J(x;u) defined on Hilbert spaces d (rr), where a is a parameter in a metric space. In this and subsequent work, this approximation theory has been used to solve the types of problems listed above. A major part of our work involves the development and interpretation of inequalities concerning s(u) and n(u) as tr approaches a fixed member a 0 of the matrix space I. In Chapter I we take a look backward at more classical methods and ideas of quadratic forms. It may initially be read briefly for flavor and interest since this material is not completely necessary for subsequent chapters. We begin this
Preface
xi
chapter with finite-dimensional quadratic forms. Many of these ideas will be new to even the sophisticated reader and will appear in an infinite-dimensional con› text in later parts of the text. The topics include the duality between quadratic forms and symmetric matrices, stationary conditions off: lRn ~ lRI, Rayleigh› Ritz methods, and eigenvalues as comparison parameters. Section 1.2 contains a brief introduction to the calculus of variations and in particular the second varia› tion. Of interest is that the Euler- Lagrange necessary conditions are differential equations. In Section 1.3 we cover a general theory of integration by parts and multiplier rules. Section 1.4 explores briefly the relationship between quadratic forms and differential equations. Many examples are included, covering the sim› pler second-order problems to the more difficult 2nth-order control theory or partial differential equations. Chapter 2 may also be initially read for flavor by all but the theoretical mathe› matician since it contains our theoretical machinery and results. Section 2.1 con› tains the basic Hilbert space material, which was given by Hestenes and which forms the basis of our approximation theory. The majority of the material in Section 2.2 is more general than needed for the remainder of this book. Section 2.3 is our fundamental theoretical section yielding nonnegative integer inequali› ties. Briefly, if s(o-) and n(o-) correspond to the number of negative and zero eigenvalues of a quadratic form or symmetric matrix, then for 0- "close to" 0- 0 we obtain s(o- 0) :0::;; s(o-) :0::;; s(o-) + n(o-) ~ s(o- 0) + n(o- 0)’ This innocent-looking inequality is used extensively throughout this book. Chapter 3 is a complete discussion of the second-order problem, and the reader is strongly advised to begin here. We have made a serious attempt to make our ideas in this chapter conceptually clear and descriptive so as to be readily understood. In a real sense, Chapter 3 is a book unto itself. The nontheoretical parts may be understood by senior-level students in mathematics and the physical sciences. Once grasped, the remainder of the book can at least be read for the flavor of more general examples. Formal proofs have been postponed until the last section of this chapter. We begin Chapter 3 with a discussion of the duality of focal-point theory of quadratic forms and the oscillation theory of differential equations. Section 3.2 contains approximation ideas and shows how to build numerical solutions for differential equations. Sections 3.3 and 3.4 contain gen› eral theories for eigenvalue problems. The unified setting yields numerical› eigenvalue-focal-point theories and results, as wen as efficient and accurate computer algorithms. Chapter 4 contains the most general ordinary-differential-system-quadratic› form problem, namely, the self-adjoint 2nth-order integral-differential case be› gun by Hestenes and Lopez. The exposition is primarily theoretical, but in Sec› tion 4.4 we do give numerical ideas of higher-order spline approximations and banded symmetric matrices. Section 4.1 contains the work of Lopez relating quadratic forms and differential equations. Section 4.2 contains our approxima-
xii
Preface
tion theory. Section 4.3 presents a general comparison theory and results that are applicable to a variety of problems. Chapter 5 contains the elliptic partial differential equation theory begun by Hestenes and Dennemeyer; this theory is contained in Section 5.1. The numeri› cal construction of conjugate (or focal) surfaces for Laplacian-type partial differ› ential equations, including eigenvalue results, is given in Section 5.2. In Section 5.3 we give a separation-of-variables theory for quadratic forms and new ideas for block tridiagonal matrices. Chapter 6 contains a general theory of quadratic control problems begun by Hestenes and Mikami. In particular, in Section 6.1 we generalize the concepts of oscillation, focal, and conjugate point to focal intervals and show how to count and approximate them. The concept of abnormality is the key idea here, which distinguishes conjugate-point (calculus-of-variations) problems and focal-inter› val (optimal control theory) problems. In Section 6.2 we apply these ideas to solutions of differential equations. In Section 6.3 we give two nontrivial exam› ples to illustrate abnormality. Finally, in Section 6.4 we apply our approximation ideas a second time to obtain an approximation theory of focal intervals. It should be evident that we have been influenced by many distinguished scholars whose works cover several centuries. We should like particularly to acknowledge the work and guidance of Professor Magnus Hestenes in the begin› ning of this effort. Quadratic form theory is only one of at least four major mathematical areas that bear his stamp. To paraphrase one of our most illustrious forefathers, "If we have seen further than others, it is because we have stood on the shoulders of giants. " We should like to acknowledge Lewis Williams and Ralph Wilkerson for their support in the generation of computer algorithms that appear in this text. We acknowledge Joseph Beckenbach for his fine illustrations, Sharon Champion for her expert typing and patience in reading handwritten pages, and the author’s charming wife, Virginia, for her editorial corrections. Finally, the author would like to thank Professor Richard Bellman for inviting him to write this book at an early stage of its development, thus providing the encouragement to complete the task.
A Few Introductory Remarks
Chapter 0
On May 14, 1979, I had just arrived at the Technical University in Wroclaw, Poland. My luggage and hence my notes had not yet arrived due to the inefficiency of the American (not the Polish) airlines. There was a scheduled Monday morning seminar, and I was asked if I should like to speak, essentially on the spur of the moment. I replied, as one must in those circumstances, "Of course." It seems reasonable that the summary of such a presentation before a charming though general audience, having some lan› guage difficulties, might form an introduction to this book. From the point of viewofthis book, the followingdiagram is fundamental: Differential equations problem
(2)
I
Equivalent quadratic form problem
-
(1)
Solution of differential equations problem
(4)
(3) ~
I
Solution of quadratic form problem
Usually, people working on differential equations proceed on path (1). By this statement, we mean that they have their own methods to solve their problems. Thus, a numerical problem might call for divided difference methods, while oscillation theory might call for Sturm theory type argu› ments. Our approach will be to convert the differential equation into the equivalent quadratic form, path (2); solve this quadratic form problem, path (3); then convert back into the solution, path (4). These methods seem to re› quire more steps. However, the steps are often easier to accomplish and are 1
2
0 A Few Introductory Remarks
more enlightening. We get better results, methods, and ease of applicability. In addition, we have more flexibility and more ability to generalize to more difficult problems with less additional effort. Three example problem areas come to mind, and we shall quickly describe them in the next few paragraphs, deferring a more thorough explanation until Chapter 3. We ask the reader to skim the next few paragraphs for the cream and not be concerned about details. Equally important, we ask the reader to note that these examples can be easily combined by our ideas, a process not easily performed on path (1). We shall illustrate a numerical oscillation eigenvalue theory of differential equations at the end of the next few paragraphs. Let L(x) be a linear self-adjoint, differential operator, and Q(x) be the associated quadratic form, such as our most elementary infinite example L(x) = x"(t) + x(t)
(1)
= 0,
I: (X,2 - x 2)dt, Q(x, y) = I: [X’(t)y’(t)- x(t)y(t)] dt.
(2a)
Q(x) =
and (2b)
For (1) we wish to study conjugate or oscillation points relative to t = 0; that is, point A such that there is a nontrivial solution of (1), denoted xo(t), such that xo(O) = Xo(A) = O. (1)is the Euler-Lagrange equation of(2). It is obtained by integration by parts or a divergence theorem. Let ~(A) denote the col› lection of smooth functions such that x(t) is in ~(A) implies x(O) = 0 and x(t) == 0 on [A, b]. We shall see that ~(A) is a subspace of a Hilbert space. For (2),we wish to determine the signature S(A), that is, the dimension of’t&’where ’t&’ is a maximal subspace of ~(A) with respect to the property that x =F 0 in ’t&’ implies Q(x) < O. That is, S(A) is the dimension of a negative space of ~(A). Let n(A) = dim{x in ~(A)I Q(x,y) = 0 for y in ~(A)}. These two nonnegative indices correspond, respectively, to the number of negative and zero eigen› values of a real symmetric matrix. Instead of finding the zeros of (1) subject to y(O) = 0, path (1), we convert L(x) to Q(x), path (2), solve the signature S(A) for each 0:::;; A :::;; b, path (3), and finally use the result that for Ao in [0, b], (3)
S(AO) =
L
n(A).
).
Thus, S(A o) counts the number of oscillation points before t = AO’ Similarly, the eigenvalue differential equation L(x;~) = x" + x(O) = x(n) is converted to a quadratic form
J(x;~)
= J(x)
-
~K(x)
=
I: X,2 dt - ~ Io" x
2
dt.
~x
= 0,
o
A Few Introductory Remarks
3
This time, s(~) is the signature of J(x;~) on a smooth space of functions defined on [0, n] vanishing at the end points. We solve this problem, path (3); then establish the equivalence between an eigenvalue ~o and the discon› path (4). tinuity in s(~o), Similarly, for numerical problems, we convert (1) to a quadratic form (2a), in path (2), numerically approximate (2a) by a finite-dimensional qua› dratic form, path (3), and then show that this approximation leads to a numerical solution that converges to the desired result in a very strong, derivative norm sense, path (4). As we have remarked before, our methods allow us to combine these three problems in a relatively simple manner to obtain a numerical oscillation theory of eigenvalues.
Chapter 1
Introduction to Quadratic Forms and Differential Equations
1.0 Introduction
The purpose ofthis chapter is to present to the reader much of the beauty and many of the fundamental ideas of quadratic forms. This chapter is an introduction to the remainder of this book. It may be read (and reread) for interest and examples, or it may be skipped entirely by those who are only interested in specific problems. Section 1.1 treats the finite-dimensional case or equivalently a real sym› metric matrix. Since most readers may be familiar with the usual ideas, we have included several topics that illustrate important ideas which are not commonly known nor understood. We believe even the expert will find these topics of interest. The topics are: (a) the duality between finite-dimensional quadratic forms and matrices; (b) optimal or stationary conditions of f:lRn --+ 1R 1 and in particular second-order conditions (the Hessian); (c) the (finite) Rayleigh-Ritz method for obtaining eigenvalues; and (d) eigenvalues as comparison parameters. Section 1.2 contains a brief introduction to the calculus of variations. Of special interest is the second variation functional or the stationary value con› ditions when the original functional is quadratic. The major idea is that the necessary condition for quadratic functionals leads to a self-adjoint differ› ential system of equations. Some interesting examples are given. In Section 1.3 we explore the fundamental tool in our work, i.e., inte› gration by parts. We show that these ideas can be put on a sound mathe› matical basis. Of special note is the use of multiplier rules. In Section 1.4we explore briefly the relationship between quadratic forms
1.1 The Finite-Dimensional Case
5
and differential equations. In particular, two indices of quadratic forms are introduced and their relationship with solutions of differential equations with boundary value problems are given for many interesting problem settings. This section also contains many examples that the reader should find helpful. We note that these indices correspond to the number of negative and zero eigenvalues of a real symmetric (possibly infinite) matrix. Our initial idea was to include a section on the Aronszajn-Weinstein theory of eigenvalues for compact operators since classically these ideas pro› vide one of the most beautiful uses of Hilbert space theory. However, with the use of a computer we have developed numerical algorithms (Chapters 3 and 5) that surpass the computational algorithms of those classical methods in speed, accuracy, and feasibility. The interested reader may consult Gould [12] for the best explanation of these methods.
1.1
The Finite-Dimensional Case
In this section we treat four topics. Our criteria of which topics to include and of the degree of each topic were based on a personal judgment, based upon interesting ideas and what we feel is needed to understand quadratic form theory and the remainder ofthis book. Whenever possible we shall avoid technical details, results, and settings and use an expository style. The first topic deals with the duality between real finite-dimensional quadratic forms on /R" and real symmetric matrices. The second topic deals with optimal or stationary conditions of a function f: /R" -> 1R 1 and second-order necessary conditions involving the symmetric matrix A (the Hessian) with elements aij = iJ 2f/iJxJJxj evaluated at a stationary point. The third topic is the (finite) Rayleigh-Ritz method for obtaining the eigenvalues of a real symmetric matrix. Our fourth and last topic is the concept of eigenvalues as companion parameters between a real symmetric matrix A and the identity, or more generally another real symmetric matrix B. We have also added some ideas on Lagrange multipliers for yet another view of eigenvalue theory and extremal problems. In fact, as we shall indicate in subsequent sections and chapters, this is often the correct, more practical view of eigenvalues. We begin the first topic by assuming that JR is a finite-dimensional, real inner product space and Q(x) is a quadratic form defined on JR. The re› mainder of the book will be concerned with extending these concepts, along with the "meaning" of nonpositive eigenvalues, to infinite-dimensional qua› dratic forms Q(x) and Hilbert spaces JR. Our model of JR in dimension n is usually /Rn and of Q(x) is xT Ax = (Ax, x), where A is an n x n real symmetric matrix, x an n vector, and x T the transpose of x. For completeness, we include some topics involving background material in the next few paragraphs.
6
1 Introduction to Quadratic Forms and Differential Equations
We assume that the reader is familiar with the definition of (Yf’, [RI, +,.) as a real vector space, !/ a subspace of Yf’, linear combinations, linear in› dependence and linear dependence, span, and basis. IfYf’ is a vector space, an inner product on Yf’is a function (" .): Yf’ x Yf’ ~ [Rl such that if x, y, z in Yf’ and e in [Rl, then (x + y, z) = (x, z) + (y, z),(ex, y) = e(x, y), (x, y) = (y, x), and (x,x) ~ 0 with equality if and only if x = 0 in Yf’. The usual example is (x,y) = yTx = ajb j, where repeated indices are summed, x = (ai>’" ,an)T, and y = (b l , .. ,bn)T in [Rn. The norm of x is a function 11’11:Yf’ ~ [RI such that is the positive square root of (x, x). The following ideas are found in most standard texts, for example, Hoffman and Kunze [32].
Ilxll
[RI
Theorem 1 we have
If Yf’ is an inner product space, then for any x, y in Yf’ and e in
(i) Ilexll = lelllxll, (ii) Ilx/l ~ 0, /Ixll = 0 if and only if x (iii) I(x, y)1 s /Ixlilly/l, (iv) Ilx + yll s Ilxll + IIyll•
=
0,
Statement (iii) is the well-known Cauchy-Schwartz inequality, and (iv) is the triangular inequality. We remark that both of these inequalities hold in the more general case of a real symmetric matrix A associated with a quadratic form Q(x) = (Ax, x) if Q(x) is nonnegative, i.e., x =F 0 implies Q(x) ~ O. The inner product is the special case with A = I. We shall make these concepts clearer below, but for now let Q(x, y) = (Ax, y) = (x, Ay) = Q(y, x) be the bilinear form. Conditions (iii) and (iv) become, respectively,
IQ(x,
y)1 s .jQ(x)JQ(y), (iii)’ (iv)’ JQ(x + y) s .jQ(x) + JQ(y). Condition (iii)’ follows since for Areal,
os
Q(x
+ AY) = Q(x + Ay,x + AY) = Q(x, x) + Q(Ay,x) + Q(x, AY) + Q(AY,AY) = Q(x) + 2AQ(X,y)+ A2Q(y).
If x = 0 or y = 0, we have equality in (iii)’.The fact that the quadratic func› tion of Ahas no roots or one double root implies the discriminant "B 2 - 4AC" of the quadratic formula is nonpositive; otherwise we would obtain two real values of A, and hence f(A) = A2Q(y) + A[2Q(x, y)] + Q(x) is negative for some A = Ao. Thus B 2 - 4AC = 4Q2(X, y) - 4Q(x)Q(y) s 0, and hence Q2(X, y) S Q(x)Q(y). If Q(x) > 0, equality holds if and only if x + Ay = O.
1.1
The Finite-Dimensional Case
7
For (iv)’, Q(x + y) = Q(x) + 2Q(x, y) + Q(y) ::; Q(x) + 2IQ(x, Y)I + Q(y) ::; Q(x) + 2JQ(x)JQ(y) + Q(y) = (JQ(x) + JQ(y»2. Since Q(x + y) ~ 0 we may take square roots of both sides to obtain (iv)’.By "dropping the Q" we obtain the usual proofs of (iii) and (iv) in Theorem 1. The vector x is orthogonal to y if(x, y) = O. The vector x is orthogonal to !/ (a subset of .Yf’) if (x, y) = 0 for all y in !/. !/ is an orthogonal set if (x, y) = 0 for all x # y in [1’. !/ is an orthonormal set if [I’ is an orthogonal set and IIxli = 1 for all x in !/. The Gram-Schmidt orthogonalization process ,xn } are n linearly independent vectors, provides that if {x., x 2 , there exists an orthonormal set of vectors {Y1’Y2,... ,Yn} such that span{x1" .. ,xk ) = span{y, ... ,Yd,where 1::; k s; n. The vectors {Yk} are defined inductively by Y1 = xtlllx111 and Ym+1 = zm+tlllzm+111, where (as› suming Y2, ... ,Ymhave been found) m
(1)
Zm+
1=
X m+
1-
I
(xm + 1, y,Jh•
k=l
In fact, Zm is the solution to the projection or best approximation problem illustrated by Fig. 1.
Fig. 1
If.Yf’is a vector space over [R1, then L:.Yf’ --+ .Yf’ is a linear operator if x, Y in .Yf’ and c in ~1 imply L(cx + y) = cL(x) + L(y). It is well known (but bothersome to state precisely) that there is an isomorphism between the set oflinear operators L(.Yf’)and the set Atn x n of n x n matrices, where .Yf’ is an n-dirnensional vector space. However, before we move on, let us illustrate the above definitions and concepts by assuming .Yf’ = {x(t) = a o + a1t + a2t2 + a 3t 3 } , where a k in
8
1
Introduction to Quadratic Forms and Differential Equations
[RI (k = 0,1,2,3) with L = D, the derivative operator. Choosing the standard basis {1,t,t Z,t3} of £’ with coordinates a = (aO,aI,aZ,a3)T in [R4, we note that D(1) = 0 = 01 + Ot + OtZ + Ot3, D(t) = 1 = 11 + Ot + OtZ + Ot3,
D(t Z) = 2t = 01 + 2t + OtZ + Ot3, D(t 3) = 3t Z = 01 + Ot + 3t Z + Ot3. Thus D, is identified with the 4 x 4 matrix
0
1 0
M= 0 0 2 ( 000 000 while D( - 3x + 4x Z ) = - 3 + 8x since
Note that D(t k) determine the components of the columns of M. Similarly, this four-dimensional space becomes an inner product space if we define either or
(x, y)z =
f
1
p(t)x(t)y(t)dt,
where
x(t)
=
y(t) =
+ a.: + azt Z + a3t3 bo + bIt + bzt Z + b 3t3, ao
and p(t) > 0 and integrable. Note that if x(t) = t and y(t) = t 3, then x(t) is orthogonal to y(t) using (, )1> but x(t) is not orthogonal to y(t) using ( , )z since in this case with p(t) = 1 for example,
(x,y)z = Similarly
"xiiI = ~ Ilxl ~
=
f~1 (t)(t 3)dt = t t 51:’1 = %=1= O. 1, while
= (x,x)z =
Ilxllz = -A since
fi tZdt
=
tt31:’1 = i.
9
1.1 The Finite-Dimensional Case
Things are not all that bad, however, since all norms in finite-dimensional vector spaces are equivalent, that is, there exists m and mwith 0 < m < m such that
mllxll s Ilxll’::;; mllxll for any x in JIf’ and norms "’11,11-1/’. Thus in [R2 if x = (t1,t2)T, then the circle norm IIxll3 = (tf + t~)1/2 is equivalent to the square norm IIxl14 = max{lt 11, It21} since each circle of positive radius has both an inscribed and circumscribed square, that is for example tmax{lt11,lt21}::;; (tf + tD l / 2::;; 2max{lt 11,lt21}• Using ( ,
h,
the Cauchy-Schwartz inequality becomes
/2 /2 2(t)dty If1 X(t)y(t)dtl::;; (f1 X (f1 y2(t)dty for any third-order polynomial x(t) and y(t), while the triangular inequality becomes
(f~1
[x(t)
(fl
X2(t)dty
+ y(tWdtY/2s
/2
+ (f~1y2(t)dtY/2
Finally the Gram-Schmidt process for the basis {1, t, t 2, t 3} and the inner product (, h proceeds using (1) and letting X1(t) = 1, xit) = t, X3(t) = t 2, X4(t)=t 3:
I x1(t)"~
= f~
1
2
1 dt = 2,
Z2(t)=t-(f1 t Il zi t )112 =
f~
Y1 =
1/.j2,
~dt)~=t-~~1~1
1(t - t)2 dt = 1(t -
=t-~,
t)31 ~ 1 = i x if= /2’
yit) = .JI2(t - t)•
To save the reader’s patience (and cost of books) we shall stop the procedure here, except to note that the reader may verify that (Yi’yjh = J ij , where i, j = 1, 2 and Jij = 0 if i =1= j, and = 1 if i = j. Jij is the Kroneker delta. The next step is to compute Z3(t) by
aij
2 Z3(t) = t -
(f~l
t
2
~dt)
~
- {f1 t
2[
.JI2(t -
D
Jdt} JIT(t
-~)
and continue the above process to Z4(t). Returning to our topic, we remark that in this text we are interested only in self-adjoint linear transformations. Let L: JIf’ --+ JIf’ be a linear operator. L T is the adjoint of L if (Lx, y) = (x, y) for all x, Y in ;Yf. L is self-adjoint if
e
10
1 Introduction to Quadratic Forms and Differential Equations
L = LT. It is well known that L T is a linear operator; hence in the isomor› phism between L(.Yt’,.Yt’)and v1tnxn described above, every self-adjoint linear operator is associated with a real symmetric matrix A = AT, i.e., A = (ajj) implies aij = ajj. It is also well known that L is self-adjoint if and only if its associated matrix A in every orthogonal basis is a symmetric matrix. In this case Q(x) = (Lx, x) or x T Ax = (Ax, x) is the associated quadratic form. Note that if Q(x) is quadratic, then Q(x, y) = (Ax, y) is the associated bilinear form and (2)
Q(x + y) = Q(x) + 2Q(x, y) + Q(y).
For the remainder of this subsection we assume that .Yt’ is !Rn and the linear operators are symmetric matrices A = AT. Many of our comments hold equally well for .Yt’ a finite-dimensional, real vector space. The matrix U is orthogonal if U- 1 = UT. Since (Ux, Uy) = (Uy)TUX= yTUTUx = yTlx = yTX and II U xll 2 = Ilxll2, angles and distances are preserved under orthogonal transformations. A is orthogonally diagonalizable ifthere exists an orthogonal matrix U such that UTAU = U- 1 AU = D, where D is a diagonal matrix, that is, D = (dij) implies dij = 0 if i #- j. The complex number Ais an eigenvalue of the matrix A if there exists a nonzero vector x, called an eigenvector, such that Ax = Ax. One of the most beautiful results in mathematics is the principal axis theorem: A is orthogonally diagonalizable with D = diag{Al> ..1. 2 , , An} and real eigenvalues {Ai} if and only if A is symmetric. Further› more the ith column vector of U is a normalized eigenvector corresponding to Ai’ and these eigenvectors are mutually orthonormal since UU- 1 = UU T = I = (c5ij)’ We note that the calculation n
Q(x) = . ~ I.
J
aijtitj = 1
~
(J
=
L Ajst
(t:1) TA:(t 1) t
t
n
T
U AU (;:)
~
n
(J (i:) D
n i= 1
shows that under the proper orthogonal transformation (rotation of axis) the quadratic form Q(x) = (Ax, x).\i relative to a basis fJB has a more natural basis where the principal axis theorem holds. The vector x in .Yt’ = !Rn has com› , sn)T relative ponents (t 1>’ , tn)T relative to basis fJB and components (s 1" to the basis of orthonormal eigenvectors of A. This topic is covered in most texts in linear algebra.
1.1
The Finite-Dimensional Case
11
To illustrate these ideas we start with a simple example. Let Q(x) - 36 = 5rI - 4t 1t 2 + 8d - 36 = describe a conic (more precisely an ellipse). We desire to show that there is a change of basis so that the above conic has the or si!3 2 + S~/22 = 1. We proceed more proper form 4si + 9s~ - 36 = as follows: A
( 5-2)
= -2
det(A -
8’
AI) = det
(5- -2) = A -2 8 - A
.1)(8 - A) -
(5 -
4
.1 2 - 13.1 + 40 - 4 = .1 2 - 13.1 + 36 = (A - 9)(.1 - 4). For the eigenvalue .1=4 we have Ax = 4x or t1 - 2t 2 = 0, -2t 1 - 4t 2 = 0, which yields any nonzero multiple of Xl = (2, I)T as an eigenvector. Similarly for .1=9 we obtain - 4t 1 - 2t 2 = 0, - 2t 1 - t2 = 0, or X2 = (-1,2) as an eigenvector. or
Setting U to be the matrix whose columns are unit eigenvectors we have UTAU = diag{4,9} since T
_
U AU -
=
(
5-2)8 (2/J"S -1/J"S) 1/J"S 2/J"S
2/J"S -1/J"S)( -1/J"S 2/J"S -2
8/J"S 4/J"S) (2/J"S ( - 8/J"S 18/J"S 1/J"S
-l/J"S)
2/J"S
(20/5 =
0) 45/5
The major idea is that a vector x whose coordinates (t1’ t 2)T relative to the usual standard basis {e1,e2} has a different representation (coordinates) (Sl,S2)T relative to a more natural basis {f1’/2} pictured below. That is, x = t 1i + t 2j = Sd1 + s212. Note that the rows (and columns) of U are orthonormal. In [R2, U is a rotation matrix with rows (cos e, - sin e) and (sin e, cos e), respectively. In this case cos e = 2/J"S, sin e = 1/J"S so that e is approximately 26.565.Thus the original axis are rotated counterclockwise approximately 26.565. Note that Q(x) is now 4si + 9s~. The value of Q(x) has not changed and neither has the physical location of x, only its repre› sentation (see Fig. 2a). Thus the point on the right semimajor axis with components (6/J5, 3/J"S) in the standard basis has components (3,0) relative to the {11’/2} basis. To check these assertions we note that
36 =
=
(~r (~ ~)(~)
(3)T( -1/J"S 2/J"S
1/J"S)( 5 -2)(2/J"S -1/J"S)(3) 2/J"S - 2 8 1/J"S 2/J"S
= (6/J"S) ( 5 -2)(6/J"S)
3/J"S
-2
8
3/J"S•
12
1 Introduction to Quadratic Forms and Differential Equations
\\ \0
<,
"
\(-3,Olf
......
\ I I
›--.
I I J
Fig.2 (a) Ellipse: (s1/3 2) + (S~/22) (sI/3 2) - (S~/(J6)2) = 1 with s = Sdl
= 1 with x = Sdl + sd2; e ~ + sd2; e ~ 26.565.
26.565. (b) Hyperbola:
Of secondary interest, but worth noting, is that if the basis is not changed, the operator U rotates the physical vector x. In this case there isa rotation of approximately 26.565clockwise. To anticipate and illustrate one of our fundamental ideas about the signature (number of negative eigenvalues) of a quadratic form we consider a second quadratic form, Q1(X) = -ti - 4t 1t2 + 2d (see Fig. 2b). The associated matrix is Al = A - 61 or A
=(-1 -2) - 2
2 .
Note that similar calculations yield the same rotation matrix U as above, with eigenvalues six units smaller, i.e., ttl = Al - 6 = 4 - 6 = - 2 and tt2 = A2 - 6 = 9 - 6 = 3 with the same corresponding eigenvectors. Thus AlX 1 = (A - 61)x1 = AX1- 6X1 = 4x1 - 6X1 = -2x1 and similarly A1X2 = 3X2’
1.1 The Finite-Dimensional Case
13
while UTA1U = UT(A - 61)U = UTAU - 6U TU = diag{4,9} - 61 = diag {- 2, 3}. Let us assume that the same physical vector x is on the conic section Ql(X) + 18 = (in this case a hyperbola), so - 2st + 3s~ + 18 = or (Sl/W - (sz/J6f = 1 and
(~y (-~
~)(~)
= -18.
The calculations are computed in the "rotated" setting since they are easier to carry out. To check we note that
Ql(X) =
-d -
4t 1tz +
2t~
=
-(~y
- 4(~)(~)
+ 2(~Y
= ~( - 36 - 72 + 18) = -18. That is (6/J5, 3/J5) in the standard basis or (3, O)I in the {/lJZ} basis is the same point (see Fig. 2a). The essential difference between these two quadratic forms is the,number of negative eigenvalues in the associated matrix. In the first case both eigen› values are positive and we have an ellipse (or a circle if the eigenvalues are equal). In the second case there is one negative eigenvalue and we obtain a hyperbola. The number of negative eigenvalues also equals the dimension of the negative space of Q(x), which we call the signature. Thus the signature of Q(x) on :Yl’ is the dimension of a maximal subspace $’ of :Yl’ such that x oF 0 in % implies Q(x) < O. This number is always well defined, i.e., if %1 and $’z are two maximal subspaces with this property, their dimensions are equal. The calculations are easier to see after our forms have been orthogonally diagonalized. Thus in the first case, it is seen that el = (1,O)T and ez = (O,I)T, relative to the standard basis, are both positive vectors of A = diag{4,9} since eIAel = 4 and e1Aez = 9. Now any nonzero vector y is expressable as U(Cl> CZ)T relative to the standard basis, and hence yTAy = (c., cz)UTAU(cl> CZ)T = 4cr + 9d > 0. The key point is that el and ez are diagonally ortho› gonal, that is, (1, O)D(O, W = if D is any diagonal matrix. Note that bilinear forms have the property that Q(CIX + czy) = c!Q(x) + 2CICZQ(X, y) + dQ(y), so that if Q(x,y) = 0, Q(x) > 0, Q(y) > 0, then Q(clX + czy) > 0 if + d oF 0. In our first case we can also argue that since the eigenvectors of A are A-orthogonal, i.e., (Axl>Xz) = 0, and since (AXl,Xl) = 4(x1>xd > and(Axz,xz) = 9(xz,xz) > 0, thenz = CIX + czyimplies(Az,z) = 4crllxllz + 9dllyl/z. Thus the signature of A or Q is 0. Similarly, in the second case of the hyperbola, the signature of Q1> Al or A, = diag{ -2,3} is 1. This follows since (I,O)T yields (l,0)A 1(1,0)T = -2<0, while (1,0) and (0,1) are AI-orthogonal and (0,I)A 1(1,0)T=3
cr
14
1 Introduction to Quadratic Forms and Differential Equations
yields (Ct>C2)TA1(Cl,C2) = -2et + 3cL which is negative if (Cl,C2) = (1,0). The same argument could be made with Al replacing Al since the corre› sponding eigenvectors of A1 are AI-orthogonal. We now begin our second topic in this section: necessary conditions for stationary values of a function f: [Rn ~ [RI. Of interest is the quadratic form involving the "second derivative" or Hessian matrix. Our remarks will be very heuristic, and are intended to illuminate further topics in this book. Many references contain a precise development; see, for example, Hestenes
[30]. Assume n = 1 and f: [RI
~
[RI has a Taylor series expansion such as
at t = to’ Thenf(t) having a local minimum at t = to, f(t o)::;; f(t) in a neigh› borhood of to, implies (i) f’(to) = 0 and (ii) f"(t o) ;::: O. The first necessary condition (i) follows since otherwise f(to) + f’(to)(t - to) is a nonhorizontal straight line, while f"(t o) < 0 leads to f(t) ~ f(t o) + tf"(to)(t - t o)2 < f(t o). Similarly under the same circumstances, f’(to) = 0 and f"(t o) ::;; 0 are neces› sary conditions that f(t) ::;; f(t o) in a neighborhood of t = to’ If f’(to) = and f"(t o) > 0, then there exists a neighborhood of t = to such that f(t) > f(to) with a similar sufficiency condition holding if f"(t o) < 0. If 17 > 1, the above ideas carry over, but we change notation for conve› nience. Thus if (3)
f(x
+ h) = f(x) + f’(x,h) + tf"(x, h) + ...
is the Taylor series expansion of f: [Rn
f’(x,h) = V’f(x). h =
[Rt, our notation is
~
a;
uti
(x)h;,
and f"(x, h)
= hT Ah = aijhih j.
Furthermore, repeated indices are summed, V’f(x) is the gradient of first partial derivatives evaluated at x, A = (aij) is the symmetric matrix of second partials o2flot;otj evaluated at x, called the Hessian, and f"(x,h) is a qua› dratic form. If f(x) = Q(x) = aijt;tj is quadratic, then f(x + h) = Q(x) + 2Q(x, h) + Q(h) is the Taylor series expansion. Thus Vf=2Ax,
Q(h)
= tf"(x, h),
1.1 The Finite-Dimensional Case
15
and
where indices i and j are summed. As in the case of n = 1, if f(x) is a local minimum at x, Le.,f(x) ~ f(x + h) for small Ihl, then f’(x) = "Vf(x) = 0 and A :2: O. This last symbolism means that hT Ah :2: 0 for all h in [Rn. By our above discussion, A is orthogonally diagonalizable to A = diag{ AI" .. ,An} and Ai :2: O. SimilarIy,f(x + h) ~ f(x) for small ls] implies that f’(x) = Vf(x) = 0 and A ~ 0 (hT Ah ~ 0 for all h in [Rn). Thus Ai ~ O. A point Xo at which Vf(xo) = 0 is called a critical point of f and f(xo) is called the critical value. An appropriate sufficiencytheorem holds at a critical point with strength› ened conditions such as A > 0 (hT Ah > 0 for all h in IRn, h =;6 0). For example, for n = 2, iff(x)satisfies (3), Xo satisfies the first necessary conditionj’(x.) = 0, and
(given above), then f(xo + h) = f(xo)
5-2)8 (hI) h > f(xo), 2
+ ( hhI) ( _ 2 2
since A > 0 because its eigenvalues are positive. Similarly, ifVfl(xo) = 0 and Al = A - 61 (given above) is the Hessian evaluated at xo, then j] has neither a local maximum nor minimum at xo’ More specifically, if fix) = x T AIx, then Vf2 = 2A Ix = 0 implies Xo = 0 (neither eigenvalue of At is zero, hence the null space of Al is {O}). Thus f(xo + h) = f(h) = hT Alh. Using the above ideas where the eigenvalues of Al are -2 and 3, if we move from Xo = 0 in the direction hI = (2/J5, l/J5)T of the first eigenvector, we have f(h l) = - 2 < 0 = f(O), while if we move in the direction h2 = ( -1/J5, 2/J5)T of the second eigenvector, we have f(h 2 ) = 3 > 0 = f(O). Thus Xo = 0 is neither a local maximum nor minimum point but a saddle point of fix). Our third topic in this section is the Rayleigh-Ritz procedure for deter› mining eigenvalues of a real symmetrix matrix A or a quadratic form Q(x). The complete ideas and results are elegantly done by Hestenes [30]. Much of this material carries over to the topic of self-adjoint, compact linear transformations on a Hilbert space. We begin by stating the theorems from Hestenes and then the motivation of the results. In particular, we show that a heuristic proof is easily obtained by orthogonally diagonalizing A or Q(x) and noting that the standard unit vectors are eigenvectors.
16
1 Introduction to Quadratic Forms and Differential Equations
The Rayleigh quotient of A is the function
R(x) = Q(x) = (Ax, x) (x, x) (x, x)
) (4
(x i= 0).
Thus R(x) is a mapping from ~n - {O} to ~1. Note that if a is any nonzero real number, then R(ax) = Q(ax)/l/axI/2 = a2Q(x)/(aZllxI/2) = R(x). Hence R is a homogeneousfunction of degree zero. Thus its value is unchanged on any ray through the origin and R(x) = R(y) where y = x/llxll and y lies on the unit circle C = {xIllxl/ = 1} in IRn C is compact in ~n, since it is a closed and bounded subset, and R(x) is continuous on C. Thus there exists unit vectors Xl and x, in C such that R(Xl) :::;; R(x) :::;; R(xn) for any x in C or X i= 0 in ~n. In the next three theorems we show that if ..1.1 :::;; Az :::;; ... :::;; An are the n eigenvalues of A, then Xl and Xn are eigenvectors corresponding to ..1.1 and An’ respectively. Furthermore the kth eigenvalue is the solution to a constrained problem of optimizing R(x), and the kth eigenvector gives the solution of this problem. Theorem 2 If A is a symmetric matrix, there exists an orthonormal set of eigenvectors {Xl’X 2, ... , x n} of A such that the corresponding eigenvalues {Al,Az,’ " ,An} satisfy Ak = R(Xk) and ..1.1:::;; Az :::;; ... :::;; An’ The vector Xl minimizes R(x) on IR n - {O} and the vector Xn maximizes R(x) on IRn - {O}. For each k such that 1 < k < n, the vector Xk minimizesR(x) on the set
I
86 k = {x E IR n (x, Xi) = 0; i
=
1,... ,k - I} - {O}
=
span{x k ,
,xn }
-
{O}.
Similarly x k maximizes R(x) on the set
0; i = k + 1,... ,n} - {O}
span Ix., ... ,xk} - {O}. Finally if f?,&k denotes a "subspace" of dimension k with x = 0 removed and f?,&k denotes the set of all such "subspaces" then Ak satisfies the (min-max or max-min) principle rri k = {x E
(5)
~nl(X,Xi)
=
Ak = I1!.in[max R(X)] = _max [ ~k
x in
§)k
~n
-
k
+1
=
x in
min §}11- k
+1
R(X)].
Equality (5) seems especially formidable. We note that f!J k is one of the sets f?,&n - k+ 1. Hence by the earlier part of the theorem, min xin~"-k+l
R(x):::;; min R(x) = Ab xin9lk
but since Ak is obtained, we may maximize both sides to obtain the max-min equality in (5). The min-max equality follows similarly or (as Hestenes suggests) by applying this argument to the eigenvalues of - A, which are -An:::;; -An-I:::;;"’:::;; -..1.1’
1.1
The Finite-Dimensional Case
17
Since our purpose is primarily to shed light, we remark that Theorem 2 (heuristically) follows easily by geometrical considerations if we have diago› nalizedA. That is, UTAU = A = diag{Ab’" ,An},where we assume (without ::; An’ The standard unit vectors loss of generality) that A1 ::; A2 ::; e1" .. .e., where e1 = (1,0,0, ... ,O)T, etc., are eigenvectors of A and Q(ek) = Ak. If x = (cb ... ,Cn)T is such that IIxl12 = CiCi = 1, then R(x) = AiCiCi has the smallest value when C 1 = 1 and Ck = (k = 2,... ,n). To see this we note that R(x) is a convex combination of the {A;} and 5 = {(A1 db’ .. ,Andn) Id, ~ 0, 1: di = I} is the "face" or the intersection of the n - 1 hyperplane determined by {A;} and the "positive octant" in n space. In the above cf = d, (i = 1, ... ,n). In fact, R(x) is the sum of the coordinates on 5. In linear programming terminology we wish to minimize f(t) = t 1 + l z + ... + tn, where t = t2 , ,tn)Tis in 5. Since 5 is a convex set [x and y in 5 implies sx + (1 - s)y in 5 for any s ::; 1] and f is linear, the minimum value of f on 5 exists and this value occurs at "corner" points of 5. An immediate calculation gives the desired result since these corner or extreme points have at most one nonzero value. Figure 3 illustrates these ideas with n = 3. We assume Ai> since our geometric ideas hold under the translation Si = IAnl + t i . In this case, min, R(x) = A1 as stated. 51 is the line segment connecting the points (0, A2 , 0) and (0,0, A3 ) , or equivalently, it is the subset of 5 with d 1 = 0. The respective "dimensions" of 5 and 51 are n - 1 = 2 and n - 2 = 1. Note that A2 = mins, R(x). In the general case similar argument holds for Ak (2 ::; k ::; n) by constructing a collection 5 = 50’ 5 b 52’ 53’ ... , S; -1, where each S, + 1 is the "positive" edge or face of 5 i .
«;
:;
Fig.3 Face 8 = {(d,A, + dzAz + d3A3)ld, ~ 0, d, + dz + d3 = I}; line segment 8, = {O, dzAz + d3A3)!dz ~ 0, s, ~ 0, d z + d 3 = I}. 8, and P’ give, respectively, the smallest and largest values of R(x) of the set 8 n D k
.R
..
18
1 Introduction to Quadratic Forms and Differential Equations
The min-max theorems follow by picture and our comments after Theo› rem 2 once again. Pick !!2 k in !!2 k Since R(x) is homogeneous of degree zero, we may search for optimal values of the sum of the component functions on S. The value max R(x), x in !!2\ is taken on at the intersection point P of an edge of Sand !!2 k since extremal values of a linear function cannot occur in the interior of Sl’ By the above argument, if P is not a corner or extreme point it is not optimal. The minimum of such values occurs when !!2 k = span{el"" ,e k } . Finally, the problems of finding optimal values of A and A are equivalent since U is one to one and preserves lengths. For example el = (1,0, ... ,O)T satisfies with y = U x, 1
A1
=
(
)
Ae1 e 1
,
(Ax,x)
(UAx, Ux) (AUx, Ux) = max -’------’x*o (x,x) x*o (x,x)
= max - - = max x*o (x, x)
(Ay,y)
(Ay, y)
= max T T = max -y*O (U y, U y) y*O (y, y)
since UT AU = A and U preserves distances. Thus Theorem 2 is a "rotation," or more correctly an isometry (distance preserving mapping), of an apparent geometrical picture. The remainder of the third topic of this section might have been placed before Theorem 2. We have included this material to show interesting calculations and concepts for quadratic forms. The following theorem is stated in Hestenes [30]’ Both the results and the computations involved in obtaining these results are of interest.
Theorem 3 The vector x is an eigenvectorof A if and only if it is a critical point of R(x). The eigenvalues of A are the corresponding critical values. Let Xo be a critical point of R(x) with A. = R(xo) = Q(xo}/JlxoW = (Axo, xo)/(xo,xo) the critical value. Then for any vector y and e> 0 and small we have 1 [Q(xo + ey) Q(xo)] -;; Ilxo + eyl12 -llx ol1 2
_ 1 IlxoI12[Q(xO) + 2eQ(xo, y) + Q(y)] - Q(xo)[llx oI1 2 + 2e(xo, y) + Ily112] - e Ilxo + eyl1211xol12
= 21IxoWQ(xo, y)- Q(xo)(xo,Y)+ I>llxoI12Q(y) - Q(xo)llyI12 Ilxo + l>yll211 xol1 Taking the limit as I>
-4
2
2
Ilxo + eyl1211 xoll
0 and noting that Q(xo) = Allxol12 we have
lim(l/e)[R(xo + sy) - R(xo)] = (2/llxoI12)[Q(xo, y) - A(Xo,y)]. 8->0
1.1
The Finite-Dimensional Case
19
This limit is zero for all y if and only if Q(xo, y) - (xo, y) = 0 or (Axo › AXo,y) = 0 for all y, i.e., Axo = Axo. Thus our result follows. Some comments are in order. From (3) we have with R replacing f and ey replacing h, R(x + ey) = R(x) + R’(x,ey) + tR"(x, sy) + .. ’. R’ is linear in its second argument and equal to VR . (sy) so that subtracting R(x) from both sides, dividing bye, and letting e --+ 0 we have R’(xo, y) = (2/1IxoI12)[Q(xo, y) - A(xo, y)]. Thus the gradient of R is VR(xo) = (21 IIxo1l2)[ Axo - R(xo)xo]. Locally at x = xo, R(x + ey) - R(x) is linear in y if Xo is not an eigenvector of A. IfXo is an eigenvector of A, then this expression is locally quadratic in y. It is illustrative to use elementary calculus to obtain the (first and) second directional derivatives of the Rayleigh quotient R(x) and the Taylor series expansion as in (3). This will enable us to derive independently a "stronger" result than in Theorems 2 and 3. The critical point Xk is an (I, m) saddle point of R(x) if there exists subspaces S I and S2 of IRn of dimension I and m, re› spectively, such that YI ¥ 0 in Sl and Y2 ¥ 0 in S2 imply that there exists b > 0 such that lei < b implies R(Xk+ eYI) < R(Xk) < R(Xk+ eY2). The above means that locally we may decrease the critical value Ak = R(Xk)by "moving" from x = Xk in an I-dimensional direction and increase R(Xk) by "moving" from x = X k in an m-dimensional direction. In Theorem 4 we show that we may choose Sl = spanjx,, ... ,XI} and S2 = span{xn-m+b’" .x.}. To continue Theorem 3 we have
Theorem 4 Let Al :s; A2 :s; ... < An be the eigenvalues of A with corres› ponding eigenvectors Xl’ X2’... .x;; respectively. Iffor some k (1 < k < n) we have Ai < Ak < An’then the critical point Xk is a saddle point of the Rayleigh quotient (neither a local maximum or minimum).More precisely, if AI < Ak < An- m + I ’ then Xk is an (I, m) saddle point. Finally Al and An are, respectively, the absolute minimumand maximum of R(x) on IR n - {OJ. Let h(e)=Q(x+ey), g(e)=llx+eYI12, and R(x+ey)=f(e)=h(B)lg(e). Now fee) = f(O) + e1’(O) + te 21"(0) + "’,where j’{s) = [g(e)h’(e) - h(e)g’(e)]lg2(e) and
1"(e)
= g2(e)[g’(e)h’(B) + g(e)h"(e) - h’(e)g’(e)- h(e)g"(e)] - { } g4(e)
We have not bothered to determine { } since it is zero when e = 0 (a critical point). Thus 1’(0) = [g(O)h’(O) - h(0)g’(0)]lg2(0) and 1"(0) = [g(O)h"(O) › h(0)g"(0)]lg 2(0). The first and second derivatives are found by the Taylor series expansion. Thus h(e) = Q(x + sy) = Q(x) + 2eQ(x, y) + e2Q(y) = h(O) + eh’(O) + te 2h"(0) so that h’(O) = 2Q(x,y) and h"(O) = 2Q(y); similarly (or replacing A by 1) we have g’(O) = 2(x, y) and g"(O) = 211x1I2. Thus the critical points of R(x) are
20
1 Introductioo to Quadratic Forms and Differential Equations
when 1’(0) = O. Letting x = Xo be a critical point with critical value Ao = R(xo) we obtain 0= g(O)h’(O) - h(O)g’(O) = 2I1xoUZQ(xo,Y)- Q(xo)(xo,Y) = 21!xolj2[Q(xo,Y) - Ao(xo,Y)]’Since Y is arbitrary, we obtain as in Theorem 3 that 1’(0) = or equivalently Axo = AOXO if and only if (Ao, xo) is a critical solution of the Rayleigh quotient. At a critical solution we have j(e) = j(O) + te 2f"(0) + ... so that (to second order) 2
R( Xo + ey)
~ R( ) + .1eZ (lIx oI1 (2Q (Y)) - Q(x o)(21! Y1I = Xo 2 IIx oll4
2 ))
.
Thus
R(xo + ey) ~ R(xo) + eZ(Q(y) - AollyllZ)/llxollz = R(xo) + eZ«A - Ao)y,y)/llxoI12 If(Ak> Xk) is an eigensolution of A with Al < Ak < An’ then Q1(Y) = Q(y) › Ak(y,y) satisfies Q1(X1) = Q(X1) - Akllxl112 = R(x 1)fjlx1 1Iz - Akll x l11 2 = (AI - Ak)lIx1l1Z < 0, while Q1(X1) = Q(xn) - Akllxnllz = (An - Ak)llxnl!z > 0. Hence for s small, R(xo + ex1) < R(xo) < R(xo + exn). This establishes the first assertion of the theorem. The next to last sentence of the theorem about (/, m) saddle points follows by direct computation and the fact that Q(xp , xq ) = if p =F q. Thus if y
= L~=
1
amxm, then I
Q1(Y) = Q(y) - Ak(Y’Y)=
L [a;’Q(x
m) -
A~;'llxmI12]
m=l I
= L a;’(Am- Ak)lIxmllZ <
0.
m=l
The last statement follows from advanced calculus ideas. The minimum value of R(x) on the unit disk C = {x IlIxll = 1} in ~n is obtained since C is compact and R(x) is continuous. The unit eigenvectors are the only critical points on C, therefore Al is the minimum value of R(x) on C. Similarly An is the maxi› mum value of R(x) on C. The last sentence now follows since R(x) is homo› geneous of degree zero. As an example let A = diag] -1, 1,2,2} with eigenvalues Al = -1, A2 = 1, ..1. 3 = 2, and A4 = 2. Let {e.} be the associated standard eigenvectors where e1 = (l,O,O,Ol, etc. Then e2 is a (1,2) saddle point since Al < Az < A3 and n - k + 1 = 4 - 2 + 1 = 3. If 8 1 = span le.} and 8 z = span{e3,e4}
1.1
The Finite-Dimensional Case
21
then Y1 # 0 in S 1 and Yz# 0 in Sz implies R(ez +e(aed)~ R(ez)+eza zR(e1)= z Z+c Z)2 l_eza and R(ez+e(be 1+ eez))=R(ez)+e(b so that R(ez+e(ae l ))< 1 = R(ez) < R(ez + etbe, + eez)). If the reader believes our example with a diagonal matrix is too special, since U T AU = A or A = UA U\ the reader may make up his own matrix A with "diagonal form" A. That is, for any orthogonal matrix U, form A = UAUT and Xk = Uei, the kth column of U. The purpose of our final topic in this section is to give an alternative definition of eigenvalues of a real symmetric matrix A or quadratic form Q(x) = (Ax, x). For many problems this definition is more practical than the usual definition of Ax = AX. Thus for numerical halving problems we think of a zero of a continuous real-value function f(t) as a value such that the product f(to+)f(to-) is negative. This is not an equivalent definition, but in the case of real symmetric matrices the definitions are equivalent. It also contains the Rayleigh quotient ideas but is easier to apply. Finally this definition involves the signature idea contained above. Let A be a real symmetric matrix or Q(x) = (Ax, x) be the associated quadratic form. Let seA) denote the signature of the quadratic form J(x; A) = Q(x) - Allxllz. That is, seA) is the dimension of a maximal subspace f(j of [Rn such that x # 0 in f(j implies J(x; A) < O. Note that AI < Az implies that
J(x;Az) - J(X;A1) = Q(x) - Az/lxllz - (Q(x) - AlllxllZ) = (AI - Az)llxllZ < 0 if x # O. Now J(x; Az)::;; J(x; AI) so that x # 0, J(x; AI) < 0 implies J(x; Az) < O. Thus sCAd ? S(-1’l)’ i.e., seA) is a nondecreasing, nonnegative, integer-valued func› tion. Assume as above that Al ::;; Az S ... S An and Xl> Xz, . . . , x, are, res› pectively, the n eigenvalues and eigenvectors of A or Q(x) = (Ax, x). If A* < Al = minR(x) for x # 0, then Q(x)/llxllz > Al or J(X;A*) = Q(x)› A*llxllz > (AI - A*)llxIIZ > O. That is, J(x; A*) is positive definite and seA*) = O. Similarly if A> An = maxR(x) for x # 0, then Q(x)/llxllz < An or J(X;A) = Q(x) - Allxllz < (An - A)IIxIIZ < O. Thus J(x;X) is negative definite and s(A) = n. The reader may verify as an exercise that the intermediate eigen› values between Al and An behave as we expect. Thus Theorem 5 seA) is a nonnegative, nondeereasing, integer-valued function of A. It is continuous from the left, i.e., s(Ao - 0) = s(Ao), and its jump at A = A.o is equal to the number ofeigenvalues equal to ..1.0’ i.e.,seA + 0) = s(Ao) + n(Ao), where n(Ao) is the number of eigenvalues equal to )’0’ Finally s(Ao) = I.l.<.<.o
n().).
In future work we shall refer to n(Ao) as the nullity of J(x; Ao). In this case it is the dimension of the null space JlI(Ao) = {x E [Rn/J(x, Y; Ao) = 0 for all Y in [Rn}. Note that %(A.) # {OJ for exactly n values of A, counting multiplicity. Thus this space is the span of the eigenvectors corresponding to
22
1 Introduction to Quadratic Forms and Differential Equations
eigenvalues that are equal to Asince x in JV(A) implies 0 = (Ax, y) - Ao(X, y) = «A - Ao1)x, y) for all yin IRn. For future work we note that m(A) = S(A) + n(A) is the dimension of a maximal subspace E of IRn for which J(x; A) sOon E. This nondecreasing integer-valued function is continuous from the right with m(Ao) - m(Ao - 0) = n(Ao). We also have an interesting comparison result. Note for x =f 0, J(X;A) = Q(x) - Allxll2 < implies Q(x) < Allxllz. Thus S(A) gives the dimension of the subspace for which Q(x) is less than Allxll2. This concept can be generalized if we replace IIxll2 by K(x) = (Bx, x), where B is symmetric. As an example to illustrate signature, if A = U AUT, where A = diag{ -1, -1,0,1.5, e, n, n,n}, then the graph of S(A) is given in Fig. 4. IfA < -1, then J(x; A)is positive definite. s(),.)
8 n(1T)=3
5 4 3 f--------O ----92
---_0
n(-I )=2
-I
1.5
e
IT
Fig.4 s(J.) versus J..
We may also think of eigenvalues as Lagrange multipliers. It is of interest to show that the eigenvalues can be determined solely by a process that maximizes quadratic forms over the unit sphere in IR n and not on proper subsets of IRn such as constraints L(x) = (x, Xi) = 0 as in the previous work. We assume that Ai S Az:s "’:S An are the eigenvalues of A or Q(x) with corresponding orthonormal vectors Xi’ x~, . . . ,xn , respectively. By trans› lation we may assume that Ai > 0, since A + (IAnl + 1)1 has the same eigen› vectors as A with eigenvalues translated by IAnl + 1. This assumption is not required in our proof by construction, but avoids some technical difficulties. We know that An = max Q(x) = Q(x n), where the maximum is taken over C = nlxll = 1}, the unit ball in IRn. Notice that K(x, y) = (x, xn)(y, x n) is a bilinear form in x and y and K(x) = K(x, x) = (x,xnf is quadratic. For practice we have K(x
+ sy) = =
=
(x + sy, Xn)Z = [(x, Xn) + e(y, Xn)]2 (x, Xn)Z + 2e(x, Xn)(y, Yn) + e2(y, Xn)Z K(x) + 2eK(x, y) + eZK(y).
1.1
The Finite-Dimensional Case
23
Let Ql(X) = Q(x) - An(X,xn)z. The eigenvalues of Ql(X) are At = 0 :s; Al s Az:S;"’::S; An- l with corresponding eigenvectors Xn,X1,Xz, ... ’Xn-l> since for i and j fixed we have Ql(Xj,X)=Q(X;,X)-Aix;,xn)(Xj,Xn)=(AiXi,Xj)› An(X;,Xn)(Xj’x n) = A;bij - Anbinbjn’This expression is zero ifi oF j orifi = j = n. If i = j oF n, then this expression is Ai’If A were diagonal, i.e., A = diag{.11’ Az,. . . ,An},then Q 1 would correspond to the real symmetric matrix B = diag{Al’ AZ,’.. ,An-l> O}. Now An-l = maxQl(x) = maxQ(x) - An(X,Xn)Z = Ql(Xn- 1 ) , where the maximum is over C and not some subset of C. Continuing in this way, if Qz(x) = Q(x) - An- 1(X, Xn_l)Z - An(X,xn)Z, then A3 = max Qz(x) = QZ(Xn- 2), where the maximum is over C. Finally note that we may decompose Q(x) into its finite "Fourier series" n
Q(X,y) =
L )•iX,Xk)(y,XJ. k= 1
Clearly at each step we have invoked a Lagrange-multiplier-type rule. This result is stronger than the min-max theory in that we maximize over all of C and do not restrict ourselves to certain subspaces. Our final effort is to show that the known eigensolution results follow from a Lagrange multiplier rule or Kuhn-Tucker theorem. Hestenes [30, p. 221] defines Xo to be a regular point of a set S in IR n if every solution h of the tangential constraints (see notation below) g’p(xo,h) :s; 0 for f3 = p + 1,... ,m, and g~(xo, h) :s; 0 for all a such that a is an active index; that is, gixo) = 0 is tangent to S at Xo’In our case this is equivalent to the fact that the normal space of Sat Xo is generated by the gradients Vg1(xo), . . . , Vgn(xo). The following theorem is given in Hestenes [30, p. 186].
Theorem 6 Suppose that the constraints
Xo
minimizesf(x) locally on the set S defined by
gix) ::s; 0 (a = 1,... ,p),
gp(X) = 0 (f3 = p
+ 1,... ,m).
If Xo is a regular point of S, then there exist multipliers AI,. . . ,Am such that A",
~
0 (a = 1,... ,p)
with
Ay = 0
if gy(xo) < 0
and such that (i) VF(xo) = 0 where F = f + AlgI + ... (ii) F(x) ::s; f(x) on S, and (iii) F(xo) = f(xo).
+ Amgm. In
addition,
If gl(X) = -1 + (X, x) and f(x) = -Q(x), the minimum of f(x) on the set Sl = {Xlgl(X):S; O} oF {O} is obtained at a point Xl on the boundary of Sl’ We have demonstrated this above. For Sl is compact in IR n, f(x) is continuous, and hence x I in S I satisfies f(xo) ::s; f(x) for all x in Sl’ Since f(ax 1 ) = a 2f(xl) > f(xd, Xl has norm one. Since the function gl(X) + 1 is
24
1 Introduction to Qnadratic Forms and Differential Equations
quadratic, g~(x!> h) = 2(x!> h) = 0 implies h is perpendicular to Xl> and hence Xl is a regular point of 8 1, Thus there exists Al such that F(x) = - Q(x) + A1[-1 + (x,x)J satisfies (i) 0 = VF(x l ) = -2Axl + 2A1(X1>xdxl, (ii) -Q(x) + A1[1 -llx ll1 2J $,; -Q(x) on 8 1 , and (iii) F(x o) = f(xo). The second and third conditions, which imply Al( -1 + Ilxln $,; 0 on 8 1 and A1(-1 + Ilx112) = 0, are "meaningless" since Al is known to be non› negative. We note that there is another argument that Al > 0 and hence IIxll1 2 = 1; otherwise c, = 0 must hold by Theorem 6. IfAl = 0, then ztx. = 0, 0= -(Ax1,Xl) $,; -(Ax,x) $,; 0 for X in 8 1 , and A> 0 implies Sl = {O}. Finally 0 = VF(xd gives the eigenvalue result AXI = AlXl’ Since f(x l ) = -Q(x 1) = -(AXl,X1)= -Al(X 1,Xl) = -AI’ -AI is the smallest value of -Q(x) on Sl, or equivalently, Al is the largest value of Q(x) on Sl’ In summary, if 8 1 ¥- {O} (n ~ 1), then there exists a positive eigenvalue Al of A and a normalized eigenvector Xl such that (Ax l , X 1 ) = Al = max(Ax,x) for X in Sl’ Let gz(x) = (x,xIf and 8 2 = {xlgl(x) $,; 0, g2(X) $,; O} ¥- {O}. 8 2 is closed and bounded, since gl(X) and g2(X) are continuous functions so that the minimum value of f(x) exists on 8 2 , say at a point X = X2’ By assumption g2(X2) $,; 0 so that X2 ¥- Xl’ By homogeneity IIx211 = 1 as before. Once again X2 is a regular point of S2 since g’l(X2,h)= 2(x 2,h) and g’z(x, y) = 2(X,X1)(Y’X l) or g’z(x2,h)= 2(X2,X1)(x1,h) implies that every solution to g’l(X2,h)= g’z(x2,h) = 0 is in the tangent plane at X = x 2. Thus there exist multipliers Xl ~ 0 and A2 ~ 0 such that F(x) = - Q(x) + A’l[ -1 + (x, x)] + A’z(X,X1)2 satisfies
(i) 0 = VF(x 2) = -2Ax2 + 2X1X2 + 2A’z(X2,X1)X1, X1[ -1 + (x, x)] + A’z(X1,X)2 $,; 0 on 8 2, and A~[ -1 + (X2,X2)] + ).’z(Xl,X2)2 = O.
(ii) (iii)
Lemma 7 If S2 -# {O} (n ~ 2), then there exists a multiplier Xl such that AX2= X1X2, IIx211 = 1,0 <),,~ $,; A1> and Xl = Q(Xl) = max Q(x) for X in 8 2, We now prove this result directly from Theorem 6 without reference to homogeneity of Q(x). Regularity of X2 in 8 2 is not a problem. If IIx211 < 1. then x 2 is an interior point of 8 2; otherwise if IIx211 = 1, every solution of the constrained derivative equations is a tangent vector of S2 at X2’ We begin by noting that (X!>X2) = 0 implies from (iii) that A~[ -1 + IIx2WJ = If A’l = 0, then from (i) we have AX2 = 0 and 0 = - (AX2,x 2) = f(x 2) $,; f(x) = -(Ax,x) on S2 or (Ax, x) $,; 0 on S2, which implies S2 = {O}. Thus Xl > 0 and IIx2112 = 1.
o.
1.2 The Calculus of Variations
25
Since Xl > 0 and IIx211 = 1, we have by (i) again that Axz = A~xz, A’l = X1(XZ, XZ) = (Axz,xz) and -A’l = f(xz)::; f(x) = -(Ax,x) for x in Sz, and hence A’l = max(Ax,x) for x in S zContinuation of the above arguments with Az = Xl above leads to Theorem 8 If m s; n, there exists a sequence of positive multipliers ::2: Am and corresponding orthonormal vectors Xl’ xz, ... ’Xm such that AXk= Axkand Ak = Q(Xk)’If
Al ::2: Az ::2:
gl(X) = [ -1 + IIxIIZ], gz(x) = (x,xl)Z, .. . , gm(x) = (x,xm-If and Sm = {x E ~nlgk(X)::; 0; k = 1,2, ... , m}, then Am and X mare the solution to the problem ofminimizing f(x) = -Q(x) on Sm’
1.2
The Calculus of Variations
The purpose of this section is to give the main ideas for the fixed point problem in the calculus of variations. This topic has been covered so well by so many fine authors that it is sufficient to give a brief summary. Excellent references are Bliss [3], Gelfand and Fomin [10], Hestenes [29], Morse [40], and Sagan [46]. Much of the material of this section may be found in Hestenes [29, pp. 57-72]. We are especially interested in second-order conditions or when the original functional is a quadratic form. Of great importance to us is that even when the original problem is not quadratic, its second variation is a quadratic form and must satisfy a stationary condition. This second-order necessary condition is called the Jacobi condition. Strengthened Jacobi conditions also playa large role in sufficiency conditions [29]. Let [a, b] be a fixed interval and let ~ denote the class of all continuous arcs x(t) defined on [a, b] having piecewise continuous derivatives on [a, b]. Hestenes considers x(t) = (x 1(t), XZ(t), . . . ,xn(t)), but we assume that n = 1 for simplicity of exposition; Let ~o denote the linear space of all functions x in ~ such that x(a) = x(b) = 0, f!Jt be a region in ~3, and .91 be the class of x in ~ such that (t, x(t), x(t)) are in f!Jt. Let f: ~3 ~ ~I be C Z on fJJl and let
(1)
I(x)
=
Lb f(t,x(t),x(t))dt.
The basic problem is to find necessary conditions on an arc Xo in .91, where X o minimizes I(x) in the class fJ o = {x in dlx(a) = xo(a) and x(b) = xo(b)}. The major (necessary) results are stated in the following theorem and corol› laries. We like and shall follow Hestenes’s style of stating this theorem and several corollaries in a concise manner. We shall give comments and examples
26
1 Introduction to Quadratic Forms and Differential Equations
that we hope will make the basic ideas clear. We use x(t) as the derivative of x(t) in this section. Theorem 1 Suppose that constant c such that
Xo
minimizes I(x) in f!4 0 Then there exists a
(2)
holds along xo(t). Furthermore the inequality (3)
E(t, x, X, u) = f(t, x, u) - f(t, x, x) - (u - x)fx(t, x, x) ;:::
holds for all (r, x, x, u) such that (t, x, x) is on xo(t) and (t, x, u) is in f!ll. Finally there exists a constant d such that xo(t) satisfies
.
(4)
f - xfx =
Jar it ds + d. t
Equation (2) is the integral form of the Euler-Lagrange equation, in› equality (3) is called the Weierstrass condition, and E is the Weierstrass excess function. The statement "holds along xo(t)" means that (2a)
while the statements "on xo(t)" and "xo(t) satisfies" are similarly interpreted. If(2) can be differentiated at a point to, we have xo(t) satisfying (2’)
d
dtfx - I,
in a neighborhood of to’
=0
It is not necessary for
x~(to) to exist for (2’)to hold at to. If xo(t) has a con› tinuous second derivative on a subinterval, the identity
:t (f - xix) -
it + x(:t fx -
fx) = 0,
which can be verified by differentiation, yields an alternate form of the Euler-Lagrange equation and hence (4), which is more useful for some problems. At a corner point t of x(t) there are two elements, (t, x(t), x(t - 0)) and (t, x(t), x(t + 0)). We have as a corollary to Theorem 1, Theorem 2 If xo(t) satisfies Theorem 1, the functions w 1(t) = f(t) › x(t)fx(t) and wz(t) = fit) are continuous along xo(t). At a corner point to of xo(t) we have E(t o, xo(t), xo(to - 0), xo(to
+ 0)) = E(t, xo(t), xo(to + 0), xo(to -
0)
At each element (t, x, x) of an arc xo(t) that satisfies (3) we have fxx ~
= O. o.
1.2 The Calculus of Variations
27
The result of the first sentence is called the Weierstrass-Erdmann corner condition. The condition fxx ~ 0 is called the condition of Legendre. For maximum problems the same results hold except that E ~ 0 in (3) and fix ~ O. If x(t) = (x’(r),. . . ,xn(t» with n > 1, the immediate corresponding results hold with (fXiXJ), a nonnegative matrix at each element. A solution xo(t) of (2) is called an extremaloid, or an extremal if x(t) is continuous. An admissible arc is nonsinqular if fix i= 0 on xo(t).We state one further corollary on the smoothness of solutions of (2). Theorem 3 Iff is of class c» (m ~ 2) on [J£ and xo(t) is a nonsinqular extremal, then xo(t) is of class c(m). More generally, a nonsinqular minimizing arc xo(t) is of class c(m) between corners. We now derive the first and second variations of I(x) along an arc x. This will parallel the development of directional derivatives of I at X o in the direction y of Section 1.1. Assume for b > 0 the function (5)
F(8) = I(x
+ 8Y) =
f f(t, x(t) + 8y(t),x(t) + 8y(t»dt
is defined for 181 < b, x(t) + 8y(t) is admissible for 161 < b, and F(6) in C2 for 161 < b. The derivative F(O) is the first variation of I at x and F"(O) is the second variation of I at x. A straightforward calculation
lib lib + -lib lib
-1 [F(6) - F(O)] = 6
6
= -
6
a
a
6
~
-
6
a
[f(t, x + 8y, X + 6Y) - f(t, x, x)] dt
+ 6Y, X + 8Y) -
[f(t, x
a
[f(t, x
fit, x
+ 8Y, x) -
f(t, x
+ 6Y, x)] dt
f(t,x,x)] dt
lib
+ sy, x)[6y(t)] dt + -6
a
fAt, x, x)[6X(t)] dt
yields, by letting 6 go to zero, that (6)
F’(O) =
fab (fxY + J;,y) dt.
The "~" signs take the place of a mean-value theorem and the usual argu› ments hold. Similarly, taking the limit as 6 goes to zero of [F’(8) - F’(O)]j6, we have (7)
28
1 Introduction to Quadratic Forms and Differential Equations
Note that F’(O) = I’(x,y) is a linear function of yet), and F"(O) = I"(x, y) is quadratic in yet). Furthermore, there is a Taylor series expansion I(x + BY) = I(x) + BI’(X,y) + !B2I"(x, y) + ... (8) of I at x in the direction y. The various derivative functions F x, F y , F xx, etc., are evaluated at (t, x(t), x(t)). Finally, I"(x, z, y) is the bilinear form such that I"(x, y, y) = I"(x, y). These ideas are also found in Hestenes [29, pp. 68-69]. Intuitively they follow as in Section 1.1. The last statement of Theorem 4 follows from integration by parts. Theorem 4 If an admissible arc xo(t) minimizesI(x) in the class PA o , then I’(xo, y) = 0 and I’(xo, y) ~ 0 for all arcs y in The relationship I’(xo, y) = 0 holds on if and only if (2) holds.
eo.
eo
To illustrate these theorems we consider an example. Let f(t, x, x) = (1 + X2(t))1 /2. Then xo(t) is the minimal solution to (1) if xo(t) = at + b since I(x) is the arc length from t = a to t = b. If a = 1, b = 2, x(l) = 1, x(2) = 3, then the line segment is xo(t) = 2t - 1 and I(xo) = Returning to Theorem 1 we have from (2), fx(xo) == 0 and
J5.
fx(xo) =
1
2xo(t)
2 [1 + xMt)J1/2
so that (2) implies fx(xo) = c, xo(t) = d = [x(2) - x(I)]/(2 - 1) = 2 by the mean-value theorem, and finally xo(t) = 2t - 1. The same result is obtained from (4) since d = f - xfx = (1 + X2)1 /2 - x 2/ (1 + X)1 /2 = 1/(1 + X2)1 /2 im› plies as before x~(t) = 2. For the Weierstrass E function we have from (3) with xo(t) = 2t - 1, g(u) = .J1+22 - (u - 2)21J"S. We claim this is nonnegative for all u. Since g(u) = ul(1 + U 2)1 /2 - 21J"S and
JT+U2 -
g"(u) = [(1
+ U2)1/2 -
u((l
+ :2)1 /2)
J/(1 + u2)
= 1 : u2 > 0,
we have only one local minimum (a global minimum) and no local maxi› mums. Since g(2) = 0, g(u) ~ g(2) = O. Note also that
x 1/2 )x = 1 +1 x2 = 1 +1 22> 0 ( + x(t)]
fxx = [1
along xtlt), so that the condition of Legendre is satisfied. For a second example let fl(x) = x 2(t) - x 2(t). Note that the associated integral is quadratic so that if
Il(x) = Q(x) =
f [x 2(t) - x 2(t)Jdt,
1.2 The Calculus of Variations
29
we have the Taylor series expansion F(e) = Q(x + ey) = Q(x) + 2eQ(x, y) + Q(y), where Q(x, y) is the associated bilinear form Q(x, y) =
f [x(t)y(t) -
x(t)y(t)] dt,
F’(O) = I’(x, y) = 2Q(x, y), and F"(O) = I"(x, y) = 2Q(y). The Euler-Lagrange equation in derivative form is d dt f x - fx = (2x)
+ 2x = 2[x + x] = 0
with solutions x(t) = A sin t + B cos t. Condition (3) becomes E = (u - xf so that the Weierstrass condition is satisfied. Let a = O. We shall show below that if 0 < b :::; 1C, then there exists a minimizing solution to 11(X) with minimum value 11(0) = O. Ifb > 1C, there is no minimizing solution. An immediate calculation shows that if 1C < C < 21C and c < b, Xl(t) = sint on [0,c/2], x 1(t) = sin(c - t) on [c/2,c], and Xl(t) == 0 on [c,b], then 11(X1(t)) = sine < 0 = 11(XO) if xo(t) == 0 on [O,b]. Thus the above conditions are not sufficient to insure a minimum. We need a second› order condition called the Jacobi condition. The Jacobi condition comes into play in two ways. In a general problem where lex) is not quadratic, we still need to satisfy the condition that the second variation satisfies I"(xo, y) ~ O. Since I"(xo,O) = 0 as I"(x, y) is qua› = y(b) = O}, this dratic in y and y == 0 is in the class ~o = {y in ~Iy(a) second-order condition of Theorem 4 requires that we find the minimum of an accessory problem, which is that yet) == yields the minimum value of I"(xo, y) for y in ~o. Ifthe original f was quadratic, the integral Q(x) = lex) satisfies a Taylor series expansion Q(x o + ey) = Q(x o) + eQ(xo, y) + e2Q(y) = Q(xo) + e2Q(y) since Q(xo, y) = 0 along extremal solutions if y is in ~o. Thus we once again come to the question of whether Q( y) ~ 0 for y in ~o. A proper development of these topics usually include deeper concepts such as field theory [29]. However, we shall avoid these concepts by our use of signature theory in Chapter 3 and in later chapters. Thus we finish this section by stating the main definitions and theorems for the accessory minimum problem. We have seen that I"(xo, y) ~ 0 on ~o for a minimizing arc xo(t) of lex). This is equivalent to the fact y == 0 minimizes I"(xo,Y) on ~o. The problem of minimizing I"(xo, y) on ~o will be called the accessory minimum problem. Following Hestenes we assume that the minimizing arcxo(t) has no corners and is nonsinqular, that is /xx =1= 0 evaluated at all points (t, xo(t),xo(t)) for t in [a, b]. Furthermore, we now assume n ~ 1 so that x(t) = (x 1(t), x 2(t), ... ,xn(t)), where (t, x(t), x(t)) is a point in 1R 2 n + 1 We denote I"(xo,y)
30
1 Introduction to Quadratic Forms and Differential Equations
by J(y) where (9a) (9b)
J(y) =
S: 2w(t, y(t), y(t)) dt,
2w = fxixilyi + 2fxixilsj + fXixiyiy,
and
(9c)
J(y, z) = =
b
Sa
[WyiZi(t) + wydi(t)] dt
f {[Ixixiyi + fX’xiyJ]Zi(t)+ [Ixixiyi + fxixiy]ii}dt,
where repeated indices are summed and the coefficients functions fxixi, fxixi, fxixi are evaluated at the arguments (t,xo(t),xo(t)) for t in [a,b]. The Euler-Lagrange equation for (9) becomes the vector equation, (10)
d dt Wyi = Wy"
Solutions of (10) are called accessory extremal equations and form a 2n› dimensional solution space so that they are uniquely determined by the values y(t o) and Y(to), where to is in [a, b]. In particular, if y(t o) = y(t o) = 0, then y(t) == 0, a ~ t ~ b. The same result holds if y(t) = on an infinite subset of [a, b]. For the I l(X)example, we have n = 1, 2w = x 2(t) - x 2(t), co; = 2x’(t),Wxx= 2, WXx = Wxx = 0, ca; = -2x(t), Wxx = -2, WyiZi(t) + wytii(t) = [ -2y(t)]z(t) + [2y(t)]i(t) so that Q(y) = Q(y, y), where
Q(y) = 2 Lb [y2(t) - y2(t)]dt =
Sab 2w(t, y(t), y(t)) dt.
The Euler-Lagrange equation (10) becomes x + x = O. Since n = 1, there is a two-dimensional solution family, which we denote by x(t) = A sin t + B cos t. If x(to) = x(t o) = 0 for to in [a, b], then 0 = A sin to + B cos to = A cos to - B sin to’ Thus A = and B = 0, which means that xo(t) == 0. A point t = c on (a, b] is a conjugate point to t = a on xo(t) if there exists an accessory extremal y(t) such that y(a) = y(c) = 0 with y(t) ¥= 0 on [a,c]. In the last example if a = then an accessory extremal y(t) with y(a) = 0, satisfies y(t) = A sin t. Thus t = n is a conjugate point to t = a if b ~ n. If b < tt, there are no conjugate points to t = a. Hence if b > n, we see by Theorem 5 that there is no minimum solution for I 1(x), since there exists a conjugate point c = tt. We have shown this result independently above by a Taylor series expansion argument. Theorem 5 (Jacobi Condition) If a nonsingular admissible arc without corners xo(t) minimizes I(x) in ~o, then there is no point t = con Xo conjugate to t = a on (a,b).
1.3 Fundamental Lemmas (Integration by Parts)
1.3
31
Fundamental Lemmas (Integration by Parts)
The purpose of this section is to show that the fundamental tool which connects quadratic forms and differential equations can be put on a sound mathematical basis. This tool is usually referred to as fundamental lemmas in the calculus of variations, integration by parts in differential equations, or divergence theorems in classical mathematics. Similar tools carryover to partial differential equations except that more technicality is required. To this end Professor Magnus Hestenes has graciously allowed us to use some of his unpublished work. Since the author feels this work is essentially com› plete and difficult to improve upon, we present Professor Hestenes work verbatim, except where the phrase "we note" is added or minor editorial comment is required. This section may be postponed until the material in Chapter 3 has become more familiar. We begin by giving a multiplier rule found in [29]. Thus
Lemma 1 Let L, L t , ... , Lm be m + 1 linear functionals on d such that L(y) = 0 for all y such that Liy) = O. There exist multipliers at, . . . .a.; such that L(x) = a(%Lix) for all x in d. These multipliers are unique if and only if L t , ... , L m are linearly independent on d. In the present section we shall be concerned primarily with the class d of absolutely continuous functions x: [a, b] --+ IR t , whose derivative x(t) is piecewise continuous on [a, b]. In this event a ~ t ~ b can be subdivided into a finite number of subintervals on each of which x(t) is continuous. At each point of discontinuity of x(t) the left- and right-hand limits x(t - 0) and and x(t + 0) exist. The class d is a vector space. Ifc is a fixed point on [a, b], then L(x) = x(c) is a linear functional on d. A general linear functional on d is given by the formula L(x) =
(1)
fab [X(t) dF(t) + x(t)N(t)] dt,
where N is integrable on [a, b] and F is of bounded variation on [a, b]. The following lemma is fundamental in our analysis.
Lemma 2 Let!!J be the class of all arcs x in d having x(a)
=
x(b) = O.
The relation
(2)
L(x) =
f [x(t) dF + x(t)N(t) dt] = 0
holds for all x in!!J if and only if there is a constant c such that N(t) = F(t) + c almost everywhere on a ~ t ~ b. The relation (2) holds on !!J if and only if it holds for all arcs x in!!J possessing derivatives of all orders on a ~ t ~ b.
32
1 Introduction to Quadratic Forms and Differential Equations
In order to prove Lemma 2 observe first that if (3)
G(t)
= F(t) + c,
where c is a constant, then J~ x(t) dF = J~ x(t) dG. Using this relation together with the identity f:[x(t)dG
+ x(t)G(t)dt]
= G(b)x(b) - G(a)x(a)
it is seen that the linear functional (1) is expressible in the form (4)
L(x) = G(b)x(b) - G(a)x(a) +
Lb [N(t) -
G(t)]x(t) dt
ond. Suppose that L(x) = 0 on the subclass f!IJ of d having L 1(x) = x(a) = 0, L 2(x) = x(b) = O. By virtue of Lemma 1 there exist constants rx, 13 such that L(x) = j3x(b) - rxx(a) on d. Ifwe choose the constant c in (3) so that G(b) = 13, then, by virtue of (4), we have (5)
s:
[N(t) - G(t)]x(t) dt = [G(a) - rx ]x(a)
on d. Selecting x in d such that x(a) = 0, x(t) = 1 (a ::; t ::; s), and x(t) = 0 (s < t ::; b), it is seen that
(6)
f:CN(t) - G(t)] dt = 0
for all s on [a, b]. It follows that N(t) = G(t) almost everywhere on a ~ t ~ b, as was to be proved. Conversely, if N(t) = G(t) = F(t) + c almost everywhere on [a, b], then, by virtue of (4), L(x) = G(b)x(b) - G(a)x(a) on d. Hence L(x) = 0 on f!IJ, as was to be proved. It remains to prove the last statement in the lemma. To this end let ~(oo) be the class of all arcs x in d possessing derivatives of all orders. IfL(x) = 0 on f!IJ, then L(x) = 0 on f!IJ n ~(oo). In order to prove the converse suppose that L(x) = 0 on f!IJ n ~(oo). Then by the argument given above we can select a constant c such that (5) holds on ~(oo) with G = F + c. The function
p (t ) = exp (
exP( -lit)) t _ 1
(0
1)
is of class Coo and is bounded on 0 < t < 1. Choose s on a < t < band h such that 0 < h ~ b - s. The arc having x(a) = 0 and x(t) = 1 (0 ~ t ~ s), x(t) = p((t - s)lh) (s < t < s + h), x(t) = 0 (s + h ~ t ~ b) is in ~(oo). For this + J~+h P(t)p((t - s)/h) arc (5) takes the form, with P = N - G, 0 = J~P(t)dt dt. Setting t = s + Bn, we have dt = h di), and the last term takes the form h g P(t + Oh)p(O) di). This term tends to zero with h. It follows that (6) holds
1.3 Fundamental Lemmas (Integration by Parts)
33
on a s; s < b. Consequently N(t) = G(t) almost everywhere on [a, b]. We have accordingly L(x) = 0 on &6, as was to be proved. As an immediate consequence of this result we have Lemma 3 Let M(t) and N(t) be integrable real-valued functions on [a, b]. Set
(7)
F(t)
=
f: M(s)ds.
The relation L(x) =
(8) holds on
(9)
ffJ
if and
f:
[M(t)x(t)
+ N(t)x(t)] dt = 0
only if there is a constant c such that N(t) = F(t)
+c
almost everywhere on [a, b]. If the relation (8) holds on ffJ II qjoo, then it holds on ffJ. Finally, if N(t) is assumed to be piecewise continuous on [a, b], then (8) holds on ffJ if and only if (9) holds everywhere on [a, b].
As an immediate corollary we have Corollary 4 If M(t) is integrable on a ::::; t ::::; b and J~ M(t)x(t) dt = 0 for all x in &6, then M(t) = 0 almost everywhere on a ::::; t ::::; b. Again the arcs in ffJ can be restricted to be of class e oo We now extend Lemma 3 to results which are applicable to linear, self› adjoint, second-order differential equations. To this end we introduce the following terminology. A function x(t) (a ::::; t ::::; b) will be said to be of class c» (k ~ 0) if it is continuous and possesses continuous derivatives of the first k orders on a ::::; t ::;; b. It will be said to be of class D(k) (k ~ 1) if it is of class e(k-I) and possesses a piecewise continuous kth derivative on a ::;; t ::;; b. It will be said to be of class B(k) (k ~ 1) if it is of class e(k-I) and its (k - l)st derivatives satisfy a Lipschitz condition on a ::;; t ::;; b. In the definitions just given we tacitly assume that k ~ 1. A function x(t) (a ::;; t s; b) is of class e(O) if it is continuous on a ::;; t ::;; b, is of class 1)(0) if it is piecewise continuous on a ::::; t ::;; b, and is of class B(O) if it is a bounded integrable function on a s; t s; b. The set of functions of class s», e(k), and D(k) will be denoted, respectively, by ffJ\ qjk, E0 k. Clearly, ffJk :::> E0 k :::> qjk. A function is said to be of class e(oo) if it is of class e(k) for every integer k. The class of functions of class e(oo) will be denoted by C". The classes ffJOO and E0 00 can be defined similarly. We have ffJoo = qjoo = E0 OO By the classes ffJkO, qjkO, E0 kO (k ~ 1) will be meant the classes of functions x in ffJk, e: E0\ respectively, having x(a)(a) = x(a)(b) = 0 (a = 0,1, ... ,k - 1).
34
1 Introduction to Quadratic Forms and Differential Equations
Let M o(t), ... , M k(t) be k + 1 integrable functions on a S t ao, ar. ... ,ak-t, b o, b t, ... ,bk- t be 2k constants. Set
(10)
s
b. Let
L(x) = a"x(")(a) - b..x(")(b) + f: Mp(t)x(f3)(t)dt,
where 0( is summed over the range 0,1, ... ,k - 1, f3 is summed over the range 0, 1, ... ,k, and x(f3)(t) = d f3x(t)/dtf3. The linear functional L is well defined on the subspaces f!g"’, !(i’’’’,f0"’, where m ;;:: k. We have the following: Lemma 5 Let Co,
C t, ... ,
Ck _ t be constants and set
= f:Mo(s)ds+c o,
(lla)
Po(t)
(llb)
Pt(t) = f:CMt(s)
(llc)
Pit)
+ (s -
+ coCa -
t) + Ct,
ft (s - t)"-Y (a - t)"-Y = Ja Mis) (0( _ y)! ds + Cy (0( _ y)! ’
where 0( < k and y is summed from expressible in the form (12)
t)Mo(s)]ds
to 0(. The linear functional L(x) in (10) is
L(x) = [a" - P,,(a)]x(")(a) - [b" - P,,(b)]x(")(b)
+
f[Mk(t) - P k_ 1(t)]X(k)(t)dt
on the classes gum, !{i’m, f0m (m ;;:: k). If c" = a" (0( = 0,1, ... ,k - 1), then on these classes we have (13)
L(x) = [Pib) - b,,]x(")(b) + f:[Mk(t) - P k_ 1(t)]X(k)(t)dt.
We note that several comments may be in order. The first is that L(x) in (10) is (usually in our context) a bilinear form J(x, y) with y fixed, and that we wish to find yet) so that J(x, y) = on a class offunctions x(t) with certain boundary conditions. Thus we desire to represent L(x) in a more convenient form as in (12) where we can more easily see the necessary conditions on yet). Thus for example, if k = 1 then (10) is of the form
= J(x,y) = [Ay(a) + By(b)]x(a) - [By(a) + Cy(b)]x(b) +Lb {[r(t)y’(t) + q(t)y(t)]x’(t) + [q(t)y’(t) + p(t)y(t)]x(t)} dt, where a o = Ay(a) + By(b), b o = By(a) + Cy(b), M o(t) = q(t)y’(t) + p(t)y(t), and M t(t) = r(t)y’(t) + q(t)y(t). Integration by parts of the second term of Lt(x)
the integral with u(t) = x(t), dv = M oCt) dt, du = x’(t)dt, vet) = P oCt) leads to L 1(x) = aox(a) - box(b) + Po(t)x(t)
which is (12) when k = 1.
I: + f[M
1(t)
- Po(t)]x’(t)dt,
1.3 Fundamental Lemmas (Integration by Parts)
35
We note similarly when k = 2 (leaving boundary terms to the reader), L 2 (x) = J(x, y) =
Lb [Mit)x"(t) + M l(t)X’(t)+ M o(t)x(t)] dt
= Po(t)x(t)I: + J: ([M1(t) - Po(t)]x’(t) + M 2(t)x"(t)}dt = Po(t)x(t)
I:+ P1(t)x’(t)I: + J:[M
2(t)
- P1(t)]x"(t)dt,
where, as above, M 2(t), M l(t), and M o(t) are linear in y"(t), y’(t), and y(t), respectively, so that J(x, y) = J(y, x). The third equal sign holds as when k = 1 for L1(x) by integration by parts. The fourth equal sign holds by integration by parts in agreement with (lib). Observe that by (Ll) we have P~(t) = M o(t) and P~(t) = Mit) - P~-l(t) since
,
it
Pit) = M~(t)
(s -
(a -
t)~-Y-l
- Ja Mis) (oc _ y _ I)! ds -
C
t)~-Y
y (oc _ y)!
(0::;; y ::;; o: - 1)
almost everywhere on a ::;; t ::;; b. From these relations we see that (14)
where o: is summed from 0 to k - 1. Using this formula in (10), we obtain the formula (12) for L. Since Pia) = C~, the formula (13) follows from (12) when c~
=
a~.
We note that the comment "we see" may require some calculation. Thus to obtain (14) we have, where the appropriate quantities exist [P o(t)x(t)]’ = Po(t)x(t) + P o(t)x’(t)= M o(t)x(t) + P o(t)x’(t), [P1(t)x(l)(t)]’ = P1(t)X(H ll(t) + [M1(t) - P1_1(t)]X1(t),
for 1 ::;; 1::;; k - 1 (l not summed) by the last paragraph, and k-l
L [P,(t)x(l)(t)]’ = M o(t)x(t) + Po(t)x’(t)
’=0
k-l
+ L {P,(t)X(l+l)(t) + [M,(t) -
,=
P’_l(t)]X(l)(t)}
1
k-l =
L M1(t)x(l)(t) + P o(t)x’(t) -
P o(t)x’(t)
1=0
k-2
+ L [P1-1(t) 1=2
P1_1(t)]x(l)(t)
+ Pk_1(t)X(k)(t)
36
1 Introduction to Quadratic Forms and Differential Equations
so that (14) holds. To obtain (12) we have using (10) and (14),
f: M p(t)x(P)(t) dt
L(x) = a~x(~)(a)
- b~x(~)(b)
+
a~x(~)(a)
- b~x(~)(b)
+ lb Mk(t)x(k)(t)dt
=
=
f: [Pk_l(t)X(k)(t)] dt
+ lb [Pit)x(~)(t)]'
dt -
a~x(~)(a)
+ Pit)x(~l(t)l:
- b~x(~)(b)
b +Jai [ M k(t)-Pk- 1(t)] x (k) (t)dt.
Lemma 6 Let d be one of the classes f!4m, ~m, f0m (1 :s;; k :s;; m :s;; co) and = x(~)(b) = 0(0: = 0, let g be the set of all functions x in d having x(~)(a) 1,... ,k - 1). If L(x) = on g, there exist constants d~, ea (0: = 0, 1,... ,k - 1) such that
L(x) =
(15)
d~x(~)(a)
-
on d. The relation L(x) = on g holds 1, ... ,k - 1) can be chosen so that (16)
Mk(t) =
it (s Ja Mis) (k _
eax(~)(b)
if and
only (a -
t)k-~-l
if constants
c~ (0: =
0,
t)k-~-l
0: _ 1)! ds + c, (k - 0: - I)!
holds almost everywhere on a :s;; t :s;; b. The relation (15) holds on d if and only = a; - d~ holds on d and P~(b) = b~ - e~, where P~ is defined by (11).
if (16) with c~
Suppose that L(x) = linear constraints
on g. Then L(x) =
on d (0:
subject to the 2k
= 0,1, ... ,k - 1).
By virtue of Lemma 1 there exist constants d~, e~ such that (15) holds. on g. Select c~ = a~ - d~ and let Conversely, if (15) holds, then L(x) = P~ be defined by the formulas (11). Then, by (12), (13), and (14), we have
(17)
[Pa(b) - b~
+ e~]x(a)(b)
+ Lb[Mit) -
P k_ 1(t)]X(k)(t)dt =
for all x in d. Let f(t) be a function of class ~oo on a :s;; t :s;; b. The function x defined on a:S;; t :s;; b by the relations X(k)(t) = f(t), x(~)(b) = 0(0: = 0, 1, ... ,k - 1) is in .91. For this function x the relation (17) takes the form S~[Mk(t) - P k- 1(t)]f(t)dt = 0. By Corollary 4 this relation holds for an arbitrary function f(t) of class ~oo if and only if Mk(t) = P k- 1(t) almost
1.3 Fundamental Lemmas (Integration by Parts)
37
everywhere on a ~ t ~ b. Hence L(x) = 0 on..&’ ifand only if(16) holds almost everywhere on a ~ t ~ b. If (16) holds, then equations (17) take the simpler form [P,.(b) - b, + e~]x(~)(b) = 0 on d. Since x(~)(b) (e< = 0,1, ... ,k - 1) can be chosen arbitrarily, it follows that P,.(b) = b; - e~. Consequently (15) holds on d ifand only if C~ = a~ - d~, P,.(b) = b~ - e~, and (16) holds almost everywhere on a ~ t ~ b. This proves the lemma. Corollary 7 If L(x) = 0 on one of the classes f!Jm, Cf}m, ~m (1 ~ k ~ m ~ 00), then L(x) = 0 on all of these classes. If L(x) = 0 on one ofthe classes f!JmO, Cf}mO, ~mO (1 ~ k ~ m ~ (0), then L(x) = 0 on all of these classes.
We note that if Mk(t) satisfies (16) then differentiating we have with e< summed over the indicated integer values ,
(s -
t
Mk(t) = Mk- 1(t) -
-
t)k-~-Z
fa M,.(s) (k _ e< _
(a - t)k-~z C~ (k _ e< _ 2)!
2)! ds
(0 ~ e< ~ k - 2)
or Mk(t) = M k- 1(t) - Pk-Z(t). Differentiating once again and using the expression for P~(t) below (13), we have
M;:(t) = M k- 1(t) - Pk-Z(t)
,
r Mis) (k~- _ 03 _ k
= Mk- 1(t) - Mk-Z(t)
(a -
+ c, (k _
tt-
3-
+ Ja
-
3
-
y
y)! ds
y
3 _ y)!
(0 ~ Y ~ k - 3).
By induction, this process continues until it terminates in (18). Equation (18) also follows by integrating (10) by parts, reversing the role of u and v. For example with k = 2, we have (assuming boundary terms are zero for con› venience)
L(x) = Lb[Mo(t)x(t) + M 1(t)x’(t) + Mz(t)xl/(t)]
I: S: {Mo(t)x(t) + [M l(t) - Mz(t)]x’(t)}dt z(t)x’(t)I :+ l(t) - Mz(t)]x(t)l:
= M z(t)x’(t) + =M
+
[M
f {Mo(t) -
[M 1(t) - Mz(t)]’}x(t)dt.
Thus the process which leads to (18) requires k operations of integration by parts.
38
1 Introduction to Quadratic Forms and Differential Equations
Corollary 8 Suppose that for f3 = 0, 1, ... ,k the function M p is of class on a;S; t ;S; b. Then L(x) = on the classes ~mO, Cf}mO, fiflmO (1 ;S; k :::;; m s; (0) if and only if Cf}fJ
(18) on a
M0 ;S;
t
;S;
-
d dt M 1
d2
+ dt 2 M 2 -
d
...
+ (_I)k dt k M k =
b.
Let F 0’ . , F k- 1 be functions of bounded variation on a s; t;S; band let N(t) be an integrable function. Let L be the linear functional (19) Recall that if yet) is absolutely continuous on a
s:
yet) ar,
= F~(t)y(t)l:
-
;S;
s: F~(t)y(t)
t
;S;
b, then dt.
By virtue of this relation it is seen that L can be put in the form (10) with a, = -F~(a), b~ = -F~(b), Mo(t) = 0, Mit) = -Fy-1(t) (y = 1, ... ,k - 1), Mk(t) = N(t) - Fk-1(t). Conversely, a linear functional of the form (10) can be put in the form (19) by selecting N(t) = Mk(t) and Fia) = 0, F~(t) = a~ + J~M~(s)ds (a < t < b), F~(b) = Fib - 0) - b, for IX = 0,1, ... ,k - 1. In view of this fact no generality is obtained by selecting L to be of the form (19). Accordingly we shall restrict ourselves to representations of L of the form (10).
1.4 Quadratic Forms and Differential Equations In Sections 1 and 2 we have seen that quadratic forms playa central part in the study of extremal or variational problems. The primary purpose of this section is to begin to explain the relationship between quadratic forms and self-adjoint differential equations. At one time this relationship was primarily one way, in the sense that quadratic functionals were studied by using the theory of differential equations. Thus for example Bliss [3] and Hestenes [29] devote a number of pages to existence theorems for differential equations. Hestenes [27] recognized that although the desired results were obtained, the study of quadratic functionals could be improved by developing a theory of a quadratic form lex) on a Hilbert space .Yt’. This would allow a unified treatment of many problems which had formerly been treated individually. In the cited reference [27] and in other publications by Hestenes and many of his students, this Hilbert space theory was applied to a wide variety of quadratic functional problems whose associated Euler-Lagrange equation
1.4 Quadratic Forms and Differential Equations
39
is a linear, self-adjoint differential equation or system. These equations were of elliptic type. Of particular importance to us, were two nonnegative integers connected with J(x) on £0, which Hestenes termed the index or signature s and the nullity n. These indices correspond, respectively, to the number of negative and zero eigenvalues of a real symmetric matrix. The author took this Hilbert space concept one step further by giving defined on a collection an approximation theory of quadratic forms {J(x; Of of Hilbert spaces {d(an where a is an element of a metric space (~,p). particular importance are inequalities involving s(a) and n(a), which lead to numerical and other types of approximation results ofthe topics in this book. The purpose of this section is to briefly give the problems and results, as discussed above, of Hestenes and his students. Thus this section is, in effect, the beginning of the remainder of this book. For completeness, we remark that in later work, Hestenes formally defined a more general index called the relative nullity and a formal concept of Q closure. However, we shall not consider this index directly in this book. Our intent is to start with "simple" problems and detailed explanations and calculations. We ask the reader to make an effort to become involved with these problems. As the example number increases, the problems tend to be more complicated and less con› ducive to calculations and examples. However, our central theme is always the same so that the earlier problems are examples of the later problems. We begin by stating some definitions and notation. Let d be a (real) Hilbert space with subspaces fIl, Cfj’, f0, .... It is not important that the reader have a thorough knowledge of Hilbert space theory at this time. This will be given in Chapter 2. For this section, the reader may think of a real inner product space (of possibly infinite dimension). The elements of d are denoted by x, y, z, .... Let J(x), K(x), Q(x), ... denote quadratic forms on d. J(x) is positive, negative, nonpositive, nonnegative on a subspace fIl if x # 0 in fll implies J(x) > 0, J(x) < 0, J(x) :-s; 0, and J(x) ~ 0, respectively. The vectors x and yare J orthogonal if J(x, y) = O. The J-orthoqonal complement of fIl, denoted by fllJ, is the subspace of all vectors x in d such that J(x, y) = 0 for all y in fIl. The subspace of vectors fIl n fIlJ is denoted by fIlo- A vector x in fIl0 is a J -null vector of fIl. The dimension of fIl0, denoted by n, is the nullity of J on f!4. The signature (or index) of J on fIl, denoted by s, is the dimension of a maximal subspace Cfj’ of fIl such that x # 0 in Cfj’ implies J(x) < O. To illustrate these ideas we consider a simple example which we call Example O. Let
an
(1)
J(x) =
f: (x/
2
-
x 2 ) dt
be defined on a smooth space of functions d to be defined below. Let fIl be the subspace of d such that x(O) = x(b) = 0. Integrating J(x, y) by parts with
40 u
1 Introduction to Quadratic Forms and Differential Equations
= x’(r)and
(2)
dv
= y’(t) dt we have
f: [x’(t)y’(t)- x(t)y(t)] dt
= -
f: [x"(t)
I:.
+ x(t)]y(t) dt + x’(t)y(t)
IfJ(x, y) = 0 for arbitrary y(t), then a necessary condition is that x(t) satisfies L(x) = x" + x = 0 which is a linear, self-adjoint, second-order differential equation whose solution space is S = span {sin t, cos t}. If 0 < b < n, then f!4J = Sand f!40 = {O} since it is the set of vectors in S vanishing at t = 0 and t = b < tt. Thus x(t) = Asint + Bsint, 0 = x(O) = B, 0 = x(b) = Asinb implies A = 0 and hence x(t) == O. It will be shown that J is positive on f!4 and hence s = n = O. Ifb = n, then f!4J = S, but f!40 = span {sin t} since x(t) = sin t satisfies the differential equation and vanishes at t = 0 and t = tt, Thus J is nonnegative but not positive on f!4 and hence s = 0, n = 1. Ifb > n, then s ~ 1 using the example function in Section 1.2 or the function in Section 3.1. f!4o = {O} unless b = nk for some natural number k. It will be shown in Section 3.1 that if nk < b ~ n(k + 1), then the signature is k and that the signature "counts" the number of zeros of L(x) = 0 on (0, b). We remark that if b > tt, the function x1(t) = sint on [O,n] and x1(t) == 0 on [n,b] is not in [lJJ and hence not in f!4o. The problem is that integration by parts holds on [0, n) and (n, b] but not on [0, b]. In fact if y(n) = 1 for y in [lJ, then J(x 1, y)
=
f: (xly’ - x1y)dt = f:(xlYl- X1Yl)dt
= -
r
(x ’{+ x1)ydt + (cost)y(t)I:-
= 0 + ( -1)(1) - (1)(0) = -1 # O. This illustrates that if the coefficient functions in Example 1 are, for example, continuous, then a null vector in f!40 must be C 1 on [a, b]. We should like to justify some of the statements above in a way that can be systematically continued to higher-order problems. Thus, for example, we seek to find nontrivial solutions to x(t) = A sin t + B cos t and parameter b > 0 such that x(O) = x(b) = O. In this case we have the linear system written in equation form 0= x(O) = AsinO
+
BcosO
and 0= x(b) = Asinb + Bcosb
or in matrix form
1.4
Quadratic Forms and Differential Equations
41
The associated coefficient matrix has rank one or two. In the latter case, the determinant of the system Ll = - sin b is not zero, the coefficient matrix is invertible and hence x(t) == 0 is the only solution. Thus there exist a countable set of points 0 < b, < b z < b 3 < ... such that the coefficient matrix has rank one. In this example bk are the zeros of sin t, i.e., b = n, 2n, 3n, .... For each such bk there is a nontrivial solution such that x(O) = x(b k) = O. Since L(x) = x(O) is a linear-independent constraint and since the solution space of x" + x = 0 is two dimensional, there is only one linearly independent solution x o(t). It has zeros at each bk In our example xo(t) = sin t. We shall see that this situation is not true in general for higher-order equations. That is, there may exist a countable collection of solutions of Ll = 0, but no solution vanishes at three distinct points t 1 = 0, t z = b., and t 3 = b., Example 1 is the quadratic form (3)
J(x) =
f [p(t)x,Z(t) -
q(t)XZ(t)]dt
with associated differential equation (4)
L(x) = [p(t)x’(t)]’ + q(t)x(t)
= o.
This example is discussed in detail in Section 3.1. The same general ideas hold as in Example 0, where p(t) == q(t) == 1, a = O. That is, ifp’(t) and q’(t) are continuous and p(t) > 0 on [a, b], then null solutions form a one-dimensional solution space, span{x1(t)}, where Xi(t) is a nontrivial solution of L(x) = 0, x(a) = O. Ifx1(b) #- 0 then ~o = 0 and n = O. Ifx1(b) = 0, then ~o = span x.Ir) and n = 1.The signature counts the number of points a < t 1 < t Z < tn < b such that Xi(tk) = O. Example l’is as above except that we have a nonzero boundary condition. Thus consider the quadratic form (5)
J(x) = AxZ(a) +
f [p(t)x’z - q(t)x
2
]
dt.
Since L(x) = [p(t)x’(t)]’ + q(t)x(t) = 0, integration by parts with u = p(t)x’(t), do = y’(t)dt, du = [p(t)x’(t)]’dt, and u = y(t) leads to J(x, y) = Ax(a)y(a) + p(b)x’(b)y(b) - p(a)x’(a)y(a) = [Ax(a) - p(a)x’(a)]y(a) + p(b)x’(b)y(b). If J(x, y) = 0 for "smooth" (unconstrained) functions y(t) in .91, then x in d J = do implies that Ax(a) = p(a)x’(a) and x’(b) = O. If~ is constrained by either x(a) = 0 or x’(a) = 0, then (if A #- 0) both x(a) = 0 and x’(a) = 0, which implies that lAo = {O} since L(x) = 0 and x(a) = x’(a) = O. Similar results hold if Cx(b)y(b) replaces Ax(a)y(a) in (5). If (6) J(x, y) = Ax(a)x(b)
+ 2Bx(a)x(b) + CxZ(b) +
f [p(t)x’Z -
q(t)XZ] dt,
42
1 Introduction to Quadratic Forms and Differential Equations
we have, as in the last paragraph,
+ Bx(a)y(b) + Bx(b)y(a) + Cx(b)y(b) + p(b)x’(b)y(b) - p(a)x’(a)y(a).
J(x, y) = Ax(a)y(a)
Hence Ax(a) + Bx(b) = p(a)x’(a)
and
Bx(a)
+ Cx(b) = -
p(b)x’(b)
are necessary conditions for x(t) in d J = do. If f!4 is constrained by both x(a) = 0 and x(b) = 0, then x(t) in f!4o = f!4 n f!4J implies x(t) == O. Finally, most generally in this second-order example, if R(t) > 0, then (7) J(x, y) = [Ax(a)
+ Bx(b)Jy(a) + [Bx(a) + Cx(b)Jy(b)
+ fab {[R(t)x’(t) + Q(t)x(t)Jy’(t) + [Q(t)x’(t) + P(t)x(t)Jy(t)} dt =
+ Bx(b) - R(a)x’(a) - Q(a)x(a)]y(a) + [Bx(a) + Cx(b) + R(b)x’(b) + Q(b)x(b)]y(b)
[Ax(a)
+ fab {-[R(t)x’(t) + Q(t)x(t)]’ + [Q(t)x(t) + P(t)x(t)J}y(t)dt so that do equation (8)
=
d
J
is the collection of vectors in d satisfying the differential
L(x) = - [R(t)x’(t) + Q(t)x(t)J’
+ [Q(t)x’(t) + P(t)x(t)J = 0
along with the boundary conditions (9a)
Ax(a)
+ Bx(b)
Bx(a)
+ Cx(b) + R(b)x’(b) + Q(b)x(b) =
- R(a)x’(a) - Q(a)x(a) = 0
and (9b)
O.
If f!4 is constrained by x(a) = x(b) = 0, then x’(a) = x’(b) = 0 implying that f!4o = {O}. These types of boundary conditions occur in the literature in other con› texts. For example, Stakgold [49, pp. 69-76J considers these ideas in his discussion of self-adjointness, symmetry, and Green’s function and formula. Gould [12, pp. 80-84J and [12, pp. 102-116J considers these conditions for the Rayleigh-Ritz eigenvalue problems. We now consider Example 2, which generalizes Example 1. This example is given in Hestenes [27, pp. 533-536J and in other parts of that reference. We shall only be interested in stating basic results; some of this theory will be given in Chapters 2 and 4.
1.4
Quadratic Forms and Differential Equations
43
Let .s# consist of the arcs (10)
(a :::;; t :::;; b),
where each xj : [a, b] (11a)
--+ [Rl
is absolutely continuous with x/(t) in L 2[ a, b]. Let
J(x) = 2q[x(a),x(b)]
+ f2w(t,x,x)dt,
where (Llb)
2q[x(a),x(b)]
= Ajkxj(a)xk(a) + 2B jkxj(a)x k(b) + Cjkxj(b)xk(b)
and (Llc)
2w(t, x, x)
=
Rjk(t)xj(t)xk(t)
+ 2Qjk(t)Xj(t)x k(t) + Pjk(t)Xj(t)xk(t).
Repeated indices are summed, (AjJ and (C jk) are symmetric constant matrices, Pjit) = Pkj(t) is integrable, Qjk(t) is square integrable, Rjit) = Rkit) is essentially bounded with R jk(t)l1:ink ~ hnjn j for some constant h, 0 < h < 1, and any vector (nl, n 2 , ,n P ) in [RP for almost all t in [a, b]. In the above a s; t:::;; band j, k = 1, ... , p. The inequality condition on R jk is called the strengthened condition of Legendre. The associated bilinear form is
where the subscripts denote partial derivatives of the respective quantities,
8
.
8
.
(12b)
qka(x) = 8x k(a) q = AjkxJ(a)
(12c)
qkb(X) = 8x k(b) q = Bj~J(a)
(12d)
Wxk
= Pjk(t)xj(t)
.
+ BjkxJ(b), + Cj~J(b),
.
+ Qj~j(t),
and (12e)
If f!J = {x in .s#Ix(a) = x(b) = a}, then x in [!4J implies that the Euler› Lagrange equation is satisfied, that is, there exists a constant vector c = (Cl,’" ,Cp)T such that the system
(13)
(k= 1,2, ... ,p)
44
1 Introduction to Quadratic Forms and Differential Equations
holds almost everywhere on [a, b]. ggJ is a 2p-dimensional subspace of d since the strengthened condition of Legendre holds. Note that gg c d implies do = d J C ggJ so that x in d J satisfies (13). In this case, since (13) is satisfied, we have, by integration by parts, that J(x, y) = qka(x)yk(a)
= [qkAx) -
+ qklJ(x)yk(b) + ~k(t)yk(t)l: ~k(a)Jyk(a)
+ [qklJ(X) + ~k(b)Jyk(b),
where ~k(t) = S~ Wxk ds + ci, Thus x(t) is in d along with the boundary conditions
J
if and only if (13) is satisfied
(14) More generally Hestenes considers a subspace C(j of d such that C(j = {x in dlLix) = aO:kxk(a) + bo:kXk(b) = 0, (0( = 1,2, ... ,m)}. Note that gg c C(j c d and hence d J c C(jJ C ggJ. gg is a subspace of d of codimension 2p since there are 2p linearly independent constraints, Lk(x) = xk(a) = and L~(x) = xk(b) = 0, in going from d to gg. ggJ has dimension 2p, while d J is a 2p-dimensional space with linear constraints (14). An immediate extension of the above with p = 1, shows that d J can be {O}; while if q == 0 in (11), then d J has dimension p similar to Example 0 when p = 1. For this more general problem Hestenes [27, p. 542J shows that x(t) is in C(jJ if and only if it satisfies (13) and the transversality conditions, that there exist constants (hi’ h 2 , . . . , hml such that (15a) and (I5b) holds. This result follows by a very clever multiplier rule and by the lemma that under the above conditions there exists constants (h 1, h 2 , . ,hm)T such that J(x, y) + ho:Liy) = 0 holds for all y in d. Integration by parts as in the case when ho: = yields (15). Note that if ao: k == == bo: k (0( = 1,... ,p), then in fact C(j = d and hence C(jJ = d J . This is verified since in this case (14) and (15) are identical. At the other end of the subspace inequality defining C(j, assume m = 2p and, without loss of generality, that Lix) = ao:kxk(a) (0( = 1,2, ... ,p) and Lo:+p(x) = bo:~k(b) (0( = 1,2, ... ,p), where (ao:k) and (bo: k) are p x p linearly independent matrices. Then xk(a) = xk(b) = 0 for all k so that C(j = gg and hence C(jJ = ggJ. Thus qka(X) = qklJ(X) = 0 for all k and the multipliers in (15) are uniquely determined from the constants ~k(a) and ~k(b). So (15) is really from the constants ~k(a) and ~k(b). So (15) is really a vacuous condition and gg only need satisfy (13).
1.4 Quadratic Forms and Differential Equations
45
Hestenes has also considered extensions of the space f!4 above where in addition, a set of isoperimetric conditions is satisfied such as (« = 1,... ,m).
We defer this problem until we consider the work of Lopez in Chapter 4. Our next example, Example 3, is a simple fourth-order problem corre› sponding to the second-order problem we called Example O. Thus let p(t) > 0 and (16)
J(x)
=
S:
[p(t)x 2(t) - q(t)x 2(t)] dt.
The formal setting is given below, but for now let d = {x(t): [0, b] -+ [Rl} be defined so that (16) makes sense and let f!4 = {x in d Ix(O) = x’(O) = 0, x(b) = x’(b) = O}. This is the correct extension for concepts such as oscilla› tion or conjugate points and extremal solutions in the previous examples. The associated bilinear form is (17)
J(x, y) =
S:
[p(t)xl/(t)yl/(t) - q(t)x(t)y(t)] dt
=
p(t)xl/(t)y’(t) ,: -
S:
=
p(t)xl/(t)y’(t) /; -
[p(t)xl/(t)]'y(t)I~
+
S:
[(p(t)xl/(t) )’y’(t) + q(t)x(t)y(t)] dt
{[p(t)xl/(t)]1/ - q(t)x(t)}y(t) dt.
We have performed a double integration by parts. In the first integration, = p(t)xl/(t), do = yl/(t)dt, du = [p(t)xl/(t)]’dt and u = y’(t). In the second integration, u = - [p(t)xl/(y)]’, do = y’(t)dt, du = - [p(t)xl/(t)]"dt, and v =
u
y(t).
Thus by (17) x is in f!4J if and only if x(t) satisfies (18)
L(x)
= [p(t)xl/(t)]" - q(t)x(t) = O.
For convenience we assume that p(t) and q(t) are in C 2(0, b), so that (18) exists as a differential equation and not in integrated form. The solution space ,qjJis a four-dimensional subspace. Once again f!4 c d implies d J c ,qjJ so that if x is in d J , then x satisfies (18) and the boundary conditions can be worked out from (17) as before. Because f!4 o = f!4 n f!4J, we require that (18) be satisfied along with the conditions x(O) = x’(O) = x(b) = x’(b) = O. Under these conditions, Example 3 is like an eigenvalue problem. That is, these four constraints are (usually) linearly independent and hence f!4 o = {O}. However, for some values of t = b we obtain f!4 o -=I {O}.
46
1 Introduction to Quadratic Forms and Differential Equations
To illustrate this idea, let p(t) == 1 and q(t) == 1. Then (18) becomes L(x) = X(4)(t) - x = O. The four-dimensional solution space is fllJ = span le’, e:’, sin t, cos z}. We wish to determine constants A, B, C, D not all zero and b > 0 such that x(t) = Aet (19)
+ Be- t + C sin t + D cos t satisfies
x(O) = x’(O) = 0,
x(b) = x’(b) = O.
Since x’(z) = Aet - Be- t + Ccost - Dsint we have four homogeneous equations in four unknowns x(O) = A + B + D = 0, x’(O) = A - B + C = 0, x(b) = Aeb + ee > + Csinb + Dcosb = 0, x’(b) = Aeb - Be- b + Ccosb - Dsinb = 0,
W) 0
1 -1I 1 0 (’ eb e- b sinb cosb eb _e- b cosb -sinb
o
1
B C D
0 0 0
Straightforward row reduction leads to 1
(~
0 1 _eb+e- b sinb _eb_e- b cosb
I ) (1
-2
-1 0 -eb+cosb ~ 0 -eb-sinb 0
~( Ol io Sinb-~+ie-'
cosb-!eb-!e- b
=(o~l o~ o
_~
0 1 1 1 -2 _eb+e- b sinb _eb_e- b cosb
-~Lb)
-eb-sinb
-~+COS}+Y-ie-')
_eb-sinb+!e b+!e- b
~)
sinb -i(eb - e- b) cosb-t(eb+e- b) . cos b-!{eb+e- b) - sin b- ~(eb-e-b)
Note that the fourth row of the final matrix is the derivative of the third row, which is correct since we were looking for b such that a linear combination of these functions and their derivatives vanishes at t = b. The determinant of the final matrix is (20)
D. = -sin 2 b + •!:(eb - e- b ? - cos’’b + cosb(eb + e- b) - t(eb + e- b)2 = - 1 + cos b( eb + e- b) - 1 = 2(- 1 + cos b cosh b).
1.4 Quadratic Forms and Differential Equations
47
It is not possible that the last two rows be zero for b > 0, since the (3, 3) and (4,4) element both equal to zero implies sinb = 0 and hence eb = «», which implies b = O. Thus the rank of the system is at least 3 and hence the null space has dimension at most 1. We remark that since cos(2nn - n12) = cos(2nn + n12) = 0 and cos(2nn) = 1, the intermediate value property yields two solutions of Ll = 0 in each interval (2nn - n12, 2nn + nI2). Furthermore these solutions monotonically approach (very quickly) the two end points of the interval. Let b i < b2 < b 3 < ... be the countable collection of zeros of Ll = 0 in (20), so that b2 n - 1 is in (2nn - n12, 2nn) and b z n is in (2nn, Lnn + nI2). Let XI(t) and x 2 (t) be linearly independent solutions of X(4) - x = 0 subject to the linearly independent constraints LI(x) = x(O) = 0 and Lix) = x’(O) = O. The question we now consider is how are the points {bk} generated? Does there exist a particular nontrivial solution yet) = Axl(t) + Bx 2(t) such that yet) and y’(t)vanish at t = 0 and two or more of the points bk? We now show that this is impossible. Suppose for example that yet) and y’(t)vanish at t = 0, t = bi> and t = b z . If so, let x(t) = y(bi + t). Then x’(t) = y’(bi + t), etc., imply that x(t) is a nontrivial solution to X(4)(t) - x(t) = 0, vanishing along with x’(t) at t = 0 and t = b z - b l. This is impossible since 0< b z - b i < n and Pc < b l. In the general case, assume there exists a nontrivial solution yet), such that yet) and y’(t) vanish at t = 0, t = bk 1, and t = bk2, where bk 1 and bk2 are the first two such points. As above with x(t) = yet + bk.), we have x(o) = x’(O) = and X(bk2 - bk.) = X’(bk2 - bk .) = 0. Thus bk2 - bk 1 = bkl or bk2 = 2bk,. Let c = bk , Then cos c cosh c = 1 and
1 = cos 2c cosh 2c = (2 cos" c - 1)(2 cosh? c - 1) = =
4cos 2 ccosh? c - 2(cos Z c + cosh’’c) + 1 4 - 2(cosZc + cosh/ c) + 1
or cos! c + cosh? C = 2. Thus we arrive at the inequality
J2
> coshc > cosh(Pc) > !e9f2 ,
which is clearly impossible. Let the above sequence of points < b i < b z < ... be known as con› jugate points. We will not underline this "definition" since we only wish it to apply at this time to this specific problem. Then we have the following remark. Remark For the above example, there exist a countably infinite set of conjugate points such that no two distinct (nonzero) conjugate points ak and al satisfy
48
1 Introduction to Quadratic Forms and Differential Equations
More generally, if there exist points a < b < c such that x(t) satisfies X(4) - x = and x(a) = x’(a) = x(b) = x’(b) = x(c) = x’(c) = 0, then x(t) == 0. Example 4 is our final example in ordinary differential equations and is quite esoteric. We shall give little motivation, leaving for illustration previous examples of this section. Example 4 is a most general integral-differential equation of Fredholm type. The details of this theory are worked out in a dissertation of Lopez [36]. We shall return to this problem in Chapter 4. The fundamental vector space .xl is the set of functions z(t) = [Zl(t),.. . ,zp(t)J whose «th component, za(t), is a real-valued function defined on the interval a ~ t ~ b of class C n - 1 ; z~n-1)(t) is absolutely continuous and z~n)(t) is Lebesque square integrable on a ~ t ~ b. .xl is a Hilbert space. The inner product is given by (21) where a = 1, ... ,p; k = 0, ... , n - 1; superscripts denote the order of dif› ferentiation; and repeated indices (except for n) are summed. The fundamental quadratic form J(x) is given by (22)
J(x) = H(x) +
Jab Jab k~p(S,
t)x~)(s)x~)(t)
+ Lb R~p(t)x~)(t)xW)(t)
dt
(a, f3 = 1, ... ,p; i,j = 0, ... ,n), where R~p(t) and integrable functions on a ~ t ~ b; (23)
ds dt
= R~it)
are essentially bounded
H(x) = A~~x~k)(a)x~)(a),
A~~ = A~~ (k, 1= 0, ... , n - 1) are n2 p 2 real numbers; K~p(s, t) = K~a(t, essentially bounded and integrable functions on a ~ t ~ b; and
(24)
R~p(t)7ta7tp
s) are
;;:: h7t a7t a
holds almost everywhere on a ~ t ~ b, for every 7t = (7tl>’ . , 7t p ) in EP, and some h » 0. This inequality is the ellipticity (or Legendre) condition of Hestenes in this setting. The connection between quadratic forms and integral-differential equa› tions is now given. Let f!l denote a subspace of .xl such that x is in f!l if and only if (25a)
1.4 Quadratic Forms and Differential Equations
49
(ex, [3 = 1,.. . ,p; k,l = 0, .. . ,n - 1; y = 1,.. . ,m:::; np), where M~a are real numbers such that the linear functionals Ly(x) are linearly independent on d. Let ~(A) (a :::; A:::; b) denote the subspace of ~ whose component functions satisfy
A:::; t:::; b
on
(25b)
for
k = 0, ... , n - 1.
For any arc x(t) in d set (26) for almost all t on a :::; t :::; b. Define the recursive relations (27a)
vp(t) =
0 0 Jait Tp(S) ds + cp,
(27b)
v~(t)
f: [T~(S)
o
=
-
v~- 1(s)] dx + c~
(k = 1,.. . ,n - 1),
where cg, ... , cp- I are real numbers. Let J(x, y) be the bilinear form asso› ciated with J(x), i.e., J(x) = J(x, x).
Theorem 1 Let J(x) be the quadratic form given by (22). There exists an arc x = (xl(t), ... ,xp(t))ind suchthatJ(x, y) = for ally = (Yl(t), ... ,Yp(t)) in ~(A) if and only if the constants c~, . . . , cp- l in (27) and constants 111" .. ,11m can be chosen such that the Euler constants Tp(t) = Vp-l(t)
(28)
([3 = 1,.. . ,p)
hold almost everywhere on a :::; t :::; A, and the transversality conditions
(29)
+
A:~x~k)(a)
l1yM~p
- v~(a)
=
hold at t = a.
The proof of this result follows in the expected way from the method of integration by parts or by the Riesz representation theorem for Hilbert spaces. We also note that (28) is the integrated form of the 2nth-order integral differential equation (if it exists) (30)
dn -
d"
I
dt n [Tp(t)] - dt n - l [Tp-l(t)]
+ ... + (-ltTg(t) =
0.
For Example 5 we consider linear elliptic partial differential equations of the form (31)
o
( m.2: -0.’ OQ.) = 0,
L(x) = -0. [Rij(t)xit)] - x(t) pet) -
t,
,= 1 t,
50
1 Introduction to Quadratic Forms and Differential Equations
where t=(tt.tz, ... ,tm) is in ~m; x(t) is a real-valued function; oxjotj is written as x j; P(t), Qi(t), and RJt) satisfy smoothness and symmetric prop› erties described in Chapter 5, repeated indices are summed, and i.] = 1, ... ,m. Once again Rij(t) = Rj;(t) satisfies the ellipticity (or Legendre) condition Rij(t)~;~j > 0 for all nonzero ~ = (~l'" ',~m) in ~m. The quadratic form of interest is
(32)
J(x)
=
ST{P(t)XZ(t) + [2Qi(t)Xi(t)]X(t) + Rij(t)xi(t)xit)}dt
with associated bilinear form
where T is an open bounded subset of ~m with "nice" boundary oT (see Chapter 5). As in our other examples (31) is the Euler-Lagrange equation for J(x) in (32). It is obtained by integration by parts or by a divergence theorem. This result is expressed in
Theorem 2 There exists a subset T 1 of T and a nontrivial solution x(t) of (31) vanishing on oT1 if and only if J(x, y) = 0 for all y(t) vanishing in the closure of T - T 1 Hestenes shows (in unpublished classroom notes) that the integration by parts involves a "divergence theorem" and/or an application of his theory of Q closure. This general integration by parts idea for this application may also be found in Dennemeyer [7]. As we have seen in Section 1.2, we may derive a more general Euler› Lagrange equation and then apply it to the special form ofquadratic functions. To avoid too many subscripts we change notation briefly to do this more general problem in two independent variables. Thus let
(34)
I(x) = SSTf(s,t, x, x., Xt) ds dt.
Proceeding as in Section 1.2, we let F(e) = I(x and form
~B [F(e)
- F(O)] =
~B
II [f(s, t, T
x
+ BY), where y(s, t) == 0 on aT
+ BY, X + By.,Xt + BYt) S
f(s,t,x,x.,xt)]dsdt
~ SST[fxY(s,t) + fxsYs(S,t) + fx,Yls,t)] ds dt = F’(O).
Quadratic Forms and Differential Equations
1.4
51
The approximation sign is the first-order Taylor series expansion in e evaluated at s = O. Using Green’s theorem, which states that ffT(Qs - Pt)dsdt = fT P ds + Qdt,
and setting P = - YIx, and Q = YIxs ’ we obtain
0
e
[0
0]
Qs - P, = os (yIxJ + ot (YIx,) = ysIxs + Ytfx,+ Y os (fxJ + at (fx,) . From Green’s theorem and the last calculation ff [YsIxs = -
:s
+ Ytfx, + Y(
(fxJ
+ :t (Ix.»] ds dt
fOT YIx,ds+ YIxs dt = 0
since y(s, t) == 0 on aT. Using this calculation and F’(O) we obtain F’(O)
=
ffT(fxY+ !xsYs+ Ix,Yt)dsdt
= ffTY[Ix-
(~(fxJ+
:t(fx,»)]dSdt.
Since F’(O) = 0 for all y(s, t) == 0 on aT we obtain
o as (IxJ -
(35)
0 ot (Ix) - Ix = 0,
which is the Euler-Lagrange equation of (34). Let us assume in fact that the above calculations hold for m ~ 2 inde› pendent variables so that (35) becomes
a
m
.L at. (fx.) -
(36)
l=
1
t; = O.
I
Applying this result to J(x) in (32) with I = P(t)x 2(t)
+ [2Qi(t)Xi(t)]X(t) + RijXi(t)Xj(t),
we obtain
o=
0
m
Oti [2Qi(t)X(t) + 2Rijxj(t)] - 2P(t)x(t) - 2Qi(t)Xi(t)
i~l
o. t,
= 2[a
[Ri~P)]
+
(.f,= a~i)X(t) 1
t,
which is the Euler-Lagrange equation (31).
- P(t)X(t)],
52
1 Introduction to Quadratic Forms and Differential Equations
Ifthe reader chooses not to accept (36) for m > 2, as we have not proven this result, he may use (35) to verify (31) with Tn = 2. The specific example of (32) we may think of in this connection is the Dirichlet integral (37) where T is a bounded open subset of [Rz with oT the boundary of T "suffi› ciently smooth" for our purposes. We may also remark that by introduction of a real parameter {l, we obtain an elegant theory of eigenvalues connected with the Dirichlet integral or more generally with J(x) in (32). The Euler-Lagrange equation of (34) is Laplace’s equation
oZx
(38)
L(x) = ~ ut 1
oZx
+ ---;-z = O. ut z
If [!g is the class of all functions x(t) vanishing on oT, then x is in [!gD if and only if it is harmonic on T. That is, x(t) satisfies (38). Dirichlet’s principle of uniqueness of boundary data implies that [!go = {O}, for x(t) == 0 on oT and x(t) satisfying (38) implies that x(t) == O. The reader may wish to consult a text, such as Stakgold [49J, and in particular Volume II, Chapter 6, to explore this important topic in more detail. For completeness we note that many of our comments in this specific example could follow from the first Green’s theorem .
r
JT
u VZvdt
= -
r (grad u• grad v)dt + JoT r u on OD ds,
JT
where VZv = (ozv/oti) + (OzD/Ot~), gradu = (ou/otdi + (ou/otz)j, and ou/on denotes the outward normal derivative of u on oT. Example 6 is a quadratic control problem, see Mikami [37]. The space of functions is the arcs
x: x(t),u(t)
(a~t~b),
where x(t) = (x’{z),XZ(t), . . . ,xn(t»)is a state vector, u(t) = (u1(t), UZ(t), . . . ,uq(t» is a control vector, x(t) and u(t) are square integrable on [a, b], and the asterisk denotes transpose. Our quadratic form is expressible as (39)
J(x) = b*Fb
+
i
tl
to
2w(t, x, u)dt,
where
2w(t, x, u) = x* P(t)x
+ x*Q(t)u + u*Q*(t)x + u*R(t)u,
1.4
’6’(A)(a
~
Quadratic Forms and Differential Equations
53
A ~ t) are subspaces of arcs satisfying the linear control equation
x = Ax + Bu
(40)
(a
~
t
~
A),
a linear constraint equation
Mx+ Nu =0
(41)
and the boundary conditions
(42a)
x(tO) = Cb,
(42b)
x(t) = 0,
u(t) = 0
(A ~ t ~ b).
The matrices A, P = P*, and Q are square integrable, B, M, N, and R = R* are essentially bounded and measurable and C, D, and F = F* are constant matrices. In addition, the matrix N is assumed to have the inverse of NN* existing and essentially bounded. Stated in another way, there exists a positive number h such that at almost all t on a ~ t ~ b,:n:* N(t)N*(t):n: ~ h:n:*:n: for every :n: in [Rm. In addition to the other generalities, we now will have an oscillation point or focal point theory by introduction of the spaces {’6’(A)}.If A == 0 and B is the identity matrix in (40), with (41) deleted, then u = x and this case reduces to the second-order system case in Example 1’. In the current case we have a concept of abnormality which did not exist in previous examples. That is, in Example 1’, if x(c) = x’(c) = 0 for a ~ c ~ b, where x(t) is a solution of the Euler-Lagrange equation, then x(t) == O. We shall see in Chapter 6 that nontrivial solutions of the Euler-Lagrange equations below are allowed to vanish on a subinterval of [a, b]. We will show that our signature theory in Chapter 3 makes sense, if we think of these intervals as points. Furthermore our approximation theory will also hold. The next theorem is found in Mikami [38, p. 475]. Its proof is extremely difficult. If the reader is uncomfortable with a collection {’6’(A)},the generic space ’6’(A) may be replaced by ’6’(b). :Yt’ is given in Chapter 6.
,
Theorem 3 Let x in :Yt’ with x(t) absolutely continuous on a ~ t ~ A. Then x is J orthogonal to ’6’(A)if and only if there exist an absolutely continuous vector p(t) (a ~ t ~ A) and a square integrable vector f1(t) (a ~ t ~ A) such that (43) (44)
jJ
+ A*p + M*f1 = B*p + N*f1 =
wx , Wu
and (45)
Fb - C*p(tO)
= O.
(to ~ t ~ A),
54
1 Introduction to Quadratic Forms and Differential Equations
Equations (43) and (44)together with (40)and (41) are the Euler-Lagrange equations, and Eq. (45) is the transversality condition. Example 7 is a singular differential equation and quadratic form. A theory has been developed for these types of problems by Stein [50]. However, in our sense this is an unfinished problem since we have no concept of an approximation theory. Thus we shall briefly describe these ideas and list several illustrative examples. As an example quadratic form we have (46)
lex) =
f 2w(t,x,x’)dt,
where (47)
2w(t,x,x’)= Rjk(t)x/ik + 2Qjk(t)Xj(t)ik(t) + Pjk(t)Xj(t)xk(t),
and Rjk(t) is continuous for a :::;; t :::;; b. However, the matrix (Rjk(t» satisfies det(Rjk(c» = 0, for a finite number of points c in [a, b]. Note that except for this difficulty we have the quadratic form lex) given in (11) above. For› mally, the Euler-Lagrange equation is expressed as in (13), that is it is a second-order linear system of the form d
(48)
k
dt(w",) =
k
Wx
The major examples are singular second-order linear homogeneous equa› tions. In this case Stein [50, p. 199] has given a canonical form (where we assume for convenience the singularity at t = 0): (49)
lex) =
f: [t 2Pr(t) .e(t) + 2(’+Pq(t)x(t)i(t) + t2 p(t)X <x
2(t)]
dt
with associated singular differential equation (49)’ [t 2Pr(t)x’(t) + t<X+Pq(t)x(t)]’ = t<X+Pq(t)x’(t) + t 2<x p(t)x(t). We assume that pet), q(t), and ret) are continuous on [0, b] and that p(O) and reO) are nonzero unless pet) and ret) are identically zero. To connect these ideas and other ideas in this work to problems in mathematical physics, and to concepts such as limit point, limit circle ideas we return to Example 0 with a = 2n and b = 00, that is let (50) If we consider the subspace of functions which vanish at t = 2n, then the solution space of the Euler-Lagrange equation x(t) with x(2n) = 0 is span {sin t} as above. lex) cannot be elliptic or Legendre. In fact, there exists an infinite-dimensional negative space of lex), or equivalently the
1.4 Quadratic Forms and Differential Equations
55
Euler-Lagrange solution x(t) = sin t vanishes infinitely often on the interval [2n, (0). The last sentence will become clearer in Section 3.1. This form with analytic coefficients is singular because of the fact that the upper limit is infinite. We now show (by two methods) that it is equivalent to a form given in (48). Our first method is to begin with the problem
(51)
x"(t) + x(t)
=0
(2n:::;; t < (0),
x(2n) = O.
Making a change of variables y(s) = x(t(s)), where t = l/s and dt = -1/s2 ds, we have, with primes denoting differentiation with respect to the appropriate variable,
dx ds x’(t) = ds dt = y’(s)(- S2), x"(t) = d(x’(t)) ds = _ [S2 y’(S)]’(_ S2) = S2[S2y’(S)]’. ds dt Thus (51) becomes
(52)
[S2 y’(S)]’ + (l/s 2)y(s) = 0,
y(I/2n) = 0
and the associated quadratic form is (53) which is ofthe form (48).The Euler-Lagrange solution to (52)is y(s) = sin(l/s) which oscillates infinitely often as s ~ 0 +. Furthermore, the quadratic form in (53) has an infinite-dimensional negative space (see Section 3.1). Our second method is to begin with the quadratic form (50); then the above change of variables lead to
with associated Euler-Lagrange equation [S2 y’(S)]’+ (1/s2)y(s) = O. Note that both methods lead to equivalent results. Note also that the self› adjoint form of the differential equation must be preserved. For example, in (52) we are tempted to multiply by S2 and obtain S2[S2 y’(S)]’+ y(s) = 0, but this is not the Euler-Lagrange equation ofthe correct quadratic form. Stein notes that many of the classical orthogonal polynomials and differ› ential equations are in form (48) except that the singular point(s) may not
56
1 Introduction to Quadratic Forms and Differential Equations
be at t = O. These include 1. Jacobi equation:
d dt [’y(t)(l - t)M+l(l + t)N+IJ + n(M + N + n + 1)(1- t)M(l + tty(t) = O. 2. Legendre equation: d dt [y(t)(l - t 2)] + n(n + l)y(t) = 0. 3. Laguerre equation:
4. Associated Legendre equation:
d [(1 - t 2)y(t)J + dt
(m - 1 _ t2 + l(l + 1) y(t) 2
)
= O.
5. Associated Laguerre equation: d [t 2y(t)J + ( dt
2
t + nt -4
1(1 + 1)) y(t) = O.
6. Chebyshev equation:
d
it [(1 -
t 2)1/2 y(t)J
n2
+ 1_
t2 y(t) = 0.
7. Bessel equation: d t 2 - n2 -d [ty(t)J + - - y(t) = O. t t Note that several of these equations are really eigenvalue problems of the type J(x) - J-tK(x), where the parameter J-t is a comparison parameter. This will be discussed in more detail in Section 3.3. For example, the Legendre equation is the Euler-Lagrange equation associated with the parametric quadratic form
J(x) - J-tK(x)
=
f
1
(1 - t 2)y’2(t)dt - J-t
r
1
y2(t)dt,
where the value u; = n(n + 1) for n = 0,1,2, ... is an eigenvalue of this equation. We will see (at least in the nonsingular case) that the eigensolutions J-tn = n(n + 1) with the associated eigenvector xn(t), is equivalent to PJ o = go (’) PJP = span{xn(t)}, where P(x) = J(x) - J-tK(x). Furthermore, PJ o =
1.4 Quadratic Forms and Differential Equations
57
!!J n !!JP = {O} if J1 is not an eigenvalue. Similarly, the associated Legendre equation is a double eigenvalue problem of the form J(x) - J1K 1(x) - eK 2(x) = f~
1
- J1
(1 - t 2 ) y
f2(t)
f
1
dt
y2(t)dt - s
f
1
C~
t 2)y2(t) dt,
where the values J1n = n(n + 1) for n = 0,1,2, ... and em = _m2 for m = 0, 1,2, ... , n is the double eigenvalue of this equation. The same comments hold for!!J o =!!J n !!JP with P(x) = J(x) - J1K 1(x) - eK 2(x). We shall show in Section 3.4 that (in the nonsingular case) a numerical approximation theory can be given that allows us to approximate qualita› tively and compute single and double eigenvalues. Soon, we hope to be able to do these tasks for singular problems.
Chapter 2
Abstract Theory
2.0 Introduction
The purpose of this chapter is to present to the reader theoretical ideas and results necessary for subsequent chapters. The reader may wish to skip or skim these topics at first reading and return to them at a later time. The examples in Section 1.4 and 3.1 should clarify and motivate the theoretical ideas of this chapter. Section 2.1 gives the basic Hilbert space theory required for this text. This material includes weak and strong convergence, linear operators, and qua› dratic forms. Much of this material will be new even to those people with a strong background in Hilbert space theory. Theorems 1-4 summarize the new directions taken by Hestenes, which allow us to attack the problems of subsequent chapters. Section 2.2 contains part of a very elaborate and beautiful theory of quadratic forms by Professor Magnus Hestenes. Much of this material is more extensive than we shall require. The reader should make an effort to understand Theorems 12 and 14 in this section. These theorems deal with the signature and nullity of a quadratic form and are very important to our efforts. Section 2.3 is a fundamental theoretical section. It formed the basis of the author’s Ph.D. thesis under Hestenes. The signature theory of a quadratic form J(x) on a Hilbert space d given by Hestenes in Sections 2.1 and 2.2 is extended to an approximation theory of quadratic forms J(x; 0) defined on d(O"). The major result is (5) of Theorem 6, which is a collection of in› equalities relating the approximating signature s(O") and nullity n(O") of J(x; 0") on d(O") to the fixed values s(O"o) and 11(0"0)’ The purpose of much of this text is to extend and apply these inequalities to problems of differential equations. 58
2.1 Hilbert Space Theory
2.1
59
Hilbert Space Theory
The purpose of this section is to state very briefly the basic theory of Hilbert spaces, strong and weak convergence, and linear operators and qua› dratic forms needed for this book. As the reader may appreciate, a thorough discussion of this material is not possible here. There are many excellent references, such as Halmos [24], Helmberg [26], and Stakgold [49]. In partic› ular, Hestenes [27] contains much of the material on quadratic forms given here that we shall need in this book. This reference, which may be difficult for the reader, is somewhat unique in its presentation of material and success in solving problems. We assume the definitions of vector space, subspace, basis, and direct sum. The fundamental (real) vector space is denoted by d ; subspaces by ~, "6’, ~, ... ; elements of d by the letters x, y, z, ... ; complex scalars by a, b, c, .... The inner product is denoted by (x, y). The norm of x is defined by Ilxll = (x, X)l/Z. We assume throughout that our scalar field is real for convenience and applications. The complex scalar field holds equally well in theory. A sequence {xq } in d converges strongly to xo, written xq = xo, if limq=oollxq - xoll---+ O. A sequence {xq } in d converges weakly to xo, written xq ---+ X o, if limq=oo(:xq - xo,y) = 0 for all yin d. The subspace ~ of dis closed if it is closed relative to strong convergence. We sometime will use the words "manifold" or "linear" in place of subspace. d is complete if every (strong) Cauchy sequence has a limit in d.We shall assume that d is a (real) Hilbert space, that is, a complete inner product space with norm Ilxll. The sequence {xq } in sf is bounded if {Ixql} is a bounded sequence of real numbers. We remark that our inner product is not the usual L, inner product (1)
(x,y) =
S: x(t)y(t)dt,
but depends upon the order of differentiation of the problem under con› sideration. Thus, for example, for second-order differential systems we re› quire an inner product given by (21) in Section 1.4. When n = 1 (second-order case) our inner product is given by
(2)
(x, y) = x(a)y(a) +
f x’(t)y’(t)dt.
We then require a space of functions which will be complete relative to the norm Ilxll = (x, X)l/Z of (2). Some useful results from the above definitions are: strong convergence implies weak convergence; closed subspaces of complete spaces are complete; if ~ is a closed subspace of d such that every weakly convergent sequence is strongly convergent, then ~ has finite dimension; finite-dimensional sub› spaces are closed; finally, xq ---+ Xo if and only if {xq } is a bounded sequence and limq=xJx, xq ) converges for all x in a dense subset of d.
60
2 Abstract Theory
A real-valued function f(x) is continuous on si if xq= X o implies f(xq) --+ f(xo); f(x) is weakly continuous on si if xq --+ X o implies f(xq) --+ f(xo); f(x) is weakly lower semicontinuous (WLSq on si if xq --+ Xo implies lim infq= f(xq) Z f(xo); f(x) is linear on.# if f(ax + by) = cif(x) + bf(y). A continuous linear function is said to be a linear form or linear functional. A function Ton si to si is linear if T(ax + by) = aTx + bTy; T is continuous if xq = X o implies TXq Txo; T is a linear operator (linear transformation) if it is linear and continuous; T is compact (or completely continuous) if xq --+ X o implies TXq --+ Txo; T is bounded if there exists a positive number M such that IITxl1 ~ Mlixli holds on si. The smallest of these bounds M is the norm of T and is denoted by II Til. Some useful results from the above definitions are: T is compact implies T is continuous; if T is linear, then it is compact if and only if every bounded sequence {xq} of si has a strongly convergent subsequence, that is, IIxqll ~ M implies there exists a subsequence {xqJ such that TXqn= y in si. The real-valued function Q(x, y) on si x si is a bilinear form if, for each yin si, Q(x, y) and Q(y, x) are linear forms in x. Ifxq --+ Xo and yq --+ Yoimply Q(xq,yq) --+ Q(xo,Yo), then Q(x,y) is compact. If Q(x,y) = Q(y,x), then Q(x) = Q(x, x) is the quadratic form associated with the bilinear form Q(x, y). Q(x) is positive (negative, nonpositioe, nonnegative) on si if Q(x) > 0 [Q(x) < 0, Q(x) ~ 0, Q(x) Z OJ for x =1= in si. Q(x) is positive definite on si if there exists a positive number h such that Q(x) z hllxl1 2 on si. Q(x) is compact if xq --+ X o implies Q(xq ) --+ Q(xo). Q(x) is WLSC if xq --+ X o implies lim infq = 00 Q(xq ) z Q(xo). The result that Q(x) is compact if and only if Q(x, y) is compact is im› mediate by the identity 2Q(x, y) = Q(x + y) - Q(x) - Q(y). Furthermore if Q(x) is positive definite, Illxlll= Q(X)1/2 yields an equivalent strong topology. That is there exists positive constants m and M such that ’L
=
for x in si. mllxll ~ Illxlll s Mllxll We note the study of quadratic forms is equivalent to the study of self› adjoint linear transformations and in particular that the study of compact quadratic forms is equivalent to the study of compact, self-adjoint, linear transformations. Thus Q(x, y) is a bilinear form on si if and only if there exists a linear transformation Ton si such that Q(x, y) = (Tx, y) for all x and yin si. If Q is quadratic, then T is self-adjoint. T is a compact linear trans› formation if and only if Q is a compact quadratic form. The identification between Q and T follows from the Riesz Representation Theorem which states that if f is a linear functional on si, there exists a unique vector y in s1 such that f(x) = (x,y) on s1 with Ilfll = Ilyli. Thus if z is fixed, Q(x, z) is a linear functional on si and hence Q(x, z) = (x, y). Letting y = Tz gives the above results.
2.1 Hilbert Space Theory
61
Let Q(x,y) be a bilinear form and Q(x) be the associated quadratic form. Two vectors x and yare Q orthogonal if Q(x, y) = O. The vector x is Q orthog› onal to f!J if y in f!J implies Q(x, y) = O. The set of all vectors Q orthogonal to f!J is the Q-orthogonal complement,denoted by f!J Q f!J and «5 are Q orthogonal if each x in f!J is Q orthogonal to «5. A vector x is a Q null vector of f!J if x in !JI and x is Q orthogonal to f!J. !JI 0 will denote the set of Q null vectors of f!J. The signature (index) of Q(x) on f!J is the dimension of a maximal, linear subclass «5 of f!J on which Q(x) is negative. The nullity of Q(x) on f!J is the dimension of f!J (\ f!J Q Finally J(x) is an elliptic form on sf if J(x) is WLSC on .91, and xq => X o whenever xq -+ X o and J(x q ) --+ J(x o). Several equivalent definitions of an elliptic form are given below. The above definition was chosen for many reasons: It was the original one given by Hestenes; it is the most useful in the development of the theory of quadratic forms; it most closely corresponds to the applications considered by Hestenes. The con› dition of WLSC fixes the sign of J(x); without it, either J(x) or - J(x) is elliptic. We note the following results for elliptic forms, which show they are "almost" positive definite. A quadratic form J(x) is elliptic on .91 if and only if there exists a finite-dimensional subspace f!J of .91 such that J(x) is positive definite on the orthogonal complement of f!J. A quadratic form J(x) is elliptic on .91 if and only if there exists a positive definite form P(x) and a compact form K(x) such that J(x) = P(x) - K(x). This result is Garding’s inequality for elliptic partial differential equations. Furthermore K(x) can be chosen nonnegative on d. A positive elliptic form is positive definite. We have noted above that for a quadratic form problem such as (3)
J(x)
=
Io! [p(t)X
’ 2(t)
- q(t)x 2(t)] dt,
p(t) > 0
associated with the differential equation L(x) = [p(t)x’(t)J’ + q(t)x(t) = 0, we use the inner product (2) and not (I). In the above J(x) is elliptic [relative to (2)], the integral of X ’ 2 (t) is elliptic and the integral of x 2 (t) is compact. Using classical methods with Au = - u", the "operator" A is not defined on all of L 2[0, I], and hence not bounded or compact. Classical methods require that we find the inverse of A using Green’s function, which is compact. See for example Hochstadt [31, p. 65]. Our methods avoid these classical methods and problems, which become very difficult for higher-order problems. The reader may wish to consult a text such as Goldberg [11] to see the sophisti› cated machinery needed by classical methods. Even then they do not seem to obtain the results of Hestenes and his students. Theorem I is a very beautiful and remarkable theorem. It is given in Hestenes [27, pp. 543-544]. Combining this result with Theorem 3 we see that, in some sense, elliptic forms can be pictured as having only a finite
62
2 Abstract Theory
number of negative and zero eigenvalues so that many of the ideas in Section 1.1 are applicable. We often do not really care about the positive eigenvalues since Q(x) is positive on d+ and hence equivalent to the inner product on d +. Theorem 1 Let Q(x) be a quadratic form on d. Then there exists three subspaces d_, do, d+ such that (a) d = d_ EB do EB d+ (the direct sum); (b) the three subspaces are mutually orthogonal and Q orthogonal; (c) Q(x) is negative on d _, zero on do, and positive on d. Iff!4 is a closed subspace of d (and hence complete), it may be decomposed as above. In this case the signature and nullity of Q(x) on f!4 are the dimensions of f!J _ and f!4 o, respectively. Theorem 3 also follows with f!4 replacing d in this case.
Theorem 2 A quadratic form Q(x) is WLSC on d if and only if there exists P(x) nonnegative and K(x) compact such that Q(x) = P(x) - K(x). In fact K(x) can be chosen nonnegative. Theorem 3 If J(x) is elliptic on d, then the spaces d_ and do above are finite dimensional. Their respective dimensions are characterized by Theorems 12 and 14 of Section 2.2. Finally, we state a fundamental theorem for our study of eigenvalues of compact (operators) quadratic forms. This theorem is given in Hestenes [27, p. 561]. Theorem 4 Let J(x) be elliptic and K(x) be compact on d, respectively, such that K(x) ~ 0, x :I: implies l(x) > 0. Then there exists a real number b such that lex) + bK(x) is positive definite on d. If in addition K(x) ~ when› ever lex) ~ 0, then b may be chosen to be positive.
2.2 Further Ideas of Hestenes The purpose of this section is to present unpublished parts of an elegant theory of quadratic forms due to Hestenes. Some of this work is found throughout [27], but often in different form. Our intent is to illustrate both the ideas of Hestenes and some theoretical directions in which this work can go. Because of practical limitations we shall often summarize this unpublished work and present few examples or other forms of aid to the reader. Similarly there will often be omission of proofs and intermediate results. This regretful limitation imposes a hardship on the reader and on the clarity of the ideas of Hestenes. Except for minor instances the text is as
2.2 Further Ideas of Hestenes
63
given by Hestenes. This section could (should) be omitted on a first reading and optional for many readers of this book. However, Theorems 12 and 14 should be understood by all readers. We are interested in three related concepts in this section, which we designate as Q closure, Q-linear functionals, and index theory of a quadratic form Q(x) on a vector space d. The first concept is somewhat similar to a projection theory of linear, self-adjoint operators and involves relationships Q Q (for example) ofthe subspaces f!4, f!4 (defined below), f!4 o = f!4 !l f!4 , (f!4 o)Q, Q and (f!4 )o of d. The second concept involves a linear functional L(x) on d and is primarily concerned with problems of representing L(x) on d or a subspace of d, in the form Q(z, x) where z is fixed. This representation leads to the Euler-Lagrange equations and a multiplier rule. The last concept allows us to count the negative space and the null space of Q(x) on subspaces f!4 of d and to relate these indices to linear functionals L(x). We begin with several definitions. Let Q(x) be a real quadratic form defined on a Hilbert space d. Two vectors x and y in d will be said to be Q orthogonal if Q(x, y) = O. Let [JO be a linear manifold in d. By the Q› orthogonal completment,[JOQ of f!4, will be meant the class of all vectors z in d that are Q orthogonal to every vector y in f!4. A vector x in f!4 0 = f!4 !l [JOQ will be called a Q null vector in f!4. We have Q(x) = Q(x,x) = 0 on f!4 o. It should be observed that do = d !l d Q = d Q Observe also that if!!fi is the Q sum !!fi = f!4 + ’ll of two linear manifolds f!4 and s’, then !!fiQ = f!4 !l CQ. In order to illustrate these ideas consider the real two-dimensional space of points z = (x, y) with Q(z) = x 2 - y2, Q(Zl’ Z2) = XIX2 - YIY2’Iff!4 is the Q line Y = mx, then f!4 is the line my = x. In this event [JOo = f!4 r, [JOQ = 0 2 2 unless m = 1. If m = 1, then f!4 0 = f!4 = f!4 Q Observe that if [JOo = 0 (i.e., m2 i= 1), then d is the direct sum d = f!4 + f!4Q. If f!4, ’ll are distinct lines Q through the origin such that f!4 !l ’ll = 0, then d = f!4 + ’llQ = ’ll + f!4 , as one readily verifies. A generalization of this result is given in the following lemma. The proof is not easy and appears in Hestenes’s unpublished work. Lemma 1 Let [JO be a finite-dimensional subspace of d. Then m = Q dim f!4 - dim(f!4 !l do) is the codimension of f!4 , that is, m is the dimension Q of a maximal subspace ’ll such that C !l f!4 = O. Let ’ll be a subspace of dimension r. If r > m, then’ll !l f!4Q i= O. If f(} !l f!4Q = 0, then r :-:::; m,
(1)
d
= [JO + ’llQ
and
If’ll !l [JOQ = 0 and r = m, then
If (2) holds, then r = m. There exists a linear manifold’ll such that (2) holds.
64
2 Abstract Theory
In Lemma 1 we note that if Q is positive definite, we have the usual results as if Q were an inner product. To illustrate the depth and flavor of these ideas we have included Hestenes’s proof of Theorem 2. Thus Theorem 2 Let Q be a quadratic functional on a linear spaced and let flJ be a finite-dimensional subspace of d. If flJo = gJ n flJQ = 0, then
(3) In general we have
(4) Moreover, (5)
gJQQ
= fJB + do, flJoQQ = gJQ o = fJB o + do,
where flJoQQ = [(flJo)Q]Q and flJQo = (flJQ)o = flJQ n flJQQ. If flJo = 0, then flJoQ = .91, consequently (3) is a special case of (4). We shall establish (4) under the assumption that flJ -=I O. The proof for the case in which flJ0 = 0 is obtained by omitting appropriate remarks concerning flJo› Let Xl’ .. , Xm be a basis for flJo and choose Yt> ... , Yq in flJ such that Xl’ ... , X n Yl’ ... ,Y q forms a basis for flJ. The determinant
(6)
(rY.,p = 1,... ,q).
Ifthis were not so, the equations Q(y~, Yp)b p = 0 (« = 1, ... , q) would have solutions bI> ... , bq not all zero. The vector Y = Ypb p would be a vector in gJ such that X" = 0 (0’ = 1, ... ,m), Q(y~, Y) = 0 (rY. = 1, ... , q), the first equa› tion holding since x; is in flJo- It follows that Y would be in flJ0, contrary to our choice of our bases for flJ. Consequently (6) holds. Consider now a vector X in flJoQ, that is, a vector x such that Q(x", x) = 0 (0" = 1, , m). Select scalars bI>"" bq such that Q(Y~,x) = Q(Y~,Yp)bp («, P = 1, , q). The vector z = x - Y with Y = Ypb p satisfies the relations Q(Y~, z) = Q(Y~, x) - Q(Y~, Yp)b p = 0 (rY. = 1, ... , q). Since Y is in flJ, we have Q(x",y) = 0 and hence Q(x",z) = Q(x",x) - Q(x",y) = 0 (0’ = 1, ... , m). It follow that z is Q orthogonal to a basis for flJ and hence is in flJQ. Hence x is expressible in the form Y + z with Y in fJB and fJB Q. Consequently (4) holds, as was to be proved. In order to prove relations (5) observe first that whenever gJoQ = fJB + fJBQ we have (7)
fJB 0 QQ = flJQ n flJQQ
= fJBQ 0
Since flJ0 c flJQ, we also have flJQQ c flJ0 Q = flJ + flJQ. Inasmuch as flJ c flJQQ, the relation
(8)
2.2 Further Ideas of Hestenes
65
holds. Since PJ + PJ o = PJ, the relations (5) will follow from (8) and (7) when we have established the relation (9)
In order to prove (9) observe that 880 is of finite dimension, since 88 is. By Lemma 1 with 88 replaced by PJ 0 there is a subspace Cfj’ such that
d = PJ o +Cfj’Q = Cfj’+ PJoQ,
(10)
Cfj’nPJoQ=O.
Consequently, by (7) do = Cfj’Q n PJoQQ = Cfj’Q n PJoQ. Since PJQ o :::J 880 , we have PJQ o = PJQ o n (880 + Cfj’Q) = 880 + Cfj’Q n 88 0 Q = 880 + do. This com› pletes the proof of Theorem 2. To anticipate the next fundamental definition we set (Lla) \
Cfj’ = PJQ,
PJ o
=
PJ
n
PJQ
= 88 n
ce,
It is easily verified that
(lIb) (11 c)
:!J c ceQ,
:!JQ
= PJQ =
88oQ :::J :!J oQ = ceoQ:::J
-e,
880 c Cfj’o =
s.,
ce + ’{jQ:::J:!J +:!J Q =:!J + PJQ.
The manifold PJ will be said to be Q closed relative to d or simply Q closed in case (12)
(PJo)Q
=
(PJ n PJQ)Q
=
PJ
+ PJQ.
IfPJ is Q closed, then by (11), Cfj’oQ = Cfj’ + Cfj’Q, :!J oQ = PJ + 88 Q. Consequently, if PJ is Q closed, so are Cfj’ = 88Q and:!J = PJ + Cfj’0 = PJ + 88Q o . Moreover, as was seen in the last paragraph, (13)
PJoQQ
= PJQo,
PJQQ
= PJ + PJQo.
Conversely, if Cfj’ = PJQ is Q closed and (13) holds, then 88 is Q closed. This follows because 880 Q = 880 QQQ = ’(j0 Q = Cfj’ + Cfj’Q = 88Q + PJQQ = PJQ + 88 + 88Q o = PJ + 88Q. Here we have used the identity .@QQQ = f!), which holds for any subspace f!). It should be noted that if PJQQ = PJ + do, then Cfj’0 =
88Qo = PJQ n PJQQ
= PJQ n
(88
+ do)
=
880
+ do,
Cfj’oQ
= 88 oQ .
If in addition Cfj’ = PJQ is Q closed, we have 880 Q = ’(joQ = ’(j + ceQ = PJQ + PJQQ = PJQ + 88 + do = 88 + 88Q. Consequently PJ is Q closed in this event. The results obtained above are summarized in the following theorem. Theorem 3 Let PJ be a subspace of d. If 88 is Q closed, then 88Q and are Q closed and (13) holds. If either PJQ or PJQQ is Q closed and (13) holds, then PJ is Q closed. If PJ is Q closed and
88 QQ (14)
PJoQQ
= 880 + s;/ 0
66
2 Abstract Theory
as is the case when dim flJo < (15)
flJQQ
= flJ + do,
00,
then flJQ and flJQQ are Q closed and flJQo
= flJQ n flJQQ = flJo + do.
If either flJQ or flJQQ is Q closed and the first relation in (15) holds, then flJ is = g n do,
Q closed and (14) holds. Ifg is a linear manifold such that flJ n do then flJ is Q closed ifand only if g is Q closed.
The last statement follows from the relations flJoQ = g oQ, flJQ g + gQ = flJ + flJQ, and the definition of Q orthogonality. We have the following dual of Theorem 2.
= gQ,
Theorem 4 Suppose the Q-orthogonal complement ;!4Q ofa linear manifold PJ is offinite dimension. Then flJ is Q closed ifand only if
(16)-
codimflJ = dimPJQ - dimPJ n do
where codim flJ is the codimension of flJ.
In order to illustrate the significance of these ideas Hestenes gives the following interesting example. Let T be the unit disk u 2 + v2 < 1 in the real uv space. Let d be the class of all real-valued continuous functions x: x(u, v) (u Z + v2 ~ 1) having continuous derivatives xu, x; on T and having a finite Dirichlet integral D(x) = h {x~ + x;} du dv. Then D(x, y) = Sd xuYu + xvvv}du dv. Let flJ be all x in d having x(u, v) = 0 on the boundary aT of T. By the use of the diver› gence theorem it can be shown that z is in flJD if and only if it satisfies the Euler equations ~z = Zuu + Zvv = 0 for the integral D. Hence ;!4D consists of all functions z in d that are harmonic on T. Given a function x in d, the function z defined by the Poisson’s integral formula . 8) 1 8,rSIn ( zrcos =-2 11:
Itt -tt 1 -
2
1- r
rcos
(8
Z
-
tj»
+ r zx(costj>,sllltj»dtj>
is a function in f1$D having the same boundary values at x. Hence y = x - z is in flJ and d = flJ + f1$D. Thus flJ is D closed. The class do consists of all x in sl such that D(x) = 0 and hence of all x such that x(u, v) is constant on T. Hence flJ n do = O. Similarly D(x) = 0 on f1$0 and hence f1$0 = 0 also. Since f1$o = 0 and d = flJ + flJD, f1$DD = f1$ + do, (f1$D)O = do. In view of this example, the concept of Q closure is closely connected with Dirichlet’sprinciple. This example can be generalized in nontrivial ways. The second concept of Hestenes is to restrict the ideas of a real linear function L: d --+ [Rl. This restriction depends on the concept of Q closure and leads to a representation of L(x) by a bilinear form Q(z, x) with z fixed. This in turn leads to integration by parts and the Euler-Lagrange equations. Thus a linear functional L on d will be said to be a Q linear functional on
2.2 Further Ideas of Hestenes
67
d if the class 8l of all x in d having L(x) = 0 is Q closed, that is, (~ n ~Q)Q = 8l + 8l Q It is clear that this concept depends not only on the quadratic form Q but also on the vector space d. A characterization of Q-closed subspaces is given in the following theorem. Once again we give Hestenes proof for flavor.
Theorem 5 Let L be a linear functional on d. If there exists a vector y in do = d Q such thafL(y) =I- 0, then L is a Q-linear functional on d. If L(x) = 0 for all x in do, then L is a Q-linear functional on d if and only if there is a vector X o in d such that (17)
L(x) = Q(xo, x)
for all x in d.
Let ~ be the class of all x in d such that L(x) = O. For the first statement if there is a vector y in do such that L(y) =I- 0, then d = ~ + span{ y}. Since y is in d Q and d Q c ~Q, it follows that d = ~ + ~Q and hence that ~ is Q closed. Consequently L is a Q-linear functional on d. Suppose next that L is expressible in the form (17). Then f!8 is the Q› orthogonal complement of the one-dimensional space C(j generated by Xo’ Since C(j is Q closed, it follows that f!8 = C(jQ is Q closed and that L is a Q-linear functional on d. Suppose finally that L(x) = 0 on do and that L is a Q-linear functional on d. Then ~ is Q closed and f!8 0 ::l do. If L(x) = 0 on d, then f!8 = d and (17)holds with Xo = O. Suppose therefore that L(x) =1= 0 on d. If~ 0 =I- do, let z be a vector in f!8 0 that is not in do. If 8l 0 = do, then d = ~ + 8lQ and there is a vector z in f!8Q which is not in do. In either case the linear functional M(x) = Q(z, x) has the property that M(x) = 0 whenever L(x) = O. There is accordingly a constant c such that M(x) = Q(z, x) = cL(x) for all x in d. Ifc = 0 then Q(z, x) = 0 for all x in d and z would be do, which is not the case. Hence with X o = zlc we have L(x) = (1/c)Q(z, x) = Q(x o, x) and (17)holds. This proves Theorem 5. For interest and completeness we give four further theorems without proofs. A collection oflinear functionals {L l , L z, . . . ,Lm} are linearly independent on d if aaLa(x) = 0 holds on d implies al = az = ... = am = O. Theorem 6 A subspace ~ in d is Q closed if and only if every Q-linear function L on d that vanishes on ~o + do has the property that there is a vector Yoin f!8 such that (18)
L(x) = Q(yo, x)
for all x in 8l and hence for all x in f!8 + do.
68
2 Abstract Theory
Theorem 7 Let L I , . . . .L; be linear functionals that are linearly in› dependent on do and let M t, ... ,Mn be Q-linear functionals that vanish on do. The class g ofall vectors x in d such that
(19)
La(x) = 0,
Mp(x) = 0
(0: = 1,... ,r; p = 1,... ,n)
is Q closed. A vector y in d is in gQ b l , . . . , b; such that
if and only if there exist multipliers
(20)
for all x in d. If M I ’. . . , M n are linearly independent on d then (21)
dim gQ = n
+ dim do = r + n + dim(g n
do).
Theorem 8 Let L t , ... ,Lm be m linear functionals such that every linear combination aaLa is a Q-linear functional. The class g ofall y in d such that (22)
(IX = 1, ... ,m)
is Q closed. An arc x in d is in gQ if and only if there exist multipliers A.l, ... , I’m such that Q(x,y) = A.aLa(y)for all y in d. If L I , . . . ,Lm are linearly indepen› dent, these multiplers are unique and dim gQ = m + dim(g n do). Theorem 9 Let!J8 be a Q-closed subspace in d such that dim f!JQ < 00. Let L be a linear functional on d such that L(x) = 0 on do. Then L is a Q-linear functional on d if and only if there is an element Xo in s;/ such that
(23)
L(y) = Q(xo, y)
for all y in !J8. Our third concept is that of signature, nullity, and relative nullity. The first two indices playa central part in this book. The latter appears oc› casionally, such as in abnormal problems of control theory. Thus let Q be a quadratic functional relative to d and let f!J be a subspace of d. The nullity n(!J8) of Q on !J8 or the Q nullity of f!J will mean the dimension of the class !J8 0 = !J8 n !J8Q of Q null vectors of !J8. If n(!J8) = 0, we say that Q is nondeqenerate on f!J. We shall be concerned mainly with the cases in which n(f!J) is finite. For example, as we have noted in Section 1.3, for normal differential equations on bounded intervals [a, b], the nullity of the classes considered were all finite. The relative Q nullity rn(!J8) of Q on !J8 relative to .sJ1 or, more simply, the relative Q nullity of !J8 will mean the dimension of the minimal linear Q. manifold g in f!J o such that f!J o = g + !J8 n do. Clearly, rn(f!J) = codim f!J o Moreover if n(f!J) is finite, n(f!J) = rn(f!J) + dim(f!J n do). Again we shall be primarily interested in the case in which rn(!J8) is finite. If!J8 is a subspace of a second subspace ((5, we define the relative Q nullity rn(!J8, ((5) of f!J relative
2.2 Further Ideas of Hestenes
69
to ce to be the dimension of the minimal subspace <S of fJ8 0 such that flJo = <S + fJ8 n ceo. Since flJ n ce ::::J fJ8 n do it follows that rn(flJ, ce) ~ rn{fJ8, d) = rn{fJ8). Moreover, rn{flJ,fJ8) = O. Suppose that ce and ::2 are maximal subspaces of fJ8 on which Q < O. It will be shown below that if dim ce < 00 then dim::2 = dim ceo This dimen› sion will be called the signature s(fJ8) of Q on fJ8 or, more simply, the signature of fJ8 or the Q signature of fJ8. Occasionally we shall refer to s(fJ8) as the nega› tive signature of fJ8. The positive signature of Q on fJ8 is of course the negative signature of - Q on fJ8. Again in our applications we shall be concerned mainly in the case in which the Q signature of flJ is finite. In order to illustrate these ideas let d be the real four-dimensional space of vectors x = (x 1 , X Z , X 3 , X 4 ) . Let Q be defined by the bilinear form Q(x,y) = 4X1y l - x 3 y3 - X 4 y4. Let fJ8 be the subspace generated by Yl = (0,1,0,0)\ Yz = (1,0,2,0)\ and Y3 = {O, 0, 0, W. Here do = fJ8 n do = span{yd and fJ8 0 = span{Yl’YZ}’The class <S such that fJ8 0 = <S + fJ8 n .rfo above may be generated by Yz. The class ce = span{Y3} is a maximal sub› space in fJ8 on which Q < O. Hence n(d) = 1, n{fJ8) = 2, rn{fJ8) = 1, s{fJ8) = 1, and s(d) = 2. The class fJ8Q = span{YI,Yz}. Hence dim«flJQ)o) = n(fJ8 Q) = 2, rn(fJ8 Q) = 1, s(fJ8 Q) = O. Lemma 10 Assume Q{x) ~ 0 on fJ8. If x is in fJ8, then x is in fJ8 0 only if Q(x) = O.
if and
Clearly if x is in fJJ 0 = fJ8 n fJ8 Q, then Q(x) = Q(x, x) = O. Conversely if Y in f14, then f(t) = Q(tx + y) = t1Q{x) + 2tQ(x, y) + Q(y) = 2at + b ~ 0 if Q(x) = 0, implies a = Q(x, y) = O. Lemma 11 Let ce and ::2 be maximal linear manifolds in fJJ on which Q < O. If dim s’ < 00, then dimf2 = dim-e’, Moreover, Q(z) ~ 0 for all z in ceQ n f14, the equality holding if and only ifz is in fJ8 oWe shall first prove that Q ~ 0 on ceQ n fJ8. Suppose, to the contrary, Q(z) < 0 for a vector z in fJ8 that is Q orthogonal to ceo If Y is in ce, then Q(y,z) = 0 and Q(y + az) = Q(y) + aZQ(z) < 0, unless y = 0 and a = O. Hence Q < 0 on ce + span {z}, contrary to the maximality of ceo It follows that Q ~ 0 on f14 n ceQ. Since dim (e < 00, (e is Q closed and fJ8 = ce + ce n ,qj. Ifz is in <S = ceQ n fJ8 and Q{z) = 0, then Q is in <So by Lemma 10. It follows that z is Q orthogonal to <S as well as ce and hence to fJ8 = ce + <S. Conse› quently z is in fJ8 o. Conversely, if z is in flJo , then Q(z) = 0 and z is in C Q n flJ. This proves the last conclusion in the theorem. Suppose next that dim f2 > dim ceo Then by Lemma 1, there is a vector z # 0 in f2 that is Q orthogonal to ceo Since Q(z) < 0, this is impossible since Q ~ 0 on ceQ n f14. It follows that dim f2 s dim ceo By symmetry dim ce s dim::2. Hence dim ce = dim::2, as was to be proved.
70
2
Abstract Theory
Theorem 12
The Q signature of PJ, if finite, is given by each of the fol›
lowing quantities. The dimension s of the maximal subspace ce in PJ on which Q < O. The dimension r of the maximal subspace § of PJ on which Q ::;; 0 and having § n PJ0 = O. (c) The dimension m of the minimal subspace ’?2 of PJ such that Q ~ 0 on PJ n ceQ. (d) The least integer k such that there exist k linear functional L 1 , . . . ,LK on fJ8 such that Q ~ 0 for all x in PJ having LaCx:) = 0 (IX = 1, ... ,k). (a) (b)
In the proof we assume that d = PJ. The number s given in (a) is, by definition, the signature of fJ8. Clearly r ~ s, where r is given by (b). Hestenes shows that the manifold § can be replaced by a manifold §’ of the same dimension r on which Q < O. It follows that r = s. Turn next to the index m defined by (c). If m < s there is a vector x =1= 0 in ((j belonging to §Q. Since Q(x) < 0 and Q ~ 0 on §Q, this is impossible, hence m ~ s. If§ n fJ8 0 =1= 0, we can express § in the form § = g + § n PJ0 with dim g < m and Q ~ 0 on gQ = §Q, which is also impossible, Suppose next there is a vector z in § such that Q(z) > O. Let §’ be all vectors in § that are Q orthogonal to z. Then g;Q = §Q + span{z}. If x is in g;Q then x = y + az with y in §Q. Since Z IS in § we have Q(y, z) = 0 and Q(x) = Q(y) + a 2Q(z) ~ O. This too is impossible by virtue of the minimality of §. Hence Q ::;; 0 on § and § n PJo = O. Consequently m s; r = s by (b) so that m = s, as was to be proved. In order to prove (d) let Yb ... , Yr be a basis for ((j. Then Q(x) ~ 0 for all x in ((jQ, that is, for all x such that La(x) = Q(Ya,x) = 0 (IX = 1, ,s); hence k s; s. Suppose next that there exists linear functional L 1 , ,Lk with k < S having the properties described in (d). Then there would exist a vector z =1= 0 in ((j such that La(z) = 0 (IX = 1, ... , k) and Q(z) < 0, which is impossible. Consequently k = s. This proves Theorem 12. Corollary 13 If Ll> ... , L k are linear functionals on PJ such that Q ~ 0 for all x in PJ having La(x) = 0 (IX = 1, ... , k), then s(PJ)::;; k. If § is a subspace of PJ such that Q ~ 0 on §Q n PJ, then s(PJ) ::;; dim §.
As a further result we have Theorem 14 Let PJ be a subspace ofd such that m(fJ8) = s(fJ8) + n(fJ8) < Then m(fJ8) is the dimension of a maximal subspace § of fJ8 on which Q ::;; O. Moreover, m(fJ8) is equal to the least integer k such that there exist k linear functionals L b . . . .L; with the property that Q(x) > 0 for all x =1= 0 in PJ satisfies the relations La(x) = 0 (IX = 1, ... , k). 00.
In the proofit is sufficient to consider the case PJ = d. Let ce be a maximal linear subspace of PJ = d on which Q < O. Then dim ((j = s(PJ) and Q(x) ~ 0
2.2 Further Ideas of Hestenes
71
for all x in ((jQ, the equality holding if and only if x is in f!4 0 = do. Moreover, Q :s; 0 in !!2 = tt’ + f!lo and dim !!2 = m(f!4). Let 8 be a maximal subspace of f!4 on which Q :s; O. Then 8 ::") f!4 0 = do and we can select a linear manifold ff in 8 such that 8 = ff + f!4 0 and ff r, f!4 0 = O. Since Q :s; 0 on ff, it follows from Theorem 12 that dimff:S; s(f!l). If dim ff < s(f!l), there is a vector z in tt’ which is Q orthogonal to ff and hence to 8. Since Q(z) < 0, z is not in ff nor in 8. Moreover, Q :s; 0 on !!2 + span{z} contrary to our choice of 8. Hence dimff = s(f!l) and dim s’ = m(f!l), as was to be proved. In order to prove the last statement let L 1 , . . , L n , where n = n(f!l), be n linear functionals that are linearly independent on f!4 o- Then Lix) = 0 (IX = 1, ... ,n) implies that x is not in f!lo unless x = O. Let Xl>’" .x,; where s = s(f!l), be a basis for ((j. A vector x i= 0 in f!l having La(x) = 0, Q(x p , x) = 0 (ex = 1,... ,n; p = 1,... ,s) is in ((jQ but not in f!l o. Consequently Q(x) > 0 and k :s; n + s = m(f!l). Clearly k ~ m(f!l). It follows that k = m(f!l), as was to be proved. Corollary 15 Ifthere exist k linear functionals L 1 , .. .L, on f!l such that Q(x) > 0 for all x i= 0 in f!l having La(x) = 0 (ex = 1, ... ,k), then m(f!l) :s; k. Theorem 16 Suppose the Q signature sed) of .91 is finite, and f!l is a subspace of d. Then the Q signatures s(f!l), s(f!lQ) and the relative Q nullities rn(f!l), rn(f!lQ) are finite and satisfy the inequalities (24) Iff!l is
Q closed,
then
rn(f!l) = rn(f!4Q),
+ s(f!lQ) + rn(f!l) = s(d), (26) n(f!l) = rn(f!l) + dim(f!l n do), n (f!lQ) = rn(f!4) + ned), (27) m(f!4) + s (f!4Q) :s; sed) + dim(f!l n do) :s; m(d) = m(f!lQ) + s(f!l), where m(tt’) = nett’) + s(tt’). (25)
s(f!l)
These relations hold when ned) = 00. This situation arises in the study of the second variation for parametric problems in the calculus of variation. However, in our applications we shall be mainly concerned with the case in which ned) is finite. In order to prove this result let tt’ and !!2 be, respectively, maximal sub› spaces in f!4 and f!lQ on which Q < O. Let 8 be a linear manifold such that (f!4 Q)o = 8 + do and 8 n do = O. The manifolds ’ff, fiJ, 8 are mutually Q orthogonal and have dimensions s(f!l), s(f!lQ), rn(f!4Q), respectively. Their sum ff = tt’ + !!2 + 8 is of dimension r = s(f!l) + s(f!lQ) + rn(f!lQ). A vector x in ff is of the form x = u + v + w with u in C, v in !!2, and w in 8 where
72
2 Abstract Theory
u, v, ware mutually Q-orthogonal vectors. Hence Q(x) = Q(u) + Q(v)+ Q(w) = Q(u) + Q(v) ::;; 0, the equality holding if and only if u = v = O. It follows that %0 = If and % n do = If n do = O. Consequently r ::;; sed), by Theorem 12. Since f1#0 c (f1#Q)o) it follows readily that rn(88) ::;; rn(f1#Q). This establishes (24). Suppose next that 88 is Q closed. Since rn(f1#) < 00 it follows that f1#Q o = Q) 880 + do and hence that 880 = g + 88 n do. It follows that rn(88) = rn(88 Q and that (25) holds. Since %0 = If we have %oQ = gQ = 88 0 = f1# + f1#Q. A vector x in %Q is in %oQ and hence is expressible in the form x = y + z with Q. yin f1# and z in 88 Since x and z are Q orthogonal to If}, so is y. Hence Q(y) ~ 0, by Lemma 11. Similarly Q(z) ~ 0 since x and z are Q orthogonal to ~. Inas› much as Q(y,z) = 0 we have Q(x) = Q(y) + Q(z) ~ O. Since % n do = 0, Q::;; 0 on %, and Q ~ 0 on %Q, we have dim % = r = sed), as was to be shown. The inequalities (27) follow from (25) and (26) and the theorem is established. The main result described in the last theorem can be put in another form. This is done in the following Corollary 17 Let 88 and :Yrbe linear manifolds in .91 offinite signatures s(88) and s(:Yr). Suppose that 88 c :Yr and that 88 is Q closed relative to :Yr. Let t§ be a maximal linear manifold in :Yr n 88Qsuch that Q ::;; 0 on t§ and such thatt§ n £’0 = O;thens(£’) = s(88) + dimt§.Moreover,dim<;§ = rn(88,£’)+ s(£’ n f1#Q), where rn(f1#, £’) is the relative nullity off1# with respect to £’. In the proof we can identify £’ with d. If~ and g are chosen as described in the proof ofthe last theorem, then t§ = ~ + If has the properties described in the corollary. Moreover, any other manifold t§ having these properties must be of the same dimension. The dimension of t§ is sometimes referred to as the order of concavity of £’ relative to 88. Theorem 18 Let L 1, . . . .L, be k linear functionals on .91 and q be the number of these in a maximal set that are linearly independent on do. If 88 is the class of all x in .91 such that Lrx(x) = 0 (IX = 1,... , k), then
(28)
s(d) ::;; s(f1#) + k - q,
sed) + n(d) = S(88) + d + k - q,
where d = dim(88 n do). This result is readily verified. As a further result we have Theorem 19 Let Ql> Q2 be quadratic functionals on .91 offinite signatures s 1, S2 and finite nullities n 1, n z, respectively. Let d be the dimensionofthe class ~ of vectors that are simultaneously Ql null and Qz null vectors of d. Suppose that
(29)
2.3
Approximation Theory of Quadratic Forms
73
on .91, or more generally that Q1(X) s 0 wheneverQz(x) :s; O. Then (30)
If the inequality (31)
holds for all x =I- 0 in .91 or more generally that Qz(x) :s; 0, then
if Q1(X) <
0 for all x =I- 0 in .91 such
(32) Let ({j’z be a maximal subspace on which Qz is negative with dimension Sz. Since Q1(X) is also negative on ({j’z, it follows that Sz :s; S1’ Let £0 1 and ~z be, respectively, the classes Q1 null and Qz null vectors of d. Then £0 = ~ 1 n ~ z› The dimension of C = ~z + ~z is Sz + nz. We have Qz:S; 0 and hence also Q1 :s; 0 on,g. Since C n ~1 = ~ it follows that S1 ~ Sz + nz - d and d s: n 1• This establishes the relations (30). The inequality (32) follows from (30), since d = 0 under the hypothesis imposed in the last statement in the theorem.
2.3
Approximation Theory of Quadratic Forms
We have seen in Sections 1,4,2.1, and 2.2, and shall see in other parts of this book, a part of a rich theory for an elliptic quadratic form J(x), defined on a Hilbert spaced as given by Hestenes. A fundamental part ofthis theory is concerned with the signature s and nullity n of J(x) on d. These indices were used by Hestenes and his students to develop a generalized Sturm› Liouville theory and a local Morse theory. In this section the theory of Hestenes is extended to elliptic quadratic forms J(x; a) defined on d(a), where a is a member of the metric space (L, p) and .91 (a) denotes a closed subspace of d. The major part of this extension is concerned with inequalities dealing with the signature s(a) and nullity n(a) of J(x; a) on d(a), where a is in a p neighborhood of a fixed point a o in L. We shall show in other parts of this book that the hypothesis for these inequalities is sufficiently weak so as to include many important mathematical problems. It is possible that the reader is not interested in the formal theory nor detailed proof. However, we ask that Theorems 6 and 7 be read for content. The reader may draw a circle with radius (j and ponder over (5) and (6). Condition (6c) leads to results associated with normal problems in differential equations such as in Chapter 3, while (6b) leads to results associated with abnormal problems in control theory as in Chapter 6. After finishing our L theory in Corollary 8 we show that this theory can be generalized to a theory of focal point (or oscillation) problems involving a resolvent space
74
2 Abstract Theory
{J’l’(ic)I.Ie in A = [a, bJ} and then briefly to eigenvalue problems for compact quadratic forms (linear operators). Applications of these ideas are found in Chapter 3 and in later chapters. The basic theory of Hilbert spaces, strong and weak convergence, and linear operators and quadratic forms is given in Sections 1 and 2. Our fundamental Hilbert space is denoted by .91; subspaces by fJI, Cff, f», ... ; elements of .91 by the letters x, y, Z, ... ; scalars by a, b, c, .... The inner product is denoted by (x, y), the norm by Ilxll, strong convergence by xq => Xo, and weak convergence by x q ~ X o’ We shall assume that subspaces of .91 are closed and the scalars are real. The latter assumption is for convenience; the complex case holds equally well. For convenience we repeat some concepts which appeared earlier. A real-valued function L(x) defined on .91 is said to be a linear form if it is linear and continuous. A real-valued function Q(x, y) defined on .91 x .91 is a bilinear form if, for each y in .91, Q(x, y) and Q(y, x) are linear forms in x. Ifxq ~ X o and yq ~ Yoimply Q(xq, yq) ~ Q(xo, Yo), then Q(x, y) is compact. H Q(x, y) = Q(y, x), then Q(x) = Q(x, x) is the quadratic form associated with the bilinear form Q(x, y). We assume throughout this book that bilinear forms satisfy Q(x, y) = Q(y, x). Q(x) is positive (negative, nonpositite, nonnegative) on .91 if Q(x) > 0 [Q(x) < 0, Q(x) ~ 0, Q(x) ~ OJ for x "# 0 in d. Q(x) is positive definite on d if there exists a positive number k such that Q(x) ~ kllxl1 2 on d. Q(x) is compact if xq ~ Xo implies Q(xq) ~ Q(x o). Q(x) is weakly lower semicon› tinuous (WLSC) if xq ~ Xo implies lim inf, = 00 Q(xq) ~ Q(xo). Two vectors x and y in dare Q orthogonal if Q(x, y) = O. The vector x is Q orthogonal to (!J if y in (!J implies Q(x, y) = O. The set of all vectors Q orthogonal to (!J is the Q orthogonal complement, denoted by (!JQ. fJI and Cff are Q orthogonal if each x in (!J is Q orthogonal to Cff. A vector x is a Q null vector of fJI if x in fJI ( l (!JQ. (!J 0 will denote the set of Q null vectors of fJI. The signature (index) of Q(x) on (!J is the dimension of a maximal, linear subclass Cff of (!J on which Q(x) is negative. The nullity of Q(x) on (!J is the dimension of (!Jo = (!J ( l (!JQ. Finally J(x) is an elliptic form on .91 if J(x) is WLSC on .91, and xq => Xo whenever xq ~ X o and J(x q) ~ J(xo). We note that Theorem 12 of Section 2.2 gives several characterizations of the signature s on (!J, Theorem 14 of Section 2.2 characterizes the sum m = s + n, and statements of Section 2.1 give characterizations of elliptic quadratic forms. We now state and derive fundamental inequalities which relate the signature and nullity of an elliptic form on a closed subspace of .91 to ap› proximating elliptic forms on approximating closed subspaces. The main results are contained in Theorems 4 and 5. Theorem 6 is a combination of these two theorems.
2.3 Approximation Theory of Quadratic Forms
75
Let ~ be a metric space with metric p, A sequence {O"r} in ~ converges to 0"0 in ~, written a, ~ 0"0’ iflim r = 00 p(O"" 0"0) = O. For each 0" in ~ let .91(0") be a closed subspace ofd such that (la)
If0", ~ 0"0’ x, in d(O"r), x, ~ Yo, then Yois in .91(0"0);
(lb)
If Xo is in .91(0"0) and 8> 0,
there exists (5 > 0 such that whenever p(a,ao) < (5, there exists satisfying Ilxo - xtrll < 8.
X
tr in .91(0’)
Lemma 1 Condition (lb) is equivalent to the following: Let 86’(0"0) be a subspace of .91(0"0) of dimension hand 8> O. There exists (5 > such that whenever p(ao, 0") < (5, there exists a subspace @(O") of .91(0") of dimension h with the property that if Xo is a unit vector in 86’(0’0) there exists Xtr in 86’(0’) such that Ilxo - xtrll < 8. Clearly this condition implies (1b) with h = 1. Conversely, let Xl" . ,Xh be an orthonormal basis for @(O"o). Given a > there exists (5 > such that if p(ao, a) < (5, then Xl tr ,’.. ,Xhtr is in .91(0") with Ilxk - xktrl1 2 < a/h. Assume that usual summation conventions with k, 1= 1,... .h. Letting X o= bkxk and X tr = bkxktr, where bkbk = 1 we have
IIxo- xtr ll2 = IIbk(xk - Xktr)11 2 ::; (Ibklllxk- XktrlD2 ::; (b~J(llxl - x/trlllix/- xltrlD ::; h(a/h) =
8.
This concludes the proof of the lemma. The approximation hypotheses for quadratic forms are now stated. For each a in ~ let J(x; a) be a quadratic form defined on .91(0’) with J(x, y; a) the associated bilinear form. Let s(a) and n(a) be the index and nullity of J(x; 0") on .91(0’). For r = 0,1,2, ... let x, be in d(a r), Yr in d(a r) such that: if x, ~ XO, Yr => Yoand a, ~ 0’0 then (2a) r=oo
(2b) r=
00
and lim J(x r; a r)
(2c)
=
J(xo; 0’0)
r= 00
implies x,
=> X o ’
Lemma 2 Assume condition (2a) holds. Let a 0 be given. Then there exists (5 > 0, M> 0 such that p(a,ao) < (5 implies IJ(x, Y; 0’)1::; MllxlIllyllfor, all x, yin .91(0’).
76
2 Abstract Theory
Suppose the conclusion does not hold. Then for r = 1, 2, ... we may choose a; in ~ and x., Yr in d«(Jr) such that Ilxrll = IIYrl1 = 1, p«(J,,(Jo) < 11r, and a; = IJ(x" Yr; (Jr)1 > r. Now xr = x.]a, => 0 and Yr = y.]«; => 0 so by (2a) 1 = J(x" Yr; (Jr)
~
J(O, 0; (Jo) = O.
This contradiction establishes the result. Theorem 3 If (2a) and (2c)hold then either J(x; (J) or -J(x; (J) satisfy (2b). Suppose the conclusion does not hold. Then there exists sequences {(Jr}, {Yr}and {zr} (r = 0,1,2, ... ) such that a, ~ (Jo; Y" z; in d«(Jr); y, ~ Yo, z, ~ Zo; and r= 00
lim J(y" Zr; (Jr)
=
B,
=
C> J(zo; (Jo)
r= 00
and lim J(zr; (Jr) r= 00
where A, B, and C are real numbers by Lemma 2. Thus the equation [A - J(yo; (Jo)]a 2 + 2a[B - J(yo,zo; (Jo)] + [C - J(zo; (Jo)] = has two distinct real roots aI’ a z. For i = 1,2 and r aiYr + z; so that x., ~ X Oj. By the definition of a., J(Xri;
(JJ
=
J(Yr; (Jr)a;
+ 2a jJ(y" Zr;
(Jr)
+ J(zr; (Jr)
=
0,1,2, ... let Xrj
~ Aa;
=
+ 2Baj + C
= J(yo; (Jo)a; + 2a jJ(yo,zo; (Jo) + J(zo; (Jo) = J(XOj; (Jo) so that from (2c) Xrj => XOi (i = 1,2). Since al i= a2 then Yq => Yo and Zq Finally from (2a) we have
=>
zo.
A = lim J(y; (Jr) = J(yo; (Jo) > A. r=co
This contradiction establishes the theorem. Theorem 4 Assume conditions (la), (2b), and (2c) hold. Then for any (Jo in ~ there exists fJ > 0 such that p«(J 0, (J) < fJ implies (3)
Assume the conclusion is false. Then there exists a sequence {(J.} with a, ~ (Jo and s«(Jr) + n«(Jr) > s«(Jo) + n«(Jo). Let k = s«(Jo) + n«(Jo) + 1. For r = 1,2, ... there exists k orthonormal vectors Xl" Xz" ’Xkr in d«(Jr) with J(x; (Jr)::;; 0 on spanjxj,; ... , x kr}. For each p = 1, , k the sequence
2.3
Approximation Theory of Quadratic Forms
77
{Xp,} is bounded in S’1 and hence has a weakly convergent subsequence, which we may assume to be {xp!},such that x p, --+ xp’ By (la) x p is in S’1(0"0). Assume the usual repeated index summation convention with p = 1,... , k. Let b = (b l , . . , bk ) be arbitrary, set Yo = bpxp and Yr = bpxpr. Since Yr --+ Yo we have by (2b) J(yo; ao):::; lim inf J(y,; a,) :::; O. r= 00
Thus Xl’ ... ,Xk is a linear dependent set, for if not by Theorem 14 of Sec› tion 2.2, k - 1 = s(ao) + n(ao) ~ k. Choose b -# 0 such that Yo = bpXp = 0; also choose y, = bpXpr’ We note y, --+ Yo= 0 and
0= J(O; ao):::; liminfJ(y,; a.):::; limsupJ(y,; 0",):::; O. r=
r= 00
00
Hence J(Yr; 0",) --+ 0 = J(O; ao) so that Yr = 0 by (2c). Finally 0 = lim,=ooIIY,11 2 = bpbp -# O. This contradiction establishes the theorem.
Theorem 5 Assume conditions (lb) and (2a) hold. Then for any ao in L there exists f> > 0 such that p( a 0, 0") < f> implies
(4) Let .?4(0"0) be a maximal subspace of S’1(0"0) such that J(x; ao) < 0 on &6(a 0)’Let x., ... , Xh be a basis for &6(a 0)’By Lemma 1 and conditions (1b) and (2a) there exists a basis Xl,,., ... , Xhcr for &6(0") such that if x; = apxpcr and Apq(O") = J(xpcr,xqcr; 0") then
F(a,O") = J(x cr; a) = apaqApq{O")
(p, q = 1, ... , h; p, q summed)
is a continuous function of a at a oBy the usual arguments for quadratic forms we may choose M < 0 and b > 0 such that F(a, 0"0) :::; 2M apa p and
F(a,a) = F(a,O"o)
+ [Apq{O") -
ApiO"o)]apa q:::; Mapap,
where p(ao,a) < b. This completes the proof. Combining Theorems 4 and 5 we obtain
Theorem 6 Assume conditions (1) and (2) hold. Then for any a o in there exists f> > 0 such that p(a, 0" 0) < b implies
~
(5)
Corollary 7 Assume b > 0 has been chosen such that p(0", a 0) < b implies equation (5) holds. Then if p(a, a 0) < b we have (6a)
n(O"):::; n(ao),
78
2 Abstract Theory
(6b) n(a) = n(O"o)
implies
s(O")
= s(O" 0)
and
implies
s(O")
= s(O"o)
and
and
(6c) n(O"o) =
n(O")
= 0.
Condition (6a) holds since n(O"o) < n(O") and s(O"o):::; s(O") contradict the last inequality in (5). If n(O") = n(O"o), then starting with the last inequality in (5) we have s(O") + n(O"):::; s(O"o) + n(O"o) = s(O"o) + n(O") or s(O"):::; s(O"o)’ But s(O"o) :::; s(O") and hence (6b) holds. Finally if n(O"o) = 0, (5) becomes s(O"o):::; s(O") :::; s(O") + n(O") :::; s(O" 0) and hence all inequalities are equalities. Finally in the L setting we have
Corollary 8
The set {O" in Lln(O")
= O] is open.
The set {O" in Lln(O") =1=
O]
is closed.
This completes the L theory. We next show that the resolvent hypothesis involving the collection {J’f’(),)IA. in A = [a, bJ} satisfies the hypotheses and conclusions of the L theory. This resolvent theory will be referred to as the A theory or A setting. Let a,b be real numbers (a < b) and define A = [a,b]. Let {J’f’(A)IAin A} be a one-parameter family of closed subspaces of .91 such that J’f’(a) = 0, J’f’(b)= A, and J’f’(A1 ) c J’f’(A2) whenever AI, A2 in A, Al < A2’ We require that one (or both) parts of the additional hypothesis is satisfied: (7a)
whenever a s; Ao < b,
and (7b)
whenever a < Ao :::; b.
Lemma 9 If f!lJ is a closed subspace of .91, {xn } c f!J, x;
-->
Yo, then Yo
in f!lJ.
The proof of this lemma is a property of Hilbert spaces and will be left to the reader. We now consider (1) and (2) in the A setting. Thus we set L = A = [a, bJ and p(At> A2 ) = 1..1. 2 - All. We show that the hypotheses in (7) are stronger than those in (1). The converse of Theorem 10 holds in our setting and is left as an exercise. If Ar :::; Ao and Ar --> Ao, we write Ar » Ao.
Theorem 10 Hypothesis (7) implies (1). In particular (7a) implies (la), while (7b) implies (lb). For (la) let Ar --> Ao, x, in J’f’(Ar ) and x, --> Yo’Ifthere exists a subsequence such that )’rk }’ Ao, we have {xrJ c J’f’()’o)so that Yo is in J’f’(Ao) by Lemma 9.
PrJ
2.3 Approximation Theory of Quadratic Forms
79
Thus assume )’r ’" )’0 and Ar i= b. Let Abe given and satisfy AO < A:::;; b. By (7a) there exists N such that r :2: N implies x, in £(A). By Lemma 9, Yo is in £(J:). Finally Yoin JIf(Ao) follows from (7a). For (lb) assume X o in JIf(Ao) and e > 0 is given. We assume Ao i= a; if Ao = a, the result follows immediately as Xo = O. From (7b) there exists A(a:::;; I < Ao), x in JIf(I), such that Ilx- xoll < e. Let b :::;; AO - A. Then A:2: AO - b implies A :::;; )’0 - b < A, JIf(I:) c JIf(A), and hence x is in £(A). This completes the proof. The form l(x) is elliptic on .91 if conditions (2b) and (2c) hold with l(x) replacing l(x; 0-) and .91 replacing d(a). Let l(x; A) denote the restriction of l(x) to JIf(A). The following theorem is immediate as l(x) is elliptic on d.
Theorem 11 The forms l(x; A) satisfy hypothesis (2). The signature and nullity of l(x) restricted to JIf(A) are now defined. The signature (index) of l(x; A), written s(),), is the dimension of a maximal linear subclass pg of £(A) on which l(x; A) < 0 for x i= O. The nullity of l(x; A), written n(A), on £(A) is the dimension of the subspace ’{j of £(A), where ’{j = {y in d(A)ll(x,y; A) = 0 for all x in £(A)}. We shall denote the set ’(j by £o{),). The symbolism S(A - 0) is used to denote the left-hand limit of S(A). Similar remarks hold for S(A + 0), n(A- 0), and n(A + 0). We set s(a - 0) = 0, n(a - 0) = 0, and s(b + 0) = s(b) + n(b). The first two statements in Theorem 12 have been given in Theorems 12 and 14 of Section 2.2.
Theorem 12 The quantity S(A) is the dimension of a maximal subspace Iff of d(A) on which l(x; A) :::;; 0 such that Iff(’) do(A) = O. The sum m(),) =
S(A) + n(A) is given by the quantity: The dimension of a maximal subspace :!fl of d(A) in which l(x; A) :::;; O. Thus the quantities S(A) and m(A) are non› decreasing functions on A. A point A at which S(A) is discontinuous will be called a focal point of l(x) relative to d(A) (A in A). The difference f(A) = S(A + 0) - S(A - 0) will be called the order of Aas a focal point. A focal point A will be counted the number of times equal to its order. The term focal point is often replaced by the terms conjugate point or oscillation point. The latter term is used when we wish to emphasize the ideas of differential equations. The former term was used to emphasize boundary conditions, but we make no such distinction. We now give inequalities involving S(A) and n(A). We note that inequalities (8)-(10) have been given in the more general a setting of hypotheses (1) and (2). Thus they follow immediately by Theorems 4 and 5, respectively.
80
2
Abstract Theory
Theorem 13 Assume hypothesis (7a) holds. Let Ao in A be given. Then there exists b > 0 such that Ain A and I), - Aol < b imply
S(A) + n(l) S S(AO) + n(),o)•
(8)
In particular s(a n()-o + 0).
+ 0) = n(a + 0) = O.
Finally s(),o)
+ n(Ao) = S(Ao + 0) +
Inequality (8) holds by our discussion. The second result follows as s(a) + n(a) = O. Finally from Theorem 12 and (8), m().o + 0) ~ m(),o) ~ m(Ao + 0). Theorem 14 Assume hypothesis (7b) holds. Let )’0 in A be given. Then there exists b > 0 such that Ain A and I), - Aol < b imply
(9)
In particular s(A.o - 0) = S(AO)’ Inequality (9) holds by our discussion. The remaining result holds since s()•o - 0) s s(),o) S s(),o - 0). Theorem 15 Assumehypothesis (7) holds. Let )’0 in A be given. Then there exists b > 0 such that Ain A and IA - )’01 < b imply
S(AO) S S(A) S S(A) + n(A) S s()’o)
(10)
+ n(Ao)’
In addition we have,for such )" (11)
(l2a)
n(Ao) = 0
implies
S(A) = s(),o)
and
n(A) = 0,
implies
S(A) = S(Ao)
and
m().) = m(Ao).
n(A) S n(),o),
and (12b)
n(A) = n(Ao)
Normal problems of differential equations satisfy Theorem 16. In Section 3.1 we picture this phenomena when n(A) is zero or one. In Chapter 6 we obtain a focal interval theory for (abnormal) control problems using Theorem 15 when the hypothesis of Theorem 16 is not applicable. Theorem 16 Assume A1 #- A2 in A implies £’0(A1 ) n £’0(A2) = O. Then a S A < A1 S b implies S(A) + n(A) S S(A1)’ In addition if (7a) holds then S(A + 0) = S(A) + n(A). If (7) holds then f(A) = n().) and the set A 1 = {A in Aln(),) #- O} is finite.
For the first result let qj = !YJ EEl £’o(A) be a subspace of £’(A), where !YJ is a maximal subspace such that x#-O in!YJ implies J(x; A) < O. By Theorem 12
2.3
Approximation Theory of Quadratic Forms
+ n(A) ::s; s(Ad. If(7a) holds we have the inequalities S(A + 0) ::s; S(A + 0) + n(), + 0) = S(A) + n(A) ::s; S(A + 0). Finally if (7b) also holds we have f(A) = s(). + 0) - S(A) + S(A) - S(A -
81
we have s(),)
0) =
n(A). Thus A 1 is finite as s(b) is since J(x) is elliptic on d. For our final effort in this section we shall give Theorem 17, which allows us to extend the results of Theorems 6 and 7 to further applications such as eigenvalue problems. Thus assume M = I x ~, I an open interval of [Rl, is a metric space with metric d defined by
d(Jil,Ji2) = 1~2
-
~11 + p(a2,al)
for any pair of points Jil = (~l,ad, Ji2 = (~2,a2) in M. Let s(Ji) = s(~,a), n(Ji) = n(~, a) be the index and nullity of J(x; Ji) on d(Ji); let m(Ji) = m(~, a) = s(~, a) + n(~, a). Theorem 6 and Corollary 7 hold with the obvious modi› fications.
Theorem 17 Let conditions (1) and (2) be satisfied with Ji = (~, a) in M replacing a in ~. For fixed a let the signature s(~, a) be a monotone function of ~ such that s(~ + 0, a) = s(~ - 0, a) implies n(~, a) = O. Let Jio = Ro, ao) in M be given such that s(~o - 0, ao) = n, s(~o + 0, ao) = m. Then there exists 00 > 0 such that I~ - ~ol ::s; 0 0 and oto; ao) ::s; 00 imply that s(~, a) is between nand m. Assume s(~, a) is monotone increasing on an interval I and hence n ::s; m. Choose S > Ososmallthats(~,ao) = nfor~in(~o - 2b, ao) c Iands(A,ao) = m for ~ in (~o, ~o + 20) c I. By assumption n(~o - 0, ao) = n(~o + 0, ao) = O. Finally choose 00,0 < bo ::s; 0, such that p(a,ao) < 0 0 implies (5) holds for both Jio = (~o - b, ao) and Jio = (~o + 0, ao). By (6c), s(~o - 0, a) = nand s(~o + b, a) = m for all a such that p(a,ao) < 00 , The theorem now follows by the monotone condition. To be mathematically consistent, we should now derive an approximation theory for eigenvalues and a theory for numerical approximations similar to the (resolvent) A theory above. These three extensions (and their inter› sections) of the basic ~ comprise the remainder of this book. However, these two theories are postponed until Section 3.3 and 3.2, respectively, when they can be presented with examples and meaningful commentary.
Chapter 3
The Second-Order Problem
3.0 Introduction The purpose of this chapter is to present a lengthy discussion of the second-order problem. In many senses this chapter is the most satisfying chapter of this text. The example problems are more easily understood and the solutions are more complete than in other chapters. Similarly, this chapter provides examples of more complicated problem areas in subsequent sec› tions. Our intent is that the reader understand this chapter as thoroughly as possible. We have had some difficulty in dividing this chapter into sections. Problem areas such as focal-point problems, numerical problems, and eigen› value problems can be treated both separately and together depending on the particular problem at hand. While this is a major strength of our theory, it does create some problems in exposition. Many technical details and proofs are put off until Section 3.5 so that the reader is not distracted. In Section 3.1, we examine the duality between the focal-point theory of quadratic forms and the oscillation point theory of second-order differ› ential equations. Several figures and constructive examples are included for exposition purposes. We hope the reader will make a special effort to grasp these ideas. Section 3.2 contains the numerical theory of second-order prob› lems. This section is divided into three parts. The first part is to present the basic algorithm and relevant theory on a fixed interval. The second part is to extend these results to give a numerical-focal-point theory. Finally, we give test cases, program listings and numerical computer results for two numerical procedures. The most important procedure is a dynamic method which computes the solutions at points a n + 1 from the values at a n - 1 and 82
3.1
The Focal-Point Problem
83
an’ The second procedure is a relaxation method on a fixed interval. We include this second procedure to anticipate the numerical results in Section 5.2 on partial differential equations. Section 3.3 contains the single-eigenvalue problem. We have included much expository material and several figures to illustrate the duality between eigenvalue theory and focal-point theory. An elaborate approximation theory that includes numerical-eigenvalue-focal-point problems is given here. Sec› tion 3.4 treats the numerical single-eigenvalue problem and the complete double-eigenvalue problem. In each problem area, we modify the computer algorithm given in Section 3.2 to develop fast, efficient, and accurate computer algorithms. Computer results and program listings are also included. Finally, Section 3.5 contains the proofs and some historical comments that were omitted in our earlier sections.
3.1
The Focal-Point Problem
The major purpose of this section is to interpret the focal-point theory contained in Theorems 9-16 of Section 2.3 for the case of the second-order problem of this chapter. For clarity we give some examples and figures to illustrate the indices s(Je) and n(Je) in Theorems 1 and 2 below. In later sections we shall combine the focal-point parameter Je with other parameters. For example, in Section 3.2 we obtain a numerical-focal-point problem by combining Je with a numerical parameter (J. To begin this section let
(1)
L(x) = [p(t)x’(t)]’ + q(t)x(t) = 0,
(2a)
J(x)
=
f
[p(t)X’2(t) - q(t)x 2(t)] dt,
J(x, y)
=
f:
[p(t)x’(t)y’(t) - q(t)x(t)y(t)] dt
and
(2b)
be, respectively, our second-order differential equation, the associated qua› dratic form, and the bilinear form. We assume p(t) > 0 and r(t) are piecewise continuous (for convenience) and that J(x, y) is defined on .91, where .91 is the set of arcs x(t) that are absolutely continuous on A = [a, b] with square integrable derivatives x’(t). .stI is a Hilbert space with norm = (x, x), where
Ilxll
(3)
(x, y)
= x(a)y(a) +
f:
x’(t)y’(t)dt.
The above extends the concepts of (real) finite-dimensional quadratic forms such as Q(x) = x T Ax and bilinear forms Q(x, y) = yTAx in Section 1.1
84
3 The Second-Order Problem
to infinite dimensions. In the finite-dimensional case, A is a symmetric n x n real matrix and x, y belong to the Hilbert space ~n with inner product (x, y) = yT x. The eigenvalues of A are real. The condition p(t) > 0 in (2) ensures that the signature [the "number" of negative eigenvalues of J(x)] is finite, and the nullity [the "number" of zero eigenvalues of J(x)] is zero or one for second-order differential equations. We now show that these indices count the number of oscillation points t* of (1),i.e., xo(a) = xo(t*) = 0, where Xo is a nontrivial solution of (1). For each A in A let Yf(A) denote the arc x(t) in d satisfying x(a) = 0 and x(t) == 0 on p, b]. The collection {Yf(A) A in A} is a resolution of Yf(b), that is (7) of Section 2.3 is satisfied. Let s().) denote the signature of J(x) on Yf(A); that is, the dimension of f(J where f(J is a maximal subspace of Yf(A) such that x =F 0 in f(J implies J(x) < O. Let n().) denote the nullity of J(x) on Yf(A);that is, the dimension of the subspace Yfo(A) = {x in Yf(}.)IJ(x, y) = 0 for all y in Yf(A)}.The sum m(A) = S(A) + n(A) is the dimension of £0, where £0 is a maximal subspace of Yf(A)such that x in £0 implies J(A) .::; O. Note that S(A) and m(A) are nonnegative, integer-valued, nondecreasing functions of A. For example, if Al < A2’ x(t) in Yf(Al) and J(x) < 0, then x(t) is in Yf(A2)and J(x) < O. Theorem 1 is a partial summary of our results from Theorems 15 and 16 of Section 2.3. If n().o) > 1 there exists at least two linearly independent solutions of (1) vanishing at Ao which is impossible for second-order differential equations.
I
Theorem 1 The indices S(A) and m(A)are nondecreasing functions of A. If Ao in (a, b) the left-hand limit S(Ao - 0) = s().o). The integer S(Ao + 0) - S(Ao) is equal to n(Ao) - n(Ao + 0) which is zero or one. Finally s()’o) = La
We recall from Section 2.3 that A is a focal point of J(x) relative to {Yf(A)IA in A} if f(A) = S(A + 0) - S(A - 0) is not zero. Theorem 2 contains a complete summary of the results of Theorems 15 and 16 of Section 2.3. Theorem 2 If Yfo(Al) n YfO(.A2) = 0 whena .::; )’1 < A2 .::; b, then f(a) = 0 and f(A) = n(A). Thus if )’0 is in A, the following quantities are equal:
the number of focal points on the interval a .::; A < Ao, the signature S(Ao) of J(x) on Yf(Ao), the sum La<:;A
)’0
Historically in the calculus of variations, a point Ao where n(Ao) =F 0 is called a conjugate or focal point since it is equivalent in our context to the result that there is a nontrivial solution of (1) vanishing at t = a and t = Ao. For this reason we also refer to },o as an oscillation point.
3.1
The Focal-Point Problem
85
Because "one picture is worth a thousand words," we ask the reader to pay special attention to Fig. 1. This picture illustrates many of the basic ideas in this chapter. Figure 1a illustrates a nontrivial solution xo(t) of (1) with xo(a) = O. Figure 1b illustrates the function S(A). Note that any solution x(t) of (1)with x(a) = 0 is a multiple ofxo(t). Figure 1c illustrates a comparison result. For example let
(4) with PI(t)
~
p(t)
> 0 and ql(t) ::s;; q(t). In this case
J I(X) - J(x)
=
s:
{[PI(t) - p(t)]x’Z(t) - [q 1(t) - q(t)]XZ(t)}dt
~ 0
or J 1(x) ~ J(x). Thus 0> JI(y) implies 0> JI(y) ~ J(y) for y in Jf’(A) or SI(A)::S;; S(A) for each Ain A, since these indices count the number of negative vectors for each quadratic form. Figure 1d illustrates the corresponding solution to L 1(x) = 0, x(a) = O. In Figs. 1b and 1c, n(A) == 0 except at the points Ai or Ai where the signature jumps in which case n(A) = 1. Note that the oscillation points of LI(x) occur after those of L(x) since J I(X) ~ J(x). Figure Ie illustrates the signature of an approximating quadratic form J(x; 0") to J(x). This concept will be introduced in Section 3.2, where 0" will represent a numerical parameter. In that case Fig. His the numerical solution to xo(t) in Fig. 1a. In Section 3.2 we extend the theory of Section 2.3 to obtain a numerical-focal-point theory. Thus let S(A, 0") and n()", 0") represent, respec› tively, the signature and nullity ofthe approximating quadratic form J(x; 0") on an approximating space d(O") n Jf’(A) of Jf’(A). The idea is that if 0" and are "small" then
II- AI
(5a)
S(A) ::s;;
sCA, 0") ::s;; s(I, 0") + nO:, 0") ::s;; S(A) + n(A)
and (5b)
n(A) = 0
implies
s(I,o) = S(A),
n(I,O") =
o.
In addition S(A,O") is nondecreasing as a function of A for fixed 0". Other examples of the 0" parameter are for perturbations of p(t) and q(t) or in eigenvalue problems as in Section 3.3. Note that in the approximation prob› lem illustrated by Figure Ie there is an interval about the conjugate points in which we do not know the value of S(A, 0"). As examples of the above remarks let us take a = 0, p(t) = PI(t) = 1, q(t) = 4, and ql(t) = 1. Hence for Fig. 1a L(x) = x"(t) + 4x(t) with extremal solution xo(t) = sin2t and oscillation points at Al = n12, Az= n, A3 = 3n12, .... In Fig. 1b the corresponding signature function S(A) satisfies s(O) = 0 and S(A) = n, where (nI2)n < A ::s;; (nI2)(n + 1) for n = 0, 1,2, .... Corre› sponding to Figs. 1c and 1d we have L1(x) = x"(t) + x(t) with extremal
86
3 The Second-Order Problem
r>.
~XO
I
I
I
I
>-~2
a
:3
2 (b]
,
0
0
,
>-2
>"3
,
>-,
a
I
>-~
b
_2_ o I
>..’I
>"3 >"2
2 (e)
I
b
:3
(, I II I ---+-t.===f..++---I-+-J---+
+-1
a
>"1
>’2
b
Fig. 1 (a) Extremal solution of L(x) = 0, x(a) = 0. (b) Signature SiAl of J(x) on X(A). (e) Signature 5,(1.) of J ,(x) on X(}.). (d) Extremal solution of L,(x) = 0, x(a) = 0. (e) Signature S(A,O’) of J(x, 0’)on X().) n d(O’). (f) Numerical extremal solution of J(x,O’) on X(A) n .sf d(O’).
solution X1(t) = sint and oscillation points at ),’1 = n, A~ = 2n, .. . . The corresponding signature function Sl(A) satisfies s(O) = 0 and s(},) = n, where ttn < A ~ n(n + 1) for n = 0, 1,2, .. . . Note that J l(X) = (X,2 - x 2)dt ~ (X,2 - 4x 2)dt = J(x) so that Sl()’) ~ seA). We close this section by constructing an example to demonstrate that J(x) = (X,2 - x 2) dt has negative vectors for b > tt. This will also be done in Section 3.2when we build numerical solutions which yield negative vectors.
It
It
It
3.1 The Focal-Point Problem
For the moment let us define, for
8
sin t, 1, x e(t) = . sm(t - s), { 0, Then J(xe(t» b
-
~ o [x~2(t)
=
-8
87
> 0 (see Fig. 2a), 0::;; t::;; nl2
nl2 < t ::;; nl2 + 8 nl2 + S < t ::;; n + e t> tt. + 8.
since
(t)] dt = ~"/20
2 Xe
[COS
+ I"+e
J"/2 +e
=
"/2
~o
(COS
.
2
. 2(t)] t - sm dt +
[COS 2 (t
2
f,"/2 + e (_1 2 ) dt ,,/2
_ 8) - sirr’(z- 8)] dt
. t - sirr’ t)dt -
8
+ f,"
(COS
2
~2
. t)dt. t - sin?
The reader may verify that pieces of xo(t) = sin t, "stretched" between con› jugate points by an amount 8 (each time) and vanishing elsewhere, provide n linearly independent vectors when snipped apart. Thus "continue" xJt) above by defining Yit) = xlt - n - 8) if tt + 8 ::;; t < 2n + 2s, and Yit) = 0 otherwise (see Fig. 2b). We now have a second negative vector such that any real linear combination z(t) = ()(xit) + Pyit) satisfies
J(z) = J«()(xit)
+ PYe(t» = ()(2( -
8) +
p2 ( -
8) < 0
if ()( and P are not both zero. This construction holds in the general second-order case since a necessary condition that J(x) have a negative vector is that q(t) in (1) be positive for some ~E(t) (a)
I
o
I
I
I
(litE I TTiE _1T 2 TT\ 2
(b)
l-l--+--+--+-""""---+-~--,t=.c--;;-:;,------+------~
o
88
3 The Second-Order Problem
values of t. That is, if q(t) ::;; 0 then J(x) > 0 in (2), for any nontrivial vector x(t) in Jf’(A), a < A::;; b. For example if q(t) = 0, p(t) = 1 the extremal solution is xo(t) = t - a which has no conjugate points. In fact J(x) = J~X'2 dt is positive for any nontrivial function x(t) in Jf’(A), a < A::;; b.
3.2 The Numerical Problem
In this section we treat the numerical solution of the second-order differ› ential equation problem discussed in Section 3.1, that is, the differential equation (1)
L(x)
= [p(t)x’(t)]’ + q(t)x(t)
= 0
with associated quadratic form (2)
J(x)
=
f [p(t)x/
2(t)
- q(t)x 2(t)] dt.
Our main practical result of this section is to construct a simple, easily obtainable, accurate numerical algorithm of the form (3)
so that the piecewise-linear vector x".(t) which satisfies x,,(ak) = Ck agrees with a solution xo(t) of (1), normalized by the condition xo(a + 0) = x".(a + (1), in the sense that (4)
lim fb [x~(t) (1---+0
Ja
-
x~(t>J2
dt = O.
The parameter <1 is the step size of our numerical process. The associated picture is contained in Fig. 1 and explained in more detail below. Condition (3) yields the "Euler-Lagrange solution" of a real symmetric tridiagonal matrix. The solution bears a similar relationship to the finite-dimensional quadratic form that xo(t) bears to J(x) in (2). The numerical solution x,,(t) also provides (constructively) the representation of a negative vector. For the reader, the numerical problem should provide a practical and meaningful example of our approximation theory. The approximating parameter <1 will denote the step size associated with a finite-dimensional approximation J(x; (1) of (2). For <10 = 0, J(x; 0) or J(x; (10) denotes the infinite-dimensional problem given by (2), and convergence <1 --+ <10 or p(<1o, (1) < () means that the step size <1 goes to zero. We shall also provide new ideas for symmetric tridiagonal matrices which will extend in more general problems of later chapters to block matrices. To avoid unnecessary confusion, we shall divide this section into three parts. In the first part we shall consider our problem on a fixed interval
3.2
00
01
02
ok_1
0k
0ktl
The Numerical Problem
oN_IoN
89
0Ntl
Fig. 1 Solid line: true solution; dashed line: linear approximation. Note: if tin [ak-1,ak+l]’ otherwise.
[a, b]. This will avoid the technical difficulties of defining and explaining resolution ideas before they are really necessary for understanding. The second part will be more technical and involve the resolution of spaces d(u) by spaces {.@(Je, o) = .n"(Je) (\ d(u) Ia ~ Je ~ b}, where .n"(Je) is defined below or in Section 3.1. In this part, our ideas are given in a more formal "theorem-definition-proof" style. In the third part we shall discuss two types of numerical algorithms, test cases, and numerical computer results. We have included a numerical relaxation method to illustrate the numerical ideas in Chapter 5. Proofs are deferred until Section 3.5 when they may detract from the exposition. Perhaps our ideas are best explained by consulting a figure. In Fig. 2 we have "commutative diagram" of our procedure. Arrow (1) denotes ideas originally given by Hestenes, which were redeveloped in Section 1 to "fit" into the overall picture. Arrows (2) denote approximation ideas of Section 2.3 applied to the numerical problem. Arrows (3) denote new constructive ideas (algorithms) which include the Euler-Lagrange equations for tri› diagonal matrices. For convenience of presentation, we now describe the basic numerical procedure: (i) Let L(x) = 0 be the differential equation in (1) that is the Euler-Lagrange equation of the quadratic form J(x) in (2), and let xo(t) be a nontrivial solution of the differential equation satisfying xo(a) = O. (ii) Approximate the vectors x(t) on [a,b] by spline functions of degree 1 (order 2) so that the approximation finite-dimensional quadratic form is
90
3 The Second-Order Problem
L(x) x(a)
= (px’)’ + qx = 0 = 0, x(A) = 0
J(x) = S:(PX,2 - qx 2)dt X
in AC, x’ in L
x(t) =
0
2
if t
,
x(a) = 0; ~
Completion of diagram
Numerical Approximation of L(x)
(2)
Numerical Approximation of J(x) by Splines
(3)
A Fig. 2
J(x; a) = bTD E, where x•= b"z" is in d(a) (see below), D" is a symmetric tridiagonal matrix, and a is a parameter denoting the distance between "knot points." (iii) Obtain the Euler-Lagrange equations for D,,; call the solution xAt). (iv) Show that xAt) converges to xo(t) as a -.0 in the strong L 2 derivative norm sense of (4). We begin by constructing the approximating quadratic form J(x; a) = xTD(a)x of J(x) in (2) and in particular, the elements d",p of D(a). Let a be small and positive and choose a partition of the interval [a, b], namely, n(a) = (ao = a < a1 < a2 < ... < aN+ 1 ~ b), where ak = a + ka (k = 0,1,2, 3,... , N). For convenience, we assume aN+ 1 = b. For each k, let zk(t) be the spline hat function of degree 2, namely, (5)
if tIll [ak-1,ak+1] otherwise.
We note that {Zl,’" ,ZN} is a basis for the vector space of piecewise linear functions xAt) which vanish at t = a and at t = b. We denote this space by d(a). The dotted line in Fig. 1 is an example ofa vector in d(a). To define the approximating quadratic form J(x; a) we choose pAt) = p(at) and q,,(t) = q(at) if ak ~ t < ak+ 1, where at = ak + a /2. Then (6)
defines a quadratic form on the space d(a). If x(t) = b"z,,(t) is in d(a) (repeated indices are summed), we have J(x; a) = J(b"z", bpzp; a) = b"bpJ(z", zp; a) = b"bpd"p. The following result is immediate. Theorem 1 J(x; a) defined by (6) on the space d(a) is a quadratic form whose associated matrix D(a) = (d"p) is symmetric and tridiagonal.
91
3.2 The Numerical Problem
To construct d"p we note that if lac - PI;:::: 2 we have d"p = 0 since Zk(t) vanishes outside of the interval (ak-b ak+ 1)’ Thus
r.
dk,k = J(Z1> Zk; a) = =
r.
[pAt)Z~2(t)
- qa(t)zr(t)] dt
(1)2 -(j dt -
* i ak = P(ak-d ak-, *) ( k +pa
[Pa(t)z;/(t) - qa(t)zr(t)] dt
q(at-l) a
--2-
iak (t ai; -,
f +’( - -a1)2 dt -q(at) -2- f +’(ak+1-t )2 dt a Uk
Uk
Uk
Uk
_ ap(at-1) _ q(at-1) (1)( _ )31 2 2 3 t ak-l
-
a
+
a
2
a
Uk
Uk-l
ap(at) _ q(at) (1.)(
a
2
ak-d dt
2
3
ak+ 1
_
ak t)3I +’ Uk
_ [p(at-l) + p(at)] a
(*)] -"3a [ q (*) ak-l + q ak .
Note that Zk+ 1(t) vanishes outside the interval (a k, ak+ 2) and the product Zk(t)Zk+ 1 (z) vanishing outside the interval (a k, ak+ 1)’ Thus dk,k+
1
= J(Z1> Zk+ 1; a)
= p(at)
f
Uk + 1 (
U
-~
1) (1) ~
dt - q(at)
fUk+l JUk
(1
-;;
(ak+
1 -
)(1
t)
)
-;; (t - ak) dt
= - p(at)ja - t;q(ana. In practice it is more convenient to normalize our problem so that (7a)
and (7b)
dk.k+l = -p(at)-«(j2j6)q(at).
This is due to the fact that J(x; a) is quadratic, hence homogeneous of degree 2. That is J(cx; a) = c 2J(x; a) for any real value of c. As we have indicated
92
3
The Second-Order Problem
above, we seek a nontrivial solution xAt) that spans the one-dimensional solution space. The specific representative is obtained by the choice of C1 below. This vector is the Euler-Lagrange solution. We now derive the discrete difference algorithm which computes the sequence {c~} where xa(t) = c~z~(t). The motivation for our work is that we wish to construct the Euler-Lagrange equation for D(a). In the continuous case the bilinear form for (2) is J(x, y) =
Lb (px’y’ -
= -
qxy) dt
f [(px’)’ + qx]y dt + p(t)x’(t)y(tll:•
Hence in trying to make J(x, y) = 0 for all y(t) vanishing at t = a and t = b we set L(x) = (px’)’ + qx = 0 and "wait" until y(b) = 0. The discrete analog is to try to find c = (C 1,C 2, .. Y so that J(x, y; a) = fjTD(a)c = when the function y(t) = b~zit) is in d(a). Thus we try to solve D(a)c = and "wait" until x(t) in d(a) vanishes at t = b. We come "as close as we can" if we define the sequence {c;} (i = 1, ... , N) by the solution to (8a) (8b) (8c)
c 1 d ll c 1d21
+ c 2 d 12 = 0,
+ c 2d 22 + C3d23 = 0, (k = 3,4,5, ... ,N),
and the extremal solution xa(t) = c~zit). Notice that (8) gives a sequence {Ck} such that D(a)c = except for the "last term." We also assert that (for a small) if n j is the jth value of k for which ckCk+ 1 .:::;; 0 the vectors c1 = (C1,C2,’" ,Cn"O,O, )T and c j+ 1 = (0,0, ... ,0,c nj+ 1, cnj+ 2, ... ,cnj+1,0, 0, ... )T for j = 1,2, satisfy cJD(a)c j .:::;; as does any linear combination c = aCi + bCj (see Fig. 3). In order to motivate how numerical Euler-Lagrange equations such as (8) are obtained, especially for higher-order problems, we have Figs. 3a and 3b. In Fig 3a the products j; are obtained as follows. The first n - 1 entries are zero by (8); f" = -dn.n+1Cn+1 since f" + dn.n+1Cn+1 = by 8(c) with k = n; fn+ 1 = d n+ l.ncn is evident; as is fn+k = for k > 1. In Fig. 3b similar logic holds. The first assertion about D(a)c holds with n = N from Fig. 3a. For j = 1 and Ck,C k , + 1 .:::;; we have ciD(a)c 1 = «J; = - en. n+ 1 CnC n+ 1 < since en, n+ 1 < from (7b) for a small. Similarly for Cj (j = 2,3, ... ) we have cJD(a)c j = (- d; + 1, nCn)C n+ 1 + (- dm. m+ 1 Cm+ 1)cm .:::;; 0. Finally the assertion about cholds for if Ii - jl > 1, cTD(a)c = a2[ciD(a)ci] + b2[cJD(a)cJ + 2ab[cJD(a)c j ] .:::;; since the last term is zero. The reader may verify the same result when
3.2
s.,
dl,z
dz,l
d2,2 d Z, 3
93
The Numerical Problem
o
o dll dn
o
o o
«.«.
1 n- 1
s.,
n- 1
-d n, n + 1 Cn + l
C’n
dn,n+l
dn+ 1 , n dn + 1 , n + l
dn + 1 , n + 2
o
o o o
dn + 1 , nC"
o o
o 0
dn,n-’
d". n+
0
..
o
1
«.;».
o
0 Cn + 1 Cn + 2
Cm - 1
em 0
o
0
Fig. 3
d",n+1
Cn + l
-d n + 1, 0
nCn
0
-d m, m + 1Cm + 1 dm + 1 mcm 0 0
This construction demonstrates a sufficient condition that there exist negative vectors for the finite-dimensional quadratic form (6), as well as the continuous quadratic form lex) in (2) by assuming peat) s; pet) and q(at) ~ q(t) for a k S; t < ak+ l : We have not (yet) proven this result nor the fact that weobtain a maximal-dimensional negative space. This will be done in Section 3.5 where we prove Theorem 2. To summarize these results we have Theorem 2 Define d k k and dk+ 1. k = d k , k+ 1 as in (7) and {cJ by (8). Let xo(t) be a nonzero solution to L(x) = in (1), xo(a) = 0, and xa(t) = caZit), where C 1 is chosen so that C 1 = xo(a + 0’). Then (4)
lim
a-O
f [x~(t) a
-
x~(t)y
dt
= 0.
The proof of this result is very technical. It involves new ideas in the theory of approximation in that we are approximating the nonpositive part of an operator. Furthermore, 0’ may be chosen relatively large if pet) and
94
3 The Second-Order Problem
q(t) are smooth because our methods depend on construction of negative vectors. As an aside we present in Theorem 3 another new result for the elliptic type matrix D(u) given above. We expect that our results can be generalized to more general banded matrices of this type. Theorem 3 generalizes the well-known result that a symmetric matrix Anx n = (a;) is positive definite if and only ifthe determinants PI’ pz, . . . ,Pn are positive, where Pk is the deter› minant of the principal minor Ak = (a;), where i, j = 1, ... ,k. For these results assume Pr is as described above, that is, let Po = 1, PI = dl,l’ and Pr = dr,rPr-l - d;, r-lPr- Z (r = 2,3,4, ... ). Theorem 3 The number of agreements in sign of two successivemembers of the sequence {Pr} is equal to the number of eigenvalues of D(u) that are greater than zero. Similarly if nl> n z,. . . ,n j" .. and cl , cz, . . . ,Cj" .. are the positive integers and vectors described under condition (8), then the njth principal minor of D(u) has j eigenvalues less than or equal to zero and cj is an approximate zero eigenvector of D(u) in the sense described above. To find the eigenvalues and eigenvectors of D(u) we would construct the new tridiagonal matrix D~(u) = D(u) - 0, where ~ is any real number and apply Theorem 3 to this new problem. Note that this procedure would give a great deal of information for numerical problems. It is evident that with the speed of modern computers, these methods are both accurate and ex› tremely fast. We now begin the second part of this section. The reader should already understand the heuristic ideas of what is happening. The purpose of this part is to present our ideas in a formal mathematical style. A second reading may also make these ideas more understandable. We now introduce the resolution parameter A.. This second parameter allows us to consider the numerical problem on the interval [a, b] as a problem on subintervals [a, b’], where a < b’ < b. The major theoretic results are in characterizing the indices s(A., u) and n(A., o) below and deriving the main result given in (4). For con› venience of presentation we repeat some ideas or conditions contained above. The omitted proofs are found in Section 3.5. Once again we assume the quadratic form given in (2) or the bilinear form (9)
J(x, y) =
L[p(t)x’(t)y’(t)- q(t)x(t)y(t)] dt. b
For convenience we assume p(t) > 0 and q(t) are continuous functions on a ::;; t ::;; b although these restrictions may be relaxed. Let a denote the arcs x(t) that are absolutely continuous on A = [a, b] and have square integrable derivatives x’(t). The norm, = (x, X)l/Z on the Hilbert space .xl is defined from
IIxll
(10)
(x, y) = x(a)y(a) +
Lb x’(t)y’(t)dt.
3.2
The Numerical Problem
95
Let :l: denote the set of real numbers of the form (J = lin (n = 1,2, ... ) and zero. The metric on E is the absolute value function I(J 2 - (J 11. Let (J = lin, define the partition n«(J)
= (ao = a <
a 1 < a2’ .. < aN+
1 ::;;
b),
where we assume without loss of generality that b = aN + 1 and ak = k(b - a)/(N
(11)
+ 1) + a
(k = 0, ... ,N).
If (J = 0, let d(O) denote the functions x(t) in the subset of d such that x(a) = x(b) = O. If (J =I- 0, the space d«(J) is the set of continuous piecewise› linear functions x(t) = baz a with vertices at n(O’). That is, if (J = lin, then (5)
Zk
for k (n
()_{olt -
= 1, 2, 3, ...
l
if tin [a k- 1,ak+1] otherwise
t - a kI/ O’
= baza(t)
and x(t)
is in d(O’). The N "hat" functions zn(t)
= 1,... ,N) form the basis of d(O’). They are usually referred to as splines
of degree 1. The author "discovered" them in 1968, in order to obtain certain minimum properties, 22 years after their appearance in Schoenberg [48]. For each Ain A let J’t’(A) denote the arcs x(t) in d satisfying x(a) = 0 and x(t) == 0 on [A, b]. If /l = (A,O’) is in the metric space .A = A x L with metric d(/ll’/l2) = IA2 - All + 10’2 - 0’11, let 81(/l) = d(O’) 11 J’t’(A). Thus an arc x(t) in ~(A, (J) is a piecewise-linear segment on [a, ak], where a k s A < ak+ 1 such that x(a) = 0 and x(t) == 0 on [ak,b] (see Fig. 4). For /l = (A, u) let p,,.(t) and q,,(t) be defined as in Theorem 1 above then J(x; /l) = J(x, x; /l), where (12)
J(x, Y; /l) =
f:
[pAt)x’(t)y’(t)- q,,(t)x(t)y(t)] dt
is defined for arcs x(t), y(t) in 81(/l). We have shown above by a straightfor› ward calculation that J(x; /l) = babada,l/l) = xTD(/l)x, where x = baz a is in
/
’/ ’/
/
’/
/ ’/ /
I
\
’/
/
\
’/
’/
\ \
/ ’/
x.. ,Ctl......\, I
/ /
,
\
’/
\
\
,,
\
\
00
1
i_1
OJ Fig. 4
i+1
96
3 The Second-Order Problem
iJ6(p) and D(J1) is a symmetric, tridiagonal matrix "increasing" in Je so that the upper k x k submatrix of D(aH I> (J) is D(ak> (J). We now give some important definitions. We assume that the (general) definitions of signature, nullity, Q-null vectors, and Q orthogonality are known. For each p = (Je, (J) in M let s(p) = s(Je, (J) and n(p) = n(Je, (J) denote the index and nullity of the quadratic form J(x; p) given in (12) if (J =I- 0. Let s()" 0) and n(Je, 0) denote the index and nullity of the quadratic form J(x) given by (2) on .1f(Je). Let (J in L be given. A point Je((J) at which s()" (J) is discontinuous is an oscillation point of J(x; (J) relative to {.1f(Je) Je in A}. The oscillation points are denoted by }’m((J), m = 1,2,3, .... In Theorem 2 our construction of a numerical solution yields the oscilla› tion (conjugate) points as a biproduct. However, the problem of determining (numerical) oscillation points is of interest in its own right for higher-order problems and for the insight we obtain. We begin this problem by citing a lemma which may be found in Gourlay and Watson [13, p. 81] with}, = 0 in the reference. It involves a Sturm-sequence argument for polynomials. We assume (J small enough so that d""" > 0 and d", ,,+ 1 < O. We define recur› sively: Po = 1, PI = d ll , . . . ,
I
(13)
Pr = dr,rPr-l - d;,r-lPr-Z
(r = 2, 3,4, ... )
and note that Pr is the determinant of the upper r x r submatrix of D((J), which we denote by D(r)((J). In Theorem 4, result (d) is proven in Lemma 1 of Section 3.5. It is included here for completeness.
Theorem 4
The following nonnegative integers are equal:
s(ak+ I> (J) + n(ak+1’ (J), k - l(k), where l(k) is the number of agreements in sign of {PO,Pl’ Pz,• .. ,Pk} given by (13), (c) the number of nonpositiue eigenvalues of D(k)((J), and (d) the number of times the vector c((J), defined below, "crosses the axis" on the interval [ao, ak+ 1]. (a) (b)
Theorem 5 There exists b > 0 such that if (J < band aH 1 is not an oscillation point of (2), i.e., xO(aH 1) =I- 0, where Xo is a solution of (2) such that xo(a) = 0, then the nonnegative integers in Theorem 2 are equal to: (e) (f)
the number of oscillation points of (2) on (a, aH 1), and s(ak+I>(J) = s(ak+l’O) + n(ak+I>O).
We note that the calculations of d"." and d". a+ 1 in (8) and the number of sign changes of p, in (13) allow us to determine the number of (numerical) oscillation points of (1). By Theorem 2 the mth oscillation or conjugate point
3.2
The Numerical Problem
97
Am(cr) is a continuous function of a if Am(cr) < b. Thus we can construct the numerical oscillation vector which satisfies Theorem 2 and yields the oscilla› tion points. In a sense Theorem 4 describes an eigenvalue problem, while Theorem 2 leads to the construction of the eigenvector and eigenvalue. For completeness, we now state some ideas that may be evident to the reader. Given the matrix D(cr) = (dap ), define a sequence (c1 i= 0) {C 1,C2, C3, } of real numbers as in (8). Assume a small enough so that d r r > 0 and dr r+ 1 < O. Given the sequence of numbers {c.} defined in (8), let x,,(t) = caza be the spline of degree 1 (broken line segment) such that XAak) = ci, The vector x,,(t) is the Euler-Lagrange solution of D(cr) in the sense that the number of times it crosses the axis is the number of negative eigenvalues (see Theorem 4). Furthermore, we have x,,(t) - xo(t) in the strong derivative sense described above in Theorem 2. The next theorem is obtained by noting as we have above, that the product of D(a) with vectors of the form Xl = L~~ I CkZk and X2 = L~:;'nl + I CkZ k is almost the zero vector because of (8). In fact cI D(cr)c 1 = - cn,+ 1 cn,dn,. nl + I and c1D(cr)C2 = -cn,Cn,+ldn,+l,n, - cn2cn2+1dn2.n2+1’ These results are obtained by "visualizing the effect of D(a) on the given vectors (Fig. 3). Hence our remark about the Euler-Lagrange equation of tridiagonal matrices. Theorem 6 must be modified in the obvious manner if c, = 0 for some value of I. Theorem 6 If C,C’+1 < 0 for exactly the values 1= nb n2,"" then the vectors c1 = (c., C2, ... ,cn" 0, 0, 0, ... )T, C2 = (0,0,0, ... , cn,+ 1, Cn,+2" .. , Cn2’0, 0, 0, ... f, etc., are negative vectors for D(cr) in the sense that c{D(a)ci < 0 as is any nonzero linear combination of {Ci}’ and conversely. Since C,C/+ 1 < 0 and - dn n + 1 > 0 and "if" part holds. The converse holds by Theorem 2. Finally we remark that the proof of Theorem 2 relies on the fundamental a hypothesis of Section 2.3. Historically, Theorems 4 and 5 were obtained from the fundamental signature approximation inequalities, the algorithm given by (8) was verified by computer runs, the motivation of null and negative vectors was established, and finally the theorems leading to Theorem 2 listed in Section 3.5 were obtained by the author. These theorems demonstrate a new type of convergence which is very different from the usual ideas and methods. They provide a further justifi› cation for the Hilbert space approach of Chapter 2. The third part of this section involves the numerical construction of the solution in (1) using (7), which generates the elements of D(a), and (8) which allows us to solve recursively for {cd. In addition to the strong convergence result given in Theorem 2, we believe that our methods are better than the usual difference methods, since they approximate integrals that involve a
98
3 The Second-Order Problem
smoothing process and not the usual approximation of derivatives by differences. In this section we shall describe two methods of solution, namely, a "dynamic" algorithm for initial-value problems and a "relaxed" method for boundary-value problems. The former method is similar to conventional methods of ordinary differential equations in that our solution at ak+Z depends upon the values at ak and ai ; l ’ The latter method assumes given boundary conditions such as x(a) = x(b) = 0 and computes the solution by relaxation methods. The dynamic method has in fact been essentially described above and given by Eqs. (8). For the usual examples we desire a one-dimensional solution with x(O) = O. This can be done by choosing Cl to be some desired constant or by choosing C1 = 1, and then solving recursively for Ck by Eq. (8). If we want an alternate initial condition such as x’(a) = 0, we begin with k = 0, Co = 1, c, 1 = C 1 in (8c).This gives Co = 1, Cl = - codo,o/(do,-1 + d O,1), and Ck defined recursively by (8) for k = 2, 3, 4, ... , For boundary-value problems such as x(a) = x(b) = 0 we used a relaxed or Gauss-Seidel method on the matrix D(a) = (d"p) to solve D(a)c = O. We began with an initial guess for the vector c, which we call c(Ol. Let c(n) be the value of the vector c = (CO,C1"" ,cN+d T after n iterations and C~~1 = 0 for all n. Then from (8c) we have (14) 4n+ 1) = -(c~nLdk,k+1 + 4n! / ldk, k_l )/dk.k (k = 1,2,3, ... ,N).
In our test cases of relaxation, described in more detail below, we observed an interesting phenomenon that agrees with the theory of relaxation. To be specific, we assume our setting is such that a = 0 and the first conjugate point (focal point or oscillation point) is at b = n. If aN + 1 < n, the matrix (d"p) is positive definite and hence the iteration (14) gives vectors c(n) that converge to c(oo) = 0 as expected. The convergence is "monotone" in that we have numerically verified that 0 < aN, < aNz < n implies that {c(n)(NI)} converges to c n, the matrix (from our theory) has a negative eigenvalue and the iteration (14) diverges. As above, the divergence is monotone in aN’ If aN is close to n, the divergence is slow, while ifn < aN, < aNz’ the divergence of {c
3.2
The Numerical Problem
99
runs of this constant coefficient case (test case 1) are slightly better than the equation described in the next equation. Our second equation is the dif› ferential equation [(2 + cos t)x’(t)]’+ (2 + 2 cos t)x(t)
= O.
The reader can verify that sin t is "the" solution that vanishes at t = O. This equation provides coefficient with reasonable change in values for t in [0,271:]. Figure 5 is a listing of our dynamic or initial value program. The program can be easily modified to give the solution cos t to the initial problem with IMPLICIT REAL"S (A-H,O•Z) DIMENSION C(3) C NUMBER OF DIVISIONS OF INTERVAL ARE READ IN, IF SaO STOP 100 READ (5,150)S 150 FORMAT (010.4) IF(S.EQ.O.DO) GO TO 401 XS= S"7.DO N-IDINT(XS) S=1.DO/S B=S/Z.DO BB=S+S/Z.DO C EI-DIAGONAL ELEMENT, E2 AND E3 ARE OFF-DIAGONAL ELEMENTS EI=DCOS(B)+DCOS(BB)+4.DO-S"S"(Z.DO"DC05(B)+2.DO"DCOS(BB)+4.DO) "/3. DO EZ=-DCOS(BB)-2.DO-S"S"(DCOS(BB)+I.DO)/3.DO C SOLUTION IS NORMALIZED TO 5INX C(1)=D5IN(S) C(Z)=-C(I)"EI/E2 WRITE (6,200) S,e(l) zoo FORMAT(T2,’STEP SIZE =’,F11.S,TZ5,’C(1) -’ ,F16.14) WRITE (6, ZOl) ZOl FORMAT(’l’) WRITE (6,202) 202 FORMAT(T2,’X VALUE’ ,T20 , ’ALGORITHM VALUE’ ,T45,’ACTUAL VALUE’, *T70,’DIFFERENCE’) DO Z50 I=3,N T-I B=(T-Z.DO)"S+S/Z.DO BB=(T-1.DO)"S+5/2.DO E1=DCOS(B)+DCOS(BB)+4.DO-S"S"(2.DO"DCOS(B)+Z.DO"DCOS(BB)+4.DO) "/3.DO E3=-DCOS(BB)-2.DO-S"S"(DCOS(BB)+1.DO)/3.DO C C(3) REPRESENTS THE I+Z DYNAMIC VALUE C(3)=(-C(1)*E2-C(2)"E1)/E3 E2-E3 M=N/100 C(l)=C(2) C(2)=C(3) IF (MOD(I,M)) 250,210,250 210 Y=I X-Y"S Z=DSI N(X) W=Z-C(3) WRITE(6,225) (X,C(3) ,Z,W) 225 FORMAT(T2,FS.5,TZO,F16.14,T45,F16.14,T70,D22.14) 250 CONTINUE WRITE (6,300) 300 FOR~lAT (’1 ’ ) GO TO 100 401 STOP END
Fig. 5 Listing of program for initial-value problem (p(t)x’)’ + q(t)x cost and ’lit) = 2 + 2eost.
=0
with p(t)
=2+
100
3 The Second-Order Problem
X’(O) = 0 instead of x(O)
= 0, as described above. Figure 6 is a listing of our relaxation or boundary-value program with x(O) = x(n) = O. We note that this program is in single-precision arithmetic since each new iteration using (14) can be thought of as the first iteration with better initial data. Dynamic runs are in double-precision arithmetic. Table 1 is our test case with a run of our dynamic program with size (J = 1 is. Table 2 is a summary of runs of our dynamic program with different step sizes. Table 3 is the test case of our relaxation method with step size s = n/l00 In this run we initialized three values at 1, the remaining 98 values at 0, which is certainly "far removed" from the correct value of sin t. After 1000 steps we normalized the solution so that x(n/2) = 1. The differences of normalized values is given as !(Xk) = sin x, - Ck/C51 since C51 is our largest value. DIMENSION EU(lOl) ,EO(lOl) ,C(lOl) ,F(lOI) P(X)=2.0+COS(X) R(X)=Z.O+Z.O*COS(X) N=lOl NN=N-l C NU-NUHBER OF DIVISIONS OF THE INTERVAL S=3.l4l59Z654/NN DO 11 I=l,N 11 C(Il=o.O C(ZO)=l.O C(45)=1.0 C(30)=1. 0 DO SO K-Z,NN AK=K*S-S/2.0 AL=AK••S ED(K)=P(AL)+P(AK)-S*S*(R(AL)+R(AK))/3.0 EU(K)--P(AK)-S*S*R(AK)/6.0 50 CONTINUE DO 100 ITER=l,5000 DO 99 K=Z ,NN C GAUSS-SEIDEL ITERATION STEP 99 C(K)=-(EU(K-l}*C(K-l)+EU(K)*C(K+l))/EO(K) M-MOD (ITER,IOOO) IF(M.NE.O) GO TO 100 WRITE (6,75) ITER 75 FORMAT(TZ, ’NUMBER OF ITERATIONS-’ ,IS) WRITE (6,76) 76 FORMAT (TZ,’ X-VALUE’ ,TZO, ’ACTUAL-VALUE’ ,T4s.’ ALGORITHM› *VALUE’,T70,’DIFFERENCE’,T90,’NORMALIZER’) DO 98 J=l,N 98 F(J)=C(J)/C(5l) DO 70 J-l,N,5 X=(J-l)*S Z-SIN(X) D-Z-F(J) WRITE(6,80) (X,Z.F(J) ,D,C(5l)) 80 FORMAT(T2,F8.5,TZO,F16.8,T45,F16.8,T70,E16.8,T90,FI6.8) 70 CONTINUE WRITE (6,110) 110 FORMAT(’l’) IF (ITER.NE.lOOO) GO TO 100 C RE-INITIALIZE SOLUTION TO SIN X 85 C(L)=F(L) 100 CONTINUE STOP
END
Fig. 6 Listing of program for boundary-value problem (p(t)x’)’ + q(t)x 2 + cost and q(t) = 2 + 2eost.
=
0 with p(t) =
Table 1
Test Case of the Initial-Value Problem (p(t)x’)’ + q(t)x
=
0"
X value
Algorithm value
Actual value
Difference
0.0625 0.3125 0.5625 0.8125 1.0625 1.3125 1.5625 1.8125 2.0625 2.3125 2.5625 2.8125 3.0625 3.3125 3.5625 3.8125 4.0625 4.3125 4.5625 4.8125 5.0625 5.3125 5.5625 5.8125 6.0625 6.3125 6.5625 6.8125
0.062459 0.307438 0.533303 0.726009 0.873577 0.966830 0.999971 0.970940 0.881540 0.737331 0.547279 0.323199 0.079026 -0.170060 -0.408571 -0.621679 -0.796132 -0.921085 -0.988769 -0.994976 -0.939320 -0.825263 -0.659894 -0.453497 -0.218904 0.029299 0.275681 0.504923
0.062459 0.307438 0.533302 0.726008 0.873574 0.966826 0.999965 0.970931 0.881529 0.737318 0.547264 0.323184 0.079010 -0.170076 -0.408588 -0.621696 -0.796150 -0.921104 -0.988787 -0.994993 -0.939334 -0.825272 -0.659899 -0.453497 -0.218898 0.029310 0.275696 0.504941
-0.5420-09 -0.6800-07 -0.3860-06 -0.1110-05 -0.2330-05 -0.4050-05 -0.6170-05 -0.8500-05 -0.1070-04 -0.1270-04 -0.1420-04 -0.1520-04 -0.1580-04 -0.1630-04 -0.1690-04 -0.1760-04 -0.1830-04 -0.1860-04 -0.1810-04 -0.1650-04 -0.1360-04 -0.9740-05 -0.4860-05 0.5690-06 0.6100-05 0.1120-04 0.1530-04 0.1810-04
"Step size = rh and p(t)
= 2 + cos t and q(t) = 2 + 2 cos t. Table 2
Error Range of Solution Values of the Initial-Value Problem (p(t)x’)’ + q(t)x = 0" Step size
-f2 -h1
rra 1 no 1
5T2"
nh:4 1
J01j(j
41fmr
Maximum error
Minimum error
0.3010-03 0.7610-04 0.1900-04 0.4750-05 0.119D-05 0.2950-06 0.7980-07 0.285D-07
0.6610-07 0.2070-08 0.5430-09 0.1650-09 0.4510-10 0.1180-10 0.3170-11 0.8230-12
a p(t) = 2 + cos t and q(t) size = (J on the range 0 - 2n.
=
2
+ 2 cos t with step
= 2 + cos t and q(t)
-0.477E-03 -0.905E-03 -0.124E-02 -0.146E-02 -0.154E-02 -0.147E-02 -0.125E-02 -0.914E-03 -0.482E-03 0.0 O.471E-03 0.900E-03 0.124E-02 0.148E-02 0.157E-02 0.150E-02 0.128E-02 0.929E-03 0.486E-03 0.627E-06
2000 iterations (normalized value = 1.00609)
= 2 + 2 cos t with step size =
0.15708 0.31416 0.47124 0.62832 0.78540 0.94248 1.09956 1.25664 1.41372 1.57080 1.72788 1.88496 2.04204 2.19911 2.35619 2.51327 2.67035 2.82743 2.98451 3.14159
p(t)
-0.110E-Ol -0.209E-Ol -0.286E-Ol -0.336E-Ol -0.352E-OI -0.335E-Ol -0.285E-Ol -0.207E-Ol -O.109E-Ol 0.0 0.110E-Ol 0.211E-Ol 0.293E-Ol 0.347E-OI 0.367E-Ol 0.351E-Ol 0.299E-0l 0.217E-Ol 0.114E-Ol 0.627E-06
X value
a
1000 iterations (normalized value = 0.05646) -0.607E-05 -0.120E-04 -O.211E-04 -0.312E-04 -0.390E-04 -0.404E-04 -0.345E-04 -0.252E-04 -O.144E-04 0.0 0.542E-05 0.971E-05 0.14OE-04 0.2118-04 0.237E-04 0.219E-04 0.413E-04 0.780E-05 0.351E-05 0.627E-06
4000 iterations (normalized value = 1.00423)
n/l00 on the range O-n.
-0.289E-04 -0.556E-04 -0.807E-04 -0.I00E-03 -0.111E-03 -0.108E-03 -0.920E-04 -0.658E-04 -0.341E-04 0.0 0.252E-04 0.455E-04 0.645E-04 0.825E-04 0.889E-04 0.8548-04 0.702E-04 0.483E-04 0.246E-04 0.627E-06
3000 iterations (normalized value = 1.00529)
-0.476E-05 -0.965E-05 -0.168E-04 -0.255E-04 -0.329E-04 -0.340E-04 -0.290E-04 -0.202E-04 -0.102E-04 0.0 0.536E-05 0.822E-05 0.125E-04 0.191E-04 0.212E-04 0.198E-04 0.136E-04 0.762E-05 0.351E-05 0.627E-06
5000 iterations (normalized value = 1.00318)
Differences of Normalized Values of Solution to the Boundary-Value Problem (p(t)x’)x’ + q(t)x = 0-
Table 3
3.3 The Eigenvalue Problem
103
Since most of our runs were performed on a terminal-time-sharing arrangement, exact run time is not possible. All the runs given in Table 3, including massive amounts of output and compiling time, took 17 sec. Hence our time for the test case in Table 1 would be a maximum of 1 sec. Similarly the runs in Table 3, where we iterated 5000 times, required 24 sec.
3.3 The Eigenvalue Problem In this section we treat the second-order eigenvalue problem. The differen› tial equation and boundary conditions are (1)
L(x;~)
=
[p(t)x’]’ + q(t)x(t)
+ ~r(t)x(t)
with associated quadratic form H(x; ~)
(2a)
J(x) =
f:
= J(x)
= 0, x(a) = x(b) - ~K(x),
=0
where
[p(t)X’2(t) - q(t)x 2(t)] dt
and (2b) We assume p(t) > 0, d, A = [a,b], J’l’()’)and the continuity conditions on p(t), q(t), and r(t) are as in Section 3.1. There are two main ideas in this section. The first idea is that eigenvalue problems are parameter-comparison prob› lems of the equations and forms in Section 3.1. That is, we compare the elliptic form J(x) with the compact form K(x). The second idea is that the approxi› mation (J theory of Section 2.3 includes the eigenvalue theory. We use an elementary example to illustrate our ideas, which we call Example 0. In (1) let p(t) == l,q(t) == O,r(t) == l,a = O,andb be some arbitrarily fixed positive number. If b = n for example, we verify directly that the pair (~n, xn(t» with ~n = n 2 and xit) = sin nt satisfies (1) since (sin nt)" + n 2 sin nt = and xn(O) = xn(n) = 0. This, in fact, is what is done in many textbooks. Eigensolutions appear for this simple problem with no concept of how they are computed or their qualitative or quantitative behavior. For our purposes there is a more practical, workable, equivalent definition of eigenvalues, which we will now illustrate. For Example 0, the idea is that any ~ > and positive integer k, coupled with the initial condition x(O) = 0, defines a one-dimensional solution family, span {x~(t)}, and a point t~ such that x~(t) satisfies x" + ~x = 0, x(O) = x(t~) = 0, and vanishes k - 1 times on the interval (0, t~). We call these k points on (0, t~] oscillation points. As we shall see below, the fact that K(x) is positive implies that H(x; ~) is strictly decreasing in ~ and that oscillation points oft = move (continuously)
104
3
The Second-Order Problem
e.
e
to the left toward t = 0 with increasing Increasing is illustrated by the change from Fig. ld to la in Section 3.1. In Fig. I we have pictured the level curves in the (A, e) plane for the signa› ture S(A, e) of Example O. The signature is the negative dimension of the quadratic form H(x; e)
=
f:
[X’2(t )
- ex 2(t)] dt
on the space of functions vanishing at t = 0 and on the interval [A, b]. We ask the reader to try to understand this picture as clearly as possible. A major benefit of understanding Figs. I and 2 is that our approximation theory with an additional parameter a yields figures which are slight perturbations of Figs. 1 and 2. Thus, if we think of Figs. 1 and 2 as being associated with a fixed value a 0 of a parameter a in a metric space (I:, p), for pia, a 0) sufficiently small, the a pictures of Figs. 1 and 2 are a slight perturbation of that of a o › In particular, if a is the numerical parameter of Section 3.2, the reader may anticipate a numerical-eigenvalue-oscillation theory. If So = S(Ao, eo, ao) and n{}co,eo,ao) = 0 are the signature and nullity for a point (Ao,eo) in Figs. 1 or 2b, there exists t5 > 0 such that if p., e) is a point in the associated
9
4
TT TT
3 ’2
TT
2Tr
Fig.!
311
3.3
The Eigenvalue Problem
105
(a)
3 2
~I
(b)
t
b:C,
a ~-~r_2
- s1.i’: ~r_3 ""’/1’ ....
...... \1/",
:.~i~,Fig. 2
o figure and p(uo, o) + lAo - AI + I~o - ~I < b, then S(A,~, o) = So and n(A, ~, c) = 0. For Example we havefor k = 1,2,3, ... , level curves r k = {(A, ~)IS(A,~) = k - 1 and n(A,~) = I}. The curve r k separates the regions 0k-l and 0b where Ok = {(A,~) IS(A, ~) = k - 1 and n(A,~) = O}. We shall show below that Ok is an open connected set. For each Aoin (0, bJ there exists a unique point of intersection of J. = AO with each r k (k = 1, 2, 3, ...). This point is the "unique" solution to our eigenvalue which we designate as (}'O'~k(AO)) problem in the sense that there is a nontrivial solution Xk(t) which satisfies x"(t) + ~~(t) = 0, which vanishes at t = and t = Ao and at k - 1 points in the interval (0, Ao). The wording "unique" indicates that any nonzero multiple of xk(t) has the same property. Thus this eigensolution has k oscil› lation points. Note that the eigenvalue problem has an infinite number of solutions gk(Ao)Ik = 1,2,3, ... } for any AO > with the eigenvalues having a limit point at 00.
106
3 The Second-Order Problem
Conversely, for each ~o > 0 and b finite, we shall show that the line ~o intersects only a finite number of curves r 1 , r z , ... , r ko ’ where we allow ko to be zero if b is small, i.e., s(b, ~o) = O. If ko = 0, this set of curves is empty by convention. The intersection of the line ~ = ~o with k (k = 1, ... ,k o) determines a point (A'k(~O)' ~o), where Ak(~O) is the right-hand boundary condition on the eigenvector Xk(t). That is, Xk(t) satisfies the dif› ferential equation x" + ~ox = 0 and x(O) = X(Ak(~O)) = O. Note that )'k(~O) < Ak+ 1(~0) with the kth eigensolution xk(t) vanishing k - 1 times on the interval (0, Ak ( (T 0)). Finally we remark that if b were allowed to become infinitely large, for each ~o > 0 the value of ko becomes infinitely large. Thus we have a duality between eigenvalue and oscillation problems. If(Ao, ~o) is a point on the curve r b the quadratic form H(x; ~o), with b = )’0’ has a (k - I)-dimensional negative space. It has a null vector xo(t) that is a solution of the equation x"(t) + ~ox(t) = 0 such that xo(O) = Xo(Ao) = 0 and xo(t) vanishes k - 1 times in the interval (0, )’0)’ In the general case of (1) or (2), the picture is modified as follows. If r(t) ~ 0 and q(t) ~ 0, the curves are essentially as pictured in Fig. 1. Ifr(t) ~ 0 the curves are as pictured, except that some eigenvalues may be negative; that is, the curves may be pulled down in the picture. Thus for example if ~
=
r
H l(X; ~) =
f: [X’2(t)-
x 2(t) - ~ lX2(t)] dt,
where q(t) = 1 in (2a) then the eigenvalue ~ of H(x; ~) is related to ~ 1 by ~ = 1 + ~ 1 or ~ 1 = ~ - 1 so that the line ~ = 1 becomes ~ 1 = O. To complete our picture in the general case of (1) or (2) we must assume the Hestenes condition on J(x) and K(x), that is we assume that (3)
x =F 0,
K(x) ~ 0
implies
J(x)
> O.
With this condition we may have a two sided graph (Fig. 2a) of s(~) for each fixed A in (a, b]. In this figure we graph S(A,~) versus ~ for A fixed. We shall show that there exists ~* independent of Asuch that J(x) - ~* K(x) is positive definite on Ye(b) and hence on any Ye(A). Figure 2b is the picture of a general problem with (3) satisfied. In Example 0, ~* may be chosen to be zero and ~ -10 ~ - 2, ~ - 3, . . . do not exist since K(x) ~ O. The values ~1o ~2' ~3' . . . of Fig. 2a are found by intersecting the line A = AO with r 1, r 2, r 3, . . . in Fig. 2b. We begin our formal treatment of eigenvalues by focusing on the signa› ture of appropriate quadratic forms. This will lead to the eigenvalue results given below. An understanding of Fig. 1 or 2 should allow the reader to translate the signature results to eigenvalue results. For exposition purposes, we take a minor detour in Theorem 1 to prove a less general result than is possible. Thus assume the quadratic form K(x) in (2b) is positive, (Jl is a fixed subspace of d, and s(~) is the signature of H(x; ~) = J(x) - ~K(x) on
3.3 The Eigenvalue Problem
fJ4 for any real ~. This corresponds to 4(a) where ~ not exist.
1, ~ - 2, ~ - 3, . . .
107
do
Theorem 1 s(~) is a nondecreasinq integral-valued function on (- 00, 00). Furthermore there exists ~* ~ 0 such that ~ ~ ~* implies that see) = o. If q(t) ~ 0, we may choose ~* = 0 since J(x) is positive definite and s(O) = O.
H(x; ~2) - H(x; ~1) = J(x) - ~2K(x) - [J(x) › is a negative quadratic form. Thus H(x; ~2) < H(x; ~d, and if X o satisfies H(xo; ed < 0, then H(xo; e2) < O. Thus S(e2):?: seed since these integers count the respective dimensions of the negative space. Ifq(t) ~ 0, then lex; 0) > 0 for any x(t) not identically zero. The existence of ~* in the general case where r(t) :?: 0 may not be satisfied has been given in Hestenes [27, p. 559]. Intuitively in Theorem 1 we may assume that our hypotheses imply that if ~* is sufficiently small, then -~*r(t) - q(t):?: 0 holds on [a,b]. Hence x i= 0 implies H(x; Pc*) > 0 since Let
~lK(x)]
~1 < ~2 then = (~1 - ~2)K(x)
H(x; ),*) =
f p(t)X’2(t)dt - f: [~*r(t)
+ q(t)x 2(t)] dt > O.
Our formal results are obtained in two parts. Our first development is an approximation-eigenvalue theory in a (e, 0) setting, where ~ is the eigen› value parameter and (J is the approximation parameter. In this case we have hypotheses (1) and (2) of Section 2.3 holding in the (~, (J) setting (they are inherited properties from the (J setting) and hence the fundamental inequalities (5) of Section 2.3 hold in the (e, (J) setting. Examples of this theory are eigenvalue approximation problems involving perturbation of coefficients of (1) or (2) on a fixed space of functions and a numerical-eigenvalue theory on a fixed space of functions. More in keeping with Figs. 3 and 4, the reader may replace (J with the resolvent parameter Pc in which case we obtain the duality theory involving eigenvalues and resolution spaces pictured in Figs. 1 and 2. The second development involves a (e, Pc, (J) setting where the parameters are as represented above. This case is more complicated and involves a resolu› tion of the spaces d«(J) by Y’f(Pc) spaces. That is, {fJ4(Pc,(J) = Y’f(Pc) n d«(J)IPc in [a, b]} is a resolvent of d«(J). In this setting, hypotheses such as (1) and (2) of Section 2.3 are inherited in passing from the (J setting to the (A, (J) setting. In this part we picture Figs. 1 and 2b as associated with a fixed value (J 0 of (J. Values of (J close to (J 0 will be associated with pictures close to Figs. 1 and 2b. We assume that the reader is familiar with the approximating hypotheses (1) and (2) and the fundamental inequality results in (5) and (6) of Section 2.3. It is the extension of this result from the (J setting to the (~, (J) or the (e, Pc, (J) setting that yields the following results. If (J 0 is associated with Fig. 1 or 2b,
108
3 The Second-Order Problem
small perturbations of 0"0 yield a picture which is a small perturbation of Fig. I or 2b. We continue the first development by extending the L theory to the M theory given below. Let L be a metric space with metric p. For each 0" in L let d(O") be a closed subspace of d, J(x; 0") an elliptic form defined on d(O"), and K(x; 0") a compact form on d(O"). Elliptic and compact forms have been characterized in Section 2.1. We assume conditions (1) and (2) of Section 2.3 are satisfied and that a, ~ 0"0’ x, in d(O"r), Xo in d(O"o), x; ~ Xo imply K(xr; O"r) ~ K(xo; 0"0)’ Let M = [R1 X L be the metric space with metric d defined by d(fl1> fl2) =
le2 - ed + P(0"2,0"1)
for any pair of points fl1 = (ehO"l) and fl2 = (e2,0"2) in M. For each fl = (e,O") in M define d(fl) = d(O") and H(x; fl) = J(x; e,O")
= J(x; 0") - eK(x; 0") on the space d(fl). Finally let S(fl) = s(e,O"), n(fl) = n(e, 0"), and m(fl) = m(e,O")
(4)
denote the index, nullity, and sum of the index and nullity of H(x; fl) on d(fl). Theorem 2 Conditions (1) and (2) of Section 2.3 hold with fl replacing 0" and H replacing J. The proof ofthis theorem is given in Section 3.5 and indicates the elegance ofthe theoretical ideas oftwo types of convergence, compactness of quadratic forms and Hestenes’s ideas of ellipticity. Conceptually, this theorem and the next few theorems should be understood, with special emphasis on the generalization from a L setting to an M setting or more generally to an M x A setting as in the second development. Theorems 3, 4, and 5 now follow immediately from Theorem 2. Theorem 3 For any flo = (eo, 0"0) in M there exists b > 0 such that fl = (e,O") and d(/l,flo) < 15, then
(5)
s(eo,O"o) ::;; s(e,O") ::;; s(e,O") Theorem 4
+ n(e,O")
::;; s(eo, 0"0)
if
+ n(eo, 0"0)’
Assume b > 0 has been chosen such that fl = (e, 0"), d(fl, flo) < if d(fl,flo) < b we have
15 implies inequalities (5) hold. Then (6a)
n(e,O") ::;; n(eo, 0"0),
(6b)
n(e,O")
= n(eo,O")
implies
s(..1.,O")
= s(..1.o,O"o)
and
m(e,O") = m(eo,O"o), and and
n(e,O") = O.
3.3
The set {Jl in Mln(Jl)
Theorem 5
=
The Eigenvalue Problem
109
O} is open. The set {Jl in Mln(Jl) =I’
O} is closed. Theorem 6 Let (Join L be given and let So be a nonempty compact subset of g In(~, (Jo) = OJ. Then there exists e> 0 such that ~o in So, and p«(J, (Jo) < s imply (7)
s(~o,(J)
= s(~o,(Jo),
n(~o,(J)
= n(Ao,(Jo) = 0,
where So, is the s neighborhood of So. Let ~* be real and (J 0 in L such that n(~*, (J 0) = s(~*, (J 0) = O. Then there exists e > 0 such that pto; (J 0) < e and I~ - ~ol < e imply n(~, (J) = Corollary 7
= o.
s(~,(J)
We have finally reached a point in our presentation where we may define the word eigenvalue. Our definition generalizes the usual definition of eigen› values in that it is equivalent to the usual definition for second-order differ› ential equation corresponding to Eq. (1), and includes, for example, the abnormal case of optimal control theory as in Chapter 6. Let (Jo in L be given. A real number ~o is an eigenvalue (characteristic (Jo) =I’ O. The number value) of J(x; (Jo) relative to K(x; (Jo) on d«(Jo) if n(~o, n(~o, (Jo) is its multiplicity. An eigenvalue ~o will be counted the number of times equal to its multiplicity. If ~o is an eigenvalue and Xo =I’ 0 is in d«(Jo) such that Jtx«, y; (Jo) = ~oK(xo, y; (Jo) for all y in d«(Jo), then Xo is an eigenvector corresponding to ~ 0 We begin the development of eigenvalues by assuming that J, K, and d are independent of (J, that is, consider a fixed elliptic form J(x) and a fixed compact form K(x) on a fixed space d. Theorem 8 has been given in Hestenes [27]. These results have been illustrated in Fig. 2a. Theorem 8 Assume x =I’ 0 in d, K(x) :::;; 0 implies J(x) > O. Then there exists ~* such that H(x; ~*) is positive definite on d. If ~o 2 ~*, there exists e = e(~o) such that (8a)
s(~)
= s(~o),
n(~)
= 0
for
~o - s
<
~
for
~o
<
<
~o
<
~o
and
(8b) If ~o :::;; (9a)
s(~)
~*,
there exists e =
= s(~o)
+ n(~o),
e(~o)
~
+ e.
such that
n(~)
= 0,
for
~o -
n(~)
=
0,
for
~o
e < ~ < ~o
and (9b)
<
~
<
~o
+ e.
110
3 The Second-Order Problem
If ~* ~' ~ ~
~ ~'
< C;
< t’, then s(C) - s(~') is equal to the number of eigenvalues on < ~' ~ ~*, then s(C) - s(O is equal to the number of eigen›
if~"
values on i" < ~ ~ ~'. If ~* ~ ~' < C, then s(~") + n(C) - s(O is equal to the number of eigen› values on ~' ~ ~ ~ ~"; if C < ~' ~ ~*, then s( ~") + n(~") - s( ~') is equal to the number of eigenvalues on i" ~ ~ ~ ~'. It is instructive to describe the graph of ~ versus s(~) as pictured in Fig. 2a. By Theorem 8 this graph is a step function with a finite or countably infinite number of intervals (or jumps). Each interval has the associated nonnegative integer value s(~). The number ~* is not unique. It may be chosen to be any interior point of the interval on which s(~) = O. Note that s(~) is a nonde› creasing function on (~*, 00) and nonincreasing on (- 00, ~*). It is continuous from the right if ~ < ~* and from the left if ~* < ~. The discontinuities in s(~) are points at which n(~) =I 0; in fact the jump at ~ is n(~) = 1 for second-order equations such as (1). The next step in the development of eigenvalues is to extend the results of Theorem 8 and the picture in Fig. 2a to an approximation theory of eigenvalues. In the following three theorems, if (J were the eigenvalue param› eter ~, we would have pictures corresponding to Fig. 2b. For convenience, we shall denote the kth eigenvalue greater than ~* by ~k' the kth eigenvalue less than ~* by ~ -k’ If (Jo in L is such that Theorem 9 holds, we use the notation ~k((J 0) and ~ _ k((J 0) to describe the respective eigenvalues.
Theorem 9 Let Uo in L be given and assume J(x; (Jo) > 0 whenever x =I 0 in d((Jo) and K(x; (Jo) ~ O. Then there exists 1’/ > 0 such that p(u,uo) < ~ implies J(x; (J) > 0 whenever x =I 0 in d(u) and K(x; o) ~ O. In addition there exists ~* and /j > 0 such that J1 = ().,u), J10 = (~*,(Jo), d(J1,J1o) < /j imply H(x; J1) > 0 on d(J1). Theorem 10 Let Uo in L be given such that J(x; (Jo) > 0 whenever x =I 0 in d((Jo), K(x,(Jo) ~ O. Assume ~' and C(~' < ~") are not eigenvalues of (Jo and there exists k eigenvalues of Uo on (~', ~"). Then there exists s > 0 such that pt«, uo) < s implies there are exactly k eigenvalues of (J on (~', ~"). In fact, if ~n(UO) ~ ~n+ 1(UO) ~ ... ~ ~n+k-l(UO) are the k eigenvalues of (Jo on (C ~"), then ~n((J) ~ ~n+ 1((J) ~ ... ~ ~n+k-1((J) are the k eigenvalues of a on (~', C). Corollary 11 is our final effort in our first development. It is a major result in this presentation. Corollary 11 If the nth eigenvalue ~n((J) (11 = – 1, – 2, – 3, ... ) exists for a = (Jo, it exists in a neighborhood of (Jo and is a continuous junction of (J.
3.3
The Eigenvalue Problem
III
Thus we have completed an approximate-eigenvalue theory. Important examples of these problems are the numerical eigenvalue problems in Section 3.4. In addition, by choosing (J to be A, the resolution parameter, we obtain a focal-point-eigenvalue theory as pictured in Figs. 1 or 2b. We shall skip this task by deriving a more general theory in this section. For completeness, we given Theorems 12 and 13 which are concerned with comparison theorems and eigenvalue problems. These results are com› parison theorems for the respective signatures s(e) and s*(e). For example, in Theorem 12, s*(e) ::; s(e) since scI* c sci. The result (10) holds by Theorem 8. In general the proofs of these results follow the proofs of Theorem 18 and 19 of Section 2.2 and other comparison results. Theorem 12 Let scll be a subspace of sci, J(x) > 0 whenever x#-O and K(x)::; 0, and e* be given as in Theorem 10. Let gi}, gn (i = – 1, –2, – 3, ... ) be the eigenvalues ofJ(x) relative to K(x) on sci and scll, respectively. If the kth eigenvalues et exist (k = – 1, – 2, – 3, ... ), we have
ekl
(lOa)
ek ::; ~t
(k
= 1,2,3, ...)
(k
= -1, -2, -3,
and
(lOb)
~k
2 et
Strict inequality holds for any k (k = – 1, – 2, – 3, null vectors of sci and scli are disjoint.
).
) such that the J(x;
~k)
Theorem 13 Let J I (x) and K I (x) be a second pair of elliptic and compact forms on sci and assume J(x) > 0 whenever x #- 0, K(x) < O. Let H I(X; e) = JI(x) - eKI(x)and assume for any real that H(x; e)::; whenever HI(x; e)::; o. Then there exists e* such that both HI (x; ~*) and H(x; ~*) are positive definite on sci. Let gd, gn (k = – 1, – 2, – 3, ... ) be the eigenvalues of J(x) relative to K(x) on sci and JI(x) relative to KI(x) on sci, respectively. Then inequalities (10) hold. If H(x; e) < 0 whenever x#-O and HI (x; e) < 0, then inequalities (9) hold with strict inequality. .
e
o
To prove Theorem 13, we note that the first hypothesis implies that there exists ~* such that H(x; ~*) > O. Thus H I(X; e*) > 0, for otherwise if Xo #- 0 is such that H I(XO; ~*) ::; 0, then H(xo; e*) ::; O. For the second paragraph, let ~ > e* and Xo #- 0 be such that H I(XO; e) ::; O. By assumption, H(xo; e) ::; O. Thus if s(~), n(~), Sl(~)' and nl(e) are the respective indices, then Sl(~) + nl(~)::; s(~) + n(~) and the result follows. We remark that the condition H(x; ~) ::; 0 whenever HI (x; ~) ::; 0 in› tuitively means that H(x; ~) is "less positive" than H I(X; ~). Thus H(x; ~) > 0 implies H I(X; e) > O. Inequalities (10) hold similarly. The hypothesis of
112
3 The Second-Order Problem
Theorem 13 may be weakened by assuming this condition on H(x; ~) and H l(X; ~) for ~ in any interval containing ~*. To illustrate Theorem 12, let p(t) == 1, q(t) == and r(t) == 1 be defined on the interval [0, n]’ If d is the space of functions vanishing at the end points, then ~n = n 2 , n = 1,2,3, ... , are the eigenvalues with eigenvectors xit) = sinnt. If d 1 is the subspace of arcs satisfying L 1(x) = x(n/2) = 0, then d = d 1 Ef:) E, where dim E = 1. Ifn is even, xit) = sin nt satisfies L1(xn) = O. Ifn is odd, L1(xn):I= 0 so that X n is not in d 1 Hence ~t = 22 , ~~ = 42 , . , ~~ = (2n)2, .. . and ~k = k 2 :::;; (2kf = ~t. We leave to the reader the task of constructing a new picture such as Fig. 3 for the space d 1 in the last paragraph. This effort provides interesting examples for Theorems 18 and 19 of Section 2.2. Thus for Anot (necessarily) are reduced by one or zero in the new picture. equal to n, the values S(A,~) To illustrate Theorem 13, let p(t) == Pl(t) == 1, q(t) == ql(t) == 0, r(t) = 4, and r1(t) == 1 on the interval [O,»], Now
Hl(X;~)
-
H(x;~)
=~
f:(4 - 1)x 2 (t)dt.
If Hl(X;~):::;; 0, then ~ > 0, which implies that Hl(X;~) - H(x;~) ~ or H l(X; ~) ~ H(x; ~) so that 0> H l(X; ~) ~ H(x; ~). Thus the hypothesis of Theorem 13holds. The eigensolutions of L(x) = X" + 4~x = 0, x(O) = x(n) = are ~n = n 2/4, xn(t) = sin nt, 11 = 1,2,3, ... , similar to above. The eigen› = n2,x~(t) = sin nt, solutions of’Ljix) = x" + ~*x = O,x(O) = x(n) = Oare~~ n = 1,2,3, .... Since k2/4 :::;; k2 we have verified (10). We remark that the Rayleigh-Ritz theory of eigenvalues due to Weinstein and Aronszajn (see Gould [12]) is contained in our approximation theory. Our second part is straightforward and involves extending the (~, a) theory above to a (~, A, a) theory where Ais the resolution parameter. That is, {JIl’(A)IA in A} is a one-parameter family described in (7) in Section 2.3. In the more general case we desire an approximation-eigenvalue theory in› cluding the resolvent parameter A, the eigenvalue parameter ~, and the approximating parameter a. In many examples, inequality (11) would be an inherited property from (J to (~,)" (J). Inequality (7) is in fact correct in the (~, A, a) setting, but some care must be taken in the proof. (See Section 3.5.) Thus for the problem defined by (1) and (2), the (~, a) setting leads to an approximation theory of eigenvalues on [a, b]. This problem might include smooth perturbation of the coefficientfunctions or numerical-approximation problems on the interval [a, b]. The addition of the parameter A allows us to determine where, for example, a particular eigensolution vanishes on (a, b) or the focal or conjugate points of an eigenvector solution. We now define the spaces 8U(fJ,) that resolve the space d(a). The basic inequality results are then given, relating the signature s(/1) = s(~, A, a) and 11(/1) = n(~, A, (J) to fixed values s(/1o) and 11(/1).
3.3 The Eigenvalue Problem
113
Let D = 11\£1 X 1\ x L be the metric space with metric d defined by d(/11,/12) = 1~2 - ~11 + IA2 - All + p(0"2,O"d, where J.l1 = (~1,A1'0"1) and /12 = (~2,A2'0"2)' For each J.l = ((, A, 0") in M define H(x; /1) = H(x; (,0") on and n(/1) = n(~,A,O") de› the space P4(J.l) = .>1’(0") x Yl’(A). Let s(J.l) = S(~,A,O") note the signature and nullity of H(x; J.l) on P4(/1). In keeping with our announced philosophy for this chapter, the steps leading to the next theorem will be postponed until Section 3.5. They cul› minate in the following theorem. Theorem 14 For any /10 = ((o,Ao,O"o) in D, there exists 15 > 0 such that ((, A, 0") and d(/10,/1) < 15, then
if /1 = (11)
S((o,Ao,O"o) ~
S(~,A,O")
~
S((,A,O") +
~ s((o, Ao,0"0)
n(~,A,O")
+ n((o, Ao, 0" 0)’
Furthermore n((o, AO, 0"0) = 0 implies s((, A, 0") = s((o, Ao, 0"0) = 0 and n(~,A,O") = 0 whenever d(/1o,/1) < 15. We now begin a discussion of the focal-point problem. For convenience we assume (through the discussion of Theorem 17) that ~ = 0 or that there is no eigenvalue parameter present. Similarly we will use the notation S(A,0") to denote the signature of J(x; 0") defined on the space .>1’(0") r. Yl’(A). Let 0"0 in L be given. A point Ao at which S(A,11 0) is discontinuous will be called a focal point of J(x; 0"0) relative to {Yl’(A)IAin A}. The difference S(AO + 0, 0"0) - S(AO - 0, 0"0) will be called the order of AO as a focal point of 0"0’ A focal point Ao is counted the number of times equal to its order. In the above, S(AO + 0, 0"0) is the right-hand limit of S(A,110) as ), ~ Ao from above. The quantity S(Ao - 0, 0"0) is similarly defined. In the problems of this section we have that S(A - 0,0"0) = S(A, 0") whereas the disjoint hypotheses of Theorem 15 imply s(). + 0, 0"0) = S(A,O"o) + n(A,O"o), where n(A,O"o) is zero or one. Theorem 15 follows from this. Theorem 15 Let 110 in L be given such that X, A" in 1\, a ~ X < A" ~ b imply the J(x; 0"0) null vectors on P4(X,O"o) and P4(A",110) are disjoint. Assume X and A" are not focal points of 0"0 (a ~ X < A" < b) and there exist k focal points of 110 on (X, A"). Then there exists e > 0 such that p(I1,l1o) < s implies there are exactly k focal points of 0" on (X,A"). In fact if An(110) ~ An+1(0"0) ~ ... ~ An+k-1(110) (n = 1,2,3, ...) are the k focal points of 0"0 on (X, A"), then An(O") ~ An+ 1(0") ~ An+k-1(11) are the k focal points of 0" on (X, A"). Corollary 16 Under the above hypotheses there exists s > 0 such that pto, 0"0) < e and a ~ A ~ a + e imply that there exists no focal point A of 0". Corollary 17 Under the above hypotheses the nth focal point An(O") is a continuous function of 0" (n = 1,2,3, ... ).
114
3 The Second-Order Problem
This section concludes with the approximation of the eigenvalue-focal› point problem (or the focal-point-eigenvalue problem). We return to The› orem 14 and its consequences and reinsert the parameter ~. Let (Jo in ~, ~o in [Rl and 110 = (~o,(Jo) be given. A point Ao at which s(~o,)-, (Jo) is discontinuous will be called a focal point of £(x; 110) relative to {£(A)I), in A}. We note that this discontinuity also yields an eigenvalue ~o of J(x; (Jo) relative to K(x; (Jo) on the space d(O"o) n £(Ao). Similarly, Theorems 15, 16, and 17 can be extended to the continuity of the approxima› tion of the eigenvalue-focal-point problem. We restate or generalize Corollary 17, leaving Theorems 15 and 16 to the reader. In this new setting ~iA, (J) is the nth eigenvalue for the ()"O") problem, and )'n(~, 0") is the nth focal point or conjugate point for the (~, (J) problem. We leave to the reader the task of examples in this generalization from our examples above. Pictorially we have a perturbed picture of Figs. lor 2b. Theorem 18 The nth focal point )-i~, (J) is a continuous function of (~, (J) in the metric space R x ~. The nth eigenvalue ~n(A, 0") is a continuous function of (A, (J) in the metric space A x L.
3.4 The Numerical Eigenvalue Problems In this section we treat the numerical single-eigenvalue problem and the complete double-eigenvalue problem. In the former case we continue Section 3.3 and give a constructive algorithm and numerical examples. In the latter case we give both qualitative and quantitative results including a constructive algorithm and examples. As in Section 3.3, the single-eigenvalue problem is to find eigenvalues ~n and the corresponding eigenvector xit) of L1(x; ~) = 0, x(a) = x(b) = 0, or equivalently, extremal solutions of the quadratic form J l(X; ~) where (1)
L 1(x; ~)
= [p(t)x’(t)J’ + q(t)x(t) + ~r(t)x(t)
J l(X; ~)
=
=
and (2)
L[p(t)X’2(t)- q(t)x 2(t)J dt - ~ f: r(t)x 2(t}dt. b
The double-eigenvalue problem is to find eigenvalues (~n,/;n) and the cor› responding eigenvector xn(t) of L 2(x; ~,e) = 0, x(O() = x(f3) = x(y) = (0( < f3 < Y), or equivalently, extremal solutions of the quadratic form H(x; ~,e) where (3)
L 2(x; ~,8)
= [p(t)x’(t)J’ + l(t)x(t)
+ ~q(t)x(t)
+ er(t)x(t) =
3.4 The Numerical Eigenvalue Problems
115
and (4)
H(x;
~,8)
=
f: [p(t)X’2(t)-
l(t)x 2(t) -
~q(t)X2(t)
- 8r(t)X 2(t)] dt.
In the above, p(t) > 0 and p, I, q, and r are assumed to be continuous functions for convenience. By xn(t), we mean a nontrivial solution that spans the one-dimension space of eigensolutions. This solution exists in the double› eigenvalue problem since the extra parameter is "balanced" by the extra boundary condition. In the double-eigenvalue problem we assume q(t) ~ 0 and r(t) ~ O. We shall show that this assumption results in no loss of gener› ality, since (4) may be redefined with this property. We begin this section by discussing two problems. Historically, the study of eigenvalues of compact operators occupies a large part of the theoretical and practical literature. Many of these problems are discussed elsewhere in this book. We note especially Gould [12], which describes the ideas and many applications of the generalized theory of Rayleigh-Ritz methods, due to Aronszajn and Weinstein. Our first example is the Schrodinger equation of the hydrogen atom in polar coordinates. We cannot solve this type of problem (yet) in the manner we should like, since Eq. (5) involves a singular problem in that p(t) = 0 at some points. We expect that the ideas in this book will extend to certain types of singular quadratic forms and differential equations, but this has not yet been done. Assuming that singular methods are satisfactory (that we could build numerical solutions, for example) we would add another chapter to this book. Our purpose in this example is to illustrate how the technique of separation of variables leads to single- and double-eigenvalue problems. Actually, by using some ingenuity we can "solve" (in an approxi› mation and numerical sense) single- or double-eigenvalue problems of many classical singular differential equations such as Legendre’s equation (11) below. This topic will be discussed at the end of this section, after we have developed more ideas and become familiar with techniques. If the potential energy of the hydrogen electron is U = - e21r and its mass is rno, the wave function ljJ(r, 0, » satisfies the Schrodinger equation (5)
Equation (5) is Laplace’s equation which in rectangular coordinate is
116
3 The Second-Order Problem
where I1lj1 = ljIxx + ljIyy + ljIzZ" The equation was transformed into spherical coordinates to facilitate a classical solution using separation of variables. An explanation of the Schrodinger equations can be found in any book dealing with quantum mechanics, for example Joss [33]. In fact this topic forms the cornerstone of many books on quantum mechanics. It is the time› independent (perturbed) Schrodinger-type equation in one-, two-, and three› space variables that we should like eventually to solve numerically and to approximate. Returning to the above equation, let ljI(r, 8, 4J) = X(r,8)$(4J). Then (5) becomes a2x ax 0 (. OX) (6) cD(4J) or2 + -;- cD(4J) a;:- + r2 sin 808 SIll 8 cD(4J)
2
+ si~2
1[1
8 X(r, 8)cD"(4J)]
+
ae
8:;0 (E + ;Z) X(r, 8)cD(4J)
=
o.
Dividing by cD multiplied by the coefficient in (6) of 1"(cD), and separating variables we have _ $"(4J)
(7)
cD(4J)
2
2 X or 2
2
= r sin 8 [0 X
+~
oX r or
8n1mo ( E+ e
+~
+ ~2 _1_ ~
(Sin 8 OX)
r sin 8 08
08
2)] X.
4
Since the left-hand side is independent of 8 and r and the right-hand side is independent of 4J, they must both be equal to a constant, which we call By boundary conditions that we have not stated, we have the eigenvalue problem cD"(4J) + ecD(4J) = 0 with eigenvalue solutions em = m’, where m is an integer. We now set the right-hand side of (7) equal to = m2 and rearrange slightly to obtain the parametric partial differential equation in rand 8
e.
e
2
0 X
or2
(8)
2 ex
1
2
2
8n mo ( + [~ E Setting X(r,8)
=
a (.
1
+ -;- a;: + r 2 sin 8 a8 e
SIll 8
OX)
ae
2
)
+ -;: -
m ] r 2 sin 8 X
=
o.
R(r)0(8) we separate variables once again to obtain
2 r R
2) [R" + ~r R’ + 8n2mo (E + e R] h r 2
= _~
o
2
0)=
(_1_!..- (sin 80’) _ m sin 8 d8 sin 8
(X
3.4
The Numerical Eigenvalue Problems
117
and hence (9)
(m
2
1 d [sine0’(8)] + a - - . - ) 0(8) = -.sm8 d8 sm 28
and (10)
2m
T
2 R’(r) + 8n R"(r) -;:
2
( Ee + 7 ) R(r) - ra2 R(r) = 0.
Note that both equations (9) and (10) are double-eigenvalue problems, while the equation involves a single-eigenvalue problem. Because of the boundary conditions (the beauties of nature) a is also in integer; see Joos [33] for a complete solution. In fact a is of the form a = (k + m)(k + m + 1) or a = l(l + 1),where k is an integer. The substitution t = cos 8 in (9) leads to (l - t 2)x"(t) - 2tx(t)
+ [a - m2 /(1 -
t 2)]x(t)
=
0,
or in self-adjoint form (11)
[(1 - t 2)x’(t)]’ + [l(l
+ 1) -
m2 /(1
-
t 2)]x(t) = 0,
whose solutions are the associated Legendre (harmonic) polynomials. As we remarked above, we cannot solve equations such as (9) and (10) at this point with the same certainty as in the nonsingular cases. We ask the reader to bear with us, however. Mathematically, we could have started with an equa› tion such as (5), which yielded nonsingular ordinary differential equations similar to (9) and (10). We might remark that in fact (10) is in reality a triple› eigenvalue problem, since there exist only certain values for the energy parameter E. As an aside, we might mention that much of mathematical physics in› volves quadratic forms. For example, the equation x"(t) + [u 2 - g(u, t)] x(t) = 0, or less generally when g(u, t) == g(t), where g(t) has a simple pole at t = 0, occurs in scattering theory. Physically, u2 represents the energy of the scattering particle and g(t) is the potential. This is another eigenvalue prob› lem; see Olver [41]. Our second example is the isoperimetric problem involving a quadratic cost functional I(x) and a "quadratic" constraint functional I 1(x). This prob› lem occurs naturally in both the physical and social sciences; for example, as the cost or return on a manufacturing process. In the former case we are interested in minimization problems; in the latter case we are interested in maximization problems. Hestenes [29] in Chapter 2, Section 9, gives a very nice treatment of the necessary conditions of the problem of minimizing I(x)
=
f: f(t, x(t), x’(t)) dt
118
3 The Second-Order Problem
in the class of admissible arcs !J4 joining two given points (a, A) and (b, B), and satisfying a set of integral constraints of the form (’)I = 1, ... ,p).
The basic result is that if xo(t) is nice (normal) and minimizes I on f!8, then there exists a unique set of multipliers Ai>’.. ,Ap such that J(x) = l(x) + AI ll(x) + ... + Aiix)satisfies the basic necessary conditions ofthecalculus of variations. These include the first variational condition J’(xo, y) = for all y such that y(a) = y(b) = 0, the second variation condition, and the fact that xo(t) satisfies d(Fx,)/dt - F; = 0, where F = f + Adl + ... + Apfp. Since in our example J(x) is "quadratic," J’(xo,y) = Jl(xo,y) and necessary conditions are also sufficient conditions. In the above J l(X) = J(x) - A1Bl › ... - ApBp. For our example we shall assume l(x) = Sox’2dt and I I(x) = -I:J. + So x 2 dt = 0, where I:J. = SO sirr’ t dt. Following Hestenes notation we get J(x) = fo" [X’2(t) - )’lX2(t)] dt
+ A1I:J.,
J l(X) = fo" [X’2(t) - A1X2(t)] dt,
where es is the set of all nice functions vanishing at t = and t = n. The solution in this case is our elementary example in the beginning of Section 3.3. That is, xo(t) = sin t. Note that solutions depend on a relationship between the right-hand point b (equal to n in this case) and the value of I:J., as in Section 3.3. Thus, if I:J. n = So sin 2 nt dt, our solution is xo(t) = sin nt (n = 1,2,3, ... ), while there are no solutions for other values of I:J. when b = n. We now begin the numerical (single-) eigenvalue problem. Most of the theoretical ideas have been given in Sections 3.2 and 3.3 so that our emphasis will be nontheoretical, but complete enough to be understood without refer› ring to Sections 3.2 and 3.3.These ideas are developed as pictured in Section 3.3, except that L, and J 1 are given by (1) and (2) of this section. We present new numerical algorithms, methods, and ideas for the eigenvalue problem of second-order differential equations, quadratic extremal problems, and banded symmetric matrices. Computer runs have been very favorable in that our methods are fast, easy to implement, and very accurate. Unlike the situation for ordinary differential equations, there are few alternative feasible methods for eigenvalue problems. Usually these alternative methods are inefficient shooting methods or are only applicable to and illustrated by "perturbations" of special examples. Our methods have no such restrictions and yield better results. As we have indicated in Section 3.3, our basic idea is to redefine the problem so that we are not searching for a parameter ~ in
~.4
The Numerical Eigenvalue Problems
119
order that certain boundary conditions are satisfied, but rather boundary conditions which satisfy a parametrized approximate difference equation. As in to Section 3.2, where we constructed a numerical approximation J(x; 0") of the quadratic form J(x), we shall construct the numerical approxi› mation J J (x; ~,O") of J 1 (x,~) in (2),where 0" is the parameter representing mesh size. This is a finite-dimensional quadratic form J1(x; ~,O") = fjTDJ(~,O")E, where Dl(~'O") = D(O") - ~E(O") is a symmetric, tridiagonal matrix and D(O") and E(O") are the numerical approximations of the respective quadratic forms J(x) and K(x), where J l(X; ~) = J(x) - ~K(x) in (2). We then construct the numerical eigenvector solution, which is the Euler-Lagrange equation of Dl(~' 0"), and give an interval-halving algorithm to find the correct value of ~ to match the right endpoint condition as in Section 3.3. Finally, corre› sponding to Theorem 2 of Section 3.3, if xo(t) is a solution to (1) and xo(t) is our numerical approximation solution normalized so that xo(a) = x~(a), then we have lim
u-+O
[Jb [x~(t)
- xMt)]2 dt
a
+ I~" -
~olJ
= o.
In fact, if p, q, and r are reasonably smooth, the mesh size may be chosen relatively large because of our theoretical ideas, which depend upon negative solutions. Let 0" be small and positive and choose a partition of the interval [a, b], namely, n(O") = (a = ao < al < az < ... < aN+ 1 :::;; b), where a k = a + ka (k = 1,2, ... , N). Without loss of generality we may assume that b - a = (N + 1)0". For each k = 1,2, ... , N, let Zk(t) be the spline hat function of degree 2, namely, if t in [ak-bak+l] otherwise.
(12)
We note that {Zl,Z2"’" ZN} is a basis of the vector space of piecewise› linear functions xAt) that vanish at t = a and t = b. To define the approxi› mating quadratic form J l(X; ~,er), we choose rAt) = r(at), pAt) = p(at), and qAt) = q(at) if ak :::;; t < ak+ 1, where at = ak + a /2. Then
=
(l3a)
J(x; 0")
(l3b)
K(x; 0") =
f f:
[p,,(t)X’2(t) - q,,(t)x 2(t)] dt, r,,(t)x
2(t)dt,
and (l3c)
J l(X; ~,O")
= J(x; er) -
~K(x;
0")
120
3 The Second-Order Problem
define the fundamental quadratic forms. Choosing x(t) = baza(t) to be a vector in d(a) (repeated indices are summed) we have
J 1(x; ~,a)
= Jl(baza,bpzp; (,a) = bab pJ 1(za,zp; (,a),
where J 1 (x, y ; ~,a) is the bilinear form associated with the quadratic form J 1(x; (,a) = J 1(x,x; ~,a). Theorem 1 and the associated calculations follow immediately as in Theorem 1 of Section 3.3.
Theorem 1 J I(X; ~,a) defined by (13) is a quadratic form whoseassociated is symmetric and tridiagonal where matrix Dl(~' a) = (dap) = (eaP) - ~(faP) (eaP) and (faP) are defined below. From Section 2 or by direct calculation we have
(14b)
J(Zk> Zk+ 1; a) =
fa:~+12
[Pa(t)Zk(t)Zk+ 1(t) - qa(t)Zk(t)Zk+ 1(t)J dt
_ -p(at) a (*) a - 6"q ak’
and (14d)
K(Zk> Zk+ 1; a)
=
r.
rQ’(t)Zk(t)Zk+1(t) dt =
~ r(an
In practice we note that our quadratic forms are homogeneous of degree 2; hence we shall normalize (14) to obtain ek,k = aJ(zk; a), ek, k+ 1 = aJ(Zk> Zk+ 1; a), h,k = aK(Zk; a), fk,k+ 1 = aK(Zk,Zk+ 1; a), (15a)
and (15b)
which define the elements of the matrix Dl(~,a). The motivation for our algorithm is as follows. For each fixed real num› ber ~ the Euler-Lagrange equation for Dl(~' a) is given by the vector c that "satisfies" Dl(~' a)c = O. In general, ~ is not an eigenvalue, because the vector
3.4 The Numerical Eigenvalue Problems
121
C= (C1,CZ," .)T defines x(t) = c~z~(t), which does not (in general) vanish at t = b. An interval-halving procedure is used on ~ to find the corresponding x(t) that vanishes at t = b. The second eigenvalue is obtained by specifying that the corresponding x(t) vanishes twice on the interval (a, b] including once at b. The lth eigenvalue (1 :2: 3) is defined similarly. In this paragraph and the description of the algorithm below, we assume we are looking for the eigenvalues greater than ~* given in Theorem 8 of Section 3.3 and pic› tured in Fig. 1. This is certainly true if ret) :2: 0. If we are looking for the lth eigenvalues less than ~* then ~ -1-1 < ~ -I and this eigenvector vanishes I - 1 times on (a, b). The algorithm is as follows (assume 1= 1): (a) Choose ~L (the lower bound on ~) and ~u (the upper bound on ~). These values may be obtained by inspection or by searching for ~L which has no focal point in (a, b] and ~u which has a focal point in (a, b). (b) Evaluate (once) the numbers ek.b ek,k+1,fk,b andfk,k+1’ (c) Set ~ = !(~L + ~u). Define the matrix Dl(~,(J) whose elements are given in (15). (d) Find the components of the vector c=(cr.cz, ... )T, where c, is defined recursively with C 1 = 1 by
c 1 d ll + c zd 12 = 0,
(16a) (I6b) (I6c)
c 1d z 1 Ck-1dk,k-1
+
+ czd z z + C3d23
Ckdk,k
+
= 0,
Ck+1dk,k+1 =
(k = 3,4,5, ... ).
(e) If CiCi+1 > for all integers i = 1, 2, ... , N + 1, then the current value of ~ is too small, so set ~L = ~, which increases the lower bound. If CiCi+ 1 ~ 0, set ~u = ~. In either case return to step (c). (f) Stop if CiCi+ 1 > 0 for 1 ~ i ~ N - 1 and CN+ 1 ~ 0, or if ~u - ~L < 6, where 6> is a preassigned interval length for ~. The numerical eigenvector is c~z~(t), where c is the current value given by (16). We remark that if we wish to find ~l and the associated eigenvector, the only modification is that Step (e) becomes (e.). (el) IfCiCi+ 1 > 0 for all but 1 - 1 integers, i = 1,2, ... ,N Otherwise set ~u = ~. In either case return to Step (c).
+ 1, set ~L = ~.
We now present several test cases of computer runs. We wish to state once again that while our numerical algorithms in Section 3.2 can be matched by existing methods (the four-step Runge-Kutta process, for example), for a combination of speed, accuracy, and efficiency in implementing we believe the algorithm in this section is far superior to any existing method. Thus, for example Rayleigh-Ritz methods are difficult to implement and rather
122
3
The Second-Order Problem
slow, while shooting methods suffer from the requirement that the coefficient functions must be reevaluated when ~ is changed since the nth step at t; depends upon ~ and the previous values of the independent variable. Our matrix (eafJ) and (fafJ) need be computed only once for each (J since dafJ = eafJ + ~fafJ' The remaining operations to compute c are on the order of microseconds. In the first test case we find the first four eigenvalues of
x"(t)
+ ~x = 0,
x(O)
= 0,
x(n) =
0,
where are of course ~ = 1 , 2 , Y, 4 with corresponding eigenvectors sin J[i in each case. In Table 1 we give the results of the case when ~4 = 16. Although this example has "trivial" coefficients we have often seen that our 2
2
2
Table 1
Fourth Eigenvalue of x"
+ ~x = 0"
Lambda-up
Lambda-Io
Lambda
Crossing point
40.0000000000 22.0000000000 22.0000000000 17.5000000000 17.5000000000 16.3750000000 16.3750000000 16.0937500000 16.0937500000 16.0234375000 16.0234375000 16.0058593750 16.0058593750 16.0014648437 16.0014648437 16.0003662109 16.0003662109 16.0000915527 15.9999542236 15.9999542236 15.9999542236 15.9999370575 15.9999370575 15.9999327660 15.9999306202 15.9999306202 15.9999300838
4.0000000000 4.0000000000 13.0000000000 13.0000000000 15.25OOOOOOOO 15.2500000000 15.8125000000 15.8125000000 15.9531250000 15.9531250000 15.9882812500 15.9882812500 15.9970703125 15.9970703125 15.9992675781 15.9992675781 15.9998168945 15.9998168945 15.9998168945 15.9998855591 15.9999198914 15.9999198914 15.9999284744 15.9999284744 15.9999284744 15.9999295473 15.9999295473
22.0000000000 13.0000000000 17.5000000000 15.2500000000 16.37500000oo 15.8125000000 16.0937500000 15.9531250000 16.0234375000 15.9882812500 16.0058593750 15.9970703125 16.0014648437 15.9992675781 16.0003662109 15.9998168945 16.0000915527 15.9999542236 15.9998855591 15.9999198914 15.9999370575 15.9999284744 15.9999327660 15.9999306202 15.9999295473 15.9999300838 15.9999298155
2.679199 3.145996 3.004394 3.145996 3.104980 3.145996 3.132324 3.145996 3.139160 3.143066 3.141113 3.142089 3.141113 3.142089 3.141113 3.142089 3.141113 3.141113 3.142089 3.142089 3.141113 3.142089 3.141113 3.141113 3.142089 3.141113 3.141113
a
x(O) = x(n) = O.
3.4
The Numerical Eigenvalue Problems
123
algorithm loses little accuracy when handling more complicated smooth coefficients. We remark that our results are quite satisfactory. The four cases required less than 24 sec of computing time, which included compiling and a great deal of computer output. In each case we use N = 1024and e = 10- 6 . For completeness, we note that for this problem we obtained ~ 1 = 0.9999948, ~2 = 3.999978, and ~3 = 8.999955, which are our algorithmic approxima› tions to ~1 = 1, ~2 = 4, and ~3 = 9, respectively. For the second test case we use an example in Gould [12]. The equation is u"(x) - (1 - cos x)u(x)
+ ~u(x)
=0
with boundary conditions u(O) = u(n) = O. The results are given in Table 2. We remark that results in Gould using the Rayleigh-Ritz method for the upper bounds and the method of extension of special choice due to Bazley for the lower bounds are especially good for ~1 and ~2' This is because of the "special nature" of the example problem. These results cannot be ob› tained in general. In fact, it is usually difficult in practice to apply these methods. This is not really a criticism of these methods, but illustrates how really difficult eigenvalue problems have been. Table 2 Results for u"(t) - (1 - cos t)u(t)
+ ~u(t)
=
0"
Value o"
lth eigenvalue ~l ~2 ~3 ~4
n/800
n/lOoo
11:/2000
n/4000
1.9180598 1.9180598• 1.9180588 1.91805825 5.031943 5.031935 5.031923 5.031925 10.01440 10.01436 10.01432 10.01431 17.0083 17.0082 17.0079 17.0079
Lower bound Upper bound (Bazley) (Rayleigh-Ritz) 1.91805812 5.031913 10.011665 16.538364
1.91805816 5.031922 10.014381 17.035639
u(O) = u(n) = 0 with varying step sizes and error bounds for the first four eigenvalues. In all cases listed out focal point is "best possible." That is, if aN = n we obtain CN-lCN S; 0 the lth time. Thus the numerical boundary value corresponding to x(n) = 0 is obtained in the interval (n - (J, n). a
b
While our results are not as good for ~1 (we obtain seven-place accuracy; Gould obtains eight-place accuracy) we shall obtain similar results for any problem with this type of smoothness in the coefficient functions p(t), q(t), and r(t). Even in this test case the methods described by Gould are very difficult to implement. For larger eigenvalues they would be even more difficult. Our methods are simple to implement and are applicable to all problems. We generate (for small a) many eigenvalues without recomputing D(a) and E(a).
124
3 The Second-Order Problem
The third example has recently appeared in the literature. Reddien [44J has used projection methods to find the first five eigenvalues of the Mathieu equation
x" - (6cos2t)x =
x(O)
~x,
= x(n) = O.
As above, we remark that these projection methods are very difficult to implement. Table 3 shows the correct value, Reddien’s best value, and our value for the first five eigenvalues. Our results are significantly more accurate in all cases. Table 3 Eigenvalues of Mathieu’s Equation" lth eigenvalue
Correct value
OUf
-2.7853797 3.2769220 9.2231328 16.2727010 25.1870798 a XU -
(6 cos 2t)x
= ~x,
value
-2.7853789 3.2769236 9.22313769 16.2727148 25.1871123
x(O) = x(n)
Reddien’s value -2.7843130 3.2826062 9.2457998 16.3676806 25.5435224
= O.
We now begin the double-eigenvalue problem characterized by (3) and (4) above or more exactly by (17)
L(x; ~,a)
= [p(t)x’(t)]’ + l(t)x(t)
subject to X(tI.) = X(P) = x(y)
=
+ ~q(t)x(t)
+ ar(t)x(t)
= 0
0, where p(t) > 0, tI. < P < y, and
(18)
where (18a) (18b)
J(x) = K 1(x)
=
faY [p(t)X’2(t) -
f
l(t)x 2(t)] dt,
q(t)x 2(t)dt,
and (18c)
Note that with the two parameters
~
and a we need two conditions such as
x(P) = x(y) = 0 to obtain an eigenvector. For one parameter we need only one condition such as x(P) = 0 to obtain an eigenvalue. We shall discuss,
at the end of this section, another set of conditions to obtain eigenvectors for the two-parameter case of the associated Legendre polynomials. These
125
3.4 The Numerical Eigenvalue Problems
conditions are not meaningful in a physical sense but yield some results. Once again our summary picture is as in Fig. 1 of Section 3.3 with Land H of (17) and (18) replacing (1) and (2) of Section 2.3, respectively. We begin by presenting the theoretical results and preliminaries necessary for the remainder of this section. In particular, we give very general inequality results concerning the signature and nullity of quadratic forms in (18), relate this to differential equations in (17), and show how these results fit in a qualitative picture for our two-parameter problem. We then show how to build finite-dimensional Hilbert spaces by use of splines and finite-dimension quadratic forms, which are approximations of the quadratic forms in (18).We also give the Euler-Lagrange solution for this finite-dimensional problem. A very strong approximation result relating to our finite-dimension solution is given. We then give a two-dimension iteration scheme to find the proper values of ~ and 8, and test cases are given to show how efficient and numeri› cally accurate our procedures are. For each pair of real numbers (~,8) let the quadratic form H(x; ~,8) = J(x) - ~K1(X) - 8K 2(x) given in (18) be defined on the interval oc::;; t::;; 13. Let d(f3) be the set of all arcs x(t) defined on oc ::;; t ::;; 13 such that x(oc) = x(f3) = 0 and such that x(t) is absolutely continuous and x’(t) is square integrable on [oc, 13], d(f3) is a Hilbert Space with inner product [x, y) = x(oc)y(oc)
(19)
+
S:
x’(t)y’(t)dt
Ilxll
and = [(X,X)]1/2. Let SP(A,p) denote the signature (index) of H on d(f3), that is, the dimension of a maximal subspace f!J c d(f3) such that x#-O in f!J implies H(x; ~,8) < O. Let np(~,8) denote the nullity of H(x; ~,8) on d(f3), that is, the dimension of the space of arcs in d(f3) such that H(x, y; ~,8) = 0 for all arcs y(t) in d(f3) where H(x, y; ~, 8) is the bilinear form associated with H(x; ~,8) = H(x, x; ~,8) in (18). We pause to review briefly some characteristics of these nonnegative integer-valued functions, which the reader should picture as the number of negative and zero eigenvalues of a real symmetric matrix. We assume that Sy(~,8) and ny(~,8) is defined similarly to Sp(~,8) and np(~,8) above on the interval o: ::;; t ::;; ’Y and will "incorrectly" use the symbol H since there is no danger of confusion.
Theorem 2 Sp(~l>81)::;;
Sp(~2,82)'
Assume q(t) ~ 0 and r(t) Similarly Si~l>81)::;;
The first statement follows since H(x; ~l>81)
-
H(x; ~2,82)
~
0; then
~ 1 ::;; ~2
= J(x) - ~lK1(X)
+
~2K1(X)
= (~2 -
~1
SY(~2,82)'
~1)K1(X)
and
::;; ~2 and 8 1 Finally Sp(~,8)::;;
81 ::;; 82
::;; 82
imply
- 81K2(X) - J(x)
+ 82 Kz(X) +
(82 -
imply
Si~,8).
81)K2(x) ~ 0
126
3 The Second-Order Problem
or H(x; ~l,ed 2:: H(x; ~2,e2) so that if xo(t) implies H(xo; ~be1) < 0 then H(xo; ~2,e2) < 0. The final statement follows by defining yo(t) equal to xo(t) on [a,,8] andYo(t) == Oon [,8,y]. ThenifH(xo; ~,e) < 0, wehaveH(yo; ~,e) = H(xo; ~,e) < 0. It is of special importance to note the connection between s(~, e) and the number of oscillation points ofa nontrivial solution of(17) subject to x(a) = O. The next result is stated only for t = ,8 but holds equally well for ,8 replaced by any to, a < to ~ y. As above, we would (incorrectly) use the same symbol H to denote a quadratic form with integration over the interval [a, to]. Note that Sto(~' e) is a nondecreasing function of to follows as in the proof of Theorem 2. Theorem 3 The value of np(~,e) is zero or one. It is one if and only if there exists a nontrivial solution xo(t) to (1) such that xo(a) = xo(,B) = 0. The value of sp(~, e) equals m if and only if there exists a nontrivial solution Xl (t) of (1) satisfying x1(a) = and x 1(tj) = for j = 1,2, ... , m, where a < t 1 < t 2 < ... < t m < ,8. These results have really been given in more detail in Section 3.3, where and s are zero. Note that sp(~,e) counts the number of points t on (a,,8) for = 1. which nt<~,e) It is instructive to note that for q(t) 2:: 0, r(t) 2:: 0, and q and r linearly independent functions, we can separate the (~, e) plane into open sets ~
Om
= {(~,t:)lsp(~,t:)
= m, npR,e) = O}
with boundary lines
r m = ((~,e)lsp(~,t:)
= m, np(~,e) = I}. From Theorems 2 and 3 we note than r m’ m = 0,1,2, ... , defines a function s = gm(~) that is one to one and has negative slope (see Fig. 1). This follows immediately since for a fixed value of ~, we have shown in Section 3.3 that there exists an eigenvalue-eigenvector solution (a,xo(t» to Eq. (I) where xo(a) = xo(,B) = 0 and xo(t) vanishes at m points in the interval (a, ,8). In› creasing c, for example, implies that e must be decreased if we wish to stay on the boundary curve r m’ The relative negative values of the slope of T m and r~ depend on the relative size of q(t) and r(t) in the interval [a,,8] and [,8, y]. The last conclusion in Theorem 3 that sp(~,e) ~ Sy(~,£) shows that, for a as pictured in Fig. 1, for (~o,eo) "between" the r m and fixed value of(~o,eo) r m+ 1 curves and "between" the rj and rj + 1 curves defined below then m ~ j. If we define OJ and rj similar to above except that the "prime" denotes the y situation, i.e.,
OJ = {(~,e)lsy(~,e)
=
i. np(~,e)
= O},
3.4
The Numerical Eigenvalue Problems
127
E \
\
f2 \
\
\
\
\
\
\
\
\ \
\
(2,1>" \
f
\,
l
\
,, \
~ (2,2) ’(
\
fO
\
,
\ \
\
\ \
\ (0,2)
\
\
\
\
\ (0,3)
\ \
\
\
\
\ \
\
\
r3
\
r4 Fig .r
then our picture is given by Fig. 1, where (~o,Eo), the solution of the double› eigenvalue problem, lies at the intersection of the two lines r m and rj de› scribed above, and the ordered pair, designated (m, m’) at this intersection, denotes the fact that the corresponding eigensolution "crosses" the axis m times in («, /3) and m’ = j - m - 1 times in (/3, y). We now give the spline approximating setting associated with the dif› ferential-equation-quadratic form problem given by (17) and (18). The theoretical basis of these ideas is given in Sections 3.2 and 3.3. We ask the reader’sindulgence while we introduce more parameters and a generalization of the theory above with a product parameter" = (8, (J, 2, J.l). The reader may be more confused than enlightened by the next few paragraphs and may wish to skip these paragraphs ifhe understands the single-eigenvalue ideas. Briefly, we shall add one new parameter (the second eigenvalue B) to the quadratic form (13), which yields matrix elements (20a)
and (20b)
128
3 The Second-Order Problem
corresponding to (15), of a matrix D 2 ( ; , e, 0") corresponding to D 1( ; , 0") above. The theorem corresponding to Theorem 1 will now hold, and a two-param› eter algorithm involving ; and e will yield an eigenvector solution, cor› responding to the one-parameter algorithm involving; above. Let .91 be the space of arcs x(t) which are absolutely continuous, with x’(r) square integrable on [ct, y], such that x(ct) = x(y) = 0 and norm given by (x, y) = x(a)y(a)
+
f
x’(t)y’(t)dt.
Let L denote the set of real numbers of the form 0" = lin (n = 1,2,3, ... ) and zero. The metric on L is the absolute value function. For 0" = lin, define the partition n(O")
= (ao = ct <
where we assume y - ct = (N (21)
al
< a2 < ... < aN+ 1 = y),
+ 1)0" and
a k = k(y -
+ 1) +
(k = 1, ... ,N).
The space .91(0") is the set of continuous broken linear functions with vertices at n(O"). For each A. in [ y]. Now that our Hilbert spaces are constructed, we construct the appro› priate quadratic forms designated by H(x; 11) = H(x; ,1,,0",;, e), which are the approximating quadratic forms for (18). Thus define p,,(t) = p(at) if t is in [ak> ak+ 1), where at = ak + 0"12, with similar definition of l,,(t), qit), and rit). For 11 = (A.,O",;,e) let H(x; 11) = H(x,x; 11), where (22)
H(x, y; 11) =
f
[pAt)x’(t)y’(t)- l,,(t)x(t)y(t)] dt
-; LYq,,(t)x(t)y(t) dt -
e
f
rAt)x(t)y(t) dt
is defined for arcs x(t), y(t) in gJ(l1). As above, we define s(1]) and n(l1) to be the signature and nullity of the quadratic form H(x; 11) on the Hilbert space 93’(11). The connection between S(l1), n(l1), and oscillation or conjugate points is now given. Let 0", ;, and e be given; a point A. at which s(A., 0",;, e) is dis› continuous is an oscillation point of H(x; ,1,,0’,;, e) relative to {J’l’(A.)IA. in
3.4 The Numerical Eigenvalue Problems
129
[a, b]}. A direct extension of the results in Section 3.3 shows that, as a con› a) sequence of our ideas of approximation, the mth oscillation point Am(O",~, is a continuous function for m = 1, 2, 3, ... and Am < y. Continuity is in the sense of the metric defined above. When 0" = we have the continous prob› lem given by (17) and (18) and our definition coincides with the usual def› inition of oscillation or conjugate points. We follow the earlier ideas in this section numerically find s(A, O",~, a) for 0" # 0. Choose 0" = lin, a k as in (21),
()_{I - nit - akl
(23)
Zk t -
if t in [ak-bak+1] otherwise,
for k = 1,2,3, ... ,N and x(t) = bjZit) (repeated indices are summed) in I] = (A., O",~, a). Note that Zk(t) is the spline hat function and is the basis for our finite-dimensional space. A straightforward calculation shows that H(x; 1]) = bjbi/ij(l]) = bT D2(IJ)b, where b = (bbb 2, .. . )T, D2 = (dap) de› fined in (20), x = b.z.. Dil]) is a symmetric tridiagonal matrix "increasing" in Aso that the upper k x k submatrix of Diak+ b a,~, a) is Diak> O",~, a). The results of Theorem 4 are given in Section 3.2 with modification to this ex› ample in that we have two additional parameters ~ and a.
&6(1]) for
Theorem 4 The values s(A, O",~, a) and n(A, O",~, a) are, respectively, the number of negative and zero eigenvalues of the symmetric tridiagonal matrix Diak,a,~,a), where 0"#0 and ak~}, such that if IA. - ak+ 11 + 10" - 0"’1 + I~ - ~'I + la - a’i < (j and ak+ 1 is not a conjugate point to t = o: then the above sum is equal to (i) the number of oscillation points of (17) on (e, ak+ 1), (ii) the sum s(A,a’, ~', a’) + n(A., 0"’, ~', a’), and in particular (iii) the sum s(A, o,~, a) + n(A., o,~, 8) for the continuous case.
e,
Our final effort will be to construct a finite-dimensional approximation solution xAt) of problems (17) and (18). That is, if xo(t) is the solution to (17), x,,(t) is our approximate solution, and if they are normalized so that xAcx + a) = xo(cx + 0"), then (24)
f[x~(t)
- x~(t)]2dt
-+
0
as 0"
-+
0, ~
-+
~o, a -+ ao,
where ~ and a are associated with the approximate problem and ~o and ao with the continuous problem. Two steps are involved in this solution. The first is to construct the ele› ments d k k and d k k+ 1 of the symmetric tridiagonal matrix D2(1]). The second is to give the Euler-Lagrange equation of this matrix. This equation is the a) referred to above. A direct calculation in (22) and the same solution (;'(a,~,
130
3 The Second-Order Problem
steps as in (14) and (15) above lead to (25a)
dk,k = ek,k - Uk,k - sgk,k
= [p(at- d + Plan] -to"2[1(at- d + l(am - t~0"2[q(at_ d + q(an] -tw 2[r(at_ d + r(an], and (25b)
dk, k+ 1 = e k, k+ 1 = _
-
~fk,
k+ 1 -
sgk,k+ 1 s0"2 r(at)
p(aZ) _ 0"2[(at) _ ~0"2q(atL
6
6
6
Finally, we show as above that for a given fixed value of ~ and s, the finite-dimensional approximation to the solution xo(t) in (17) is the vector xAt) = CiZ;(t) (repeated indices are summed), where the components c, are defined recursively by (26a)
C1dll
(26b)
c 1d21 + C2d22
(26c)
+ c 2d 12 = 0, + c 3d23 = 0, (k
= 3,4,5" .. ).
= (C 1,C2,C3" ,1 is the vector defining xu(t) = c~z~(t), This vector c(O",~,s) where x,it) satisfies the limiting relationship in (24). In practice, if the cof› ficient functions p(t), l(t), q(t), r(t) are at all "nice" our algorithm is easy to apply and converges quickly. This is due to the fact that we approximated an integration process using the quadratic form (18) and not a difference process using the differential equation (17), Furthermore, for each choice of 0", the values ek,b etc., need only be computed once in our two-dimensional iteration scheme unlike the case of differential equations. This results in relatively little computation time and allows us to compute all numerical eigensolutions ~iO"), siO") with one computation of pAat), etc. We now give a two-dimensional iteration scheme that allows us to find c(O",~, s) under the assumption that p(t) > 0, q(t) ?: 0, and r(t) ?: 0. The con› dition p(t) > 0 is necessary to avoid singular theory; the nonnegativity of q(t) and r(t) may be obtained by rewriting our equations slightly, as in our example below, and is not a requirement in our original problem. More precisely, we find cm,m’(O")’ where m and m’ are the number of crossings of 2'(0",~, s) on the intervals (a, (3) and ({3, Y), respectively, described above and pictured in Fig. 1 on the intersection of the two curves r m and rj. Our innermost subroutine computes a solution c(0", ~,s) from (26). Ifthis solution "crosses" the axis "exactly" m times in the interval (a, (3), m + 1 times in the interval (a, {3], j = m + m’ + 1 times in (a, Y), and j + 1 times in
3.4 The Numerical Eigenvalue Problems
131
(a, y], we are done. Call this solution cm,m.(a). The word "exactly" means the crossing is within a predetermined (j neighborhood of 13 and y. If both ~ and e are too large, the resulting solution x,it) given by c(a,~, e) "crosses" too soon or is to the left of the curve given by cm,m.(a) and must be shifted to the right by decreasing ~ and e. Similarly, if ~ and e are too small, the curve x".(t) given by c(a,~, e) is to the right of the curve given by cm,m•(a) and must be shifted to the left by increasing ~ and e. The secondmost inner loop is the single-eigenvalue problem done twice. where e 1 is the solution to the Thus, for fixed ~ find e1 = el(~) and ez = ez(~), eigenvalue problem on (a,f3) with m crossings and ez is the solution of the eigenvalue problem on (a,f3) with j = m + m’ + 1 crossings. This enables us to find the points PI and Pz in Fig. 2. We assume without loss of generality that .1.e(~I) = ei~l) - el(~I) > 0 and .1.e(~z) = ez(~z) - el(~Z) < 0 have been found as in Fig. 2. Choosing ~' = t(~1 + ~z) we compute .1.e(O = ez(O › el(~')' If 1.1.f-l(~')1 < e where e is prescribed we are done. Otherwise if .1.f-l( ~').1.f-l( ~ 1) < 0, we set ~ z = ~' (if .1.e( ~').1.e( ~ 1) > 0, we set ~ 1 = ~') and repeat this process. The interval [~l' ~z] is halved at each step. This process converges to the desired solution. E
Fig. 2
Theorem 5 The algorithm described above converges to a numerical so› lution x,,(t) defined by cm,m.(a), which is an eigenvector of (17) or (18) corre› sponding to the double eigenvalue (.;, e(~») found above. The numerical solution cm,m•(a) is generated by (26) and satisfies the convergence criteria given by (24). We now consider the numerical example given by Fox [39, pp. 93-112] namely, (27)
x" + (~
+ e cos t + elx = 0,
x(O) = x(2) = x(4) = 0
132
3 The Second-Order Problem
or (28)
H(x; ~,I: )
=
4 f0 (X,2 - e1x2)dt - ~ f04X2 dt - I:: f04(cos t)X2 dt.
That is IY. = 0, f3 = 2, y = 4; p(t) == 1, l(t) = e’, q(t) == 1, r(t) = cos t. Note that r(t) ~ is not satisfied on [0,4]. To correct this situation and to obtain pos› itive values of ~ and 1::, we rewrite (27) (27’)
x"
+ [(~ - I:: + 60) + 1::(1 + cos t) + (e’ -
and (28) H(x;
~,I: )
=
60)Jx =
f04 [X,2 + (60 - el)x2J dt - I::f04(1 + cos t)x 2 dt - (~ + 60 -
1::)
f04x
2
dt
so that if we define new parameters by ;; = ~ - I:: + 60, B = e, then (28) becomes 4 (28’) H I,ll) = f0 [x’2 + (60 - el)x 2Jdt 1(x;
-;;f04x 2 dt - Bf04(1 + cos t)x
2
dt.
Note that H 1(x; 0,0) is positive definite as is H 1 (x; ;;, Ji) for;; s and B :s; 0. Note how "flexible" the double-eigenvalue problem is when compared with the single-eigenvalue problem, where we require J(x) > whenever K(x) :s; 0, x i= 0. The change from unbarred to barred parameters is always possible on fixed intervals. The double-iteration procedure described above yields ;; and B, and hence ~ = ;; + B - 60 and s = B with the same eigenvector in both cases. Table 4 gives the values of ~ and e corresponding to m and m’.The values for m = 0, m’ = 1 are given in Fox [39J as ~ = -4.6204, e = 7.8787; our Table 4 Values of
~
~
and e Corresponding to m and m’
0
1
2
0
-10.8034 17.1705
-4.6203 7.8785
-0.2913 -0.7143
1
-4.3164 24.3671
0.6342 13.6812
4.4521 4.9594
2
4.6384 34.3424
9.1117 23.0994
12.5428 14.6673
133
3.4 The Numerical Eigenvalue Problems
values are ~ = -4.6203, e = 7.8785. Additionally, our eigenvector for these values crosses the axis (using linear interpolation) at t{J = 1.99996 and t, = 3.999997. For other values of m and m’, Fox [39] only gives answers to the nearest hundredths. We agree with their answers in all cases. Their methods are somewhat unclear and do not seem to contain an attendant mathematical theory. Finally, to end this section, we should like to make some remarks about other criteria for determining double eigenvalues than the ones we have considered, that is x(fJ) = x(y) = O. As an example, let p(t) = (1 - t 2 ) , l(t) = 0, q(t) = 1, r(t) = 1/(1 - t 2 ), a = -1, fJ = 0, and y = 1 in the translation of the singular equation in (11) to the setting of (4). The solutions (in terms of I and m) of Eq. (11) are the zonal harmonics given by the equation where
and Po(t)
= 1,
P1(t)
P 4(t) = 3it
4
-
Pit) = tt 2
= t,
1jt + i, 2
-
P s(t) =
t P 3(x) = ~t3 s 6b - ¥t 3 + ¥t,
tt,
etc. Note Pi(t) is even or odd depending on whether m + I is even or odd. This criterion can replace (for example) the condition x(f3) = O. That is, given ( in our double iteration procedure we can choose e(~), so that Eq. (26) as a function of ( and e(~) will generate an even (or odd) solution. That is, we choose e(() so that Co = 0, C- 1 = C1 = xo(a) in (26) and iterate until a value of ( leads to a solution x(J"(t) [depending on ~ and e(~)] which vanishes a fixed number of times in (0,1). Analogously, we may desire Pi(t) to be a polynomial of degree k. This can be done by given ~, find e(() so that the solution in (26) gives a kth-order polynomial [the (k + 1)st-order numerical divided difference is zero], and iterating on ( once again. Even though we have no singular approximation theory at this time, it is not unreasonable to believe that we could solve perturbation problems where p(t, 6), l(t,6), q(t,6), and r(t,6) are continuous in 6 and reduce to p(t), l(t), q(t), and r(t), respectively, in (11) when 6 = 0. The same criterion discussed above, giving the associated Legendre polynomials when 6 = 0, should give (numerically and abstractly) perturbed Legendre polynomials on the interval [0, 1)or( -1,1). For the example ofthe last paragraph we may let a = -1 + e, fJ = 0, and y = 1 - e for e small and positive. On this subinterval of (- 1, 1) we usually have a well-defined double-eigenvalue problem where our theory can be demonstrated. Letting s decrease to zero (theoretically) will yield the desired result. Equivalently, we can choose a = -1, fJ = 0, y = 1, p(t) = (1 + e - t 2 ), and r(t) = 1/(1 + s - t 2 ) and allow e to decrease to zero. For
134
3 The Second-Order Problem
the single-eigenvalue case with m = 0, the criterion that the (k + l)st-order divided difference vanish works very well. To be more specific, when m = in the problem ofthelast paragraph the solution is the eigenvalue IX = 1(1 + 1) and the corresponding eigenvector is P,(t) as defined in the previous para› graph. In this case, we find that values of IX less than l(l + 1) yield an (l + l)st› order divided difference of the numerical solution that is positive for all t while values of IXgreater than 1(1 + 1)yield a divided differencethat is negative. This follows since Pl’(t) is a polynomial of degree 1 - m + m when m is even. A simple iteration procedure on IX yields the correct value IX = l(l + 1) with eigenvector Pl(t). When m =1= 0, the sign of the desired divided differences does not always remain the same throughout the interval. Thus this idea does not work when m =1= O. We shall see that a modification ofthis idea does work. These last results were obtained by Joseph Gibson in a master’s paper under the author’s direction. Gibson also pointed out that symmetry of the problem (and solution) require special care. That is, if IX = -1, fJ = 0, and y = 1, we do not have our usual double-eigenvalue problem by requiring a solution to vanish at IX, fJ, and y. This is due to the fact that any solution xo(t) satisfies xo(- t) = xo(t) or xo(- t) = - xo(t). Gibson used x(O) = x(l) = 0 as one condition and hence could not use x( -1) = 0 as the second condition. As we note above, the constraint x(O) = x(l) = 0 does correctly determine the function a(m) or m(a). When he specified a point to in (0,1) such that Pl’(to) = 0 as our second condition, our perturbation problem yielded the correct solution with eigenvector Pi(t) as we then have a double-eigenvalue problem. Gibson also discovered that although the (l + l)st-order divided dif› ference does not always remain constant on an interval, he could often obtain the correct solution if he chose the "correct" point at which to compute the divided difference. For the solution with m = 2 and 1= 3, the fourth-order divided difference was of constant sign in the interval (0,0.5) and he was able to compute this solution using the scheme described for m = O. For m = 2 and 1= 5 the sixth-order divided difference was of constant sign in the interval (0.6,0.9)and he was able to compute this solution. In all sample cases Gibson was able to determine the correct solution by iterating on a until the graph of the function f~(t) was "lowest." This is not a practical procedure since this inspection required his direct interaction with the computer program. The function f~(t) is defined as
where g(t) is the (l + l)st-order divided difference of our numerical solution at t. For a = 12 (l = 3) he obtained value of fa(t) between -15.5 and - 8.1 for values of a and t in the intervals 11 :::;; a :::;; 13 and 0 < t < 9.5. For "almost all t," f~(t) was decreasing in fJ where fJ = 112 - al.
3.5 Proofs of Results
3.5
135
Proofs of Results
The purpose of this section is to give the proofs of Sections 3.1-3.4 that we had earlier omitted. We shall not restate theorems but refer to the theorems in the order in which they appear in the earlier sections. Omitted proofs of theorems or parts of theorems indicate that these proofs follow, as we have indicated in the text where the theorems occur. We shall also include some comments of historical interest. Our notation in this section is consistent with other sections. New theorems in this section are numbered as they appear; otherwise they are denoted as illustrated by the next line. Proof of Theorem 1 (Section 3.1) We note that this result and the next theorem were originally proven by Hestenes [27, pp. 568-571 J using different methods. The proof of the second sentence is given in Theorem 14 of Section 2.3 where we show that S(Ao - 0) ::;;: s(}’o) ::;;: S(Ao - 0). Similarly in Theorem 13 of Section 2.3 we show that S(Ao) + n(Ao) = S(Ao + 0) + n(),o + 0) so that S(Ao + 0) - S(AO) = n(A)- n(A + 0). The final result follows immediately by summing both sides of the last given equality. Proof Theorem 2 (Section 3.1) This theorem follows immediately from Theorem 16 of Section 2.3, Theorem 1 of Section 3.1, and the various discus› sions. We note that the critical idea is that the disjoint hypothesis of Theorem 2 of Section 3.1 implies that n(Ao + 0) = O. The major result in Section 3.2 is Theorem 2. For clarity of exposition in Section 3.2, we gave Theorem 2 first. The other results (theorems) of this section are "lemmas" to Theorem 2 or follow from our discussion and are left to the reader. We now assume in Section 3.2 that the "lemmas" which we gave as Theorems 3, 4 [except for part (d)], and 6 have been proven. The following three results lead to the proof of Theorem 2 of Section 3.2. Our first result in this section follows after Theorem 6 of Section 3.2. In Theorem 6 we constructed negative vectors for D(a). We now wish to show that we have the correct number. This will be done by showing that the sequences {ck} and {Pk} are very closely related. Thus choose M, a positive integer, so large that b - a < M CT. Define M-1
gl =
nd
i , i+ 1
and
i=l
gk gk+1 =-d-› k, k+ 1
(k = 1,2, ... ,M - 1).
Notethatg ngn+ 1 <0. Define the vector Y=(Y1’YZ,Y3,•.,)T by Yk= (_I)k+ 1Pk_1gk’ The kth component of the vector D(a)y is the quantity [d k. k-1Pk- Zgk-1 - dk,kPk-1gk
= [d~,k-lPk-Z by the definition of Pk’
- dk,kPk-l
+
dk,k+ 1Pkgk+ 1J( _1)k
+ pk](-I)kgk = 0
136
3 The Second-Order Problem
Lemma 1 There exists a constant y =F 0 so that Ck = y( _1)k-lPk-lgk’ Thus the result in Theorem 4 [part (d)J of Section 3.2 holds.
Let Ek(a) denote the matrix formed from the first k rows and k + 1 columns of D(a). By removing the first column of Ek(a), we note that the rank of Ek(a) is k. Let z be in [R;k+ 1. Then the solution space of Ek(a)z = 0 is one dimensional. By construction, WI = (cI>C2"’" CbCk+l)T and W2 = (Yl,Y2’" ., Yb Yk+ l)T are in the null space of E(k)(a) when Yk = (-I)k+lPk_lgk’ This completes the proof. We next must establish that conditions (1) and (2) of Section 2.3 are satisfied in the a setting of Section 3.2. This is done in Theorems 4, 5, and 6 of Section 4.4 for the more general 2nth-order problem and hence for the problem setting of Section 3.2. For convenience and without loss of generality we assume Ao < b is an oscillation point, i.e., xo(t) is a nonzero solution of (1) of Section 3.2 such that xo(a) = 0 = xo(Ao). Furthermore, assume for each a =F 0 that k; satisfies ak" ~ AO < a k,,+l ~b, C = (Cl, C2,•••,Ck", 0, ... ,0), X (t) = I~~l CiZi(t), where Zi(t) is the ith spline basis element and J(x l1(t); a) < O. By Theorems 4 and 5 ofSection 3.2 Ck"Ck,, + 1 ~ 0 so we may choose C 1 such that J(x l1(t); a) < 0 and so that l1ll = 1. I1
Ilc
Cn " ,
Lemma 2 For each a = 11m wehave J(c l1 ; a) < 0, where cl1 = (c 1 , C2, ... , 0); c, is chosen by our algorithm. Furthermore, if = 1, then there
Ilcall
exists Yo(t) in d(b) such that Yan(t) => Yo(t). The first part was proven above. The second part follows since Hilbert spaces are weak sequentially compact; that is, if {wd c S (bounded), then there exists Wo in S such that Wk n - Wo for some subsequence {wd of {wd. Theorem 3 The vector ca = (CI> c 2 , ,ck", 0) given by the algorithm defines xl1(t) = caza(t) as described above. The sequence {xl1 (t)} converges strongly to xo(t) (as a - 0) in the derivative norm sense of (10) of Section 3.2. Thus Theorem 2 of Section 3.2 holds.
This completes the proofs of Section 3.2. The proof of Theorem 1 of Section 3.3 is given in the text. In the next result, conditions (1) and (2) of Section 3.3 are, respectively, conditions (1) and (2) of Section 2.3.
Proof of Theorem 2 (Section 3.3) Since d(J1) = d(a), conditions (1)hold. For (2a), let x., y, in d(J1,), r = 0, 1,2, ... with x, - X o and y, => Yo. Then (with the obvious notation) Htx., y,; fl,) - H(x o, Yo; flo) = J(X., y,; a,) - J(xo, Yo;ao)
+ eo[K(xo , Yo;a) - Ktx., y,; a,)]
+ (~o
- ~,)K(x.,
y,; a.).
137
3.5 Proofs of Results
Ifr ---> 00, the first term goes to 0 since (2) holds on L, the third term goes to 0 as Kix., Yr; a r) is bounded, and the second term goes to 0 by the equality 2[K(xo, Yo; 0"0)
-
K(x" Yr; O"r)] = K[xo + Yo; 0"0) - Kix, + Yr; a r) - K(xo; 0"0) - K(yo; 0"0)
+ Kix.: a r) + K(Yr; O"J For (2b) let lim Ar denote lim inf.; CD Ar ; then O"r)] lim Hix.; Jir) = lim[J(xr ; O"r) - ~rK(xr; ~ limJ(x r ; O"r) - lim ~rK(xr; O"r)
~
J(xo;
0"0) -
~oK(xo;
0"0)
= H(xo; Jio). For (2c) if x, ..... x o , lim Hix.; Jir) = H(xo,Jlo); then r= 00
= lim J(x r; Jir) - lim ~rK(xr; r=
00
r~
Jir)
00
so that J(xo; ao) = Iim r =oo J (x r ; ar)’ Since (2c) holds on L, we have Xr=>XO. This completes the proof. Proof of Theorem 9 (Section 3.3) If the first result is not true, we may choose sequences {O"q}, {xq} such that O"q ..... ao, x q in d(O"q), Ilxqll = I, K(xq; O"q) ~ 0, and J(x q; O"q) ~ 0. Since {xq} is bounded, there exist Yo in d and a subsequence {xqJ, which we assume to be {xq } such that xq ---> Yo. By (1) of Section 2.3, Yois in d(O"o). We claim Yo = 0. If not, K(yo; 0"0) = limq=CD K(xq; O"q) ~ 0 implies J(yo; a o) > 0, which is impossible, as ~
limsupJ(xq; a q) ~ liminfJ(xq ; O"q) ~ J(yo; q=oo q=oo
0"0)’
Thus J(yo; 0"0) = 0 = lim q= CD J(x q; O"q) and by (2c) of Section 2.3, x q=> 0. The contradiction 1 = limq=00 Ilxqll = 11011 = 0 establishes the first result. For the second result, by Theorem 8 of Section 3.3 there exists ~* such that H(x; Jio) > on d(Jio), where Jio = (A,*, ao). Thus n(),*, 0"0) = s(A,*, ao) = 0. The result now follows by Corollary 7 of Section 3.3. Proof of Theorem 10 (Section 3.3) We may assume
~*(ao)
~
e’ < e"; if
e’ < e*(O"o) < e" we consider the two intervals e’ ~ ~ < ~*(ao) and ~*(O"o) ~ e~ C separately. Assume s(~',O"o) = n, then by Theorem 8 of Section 3.3,
s(A,",O"o) = n + k - l,n(A’,O"o) = n(A",O"o) = O. By Corollary 4 of Section 3.3, there exists <5 > such that if pta, 0"0) < b then n(A.’, a) = n(A",a) = 0, s(A’,0") = nand s(A.", a) = n + k - 1. The result follows from Theorem 8 of Section 3.3 by taking 8 = min(b, IJ) where 1] is given in Theorem 9 of Section 3.3.
138
3 The Second-Order Problem
There remains one major result to prove in Section 3.3; that is, Theorem 14 of Section 3.3. This result will also be given in Theorem 3 of Section 4.2 with the ~ (eigenvalue) parameter suppressed. In fact, the (~, a) problem is equivalent to the a problem. The difficulty occurs when we add the ), (resolvent) parameter. We will see that our signature inequalities are still true. However, condition (1b) of Section 2.3 is not in general tube in the (A, a) setting so that further care must be taken to obtain these inequality results. For ease of presentation we will suppress the ~ parameter and deal with the resolution of the spaces d(a) by the collection {Yl’(A)IJ. in [a,b]}. In addition we shall restate some notation and remarks.
Proof of Theorem 14 (Section 3.3) Let M = A x L be the metric space with metric defined byd(Jlb Jl2) = 1,12 - Atl + p(a2, at), where u, =()’bat) and Jl2 = (,12’ (2)’ For each Jl = (A,a) in M, define J(x; Jl) = J(x; a) on the space PJ(Jl) = Yl’(A)n d(a). Let s(Jl) = s()" a), n(Jl) = n(A, a) denote the index and nullity of J(x; Jl) on PJ(Jl). We shall use the terminology "holds on Mil to refer to conditions (1) and (2)in the Jl setting of this section as opposed to the a setting of Section 2.3. Lemma 4 is immediate as J(x; Jl) = J(x; a) on d(Jl).
a
Lemma 4 Lemma 5 onM.
If (2) holds on L, then (2) holds on M. If (1a) holds on Land (7a) of Section 2.3 holds, then (1a) holds
Suppose u, ~ Jlo, xq inPJ(/lq),x q ~ xo, where u, = (}’q,aq),q = 0,1,2, .... From a q ~ ao, x q in d(a q), x q ~ Xo we have Xo in d(ao). From Aq ~ ,10 we have Xo in Yl’(Ao).Thus Xo in Yl’(A o) n d(ao) = PJ(Jlo). Theorem 6 Assume (1a) and (2) hold on L and that (7a) of Section 2.3 holds. For any Jlo = (Ao,ao) in M there exists 15 > such that if /l = (A, a), d(Jlo,a) < 15, then
s(A, a)
+ n(A,a) ::; s(Ao,a 0) + n(Ao,a 0)’
The proof of this result follows immediately from Theorem 4 of Sec› tion 2.3. We note that condition (1b) does not hold on M without extra hypotheses. These extra hypotheses are not necessary to prove our other inequality. Theorem 7 Assume (lb) and (2) hold on L and that (7b) of Section 2.3 holds. For any Jlo = (,10’ ao) in M there exists 15 > 0 such that if Jl = (A, a), a(Jlo, /l) < 15, then
3.5
Proofs of Results
139
We note there exists <5 > 0 such that (l(llo, Il) < <5 implies the following inequalities hold: s(A,o, 0"0) ::; s(Ao- «5,0"0) ::; s(Ao - <5,0") ::; s(A,O").
The first inequality holds by Theorem 5 of Section 2.3 as
~(Ao,
0" 0) = cl(
U ~ (A, 0" 0))
whenever
a
< ,1,0 ::; b.
as A
The second inequality holds by replacing .91 with J’l’(Ao - 0") in (1b) and Theorem 10 of Section 2.3. More specifically, if x is the projection of x onto J’l’(A- 0) and Xu is in .91(0") and given by (1b), then Xu in J’l’(A- 0) 1\ .91(0") and IIxu - xoll ::; IIxu - xoll < e. The third inequality follows as J’l’(}’o- «5) c J’l’(A). ~
Combining Theorems 6 and 7 we have Theorem 14 of Section 3.3 with suppressed. This concludes the proofs of the results in Section 3.3.
The 2nth-Order Problem
Chapter 4
4.0
Introduction
In this chapter, we consider ideas and results that are applicable to a very general class of quadratic forms and to linear self-adjoint operators of a generalized Fredholm type. That is, 2nth-order, integral-differential systems such as rij(t) = Vij-1(t), or as the generalized system (assuming suffi› cient differentiability) (1)
d" d"-1 dt" [rij(t)] - dt"-1 [rij -1(t)]
+ ... + (_1)"[r~(t)]
=
0,
where arcs x(t) = (x 1(t), xz(t), . . . ,xp(t» define equations
r~(t)
= R~p(t)x~)(t)
v~(t)
= f;r~(s)ds
v~(t)
=
f: [r~(s)
+ f: K~p(s,
t)x~)(s)
ds,
+ c~, - v~ - 1(S)] ds + c~
((X, f3 = 1, ... , p; k = 1, ... ,n - 1; i, j = 0, ... ,n). In the above, R~p(t) and K~p(s, t) satisfy smoothness and symmetry properties sufficient to guarantee that our system is the Euler-Lagrange equation for J(x) below; x~)(t) denotes the ith derivative of the (Xth component function; and repeated indices are summed. For p = 1 and n = 1 we obtain (ignoring subscripts) the generalized equation
:t (Ril(t)X(i)(t) +
s:
Kil(s,t)X(i)(S)dS) = RiO(t)x(i)(t) + 140
s:
KiO(s, t)x(i)(s)ds,
4.0
Introduction
141
where i = 0, 1. If, in addition, Kii(s, t) == we are back to the problems of Chapter 3. The fundamental quadratic form J(x) is given by (2)
J(x) = H(x)
+
f: f K~p(s,
t)x~)(s)xW)(t)
ds dt
+ Lb R~p(t)x~)(t)xW)(t)dt, where H(x) is a quadratic form in x(a) and x(b). As the reader may appreciate, the ideas of Chapter 3 are examples for this chapter and should be understood before this chapter is seriously attempted. We shall give less heuristic material such as pictures and examples than we gave in Chapter 3. However, we give some examples when appro› priate. Thus we give an example problem with several relevant comments. We consider rather deep comparison theorems including some nonlinear situations. We give a detailed derivation of the Euler-Lagrange equation when p = 1 and n = 1, and we construct the numerical matrix (quadratic form) when n = 2. This setting is so general that we have a choice of material and problems to pursue. Hence we leave many problems to another day or to the reader. In Section 4.1 we consider the most general type of Fredholm quadratic form associated with the integral-differential system. The connection be› tween (1), which is the Euler-Lagrange equation, the transversality, or boundary conditions, and the quadratic form J(x) in (2) is the main result. Ifthis were a research paper, we would give a summary setting and theorem such as in thenext two paragraphs. We would then use Theorem 1 and our approximation results of Section 4.2 to derive a focal-point theory as in Chapter 3 and a focal-interval theory as we will give in Chapter 6. However, for many reasons, we shall give the more complete ideas of Lopez [36] in Section 4.1. We shall then use our approximation theory to rederive and generalize these ideas. Let f!4 denote a subspace of d such that x is in f!4 if and only if Ly{x) = M~ax~k)(a)
(3a)
= 0, x~)(b) = 1; y = 1, ... .m s; np) where M~a are real
p = 1, ... ,p; k, 1 = 0, ... ,n numbers such that the linear functionals Ly(x) are linearly independent on d. Let f!4(}"), a :s; }" :s; b, denote the subspace of [!g whose component func› tions satisfy
(oc,
(3b)
X~k)(t)
== 0
on}" :s; t :s; b
For any arc x(t) in d set (4)
'l:~(t)
= R~p(t)x~)(t)
+
for k = 0,1, ... ,n - 1.
f K~p(s,
t)X~k)(S)
ds
142
4
The 2nth-Order Problem
for almost all t on a :s; t :s; b. Define the recursive relations
f: T~(S) = f: [T~(S)
(5a)
VO(t) =
(5b)
v~(t)
ds + c~, -
V~-l(S)]
dx
+ c~
(k = 1,... ,n - 1),
where c~, . . . ,cp- 1 are real numbers. Let J(x, y) be the bilinear form asso› ciated with J(x), i.e., J(x) = J(x, x).
Theorem 1 Let J(x) be the quadratic form given by (2). There exists an arc x = (Xl(t),. . . ,xit)) ill d such that J(x, y) = 0 for all y = (Yl (t), . . . ,Yp{t)) ill B8(J..) if and only if the constants c~, . . . ,cp- 1 in (5) and constants Ill’ ... ,Ilm can be chosen such that the Euler equations (6)
({3 = 1, ... ,p)
hold almost everywhere on a :s; t :s; J.., and the transversality conditions (7)
hold at t = a. The proof of this result follows in a nontrivial, expected way from the method of integration by parts or by the Riesz representation theorem for Hilbert spaces. For n = 1 and K~p(s, t) identically zero for all indices, the results are given in Hestenes [27]. In (7) H(x) = A~~x~k)(a)x~)(a). The basic results of Theorem 1 are to show that the spaces P4(J..) and the forms J(x) in (2) satisfy the resolvent £(J..) theory of Section 2.3. We then characterize P4(J..f and P4 o(J..) in terms of solutions of equations and trans› versality conditions similar to Theorem 1. If K~p(s, t) == 0 for all i, i. IX, {3, then we obtain a focal-point theory including Theorem 16 of Section 2.3, where s(J.. o) = LA
4.1 The Signature Theory of Lopez
143
equations along with transversality conditions. We shall see it is possible to compare ordinary differential equations to integral-differential equations as well as to nonlinear differential equations. In Section 404 we apply our approximation theory to numerical problems. The use of higher-order splines to replace the hat functions of Section 3.2 is of interest. For illustrative purposes, we "construct" the numerical approx› imation matrix (quadratic form) for fourth-order differential-equation problems. Finally, we mention a use of integral-differential equations for possible study of oscillation properties of differential equations which normally appear to be non-self-adjoint and/or of odd order. For example, at the end of Section 4.1 we encounter the differential equation L(x) = X"’ + X’ = O. This is not a self-adjoint problem, but L(x), in some sense, is connected with L1(x) =
X"
+ X-
f x(s)ds,
which is self-adjoint. Thus some solutions of odd-order equations can be studied by the methods of this chapter. Similarly, for a more complicated example, Lz(x) = (rx’)" + (qx)’ = (rx’)’ + qx comes from (rx’)’ + qx = et
f eSx(s)ds.
In fact, the left-hand side of Lz(x) may be differentiated as many times as desired with the same right-hand side.
4.1
The Signature Theory of Lopez
The purpose of this section is to explain the connection between the quadratic form (2) and (the possible integrated form rp(t) = vp-1(t) of) the integral-differential system (1) with the associated boundary conditions (3), given in Section 4.0. A nontrivial example is given to illustrate these ideas. We have briefly seen this problem as Example 4 of Section 104. In Section 1.3, we have given fundamental lemmas (integration by parts) so that the reader may anticipate the proofs that we shall not give. The majority of this section comes directly from Chapters 8, 11, and 12 of Lopez’s dissertation [36], which contains more information than we shall be able to give here. Our choice of topics and results is in keeping with the rest of this book. The fundamental Hilbert space s1 considered in this section is the set offunctions z(t) = (z 1(t), . . . ,zp(t)) whose «th component zit) is a real-valued function defined on the interval a ::;:; t ::;:; b of class en-I; zin-l)(t) is absolutely continuous and zin)(t) is Lebesgue square integrable on a ::;:; t ::;:; b. The inner
144
4
The 2nth-Order Problem
product is given by (1)
where !Y. = 1, ... ,p, k = 0, ... ,n - 1, superscripts denote the order of differ› entiation, and repeated indices (except for n) are summed. Theorems 1 and 2 follow as expected. The symbol "--+" denotes weak convergence while "=>" denotes strong convergence. Throughout this section !Y., f3 = 1, 2, ... , p; k, 1=0, 1, ... , n - 1; i, j = 0, 1, ... , n, and repeated indices (except for n) are summed. Theorem 1 The vector spaced with the inner product (1) is a Hilbert space. The norm IIxll of the arc x is given by the formula IIxll = (x, X)1/2. The --+ relation x q => Xo holds ifand only iffor each « and k, x~~ --+ x~~(a) and x~~(t) x~J(t) in the mean of order 2. Similarly, the relation x q --+ Xo holds if and only if for each !Y. and k, x~~(a) --+ x~~(a) and x~~(t) --+ x~'8(t) weakly in the class of Lebesgue square integrable functions. In either case, for each !Y. and k, x~~(t)--+ x~~(t) uniformlyon a ~ t ~ b.
Corollary 2
Let .91* be the class of arcs x(t) in .91 such that x~k)(a)
=
= O. The subspace .91* together with the induced inner product
x~k)(b)
(2) is a Hilbert space.
We now consider the quadratic form J(x). Let (3a)
+ B:~[x~k)(a)x~)(b)
H(x) = A:~x~k)(a)x~)(a)
+ x~k)(b)x~)(a)J
+ C:~x~k)(b)x~)(b), k1 d eno t e rea l k1 = A1pa’ k Bap k1 = Blk kl = td kl Bkl h Aap, were ap, Cap constan s an Aap pa’ C ap a The quadratic form J(x) on .91 is given by the formula
qk
J(x)
(3b)
=
f
t)x~i)(s)x~)(t)
f R~p(t)x~)(t)x~1(t)
dt,
H(x)
+
+ Lb K~p(s,
ds dt
t) and R:~(t) are integrable; K:’P(s, t) and R:’P(t) are square t) and R~p(t) are essentially bounded and integrable; K~p(s, t) = Kffit,s) and R~p(t) = Rffa(t); and H(x) is given by (3a). The corre› sponding bilinear form J(x, y) is given by the formula where K:~(s, integrable;
(4a)
K~p(s,
J(x, y) = H(x, y) +
LbLb K~p(s,
+ Lb R~p(t)x~)(t)y~1(t)dt,
t)x~)(s)y~)(t)
ds dt
4.1 The Signature Theory of Lopez
145
where the bilinear form Hix, y) is given by the formula (4b)
H(x, y) = A~~x~k)(a)yW(a)
+ B~~[x~k)(a)y~)(b)
+ x~k)(b)y~)(a)J
+ C~~x~k)(b)y~)(b).
For any arc x(t) in .91, let k1x(k)(a) HIfJa(x) = A(f.fJ + Ba{Jkl x(k)(b) (f. (f. , k1x(k)(a) + Ck1x(k)(b) HIfJb(x) = B(f.fJ (f. (f.fJ (f.
(5)
and (6)
for almost all points t on the interval a
~
t
~
b. By Fubini’s theorem,
By the Riesz representation theorem (see Section 2.1), there exists a self› adjoint, linear transformation y = Tx on .91 such that the representation J(z, x) = (z, y) holds for every arc z(t) in d. The inner product (z, y) of z and y is given by (1). In order to describe this transformation let L(x) denote the linear form L(x) = a~x~k)(a)
(8)
on .91, where a~, integrable. Let (9)
B~(t)
=
b~
+ b~X~k)(b)
are constants, A~(t)
+
f: A~(t)x~)(t)
dt
are integrable, and A~(t)
are square
A~(t),
B’;(t) = A;(t) +
f B;-1(S) ds + b;-1
(m = 1,2, ... ,n - 1).
The following result concerning L(x) is easily established. It follows from Hestenes [27]. Lemma 3 Let L(x) be the linear form given by (8). There exists a unique arc y(t) in .91 such that the representation L(x) = (x, y) holds for every arc x(t) in .91. Moreover, the unique arc y(t) is determined by the following conditions:
(lOa) and (lOb)
146
4 The 2nth-Order Problem
In order to state the next result, let (11)
B~ O( t )
= Jar A~(s)
B~(t)
=
t
f)A~(S)
ds + c~, - B~-l(S)J
ds + c~,
where c~, . . . , C~-l are constants and A~(t), . . . , A~-l(t) are the functions given by (8). The following result also holds. Note that (13) is Eq. (1) of Section 4.0. Lemma 4 Let.PI* be the subspace of arcs x(t) in .PI such that x~k)(a) = = O. Let L(x) be the linear form given by (8). Then L(x) vanishes iden› 1 tically on .PI* if and only if constants c~, ... , in (11) may be chosen such that x~k)(b)
c:-
(12)
holds almost everywhere on a :s;; t :s;; b. In particular, suppose that A~(t) ofclass C. Then L(x) vanishes identically on sd" if and only if
are
(13) holds on a :s;; t :s;; b. For an arc x(t) in .PI, let (14)
B7J(t) = 7:7J(t) +
f B7J- 1(s)ds + Hpb-1(X),
where H7Jb- 1(X) are given by (5) and 7:~(t), ... , 7: p- 1(t) are given by (6). The self- adjoint, linear transformation T associated with the Hermitian, bilinear form J(x, y) is now described. Theorem 5 Let x(t) be an arc in .PI and let J(z,x) be the Hermitian, bilinear form given by (4). There exists a unique arc y(t) in .PI such that the representation J(z, x) = (z,y) holds for every arc z(t) in d. Moreover y(t) is determined by the following relations, (15)
where H~a(x), by (14).
H~b(X)
y~k)(a)
=
y~n)(t)
= 7: p(t)
+ H~b(X)
H~ix) +
+ Lb B~(s)ds,
f Bp-1(s)ds + Hpb1(X),
are given by (5), 7: p(t) are given by (6) and B~(t)
are given
147
4.1 The Signature Theory of Lopez
The conclusion of this theorem follows directly from Lemma 3 with L(z) = J(z, x).
For any arc x(t) in sf of class C 2n, let
I./ _1)k-’ dt’d’ rrk+’(t) + (-I)kBp-k(t), a Ep(x) = ~ (_l)n- ’i -a t)x: (s)ds t + ito (-It- i::i [R~P(t)x~)(t)], k
(16) (17)
E~[t,
x(t)]
=
b
LJ i=O
J
i
i K~(ls,,.
(I)
a
where rrk+’(t), Bp-k(t) are respectively given by (6), (14) and K~p(s, are of class ci. Corollary 6 If K~/ls, class C 2n, then
t), R~p(t)
(18)
y~)(a)
= Bp(a), y~n+k)(a)
y~2n)(t)
= E/1(x)
t), B~p(t)
are of class Ci and x(t) is an arc in sf of = E~[a,x(a)J
and where Bp(a) are given by (14) at t and Ep(x) is given by (17).
a, E~[a,x(a)J
=
are given by (16) at t
= a, ’
The next three lemmas characterize the complete or weak continuity of quadratic forms in this section. They allow us to determine conditions of ellip› ticity in Theorems 11 and 12. Thus we have very general examples of fun› damental concepts which are found in Section 2.1. Lemma 7 The Hermitian, bilinear form (19)
is completely continuous on sf. Lemma 8 The quadratic form
(20)
K(x) = H(x) +
lb R~p(t)x~)(t)x~)(t)
dt
(i
+j
=I- 2n),
where H(x) is given in (3a), is completely continuous on sf. Lemma 9 Let K(x) be a completely continuous quadratic form on sf and ra/1(t) == R:p(t). The quadratic form
(21)
Q(x) = K(x) +
lb raP(t)x~n)(t)x~)(t)dt
148
4 The 2nth-Order Problem
is weakly lower, semicontinuous on .xi (22)
if and only if the inequality
rap(t)1r a1rp 2:: 0,
holds almost everywhere on a :::; t :::; b for every 1r #-
in \RP.
Condition (22) is, of course, the weak Legendre condition ofHestenes [27]. With the help of the preceding lemma, the quadratic form D(x) given in the subsequent result is characterized as to positive definiteness (see Section 2.1). Theorems 10, 11, and 12 are the expected generalizations.
Theorem 10 Let D(x) be the quadratic form (23)
+
D(x) = x~k)(a)x~k)(a)
f rap(t)x~n)(t)x~)(t)
dt
defined on .xi where rap(t) == R:ji(t). Then for some constant h, 0< h < 1, the inequality (24)
holds on .xi (25)
if and only if the inequality rap(t)1r a1rp 2:: h1r a1r a
holds almost everywhere on a :::; t :::; b for every set tt #- in \RP. Theorem 11 Let J(x) be the quadratic form givenby (3) and raP(t) == R:ji(t). Then J(x) is an elliptic form on ss if and only if for some constant h, 0< h < 1, the inequality given by (25) holds almost everywhere on a :::; t :::; b for every set tt #- in IRP. If rap(t) are continuous, then the inequality (25) holds if and only if the inequality (26)
r ap(t)1r a1rp> 0,
holds on a :::; t :::; b for every 1r #- in IRP. Theorem 12 If J(x) is an elliptic form on ss, then the determinant /raP(t) I is different from zero almost everywhere on a s; t:::; b. In particular if rap(t) are continuous, then the determinant IraP(t) Iis different from zero on a :::; t :::; b. We now introduce a resolution of the subspace fJ?J of .xi. Our major result is that the spaces fJ?J(A) satisfy the resolvent theory or A theory of Section 2.3 and in particular conditions (7a) and (7b) of Section 2.3 with fJ?J(A) replacing £(A). Thus Theorems 10-15 of Section 2.3 hold in the fJ?J(A) setting of this section. Let fJ?J denote the subspace of.xi above such that
(27)
Ly(x) = M~ax~k)(a)
= 0,
x~)(b)
= 0,
where (for the remainder of this section) y = 1,2, ... ,m :::; np, M~a
denote
4.1 The Signature Theory of Lopez
149
constants, and the linear forms Ly{x) on d are linearly independent. Thus the m x np matrix IIM~all has rank m. The next few results are more general than we shall require for the remainder of this chapter. Lopez gives a space C =f. 0 described in the next paragraph. After Theorem 15 we shall always assume C = 0 and obtain the more usual boundary conditions. Let C denote an r-dimensional, linear subclass of d such that 81 (’)C = {O} and f~k)(b) = 0 for every arc f(t) in C. Let P4(A), a ~ A ~ b, be the linear subclass of arcs x(t) in [fB such that (28)
on
A~ t
~
b
for some arc f(t) in ~. In particular, let feet) (s = 1, ... , r) be a maximal set of linearly independent arcs in ~ and suppose that the rank of the matrix Ilhe(t)11 (s = 1,... , r) on the interval a ~ t ~ b is the minimum of the integers p, r. The following result is an immediate consequence of Theorem 1.
Theorem 13 The linear subclass 81 of d, together with the inner product given by (1), constitutes a Hilbert space. The norm of the arc x given by the formula = (x,x)lfZ Similarly the linear subclass P4(A), a ~ A ~ b, of [fB together with the inner product given by (1) constitutes a Hilbert space.
IIxll
Ilxll
is
Conditions (a), (b), and (c) of the next theorem follow immediately. Condition (d) is more difficult to prove. a
Theorem 14 The one-parameter family of closed subspaces 81 (A), A ~ b, of 81 satisfy
~
P4(a) = {O}; P4(b) = 81; P4(Al) a subspace of P4()"z) whenever Al < Az; if a ~ Ao < b, then P4(Ao) is the intersection of all classes P4(A) with Ao < A ~ b; (d) if a < Ao ~ b, then P4(Ao) is identical with the closure of the union of ofspaces P4(A) with a :s;; A < Ao. (a) (b) (c)
is
We now assume the above notation with B~~ and C~~ identically zero in (3a), r~(t) given in (6), and the recursive relations (29)
v~(t) v~(t)
f: r~(s) = f: [r~(s) =
ds + c~,
- V~-l(S)]
ds + c~,
where c~, . . . , cp- 1 are constants. Note that v~(t) = r~(t), v~(t) = r~(t) - v~(t) and v~(t) = i~(t) - r~(t), v~(t) = r~(t) - v~(t) and v2(t) = i'~(t) - v~(t) = i’i(t) › i~(t) + r~(t), etc. when the derivatives make sense. Thus (1) of Section 6.0 holds.
150
4 The 2nth-Order Problem
For any two arcs x(t), f(t) in d, let
(30)
J ix, f) = v~(2)f~k)(2)
S: S: K~p(s, + S: R~p(t)x~)(t)fW)(t)dt, +
t)x~)(s)fW)(t)
ds dt
where v~(2) are given by (29) at t = 2. The following result characterizes gg(2t Theorem 15 Let J(x) be the quadratic form given in (3). An arc x(t) in d is J orthogonal to a subspace gB(2), a ::; 2 ::; b, if and only if the constants cg, . . . , cp- 1 in (29) and constants f.1b’ , f.1m can be chosen such that (a) the Euler equations
’t"p(t) = vp- 1(t)
(31)
(fJ = 1, ... , p),
hold almost everywhere on a::; t ::; 2, where ’t"p(t) are given by (6) for j and vp- 1(t) are given by (29); (b) the transversality conditions (32)
+
A~px~k)(a)
f.1 yM;p - v~(a)
=
= n,
0,
hold at t = a, where v~(a) are given by (29) at t = a; and (c) J ix,f) = 0 for every arc fit) in the r-dimensional, linear subclass described above where J ix, f) is given by (30).
re
An arc x(t) in d satisfying the Euler equations (31) almost everywhere on a ::; t ::; b is called an extremal. Moreover, an extremal x(t) satisfying the transversality conditions (32) at t = a is called a transversal extremal. Thus an arc x(t) in d is J orthogonal to a subclass gB(2), a ::; 2 ::; b, if and only if x(t) is a transversal extremal on a ::; t ::; 2 such that J ix, f) = 0 holds for every arc f(t) in the subclass described above, where J ix, f) is given by (30). For our final results in this section, we assume == 0. Thus gg(2), a ::; 2 ::; b, is the subspace of arcs x(t) in d such that
re
(33)
Lix) = 0,
re
on
2 < t ::; b,
where the linearly independent forms Ly(x) on d are given in (27). The family of closed subspaces gB(2), a s; 2 ::; b, of gg have properties (a)-(d) of Theorem 14.
Lopez defines gB* to be the subspace of gB such that x~k)(a) = x~k)(b) = 0 and then gB*(2) = gB* n gB(2) so as to get a conjugate point theory. We shall use the words conjugate point and focal point interchangeably, depending on the rank of (M~<x), Thus if (M~<x) has "full rank," Lix) = in (27) implies x~k)(a) = 0, and we have Lopez’s conjugate point theory. His ideas are in keeping with the classical definitions, while ours are not.
4.1 The Signature Theory of Lopez
151
The next theorem characterizes gg(Je)J and ggo(Je) and is a major theorem for later sections. Theorem 16 Let J(x) be the quadratic form given by (3). An arc x(t) in .91 is J orthogonal to a subspace gg(Je), a ~ Je ~ b, if and only if x(t) is a transversal extremal on a ~ t ~ A.. That is, the constants c~, . . . ,cji- 1 in (29) and constants /11> , /1m can be chosen such that (a) the Euler equations given by (31) hold almost everywhere on a ~ t ~ Je; (b) the transversality conditions givenby (32) hold at t = a. An arc x(t) is in ggo(Je) if and only if it is a transversal extremal on a ~ t ~ Je, and Ly(x) = X~k)(A) = 0 and x(t) =: 0 on A < t ~ b.
Since J(x) is an elliptic form on .91, the determinant Irap(t) I is different from zero almost everywhere on a ~ t ~ b, by Theorem 12.Accordingly, the subsequent result is obtained from the theory of ordinary differential equa› tions. Theorems 17 and 21 show that ordinary different equations lead to a focal-point theory and not a focal-interval theory. These results "verify" Theorem 16 of Section 2.3 and Theorem 2 of Section 3.1. Theorem 17 Let the quadratic form J(x) given by (3) be an elliptic form on .91. Assumethat K~p(s, t) vanish identically and R~p(t) are essentially bounded. If an arc x(t) in .91 is an extremal and x~)(t) = v~(t) = 0 (fJ = 1, ... , p; k = 0, ... , n - 1) at somepoint on a ~ t ~ b, where v~(t) are givenby (29), then x(t) =: o on a ~ t ~ b. Accordingly, no nonzero, J-null vector of gg(Al) is a J null vector of gg(A2) wheneverA1 "# A2.
In order to describe and to characterize conveniently the focal points of J(x) relative to gg(A), a ~ A ~ b, the following definition is introduced. An arc x(t) in .91 is a focal arc with respect to a subspace gg(A), a ~ Je ~ b, if and only if x(t) is J orthogonal to gg(Je) and Ly(x)= 0 (y = 1, ... , m ~ np), where the linear forms Lix) are given in (27). The focal class ff is the linear class of focal arcs x(t). Thus a focal arc x(t) with respect to a linear subclass gg(Je), a ~ A ~ b, is a transversal extremal on a ~ t ~ Je such that Lix) = O. The class ff n 93’(A) of focal arcs x(t) in gg(A) is the class ggo(A) of J null vectors of gg(A). We recall that we have discussed "focal point" above. Thus if s(Je) is the signature of J(x) on 93’(Je), then Ao is a focal point of J(x) relative to gg(Je), a ~ A. ~ b, if f(Ao) = s(,1 o + 0) - S(Ao - 0) "# O. In addition f(Ao) is called the order of Aas a focal point. Theorem 18 is also a major theorem. It shows that for integral-differ› ential equations we obtain a focal interval theory. A complete description of this theory will be postponed until Chapter 6. Theorem 18 is very similar to the corresponding result in Mikami [37, p. 75J and our Section 6.2. In the next four theorems we relate s(Je) and n(A) to solutions of equations and transversality conditions.
152
4 The 2nth-Order Problem
Theorem 18 A value A, a < A < b, is a focal point of lex) relative to go(A), a ~ A ~ b, if and only if there is a focal arc x#-o such that (a) X~k)(A) = 0, (er: = 1, ... , p; k = 0, ... , n - 1); (b) x(t) ¥= on any interval A< t ~ A’; and (c) there is no focal arc y such that yet) == on a ~ t ~ A, and yet) ¥= x(t) on any interval A < t ~ A’. The order of Aas a focal point is equal to the number C(A) of linearly independent arcs in a maximal subspace of focal arcs of this type. Moreover, there are at most a finite number of points on the interval a ~ A ~ b at which C(A) #- O. The index s(b) of lex) on!JB is equal to the sum of the orders C(A) of the focal points Aof lex) relative to go(A), a ~ A~ b. Theorem 19 Let h denote the sum of the orders of the focal points of lex) relative to go(A), a ~ }, ~ b. There is an h-dimensional subspace
(34)
E,lx) =
Jo
(_1)n-
j
L (-1tj~O
j
n
=
Ifx(t) is in C
2n
::j(R~p(t)x~(t)
+
f K~p(s,
t)x~)(s)
dS)
dj d j [T~(t)]. t
and R~,lt), and K~rls, t) are sufficiently smooth, then the Euler equation given by (31), namely, Tp(t) = Vp-l(t), where Tp(t) is given in (6) and Vp-l(t) is given in (29), is equivalent to EfJ(x) = 0. Thus Tp(t) = Vp-l(t) is the integrated form of Ep(x) = O. By Theorems 4, 5, and 6, extremal solutions are of class C2n when we restrict lex) to the subspace d* n c 2 n and (34) makes sense.
4.1 The Signature Theory of Lopez
153
To see this equivalence in detail, we proceed as follows. From Tp(t) = vp-1(t) we have
d
d
d2
d
dt [Tp(t)] = dt [vp-1(t)] = Tp-l(t) - vp- 2(t),
d
dt 2 [Tp(t)] = dt [Tp-l(t)] - Vp-2 = dt [Tp-l(t)] - [Tp-2(t)] d3 dt 3 [Tp(t)]
d2
d dt [Tp-2(t)]
d2
d
= dt 2 [Tp-1(t)] -
+ vp- 3(t)],
+ vp- 3 (t),
= dt 2 [Tp-l(t)] - dt [Tp-2(t)] + Tp- 3(t) - vp- 4(t), dn- 1 dn- 2 dn- 3 dt n- 1 [Tp(t)] = dt n- 2 [Tp-l(t)] - dt n- 3 [Tp- 2(t)]
+ (-lr-2{[T~(t)]
-
+ ...
v~(t)},
dn dn- 1 dn- 2 dt n [Tp(t)] = dt n- 1 [Tp-l(t)] - dt n- 2 [Tp-2(t)] + ...
+ (-It{i~(t) dn - 1
= -
sr:’
- v~(t)}
dn -
2
[Tn-1(t)] _ _ [T n- 2(t)] + ... P dt"’? P
We conclude this section by considering an example problem. Let (35)
J(x) =
f: (X,2 -
x 2)dt
+
f:f: x(s)x(t)ds dt
be the quadratic form associated with the bilinear form (36)
J(x,y) =
f: (x’y’ -
xy)dt +
f:f: x(s)y(t) ds dt.
Integrating by parts with u = x’, dv = y’dt, du = x" dt, and v = y, we have J(x, y) =
f:(
-x"(t) - x(t)
+
f:
x(s) dS) y(t) dt
+ X’yl:•
Thus if go is the set of all "nice" arcs y(t) vanishing at t = a and t = band 2 Xo is in C , we have Xo is in f!8J if and only if xo(t) satisfies (37)
L(x) = x"(t)
+ x(t) -
f: x(s) ds = O.
154
4 The 2nth-Order Problem
If x(t) is in C 3 , we can differentiate (37) so that (D3 + D)x = 0 or xo(t) = A sin t + B cos t + C. Substituting this result into (37) we obtain
0= C =
S:
(Asint + Bcost + C)dt
C - (-Acost + Bsint + Ct)l:
= C - A - (- A cos b + B sin b + Cb) or (38a)
0= A( -1
+ cos b) + B( -sinb) + C(1 -
b).
Ifxo(t) is in f!J we also obtain (38b)
o=
(38c)
0= Asinb + Bcosb + C.
A .0
+ B . 1 + C,
In this case to solve for A, B, and C we have a coefficient matrix
,1=
- 1 + cosb -sinb 0 1 [ sinb cosb
or
1,11 = f(b) = -1 + cosb - sin? b - sinb(1 - b) - cosb( -1 =
-2 + 2cosb - sinb
+ cosb)
+ bsinb.
Note that f(b) = 0 when, for example, b = 2n, and hence there exists a non› zero solution xo(t). In fact, the reader may verify that if b = 2n then the above equations become 0 = A(O) + B(O) + C(1 - 2n), 0 = B + C, 0 = B + C, which implies C = 0, B = - C = O. Thus, xo(t) = sin t is a solution. Note that unlike the case for Section 2.1 the vector x1(t) = sin ton [0, n] and zero on [n, 2n] satisfies
J(Xl) = fo" [(sin t)’z - (sin t?] dt + [fo" sin t dtJ > O. Furthermore there can exist no vector X z #- 0, where xz(t) == 0 on tt and J(x z) ~ O. Otherwise J l(X Z) ~ 0, where Z" J1(x) = fo [xfZ(t) - xZ(t)]dt.
~
t
~
2n
4.1 The Signature Theory of Lopez
155
This illustrates (in this case) that the conjugate points of lex) occur after those of ll(X). We now identify this problem with Lopez’s theory. From (4) we have n = 1, p = 1 and IX, f3 are ignored; k, 1 = 0; i,j = 0, 1; and R l 1(t) = 1, ROO(t) = -1, KOO = 1 with all remaining terms identically zero. From (6) we have to(t) = - x(t)
+ So2" xes)ds, tl(t) = x’(z),
Condition (27) becomes x(a) = 0 and x(b) = 0, while (29) becomes VO(t) = S:( -x(s) + s:"x(U) dU) ds + co,
From (33) we have 86(A) is the set of arcs x(t) such that x(a) = 0, x(t) == on [A,b]. From Theorem 16, we see that x(t) is in 86(A)J if and only if there exists CO in (29) and P,l = 0 [since vO(a) = 0] such that the Euler equations given by (31), namely, ’[l(t) = VO(t), hold almost everywhere on a:5; t s; A. Thus x’(r) = ’[l(t) = VO(t)
(39)
= S: ( -xes) + S:" x(u)du)dS + CO or (40)
x"(t) + x(t) -
S:" x(u) du = 0
holds on
a
s; t :5; A
whenever we can differentiate (39). As per our discussion following Theorem 21, xo(t) = sin t is an extremal vector [condition (5) holds] if we restrict lex) to the set of functions x(t) in C 2 Similarly, x(t) is in 860(A) if and only if (4) is satisfied on a :5; t :5; A, x(a) = 0 and x(t) == 0 on A :5; t :5; b. Thus if A = 2n then xo(t) = sin t is in 86 o(2n). We note that xit) is no longer a negative vector for e > 0, as at the end of Section 3.1. However z.(t), defined below, is a negative vector on the interval [0,2n + e]. Thus let
x.(t) =
zit) =
Sin t
o :5; t :5; nl2
1
nl2 < t < nl2 + e nl2 + s < t :5; n + e n + e < t :5; 2n + e,
. { ~ll (t
{
- e)
sin t
o :5; t :5; nl2
1
nl2 < t :5; nl2 + s nl2 + s < t :5; 2n + e.
sin(t - e)
156
4
The 2nth-Order Problem
Now similar to Section 3.1, we have J(x,) = -8 + (8 +
J: sintdtY = -8 + (8 + 2)2> 0,
while J(z,)
= -8+(8+ J:"sintdtY = -8<0.
The remaining details of this example are left to the reader.
Approximation Theory
4.2
The purpose of this section is to derive an approximation theory for the problems studied in Section 4.1. More specifically, we show that our fun› damental inequality (10) of Section 2.3 can be generalized to the inequality S(Ao, 0"0)
~
S(A, 0")
s()", O") + n(A, 0")
~
~
s().o, 0"0)
+ n(Ao,0"0),
where J(x; 0") is defined below on a space d(O") II .16(A). As in Chapter 3 we are especially interested in the situation when 0" is the eigenvalue parameter, the numerical parameter or the numerical-eigenvalue parameter. Thus, we can derive results for focal interval and focal-point-numerical-eigenvalue problems. We also reestablish many of the results of Section 4.1 as a special case. In addition, we shall repeat (though in a different applied setting) some of the theoretical arguments of Chapter 3. We restate the definition of f!.6 and f!.6(),) of Section 4.0 for convenience. Let f!.6 denote a subspace of d such that x is in f!.6 if and only if (la)
Ly(x) = M~iXx~k)(a)
= 0, x~)(b) = 1; y = 1, ... , m ~ np), where
= 1, ... ,p; k,l = 0, ... , n M~iX are real numbers such that the linear functionals Ly(x)are linearly independent on d. Let f!.6().), a ~ A ~ b, denote the subspace of f!.6 whose component functions satisfy
(a,p
(1b)
on
A~ t
~
b
for
k = 0, 1, ... , n - 1.
Assume that for each 0" in a metric space (L, p) we have a quadratic form (2)
J(x; 0") = A:~trx~k)(a)x~)(a)
f Jab K~Ptr(s, + f R~Ptr(t)x~)(t)X~)(t)
+
t)X~)(S)X~)(t)
ds dt dt,
4.2 Approximation Theory
157
where the indices iX, P, k, l, i, and j, symmetry and smoothness properties of K~Pa, and R~Pa are those given in Section 4.1. In order to illustrate later developments, we shall show that conditions (2) of Section 2.3 hold if the coefficients A~~a, K~pAs, r), and Rapa(t) of (18) are continuous in a and continuous (for fixed a) in the independent variables t and s. A~~a,
Theorem 1 Let fJ8 denote the space of functions given by (la). Let A~~a, t), and Ra/lAt) satisfy the continuity properties given above. Then condi› tions (2) of Section 2.3 hold on fJ8. K~pAs,
For (2a) of Section 2.3, if x, ..... /J(x" Yr; a r) - Jix«, Yo; ao)1
Yr => Yo, and a, ..... ao, then
X O,
:s;;
IJ(x" Yr; a r) - J(x" Yr; ao)1
+ IJ(x" Yr; ao) -
J(xo, Yo; ao)l•
The second term becomes arbitrarily small as J(x, Y; ao) is elliptic on fJ8. That is, it can be expressed as the difference D(x, y) - K(x, y), where D is topologically equivalent to the inner product (x, y) and K(x, y) is compact. Equivalently, the second term becomes small by use of Theorem 1 of Section 4.1 since the coefficients are fixed. The first term is bounded by M 1 t/J(a)llxrIIIlYrll :s;; M 2t/J(a), where t/J(a) = 3 sup{IR~Pao(t)
- R~pa(t)l,
IK~pao(s,
t) - K~/la(s,
t)l, IA~~ao
- A~~al}
and the supremum is taken for s, tin [a,b]; o; P = 1, ... , p; i,j = 0, ... , n; k, t = 0, ... , n - 1. Thus the first difference tends to zero as a ..... ao by the continuity of the coefficients and the fact that both weak and strong con› vergence implies boundedness by Theorem 1 of Section 4.1. Similar arguments for (2b) and (2c) of Section 2.3 hold. For example, if x, ..... Xo, then J(x r; a r) - J(x o; ao) = J(x r; a r) - J(x r; ao)
+ J(x r; ao) -
J(xo; ao).
Since /J(x r; a r) - J(x r; ao)l:s;; M 3 t/J(a r) .....
as
a; ..... ao,
(2b) of Section 2.3 holds since J(x; ao) is elliptic and hence lim infJ(xr ; ao) z J(xo; ao). r== 00
IJ(xr;a r) - J(xo;ao)1
z IIJ(x r;a r) -
J(xr;ao)!-IJ(xo;ao) - J(x r; ao)1I.
As above, J(x r; a r) ..... J(x r; ao) so that Jtx.: ao) ..... J(xo; ao). But J(x; ao) is elliptic and therefore x, => X o ’ This completes the proof. Similarly, we have condition (1) of Section 2.3 holding in this setting.
158
4
The 2nth-Order Problem
Theorem 2 The subspaces .cJl(A), a;S; A ;S; b, in (1a) satisfy hypothesis (1) of Section 2.3. The proof of this result follows in two steps. In the first we have that these subspaces satisfy conditions (a), (b), (c), and (d) of Theorem 14 of Section 4.1. In the second by Theorem 10 of Section 2.3 we have that these conditions are sufficient for (1) of Section 2.3. We remark that the signature theory of Lopez now follows in the focal point case, where K~{J(s, t) == 0, by Theorem 16 of Section 2.3 and in the focal interval case by the derivation in Section 6.1. We now show that our funmental inequality can be extended as indicated above. . Let M = A x ~ be the metric space with metric d defined by d(fll, flz) = p.z - All + p(aZ,al)’ where fll = (Abal), flz = (Az,az); (~,p) is a metric space; and A = [a, b] with the usual absolute-valued metric. For each fl = (A,a) in M and J(x; a) in (2), define J(x; fl) = J(x; a) on the space re’(fl) = d(a) n .cJl(A). Let S(fl) = S(A, a) and n(fl) = n(A,a) denote the index (signature) and nullity of J(x; fl) on re’(Ji} In many senses, Theorem 3 is the main result for applications to approxi› mation problems. It allows us to extend our theory to more general problems. This fundamental result follows from Theorem 14 of Section 3.3, which is proven in Section 3.5. Theorem 3 Assume that the quadratic forms J(x; a) and the spaces d(a) satisfy (1) and (2) of Section 2.3. For any Jlo = (20 , 0’0) in M there exists (j > 0 such that if fl = (A, a), d(flo,fl) < (j, then (3)
s(Ao,ao);S; s()"a);S; s(2,a)
+ n(2,a);S; s(20,ao) + n(Ao,ao).
Furthermore and
n(A,a) = o.
We now digress to interpret Theorem 3 for the integral-differential equations in this setting. For convenience we will assume d(a) = .cJl. For each a in ~ and quadratic form J(x; a) given by (2) let (5)
(6a)
V~u(t)
= f: 'r~u(s)ds
(6b)
v~u(t)
= f)t~, (s)
(7)
tp,,(t) = vp;; l(t),
+ c~u, -
v~;;l]ds
+ c~u,
4.2 Approximation Theory
159
(8) (9)
satisfy the conditions of those in Section 4.1. Then by Theorem 1 of Section 4.1 we have Theorem 4 The integer n(A,0") is the number of distinct nonzero solutions x(t) = (x 1 (r),xz(t), ... , xp(t)) to (7)
’rp".(t) = vp; l(t)
satisfying the transversality conditions (8)
and the boundary conditions M~ax~k)(a) 1= 0, ... , n - I; y = I, ... , m ::; np).
= 0, x~)(A)
= 0
(rx,/3 = 1, ... ,p; k,
We note as in Chapter 3 that for fixed 0"0’ s(A,0"0) and m(A,0"0) = s(A,O"o) + n(}" 0"0) are nondecreasing, nonnegative, integer-valued functions of A. Fur› thermore, s(A- 0, (1) = s(A,O") and the disjoint hypothesis or K~pu(s, t) == implies s(}, + 0, (1) = s(A,O") + n(}" (1). Thus in this case, s(A + 0, 0"0) ›
s(},- 0, 0"0) = n(A,0"0)’ This disjoint hypothesis is usually called "normality" in problems of differential equations, calculus of variations, and control theory. Finally, we arrive at the concept of focal points. A point }, at which s(A,0"0) is discontinuous will be called a focal point of J(x; 0"0) relative to ~(A) (A in A). The difference fV" 0"0) = s(A + 0, (10) - s(},- 0, 0"0) will be called the order of the focal point. A focal point will be counted the number of times equal to its order. The next theorem is a restatement of Theorem 16 of Section 2.3 in this setting. Theorem 5 Assume for (10 in L that ~ 0(,11> (10) n ~ o(}’z, (10) = when # ,12’ then f(a, 0"0) = O,j(A,O"o) = n(A,(10) on a ::; A::; b. Thus if ,10 is in A the following quantities are equal: ,11
(i) the sum La,;.«.
160
4
The 2nth-Order Problem
We can say much more for the approximation setting. In the next two results, we assume that 0"0 in L satisfies 0’0(A. b 0"0) n 0’o(A.z, 0"0) = 0 when }’l i= }’z• Since this implies that n(}" 0"0) = 0 except for a finite number of points A. in A, we have from (4) with }’o = X and then with }’o = }." Theorem 6 Assume X and X’ are not focal points of 0"0 (a::; I,’ < },"< b) and },ier0) ::; }’q+ 1 (0"0) ::; ::; }’q+k - 1 (0"0) are the k focal points of 0"0 on (A’’;’’’).Then there exists an s > 0 such that p(O",O"o) < B implies }’q(O")::; }’q+ 1(0") ::; ... ::; }’q+k-l(0") are the k focal points of 0" on (},’,},"). Corollary 7 (k = 1,2, ... ).
The kth focal point }’k(O") is a continuous function of
0"
If K~p(s, t) == 0 does not hold, or if the disjoint hypothesis 0’o(}’l> 0"0) n 0’o()’z,O"o)= 0 when I’l i= Az does not hold, a nonzero solution to rjiao(t) = Vji;;ol(t) subject to (8) may be identically zero on a proper subinterval of A. In this case, s(), + 0, 0"0) - s(A. - 0; 0"0) i= 0 if and only if I, is the right-hand endpoint of the subinterval and s(b, 0"0) counts the number of such intervals. A detailed explanation of abnormal problems may be found in Chapter 6. This phenomenon is the "focal-interval" problem encountered in optimum control theory. It should be regarded as a generalized type of oscillation problem.
4.3
Comparison Results
The purpose of this section is to give very general comparison results for the signatures of two quadratic forms or for the oscillations of two dif› ferential equations. We also give nonlinear results by changing one of the forms to a nonlinear functional. Elementary type results have been seen in Section 3.1, Section 3.3, and at the end of Section 4.1, as well as more sophisticated results for eigenvalues such as in Theorem 2 of Section 3.4. We have indicated in Section 1.1 that eigenvalue problems are really com› parison parametric problems. The basic point to remember is that by the words "comparison theorems" or results we mean comparison of the signature of a quadratic form J l(X) on a space d 1 with the signature of a quadratic form J 2(X) on a space d z› If we may then relate the respective signatures to solutions xo(t) of differ› ential equations that vanish at points t*, we may compare the respective vanishing points. Ifthis relation is not possible, we still have a more general theory of oscillation, but one less interesting to students of differential equa› tions.
4.3
161
Comparison Results
In Thoerem 1, we shall prove the Sturm comparison theorem. Leighton [35, p. 225] refers to this theorem as a generalization of the Sturm-Picone theorem. We could then use some clever arguments and the identity (Picone formula)
b Jar [(r 1 -
r)z’Z + (p - P1)ZZ] dt
b + Jar r [YZ’-y y’zJZ dt
= -z [r 1yz, - ry’Z]Ib y a
to obtain Theorem 1. In the above and in Theorem 1, we assume r(t), r 1(t), p(t), and P1(t) are continuous. We ask the reader to pay special attention to the methods of our proofs since they are repeated throughout this section. For Theorem 1, let y(t) be a nonnull solution of (r(t)y’(t»’ + p(t)y(t) = 0 and z(t) be a nonnull solution of (r 1 (t)z’(t», + P1 (t)z(t) = 0, where 0 < r(t) ~ r 1(t) and P1(t) ~ p(t); or less generally we may assume
fab [(r 1 -
(1)
Theorem 1 If z(a)
r)z’Z + (p - pdzZ] dt 2:: O.
= z(b) = 0 and y(a) = 0, then y(t*) = 0 for a < t*
~
b.
+
Our proof for the specific hypothesis (1) is as follows. Let m().) = s().) n().), where ,161().), a ~ Il ~ b, be the space of C 2 functions vanishing at t = a and on [Il,b], where J(x)
=
Lb [r(t)x’Z(t) -
p(t)XZ(t)] dt.
Similarly, let m 1().) = Sl().) + n 1().1) on ,161().), a J 1(x) =
~
). ~ b, where
f [r1(t)x’Z(t)- P1(t)XZ(t)] dt.
Recall that m(ll) is the dimension of a maximal subspace rrJ of ,161().) such that x in rrJ implies J(x) ~ O. By Section 3.1, J l(Z) = J l(Z,z) = 0; while by (1), J 1(z) - J(z) 2:: 0 since J 1 (z) - J(z) is the left-hand side of (1).Now 0 = J 1 (z) 2:: J(z) implies m(b) 2:: 1 and, by Section 3.1, the existence of t*, which satisfies the theorem. For future purposes, we prove this result for the more general hypothesis 0< r(t) s r1(t) and P1(t) ~ p(t). In this case, J l(X) - J(x) 2:: 0 or J l(X) 2:: J(x) for any vector x. If x is such that J 1 (x) < 0, then 0 > J 1(x) 2:: J(x) so that s().) 2:: Sl().)’ Similarly, m().) 2:: m 1 (1l). The hypothesis of Theorem 1 implies m1(b) 2:: 1 so that m(b) 2:: 1 and our result holds as before. If there is strict inequality, such as r1(ttl > r(t 1) at some point t 1 in [a, b], then the hypothesis
162
4 The 2nth-Order Problem
implies s(b) ~ 1. This assertion follows since J 1(z) = 0 and J 1(x) > J(x) for all x i= 0 implies J(z) < O. Thus the null solution of one quadratic form is the negative vector of the second quadratic form. In some real sense, we believe our ideas are more useful and general than the more standard oscillation theorems. For example, in the above, we can relax the continuity conditions on the coefficient functions. Similarly, we need not consider solutions to equations (rx’)’ + px = 0, but the general solutions satisfying the integral form of this equation. These results can also be generalized to focal-interval-type results of Section 3.1 and Chapter 6. Furthermore, we have seen that our methods are often quantitative and not only quantitative; that is, we can count as well as compare. Finally, they are applicable to stationary solutions of extremal problems and have some real meaning. We now consider second-order problems of integral-differential equa› tions as in Section 4.1. We leave to the reader the multitude of higher-order problems. Our results are given in terms of focal points, but the reader can make the transition to "oscillation points" as the point t* of Theorem 1 or to right-hand end points of focal interval as in Chapter 6, when these terms are applicable. Thus let (ri(t)x’(t))’+ Pi(t)X(t) =
Jar- qi(S, t)x(s)ds - dtd Jafb lies, t)x’(s)ds
be associated with the quadratic forms Ji(x)
=
S: [ri(t)x’Z(t) - Pi(t)XZ(t)] dt + Lb f qi(S, t)x(s)x(t) ds dt + Lb S: lieS, t)x’(s)x’(t)ds dt.
In the above, i=l, 2; rit)~r1(t»0, u(s)u(t), lz(s, t) - 11(s, t) = v(s)v(t).
PZ(t)~Pl(t);
qiS,t)-q1(S,t) =
Theorem 2 Let a < Al < Az < A3•.. < Ami ~ b be the focal points asso› ciated with equation (21 ) and let a < Jl1 < Jlz < Jl3 < ... < JlmZ ~ b be asso› ciated with equation (2 z). Then mz ~ m1’and for eachj such that 1 ~ j ~ mz, we have Aj ~ Jlj.
Let x(t) be any vector vanishing at a and b, then J z(x) - J l(X) =
Lb [(rz - r 1)x,Z(t) - (pz - Pl)XZ(t)] dt + (Lb u(s)x(s)dsY+ (S: v(s)x’(s)dS) z ~ o.
4.3 Comparison Results
163
Thus J z(x) ~ J I(X), so that a negative vector of J z(x) is a negative vector of J 1(x) (i.e., Jz(y) < 0 implies J 1(y) < 0). The result now follows from the definition of focal points. The next theorem shows that the focal points of ordinary differential equations can be "bracketed" by focal points of integral-differential equa› tions and conversely. Corollary 3 Assume that ql(S, t) = ’l(s, t) == 0 in Theorem 2. Then Aj ~ /1j for each j such that 1 -::;j ~ mz. Conversely, if ql(S,t) = ’I(S,t) == 0, rl(t) ~ rz(t) > 0, pz(t) ~ Pl(t), qz(s,t) = - u(s)u(t), lz(s,t) = - v(s)v(t), then /1j ~ Ajfor eachj such that 1 -::;j -::; mi’
The above inequalities are "sharp." For example, assume the conditions of Theorem 2; then from an argument similar to Theorem 6 we have the following sharp, strict inequality result. Corollary 4 Let j be a nonnegative integer such that any of the following hold: rz(t) - r 1 (t) > 0 on [a, /lJ; Pl(t) - pz(t) > 0 on [a, /lj]; S~j u(s)x(s)ds #- 0 or f~jv(s)x'(s)ds #- 0, where x(s) is a solution to (2 z) on [a,/1J vanishing on (J1j, b]. Then }’j < /lj’ Theorem 5 (Sturm separation theorem) IfXl(t)and xz(t) are two linearly independent solutions of (2d and (2z), respectively then between any two consecutivefocal points ofx1(t) there is afocal point ofxz(t).
The negation implies there exist points a < al < a z < a3 such that if s(t) is the signature of the appropriate quadratic form, we have s(a z + 0) › s(a 1 - 0) > 0 and s(a3 - 0) - s(a) = 0, which is clearly impossible. Theorem 6 Assume the hypothesis of Theorem 2 except that qz(s,t) ~ q1(S, r), Then }’1 s /11 if /11 exists. Furthermore, }’1 < /l1 unless r 1(t) == rz(t), Pl(t) == pz(t) and ql (s, t) == qz(s, t) on [a, /11] and f~l v(s)x’(s)ds = 0 for any solution to (2 z) on [a, /11].
By hypothesis, if x(t) is any nonnull vector vanishing at t the interval [/lllb], and x(t 1)x(tZ ) ~ 0 on [a,/lI]’ then
Jz(x) - J 1(x) =
=
f:’[(rz - r 1)x’Z - (pz - pl)x + f:’f:’[qz(s,t) - ql(s,t)]x(s)x(t)dsdt + (f:’V(S)X’(S)dSY~ o.
The result again follows by our discussion.
Z(t)]dt
a and in
164
4 The 2nth-Order Problem
Conversely, if xo(t) is any nonnull solution to (2 2 ) with xo(a) = 0, then 12(XO) "restricted" to [a, Ill] has valued zero. If12(XO) - 1 1 (xo) > 0 restricted to [a,lll]’ then as before we have )’1 < Ill’ We now consider comparison theorems between linear and nonlinear equations. In Theorems 7, 8, and 9 we set L(x) = (r(t)x’(t))’ - q(t)x(t) and L 1(x) = (r 1(t)x’(t))’ - ql(t)x(t) and assume that xg(t,x) ~ on an interval a ::::; t ::::; b. Let 1(x) and 1 I (x) be the quadratic forms associated with L(x) and L 1(x), respectively. Furthermore we assume that 0< rl(t) ::::;r(t) and q I (t) ::::; q(t) on a ::::; t ::::; b. Theorem 7 Assume there exists a nontrivial solution to L(x) = g(t, x) satisfying x(a) = x(b) = O. Then there exists a nontrivial solution to L 1(x) = hx,), - qlX = 0 such that x(ad = x(a 2) = 0, a ::::; al < a2 ::::; b. Let x(t) be as given in the hypothesis. Then
0= S:[L(x) - g(t,x)]xdt
=
J:[ -rx,2 -
qx2 - xg(t,x)]dt + rX’xl:
implies
0=
Lb [rx,2 + qx 2 + xg(t,x)] dt ~
J: (r
1x,2
+ qlx)dt = 1 1(x).
Thus mj(b);;::’: 0, which implies the result. Theorem 7 and 8 are "sharp," as is Theorem 6.
Corollary 8 Let a = al < a2 < a3 < ... < a n + I ::::; b be points such that a nontrivial solution of L(x) = g(t, x) vanishes at a, (i = 1,... ,n + 1). Then, there exists a nontrivial solution to L 1(x) = which vanishes at least n + 1 times on the interval a ::::; t ::::; b. Corollary 9 If no nontrivial solution to L 1(x) = 0 vanishes n times on a a
s; t s; b, then no nontrivial solution to L(x) = g(t, x) vanishes n times on s; i s; b.
As an example the reader may verify that x(t) = sin t is a solution to [(2 + cos t)x’]’ + (2 + 2 cos t)x = such that x(o) = 0. Hence no nontrivial solution to [(2 + cos t)x’]’ + (2 + 2 cos t)x = [hI (t)x 5 + h 2(t)x3] (hi (t) > 0, h 2 (t) > 0) vanishes more than once on [0, n) or (more generally) on [a, a + n). Note that if k ;;::.: then sin t is a solution to [(2 + cos t)x’]’ + (2 + 2 cos t)x = k(x 5 + x 3), so our results are "best possible." In Theorems 10, 11, and 12 we assume f(t,x,x’) does not change sign on a s; t ::::; b for some nontrivial solution x(t) of L(x) = (rx’)’ - qx = f(t, x, x’), 0 < rl (t)::::; r(t), and ql (t) ::::; q(t).
4.3
Comparison Results
165
Theorem 10 Assume the hypothesis of the last paragraph and that x(t) is a nontrivial solution to L(x) = f(t, X, x’) such that a < c < b are three con›
secutive zeros of x(t), x’(c) -=1= O. Then there exists a nontrivial solution to Ll(x) = (rlx’)’ - qlx = 0 vanishing at a l and az, a ::::; a l < a z ::::; b. We may assume that x(t)f(t, x, x’) ~ 0 on a ::::; t::::; c. Set yet) = x(t) on [a, c] and zero otherwise. The implication in the proof of Theorem 7 yields
0= f[rx’Z + qx Z + V(t,x,x’)]dt > Lb(rd Z + qlyz)dt = ll(Y) and hence the conclusion follows as above. Corollary 11 Assume x(t) is a nontrivial solution to L(x) = f(t, x, x’) such that x(ai) = 0 for i = 1,2, ... ,2n + 1, a s; a l < az < ... < aZ n + l = band x’(a) -=1= 0 for j = 2,3, ... ,2n. Then there exists a nontrivial solution yet) of Ll(x) = (rlx’)’ - qlX = 0 which vanishes at (at least) n points of a ::::; t s; c. Corollary 12 If a nontrivial solution of Ll(x) = 0 vanishes less than n times on a ::::; t ::::; b, then every nontrivial solution yet) of L(x) = f(t, x, x’) vanishes less than 2n + 1 times on as; t ::::; b unless a < a; < b, yea;) = 0 implies y’(a;) = 0 (i = 2,3, ... ,2n). For example, if f(t, x, x’) = 8(t, x, x’)[1 - (x? - (x’)Z], then sin t is a so› lution to
+ cos t)x’]’ + (2 + 2 cos t)x = 8(t, x, x’)[1 - (XZ) - (x’)z]. Thus, there exists a nontrivial solution to (r1x’)’+ qlx = 0 for any rl(t) and ql(t) such that 0 < rl(t)::::; 2 + cost, ql(t)::::; 2 + 2cost, which vanishes at least n times on any interval a ::::; t ::::; a + 2nn. In fact, in this example we [(2
can easily show that any solution must vanish 2n times on a ::::; t ::::; a + 2nn. In Theorems 13 and 14 we extend (in the obvious manner) our comparison theory to fourth-order differential equations. For simplicity, we set Lz(x) = (rx")" + qx, though our results are valid for more general fourth-order linear, self-adjoint equations. We also assume 0 < r l ::::; rand ql ::::; q on a ::::; t ::::; b as above. In Theorem 13, we assume xg(t, x, x’, x", x"’) ::::; O. We note the obvious corollaries to Theorem 14 are (essentially) Corollaries 8 and 9 except that the condition x(to) = 0 (or x(t) vanishes at to) is replaced by x(to) = 0 and x’(to) = O. Theorem 13 Let x(t) be a nontrivial solution to Lix) = get, x, x’,x", x"’) such that x(a) = x’(a) = x(b) = x’(b) = O. Assumexg(t, x) ::::; 0; then there exists a nontrivial solution yet) to Lix) = (rlx")" + qlx = 0 satisfying y(al) = y’(al) = y(az) = y’(az) = 0, where a s; a l < az ::::; b.
166
4 The 2nth-Order Problem
Integrating by parts twice we have
0= f:[L 2(x) - gJxdt
= f: [r(x")2 + qx - gxJ dt + [(rx")’x - (rx")x’JI: >
f [r
1(x")2
+ ql X J dt,
which completes the theorem. Theorem 14 Assumef(t, x, x’,x", x’’’)does not change sign for a nontrivial solution x(t) of L 2(x) = f(t, x, x’, x", x"’) on as t S b, where a < c < b, x(a) = x’(a) = x(c) = x’(c) = x(b) = x’(b) = 0, x"(b) =1= O. Then there exists a nontrivial solution y(t) of Lix) = (rlx’)’ + qlX = 0 such that y(al) = y’(al) = y(a2) = y’(a2) = 0 for a S a 1 < a2 S b. Similar results hold for the general2nth-order problem. For convenience, we assume L(x) = (r(t)x(n)(t»(n) + (-1tq(t)x(t). As above, we assume 0 < rl(t)sr(t), ql(t)Sq(t), and (_l)n-l xg(t,x,x’,... ,x(2n-l))sO hold on as t S b. Theorem 15 Let x(t) be a nontrivial solution to L(x) = g(t,x,x’,... , x(2n-l)) such that x(a) = x’(a) = x"(a) = ... = x(n-l) (a) = x(b) = x’(b) = x"(b) = ... = x(n-l)(b) = O. Then there exists a nontrivial solution y(t) to L 1(x) = h(t)x(n,]
4.4 Higher-Order Numerical Problems and Splines The main purpose of this section is to show that the approximation ideas in Section 2.3 for quadratic forms whose associated Euler-Lagrange equa› tions are 2mth-order ordinary differential equations can be applied to nu› merical quadratic form problems. The basic result is that the hypothesis of Section 2.3 is satisfied in this setting and hence the fundamental inequalities
4.4 Higher-Order Numerical Problems and Splines
167
(5) and (6) of Section 2.3 hold. We shall see that our approximating vectors are the appropriate size spline functions and that the indices s(a) and n(a) are the number of negative and zero eigenvalues of an approximating, sym› metric, sparse matrix. This work generalizes the ideas of Section 3.2, where we considered the second-order case. We assume that the reader is familiar with these ideas, which serve as examples for this section. As expected, this generalization provides little theoretical difficulty but great practical difficulty. We shall briefly comment on the construction of the approximating matrix for a fourth-order problem to illustrate these ideas. This construction is more difficult than in the second-order case. In Chapter 5, we shall give the numeri› cal problem for second-order elliptic partial differential equations. The approximating matrix in that case becomes a block tridiagonal matrix. Splines are used in this section in two ways. On the abstract level, their well-known approximation properties simplify the proofs needed to show that the approximating hypothesis of Section 2.3 is satisfied. On the applied level it is shown that splines are the right approximating elements. It is interesting to observe that the "chain is complete" if we view splines as solutions to fixed-endpoint problems in the calculus of variations. It is clear that ideas and results of this section may be applied to a wide variety of additional problems. We begin our formal development by defining the forms J(x; a) and the subspaces d(a) for this section. In this section, d will denote the totality of arcs x in (t, Xl>’ ,xp ) space defined by a set of p real-valued functions x:x,,(t) (0 :::;; t:::;; 1;!Y. = 1, ... ,p) such that xit) is of class Cm - l ; x~m-l)(t) is is absolutely continuous; x~m)(t) is square integrable. In this section « denotes a parameter with values 1, 2, ... ,p, q denotes a parameter with values 0, ... , m - 1, superscripts denote the order of differentiation, and repeated indices are assumed summed. The inner product on d is given by (1) (m not summed), with corresponding norm given by
IIxl12 = (x, x).
Let I: denote the set of real numbers a = lin (n = 1,2,3, ... ) and zero. The metric on I: is the absolute value function. Let d(O) = d. To construct d(a) fora = lin, define the partition n(a) = {klnlk = 0,1, ... , n}. The space d(a) is the space of spline functions with knots at n(a), which will be described in Theorem 1. The space is a p(n + Ij-dirnensional space. For convenience we have adopted notational conventions associated with spline theory. The fundamental (real) bilinear form is given by (2)
J(x, y) = H(x, y) + JOl R~p(t)x~)(t)y~)(t)dt,
168
4 The 2nth-Order Problem
where H(x, y) = A~~X~k)(O)y~)(O)
+
x~k)(l)y~)(O)]
+ B~~[X~k)(O)y~)(l) +
e~~x~k)(l)y~)(l),
e kl Akl ~{J = Alk {J~' ~{J = elk {J~' an d Bkl ~{J are cons t an t rna tri rIces,. Rij ~{J (t) = Rji (J~ (t) are (r lor purposes of simplicity) continuous functions on 0 ::; t ::; 1; and the inequality (3)
holds almost everywhere on 0::; t s; 1, for every fjJ = (fjJl"" ,fjJp) in £P, and some h > O. In the above 0:, f3 = 1, ... ,p; k,l = 0, ... ,m - 1; i,j = O, ... ,m. The quadratic forms are now defined. For a = 0, let d(O) = d and l (4) J(x; 0) = J(x) = H(x, x) + fo Rjp(t)x~i)(t)x~"l(t) dt. For a = lin (n = 1,2,3, ... ), we now define the quadratic form J(x; 0") for x in d(O"). Thus, let Rj{J,,(t) = Rjp(kln) if t E [kin, (k + l)ln) and Rj{J,,(l) = Rjp(n - l)fn) for rx,f3 = 1, ... ,p; i,j = 0, ... ,m. Finally, set l (5) J(x; a) = H(x, x) + fo Rj{Ju(t)x~)(t)x~)(t) dt, where x = (Xl(t), ... ,xit)), x(t) is in d(a). We now show that conditions (1)and (2)of Section 2.3 hold for the spaces d(a) and forms J(x; a). We first state the necessary results that we need from the theory of splines. By a spline function of degree 2m - 1 (or order 2m), having knots at n(l/n), we mean a function S(t) in e Zm-Z( - 00, (0) with the property S(t) is in P Zm- l (a polynomial of degree at most 2m - 1) in each of the intervals (- 00,0), (0, lin), ... , «n - l)ln, 1), (1, (0). Let m s; n + 1, and denote by Lzm(n) those spline functions of degree 2m - 1 that reduce to an element of Pm - 1 in each of the intervals (- 00,0) and (1, (0). The last condition implies S(V)(O) = S(V)(1) = 0 for v = m, ... ,2m - 2. Theorems 1,2, and 3 are given in Schoenberg [48].
Theorem 1 If Yo,... ,Yn are real numbers, there exists a unique S(t) E LZm(n) such that S(kln) = Yk(k = 0, ... ,n). Theorem 2 Let f(t) be in d (with p = 1), and suppose S(t) is the unique element of Lzm(n)such that S(kln) = f(kln) (k = 0, ... ,n). If s(t) is in LZm(n), then (6a)
l l fo [s(m)(t) - pm)(tWdt 2’: fo [s(m)(t) - f(m)(t)]2 dt
with equality
if and only if s(t) -
S(t) is in Pm-I’
4.4 Higher-Order Numerical Problems and Splines
(6b)
169
f: (f(m)(t))Zdt ~ fo (s(m)(t))Z dt, l
with equality if and only if f(t)
= Set) in [0,1].
Theorem 3 Let f(t) and Sn(t) in Lzm(n) satisfy (for each n such that m S n + 1) the hypothesis in Theorem 2. Then (7a)
lim (’I [s~m)(t) n-+
00
Jo
- f(m)(tWdt
=
0;
for each v = 0, 1, ... , m - 1, uniformly on
(7b)
[0,1];
and
(7c) The history of spline theory and the author’s involvement may be of interest to the reader. Schoenberg first introduced the mathematical idea of splines in 1946, although cubic splines had been used by draftsmen to draw smooth curves for many years [1]’ Splines have strange and wonderful properties. For example, higher-order polynomials that are used to inter› polate data are not practical for numerical work because of their large "oscillatory" behavior. However, piecewise-cubic "polynomials" with dis› continuity in the third derivative give good interpolation results with little oscillatory behavior and are in C 2 Mathematically, for example, (6b) shows that the appropriate spline function gives a minimal value of the quadratic forms, while (7a) shows that the spline approximations converge to f(t) in a very strong sense. For m = 1, the reader should recognize this convergence as given in Section 3.2.The author had derived these properties for quadratic forms in 1969 and was introduced to spline functions several years later by seeing the expression
fo (f(m)(t))zdt l
=
minimum
on the blackboard of Frank Richards, a student of Schoenberg. When the author was told that this "was a spline problem," he insisted it "looked like a quadratic form problem." Discovery is wonderful! We might mention that the mathematical interest in splines has been of "exponential growth" since approximately 1963. Finally, we know from personal experience that (higher-dimensional) splines were and are used to design the pleasing shapes of automobile bodies. To continue with our development, we note that weak and strong con› vergence have been characterized in Theorem 1 of Section 4.1.
170
4 The lnth-Order Problem
We now show that conditions (1) and (2) of Section 2.3 hold in light of the theorems on splines. Let Xo in d(O) = d be given. For (J = lin, n = 1,2, 3, ... , let xuit) be the unique element of L2rn(n) such that xuit) = xoP) for t in n«(J) and j = 1, ... .p, as described in Theorem 2. Let xu(t) = (Xul(t), Xu2(t), ... ,xup(t)). Condition (lb) of Section 2.3 now holds from Theorem 1 of Section 4.1. Theorem 4 Assume for each (J = lin (n = 1,2,3, ... ) that Xu is the arc constructed above which agrees with the arc Xo in d(O) = d at the points n«(J). Then Xu => Xo’ Thus, condition (lb) of Section 2.3 holds.
Since
Ilxu
-
xol1 2
= [x~~(O)
- x~M)][x~~(O)
+ SOl [x~': )(t)
- x~M)]
- xbrnj(t)] [x~': )(t)
- Xb:>(t)] dt,
(where IX = 1, ... ,p; q = 0, . . . .m - 1; IX and q summed; m not summed), the result follows from parts (a) and (b) of Theorem 3. Theorem 5
Condition (la) of Section 2.3 holds.
Since d«(J) is a subspace of d = d(O) for each (J from the weak completeness of Hilbert spaces.
= lin, the result follows
Theorem 6 If we define J(x; 0) = J(x), then J(x; (J) defined on d«(J) and given by (5) satisfies condition (2) of Section 2.3.
The proof of Theorem 6 is the "same" as the proof of Theorem 1 of Section 4.2 with minor notational changes. Let J(x) be given by (4). For (J = lin (n = 1,2, ... ), let J(x; (J) be defined on d«(J) and given by (5). Let s«(J) and n«(J) be the index and nullity of J(x; (J) on d«(J) and s(O) and n(O) be the index and nullity of J(x) on d. Then there exists (j > 0 such that whenever I(J! < (j Theorem 7
s(O) ::::;; s«(J) ::::;; s«(J)
(8)
+ n«(J) ::::;; s(O) + n(O).
This result follows by Theorems 4, 5, and 6 of Section 2.3. In many types of problems such as eigenvalue problems, focal-point problems, or normal oscillation problems, the nullity n(O) = 0 except at a finite number of points. In this case we have Corollary 8
n(O) (9)
Assume the hypothesis and notation of Theorem 7 and that (j > 0 such that whenever I(JI < (j, we have
= O. Then there exists a
s«(J) = s(O)
and
n«(J) = O.
We now wish to go in two or three directions at the same time. One direction is to define the finite-dimensional problem and construct the
4.4
Higher-Order Numerical Problems and Splines
171
matrix D(a) given below, where a = lin is the partition size. The second direction is to extend the above theoretical a setting to give a focal theory in a (A, a) setting. Finally, we wish to go in a "diagonal direction" that is to give a numerical-focal-point theory by "combining" these two theories. The (A, a) theory would proceed as in Theorems 3, 5, and 6 of Section 4.2. Since the approximate-focal-point theory has been essentially given in Section 4.2, we shall define the finite-dimensional problem where a = lin on the fixed interval [0, 1] and derive the associated matrix D. The task of a resolvent theory in the (A, a) setting is left to the reader but follows as in Section 3.2. We note that the results are as in Section 3.2. That is, if a = lin, a k = kin, ak+ 1 = (k + 1)ln, and ak < A ;S; ak+ l’ then the quadratic form J(x;A,a) = xTDkx, where x(t) is in d(a) and vanishes for t ~ ak+1, and Dk is the "upper" k x k submatrix of D. Thus, the resolvent problem is really the expected restriction problem. The continuity theorem also holds as in Section 4.2. To begin the finite-dimensional instruction, let IY., fJ = 1, ... ,p; i,j = 0, ... ,m-l; k, 1=0, ,n; and e=(IY.-I)(n+ 1)+(k+ 1), 1J=(fJ-l)(n+ 1)+ (l + 1) for z, IJ = 1, ,p(n + 1). Repeated indices are summed unless otherwise indicated. Let Z = (Zl(t), ... ,zp(t» be fixed in d(a). Assume the con› where struction described in the text above Theorem 4 with xa(t) = ~akYk(t), Yk(t) is a basis element of the spline space L2m(n) described in Theorem 1. We note that x~)(O) = ~akY~)(O) -+ z~)(O) and x~)(I) = ~akY~)(I) -+ z~)(I). From (5) we have J(x; a)
= H(x) +
fo1 R~pq(t)x~)(t)x~)(t)dt
U = AaP~akYk
ro (O)~PIYI
+
w(0) + 2B aiakYk u ro(O)~PIYI
w(1)
1
+ fo R~pq~ak~PIY~)(t)yP)(t)dt
C~P~akY~)(I)~PlyP)(I)
= X:~~ak~p"
where x:~
= A~py~)(O)yli)(O)
+
If we set I’, = ~ak> (to)
C~py~)yP)(I)
I’, = ~Pt>
+
2B~py~)(0)yP)(1)
+
f; R~pqY~)(t)yli)(t)dt.
and de~ = X:~,
we have
J(x; a) = de~(a)rer~.
We note that the matrix (d.~(a» is symmetric. For p = 1 and m = 1 we obtain a tridiagonal matrix for zero boundary data. For p = 1 and m = 2, we have a fourth-order equation where Yk(t) vanishes outside the interval [a k-2,a k+2]. Thus xi,j = 0 if Ii - jl > 3. For the general problem with zero
172
4 The 2nth-Order Problem
boundary data, we note that a different class of interpolating splines have support on at most 2m intervals. Hence our matrix will appear in diagonal form, each diagonal of length at most 4m - 1, and "separated" from the next diagonal by length n. Thus the matrix is sparse (a preponderance of zeros) and existing computer techniques may be used to find the number of negative and zero eigenvalues of this real symmetric matrix. Theorem 9 The indices s(O’) and n(O’) are, respectively, the number of negative and zero eigenvalues of the p(n + 1) x p(n + 1) matrix (de~(O'))' We close the theoretical ideas of this section by including some comments. Let ( be a real parameter and let K(x) be a second quadratic form similar == 0 if i + j = 2m. We have seen in Section 4.1 to J(x) in (4), where R~P(t) that K(x) is a compact quadratic form and hence J(x; () = J(x) - (K(x) leads to an eigenvalue theory. We can then repeat the development of this section to get a numerical eigenvalue theory (where ((,0’) replaces 0’) or a numerical-eigenvalue-focal-point theory (where ((, A, 0’) replaces (A, 0’)). Second, we note that the spline approximation theory can be applied to J(x) associated with an integral-differential equation. However, this leads to a matrix D(a), which is not sparse. Using a Given’s-type method [43J, there exists a sequence of matrices that can reduce D(a) to a matrix D(a) associated with ordinary differential equations. This seems to suggest that integral-differential equations and ordinary differential equations are related by a "change of basis," at least in some strong approximation sense. Finally, we make a few remarks about the construction of D(a) for Example 3 of Section 1.4. The differential equation is L(x) = X(4) - X = 0, and our quadratic form is J(x, y) = g [X"2(t) - x 2(t)] dt. The basis elements are the cubic spline functions, Yk(t) given as follows: if ak-2 ~ t < ak-1 0’3 + 30’2(t - a k- 1) + 3a(t - ak_1)2 - 3(t - ak- d 3 (t - ak_2)3
Yk(t) =
0’3
+
3a 2(ak+
(ak+2 - t)3
1
if ak-1 ~ t < ak - t) + 3a(ak+ 1 - t) - 3(ak+ 1
-
t)
if ak ~ t < ak+ 1 if ak+1 ~ t < ak+2 otherwise.
The trick is that we must fix k or Yk(t) and then compute dk. 1 = J(Yk, Yl) integrated separately over the four intervals [ak- 2, ak- 1J, [ak- 1,QkJ, [ab ak+ 1J and [ak+ 10 ak+ 2J. If Ik - II > 3, the result is zero. Also note that is not continuous. integrating by parts requires great care since y~/(t)
4.4
Higher-Order Numerical Problems and Splines
173
For illustration purposes, we evaluate y~(t)Y~-l(t) over the interval [ak-Z,a k- 1 ]. The reader may then form the approximate integral of r(t)y~(t)Y~_l(t), for example, for a general problem, by multiplying our value by r(t*) where t* = ak-l + a12. Thus
J{a a
k - 1 k - 2
"" dt = J{akk - - 21 [ ( t - ak-Z )3J"[(J 3 + 3(J Z(t - ak-Z ) YkYk-l a
+ 3a(t = fa:k~'
- ak_Z)Z - 3(t - a k_z)3]" dt
[6(t - ak-z)][3a - 18(t - ak-Z)] dt
= [9a(t - ak_Z)Z - 36(t - ak-Zn I: ~~ = 9(J3 - 36(J3
= -
27a 3.
Elliptic Partial Differential Equations
Chapter 5
5.0 Introduction
The main purpose of this chapter is to present an approximation theory of quadratic forms that is applicable to linear elliptic multiple integral problems, that is, to quadratic forms given in (1) whose Euler-Lagrange equation is given in (2), where (1)
J(x) =
IT {P(t)x
2(t)
+ [2Qi(t)X;(t)]x(t) + Rijx;(t)Xj(t)}dt
and (2)
L(x) = ~o ( Riit)xit)) - x(t) ( P(t) u~
«.,
.L ~oQ.) = O. u~ m
.=1
In the above, t = t 2 , , t m) is in ~m; i.] = 1, ... , m; x(t) is a real-valued function; ox/otj is written as Xj; P(t), Q;(t), and Rij(t) satisfy smoothness and symmetric properties described in Section 5.1; T is a subset of ~m described in Section 5.1, and repeated indices are summed. We note that much of the material for elliptic differential equations or multiple integral quadratic forms has been given in earlier chapters. For example, in Example 5 of Section 1.4we have given the relationships between the quadratic form (1) and the differential equation (2). In addition, many of our theoretical ideas and results have been given in Chapters 3 and 4. For example, our basic focal-point or oscillation-point ideas are as in Sections 3.1 and 3.3. The new material is that our applications are changed from ordinary differential equations to partial differential equations. We now have solutions to (2) vanishing on conjugate surfaces of T, whereas in 174
5.1 Summary
175
Chapter 3 we had solutions to the differential equation L 1 (x ) = (p(t)x’(t»’ + q(t)x(t) = vanishing at conjugate points. Similarly, the reader might antici› pate the expected ideas for eigenvalue problems and comparison theorems. For example, ifin (1) and (2) Qi(t) == Qt(t) == 0, P*(t) ~ P(t), and R*(t) ~ R(t) define two quadratic forms J*(x) and J(x) in (1), then J*(x) ~ J(x) and the conjugate surfaces of J*(x) occur after those of J(x). For these reasons, Section 5.1 will be a collection of previous ideas applied to problems defined by (1) and (2). Proofs and justifications that follow as those in the earlier chapters will often be omitted. While our numerical theory is similar to that in Section 3.2, there are many interesting problems that occur for problems defined by (1) and (2). The (hat) basis functions {Zk(t)} of Section 2.3 are replaced (for m = 2) by the products {zi(s)zit)},which are pyramid functions. Similarly, tridiagonal matrices are replaced by block tridiagonal matrices, which are tridiagonal blocks of tridiagonal matrices. In Section 5.2 we examine in detail this interesting numerical problem. Finally, in Section 5.3, we shall consider a new method of separation of variables by quadratic form methods. The fact that our numerical eigenvalue theory of Section 3.4 allows us to solve more general types of problems than usually solved by classical methods is of special interest. We shall also consider an undeveloped idea of "factoring" block tridiagonal matrices into "products" of diagonal matrices.
5.1 Summary
The purpose of this section is to summarize theories and ideas of earlier chapters in the setting of elliptic partial differential equations and multiple integral quadratic forms. This summary is in keeping with the author’s philosophy that our approximation theory can be applied to many problem areas and that more difficult problems can be more easily understood by understanding and solving easier problems. When we refer to earlier sections of this text (or do not prove theorems), it is because the ideas in this partial differential equations setting follow (immediately) as they did in earlier sections. We assume that the reader is acquainted with these earlier ideas. Our emphasis is on new results that follow in a similar manner to the results for problems of ordinary differential equations given in earlier chapters. Since there will be several topics covered in this section, we shall briefly outline these topics. We begin with the theory of quadratic forms by Dennemeyer. Our exposition is intended to parallel the earlier development and in particular the second-order problems of Chapter 3. Dennemeyer’s ideas are contained in his dissertation [7] and research article [8]. They
176
5
Elliptic Partial Differential Equations
follow from ideas in Hestenes [27J and [28]. Reference [8J contains many of the technical details for quadratic forms that we shall (mercifully) omit by assuming smooth problems and solutions. These details include ellipticity, Gaarding’s inequality, and coerciveness. In addition, they include much of the work of the founders in this area. The interested reader may wish to read this informative work in elliptic-type partial differential equations. For our purposes, Theorem 1, which gives the connection between conjugate surfaces, the quadratic form theory, and the Euler-Lagrange equations, is a major result. The second topic is the approximation theory of quadratic forms by the author, which is sufficiently general to handle the multiple integral quadratic forms. As in Chapter 3, the main results are given in terms of inequalities involving nonnegative indices. In particular, we show that the hypothesis for these inequalities are sufficiently general to include the resolution space or ), theory of focal point as well as continuous perturbations of coefficients of quadratic forms and partial differential equations. We then extend the approximation setting to obtain an approximate theory of conjugate surfaces. These results are then interpreted to obtain existence theorems and other properties for the multiple integral problem. The final topic of this section is comparison theorems for quadratic forms and partial differential equations. We begin the formal development of this section by giving the quadratic form theory leading to the partial differential equation described in Section 5.0. We shall define our fundamental Hilbert space (or Sobolev space) d, the quadratic form J(x) to be considered, and then state a main theorem relating quadratic forms to partial differential equations. The notation and ideas are found in Dennemeyer [8]. For ease of presentation, we refer the reader to this reference or to Hestenes [28J for technical details such as smoothness conditions on the coefficientfunctions R ij , Qi’ and P, on vectors x(t), and on B 1 types of regions as found in the works of Calkins and Morrey. These technical details are very important (and difficult) in the theory of partial differential equations, but the details contribute little to our understanding. Following Dennemeyer, we let m ~ 2 be a fixed positive integer, T c [Rm be a fixed region of class s’, t = (t 1, t 2, . . . , t m ) be a point in T, and x(t) be a real-valued function defined on T. A region is a bounded, open, connected subset of [Rm. We shall not define fixed region of class B 1 However, some examples (given by Dennemeyer) include (i) the interior of a sphere or interval in [Rm, (ii) the interior of the union of a finite number of closed contiguous nonoverlapping intervals, and (iii)the image of one ofthe regions in (i)or (ii)under a continuous, one-to-one mapping ¢(t) where ¢(t) and ¢ -l(t) satisfy a uniform Lipschitz condition on every compact subset of their
5.1
Summary
177
respective domains. If T 1 c T, let 1\ denote the closure of T 1 and Ti denote the boundary of T l ’ Let .Yt’ be the Hilbert space of vectors x(t) with inner product (1)
(x,y) = fT.xit)Yit)dt+ fTX(t)y(t)dt
with norm Ilxll = (x, X)1/2, where xit) = ax(t)/at j, repeated indices are summed, and i,j = 1,2, ... , m. Our fundamental quadratic form on .Yt’ is (2a)
J(x) = IT {P(t)x 2(t) + [2Qi(t)Xi(t)]x(t)
+ RJt)Xi(t)Xj(t)}dt
with associated bilinear form (2b)
J(x, y) = IT {Pxy + Qi(XYi + XiY) + RijXiYj}dt
where Rij(t) = Rji(t) and the ellipticity condition Rij(t)~i~j > holds for all t in the closure of Tand ~ = (~b ~2" .. , ~m)in IRm with ~ "" 0. The ellipticity condition means that J(x) is an elliptic quadratic form relative to (1). If K(x) is a quadratic form as in (2)with Rij 0, then K(x) is compact (seeSection 2.1). The associated Euler-Lagrange equation or extremal solution for J(x) is
=
(3)
E(x) =
!(Rij ax) ati at j
x(p -
f: aatiQi)
=
O.
i=1
This result is derived as Example 5 of Section 1.4.For convenience, we assume additional conditions upon R ij, P, Qi so that solutions of (3) are in .Yt’ n C 2 (T) . In the remainder of this chapter, we have assumed that all function spaces are subspaces of the Hilbert space d = Co(T) described in Dennemeyer [8, p. 623]. That is, the vectors x(t) are functions that "vanish" on the boundary aT = T* of T and are "smooth" on T. A conjugate surface T! of (3)is the boundary of a region T 1 C T of class B1 on which a nontrivial solution of (3) vanishes. Once again we remark that it would create great problems of exposition and understanding to consider "generalized" solutions of (3) or consider more general Hilbert spaces than defined here. The interested reader will find a good introduction to these topics and the necessary references in Hestenes [28]. Theorem 1 is a major result for us in that it allows us to relate the solutions of(3), which vanish on boundary surfaces, to the signature s and nullity n of the quadratic form (2). Thus conjugate points or oscillation points in Chapter 3 become conjugate surfaces. In this theorem, T 1 c T and T - T 1 is the set {t in IRm It in T, t not in T 1}, S denotes the closure of the set S in IR m, and T* denotes the boundary of T.
178
5 Elliptic Partial Differential Equations
Theorem 1 Let J(x) be the quadratic form given by (2). There exists a conjugate surface TT with corresponding extremal solution x(t) if and only if J(x, y) = 0 for all y in .Yt which vanish in T - T 1 This result follows by integration by parts and Dennemeyer’s discussion [8, p. 631] or by Example 5 of Section 1.4. We shall detour from our development in our next four paragraphs. In the first paragraph we consider an example problem. In the second paragraph we give some focal surface results for this example. Finally, we give some brief comments about eigenvalue problems. As an example problem, which we shall call Example 1, let b > 0 and T = {( s, t) in ~ 0 s S, t s b}. Let
zi
(4a)
J(x)
(4b)
J(x, y) =
=
r -
f:f:[G;
+
(~~r
2x
Z(s,
t)] ds dt,
- - 2x(s, t)y(s,t)] ds dt, ~ ob ~ b [ -OosX -OosY + -oxoy ot in 0
and
(5) In the above, Qi == 0, P(s, t) = -2, R l l = 1, R zz = 1, R 12 = R Z1 = O. Note that the matrix R = (Rij) is positive. Ifb = n, then x(s, t) = sin s sin t vanishes on T*, the boundary ofT, and E(x) = - sins sint - sins sint + 2sins sint = O. To derive these results in a more coherent manner, we note that Eq. (5) is a special case ofEq. (35) of Section 1.4. It is solved by separation ofvariables. Letting X(s, t) = S(s)T(t), we obtain
S"(s)T(t) + S(s)T"(t)
+ 2S(s)T(t) = 0
or
S"(s)/S(s) + 2 =
-
T"(t)fT(t) = J1..
The constant term J1. is obtained since the left-hand side is independent of t and the right-hand side is independent ofs. Furthermore, J1. > 0 and 2 - J1. > 0 since we desire solutions that vanish when t = 0 and t > 0 and when s = 0 and S > O. Setting J1. = C Z and 2 - J1. = d Z, we have
T"(t)
+ cZT(t) = 0,
T(O) = 0
and S(O) = 0,
5.1 Summary
179
which leads to T(t) = sin ct and S(s) = sin ds. Since c and d are natural numbers with CZ + d Z = 2, we have c = d = 1 or X(s, t) = sin s sin t as the (only) solution. As in Chapter 3, any multiple of X(s, t) is also a solution. We note that if P(s, t) = -1, there are no solutions, while if P(s, t) = - 50, there are many possible solutions (c,d), such as (7,1), (1,7), and (5,5). To anticipate future ideas of signature for Example 1 given by (4)and (5): For < A < b, let J"t’(A) be the space of functions x(s, t) defined on T(A) = [0, A]Z such that x(s, t) == on T - T(A). In this notation [0, A]z denotes the square {(s, t)IO :::;; s, t :::;; A}.Now n(A) is equal to or 1 since J"t’o(A) requires the solution x(s, t) = sin ssin t vanishing on T(A)*. It is equal to 1 if and only if A = kn :::;; b for k = 1, 2, 3, .... As can be anticipated from Section 3.1, s(},) = k if kn < A:::;; (k + l)n :::;; band S(A + 0) - S(A - 0) = n(A). We shall not consider eigenvalue problems in detail in this section, but the reader may see that Example 1 also illustrates an eigenvalue problem J(x;~) = J(x) - ~K(x), where K(x)
=
S: S:
XZ(s, t) ds dt,
with function spaces defined in the previous paragraph. If A = kn = b for k = 1,2,3, ... , then ~ = 2 is an eigenvalue of this problem with eigenvector x(t) = sin s sin t, which vanishes on k conjugate surfaces. In this case, the conjugate surfaces are Ti, where T, = [0, nl]z for 1 = 1,2,3, ... , k - 1. Thus, the reader may develop for himself the duality between ~ and Aas in Fig. 1 of Section 3.3. To complete our first topic of the relationship between the quadratic form (2)and the associated partial differential equation (3), we shall combine the ideas and results of Section 3.1 with those of Dennemeyer [8]. Dennemeyer’s theoretical results are virtually identical to Theorems 1 and 2 of Section 3.1. The application of these theoretical results to conjugate surfaces of partial differential equations is of great interest. We now give some of Dennemeyer’s work. In Dennemeyer [8, p. 627], a resolution is given identical to the {J"t’(A)} resolution in Section 2.3. Dennemeyer then quotes the signature results in Hestenes [27], which are similar to Theorem 1 and 2 of Section 3.1. Specific applications (then) given to our particular problem, namely to a collection {T(A)IA.’ s A s A"} of subsets of IRm and to a corresponding space offunctions denoted by {d(A) IA’ s A s A"} are of interest. The one-parameter family {T(A)IA.’ :::;; A:::;; A"} has the following properties: (a) T(A’) consists of a point of IR m , or else has (m - l)-dimensional measure zero, while T(A") = T; (b) T(A) is a region of class B1 , A’< A:::;; A";
180
5 Elliptic Partial Differential Equations
(c) if A1> A2 are such that X s Ai < T*(Al) n T(A2) is not empty; (d) if Ao satisfies A’ :::;; Ao < A",then
)’2
< X’, then T(Ad c T(A2)’ and
T(),o) = n T(A),
(e) if Ao satisfies A’< Ao :::;; A", then T(Ao) = u T(A),
The following theorem is then proven. Theorem 2 Let {T(A)} be a family of subsets of T having properties (a)-(e). Define the family {d(A)} of subsets of d as follows: (i) zero on (ii) support
d(A’) is the set whose sole member is the function which is identically T, and d(A") = d; If A is such that A’< A < A", d(A) is the set of all x in d having set contained in T().). That is, x == 0 in T - T(A).
Then the family {d(A)} is a family of subspaces of d for which the resolu› tion properties of Section 3.1 hold.
Dennemeyer gives the following examples of sets T{A} with properties (a)-(e) above. Example 1 Let to in /Rm be fixed, and let T(),) denote the interior of the sphere It - tol = A, for 0 < A:::;; r, r a fixed positive number. Let T(O) = {to}, T = {t:lt - tol < r}. Example 2 Let T be a given interval (a, b) having positive measure, and let A denote length measured along the diagonal joining the points a and b, where A" = Ib - al. Let Ck denote the kth direction cosine of the line joining a to b. Let T(O) = {a} and T(),) = (a, a
+ AC),
0 < ), :::;; X’.
Example 3 Let T = (a, b), and let to denote the center of T. Define the family {T(A)}for 0 < A:::;; 1 by T(A) = (to - t),(b - a), to + tA(b - a))
and let T(O) = {to}. Example 4 Let S denote an interval (a, b) of positive measure, and let to be a point on the boundary S*. Let V denote a hypercube (to - th, to + th), where h > 0 is fixed. Let T be the union of S with V. Let {S(),)} be the family of expanding subsets constructed for the interval S in the same manner as for the interval in Example 2, for 0:::;;), :::;; X’, where X’ = Ib - al. Let {V(A)}
5.1
Summary
181
be the family of cubes V(A) = (to - !Ah, to + !Ah),
0< A::; 1,
centered about the point to, and let V(O) = {to}. Define the family {T(A)} of subsets of T by T().) = S(A) if 0 ::; A < A" and T(A) = S u V(A - A") if A" ::; A::; A" + 1. Instead of expanding to fill S and then T, one can have the family of subsets {T().)} expand to fill V first, then T. Alternatively, one can have the family {T(A)} expand into both sets simultaneously if T(O) =
{to},
0< A::; A".
T(A) = S(A) u V(AjA"),
In Theorem 7.3 of [8], Dennemeyer obtains results virtually identical to Theorem 2 of Section 3.1. The space of functions we denoted by .1f(A) in Section 3.1 is the space offunctions Dennemeyer denotes by d(A) associated with the subsets T(A) as noted in Theorem 2 above. Assume s(A) is the signature of the quadratic form in (4) on .1f(A) or d(A) with the other indices similarly defined. Then the number f(A) = n(A) of Theorem 2 of Section 3.1 is the number of linearly independent solutions of (3) that vanish on T*(A) in the maximal set of such nontrivial solutions. Thus, s(A o) counts the number of conjugate surfaces T*(A) in the interior of T(Ao) where a conjugate surface T*(A) is counted according to the number of linearly independent solutions of (3) vanishing on T*(A). Note that in our example problem the boundary of T = [0, n]2 is a conjugate surface. If in this example we change the coefficient 2 to 50 in (4) and (5), then the functions Xl(S, t) = sin s sin 7t, xis, t) = sin 7s sin t, 2x;as 2 X3(S, t) = sin 5s sin 5t satisfy a + a2x;at 2 + 50x = 0 and vanish on the boundary of T = [0, n]2’ Thus T* is a conjugate surface and counted three times. In this case, if s(A) is the signature of J(x) on .1f(A) associated T(A) = [0,,1]2’ thenf(n) = s(n + 0) - s(n - 0) = 3. For m ~ 2, we consider the problem given by Dennemeyer and defined by (6)
(7)
~(ax) at i ati
+
JlX = 0
x(t) =
0
in
T
for
t in
=
(0, b)m, T*,
where f1 > 0 and repeated indices are summed. Dennemeyer’s presentation is so well done that we make only editorial changes on the next several paragraphs. The terms not defined previously in this section have similar meaning to the same terms in other sections of this text. We shall call this Example 2. Once again, we remark that there are many implications for eigenvalue problems that the reader may wish to explore.
)82
5
Elliptic Partial Differential Equations
The quadratic form of interest is
(8)
IT (XiXi -
J(x) =
2
/lX ) dt.
The associated Euler Equation is given by (6). The class of extremals is the class of solutions of Eq. (6). If x is an extremal, then the J-orthogonality condition
I: (XiYi -
J(x, Y) =
/lXY)dt = 0
holds for every Y in .91 and x(t) is analytic on (0, h). The class do of J-null vectors of .91 consists of all solutions of (6) with boundary conditions (7). There are at most a finite number of linearly independent solutions of this problem, Separable solutions of (6) and (7) are of the form (9)
=
X
nk1l:t
m
sm-- k , O hk k= 1
where the set (nb ’ .. , n m) of positive integers satisfy the equation
I
m
(10)
(n.)2 u ---l.
j=1
hj
=2’ 11:
The set of functions of the form (9) spans the class do of J-null vectors of d. There are a finite number v of linearly independent functions of this type, and the number v is the nullity of J on d. If x is a function of the form (9) with positive integers satisfying (10), then x is in do. Since the nullity of J on d must be finite, there are at most a finite number oflinearly independent functions of this type. Suppose now that x(t) is in do and let the Fourier series for x(t) in T be m
00
I _a
x(t) =
p l ’ " Pm
PI, ’ .. , Pm -1
J]
Pk1l:t k
sm -h-’
k-1
k
Since x must satisfy Eq. (6),
I _a PI, ... ,Pm-1 00
m
[
p l ’ " Pm
-
.~11:
2 Pj
~
+ /l
)-1)
]
J] m
k-1
holds on every closed set in T. Hence, whenever {Pj} of positive integers must satisfy (11)
Pk1l:t k
sm -h-
=
0
k
apI’ , ’Pm
# 0, then the set
Im (pj)2 -h j -_211:/l’
j=1
There are at most a finite number of distinct sets { Pj} of positive integers that satisfy this last relation. Thus x(t) must be a finite linear combination of
5.1
Summary
183
functions of the type (9). The number v of linearly independent functions of this type in a maximal set is the nullity of Jon d. In fact the nullity of Jon d is given by M, where M denotes the sum of the counts of all distinct sets (Pb’ .. ,Pm) of positive integers which satisfy (10). A set (PI" .. ,Pm) is counted m!/r! times whenever it has r of its elements alike. We conclude this example by counting focal points. That is, we illustrate Theorem 2 of Section 3.1in this setting. Let 1denote the length of the diagonal from 0 to the point b. Define the family {T(A)} of subintervals by T(A) = (0, Ab/l), 0 < A < 1. Let {d(A)} be the corresponding family of subspaces of d. A function x is a J-null vector of d(A) if and only if x is a linear com› bination of functions of the form
k = 1, ...
,m,
with (Pb ... , Pm) a set of positive integers satisfying (12) There is a set Ab . . . ,AN of values A in the interval 0 < A < 1, such that for each Aj there exists at least one set (p I, . . . , Pm) of positive integers satisfying (12). These values Ajoflength along the diagonal are the distinct focal points of J relative to the family {d(A)}. The corresponding intervals (0, Ajb/l) have boundaries that are the distinct conjugate surfaces T*(Aj) of J in T. Let M(Aj) denote the sum of the counts of sets (PI, ... ,Pm) of positive integers satisfying (12) with A replaced by Aj’ the count being made as indicated previously for j = 1,... , N. Then the signature of J on d is the sum N
s(b) =
L M(Aj)’ j= I
Dennemeyer [8] gives an interesting example in polar coordinates, which we now give and shall call Example 3. Let Tin [R;z be the interior of the circle of radius R about the origin. Separable solutions of Eq. (6) with m = 2 in polar coordinates that are single valued in T are of the form x = J p(W)[Cl cos pB + Cz sinpB], where Cl’ Cz are constants, P = 0, 1,2, ... , and J p is the Bessel function of the first kind of order p. The class of J -null vectors contains no nontrivial functions unless J1.R > t Ol ’ where t OI is the first zero of J o(t), and in any case the nullity will be either zero or one. Let T(},) be the interior of the circle of radius Aabout the origin, for 0 ~ }, ~ R. Then T*(A)is a conjugate surface if and only if J p(J1.A) = 0 for some P = 0, 1, 2, .... Let J 0’ J l , ’ . . ,J p be the Bessel functions of integral order that have at least one zero in the interval 0 < A < J1.R, and let vq be the number of zeros
184
5 Elliptic Partial Differential Equations
of Jit) in this interval, for q = 0,1, ... .p. Then the signature of J(x) on T is = I~=o vq . This value is the same for any expansion in sets {T(A)}having the properties (a)-(e) listed before Theorem 2. The second topic of this section is to consider an approximation theory applicable to the problems defined by (2) and (3). We note that our main interest is in the application of the theoretical results in Sections 2.3 and 3.5 to partial differential equations and in particular to ideas and examples given earlier in this section. We begin with a briefsummary ofthese theoretical results. In Theorem 6 of Section 2.3 we showed that the indices s(o-) and n(a), which are respectively the signature and nullity of J(x; a) on d(a), satisfied the fundamental inequality s(ao)::::;; s(a)::::;; s(a) + n(a)::::;; s(ao) + n(ao). This in› equality was extended in Section 3.5 to include an approximation-focal› point theory. We shall briefly restate this extended result in Theorem 3 to fix ideas and notation. Let M = A x L be the metric space with metric d defined by d(f.1.1,fl2) = IA 2 - All + p(a2,ad, where fll = (Al,al)’ fl2 = (A2,a2);(L,p) is a metric space; and A = [a, b] with the usual absolute valued metric. For each fl = (A, a) in M and J(x; a), define J(x; fl) = J(x; a) on the space P-8(fl) = d(a) n JIC’(A). The set {JIC’(I.)IA in A} is the resolution space defined just above Lemma 9 of Section 2.3. Let S(fl) = S(A, a) and n(fl) = n(A, a) denote the index (signature) and nullity of J(x; fl) on iJ(fl). In many senses Theorem 3 is the main result for applications to approxi› mation problems of this section. It allows us to obtain conditions (13) and (14) in general problems of partial differential equations. We note that the reader may now redo much of Chapter 3 only with applications to partial differential equations.
s
Theorem 3 Assume that the quadratic forms Jix: a) and the spaces d(a) satisfy (1) and (2) of Section 2.3. For any flo = (Ao, ao) in M, there exists J > 0 such that iffl = ()., a), d(flo, fl) < 15, then
(13)
s(Ao,ao)::::;; s(A,a)::::;; s(A,a)
+ n(A,a)::::;;
s(Ao,ao)
+ n(Ao,ao).
Furthermore,
(14)
implies
S(A, a) = S(A o, a 0) and
n()., a)
= O.
We now interpret Theorem 3 for the setting of this section. As examples, the reader may regard J(x; a) as perturbations of J(x) in (2) that may include an eigenvalue parameter ~. For our numerical work in Section 5.2, d(a) will include the doubly linear first-order spline functions described there. Resolution space examples are given earlier in this section.
5.1
For each a in
(15)
J(x; a)
=
~,
Summary
185
let
IT {P.,.(t)x
2(t)
+ 2[Q.,.i(t)xlt)]x(t) + R.,.ij(t)xi(t)xj(t)}dt
be defined on a subspace d(a) of d, and let (16)
E (x; a ) = - a ( R.,.ij(t) -ax) - x ( P.,.(t) at i at j
~
aQ.,.i) = at i
L, - i=l
be the associated Euler-Lagrange equation. For each A in A = [a, b], let {Yf’(A)I}, E A} be a resolution of d. As above, Yf’(},)is now assumed to be the set of functions x(t) in d with support in T(A). By Theorem 1 we have Theorem 4 The nullity n(Ji) = n(A, a) is the number of distinct nonzero solutions to (16) vanishing on PJ(Ji). We note that for a 0 fixed S(A,a 0) and m(A,a 0) = S(A, 0" 0) + n(A, 0" 0) are nondecreasing, nonnegative integer-valued functions of A. We have shown above that S(A - 0, 0") = S(A, 0") and that S(A + 0, a) = S(A, 0") + n(A0"). Thus s(}, + 0,0"0) - S(A - O,ao) = n(A,O"o). These results follow from (13). This disjoint hypothesis is usually called normality in problems of differential equations, calculus of variations and control theory. Chapter 6 contains a thorough discussion of these topics. A point A at which s(}" 0"0) is discontinuous will be called a focal point of J(x; 0"0) relative to Yf’(A)(A in A). The difference f(A, 0"0) = S(A + 0,0"0) › S(A - 0, ao) will be called the order of the focal point. A focal point will be counted the number of times equal to its order. Theorem 5 records many of the results for this problem. Theorem 5 Assume for 0"0 in ~ that PJO(AbO"O) n PJo(}’z,O"o) = when }’z, then f(a, 0" 0) = 0, fV., 0"0) = n(A,O"o) on a :::;; A s b. Then if Ao in A, the following quantities are equal:
}’1 =I-
(i) the sum La";A
For the approximation setting we can say much more. In the next two results we assume that 0"0 in L satisfies PJO(At> 0"0) n PJO(AZ, 0"0) = when
186
5 Elliptic Partial Differential Equations
At # Az• Since this implies that n(A,(Jo) = 0 except for a finite number of points Ain A, we have Theorem 6 Assume A’and A"are not focal points of (Jo (a :s; A’ < A" < b) and Ai(J0) :s; Aq+ t ((Jo) :s; ... :s; Aq+k-t((Jo) are the k focal points of (Jo on (A’, A"). Then there exists an e > 0 such that p((J, (Jo) < e impliesAq«(J) :S;Aq+ t«(J) :s; ... :s; Aq+k- d(J) are the k focal points of (J on (A’,A"). Corollary 7 The kthfocal point (Jk(A) is a continuous functiontk = 1,2, ... ), as is the kth conjugate surface. As an example of our methods, we use Theorem 5 to generalize Corollary 8.3 of Dennemeyer [8]. We assume that Rij(t) = Raoi/t) and P(t) = Pao(t) are defined on T and P(t) > 0 on a fixed subspace T(Ao) c T, where a < Ao < b. Then Theorem 8 There exists a b > 0 such that if f.lo = (Ao,(Jo), f.l = (A,(J), and lAo - AI + p((Jo,(J) < (), then no solution on T(,1) of the differential equation
a~i ( Rai/t) ;~)
- Pa(t)x = 0
oscillates in T(,1) in the sense that no conjugate surface is properly contained in T(A). The hypothesis implies that
f
T(J.)
[Rij,(t)Xi(t)Xj,(t) + P(t)XZ(t)] dt > 0
for x(t) in £(,10) and hence that S(Ao,(Jo) = 0 and n(,1o,(Jo) = O. Thus, by the above, there exists b > 0 such that s(,1, (J) = 0 and n(,1, (J) = 0 whenever lAo - AI + p((Jo, (J) < b. This completes the proof. We remark that, similar to Chapter 3, the parameter (J above can include the eigenvalue parameter ~. For example, let K(x; (J) = Sr QAt)xZ(t)dt for (J in L. Define H(x; (J,~, A)= J(x; (J) - ~K(x; (J), where ~ is a real parameter. Theorems 5, 6, and 7 generalize to the corresponding eigenvalue results for elliptic-type partial differential equations. Our third topic in this section is comparison ideas for conjugate surfaces of elliptic partial differential equations or equivalently, the related signature theory of quadratic forms. Thus, our results are a different application of the same or similar theory of Section 4.3 and earlier comparison results of this text. We assume the reader is familiar with the material in Section 4.3. Hence we give few new results, but leave these results as an exercise to the reader. Theorem 9 and the resulting comments are the expected results given in Dennemeyer. Historically, they follow from the signature theory originally
5.1 Summary
187
given by Hestenes, such as in the latter part of Section 2.2. The actual me› chanics of the proofs of these results follow immediately by the reasoning involved in the proofs of Theorems 1 and 2 of Section 4.3 and are left to the reader. We note that these results are also obtained in a more classical way and that there are Picone-type identities for these problems as in the third paragraph of Section 4.3. The interested reader should refer to Chapter 5 of Swanson [51].
Theorem 9 Let J*(x) =
IT [P*(t)x
2
+ 2Q{(t)XXi + Rt(t)XiXj]dt
(i, j = 1,2, ... ,m) be a second quadratic form on d having suitable coefficients P*(t), Q{(t), Rt(t) such that the properties of J in (2a) hold for J*. Suppose that J*(x) ~ J(x) holds for all vectors in d. Let (17)
E*(x) =
~
at i
(Rt
ax) - x(p* at j
f OQf) 0 ot j =
i=l
be the Euler equation corresponding to J*. Let {T(A)} be a family of subsets of T having the expected properties. Then the theorems on focal points and conjugate surfaces hold for Eq. (17). Let Tf, T!, ... , T~ be the distinct con› jugate surfaces of Eq. (3) ordered according to the increasing and distinct focal points of J in the interval, and let TT’, T!" ... , T~~ be the distinct conjugate surfaces of Eq. (17) ordered according to the increasing and distinct focal points of J* in the same interval. Let T" r = 1,2, ... ,N, be the member of the family {T(A)} having as its boundary T~ and let T~, r = 1,2, ... ,N*, be the member ofthe family {T(A)}having as its boundary T~'. Then T r C T~, r = 1,2, ... ,N*. If J*(x) > J(x) holds for all nontrivial functions x in d, then T’r C T~, r = 1,2, ... , N*.
Less generally, the relations between the conjugate surfaces stated in the conclusion of Theorem 9 hold for the conjugate surfaces of the differential equations (18)
(19)
E(x)
E*(x)
= -a
at.1
( R i • -ax) lat•)
-
P(t)x
=0
’
a ( R~· -ax) - P*(t)x = 0 = -at. at. I
I)
)
provided Rt/t)~i~j ~ Rij(t)~i~j and P*(t) ~ P(t) holds for t in T and ~ in [Rm. Ifstrict inequality holds for some t in Tin at least one of these inequalities, then the proper inclusion of the conjugate surfaces T~ in T~ hold for r = 1, ... , N*. Thus, for example, if P(t,Jl) is strictly increasing in Jl for each t in
188
5 Elliptic Partial Differential Equations
T, then if equations
j1*
> u, the proper inclusion of conjugate surfaces holds for the a ( ax) ati Rij at j
-
Pit: Il)x =
and a ( Rij at ax) ati j
-
P (r: t, 11 *)x -_
.
Finally we remark that Theorem 7 and subsequent theorems of Section 4.3 contain the ideas to extend these ideas of conjugate surfaces for nonlinear partial differential equations. Once again we leave these ideas to the reader.
5.2
The Numerical Problem
In this section, we give a new theory, procedures, and results for the numerical computation of conjugate surfaces of the quadratic form (2) and Eq. (3) of Section 5.1. The technical results are similar to those given in Chapter 3 for the second-order differential equation (r(t)x’(t»)’ + p(t)x(t) = and are often left as an exercise for the reader. As we expect, these numerical results are not as good as those in Chapter 3. To fix ideas and to make the calculations easier, we often consider in this exposition an elementary example of (2) and (3) in Section 5.1. We let m = 2, R ll (t) = Rdt) = 1, Rdt) = R 2 1(t) = 0, P(t) = 2 and obtain (1), (2), and (3) below. It is immediate that any multiple of X(tr.t2) = sin r, sin z, satisfies the differential equation (1) and the boundary conditions (2), and is an extremal solution of (3) on the square interval T = [0, bJ2 C 1R 2 , where b is a large fixed positive number. In our development, we shall be explicit enough to allow the reader to implement our ideas for more general coeffi› cient functions than the constant functions for P(t) and Rij(t) given above. We have considered other cases that yield similar numerical results, but they will not be given here. A summary of our ideas is as follows: (i) The partial differential equation and initial conditions (1)
and (2)
x(O, t 2 )
=
(0 .:::;; t 1
.:::;;
b,
.: ;
t2
.:::;;
b)
5.2 The Numerical Problem
189
are (ii) replaced by the quadratic form (3)
J(x) =
IT [xI(t) + x~(t)
- 2x Z(t)] dt , dt z •
(iii) A finite-dimensional quadratic form with matrix D(u), which is real, symmetric, and block tridiagonal, is shown to be a numerical approximation of (3). (iv) We then compute x".{t), the Euler-Lagrange equation of D(u), and show that, if properly normalized, x".{t) converges to the solution xo(t) of (3) as a -> 0. In our example problem, the numerical solution xa(t) is the discrete bilinear approximation of xo(t) = sin t 1 sin t z corresponding to a mesh size of a, Unfortunately, we cannot directly compute a solution using D(u) as we can in the case of second-order differential equations of Section 3.2. In that situation, where D(u) was a tridiagonal matrix, we can directly compute the numerical approximation xo(t) (see (8) of Section 3.2). This problem of direct computation is to be expected from the theory of elliptic partial differential equations, which requires boundary data on all of T*, the numerical theory such as in Forsythe [9], or the heuristic feeling of roundoff error and insta› bility no matter the accuracy of the computer. We shall verify that D(u) is correct by checking the known discrete solution and by relaxation methods that are discussed below. We remark that this problem is insidious. Our algorithm leads us to the belief that there should be no problem, in that numerical solutions should be computed in a step-by-step manner on ever› expanding regions. Unfortunately, numerical problems always all too soon rear their ugly heads. Test runs with double-precision IBM FORTRAN (involving about 16 significant figures of accuracy) and CDC FORTRAN with approximately double the number of significant figures yield unstable solutions. The CDC results took longer to become unstable, but they do become unstable. We begin our numerical procedure by choosing L to denote the set of real numbers ofthe form o = lin (n = 1,2,3, ... ) and O. For a = lin define the two-dimensional partition nz(a) = n(u) x n(a) of the square [O,b]z, where ak = kbln (k = 0,1,2, ... ,N,,) and n(u)
= (ao = <
a1
<
az
< ... <
aN"
= b).
We assume, for convenience and without loss of generality, that aN" = b. The space d(u) is the set of continuous bilinear functions with vertices at nz(u). Thus d(u) is the vector space of bivariate splines with basis Zij(tb t z) = Yi(tdyitz), where Yk(S) (k = 1, ... , N; - 1) is the one-dimensional spline hat function given in (5) of Section 3.2,
()_{I- Is - akl/a
Yk S
-
if ak-1::;; S::;; otherwise
ak+1
190
5 Elliptic Partial Differential Equations
The basis elements zij(t 1 , t z) are pyramids with apex or vertex at the point (a;, aj, 1) in [R3 and support in the square with corner points P 1(ai-1, a l : 1), P z = (ai-b aj+1), P 3 = (ai+baj-1), and P 4 = (ai+b Qj+1)’ Finally, let .91(0)
denote the space of "smooth" functions described in Section 5.1, defined on the rectangle T = [0, b]z c [Rz, and vanishing on T* = aT, the boundary of T. The reader should see Fig. 1 for the appropriate picture.
,j£------0i-l
-+-_.----------------~s
OJ
Fig. 1
0i+1
Note that all points P are on the surface.
For each A in [0, b], let £(A) denote the arcs x(t) in .91(0) with support in the square interval [0, A]z of [Rz. If J1 = (.:1., o) is in the metric space M = [0, b] x ~ with metric d(J1b J1z) = l.:1. z - A1) + lu z - a 11, let fZB(J1) = d(u) x £(A). Thus, an arc x(t) in fZB(A, o) is a bivariate spline with support in [0, ak] Z c [Rz where ak s A < ak+ l ’ Because of our sample problem with constant coefficients, we define J(x; J1) = J(x; 0’) as in (3), restricted to the class of functions .91(0’). In the more general case, we would define J(x; c) similar to (6) in Section 3.2 where, for example, P l1(t) = Pia., a) if t is in the square given by P 1, P z, P 3 , and P 4 above. A straightforward calculation in the next paragraph (for a i= 0) shows that J(x; J1) = cac pd allJ1) = C TD(J1)C, where x(t) = Caw it), C = (CbCZ’" .)", d ap(J1) = J(w a, wp;J1), and D(J1) is a symmetric tridiagonal block of tridiagonal matrices increasing in A so that the "upper" submatrix of D(ak+ 1, rr) is D(a b c), In the above, wit) = Zi,j(t) where the correspondence IX~ (i, j) is one to one and given after (4d) in Section 5.2.
5.2 The Numerical Problem
191
To construct D(a), we assume the double subscripted notation above; then J( z•• Z )= I,}’ k,1
=
OZk,1 OZi,j OZk,1- 2 z••z z z ] dt d t -fc0bfcb0 [OZi,j ot ot+ot ot k,1 1 Z l l z z I,}
f:f: [Y;(tdYitz)(Y~(tl)YI(tZ)
+ (Yi(tl)yj(tZ))(Yk(tl)y;(tZ))
- 2(Yi(tl)y/tZ))(Yk(tl)Yt(tz))] dt , dt z•
Ii- kl
II
If > 1 or if U- > 1, J(Zi,j, Zk,l) = 0 since the product function Zi)t)Zk,l(t) is identically zero. Otherwise, we have (4a)
(4b)
and (4c)
+ Yi+l(tdYi(tl)yj+l(tZ)Y~{tz) + 2Yi+ l(tdYi(tl)Yj+l(tZ)Yj(tz)] dt 1 dt z
Note that following Theorem 1 of Section 3.2, in the calculation of the dk,k element, we have (setting pAt) = q,,(t) = 1 and Yk(t)= Zk(t))
c. ak-I
’Z()d _2 Yk t t(J
and
rak+1 Jak-t
z
Yk(t)dt =
2
’3
(J.
192
5 Elliptic Partial Differential Equations
This yields the second equality in (4a). Similar results hold for the second equality in (4b) and (4c). We have carried out our calculations so that non› constant coefficient cases may be easily considered by the reader. Thus, for example, (4a) would become
22
11 22 2 2 2 2 Raij(a;, aj) -;;"3 a + Rai/a;, a) "3 a -;; - P aij(a;, a) "3 (J"3 a
(4d)
if J(x) is given as (2) of Section 5.1 with m = 2 and Ql(t) = Q2(t) = O. We now show that D(a) is the approximating finite-dimensional matrix to J(x) on T, and hence D(A, (J) is the approximating finite-dimensional matrix to J(x) on R(},) n d(a). Let IY. = IY.a(i,j) = Nai + j (i,j = 1, ... ,Na) and f3 = f3a(k, l) = N ak + I (k, I = 1, ... , N a)’ Let wa(t) = zJt) and xo(t) be an extremal solution of (3). Let C = {C 1,C2,C3""} be the Euler-Lagrange solution of D«(J), i.e., D(a)C ~ 0, where "~" is described in Section 2.3 if D«(J) is tridiagonal and below if D«(J) is block tridiagonal. For motivation, we refer the reader to Fig. 3 of Section 3.2 and to the surrounding discussion leading up to (8) of Section 3.2. Our situation in this section is similar except that the real elements di,j and cj in that figure are replaced by tridiagonal matrices and vectors. The integration-by-parts motivation for J(x, y) in Section 3.2 is replaced by identical motivation of integration by parts for J(x, y) in (26) of Section 5.1. The reader should return to Example 5 of Section 1.4 for this type of integration by parts. We now state our main theorem on numerical approximation. The result is similar to Theorem 2 of Section 3.2. The proof of Theorem 1 follows with similar arguments, ideas, and theory as the proof of Theorem 2 of Section 3.2. We leave this proof to the reader. We assume xa(t) = cawit), where C = {Cl,C2,C 3",,} is the solution to D«(J)C ~ 0 as described above and xa(t) is properly normalized. Then Theorem 1 The vectors {xa(t)} converge strongly to xo(t) (as a ~ 0) in the derivative norm sense of (1) of Section 5.1; that is, if
g(a)
=
fT[(8~1
+ [xo(t) then g«(J)
~
0 as a
~
Y+ (8~2
[xo(t) - xAt)]
[xo(t) - xa(t)]
Y
xa(t)] 2 ] dt,
O.
We shall now describe in more detail the matrix D(a) and the Euler› Lagrange equations. This type of matrix is found in more classical settings of numerical solutions of partial difference equations where we have finite difference approximation of the derivatives (see Forsythe [9]). Our methods
5.2 The Numerical Problem
193
are different in that we approximate the integration problem, which should be smoother. Note that Theorem 1 gives very strong convergence results even when the coefficient functions are not very smooth. We hope and expect our ideas to shed more light on block tridiagonal matrices of this type. Thus we hope to show in later work (by separation of variables) that D(O") is a linear combination of tridiagonal matrices analogous to the con› tinuous case. The picture is as follows. The Euler-Lagrange equation is
(5)
E1 Gz 0 0
F 1 0 0 ... E z F z 0 ... G3 E 3 F 3 ’ " 0
G4-
E4- ...
C1 Cz C3 C4
~O.
In the above En’F n, and Gn are N x N tridiagonal matrices, En is symmetric, G~ = F n - I , and C n is an N x 1 column matrix corresponding to the points {(an,tz) It z E n(O")}. If A =I- b, then the latter elements of En’ Fn, Gn, and C, contain the appropriate zeros. The matrix equation (5) is similar to (8) of Theorem 3.2 and becomes (for m = 1,... , N) (6)
The associated computer equation (for k = 1,... , N) with g’r, k-1 denoting the (k, k - 1) element of Gm is (7)
g’r,k-1Cm-1,k-1
+ g’r,kCm-1.k + g’r,k+1Cm-I,k+1
+ e’r,k-Icm,k-1 + e’r,kCm,k + e’r,k+ICm,k+1 + f’r,k-1Cm+I,k-1 + f’r,kCm+1,k + fk,k+ICm+I,k+1 = O. In all cases a subscript zero indicates a zero element in (7) or zero matrix in (6), as does an index that takes us past the value of A, that is, a value of a subscript I for which a, > ,1,. As we indicate above, (5), (6), and (7) do not yield a direct numerical solution as does (8) of Section 2.3 for the second› order case. There is a further numerical complication for J(x), which we shall describe in more detail below. We now describe some numerical results of our algorithm. Our test results involve two different ideas that we label Case A and Case B. In Case A we check the matrix D(O") for our sample problem. In Case B we use the method of relaxation to compute a numerical solution. The notation L.c"cpe"p denotes CTD(O")C, where D(O")C ~ 0 is pictured in (5). Case A Direct verification: In this case, we take T = [0, nJ z c [Rz and choose a step size of 0" = n/70. The known solution is xo(t) = sin t I sin t z.
194
5 Elliptic Partial Differential Equations
We build a numerical solution with elements Cjj = sin a, sin aj, and letting C, and D(a) = (eaP) as described above, we obtain the sum LcacpeaP = 0.952 X 10- 7 . For completeness, we include the computer program listing of this example in Fig. 2. The computation was performed in double precision and is the approximation of J(xo) in (3). The exact result is found in the following calculation: (8)
f: fo"(cosZt1sinZtz + cosZtzsinZt1 - 2sinZt1sinZtz)dt1dtz =
«. f: sin? t z dt z + I: cos u, dt z I: sin? t
f: cos 2t 1
1
dt 1 = 0,
where we use the identity cos 2t = cos? t - sin Z t twice. The error in this calculation is due to our bilinear approximation of the surface xo(t) = sin t 1 sin t z. For the case T 1 = [O,I]z with A= n/2 + n/90, we obtain Lcacpeap = 0.402 X lO- z which illustrates, at least, that the value 0.952 x 10- 7 is (somewhat) close to zero as expected. This is the numerical approximation of the function equal to sin s sin t on [0, n/2] z, bilinear in sand t on T 1 and vanishing on the boundary T! of T 1 This number is not meaningful except to note that it must be relatively large and positive. Ifit were negative, there would be a vector on Til vanishing on T! such that J(x; 0") ~ O. This would imply that with A = n/2 + n/90 we have S(A,O") + n(A, 0") ~ 1, which is not possible until A ~ n. Note that sin S sin t integrated over [0, n/2]z in (8) would also be zero but we must "wait" for a conjugate surface, that is, until this function vanishes on the boundary of T. We remark that the integration (8) with I = n/2 + n/90 replacing tt as the upper limit yields a negative value of - 0.057. This does not negate what we have said since xo(t) = sin s sin t does not vanish on the boundary of [0, n/2]z or [0, I]z. We have also numerically verified that the function equal to sin 7t 1 sin t z on the rectangular interval 6n/7 ~ t 1 ~ n, 0::;;; t z ~ n, and zero otherwise on T = [0, n]z is a solution of J(x) = 0 where
J(x) =
IT [xi(t) + x~(t)
- 50xZ(t)J di, dt z•
In fact we obtain a numerical value of LcacpeaP = 0.81 X 10- 5 , which is quite respectable as we use a step size of 0" = n/70. Our solution surface contains at most nine nonzero values corresponding to each discrete value of t z . Case B Relaxation: By relaxation we mean a procedure in which we assign initial values to the vector C of (5) and then use (7) to calculate the current value of Cm,k using the eight neighboring points. This topic is dis-
5.2 The Numerical Problem
195
TOF:
200 100
300 C 350 C 375
IMPLICIT REAL*8 (A-H,O-Z) DIMENSION C(75,75),D(75,75) SUM=O.DO S=3.141592654DO/70.DO E1=8.DO/3.DO-8.DO*S*S/9.DO E2=-1.DO/3.DO-2.DO*S*S/9.DO E3=-1.DO/3.DO-S*S/18.DO DO 100 1=1,70 DO 200 J=1,70 X=I*S Y=J*S C(I,J)=DSIN(X)*DSIN(Y) CONTINUE CONTINUE D(1,1)=E1*C(1,1)+E2*C(1,2)+E2*C(2,1)+E3*C(2,2) SUM=D(l,l)*C(l,l)+SUM DO 300 1=2,69 0(1,I)=E2*C(1,I-1)+E1*C(1,I)+E2*C(1,I+1)+ *E3*C(2,I-1)+E2*C(2,1)+E3*C(2,I+1) SUM=D(l,I)*C(l,I)+SUM CONTINUE WRITE (6,350)(D(1,II),II=1,49) FORMAT(T2,10D11.4) WRITE (6,375) FORMAT(’l’) DO 400 K=2,69 XMAX=O.DO XMIN=l.DO D(K,1)=E2*C(K-1,1)+E3*C(K-1,2)+E1*C(K,1)+ *E2*C(K,2)+E2*C(K+1,1)+E3*C(K+1,2) SUM=D(K,l)*C(K,l)+SUM DO 500 J=2,69 0(K,J)=E3*C(K-1,J-1)+E2*C(K-1,J)+E3*C(K-1,J+1)+ *E2*C(K,J-1)+E1*C(K,J)+E2*C(K,J+1)+ *E3*C(K+1,J-1)+E2*C(K+1,J)+E3*C(K+1,J+1) XMAX=DMAX1(x}~X,D(K,J))
500 550 C 600 C 650 400 700
XMIN=DMIN1(XMIN,D(K,J)) CONTINUE WRITE(6,550) (K,XMAX,XMIN,SUM) FORMAT(T2,I2,T6, ’XMAX=’,D19.12,T40, ’XMIN=’,D19.12,T70, *’SUM=’,D19.12) WRITE (6,600)(D(K,JJ) ,JJ=1,49) FORMAT (T2,10D11.4) WRITE (6,650) FORMAT(’l’) CONTINUE WRITE(6,700) SUM FORMAT (T2,’SUM=’,D19.12) STOP END Fig. 2
196
101 100
103 102
500
201 200
400 401
250 301 300 350 150
5
Elliptic Partial Differential Equations
IMPLICIT REAL*8 (A-H,O-Z) DIMENSION C(101,101),F1(101),F2(101),E1(101) ,E2(101) R(X)=2.DO+DCOS(X) P(X)=1.5DO+DCOS(X) S=3.141592654DO/100.DO DO 100 1=1,101 DO 101 J=1,101 C(I,J)=(1.DO-DABS(1.DO-(I-1)/50.DO))* *(1.DO-DABS(1.DO-(J-1)/50.DO)) CONTINUE CONTINUE DO 102 1=1,101 DO 103 J=13,63,25 C(I,J)=4.DO*C(I,J) CONTINUE CONTINUE DO 500 J=1,101 T1=J*S+S/2.DO T2=Tl-S F1(J)=-1.DO/6.DO-R(T1)/6.DO-S*S*P(T1)/18.DO F2(J)=-2.DO/3.DO+(R(T2)+R(T1))/6.DO-S*S*(P(T2)+P(T1))/9.DO E1(J)=1.DO/3.DO-2.DO*R(T1)/3.DO-2.DO*P(T1)*S*S/9.DO E2(J)=4.DO/3.DO+2.DO*(R(T2)+R(T1))/3.DO-4.DO*S*S*(P(T2)+ $P(Tl))/9.DO CONTINUE DO 150 ITER=1,1000 DO 200 1=2,100 DO 201 J=2,100 DUMMY=-(F1(J-1)*C(I-1,J-1)+F2(J)*C(I-1,J)+F1(J)*C(I-1,J+1)+ $E1(J-1)*C(I,J-1)+E1(J)*C(I,J+1)+F1(J-1)*C(I+1,J-1)+ $F2(J)*C(I+1,J)+F1(J)*C(I+1,J+1))/E2(J) C(I,J)=DUMMY CONTINUE CONTINUE M=MOD(ITER,500) IF(M.Ne.O) GO TO 150 WRITE (6,400) FORMAT(T2, ’X-VALUE’,T25, ’Y-VALUE’,T45,’TRUE-VALUE’,T70, $’ALGO-VALUE’,T95, ’DIFFERENCE’) WRITE(6,401)(C(51,51)) FORMAT(T2, ’UNNORMALIZED VALUE OF C(51,51)=’ ,D16.8) DO 300 1=2,101,15 DO 301 J=2,101,15 X=(I-l)*S Y=(J-1)*S Z=DSIN(X)*DSIN(Y) W=C(I,J)/C(51,51) D=W-Z WRITE(6,250) (X,Y,Z,W,D) FORMAT(T2,F16.8,T25,F16.8,T45,F16.8,T70,F16.8, *T95,DI6.8) CONTINUE CONTINUE WRITE(6,350) FORMAT(’l’) CONTINUE STOP END Fig. 3
5.3 Separation of Variables
197
cussed in detail in Forsythe [9, Chapters 21 and 22]. One such pass with m, k = 1, ... , N is called an iteration. For the problem described above with solution XO(tI, t 2) = sin t I sin t 2 in [0, n J2 with step size (J = n/50, we obtain a maximum error less than 0.2 x 10- 3 after 500 iterations and less than 0.25 x 10- 4 after 1000 iterations. The median error is an order of magnitude better. The initial values at cij were chosen so that x o(n/2, n/2) = 1, Xo = 0 on T*, and bilinear otherwise. The calculations were performed in single precision and took approximately two minutes of computer terminal time. (We have no method of obtaining accurate computer timing.) Our relaxation results were not as good when we changed the coefficient functions to nonconstant values. Thus, for the equation 2
(9a)
ax +;;a ((2 + cos t 2 ) ;;ax) + (3 + 2 cos t 2 )x = at; ut ut
--;z
2
0
2
or the associated quadratic form (9b)
J(x)
=
f:f: [xi + - (3
(2
+ cos t2)X~
+ 2cost 2)X2(t)Jdtldt 2,
we note that xo(t) = sin t I sin t 2 is a solution to (9a) vanishing on the boundary of T = [0, t: J2’ In this case with (J = n/50, we obtain a maximum error of 0.65 x 10- 2 after 500 iterations, with little improvement after 1000 iterations. With (J = n/1oo and 2000 iterations, we obtained a maximum error less than 0.35 x 10- 2 For completeness, we include in Fig. 3 the computer program listing of this example (with 1000 iterations). Finally, for J(x) defined by either (3) and (9) we observed phenomena in our relaxation methods consistent with the theory (see Forsythe [9J and in particular p. 240). They are also similar to the numerical results described after (14) of Section 3.2. If our interval is [0,2.5J2’ then D(a) is positive de› finite and our relaxation method drives the solution of D((J)C = 0 toward the zero solution (very slowly). Ifour interval was [0, 3.5J2’ D((J) has negative eigenvalues and our computations rapidly diverged. The relaxation method works only when D((J) has a zero eigenvalue or equivalently the first con› jugate surface.
5.3
Separation of Variables
The purpose of this very brief section is to consider the topic of separation of variables. However, instead of dealing with the partial differential equation as is usually done, we shall deal with the quadratic form. We remind the
198
5 Elliptic Partial Differential Equations
reader that we have performed separation of variables for the partial dif› ferential equation in Sections 3.4 and 5.1. We shall also describe some ideas of "separating" a block tridiagonal matrix into sums of "products" of tri› diagonal matrices. We have indicated in Section 5.2 how unsatisfactory these block tridiagonal matrices are. Our purpose in this section is not to obtain some deep theory, but merely to cast some light on these topics. We begin with the major example of Section 5.2, namely, the quadratic form J(x) and the partial differential equation L(x), where (1)
J(x) =
f:f:[(~;Y
+
G:Y -
2x(s, t)]dSdt
and
iPx
(2)
L(x) = as 2
a2 x + at 2 + 2x(s, t) =
for x(s,t) defined on [O,b]2’ We have shown in Section 5.2 that if we assume x(s, t) = S(s)T(t), we obtain equations of the form
S"(S) T"(t) S(s) + 2 = - T(t) = A or T"(t) + AT(t) = 0 and S"(S) + (2 - A)S(s) = 0. If we ignore these results and consider only the quadratic form given by (1) directly, then the above substitution for x(s, t) = S(s)T(t) in (1) yields
(3)
J(x)
f:f: {[S’(s)T(t)y+ [S(s)T(t)y - 2S2(s)T2(t)}dsdt = f: T 2(t)dt f: [S /2(S) + (- 2 + A)S2(S)]ds + f: S2(S) ds f: [T 2 AT dt. =
(t ) -
2(t)]
We shall describe how A is determined in several cases. Clearly 0 < A and 0<2 - A; otherwise there are no conjugate solutions (see (4) below). We now assume that we wish to find a numerical solution of (2) vanishing on the boundary of a rectangle R, but that we are not able to find a closed› form extremal solution to either (4a)
or (4b)
5.3 Separation of Variables
199
In the above, R is defined by 0::;; S::;; b 1 , 0::;; t s; b2 If b 1 = b z, then -A = 2 - Aor A = 1. We can verify numerically as in Section 3.2 that the smallest b such that the boundary R* of R = [0, b]2 is a conjugate surface occurs when b = n. Similarly, we could obtain a second conjugate surface when b = 2n, etc. For a rectangle such as b z = 2b 1 (for example), we still have the condition A> from (4b) and 2 - A > from (4a). Hence let 0::;; A::;; 2. As Aincreases in this closed interval, the quadratic form J iT) becomes more negative. Equivalently, the conjugate (or oscillation) point of J z(T) decreases con› tinuously from infinity when A = to its minimum value when A= 2. Similarly, as Aincreases, the conjugate point of J I(S) increases continuously from its minimum value when A = to infinity when A = 2. Thus, there is a value Ao < Ao < 2, such that the line b2 = 2b1 is crossed. These ideas can be continued to a more general quadratic form such as (5)
J(x) =
r- Jrb2 [(OX)2 Jal !X1(S, r) os + !Xz(s, r) (OX)2 ot a 2
2P(s, t)x Z(s, t)] ds dt,
where !X1(s, t) = r1(s)P1(t), !Xz(s, t) = rz(s)P2(t), P(s, t) = r3(s)P3(t), where r1(s) > 0, r2(s) > 0, P1(t) > 0, pz(t) > 0, and r3(s) = cr2(S) for some constant c. Letting
x(s, t) = S(s)T(t), we obtain a separation such as J(x) =
(f:’2 P1(t)TZ(t)dt) (f~1 r-. (S)Sf2(S) + (L~l
r 2(S)SZ(S)dS)(
Ar2(s)S2(s)] dS)
S:’2 {P2(t)T’2(t)+ [APl(t) -
2CP3(t)] TZ(t)}dt)
yielding (6a)
and (6b)
f2 J iT) = f:: {PZ(t)T’2(t)+ [API (z) - 2CP3(t)] TZ(t)}dt
similar to (4b) and (4a), respectively. The same reasoning as above will yield (numerical) solutions in this case, as we assumed no closed form solutions to our elementary problem. As in Section 3.4, our problem becomes D - AE, where D and E need be computed once for each step size a. Our second topic of factoring block tridiagonal matrices is rather fanciful. As this is written, nothing of substance has been done, but the idea seems to be interesting although somewhat speculative. The plan is to take the block tridiagonal matrix in the form of Eq. (5) of Section 5.2, define the elements R~i}' and P ai] of Eq. (4d) of Section 5.2, separate variables as we have
R;5,
200
5
Elliptic Partial Differential Equations
in (5) above, and factor R~i~ into the product rt(si)pt(t j ) as we have done with r) in the last paragraph. Similarly, we may factor R;fjand Pai]> We now have the numerical approximation of J i (S) and J z(T) as given in (6). More precisely, we have the numerical eigenvalue problem of Section 3.4. The cor› rect value of;’ may be found as we have indicated in the previous paragraph. For example, if D(a) is the block tridiagonal matrix associated with (1) whenb = 2n, we can obtain no solutions of D(a)C = Obyrelaxationmethods. However, we can "factor" D(a) into tridiagonal matrices Ds(a) and Dt(a), which are incidently equal in this case. IfD = {di } and E = {ej} are respec› tively the Euler-Lagrange solutions of Ds(a)D and Dt(a)E, then C = {d;ej} is the numerical solution of D(a)C = O. Clt(S,
Chapter 6
The Quadratic Control Problem
6.0 Introduction In the study offocal points for differential equations such as in Section 2.3 and the fact that n(O’o) = 0, except at discrete points, lead to the equalities (1)
s(O’o)::::; s(O’) ::::; s(O") + n(O")::::; s(O’o) + n(O’o)
and the fact that n(O’0) = 0, except at discrete points, lead to the equalities s(O’) = s(O’o) and n(O’) = 0 on open subsets of L. When 0’ is replaced by the resolvent parameter A in [a, b] for example, we obtain a conjugate-point theory with the main result that s(A - 0) = s(A) and (2)
s(Ao) =
L n(A). ,1.<,1.0
We shall refer to this discrete property as the disjoint hypothesis condition since it appears because of the disjoint hypothesis: if x is in £’0(0" d 11 £’0(0’2) for 0’1 -=1= 0"2’ then x = O. This is the property of normality in differential equations, which appears in Theorem 16 of Section 2.3. This property translates into the known result that because of uniqueness of solutions xo(t) of L(x) = 0, if xo(t) = on a set S with an accumulation point, then xo(t) == O. In quadratic control problems this concept of normality or disjoint hypothesis is no longer true. Inequality (1) will still hold but condition (2) will not, since n(A.) may be positive on a nontrivial interval and s(Ao) is finite. In place ofthis conjugate-point theory, which satisfies the disjoint hypothesis, we obtain a conjugate-interval theory or focal-interval theory. Thus a non› trivial vector x may be in £’0(A.) for A. in a nontrivial closed interval [J.l,)’2] 201
202
6 The Quadratic Control Problem
of [a, b]. Furthermore, s(2) now counts focal intervals and not focal points. More precisely, s(2) counts the number of right endpoints .1 2 of a closed interval [.1 1,.1 2 ]. In this sense focal points are degenerate focal intervals. Hence we have a generalized theory of oscillation for differential equations. Thus we must return to (1) and begin again. This is done in Section 6.1. While this return requires more effort, it is also quite satisfying in that it illustrates that our approach to differential equations is quite sound. While the number of research papers on normal oscillation problems seems to grow without bound, those methods are not able to cope with this more general oscillation phenomena. In addition, it is pleasing to get such nice results from such simple inequalities as (1). Unlike the situation in Chapter 3, where we were primarily interested in describing behavior and examples of differential equations, we shall expend a great deal of effort trying to make sense out of inequality (1) when the disjoint hypothesis does not hold. Section 6.1 contains the signature theory for quadratic control problems. We shall begin with Theorem 15 of Section 2.3 and develop a theory in which Eq. (10)of Section 2.3 holds, but the disjoint hypothesis does not hold. In Section 6.2 we shall interpret these signature results to solutions of differential equations. In particular, we shall relate the focal point or focal intervals of quadratic forms in Section 6.1 to solutions of control differential equation problems. Section 6.2 also contains further ideas of Mikami found in his dissertation and Pacific Journal ofMathematics paper [37, 38]. Observe that even today, this work seems to be at the forefront of current efforts in quadratic control theory. Section 6.3 contains two examples of the theory of Sections 6.1 and 6.2. Finally in Section 6.4, we show that a second "iteration" of our approximation ideas leads to an approximation or pertur› bation theory for the results of Sections 6.1 and 6.2.
6.1 Focal-Interval Theory of Quadratic Forms The main purpose of this section is to give the approximation theory, which holds for quadratic control problems, and to explain the difference between the focal-point theory (normality) encountered in earlier chapters and the focal-interval theory (abnormality) of this chapter in terms of the approximation theory. For completeness, we note Theorems 2 and 3 of Section 4.3, which are the expected comparison results. They follow as corollaries from our signature theory. We leave to the reader the task of interpreting these results for focal intervals in this section and then for solutions of control equations in Section 6.2. For convenience we repeat some concepts and results from Section 2.3. Let a, b be real numbers (a < b), and define A = [a,b]. Let {.Yt’(2)IA in A}
6.1 Focal-Interval Theory of Quadratic Forms
203
be a one-parameter family of closed subspaces of £’ such that £’(a) = 0, £’(b) = se, and £’(..1.1) c £’(..1.2), whenever ..1.1’ . 1. 2 in A, ..1.1 < . 1. 2 In this chapter we shall require that one or both parts of the additional hypothesis is satisfied: £’(..1.0) =
(la)
n
£’(..1.)
).o<).sb
whenever a ::; . 1. 0 < b, and (lb) whenever a < . 1. 0 ::; b. Let J(x) be an elliptic quadratic form on d and J(x; A) denote the restriction of J(x) to £’(..1.). Let 8(..1.) and n(A) denote, respectively, the signature and nullity of J(x; A) on £’(..1.). In Theorem 12 of Section 2.3 we show that 8(..1.) and m(A) = s(A) + n(A) are nondecreasing in A. A point A at which f(A) = s(A+ 0) - s(A- 0) =I- 0 is called a focal point. Theorem 15 of Section 2.3 is our main result to continue our study of control theory or focal-interval problems. Theorem 2 of Section 3.1 sum› marizes the results of Section 2.3 when the disjoint hypotheses hold in the focal-point case. As we shall see, the disjoint theory and results may hold on subintervals of [a, b]. Both theorems are now restated for convenience. Let . 1. 0 in A be given. Then there exists () > 0 such that Ain . 1. 01 < () imply
Theorem 1
A and
1..1. -
s(Ao) ::; s(A) ::; s(A) + n(A) ::; s(Ao) + n(Ao).
(2)
In addition we have, for such A, (3) (4a)
n(Ao) = 0
implies s(A) = s(Ao)
and
n(A) = 0,
implies s(A) = s(Ao)
and
m().) = m(Ao).
n(A) ::; n(Ao),
and (4b)
n(A) = n(Ao)
Theorem 2 If £’0(..1.1) n £’0(..1.2) = 0 when ..1.1 =I- . 1. 2, then f(a) = 0 and f(A) = n(A) on a ::; A::; b. Thus if . 1. 0 in A the following quantities are equal: (5a)
the number of focal points on a::; A < . 1. 0,
(5b)
the signature s(Ao) of J(x) on £’(..1.0),
(5c)
the sum Las).<;,o n(A), and
204
6 The Quadratic Control Problem
(5d)
the sum 2Is(AI + 0) - s(AdJ taken over all Al such that a :::;; Ai < and S(A) discontinuous at )’i’
)’0
We now consider the situation in which the disjoint hypothesis does not hold. In this case, we get the results in Theorems 5 and 7 and the description offocal intervals described below. Both types of behavior may occur together on A = [a, b], as may be seen in the examples of Section 6.3. Let £’(A)J denote the set {xIJ(x,y) = 0 for all y in £’(),)). If in addition x is in £’(),), then as above, x is in £’o(A)= £’(A) 1\ £’()y Lemma 3 If )’1 < ),z and x#-O in £’0(A1) n £’o(Az),then x in .1t’o(A)for Al :::;; A:::;; AZ’ For I: in A1 s A:::;; Az we have .1t’(A1) c £’(1:) and .1t’(A.z)J c .1t’(I)J. Thus .1t’o().d n .1t’o(),z) = .1t’(Ad n £’(Az)J c .1t’(I:) n .1t’(l:l = .1t’o(l:l• Lemma 4 If x#-O in £’o(A),A1 < A < AZ’then x is in .1t’(A),AI:::;; A s Az. Clearly x is in £’(Az). If x is not in £’()’z)J, there exists y in .1t’(AZ) such that J(x, y) = 1. We may choose sequences {Jln} and {Yn} such that Yn in £’(Jln), Yn ~ Y as Jln /’ }’z• Thus 1 = J(x,y) = limn=cx,J(x,Yn) = O. Thus x is in £’o(Az). Conversely in satisfies A1 < I: < AZ’ then x is in £’(I:)J c .1t’(A1)J. But x is in <;.£’(A) = £’(Ad. Thus x is in .1t’0()’1)’
n;’l
Theorem 5 If x#-O in £’0(A1) 1\ .1t’O(Az), )’1 #- AZ’then there exists a closed interval Al = {A IA’:::;; A:::;; A"}of A such that {AIx in £’o(A)}= A 1 Let A’ = glb{Alxin .1t’0().)} and A" = lub{Alxin £’o(A)}. By Lemma 3, x is in £’o(A),A’ < A < A". The theorem now follows by Lemma 4. Theorem 6 restates inequalities involving the indices S(A) and n(A), which characterizes focal-interval problems. Theorem 6 Let Ao be given such that a < inequalities hold: (6a) (6b) (6c)
S(AO - 0) = S(Ao), n(Ao) ;;::: n(Ao - 0),
},o
< b. Then the following
S(Ao + 0) ;;::: S(Ao), n(Ao) ;;::: n().o
+ 0),
S(Ao + 0) - S(Ao) = n(Ao) - n(Ao + 0) ;;::: O.
Condition (6a) follows since S(AO + 0) ;;::: s(},o) holds by monotonicity, and S(Ao - 0) :::;; S(Ao) :::;; S(Ao - 0) holds by monotonicity and (2),respectively. If, for example, for (6b), n(),o) < n(Ao + 0), then by (2)with s(A.o) :::;; S(A + 0) we have S(Ao) + n(Ao) < S(Ao + 0) + n(Ao + 0) :::;; S(Ao) + n(Ao). The last inequal› ity holds by (2). Since this result is impossible, (6b) holds. Finally for (6c) we
6.1
Focal-Interval Theory of Quadratic Forms
205
have the inequalities s(Xo) + n(Xo) =::; S(Ao + 0) + n(Xo + 0) =::; S(Ao) + n(Ao) holding by monotonicity and (2) respectively. We shall now describe when S(A) changes. This occurs when we "lose" null vectors. To recap our results from the viewpoint of the collection {sU,)I), in A} and the losing of null vectors we have Theorem 7. Theorem 7 The nonnegative integer valued function S(A) is nondecreasing. It is continuous from the left and has a right-hand jump fU,o) = n(Ao) › n(),o + 0) ~ O.
We now observe that focal-point theory (normality) is a special case of the focal-interval theory since in the former n(Xo + 0) = 0 so that f(Ao) = n(Ao). In a formal mathematical sense Theorem 2 is a corollary of Theorem 6. Our approach to develop the focal-point theory first is more in keeping with expository problems and the relative use of these results. For expository purposes, we now summarize the two types of phenomena described above. The least general phenomenon is the focal-point phenomenon or normal› ity phenomenon. We obtain the disjoint hypothesis condition when a vector x =I- 0 cannot belong to .Yt’o(Adn .Yt’0()’2}if Al =I- A2. Thus if Al = {A in AI n(A) =I- O}, then A, is a finite subset of A and Theorem 2 holds. This is the phenomenon observed in (normal) ordinary differential equations. More generally, if the disjoint hypothesis does not hold, then a vector x =I- 0 may belong to .Yt’o()’).From Theorem 5,), is in A., which is a nontrivial closed subinterval of A. We refer to this situation as the focal-interval phenom› enon. It is characterized by Theorems 5, 6, and 7. We now consider what is really happening. From condition (4b) we need only consider points Ao at which n(Ao) is not constant in an open neighborhood of )’0’ From (6b) we have n(l,o) ~ n().) in this interval. Ifn(),o - 0) < n(Ao), then we have acquired at least one null vector, xo(t) at ), = Ao, but the signature does not change because of this acquisition, since s(),o) - s(), - 0) = O. To see this fact let )’1 satisfy a =::; Al < AO’ Then .Yt’(Adc .Yt’(Ao) implies .Yt’()’o)Jc .Yt’(Al)J, so that if xo(t) is in .Yt’o(AO} and also in .Yt’0(A1)J c .Yt’().o}J, then xo is not in .Yt’(Ad.Roughly speaking, xo(t} is a solution to L(x} = 0 on [a, b], but it does not vanish until it arrives at t = )’0’ On the other hand, if n(),o + 0) < n(),o}, then we have converted a null vector xo(t) at A = Ao into a vector that is not null for A> Ao. In this case, s(A + 0) - s(),} = n(Ao} - n(A + 0) by (6c) and we shall acquire a negative vector after we leave A = )’0’ Several examples of negative vectors have been given above for the focal-point case; see Sections 3.1 and 3.2. However, this situation can be pictured more clearly in the following way. The vector xo(t) is in .Yt’(Ao)and hence in .Yt’(A2}, where AO < A2 =::; b. It is in .Yt’(Ao)Jbut not
206
6 The Quadratic Control Problem
in .;.r"’(A.z)J. Roughly speaking, xo(t) is a solution to L(x) = 0 on [a, Ao], van› ishing at Ao, and is identically zero on [Ao,b]. We have seen in Section 1.4 that because of the boundary terms involved in integration by parts, J(xo, y) ::/= 0 for all y in £’(Az), although J(xo) = O. In the sense of Theorem 12 of Section 2.3, Xo contributes to S(Az) but is not in £’o(Az).Equally, it cannot be used to enlarge the dimension of a negative space of J(x). For example, if S(Az) = 1 and x 1(t) is in £’(Az) with J(Xl) < 0, then J(xo) = 0 as we discussed above, but some linear combination y(t) = Axo(t) + BXl(t) satisfies J(y) > O. We continue our study of focal-interval theory by defining focal intervals. For this purpose we assume Al < Az < ... < Ap are the distinct focal points on A, where e, will denote the order of Ai as a focal point. The space £’o(A) will be called the J-null vectors of £’(A). Let E, denote the J-nun vectors of £’(Ai) which are not J-null vectors of £’(A),A> Ai’The following procedure is given to make "focal intervals" well defined. To obtain the e 1 focal intervals 11 , , Ie, associated with (which end at) Al we proceed as follows: choose 1 1 = [~11' Al]’ where ~ 11 = min {A. :s;; AlIx in £’O(A) for some x ::/= 0 in E 1}’Let x 11 be the vector giving ~ 11’ Choose I z = [~Zl,Al]' where ~Zl = min{A:S;; AljX in £’o(A) for some x::/= 0 in E 1 and (X,Xll) = O}. Let XZl be the vector giving ~Zl"'" Choose lei = [~ell> Al]’ where ~el1 =min{A:S;;A1Ix in Jlt’o(A) for some x::/=O in E 1 and (x, Xkl)=O; then Jlt’0(A1 ) is the direct k = 1, ... , el - 1}. IfXe,l is the vector giving ~ell' sum of the span of {X11,XZl>’" ,Xe,l} and Jlt’OP’l)(\ Jlt’OCAl + 0). Note that this construction "defines" focal intervals. Thus ~11 is the smallest value ,1.0 of Ain A for which n(Ao) > n(Ao - 0) as expressed in (6b), where the J-null vector gained at ,1.0 is lost at A1 There may be a value of A < AO such that a J-null vector x 1 (t) was gained at A, but it does not belong to E 1 It is a J-null vector for £’(Ao + 8) for some 8 > O. With obvious modifications, we define the ez focal intervals I e 1 + 1 , . , Ie, +e2 associated with AZ; ... ; the ep focal intervals I k , , II associated with Ap (k = e 1 + e z + ... + e p-l + 1; 1= e 1 + ez + ... + ep ) . Suppose for example we have a graph of Aversus S(A) as in Fig. 1. Suppose also that 11 = [~l1,Al]' I z = [~21,Al]' 13 = [~lZ,Az], 14 = [~Z2,Az], and 15 is yet to be specified, although it exists, with Az as its right-hand end point, since S(Az + 0) = 5. The left-hand end point of I 5, ~3Z occurs in the interval [~zz,)'z]. If ~3Z = ~zz, then two null vectors began at A = ~22' If ~3Z = Az, have a focal-point phenomenon and the focal interval shrinks to a single point. Finally, the signature S(A) does not tell us if other null vectors have appeared. It does tell us that no other null vectors have disappeared before A = b. Let X, X’ be in A with a :s;; X < A" :s;; b. We denote the number of focal intervals on (A’,A") by f(X, A"), the number of focal intervals on (a, A") con› taining the point A’ by g(X A"), and the dimension of the space Yl’o(X)e Yl’o(A’) r. Jlt’o(X’) by rn(A’, X’).
6.1 Focal-Interval Theory of Quadratic Forms
207
2
,
I
I
I
I
I
a
~lZ
e-ll
~Zl
A,
~zz
...
I
Az
A
b
-1,"-Iz-
13
0(
~
-14 Fig. 1
We now relate f(A’,A") to the signature function as we have done for conjugate-point theory. In that case, f(X,A") is the number of conjugate points on the open interval (A’,A"). Hence s(A") - s(X) =
I
n(A)-
J.<J."
I J.<J.’
n(A) =
I
n(A),
J.’:s;J.<J."
so that (7)
f(A",X) =
I
n(A) = S(A") - s(X) - n(X).
J.’< J.<J."
Equivalently f(A",X) = S(A" - 0) - s(X + 0) = s(A") - [s(A’) + n(X)], and we get the above result. The first equality holds since (A’,A") is an open inter› val. The second equality holds since S(A) is continuous from the left and by Theorem 2. In the focal-interval case, we begin with an apparent complicated se› quence of lemmas. However, out final results in Theorems 15 and 16 are relatively uncomplicated, and stated so that (6) generalizes in an agreeable way.
Lemma 8 If X, A" in A, a s; X < A" < b, then f(a, A") = f(a, X) + f(X, A") + g(X, A"). I is a focal interval on (a, A") if and only if it satisfies one of the following three mutually exclusive descriptions: I is a focal interval on (a,X); I is a focal interval on (X,A"); or I is a focal interval on (a, A") containing X.
Lemma 9 If A in A, then f(a, A) = s().). The result follows by the definition of I k
Lemma 10 If a ::;; X < A" < b, then g(X, A") = rn(X, A").
208
6 The Quadratic Control Problem
The number of focal intervals on (a, A") containing X is equal to the num› ber of null vectors of JIl’(X) not contained in JIl’(A"). The next result now follows immediately from the above lemmas. Theorem 11 If A’,A" in A, a
s
X’ < b, then
f(X, A") = S(A") - s(X) - rn(A’,A").
(8)
We note that Hestenes, in (25) of Section 2.2, has given the following result. = s(!!J) + s(!!JJ) + rn(!!J), where ss:’ are the J-orthogonal vectors of!!J, rn(!!J) is the dimension of!!J o ,rJ8 o n sf o› Thus the number of focal points on (A.’ 1-") is equal to s(!!JJ). Let m(A’ A")denote the dimension of the space JIl’o(A’) n JIl’o(A"). Noting that n(X) = rn(X),") + m(A’, X’), we have our desired extension of (7). 1f!!J is a closed subspace of sf, then s(sf)
Corollary 12 (9)
6.2
If X, )." in A, a
~
e
),’ < A" < b, then
f(A’,;’’’)= s(A") - [s(A.’) + n(X)]
+ m(X,A").
Focal Arcs of Differential Equations
The purpose of this section is to relate the ideas and results in the previous section to the theory of solutions of systems of differential equations, which are the Euler-Lagrange equations for quadratic control problems. Many details for this type of problem are found in Mikami [37, 38J and Hazzard [25]’ We shall also consider further ideas of our theory. We shall see that it is possible to obtain simultaneously a focal-point phenomenon and a focal› interval phenomenon. The former should be better understood by the reader because of our earlier theory for ordinary differential equations, as in Chapter 3. The focal-interval phenomenon is more difficult to understand and derive. However, we are able to achieve quite satisfactory results, even for this phenomenon, in relating S(A) to solutions of control differential equations with transversality and boundary conditions. An element x of sf is an arc x.x’(z), uk(t),
a ~ t ~ b,
k = 1, ... ,q,
i = 1" .. .n,
where x’(r) and uk(t) are Lebesgue square integrable functions, The subspace !!J of sf will denote all arcs that also satisfy (1)
x = Ax + Bu
and
C*x(a)
= O.
Finally C(} will denote all arcs x in !!J that also satisfy x(b) = O. The quadratic form J(x) = x*(a)Dx(a)
+
L b
2w(t, x, u) dt
6.2
Focal Arcs of Differential Equations
209
is assumed elliptic relative to the inner product (x, y) = x*(a)y(a)
+
S: (y*x + u*u)dt,
where x:x(t), u(t);
y: y(t), u(t);
and 2w(t,x,u) = x*Px
+ x*Qu + u*Q*x + u*Ru.
In the above let the asterisk (*) denote the transpose of a matrix. The matrices A, B, C, and D are respectively n x n, n x q, n x r, and n x n con› stant real matrices where the rank of C is r :-s;; n. P(t) and Q(t) are n x nand n x q Lebesgue square integrable matrices on [a, bJ with P(t) = P*(t) and R(t) = R*(t) is a q x q essentially bounded and Lebesgue integrable matrix on [a, bJ satisfying R(t) ~ eI almost everywhere for some s > 0. The ellip› ticity of J is a consequence of the fact that R is positive definite in this sense. For each 2 in [a, bJ, let ~(2) be given by ~(),) = {x in ~ Ix(t) = 0, u(t) = a.e. on), :-s;; t :-s;; b}. Let s(),) and n(1) denote the signature and nullity of J(x) on ~(),). We note that 2 1 < 2 2 implies ~(21) c ~(22) and that the resolvent hypothesis holds with ~ and ~(1) replacing d and £(2), respectively, in Section 6.1. Thus all inequalities, descriptions, and results from Section 6.1 hold in this section. Later in this section we shall add the condition that Mx + Nu = 0. We now consider focal intervals in the setting of this example. The dif› ferential equations and boundary conditions associated with extremal solu› tions and their correspondence with the null vectors and focal intervals of Section 6.1 are of special interest. We note that the notion of focal intervals in an optimal control setting was omitted in Hazard [25]. This reference defines focal intervals for the problem of Bolza in the calculus of variations setting in terms of differential equations and "side" conditions. An arc x in f!4 n ~J is called a focal arc. From Mikami [38J we have: x is a focal arc if and only if there exists an adjoint vector p = (p1, ... ,pn)* and constant multipliers v = (v, ... ,v r )* such that the vector xp: x(t), u(t), p(t)
satisfies the Euler-Lagrange equations
(2)
x=
Ax + Bu,
p + A*p = wx ,
the end conditions C*x(a) = 0, and the transoersality condition p(a) = Dx(a) + Cu. In the above, pi is absolutely continuous and pi is Lebesgue square integrable on the interval [a, b]. Note that ~(2) c ~ and hence ~J c ~(2)J for all 2 in [a, b]. Thus a focal arc is "almost in" ee o(),) = ee(2) n ee(;1Y in that it satisfies the Euler-Lagrange
210
6 The Quadratic Control Problem
equations (2) along with the boundary conditions and the end conditions. It remains only to check that it is in ~(),). By extension of the interval I we mean an interval l’of [a, b] such that I c 1’. This definition was also used by Hazard to define focal intervals in her setting. In the next three theorems we relate the indices such as s(),) to focal intervals of control theory.
Theorem 1 If x i= 0 is in ~ o(),) for), in I = [X, ),"] and not in ~ o(),) for }, in an extension of I, there exists a focal arc y that is identically zero on I and not identically zero on an extension of I, such that x(t) = y(t) on [a, )’’’] and x(t) = 0 on [X,b]. This follows since, if x in ~ o(),"), it coincides with a focal arc y on the interval [a, )’’’]’T he arc y has the properties of the theorem. The converse is not true. For if y is a focal arc described in Theorem 1, there may exist a focal arc z that is zero on [a, )’’’] and satisfies z(t) = y(t) on [A",J:] for )," < J .:s; b. The focal arc Y1 = Y - z "extends" the interval on which x is a null vector. That is, x is in ~ o(),) for), in [},’,J]. This discussion partially motivates the following definition. Examples of these phenomena are given in Section 6.3. Let a < },’ .:s; )," .:s; b. A focal arc y is called a maximumfocal arc associated with an interval I = [X,}c"] if (i) y is identically zero on I, (ii) y is not iden› tically zero on an extension of I, and (iii) there exists no focal arc Yl having Yl(t) = y(t) on an extension of I to the left and Yl(t) = 0 on an extension of I to the right. The condition that (iv) there exists no focal arc yz having yz(t) = y(t} on an extension of I to the right and Yz(t}= 0 on an extension of I to the left is necessary to our definition. However, it is redundant, for if yz satisfies (iv), the arc Yl = Y - yz satisfies (iii), and the converse is also true. The interval I = [A’,),"] will be called a maximum focal interval if there exists a maximum focal arc associated with I. The order of the maximum focal interval is the number n of linearly independent maximal focal arcs Yl, ... , Yn in a maximum set, every proper linear combination corresponding to I.
Theorem 2 Let x = Xkl in ~ o()’), X .:s; ), .:s; s", be a null vector associated with a focal interval 1= I kl = [X,),"] described in Section 6.1. Then there corresponds a unique maximal focal arc Yassociated with I such that x(t) = y(t) on [a,}."] and x(t) = 0 on [A’,b]. Let XI> ... , X m be a maximum set of m linearly independent arcs asso› ciated with a focal interval 1= I k1 described in Section 6.1. By Theorem 1 there exist focal arcs YI> ... , Ym vanishing on I and no extension of I such that Xi(t) = Yi(t) for t in [a, ),"]; furthermore Xi(t) = 0 on [X, b]. Ifc9’)t) = 0 (j = 1, ... ,m;j summed) on a.:s; t s; b, then r:LjXj(t) = 0 on a.:s; t < b so that
6.2
Focal Arcs of Differential Equations
211
aj = 0 for j = 1, ... , m. Thus if I is a maximal focal interval of order n, we have n ~ m. Conversely, if n > m, there exist m + 1 linearly independent maximal focal arcs Yl, ... ,Ym+l associated with I. Let x;(t) = Yi(t) on [a,A"] and Xi(t) = 0 on [A’,b] for i = 1, ... , m + 1. Let a l , . . . , a m + l be real and chosen such that ajaj # 0 and ajxj(t) = 0 on [a, b] for j = 1, ... ,m + 1; j summed. Then ajy)t) = 0 for t in [a, A"], which contradicts the requirement that every proper linear combination vanish only on I. Corollary 3 If A in [a, b], then s(A) is equal to the number of maximal focal intervals on the open interval (a, A). This result follows immediately from Lemma 9 of Section 6.1. Mikami [37] has given the first part of the following theorem. Theorem 4 If the matrices A and B above are analytic in [a, b] and if xo(t) is a focal arc such that xo(t) = 0 on a subinterval [a’, b’] of [a, b] with a ’ < b’, then xo(t) == 0, uo(t) == 0 on [a, b]. Thus under this hypothesis, the focal intervals or maximal focal intervals referred to in Theorems 1, 2, and 3 are focal points. Further the disjoint hypothesis holds. We should like to describe in more detail the picture of the situation described in Theorems 1, 2, and 3. It is not obvious that this will clarify this difficult concept, but we will try. The examples in Section 6.3 should further clarify these concepts. The reader may prefer to begin with Section 6.3 and to skip the next paragraph. Let us assume a s; ..1.1 < A2 ::::; b. Then <;q(Al) C <;q(A2) c <;q c f!.8 imply f!.8J c <;qJ C <;q(A 2)J C <;q(Al)J. The hypothesis of Theorem 1 implies that if A" = b, then x is a focal arc on [a, A"]; otherwise, assume A" < A: ::::; b. Since x is in <;q0(..1.") but not in <;q0(A:) = <;q(A:) n <;q(A:V, we have x is not in <;q(A:)J. That is, there has appeared a vector w(t) in <;q(A:) but not in <;q(A") such that J(x, w) # O. We note that w(A:) # 0 for A: - A sufficiently small and positive. Similarly, if a ::::; A < A’, then x is in <;q 0(A’) and not in <;q o(A). Since x is in <;q(A’)J c <;q(A)J, X is not in <;q(A). That is, x(t) = 0 for A’ - A sufficiently small and positive. Furthermore, if y(t) = 0 on an interval [A’,A:], then Y is in <;q(A:)J by evaluating the integral for J(y, u) over [a,A"] and then over [A",A:]. The remaining comments about "converse" and "maximum focal arcs" may now be clearer, and we leave the picture to the reader. We remark once again that Corollary 3 shows that the maximal focal interval is the generalization of focal points. We note that the most complete current work on quadratic control problems is probably found in Mikami [37], which has been summarized in [38]. The former includes many concepts we have discussed in earlier sections, such as ellipticity conditions, comparison theorems, and eigenvalue
212
6 The Quadratic Control Problem
problems applied to control theory problems. This reference, which we recommend to the reader, also includes concepts of conjugate base and the order of abnormality. To complete this section we have added some ideas of this reference. Theorem 6 below is of special interest, since it (or an apparently less general form) is the starting point for our next stage of approximation results, as in Section 6.4. This result is given in Mikami [37J, and we quote rather freely since we cannot improve the presentation. Some of this material has been stated above, but we restate it for convenience and completeness. Our presentation is "backward" in that Theorem 5 is a corollary of Theorem 6. Our rationale is to present this useful corollary and then summarize the main theorem and its proof for completeness. We note that our notation has been altered slightly from the earlier part of this section. Our fundamental quadratic form is (3)
J(x)
= b*Fb +
l
t1
to
2w(t, x, u) dt,
where we define 2w(t, x, u)
= x* P(t)x + x*Q(t)u + u*Q*(t)x + u*R(t)u,
relative to a one-parameter family 0’(..1.) (to ~ A ~ t 1) of subspaces of arcs satisfying the linear control equation Ax + Bu
.x =
(4)
a linear constraint equation
(5)
Mx
+ Nu = 0
and the boundary conditions x(tO) = Cob,
(6) (7)
x(t)
= 0,
u(t)=O
(},~t~t1).
The matrices A, P = P*, and Q are square integrable. B, M, N, and R = R* are essentially bounded and measurable, and Co, D and, F = F* are constant matrices. The matrix N in addition is assumed to have the inverse of NN* existing and essentially bounded. Stated in another way, there exists a positive number h such that at almost all t on to ~ t ~ t 1, (8a)
n*N*(t)N(t)n ~ hn*n
for every n in IRq. Furthermore, we assume that there exist positive numbers h o, h 1 such that at almost all points on to ~ t ~ r’, (8b)
for all n in IRq.
n*R(t)n
+ h 1n*N*(t)N(t)n ~
hon*n
6.2
Focal Arcs of Differential Equations
213
This apparently more general problem is really equivalent to that given earlier in this section because of the assumptions on the matrix N. Of course the current formulation, if required, is easier to apply. Mikami’s fundamental Hilbert space will be denoted by the symbol Ye. This Hilbert space will be the underlying Hilbert space in the rest of this section. An element x of Ye is the system of vectors x:xi(t),b C1, tI’(t)
(to ~ t ~ t 1 )
(i = 1, ... ,n;
(J = 1, ... ,r; k = 1, ... ,q), where x’(r) and uk{t) are Lebesgue, real-valued, square-integrable functions and b’ is a real scalar parameter. In optimal control terminology, x(t) is called the state variable, u(t) the control variable, and b the control parameter. The inner product of two vectors x and y in Ye is given by
tl
(x, y) = c*b
+ Jti
(y*x + v*u) dt,
O
where x:x(t),b,u(t)
and
y:y(t), c, v(t).
Furthermore, Ye is complete under this inner product. However, we must restrict our attention to a subspace f!l =
{x in Yelx(t) is in f!l(n)}
on which J(x) in (3) is defined. In this definition ?I(n) is the space of all absolutely continuous vectors x(t) such that Xi is Lebesgue square integrable. For Theorem 5, let !Jl denote the subspace of Ye such that (4), (5), and (6) hold on [to, t 1 ] , and let re denote the subspace of !Jl on which x(t 1 ) = 0 and u(t 1 ) = O. The one-parameter family re(A) (to ~ A. ~ t 1 ) of re have been given above. Theorem 5 Let x be in Ye with x(t) absolutely continuous on to ~ t ~ A. Then x is J orthogonal to re(A) if and only if there exist an absolutely continuous vector p(t) (to s t s A.) and a square-integrable vector J.1(t) (to s t s A) such that
(10)
P + A*p + M*J.1 = W B*p + N*J.1 = co;
(11)
Fb - Co*p(tO) = O.
(9)
X’
(to ~ t ~ A.),
Equations (9) and (10) together with (4) and (5) will be called the Euler› Lagrange equations, and Eq. (11)is the transversality condition. An element x of Ye that satisfies the Euler-Lagrange equations on the subinterval to ~ t ~ A. and the transversality condition is called a transversal extremal arc on to ~ t ~ A..
214
6 The Quadratic Control Problem
To obtain our main theoretical results, of which Theorem 5 is a corollary, we must modify the above definition of the subspace re. Thus let C 1 be an n x r scalar matrix and (12)
This result is found (in part) in [38, pp. 43-56]. We shall attempt to sum› marize the ideas of the proof. Theorem 6 If x is in f!£, then x is in ’tJ J if and only p in f!£(n) and a square-integrable vector J.l such that
if there exist vectors
p+A*p+M*Jl=w x, B*p + N*u
=W
y ,
and
Mikami points out that J(x) is defined on f!£ and that for fixed x in f!£, J(y, x) is a linear form in y in f!£ that can be extended to :Ye, while preserving continuity, because f!£ is-dense in:Ye. The operator L(y) = y(t 1 ) - C1c is a bounded linear operator on f!£. By a multiplier rule proven by Mikami, if x is in re J , there exists p in f!£(n), Ain jRn, J.l a square integral vector, and y in jR" such that for all y in :Ye (13a)
i:’ [(p*+p*A)y+p*Bv]dt
J(y, x) =
- p*(t 1 )y(t1 ) +
+ p*(tO)y(tO) + A*[y(tO) -
i:’(Jl*My + Jl*Nv)dt + y*[y(t
1
) -
Coc] C1c].
Since J(y,x) is also equal to (13b)
J(y, x) = b*Fc
+ JtO ft’(wx*Y + wu*v)dt,
Theorem 6 follows by equating like terms. Statement (13a) involves skillful use of Hilbert space theory. We have tried to present enough of a summary so that anyone who is familiar with these types of arguments can get the gist of what is happening. For further details the reader is referred to Mikami [37]. As above, the equations
x=
Ax + Bu,
P + A*p + M*Jl = wx’
Mx + Nu = 0, and
B*p + N*Jl =
Wy
6.3
Two Examples
215
on [to, t 1 ] are referred to as the Euler-Lagrange equations. Because of (12), condition (14)
is the (new) transoersality condition. A solution {x, u, p, /i} of the Euler› Lagrange equations is called an extremal. A vector x:x(t), b, u(t) in fE is called an extremal arc if there exist vectors p E 2((n) and u square integrable such that {x, u, p, /i} is an extremal. A solution {x, b, u, p, /i} of the Euler› Lagrange equations and the transversality condition is called a transversal extremal. The vector x:x(t), b, u(t) ofa transversal extremal is called a trans› versal extremal arc.
Corollary 7 A vector x is in (f,J = (f,J n (f,JJ extremal arc that satisfies the end conditions
if and only if x is a transversal
(15) Mikami also notes we may introduce the "usual" Hamiltonian concepts and obtain a more symmetric form of Euler-Lagrange equations. Thus let (16)
H(t,x,u,P,/i)
= tp*(Ax + Bu) + t(Ax + Bu)*p
+ t/i*(Mx + Nu) + t(Mx + NU)*/i Theorem 8 For a given x in fE, x E (f,JJ p E fE(n) and /i square integrable such that
if and
only
w(t, x, u)
if there
exist vectors
and
The equations
p= -tt, HIl=O
and also referred to as the Euler-Lagrange equations.
6.3 Two Examples The purpose of this section is to give two examples of control theory phenomena. Our first example is somewhat elementary and reminds us of second-order examples in Sections 2.0 and 2.1. The second example is more complicated and is illustrative of Theorems 1,2, and 3 of Section 6.2.
216
6 The Quadratic Coutrol Problem
For Example 1 let
(1)
r
J(x) =
(u 2
-
x 2 ) dt
be defined relative to a one-parameter family subspaces rt’(),) (0 ::;; A::;; n) of arcs (x(t), u(t», where x(t) is absolutely continuous; x(t) and u(t) are square integrable on a ::s;; t ::;; b such that
x e- Bu
(2)
(0::;; t ::s;; A)
with boundary conditions (3a)
x(a)
=0
and (3b)
x(t) = 0,
u(t) =
(A ::s;; t ::s;; n).
In (2) we define B(t) = 0 if nl2 ::;; t < 3nl4 and B(t) = 4 otherwise on [0, n]’ In Section 6.2 we have shown that the vector x(t) is in rt’o(A), for 0 < A < tt, if and only if there exists an absolutely continuous vector p(t) (0 ::;; t ::s;; A) such that Eqs. (2), (3), and
p=-x
(4a)
(o::s;; i
Bp=u
(4b)
s: A),
(O::S;; t ::s;;A)
hold. Thus x = Bu = B2p and x + B2x = 0 almost everywhere. The following solutions to (2), (3), and (4) are obtained: (5a)
(5b)
x(t) =
Sin 4t 0 { sin(4t - 3n)
u(t) =
COs 4t 0 { cos(4t - 3n)
0::;; i s nl2 nl2 < t ::;; 3nl4 3nl4 < t ::s;; n,
o s i s:
nl2 nl2 < t ::s;; 3nl4 3nl4 < t ::;; tt,
and i cos 4t (5c)
p(t) =
i
{ icos(4t - 3n)
0 < t s; nl2 nl2 < t ::s;; 3nl4 3nl4 < t::s;; tt.
The following results agree with the theory in Section 6.2. Thus if n12::s;; A::;; 3nl4 if A = nl4 or n otherwise
6.3
Two Examples
217
and if A::;; n/2 if nl2 < A::;; 3nl4 if 3nl4 < A::;; n. We note that A = nl2 and possibly A = tt are the more classical focal (conjugate, oscillation) points, while I = [nI2,3nI4J is a focal interval. Thus S(A) = on [0, nl2J since n(A) = on [0, nI2); s(nl2 + 0) = 1 since n(nI2) = 1; S(A) = 1 on (n/2,3n/4J since we have encountered no classical focal points nor has any focal interval ended before t = 3n14; s(3n14 + 0) = 2 since a focal interval ended at t = 3n14; and S(A) = 2 for 3nl4 < A::;; tt by similar reasoning. If we extended our problem from [0, tt J to [0, n + e) for some e> 0, s(n + 0) has value equal to either two or three. It has value two if t = n is the left-hand end of a nondegenerate focal interval and the value three if t = n is a classical focal point. From Section 3.1, we note that if B(t) is chosen to be sufficiently smooth, we might gain a negative vector. For example, if B 1 (t) = 4 on [0, nJ, so that instead of (1), we have x = 4u or
2
J 1(x) = fo"(l6 X,2 - x )dt
with associated differential equation l6X" + x = 0 and solution x(t) = sin 4t. The associated problem has n(A) = 1 at A = n14, A= n12, A= 3n14, and A = n; n(A) = 0 otherwise with corresponding signature S(A) given by if if nl4 < if nl2 < if 3nl4 <
A::;; A::;; A::;; A::;;
nl4 nl2 3nl4 n,
These examples should lend some credibility to our contention that the methods in this book yield a more general definition of oscillation for differential equations. Our second example is rather complex. It is given in Mikami [37, pp. 79-83J and modified only for editorial comments or further explanation. Let 4e = n and the matrix B (and hence BB*) be defined as follows:
J2 .fi B(t) = ( 0 o
0
oo
0)0
4 0 ’ 0 o 1
2 2 BB* =
2
2
(o 0
o
o 16 o
0) 1
on
[0,2e),
218
6 The Quadratic Control Problem
0)0 00’ o o
BB*=B
on
[2e,3e),
1
and
2 00 00 00) 0 ( ) - 0 0 2 0’ (0 0 0 1
(4000) 0000 BB - 0 0 4 0 *_
Bt _
on
[3e,n].
0001
Let &6 be the space of all arcs x:xi(t), uk(t) (0::; t s; 4e; i, k = 1,... ,4) such that x(t) is absolutely continuous, x(t) and u(t) are Lebesgue square› integrable, x = Bu; and x(O) = O. Let J(x)
= Jo e (u*u 4
(x E &6)
- x*x)dt
be a quadratic form on &6. Note that J(x) satisfies the strengthened Legendre condition and hence is elliptic on &6. Let ((j =
{x E &6lx(4e) = O}.
An arc x with x(t) absolutely continuous and with x(t) and u square inte› grable is J orthogonal to ((j if and only if there exists an absolutely continuous vector (0::; t::; 4e; i = 1,... ,4)
such that the Euler-Lagrange equations (6)
x=
Bu,
P=
- x,
B*p = u
are satisfied on 0 ::; t ::; 4e. These equations reduce to
x = BB*p,
(7)
p= -x.
Our interest is in the vectors x that satisfy Eqs. (7) and the end condition x(O) = O. There exist eight linearly independent solutions {x, p} of (7), which in fact are in d (J C(jJ, and four linearly independent solutions {x,p} of (7) satisfying x(O) = 0, which are in &6 (J ((jJ. The graphs of these solutions are illustrated in Fig. 1. Figure la refers to our first solution satisfying (7) and x(O) = O. Hence x;(t), x;(t), x:(tW. By the notation in component form xa(t) = (x~(t), x~
= 2 sin 2t, 0, - 2 sin(2t - 6e)
6.3
Two Examples
219
x2
(02)~'
e -----
I ?-~
I 1e
~~
p2
(b)
I
A
~
__ ,
~ 3~e
2’e po
(d)
~-----,C_
e
X
~,
2e -------_~e p4
4e ---------.
Fig. 1 (al) X’ = 2 sin 2t, 0, - 2 sin(2t - ~); p’ = eos 2t, -1, - eos(2t - ~). (a2) x 2 = 2 sin 2t, 0, 0; p2 = cos 2t, -1, -1. (b) Xl = 0,0, -2 sin(2t - ~); pl = -1, -1, -eos(2t - ~). (e) x 3 = 2sin4t,0, -sin(2t - ~); p3 = teos4t,!,teos(2t - ~). (d) x4 = 2sint; p4 = 2 eas t.
we mean 2 sin 2t x;(t)
= 0 {
-2sin(2t - 6e)
on on on
0:::; i
s: 2e
2e < t ~ 3e 3e < t ~ 4e,
with similar notation for x;(t), x~(t), x:(t), and pAt) = (p;(t), p;(t), p~(t), )T. The solution x~(t) 0 for t is [0,4e] and hence is ignored in our picture. Similarly, our fourth solution of (7) with x(O) = 0 is pictured in Fig. ld, and denoted by Xd’ The first three components of xAt) and pAt) are identically zero and hence ignored. p~(t)
=
220
6 The Quadratic Control Problem
Note that from (7) we have x + BB*x = O. The complete nonzero listing of the four solutions in this notation is as follows. Solution (a) x~(t)
=
2sin2t, 0, -2sin(2t - 6e),
p~(t)
=
cos2t, -1, -cos(2t - 6e),
x;(t) = 2 sin 2t, 0, 0, p;(t)
= cos 2t, -1, -1;
Solution (b) x~(t)
0,0, - 2 sin(2t - 6e),
=
= -1, -1, x;(t) = 0, 0, 0, p~(t)
- cos(2t - 6e),
p;(t) = 1,1,1;
Solution ( c)
= 2 sin 4t, 0, - sin(2t = !cos4t,!, !cos(2t -
x~(t)
p~(t)
6e), 6e);
Solution (d) xJ(t) = 2 sin t, 2 sin t, 2 sin t, PJ(t)
= 2 cos t, 2 cos t, 2 cos t.
To aid the reader, we shall verify xit) on [0,2e) in Solution (a). Thus if x;(t), x;(t), x~(t) l, then
xa(t) = (x~(t),
.. x
* _ --
+ BB
x-
(
(22 22 00 00)(22 sin sin 2t 2t) _ sin 2t) 88 sin 2t 0 + 0 0 16 0 0 -
o
00010
(0) 0 0 0
verifies that xa + Bbr x; = 0 holds on the interval [0, nI2). Mikami has used methods (slightly) different from those we have used to define maximum focal interval and count S(A) as in our Theorems 2 and 3 of Section 6.2. We now return to Fig. 1 and briefly explain our rationale for S(A). By inspection the first jump of S(A) cannot occur before A = e. In fact s(e + 0) = 1 due to the solution xc(t). That is, A. = e is a focal (or conjugate) point or a degenerate focal interval. Note xb(t) = 0 in an open neighborhood of A = e, but this zero solution can be extended so that xb(t) does not
6.4
An Approximation Theory of Focal Intervals
221
"contribute" to S(A). The point A = 2e is most confusing (at first); in fact S(A) = 1 in an open interval of A = 2e. Thus X a does not "contribute" to s(2e + 0), since Xa is not a solution to our equations. That is, for example, x~(t) is not continuous at t = 2e. Similarly Xb(t) == 0 in an open neighborhood of t = 2e and x;(t) is not continuous at t = 2e. There is a jump in S(A) at A = 3e due to xc(t), which ends a focal interval at t = 3e. Note xit) does not contribute to s(3e + 0), since X a - X b = x, to the left of t = 3e and zero to the right of 3e. Similarly Xb(t) = 0 on a left extension. Thus s(3e) = 1 and s(3e + 0) = 2. Finally, A = n can only contrib› ute to the nullity and not to S(A). Hence
o S(A) = 1
{2
6.4
if o s;;, As n/2 if n/2 < A s;;, 3n/4 if 3n/4 < A::;; n.
An Approximation Theory of Focal Intervals
In Section 6.1 we used the approximation theory of Chapter 2 to derive a focal-interval theory of quadratic forms. More precisely, we gave a theory of focal points and of focal intervals for an elliptic form J(x) on a Hilbert space d. These results were based upon inequalities dealing with the indices s(a) and n(O’) of the elliptic form J(x; a) defined on the closed subspace d(a) of d, where a belongs to the metric space C~:, p). The purpose of this section is to make a "second pass’ through this theory and derive an approximation theory for the focal points and focal intervals of Sections 6.1 and 6.2. Our results are based upon inequalities dealing with the indices s(/1) and u(/1), where /1 belongs to the metric space (M, d), M = [a, b] x L. For the usual focal point problems we show that An(O’), the nth focal point, is a p continuous function of a. For the focal-interval case, we give sufficient hypotheses so that the number of focal intervals is a local minimum at ao in L. For com› pleteness, we indicate that example problems for quadratic problems in a control theory setting can be formulated. We begin with Theorem 1, which is a restatement of Theorem 14 of Section 3.3 in the setting of this section. The proof of this theorem has been given in Theorems 4-7 of Section 3.5. We assume the reader is familiar with this material, much of which will not be repeated here. For a in the metric space (L, p), let J(x; a) be a quadratic form defined on the closed subspace d(a) of d and assume that the approximating hypotheses (1) and (2) of Section 2.3 hold. Let M = A x L be the metric space with metric d defined by d(/11,/12) = IA2 - All + p(a2,a 1), where /11 = (A1,a 1) and /12 = (A2,0’2)’ For each /1 = (A, a) in M, define J(x; /1) = J(x; a)
222
6 The Quadratic Control Problem
on the space 84(Jl) = £(A) n d(o). Let s(Jl) = S(A,a) and n(Jl) = n(A,a) denote the signature and nullity of J(x; Jl) on 84(Jl). Theorem 1 For any Jlo u = (A, a), d(Jlo,Jl) < b, then (1)
=
(Ao,ao) in M, there exists b > 0 such that
if
s(Ao,a o)::; s(A,a)::; s(A,a) + n(A,a)::; s(Ao,a o) + n(Ao,ao)’
Furthermore, and
n(A,a) = o.
We now begin our approximating theory of focal points and focal intervals. Thus let a 0 in L be given. A point Ao at which S(A,a 0) is discon› tinuous will be called a focal point of J(x; ao) relative to {£(A)IAin A}. The difference s(Ao + 0, a 0) - S(A o, a 0) will be called the order of ,10 as a focal point of ao. A focal point Ao is counted the number of times equal to its order. In the above S(Ao + 0, a o) is the right-hand limit of S(A,ao) as A-do from above. The quantity S(AO - 0, ao) is similarly defined. It has been shown in Theorems 14 and 16 of Section 2.3 that s(A- 0, a 0) = S(A,a 0), while the disjoint hypotheses of Theorem 2 imply S(A + 0, a 0) = S(A, a 0) + n(A, a 0)’ Thus Theorem 2 Let a o in L be given such that A’,A" in A, a s; A’< X’ :::;; b imply the J(x; a o) null vectors on 84(X,ao) and 84(,1", a o) are disjoint. Assume A’and Art are not focal points of ao (a :::;; X < A" < b) and there exist k focal points of a o on (X,A"). Then there exists e> 0 such that p(a,a o) < s implies there are exactly k focal points ofa 0 on (A’, A"). In fact if An(ao)::; An+l(a o)::; ... :::;; An+k-I(aO) (n = 1, 2, 3, ...) are the k focal points of a o on (X,A"), then Aia)::; An+l(a)::; ... ::; An+k-I(a) are the k focal points of a on (X, 2").
Assume s(X,ao) = n. Then by the above remark, s(2",ao) = n + k - 1 and n(X,ao) = n(2",ao) = O. Hence there exists 15 > 0 such that if pta, a o) < 15, then n(X,a) = n(A",a) = 0, s(X, a) = n, s(2",a) = n + k - 1. The result follows by definition. Corollary 3 Under the above hypotheses there exists 8> 0 such that pt«, (J 0) < 6 and a ::; A:::;; a + 8 imply there exists no focal point Aof (J. Corollary 4 Under the above hypotheses the nth focal point An((J) is a continuous function of a (n = 1,2,3, ... ). If we assume that the disjoint hypotheses of Theorem 2 do not hold, we obtain a focal-interval theory. In this case we have shown in Theorem 9 of Section 6.1 that if X o is a J(x; a 0) null vector of 84(),o, a 0), then a 0 belongs to a proper closed subinterval A, of A, where A, = {A in AJxo is a J(x; a o)
6.4
An Approximation Theory of Focal Intervals
223
consider the nonlinear problem L(x) - N(x) = 0. The idea is to replace null vector of 86’(A, a o)}. We have also shown in Section 6.1 that focal intervals can be well defined. Furthermore the relationship between focal intervals and the indices s()" a) and n(A,a) have been given in that section. We now consider inequalities involving f(A’,A"; a), the number of focal intervals (with respect to a) on the interval (X,A") of A. We shall denote the dimension of the J(x; a) null vectors common to the space gB(),’, a) and gB(A",a) by m(A’,A",a). We begin by restating Corollary 12 of Section 6.1 in this setting. Corollary 5
(3)
Let ao in L. If A’, A" in A (a ::;; A’< X’ < b), then
f(X,A"; ao) = s(A"; a o) - [s(A’,ao) + n(X,ao)]
+ m(X,A"; a o)•
Theorem 6 Let X, X’ in A (a ::;; X < A" < b), ’1> 0, and assume a in L, p(ao, a) < IJ implies m(X,A"; ao)::;; m(X,A"; a). Then there exists b > Osuch that f(A’,A";ao) ::;;f(X, A"; a) whenever p(ao,a) < b.
From equality (3) we have f(X’),"; ao) = s(A",a o) - [s(X,ao) + n(X,ao)] + m(X,A"; a o) ::;; s(X’,a) - [s(X, a) + n(A’,a)] + m(X,A"; a)
= f(X, A"; a). Corollary 7 Ifn(A",ao)=O, then there exists b>O such that iff(A’,A";ao) ::;; f(X, A"; a) whenever p(ao, a) < b.
In this case n(A",a) = so that m(X,A"; a o) = = m(X,A"; a). We conclude this section by noting that example problems can be formulated by returning to Section 6.2 and defining Fa’ Pa(t), QAt), Ra(t), AAt),BAt),MAt), Na(t), and C Oa similar to quantities in (3)-(5) ofthat section and continuous in a. We leave such example problems to the reader.
Postscript
The Numerical Problem Revisited
In the time since the manuscript for this book was prepared, the author and a young SIU Computer Science graduate student named Charlie Gibson have extended the numerical algorithm given in Section 3.2 in many interesting directions. The purpose of this section is briefly to discuss these extensions. Much of this work is incomplete in a mathematical sense and should be considered as a preliminary report. Note that our outlook and philosophy have "changed" somewhat, although our basic tools have not. We shall now approach our problems as initial-value problems. This will allow us to obtain better algorithms than the finite element method for boundary-value problems as in Chapter 3 and to extend our methods to solve nonlinear equations of the form L(x) › N(x) = O. The operator L(x) is as in Chapter 3, while N(x) is a nonlinear operator. Topics will be discussed as follows: (1) The basic problem in Section 3.2 given by (1)and (2)and the approximating quadratic form (6)is extended to include the "x(t)x’(t)" term. This is an initial-value problem. (2) We show how to solve boundary-value problems (cheaply) using initial-value tech› niques. After reading Chapter 3, the reader may appreciate that our eigen› value procedures are perhaps the most elegant examples of shooting methods. This method works equally well for the equation L(x) = f(t) where a parti› cular solution can be guessed. (3) The basic algorithm with one dependent variable x(t) is extended to systems where x(t) = (Xl(t),X2(t), ... ,x m (tW. We also illustrate that higher-order self-adjoint equations as in Chapter 4 (with K~i == 0) can be handled with this technique. The perceptive reader will note that eigenvalue results should follow (immediately). (4) Finally we consider the nonlinear problem L(x) - N(x) = O. The idea is to replace 224
1 The x(t)x’(t) Term
225
N(x) by xf(x) and obtain a "linearization" of the nonlinear problem. If f(x) depends only on the independent variable t, we have a linear problem. However, our technique allows us to approximate f(x) at the node ak+ I from the computed value of x at ak and ak-I’
1 The x(t)x’(t) Term
We shall change some notations and definitions from (1), (2), and (5) in Section 3.2, Thus let (1)
L(x)
where ret) > (2a)
= [r(t)x’(t) + q(t)x(t)]’ -
[q(t)x’(t) + p(t)x(t)]
= 0,
on [a,b], and lex, y) =
L b
[r(t)x’(t)y’(t) + q(t)x’(t)y(t) + q(t)x(t)y’(t)
+ p(t)x(t)y(t)] dt, (2b)
l(x) = l(x, x).
For technical reasons, we define our spline basis elements slightly differently than in (5) of Section 3.2. Thus let (3)
Zk
_ {~ -It () t -
if t in otherwise.
akl/~
Repeating the arguments after Theorem 1 in Section 3.2, we have with a: = teak + ak+ d, r: = r(an etc., (4a)
= rt-I + rt + a(qt-I
dk,k
- qt)
+ (ta 2)(pt_1 + pt)
and (4b)
dk,k+
1
= -rt
+ (ta2)pt.
This result "agrees" with (7) of Section 3.2 with our notational changes. In fact there is an alternative approximation that gives (slightly) better numerical results when q(t) is differentiable. Note that q(t)x
2(t)l:
=
s: ~
=
Lb [q’(t)x2(t) + 2q(t)x’(t)x(t)] dt
[q(t)x
2(t}]
dt
so that lex) in (2b) becomes (5)
lex) =
f:
{r(t)x’2(t) + [pet) - q’(t)]x2(t)} dt.
226
Postscript
The Numerical Problem Revisited
Note, in fact, that the Euler-Lagrange equation for (5) is d dt (r(t)x’(t»
= (p(t) - q’(t) )x(t),
which is (1) if q’(t) exists. Redoing the calculations leading to (4) with q(t) equal to zero and p(t) equal to p(t) - q’(t) and using the approximation a(qk-1
+ q;’)* = (qk -
qk-1)
+ (qk+1
- qk) = qk+1 - qk-1’
where q;’ = q’(ak) and qk = q(ak), we have from (4), (6a)
dk,k
= rt-1 + rt - (ta)(qk+
1 -
qk-
d + (ta 2)(pt_1 + pZ)
and (6b)
dk,k+
1
= - rt - (f;cr)(qk+ 1
-
qk)
+ (~a2)pr
Both (4) and (6) are consistant with our approximation ideas and hence are correct. As in (8c) of Section 3.2, we obtain the three-term recursive formula (7)
which yields ck+ 1 in terms of Ck and Ck -1’ We remind the reader that our solution arc xAt) now satisfies XAak) = J(i C k• Theorem 2 of Section 3.2 still holds. In fact, with smooth coefficient functions r(t), q(t), and p(t), test runs yield the quadratic convergence we observed in test runs described in Chapter 3. As a rule of thumb, the error is less than ka", where k is usually observed to be less than 0.5 and is often smaller.
2 Cheap Boundary-Value Methods From Section 1 we have an initial-value problem since we require Co C1 to use (7). However, the coefficients d k,k-1’ dk,b and d k,k+1 are independent of the sequence {ck} and need be computed only once. Since the time required to perform additions and multiplications by modern computers is on the order of a million per second, our numerical procedure allows initial-value problems to "become" boundary-value problems. We note that our procedures for computing eigenvalues in Chapter 3 are examples of parametrized boundary-value problems. The coefficients dk,k-1, dk,k’ and d k,k+1 in (15) of Section 3.4 require that the values ek,k-1’ ek,k’ ek,k+1’ fk,k-1’ h,k’ and fk,k+l be computed only once while the com› putation dk,k = ek,k - lfk,k requires only one multiplication and addition. and
3 Systems
227
We also note that initial values and boundary values of the form L(x) = f(t) can easily be handled if we can guess a particular solution. As an elemen› tary example, suppose we consider the problem
x" + 25x = t,
(la)
x(O) = a,
x(l) = {3.
Clearly xp(t) = tl25 is a particular solution with xp(O) = 0 and x p(l) = 15’ The solution to (la) is x(t) = xo(t) + xp(t) where xo(t) satisfies (lb)
x"+25x=O,
x(O)=a,
x(1)={3-15’
This homogeneous problem is solved by choosing Co = aljU and iterating on the value Ct> until this value of C t leads to a solution such that x(l) = {3 - /5’ 3 Systems
The techniques described above can be extended to quadratic forms of the type (1)
J(x) =
f [x,T(t)R(t)x’(t)+ 2X,T(t)Q(t)x(t)+ xT(t)P(t)x(t)] dt,
whose Euler-Lagrange equation is (2)
d dt [R(t)x’(t) + Q(t)x(t)]
=
QT(t)X’(t)+ P(t)x(t).
In the above we assume that R(t) > 0, P(t) is symmetric, and x(t) = (xt(t), x 2(t), ... , xm(t))T. We consider only the case with m = 2 and Q(t) = O. This includes the case when Q(t) is symmetric and Q’(t) exists (piecewise), since we may inte› grate by parts in that case and obtain the "new" Ql(t) = 0 and Pl(t) = P(t) - Q’(t) as discussed in Section 1. The basic idea is to define Y = (Yl,yz)T= (e,z"hz,)T and x = (Xl,XZ)T = (ckzk,dkzk)T where repeated in› dices are summed and the {Zk(t)} are the spline hat function in (3) of Section 1. Proceeding as in Section 1, we have corresponding to (7) (3a)
and (3b)
ZI
2l
Ck-t Rk,k-l+CkRZI k,k+Ck+l R k.k+l + dk-tRfi-t + dkR~,i + dk+1R~.i+l
= O.
The number RU-l is Jll(Zk,Zk_l; 0") where Jll(x, Y; 0") =
f [y’
l r ll l1 x ’l
+ YlPl1l1 X t] dt.
228
Postscript The Numerical Problem Revisited
The numbers R~~ in (3a) and (3b) are defined similarly and need only be computed once, as we described in Section 2. As we did in (7) of Section 1, where we used the values of Ck- 1 and C k to compute Ck+ 1> we use the values Ck- 1, Ck, dk - 1 , and dk to compute Ck+ 1 and dk + r- Note that in (7) of Section 1, we have one equation in one unknown, while in (3a) and (3b) we have two equations in two unknowns. Intuitively this yields J(x, y) = 0, except for boundary terms; J(x, y) = ekL~ + fkL~ where repeated indices are summed, L~ is the left-hand side of (3a) and L~ is the left-hand side of (3b). That J(x, y) = 0 for all ek and fk yields (3). We have run a variety of interesting runs for this example setting. Our accuracy is essentially the same as described in the last paragraph of Section 1. What is surprising is that (for example) if, for our first example, [G, bJ = [0,4J, then R(t) =
G;)
and
P
=
-G
~).
The reader may verify that xo(t) = (sin t, cos t)T is the solution when Co = 0, Cl = sin (T, do = 1, and d 1 = cos (T. We obtain the same maximum error (as a function of (T) as discussed above. Of special interest is that R(t) > 0 does not hold for t ;;:: This is a pleasant surprise since our theory only holds when R(t) > O. As a second example, we find the first conjugate point, relative to t = 0, of L(x) = xliv) - X = O. We mean a point t 1 > 0 such that a nontrivial = O. Conventional solution xo(t) exists with xo(O) = x~(O) = XO(t 1) = X~(tl) methods require that we convert L(x) to a fourth-order system of first-order equations. However, we shall convert this problem to a second-order system by letting x1(t) = x(t) and x 2(t) = x"(t). This yields the system x’; = X2’ Xz = Xl> or
J2.
which is Eq. (2) with Q(t) = 0, R
=
(~ ~}
and
We note that we have successfully avoided the use of higher-order splines and are approximating higher derivates by the spline hat functions of Sec› tion 1. We set Co = xo(O) = 0 and Cl = C 1 in (3) since x~(O) = O. We can check that do = 0 does not yield a solution to our problem, and hence we may take do = 1. Any other nonzero value of do will determine a solution
4 Nonlinear Problems
229
yo(t) that is a constant multiple of xo(t), which we are to determine. It will not change the value of t 1 We now iterate on d 1 until the boundary condi› tions are satisfied. In this example, it is evident that d 1 > 1 yields a solution that does not oscillate. Thus our upper value of d 1 is 1. When (J = Si2 we obtain a value of t 1 in the interval 4.72949219 ~ t 1 ~ 4.73144531, which is "best possible" after 38 iterations. The number of iterations depends upon the initial difference of the upper and lower bounds for d i - The length of this interval equals (J, so there is no point in further iteration without decreasing (J. This result compares very favorably with the true solution of t 1 = 4.73004, which is a solution to cosh t 1 cos t 1 = 1. 4 Nonlinear Problems
Perhaps the most interesting new idea involves nonlinear problems of the form ff(x) = L(x) - N(x) = 0, where L(x) is of the form above and N(x) is a continuous nonlinear operator in x. There are many important physical problems of this type. Perhaps the simplest is the undamped pendulum problem described by ff(x) = x" + (qjl)sinx = 0, where I is the length of the pendulum, x the angular measure› ment from the center line, and 9 the gravitational constant. * The trick is to multiply ff(x) by - y(t), integrate over [a, b], and factor x from N(x). For convenience, we assume that this integral has the form (1)
0 = H(x, y) =
S: [r(t)x’(t)y’(t) + p(t)x(t)y(t) + f(x)x(t)y(t)] dt,
Iff(x) were dependent only on t, we would have our familiar linear problem. Let Zk(t) be as defined above, x(t) = CkZk(t), and y(t) = d,z,(t), where repeated indices are summed. Let ra(t) and Pa(t) be defined as in Section 3.2 and rt as defined above with pt defined similarly. By Ha(x, y) we mean the form H(x, y) with ra(t) replacing r(t), pAt) replacing p(t), and fa(x) replacing f(x), with fAx) to be specified below. From (1) we have, by the linearity of dk, that 0 = dkHa(x,Zk)’ Since dk is arbitrary, we have Ha(X,Zk) = O. This observation gives us a three-term Euler-Lagrange relation for Ck’ as before. The only problem is how to define fa(x). Since f is continuous and Ck-1 and Ck are known, this definition is
possible. For ease of exposition, we consider an example where H(x,y) = S~ [x’y’ › (1 + sin 2 t)xy + f(x)xy] dt, r(t) == 1, p(t) == - (1 + sin 2 t), and f(x) = x 2 The reader may observe that the corresponding differential equation is
* See W. E. Boyce and R. C. DiPrima, "Elementary Differential Equations and Boundary Value Problems," 3rd ed., p. 419. Wiley, New York, 1977.
230
Postscript The Numerical Problem Revisited
x" + (1 + sirr’ t)x - x 3 = O. Note also that x(t) = sin t is a solution to this non-linear equation. The computation of HAx,Zk)= 0 in this example proceeds as follows (k fixed and not summed):
0=Ck-1L:k_,{Z~-lZ~+[Pt-l
+~(C~-l
+CDJZk-1Zk}dt
We obtain (in this example)
a Pk-l+2"(Ck-l+ * a 2 2 Ck) O=Ck-l { (-1)+"6 2 [
2a2
+ Ck { (2) + 3 [pt + a
J}
acn } * a 3C2 2 ]} "2 ( k -Ck-l) .
+Ck+l { (-1)+"62 [ Pk +
In the above the first integral is only over [ak-bak] since Zk-l(t)Zk(t)== 0 outside this interval. On this interval we approximate f(x) by [f(x(ak-l)) + f(x(aJ)]/2, where x(ak) = .j(JCk’ The second integral follows analogously, as does the third, except that we must extrapolate for the value of f(x) in the interval [ab a k+ 1]. We use simple first-order extrapolation. Thus if y is our desired approximation to f(x(at)), we have [f(x(ak)) - f(x(ak- d )]/a = [ Y- f(x(ak))]/(!a). Finally we remark that test cases work quite well. In this example with a = 0, Co = 0 = sinO/J(i, and Cl = sina/J(i, we have maximum absolute errors as given in Table 1. Note that these errors are smaller than we some› times obtain in the linear case. This is because the coefficientsin our problem are so smooth. This is not really surprising, because of our methods. Table 1 (J
Th -rl6 sh
nrn:
Error
0.8 X 0.2 x 0.5 X 0.1 x
10 5 10- 5 10- 6 10- 6
References
1. Ahlberg, J. H., Nilson, E. N., and Walsh, J. L., "The Theory of Splines and Their Appli› cations." Academic Press, New York, 1967. 2. Akhiezer N. 1., and Glazman, I. M., "Theory of Linear Operators in Hilbert Space," Vol. 1. Ungar, New York, 1966. 3. Bliss, G. A., "Lectures on the Calculus of Variations." Univ. of Chicago Press, Chicago, Illinois, 1945. 4. Carson, A. B., An analogue of Green’s theorem for multiple integral problems in the calculus of variations, in "Contributions to the Calculus of Variations, 1938-1941" (G. A. Bliss, ed.). Univ. of Chicago Press, Chicago, Illinois, 1942, pp. 453-490. 5. Coppel, W. A., "Disconjugacy." Springer-Verlag, Berlin and New York, 1971. 6. Courant, R., and Hilbert, D., "Methods of Mathematical Physics," Vol. 1. Wiley (Inter› science), New York, 1953. 7. Dennemeyer, R. F., Quadratic forms in Hilbert space and second order elliptic differential equations, Dissertation, Univ. of California, Los Angeles, 1956. 8. Dennemeyer, R. F., Conjugate surfaces for multiple integral problems in the calculus of variations, Pacific J. Math. 30, No.3, 621-638 (1969). 9. Forsythe, G. E., and Wasow, W. R., "Finite Difference Methods for Partial Differential Equations." Wiley, New York, 1967. 10. Gelfand, I. M., and Fomin, S. V., "Calculus of Variations." Prentice-Hall, Englewood Cliffs, New Jersey, 1963. II. Goldberg, S., "Unbounded Linear Operators." McGraw-Hill, New York, 1966. 12. Gould, S. H., "Variational Methods for Eigenvalue Problems." Univ. of Toronto Press, Toronto, 1966. 13. Gourlay, A. R., and Watson, G. A., "Computational Methods for Matrix Eigenproblems." Wiley, New York, 1973. 14. Greenspan, D., "Discrete Numerical Methods in Physics and Engineering." Academic Press, New York, 1974. 15. Gregory, J., An approximation theory for elliptic quadratic forms on Hilbert spaces, Pacific J. Math., 37, No.2, 383-395 (1970). 16. Gregory, J., A theory of focal points and focal intervals for an elliptic quadratic form on a Hilbert space, Trans. Amer. Math. Soc. 157, 119-128 (June 1971).
231
232
References
17. Gregory, J., An Approximation Theory for Focal Points and Focal Intervals, Proc. Amer, Math. Soc. 32, No.2, 477-487 (April 1972). 18. Gregory, J., Elliptic quadratic forms, focal points and a generalized theory of oscillation, J. Math. Anal. Appl. 51, No.3, 580-595 (September 1975). 19. Gregory, J., Comparison theorems for oscillation of nonlinear, nonself-adjoint equations by use of quadratic forms, J. Math. Anal. Appl. 57, No.3, 676-682 (March I, 1977). 20. Gregory, J., Numerical algorithms for oscillation vectors of second order differential equations including the Euler-Lagrange equations for symmetric tridiagonal matrices, Pacific J. Math., 76, No.3, 397-406 (June 1978). 21. Gregory, J., and Lopez, G. C, An approximation theory for generalized Fredholm quadratic forms and integral differential equations, Trans. Amer. Math. Soc. 222, 319-335 (1976). 22. Gregory, J., and Richards, F., Numerical approximation for 211th order differential systems via splines, Rocky Mountain J. Math. 5, No. I, 107-116, (1975). 23. Gregory, J., and Wilkerson, R., New numerical methods for symmetric differential equa› tions, quadratic extremal problems, and banded matrices; the second order problem, Trans. Illinois State Acad. Sci. 71, No.2, 222-235 (1978). 24. Halmos, P. R., "Introduction to Hilbert Space." Chelsea, Bronx, New York, 1957. 25. Hazard, K. F., Index theorems for the problem of Bolza, in "Contributions to the Calculus of Variations, 1938-1941" (G. A. Bliss, ed.). Univ. of Chicago Press, Chicago, Illinois, 1942,293-356. 26. Helmberg, G., "Introduction to Spectral Theory of Hilbert Space." American Elsevier, New York, 1969. 27. Hestenes, M. R., Applications of the theory of quadratic forms in Hilbert space in the calculus of variations, Pacific J. Math. I, 525-582 (1951). 28. Hestenes, M. R., Quadratic variational theory and linear elliptic partial differential equa› tions, Trans. Amer. Math. Soc. 101, 306-350 (1961). 29. Hestenes, M. R., "Calculus of Variations and Optimal Control Theory." Wiley, New York, 1966. 30. Hestenes, M. R., "Optimization Theory, The Finite Dimensional Case." Wiley, New York, 1975. 31. Hochstadt, H., "Integral Equations." Wiley, New York, 1973. 32. Hoffman, K. M., and Kunze, R. A., "Linear Algebra." Prentice-Hall, Englewood Cliffs, New Jersey, 1971. 33. Joss, G., "Theoretical Physics." Stenchert, New York, 1934. 34. Kreith, K., "Oscillation Theory." Springer-Verlag, Berlin and New York, 1973. 35. Leighton, W., "Ordinary Differential Equations." Wadsworth, Belmont, California, 1970. 36. Lopez, G. C., Quadratic variational problems involving higher order ordinary derivatives, Dissertation, Univ. of California, Los Angeles, 1961. 37. Mikami, E., Quadratic optimal control problems, Dissertation, Univ. of California, Los Angeles, 1968. 38. Mikami, E., Focal points in a control problem, Pacific J. Math. 35, No.2, 473-485 (1970). 39. Miller, J. J. H., "Topics in Numerical Analysis." Academic Press, New York, 1972. 40. Morse, M., The calculus of variations in the large, Amer. Math. Soc. Colloq. Publ. 18(1934). 41. Olver, F. W. J., "Asymptotics and Special Functions." Academic Press, New York, 1974. 42. Prenter, P. M., "Splines and Variational Methods." Wiley, New York, 1975. 43. Ralston, A., "A First Course in Numerical Analysis." McGraw-Hill, New York, 1965. 44. Reddien, G. W., Some projective methods for the eigenvalue problem, Applicable Anal. 28, Fasc, 3, 61-73 (1977). 45. Riesz, F., and Sz-Nagy, B., "Functional Analysis." Ungar, New York, 1965.
References
233
46. Sagan, H., "Introduction to the Calculus of Variations." McGraw-Hill, New York, 1969. 47. Sauer, N., Properties of bilinear forms on Hilbert spaces related to stability properties of certain partial differential operators, J. Math. Anal. Appl. 20, 124-144 (1967). 48. Schoenberg, I. J., Contributions to the problem of approximation of equidistant data by analytic functions, Quart. Appl. Math. 4, 45-99,112-141 (1946). 49. Stakgold, I., "Boundary Value Problems of Mathematical Physics," Vols. I and II. Mac› millan, New York, 1971. 50. Stein, J., Singular quadratic functionals, Dissertation, Univ. of California, Los Angeles, 1971. 51. Swanson, C. A., "Comparison and Oscillation Theory of Linear Differential Equations." Academic Press, New York, 1968.
Index
A
Abnormality, 160,201 Accessory extremal, 30 Accessory minimum problem, 29 Adjoint, 9 Approximation theory, 75-78,156-157, 167-170,184 eigenvalues, 81, 108-114, 186 focal intervals, 221- 223 focal points, 78-81,156-160,184-186, 202-203 numerical, 88-93, 135-136, 166-173, 189-192
Completely continuous, 60, 147 Condition of Legendre, 27, 43 Conjugate point, 30, 45, 84,150,228, see also Focal point; Oscillation point Conjugate surface, 177, 178, 181 Continuous function, 60 weakly, 60 weakly lower semicontinuous, 60, 148 Control parameter, 213 Control problem, see Quadratic control problem Control variable, 213 Convergence, strong and weak, 59, 144 Corner point, 26 Critical point, 15
B
Bessel functions, 183 Bilinear form, 60,74 Block matrices, 175, 190 Boundary conditions, 41-42, 159,208,226 Bounded sequence, 59
D
Dennemeyer, 175, 179, 183 Dirichlet integral, 52, 66 Disjoint hypothesis condition, 20I, 204 E
c Calculus of variations, 25 Cauchy-Schwartz inequality, 6 Class ev. cv, o», 33 Closed subspace, 59 Compact operator, 60 Comparison theorems, 72, 1I0, 161, 187
Eigenvalue (or eigenvector), 10, 16,21,56, 103-107, 109 double, 57,115,124-134 hypothesis, 109 multiplicity, 93 Ellipticforms, 61, 74,148 Ellipticity, 50, 148, 177 235
236
Index
Elliptic partial differential equations, 49, 174 Extension of the interval. 210 Extremal, 27, 150, 177 Extremal arc, 215 F
Focal arcs, 209 maximal,210 Focal class, 151 Focal intervals, 206 maximum, 210, 211 Focal interval theory. 142,201.204-208, 221-223 Focal point, 53. 79, 83, 152, 159, 183, 185. 222, see also Conjugate point; Oscilla› tionpoint order, 151 Focal point hypothesis, 147, 179 Forsythe. 192 Fredholm equations and forms, 141 G
Garding’s inequality, 61 Gauss-Seidel method, 98 Given’s method, 172 Gould, 42, 115, 123 Gradient. 14, 19,23 Green’sfunction. 61 Green’s theorem, 51, 52
L
Lagrange multipliers, 22 Legendre condition, see Condition of Legendre; Ellipticity Limit point. limit circle, 54 Linearfunctionals and forms, 60, 66,70,74 Linear independence, 67 Linear operator, 60 Lopez. 45. 143 M
Mathieu’sequation, 124 Mikami, 52, 53, 202, 208, 211, 217 Minimum, 25 Min-max theorems, 18 Multiplier rule, 3 I N
Negative vectors, 86, 93 Nonsingular are, 29 Nonsingular condition, 27 Norm. 6, 59, 60 Normality, 201, 202 Nullity, 21, 39, 61, 159, 185 Null vector. 39,151, 182 Numerical problem, 88, 166, 188,224 Euler-Lagrange equation, 92. 192.226 nonlinear, 229 systems, 227
H
Harmonic functions. 52, 66 Hazard. 208. 209 Hessian, 14 Hestenes, 15,29.31,61,62, 109,176 Hilbert space. 59, 128 Homogeneous function, 16,91 Hyperplane, 17
Index, see Signature Inner product, 6, 59 Integration by parts, 28, 31,146 Isoperimetric, 45, 117
o Orthogonal complement, 39, 61,63,74,204, 213 Orthogonally diagonalizable , 10 Orthogonal polynomials, 55,117 Orthogonal subspaces, 61,62,63 Orthogonal vectors, 7 Oscillation point. 45. 53, 84. 96, 128.see also Conjugate point; Focal point p
Poisson integral formula, 66 Q
J
Jacobi condition, 25,30
Q closure, 39, 63 Quadratic control problem. 201
Index Quadratic forms. 60- 62 compact. 60. 62. 147. 177 and control problems. 212 and differential equations. 1. 38. 90. 227 finite dimensional. 5 and integral-differential equations, 48. 141 and partial differential equations, 178 positive definite, 60 positive. negative. nonnegative. nonposi› tive, 39.60.74 R
Rayleigh quotient. 16. 19 Rayleigh-Ritz theory. 112. lIS Region (of class B’), 176 Regular point, 23 Relative nullity. 39. 68. 207 Relaxation. 98. 194 Riesz representation theorem. 49. 60. 142. 145 S
Saddle point. 19 Schrodinger’sequation. lIS Second-order conditions. 25. 118 Self-adjoint. 10 differential equations. 38. 145 Separation of variables. 116. 178. 182 of quadraticforms. 199
237
Signature. 13.21.39.61,69.70, 110,
In parameter, 128 A parameter. 79. 84. 85. 185.205.211 Singular differential equations.54 Spline functions. 89, 95. 168-170.190 Stakgold.42 State variables. 213 Stationary values. 14 Stein. 13 Sturm sequence, 96 Subspaces.e’c.cs"u.d +.62 Sufficiency conditions, 25 Symmetric. 10 ’Y/
T
Taylor series expansion. 14 Transversality conditions. 44. 49.54. 150, 208.213 Tridiagonal matrix. 90, 120
v Variations. first and second. 27 Vector space. 6. 59
w Weierstrass conditions. 26 Weierstrass- Erdmann comer conditions. 27