FOUNDATIONS AND APPLICATIONS OF VARIATIONAL AND PERTURBATION METHODS No part of this digital document may be reproduced...
28 downloads
611 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
FOUNDATIONS AND APPLICATIONS OF VARIATIONAL AND PERTURBATION METHODS No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.
FOUNDATIONS AND APPLICATIONS OF VARIATIONAL AND PERTURBATION METHODS
S. RAJ VATSYA
Nova Science Publishers, Inc. New York
Copyright © 2009 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA Vatsya, S. Raj. Foundations and applications of variational and perturbation methods / S. Raj Vatsya. p. cm. Includes bibliographical references and index. ISBN 978-1-60741-414-8 (eBook) 1. Perturbation (Mathematics) 2. Perturbation (Quantum dynamics) 3. Variational principles. I. Title. QA871.V285 2009 515'.392--dc22 2008050554
Published by Nova Science Publishers, Inc.
New York
CONTENTS Preface
vii
Acknowledgements
ix
I.
Foundations
1
Chapter 1
Integration and Vector Spaces 1. I. Preliminaries 1. II. Integration 1. III. Vector Spaces
3 3 6 15
Chapter 2
Operators in Vector Spaces 2.I. Operators in Banach Spaces
23 23
Chapter 3
Variational Methods 3.I. Formulation 3.II. Convergence 3.III. PadÉ Approximants 3.IV. Monotonic Convergence
63 63 74 99 118
Chapter 4
Perturbation Methods 4.I. Perturbed Operator 4.II. Spectral Perturbation 4.III. Spectral Differentiation 4.IV. Iteration
127 127 136 165 173
II.
Applications
181
Chapter 5
Matrices 5.I. Tridiagonal Matrices 5.II. Structured Matrices 5.III. Conjugate Residual-Like Methods
183 183 193 202
Chapter 6
Atomic Systems 6.I. Preliminaries 6.II. Eigenvalues and Critical Points
215 215 216
vi
Chapter 7
Contents 6.III. Scattering
223
Supplementary Examples 7.I. Ray Tomography 7.II. Maxwell’s Equations 7.III. Positivity Lemma for the Elliptic Operators 7.IV. Transport and Propagation 7.V. Quantum Theory
251 251 254 264 289 310
References
327
Index
331
PREFACE Variational and perturbation methods constitute the basis of a variety of numerical techniques to solve a wide range of problems in the physical sciences and applied mathematics. Many of the practicing researchers have limited familiarity with the mathematical foundations of these methods. This literature is scattered and often concentrates on mathematical subtleties without associating them with the physical phenomenon, limiting its accessibility. Another impediment to the dissemination of the rigorous results is a lack of examples illustrating them. Computationally oriented texts present them as ad hoc, problem specific procedures. Present text is aimed at presenting these and other related methods in a unified, coherent framework. Pertaining results, e.g., the convergence and bound properties, are obtained rigorously and illustrated by copious use of examples drawn from various areas of physics, chemistry and engineering disciplines. The material provides sufficient information to researchers in scientific disciplines to apply the mathematical results properly and to mathematicians who may wish to use such techniques to develop solution schemes for scientific problems or analyze them for their properties. Out of a number of mathematical subtleties addressed in the material covered, one deserves mention. Processes in the interior of a physical system respond in a fundamental way to the conditions imposed at its boundary, which are often experimentally controllable or determinable. In the models representing such phenomena, the boundary conditions determine the mathematical character of equations in an equally fundamental way. For the methods to solve such equations to be well founded and reliable, it is essential that these mathematical properties be adequately taken into account. Effort is made to explain the impact of this and other similar conditions on the mathematical properties, methods and physical phenomena. While the mathematical rigor is not compromised, the material is kept focused on solving physical problems of interest. To this end, only the essential areas of abstract mathematics are covered and excessive abstraction is avoided. Mathematical concepts are developed assuming the background of the reader expected of an advanced undergraduate student in science and engineering programs. The concepts and their contents are clarified with examples and comments to facilitate the understanding. This book developed from the notes for an interdisciplinary course, “Applied Analysis,” which the author developed and taught at York University. The course attracted advanced undergraduate and graduate students from the physical sciences and mathematics departments. The material was continuously updated in view of the new developments. Some non-standard applications of the variational and perturbation methods that have not yet been
viii
S. Raj Vatsya
included in the instructional texts and courses have also been included. Effort is made to render this a suitable textbook for a course for the students with background in scientific and mathematical disciplines, as well as a suitable reference item for researchers in a broad range of areas. The material can also supplement other courses in approximation theory, numerical analysis theory and functional analysis. First two chapters concentrate on the mathematical topics needed for later developments. This material is introduced and developed from commonly familiar grounds for accessibility, maintaining the essentials of the concepts. These topics are covered in standard texts. Therefore the details and the proofs are provided only when considered instructive. A reader familiar with them may skip to the later chapters and refer back to these as needed. A reader interested in exploring these topics in more detail may consult standard reference material. While a vast amount of suitable literature is available, the following classic texts are still quite prolific sources of information: M. H. Stone, Linear transformations in Hilbert space and their applications to analysis, Am. Math. Soc. New York (1932); F. Riesz and B. Sz.-Nagy, Functional analysis, Translated by L. F. Boron, Ungar, New York (1955); and T. Kato, Perturbation theory for linear operators, Springer-Verlag, New York (1980); in modern analysis, and R. Courant and D. Hilbert, Methods of mathematical physics, Interscience, New York (1953); in classical analysis. Chapters 3 and 4 cover detailed mathematical foundations of the variational and perturbation methods. Remainder of the text is devoted to worked out examples from various areas of physics, chemistry and engineering disciplines. The examples illustrate the techniques that can be used to select a suitable method and verify the conditions for its validity and thus, establish its reliability. Algebraic details and numerical applications to specific problems are covered extensively in the existing literature. Applications part in the present text covers the ground between the computational aspects of the methods and the results developed in foundations part, where there is a dearth, partly due to a lack of the organized foundational literature. The material is organized with mathematical classification, each class applicable to a variety of problems. Applications part concentrates on concrete problems and reduces each to one of the topics studied in the foundations part, thereby developing a solution scheme and illustrating the applications of the results. The material is presented in precise terms unless descriptive text is deemed necessary for explanations. Supplementary remarks are included for further clarifications and to isolate parts of the text for later reference. An argument is described in detail the first time it is needed. If a similar argument is used for other proofs later, it is outlined relatively briefly to avoid excessive duplication. A term defined for the first time is bolded. Proofs and definitive statements such as the remarks and examples, in danger of confusion with other text, are concluded with a thick period • S. R. Vatsya Formerly: S. R. Singh August 2008
ACKNOWLEDGEMENTS The author is grateful to several researchers, particularly Professors Huw Pritchard and John Nuttall, for bringing numerous problems to his attention. Thanks are due to them and his students for persistently encouraging him to write a text covering this material. Also, constructive advice of Dr. Mile Ostojic and Shafee Ahamed was greatly appreciated.
I. FOUNDATIONS
Chapter 1
1. INTEGRATION AND VECTOR SPACES 1. I. PRELIMINARIES In this section, some introductory concepts and results needed later are collected. Although commonly known concepts will be avoided, some will be included for completeness and their significance, especially if there is a danger of confusion. The proofs of the results commonly available in literature are omitted unless considered instructive. Out of the real number system, zero plays about the most prominent role in analysis. The following simple but useful characterization of zero and its variants pervade the entire analysis: Proposition 1.1. If 0 ≤ a ≤ ε for each Proof. If a > 0 , take
ε > 0 , then a = 0 .
ε = a / 2 , implying that 1 ≤ (1/ 2) , which is a contradiction •
The next concept of major significance is that of a function. Let X and Y denote the sets of elements or points. A function f is a map from its domain D( f ) in X to its range
R( f ) in Y , i.e., an assignment of each x in D( f ) to y = f ( x) in R( f ) . Basic properties of functions, e.g., continuity and differentiability, are defined in terms of the metrics or distance functions associated with X and Y . The material will be introduced for the functions of one variable with X and Y being the sets of real and complex numbers where the metric is given by the absolute value, | x1 − x2 | . Extensions are natural and will be introduced as needed. ∞
A sequence of functions { f n }n =1
is said to converge point wise to f
if
lim | f n ( x) − f ( x) |= 0 for each fixed x in some set Ω in D( f ) , and it is said to n →∞
converge uniformly, if sup | f n − f |→ 0 . The discrete index can be replaced with a x in Ω
continuous one with adjustments. Continuity can be defined in terms of the convergence. A function is right, left continuous if f ( x + ε ) → f ( x) as ε → 0 from the right, left,
4
S. Raj Vatsya
respectively. A function is continuous if it is both, right and left continuous, and uniformly continuous if
sup | f ( x + ε ) − f ( x) | → 0 . ε →0
x in Ω
A function f is absolutely continuous on Ω if
∑
n
| f ( xn ) − f ( xn′ ) | → 0 as
∑
n
| xn − xn′ | → 0 ,
for any subdivision contained in Ω . If a subdivision of an interval is disjoint except for the boundary points of its defining intervals, it will still be termed disjoint, unless the boundary points are of significance. A function f is called a function of bounded variation on Ω if
∑
N n =1
| f ( xn ) − f ( xn −1 ) | remains bounded by a constant for any disjoint subdivision covering
Ω . It is sufficient for this condition to hold for a set of distinct points {xn }nN=1 in Ω such that Ω = [ x1 ; xN ] . A function of bounded variation can be expressed as the difference of two non-decreasing functions. The limit of a convergent sequence of continuous functions can be discontinuous, e.g., the step function, which can be obtained as the limit of continuous functions. Its converse is also true. Also, a sequence of continuous functions can converge point wise to a continuous function f at each x in Ω and not converge uniformly to it, as shown by the following example: Example 1.1. Let f n ( x) = 0 for 1 ≥ x ≥ 1/ n and for x ≤ 1/(n + 2) ; and let f n ( x) = 1 at x = 1/(n + 1) . On the remaining interval it is piece wise linear function joining 0 and 1 . Then lim f n = 0 for each fixed x but sup | f n ( x ) |= 1 • n →∞
x in [0,1]
The following three classic results provide useful criteria to conclude the uniform convergence. Lemma 1.1. (Weirstrass approximation theorem) With an arbitrary continuous function f , for each ε > 0 , there exists a polynomial pn ( x) of sufficiently high degree n such that sup | p n ( x) − f ( x ) |< ε • x in Ω
Lemma 1.2. (Dini's theorem). If a semi-monotonic sequence of continuous functions
{ f n }∞n =1 , converges point wise to a continuous f on a closed domain Ω , then it converges uniformly on Ω •
Foundations and Applications of Variational and Perturbation Methods A sequence
{ f } is m(n)
{ fn} {n} for
a subsequence of
m(n) → ∞ , e.g., {n 2 } is a subsequence of n →∞
if it is contained in
5
{ fn}
and
n = 1, 2,... A family of functions
{ f n }∞n =1 is termed equi-continuous if for each ε > 0 , there is a δ (ε ) , such that sup | f n ( x1 ) − f n ( x2 ) |< ε , whenever | x1 − x2 |< δ (ε ) . n
Lemma 1.3. (Arzela-Ascoli theorem). A set of uniformly bounded equi-continuous ∞
functions { f n }n =1 on a closed domain Ω , contains a uniformly convergent sequence with ∞
limit function being continuous. Further, every point wise convergent subsequence of { f n }n =1 converges uniformly to its continuous limit function • Above definitions and results have their counterparts in the series of functions, which can be obtained by defining the sequences as the partial sums, i.e., f n =
f ( z) of complex variable z
A complex function
∑
n m =1
gm .
is called analytic on an open set Ω in
D( f ) if it is differentiable at each z in Ω . These functions are considerably smoother than the differentiable functions of real variables. Analyticity implies that if the derivative of a function is equal to zero everywhere, then the function is constant. An analytic function admits the Taylor series expansion about each point within the region of its analyticity. This is not valid for the functions of real variables as can be seen from exp(−1/ x ) , which 2
together with all its derivatives, is equal to zero at x = 0 but is not constant in any of its neighborhoods. Extension of the notion of integral developed in the next section does not impact upon the results pertaining to the integration of an analytic function, which are properly covered by Cauchy's construction and the associated results. If f ( z ) is analytic in the whole of the complex plane, it is called entire. Proposition 1.2. (Liouville's theorem) A non-constant entire function is unbounded. Proof. While other proofs of this result are available, one can be based on Cauchy's residue theorem by expressing f& ( z0 ) , the derivative of f ( z ) at z0 , as
f& ( z0 ) =
1 2π i
f ( z ) dz − z0 ) 2
∫ (z
where the integration is along a positively oriented closed contour enclosing z0 , which can be deformed into a circle at infinity. If f ( z ) is bounded, then the integral can be evaluated to yield f& ( z0 ) = 0 for each z0 . This together with the analyticity of
f ( z) is constant •
f ( z)
implies that
6
S. Raj Vatsya In the process of proving Proposition 1.2., we have also shown Corollary 1.1. If f ( z ) is entire and f (∞) =
κ then f ( z ) = κ everywhere •
1. II. INTEGRATION 1.II.1. Basic Concepts The Riemann sum defining the integral of a function was found to be unsatisfactory for applications as analysis advanced to its modern form. In response, attempts were made to modify Riemann's definition. Among all, Lebesgue's construction was deemed to be the most satisfactory one, which underlies modern analysis. In this section, the concept of the Lebesgue integral is developed starting with Riemann's definition. Need for this extension will become clear as the material is developed. The extended definition does not impact directly on the evaluation of the integrals. However, the analysis based on this extension serves to establish the validity of the results, which can then be exploited with the Riemann sums. Complete understanding of Lebesgue's construction is not necessary. It is sufficient to be able to use the results deduced as its consequences, some of which are not valid with Riemann's construction. It is described here as it explains a basic tenet of analysis, which is to extend an incomplete set to a complete one in some sense. A simple example of such construction is provided by the completion of the real number system starting with the rational numbers, and the completion of the complex number system from the reals. Lebesgue developed an abstract system for generality, which underlies his construction. While this development is more complete and satisfying in addition to being beautiful, its accessibility is limited. For simplicity, we develop the concept from the familiar grounds without sacrificing the essential content. The conditions under which certain results are valid cannot be completely clarified without an understanding of the abstract structure. In such cases, the conditions are stated in familiar terms even though they may be stronger than necessary. The development can be easily illustrated with integrals of the functions of one variable on an interval [a; b] in the real line. The level of generality adopted for the exposure is sufficient for applications. Any necessary extensions will be introduced as needed. Various useful results are stated without proofs, which are available elsewhere. The concept of integration originated out of the need to calculate the area covered by the graph
{ f ( x), x} of a function. Consider a bounded function with
D( f ) and R( f ) in the
set of real numbers R1 . Let
In =
n
∑ m( Δi ) inf . [ f (Δi )] and I n = i =1
n
∑ m( Δ ) i
i =1
sup. [ f (Δ i )] ,
where {Δ i } is a disjoint set of intervals covering the domain of the integral D( I ) = [ a; b] contained in D( f ) with length m (Δ i ) such that max .m( Δ i ) → 0 . The lower and upper i =1, n
n →∞
Foundations and Applications of Variational and Perturbation Methods
7
Darboux sums are defined as I = lim I n and I = lim I n , respectively, if the limits exist. n →∞
n →∞
This construction is easily generalized to higher dimensions. For example, for the functions in a plane, all one has to do is to take the set of rectangles for {Δ i } and their areas for m (Δ i ) .
If I = I = I , then f is Riemann integrable with its integral equal to I . Equivalently, n
I = lim ∑ m( Δi )[ f ( xi )] , where xi is an arbitrary point in Δ i . Integrals of unbounded n →∞
i =1
functions are defined as the limits of the integrals of the bounded functions and the integrals on an infinite interval are defined as the limits of the integrals on finite intervals, provided that these limits exist. Integrals of complex functions are evaluated by separating their real and imaginary parts. Example 1.2. Consider a function f% = 1 on the set of all rational numbers and f% = 0 on the set of all irrationals in [0;1] . Since every interval in the subdivision must contain at least one rational and one irrational, f% is not Riemann integrable. However, f% can be obtained as the limit of a sequence of functions
{ f n } with each
f n = 1 on a finite number of
points, defined by
S n = (l / m), m = 1, 2,..., n; l = 0,1, 2,..., m; and f n = 0 elsewhere. These functions are Riemann integrable with the integral of each being equal to zero. Thus, the limit function of a sequence of Riemann integrable functions is not necessarily Riemann integrable, i.e., the set of Riemann integrable functions is not closed or complete. This deficiency has a profound impact on analysis • Since a finite number of lines cover zero area, the value of a function at a finite number of points has no effect on the area covered by its graph. Such sets are termed the sets of measure zero. A statement is said to be true almost everywhere, if it is valid except at a set of measure zero. While the notion of measure is introduced here from the length, it is generalized to situations wherever integration is to be defined, with the following basic property: The measure m of a countable collection of mutually disjoint sets is equal to the sum of their individual measures. Unless otherwise stated, m will be assumed to be non-negative. If the elements of a set can be mapped one to one into the set of integers, it is called countable, e.g., S∞ as defined in Example 1.2. The measure of a countable set of elements each one of measure zero is equal to zero. For illustration, consider S∞ . The same proof applies to all countable sets. Each element of S∞ can be covered by an interval of arbitrarily small measure. For a given
ε > 0 , cover nth element by a set of measure ε / 2n , n = 1, 2,...
Then 0 ≤ m ( S n ) ≤ ε , and hence, 0 ≤ m ( S∞ ) = lim m ( S n ) ≤ ε , implying that m ( S∞ ) = 0 , n →∞
8
S. Raj Vatsya
from Proposition 1.1. Since m ( S∞ ) = 0 , it should be expected that an integral of f% be definable and its value should be equal to zero. Thus Reimann's construction is also deficient. A function s is called simple if it can be expressed as s ( x ) =
∑
n i =1
α i χ ( Si ) , where
χ ( Si ) is the characteristic function of Si , i.e., χ ( Si ) = 1 on Si and zero elsewhere, with
{ Si }
being a mutually disjoint set covering D( I ) . The Riemann integrals of such functions
are given by
∫
D(I )
s ( x ) dm ( x ) =
∑
n i =1
α i m ( Si ) .
Let f be non-negative and let
I = sup
∫
D(I )
s ( x) dm ( x) ,
the supremum being taken over all simple functions such that 0 ≤ s ≤ f . If I exists, then
f is called Lebesgue integrable with its integral being equal to I . Equivalently, let
{sn ( x)}
be a non-decreasing sequence of simple functions with their integrals having a
common bound. It can be shown that this sequence converges to a function f almost everywhere. Each such f is Lebesgue integrable with integral given by
I =
∫
D(I )
f ( x) dm ( x) = lim
n →∞
∫
s ( x ) dm ( x ) .
D(I ) n
Existence of the limit is guaranteed by the existence of the common bound and it can be shown to be unique. It is clear that the sets of measure zero do not contribute to the integral. If not mentioned otherwise, integration will be understood to be in the Lebesgue sense. Integration of f% is now clearly defined to be equal to zero. The sets of measure zero have no impact on the value of an integral and can therefore be ignored. This definition retains the usual properties of the Riemann integrals. The integrals of the sums and differences of the functions are defined as the sums and differences of their integrals. The integrals of general real functions are defined by separately integrating their positive and negative parts. The integrals of complex functions are defined by separately integrating their real and imaginary parts. The integrals of unbounded functions and on unbounded D( I ) are defined by the appropriate limits, if they exist. Natural question arises whether the set of the Lebesgue integrable functions is closed or complete in the sense the set of Riemann integrable functions is not. The answer is provided in the affirmative by the following result.
Foundations and Applications of Variational and Perturbation Methods Theorem 1.1. (Beppo-Levi theorem) Let
{ fn}
9
be a non-decreasing sequence of
integrable functions with their integrals having a common bound. Then f = lim f n exists n →∞
almost everywhere and
∫
D( I )
f ( x ) dm ( x ) = lim n →∞
∫
D( I )
f n ( x ) dm ( x) •
As indicated earlier, the results on the sequences of functions can be transferred to the series. Such an extension of the Beppo-Levi theorem is particularly useful. Corollary 1.2. If
∑ ∫ ∞
| f n ( x) | dm ( x) converges, then the series
n =1 D ( I )
∑
∞ n =1
f n ( x)
converges almost everywhere to an integrable function and the order of summation and integration can be interchanged. Proof. Follows from Theorem 1.1 by substituting
∑
n m =1
f m ( x ) for f n ( x) •
Parts of the assumptions and conclusions in Theorem 1.1 can be interchanged to yield the following result: Theorem 1.2. (Lebesgue monotone convergence theorem) Let
{ fn}
be a non-
decreasing sequence of integrable functions such that f = lim f n . Then f is integrable and n →∞
∫
D( I )
f ( x ) dm ( x ) = lim n →∞
∫
D( I )
f n ( x ) dm ( x) •
In addition to their value in defining a closed set of integrable functions, Theorems 1.1 and 1.2 provide sufficiency criteria for an interchange of the limit and the integration, which is not always valid as is clear from the following example: Example 1.3. Consider the sequences
{
}
nx n , {nx n } and {n 2 x n } on [0;1] . All three
sequences converge to zero except at a set of measure zero, that is x = 1 , where they all increase without bound, i.e., the limit function is the same for all, which is the vertical infinite line at x = 1 with integral equal to zero. However, the limits of their integrals converge to zero and one, respectively, in the first two cases and in the third, a limit does not exist • In addition to Theorems 1.1 and 1.2, the following result is an alternative way to ensure the validity of such an interchange.
10
S. Raj Vatsya Theorem 1.3. (Lebesgue dominated convergence theorem) Let f n , n = 1, 2,..., and
g be integrable functions such that | f n |≤ g and f n → f almost everywhere. Then f is n →∞
integrable and
∫
D(I )
f ( x ) dx = lim ∫
n →∞ D ( I )
f n ( x ) dx •
While Theorems 1.1 and 1.2 also provide sufficient criteria to allow the interchange of the integration and limit operations, sequences encountered are rarely monotonic or bounded. However, boundedness of a sequence by an integrable function is usually convenient to establish. Thus Theorem 1.3 provides a more useful sufficiency criterion. The question of the orders of operations arises also in the case of the multiple integrals. Let
f ( x, y ) be a function defined on the rectangle D( I ) = D1 × D 2 with measure
m = m 1 × m 2 . The following result ensures the legitimacy of such an interchange: Theorem 1.4. (Fubini's theorem) Let f ( x, y ) be absolutely integrable in any order on
D( I ) = D1 × D 2 , i.e.,
∫
D1
dm1 ( x)
∫
D2
dm 2 ( y ) | f ( x, y ) | or
∫
D2
dm 2 ( y )
∫
D1
dm1 ( x) | f ( x, y ) | ,
exists. Then the integrals
∫
D1
dm1 ( x) f ( x, y ) and
∫
D2
dm 2 ( y ) f ( x, y ) ,
are defined for almost every y and x , respectively, and the integral on D ( I ) can be evaluated in any order, i.e.,
∫
D1
dm1 ( x) ∫ 2 dm 2 ( y ) f ( x, y ) = ∫ 2 dm 2 ( y ) ∫ 1 dm 1 ( x) f ( x, y ) = ∫ D
D
D
D( I )
f ( x, y ) d m ( x , y )
The last term is the originally defined integral • Remark 1.1. (Stieltjes integral) Consider the case of integration of the functions of one variable with m ( x) defined by a non-decreasing function γ ( x) with the measure of an interval [a′; b′] given by (γ (b′) − γ ( a′)) . We assume that integral denoted by
I =
∫
D( I )
γ ( x) is right continuous. The
f ( x) d γ ( x) is defined as above with the measure so
calculated. This is usually termed the Riemann-Stieltjes or the Lebesgue-Stieltjes integral, depending on the construction used in integration. If not explicitly mentioned, it will be understood to be defined as the Lebesgue-Stieltjes integral. This concept is naturally
Foundations and Applications of Variational and Perturbation Methods extendible to include the functions
11
γ ( x) of bounded variation and the multi-dimensional
spaces also. Above results on integration are clearly applicable to the Stieltjes integrals • Non-decreasing functions are differentiable almost everywhere. Since a function of bounded variation can be expressed as the difference of two non-decreasing functions, it is differentiable almost everywhere also. Indefinite integral f ( x) of an integrable function
g ( x ) , f ( x) =
∫
x
a
g ( y) dy , is a function of bounded variation with total variation
V ( f ;[a; b]) over the interval [a; b] given by V ( f ;[ a; b]) =
∫
b
a
| g ( x) | dx ,
the definition being valid also for complex functions. Consequently, an indefinite integral of an integrable function is differentiable almost everywhere, and g = f& almost everywhere, where the dot denotes the derivative. Continuity is not sufficient for the differentiability of a function. However, an absolutely continuous function is differentiable almost everywhere and it can be expressed as the integral of its derivative. A useful result valid also for complex functions of bounded variations is stated in Proposition 1.3 below. Proposition 1.3. (Helly's theorem) Let variation of a real variable independent constant. Then wise to a function
{ρn (λ )} be a sequence of functions of bounded
λ with total variation V [ ρ n (λ )] ≤ κ , where κ is an n -
{ρn (λ )} has a subsequence {ρm ( n ) (λ )} , which converges point
ρ (λ ) of bounded variation with V [ ρ (λ )] ≤ κ •
Integration by parts and by substitution can also be carried out with adequate integrability conditions. If f and g are integrable functions with F and G being their indefinite integrals, respectively, then integration by parts, i.e., b
∫
a
∫
F ( x ) g ( x ) dx +
b
a
G ( x ) f ( x ) dx =
∫
b
a
F ( x )dG ( x ) dx +
∫
b
a
G ( x )dF ( x ) dx
(1.1)
= F (b)G (b) − F (a )G (a ), is valid. For integration by substitution, if x(t ) is a non-decreasing, absolutely continuous function, and f ( x ) is integrable, then f ( x(t )) x& (t ) is also integrable with
∫
b
a
f ( x) dx =
∫
t (b)
t (a)
f ( x(t )) x& (t ) dt .
(1.2)
12
S. Raj Vatsya
1.II.2. Integration over Trajectories Concept of integration on a set of trajectories in space-time manifold has been used to develop the path-integral formulation of quantum mechanics and applied effectively to analyze a number of problems in quantum theory, particularly the quantum fields (Feynman,1948; Feynman and Hibbs, 1965). The formulation has also emerged as the basis for effective computational techniques to solve partial differential equations in several scientific and engineering disciplines including the quantum theory (Vatsya, 2005). This concept opens up a wide range of novel constructions and questions. We restrict to a limited exposure, directly relevant to the type of cases encountered in the applications considered in ch. 7. The following construction is formal. Operations must be justified with properties available in the specific applications. For illustration, consider the trajectories in a one-dimensional space, the base manifold, parameterized by a variable τ and thus, each expressible as a function ξ (τ ) with graph being the set of points {ξ (τ );τ } in the two-dimensional, composite, manifold obtained by adjoining the parameter as the additional coordinate to the base manifold. Construction in the manifolds of higher dimensions is about the same. The aim is to determine an appropriately weighted sum of a function f (ξ ) defined over a set of trajectories from one point ( y;τ ( y )) to the other, ( x;τ ( x)) . The concept of integration was developed in the last subsection as the limit of properly constructed finite sums with a weight attached to each term. Specifically, the target function was approximated by a sequence of simple functions and its integral was defined as the limit of the sequence of integrals of the approximating simple functions. In the following, we extend the construction to obtain the integrals of the functions f (ξ ) with respect to a suitable measure m (ξ ) defined on the set of trajectories, termed the Weiner measure. In general, complications arise in the construction of a suitable measure. Once a Weiner measure has been constructed, the integral can be defined as a Lebesgue sum without any further adjustments. We proceed with a construction of the weighted sums and define the integral as their limit, whenever it exists. Consider the set of all trajectories joining the two points, ( y;τ ( y )) to ( x;τ ( x)) in the composite space. Let
τ ( y ) = τ 0 < τ 1 < τ 2 ...... < τ N −1 < τ N = τ ( x) , be a subdivision of the interval [τ 0 ;τ N ] in the parameter space, which may be equispaced but it is not necessary. Let {Δ m }m =1 be a disjoint set of open intervals with its closure covering M
[ y; x] , and let ξ m be a point in the interior of Δ m for each m . Each Δ m should shrink to a point as N increases to infinity. Let the point (ξ m ;τ n ) be denoted by
ξ m , and set n
Foundations and Applications of Variational and Perturbation Methods
{ }
PB ξ mn
M
13
= {ξ m }m =1 , n = 1, 2,...., ( N − 1) , M
mn =1
i.e., PB is the projection from the composite space to the base space. This attaches a copy of the base manifold at each point
τ n in the parameter space. The corresponding interval,
enclosing PB (ξ mn ) , will be denoted by Δ mn , which is an interval on the copy of the base manifold at
τ n . Now construct M trajectories in the interval [τ 0 ;τ 1 ] , all from
( y;τ ( y )) = ξ m0 = ξ 0 to {ξ m1 }mM1 =1 , and in the intervals [τ n −1 ;τ n ] from {ξ mn−1 } to {ξ mn } for
n = 2,3,....., ( N − 1) , as well as M trajectories from {ξ mN −1 } to ( x;τ ( x)) = ξ mN = ξ N in [τ N −1 ;τ N ] , such that each segment is contained in the rectangle with corners at
ξ m , ξ m , ξ m +1 , ξ m +1 for each n , and such that f on each segment is sufficiently close to n −1
n
n +1
n
its value on all the trajectories contained in the rectangle. In applications, these segments can be uniquely defined in terms of their end points. This will be assumed to be the case. Thus, f on each of the segments can be expressed as the function of its end points. Unions of all of these segments determine M
N −1
trajectories from y to x . It is a legitimate construction as
long as this set approximates its limit set. We shall restrict to the integration of the functions with the property
f ([a′; c′]) = f ([a′; b′]) f ([b′; c′])
(1.3-a)
for each set of points a′, b′, c′ on each trajectory, including the approximating set. Thus, f is essentially a representation of an additive group of real numbers. Supplemented with the continuity, f is determined to be an exponential. In applications, we shall encounter only the exponential functions explicitly. In any case, with this group property, f [(ξ μ (τ )] on each of these trajectories
ξ μ , can be expressed as
f [(ξ μ (τ )] = f [ξ N , ξ mN −1 ] f [ξ mN −1 , ξ mN −2 ]..... f [ξm2 , ξm1 ] f [ξm1 , ξ0 ] , where
(1.3-b)
μ = 1, 2,....., M N −1 , runs over the finite set of trajectories defined by
mn = 1, 2,..., M , for each n = 1, 2,..., ( N − 1) . Consider the sum
Ζ N , M [ξ N , ξ 0 ] = ∑ m ,m ,...,m M
1
2
N −1 =1
f [ξ μ (τ )] m ( Δ m1 )m ( Δ m2 ).....m ( Δ mN −1 ) ,
(1.4)
where m (Δ mn ) = m (Δ m ) is the measure assigned to the trajectories in the interval Δ mn in the base manifold. Let
14
S. Raj Vatsya
ℑN [ξ N , ξ0 ] = lim Ζ N , M [ξ N , ξ0 ] ,
(1.5)
M →∞
if it exists. From Eqs. (1.3-a)-(1.4) , Eq. (1.5) can be expressed as
ℑN [ξ N , ξ0 ] =
∫ dm (ξ ) f [ξ , (ξ ;τ )] f [(ξ ;τ ), (ξ ;τ )]... (1.6-a) ∫ ..............∫ dm (ξ ) f [(ξ ;τ ), (ξ ;τ )] f [(ξ ;τ ), ξ ], N −1
N
N −1
N −1
1
where
N −1
2
2
1
N −1
N −2
1
1
N −2
1
0
ξ n for each n = 1, 2,..., ( N − 1) , is the coordinate variable on the base space. Thus,
the right side of Eq. (1.6-a) defines an integral on the product space of ( N − 1) copies of the
τ is still a free parameter. If the limit exists in Eq. (1.6-a) as N increases to infinity with ξ N = ( x;τ ( x)) and ξ 0 = ( y;τ ( y )) fixed, then it defines the integral of f on the set of all trajectories from ( y;τ ( y )) to ( x;τ ( x)) , together with the resulting measure, base space and
i.e.,
ℑ[( x;τ ( x)), ( y;τ ( y ))] = = lim ℑN [ξ N , ξ 0 ] . N →∞
(1.6-b)
The integrals on the trajectories in an unbounded domain are defined as the limits of the integrals on the bounded domains. As a further extension of the process, one can define integration along all trajectories ending at ( x;τ ( x)) , i.e., from all ( y;τ ( y )) , by summing over ( y;τ ( y )) with appropriate measure, or by continuing the construction to all points ( y;τ ( y )) at infinity. The integral along all trajectories ending at ( x;τ ( x)) will be denoted by ℑ[ x;τ ( x)] , which is of more direct interest for the present. This summation can be introduced in Eq. (1.4) and retained in the subsequent equations to arrive at the same result. In the following we take the summed forms of Eq. (1.4) to (1.6-b). In various applications, an explicit knowledge of the measure is not required. Instead, the existence of the limits, and thus the integrals, and the necessary properties of ℑ[ x;τ ( x)] can be obtained by using the following representation. With τ N −1 = τ , ε = [τ ( x) − τ ] and ξ = ξ N −1 , Eq. (1.6-a) summed over ( y;τ ( y )) yields
ℑN [ x;τ + ε ] =
∫ dm (ξ )
f [ x,τ + ε ; ξ ,τ ] ℑN −1[ξ ,τ ] .
Since ε → 0 as N → ∞ , we can express the limit as
(1.7-a)
Foundations and Applications of Variational and Perturbation Methods
15
lim ℑ[ x;τ + ε ] = lim ℑN [ x;τ + ε ] ε →0
N →∞
= lim
N →∞
= lim ε →0
∫ dm (ξ ) f [ x,τ + ε ; ξ ,τ ] ℑ [ξ ,τ ] . ∫ dm (ξ ) f [ x,τ + ε ; ξ ,τ ] ℑ[ξ ,τ ]. N −1
(1.7-b)
An explicit procedure will be described by a specific example in ch. 7. It is not necessary to include the set of all trajectories in the sum. Integrations over other properly characterized sets can be defined by assigning a suitable measure to the set and its approximating sets. From Eq. (1.7-b), we must have that
lim ∫ dm (ξ ) f [ x,τ + ε ; ξ ,τ ] = 1 . ε →0
(1.7-c)
Thus, if there is a suitable measure m (ξ ) for a set of trajectories in Eq. (1.7-a), the same measure can be redistributed over a different set without affecting the deduction and the normalization. Remark 1.2. Eq. (1.7-b) defines a one parameter family of integral operators, which propagates the function ℑ[ ] from one point [ξ ,τ ] to another neighboring point,
[ x,τ + ε ] , in its infinitesimal neighborhood. For this reason, the function f in this role is frequently called a propagator. It is clear that the value of ℑ[ξ ,τ ] can be arbitrarily specified, with some restrictions on the class of functions, without affecting Eqs. (1.7-a) and (1.7-b). This arbitrariness can also be used in Eq. (1.4) to specify a function value in a neighborhood of ( y;τ ( y )) to arrive at the same result. Furthermore, Eqs. (1.7-a) and (1.7-b), can be used to deduce Eq. (1.6-b) by repeated operations of the propagator, which is equivalent to generating a Lie group element as a product of the elements in an infinitesimal neighborhood of the identity. This process is equivalent to the summed form of Eq. (1.5) supplemented with Eq. (1.4), which is the limit of a sum over a finite number of trajectories converging to the limit set from all points to ( x;τ ( x)) . Thus, if a propagator satisfying Eq. (1.7-c) exists, ℑ[ x,τ ( x)] can be characterized as a weighted sum of a function f (ξ ;τ ) over the trajectories ending at ( x;τ ( x)) •
1. III. VECTOR SPACES A linear vector space is a set V whose elements are called vectors denoted by
u, υ , w ,…, associated with a field of scalars α , β ,…, with two operations, addition and multiplication, defined on them as follows: 1. For all u ,
υ in V , there is a vector (u + υ ) = (υ + u ) in V ;
16
S. Raj Vatsya
(u + υ ) + w = u + (υ + w) ;
2.
3. There is a null vector 0 such that u + 0 = 0 + u = u ; 4.
α (u + υ ) = α u + αυ ;
5.
(α + β ) u = α u + β u ;
6.
(αβ ) u = α ( β u ) ;
7. There is a unit element 1 in the field such that 1u = u . Out of various classes of vector spaces over a variety of fields, we shall have occasions to deal with only the following spaces. Even out of them, most of the analysis will be carried out 2
on the setting of L . The field in each case is the field of real or complex numbers and the operations of addition and multiplication are as usually known. The number of arguments of the functions in case of the function spaces is arbitrary.
V n : Space of n-column vectors, including n = ∞ . 0 2. C : Space of continuous functions. 1 3. L : Space of absolutely integrable functions. 2 4. L : Space of square integrable functions. 1.
It is straightforward to check that the first three of these spaces satisfy the properties required of them. The requirement that the square integrability of two functions implies the 2
same for their sum, is not so obvious, which is required of L . Instead of proving it at this stage, we introduce further structures on these spaces. This result will be proven in the process. Norm is a mapping from a vector space into the set of non-negative real numbers taking u to || u || with the following properties: 1.
|| u || ≥ 0 ;
2.
|| u || = 0 if and only if u = 0 ;
3.
|| α u || = | α | || u || ;
4.
|| u + υ || ≤ || u || + || υ || .
A linear vector space equipped with a norm is called a normed linear space. n
Let u j be the elements of a vector u in V . It becomes a normed linear space with the supremum norm denoted by ||| . ||| , defined by ||| u |||= sup. | u j | . The space C acquires 0
j =1, n
the structure of a normed linear space with the supremum norm ||| u |||= sup. | u ( x ) | . The x in D ( u )
1
space L is a normed linear space with the norm defined by
Foundations and Applications of Variational and Perturbation Methods
|| u || =
∫
D (u )
17
| u ( x ) | dm ( x )
The properties of the norm except 2 are straightforward to check for L . However, || u ||= 0 1
implies only that u = 0 almost everywhere. Considering all functions differing on the sets 1
of measure zero as equivalent and thus, one vector, L acquires the desired structure. A vector space is called a scalar product or inner product space if to each pair of vectors u and υ , there is a complex number (u ,υ ) , called a scalar or inner product, such that 1.
(u,υ ) = (υ , u )* , where * denotes the complex conjugate;
2.
(u,υ + w) = (u ,υ ) + (u , w) ;
3.
(u, αυ ) = α (u ,υ ) ;
4.
(u, u ) ≥ 0 ;
5.
(u, u ) = 0 if and only if u = 0 .
This notational convention is more commonly used in the scientific literature, which differs from the one available in the mathematical literature. However, both of the definitions are equivalent. The scalar product can be used to define a norm by || u ||=
(u, u ) . To show this,
we need the Schwarz inequality, which is of significant interest otherwise as well. Essentially equivalent three proofs are given for their instructive significance. The inequality is frequently used to estimate the quantities that are not explicitly defined as the scalar products but can be treated as such. Lemma 1.4. (Schwarz inequality) With the scalar product defined as above,
| (u ,υ ) |2 ≤ (u, u ) (υ ,υ ) . Proof-1. Let a = (u , u ), b =| (u ,υ ) | and c = (υ ,υ ) . There is a complex number such that | α |= 1 and
α
α (u,υ ) = b . For any real β , we have
(u − βαυ , u − αβυ ) = a − 2bβ + c β 2 ≥ 0 . If c = 0 , then b = 0 otherwise this inequality is violated for large positive result. If c > 0 , take
β , proving the
β = b / c , which yields b 2 ≤ ac •
υ = 0 (or u = 0 ). Assuming that υ ≠ 0 , let φ = υ / (υ ,υ ) and w = [u − (φ , u )φ ] . Since (φ , φ ) = 1 , we have Proof-2. The result is clearly true if
0 ≤ ( w, w) = ([u − (φ , u )φ ],[u − (φ , u )φ ]) = (u , u ) − (u , φ )(φ , u ) ,
18
S. Raj Vatsya i.e., (u , φ )(φ , u ) ≤ (u , u ) , implying the result • Proof-3. With symbols as in Proof-2, u can be expressed as u = [(φ , u )φ + φ ′] , where
(φ , φ ′) = (φ ,[u − (φ , u )φ ]) = 0 . Hence (u , u ) =| (u , φ ) | +(φ ′, φ ′) ≥| (u , φ ) | , implying the result • 2
The fact that || u ||=
2
(u, u ) defines a norm follows from Lemma 1.4. All the properties
except the triangle inequality are straightforward to check. This inequality is verified as follows:
(u + υ , u + υ ) = (u , u ) + (u ,υ ) + (υ , u ) + (υ ,υ ) ≤ (u , u ) + 2 (u , u ) (υ ,υ ) + (υ ,υ ) 2
= ⎡⎣ (u , u ) +
(υ ,υ ) ⎤⎦ .
It is straightforward to check that
(u , υ ) =
∫
D
u * ( x) υ ( x) dm ( x)
satisfies all the properties of a scalar product except 5. However, if all functions equal almost 2
everywhere are considered equivalent, then L becomes a scalar product space, and hence, a normed linear space. As is the case with the real number system, a set D in a normed linear space is closed if
it has the limit vector of every convergent sequence, i.e., if {un } is a convergent sequence, there is a vector u in D such that lim || un − u ||= 0 . An open set D can be extended to its n →∞
closure, a closed set D , by including all of its limit points. The set D is termed dense in
D . In general, D is dense in D′ if D contains D′ . The Cauchy criterion enables one to determine if a sequence {un } is convergent. It is convergent if it is a Cauchy sequence, i.e., if lim || un − um ||= 0 . A closed or complete normed linear vector space is called a Banach n , m →∞
space. n
0
1
2
Consider the spaces V , C , L and L
introduced earlier. Completeness of V 0
n
follows from the completeness of the real and complex number systems and C is the closure of the set of all polynomials by the Weirstrass approximation theorem (Lemma 1.1). The other two spaces are not complete if the Riemann integration is used to define the norms, but they are complete if Lebesgue's construction is used, as indicated in the following example.
Foundations and Applications of Variational and Perturbation Methods
19
1
Example 1.4. Consider L ([0;1], dx ) , i.e., the space of absolutely integrable functions on [0;1] with respect to the measure defined by dm ( x) = dx . If the integration is defined in the Riemann sense, then it is not a Banach space for the following. Consider the function f% introduced in Example 1.2, which is zero at all irrationals and one at all rationals in the interval [0;1] , which is the limit of a Cauchy sequence
{ f n } , but is not in the space for it is
not Riemann integrable, and hence does not belong to the space so defined. However, if the Lebesgue definition of integration is assumed, then it follows from the Beppo-Levi theorem 1
2
(Theorem 1.1) that L ([0;1], dx ) is a Banach space. The same argument applies to L • The proofs of convergence and existence require completeness of the underlying spaces. Among others, this is one reason why Lebesgue's extension of the concept of integral is crucial to carry out the necessary analysis. A scalar product space, which is also a Banach space, is called a Hilbert space. The terms Banach and Hilbert spaces are at times used only for the infinite dimensional spaces. In case of the finite dimensional spaces, the complications resulting from the convergence questions associated with infinite dimensionality do not arise. However, in other respects, they can be treated as the subclasses of the respective infinite dimensional spaces. For the present, these terms will be used for the finite dimensional spaces also. Further, most of the analysis will be carried out in the Hilbert spaces, denoted by H . A scalar product can be defined on V
n
as well: (u ,υ ) =
∑
N n =1
un*υn , where N can be
finite or infinite with the equality assumed to hold in the limit. In fact this scalar product can also be defined in the integral notation by taking dm ( x) = 1 at the integral values of x and zero otherwise. In this way, V
n
acquires the Hilbert space structure, which will be denoted
2
by L ( N ) . With respect to the supremum norm, V
n
is still a Banach space but not a Hilbert
space. A sequence {un } convergent in a Banach space with respect to the norm, i.e., if
lim || un − u ||= 0 is called strongly convergent to u . If for each υ in a Hilbert space H , n →∞
lim(υ , un − u ) = 0 , then it is called weakly convergent to u . Weak convergence can be n →∞
defined in a general Banach space also. This requires additional concepts and therefore, it will be considered later. This is weaker form of the convergence as shown in Example 1.5. We need the following result, which finds a wide range of other applications as well. Lemma 1.5. (Riemann-Lebesgue lemma) With an absolutely integrable f ,
lim
α →∞
∫
D( I )
e iα x f ( x ) dx = 0
This result is applicable for integrals on infinite domains also •
20
S. Raj Vatsya 2
Example 1.5. We show that sin(nx ) converges weakly to zero in L ([0;1], dx) but not strongly. The norm of the vectors defined by sin(nx ) is equal to 1/ 2 for each n , and hence the sequence does not converge strongly to zero. For weak convergence, first notice that a square integrable function on a finite interval is also absolutely integrable and hence,
L2 ([0;1], dx) is contained in L1 ([0;1], dx) . The result now follows from Lemma 1.5 • Strong convergence implies that || un ||→|| u || , which follows from
|| un || ≤ || u || + || un − u || → || u || . n →∞
Supplemented with || un ||→|| u || weak convergence implies strong convergence as shown in Proposition 1.4. Proposition 1.4. Let {un } be a weakly convergent sequence with limit u such that
|| un ||→|| u || . Then {un } converges strongly to u . Proof. The result follows from
|| un − u ||2 = || u ||2 −(un , u ) − (u , un ) + || un ||2 → 0 • n →∞
If not stated otherwise, the convergence in a Banach space, including the Hilbert, will mean strong convergence. Let u be in H . It is clear that if (υ , u ) = 0 for all υ in H , then u = 0 for one can take u for υ . Furthermore, the following stronger result is valid:
Lemma 1.6. If (υ , u ) = 0 for all υ in a set D dense in H , then u = 0 .
ε > 0 , there is a υ (ε ) in D such that || u − υ (ε ) ||≤ ε / || u || , since D is dense in H . Consequently, Proof. If u ≠ 0
then for each
(u, u ) = (υ (ε ), u ) + (u − υ (ε ), u ) = (u − υ (ε ), u ) ≤ | (u − υ (ε ), u ) | ≤ || (u − υ (ε ) || || u || ≤ ε . Hence, from Proposition 1.1, (u , u ) = 0 , implying that u = 0 , which is a contradiction, since
u≠0•
Foundations and Applications of Variational and Perturbation Methods
21
N
A set of vectors {un }n=1 in a vector space is linearly independent if and only if none of them can be expressed as a linear combination of some or all of the others. It is straightforward to show that a set is linearly independent if and only if implies that each
∑
N n =1
α nun = 0
α n = 0 . The maximum number of linearly independent vectors in a space is
called its dimension. If there is no maximum number, then the space is infinite dimensional. N
The dimension of V is N , considered as a Hilbert space or a Banach space with respect to the supremum norm, which can be infinite. The other three spaces are infinite dimensional. For example, consider the spaces of respective functions on the interval [0;1] . n ∞
The set {x }n =0 forms a linearly independent set in each of the spaces. A set of nonzero ∞
vectors {un }n =1 is called a basis if and only if every vector in the space can be expressed as a linear combination of some of its members. Unless otherwise stated, the basis set will be assumed to be linearly independent. Also, if infinitely many members are needed to express a ∞
vector, then the equality holds in the convergence sense, i.e., {un }n =1 is a basis if for every u in a Banach space, there are scalars such that lim || u − N →∞
still be expressed as u =
∑
∞ n =1
∑
N n =1
α nun ||= 0 . This relation will
α nun . A set of vectors is complete in the space if it forms a
basis. A vector with unit norm is called normalized. Every nonzero vector in a Banach space can be normalized by dividing it by its norm. If the members of a basis have unit norm, it is called a normalized basis. There are spaces without any basis. We will encounter only the spaces with bases, which will be assumed without further mention. All finite linear combinations of a basis clearly form a dense set in the space. The set of column vectors with n
th
entry equal to 1 and others equal to zero for
n = 1, 2,.... , forms a normalized basis in V N with respect to either norm. The set {x n }∞n = 0 forms a basis in the other three spaces on any finite interval. Corollary 1.3. Let {ϕ n } be a basis in H. A vector u = 0 if and only if (ϕ n , u ) = 0 for all n . Proof. Only if part is obvious. If (ϕ n , u ) = 0 for all n , then (υ , u ) = 0 for all υ in the set D formed by all linear combinations of {ϕ n } . Since D is dense in H , the result follows from Lemma 1.6 • ∞
Remark 1.3. (Gram-Schmidt procedure) Let {un }n =1 be a basis in H. If
(un , um ) = δ nm , where Kronecker delta δ nm = 1 for m = n and δ nm = 0 otherwise, then it is called an orthonormal basis. This basis is clearly normalized. An orthonormal basis can be ∞
constructed from an arbitrary basis {un }n =1 by the Gram-Schmidt process as follows. Let
22
S. Raj Vatsya
ϕ1 = u1 / || u1 || , φn = un − ∑ i =1 ϕi (ϕ i ,un ) and ϕ n = φn / || φn || . The set {ϕn }∞n =1 forms an n −1
orthonormal basis • A subset E of H is called a linear manifold if it contains all linear combinations of its elements. If it is also closed, then it is called a subspace of H . The set of all elements of H , which are not in E, is called its complement denoted by EC. A linear manifold of finite dimensions is always closed, hence always a subspace. Every element u of H can be expressed as u = f + g with f in E and g is orthogonal to every element in E, defining EC. This result is known as the decomposition theorem. Thus EC consists of all elements orthogonal to E . Therefore it is called the orthogonal complement of E. Remark 1.4. Composite spaces can be formed from the original spaces. Among them the direct sums and the tensor products play a significant role, particularly in case of the Hilbert spaces encountered in scientific models. To avoid complexities, these spaces can be conveniently determined by the orthonormal bases {ϕ1n } in H 1 and {ϕ 2 n } in H 2 with the respective scalar products denoted by (.,.)1 and (.,.) 2 . The basis in the direct sum
H = H 1 ⊕ H 2 , with the vectors denoted by u = u1 ⊕ u2 , is the union of the two basis sets {ϕ1n , ϕ 2 m } . The scalar products for the vectors in the same space are the same as in the original spaces and the products like (ϕ1n , ϕ 2 m ) are set equal to zero. The spaces H 1 and
H 2 are thus made into the orthogonal complements of each other in H . The basis in the tensor product H = H 1 ⊗ H 2 with vectors denoted by u = u1 ⊗ u2 , is given by the products of the bases: {ϕ1nϕ 2 m } with scalar product
(ϕ1nϕ 2 m , ϕ1n 'ϕ 2 m ' ) = (ϕ1n , ϕ1n ' )1 (ϕ 2 m , ϕ 2 m ' ) 2 . The spaces are determined by the bases as the completion of the sets of all of their linear combinations •
Chapter 2
2. OPERATORS IN VECTOR SPACES 2.I. OPERATORS IN BANACH SPACES Operators in the vector spaces are defined essentially in the same manner as the functions. We shall restrict to the case of X and Y being the Banach spaces, and mostly the case Y = X . An operator A is a map from its domain D ( A ) in X to its range R ( A ) in
Y , i.e., an assignment of each
u
in
D( A )
to
υ = Au
in
R ( A ) . If
A (α u + βυ ) = (α Au + β Aυ ) , then the operator A is called linear; otherwise, it is nonlinear. The norm || A || of A is defined as sup || Au || , supremum being taken over all normalized vectors in D ( A ) , if it exists. If || A || exists, A is called bounded; otherwise,
A is unbounded. The null operator 0 maps every vector into the null vector, and the identity operator 1 maps every vector into itself, which are the examples of bounded operators with norms equal to zero and one, respectively. Two operators A and B are equal if D( A ) = D(B ) and if Au = B u for each u in D( A ) . If D( A ) is contained in D( A ′) and for each u in D( A ) , Au = A ′u , then A ′ is an extension of A and A , a restriction of A ′ . An operator A is densely defined if D( A ) is dense in the space. We have Theorem 2.1. A densely defined bounded operator A can be extended to the entire space to its closure A such that || A ||=|| A || . Proof. Since D ( A ) is dense in X , for each u in X , there is a Cauchy sequence {un } in D ( A ) converging to u . Boundedness of A then implies that
|| A (un − um ) || ≤ || A || || (un − um ) ||
→
n , m →∞
0,
24
S. Raj Vatsya
i.e., { Aun } is a Cauchy sequence, which in view of the completeness of Y , must converge to a υ in Y . Set Au = υ . The extension A is uniquely defined for if there are two sequences {un } and {un′ } both converging to u , then
|| Aun′ − υ || = || A (un′ − un ) + Aun − υ || ≤ || A (un − un′ ) || + || Aun − υ ||
,
≤ || A || || (un − un′ ) || + || Aun − υ || → 0, and hence { Aun′ } also converges to υ . The result || A ||=|| A || will follow from the inequalities || A ||≥|| A || and || A ||≤|| A || . Since D ( A ) is contained in D ( A ) , || A ||≥|| A || is obvious. The converse follows from
|| Au || = lim || Aun || = lim || Aun || n →∞
n →∞
≤ lim(|| A || || un ||) = || A || || u ||, n →∞
for each nonzero u . We have used the fact that the sequence of the norms of a strongly convergent sequence converges to the norm of the limit • While all densely defined bounded operators can be closed, not all unbounded operators admit closure. The difficulty arises that for sequences {un } and {un′ } both converging to u ,
{ Aun } and { Aun′ } may both be convergent but have different limits. If an operator admits closure, it is called closable. If A = A , the operator is called closed. For any closable operator T such that T = A , D(T ) is called the core of A . For a set D to be a core of
A , it is necessary but not sufficient that D be dense in D ( A ) . However, if A has a bounded inverse then D contained in D ( A ) is a core of A if and only if A D is dense in the range space. Unless stated otherwise, a bounded operator will be assumed to have the whole space for its domain and hence closed. An unbounded operator will be assumed to be densely defined. In case of the unbounded operators, a number of operations require justification, which will be considered as needed. It is straightforward to verify that for two properly defined bounded operators || α A ||=| α | || A || , || A + B ||≤|| A || + || B || and || AB ||≤|| A || || B || . It follows that
|| A n ||≤|| A ||n . An operator A is continuous if for each ε > 0 , there is a δ (ε ) > 0 such that || A (u − υ ) ||< ε whenever || (u − υ ) ||< δ (ε ) . It can be shown that an operator is continuous if and only if it is bounded.
Foundations and Applications of Variational and Perturbation Methods
25
An operator A is invertible if there is an operator B , called its left inverse such that
BAu = u for each u in D( A ) ; if ABu = u , then B is the right inverse; if B is both the −1
left and the right inverse, it is termed the inverse denoted by A . The inverse of an operator can be characterized as follows. Lemma 2.1. A is invertible if and only if Au = 0 implies that u = 0 . If A is −1 − 1
invertible, ( A )
=A. −1
Proof. If A is invertible, then u = A ( Au ) , and thus Au = 0 implies that u = 0 . If
Au = 0 implies that u = 0 , then u ≠ 0 implies that υ = Au ≠ 0 . For each u in D( A ) , υ is uniquely defined for if
υ1 = Au = υ2 , then (υ1 − υ2 ) = A (u − u ) = 0 . Thus A −1 is
defined on the range of A by A
υ =u.
−1
Since there is a one to one correspondence between the vectors u and υ , A −1
defined above is also the right inverse, i.e., A A = AA
−1
−1
as
= 1 , which also implies that
−1 − 1
(A ) = A • Consider the operator-valued function ( z − A ) resolvent of A . All z for which ( z − A )
−1
−1
of a complex variable z , called the
exists as a bounded operator, constitute the
resolvent set of A . All the other values of z constitute the spectrum of z for which ( z − A )
−1
σ ( A ) of A . Values
does not exist define the point or discrete spectrum
σ p ( A ) of
A . The remainder of σ ( A ) is termed the essential spectrum, which can be further separated into absolutely continuous part
(z − A)
−1
σ c ( A ) and singular continuous. The inverse
exists on the essential spectrum but as an unbounded or discontinuous operator.
We shall encounter only the operators for which the singular continuous part is empty or almost empty. If a number λ is in
σ p ( A ) , then by definition (λ − A )−1 does not exist. Hence from
Lemma 2.1, there is a vector u ≠ 0 such that (λ − A )u = 0 , i.e., Au = λ u , providing an equivalent characterization of
σ p ( A ) . The points in σ p ( A ) and the vectors u are called
the eigenvalues and the corresponding eigenvectors, respectively. If the multiplicity of an eigenvalue is equal to one, it is called a simple eigenvalue. Eigenvalues of higher multiplicity, i.e., if there are more than one eigenvectors corresponding to the same eigenvalue, is also termed a degenerate eigenvalue. The space spanned by all the eigenvectors corresponding to an eigenvalue is called the associated eigenspace, which in case of a simple eigenvalue is one-dimensional. Projection operators P defined by P = P , e.g., 0 and 1 , and their pairs play a significant role in analysis. The projection on the 2
26
S. Raj Vatsya
eigenspace of an operator is called its eigenprojection corresponding to the eigenvalue. The range of the eigenprojection is spanned by the set of the eigenvectors. Some useful properties of the projection operators are proven in Lemma 2.2. For part of the proof, we need the concept of convergence in the uniform operator topology, which will be seen to be of a wide range of applicability. A sequence of operators { An } is said to converge uniformly to A if and only if || An − A || → 0 . This is the strongest form of n →∞
convergence for the sequences of operators. Weaker forms of convergence will be introduced later. Lemma 2.2. (i) A projection P has two eigenvalues, zero and one, and if P ≠ 0 then || P ||≥ 1 . (ii) If P and P ′ is a pair of projections such that || P − P ′ ||< 1 , then the dimensions of their ranges are equal. Proof. (i) For each u , we have that (P − 1)P u = 0 , which implies the first part. For the second, if P ≠ 0 we have || P ||> 0 and since || P ||=|| P ||≤|| P || , it follows that || P ||≥ 1 . 2
2
(ii) [Kato, 1980, ch. I, sec. 4.6] The operator U ′[P , P ′] = P ′ P + (1 − P ′)(1 − P ) maps the range P X of P into P ′ X and U ′ [P ′, P ] maps P ′ X into P X . The operator
Q = (P − P ′)2 commutes with P , P ′ and hence, with U ′ [P ′, P ] and U ′ [P ′, P ] . Since || Q ||=|| (P − P ′) 2 ||≤|| P − P ′ ||2 < 1 , the binomial expansion
B =
∞
⎛ −1/ 2 ⎞ ⎟ (−Q )n , n ⎟ ⎝ ⎠
∑ n =0 ⎜⎜
converges absolutely, i.e.,
|| B || ≤
∞
⎛ −1/ 2 ⎞ ⎟ || Q ||n = (1− || Q ||) −1/ 2 . n ⎟ ⎝ ⎠
∑ n =0 ⎜⎜
Consequently, the sequence of its partial sums converges uniformly to an operator bounded by (1− || Q ||)−1/ 2 . Also, the sum B satisfies the relation B 2 = (1 − Q ) −1 as can be checked by multiplying B 2 by (1 − Q ) together with term by term summation. Hence, the limit operator, still denoted by B , is equal to (1 −Q )−1/ 2 .
Foundations and Applications of Variational and Perturbation Methods
27
Let U = U ′ [P ′, P ]B = BU ′ [P ′, P ] . It can be checked by standard manipulations that U is invertible with U −1 = U ′ [P , P ′]B = BU ′ [P , P ′] , and that P ′ = U P U , i.e., P , P ′ are −1
similar to each other. In particular, their ranges are in a one to one correspondence mapped into each other by U and U −1 . Hence, the dimensions of the ranges are equal • The resolvent ( z − [ A + B ])
−1
of the sum of the two operators
A
and
B
with
D[ A + B ] being the intersection of D( A ) and D(B ) , which will be assumed to be dense, satisfies the second resolvent equation:
( z − [ A + B ]) −1 = ( z − A ) −1 + ( z − A ) −1 B ( z − [ A + B ]) −1 (2.1)
= ( z − A ) −1 + ( z − [ A + B ]) −1 B ( z − A ) −1 where the roles of
A
and
B
can be interchanged. The first resolvent equation results when
B
is a number. Eq. (2.1) can be iterated to obtain a finite expansion with remainder or a uniformly convergent C. Neumann expansion, which we demonstrate for the case of A = 0 with natural extension to Eq. (2.1). Lemma 2.3. For a bounded
B,
and | z
−1
|< (1/ || B ||) , the resolvent is given by the
infinite series
( z − B )−1 = z −1 (1 − z −1B )−1 = z −1[1 + z −1B + z −2B 2 + .... + z − nB n + .... ] Proof. The proof is essentially the same as the corresponding part of Lemma 2.2 (ii). To be specific, let C denote the series on the right side. For | z
−1
|< (1/ || B ||) we have
|| z −1B ||< 1 , and hence, || C || ≤ | z −1 | [1 + || z −1B || + || z −1B ||2 + .... + || z −1B ||n + .... ] .
≤ | z −1 | [1 − || z −1B ||]−1
Since the series is absolutely and hence, uniformly convergent, its sum C defines a bounded operator. The fact that C = ( z − B )
−1
can now be seen by operating with ( z − B )
from left and right sides and by term by term summation • Properties of the operator valued functions, e.g., the resolvent, can be studied in terms of the associated functionals. A functional F is an operator from a general Banach space to the
28
S. Raj Vatsya
Banach space of complex numbers. The space of all linear functionals on a space defines its dual space. For each F , there is a unique element f in the dual space that defines F , expressed as F [u ] = f (u ) . The result of Lemma 1.6 is extendible to the Banach spaces, i.e., if F [u ] = f (u ) = 0 as f varies over a dense set in the dual space, then u = 0 . Functions
υ ([ z − B ]−1 u ) of z for each u in the space and υ in its dual, are analytic on the resolvent set of A . For each z in the spectrum of A , there must be a pair of vectors u and υ such −1 that υ ([ z − B ] u ) has a singularity, i.e., a pole or a cut. We have Corollary 2.1. A bounded operator has at least one point in its spectrum. −1
Proof. It follows from Lemma 2.3 that || ( z − B ) ||≤| z
−1
| [1− || z −1B ||]−1 → 0 . If z →∞
υ ([ z − B ] u ) does not have a singularity anywhere in the complex plane, it is entire and −1
hence, from Liouville's theorem (Proposition 1.2, Corollary 1.1), it is equal to zero. Since, this result holds for all u in the space and υ in its dual, we have that ( z − B )
−1
= 0 , which is a
contradiction • A sequence of operators is said to converge strongly on a set if for each u in the set
|| ( An − A )u || → 0 and the sequence is said to converge weakly if for each u in the set n →∞
and each υ in the dual space,
υ ([ An − A ]u ) → 0 . It is straightforward to check that the n →∞
strong limit of a sequence of uniformly bounded operators { An } is bounded and if { An } is a Cauchy sequence in the strong operator topology, it has a bounded limit. Compact or completely continuous operators constitute a particularly useful type of bounded operators. An operator A is compact if for each bounded sequence of vectors,
|| un ||≤ M , { Aun } contains a Cauchy subsequence. A compact operator is bounded, otherwise there would be a sequence with || un ||≤ 1 and || Aun ||→ ∞ , which is not possible. However, the converse is not true, e.g., the identity operator in an infinite dimensional space is bounded but not compact. If the range of an operator is finite dimensional, it is termed an operator of finite rank or degenerate. Since a bounded sequence in a finite dimensional space always contains a convergent subsequence, an operator of finite rank is compact. A compact operator maps a weakly convergent sequence into a strongly convergent one, which is equivalent to the above characterization. The following characterization is frequently useful in determining if an operator is compact. Lemma 2.4. If a sequence of compact operators {K n } converges uniformly to the operator K , then K is compact, and an operator compact on a dense set admits a compact closure.
Foundations and Applications of Variational and Perturbation Methods
29
Proof. If the limit operator K is not compact, then there is a normalized sequence {un } such that || K (un − un′ ) ||≥ ε for some
ε > 0 . Since || K − K m || → 0 , there is a value M m→∞
such that || K − K m ||≤ ε / 4 for each m ≥ M . Since K m is compact, {K m un } contains a convergent subsequence, which will still be denoted by {n} . Thus, there is an N (m) such that || K m (un − un′ ) ||≤ ε / 4 for each n, n′ ≥ N ( m) . Consider,
|| K (un − un′ ) || = || K m (un − un′ ) + (K - K m )(un − un′ ) || ≤ || K m (un − un′ ) || + || (K - K m )(un − un′ ) || ≤ || K m (un − un′ ) || + || (K - K m ) || || (un − un′ ) || ≤ || K m (un − un′ ) || + 2 || (K - K m ) || . Let m ≥ M and let n, n′ ≥ N ( m) . We have that || K (un − un′ ) ||≤ 3ε / 4 , which is a contradiction. Remainder of the proof is essentially the same as follows. A densely defined compact K
admits a bounded closure K from Theorem 2.1. To see that K is compact, let {un } be a
bounded sequence of vectors. For each n and for each ε > 0 , there is a vector un′ in the domain of K such that || un′ − un ||< ε . Consider
|| K (un − um ) || ≤ || K (un′ − um′ ) || + || K [(un − un′ ) + (um′ − um )] || . Earlier estimates show that the left side is a Cauchy sequence •
2.II. Operators in Hilbert Spaces Since a Hilbert space is a Banach space, all of the considerations of the last section are applicable for this case as well. Also, a number of the following results have their counterparts for more general Banach spaces. The fact that a Hilbert space is equipped with a scalar product provides it a richer structure resulting in a number of simpler and useful properties. Functionals in the Hilbert spaces admit particularly simple and useful representations as a consequence of the Riesz representation theorem. A linear functional F is defined by F [α u + βυ ] = αF [u ] + β F [υ ] , and F is bounded if | F [u ] |≤ M F || u || for all u . We take M F to be the least upper bound, which is called the norm || F || of F . It is clear that the scalar product ( w, u ) for a fixed w and variable u defines a functional. Conversely, we have
30
S. Raj Vatsya
Theorem 2.2. (Riesz representation theorem ) For every bounded linear functional F [u ] in a Hilbert space H , there is vector w such that F [u ] = ( w, u ) with
|| w ||=|| F ||= M F . Proof. Since the least upper bound of | F [u ] | on the set of normalized vectors u is equal to M F , one can select a sequence of normalized vectors {un } such that | F [un ] | → M F . n →∞
By multiplying un by a complex number of unit magnitude, we can ensure that F [un ] are real and non-negative. Straightforward calculation shows that for each pair of vectors u,υ in H ,
|| u + υ ||2 + || u − υ ||2 = 2 || u ||2 + 2 || υ ||2 , and thus,
|| un − um ||2 = 4 − || un + um ||2 ≤ 4 −
1
(F [un ] + F [um ]) 2 → 2
MF
n, m →∞
4−
4 M F2 M F2
= 0,
where we have also used the fact that | F [un ] + F [um ] |≤ M F || un + um || . Thus, {un } is a Cauchy sequence and hence converges to a normalized vector u . Since
| F [u ] − F [um ] | = | F [u − um ] | ≤ MF || u − um ||
→ 0,
m →∞
F [um ] converges to F [u ] , and hence F [u ] = M F . Set w = M F u . Clearly, || w || = || F || = M F , and the relation F [u ] = ( w, u ) is true. For each vector u such that F [u ] = 0 , we have with an arbitrary constant κ ,
M F2 = (F [u ]) 2 = (F [u − κ u ])2 ≤ M F2 || u − κ u ||2 = MF2 ⎡1 − κ * (u , u ) − κ (u , u ) + | κ |2 (u , u ) ⎤ , ⎣ ⎦ implying that −κ * (u , u ) − κ (u , u )+ | κ |2 (u , u ) ≥ 0 . Taking κ = (u , u ) /(u , u ) , we have that
| (u , u ) |2 ≤ 0 , and hence (u , u ) = 0 . Thus, F [u ] = ( w, u ) for the null vectors of F as well.
Foundations and Applications of Variational and Perturbation Methods
31
The assertion follows by observing that each vector u can be expressed as a linear combination of u and a null vector u0 , i.e., u = u0 + κ u , with
κ = F [u ] / M F = F [u ] / F [u ] • Theorem 2.2 shows that a Hilbert space is self-dual. Let D′ be the set of all vectors υ such that for each
υ there is a unique w depending only on υ such that (υ , Au ) = ( w, u ) as u varies over D( A ) . The operator Aˆ , the adjoint of A is defined on D ( Aˆ ) = D′ by Aˆυ = w . Remark 2.1. A densely defined bounded A always has an adjoint for the following. For each fixed
υ in H , F [u ] = (υ , Au ) is a linear functional bounded by (|| A || || υ ||) . Hence
from Theorem 2.2, there is a w such that F [u ] = (υ , Au ) = ( w, u ) defining Aˆ by Aˆυ = w on the whole of H . Uniqueness follows from the fact that u varies over a dense set, i.e., if there is w′ ≠ w such that ( w − w′, u ) = 0 then w = w′ from Lemma 1.6. It is easily seen
Aˆ is closed and the adjoint of Aˆ is a closed extension of A . Since || w ||=|| F ||=|| A || || υ || , we have that || Aˆ ||=|| A || . Also, a closed A with the whole of H
that
in its domain is bounded [Riesz and Sz-Nagy, 1955; P. 306]. A densely defined unbounded operator also has an adjoint but it may be trivial with only the null vector in its domain. However, adjoints of densely defined operators are closed and the adjoints of adjoints provide closed extensions of the original operators • Following two results are examples of the significant role the adjoints play in understanding the properties of the operators, as will be seen repeatedly. Proposition 2.1. The range of a densely defined operator with invertible adjoint is dense.
υ ≠ 0 in H such that (υ , Au ) = 0 for all u in D( A ) implying that υ is in D( Aˆ ) and Aˆυ = 0 . Since Aˆ −1 exists, this implies that υ = 0 Proof. If not, there is a vector
(Lemma 2.1), which is a contradiction • Theorem 2.3. Let {T n } and {S n } be the sequences of uniformly bounded operators such that T n → T and Sˆn → Sˆ , respectively, and let {K n } be a sequence of operators s
s
converging uniformly to an operator K from H to H . Further, let {K n } be a sequence of compact operators or let K be compact. Then {T nK n S n } converges uniformly to the compact operator (TKS ) .
32
S. Raj Vatsya Proof. If {K n } are compact, then K is compact from Lemma 2.4. Therefore it is
sufficient to prove the result with K being compact. If the result is not true, then there is a sequence of normalized vectors {un } such that
|| (T nK n S n − TKS )un ||≥ ε for an ε > 0 . Majorize the left side as || (T nK n S n − TKS )un || = || T n (K n - K )S n un + (T n − T )KS n un + TK (S n − S )un || ≤ || T n (K n - K )S n un || + || (T n − T )KS n un || + || TK (S n − S )un || Let the common bound of {T n } and {S n } be denoted by M . Then
|| T n (K n - K )S n un || ≤ M 2 || (K n - K ) || → 0 . n →∞
Since || S n un ||≤ M , there is a subsequence and a vector
υ , such that || KS nun − υ || → 0 . n →∞
This combined with the strong convergence of {T n } to T implies that
|| (T n − T )KS n un || ≤ || (T n − T )υ || + || (T n − T )(KS n un − υ ) || ≤ || (T n − T )υ || + 2M || (KS n un − υ ) || → 0. n→0
Since || (S n - S )un ||≤ 2 M and TK is compact, there is a vector w such that
|| TK (S n − S )un − w || → 0 . n →0
Consequently,
|| TK (S n − S )un ||2 = (TK (S n − S )un , w) + (TK (S n − S )un , TK (S n − S )un − w) ˆ ˆ w) + (TK (S n − S )un , TK (S n − S )un − w) = (un , (Sˆn − Sˆ )KT ˆ ˆ w || + || TK (S n − S ) || || TK (S n − S )un − w || → 0. ≤ || (Sˆn − Sˆ )KT n→∞
We have used the strong convergence of Sˆn to Sˆ . Since each of the terms majorizing
|| (T nK n S n − TKS )un || converges to 0, it follows that || (T nK n S n − TKS )un || → 0 , n →∞
which is a contradiction •
Foundations and Applications of Variational and Perturbation Methods Compactness of {K n } or K
33
is essential in Theorem 2.3. Otherwise, take
S n = K n = K = 1 to conclude that T n → T , i.e., that the strong convergence implies the u
uniform convergence, which is untrue.
ˆ = AAˆ , the operator is called normal and if also Aˆ = A , it is called unitary, If AA which constitutes a particularly useful class of bounded, normal operators. A unitary operator U preserves the norm: || Uu ||=|| u || . Conversely, a norm preserving operator is unitary. −1
Norm preserving operators defined on proper subspaces of H are called partially isometric. If A = Aˆ , the A is called self-adjoint. If the closure of a densely defined operator is selfadjoint, the operator is termed essentially self-adjoint. If for each u ,υ in D ( A ) ,
(υ , Au ) = ( Aυ , u ) , A is termed Hermitian symmetric or symmetric. A symmetric bounded operator with H for its domain is easily seen to be self-adjoint. If a symmetric operator is only densely defined, the adjoint of its adjoint provides its self-adjoint extension. Self-adjoint projections or the ortho-projections, constituting a subclass of bounded, self-adjoint operators, play a significant role in analysis and are useful mainly due to the following property: Proposition 2.2. If P = P , P ≠ 0 and Pˆ = P then || P || = 1 . 2
Proof. From Lemma 2.2 (i), we have that, || P || ≥ 1 . If Pˆ = P we also have
ˆ u, u ) = (P 2u , u ) = (P u , u ) ≤ || P u || || u || , || P u ||2 = (P u , P u ) = (PP implying that || P u || ≤ || u || , i.e., || P || ≤ 1 • A normalized eigenvector defines an ortho-eigenprojection: P υ = u (u ,υ ) . For a degenerate eigenvalue of multiplicity N ′ , the ortho-eigenprojection is defined by
Pυ =
∑
N′ n =1
ϕ n (ϕ n ,υ ) ,
N′
where {ϕ n }n =1 forms an arbitrary orthonormal basis in the associated eigenspace. As indicated above, symmetry for densely defined closed bounded operators is sufficient for essential self-adjointness. For unbounded operators, the situation is quite different. A densely defined unbounded symmetric operator, although has an adjoint, may not be selfadjoint even if closed, as seen from the following example.
A , be defined by A u = i ( du / dx ) = iu& with D ( A ) being the set of all absolutely continuous functions u Example 2.1. [Stone, 1932; ch. X] Let the operator
in H = L ([0;1], dx) such that u (0) = u (1) = 0 . Integration by parts yields that 2
34
S. Raj Vatsya
(υ , iu& ) − (iυ& , u ) = i[υ * ( x)u ( x)]10 .
(2.2)
The derivatives u&, υ& exist almost everywhere. Since the right side of Eq. (2.2) is equal to zero for each u ,υ in D ( A ) , it follows that (υ , A u ) = ( Aυ , u ) , and hence A is a
symmetric operator. We show below that D( A ) is dense in H . A vector u is in D( A ) if and only if A u = υ is orthogonal to the vector w defined by the function equal to 1 on [0;1] . The only if part follows from
( w,υ ) =
1
∫ υ ( x) dx 0
= i
1
∫ u& ( x) dx 0
= [u (1) − u (0)] = 0 .
υ orthogonal to w . The solution u of iu& = υ with
For the if part, consider a given x
u (0) = 0 is given by u ( x) = −i ∫ υ ( x′)dx′ . It follows from the orthogonality of υ and w 0
that u (1) = 0 , and hence u is in D ( A ) . Now, if D( A ) is not dense in H , then there is a non-zero vector f such that
( f , u ) = 0 for all u in D( A ) , i.e., 1
1
0
0
0 = − i ( f , u ) = − i ∫ f * ( x)u ( x) dx = i ∫ F * ( x)u& ( x) dx = ( F , Au ) , where F ( x) is the appropriate indefinite integral of f ( x) . We have used integration by parts. Since the only vectors orthogonal to R ( A ) , are constant multiples of w , F ( x) must be a constant. Since f ( x) is the derivative of F ( x) , which is well defined, f ( x) = 0 almost everywhere, leading to a contradiction. To calculate Aˆ , it follows from Eq. (2.2) that for each absolutely continuous function υ
in H , with arbitrary boundary values, there is a vector h such that (υ , Au ) = (h, u ) , which is h = iυ& . Thus, D ( Aˆ ) consists of all absolutely continuous functions in H characterized by the set of vectors
υ ( x) = κ +
υ given by
∫
x
0
w( x′) dx′ ,
κ is an arbitrary constant and w is an absolutely continuous, otherwise arbitrary function in H . It is clear that D ( Aˆ ) properly contains D( A ) , and hence, Aˆ ≠ A , where
although the action of the two operators is the same on D( A ) , i.e., Aˆ is a proper extension of A . It can be seen that the adjoint of Aˆ is A , and hence A is also closed •
Foundations and Applications of Variational and Perturbation Methods
35
Remark 2.2. Some of the densely defined symmetric operators admit self-adjoint extensions. This topic will be addressed in the sequel further. For now, consider the extension of A of Example 2.1 to A ′ obtained by enlarging D ( A ) to D ( A ′) by altering the
boundary condition to u (0) = u (1) , i.e., D ( A ′) consists of all vectors (κ + u ) , where
κ is
an arbitrary constant and u is an arbitrary vector in D( A ) . It follows from Eq. (2.2) that
Aˆ ′u = A ′u for each u in D( Aˆ ′) = D( A ′) and hence, Aˆ ′ = A ′ . Similarly, it can be seen that the extension by the boundary condition u (0) = κ u (1) with | κ |= 1 , yields the other self-adjoint extensions of the same operator • Symmetry even without self-adjointness endows the operator with the following properties: Proposition 2.3. Eigenvalues of a symmetric operator are all real; the eigenvectors
u , u ′ corresponding to two distinct eigenvalues λ , λ ′ , respectively, are orthogonal to each other and there is at least one real z such that ( z − A ) − 1 exists. Proof. With
λ and u being an eigenvalue and a corresponding normalized eigenvector,
Au = λ u implies that
λ * = (u , Au )* = ( Au, u ) = (u , Au ) = λ . For two distinct eigenvalues, we have that (u ′, Au ) = λ (u ′, u ) and (u , A u′) = λ ′(u , u ′) ,
(u, Au ′)* = λ ′(u , u′)* , i.e., (λ − λ ′)(u′, u ) = 0 , implying that (u′, u ) = 0 .
implying
that
(u′, Au ) = λ ′(u ′, u ) ,
yielding
that
Since an orthonormal set of vectors can at most be countable, there is at least one real value of z , in fact uncountably many, such that ( z − A ) u = 0 has no non-zero solution and hence, ( z − A ) − 1 exists from Lemma 2.1 • The distinction between self-adjoint and symmetric but not self-adjoint operators is a subtle manifestation of the boundary conditions and has a profound impact on the analysis as well as the results. Self-adjointness simplifies the analysis. More significantly, this property is directly related to various physical phenomena resulting in their formulations in terms of the self-adjoint operators. In particular, the spectral points and spectral states of self-adjoint operators have satisfactory interpretations in terms of their physical counterparts. On the other hand, some spectral points and states of non-self-adjoint operators, even if symmetric, have no physical counterparts making them unsuitable to represent the physical systems of interest. However, while the symmetry is usually straightforward to check, self-adjointness is a rather delicate property and requires more effort to establish, and achieve. Characterizations stated in Theorem 2.4 indicate the technicalities as well as provide useful criteria for selfadjointness. As indicated earlier, if the adjoint of the adjoint is self-adjoint, the operator is
36
S. Raj Vatsya
called essentially self-adjoint. No distinction will be made between the essentially self-adjoint and self-adjoint operators unless necessary. Theorem 2.4. (Riesz and Nagy, 1971, pp.320-329; Kato, 1980, pp. 270-272) A symmetric operator A is self-adjoint
I)
if and only if its Cayley transform U ( A ) = (i − A )(i + A )
−1
is unitary;
II) if and only if Aˆ has no non-real eigenvalues; III) if the range of ( z − A ) is dense for some non-real z ; IV) only if the range of ( z − A ) is dense for each non-real z • The Cayley transform is defined also for merely symmetric operators but they are only partially isometric. The dimensions of the complements of its domain and the range are called the deficiency indices of the operator. It is clear that an operator is essentially self-adjoint if and only if both of its deficiency indices are equal to zero. Numerical range of an operator is defined as the set of values (u , Au ) with all normalized vectors u . The spectrum of a self-adjoint operator is a subset of its numerical range, which is contained in the real line. The numerical range of a merely symmetric operator is also contained in the real line but its spectrum contains non-real values also, which is one of the contents of Theorem 2.4. A Hermitian symmetric transformation A of a space of finite dimension N , which is equivalent to a matrix, admits the representation
A = where
∑
n
λnPn ,
(2.3)
λn are its eigenvalues and Pn are the corresponding eigenprojections. The limits of
summation depend on the multiplicities of the eigenvalues. The set of projections {Pn } provides a resolution of the identity operator, i.e.,
∑P n
n
= 1 . Equivalently, with the
N'
understanding, that {ϕ n }n = m can be taken to be an arbitrary orthonormal basis in an eigenspace corresponding to a degenerate eigenvalue of multiplicity ( N ′ − m + 1) , {ϕ n }n =1 N
constitutes an orthonormal basis in the underlying space. This convention will be implicitly assumed unless stated otherwise. These results of algebra admit an extension to the Hilbert spaces as follows. A spectral family {Eλ } of right continuous projection valued functions of a real variable
λ is characterized by 1.
Eλ Eμ = Eλ for λ ≤ μ ;
2.
Eλ + 0 = Eλ ;
Foundations and Applications of Variational and Perturbation Methods 3.
37
E−∞ = 0 and E∞ = 1 .
The last property, which entails the completeness of the spectral functions, may be satisfied with one or both end points being finite. Only the intervals containing all the points of its increase are relevant. Normal operators admit spectral representation of the type of Eq. (2.3). We state the result, the spectral theorem for unitary operators. Theorem 2.5. (Spectral theorem for unitary operators) Every unitary transformation U admits a spectral decomposition
U =
∫
2π
−0
eiθ dEθ
where the spectral family
{Eθ }
over the interval [0;2π ] is uniquely determined by U .
Conversely, the right side defines a unitary operator • Several proofs for the spectral theorem for self-adjoint operators are available in literature. This theorem will be used frequently. A proof due to von Neumann obtains this result from Theorem 2.5 in view of Theorem 2.4 (i). Theorem 2.6. (Spectral theorem for self-adjoint operators) Every self-adjoint operator A from H to H admits the representation
A =
∫
∞
−∞
λ dEλ
where the spectral family {Eλ } is uniquely determined by A . The projections {Eλ } commute with A , i.e., Eλ A = A Eλ , as well as with all bounded operators, which commute with A . Conversely, a spectral family defines a self-adjoint operator by the right side of the equation • While both sides of the result of the spectral theorem are well-defined for all vectors for bounded operators, for unbounded operators they are understood to be valid on D( A ) characterized by the condition that || A u || is finite. Also, the domain of integration is required only to include the spectrum of A , instead of the entire real line. Further, in general the statement of the theorem is valid for the scalar products (υ , Au ) for all u and υ . If the
integral on the right side is defined as the strong limit, then the statement is applicable to the vectors ( Au ) . The stated form, although customary with its inherent meaning, in the strict sense, it is valid when the integral exists as the uniform limit. It is clear from the properties of the spectral family that the identity operator is resolved by {Eλ } , which we state as
38
S. Raj Vatsya Corollary 2.2. Each u in H can be expressed as u =
∫
∞
−∞
dEλ u , where {Eλ } is the
spectral family associated with a self-adjoint operator • Spectral decomposition of partially isometric and symmetric operators is hindered for lack of completeness. It is now established that these operators in general do not posses representations similar to their self-adjoint and unitary counterparts [Stone, 1932; ch. X]. The spectral family for operators with pure point spectrums is given by Eλ =
∑λ
n <λ
Pn .
The domain of integration in this case reduces to any interval containing all the eigenvalues and the spectral theorem reduces to
A =
∑
∞ n =1
λnPn ,
(2.4)
with Eq. (2.3) being a special case for the transformations of the finite dimensional spaces. The following result follows from the definitions and Corollary 2.2. Corollary 2.3. Let {ϕ n }n =1 be the normalized set of eigenvectors of a self-adjoint ∞
operator with pure point spectrum. Then each u in H can be expressed as
u =
∑
∞ n =1
ϕ n (ϕ n , u ) •
For the operators with spectrum close to completely discrete, e.g., the compact operators, the results require slight adjustment. The spectrum of a compact operator consists entirely of at most countably many eigenvalues each with finite multiplicity with no limit points except zero. The point zero can be an eigenvalue or in the essential spectrum. A symmetric compact operator is self-adjoint and admits the expansion given by Eq. (2.4). The set {Pn } augmented by the orthoprojection on the subspace corresponding to zero, if it is in the spectrum, resolves the identity. If zero is an eigenvalue, the subspace is the corresponding eigenspace. If zero is in the resolvent set, then the operator is finite dimensional, i.e., a matrix, and the corresponding set of projections resolves the identity in the finite-dimensional space. Repeated operations of A on both sides of Eq. (2.4) show that the result of the spectral theorem extends to the power of A and hence, to polynomials p ( A ) in A , as long as the series converges defining the domain of the operator. Furthermore, if [ p (λ )]
−1
equation p ( A ) u =
υ is solved by u =
is defined by the right side, e.g.,
( z − A ) −1 =
∑
n
( z − λn ) −1Pn .
−1
∑ [ p(λ )] n
n
exists, the
Pnυ . Thus [ p ( A )]−1 also exists and
Foundations and Applications of Variational and Perturbation Methods
39
This extends the spectral theorem to rational functions. Similarly, the result is extendible to include functions f ( z , λ ) of a complex variable z and real variable λ , defined by the convergent power series, e.g., the exponentials. Such functions will be termed elementary. The equality of the spectral theorem can be extended to include the functions, which can be approximated by elementary functions, e.g., continuous. To avoid unnecessary generality, such cases will be considered as they arise. Unless otherwise stated, the results will be stated for the elementary functions only. The proofs of the above results require convergence of the series and the results are valid in the sense of the convergence. Also, the results can be expressed in terms of the spectral function instead of projection operators. It is remarkable that these results can be extended to all self-adjoint operators, by virtue of the fact that the basic property required to prove them, which is, PnPm = δ nm , where δ nm is the Kronecker delta symbol together with appropriate convergence properties, have their counterparts for more general spectra, stated below. Proposition 2.4. Let {Eλ } be the spectral family associated with a self-adjoint or unitary
operator A . Then
[ Eb − Ea ]2 = [ Eb − Ea ] and [ Eb − Ea ] [ Ed − Ec ] = 0 for a < b < c < d • This result, which is the orthonormality property of the projections [ Eb − Ea ] on the sets [a; b] , follows from the properties of the spectral function. The results obtained for the operators with discrete spectra can now be seen to be valid by expressing the integrals as the limits of sums, operations being valid in view of the convergence of series. We summarize these results in the following generalized form of the spectral theorem, stated for self-adjoint operators but has its counterpart for normal operators also. Theorem 2.7. (Spectral theorem for functions of self-adjoint operators) Let f ( z , λ )
λ varying on the real line. Then the function f ( z , A ) of a self-adjoint operator A with spectral family {Eλ } admits the representation
be a function of a complex variable z , and
f ( z, A ) =
∫
∞
−∞
f ( z , λ ) dEλ
with the domain of definition defined by the vectors u for which
|| f ( z , A )u ||2 =
∫
∞
−∞
| f ( z , λ ) |2 ||dEλ u ||2
is finite • Remark 2.3. Theorem 2.7 also shows that
σ ( f ( A )) = f (σ ( A )) which is essentially
the statement of the spectral mapping theorem •
40
S. Raj Vatsya
The following result plays a crucial role in variational methods, which is not valid for non-self-adjoint operators, even matrices. Corollary 2.4. The resolvent of a self-adjoint operator is bounded with its bound given
by
|| ( z − A ) −1 || ≤
1 d [ z , σ ( A )]
,
where d [ z , σ ( A )] is the minimum distance between z and termed the distance between z and
λ as λ varies over σ ( A ) ,
σ ( A) .
Proof. From Theorem 2.7, we have
|| ( z − A ) −1 u ||2 =
1 1 ||dEλ u ||2 ≤ || u ||2 , 2 −∞ | ( z − λ ) | (d [ z, σ ( A )])2
∫
∞
implying the result • The spectral function of a self-adjoint operator with pure point spectrum can be obtained as follows.
1
dz ( z 2π i ∫
− A ) −1 =
1
∑ ∫ dz ( z 2π i n
− λn ) −1Pn = Pn ,
where the integration is along a closed contour enclosing the single eigenvalue
λn . If the
contour encloses an interval, then this integral yields the corresponding spectral function. Extension of this result to general self-adjoint operators is not as straightforward as that of Theorem 2.7 except for the spectral function corresponding to an isolated part of the spectrum. However, its following extension is similar to its counterpart with the operators with pure point spectrum. Theorem 2.8. With arbitrary u and
υ in H and Eλ being the spectral function of a
self-adjoint operator A , we have
1 ⎡ 1 (u , ⎣( Eμ + Eμ − 0 ) − ( Eν + Eν − 0 ) ⎦⎤ υ ) = − lim 2 2π i ε → 0
∫
C ( μ ,ν ,ε )
dz (u , ( z − A ) −1υ ) ,
where C ( μ ,ν , ε ) is the contour consisting of a straight line from ( μ + iε ) to (ν + iε ) and another straight line from (ν − iε ) to ( μ − iε ) . The spectral function is uniquely determined
Foundations and Applications of Variational and Perturbation Methods
41
by its value at negative infinity, which is normally set equal to zero, since only the difference is significant • Theorem 2.8 states that the right side yields the spectral function at all of its points of continuity, by letting ν approach −∞ . At the points of its discontinuities, the left side is the mean value of its left and right limits. If the end points of the contour approach points in the resolvent set, it can be closed. Also, the contour can be deformed within the region of analyticity of the resolvent. If u = υ , the result reduces to one with the imaginary value of left side being equal to the integral along a straight line from ( μ + iε ) to (ν + iε ) , which is known as the Stieltjes inversion formula. The proofs of both versions are obtained by expressing the integral on right side as
∫
C ( μ ,ν ,ε )
∫
dz (u , ( z − A ) −1υ ) =
C ( μ ,ν ,ε )
dz ∫ d λ
(u , Eλυ ) . (z − λ)
The next step is to interchange the order of integration by Fubini's theorem (Theorem 1.4). Then integration with respect to z reduces it to a difference of the Poisson type integrals (Stone, 1932; Lemma 5.2, ch. V) with integrals in terms of the inverse tangent (Wall, 1948) and the result follows by taking the limit. One parameter family of unitary operators exp[−i Aτ ] where A is self-adjoint, is closely related with its resolvent. It follows from the spectral theorem (Theorem 2.7) that exp[ −i Aτ ] preserves the norm and hence, it is unitary. A simple application of Fubini’s theorem (Theorem 1.4) yields
∫
∞
0
dτ e[ − (i A + =
∫
∞
−∞
z )] τ
dEλ
=
∫
∞
0
∫
∞
0
dτ
∫
∞
−∞
dτ e[ − (iλ +
dEλ e[ − (iλ + z )] τ
=
∫
∞
−∞
z )] τ
1 dEλ = [ z + i A ]−1 , z + iλ
(2.5)
for Re.( z ) > 0 . Similarly, for Re.( z ) < 0 by integrating over the negative real line yields
[ z − i A ]−1 . Further, an application of the Lebesgue dominated convergence theorem (Theorem 1.3) shows that the family of vectors u (τ ) = exp[ −i Aτ ] u (0) satisfies the equation
⎡ ∂ ⎤ ⎢⎣i ∂τ − A ⎥⎦ u (τ ) = 0 ,
(2.6)
where u (0) and hence u (τ ) , are in D( A ) . Remark 2.4. As is the case with the spaces, direct sums and tensor products of the operators also define composite operators. With operators A1 and A2 in H 1 and H 2 , their
42
S. Raj Vatsya
direct sum and tensor product are defined as ( A1 ⊕ A2 )[u1 ⊕ u2 ] = ( A1u1 ⊕ A2u2 ) and
( A1 ⊗ A2 )[u1 ⊗ u2 ] = ( A1u1 ⊗ A2u2 ) , respectively • 2.III. Forms in Hilbert Spaces In this section, we provide a limited description of the forms in the Hilbert spaces, necessary for some analysis of the variational methods. Material is available in detail elsewhere [Kato, 1980]. To avoid lengthy duplications and unnecessary generalizations, we focus mainly on the constructions and results, which are directly relevant for the present applications. A map T [u ,υ ] from H × H , i.e., the Cartesian product of H with its copy, to the space of complex numbers will be called a form. The form T [u ,υ ] thus, assigns a complex number to a pair of vectors u , υ , both in H . The form T [u ,υ ] is called sesqui-linear if it is linear in υ , i.e.,
T [u ,αυ + βυ ′] = αT [u ,υ ] + + β T [u ,υ ′] , and anti-linear in u , i.e.,
T [α u + β u ′,[υ ] = α *T [u ,υ ] + β *T [u ′,υ ] ; the form is called bilinear if it is linear in both arguments. A symmetric form is defined by T [u ,υ ] = (T [υ , u ]) * . The form is bounded if | T [u ,υ ] |≤ M T || u || || υ || with a constant
MT , which can be taken to be the least upper bound, termed its norm; otherwise it is unbounded. The numerical range of a form is the set of values taken by T [u , u ] with all normalized u . The form is sectorial with vertex at κ ′ if its numerical range is contained in the sector defined by arg. | T [u , u ] − κ ′ |≤ θ < π / 2 . With T being a linear operator in H , (T u ,υ ) clearly defines a sesqui-linear form. A theory parallel to the linear operators can be developed for the forms. The following result proves to be useful in the treatment of both, the bounded and unbounded forms, which is obtained by the same method as the result of Remark 2.1. Proposition 2.5. Each densely defined bounded sesqui-linear form admits the representation T [u ,υ ] = (T u ,υ ) , where T is a linear operator bounded by MT . Proof. For a fixed u , T [u ,υ ] defines a linear functional F [υ ] bounded by M T || u || .
Hence from Theorem 2.2, there is a
w such that T [u ,υ ] = ( w,υ ) , with
|| w ||≤ M T || u || . Since υ varies over a dense set, w is unique. The operator T is
Foundations and Applications of Variational and Perturbation Methods
43
defined by T u = w . Since || T u ||=|| w ||≤ M T || u || , we have that || T ||≤ M T . Linearity is obvious • The forms arising for the present will be generated by operators. Sectorial forms will be treated in ch. 4 by considering its imaginary part as a perturbation of its real part, which is symmetric. In this section, we consider symmetric forms arising out of semi bounded symmetric operators with domain D ( A ) in H , which may or may not be self-adjoint. An operator bounded above can be reduced to a bounded below, by considering its negative and the case of bounded below can be reduced to a positive definite operator by the addition of a constant. Thus, without loss of generality A can be assumed to be positive definite, i.e.,
A ≥ κ 2 > 0 . The form ( Au ,υ ) = (u , Aυ ) is clearly sesqui-linear and symmetric on D ( A ) . The following considerations are applicable to more general symmetric forms with (u , Aυ ) 1 (T [u,υ ] + T [υ , u ]) . 2 It can be easily checked that (u ,υ ) + = (u , Aυ ) defines a scalar product on D( A ) . The
replaced with Re.T [u ,υ ] =
closure D ( A ) of D ( A ) with respect to (u ,υ ) + is obtained by including the limit points of all convergent, i.e., the Cauchy, sequences in it. It is clear that D ( A ) contains D( A ) . In fact
D( A ) coincides with D( A ) for || A 1/ 2u ||2 = (u, Au ) =|| u ||2+ . Since D( A ) is complete with respect to ( , ) + , || ||+ , it is a Hilbert space, D ( A ) = H + , with respect to the new scalar product and norm. We have Proposition 2.6. For each u in H + ,
i. || u ||+ ≥
κ || u || ;
ii. H + is contained in H ; iii. if a sequence {un } converges in H + , i.e., if || un − u ||+ → 0 , then it converges in
H , i.e., || un − u ||→ 0 . Proof. (i) follows from the fact that || u ||+ = (u , Au ) ≥ κ || u || ; (ii) follows from (i) 2
and (iii) follows from (i) by replacing u by (un − u ) • Remark 2.5. For each u in H and each υ in H + ,
| (u ,υ ) | ≤ || u || || υ || ≤
1
κ
|| u || || υ ||+ ,
2
2
44
S. Raj Vatsya
from Proposition 2.6(i), and hence (u,υ ) is a linear functional on H + bounded by || u || / κ . It follows from Theorem 2.2 that there is a unique w in H + such that (u ,υ ) = ( w,υ ) + and
|| w ||≤ (|| w ||+ / κ ) . The operator B , defined by Bu = w has H for its domain and its range is in H + . Since || Bu ||=|| w ||≤ (|| w ||+ / κ ) ≤ (|| u || / κ 2 ) , B is bounded in H . It can be seen to be symmetric and invertible by similar arguments [Riesz and Nagy, 1971, pp.330-334]. The operator B −1 provides an extension of A , called the Friedrichs extension. For a nonselfadjoint but symmetric A , B −1 is the minimal extension and for a self-adjoint A ,
A = B −1 . We shall assume A to be self-adjoint, by replacing the original operator by its Friedrichs’ extension, if need be. Let z ≠ 0 be in the resolvent set of A , i.e., the equation
(z − A) f
= g
(2.7-a)
has a unique solution f in D ( A ) and hence in H + , for each g in H . Let B+ be the restriction of A
−1
to H + . By operating with A −1 , this equation reduces to its equivalent
form:
(1 − zB+ ) f
= − Bg .
(2.7-b)
Consequently, there is a one to one correspondence between the points z ≠ 0 in the resolvent set of A and 1/ z in the resolvent set of B+ . The resolvents are related by
( z − A ) −1 g = − (1 − zB+ ) −1 B g .
(2.7-c)
−1
The point zero is covered by A = B . This also implies one to one correspondence between their spectral points. This technical point has significant implications for some analysis of the variational methods, usually required to establish that the distinction between B and B+ is inconsequential for the results, as shown here •
2.IV. Integral Transforms An integral transform is an operator on a Banach space of functions u , defined in terms of the kernel K ( x, y ) as
( Au )( x) =
∫ K ( x, y) u( y) dy ,
(2.8-a)
Foundations and Applications of Variational and Perturbation Methods
45
The functions can be defined on an infinite domain and with multi-dimensional variables as well as more general measures can be included. Descriptions are mostly restricted to the functions on an interval [a; b] , in which case K ( x, y ) is a function on [a; b] × [ a; b] , and to the Hilbert space H = L ([ a; b], dx) . The analysis is trivially extendible to more general 2
cases. If the kernel K ( x, y ) in Eq. (2.8-a) is square integrable, i.e., if the Hilbert-Schmidt norm
|| A ||2 of A defined by || A ||22 =
∫ | K ( x, y ) |
2
dx dy
(2.8-b)
exists, then the operator is called the Hilbert-Schmidt operator. It is clear that K ( x, y ) defines a vector in H ⊗ H = L ([ a; b] × [a; b], dxdy ) . A Hilbert-Schmidt operator is 2
bounded with || A ||≤|| A ||2 , for
|| Au ||2 =
∫ | ( Au)( x) | dx = ∫ dx |∫ K ( x, y) u ( y) dy | ≤ ∫ dx ( ∫ | K ( x, y ) | dy ) ∫ | u ( y ) | dy 2
2
2
2
(2.8-c)
= || A ||22 || u ||2 . We have used the Schwarz inequality (Lemma 1.4) and Fubini's theorem (Theorem 1.4). Lemma 2.5. The integral operator defined by a Hilbert-Schmidt kernel is compact.
ψ nm = ϕnϕm , n, m = 1, 2,... , constitute an orthonormal basis in
Proof. The vectors
H ⊗ H = L2 ([a; b] × [a; b], dxdy ) , where ϕn , n = 1, 2,... , is an orthonormal basis in L2 ([a; b], dx) , and K ( x, y ) defines a vector K ( x, y ) in H × H . Hence, K ( x, y ) =
∑
∞ n , m =1
(ψ nm , K ) ψ nm ( x, y )
with the series converging with respect to the norm in H ⊗ H . Let the operator A N be defined by
( A N u )( x) =
∫K
N
( x, y ) u ( y ) dy ,
46
S. Raj Vatsya
where K N ( x, y ) =
∑
N n , m =1
(ψ nm , K )ψ nm ( x, y ) . We have that || A − A N ||2 → 0 , and N →∞
hence || A − A N ||≤|| A − A N ||2 → 0 . The operator A N being of finite rank is compact. N →∞
The result now follows from Lemma 2.4 • Corollary 2.5. Let {λn } be the eigenvalues of a symmetric Hilbert-Schmidt operator.
The Hilbert-Schmidt norm is given by
∞
∑ n=1| λn |2 .
Proof. From Lemma 2.5, the operator is compact. Thus, it is a self-adjoint operator with purely discrete spectrum. Form Corollary 2.3, its eigenvectors form a complete set. The result follows from Lemma 2.5 by taking ϕ n , n = 1, 2,... to be its eigenvectors • Remark 2.6. A trace class operator can be expressed as a product of two Hilbert-Schmidt operators. These operators are clearly compact. The trace norm of a symmetric trace class
operator is given by
∞
∑ n=1| λn | •
Inverse of a compact operator, if it exists, is unbounded except for the finite rank operators. Therefore, for a Hilbert-Schmidt operator, the properties of the Fredholm integral equations of the first kind
Af = g,
(2.9-a)
and the second kind
(z − A) f = g ,
(2.9-b)
respectively, where z is a complex variable, differ significantly. Numerical determinations of the solutions of Eq. (2.9-a )) are known to run into instabilities resulting from the −1
unbounded character of A . The following result provides a criterion for the solvability of Eq. (2.9-a ). If the condition of Theorem 2.9 is satisfied, then Eq. (2.9-a) can be solved by a simple method, with a number of useful applications. The proof is by construction and thus, provides an algorithm to obtain the solutions numerically. For the purpose of Theorem 2.9, the variable x is not restricted to the interval [a; b] . Instead, it will be assumed to vary over a set. Furthermore, the range space is of no consequence, which will therefore not be specified. Also, the result applies to infinite domains without any adjustments. 2
Theorem 2.9. Let an integral transform A be defined on L ([ a; b], dx) by Eq. (2.8-a).
If there is a set {xn } such that {φn } defined by
φn ( y ) = K * ( xn , y ) forms a basis in H ,
then A is invertible and the solution of Eq. (2.9-a) is determined by {g ( xn )} .
Foundations and Applications of Variational and Perturbation Methods
47
Proof. Under the assumptions, Eq. (2.9-a) implies that (φn , f ) = g ( xn ) = g n . Linear
combinations of vectors in {φn } can be used to construct an orthonormal basis {ϕn}, e.g., by the Gram-Schmidt process (Remark 1.3), and vice versa. Let approximation f n to f is given by f n =
α m = (ϕ m , f ) =
∑
m l =1
∑
n m =1
* γ ml (φl , f ) =
ϕ m = ∑ l =1 γ mlφl . An m
α mϕ m , where
∑
m l =1
* γ ml gl .
The vector f n is just the expansion of f in terms of an orthonormal basis and hence,
|| f n − f || → 0 • n →∞
Remark 2.7. For practical applications, construction of an orthonormal basis is not
convenient at times. Since the sets {φn } and {ϕn} can be obtained as linear combinations of the members of each other, straightforward algebraic manipulations show that the same result is obtained by solving the matrix equation
∑
n l =1
(φm , φl ) β l = g ( xm ), m = 1, 2, ...., n .
The matrix with elements (φm , φl ) is known as the normalization matrix, which is positive definite. Therefore, it is amenable to inversion by standard methods, e.g., the Choleskii decomposition. The approximation f n is given by
fn =
∑
n m =1
β m φm •
Examples of practical importance of the applications of this result, particularly the image reconstruction by ray tomographic technique, will be given in the sequel. Here we illustrate the result by deriving a few known algorithms, which have been obtained by other methods. Example 2.2. Consider the Laplace transform:
variable to x = ( 2 exp[−t ] − 1) , we have
∫
1
−1
dx (1 + x ) f ( t ( x) ) = 2( λ +1) g (λ + 1) , λ
∫
∞
0
e − λt f (t ) dt = g (λ ) . By a change of
48
S. Raj Vatsya
reducing the Laplace transform to an integral transform in H = L ([ −1;1], dx) . Although, 2
the construction is valid with other sets, let us restrict to convenient to express the transform in terms of the moments
μn =
∫
1
λ = n = 0,1, 2, ... It is more μ n defined by
dx x n f ( t ( x) ) , n = 0,1, 2,..... ,
−1
which can be determined recursively from g ( n + 1) . From the Weirstrass approximation theorem (Lemma 1.1), polynomials approximate every continuous function on [−1;1] with respect to the supremum norm, and since the set of continuous functions is dense in
H = L2 ([−1;1], dx) , polynomials form a basis in H . Thus the construction of Theorem 2.9 can be used to invert the Laplace transform. The orthonormal basis is given by the normalized Legendre polynomials,
ϕn ( x) =
n
∑γ
m =0
nm
x m , n = 0,1, 2,... .
It follows that n
β n = (ϕn , f ) = ∑ γ nm μm , and hence m=0
f (t ( x)) =
∞
∑ β n ϕ n ( x) = n=0
∞
∑ βn n=0
n
∑γ m=0
nm
xm .
The series on the right side converges with respect to the norm in H . Alternatively, the same approximation is obtained by solving the normalization matrix equation (Remark 2.7). An alternative proof of the validity of this algorithm can be found, for example in Bellmann, Kalaba and Lockett (1966) • Example 2.3. Next consider the Fourier transform:
1
∫e 0
2π ixy
f ( y ) dy = g ( x) . The set
{e −2π inx }, n = 0, ±1, ±2, .... forms an orthonormal basis in H = L ([ a; b], dx) , which follows from the validity of the 2
Fourier series expansion of all absolutely integrable functions, which contains H . Another independent proof of the completeness of this basis will be given in sec. 2.V. It follows from the construction of Theorem 2.9, that
f ( x) =
∑
∞ n =−∞
e −2π inx g (n) ,
Foundations and Applications of Variational and Perturbation Methods
49
with the series being convergent with respect to the norm in H , which provides an alternative proof of the validity of the Fourier series expansion for square integrable functions. The sums of the type
f ( n) =
∑
N m =0
e−2π imn g (m)
can be evaluated efficiently by the fast Fourier transform routines, which enable calculations of the forward and backward Fourier transforms on equi-spaced meshes • Fourier transforms, both on finite and infinite domains, are about the most widely used integral operators in analysis and applications. Various analytical technicalities are easily made transparent with the help of the Fourier transforms on infinite domains, which would otherwise require quite complex arguments. Therefore, although the properties of these transforms are widely available in literature, some will be briefly considered below. Consider the transform
(Uf )( x) =
1 2π
∫
∞
−∞
e− ixy f ( y ) dy = g ( x) .
(2.10)
The aim is to show that U is a unitary operator from H = L ((−∞; ∞), dx) to itself. To this 2
end, we obtain Lemma 2.6. For continuously differentiable and integrable f on the interval [ −a; b] ,
a, b > 0 , lim
λ →∞
1
π∫
b
−a
f ( x0 + x)
sin(λ x) dx = f ( x0 ) . x
Proof. Decompose the integral on the left side as
∫
b
−a
f ( x0 + x)
b sin(λ x ) sin(λ x) dx = f ( x0 ) ∫ dx + −a x x
∫
b
−a
f ( x0 + x) − f ( x0 ) sin(λ x) dx x
Since f is continuously differentiable, the integrand in the second integral is continuous, and hence converges to zero in the limit as λ → ∞ from the Riemann-Lebesgue lemma (Lemma 1.5), and the first integral in the limit is easily evaluated to be equal to π • Lemma 2.7. The Fourier transform g ( x ) =
1 2π
∫
∞
−∞
f ( y ) e− ixy dy of a continuously
differentiable and absolutely integrable f on the real line is continuous, and
50
S. Raj Vatsya
f ( x) =
1 2π
∫
∞
−∞
g ( y ) eixy dy .
Furthermore, if f is also square integrable, then || f || = || g || . Proof. Since f is absolutely integrable, g ( x) is defined as a continuous function. Now,
1 2π
∫
∞
−∞
1 2π
g ( y) eixy dy = lim
λ →∞
λ
∫ λ dy e ∫ ixy
−
∞
−∞
1 = lim λ →∞ 2π
∫
∞
dx′ f ( x′) e− ix′y
dx′ f ( x′)
−∞
λ
∫ λ dy e
. i ( x − x′ ) y
−
by Fubini's theorem (Theorem 1.4). It follows that
1 2π
∫
∞
−∞
g ( y) eixy dy = lim
λ →∞
1
π
∫
∞
−∞
dx′ f ( x′)
sin(λ[ x − x′]) = f ( x) , [ x − x′]
from Lemma 2.6. The fact that || f || = || g || follows from standard manipulations and an application of 2
2
Fubini's theorem • We now have the desired property of the operator U stated as Theorem 2.10. Theorem 2.10. The operator U defined by Eq. (2.10) admits a unitary extension to
H = L2 ((−∞; ∞), dx) . Proof. From Lemma 2.7, U is a norm-preserving operator on the intersection of
L ((−∞; ∞), dx) , H = L2 ((−∞; ∞), dx) and the set of continuously differentiable functions 1
on the same infinite domain. Such sets form a dense set in H = L ((−∞; ∞), dx) , for 2
example the linear span of the set
{x
n
exp(− x 2 )}
∞ n=0
, which is equivalent to the Hermite
basis. Hence U is a densely defined bounded operator with unit norm, and thus, from Theorem 2.1, extendible to the entire H as a bounded operator U ′ with unit norm. Furthermore, for a sequence { f n } in the dense set,
|| U ′f || = lim || U ′f n || = lim || Uf n || = lim || f n || = || f || , n →∞
implying that U ′ is unitary •
n →∞
n →∞
Foundations and Applications of Variational and Perturbation Methods
51
This extension is frequently referred to as the Fourier-Plancherel transform. The Fourier transforms of square integrable functions are inherently assumed to be defined in this manner, which will still be denoted by U . The formal operations with the Fourier transforms that may not be valid with square integrable functions are carried out with the dense set of continuously differentiable and absolutely integrable functions and their validity for the square integrable functions is checked as in Theorem 2.10. To avoid repetition, the formal operations will be assumed to be valid with the square integrable functions unless they have significant impact on the analysis and results. Integral operators encountered in applications frequently arise as the inverses of the differential operators, considered in the next section.
2.V. Differential Operators Most of the equations modeling physical phenomena involve differential operators defined formally as the linear combinations of the derivatives of various orders with functions as coefficients. An ordinary differential equation involves functions of one variable, while the partial differential equations involve more variables. The descriptions will be centered about the derivatives up to the second order. Although a parallel formulation can be developed for the higher order equations, they will not be encountered in the material covered for the present. The equations will be expressed in terms of the operators in function spaces, complementing the classical methods, resulting in a richer theory. As is the case with the integral operator equations, the differential equations can be expressed as,
(T − z ) f = g
(2.11)
where T is a formal differential operator and z is in general, a complex number. For g = 0 , Eq. (2.11) reduces to an eigenvalue equation. Eq. (2.11) can be expressed as an operator equation by selecting a suitable function space, large enough to accommodate the solutions of 2
interest. Consider the case when the underlying space is H = L ([0;1], dx) . This case extends to include a variety of equations defined for the functions on finite domains. The case of infinite domains alters the properties of the operators in a fundamental way, which will be discussed following this case. The basic concepts for the finite domains can be illustrated with the operators introduced in Example 2.1, which is done in the following examples. Example 2.4. With symbols as in Example 2.1, Remark 2.2, the eigenvalues and the eigenvectors of the operator A ′ are easily seen to be given by
2 π n and {e−2π inx }, n = 0, ±1, ±2, .... , respectively. To determine if there are any other points in
σ ( A ′) , we solve Eq. (2.11) with
A ′ replacing T . In the process, the usual method to invert a differential operator will be re-
52
S. Raj Vatsya
interpreted, which will be shown to have an interesting application for the inversion of tridiagonal matrices in ch. 5. For convenience, let S stand for ( A ′ − z ) . Considered as an operator equation, if Eq. (2.11) has a unique solution then the associated homogeneous equation S u = 0 has no nonzero solution in H (Lemma 2.1). However, the operator S may have extensions with zero as its eigenvalue. In the present case, S ′ = ( Aˆ − z ) provides one such extension, which has zero for a simple eigenvalue with the corresponding eigenvector u0 ( x) = exp[ −izx] . If g admits an extension g ′ to R (S ′) , Eq. (2.11) admits an extension
S ′ f ′ = g′ ,
(2.12)
We assume that Eq. (2.12) has a solution f ′ = f% . Then it has one parameter family f ′(κ ) of solutions given by f ′(κ ) = κ u0 + f% . Now let there be a value
κ 0 such that f ′(κ 0 )
admits restriction h in D(S ) , then S ′ f ′(κ 0 ) restricted to R (S ) is equal to S h = g and since Eq. (2.11) has a unique solution, we have that h = f . A solution f% can be obtained by expressing it as f% = u0 w . By virtue of the fact that S ′ maps u0 into the null vector, the equation for w is simpler than Eq. (2.12), which can be solved to yield x f% = ie− izx ∫ dy eizy g ′( y ) , 0
defined for an arbitrary g ′ in H , and thus one can set g ′ = g . The resulting family of solutions is given by x
f ′(κ ) = κ e − izx + ie− izx ∫ dy eizy g ( y ) , 0
The desired value of
κ0 =
κ 0 can be found by setting f ′(κ 0 ;0) = f ′(κ 0 ;1) yielding
1 ie − iz dy eizy g ( y ) . − iz ∫0 (1 − e )
Substituting for
f ( x) = with
(2.13)
κ 0 in Eq. (2.13) and by rearranging the terms, we obtain 1
∫ dy G ( x, y ) g ( y) 0
(2.14-a)
Foundations and Applications of Variational and Perturbation Methods
i ⎧ − iz ( x − y ) , 0 ≤ y ≤ x ≤ 1 ⎪ (1 − e− iz ) e ⎪ * G ( y , x ) = G ( x, y ) = ⎨ ⎪ −i e− iz ( x − y ) , 0 ≤ x ≤ y ≤ 1. iz ⎪⎩ (1 − e )
53
(2.14-b)
In the present case, g admits a trivial extension. Consequently the technicalities pertaining to the domains and the ranges of the operators become redundant. However, they are significant in general, as will be seen while applying this procedure to develop a method to invert the tri-diagonal matrices. Now, we return to the original issue, i.e., the spectrum of A ′ . It is clear from Eqs. (2.14a, b) that for each z ≠ 2π n , ( A ′ − z )
−1
exists as an integral operator with a symmetric
Hilbert-Schmidt kernel with the whole of H as its domain. Hence, ( A ′ − z )
−1
is a bounded
operator, in fact compact. Consequently, each z ≠ 2π n is in the resolvent set of A ′ . It
follows that A ′ has a pure point spectrum.
It was shown in Example 2.1 that A ′ is self-adjoint. This property was not required to
deduce the above results. Self-adjointness of A ′ can also be established completely within the confines of the present example, from the above results. There is a real z not in (Proposition 2.3), such that ( A ′ − z )
−1
σ p ( A ′)
is self-adjoint, since it is a bounded symmetric
operator with H as its domain. Hence, ( A ′ − z ) , and therefore, A ′ , is self-adjoint, e.g., from Theorem 2.7. Furthermore, since ( A ′ − z )
−1
is compact, it has a discrete spectrum with at most zero
being the accumulation point. Therefore, it follows from the spectral mapping theorem that
σ ( A ′) = σ p ( A ′) (Remark 2.3). For this argument, it is sufficient that ( A ′ − z )−1 exists as a compact operator for just one fixed value of z . This coupled with the symmetry of ( A ′ − z ) −1 for a real value of z , establishes the self-adjointness of A ′ . Further, from Corollary 2.3,
{e−2π inx }, n = 0, ±1, ±2, .... , forms an ortho-normal basis in H , which is another proof of the validity of the Fourier series representation of the square integrable functions on the finite domains used in Example 2.3 • Example 2.5. As indicated, the above procedure to invert an operator does not require self-adjointness, or even symmetry. As an example of a non-symmetric operator, consider the restriction S 0 of S ′ with all vectors u in D(S ′) being in D (S 0 ) as long as u (0) = 0 . In
view of the fact that u (0) = 0 , S 0u = 0 implies, that u = 0 , and hence S 0 is invertible. Thus the solution of the corresponding Eq. (2.11) can be selected from the family given by Eq. (2.13), which is
54
S. Raj Vatsya x
f ( x ) = ie −izx ∫ dy eizy g ( y ) . 0
Exactly the same procedure applies to the restriction S1 of S ′ with all vectors u in
D(S ′) with u (1) = 0 being in D(S1 ) , yielding the solution x
f ( x ) = ie −izx ∫ dy eizy g ( y ) • 1
ˆ with A being as in Example 2.6. In this example, we consider the operator D = AA Example 2.1. While all the issues considered in Examples 2.1 and 2.4 are relevant to this example as well [Stone, 1932; ch. X], we restrict to the consideration of the additional concepts only. By integration by parts, we have 1
− (υ , u&&) + (υ&&, u ) = ⎡⎣υ *u& − υ& *u ⎤⎦ , 0
(2.15)
Since D(D ) consists of absolutely continuous functions in H with second derivative being square integrable and vanishing boundary conditions, (υ , D u ) = (Dυ , u ) for all u and υ in
D(D ) . Consequently, D is a symmetric operator. Also, it follows from Eq. (2.15), that D ( Dˆ ) = D ( D ) , and the action of D and Dˆ on their common domain is also the same. Hence, D is self-adjoint. Further, since (u , D u ) =
∫
1
0
2
| u& ( x) | dx ≥ 0 ,
D is also non-negative. In fact, D is strictly positive, since (u, Du ) = 0 implies that | u& ( x) |= 0 almost everywhere, and hence u ( x) = const. , but constant functions are not in D(D ) . The eigenvalues and the corresponding normalized vectors of D are given by ( nπ ) and [ 2 sin( nπ x)] , n = 1, 2,... Again, to find out if there are any other points in
2
σ (D ) ,
we solve Eq. (2.11) with (T − z ) replaced by D , by the same argument as in Example 2.4. To this end, let D ′ be the extension of D , obtained by dropping the vanishing boundary conditions. The operator D ′ maps two-dimensional subspace, spanned by constant and x , into 0. Thus, Eq. (2.11) admits an extension. D ′ f ′ = g ′ , which has a two-parameter family of solutions
f ′(κ ) = κ 0 x + κ 1 (1 − x) + f% .
Foundations and Applications of Variational and Perturbation Methods
55
The eigenvectors in the eigenspace corresponding to the eigenvalue zero are chosen for convenience. As in Example 2.4, we attempt to determine a solution f% by expressing it as
f% ( x ) = (1 − x) h0 ( x ) + x h1 ( x ) . Instead of determining the general solutions h0 ( x) and h1 ( x) , we select the vector, which is in D(D ) from the family f ′(κ ) . If the solutions are restricted by h0 (0) = 0 and
h1 (1) = 0 , then one can set κ 0 = κ1 = 0 and then f% is in D(D ) , and hence f% = f . Furthermore, with this representation, the solution of Eq. (2.11) can be written as
f ( x ) = (1 − x)
∫
x
0
h&0 ( y ) dy + x
∫
x
1
h&1 ( y ) dy ,
and thus, only the first derivatives of h0 ( x) and h1 ( x) need be determined, which is standard. The solution is given by 1
∫ G ( x, y )
f ( x) =
0
g ( y ) dy ,
⎧(1 − x) y, 0 ≤ y ≤ x ≤ 1 ⎪ with G ( y, x ) = G ( x, y ) = ⎨ ⎪(1 − y ) x, 0 ≤ x ≤ y ≤ 1 • ⎩ *
(2.16-a)
(2.16-b)
Example 2.7. Consider the same D as in Example 2.6 but now with the boundary condition
a (∂ )u (∂ ) + b(∂ )(∂u (∂ )) = 0 ,
(2.17)
denoted by T , where ∂ are the boundary points, zero and one, and ∂u (∂ ) is the derivative in the direction of the outward normal at the boundary. We shall also assume that a(∂ ) and
b(∂ ) are non-negative and a (∂ ) is not identically equal to zero. If it is zero at both points, then each constant is in D (T ) , yielding a zero eigenvalue, requiring some adjustments to the analysis and results, but can be treated by the same methods. Without loss of generality we can also assume that a (0) = 1 , and then a (1) can be an arbitrary non-negative number. The boundary condition can now be expressed as
u (0) − b(0)u& (0) = 0 and a(1)u (1) + b(1)u& (1) = 0 . This operator can be studied exactly as its special case in Example 2.6. The operator T is now defined by
(2.18)
56
S. Raj Vatsya
(T u )( x) = − u&&( x) ,
(2.19)
together with the boundary condition given by Eq. (2.18), on the interval [0;1] . By the standard methods, which can be interpreted as in Example 2.6, T
f ( x) =
1
∫ G ( x, y ) 0
−1
is given by Eq. (2.14):
g ( y ) dy ,
(2.20-a)
⎧(1 − cx) [b(0) + y ], 0 ≤ y ≤ x ≤ 1 1 ⎪ , (2.20-b) G ( y , x ) = G ( x, y ) = ⎨ [1 + b(0)c] ⎪(1 − cy ) [b(0) + x], 0 ≤ x ≤ y ≤ 1 ⎩ *
where c = a (1) /[a (1) + b(1)] ≤ 1 • −1
It is straightforward to check that T is a self-adjoint, positive definite, Hilbert-Schmidt operator, implying that T is a positive definite operator with a pure point spectrum. −1
given by Eqs. (2.20-a) and (2.20-b) yields the Furthermore, the representation of T following useful result, which is a special case of the first half of the positivity lemma for the elliptic operators, having a number of applications, particularly in the transport problems. General case will be considered in ch. 7. The set of non-negative functions on [0;1] , which are strictly positive in its interior, will be termed strictly positive. Lemma 2.8. Let 0 ≤ λ < λ0 , where
λ0 is the lowest eigenvalue of T , and let u
2
in H = L ([0;1], dx) be a strictly positive function on a set of non-zero measure in [0;1] . Then (T − λ )
−1
2
maps u into a positive function in H = L ([0;1], dx) .
Proof. Since G ( x, y ) as given by Eq. (2.20-b) is a strictly positive function in the −1
interior of [0;1] × [0;1] , T u is strictly positive everywhere in the interior of [0;1] . If
T − n u is strictly positive, then, by the same argument, T − ( n +1)u is strictly positive. By the −n induction principle, T u is strictly positive for all n ≥ 1 . −1 −1 Since || λT ||= λ / λ0 < 1 , (T − λ ) u is given by its Neumann expansion (Lemma 2.3):
(T − λ ) −1 u = T −1 (1 − λT −1 ) −1 u = T −1[u + λT −1u + (λT −1 ) 2 u + .... + (λT −1 ) n u + .... ]. Since each term on the right side is positive, its partial sums form a strictly increasing sequence of positive functions, and hence, the sum is a strictly positive function in H . Since the expansion converges in H , the left side is in H •
Foundations and Applications of Variational and Perturbation Methods
57
While a parallel treatment of the operators on infinite domains is possible, it is more convenient to treat them with the help of the Fourier-Plancherel transforms (Theorem 2.10). The operator ( A% f% )( k ) = kf% ( k ) on H = L2 (( −∞; ∞), dk ) i.e., the operation of multiplication by k , is clearly symmetric and densely defined, e.g., on the set introduced in Theorem 2.10. It also admits a unique self-adjoint extension, which will be shown to be true by constructing its spectral function, which is instructive in itself. Let the operator E% be defined by ( E% f% )( k ) = 0 for λ < k and for λ ≥ k , λ
λ
( E% λ f% )(k ) = f% (λ , k ) , i.e., ( E% λ f% )(k ) = θ (λ − k ) f% (λ ) where θ ( x) is the right continuous step function with unit jump at zero. It can be easily checked that E% is a spectral function as λ
defined in Theorem 2.6. The above operation of multiplication is now expressed as ∞ ( A% f% )( k ) = ∫ λ ( dE% λ f% )(λ , k ) , and hence A% is self-adjoint. −∞
Now consider the formal derivative ( A ′ f )( x ) = − if& ( x ) restricted to the set of sufficiently smooth functions in H = L ((−∞; ∞), dx) . It follows by standard manipulations 2
that
(U A ′ f )( x) =
1 2π
∫
∞
−∞
dx k e−ikx f ( x) = ( A% Uf )( k ) ,
(2.21)
i.e., A ′ = U A% U , which extends to H = L ((−∞; ∞), dx) as in Theorem 2.10. This −1
2
extension will be denoted by A . The operator so defined, being unitarily equivalent to a selfadjoint operator A% , is itself self-adjoint. The spectral function Eλ of the derivative can also be obtained explicitly. From the −1 spectral theorem, Theorem 2.6, we have that Eλ = U E% λU , and thus,
λ ∞ 1 ikx dk e dy e− ikx f ( y ) ∫ ∫ −∞ −∞ 2π . λ 1 ikx = dk e (Uf )( k ) 2π ∫−∞
( Eλ f )( x) =
(2.22)
We have used the fact that E−∞ = 0 . The spectral function Eλ is differentiable with its derivative defined by
( E& λ f )( x) = (
dEλ f )( x) = dλ
1 (Uf )(λ ) eiλ x . 2π
(2.23)
The derivative E& λ does not define an operator on H = L ((−∞; ∞), dx) since its 2
domain is empty. Nevertheless, differentiability of Eλ as a function of λ implies that the
58
S. Raj Vatsya
spectrum of A and hence that of A% , is absolutely continuous consisting of the entire real line. Since, the functions u (λ )
in the range of
E& λ have the eigenvector-like
property: ( A ′u (λ ))( x ) = λ (u (λ ))( x ) , they are termed the generalized eigenfunctions, which are closely related to the resolvent of A , as is the case with the differential operators on finite domains where the eigenvectors of appropriate extensions are used to construct the −1 resolvents. The derivative of E% λ is given by UE& λU , which is formally denoted by the
Dirac delta function
δ (λ − k ) .
Higher order derivatives are obtained by repeated operations of A , which are unitarily equivalent to the repeated operations of A% . Variety of functions of A can be obtained by using Theorem 2.7, by
h( A ; z ) = U −1h( A% ; z )U .
(2.24)
Multi-dimensional differential operators can be constructed as the tensor products of onedimensional ones. In some cases they may be constructed directly as above. Among them, the self-adjoint realization of the three-dimensional second derivative, still denoted by ( −∇ ) , is 2
of particular interest. Both methods will be used for its treatment as convenient. 2
Let h1 , h 2 , h3 be equal to A on
L2 [(−∞; ∞), dx1 ], L2 [(−∞; ∞), dx2 ], L2 [(−∞; ∞), dx3 ], respectively. Then ( −∇ ) is given by the tensor product 2
−∇ 2 = [h1 ⊗ 12 ⊗ 13 + 11 ⊗ h 2 ⊗ 13 + 11 ⊗ 12 ⊗ h 3 ] .
(2.25)
where 1ν is the identity in L [(−∞; ∞), dxν ] and the underlying Hilbert space is given by 2
H = L2 ((−∞; ∞), dx1 ) ⊗ L2 ((−∞; ∞), dx2 ) ⊗ L2 ((−∞; ∞), dx3 ) = L2 ( R 3 , dx1dx2 dx3 ) Alternatively, ( −∇ ) is unitarily equivalent to the operation of multiplication by k on its 2
2
domain in H = L ( R , dk ) . The only adjustment required in the steps used for A 2
3
2
is to
replace the one-dimensional Fourier-Plancherel transform by its three-dimensional equivalent: 1 dr eik r f (r ) (Uf )(k ) = 3/ 2 R (2π )
∫
=
1 (2π )
3/ 2
3
∫
2π
0
dφ
∫
π
0
sin ϑ dϑ
∫
∞
0
r 2 dr eik r f ( r , ϑ , φ ).
Foundations and Applications of Variational and Perturbation Methods
59
Functions of self-adjoint operators can be obtained from Eq. (2.24) with obvious substitutions. The resolvent of (−∇ ) is particularly interesting for its applications. For 2
Im.( z ) ≠ 0 , we have 1
([ −∇ 2 − z ]−1 f )(r ) =
(2π )
3
1
=
(2π )3
∫
R
∫
3
R
dk ′ (k ′ 2 − z )
∫
R
3
dr ′e
ik ′ ( r − r ′ )
dk ′ ik ′ ( r − dr ′ f (r ′) ∫ e 2 −∞ (k ′ − z ) ∞
3
f (r ′) (2.26) r′)
,
where f is restricted to the set of absolutely integrable functions, usually taken to be the set of functions that vanish outside a sphere of large radius, so that the interchange of integration is justified by Fubini's theorem (Theorem 1.4). Since the operator is bounded by Corollary 2.4 and the set of absolutely integrable functions is dense in
H = L2 ( R 3 , dx1dx2 dx3 ) = L2 ( R 3 , r 2 dr sin ϑ dϑ dφ ) , the relation extends to the entire H from Theorem 2.1. The integral is now evaluated by elementary methods of contour integration to yield
∫
([−∇ 2 − z ]−1 f )(r ) =
R
3
dr ′G 0 (r , r ′; z ) f (r ′) ,
(2.27)
where Green's function G (r, r′; z ) is given by 0
G 0 (r , r ′; z ) =
1
exp ⎡⎣i z | r − r ′ |⎤⎦
4π
| r − r′ |
, Im ( z ) > 0 , Im ( z ) > 0 . (2.28)
Plane waves are the generalized eigenfunctions for (−∇ ) . 2
The spectrum of ( −∇ ) is absolutely continuous coinciding with the non-negative real 2
line. Therefore ( −∇ − k ) is invertible for all real values of k . However, the inverse is 2
2
unbounded and thus, discontinuous. The discontinuity of the resolvent is manifest in the inequality
s − lim [−∇ 2 − (k 2 + iε )]−1 = [−∇ 2 − (k 2 + i 0)]−1 ε →0
≠ [−∇ 2 − (k 2 − i 0)]−1 = s − lim [−∇ 2 − (k 2 − iε )]−1. ε →0
This discontinuity defines the branch cut associated with the resolvent in the complex plane along the non-negative real line. The inverses above and below the cut are expressed as the
60
S. Raj Vatsya
integral operators in terms of the retarded and advanced Green's functions G+ (r, r′; k ) 0
2
and G− (r , r′; k ) , respectively, given by 0
2
1 exp [ ±ik | r − r ′ |] , 4π | r − r′ |
G±0 (r, r′; k 2 ) =
(2.29-a)
from Eq. (2.28). On a point z = k on the cut, the principal value integral still exists yielding the principal value Green’s function 2
G00 (r, r′; k 2 ) = [G+0 (r, r′; k 2 ) + G−0 (r, r′; k 2 )] / 2 ,
(2.29-b)
which is symmetric and defines a self-adjoint operator. While the Green’s functions define bounded operators for Im ( z ) ≠ 0 , in the limit of the real line, they are not defined for all functions. However, the limits may exist in some sense other than the norms in H , defining the generalized eigenfunctions for a perturbed operator. This topic will be discussed in ch. 4. Next, we determine one parameter family of unitary operators exp[i∇ t ] . Most of the 2
operations are justified as earlier. To avoid repetition, only the additional arguments required will be indicated. Since the three operators in the sum defining ∇ on the right side of Eq. (2.25) commute 2
with each other, exp[i∇ t ] is given by 2
exp[i∇ 2 t ] = exp[−ih1t ] exp[−ih 2t ] exp[−ih3t ] .
(2.30)
Therefore, it suffices to determine exp[−ih1t ] . It follows from the Lebesgue theorem (Theorem 1.3) that
exp[−ih1t ] = for
∫
∞
0
∞
e− iλt dEλ = lim ∫ exp[ −it (λ − iελ ) dEλ , ε →0 0
ε , t ≥ 0 , since the integrand is bounded by 1, which is integrable with respect to
(u, Eλυ ) . This representation permits the use of Fubini’s theorem needed for the following manipulations. The rest of the deduction uses the same arguments as Lemma 2.7, yielding
Foundations and Applications of Variational and Perturbation Methods (exp[ −ih1t ]u )( x) = =
1 lim 2π ε →0
∫
∞
−∞
dη eiη x
∫
∞
−∞
61
dy exp[ −itη 2 (1 − iε ) − iky] u ( y)
2 ⎡ ⎛ x− y ⎞ ⎤ 2 − − − exp (1 ) d η it η i ε η ⎢ ⎜ ⎟ ⎥ ∫−∞ 2t (1 − iε ) ⎠ ⎥⎦ ⎢⎣ ⎝ ∞ ⎡ i( x − y)2 ⎤ 1 = lim exp dy ⎢ ⎥ u ( y ), ε → 0 2 π it + ε ∫−∞ ⎣ 4t (1 − iε ) ⎦
1 lim 2π ε →0
⎡ i( x − y ) 2 ⎤ ∫−∞ dy exp ⎢⎣ 4t (1 − iε ) ⎥⎦ u ( y) ∞
∞
i.e.,
u ( x; t ) = (exp[ −ih1t ] u (0))( x) =
⎡ i( x − y)2 ⎤ dy exp ⎢ 4t ⎥ u ( y;0) , (2.31) 2 π it ∫−∞ ⎣ ⎦ 1
∞
again by the Lebesgue theorem. Since Eq. (2.31) is valid for each u in a dense set and
exp[ −ih1t ] is bounded, the relation extends to L2 ((−∞; ∞), dx) by closure (Theorem 2.1). It follows from Eqs. (2.30) and (2.31) that
u (r; t ) = (exp[i∇ 2 t ]u (0))(r ) = (4π it ) −3/ 2
∫
R
3
⎡ i (r − r′) 2 ⎤ dr ′ exp ⎢ ⎥ u (r ′;0) . ⎣ 4t ⎦
(2.32)
Similar operations show the validity of Eqs. (2.31) and (2.32) for t ≤ 0 . From Eq. (2.6), that u (t ) defined by Eq. (2.32) is the solution of the time-dependent free particle Schrödinger equation
⎡ ∂ 2⎤ ⎢⎣i ∂t + ∇ ⎥⎦ u (t ) = 0 ,
(2.33)
with a prescribed u (0) . Eq. (2.31) provides the solution for the one-dimensional counterpart of Eq. (2.33). Eqs. (2.31) and (2.32) yield the propagator representations of exp[−ih1t ] and
exp[i∇ 2 t ] , respectively, as defined in Eq. (1.7-b) (Remark 1.2). At times it is more convenient to use the units in which Eq. (2.33) reads as
⎡ ∂ 2⎤ ⎢⎣i ∂t + α∇ ⎥⎦ u (t ) = 0 ,
(2.34)
with α being a constant. The corresponding propagator can be obtained by the variable change t → α t yielding
u (r; t ) = (exp[iα∇ 2 t ]u (0))(r ) = (4π iα t ) −3/ 2
⎡ i (r − r′)2 ⎤ ′ d r exp ⎢ 4α t ⎥ u (r′;0) . (2.35) ∫R3 ⎣ ⎦
Chapter 3
3. VARIATIONAL METHODS 3.I. FORMULATION Classical calculus of variations extends the notion of finding the extremals of the functions of a finite number of variables to finding the extremals of the functionals. As indicated in ch. 2, a functional will be considered a map from a Banach space to the Banach space of complex numbers. In general only the stationary points are sought, instead of the extremals, but the term is used often for brevity unless further clarification becomes necessary. The classical developments are briefly considered without going into technical details. Consider the functional F [ x] defined by
F [ x] =
t1
∫t
0
L( x, x& , t )dt ,
(3.1)
where x(t ) is in general a multi-dimensional function. We restrict to the one-dimensional case. Extensions are straightforward. The dot denotes the derivative with respect to the variable t . The domain of the functional F [ x] constitutes of the admissible functions x(t ) , contained in a suitable Banach space. The integrand is the function of its arguments. The problem to be considered is to obtain the extremals of F [ x ] , if they exist, by varying x(t ) over the domain. A useful example is to determine the curve followed by a non-relativistic particle of mass m in a force field generated by a potential V ( x) . In this case
L( x, x&, t ) =
1 m x& 2 − V ( x) . 2
(3.2)
As is the case with other similar formulations in mechanics, L ( x, x& , t ) is called the
Lagrangian, F [ x] = S is the action and the function x(t ) of time t is the position in the underlying Euclidean space. Hamilton's principle of least action defines the path taken by the particle to be the graph of the minimizing function x(t ) . As indicated above, in general, only
64
S. Raj Vatsya
the stationary values are sought, which may be maxima, minima or the saddle points. Thus, the problem of mechanics reduces to a problem of the variational calculus. The above notation was motivated by the physical meanings of the symbols in classical mechanics. As the question of finding the extremals of the functions is approached in terms of their derivatives, the problem of finding the stationary values of the functionals is approached in terms of the functional derivatives. Consider the variations of F [ x ] as x(t ) is varied by a
δ x keeping the end points x(t0 ) and x(t1 ) fixed. To be precise, let δ x = εξ (t ) , where ε is a parameter and the function ξ varies over the set of admissible small but arbitrary amount
functions. Since the variations must leave the end points x(t0 ) and x(t1 ) fixed,
ξ (t0 ) = ξ (t1 ) = 0 . By making ε small, δ x can be made as small as desired and by letting ξ vary over a set as large as possible, δ x can be made arbitrary. The value of the functional F at ( x + δ x) is given from Eq. (3.1) by F [ x + δ x ] = F [ x, ξ , ε ] = Thus, for fixed x and
∫
t1
t0
L( x + ε ξ , x& + ε ξ&, t ) dt ,
(3.3)
ξ , the functional is expressed as an ordinary function of ε , reducing
the variational problem to its counterpart for functions. The functional derivative of F [ x] at the function x(t ) , denoted by somewhat of an abuse of notation by (δ F / δ x ) , is defined as
δ F [ x] ⎡d = ⎢ δx ⎣ dε
∫
t1
t0
⎤ L( x + ε ξ , x& + ε ξ&, t ) dt ⎥ . ⎦ ε =0
(3.4)
δ F ( x) is defined as the term having ε as the multiplier in the Taylor series expansion of F [ x, ξ , ε ] in powers of ε , with similar definitions for the higher
The first order variation
order variations. The total variation ΔF [ x] is given by
ΔF [ x] = ( F [ x, ξ , ε ] − F [ x, ξ , 0]) = ( F [ x + δ x] − F [ x]) . Since it is of no consequence, ε is usually absorbed in δ x while defining the variations of the functionals. The higher order functional derivatives can be defined in the same manner, i.e., the higher order derivatives with respect to ε . The operations are legitimate for sufficiently smooth functions. All that is required is that L( x, x&, t ) be twice continuously differentiable function of its arguments. The stationary value of the functional is thus determined by setting its first functional derivative at zero equal to zero. If the second derivative is positive, negative or zero, the stationary point is a minimum, maximum or a saddle point, respectively. Setting the first derivative equal to zero and by integration by parts together with the boundary condition ξ (t0 ) = ξ (t1 ) = 0 yields,
Foundations and Applications of Variational and Perturbation Methods
∫
t1
t0
If
⎡ d ⎛ ∂L ⎞
∂L ⎤
dt = 0 . ξ (t ) ⎢ ⎜ ⎟ − ∂x ⎥⎦ ⎣ dt ⎝ ∂x& ⎠
65
(3.5)
ξ varies over a sufficiently large set, dense in the underlying space in some sense, then the
remainder of the integrand must be zero, which yields the Euler-Lagrange equations,
⎡ d ⎛ ∂L ⎞ ∂L ⎤ ⎢ ⎜ ∂& ⎟ − ∂ ⎥ = 0 . x⎦ ⎣ dt ⎝ x ⎠ Classically,
(3.6)
ξ is assumed to vary over the appropriate subset of the continuous functions
and the results of calculus are invoked to deduce the Euler-Lagrange equations. Alternatively, the left side of Eq. (3.5) can be considered a functional defined on a Banach space and thus, ξ varies over a dense set in its dual. If the Banach space is a Hilbert space, then the left side of Eq. (3.5) is the familiar scalar product. The condition for the extremal is clearly equivalent to the requirement of the first functional variation being equal to zero for all admissible variations of the function. The solution x(t ) of Eq. (3.6) yields the stationary value of the functional F [ x] . If the Lagrangian is defined by Eq. (3.2), the Euler-Lagrange equations reduce to Newton's second && = −∂V ( x) / ∂x . law of motion: mx Consider the equation
(z − A) f
= g,
(3.7)
where A is an operator from a Hilbert space H to H , g is a vector in H and f is an unknown vector in the domain of A , which is dense in H . The associated homogeneous equation
(λ − A )ψ = 0 defines an eigenvalue equation. The solution
(3.8)
ψ of Eq. (3.8), defined within a multiplicative
constant, can be normalized according to convenience. One of the reasons for interest in the variational methods in the scientific and engineering disciplines is for their applications to solve the equations of the type of Eqs. (3.7) and (3.8), assuming the existence and the needed uniqueness properties, which are established by independent considerations. Another reason is the use of the variational principles to formulate the physical problems as in the case of Hamilton’s principle. In case of Eq. (3.7), the interest is in suitable approximation schemes for the solution f while z is a specified complex number, which may be fixed or vary over a region. Eq. (3.7) and its adjoint equation
66
S. Raj Vatsya
( z * − Aˆ ) fˆ = gˆ
(3.9)
are closely related, assuming that the adjoint Aˆ exists, although their solutions f and fˆ , and the right sides are independent of each other. In case of Eq. (3.8), the approximation schemes for the eigenvalues and the eigenvectors are needed, i.e., only the operator is specified. As is the case with the inhomogeneous equation, Eq. (3.8) is also closely related with its adjoint equation
(λ * − Aˆ )ψˆ = 0 .
(3.10)
There is some overlap between the treatments of the inhomogeneous equations and the eigenvalue equations. However, there are significant differences also resulting from the fact that while the parameter z in the inhomogeneous equation is unrestricted within certain domain, the eigenvalue is restricted by the equation it satisfies. First we consider the inhomogeneous equations. The eigenvalue problem will be considered taking advantage of this treatment supplemented with additional necessary considerations. Consider the sesqui-linear form (sec. 2.III.),
F [hˆ, h] = (hˆ, g ) + ( gˆ , h) − (hˆ, [ z − A ]h)
(3.11)
For a fixed h , hˆ , the form F [ hˆ, h] can be considered a functional of the variable vectors
hˆ , h , respectively, as in Proposition 2.5. The classical argument can now be directly applied to determine the stationary values of the respective functionals and the vectors yielding this value. Setting the first variation
δ F [hˆ, h] of F [ hˆ, h] under the variation δ hˆ = uˆ of hˆ ,
while h is kept fixed, yields
δ F [ hˆ, h] |h
fixed
= (uˆ , g ) − (uˆ , [ z − A ]h) = 0 .
(3.12)
If uˆ = δ hˆ varies over a set dense in the range of A , i.e., D ( Aˆ ) , assumed to be dense in H , then from Lemma 1.6, Eq. (3.12) implies that ( z − A ) h = g , and if Eq. (3.7) has a unique solution, e.g., for z in the resolvent set of A , then h = f . Converse is trivially true. This provides a complete variational characterization of the solution of Eq. (3.7). Similarly, the solution of Eq. (3.9) is characterized by considering the first functional variation with respect to the variations of h , as it varies over a dense set. Both equations can be fully characterized simultaneously by considering the first variation of the sesqui-linear form as follows:
δ F [hˆ, h] = (δ hˆ, g ) + ( gˆ , δ h) − (δ hˆ, [ z − A ]h) − (hˆ, [ z − A ]δ h) = 0 . (3.13)
Foundations and Applications of Variational and Perturbation Methods
67
δ h and δ hˆ are varied independently, each on a dense set, then both, Eq. (3.7) and (3.9) are equivalent to Eq. (3.13). This may be considered as (δ h × δ hˆ) varying over a dense set If
in the Cartesian product of the two copies of the same Hilbert space, which can be and usually is, the Cartesian product of the two copies of the same dense set in the original Hilbert space. If A is self-adjoint, z real and gˆ = g , Eq. (3.9) reduces to Eq. (3.7). If in addition, the Hilbert space is defined on the field of real numbers, the diagonal element of the form, which is well defined in terms of a single vector, is sufficient for the variational characterization of the solution. In this case, Eq. (3.11) reduces to
F [ h, h] = F d [ h] = 2( h, g ) − ( h, [ z − A ]h) .
(3.14)
The variational equation
δ F d [h] = 2 [ (δ h, g ) − (δ h, [ z − A ]h)] = 0 , with
(3.15-a)
δ h varying over a dense set, is then equivalent to Eq. (3.7). The total variation of F [h]
about the exact solution in this case is given by
F d [ f + δ f ] − F d [ f ] = (δ f , [ A − z ] δ f ) .
(3.15-b)
For a positive, negative ( A − z ) , the total variation is positive, negative, respectively. Thus, the variational principle acquires the status of a minimum, maximum principle, respectively. For a numerical implementation of the above formulation to solve Eq. (3.7) and (3.9), t t trial functions f and fˆ , with some unknown parameters, are used to construct the form
F [ fˆ t , f t ] . The parameters are determined by setting the first derivative with respect to each t t of the parameters equal to zero. The functions f and fˆ are taken as the approximations to
t t the solutions f and fˆ , respectively, and F [ fˆ , f ] , an approximation to F [ fˆ , f ] , which
is often a quantity of physical interest and may be experimentally observable. Although less frequently, if the only quantity of interest is F [ fˆ , f ] , the approximate solutions from other calculations are also used to approximate it. By virtue of the fact that the first functional t t derivative at the solutions is equal to zero, the error ΔF = ( F [ fˆ , f ] − F [ fˆ , f ]) in the calculated value of the functional is normally assumed to be of the second order in the errors δ f = ( f t − f ) and δ fˆ = ( fˆ t − fˆ ) in the solutions. As shown by the following counter example, careless application of this conjecture can lead to erroneous results.
&& on H = L ([0;1], dx) Example 3.1. Consider Eq. (3.7) with A defined by Au = −u 2
together with the vanishing boundary conditions, u (0) = u (1) = 0 , and other usual
68
S. Raj Vatsya
conditions defining its domain. Let z = 0 and Thus,
δ f = sin(π n 2 x) / n , where n is an integer.
δ f is an admissible variation converging to zero as n increases. It follows that
ΔF
=
d
(F
d
[ f +δ f ] − F [ f ] d
)
=
(δ f , Aδ f )
=
n2 2
→ ∞.
n →∞
This shows that a small variation in the function about the exact solution does not imply a small change in the functional. It will be shown in Corollary 3.4 that this catastrophic result will not be encountered with the solutions obtained by the variational method • To determine the approximations to the solutions within the framework of the variational methods, the following procedure is most widely used for its simplicity. In this scheme, the trial functions, now denoted by f N , fˆN , are expressed as linear combinations of a basis {φn } with unknown parameters α n , αˆ n , i.e.,
fN =
∑ ∑
fˆN =
N n =1
N n =1
α nφn ,
αˆ nφn .
(3.16-a) (3.16-b)
The parameters are determined by setting the derivative of F [ fˆN , f N ] with respect to each of the parameters equal to zero, which is to solve the following set of algebraic equations:
∑ ∑
N
α m (φn , [ z − A ]φm ) = (φn , g ),
(3.17-a)
αˆ m(φn , [ z* − Aˆ ]φm ) = (φn , gˆ ) ,
(3.17-b)
m =1
N m =1
n = 1, 2,......, N . The approximations to the solutions are given by Eq. (3.16-a, b) with the parameters determined by Eqs. (3.17-a, b). In addition to being about the most straightforward application of the variational principle, this procedure incorporates the independence of the variations of the solutions of Eqs. (3.7) and (3.9), and decouples the variationally derived approximate equations. Furthermore, this reduces the study to the consideration of Eq. (3.17-a) and Eq. (3.17-b) essentially independently without direct involvement of the form. This procedure is known as the direct, the Rayleigh-Ritz or the Bubnov-Galerkin method. In the present text, the term variational methods will be used to mean to obtain the approximate solution of Eq. (3.7) by solving Eq. (3.17-a) and taking the approximate solution as defined by Eq. (3.16-a), which
Foundations and Applications of Variational and Perturbation Methods
69
can then be used to compute other physical quantities of interest approximately. The adjoint equation will be invoked when needed. It can be checked by algebraic manipulations that in Eqs. (3.16-a) to (3.17-b), an arbitrary basis can be replaced with an orthonormal basis {ϕ n } without altering the the approximate solution as long as the N -dimensional subspace spanned by {ϕ n }n =1 and by {φn }n =1 is the N
N
same, by virtue of the fact that each set can be obtained by linear combinations of the members of the other. Thus, a basis can be selected for its computational convenience. An orthonormal basis is more convenient for analysis, which may or may not be more convenient computationally. This replacement will be assumed in the following to avoid repeat writing of essentially the same equations. The following result reduces the study of the algebraic equations, Eqs. (3.17-a) and (3.17-b), to the corresponding problem of analysis. Theorem 3.1. Let PN be the orthoprojection on the N -dimensional subspace of H
spanned by the first N basis vectors {ϕ n }n =1 , i.e., PN u = N
∑
N n =1
ϕ n (ϕ n , u ) , and let
AN = PN APN . Then Eq. (3.17-a) has a solution if and only if ( z − A N ) f N = PN g
(3.18-a)
has a solution, and Eq. (3.17-b) has a solution if and only if
( z * − Aˆ N ) fˆN = PN gˆ
(3.18-b)
has a solution, with f N and fˆN as given by Eq. (3.16-a) and Eq. (3.16-b), respectively. Proof. We give a proof for Eq. (3.18-a). The adjoint case is treated in the same manner.
With the basis replaced with an orthonormal basis, Eq. (3.17-a) reduces to
⎡ z α n − (ϕn , A ∑ N α mϕ m ) ⎤ = (ϕ n , g ), n = 1, 2,......, N . m =1 ⎣ ⎦ Multiplying Eq. (3.19) by with f N
=
∑
N n =1
(3.19)
ϕ n and summing over n , reduces it for each z to Eq. (3.18-a)
α nϕ n .
Conversely, if Eq. (3.18-a) has a solution f N , it is given by
z f N = PN g + A N f N =
∑
N n =1
ϕ n (ϕ n , g ) +
∑
N n =1
ϕn (ϕn , A f N ).
70
S. Raj Vatsya
For z ≠ 0 , this implies that f N =
∑
N n =1
α nϕn , where
z α n = (ϕn , g ) + (ϕn , A f N ) = (ϕn , g ) +
∑
N m =1
α m (ϕn , A ϕm )
for each n , which is the same equation as Eq. (3.17-a). If z = 0 , one can consider the equation [ z ′ − ( A + z ′)] f = g , together with its approximating equation
[ z′ − PN ( A + z ′)PN ] f
= PN g ,
with an arbitrary z ′ ≠ 0 to deduce that
z′α n = (ϕn , g ) + ∑ m =1α m (ϕ n ,[ A + z′]ϕm ) , N
which shows the validity of the result for this case as well. We have used the fact that f N = P N f N , which is a consequence of the fact that the right sides in Eqs. (3.18-a,b), and
f N , fˆN defined by Eqs. (3.16-a,b), are in the range of PN • As a consequence of Theorem 3.1, we have Corollary 3.1. Let symbols be as in Theorem 3.1. Assuming that the solutions f N , fˆN
exist, we have
( fˆN , g ) = ( gˆ , f N ) = ( fˆN ,[ z − A N ] f N ) = ( fˆN ,[ z − A ] f N ) = F [ fˆN , f N ] . Proof. Follows by straightforward substitutions in Theorem 3.1 and using the facts that PN f N = f N and PN fˆN = fˆN • Remark 3.1. The equalities of Corollary 3.1 are valid with
f , fˆ , A replacing
f N , fˆN , A N , respectively, whether f N , fˆN converge or not. The fact that the equalities of Corollary 3.1 hold for the approximate solutions also, yields the basic results of the variational method stated in Corollaries 3.2 to 3.4, which follow by straightforward manipulations. The results are stated assuming that M ≥ N , i.e., PM PN = PN PM = P N , and are valid also with the replacement of f M , fˆM , A M by f , fˆ , A , respectively, as long as the solutions exist, whether the approximations converge to f , fˆ or not • Corollary 3.2. With the assumptions as in Corollary 3.1 and Remark 3.1, we have
Foundations and Applications of Variational and Perturbation Methods
71
( F [ fˆM , f M ] − F [ fˆN , f N ]) = ( fˆM − fˆN , g ) = ( gˆ , f M − f N ) = ( fˆM ,[ z − A ] f M ) − ( fˆN ,[ z − A ] f N ) = ( fˆM − fˆN ,[ z − A ]( f M − f N )) • The last equality on the right side of Corollary 3.2 can be majorized to yield Corollary 3.3. In addition to the assumptions of Corollary 3.2, let A be a bounded operator. Then
| F [ fˆM , f M ] − F [ fˆN , f N ] | = | ( fˆM − fˆN , g ) | = | ( gˆ , f M − f N ) | ≤ Const . || fˆM − fˆN || || ( f M − f N ) || • It follows from Corollary 3.3 by replacing f M , fˆM by f , fˆ , that the error in the form calculated with the approximate solutions is of the second order with respect to the error in the solutions as long as the operator is bounded. This conclusion cannot be reached for the unbounded operators. However, the first two terms on the right side of the equality of Corollary 3.3 can be estimated to yield a weaker inequality: Corollary 3.4. With the assumptions of Corollary 3.2, we have
| F [ fˆM , f M ] − F [ fˆN , f N ] | ≤ min ({|| fˆM − fˆN || || g ||}, {|| gˆ || || f M − f N ||}) • Substituting f , fˆ for f M , fˆM in Corollary 3.4, shows that the catastrophic result illustrated in Example 3.1 will not be encountered for the solutions obtained by the variational method. Although not of much interest in case of the variational methods, the following representation of the forms considered in Corollary 3.1, is of considerable interest in the theory of the Padé approximants and the related moment problem, which will be shown to be a special case of the variational methods. Proposition 3.1. With symbols as in Theorem 3.1 and Corollary 3.1, and with basis independent of z , ( gˆ , f ) and ( fˆ , g ) are rational functions of z , expressible as N
N
( gˆ , f N ) = QN −1 ( z ) / PN ( z ) and ( g , fˆN ) = Qˆ N −1 ( z ) / PˆN ( z ) , where
Qm (z)
, Qˆ m ( z ) , Pm ( z ) , Pˆm ( z ) are polynomials of degree m in z .
72
S. Raj Vatsya Proof. We give a proof for ( gˆ , f N ) . The other case follows similarly.
From Eqs. (3.16-a) and (3.17-a), ( gˆ , f N ) is expressible as the elements of the solution α of the matrix equation
∑
N n =1
α n ( z ) βˆn , where α n are
LN ( z ) α ( z ) = β ,
(3.20)
where LN ( z ) is an N × N matrix with elements ( LN ( z )) nm , which are linear functions of
z , and β is a z -independent N -vector. The determinant of an m × m matrix with each entry linear in z is a polynomial of degree m , as follows. The result is obvious for m = 1 . If it is true for some value of m , straightforward computation shows its validity for (m + 1) . The result is valid for all m by induction. Thus the determinant of LN ( z ) is a polynomial of degree N and the cofactors are polynomials of degree ( N − 1) in z . The result follows by substitution • The inhomogeneous equations with
g = gˆ = 0 reduce to the corresponding eigenvalue
equations and the form F [ hˆ, h] reduces to
F0 [hˆ, h] = (hˆ, [ A − λ ]h) , with appropriate substitutions. If a number
(3.21)
λ exists such that the first variation of F0 [ hˆ, h]
vanishes under all small variations of hˆ , h over a dense set, then h = ψ , hˆ = ψˆ and
λ , λ*
are the eigenvalue of A , Aˆ , respectively. Alternatively, F0 [ hˆ, h] is stationary about the exact solutions and eigenvalues of Eqs. (3.8) and (3.10). Present characterization of the solutions of the eigenvalue equations is essentially the same as the formulation more frequently found in literature, indicated below. The form (hˆ, A h) is stationary about the solutions of the eigenvalue equations with the restriction (hˆ, h) = 1 . This condition is then implemented by the use of a Lagrange multiplier λ , reducing the formulation to the present one. In any case, the exact eigenvalue is given by λ = (ψˆ , Aψ ) /(ψˆ ,ψ ) and the approximate value
λ t by λ t = (ψˆ t , Aψ t ) /(ψˆ t ,ψ t ) , where the superscript t denotes the trial solutions.
The non-linear form
λ%[hˆ, h] = (hˆ, A h) /(hˆ, h) is also stationary about the exact solutions, yielding essentially the same formulation as by varying F [ hˆ, h] . 0
Foundations and Applications of Variational and Perturbation Methods
73
Practical application of the variational characterization can now be developed in a manner parallel to the inhomogeneous equations, which reduces the problem to solving the homogeneous counterparts of Eqs. (3.17-a) and (3.17-b), given by
∑
m =1
∑
m =1
N
α m (φn , [λ − A ]φm ) = 0 ,
(3.22-a)
αˆ m (φn , [λ * − Aˆ ]φm ) = 0 ,
(3.22-b)
and N
which is equivalent to Theorem 3.2. Variational approximations to the eigenvalues and the eigenvectors of an
A and Aˆ are the eigenvalues and the eigenvectors of AN = PN APN ˆ N , respectively. and Aˆ N = PN AP
operator
Proof. Follows by setting g = gˆ = 0 in Eqs. (3.18-a) and (3.18-b) together with other
obvious substitutions • We have inherently assumed that the eigenvalues are non-degenerate. Degeneracy creates no complication for the present purpose. In this case, the correspondence between the eigenspaces is still maintained. This issue will be addressed when it is of consequence, particularly in the case of spectral differentiation. The results of the Corollaries 3.1 to 3.4 for the inhomogeneous equations have been deduced for a fixed value of z . In case of an eigenvalue equation, the eigenvalue as well as the vector varies with N . Therefore, these results and the derivations require some adjustments. The operator A N = PN APN has N eigenvalues counting multiplicities. For the present, it is sufficient to consider one of them
λN with a corresponding normalized eigenvector ψ N .
ˆ N is The corresponding eigenvalue of Aˆ N = PN AP
λ * N with the associated eigenvector ψˆ N .
It is still true, from Theorem 3.2, that PNψ N = ψ N and PNψˆ N = ψˆ N , and hence
(ψˆ N , [ λ N − A N ]ψ N ) = (ψˆ N , [ λ N − A ]ψ N ) .
(3.23)
This leads to the following equality: Corollary 3.5. In addition to the symbols as above, let PN PM = PM PN = PN and
(ψˆ N ,ψ N ) = 1 . Then (λM − λN ) = (ψˆ M − ψˆ N ,[λM − A ] (ψ M − ψ N )) . Proof. From Theorem 3.2, we have that
74
S. Raj Vatsya
[ λ M − A M ]ψ M
= [ λ M* − Aˆ M ]ψˆ M
=
0.
Consequently,
(ψˆ M ,[λM − A ] ψ M ) = (ψˆ N ,[λM − A ] ψ M ) = (ψˆ M ,[λM − A ] ψ N ) = 0 . It follows that
(λM − λN ) (ψˆ N ,ψ N ) = (ψˆ N ,[λM − A ] ψ N ) − (ψˆ M ,[λM − A ] ψ M ) = (ψˆ M −ψˆ N ,[λM − A ] (ψ M −ψ N )), implying the result • For a bounded A , the result of Corollary 3.5 implies that
| λM − λN | ≤ const. || ψˆ M −ψˆ N || || ψ M − ψ N || .
(3.24)
This result can be seen to be valid with other usual normalization conditions also. As in the inhomogeneous case, second order accuracy of the variational approximation to the eigenvalues cannot be concluded for an unbounded operator.
3.II. CONVERGENCE 3.II.1. Basic Results The results of Corollaries 3.1 to 3.5 show that if the variational approximations converge to the exact solutions, the form converges even faster for the bounded operators and at least as fast for the unbounded operators. In this section, the convergence properties of the variationally obtained sequences are determined for the types of operators that are encountered in applications. While the convergence to the exact solution is of an independent interest, it also enables a proper use of the results obtained in sec. 3.II. to approximate the quantities of interest expressible as forms. First we collect some basic results to be invoked in proving the convergence for classes of operators, which will be used to develop the solution schemes for equations encountered in practice, in the applications part. Lemma 3.1. If a sequence of operators
{
sequence of their resolvents ( z − A N )
−1
{ AN }
converges uniformly to A , then the
} converges uniformly to ( z − A )
−1
for each z in
the resolvent set of A . The convergence is uniform with respect to z in each closed bounded set contained in the resolvent set of A .
Foundations and Applications of Variational and Perturbation Methods Proof. Since || A N − A || → 0 and ( z − A )
−1
N →∞
75
is bounded,
|| B N ( z ) || = || ( A N − A )( z − A ) −1 || ≤ || ( A N − A ) || || ( z − A ) −1 ||
→
N → ∞
0.
Consequently, || B N ( z ) ||< 1 for sufficiently large N and thus, from Lemma 2.3 the right side of
( z − AN ) −1 = ( z − A ) −1 [1 − B N ( z )]
−1
admits a uniformly convergent Neumann expansion, which defines the left side as a bounded operator. Convergence in the uniform operator topology now follows from
Δ( z ) = || ( z − A N ) −1 − ( z − A ) −1 || ≤ || ( z − A ) −1 || || B N ( z ) || [1− || B N ( z ) ||]
−1
→ 0.
N →∞
With z , z ′ in the resolvent set of A ,
|| B N ( z ) − B N ( z ′) || = || ( A N − A ) ⎡⎣ ( z − A ) −1 − ( z ′ − A )−1 ⎤⎦ || ≤ | ( z − z′) | || B N ( z ) || || ( z′ − A ) −1 || < | ( z − z′) | || ( z′ − A ) −1 ||, for sufficiently large N . Consequently, || B N ( z ) || is a family of equicontinuous functions. −1
Similarly, it can be seen that || ( z − A ) || is a uniformly continuous function of z . It follows that Δ ( z ) is a family of equicontinuous functions of z . Uniform convergence with respect to z in each closed bounded set contained in the resolvent set of A follows from Arzela-Ascoli theorem (Lemma 1.3.) • Uniform convergence of a sequence of operators transfers essentially the same property to their eigenprojections. Although the following result can be extended further, we restrict to the case when part of σ ( A ) can be enclosed in a closed bounded region in the complex plane, termed the isolated part of
σ ( A) .
Lemma 3.2. (i) With the assumptions of Lemma 3.1, let p (σ% ) be the eigenprojection on
the invariant subspace corresponding to an isolated part σ% of the spectrum of A . Then for sufficiently large
N , { AN } has a sequence of eigenprojections p N (σ% N ) with
corresponding spectrum σ% N , such that || p N (σ% N ) − p (σ% ) ||→ 0 and each point in σ% N converges to a corresponding point in σ% .
76
S. Raj Vatsya (ii) In addition to the assumptions of (i), let λ be an m -fold degenerate, isolated
eigenvalue of A . Then for sufficiently large N , A N has precisely m eigenvalues
λ jN , j = 1, 2,..., m , counting multiplicities, in a small neighborhood of λ , and each λ jN → λ as N → ∞ . Proof. (i) Let c be a positively oriented closed contour of finite length enclosing σ% in its interior, and the remainder of the spectrum, in its exterior at a non-zero distance from c . It follows from Lemma 3.1 that
sup || ( z − A N ) −1 − ( z − A ) −1 || → 0 . N →0
z in c
Consequently, c is contained in the resolvent sets of { A N } for sufficiently large N . Let the projection p N (σ% N ) be defined by
p N (σ% N ) =
1 2π i
∫ dz ( z − A c
N
) −1 ,
with σ% N being the part of the spectrum of A N that is enclosed by c . It follows that
1 dz ⎡⎣ ( z − A N ) −1 − ( z − A ) −1 ⎤⎦ || ∫ c 2π i 1 sup || ( z − A N ) −1 − ( z − A ) −1 || ∫ | dz | → 0. ≤ c 2π z in c N →∞
|| p N (σ% N ) − p (σ% ) || = ||
Convergence of σ% N , which is clearly non-empty, to σ% follows from the fact that c can
be made arbitrarily close to σ% by increasing N .
(ii) From (i), we have that for sufficiently large N , || p N (σ% N ) − p (λ ) || < 1 . The
dimension of the range of p (λ ) is equal to m by assumption. Consequently, the dimension of the range of p N (σ% N ) is also equal to m , from Lemma 2.2 (ii). This implies that σ% N consists precisely of m points, counting multiplicities. Convergence follows from (i) •
Although uniformly convergent sequences of operators are frequently encountered, this property is not available in a variety of practical cases. Relaxing this condition requires stronger conditions. Even then, the convergence results are frequently weaker.
Foundations and Applications of Variational and Perturbation Methods Lemma 3.3. If a sequence of operators
{
{ AN }
77
converges strongly to A and if the
} is uniformly bounded by a constant independent of N for a fixed z in the resolvent set of A , then {( z − A ) } converges strongly to ( z − A ) . If the bound on {( z − A ) } is independent of z in a closed bounded set
sequence of resolvents ( z − A N )
−1
−1
N
−1
−1
N
contained in the resolvent set of A , then the convergence is uniform with respect to z on this set. Proof. It follows from the second resolvent equation, Eq. (2.1), that
B N ( z )u = ⎡⎣ ( z − A N )−1 − ( z − A )−1 ⎤⎦ u = ( z − A N )−1 ( A N − A )( z − A )−1 u −1
for each fixed vector u in H . Since [( z − A ) u ] is a fixed vector in H and
{ AN }
−1
converges strongly to A , we have || ( A N − A )( z − A ) u || → 0 . Further, since N →∞
−1
|| ( z − AN ) || ≤ const. independent of N , || B N ( z )u || ≤ || ( z − A N )−1 || || ( A N − A )( z − A )−1 u || ≤ const. || ( A N − A )( z − A ) −1 u || → 0. N →∞
As in Lemma 3.1, it is straightforward to show that {|| B N ( z )u ||} is a family of equicontinuous functions of z in the stated set. The uniform convergence with respect to z on this set follows from Arzela-Ascoli theorem (Lemma 1.3) • In Lemma 3.3, we have assumed the strong convergence of the sequence of operators to conclude the same for the sequence of the resolvents, to their limits. At times the following sufficiency criterion can be exploited to remove the encumbrance of strong convergence on H . Lemma 3.4. If a sequence {T n } of uniformly bounded operators converges strongly to
T on a set D dense in H , it converges strongly on H . Proof. For an arbitrary u in H , there is a vector υ in D such that || u − υ || ≤
an arbitrary
ε for
ε > 0 . It follows that
|| (T N − T )u || ≤ || (T N − T )υ || + || (T N − T ) || || u − υ || ≤ (1 + 2M )ε ,
78
S. Raj Vatsya
ε by increasing N due to the strong convergence on D , and M is the common bound of {T n } , which implies the boundedness of T by M
for the first term can be made smaller than
also (Theorem 2.1), yielding the stated bound on the second term. This implies strong convergence on H • Strong convergence of a sequence of resolvents being weaker than the uniform convergence does not transfer the same property to the eigenprojections to the same extent. However, supplemented with the self-adjointness, and additional properties, it can be used to obtain useful results. For the purpose of the following, strong convergence of the sequence of operators is largely inconsequential as these results depend on the strong convergence of the sequence of the resolvents ( z − A N )
−1
−1
to ( z − A ) . If this is the case, then the sequence of
operators { A N } is said to converge strongly to A in the generalized sense, which is a weaker condition, i.e., if { A N } converges strongly to A , then it converges also in the generalized sense on a subset of the resolvent set in accordance with Lemma 3.3, but the converse is not true. Lemma 3.5. Let A N be a sequence of self-adjoint operators converging strongly to a
self-adjoint operator A in the generalized sense, for all non-real values of z . Then
ρ N (λ ) = ( f , EN (λ ) g ) converges to ρ (λ ) = ( f , Eλ g ) at all of the points of continuity of
Eλ , where f and g are arbitrary vectors in H , and EN (λ ) , Eλ are the spectral functions of A N , A , respectively. Proof. The total variation of the sequence
variations of a real variable
ρ N (λ ) of complex functions of bounded
λ is clearly uniformly bounded by [|| f || || g ||] . Hence, by
Helly's theorem (Proposition 1.3) it has a convergent subsequence
ρ n ( N ) (λ ) with a limit
ρ ′(λ ) . Consider the integral I n ( N ) ( z ) =
∞
d ρ n ( N ) (λ )
−∞
(z − λ)
∫
, which by integration by parts reduces to
ρ n ( N ) (λ ) dλ , −∞ ( z − λ ) 2
I n( N ) ( z) = −∫
∞
where z is a non-real complex number. The integrand is bounded by
|| f || || g || , which is | z − λ |2
integrable with respect to λ on (−∞; ∞ ) . Hence it follows by the Lebesgue theorem (Theorem 1.3), that
Foundations and Applications of Variational and Perturbation Methods
I ( z ) = lim I n ( N ) ( z ) = − N →∞
Since ( z − A N )
−1
∞
d ρ (λ )
−∞
(z − λ)
∫
ρ ′(λ ) ∫−∞ ( z − λ )2 d λ = ∞
∞
d ρ ′( λ )
−∞
(z − λ)
∫
79
.
−1
converges strongly, and hence weakly, to ( z − A ) , we have
= ( f ,[ z − A ]−1 g ) = lim ( f ,[ z − An ( N ) ]−1 g ) N →∞
= lim
N →∞
∞
d ρ n ( N ) (λ )
−∞
(z − λ)
∫
=
∞
d ρ ′(λ )
−∞
(z − λ)
∫
.
ρ ′(λ ) = ρ (λ ) at all of their points of continuity, which
From Theorem 2.8, this implies that
are identical. If the sequence itself does not converge, then there must be another convergent subsequence ρ m ( N ) (λ ) with a limit ρ ′′(λ ) ≠ ρ ′(λ ) . However, as above,
ρ ′′(λ ) = ρ (λ ) = ρ ′(λ ) , which is a contradiction, hence the result • Theorem 3.3. With the assumptions of Lemma 3.5, the sequence EN (λ ) of the spectral
functions of A N converges strongly to the spectral function Eλ of A at all of its points of continuity. Proof. From Lemma 3.5, EN (λ ) converges weakly to Eλ , i.e., the sequence of the
vectors [ EN (λ ) g ] converges weakly to [ Eλ g ] for an arbitrary vector g . Since EN (λ ) and Eλ are self-adjoint projections, we have that || EN (λ ) g || = | ( g , EN (λ ) g ) , which 2
converges to ( g , Eλ g ) =|| Eλ g || at all of its points of continuity. Thus, the sequence of 2
vectors [ EN (λ ) g ] converges strongly to [ Eλ g ] , from Proposition 1.4 • Following useful conclusions, which imply each other, can be drawn from the result of Theorem 3.3. Corollary 3.6. In addition to the assumptions of Theorem 3.3, let Ω be a real, closed, bounded and isolated set contained in the resolvent set of A , and let the ortho-projection
Q N be defined by Q N =
∫
Ω
dEN (λ ) . Then
s
(i) Q N → 0 . N →∞
(ii) B N g =
∫
Ω
s
λ dEN (λ ) g → 0 . N →∞
80
S. Raj Vatsya
If in addition z in Ω is at a positive distance from the spectrum of A , then (iii) For each z in Ω ,
( z − A ) −1 g =
∫
R1 −Ω
dEλ g = s − lim N →∞ (z − λ)
∫
R1 −Ω
dEN (λ ) g (z − λ )
= s − lim ( z − A N + B N ) −1 (1 − Q N ) g . N →∞
The last equality holds also for z = 0 in the limit as z → 0 . (iv) The sequence EN′ (λ ) of the spectral functions of ( A N − B N ) converges strongly to the spectral function Eλ of A at all of its points of continuity. Proof. Proofs for all of the results, (i) to (iv) follow by essentially the same arguments. Since some of the arguments are the same as in Lemma 3.5, they are only outlined. Consider (i): s
Q N = EN (λ ) |∂Ω → Eλ |∂Ω = 0 , N →∞
from Theorem 3.3 and the fact that Eλ is constant on Ω . For (ii), integrating by parts we have,
(u , B N g ) =
∫
Ω
λ d (u , EN (λ ) g ) = λ (u, EN (λ ) g ) |∂Ω − ∫ d λ (u, EN (λ ) g ) Ω
→ λ (u, Eλ g ) |∂Ω − ∫ d λ (u , Eλ g ) =
N →∞
Ω
∫
Ω
λ d (u , Eλ g ) = 0,
from Theorem 3.3 by using the Lebesgue theorem (Theorem 1.3) as in Lemma 3.5. We have also used the fact that Eλ is constant on Ω . Further,
|| B N g ||2 = ∫ λ 2 d || EN (λ ) g ||2 = ∫ λ 2 d ( g , E N (λ ) g ) → Ω
Ω
∫
N →∞ Ω
λ 2 d ( g , Eλ g ) = 0 ,
as above. It follows that the weak convergence of B N to zero implies its strong convergence from Proposition 1.4. In case of (iii), the first equality follows from the fact that
σ ( A ) is contained in R1 − Ω
and the domain of integration in Theorem 2.7 is required only to be large enough to include σ ( A ) . Strong convergence follows as in (ii). The last equality follows by definition.
Foundations and Applications of Variational and Perturbation Methods
81
For (iv), it follows from (iii) that the sequence ( A N − B N ) converges strongly in the generalized sense to A for all non-real z , implying the result from Theorem 3.3 • Remark 3.2. (i) The equality
dEN (λ ) g = ( z − A N + B N )−1 (1 − Q N ) g. R −Ω ( z − λ )
∫
1
used in Corollary 3.6 (iii) is valid for all z as long as z is in the resolvent set of A . As indicated, for z = 0 , the right side can be evaluated by taking the limit as z → 0 . In fact, this equality is an extension of the case for z = 0 , which is always encountered in the variational methods for the following: Since A N (1 − PN ) = 0 , ( A N ) if A
−1
( AN )
−1
does not exist (Lemma 2.1) even
is defined as a bounded operator. In this case, B N = 0 and Q N = (1 − PN ) . Although
−1
does not exist for all g , it exists on the range of (1 − Q N ) = PN , called the
restricted inverse or the inverse of the restriction of A N to PN H , defined as follows. General solution of the equation A N f N = g N , with g N = PN g N can be expressed as
f N = (u + f N′ ) , where PN u = 0 and f N′ = s − lim ( z − AN ) −1PN g N . Corollary 3.6 (iii) z →0
selects the solution f N = f N′ which is uniquely defined and shows that it converges strongly −1
−1
to A g . The restricted inverse is denoted by f N = ( A N ) g N . The argument admits an obvious extension to the more general case considered in Corollary 3.6 (iii), where B N ≠ 0 . However, if B N = 0 , the original variational method itself incorporates the adjustment indicated in Corollary 3.6 (iii). In general the computational procedure must be adjusted. −1
Even if z is not in the resolvent set of A , but A exists, the restricted inverse still exists, e.g., by taking the limit z → 0 with Im.( z ) ≠ 0 , but the convergence result may not hold. However, the results of Corollaries 3.1 to 3.4, together with Remark 3.1, are still easily seen to hold. With this understanding, z = 0 will not be isolated unless it has a significant impact on the argument or the results. (ii) The results of this subsection, from Lemma 3.1 to Corollary 3.6, including the above remark, (i), are valid with the discrete sequence { N } replaced with a continuous variable, i.e., instead of N → ∞ , with ε ′ → ε . This can be seen either by following the same arguments or by noting that the convergence for all discrete sequences contained in a continuous sequence is sufficient for the convergence on the continuous sequence. For Lemma 3.5, we used discrete subsequences. Such subsequences can be selected from a continuous one, without impacting upon the arguments •
82
S. Raj Vatsya
3.II.2. Compact Operators Among the operators encountered in practice, compact operators are endowed with properties, which imply quite strong convergence properties of the variational approximations, as shown below. Theorem 3.4. Let z ≠ 0 be in the resolvent set of a compact operator A , i.e., Eq. (3.7),
(z − A) f = g , has a unique solution f for each given g in a Hilbert space H . Then Eq. (3.18-a), Theorem 3.1, i.e.,
( z − A N ) f N = PN g with
A N = PN APN , has a unique solution
f N for sufficiently large N
, and
|| f N − f ||→ 0 as N → ∞ . The convergence is uniform with respect to z in each closed bounded set contained in the resolvent set of A . Proof. From Theorem 2.2 with T n = S n = P N , we have that || A N − A ||→ 0 as
N → ∞ . Consequently,
|| f N − f || = || ( z − A N )−1PN g − ( z − A )−1 g || ≤ || ⎡⎣ ( z − A N ) −1 − ( z − A ) −1 ⎤⎦ g || + || ( z − AN ) −1 (PN − 1) g || ≤ || ⎡⎣ ( z − A N ) −1 − ( z − A ) −1 ⎤⎦ || || g || + || z −1 (PN − 1) g || . The first term converges to zero uniformly with respect to z in the specified set from Lemma 3.1. The second term converges to zero due to the strong convergence of PN to the identity operator. Uniformity with respect to z for z ≠ 0 is obvious • The result of Theorem 3.4 implies that for sufficiently large basis set, errors in the approximate values have a common bound for all values of z in any closed bounded set, which can be made arbitrarily small. Remark 3.3. Theorem 3.4 excludes the case z = 0 , which is never in the resolvent set of a compact operator, unless it is finite dimensional rendering the question of convergence redundant. In some instances the case z = 0 can be treated with the result of Theorem 2.9, which can be deduced as a corollary of Theorem 3.4 with z = 1 and A = 0 , reducing
Foundations and Applications of Variational and Perturbation Methods
83
Theorem 2.9 essentially to an inversion of the identity operator by the variational method. If such reduction is not possible, then some alternative methods are required • The convergence in Theorem 3.4 was deduced by estimating the error || f N − f || . A computable estimate of || f N − f || converging to zero is desirable in all cases. Since the −1
error estimate of Theorem 3.4 is expressed in terms of the exact resolvent ( z − A ) , it is unsuitable for computation. This approach was needed to conclude the existence and −1
boundedness of ( z − A N ) . With this result established, || f N − f || can equally well be −1
expressed in terms of ( z − A N ) , which makes it better suited for computation. Still a bound on || f N − f || may not be conveniently computable for a general compact operator, if at all. However, for the Hilbert-Schmidt operators, which are frequently encountered in applications, convenient error estimates are feasible, which are obtained below. Proposition 3.2. (Singh and Turchetti, 1977) With the assumptions of Theorem 3.4, we have
|| f N − f || ≤ z −1 [1 − η N ]−1 [|| (1 − PN )( g + A f N ) ||] → 0, N →∞
where
η N = || T N ( z ) || = || ( z − AN ) −1 ( A − A N ) || . For a Hilbert-Schmidt operator A , the result holds with the operator norms replaced with the Hilbert-Schmidt norms. Proof. Straightforward manipulations with Eqs. (3.7) and (3.18-a) yield
( f − f N ) = ( z − A ) −1 [ (1 − PN )( g + A f N )] . −1
−1
Since || ( z − A N ) − ( z − A ) ||→ 0 from Lemma 3.1, ( z − A N )
−1
is bounded for
sufficiently large N . This, together with || ( A − A N ) || → 0 implies that || T N ( z ) || → 0 . Hence, the right side of
( z − A ) −1 = [1 − T N ( z ) ] ( z − A N ) −1 −1
and (1 − η N )
−1
are well-defined for sufficiently large N . Consequently,
84
S. Raj Vatsya
|| f − f N || = || [1 − T N ( z )] ( z − A N )−1 [ (1 − PN )( g + A f N )] || −1
≤ || [1 − T N ( z )] || || ( z − A N )−1 [ (1 − PN )( g + A f N )] || −1
≤ [1 − η N ]−1 || ( z −1 [ (1 − PN )( g + A f N )] ||
→ 0. N →∞
If A is a Hilbert-Schmidt operator, it was shown in Lemma 2.5, that || ( A − A N ) ||2 → 0 . All of the above steps then go through with the operator norm replaced with the HilbertSchmidt norm, which is an upper bound to the operator norm. With this property, it is clear that the estimate is valid with the operator norm replaced with the Hilbert-Schmidt norm, which also converges to zero • Now, we consider the variational approximations to the eigenvalues and the eigenprojections of the compact operators. Corollary 3.7. Let λ be an m -fold degenerate, isolated eigenvalue of a compact
operator
A . Then for sufficiently large N , AN has precisely m eigenvalues
λ jN , j = 1, 2,..., m , counting multiplicities, in a small neighborhood of λ , and each λ jN → λ as N → ∞ . Proof. As in Theorem 3.4, || A N − A ||→ 0 as N → ∞ . The result now follows from
Lemma 3.2(ii) • As is the case with the inhomogeneous equation, computable error bounds on the eigenvalues converging to zero can be quite useful in numerical approximations. Such bounds are obtained in Proposition 3.3 below, for an operator of the Hilbert-Schmidt class, which are somewhat more involved than their counterpart for the inhomogeneous equation, but can be evaluated without an extensive computational effort. Bounds on the eigenprojections are determined essentially in the process of the proof of Proposition 3.3. Proposition 3.3. (Singh and Turchetti, 1977) Let p and p N be the eigenprojections of a
compact operator A and A N = PN APN corresponding to the eigenvalues λ and respectively. Then, 1/ 2 ⎡ ⎛ 4γ N ⎞ ⎤ 1 | λ − λN | ≤ α N ⎢1 − ⎜1 − ⎟ ⎥ = δN → 0 , 2 α N2 ⎠ ⎥ N →∞ ⎢⎣ ⎝ ⎦
where
γ N =|| p N A (1 − p N ) ||2 → 0 , α N = d N − || (1 − p N )( A − A N )(1 − p N ) ||, N →∞
λN ,
Foundations and Applications of Variational and Perturbation Methods −1
and d N = || B N || -1
85
with B N = [λN − (1 − p N ) A N (1 − p N )] . If A is a Hilbert-Schmidt
operator, then the result is valid with the Hilbert-Schmidt norm. Proof. It follows from the eigenvalue equation (λ − A ) p = 0 that
0 = p N (λ − A )[ p N + (1 − p N )]p = (λ − p N Ap N ) p N p − p N A (1 − p N ) p = (λ − λN ) p N p − p N A (1 − p N ) p. Since || p N − p ||→ 0 from Lemma 3.2(i), p N p ≠ 0 for sufficiently large N . Hence,
| (λ − λN ) | ≤ || p N A (1 − p N ) p || / || p N p ||
(3.25)
≤ || p N A (1 − p N ) || || (1 − p N ) p || / || p N p || .
It is straightforward to deduce from the eigenvalue equation that (1 − p N ) p is the solution of
[λ − (1 − p N ) A (1 − p N )](1 − p N ) p = (1 − p N ) Ap N p
(3.26)
Let
TN = [(λ − λN ) − (1 − p N )( A − A N )(1 − p N )]B N −1 . From Corollary 3.7 and Theorem 2.2, the first factor in T N converges uniformly to zero and
B N = [λN − (1 − p N ) AN (1 − p N )] converges uniformly to the operator [λ − (1 − p ) A (1 − p )] , which is invertible owing to the fact that (1 − p ) A (1 − p ) is compact with spectrum in the complement of the one point set
λ . Consequently, [1 + TN ]−1 converges uniformly to the identity operator. It follows from Lemma 3.1 and Eq. (3.26) that
(1 − p N ) p = B N −1[1 + TN ]−1 (1 − p N ) Ap N p ,
(3.27)
Further,
|| (1 + TN ) −1 || ≤ (1− || TN ||) −1 ≤ {1 − d N−1 [| λ − λN | + || (1 − p N )( A - A N )(1 − p N ) ||]} = DN . −1
It follows that
86
S. Raj Vatsya
|| (1 − p N ) p || ≤ d N −1 DN || (1 − p N ) Ap N || || p N p || . Substitution in Eq. (3.25) results in
| λ − λN | ≤ with
γN , (α N − | λ − λN |)
(3.28)
α N and γ N as defined in the statement. The result stated is the solution of Eq. (3.28).
For the same reasons as with the inhomogeneous equation, the result is valid with the operator norm replaced with the Hilbert-Schmidt norm whenever it exists. We have assumed that N is sufficiently large to ensure that
| λ − λN | + || (1 − p N )( A − A N )(1 − p N ) || < d N • The operator B N = [λN − (1 − p N ) A N (1 − p N )] = [λN − C N ] −1
−1
−1
is a bounded matrix
for all N , which converges to a bounded operator in the limit, but not to a compact operator. Therefore, calculation of d N requires evaluation of the norm of a matrix. For a self-adjoint operator, this calculation can be facilitated by observing that, in that case d N is the reciprocal of the smallest eigenvalue of [λN − C N ] . For the non-self-adjoint operators,
|| [λN − C N ]−1α ||2 = ([λN − C N ]−1α , [λN − C N ]−1α ) , ≤ || α || || [λN* − CˆN ]−1[λN − C N ]−1α || i.e.,
|| [λN − C N ]−1 ||2 ≤ || α || || | λN |2 −(λN CˆN + λN C N ) + CˆN C N ] ||−1 . Thus, this is reduced to the same as for a self-adjoint matrix. Further, since the result of Proposition 3.3 depends essentially on the fact that γ N → 0 , it remains unaffected by a crude estimate of d N , which can be exploited for computational convenience.
3.II.3. Bounded Operators Next we consider the bounded operators that are not compact. In general, uniform convergence of the variational approximations to such operators cannot be expected. With the loss of uniform convergence, even the existence of the solutions of Eq. (3.18-a) is lost and so is the convergence of the eigenvalues and the corresponding eigenprojections. For example, variational approximation PN to the identity operator is not invertible for an arbitrary vector and has two eigenvalues, zero and one, for each N , while the limit operator has only one
Foundations and Applications of Variational and Perturbation Methods
87
eigenvalue. Thus, the loss of compactness weakens the results. If these results are not satisfactory for a particular application, they are still useful as they can be made the bases for further improvements exploiting other properties that may be available. In case of PN , its restricted inverse (Remark 3.2(i)) is also PN , which is satisfactory for most of the applications. We obtain parallel results below by weakening the uniform convergence of the variational approximations to their limit operators. Theorem 3.5. Let z be in the resolvent set of a bounded operator A , and let
{( z − A ) } be uniformly bounded, where A −1
N
N
= PN APN . Then Eq. (3.18-a),
( z − A N ) f N = PN g has a unique solution f N for sufficiently large N , and || f N − f ||→ 0 as N → ∞ , where
f = ( z − A ) −1 g , i.e., the solution of Eq. (3.7). If {( z − AN ) −1} has a common bound for all
sufficiently large values of N with respect to z in a closed bounded set, the convergence is uniform with respect to z in the set. Proof. For a fixed vector u in H we have
|| (PN APN − A )u || = || [PN A (PN − 1) + (PN − 1) A )]u || ≤ || PN A (PN − 1)u || + || (PN − 1) Au || ≤ || PN A || || (PN − 1)u || + || (PN − 1) Au || → 0, N →∞
since || PN A || ≤ || A || , u and Au are fixed vectors and PN converges strongly to the identity operator. The convergence result || f N − f ||→ 0 now follows from the estimate
|| f N − f || = || ( z − A N ) −1PN g − ( z − A ) −1 g || ≤ || ⎡⎣( z − A N ) −1 − ( z − A ) −1 ⎤⎦ g || + || ( z − A N ) −1 (P N − 1) g || ≤ || ⎡⎣( z − A N ) −1 ( A N − A )( z − A ) −1 ⎤⎦ g || + || ( z − A N ) −1 || || (P N − 1) g || ≤ || ( z − A N ) −1 || ⎡⎣|| ( A N − A )( z − A ) −1 g || + || (P N − 1) g ||⎤⎦ . We have used the second resolvent equation (Eq. (2.1)) and the uniform boundedness of
{( z − A ) } . Equi-continuity of the left side is established by standard manipulations and −1
N
the estimates using the first resolvent equation, implying the uniformity of the convergence with respect to z as stated, by Arzela-Ascoli theorem (Lemma 1.3) •
88
S. Raj Vatsya While Theorem 3.5 establishes the convergence of the variational solutions to the −1
inhomogeneous equations, the condition || ( z − A N ) || ≤ const. is quite stringent. In some cases, this condition is satisfied for z in a set depending on the properties of A as shown below, which can be used in conjunction with Theorem 3.5 to determine the convergence properties.
{
Proposition 3.4. Let A be a bounded operator. Then ( z − A N )
−1
} with A
N
= PN APN
is uniformly bounded, with respect to N and z in a closed bounded set in the exterior of the circle of radius || A || in the complex plane. Proof. Since || PN APN || ≤ || A || , ( z − A N )
−1
is given by its Neumann expansion −1
(Lemma 2.3), which also shows that || ( z − A N ) || ≤ | z
−1
| [1− || z −1 A ||]−1 . Uniform
boundedness with respect to z follows from the fact that the right side decreases monotonically with increasing | z | . Alternatively, the equi-continuity of the left side is straightforward to establish and the result follows from Arzela-Ascoli theorem (lemma 1.3) • Under the condition of Proposition 3.4, the Neumann expansion of ( z − A )
−1
can be
used to obtain the approximate solutions of Eq. (3.7). However, the variational approximations are more accurate and usually easier to evaluate. For self-adjoint operators the set of convergence can be enlarged further. For this case, the spectrum of A N = PN APN in contained in its numerical range, which is contained in the numerical range of A , i.e., the set of points (u , Au ) for all normalized u . The spectrum of A N also has an inconsequential point, zero, in its spectrum. Consequently, we have
{
Proposition 3.5. Let A be a bounded, self-adjoint operator. Then ( z − A N )
−1
}
with
AN = PN APN is uniformly bounded, with respect to N and z in a closed bounded set in the exterior of the real interval [ − || A ||;|| A ||] in the complex plane. Proof. Since || PN APN || ≤ || A || , the numerical range and hence the spectrum of the
self-adjoint operator A N is contained in
[ − || A ||;|| A ||] .
Uniform boundedness follows
from Corollary 2.4 with respect to both, N and z • While Proposition 3.5 enlarges the domain of convergence considerably in comparison with the non-self-adjoint bounded operators, it still does not cover the entire resolvent set. For example, the identity operator has zero in its resolvent set but the strongly convergent variational approximation PN does not. A need for approximating the solutions for an arbitrary z in the resolvent set of A does arise in applications where the singularities of
Foundations and Applications of Variational and Perturbation Methods
{( z − A ) } −1
N
89
are known to be problematic. In the following, we remedy this situation to
obtain a sequence of approximations converging to the exact solution involving a self-adjoint operator A . For this purpose, it is sufficient to consider real z for if Im ( z ) ≠ 0 , then the desired convergence follows from Theorem 3.5, needing no modification. We give two proofs of Lemma 3.6; one based on Corollary 3.6 for its brevity and the alternative, for its transparency and instructive value. Lemma 3.6. (Singh, 1977) With A N = PN APN , let B N = A NQ N , with Q N being the
orthoprojection on the eigenspace of A N corresponding to the intersection of
σ ( A N ) and
the interval [ z − d ; z + d ] where [ z − (d + 2δ ); z + (d + 2δ )] is in the resolvent set of a bounded self-adjoint operator A with d ,
δ > 0 . Then {B N } is a sequence of uniformly
bounded self-adjoint operators converging strongly to zero. Proof. 1. Corollary 3.6 (ii). Proof. 2. Since B N is self-adjoint with its spectrum contained in [ z − d ; z + d ] ,
|| B N || ≤ max [| z − d |;| z + d |] . Let c be a positively oriented closed circular contour centered at z and radius (d + δ ) . For fixed u,
υ in H and an arbitrary η in c , we have
| (u, (B N − A )(η − B N ) −1B Nυ ) | = | (u, ( A N − A )(η − B N ) −1B Nυ ) |,
for B N2 = A N B N ,
≤ || ( A N − A )u || || (η − B N ) −1 || || B Nυ || ≤ || ( A N − A )u || δ −1 || B Nυ || → 0, N →∞
since A N converges strongly to A , || B Nυ ||≤ const. || υ || and || (η − B N ) ||≤ δ −1
−1
from
Corollary 2. 4. Hence
| (u , [(η − B N )−1 − (η − A )−1 ]B Nυ ) | = | (u , (η − A )−1 (B N − A )(η − B N )−1B Nυ ) | = | ((η − A )−1 u , (B N − A )(η − B N )−1 B Nυ ) | The convergence can be seen to be uniform with respect to left side is a sequence of equi-continuous functions of
η in c by checking that the
η by the standard estimates.
Consequently,
|I| = |
1 2π i
∫ dη (u, c
→ 0. N →∞
−1 −1 0. ⎣⎡ (η − B N ) − (η − A ) ⎦⎤ B Nυ ) | N→ →∞
90
S. Raj Vatsya
Since (η − A )
−1
is analytic in the region enclosed by c , the second integral on the right side
is equal to zero. This together with Theorem 2.8 implies that
| I | = | (u , Q N B Nυ ) | = | (u , B Nυ ) | → 0 , N →∞
i.e., B N converges weakly to zero, which can be strengthened to strong convergence as follows:
|| B N u ||2 = (u , B N B N u ) = (u , A N B N u ) = ( Au , B N u ) + (( A N − A )u , B N u ) ≤ | ( Au , B N u ) | + || ( A N − A )u || || B N u ||
→ 0, N →∞
for the first term converges to zero from the weak convergence of B N to zero and the second, from the strong convergence of A N to A . Thus, || B N u || → 0 • N →∞
Theorem 3.6. (Singh, 1977) With the assumptions as in Lemma 3.6, let
A% N = ( A N − B N ) . Then for each z ≠ 0 in the resolvent set of A , ( z − A% N ) −1 converges −1
strongly to ( z − A ) . Further, for each z in the resolvent set of A , including zero,
( z − A% N ) −1 (1 − Q N ) converges strongly to ( z − A )−1 , where Q N is the projection on the null space of ( z − A% N ) as defined in Lemma 3.6. Proof. This is just a statement of Corollary 3.6 (iii). Alternatively, from Lemma 3.6, A% N
converges strongly to A . Since A% N is self-adjoint with its spectrum in the exterior of
[ z − d ; z + d ] , ( z − A% N ) −1 , for z ≠ 0 it is uniformly bounded by d −1 from Corollary 2.4. First part now follows from Lemma 3.3. For z ≠ 0 , the second part follows from the additional result that Q N converges strongly to zero. For z = 0 , if it is in the resolvent set of A , essentially the same argument as in Remark 3.2 (i) implies the result • In view of the result of Theorem 3.6, if any singularities are encountered in the original variational method for some z ≠ 0 in the resolvent set of A , it can be adjusted to obtain the −1
sequences converging to ( z − A ) g . The approximating sequence in this case is given by
( z − A% N ) −1 g N with g N → g , as can be seen by the same argument as in Theorem 3.5, with ( g N − g ) replacing (PN − 1) g . The reduction stated in Corollary 3.8 below is valid for all z.
Foundations and Applications of Variational and Perturbation Methods
91
Corollary 3.8. (Reduced variational method) In addition to the assumptions of
Theorem 3.6, let
g N = (1 − Q N )P N g . Then ( z − A% N )−1 g N converges strongly to
( z − A )−1 g as N → ∞ . Proof. From Corollary 3.6 (i), we have that g N → g strongly as N → ∞ . The result
follows from Theorem 3.6 as in Theorem 3.5 • The method of Corollary 3.8 entails that the offending vectors can be dropped from the basis set. This reduces the rank of the matrix together with the vector on the right side. Alternatively, they can be replaced with other vectors. However, the offending vectors cannot be dropped permanently; they can be placed at the end of the set. It the singularity persists, then the offending vectors are dropped completely by default. Thus, the computational effort required to implement the modification of Corollary 3.8 is miniscule. This result is somewhat counter-intuitive as a reduction in the size of the basis set gives an impression that some information may be excluded. As seen above and indicated in Remark 3.2, this reduction excludes only the spurious singularities of the approximating sequence and the restricted solution obtained by this reduction converges strongly to the desired solution. It is clear that the spurious singularities that are problematic in approximating the elements of ( z − A )
−1
create no difficulties in approximating the elements of the spectral
function as the residues at such poles converge to zero from Corollary 3.6 (iii). The results for the self-adjoint operators are stronger than for the non-selfadjoint, to the extent that the convergent variational approximations for selfadjoint operators can be obtained with the reduced variational method (Corollary 3.8), if need be, for all z in the resolvent set of A . The convergence can also be concluded to be uniform with respect to z in an appropriate set. The proof of Theorem 3.6 does not go through with the non-selfadjoint operators for the result of Corollary 2.4 is no longer valid as can be seen from the matrix ⎡ 0 1 ⎤ and ⎢0 0⎥ ⎣ ⎦
z = 1 . However, this reduction can be exploited to obtain convergent variational
approximations for non-selfadjoint operators also, as follows. Remark 3.4. Let T be a bounded operator with z in its resolvent set. Then ( z − T )
and ( z − Tˆ ) *
−1
−1
exist as bounded operators. Furthermore,
ˆ ) −1 ( z − T ) −1 ( z * − Tˆ ) −1 = (| z |2 − zTˆ − z *T + TT is bounded, positive definite. If ( z − T N )
−1
does not exist for a value of z , then the result of
Theorem 3.6 can be exploited by expressing Eq. (3.7) as
ˆ ) f = ( z* − Tˆ ) g . ( z * − Tˆ )( z − T ) f = (| z |2 − zTˆ − z*T + TT
(3.29)
92
S. Raj Vatsya
While useful in some situations, solving Eq. (3.29) by the variational method requires the
ˆ , which at times is inconvenient. The following alternative method is matrix elements of TT computationally more convenient. With T as above, let ˆ ) and A = ( zTˆ + z *T − Tˆ T ) . A = ( zTˆ + z *T − TT N N N N N The sequence of self-adjoint operators A N converges strongly to A . If (| z | − A N ) 2
not exist, one can construct a sequence of bounded operators (| z | − A% N ) 2
−1
−1
does
by the method of
Lemma 3.6 and Theorem 3.6. In fact, the present definition of A N can be substituted in 2
Lemma 3.6 and Theorem 3.6 and | z | can replace z without having to alter any arguments. Then solve
) (| z |2 − A N′ ) f N = ( z * − TˆN )PN g . Since (| z | − A% N ) 2
−1
(3.30)
)
2 −1 and TˆN converge strongly to (| z | − A ) and Tˆ , respectively, f N −1
converges to ( z − T ) g . As indicated for the selfadjoint operators, this construction is not needed. The result shows that if a singularity is encountered, the offending vectors can be excluded at that stage of computation for non-selfadjoint operators also •
3.II.4. Semi-Bounded Operators As seen above, with loss of each nice property of the operator, the convergence results weaken and less information can be extracted, as should be expected. In case of the unbounded operators, Eqs. (3.17-a, 3.17-b) have no meaning for an arbitrary basis {φn } in
H . This can be corrected by restricting the basis to be contained in the domain of A . The projection PN defined with this restriction still converges strongly to the identity operator on
H , since D( A ) is dense in H . However, it cannot be guaranteed that AN = PN APN converges to A even weakly, disabling the above proofs. In the present subsection, we show that with {φn } in D( A ) , the case of a semi-bounded operator reduces to the case of a bounded operator. We assume A to be positive definite. Other cases can be reduced to this by a suitable combination of sign change and addition of a constant. Fully unbounded operators are considered in the next subsection. The method considered below is based on the Friedrichs construction described in sec. 2.III., Remark 2.5. In applied literature, particularly in the studies related to the variational methods, the method at times is referred to as the energy method (Mikhlin 1964). In case of the self-adjoint operators, the technicalities of the Friedrichs construction are somewhat inconsequential but relevant. In cases of non-selfadjoint but symmetric and for sectorial
Foundations and Applications of Variational and Perturbation Methods
93
operators, the construction plays a crucial role. In these cases, the operator has multiple extensions. Among them, Friedrichs’ extension is uniquely defined minimal extension in some sense, which is usually the relevant operator in applications. Furthermore, the variational methods and Friedrichs’ extension are naturally compatible (Singh, 1976, 1977). In any case, the variational approximations to the operators will be shown to converge to their Friedrichs’ extensions. With this understanding, we consider below the case of a self-adjoint operator. Sectorial operators will be considered later. Consider the set of algebraic equations, Eq. (3.17-a):
∑
N m =1
α m (φn , [ z − A ]φm ) = (φn , g ) ,
(3.31-a)
which is equivalent to
( z − A N ) f N = PN g , as in Theorem 3.1, with f N =
∑
(3.31-b) N n =1
α nφn (Eq. (3.16-a)). We have
Theorem 3.7. Let A be a positive definite self-adjoint operator, i.e., A ≥ κ ′ > 0 ; let
{φn } contained in D( A ) be a basis in a Hilbert space H ; and let z be in the resolvent set of A. (i) If z is at a positive distance independent of N , from the spectrum of A N = PN APN
f N = ∑ n =1α nφn converges strongly to N
for each N greater than some integer, then
f = ( z − A ) −1 g in H , i. e., || f N − f || → 0 . If N →∞
{( z − A ) } −1
N
has a bound
independent of N for all z in a closed bounded set contained in the resolvent set of A , then the convergence is uniform with respect to z on this set. (ii) (Reduced variational method) If A N has eigenvalues in a small neighborhood of
z , then (a) let Q N be the orthoprojection on the eigenspace of A N corresponding to all of its eigenvalues in a neighborhood Ω of z , with Ω being at a positive distance from the spectrum of A ; (b) let A% N be the restriction of A N to (1 − Q N )H = Q N′ H , i.e.,
A% N = A N − Q N A N = Q N′ A NQ N′ ; (c) let g% N = Q N′ P N g
and (d) let
f N′ = ∑ n =1α nφn′ = ( z − A% N )−1 g% N ,(Corollary 3.8); N′
otherwise let f N′ = f N . Then || f N′ − f || → 0 . N →∞
94
S. Raj Vatsya If
{( z − A% ) } has a bound independent of −1
N
N for all z in a closed bounded set
contained in the resolvent set of A , then the convergence is uniform with respect to z in this set. Proof. Let H + be the completion of the domain D ( A ) of A with respect to the scalar
product (u ,υ ) + = (u , Aυ ) , with the norm denoted by || ||+ (sec. 2.III.). Let B , B+ be as in Remark 2.5, i.e., B = A −1 and B+ is the restriction of B to H + . Then Eq. (3.31-a) is equivalent to
∑
N m =1
α m (φn , [1 − zB ]φm ) + = − (φn , B g ) + ,
(3.32-a)
which is equivalent to
(1 − zB N+ ) f N = − PN+B g , +
+
+
(3.32-b)
+
where B N = PN BPN with PN being the orthoprojection on the N -dimensional subspace of
H + spanned by {φn } , as in Theorem 3.1. Since B is bounded and PN+ converges strongly to +
the identity operator in H + , B N converges strongly to B+ in H + , i.e., for each u in H + ,
|| (B N+ − B+ )u ||+ → 0 , as in Theorem 3.5. N →∞
(i) It follows from the equivalence of Eq. (3.31-a, b) and Eq. (3.32-a, b) that z ≠ 0 is at a positive distance from the spectra of A N if and only if z −1 is at a positive distance from the +
+
+ −1
spectra of B N . Since this distance is independent of N and B N is self-adjoint, (1 − zB N )
is a sequence of uniformly bounded operators on H + from Corollary 2.4. For z = 0 ,
(1 − zB N+ ) −1 is clearly bounded by 1. Thus for each z at a positive distance from the spectra + −1
of A N = PN APN , (1 − zB N ) +
Also, || P N B g − B g ||+
converges to (1 − zB+ )
−1
strongly on H + , from Lemma 3.3.
→ 0 for each g in H since B g is in H + (Remark 2.5). It
N →∞
follows, as in Theorem 3.5, that
|| f N − f ||+ = || (1 − zB+ ) −1 B g − (1 − zB N+ )PN+B g ||+ → 0 , N →∞
−1
−1
for f = ( z − A ) g = (1 − B+ ) B g from Remark 2.5, Eq. (2.7-c). Convergence in H , i.e.,
|| f N − f || → 0 follows from Proposition 2.6(iii). N →∞
Foundations and Applications of Variational and Perturbation Methods
95
Uniformity of the convergence with respect to z in the specified set follows as in Lemma 3.3. +
(ii) It follows from one to one correspondence between the spectra of A N and B N , that the reduced sets of equations obtained from Eq. (3.31-a, b) and from Eq. (3.32-a, b) are also equivalent. The result now follows from the same argument as in (i), which is the same as in Theorem 3.6 and Corollary 3.8. • Remark 3.5. (i) The results of Corollary 3.1 to 3.4 can be stated in terms of A as well as in terms of B , which are equivalent. For a self-adjoint A , the results with gˆ = g are most
useful. With this choice, Corollary 3.1 reduces to
( g , f N ) = ( f N ,[ z − A ] f N ) = −( f N ,[1 − zB ] f N ) + = (B g , f N )+ .
(3.33-a)
Similar expressions are obtained for the other results. The estimate of Corollary 3.3 yields
| ( f M − f N , g ) | ≤ Const. || f M − f N ||+ || ( f M − f N ) ||+ .
(3.33-b)
Thus the form on the left side is approximated up to the second order accuracy in comparison with the solution, although in terms of a greater norm. This is still an improvement over the result of Corollary 3.4 for completely unbounded operators. There are other advantages in analyzing a semi-bounded operator in terms of the Friedrichs construction, instead of treating it as an unbounded operator. If z ≤ κ , then ( z − A ) ≤ 0 . Hence from Corollary 3.2,
( f M − f N , g ) = ( g , f M − f N ) = ( f M − f N ,[ z − A ]( f M − f N )) ≤ 0 , (3.33-c) implying that ( g , f N ) is a semi-monotonically decreasing sequence. (ii) With the reduced variational method, Eq. (3.33-a) still holds with f N′ replacing f N ,
′ PM Q M′ may not contain Q N′ PNQ N′ , the estimate of Eq. (3.33-b) and the but since Q M monotonicity stated in Eq. (3.33-c) are not assured. However, the estimate of Eq. (3.33-b) holds with f replacing f M together with f N′ replacing f N •
3.II.5. Unbounded Operators Next we consider a completely unbounded self-adjoint operator A . Any additional simplifying properties an operator may possess for a particular application will be exploited as they arise.
96
S. Raj Vatsya Lemma 3.7. Let A be a self-adjoint operator with domain D ( A ) in H . A vector u is
in D( A ) if and only if there is a vector
υ in H such that u = ( z − A )−1υ for an arbitrary
z in the resolvent set of A . Proof. If there is a vector u in D( A ) , and hence in H , then
υ = ( z − A )u is a vector
in H by definition of D( A ) . If there is a vector υ in H then ( z − A )
(z − A)
−1
υ is also in H for
−1
is a bounded operator with domain H . Thus,
Au = A ( z − A ) −1υ = [ z − ( z − A )]( z − A ) −1υ = ( zu − υ ) . It follows that the right side, and hence Au , is in H , i.e., u is in D( A ) • The range of ( z − A ) is dense from Theorem 2.4(iv) for each non-real z , and by the analyticity of the resolvent. This has been assumed inherently for the existence of a densely defined bounded resolvent. By closure (Theorem 2.1), ( z − A )
−1
is defined as a bounded
operator on the whole of H . Equivalently, the range of ( z − A ) is dense due to the −1
existence of [ z − A ] *
from Proposition 2.1.
Let M H be the set of all vectors u in H , which can be expressed as u =
∑
N′ n =1
α nϕn
{α n } and basis {ϕ n } in H , which can be assumed to be orthonormal. It is assumed that {ϕ n } is in D( A ) . The set M H is dense in for any value of the integer N ′ with some scalars
H , i.e., for an arbitrary ε > 0 and fixed but arbitrary u in H , there is an integer N ′ such that || u −
∑
N n =1
α nϕn ||< ε for each N ≥ N ′ . We have
ˆ H = ( z − A )M H is dense in H for an arbitrary z in the Lemma 3.8. The set M resolvent set of A .
ˆ H is dense in D( A ) . Proof. Since D( A ) is dense in H , it is sufficient to show that M
ˆ ) = 0 for Assume to the contrary, i.e., there is a nonzero vector u in D( A ) such that (u, w ˆ in Mˆ H , implying that (u,[ z − A ]w) = 0 for all w in M H . From Lemma 3.7, there all w is a nonzero vector
υ in H such that ([ z * − A ]−1υ ,[ z − A ]w) = 0 . The operator
[ z * − A ]−1 is bounded with adjoint [ z − A ]−1 . Hence 0 = ([ z * − A ]−1υ ,[ z − A ]w) = (υ ,[ z − A ]−1[ z − A ]w) = (υ , w)
Foundations and Applications of Variational and Perturbation Methods
97
is dense in H , this implies that υ = 0 (Lemma 1.6,
for all w in M H . Since M H
Corollary 1.3), which is a contradiction • With this preparation, we can extend the result of Theorem 3.5 and Theorem 3.7 (i) to self-adjoint unbounded operators. Theorem 3.8. (Singh and Stauffer 1974) Let z be in the resolvent set of a self-adjoint
{
operator A , and let ( z − A N )
−1
}
be uniformly bounded, where A N = PN APN . Then Eq.
(3.18-a), i.e., ( z − A N ) f N = PN g has a unique solution f N for sufficiently large N , and
|| f N − f ||→ 0 as N → ∞ , where f = ( z − A ) −1 g , i.e., the solution of Eq. (3.7). If
{( z − A ) } −1
is also bounded by an N -independent constant for each z in a closed
N
bounded set, then the convergence is uniform with respect to z in the set.
ˆ H as defined in Lemma 3.8, i.e., Proof. Let u be a vector in M ( z − A ) −1 u = ∑ n =1α nϕn N′
with some integer N ′ . Consider
B N ( z )u = ⎡⎣ ( z − AN )−1 − ( z − A )−1 ⎤⎦ u = ( z − A N )−1 ( A N − A )( z − A )−1 u , which follows from the second resolvent equation, Eq. (2.1). Take N ≥ N ′ , yielding
|| ( AN − A )( z − A )−1 u || = || (PN − 1) A ( z − A )−1 u || → 0 , −1
for A ( z − A ) u is a fixed vector in H and PN converges strongly to the identity operator.
ˆ H now follows from the boundedness of Strong convergence on M
{( z − A ) } −1
N
by a
constant independent of N , for
|| B N ( z )u || ≤ || ( z − A N )−1 || || ( A N − A )( z − A )−1 u || ≤ const. || ( A N − A )( z − A )−1 u ||
→ 0.
N →∞
ˆ H is dense in H from The strong convergence on H follows from Lemma 3.4, since M Lemma 3.8. Equi-continuity of the left side with respect to z follows from the equicontinuity of ( z − A N ) Theorem 3.5 •
−1
−1
and ( z − A ) , and the remainder of the proof follows as in
98
S. Raj Vatsya −1
For each complex z with Im. ( z ) ≠ 0 , || ( z − A N ) || ≤ 1/ | Im.( z ) | , implying the validity of the above convergence result. However, as is the case with the bounded operators, some unbounded operators encountered in the applications have real intervals in their resolvent sets. The method of Theorem 3.6, Corollary 3.8 and Theorem 3.7(ii) can be extended to correct the difficulty in this case also, as follows. Proposition 3.6. (Reduced variational method) Let A be a self-adjoint operator in a Hilbert space H and let the set Ω in the real line containing a point z in its interior be at a
positive distance from the spectrum of A . If A N has eigenvalues in Ω , then let
f N′ = ( z − A% N ) −1 g% N , where A% N is the restriction of AN to the orthogonal complement of the corresponding eigenspace and g% N , the projection of PN g on to this complement; otherwise, let f N′ = f N = [ z − A N ] PN g . Then || f N′ − f || → 0 . −1
N →∞
−1
−1
Proof. Strong convergence of [ z − A N ] PN g to [ z − A ] g for Im.( z ) ≠ 0 follows
from Theorem 3.8. Consequently, the spectral function EN (λ ) of A N converges strongly to the spectral function Eλ of A , from Theorem 3.3. The result now follows from Corollary 3.6 (iii). Uniform convergence with respect to z follows by the same argument as in Theorem 3.8 • An alternative proof of Proposition 3.6 can be based on the Proof. 2. of Lemma 3.6. The convergence proof of Theorem 3.8 is based on the following strategy: It is sufficient for a sequence of uniformly bounded operators to converge strongly on a dense set; the resolvent
ˆ H into another dense set M H of Lemma 3.8; and the vectors in M H maps the dense set M are finite linear combinations of the basis vectors, which eliminates the menacing PN . Thus, the proof of Theorem 3.6 can be extended, if it can be shown that B N converges strongly to zero on an appropriate dense set. In fact, following the above strategy, it is straightforward to show that B N converges strongly on H . The remainder of the proof is the same as the alternative proof of Theorem 3.6, and Corollary 3.8. The spectral functions EN (λ ) of a sequence A N of self-adjoint operators can be constructed from Theorem 2.8. For A N = PN APN , EN (λ ) is a piecewise constant projection valued function of λ with jumps at the eigenvalues the corresponding eigenprojections p n . Consider
( g , f N ) = ( g ,[ z − AN ]−1 g ) = QN −1 ( z ) / PN ( z ) ,
λn of AN of value equal to
Foundations and Applications of Variational and Perturbation Methods as in Proposition 3.1. The eigenvalues
99
λn are clearly the zeros of PN ( z ) and the
corresponding residues are equal to ( g , p n g ) . If A N converges uniformly to A , then from Lemma 3.2(i), its spectral function converges uniformly to the spectral function of A . If A is a compact operator, then from Corollary 3.7, the spectral function of A N = PN APN converges uniformly to that of A . For the other cases, a weaker result, i.e., the strong convergence, is still assured. Convergence of the variationally determined approximate spectral functions can be used to generate sequences of approximations to other functions of the operator in addition to the resolvent.
3.III. PADÉ APPROXIMANTS 3.III.1. Formulation Theory of the Padé approximants (Baker and Graves-Morris, 1996) has its origin in the moment problem (Shohat and Tamerkin, 1943) dating back to Tchebycheff. However, interest in it was revived by Stieltjes, who formulated the problem in the following form: Find a bounded non-decreasing function ρ (λ ) on the interval Ω in the real line such that its moments have a prescribed set of values
μn =
∫
Ω
μ n , i.e.,
λ n d ρ (λ ) .
(3.34)
A finite interval Ω is equivalent to [0;1] , in which case it is termed the Hausdorff moment problem. With the interval of integration [0; ∞) , it is known as the Stieltjes moment problem and with the interval (−∞; ∞) , the problem is called the Hamburger moment problem. For the present analysis, the interval of integration is immaterial, which will therefore be omitted. While a number of results are valid for ρ (λ ) being a function of bounded variation, it will be assumed to be non-decreasing. The cases when
ρ (λ ) is a
function of bounded variation can be treated by expressing it as a difference of two nondecreasing functions but at the expense of the strength of some results. Consider the Stieltjes transform of ρ (λ ) defined by (Remark 1.1)
F ( z) =
∫
d ρ (λ ) , z − λ
(3.35)
where z is a complex variable. The function F ′( z ) = zF ( z ) can be expanded formally as
F ′( z ) = zF ( z ) =
∑
∞ n =0
z − n ∫ λ n d ρ (λ ) =
∑
∞ n =0
z − n μn .
(3.36)
100
S. Raj Vatsya
If a non-decreasing function
ρ (λ ) exists such that the moments are given by Eq. (3.34), then
the moment problem is solvable and the series given by Eq. (3.36) is termed the series of Stieltjes, which may or may not be convergent. If convergent, the series produces F ′( z ) for
z −1 inside the circle of convergence of the series. For a solvable problem, if F ( z ) is exactly known, then ρ (λ ) is its inverse Stieltjes transform, which can in principle be determined by Theorem 2.8. However, F ( z ) is not determined exactly by a finite number of moments in general. The series expansion is used to obtain a converging sequence of approximations to F ′( z ) , and thus to F ( z ) , which is then inverted to determine the corresponding approximations to
ρ (λ ) . These approximations to F ′( z ) , known as the Padé approximants,
are rational functions of z . The [n, m] approximant to F ′( z ) is defined as
[n, m]F ′( z ) =
Qm′ ( z −1 ) , Pn′( z −1 )
(3.37)
where Qm′ ( z ) and Pn′( z ) are polynomials of degree m and n , respectively, in z . −1
−1
−1
These polynomials are determined by equating the first (m + n + 1) terms in the formal expansions of F ′( z ) and [n, m]F ′( z ) in powers of z , i.e., −1
F ′( z ) − [n, m]F ′( z ) = o( z − ( n + m +1) ) ,
(3.38-a)
equivalently,
F ′( z ) Pn′( z −1 ) − Qm′ ( z −1 ) = o( z − ( n + m +1) ) ,
(3.38-b)
− ( n + m +1)
indicating that the coefficient of z is the first non-vanishing coefficient in the expansion. Even if such polynomials exist, they are determined only within a multiplicative constant. Suitable normalization will be assumed to determine them uniquely. The denominator Pn′ is usually taken to be a monic polynomial but ortho-normalization condition is also used, which is more suitable for the present formulation. When useful, [n, n − 1]F ′( z ) and [n, n]F ′( z ) are the best approximations. The two are related by
[n, n]F ′( z ) = ( μ 0 + z −1[n, n − 1][ z (F ′( z ) − μ 0 )]) ,
(3.39)
i.e., the [n, n] approximant can be obtained by approximating F ′( z ) by the first term in its formal expansion plus the [n, n − 1] approximant to the remainder with the series rearranged. The other sequences can also be obtained in a similar manner.
Foundations and Applications of Variational and Perturbation Methods In addition to their application in approximating
101
ρ (λ ) , the Padé approximants are useful
in obtaining the approximations to F ′( z ) also, which is often a quantity of physical interest. The approximations are useful even in the interior of the circle of convergence of the power series given by Eq. (3.36), due to their faster rate of convergence in comparison with the power series, but at times spurious singularities are encountered. Consider the Hilbert space H = L (Ω, d ρ (λ )) , i.e., with the scalar product (u,υ ) 2
defined by
(u ,υ ) =
∫
Ω
u * (λ )υ (λ ) d ρ (λ ) ,
for all functions with finite value of
(u , u ) . If incomplete, the manifold can be completed
by including the limit functions as usual. Now, define an operator A by ( A u )(λ ) = λ u (λ ) , with its domain characterized by the definition. With g = 1 , we have
( g ,[ z − A ]−1 g ) =
∫
d ρ (λ ) . z − λ
(3.40)
On the other hand, it follows from the spectral theorem (Theorem 2.7) that with an arbitrary self-adjoint operator A ,
( g ,[ z − A ]−1 g ) =
∫
d ( g , Eλ g ) , z − λ
with ( g , Eλ g ) being a non-decreasing function of
(3.41)
λ , which can be identified with ρ (λ ) ,
( g , A g ) with the moments μ n and ( g ,[ z − A ]−1 g ) with F ( z ) . This establishes n
equivalence between the moment problem and the theory of self-adjoint operators in a Hilbert space. Therefore, no distinction will be made between the two in the following analysis, except when necessary. The spectrum of A corresponding to a moment problem coincides with the points of increase of
ρ (λ ) . The Hausdorff problem clearly corresponds to a bounded operator, the
Stieltjes case, to A being bounded below, and the Hamburger problem, to a completely unbounded operator. A discrete set of increases defines the point spectrum, which are the points of discontinuity of ρ (λ ) and the set of points where ρ (λ ) is increasing and absolutely continuous, constitutes the absolutely continuous spectrum of A . The remainder is in the resolvent set of A . A compact operator results if
ρ (λ ) has only the jump increases
located in compatibility with the properties of the spectrum of a compact operator. This correspondence with the operators in Hilbert space makes it possible to treat the moment problem and the related theory of the Padé approximants by the methods used to study the variational methods. However, the requirement that the approximations be
102
S. Raj Vatsya
expressible in terms of the moments, places a restriction on the basis set to be used. Consider the basis
φn = A n −1 g for n = 1, 2,... , which is well defined with an arbitrary g if A is
bounded. For an unbounded A , g must be in the domain of A
m
for all non-negative
integers m . In the following, g will be assumed to satisfy this requirement whenever needed. Consider the linear manifold spanned by
{φ
n
= A n −1 g} . This manifold can be completed
to a Hilbert space H M ( g , A ) , depending on g and A . The space H M is contained in H . An orthonormal basis set
{ϕ n }
can be constructed from {φn }n =1 , e.g., by the Gram-Schmidt
process (Remark 1.3). This results in
∞
ϕ n = Pn −1 ( A ) g for each n = 1, 2,... , where Pn −1 ( A )
is a polynomial of degree (n − 1) in A . These polynomials are determined recursively starting with P0 ( A ) = 1 ,
(ϕ m , ϕ n ) = ( Pm −1 ( A ) g , Pn −1 ( A ) g ) = 0 , equivalently by
( A m −1 g , Pn −1 ( A ) g ) = 0; n = 2,3,...; m = 1, 2,...(n − 1) ,
(3.42)
The triangular system of algebraic equations, Eq. (3.42), determines an orthogonal basis set in terms of the moments, which can then be normalized. Use of this basis in the variational methods is usually known as the method of moments (Vorobyev, 1965).
3.III.2. Representations Several equivalent expressions are available for the Padé approximants, each one suitable for certain analysis and computation. Some of them are obtained in this subsection. Lemma 3.9. Consider Eq. (3.18-a), Theorem 3.1,
( z − A N ) f N = PN g with basis
{φ
= A n −1 g}
∞
n
n =1
. If a solution f N exists then ( g , f N ) admits representation as a
rational function,
( g , f N ) = z −1QN′ −1 ( z −1 ) / PN′ ( z −1 ) ,
Foundations and Applications of Variational and Perturbation Methods
103
with the coefficients of the polynomials QN′ −1 ( z ) and PN′ ( z ) expressed in terms of the −1
moments
{ μ n }n = 0
(2 N −1)
−1
.
Proof. It follows from Proposition 3.1 that
(g, fN ) =
∑
N n =1
α n ( z ) β n = QN −1 ( z ) / PN ( z ) ,
(3.43-a)
where α is the solution of Eq. (3.20):
LN ( z ) α ( z ) = β ,
(3.43-b)
and QN −1 ( z ) and PN ( z ) are polynomials in z . With the assumed basis, the elements of the matrix and the right side vector in Eq. (3.43b) are given by ( LN ( z )) nm = ( z μ n + m − 2 − μ n + m −1 ) and ( β ) n = μ n −1 , n, m = 1, 2,...N . Therefore, the solution α , and hence ( g , f N ) , is expressed in terms of the moments. The N
result follows by dividing the numerator and denominator by z • Lemma 3.10. With the symbols as in Lemma 3.9, ( A N ) g = A g for each m
m
m ≤ ( N − 1) . Proof. The result is obviously true for m = 0 . Assuming it to be valid for m ≤ ( N − 2) ,
we have
AN ( A N ) m g = PN APN A m g = PN AA m g = A m +1 g , (m + 1) ≤ ( N − 1) , m
by the induction assumption and the fact that A g and A
m +1
g are in PN H M for
m ≤ ( N − 2) , implying the result • Lemma 3.11. With the symbols as in Lemma 3.10, ( g , ( A N ) g ) = ( g , A g ) for each m
m
m ≤ (2 N − 1) . Proof. For m ≤ (2 N − 1) , we have that ( g , ( A N ) g ) = (( A N ) g , A N ( A N ) g ) with m
l
n
some values of l , n ≤ ( N − 1) such that m = (l + n + 1) . Hence, from Lemma 3.10,
( g , ( A N ) m g ) = ( A l g , P N AP N A n g ) = (P N A l g , AA n g ) = ( A l g , A n +1 g ) = ( g , A m g ) • These results yield the following representation:
104
S. Raj Vatsya −1
Theorem 3.9. With the symbols as in Lemma 3.9, z ( g , f N ) = z (P N g ,[ z − A N ] PN g )
is identical with the [ N , N − 1] Padé approximant to F ′( z ) , whenever it exists. −1
Proof. Since PN g = g , we have that z ( g , f N ) = z ( g ,[ z − A N ] g ) , and the formal −1
expansion of ( g ,[ z − A N ] g ) , which may be convergent for sufficiently large values of z , is given by
z ( g ,[ z − A N ]−1 g ) = =
∑ ⎡∑ ⎣
∞ n =0
z − n ( g , ( AN )n g )
(2 N −1) n =0
z −n ( g, A n g ) +
∑
∞ n =2 N
z − n ( g , ( AN )n g ) ⎤ . ⎦
From Lemma 3.11, first 2N terms of this expansion agree with those of F ( z ) . Consequently, the series expansion of F ′( z ) = QN′ −1 ( z ) / PN′ ( z ) (Lemma 3.9) is the −1
−1
[ N , N − 1] Padé approximant to F ′( z ) from Eq. (3.38-a) • So far we have studied the Padé approximants within the framework of the variational methods. Historically this relation was investigated after the earlier investigations, mostly based on their continued fraction representations (Wall 1948; Stone 1932). We obtain this representation in Lemma 3.12. The analysis begins with representing A by an infinite simple triangular matrix, i.e., a Jacobi matrix. Proposition 3.7. With respect to the infinite basis given by Eq. (3.42), A assumes the
form of a self-adjoint Jacobi matrix and A N assumes the form of its truncated simple finite dimensional tri-diagonal matrix. Proof. The matrix elements a nm of A with respect to this basis, normalized, are given
by
a nm = ( Pn −1 ( A ) g , A Pm −1 ( A ) g ) . Self-adjointness of the matrix follows from this property of A . Since A Pm −1 ( A ) g can be expressed as a linear combination
A Pm −1 ( A ) g =
∑
m +1 l =1
cl Pl −1 ( A ) g ,
where cl are constants with cm +1 ≠ 0 . It follows from Eq. (3.42) that a nm = 0 = a mn for
m ≤ (n − 2) and a n ( n −1) = cn ≠ 0 •
Foundations and Applications of Variational and Perturbation Methods
105 −1
Lemma 3.12. With the symbols as in Proposition 3.7, F ( z ) = ( g ,[ z − A N ] g ) , if it
exists, admits the continued fraction representation given by
1
F ( z) = [ z − a11 ] −
| a12 |2 [ z − a 22 ] −
| a 23 |2 [ z − a 33 ] ............ | a1( N −3)( N − 2) |2
...................
[ z − a ( N − 2)( N −2) ] −
| a1( N − 2)( N −1) |2 |a |2 [ z − a ( N −1)( N −1) ] − ( N −1) N . ( z − a NN )
Proof. Let [ z − A N ] denote the truncated Jacobi matrix as obtained in Proposition 3.7.
Then F ( z ) = α1 , where
α n are the elements of the N -vector solution α of
[ z − AN ]α = β . The elements of
β are all zero, except that β1 = 1 . In detail, this system of equations reads
as
( z − a11 )α1 − a12α 2 = 1, − a n ( n −1)α ( n −1) + ( z − a nn )α n − a n ( n +1)α ( n +1) = 0, n = 2,3,, ( N − 1), − a N ( N −1)α ( N −1) + ( z − a NN )α N = 0, which is solved by
α1 =
1 ( z − a11 ) − a12
α2 α1
,
a n ( n −1) αn , n = 2,3,, ( N − 1), = α n +1 α ( n −1) ( z − a nn ) − a n ( n +1) αn a N ( N −1) αN . = ( z − a NN ) α ( N −1) The result follows by substitution •
106
S. Raj Vatsya
As is clear, the Padé approximants and the orthogonal polynomials are intimately connected, which is further exploited to yield the following representation, used frequently in classical analysis. Lemma 3.13. (Nuttall and Singh, 1977) The approximant
[ N , N − 1]F ′( z ) = QN′ −1 ( z −1 ) / PN′ ( z −1 ) , where the polynomials PN′ ( z ) and QN′ −1 ( z ) are defined within a multiplicative constant −1
−1
by
PN′ ( z −1 ) = z − N PN ( z ) and QN′ −1 ( z −1 ) = z − ( N −1) ∫
d ρ (λ ) [ PN ( z ) − PN (λ )] , (z − λ)
with PN being the orthogonal polynomials as defined by Eq. (3.42). Proof. Since PN is a polynomial of degree N , ( PN ( z ) − PN (λ )) /( z − λ ) is a
polynomial of degree ( N − 1) , both in z and λ . Thus, PN′ ( z ) and QN′ −1 ( z ) as defined, −1
−1
−1
are polynomials of the indicated degrees in z . Let
Δ( z −1 ) = ⎡⎣ PN′ ( z −1 )F ′( z ) − QN′ −1 ( z −1 ) ⎤⎦ = z − N ∫
d ρ (λ ) PN (λ ) . (1 − z −1λ )
−1
The formal expansion of Δ ( z ) is given by
Δ ( z −1 ) = z − N ∑ n=1 z − n+1 ∫ λ n−1 PN (λ ) d ρ (λ ) ∞
= z − N ∑ n=1 z − n+1 ( g , A n−1 PN ( A ) g ) ∞
= o( z −2 N ),
from Eq. (3.42). The result follows from the definition, Eq. (3.38-a) • Lemma 3.13 establishes a homeomorphism between the orthogonal polynomials in terms of the operator defined by Eq. (3.42) and the ones defined in terms of the variable z . Orthogonal polynomials together with this relation play a significant role in the studies related to the Jacobi matrices, which have resulted in a large area of investigation. Converse of Lemma 3.13 is also true. A proof is omitted for its little value for illustration and applications.
Foundations and Applications of Variational and Perturbation Methods
107
3.III.3. Convergence And Applications Above analysis is valid in H M ( g , A ) , the completion of the span of equivalently, of the orthonormal set
{ϕ n }n=1 ∞
{φn }n =1 , ∞
constructed by Eq. (3.42). The space H M is
contained in H . If H M = H , then the results obtained for the variational methods are directly applicable to the moment problem. However, this is not always the case. For example, if g is an eigenvector of A , assuming that one exists, then H M is onedimensional, while H is infinite dimensional. If g is orthogonal to an eigenvector, then
H M is infinite dimensional, but still not equal to H . The results obtained for the variational methods are valid for the restriction of A to H M , which is sufficient for some problems but not for the others. We show below that H M is sufficiently large to accommodate
( z − A )−1 g . Theorem 3.10. With z in the resolvent set of a self-adjoint operator A and g , an −1
arbitrary vector in H , [ z − A ] g is in H M ( g , A ) . Proof. First, consider a bounded A , which is of an independent interest also. From the −1
spectral theorem (Theorem 2.7), [ z − A ] g is expressed as
[ z − A ]−1 g =
∫σ
(A)
dEλ g . z − λ
Since z is in the resolvent set of A , ( z − λ )
−1
is a bounded continuous function of λ on
the spectrum of A , which is contained in an interval in the real line. From the Weirstrass approximation theorem (Lemma 1.1), for a fixed z , there is a polynomial P%n (λ ) such that
sup | ( z − λ ) −1 − P%n (λ ) | ≤ ε (n) ,
λ in σ ( A )
for each
ε (n) > 0 . Consequently,
|| [( z − A )−1 − P%n ( A )]g || = || ∫ [( z − λ )−1 − P%n (λ )] dEλ g || ≤ sup | [( z − λ ) −1 − P%n (λ )] | λ in σ ( A )
≤ ε (n) || g || → 0, n →∞
∫
|| dEλ g ||
108
S. Raj Vatsya
−1 and thus, [ z − A ] g is the limit of a convergent sequence of vectors P%n ( A ) g , which are −1
clearly in H M . Since H M is complete, [ z − A ] g is in H M . Now consider [ z − A ] EN′ (λ ) g , where A can be unbounded and −1
EN′ (λ ) = Eλ for − N < λ < N ; EN′ (λ ) = EN for λ ≥ N ; and EN′ (λ ) = E− N for λ ≤ − N . It follows that
[ z − A ]−1 EN′ (λ ) g =
∫σ
dEλ ( EN′ (λ ) g ) = (A) z − λ
∫
N
−N
dEλ g . z − λ
By the same argument as in the case of a bounded operator, [ z − A ] EN′ (λ ) g is in H M . −1
Since [ z − A ] is bounded and EN′ (λ ) converges strongly to the identity operator as −1
{
N → ∞ , [ z − A ]−1 EN′ (λ ) g
} is a convergent sequence. Since H
M
is complete, the limit
−1
vector [ z − A ] g is in H M ( g , A ) • Although, the result of Theorem 3.10 is valid for an unbounded selfadjoint operator A , the basis required to generate the Padé approximants is not convenient to find. While the set of vectors in the domain of the powers of A is dense in H , each g is not in their domains, −1
even though [ z − A ] g is well defined for all vectors g in H . Therefore, the method is rarely applied. Instead, attempt is made to express the problem in terms of a bounded operator. Nevertheless, there are cases when one must deal with an unbounded operator. Theorem 3.10 shows the adequacy of the basis to be used in the moment method to −1
−1
approximate [ z − A ] g and ( g ,[ z − A ] g ) =
ρ (λ ) . However, the eigenvalues and the
corresponding eigenprojections, if determinable by the moment method, are clearly those of the restriction of A to H M , which may or may not be the same as those of A in H . In the following, we assume that the operators are their restrictions to H M and the convergence results refer to H M . Corollary 3.9. Let A be a self-adjoint operator and let z ≠ 0 be in the complement of −1
−1
its numerical range. Then, as N tends to infinity, [ z − A N ] g converges to [ z − A ] g ;
[ N , N − 1]
Padé
approximants
converge
to
F ′( z ) = z ( g ,[ z − A ]−1 g )
and
the
corresponding spectral function EN (λ ) converges strongly to Eλ in H M at all of its points of continuity. −1
−1
Proof. Convergence of [ z − A N ] g to [ z − A ] g follows from Theorem 3.8 in
general. For bounded and semi bounded operators, alternative proofs follow from Theorem
Foundations and Applications of Variational and Perturbation Methods
109
3.5 and Theorem 3.7(i), respectively. This implies the convergence of the Padé approximants, which requires only the weak convergence. Since this convergence result is valid for all nonreal z , strong convergence of EN (λ ) to Eλ follows from Theorem 3.3 • Since A is a self-adjoint operator, variational equations are always solvable for
Im.( z ) ≠ 0 . For z = 0 , the restricted solution can be obtained from Remark 3.2(i). For real z , spurious singularities may be encountered. In such cases, the approximations can be obtained by the method of Corollary 3.8, Theorem 3.7(ii) and Proposition 3.6. The result of Corollary 3.9 is used below to obtain a numerical integration scheme. Let FN ( z ) = QN −1 ( z ) / PN ( z ) = ( g , f N ) be as defined in Lemma 3.9, i.e., the approximation to
( g ,[ z − A ]−1 g ) corresponding to [ N , N − 1]F ′( z ) . Also let λn and ω n be the poles and the corresponding residues of FN ( z ) . It is clear that
ρ N (λ ) , which are all positive. With the normalization ( g , g ) = 1 ,
the jumps of
∑
N n =1
λn are the eigenvalues of AN and ω n ,
ω n = ( g , g ) = 1 . For a given function ξ (λ ) , define the integrals
∑
I N (ξ ) =
N n =1
ωnξ (λn ) =
∫
Ω
ξ (λ ) d ρ N (λ )
(3.44-a)
and
I (ξ ) =
∫
Ω
ξ (λ ) d ρ (λ ).
(3.44-b)
Proposition 3.8. Let P%n (λ ) be a polynomial of degree n and the integrals be defined by
Eq. (3.44-a, b). Then for each n ≤ (2 N − 1) , I N ( P%n ) = I ( P%n ) . Proof. From Eq. (3.44-a) and the spectral theorem (Theorem 2.7), we have
I N ( P%n ) = ( g , P%n ( A N ) g ) . If n ≤ (2 N − 1) , then from Lemma 3.11,
I N ( P%n ) = ( g , P%n ( A ) g ) =
∫
Ω
P%n (λ ) d ρ (λ ) = I ( P%n ) •
ξ (λ ) be a continuous function of λ and its integrals be as defined by Eqs. (3.44-a, b). Then I N (ξ ) → I (ξ ) . Lemma 3.14. Let
N →∞
Proof. Since
ξ (λ ) is continuous, there is a polynomial P%n (λ ) such that
110
S. Raj Vatsya
sup | ξ (λ ) − P%n (λ ) | ≤ ε , λ in Ω
ε > 0 , from the Weirstrass approximation theorem (Lemma 1.1). Since % I N ( Pn ) = I ( P%n ) , it follows for N ≥ (n + 1) / 2 , that [ I (ξ ) − I N (ξ )] = (T1 − T2 ) , where
for each
T1 =
∫
Ω
(ξ (λ ) − P%n (λ )) d ρ (λ ) and T2 =
∫
Ω
(ξ (λ ) − P%n (λ )) d ρ N (λ ) .
Now,
∫
| T1 | ≤ ε
Ω
d ρ (λ ) = ε ,
and N N | T2 | = | ∑ n =1ω n ⎡⎣ξ (λn ) − P%N (λn ) ⎤⎦ | = | ∑ n =1 an bn | ,
ωn and bn = ωn ⎡⎣ξ (λn ) − P%N (λn ) ⎤⎦ . Since
with an =
∑
N n =1
an bn is a scalar product in
the N -vector space of column vectors, it follows from the Schwarz inequality (Lemma 1.4) that
| T2 |2 ≤
(∑
=
N n =1
(∑
2
| an |
N n =1
ωn
) (∑
) (∑
N n =1
N n =1
)
)
ω n | ⎡⎣ξ (λn ) − P%N (λn ) ⎤⎦ |2 .
≤ sup | ξ (λ ) − P%n (λ ) |2 λ in Ω
2
| bn |
(∑
N n =1
ωn
)≤ε
2
Consequently,
| I (ξ ) − I N (ξ ) | = | T1 | + | T2 | ≤ 2ε , implying the result • For ρ (λ ) = λ , the result of Lemma 3.14 reduces to the familiar Gaussian quadrature formula. This result is usually obtained by approximating
ξ (λ ) by a polynomial and then,
λn and ω n are determined by requiring the quadrature to be exact for all polynomials of degree m ≤ (2 N − 1) . Consider H M ( g , A ) constructed with a given function ρ (λ ) defining the measure, the self-adjoint operator being the operation of multiplication by the variable λ and the vector g being the function equal to 1 everywhere, as indicated earlier (Eqs. (3.40), (3.41)). Introduce a scalar product (.,.)
N
on H M ( g , A ) for all N defined by
Foundations and Applications of Variational and Perturbation Methods
∫
(u , υ ) N =
Ω
111
u * (λ )υ (λ ) d ρ N (λ ) ,
ρ N (λ ) is as in Lemma 3.14. The scalar product (.,.) N is well-defined on the continuous functions of λ . The set of continuous functions can now be completed to a where
N
Hilbert space H M ( g , A ) . Consider the usual ordered basis
{φn } = { A n−1 g} = {λ n−1} ,
n = 1, 2,.... ,
in H M ( g , A ) , with the orthonormal set orthonormal set
{ϕn }
as defined by Eq. (3.42). Now, construct an
{ϕ } from {φ } , equivalently from {ϕ } , with respect to the scalar product N n
n
n
(.,.) N . It follows from Proposition 3.8 that (ϕn , ϕm ) N = (ϕn , ϕm ) for n, m = 1, 2,..., N . Consequently,
ϕnN = ϕn for n = 1, 2,..., N . In fact ϕnN for n > N will not be needed. N
Therefore, the superscript N will be dropped from the basis vectors. Let PM be defined by
PMN u =
∑
M n =1
ϕn (ϕn , u ) N ,
(3.45)
on the set of continuous functions in H M ( g , A ) , for all M ≤ N . The operator PM for N
M ≤ N possesses the basic properties of PN . We restrict to the case of PNN , since it is the optimal case. N
Proposition 3.9. The operator PN defined by Eq. (3.45), is a projection on H M ( g , A )
for all
N , with || PNN ||= 1 , converging strongly to the identity operator on
H M∞ ( g , A ) = H M ( g , A ) . Proof. The fact that PN is a projection follows from the equality (PN ) = PN , which N
N 2
N
also implies that || PN ||≥ 1 (Lemma 2.2(i)). The fact that || PN ||≤ 1 follows from N
N
|| PNN u ||2 = (u , PNN u ) ≤ || u || || PNN u || . Both of the equalities are valid for each u on the set of the continuous functions in H M ( g , A ) , and follow from Proposition 3.8 by observing that [ϕnϕ m ](λ ) is a polynomial of degree no greater than (2 N − 2) for n, m = 1, 2,..., N . By closure (Theorem 2.1), the
112
S. Raj Vatsya
relation extends to H M ( g , A ) . This proof is similar to the arguments used in Proposition 2.2 although not exactly. As in Lemma 3.14, let P%n (λ ) be a polynomial such that for a fixed continuous u ,
sup | u (λ ) − P%n (λ ) | ≤ ε , λ in Ω
and hence, || u − P%n ||≤ ε , for each
(PN - PNN )u = =
∑ ∑
N
m =1
ε > 0 . For each continuous function u , we have
ϕm ⎡⎣(ϕ m , u ) − (ϕm , u ) N ⎤⎦
N m =1
ϕ m ⎡⎣(ϕ m , P%n ) − (ϕ m , P%n ) N ⎤⎦ + [PN − PNN ](u − P%n ).
Let N ≥ n . Then from Proposition 3.8, we have
(PN − PNN )u = (PN − PNN )(u − P%n ) . Hence,
|| (PN − PNN )u || ≤ || (PN − PNN ) || || (u − P%n ) || ≤ 2ε , N
implying the strong convergence of PN to the same limit as PN , which is the identity operator, on the set of continuous functions. The result extends to H M ( g , A ) by closure ∞
(Theorem 2.1), which also implies that H M ( g , A ) = H M ( g , A ) • In view of the result of Proposition 3.9, all the results on the variational methods in sec. N
3.II. are valid with PN replaced by PN , which can be exploited for computational advantage as illustrated by the following example. Example 3.2. (Collocation variational method) Consider the Fredholm integral
equations of the second kind, Eq. (2.9-b), in H = L ([−1;1], dx) , 2
( z − K ) f = g′ .
(3.46)
In detail,
z f ( x) −
∫
1
−1
K ( x; y ) f ( y ) dy = g ′( x) .
(3.47)
with a square integrable kernel, K is a Hilbert-Schmidt operator, and hence from Theorem 3.4, if Eq. (3.46) has a solution, a converging sequence of approximations can be obtained by
Foundations and Applications of Variational and Perturbation Methods the variational method with an arbitrary basis, e.g.,
{x }
n −1 ∞ n =1
113
. This results in solving
Eq.(3.17-a) with the scalar product defined by
(u ,υ ) =
∫
1
−1
u * ( x)υ ( x)dx ,
requiring evaluations of double integrals to obtain the matrix elements of K . The space H M ( g , A ) with A being the operation of multiplication by x and g = 1 , is the same as H , since both are the spans of
{x }
n −1 ∞ n =1
with the same scalar product. To
clarify, g ′ and K bear no relation with g and A except that Eq. (3.46) defines an equation N
in H M ( g , A ) . From Proposition 3.9, PN constructed with the orthogonal polynomials in
H M ( g , A ) is a sequence of uniformly bounded operators converging strongly to the identity operator on H M ( g , A ) = H . Consequently, the result of Theorem 3.4 remains valid with
PNN replacing PN , i.e., a sequence of approximations
{ f N′ }
converging in H can be
obtained by solving
( z − PNN KPNN ) f N′ = PNN g ′ .
(3.48)
This replacement requires the use of the set of ordered polynomials as the basis set, although orthogonality is not needed for computation. This is inconsequential for the present
{ }
case as the set x
n −1 ∞ n =1
is an ordered set.
From Eq. (3.48), a solution f N′ is given by
z f N′ = PNN g ′ + K N f N′ =
∑
N n =1
ϕ n (ϕ n , g ′) N +
∑
N n =1
ϕ n (ϕ n , K f N′ ) N ,
which is the same set of equations as in the original method except that now the scalar product is approximated by its numerical value obtained by the N -point Gaussian quadrature as indicated in Eq. (3.44-a). It follows that f N′ =
∑
N n =1
z α n′ = (ϕ n , g ′) N + (ϕ n , K f N′ ) N = (ϕn , g ′) N + equivalently,
∑
N m =1
α m′ (ϕn , K ϕm ) N ,
α n′ϕn , where
114
S. Raj Vatsya
∑
N m =1
(φn , [ z − K ] , φm ) N α m′ = (φn , g ′) N ; n = 1, 2,...., N ,
∑
with f N′ =
∑
N m ,l =1
N
α n′φn , as long as {φn }n =1 spans the same space as {ϕ n }n =1 . In detail N
n =1
(3.49)
N
1 wlφn* ( xl ) ⎡ z φm ( xl ) − ∫ K ( xl , y )φm ( y ) dy ⎤ α m′ = ⎢⎣ ⎥⎦ −1 n = 1, 2,...., N .
∑
N l =1
wlφn* ( xl ) g ′( xl ),
(3.50)
Eq. (3.50) can be expressed in the matrix notation as
ˆ [z J − JW
] α′
ˆ β′, = JW
where J lm = φm ( xl ) , W
(3.51)
is a diagonal matrix with elements Wlm = wlδ lm , Jˆ is the
Hermitian conjugate of J , the elements of
lm
=
∫
1
−1
are given by
K ( xl , y )φm ( y ) dy = (K φm )( xl ) ,
(3.52)
β N′ is the N -vector with elements βl′ = g ( xl ) and α ′ is the N -vector solution. ˆ ˆ and J are is the identity matrix. It is easily seen that JW From Proposition 3.8, JWJ
ˆ ) −1 yields the inverses of each other. Multiplying Eq. (3.51) with ( JW
∑
N m =1
⎡z φ (x ) − ⎣⎢ m l
∫
1
−1
K ( xl , y )φm ( y ) dy ⎤ α m′ = g ′( xl ) l = 1, 2,...., N . ⎦⎥
(3.53)
Since all of the steps are reversible, the converse is also true, i.e., Eq. (3.53) and Eq. (3.48) are equivalent. The cases with infinite domains can be considered in the same manner, e.g.,
H = L2 ([0; ∞), dx) is spanned by
{x
n −1
exp[− x]}
∞ n =1
. It can be identified with
H = L2 ([0; ∞), d ρ ( x)) with ρ ( x) = (1 − exp[−2 x]) / 2 , which is spanned by { x n −1}
∞ n =1
.
Also, multi-dimensional variables can be accommodated in a straightforward manner. The procedure is applicable to all classes of operators considered in sec. 3.II., with the corresponding results applicable. While the argument of the closure facilitates the analysis, for computation (K φm )( x) should be sufficiently smooth to be defined at the set measure zero may be required •
{ xn } . If not, adjustment at a set of
Foundations and Applications of Variational and Perturbation Methods
115
The procedure described in Example 3.2 is known as the collocation variational method. The collocation method (Noble, 1973) results by approximating the integral in Eq. (3.53) by a numerical quadrature, which is convenient for applications but has limited range of applicability and the approximations are not as accurate as the variational method. The limitations result from the choices of
{ xn }
and from the numerical integration method
employed. On the other hand, the variational methods have a wider range of applicability and the approximations are considerably more accurate. However, in some situations their application is also considerably more complicated as seen in case of Example 3.2. For these reasons, the collocation variational method is frequently used, which can be applied with moderate computational effort, its range of applicability is wider than the collocation method and it is expected on intuitive grounds that the improvement in the accuracy of the approximations justifies the increase in effort. However, while the convergence properties of the variational methods and the collocation method are quite well understood, satisfactory results on the convergence of the collocation variational method are almost non-existent. Consequently, the conditions for the applicability of the method are also not clear. Particularly the basis and the set
{ xn }
in applications are often selected for convenience as
long as they are intuitively reasonable. As is clear from Proposition 3.9 and its illustration by Example 3.2, the collocation variational method has about as large a range of applicability as the variational methods and the convergence properties are essentially identical, although with a particular basis and the set of points { xn } . However, this condition on
{ xn }
is not
very limiting for applications. The result also shows that if numerical integration is required
in Eq. (3.53), it can be carried out independently of the Gaussian quadrature with { xn } as the set of points. It should be noted however, that in the collocation variational method, one may encounter a non-symmetric matrix even if the original operator is symmetric. Some flexibility can be exercised in selecting the basis set. It can be shown that if each finite basis set can be uniformly approximated by the presently employed polynomials, then it can replace the polynomial basis in Eq. (3.53) without impacting upon the convergence result requiring no other modifications. Foregoing analysis was conducted with the approximation ρ N (λ ) to ρ (λ ) obtained by the moment method with all of its points of increase being the jump discontinuities. At times, the function ρ (λ ) is known to be absolutely continuous or even continuously differentiable. Consequently, the approximation to its derivative ρ& (λ ) is a set of delta functions, which is
not satisfactory. For applications, frequently a continuous approximation to ρ& (λ ) is required
and the only information available is the associated moments. Since converges to
ρ N (λ ) = ( g , E N (λ ) g )
ρ (λ ) (Corollary 3.9) at all of its points of continuity, the quotients
ρ& N [λn′−1 , λn′ ] = [ ρ N (λn′ ) − ρ N (λn′−1 )] /[λn′ − λn′−1 ] are easily seen to converge to ρ& (λ ) as N , N ′ → ∞ , where the set the set of the points of continuity of
(3.54)
{λn′}n =1 is contained in N′
ρ N (λ ) , such that the union of intervals [λn′−1 , λn′ ]
116
S. Raj Vatsya
covers the interval Ω , and the maximum of their lengths tends to zero as N ′ → ∞ , where
λ is the point to which [λn′−1 , λn′ ] shrinks as N ′ → ∞ . If ρ (λ ) is absolutely continuous but not continuously differentiable, the convergence is almost everywhere. The quotient approximation is commonly used in calculations. For applications, the set contained in the set of the points of continuity of
{λn′}n=1 should be N′
ρ N (λ ) . This does not impact upon the
convergence result. We deduce a method to produce a smoother approximation. Proposition 3.10. Let
ρ& (λ ) and let ρ N′ (λ ) =
ρ (λ ) be absolutely continuous function of λ with its derivative
∑
N n =1
α n λ n −1 , with α n being the elements of the N -vector
solution α of LN α = β , where LN is an N × N matrix with the elements
( LN ) nm = and
∫
Ω
λ n+ m−2 d λ = μn+ m−2 ,
β is the N -vector with elements βn =
∫
Ω
λ n −1d ρ (λ ) = μ n −1 ; n, m = 1, 2,....., N .
Then || ρ N′ − ρ& || → 0 in L [Ω, d λ ] , where Ω is a finite interval in the real line. 2
N →∞
Proof. Consider the integral equation
∫
Ω
The set
λ η ρ& (λ ) d λ = ν (η ) .
{λ }
n −1 ∞ n =1
forms a basis in L [Ω, d λ ] . The result follows from Theorem 2.9 • 2
Proposition 3.10 is essentially identical to the inversion of the Laplace transform as illustrated in Example 2.2. The convergence of the integrals of ρ N′ (λ ) to ρ (λ ) follows from this result. Proposition 3.10 can be extended to include infinite domains by selecting an appropriate basis and peeling the required factor from ρ& (λ ) . For example, if Ω = [0; ∞)
and it is known that ρ& (λ ) decays exponentially at infinity, then one can consider the integral
equation
∫
Ω
λ η e− aλ [eaλ ρ& (λ )] d λ = ν (η )
Foundations and Applications of Variational and Perturbation Methods
117
with a suitable constant a and use the method of Theorem 2.9 to obtain the approximations to [e
aλ
ρ& (λ )] in terms of the moments, since {λ n −1e − aλ }n =1 forms a basis in [0; ∞) , ∞
yielding the corresponding approximations to ρ& (λ ) . −1
The resolvent [ z − A ]
can be expressed as
[ z − A ]−1 = z −1[1 − z −1 A ]−1 = z −1 (1 + z −1 A[1 − z −1 A ]−1 ) , and thus,
F ′( z ) = z[ z − A ]−1 = ( g , g ) + z −1 ( g , A[1 − z −1 A ]−1 g ) .
(3.55)
As indicated in Eq. (3.39), [ N , N ]F ′( z ) approximants result from the [ N , N − 1] approximants to
( g , A [1 − z − 1 A ] − 1 g ) =
∫
λ d ρ (λ ) = 1 − z − 1λ
∫
d ρ (λ ) , 1 − z − 1λ
(3.56)
with
ρ (λ ) =
∫
λ −∞
λ ′d ρ ( λ ′ ) ,
which can be obtained by solving the algebraic equations
LN′ ( z ) α ′( z ) = β ′
(3.57)
with ( LN′ ( z )) nm = ( z μn + m −1 − μn + m ) and
β n = μn , n, m = 1, 2,...N . The approximants
[ N , N ]F ′ ( z ) are given by [ N , N ]F ′( z ) = ( μ 0 + z −1[ N , N − 1][ z (F ′( z ) − μ 0 )]) =
In terms of the basis
∑
N m =1
{φ
n
(μ
0
+
∑
N n =1
α n′ ( z ) β n′
)
(3.58)
= A n −1 g} , Eq. (3.57) is expressed as
α m (φn , A[ z − A ]φm ) = (φn , A g ), n = 1, 2,...N .
(3.59)
A
If A is a positive definite operator, a subspace H M of H M can be constructed by introducing the scalar product (u ,υ ) = (u , Aυ ) as in Theorem 3.7 (i) and sec. 2.III. This A
reduces Eq. (3.59) to
118
S. Raj Vatsya
∑
N m =1
α m (φn , [ z − A ]φm ) A = (φn , g ) A , n = 1, 2,...N .
(3.60)
Thus, the entire analysis of [ N , N − 1]F ′ ( z ) can be transposed on to [ N , N ]F ′( z ) . In the Stieltjes representation, the positive definiteness of A is equivalent to
ρ (λ ) in Eq.
(3.56) being non-decreasing. The same analysis applies to a negative definite operator, by replacing it by its negative. However, for a general self-adjoint operator, Eq. (3.59) or its equivalent may not even have a solution. Such spurious singularities can be encountered even if A is invertible, although in that case, the modification of Corollary 3.8 and Theorem
3.7(ii) can be used to obtain a converging sequence of approximations to F ( z ) , F ′( z ) and
ρ ( λ ) . In view of these complications, [ N , N ] approximants have less satisfactory convergence properties than the adjacent sequence [ N , N − 1] . However, for some operators encountered in applications, these two sequences produce opposite bounds making them equally useful, as will be shown in the next section.
3.IV. MONOTONIC CONVERGENCE 3.IV.1. Diagonal Forms The convergence results of sec. 3.II. and thus of sec. 3.III., can be strengthened further with additional restrictions on the operator A , which will be assumed to be bounded below in addition to being self-adjoint, and g will be taken to be an arbitrary vector in an arbitrary Hilbert space H as well as z will be assumed to be a real parameter. Additional conditions required will be specified. Also, by convention, monotonicity term will be used for nondecreasing and non-increasing sequences, which at times are referred to as the semimonotonic nsequences. Strict monotonicity will be emphasized whenever relevant. In the present subsection, we obtain the approximating sequences to the diagonal elements of the resolvents obtained by solving the inhomogeneous equation
( z − A ) f = z (1 − z −1 A ) f = g ,
(3.61)
where z ≠ 0 is at a positive distance from the numerical range of A implying that
( z − A )−1 exists as a bounded operator. In some cases, z = 0 can be included or the results can be strengthened to include it. −1
−1
Lemma 3.15. Let z and A be such that 0 ≤ z A < 1 , i.e., z A is non-negative −1
with || z A || < 1 . Then the series −1
−1
∑
∞ n =0 −1
z − n ( g , A n g ) converges monotonically from
below to ( g ,[1 − z A ] g ) = z ( g ,[ z − A ] g ) for each vector g in H .
Foundations and Applications of Variational and Perturbation Methods
119
Proof. The convergence of the sequence follows from the convergence of the Neumann −1
−1
expansion of [1 − z A ]
(Lemma 2.3). Since each term of the series is non-negative, its
partial sums form a non-decreasing sequence, implying the monotonic convergence from below • −1
−1
Lemma 3.16. Let z and A be such that −1 < z A ≤ 0 , i.e., z A is non-positive −1
with || z A ||< 1 . Then the sequences
{Δ
} {
}
= ∑ n =0 z − n ( g , A n g ) and Δ m = ∑ n =0 z − n ( g , A n g ) , m = 0,1, 2,...., 2m
m
2 m +1
−1
−1
converge monotonically from above and below, respectively, to ( g ,[1 − z A ] g ) for each vector g in H . Proof. Again, the convergence follows from the convergence of the Neumann expansion −1
−1
of [1 − z A ]
(Lemma 2.3). We show the monotonicity of Δ m from above. The other case
follows similarly. −1
(2 m +1) / 2
Since z −1 A is non-positive, [− z A ]
is a well-defined non-negative operator.
−1
Also, 0 ≥ (u , z A ]u ) ≥ −(u , u ) for each u . Consequently,
([− z −1 A ](2 m +1) / 2 g ,[1 − z −1 A ][− z −1 A ](2 m +1) / 2 g ) ≥ 0 . Hence,
(Δ m +1 − Δ m ) = − ([− z −1 A ](2 m +1) / 2 g ,[1 − z −1 A ][− z −1 A ](2 m +1) / 2 g ) ≤ 0 ,
{ } is a non-increasing sequence, implying the monotonic convergence from above •
i.e., Δ m
While the numerical value of the direct applications of the Neumann expansions is limited, their convergence properties at times supplement the other arguments used to deduce the results of more practical value. In the following, we determine the convergence properties of the variationally determined approximations to Eq. (3.61), which are directly useful for computations. As earlier, PN will denote the N -dimensional orthoprojection converging to the identity operator and for an operator A under consideration, A N = PN APN . The results are stated for the operators that are bounded below. Theorem 3.11. Let A > κ ≥ z . Then the variational approximations
(PN g ,[ z − AN ]−1PN g ) = ( g , f N )
120
S. Raj Vatsya −1
converge monotonically from above to ( g ,[ z − A ] g ) = ( g , f ) . Proof. Since the spectrum of A N is contained in the numerical range of A N , which is
contained in the numerical range of A , which is contained in the interval [κ ; ∞) , we have
|| ( z − AN ) −1 || ≤ (κ − z ) −1 from Corollary 2.4. Further, it follows from Corollary 3.2 as indicated in Remark 3.5(i), Eq. (33-c), that
( g , f N +1 − f N ) = ( f N +1 − f N , [ z − A ]( f N +1 − f N )) ≤ 0 . In fact the lower bound to the right side is [( z − κ ) || f N +1 − f N || ] . We have used the 2
fact that A is self-adjoint and z is real, which enables one to take Eq. (3.61) for its adjoint equation as well. The inequality part implies that ( g , f N ) is non-increasing and the convergence follows from Theorem 3.7(i) • Theorem 3.11 implies the following convergence result for the Padé approximants: Corollary 3.10. Let A ≥ 0 > z . Then the sequence [ N , N − 1]F ′( z ) of the Padé
approximants to
F ′( z ) = ( g , z[ z − A ]−1 g ) = z ∫
∞
0
d ρ (λ ) z − λ
converges monotonically from below to F ′( z ) . Proof. The result follows from the representation of Corollary 3.9 and Theorem 3.11 •
If A is positive definite, A
−1
exists as a bounded operator. Then Eq. (3.61) is equivalent
to
(z A − A 2 ) f = A g ,
(3.62)
for g in D( A ) . The following results will be shown to be valid with arbitrary g in H . Eq. (3.62) was analyzed in Eqs. (3.55) to (3.60), for its relevance to the Padé approximants. The same procedure can be used to construct a monotonically convergent sequence from above with an arbitrary basis instead of the moment basis, complementing the result of Theorem 3.11. This construction reduces Eq. (3.62) to Eq. (3.61) in H + (Theorem 3.7(i)). In the following, we use the same construction once more to deduce the same result. Arguments are essentially the same as in Theorem 3.7 (i).
Foundations and Applications of Variational and Perturbation Methods
121
Complete D ( A ) to H + with respect to the scalar product (u ,υ ) + = ( Au , Aυ ) . It is A
A
A
clear that D ( A ) is included in H + , which is included in H + , which is included in H . In fact H + is identical to D( A ) , which is complete with respect to (u ,υ ) + , but it is A
A
immaterial. The closure of A
−1
A
in H + will be denoted by B . The technicalities involving
the restriction of A −1 to H + are inconsequential as in Theorem 3.7(i). Therefore, they will A
be omitted. Theorem 3.12. Let A ≥ κ > 0 ≥ z , and let hN =
∑
N n =1
α nφn , where α n are determined
by
∑
N m =1
α m (φn , A[ z − A ]φm ) = ( Aφn , g )
with an arbitrary g in H , and
{φn }
is a basis in H contained in D( A ) , i.e., basis in
H +A . Then {z −1[( g , g ) + ( g , A hN )]} converges monotonically from below to ( g ,[ z − A ]−1 g ) = ( g , f ) . Proof. As in Theorem 3.7(i),
∑
N m =1
α n are the solutions of
α m (φn , [1 − zB ]φm ) +A = − (φn , A −1 g ) +A ,
[1 − zB ] > 0 .
Hence from Theorem 3.11, the approximations
( A −1 g , hN ) +A = ∑ n =1α n ( A −1 g , φn )+A = ∑ n =1α n ( g , Aφn ) = ( g , A hN ) N
N
converge monotonically from below to
( A −1 g , (1 − zB ) −1 A -1 g ) +A = ( g , A (1 − zB ) −1 A −1 g ) = −( g , A ( z − A )−1 g ) , which implies the result • An equation involving a bounded below operator expressed in terms of a positive definite operator creates no additional complications in case of a general basis. However, in case of the moment basis, the construction requires some adjustment. If an addition of a positive constant is needed, then the moments for the new operator should be determined, which are linear combinations of the original moments. With the positive-definiteness of the operator, the following result complements Corollary 3.10.
122
S. Raj Vatsya Corollary 3.11. Let A ≥ κ > 0 > z . Then the sequence [ N , N ]F ′( z ) of the Padé
approximants to
F ′( z ) = ( g , z[ z − A ]−1 g ) = z ∫
∞
κ
d ρ (λ ) , κ >0 z − λ
converges monotonically from above to F ′( z ) . Proof. Follows from Eqs. (3.39), Corollary 3.9 and Theorem 3.12, together with z < 0 •
It is clear from Corollary 3.10 and Corollary 3.11, that for a positive definite A and
z < 0 , [ N , N − 1]F ′( z ) and [ N , N ]F ′( z ) provide monotonically convergent sequences from below and above, respectively.
3.IV.2. Eigenvalues In general, the convergence results for the point spectrum are available only for the compact operators, as seen in sec. 3.II. In the following, we obtain convergence results for more general self-adjoint operators with additional assumption that the lower part of A consists entirely of the isolated eigenvalues, which has the obvious implication that A is bounded from below. The eigenvalues of the operator will be denoted by
λn , n = 1, 2,.... ,
arranged in the non-decreasing order counting multiplicities, with un being the corresponding normalized eiegenvectors. For a degenerate eigenvalue, an arbitrary orthonormal basis in the corresponding eigenspace can be taken to be the set of eigenvectors. If multiplicity requires separate consideration, it will be indicated. The eigenvalues and the corresponding eigenvectors of A N will be denoted by
λnN and unN respectively, again indexed in the non-
decreasing order. For some results, an arbitrary finite dimensional subspace of the Hilbert space, contained in D( A ) will be required. By letting Pm′ be the orthoprojection on this subspace, it can be denoted by Pm′ H . With Am′ = Pm′ APm′ , the eigenvalues and eigenvectors of Am′ will be denoted by
λn′m and un′m , again maintaining the non-decreasing order. To
′ 1H as Pm H is contained in Pm+1H for each m , but clarify, Pm′ H is contained in Pm+ Pm′ H and Pm H are independent, although a member of the set {Pm′ H } coincides with Pm H . To avoid stating the results for n = 1 separately, a conventional auxiliary vector u0 = 0 will be assumed. Lemma 3.17. The eigenvalues of A are characterized by
λn = min .(u , Au ), u in D ( A ), (u , u j ) = 0, || u ||= 1; j = 0,1,..., (n − 1), n = 1, 2,... .
Foundations and Applications of Variational and Perturbation Methods
123
Proof. The result follows from the spectral theorem (Theorem 2.6), which under the indicated conditions states that
(u , A u ) =
∞
∫λ
n −0
λ d (u, Eλ u ) ≥ λn .
The equality is achieved for u = un • Lemma 3.18. Let Pn′H be an arbitrary n -dimensional subspace of H . Then the
eigenvalues of An′ = Pn′APn′ provide upper bounds to the exact, i.e.,
λ j ≤ λ ′j n for each
j = 1, 2,..., n . Proof. Let
w = α1u1′n + α 2u2′n + ....... + α j u ′jn , with parameters α l determined by
( w, ul ) = 0 for l = 1, 2,..., ( j − 1) . This constitutes a homogeneous set of ( j − 1) linear equations in j parameters, implying the existence of a nonzero w . Since w is in D( A ) and ( w, ul ) = 0 for l = 1, 2,..., ( j − 1) , from Lemma 3.17, with w normalized, we have that
λ j ≤ ( w, A w) . Further, w = P j′w implies that λ j ≤ ( w, A w) = (P j′w, AP j′w) = ( w, A ′j w) ≤ λ ′j n • Lemma 3.19. (minimax principle) The eigenvalues of A are characterized by
λn = min . max . (u , A u ), || u ||= 1, n = 1, 2,.... . Pn′H
Proof. Since
u in Pn′H
λn′n is the largest eigenvalue of An′ in Pn′H , it follows from Lemma 3.18
λn ≤ max .(u, Au ) .
that
u in Pn′H
Since this inequality is valid for an arbitrary Pn′H , we also
have
λn ≤ min . max .(u , Au ) . Pn′H
u in Pn′H
{ }
However, there is a subspace, namely spanned by u j
n j =1
, in which the equality is achieved,
hence the result • The minimax principle of Lemma 3.19 characterizes the eigenvalues without referring to the exact eigenvectors, thus eliminating this unsatisfactory feature of Lemma 3.17. This principle also yields the following useful result:
124
S. Raj Vatsya Lemma 3.20. (monotonicity principle) In addition to the assumptions of Lemma 3.17 to
Lemma 3.19, let A% dominate A , i.e., D ( A ) contains D ( A% ) and for each u in D ( A% ) ,
(u, Au ) ≤ (u, A% u ) . Then λn ≤ λ%n , n = 1, 2,..... , where λn , λ%n are the ordered eigenvalues of A , A% , respectively. The strict dominance (u , Au ) < (u , A% u ) implies the strict inequality
λn < λ%n .
Proof. Let Pn′H be the span of first n eigenvectors of A% . Then it follows from the
minimax principle (Lemma 3.19) that
λn ≤ max . (u, Au ) ≤ max . (u, A% u ) = λ%n . u in Pn′H
u in Pn′H
It is clear that the strict dominance implies a strict inequality • The minimax principle also implies the bound and the monotonicity properties of the variationally determined sequences of approximations to the eigenvalues, as shown below. Lemma 3.21. The approximations
λnN constitute monotonically decreasing sequences as
N increases, of the upper bounds to eigenvalues λn for n = 1, 2,...N . Proof. The fact that
λn ≤ λnN follows trivially from the minimax principle or from
Lemma 3.18. Further, it follows from Lemma 3.19 that
λnN +1 = min . Pn′PN +1H
max . (u , A u ), n = 1, 2,...., N
u in Pn′P N +1H
.
Since the set {Pn′PN +1H } contains {Pn′PN H } , i.e., the set of all n -dimensional subspaces of
PN +1H contains the set of all n -dimensional subspaces of PN H , and the minimum and maximum can only decrease, increase, respectively, with increasing the size of the set, we have
λnN +1 ≤ min . max . (u , A u ) = λnN • u in P ′P H P ′P H n N
n N
Lemma 3.21 shows that the variational approximations can only improve with increasing the size of the basis set, i.e., Corollary 3.12. With the symbols as in Lemma 3.21, we have
Foundations and Applications of Variational and Perturbation Methods
125
λnN ↓ λn ≥ λn , n = 1, 2,... ., as N → ∞ . Proof. From Lemma 3.21,
below by
{λ } for each fixed n is a non-increasing sequence bounded N n
λn , and hence, must converge to a limit λn ≥ λn •
Corollary 3.12 shows the convergence of the variational approximations to the eigenvalues to some limits, but there is no guarantee that the limits are the exact eigenvalues. If the operator is also compact, the limits are the exact eigenvalues from Corollary 3.7. As shown in Theorem 3.13 below, this result holds also for the operators with lower part of the spectrum consisting only of isolated eigenvalues. Theorem 3.13. Let A be bounded below with lower part of its spectrum consisting
entirely of isolated eigenvalues each of a finite multiplicity. Then the eigenvalues converge monotonically from above to the corresponding eigenvalues
λnN of AN
λn of A and the
N
corresponding eigenprojections p n of A N converge uniformly to the eigenprojection p n of
A. Proof. From Corollary 3.12 we have that
λnN ↓ λn ≥ λn , n = 1, 2,... We show that if
λn −1 < λn ≤ λn then λn = λn < λn +1 ≤ λn +1 for each n = 1, 2,.... Since the condition is satisfied for n = 1 , the result will follow by induction for all n . This will imply the stated result. Part of the proof can be based on Theorem 3.3. We use arguments parallel to Lemma 3.2(i) for their transparency. From Corollary 3.12 we have that
λn ≥ λn . Assume that λn > λn . Let c be a positively
oriented closed contour at a positive distance from
λn −1 and λn with λn in the interior of the
region enclosed by c . Also, let p n be the orthoprojection of A on the eigenspace corresponding to
(u , p n u ) =
λn and let u be an arbitrary normalized vector in the eigenspace. Then, 1 2π i
∫ dz (u,[ z − A ]
−1
c
u) .
From Theorem 3.7(i),
sup. | (u ,[( z − A N ) −1 − ( z − A ) −1 ]u ) | z in c
≤ sup . || ( z − AN ) −1 − ( z − A ) −1 ]u || → 0. z in C
Consequently,
N →0
126
S. Raj Vatsya
|| p nu ||2 = (u, p nu ) =
1 lim 2π i N →∞
∫ dz (u,[ z − A c
N
]−1 u ) ,
(3.63)
as in Lemma 3.2(i). Alternatively, the order of the limit and integral in Eq. (3.63) can be interchanged by the Lebesgue theorem (Theorem 1.3) as the integrand is bounded by a constant by Corollary 2.4 and dz / d | z | is bounded with respect to the measure d | z | . −1
Since [ z − A N ]
is analytic in the interior of the region enclosed by c , Eq. (3.63)
implies that (u, p n u ) = 0 , which is a contradiction. Hence,
λn = λn < λn +1 . Since the right
side of Eq. (3.63) is the limit of || p nN u ||2 , it follows that p n converges strongly to p n , for N
each vector in the range of p n in particular, which is finite dimensional. Due to the ordering,
λn and λn are in a one to one correspondence counting multiplicities. Consequently, the N
dimension of range of p n is equal to that of p n . Since this eigenspace is finite-dimensional, the convergence is uniform with respect to the norm • The convergence result was shown to be valid by this method in Corollary 3.7 for compact operators with the assumption of uniform convergence. In case of Theorem 3.13, a lack of uniform convergence is compensated by monotonicity and the induced order, in addition to the fact that the eigenvalues are isolated. Converging sequences of lower bounds to the eigenvalues of A can also be produced using the monotonicity principle (Lemma 3.20), if a sequence of lower bounds to A can be generated. This program is followed in the method of intermediate problems (Weinstein and Stenger, 1972). In this method a base operator dominated by A is constructed, which constitutes the first member of sequence. Then the other members are generated by formulating the intermediate problems. The base operator acts as stopper for the approximate eigenvalues from falling into an infinite sequence of trivial lower bounds. While a detailed consideration of this method is beyond the scope of the present text, some overlapping topics will be covered and the method will be illustrated in ch. 4 together with other methods to generate the lower bounds.
Chapter 4
4. PERTURBATION METHODS 4.I. PERTURBED OPERATOR Most problems of practical interest can be expressed as an essentially solvable system modulated by a perturbation term. In mathematical terms, the question reduces to obtaining information on the perturbed operator from essentially complete information on its unperturbed part. While this premise underlies most of the developments in the perturbation methods, introduction of an external perturbation has also been found to help analysis. Both of these approaches will be adopted in the present treatment. Perturbative approach underlies various other methods also. For example, in variational methods, an attempt is made to express the operator of interest as a sum of an operator of finite rank and the remainder, which is the perturbation term. Information on the unperturbed part can be obtained by algebraic methods. Unperturbed part is refined to make the perturbation small, in whatever sense possible, in an attempt to extract information on the operator of interest from the unperturbed part, in the limit. As seen in ch. 3, the program works quite satisfactorily in case of the compact operators. For the other classes of operators, there are limitations, i.e., even as the perturbation tends to zero in some sense, the properties of limit of the approximating sequence may differ from those of the original operator. The method of intermediate problems shares this similarity with variational methods. In some schemes, the original perturbation term is retained and some expansion method is used. This approach was adopted in the earlier applications of perturbation methods, which underlies various iterative procedures. Perturbative scheme also forms the basis of studies of the properties of one or more parameter families of operators. The issue to be resolved in all methods based on the perturbative approach is the impact of perturbation on the properties of operator. In this section, we collect some basic relations between the perturbed and unperturbed operators. The targeted operator A will be expressed as A = ( A + V ) where A 0
0
is the
unperturbed part, and V is the perturbation. While some of the analysis can be extended to 0
non-selfadjoint cases, we restrict the considerations to self-adjoint operators A and A for their practical significance, although non-selfadjoint operators will be encountered in the course of treatment. Therefore, it is necessary to ascertain this property of the operators to be analyzed. As indicated in ch. 2, symmetry of an operator is usually easy to establish, but self-
128
S. Raj Vatsya
adjointness is a non-trivial problem. Some of the techniques and criteria for this purpose were introduced in ch. 2. Relative boundedness of a symmetric V with respect to a self-adjoint operator is another frequently used criterion to establish the self-adjointness of a perturbed operator A = ( A + V ) . 0
Let A
0
be a self-adjoint operator in a Hilbert space H and let V be a symmetric 0
operator with D(V ) containing D ( A ) . If there are constants a, b such that
|| V u || ≤ a || u || + b || A 0u ||
(4.1-a)
0
0
0
for each u in D ( A ) , then V is called relatively bounded with respect to A or just A bounded. One can take for b in Eq. (4.1-a) the greatest lower bound to all of its acceptable
{
values, called the A - bound of V . If the boundedness of the sequences {un } and A un 0
0
}
implies the compactness of {V un } , i.e., {V un } contains a convergent subsequence, then V 0 0 is called relatively compact with respect to A or just A -compact. As shown in Lemma 4.1 below, the condition stated in Eq. (4.1-a) is equivalent to
|| V u ||2 ≤ a′2 || u ||2 + b′2 || A 0u ||2 = || (ia′ − b′ A 0 )u ||2 ,
(4.1-b)
with some constants a′, b′ . Lemma 4.1. Let A
0
be a self-adjoint operator in a Hilbert space H and let V be a 0
symmetric operator with D(V ) containing D ( A ) . Then Eq. (4.1-a) and Eq. (4.1-b) are equivalent, with the same value of the greatest lower bound to all of the values of respective relative bounds b and b′ . Proof. Eq. (4.1-b) implies that
|| V u ||2 ≤ (a′ || u || + b′ || A 0u ||) 2 , which implies Eq. (4.1-a) with a, b = a′, b′ . Conversely, it follows from Eq. (4.1-a) that
|| V u ||2 ≤ (1 + ε −1 )a 2 || u ||2 + (1 + ε )b 2 || A 0u ||2 −(ε −1/ 2 a || u ||2 −ε 1/ 2b || A 0u ||) 2 ≤ (1 + ε −1 )a 2 || u ||2 + (1 + ε )b 2 || A 0u ||2 , with an arbitrary
ε > 0 . Thus, Eq. (4.1-b) is satisfied with a′2 = (1 + ε −1 )a 2 and
b′2 = (1 + ε )b 2 . Since b and b′ are arbitrarily close to each other, the greatest lower bounds
Foundations and Applications of Variational and Perturbation Methods
129
to all of their possible values are identical. Equality part in Eq. (4.1-b) follows by direct computation, which uses the symmetry of V and of A • 0
Frequently, the following equivalent characterization of the relative boundedness is more convenient: Lemma 4.2. With the assumptions of Lemma 4.1, Eqs. (4.1-a, b) hold if and only if there 0 0 −1 0 is a z in the resolvent set of A such that || V ( z m A ) || ≤ b′ with b′ being the A -
bound of V . 0 −1
Proof. We consider the case of ( z − A ) . The other case follows by the same
arguments. Also, Lemma 4.1 establishes essential equivalence of the two conditions and thus, a proof for one of them is sufficient. If there is a z in the resolvent set of A each υ in H , then by letting ( z − A )
υ || ≤ b′ || υ || for
0 −1
such that || V ( z − A )
0
υ = u , we have
0 −1
|| V u || ≤ b′ || ( z − A 0 )u || ≤ b′ | z | || u || + b′ || A 0u || , implying the validity of Eq (4.1-a). This implies the validity of Eq. (4.1-b) from lemma 4.1. 0
0
The relation is valid for all u in D ( A ) (Lemma 3.7), and since D(V ) contains D ( A ) , 0
also on D( A ) , which is the same as D ( A ) . For the converse, let || V u ||≤|| (ia′ − b′ A )u || , i.e., || V u ||≤ b′ || (ic − A )u || , where 0
0
c = a′ / b′ . Since A 0 is self-adjoint and c is real, the range of (ic − A 0 ) is dense from 0 −1
Theorem 2.4 (iv) and (ic − A )
exists. Hence, each υ in this range and u are related by
( z − A 0 ) −1υ = u with z = ic . Consequently, || V ( z − A 0 ) −1υ ||≤ b′ || υ || on the range. By closure , the relation extends to the whole of H (Theorem 2.1) • Proof of the following characterization of self-adjointness is widely available in literature [e.g., Kato, 1980; Theorem 4.3, ch. V]. Theorem 4.1. If a symmetric V is bounded with respect to a self-adjoint operator A
0
with its A - bound less than one, then A = ( A + V ) is self-adjoint. 0
0
Proof. From Lemma 4.2, the assumptions imply that there is a non-real z such that
|| B ||=|| V ( z − A 0 ) −1 ||< 1 . Consequently, (1 − B − )−1 exists as a bounded operator, −
obtainable by the Neumann expansion (Lemma 2.3), with H for its domain. Each u is thus, given by u = (1 − B − )(1 − B − ) −1 u , and hence the range of (1 − B − ) covers H . Now consider
130
S. Raj Vatsya
[ z − A ] = [ z − ( A 0 + V )] = (1 − B − )( z − A 0 ) . Since A 0 is self adjoint, the range of ( z − A 0 ) is dense, which from this equality implies that the range of ( z − A ) is also dense. Hence, A is self-adjoint from Theorem 2.4 (iii) • While it is true that a densely defined, bounded operator admits a closure with H for its domain (Theorem 2.1), this does not hold for the operators that are not densely defined. Essential argument of Theorem 4.1 is that (1 − B − ) is a densely defined, invertible operator. Remark 4.1. (i) We have assumed that Im.( z ) ≠ 0 . However, this is no limitation for if 0
there is a real z in the resolvent set of A satisfying the bound requirement, a non-real z ±
0 −1
can be found satisfying the same condition. Further, if || B ||=|| V ( z ± A ) || are bounded 0
for one value of z , they are bounded for all z in the resolvent set of A by the analyticity of resolvent and the fact that the two points can be enclosed in a bounded, connected region. At infinity, the operators can be seen to approach the null operator. Also, by taking the adjoints, 0 −1
it follows from these results that the same assertions are valid for ( z ± A ) V . 0 −1
(ii) As the relative boundedness of V translates into the boundedness of V ( z ± A ) 0 −1
and ( z ± A ) V , it is straightforward to conclude that the relative compactness of V is equivalent to the compactness of these operators for all z in the resolvent set of A similar arguments •
0
by
The results obtained above for general self-adjoint operators prove to be useful also for frequently encountered semi-bounded operators. However, the property of semi boundedness can be exploited further. In the following, we obtain some consequent results and develop 0
helpful techniques. It will be assumed that A is self-adjoint, bounded below by a positive constant κ ′ , which may be the Friedrichs extension of a symmetric operator (Remark 2.5), 0
and that V is symmetric and form-bounded with respect to A , i.e., D (V ) contains
D( A 0 ) and on D( A 0 ) , (u, V u ) ≤ a(u, u ) + b(u, A 0u ) = b(u, ⎡⎣ c + A 0 ⎤⎦ u ), c = a / b.
(4.2)
The form boundedness is a weaker condition than the relative boundedness stated in Eqs. (4.1-a, b), i.e., relative boundedness implies the form boundedness but the coverse is not true (Kato, 1980; Theorem 1.38, ch. VI). While we have assumed V to be symmetric, the results are adequate to treat sectorial forms (sec. 2.III.) by defining a sectorial A by
A = ( A 0 + iV ) .
Foundations and Applications of Variational and Perturbation Methods Let H +
131
D( A 0 ) with respect to the scalar product
be the completion of
(u,υ ) + = (u, υ ) with a positive definite
, defining the norm by || u ||+ = ||
u || . If
= A 0 , H + will be denoted by H + . Friedrichs’ construction can be used to extend the results of Remark 2.5, as follows. 0
0 −1
Lemma 4.3. If V is symmetric in H with D (V ) containing D ( A ) and ( A ) V is 0 −1
bounded in H + , then ( A ) V closes to a self-adjoint operator B in H + . Proof. Since for each u , υ in D ( A ) , 0
(u, ( A 0 ) −1V υ ) + = (u , V υ ) = (V u ,υ ) = (( A 0 ) −1V u , A 0υ ) = (( A 0 )−1V u ,υ )+ , ( A 0 ) −1V is symmetric in H + . Since ( A 0 )−1V is bounded and D( A 0 ) is dense in H + , its closure is self-adjoint • 0
It was pointed out in sec. 2.III. that H + coincides with D ( A ) . Thus, B of Lemma 0 −1/ 2
4.3 is expected to be the closure of ( A )
V ( A 0 )−1/ 2 . This result is rigorously valid as
shown in Theorem 4.2 below. We obtain the results with a = 0 in Eq. (4.2), which are applicable by themselves, with A 0 replaced with other positive operators and can be extended further including the case of non-zero a , which is commented upon. Theorem 4.2. If a symmetric operator V is form bounded with respect to a self-adjoint
operator A 0 such that (u , V u ) ≤ b(u , A u ), in a complex Hilbert space H , then there is 0
an operator
both, in H + and H , bounded by b , such that
(υ , V u ) = (υ , u )+ ,
u ,υ in H + ,
equivalently,
(υ , V u ) = ( A 0υ ,
A 0 u ),
u,υ in H + .
Proof. In a complex Hilbert space H , we have that (υ ,V u ) =
1 {([u + υ ],V [u + υ ]) − ([u − υ ],V [u − υ ]) + i ([u + iυ ],V [u + iυ ]) − i ([u − iυ ],V [u − iυ ])} , 4
which can be checked by direct computation. Consider the case when the form (υ , V u ) is real. Then it follows that
132
S. Raj Vatsya
(υ , V u ) =
1 {([u + υ ],V [u + υ ]) − ([u − υ ],V [u − υ ])} , 4
for the diagonal elements of the form are real. This together with (u , V u ) ≤ b(u , A u ) 0
implies that
| (υ , V u ) | ≤ b || υ ||+ || u ||+ = b || A 0υ || || A 0 u || . A non-real form, equivalently one of its arguments, can be multiplied by a constant of absolute value one without changing its absolute value and the constant can be selected to ensure that the resulting form is real. The above estimate is therefore, valid for an arbitrary form (υ , V u ) . Thus, (υ , V u ) is a sesquilinear form in H + bounded by b . The result follows from Proposition 2.5. The arguments in H are the same where (υ , V u ) is considered a bounded form with vectors ( A
0
υ ) and ( A 0 u ) as its arguments [Kato, 1980; Lemma 3.1, ch. VI] • 0 −1/ 2
is the closure of ( A )
Remark 4.2. It is clear from Theorem 4.2 that 0
V ( A 0 )−1/ 2 0
from D ( A ) to H + . Since B of Lemma 4.3 and
define the same forms in D ( A ) and
H + , they are essentially the same operator. This justifies peeling off ( A 0 )−1/ 2 from one side of B and inserting it on the other side to obtain . This results in a useful operational calculus. For example, the equations of the variational method can be converted in more suitable forms by replacing the basis {φn } by
{
1/ 2
}
φn with an appropriate operator
.A
simple example is given by Theorem 3.7. The procedure is more versatile, particularly for more complicated equations involving the products and sums of operators, where multiple Friedrichs’ construction may be required A non-zero a can be accommodated by this argument or by considering the sesquilinear form (υ ,[V − a ]u ) instead of (υ ,V u ) in Theorem 4.2, with the same result. A one to one correspondence between the resolvents of the resulting operators with the original ones can be established by the arguments similar to Remark 2.5 • As indicated in Remark 4.1(ii), A 0 -compactness of V is equivalent to the compactness 0 −1
of ( z ± A ) V for all z in the resolvent set of A . Due to the positive definiteness of A , 0
0
0 −1
zero is in its resolvent set. The analyticity of ( z ± A ) V together with the compactness of
( A 0 ) −1V
0 −n
implies that ( A ) V
are also compact. These properties imply that
0 −1
( z ± A ) V is compact for all z in the resolvent set of A 0 . Since the adjoints of compact 0 −1
operators are also compact, V ( A )
and the other related operators are also compact. We
Foundations and Applications of Variational and Perturbation Methods
133
state Lemma 4.4 for these operators. However, the results are valid with other appropriate 0
operators replacing A and V . 0 −1
Lemma 4.4. Let ( A ) V be compact in H . Then, 0 −1
(i) for a bounded V , the closure of ( A ) V is compact in H + ; 0 −1/ 2
(ii) closure of ( A )
V ( A 0 )−1/ 2 is compact in H + .
Proof. (i) If || un ||+ ≤ M , then || un ||≤ M ′ from Proposition 2.6(i). Hence there is a
subsequence of {un } , still denoted by {un } , assumed to be contained in D ( A ) , such that 0
|| ( A 0 )−1V (un − um ) ||
→
0.
n , m →∞
Consequently,
|| ( A 0 ) −1V (un − um ) ||+2 = (V [un − um ], ( A 0 )−1V [un − um ]) ≤ || V (un − um ) || || ( A 0 ) −1V (un − um ) ||
→
0,
n , m →∞
for the first term converges to zero due to the boundedness of V , and the second term 0 −1
converges to zero due to the compactness of ( A ) V , both with respect to the norm in H . 0
The operations are valid in D ( A ) . The result on H + follows by closure (Theorem 2.1, Lemma 2.4). (ii)
With
the
same
|| un ||+ =|| A 0 un ||≤ M .
symbols
Hence,
and
{V ( A )
assumptions
0 −1
}
as
in
(i),
we
A 0 un = {V ( A 0 )−1/ 2 un }
have contains
that a
subsequence convergent in H . Now,
|| V ( A 0 ) −1/ 2 (un − um ) || = || A 0 ( A 0 )−1/ 2V ( A 0 )−1/ 2 (un − um ) || = || ( A 0 ) −1/ 2V ( A 0 )−1/ 2 (un − um ) ||+
→ 0,
n , m →∞
for the left side converges to zero. Remainder of the proof is the same as in (i) • We have not required boundedness of V in (ii). Thus, the peeling off procedure of Theorem 4.2 yields a stronger result. However, the argument requires some care as it is not universally applicable.. Above results are stated for a fixed perturbation of a self-adjoint operator. However, they are directly applicable to the parameter dependent families of operators also. For example, a
134
S. Raj Vatsya
one parameter family B ( z ) can be studied by treating its value at one point z as the unperturbed part and B ( z + ε ) , the perturbed operator by taking [B ( z + ε ) − B ( z )] as the perturbation for each ε in a neighborhood of zero. The following example illustrates the application of the above results to establish the selfadjointness of a perturbed operator. Example 4.1. (i) [Kato, 1980; pp. 301-303] Let A
0
be the self-adjoint realization of
−∇ as described in sec. 2.V. and let uˆ (k ) denote the Fourier-Plancherel transform of a square integrable function u (r ) . For a real α , we have 2
(∫
R3
dk | uˆ (k ) |
)
2
⎛ ⎞ dk 2 2 ˆ α ( ) | ( ) | k k u = ⎜∫ 3 2 + ⎟ ⎜ R (k + α 2 ) ⎟ ⎝ ⎠ ⎛ ⎞ dk ≤ ⎜⎜ ∫ 3 2 ⎟ R (k + α 2 ) 2 ⎟⎠ ⎝
(∫
R3
2
)
dk ′( k ′2 + α 2 ) 2 | uˆ (k ′) |2 ,
by the Schwarz inequality (Lemma 1.4). Evaluating the first integral and using Eq. (2.24) to express the second integral, we have that
(∫
R
3
dk | uˆ (k ) |
)
2
≤
π2 || ( A 0 + α 2 )u ||2 , α
0
i.e., if u is in D ( A ) , its Fourier transform is absolutely integrable, which implies that u (r ) is continuous and bounded from the estimate
| u (r ) | ≤ (2π )−3 / 2 ∫ 3 dk | uˆ (k ) | ≤ κα −1/ 2 || ( A 0 + α 2 )u || ≤ κ (α −1/ 2 || A 0u || +α 3/ 2 || u ||), R
where κ is a constant. Consider a perturbation V = V0 + V1 where V0 is a square integrable function and V1 is bounded. It is easy to conclude that each bounded square integrable function is in D(V ) and
|| V u ||≤ a || u || +b || A 0u || , with a = (κα 3 / 2 || V0 || + ||| V1 |||) and b = κα −1/ 2 || V0 || , where ||| ||| denotes the supremum norm. Since α can be taken to be arbitrarily large, it is clear that V is A 0 -bounded with greatest lower bound equal to zero. From Lemma 4.2, relative boundedness with zero bound implies the existence of a z 0 −1
such that || V ( z m A ) || can be made arbitrarily small and z can be taken to be in the 0 −1
negative real line implying that || V ( z m A ) ||→ 0 as z → −∞ . Consequently,
( z − A )−1 is bounded for sufficiently large negative z from the Neumann expansion
Foundations and Applications of Variational and Perturbation Methods
135
(Lemma 2.3), implying that A is bounded below, using in addition the fact that A bounded below, by zero.
0
is
(ii) For practical situations, the condition of (i) can be relaxed further. Let the perturbation V = V0 + V1 be such that V0 is a square integrable function and V1 decays to zero at infinity. This includes for example, the Coulomb potential. From Eqs. (2.27) and (2.28), we have
([ A 0 + α 2 ]−1V u )(r ) =
1
∫ 4π
R
3
dr ′
exp [ −α | r − r ′ |] V (r ′)u (r ′), α > 0 . | r − r′ |
First consider the case V1 = 0 , i.e., an arbitrary square integrable V . By a change of variables from {r, r ′} to {s = (r − r ′), r ′} , we have
∫
3
R ×R
3
drdr ′
exp [ −2α | r − r′ |] | V (r ′) |2 = 2 | r − r′ |
∫
3
R ×R
3
dsdr ′
exp [ −2α | s |] |s|
2
| V (r ′) |2 < ∞
This implies that ([ A + α ] V ) is a Hilbert-Schmidt and hence, a compact operator 0
2 −1
(Lemma 2.5). The required change of the order of integration is easily justified by Fubini’s theorem (Theorem 1.4). Thus, ([ A + α ] V ) is compact for all square integrable 0
2 −1
potentials V . Now let V1 N (r ) = V1 (r ) for | r | ≤ N and zero otherwise, and let V N = V0 + V1 N Clearly, V N is a sequence of square integrable potentials and || V − V N || → 0 . Since N →∞
[A +α ]
2 −1
0
is bounded, it follows that || [ A + α ] (V − V N ) || → 0 . Consequently, 0
2 −1
N →∞
([ A + α ] V ) is compact from Lemma 2.4, and hence V is A -compact. 0
2 −1
0
0
This method together with boundedness of the functions in D ( A ) can be used also to prove the result of (i). Conversely, the arguments of (i) can be used to deduce the result of (ii) • As is the case with convergence, the smoothness properties of the parameter dependent operators can be defined with respect to the uniform operator topology as well as strong and weak vector topologies. An operator valued function B ( z ) of z is uniformly continuous if
|| B ( z ′) − B ( z ) ||→ 0 as z ′ → z and uniformly differentiable if there is an operator B& ( z ) such that
lim || ε →0
B ( z + ε ) − B ( z)
ε
− B& ( z ) || = 0 .
(4.3-a)
136
S. Raj Vatsya
The operator B& ( z ) is termed the uniform derivative of B ( z ) . The operator is uniformly analytic if it is uniformly differentiable for z in a region in the complex plane. Strong and weak derivatives, and analyticity are defined similarly in terms of the existence of limits
lim ||
B ( z + ε ) − B ( z)
ε
ε →0
u − B& ( z )u || = 0 ,
⎡ B ( z + ε ) − B ( z) ⎤ lim (υ , ⎢ − B& ( z ) ⎥ u ) = 0 . ε →0 ε ⎣ ⎦ for all fixed vectors u ,
(4.3-b)
(4.3-c)
υ . Higher order derivatives are also defined similarly. Studies
involving the derivatives thus, require the techniques of perturbation methods.
4.II. SPECTRAL PERTURBATION 4.II.1. Resolvent and Point Spectrum In this and the next sub-section, we study the impact of a perturbation on the spectral 0
properties of a self-adjoint operator A . The perturbing term will be assumed to be such that the perturbed operator A = ( A + V ) is also self-adjoint. In cases when explicit dependence 0
on a parameter
κ is of interest, it will be included by defining A = ( A 0 + κV ) . The
treatment can be adjusted for more complicated parameter dependence. The spectral properties of self-adjoint operators are intimately connected with its resolvent. Its poles define the point spectrum and the branch cuts determine the essential spectrum. In the complement of its spectrum, the resolvent exists as a bounded operator, where the associated inhomogeneous equation has a unique solution. For the eigenvalues, inhomogeneous equation does not have a solution for an arbitrary vector; instead the corresponding homogeneous equation has solutions. Thus, the point spectrum and the resolvent interact with each other in a significant way, which will become clearer as the topic is developed. In this subsection, we consider the impact of a perturbation on the resolvent and properties of the point spectrum, i.e., the eigenvalues and corresponding eigenprojections. The eigenvalues, eigenvectors, eigenprojections and the spectral function of A 0 will be denoted by
λ 0j , u 0j , p 0j and Eλ0 correspondingly, with parallel notation λ j , u j , p j and Eλ
for A , whenever they exist. The following result is stated for analytic operators. Parallel weaker results with weaker smoothness properties will be obtained later, which usually require some alternative arguments. Lemma 4.5. If B ( z ) is a uniformly analytic function of z , then the eigenprojection
p ( z ) corresponding to each of its isolated eigenvalues ξ ( z ) is uniformly analytic with the
Foundations and Applications of Variational and Perturbation Methods
137
ξ ( z ) is a simple eigenvalue of B ( z ) , then it is also analytic. If the multiplicity m of ξ ( z ) is greater than one, then z is the common point of m analytic same region of analyticity. If
functions with derivatives ξ&( z ) being the solutions of the eigenvalue equation
p ( z )B& ( z ) p ( z ) = ξ&( z ) p ( z ) .
(4.4)
Proof. The eigenprojection p ( z ) corresponding to an isolated eigenvalue
ξ ( z ) is given
by (Theorem 2.8)
p (z) =
1 2π i
∫
c
dz ′ [ z ′ − B ( z )]−1 ,
where c is a positively oriented closed contour enclosing
1
ε
1 2π iε 1 = 2π iε 1 → ε → 0 2π i
[ p ( z + ε ) − p ( z )] =
∫
c
∫
c
∫
c
ξ ( z ) . Consequently,
dz′{[ z′ − B ( z + ε )]−1 − [ z′ − B ( z )]−1} dz ′{[ z ′ − B ( z + ε )]−1[B ( z + ε ) − B ( z )][ z ′ − B ( z )]−1} dz ′[ z ′ − B ( z )]−1 B& ( z )[ z ′ − B ( z )]−1 ,
with respect to the norm, since B& ( z ) is the uniform derivative, which also justifies the interchange of the order of the limit and integration. We have used the resolvent equation, Eq. (2.1). The right side defines the uniform derivative p& ( z ) of p ( z ) . It follows that
1 L(ε ) = [B ( z + ε ) p ( z + ε ) − B ( z ) p ( z )]
ε
converges uniformly to L(0) = [B& ( z ) p ( z ) + B ( z ) p& ( z )] . Hence, all the eigenvalues and the corresponding eigenprojections of L(ε ) converge to those of L(0) , from Lemma 3.2(ii),
Remark 3.2 (ii). This, shows that the eigenvalue equation [B ( z ) − ξ ( z )]p ( z ) = 0 can be uniformly differentiated to yield
[B& ( z ) − ξ&( z )] p ( z ) + [B ( z ) − ξ ( z )] p& ( z ) = 0 , resulting in Eq. (4.4) with an operation of p ( z ) from left, for
138
S. Raj Vatsya
p ( z )B ( z ) = B ( z ) p ( z ) = ξ ( z ) p ( z ) . From Lemma 2.2(ii), dimension of the range of p ( z + ε ) remains constant as ε varies in a small neighborhood of zero. From Eq. (4.4), if m = 1 , ξ&( z ) = (u ( z ), B& ( z )u ( z )) is uniquely defined, where u ( z ) is the normalized eigenvector of B ( z ) corresponding to the eigenvalue
ξ ( z ) . For m > 1 , Eq. (4.4) is a matrix eigenvalue equation yielding m values of
ξ&( z ) , each one being the eigenvalue of [ p ( z )B& ( z ) p ( z )] . Existence of ξ&( z ) for each z in the region of analyticity of B ( z ) implies the analyticity of ξ ( z ) • The result stated in Eq. (4.4) is a simple example of the spectral differentiation and the Hellmann-Feynman theorem [Vatsya 2004-1], which will be considered further in sec. 4.III. It is deduced here as it is required for some results to follow. Consider the equation
[ z − ( A 0 + κV )] f = g .
(4.5)
The parameter κ is usually real or introduced to keep track of the powers, which can be set equal to one after manipulations. For some arguments, complexification of κ is more convenient. Power series expansions of the resolvents, used in the earlier studies to solve Eq. (4.5) and to approach the spectrum of A , usually produce slowly convergent sequences and the circle of convergence is often too small. This method is still used for large problems where crude estimates are of some value for the difficulty in obtaining better approximations. In any case, the approach warrants a brief description. 0 −1
0 −1
As seen in the last section, the operators ( z − A ) V and V ( z − A )
are intimately
connected with A and its resolvent. These operators will be used in the following since the deduced results are more transparent with this approach and the constructions are applicable to realistic problems. 0
Lemma 4.6. Let z be in the resolvent set of A and let
|| κ ( z − A 0 ) −1V || ≤ (1 − ε (d )) , where
ε (d ) > 0 depends only on d = dist.( z; σ ( A 0 )) . Then z is in the resolvent set of
A = ( A 0 + κV ) also, and the Neumann expansion of [ z − ( A 0 + κV )]−1 = [1 − κ ( z − A 0 ) −1V ]−1 ( z − A 0 ) −1 = ( z − A 0 ) −1[1 − κV ( z − A 0 ) −1 ]−1 converges with respect to the norm. Furthermore,
Foundations and Applications of Variational and Perturbation Methods
|| ( z − A 0 )[ z − ( A 0 + κV )]−1 ||≤
139
1 1 0 −1 and || [ z − ( A + κV )] ||≤ . d ε (d ) ε (d )
Proof. Follows from Lemma 2.3 and straightforward manipulations. Incidentally, the condition implies that A is self-adjoint, from Theorem 4.1 • Theorem 4.3. Further to the assumptions of Lemma 4.6, let c be the circle of radius d
λ 0 of A 0 of a finite multiplicity m , with corresponding 0 eigenprojection p . Then there are m eigenvalues λ of A counting multiplicities, enclosed by c . The eigenprojection p (λ ) and the corresponding eigenvalues λ can be obtained as expansions in powers in κ in the interior of circle of radius | κ | , determined by condition of centered at an isolated eigenvalue
Lemma 4.6. In the limit of
κ → 0 , we have Eq. (4.4), i.e., p 0Vp 0 = λ& (κ ) |κ =0 p 0 .
Proof. The projection is given by (Theorem 2.8)
1 2π i 1 = 2π i 1 = 2π i
p (λ ) =
∫
c
∫
c
∫
c
dz [ z − ( A 0 + κV )]−1 dz [ z − A 0 ]
−1
∑
∞ n =0
[κV ( z − A 0 )−1 ]n
dz [ z − A 0 ] −1Q( z ), 0 −1
where Q ( z ) and [ z − A ] Q ( z ) are both defined by convergent power series, with respect to the norm, from Lemma 4.6. Hence, the integration can be carried out term by term from the Beppo-Levy theorem (Theorem 1.1), which extends to the present case by the same arguments as in case of an absolutely bounded series. It follows that
p (λ ) =
κn ∑ n=0 2π i ∞
∫
u
c
dz [ z − A 0 ] −1[V ( z − A 0 ) −1 ]n → p 0 (λ 0 ) . κ →0
implying the validity of power series expansion of p (λ ) and thus its analyticity with respect to
κ.
Furthermore,
( A 0 + κ V ) p (λ ) =
1 2π i
∫
u
c
dz{z[ z − A 0 ] −1 −1 + κV ( z − A 0 ) −1}Q ( z ) → A 0 p 0 (λ 0 ) , κ →0
and the left side admits a power series expansion by the same argument as for p (λ ) , in view of the fact that the term in curly brackets is bounded and linear in the eigenvalue equation
κ . Now, it follows from
140
S. Raj Vatsya
0 = ( A 0 + κV − λ ) p (λ ) = p (λ )( A 0 + κV − λ ) p (λ ) ,
(4.6)
that λ is an eigenvalue of the analytic matrix valued function p (λ )( A + κV ) p (λ ) of 0
κ
converging uniformly to p (λ ) A p (λ ) . Validity of the power series expansions and of 0
0
0
0
0
Eq. (4.4) for this case follows from Lemma 4.5 • Theorem 4.3 justifies the power series expansion of the eigenvalues and eigenprojections. In numerical applications of the expansion method, λ and p (λ ) in Eq. (4.6) are expanded in powers of
κ , i.e.,
λ = λ 0 + κλ (1) + κ 2λ (2) + ...; p (λ ) = p 0 + κ p (1) + κ 2 p (2) + ... , and the coefficients are evaluated by term by term comparison. For zero order, this yields the unperturbed equation ( A − λ ) p = 0 and for the first order, 0
0
0
( A 0 − λ 0 ) p (1) = (λ (1) − V ) p 0 . 0
Operating Eq. (4.7) by p from left yields
(4.7)
λ (1) p 0 = p 0Vp 0 , which is a special case of Eq.
(4.4) stated in Theorem 4.3. This implies that (λ
(1)
− V ) p 0 = −(1 − p 0 )Vp 0 . Therefore, Eq.
(4.7) has a family of solutions, ∞ dEλ0 (λ (1) − V ) p 0 dEλ0 (λ (1) − V ) p 0 + ∫−∞ ∫λ 0 +ε λ − λ 0 λ − λ0 0 0 λ 0 −ε dE V p ∞ dEλ0 V p 0 0 λ = κ ′p − ∫ − ∫0 , −∞ λ +ε λ − λ 0 λ − λ0
λ 0 −ε
p (1) = κ ′p 0 +
where the interval [(λ − ε );(λ − ε )] encloses only 0
0
(4.8)
λ 0 of the eigenvalues of A 0 , and κ ′ ,
a matrix, is determined by normalization as convenient. Perturbations of finite rank, or degenerate, even with unperturbed part being a matrix, arise in applications where the standard algebraic methods are deficient for numerical reasons or for their inability to yield some properties of interest. Such perturbations are thus of an independent interest as well as they prepare grounds for more complex perturbations. In addition, an external introduction of such perturbing terms helps analysis in some cases. Degenerate perturbations are considered next. An operator V μ of finite rank μ can be expressed as
Vμ u =
∑
μ
φ (ψ n , u ) ,
n =1 n
(4.9)
Foundations and Applications of Variational and Perturbation Methods where
{φn }
141
ψ n are determined by the choice of
constitutes a basis in the range of V μ and
φn . For example, if {φn } = {ϕn } is an orthonormal basis, ψ n can be expressed as ψn = If
∑
∞ m =1
(ϕn ,V μϕ m )ϕ m .
ηn and ωn are the eigenvalues and normalized eigenvectors of a self-adjoint V μ ,
respectively, V μ admits the representation
Vμ u =
∑
μ n =1
∑
ωn (ψ n , u ) =
μ
ηnωn (ωn , u ) .
n =1
Analysis with any of the choices, all represented by Eq. (4.9), is essentially the same. Specific choices may be used for convenience for a particular application. −1
The resolvent [ z − A ]
can be expressed as
[ z − A ]−1 = [ z − ( A 0 + V μ )]−1 = [1 − ( z − A 0 )−1V μ ]−1 [ z − A 0 ]−1 . 0 −1
(4.10)
For each z in the resolvent set of A , [( z − A ) V μ ] is a well-defined operator of rank 0
μ,
and hence, its spectrum consists entirely of the eigenvalues each one of a finite multiplicity. 0 −1
−1
Consequently, the singularities of [1 − [( z − A ) V μ ]] 0 −1
are its poles, i.e., the collection of
υ ) . Lemma 4.7 below provides a useful characterization
−1
poles of (u ,[1 − [( z − A ) V μ ]] of these poles.
Lemma 4.7. Let V μ , be as defined by Eq. (4.9), and let the matrix valued function of z
with elements
′ ( z ) = (ψ n , [ z − A 0 ]−1φm ) Lnm be denoted by L′( z ) . Then
(4.11)
γ in the resolvent set of A 0 is an eigenvalue of
A = ( A 0 + V μ ) if and only if it is a solution of ξ (γ ) = 1 where ξ ( z ) is an eigenvalue of
L′( z ) ; equivalently, an eigenvalue of [( z − A 0 ) −1V μ ] . Proof. The matrix L′( z ) is clearly an analytic function of z on the resolvent set of A 0 .
From Lemma 4.5, If
ξ ( z ) are also analytic.
γ is an eigenvalue of A , then there is a vector u such that
[(γ − A 0 ) − V μ ] u = 0 ,
142
S. Raj Vatsya
i.e.,
u = where
∑
μ n =1
(γ − A 0 ) −1φn (ψ n , u ) =
α n = (ψ n , u ) =
∑
μ m =1
∑
μ n =1
α n (γ − A 0 ) −1φn ,
α m (ψ n ,[γ − A 0 ]−1φm ) .
(4.12)
Thus, 1 is an eigenvalue of L′(γ ) . Since all the eigenvalues are analytic, there must be at least one
ξ ( z ) such that ξ (γ ) = 1 .
Conversely, if there is an eigenvalue
ξ ( z ) such that ξ (γ ) = 1 , then there is a μ -vector
satisfying Eq. (4.12). Set
u = Since
∑
μ n =1
α n (γ − A 0 ) −1φn .
α n = ∑ m =1α m (ψ n ,[γ − A 0 ]−1φm ) , we have that α n = (ψ n , u ) . It follows that μ
∑
(γ − A 0 )u = implying
μ n =1
α n φn =
∑
μ
φ (ψ n , u ) ,
n =1 n
γ is an eigenvalue of A .
Equivalence of the eigenvalues of L′( z ) and [( z − A ) V μ ] follows by essentially the 0 −1
same argument. All one has to do is to consider
ξ ( z )u =
∑
μ
(γ − A 0 ) −1φn (ψ n , u ) = n =1
∑
μ n =1
α n (γ − A 0 )−1φn ,
and adjust the other arguments accordingly • The characterization of the eigenvalues of A = ( A + V μ ) obtained in Lemma 4.7 is 0
equivalent to their characterization as the zeros of the Weinstein-Aronszajn determinant, det . [1 − L′( z )] [Weinstein and Stenger, 1972, pp. 82-85; Kato, 1980, pp. 244250]. However, the present characterization is more suitable for some of the analysis to follow. Remark 4.3. (i) The equation
ξ (γ ) = 1 in Lemma 4.7 can have multiple solutions. Also,
it follows from Eq. (4.10) and Lemma 4.7, that the resolvent set of A includes the intersection of the resolvent set of A
0
and the complement of the set of these eigenvalues,
ξ ( z ) . Changing the roles of A 0 and A results in the conclusion that the resolvent set of A 0 includes the intersection of the resolvent set of A and the complement of the set of the solutions of ξ ′(λ ) = −1 , where ξ ′(λ ) are the eigenvalues of [( z − A ) −1V μ ] . In any case,
Foundations and Applications of Variational and Perturbation Methods
143
−1
the poles of [ z − A ] , i.e., the eigenvalues of A , constitute a subset of the union of the solutions of
ξ (λ ) = 1 and the eigenvalues of A 0 . 0
These results for the degenerate case extend easily to more general A - compact 0 −1
perturbations. Since [( z − A ) V ] is then compact (Remark 4.1 (ii)),
ξ (λ ) = 1 can have at
most a finite number of solutions. Thus, each eigenvalue of A with a relatively compact perturbation is isolated with at most a finite multiplicity. (ii) Properties of the continuous spectrum are more delicate. There are quite mild perturbations, which can alter the continuous spectrum drastically. For example, according to the Weyl-von Neummann theorem (Kato, 1980, pp. 525-529) one can find a Hilbert-Schmidt perturbation with arbitrarily small Hilbert-Schmidt norm that alters a completely continuous spectrum into a completely discrete spectrum. Even this condition can be relaxed. On the other hand, various unbounded perturbations leave the continuous part of the spectrum unchanged, as will be seen in the next subsection. However, relatively compact perturbations leave the essential spectrum of the operator invariant [Kato, 1980; ch. IV, Theorem 5.35]. Union of the set of isolated eigenvalues and essential spectrum constitutes the spectrum of an 0
operator. It follows that the eigenvalues of A with an A - compact V are contained in the complement of its essential spectrum. We shall consider only the cases when the essential spectrum is absolutely continuous in the most part • In the following, we consider the effect of the degenerate perturbations on the solutions of the inhomogeneous equations and the eigenvalues, which has a variety of applications. Consider the inhomogeneous equation
[ z − ( A 0 + V μ )] f = g
(4.13)
for z in the resolvent set of A . Since it is of no consequence, we have set
κ =1.
Theorem 4.4. (Capacitance method) Let z be in the intersection of the resolvent sets of
A and A 0 . Then
(
∑
f = [ z − A 0 ]−1 g + where
μ n =1
)
α nφn ,
α n are determined by
αn −
∑
μ n =1
α m (ψ n , [ z − A 0 ]−1φm ) = (ψ n , [ z − A 0 ]−1 g ) .
(4.14)
Proof. It follows from Eq. (4.9) and Eq. (4.13) that
f = [ z − A 0 ]−1 g +
∑
μ n =1
[ z − A 0 ]−1φn (ψ n , f ) .
(4.15)
144
S. Raj Vatsya Eq. (4.14) follows by setting
α n = (ψ n , f ) , which has a unique solution from Remark
4.3 (i) • The matrix on the left side of Eq. (4.14) is the capacitance matrix. This method has a variety of applications in inverting the sparse and the structured matrices, as well as in cases 0
when A is an operator on an infinite dimensional space. Following corollary follows trivially from Theorem 4.4.
μ = 1 , [ z − A ]−1 is given
Corollary 4.1. With the assumptions as in Theorem 4.4, and
by [ z − ( A 0 + V μ )]−1 u = [ z − A 0 ]−1 u +
1 [ z − A 0 ]−1φ1 (ψ 1 ,[ z − A 0 ]−1 u ) • 1 − (ψ 1 ,[ z − A 0 ]−1φ1 )
In spite of its apparent simplicity, Corollary 4.1 has a number of useful applications. One of its applications is used in Corollary 4.2 below, to obtain a recursive scheme to determine f of Theorem 4.4 as an alternative to solving Eq. (4.14). ν
Corollary 4.2. With the assumptions as in Theorem 4.4, let A be defined by
Aν u = A 0 +
∑
ν
φ (ψ n , u ), ν = 1, 2,...., μ ,
n =1 n
ν
and let z be in the resolvent sets of A and of A for each ν . Then 0
[ z − Aν ]−1 u = [ z − Aν −1 ]−1 u +
1 [ z − Aν −1 ]−1φν (ψ ν ,[ z − Aν −1 ]−1 u ), 1 − (ψ ν ,[ z − Aν −1 ]−1φν )
ν = 1, 2,..., μ. Proof. Follows from Corollary 4.1 by setting A = A 0
ν −1
for each ν •
The recursive scheme of Corollary 4.2 requires a stronger condition than Theorem 4.4. Following the partial pivoting scheme of Lemma 4.4 below removes the additional restriction. Lemma 4.8. (Vatsya and Tai, 1988) With the assumptions as in Corollary 4.2, if ν is the
first index such that [1 − (ψ ν ,[ z - A
(
] φν )] = 0 , then there is an index η ≥ (ν + 1) such
ν −1 −1
] φν ) ) ≠ 0 , and Eq. (4.15) is equivalent to
ν −1 −1
that 1 − (ψ η ,[ z - A
f = [ z − Aν −1 ]−1 g +
∑
μ n =ν
[ z − Aν −1 ]−1φn′ (ψ n′ , f ) ,
Foundations and Applications of Variational and Perturbation Methods where any
145
φn′ = φn for n ≠ η , ψ n′ = ψ n for n ≠ ν , φη′ = (φη − βφν ) , ψ ν′ = (ψ ν + βψ η ) with
β ≠ 0 , and (1 − (ψ ν′ ,[ z - Aν −1 ]−1φν′ ) ) ≠ 0 . ν −1 −1
Proof. Since [ z − A
] exists, f is given by
(
f = [ z − Aν −1 ]−1 g +
∑
μ n =ν
)
φn (ψ n , f ) ,
which can be obtained by solving
αn −
∑
μ m =ν
α m (ψ n , [ z − Aν −1 ]−1φm ) = (ψ n , [ z − Aν −1 ]−1 g ) ,
(
from Theorem 4.4. If 1 − (ψ η ,[ z − A
(4.16)
] φν ) ) = 0 for each η ≥ (ν + 1) , then all the
ν −1 −1
elements in the first column, i.e., corresponding to m = ν , are equal to zero, rendering the −1
matrix non-invertible, which contradicts the fact that [ z − A ]
(
an index η ≥ (ν + 1) such that 1 − (ψ η ,[ z − A
exists, implying that there is
] φν ) ) ≠ 0 .
ν −1 −1
The solution f can now be expressed as
(
f = [ z − Aν −1 ]−1 g + = [ z − Aν −1 ]−1 g +
(
with 1 − (ψ ν′ ,[ z − A
∑ ν φ (ψ , μ ∑ ν [ z − Aν μ
n=
n
n
f ) + βφν (ψ η , f ) − βφν (ψ η , f )
)
] φn′ (ψ n′ , f )
−1 −1
n=
] φν′ ) ) = (1 − (ψ η ,[ z − Aν −1 ]−1φν ) ) ≠ 0 •
ν −1 −1
We have used the representation of the resolvent given by Eq. (4.10) to develop a procedure to determine the isolated eigenvalues and the corresponding eigenprojections of A in Lemma 4.7, and procedures to solve the inhomogeneous equation, Eq. (4.13) in the subsequent results above. Degenerate perturbations have also been used to obtain lower bounds to the eigenvalues of a self-adjoint operator A = ( A + V ) where the lower part of 0
the spectrum of A
0
consists entirely of isolated eigenvalues and V is a positive operator.
Stronger results can be obtained for a strictly positive V . Both instances arise in practical applications. For the present, we consider the case of a strictly positive V with the greatest lower bound equal to zero allowed. Corollary 4.3. Let V be a positive operator, i.e., (u , V u ) > 0 for each normalized
vector u in H , and let PN be the orthoprojection constructed with basis in D ( V ) . Then we have
146
S. Raj Vatsya
(PNV u,[PNVPN ]−1PNV u ) ≤ (PN +1V u,[PN +1VPN +1 ]−1PN +1V u ) ≤ (u,V u ) , for each N . Proof. It follows from the assumptions that the numerical range of V is contained in the −1
open interval (0; ∞) . Consequently, the restriction of [PN VPN ]
to PN H exists (Remark
3.2(i)). The result follows from Eq. (3.33-c) with appropriate substitutions • 0
Lemma 4.9. In addition to the assumptions as in Corollary 4.3, let A be self-adjoint with lower part of its spectrum consisting entirely of isolated eigenvalues. Further, let
) V N = VPN [PNVPN ]−1PN V = VPN V N −1PN V , and let
)
λ jN and λ j denote the eigenvalues of ( A 0 + V N ) and ( A 0 + V ) respectively. Then
)
λ jN ↑ λ j ≤ λ j as N → ∞ . Proof. The fact that
{λ }
∞
jN
N =1
constitutes a non-decreasing sequence bounded above by
λ j follows from Corollary 4.3 and the monotonicity principle (Lemma 3.20). Consequently, )
λ jN ↑ λ j ≤ λ j , as N → ∞ • The result of Lemma 4.9 has proven to be of little practical value with general basis sets due to the difficulties encountered in the evaluations of λ jN . However, with a properly selected basis as stated in Theorem 4.5, this result has been widely used to calculate the lower bounds. Theorem 4.5. In addition to the assumptions of Lemma 4.9, let
{φ
n
= V −1un0 }
N n =1
where
un0 are the eigenvectors of A 0 corresponding to the eigenvalues λn0 , and let PN be the orthoprojection on the space spanned by
N {φn }n=1 . Then λ jN ≤ λ j
for j = 1, 2,.., N , and
λ jN = λ j0 for j > N . )
Proof. The degenerate operator V N of Lemma 4.9 is constructed from Eq. (3.17-a) by
replacing [ z − A ] by V and g by V u , yielding
) VN u =
∑
N n =1
α n (u ) V φ n =
∑
N n , m =1
Z nm V φn (V φm , u ) ,
Foundations and Applications of Variational and Perturbation Methods
147
where Z is the inverse of the matrix with elements (φn , V φm ) . With the stated basis, this reduces to
) VN u = where
∑
N n , m =1
nm
un0 (um0 , u ) , 0
{ }
0 N
operator of rank N with its range spanned by un i.e.,
)
−1 0
is the inverse of the matrix with elements (un ,V um ) . Thus, V N is a degenerate n =1
)
. It is clear that ( A + V N )u j = λ j u j , 0
0
0
0
λ jN = λ j0 for j > N . The inequality follows from Lemma 4.9 •
The results of Lemma 4.9 and Theorem 4.5 are useful for obtaining the lower bounds [Weinstein and Stenger, 1972, pp. 82-85] but their convergence is not assured. Convergence can be shown to be valid with additional restrictions on V . Instead of obtaining varied results, this result will be proven for an example of practical importance in sec. 7.II. Introduction of an external degenerate perturbation, even of rank one, has been used to deduce significant results. We pursue this approach in the following.
γ be an isolated eigenvalue of a self-adjoint operator T with corresponding eigenprojection p , and let φ be a vector such that (u , φ ) ≠ 0 for each Lemma 4.10. Let
u = p u . Then [ z − T − B ]−1 exists for each z in some neighborhood of γ , where B is defined by Bυ = φ (φ ,υ ) . Proof. Assume that there is an eigenvector w such that
(T − γ ) w + φ (φ , w) = 0 .
(4.17)
If (φ , w) = 0 , Eq. (4.17) reduces to (T − γ ) w = 0 , and hence w = p w , which contradicts the assumption that (u , φ ) ≠ 0 for each u = p u . Therefore (φ , w) ≠ 0 . Taking the scalar product of Eq. (4.17) with an arbitrary u = p u leads to the same contradiction, implying that
[γ − T − B ]−1 exists. From Lemma 4.7 and Remark 4.3 ((i) and (ii)), the intersection of the resolvent set of T and the spectrum of [T + B ] consists of an isolated set of points. Since γ is isolated and it is not in the spectrum of [T + B ] , the resolvent set of [T + B ] contains a connected neighborhood of
γ •
Lemma 4.10 reduces to a particularly simple form for a non-degenerate eigenvalue. Corollary 4.4. With the assumptions as in Lemma 4.10, each simple eigenvalue
is a solution of
γ of T
148
S. Raj Vatsya
ξ (γ ) = (φ ,[T + B − γ ]−1φ ) = 1 , with a corresponding eigenvector being given by [T + B − γ ]
−1
φ.
Proof. The operator [T + B − γ ] is invertible from Lemma 4.10. The result now follows
from Lemma 4.7 by setting A = [T + B ] and V μ = −B • 0
−1
The eigenvalues of T are characterized as the poles of [T − z ] , which are sometime difficult to approach, particularly when the eigenvalues differ substantially from each other in magnitude. In such cases, the characterization of Corollary 4.4 can be used to approximate the eigenvalues with relative ease. Also, this result can be extended in various ways. For the present, we extend it to obtain converging lower bounds to the eigenvalues of A = ( A + V ) 0
with a non-negative V . It suffices to consider a finite number of eigenvalues can be arbitrarily large. The eigenvalues
{λ } j
J j =1
as J
λ 0j of A 0 will be counted in non-decreasing order
0 0 0 0 counting multiplicities, with p j = u j (u j , ) being the eigenprojection corresponding to λ j .
For the degenerate eigenvalues, any orthonormal set in the corresponding eigenspace can be taken for the eigenvectors, without impacting upon the analysis and results. Let
A% = [ A +
∑
J m =1
(λJ0+1 − λm0 ) pm0 ] = [ A 0 + V +
In some situations, it is advantageous to replace such that
∑
J m =1
(λJ0+1 − λm0 ) pm0 ] .
λJ0+1 in the definition of A% by a constant α
λJ0 < α ≤ λJ0+1 , which leaves the analysis and results unaffected. It is clear that
A% ≥ λJ +1 + ε ′ with some ε ′ ≥ 0 , and hence, [ A% − x]−1 exists on the interval (−∞; λJ +1 + ε ′ − ε ] for ε > 0 , with bound ε −1 . Let L′( x) be the matrix valued function with elements
′ ( x) = (um0 , [ A% − x]−1 ul0 ) , Lml where um = 0
(4.18)
(λJ0+1 − λm0 )um0 . The matrix L′( z ) , obtained by replacing x by a complex
variable z , is analytic in some neighborhood of (−∞; λJ +1 + ε ′ − ε ] . We have Lemma 4.11. With the matrix L′( x) given by Eq. (4.18),
and only if it is the unique solution of
L′( x) .
γ is an eigenvalue of A if
ξ (γ ) = 1 where ξ ( x) is an eigenvalue of the matrix
Foundations and Applications of Variational and Perturbation Methods Proof. The fact that the equality
149
ξ (γ ) = 1 characterizes γ , follows from Lemma 4.7 by
J substituting A% for A and V μ u = −∑ m=1 um0 (um0 , u ) for each u . From Lemma 4.5, 0
ξ ( z ) is
−2 analytic with ξ&( z ) given by ξ&( z ) = (u , [ A% − z ] u ) , for some non-zero vector u . −1 2 Therefore for x in ( −∞; λJ +1 + ε ′ − ε ] , ξ&( x) =|| [ A% − x ] u || ≥ 0 . −1 In fact ξ&( x) > 0 for [ A% − x] u = 0 implies that u = 0 . Consequently,
ξ ( x) is a strictly increasing function. Existence and the uniqueness of the solution of ξ (γ ) = 1 now follows from the fact that L′(−∞) = 0 and L′( x ) diverges as x approaches λJ +1 from below •
)
κ be such that λJ +1 + ε ′ > κ > λJ ≥ .... ≥ λ1 and let A = ( A% − κ ) . Then γ is an eigenvalue of A if and only if it is a fixed point of ζ ( x) , i.e., a solution of ζ (γ ) = γ for x in the interval ( −∞; κ ] , where ζ ( x) is an eigenvalue of the matrix L( x) Lemma 4.12. Let
with the elements
) ) Lml ( x) = (um0 , A[ A + κ − x]−1 ul0 ) − (um0 , ul0 ) + κδ ml . )
Proof. The identity (κ - x )( A + κ − x )
−1
) ) = [1 − A ( A + κ − x)−1 ] yields
[κ − L( x)] = (κ − x) L′( x) . Thus
ζ ( x) and ξ ( x) are related by ξ ( x) = [κ − ζ ( x)] /[κ − x] .
Consequently,
ξ (γ ) = 1 if and only if ζ (γ ) = γ as long as γ < κ •
The results of Lemma 4.11 and Lemma 4.12 identify the eigenvalues corresponding curves
λ j with the
ξ j ( x) and ζ j ( x) . The lower bounds will be obtained by producing
the appropriate approximating sequences to these curves.
)N
Lemma 4.13. Let the matrix valued function L ( x ) be defined by its elements
) ) ) ) ) −1 LNml ( x) = (PN Aum0 , ( PN A[ A + κ − x]PN ) PN Aul0 ) − (um0 , ul0 ) + κδ ml . )
with PN constructed from a basis in D ( A ) = D ( A ) and let Then
)
0
)
ζ jN ( x) be its eigenvalues.
ζ jN ( x) ↑ ζ j ( x ) as N → ∞ , uniformly for x in each closed bounded subset of the
interval in its domain.
150
S. Raj Vatsya
)N
Proof. The convergence of each element L ml ( x) to Lml ( x ) follows from Theorem
3.7(i) for all x in the specified domain. Since it is a matrix of finite dimension, this implies
)N
that || L ( x ) − L( x) ||J → 0 , where || N →∞
as a Hilbert space. Convergence of If
||J denotes the norm in the J -vector space treated
)
ζ jN ( x) to ζ j ( x) now follows from Lemma 3.2(ii).
α m are the elements of an arbitrary normalized J -vector α , then
) ) ) ) ) −1 (α , LN ( x)α ) J = (PN Au, (PN A[ A + κ − x]PN ) PN Au ) − (u, u ) + κ , where u =
∑
)
J
)
α u 0 . From Theorem 3.12, (α , LN ( x)α ) J ↑ (α , L( x)α ) J as N → ∞ . m =1 m m
From the monotonicity principle (Lemma 3.20),
)
)
ζ jN ( x) ↑ ζ j ( x) . It is straightforward to
ζ j ( x) and ζ jN ( x) are continuous functions of x . Hence, the convergence is uniform with respect to x by Dini’s theorem (Lemma 1.2) • check that
Theorem 4.6. Let
)
)
)
)
)
λ jN be the solutions of ζ jN (λ jN ) = λ jN . We have that λ jN ↑ λ j as
N → ∞ and the corresponding eigenprojections converge to the exact ones. Proof. Let
)
)
)
ξ jN ( x) = [κ − ζ jN ( x)] /[κ − x] . It follows from Lemma 4.13 that )
)
)
)
)
)
ξ jN ( x) ↓ ξ j ( x) . Also, ζ jN (λ jN ) = λ jN if and only if ξ jN (λ jN ) = 1 . Consequently, {λ jN } is a non-decreasing sequence bounded above by Assume that
)
)
)
λ j . Hence, λ jN ↑ λ j ≤ λ j .
λ j < λ j . Let ε > 0 be such that ξ j (λJ +1 + ε ′ − ε ) > 1 , which exists since )
ξ j ( x) diverges to ∞ as x ↑ λJ +1 + ε ′ . Since ξ jN (λJ +1 + ε ′ − ε ) converges to )
ξ j (λJ +1 + ε ′ − ε ) , we have that ξ jN (λJ +1 + ε ′ − ε ) > 1 for sufficiently large N . Consequently, the solutions
)
λ jN are contained in an interval [λ1 ; λJ +1 + ε ′ − ε ] .
Consider
) ) ) ) ) ) ) ) ) | ξ j (λ j ) − 1| = | ξ j (λ j ) − ξ jN (λ jN ) | ≤ | ξ j (λ j ) − ξ j (λ jN ) | + | ξ j (λ jN ) − ξ jN (λ jN ) | . As N → ∞ , the first term converges to zero since
)
)
λ jN → λ j and ξ j ( x) is continuous. The
second term converges to zero since the convergence of
)
ξ jN ( x) to ξ j ( x) is uniform for x
in [λ1 ; λJ +1 + ε ′ − ε ] (Lemma 4.13). Therefore, by increasing N it can be ensured that for each ε ′′ > 0 ,
Foundations and Applications of Variational and Perturbation Methods
151
) ) ) ) | ξ j (λ j ) − 1| = | ξ j (λ j ) − ξ jN (λ jN ) | < ε ′′ . Since the left side is independent of N , it follows by letting N → ∞ , that a solution of
)
)
λ j for each j is
ξ j (λ j ) = 1 . This result can be concluded by adjusting Proposition 1.1 also. In
any case, this implies that
)
)
λ j = λ j from Lemma 4.11, which is a contradiction. Hence
λ jN ↑ λ j . Convergence of the eigenprojections was shown in the course of the proof of Lemma 4.13. For further clarification, assume that the eigenvalues are non-degenerate. Degenerate case can be treated with slight adjustments to the arguments. It is clear from Lemma 4.13 that the convergent approximating sequence of the eigenvectors is given by
) ) ) ) ) u jN (λ jN ) = PN A[ A + κ − λ jN ]PN
(
where
)
−1
) Aυ ) ,
(4.19)
υ = ∑ m=1α mum0 with α m being the components of the corresponding eigenvector of J
) ) LN (λ jN ) •
It follows from straightforward algebraic manipulations that
) )) ) PN [ A − λ jN ][υ − Au jN (λ jN )] = 0 , equivalently,
) ) ) PN A ωN = λ jN PN ω N , with
)
))
(4.20)
)
)
)
ωN = [υ − Au jN (λ jN )] . Both of the sequences, ωN and u jN can be seen to converge to
) u j . Thus, λ jN is a solution of the eigenvalue equation given by Eq. (4.20). No further
meaningful reduction appears feasible. In the method of Lemma 4.9 and Theorem 4.5 the
)
approximations are the eigenvalues of ( A + V N ) . Thus, the two methods are fundamentally 0
different. The result of Theorem 4.6 is stronger in that it requires only the non-negativity of the perturbations and yields convergent sequences of lower bounds. Remark 4.4. A method parallel to Theorem 4.6 can be developed to produce a sequence
of the upper bounds also. The upper bounds
λ jN by this method are the solutions of
ξ jN (λ jN ) = 1 where ξ jN ( x) are the eigenvalues of the matrix with elements
152
S. Raj Vatsya
LNml ( x) = (PN um0 , [ A% N − x]−1PN ul0 ) .
(4.21)
However, these approximations reduce to the same as by the variational method applied directly to A , i.e., Let
ξ jN ( x) are the eigenvalues of AN , as shown below.
α be the normalized eigenvector of LN (λ jN ) corresponding to the eigenvalue
ξ jN (λ jN ) , and let u Nj = [ A% N − λ jN ]−1 w , with w = ∑ m =1α mPN um0 . It is clear that PN w = w J
and PN u j = u j . Since N
N
ξ jN (λ jN ) = 1 is equivalent to LN (λ jN )α = α , it follows from Eq.
(4.21) that (PN um , u j ) = α m , which yields 0
N
w = [ A% N − λ jN ]u Nj = [ A N − λ jN ]u Nj +
∑
J m =1
PN um0 (um0 , PN u Nj )
= [ A N − λ jN ]u Nj + w, i.e., [ A N − λ j ]u j = 0 , implying the result, from Theorem 3.2. The convergence of the N
bounds
N
λ jN ↓ λ j , and of the corresponding eigenprojections follow as for the lower bounds
case in Lemma 4.13 and Theorem 4.6. The approximate equations for the upper and lower bounds differ essentially by the representations of the eigenvectors. While PN u Nj = u Nj , which is the approximate eigenvector corresponding to the upper bounds, the eigenvector
)
ωN yielding the lower
bounds has non-vanishing components both in PN H and (1 − PN )H . In addition,
)
)
ωN has a
representation ω N = (υ − ω N′ ) • So far we have considered the isolated eigenvalues with further property that they are contained in an interval below the rest of the spectrum. The projection on the eigenspace corresponding to an isolated eigenvalue is given by the integral of the resolvent along a closed contour enclosing the eigenvalue, a procedure used repeatedly. In general, the spectral function on an interval can be determined from Theorem 2.8. However, this does not isolate the projection corresponding to an eigenvalue embedded in the continuous spectrum. Neither of the procedures can be used for the eigenvalues that are not isolated. In realistic examples, the set of eigenvalues can have a limit point in the continuous part and the eigenvalues embedded in the continuum, e.g., the quantum mechanical models of the atomic systems. An alternative criterion to characterize the eigenprojection corresponding to an arbitrary eigenvalue is given by the mean ergodic theorem below. Lemma 4.14. (mean ergodic theorem) The orthoprojection p (λ ) on the eigenspace
corresponding to an eigenvalue
λ of a self-adjoint operator A is given by
Foundations and Applications of Variational and Perturbation Methods
p (λ ) = s − lim t →∞
1 t i ( A −λ )t ′ e dt ′ . t ∫0
153
(4.22)
Proof. Without loss of generality, set λ = 0 , as it means only to replace A by ( A − λ ) .
We show the validity of the result for each u in the null space of A and in the range of A , which will be shown to be dense in H . The result in H will follow from the fact that Eq. (4.22) defines a bounded operator (Theorem 2.1). For u in the null space of A , we have that Au = 0 and hence, e
p (0)u =
iA t′
u = u , yielding
1 t u dt ′ = u , t ∫0
which implies the result for all such vectors. For each u in the range of A , p (0)u = u or p (0)u = 0 . If p (0)u = u , then u is in the null space of A . For u in its orthogoanal complement, there is a υ ≠ 0 in the domain of A such that u = Aυ and thus
1 t iA t′ 1 t i t d iA t′ i e u dt ′ = ∫ ei A t ′ A υ dt ′ = − ∫ e υ dt ′ = − [ei A t − 1] υ , ∫ t 0 t 0 t 0 dt ′ t from Eq. (2.6). Since p (0)u = 0 , it follows that
||
1 t i A t′ 1 2 e u dt ′ − p (0)u || = || ( ei A t − 1)υ || ≤ || υ || → 0 . ∫ t →∞ t 0 t t
To show that the above set is dense, consider ( w, Aυ ) where w is a non-zero vector and
υ
varies over the domain of A . If ( w, Aυ ) = 0 then w is in the domain of Aˆ and
since υ varies over a dense set, Aˆ w = A w = 0 , for Aˆ = A . Thus, the only vectors orthogonal to the range of A are its null vectors. Consequently the set of vectors including the null vectors and the range of A is dense •
4.II.2. Continuous Spectrum Now we turn our attention to the continuous spectrum of a perturbed self-adjoint − i At
, termed the propagator, used to operator. One parameter family of unitary operators e characterize the projection on an eigenstate in the ergodic theorem, basically determines the evolution of a state with respect to t under the influence of A . If the spectral function is known, the propagator for a self-adjoint operator can be determined by the spectral theorem. Alternatively, the power series expansion can be used. In general, the power series expansion
154
S. Raj Vatsya
of an exponential is defined only on the vectors in the domains of all positive powers of A , which shrinks with increase in power. However, for self-adjoint operators, this set is dense and thus, the series expansion is valid. Another way to determine the propagator is to use the basic result from the theory of Lie groups, which this family constitutes. A group element can be determined from its Lie algebra element, (−i A ) in this case, termed the generator. The method is based on the definition n
⎛ x⎞ e x = lim ⎜ 1 + ⎟ , n →∞ ⎝ n⎠
(4.23)
i.e., the group element can be generated by repeated operations of the group elements in an infinitesimal neighborhood of the identity. This definition also runs into difficulty in some cases. Replacing n by − n in Eq. (4.23) provides a more satisfactory procedure with the same result. In the framework of perturbation methods, the propagator is expressed in terms of the unperturbed propagator e
− i A 0t
. To this end, define the family
0
Ω (t ) = e i A t e − i A t , − ∞ < t < ∞ , which determines e
− i At
by e
− i At
(4.24)
0 ˆ (t ) . Since u (t ) = e −i At u (0) determines = e−i A t Ω
ˆ (t )u (0) = Ω ˆ (t )υ (0) , υ (t ) = ei A t u (t ) = Ω 0
ˆ (t ) constitutes a Lie group. Its generator is determined by its strong for all t , the family Ω derivative given by
s − lim ε →0
1 ˆ ˆ (t ) ⎤ = −iei A 0tV e − i A 0t Ω ˆ (t ) = −iW (t )Ω ˆ (t ) , ⎡ Ω(t + ε ) − Ω ⎣ ⎦ ε
ˆ (t ) is now determined by the following yielding [−iW (t )] as its generator. The element Ω strong integration:
ˆ (t ) = 1 − i t dt W (t )Ω Ω ∫ 1 1 ˆ (t1 ) , 0
(4.25)
ˆ (0) = 1 . Iteration of Eq. (4.25), which is equivalent to an application of Eq. (4.23), with Ω results in
Foundations and Applications of Variational and Perturbation Methods
155
2 ˆ (t ) = 1 − i t dt W (t ) + ( −i ) t dt t1 dt P [W (t )W (t ) ] + ... Ω 1 2 1 2 ∫0 1 1 2 ∫0 ∫0 t1 tn−1 ( −i ) n t + dt1 ∫ dt2 ... ∫ dtn P [W (t1 )W (t2 )....W (tn ) ] + ... (4.26) ∫ 0 0 n! 0 n t t1 tn−1 ∞ ( −i ) ... = ∑ n=0 dt dt dtn P[W (t1 )W (t2 )...W (tn )], 1 2 ∫ 0 n ! ∫0 ∫0
where
P[W (t1 )W (t2 )W (tn )] = [W (tn1 )W (tn2 ).....W (tnn )], tn1 ≥ tn2 ≥ ....tnn−1 ≥ tnn , is the time ordered product. The expansion of Eq. (4.26) is the time ordered exponential denoted by
ˆ (t ) = P exp ⎡ −i t dt ′W (t ′) ⎤ . Ω ⎢⎣ ∫0 ⎥⎦
(4.27)
Eq. (4.27) forms the basis of the time dependent perturbation theory with a range of applications. In some numerical computations, it is more convenient to use Eq. (4.23), or its
≈ (1 + ε ′x) n with small ε ′ and arbitrary n , for an approximate evaluation of the infinitesimal group element for small t , i.e., the first few terms of Eq.
approximation e
nε ′x
(4.26). The propagator accurate up to the second order is given by 0 ˆ (2ε ) = e − i A 0ε e−2iεV e− i A 0ε + o(ε 3 ) . e −2i Aε = e−2i A ε Ω
(4.28)
As can be seen by direct expansion, the last approximation of Eq. (4.28) is more satisfactory than various others, e.g., 0
0
e −2i Aε ≈ e −2i A ε e−2iεV = ℘1 (2ε ) ≈ e−2iεV e−2i A ε = ℘2 (2ε ) ,
(4.29)
which are accurate up to the first order in ε . Eq. (4.28) can be expressed in terms of Eq. (4.29) as 0
0
e −2i Aε ≈ e− i A ε e −2iεV e− i A ε = ℘1 (ε )℘2 (ε ) . Example 4.2. Let A
0
(4.30)
be the self adjoint realization of (−∇ ) and A = A + V with 2
0
V being a function such that A defines a self-adjoint operator in H = L2 ( R 3 , dr ) , which
156
S. Raj Vatsya
can be ensured using the results in sec. 4.I. The propagator for A can be evaluated using Eq. (2.24) with the Fourier transform as the unitary operator, by the same method as Green’s 0
function and the propagator for A were evaluated in sec. 2.V. For small time intervals, the procedure can be applied to Eq. (4.28) and to Eq. (4.29). The corresponding approximations can be evaluated from the values of ℘1 (ε ) and ℘2 (ε ) as indicated in Eq. (4.30), which are given by
[e − i Aε u ](r ) ≈ [℘1 (ε )u ](r ) =
1
(2π ) 1 = (2π )3 3
∫
dk e − iε k
2
∫
dk e − iε k
2
R3
R3
∫
R3
dr′e − ik r′−iε V ( r′)u (r′;0) ≈ [℘2 (ε )u ](r )
− iε V ( r )
∫
R3
(4.31)
dr′e − ik r′−iε V ( r′)u (r′;0).
Alternative expressions are obtained by using the free propagator given by Eq. (2.32) as follows.
[e −i Aε u (0)](r ) ≈ [℘1 (ε )u ](r ) =
1 (4π iε )3/ 2
⎡ i (r − r′) 2 ⎤ −iεV (r′ ) ′ d r exp u (r′; 0) ≈ [℘2 (ε )u ](r ) ⎢ 4ε ⎥ e ∫R3 ⎣ ⎦
=
1 (4π iε )3/ 2
⎡ i(r − r′) 2 ⎤ − iε V ( r ) ′ d r e exp ⎢ 4ε ⎥ u (r′;0) • ∫R3 ⎣ ⎦
(4.32)
In addition to providing the foundation of t –dependent perturbation theory, the family of operators Ω(t ) plays a crucial role in the studies of continuous spectra of some self-adjoint operators, which we consider next. To avoid lengthy duplication, the background material leading from the basic formulation to the expressions used for computational purpose will be given a limited exposure. Purpose here is to provide a reasonable level of justification for the computational equations and to develop methods for solving them. Therefore, some of the steps will be somewhat sketchy. More detailed analysis of the background material is available in literature [Kato, 1980; Prugovečki, 1971]. The first point to consider is the extent of the change to continuous spectrum as a result of a perturbation. As indicated earlier, this is a significant issue as there are quite mild perturbations, which can alter the continuous spectrum drastically (Kato, 1980, pp. 525-529) and there are unbounded perturbations that leave it intact, as mentioned earlier. The pertinent conditions for this are considered next. Stability of the continuous spectrum is intricately related with the quantum mechanical scattering theory, which is described in terms of the asymptotically free states. A state is asymptotically free if the perturbation looses its influence on it as t → ± ∞ , i.e., it evolves 0
under the influence of A alone in the limit. Thus, a state u is asymptotically free if there are states u± such that
Foundations and Applications of Variational and Perturbation Methods
157
0
|| e − i A t u − e − i A t u± || → 0 . t →m ∞
Since e
−i A t
is unitary, this is equivalent to || u − Ω(t )u± ||
→ 0 . The convention of
t →m ∞
± assignment used comes from the scattering theory as will become clear. The strong limits Ω ± of Ω(t ) as t → m ∞ , if they exist, are called the proper wave operators. The related ˆ Ω and T = (1 − S ) / 2π i are known as the scattering and transition operators S = Ω −
+
operators, respectively. The wave operators in any meaningful sense can only be expected to exist only when A has purely continuous spectrum. If A eigenvector u , then Ω(t )u = e
i ( A − λ )t
0
has an eigenvalue
0
λ with a corresponding
u . If the limit exists, then for any real a ,
|| Ω(t + a )u − Ω(t )u || = || eia ( A − λ )u − u || → 0 , t →∞
implying that Au = λ u . Thus, any eigenvector of A is an eigenvector of A with the same 0
eigenvalue. Except for such special circumstances, the wave operators exist only if A 0 has a pure continuous spectrum. Let Pc , Pc denote the projections on the subspaces of continuity Pc H = H c , 0
0
0
Pc H = H c of A 0 and A , respectively. The subspaces of continuity are defined as the 0
orthogonal complements of the sets of all eigenvectors. If A does not have pure continuous spectrum, then the generalized wave operators defined by 0
Ω ± = s − lim Ω(t )Pc0 = s − lim ei A t e −i A tPc0 , t →m ∞
t →m ∞
(4.33)
may still exist. If Pc = 1 , both definitions define the same operators. Since there is no danger 0
of confusion, same notation is retained for both of the cases, except for comments whenever required. The following basic result states the properties of the wave operators, if they exist, which describe their significance in the studies related to continuous spectra. One of the properties is their partial isometry. To recall, an operator is partially isometric if it preserves the norm for all vectors in its domain, which is a closed subspace of the Hilbert space. We consider the case of Ω + or Ω − as convenient. The other case follows similarly. Theorem 4.7. If the wave operator Ω + exists, then
(i) Partial isometry: Ω + is partially isometric from H c to its range P ′H . 0
(ii) Intertwining properties: for each real τ ,
158
S. Raj Vatsya 0 ˆ A = A 0Ω ˆ ei A τ Ω + = Ω + ei A τ , AΩ + = Ω + A 0 , Ω + +
−1
0 −1
on the intersections of the domains of both of the sides, and ( z − A ) Ω + = Ω + ( z − A ) whenever both sides are defined.
ˆ reduces A on the range of Ω , i.e., (iii) The projection P ′ = Ω + Ω + +
P ′Au = AP ′u = P ′AP ′u for each u in the range of Ω + . (iv) H c = P ′H is contained in H c . Proof.
(i) Since the strong convergence preserves the norm, we have
|| Ω + u || = lim || Ω(t )Pc0u || = || Pc0u || = || u || t →−∞
0 ˆ Ω =P0. for each u in H c , which also implies that Ω + + c
(ii) For each real τ ,
ei A τ Ω + = ei A τ s − lim ei A t e − i A
0
t
t →−∞
= s − lim Ω(t + τ )ei A t →−∞
0
τ
0
= Ω + ei A τ .
0
By differentiation, i.e., for u in the domain of A , as in Eq. (2. 6),
Ω + A 0u = s − lim ε →0
1
ε
Ω + ( ei A
0
ε
− 1) u = s − lim ε →0
−1
1
ε
(ei A ε − 1)Ω + u = A Ω + u , 0 −1
implying also the case of the resolvents ( z − A ) Ω + = Ω + ( z − A ) , and that the domain of A contains the range of Ω + .
ˆ Ω Ω ˆ ˆ ′ ′ (iii) It follows from (i) that P ′ = Ω + Ω + + + = Ω + Ω + = P , i.e., P is a projection. 2
Now, from (ii), on the range of Ω + , we have
ˆ A = Ω A 0Ω ˆ = AΩ Ω ˆ ′ ′ ′ P ′A = Ω + Ω + + + + + = AP = P AP .
Foundations and Applications of Variational and Perturbation Methods −1
0 −1
(iv) By integrating ( z − A ) Ω + = Ω + ( z − A )
159
as in Theorem 2.8, it follows that
Pc (λ )Ω + = Ω+Pc0 (λ ) , where Pc (λ ) = Eλ Pc and Pc0 (λ ) = Eλ0Pc0 , i.e., the spectral functions restricted to the spaces of continuity. The result holds on all points of continuity, and by the right continuity of the spectral functions, everywhere. Thus for each u in the domain of A 0 , we have
|| Pc (λ )Ω + u ||2 = || Ω +Pc0 (λ )u ||2 = || Pc0 (λ )u ||2 .
λ , so is the left side. This implies that for each u in H , Ω + u is in H c . Since Ω + u is in P ′H , we have that P ′H is contained in Pc H •
Since the right side is continuous in 0 c
0
It is clear from Theorem 4.7(iv) that H c is isometrically equivalent to a subset of H c . Thus, the restriction of A
0
0
to H c is unitarily equivalent to the restriction of A to this
subset. If this subset is equal to H c , i.e., if P ′ = Pc , then the wave operators are called 0
complete. In that case, A and A restricted to their spaces of continuity are unitarily equivalent and hence, their continuous spectra are identical. Thus, mere existence of the wave 0
operators implies that the continuous spectrum of A is a subset of the continuous spectrum of A and completeness in addition, implies their equivalence. As indicated earlier, the wave operators can reasonably be expected to exist for operators with continuous spectra. Even then, they should be expected to exist only for mild perturbations in some sense. A sufficient condition is given by the following result. Lemma 4.15. Let D be a dense set in H c and for each u in D , let there exist a real τ 0
such that (a) for τ ≤ t < ∞ , e
−i A 0 t
u is in the intersection of the domains of A 0 and A , (b)
0
0
(V e − i A t u ) is continuous in t , and (c) || V e − i A t u || is integrable from τ to ∞ . Then Ω − exists. Proof. Essentially by the same argument as in Eq. (2.6), it follows that 0 d Ω(t ) d i A t −i A0 t u = e e u = i[ei A t V e − i A t ] u . dt dt
Since this derivative is continuous, we have t2
[Ω(t2 ) − Ω(t1 )]u = i ∫ dt [ei A t V e − i A t ] u , t1
0
160
S. Raj Vatsya
and since || e
iA t
||= 1 , this implies that
∫
|| [Ω(t2 ) − Ω(t1 )]u || ≤
t2
t1
0
dt || V e − i A t u || .
Integrability of the integrand now implies that the left side tends to zero as t1 , t2 → ∞ . Thus, {Ω(t )u} is a Cauchy sequence implying the existence of the limit on D . Since D is dense and Ω(t ) is uniformly bounded, Ω − exists for each u in H c (Theorem 2.1) • 0
0
Example 4.3. Let A and A be as in Example 4.1 with a square integrable V . Since
A 0 is absolutely continuous, contained in [0; ∞) , Pc0 = 1 . Let
the spectrum of
u (r ) = exp[− | r − a |2 / 2] with a in R 3 being arbitrary. From Eq. (2.32), the evolved vector u (r; t ) is given by 0
u (r; t ) = [e − i A t u ](r ) = (4π it ) −3/ 2
= (1 + 2it )
Since || e
− i A 0t
⎡ i (r − r′) 2 ⎤ ′ r d exp ⎢ 4 ⎥ u (r′) ∫R3 t ⎦ ⎣
−3/ 2
⎡ | r − a |2 ⎤ exp ⎢ − ⎥. ⎣ (1 + 2it ) ⎦
u ||≤ (1 + 4t 2 ) −3/ 4 , assuming that V is square integrable with its vector norm 0
|| V ||′ , we have that || V e − i A t u || ≤ (1 + 4t 2 )−3/ 4 || V ||′ , which is integrable. The set of 3
vectors u as a varies over R constitutes a dense set, implying from Lemma 4.15, that the wave operators Ω − and Ω + exist. Since Pc = 1 , the wave operators are proper. Thus, the 0
continuous spectrum of A includes the spectrum of A 0 , which is the positive real line. The condition on V can be relaxed to include perturbations that are locally square integrable and decay at infinity as (1+ | r |)−1−ε with some ε > 0 . This still excludes the Coulomb potential, which is of substantial practical interest. This has impeded the development of a proper scattering theory for the long range potentials. However, Coulombic potentials are relatively compact (Example 4.1 (ii)), which is a useful property for the treatment of their discrete spectra • Setting the issue of completeness aside for the moment, we obtain alternative, time independent expressions for the wave operators. Let the operators Ω ± ε be defined by
(υ , Ωε u ) = ε
∫
0
−∞
dt eε t (υ , Ω(t )u ) and (υ , Ω−ε u ) = ε
∫
∞
0
dt e −ε t (υ , Ω(t )u ) .
(4.34)
Foundations and Applications of Variational and Perturbation Methods
161
The right side of Eq. (4.34) are in fact only the sesquilinear forms. However, the bounded operators Ω ± ε are then well-defined by a use of the Riesz representation theorem as in Proposition 2.5. Equalities of Eq. (4.34) are often written formally without the vectors with the meaning understood as in Eq. (4.34) and Lemma 4.16 below. These are the examples of the so called Bochner integrals. It is straightforward to check that Ω ± ε are unitary as a consequence of this property possessed by Ω(t ) . We have Lemma 4.16. If the wave operator
Ω + = s − lim Ω(t )Pc0 exists, then t →−∞
Ω + = s − lim Ωε . Conversely, the right side defines the wave operator. ε →0
Proof. Let
u+ = Ω + u = s − lim Ω(t )u and consider t →−∞
(υ , Ωε u − u+ ) = lim ε ε →0
= lim ε →0
∫
The integrand is bounded by [e
∞
0
−τ
∫
0
−∞
dt eε t (υ , Ω(t )u − u+ )
dτ e−τ (υ , Ω(−τ / ε )u − u+ ), τ = −ε t.
|| υ || (|| Ω(−τ / ε )u || + || u+ ||)] = 2e−τ || υ || || u || , which
is integrable. Therefore, the order of the limit and integration can be interchanged by the Lebesgue theorem (Theorem 1.3). Since for each τ > 0 , i.e., for almost every τ ,
(υ , Ω(−τ / ε )u − u+ ) → 0 , we have Ω + = w − lim Ωε . ε →0
ε →0
Since for each u , || Ωε u ||=|| u || → || u ||=|| u+ || , i.e., the sequence of norms converges ε →0
to the norm of the limit, weak convergence implies strong convergence, from Proposition 1.4. Converse is obvious from the equality • The result of Lemma 4.16 leads directly to t -independent integral equations, as shown below. Required interchanges of integration and limits can be justified by the Lebesgue dominated convergence theorem and the interchanges of orders of integration, by Fubini’s theorem (Theorem 1.4), which will not be mentioned except when crucial. It follows from the spectral theorem (Theorem 2.7), as in Eq. (2.5), that
162
S. Raj Vatsya
ˆ u = ε Ω ε
∫
0
0
−∞
dt eε t ei A t e−i A t u ∞
∞
=
∫ dt e ∫ e e dEλ u iε ∫ [λ + iε − A ] dEλ u ∫ (1 − [λ + iε − A ] (λ − A ) ) dEλ u
=
∫
= ε =
−ε t
iλ t − i A 0 t
−∞
0
∞
0 −1
(4.35)
−∞
∞
0 −1
0
−∞ ∞
−∞
ωˆε (λ ) dEλ u.
Eq. (4.35) is a Hilbert space version of the Lippmann-Schwinger equation. These equations provide t -independent framework for the studies related to continuous spectra and the quantum mechanical scattering theory. Our interest is in the Lippmann-Schwinger equations for the generalized eigenfunctions, which are used for computations and are closely related to Eq. (4.35). Following deductions are also briefly outlined, indicating the connection between the basic formulation and the forms used for computation. More details are available in literature [Prugovečki, 1971; Reed and Simon, 1978, Vol. III]. For this purpose, we restrict to A
0
being the self-adjoint realization of (−∇ ) , which 2
was described in relative detail in sec. 2.V. and A = ( A + κV ) in H = L [ R , dr ] , where 0
2
3
V is an absolutely integrable and locally square integrable function of the Rollnik class defined by
∫
3
R ×R
3
dr dr′
| V (r ) | | V (r′) | < ∞. | r − r′ |2
An explicit parameter
(4.36)
κ is introduced for later applications. To simplify the manipulations
V will also be assumed to be positive on all bounded sets in R 3 . The results can be extended by the same methods to include more general perturbations by including the sign. 0
As indicated in sec. 2.V., the generalized eigenfunction expansion of A is provided by the normalized plane waves Φ (k ) = (2π ) 0
−3/ 2
exp[ik r ] , λ =| k |2 = k 2 . Purpose here is to
obtain the eigenfunctions for A i.e.,
A ′Φ ± (k ) = k 2 Φ ± (k ), λ = k 2 ,
(4.37)
where A ′ is a suitable extension of A , with similar properties. More precise meaning of Eq. (4.37) is given by < Φ ± (k ), Au ) = k
2
< Φ ± (k ), u ) , for each u in the domain of A
intersecting with its space of continuity, where < Φ ± (k ), u ) is defined by
< Φ ± (k ), u ) =
∫
R3
dr Φ*± (k , r ) u (r ) .
Foundations and Applications of Variational and Perturbation Methods
163
There are other ways to interpret Eq. (4.37) rigorously and develop a scheme to construct the eigenfunctions. A “negative” norm can be introduced on a suitable set of functions, which in fact is not negative in value, just smaller than the norm in H , which is the inverse process of Friedrichs’ construction (Remark 2.5). The functions Φ ± acquire the status of vectors in the so defined Hilbert space H − [Berezanskiĭ, 1968]. The operator A is then extended to
H − , if it admits an extension. We shall adopt a more conventional approach for the class of perturbations of the Rollnik class. The solutions of Eq. (4.37) have little meaning unless they also satisfy λ2
∫λ
1
dEλ u =
λ2
∫λ
1
for arbitrary values of
d λ E& λ u =
λ2
∫λ
1
dm (λ ) Φ ± (k ) < Φ ± (k ), u) ,
(4.38)
λ1 , λ2 in the real line, with respect to a measure m (λ ) . For the case
under consideration, dm (λ ) = dk /(2π )
3/ 2
as in sec. 2.V. Eq. (4.38) implies the
completeness of the eigenfunctions on the space of continuity of A and then, it is said to have a generalized eigenfunction expansion. The representation given by Eq. (4.38) enables construction of operators in terms of the eigenfunctions, which are originally defined in terms of the spectral function. In the framework of perturbation methods, a solution scheme for Eq. (4.37) can be attempted by expressing Φ ± as Φ ± (k ) = Φ (k ) + υ± (k ) , yielding formally 0
υ± (k ) = Φ ± (k ) − Φ 0 (k ) = κ [λ ± i 0 − A 0 ]−1V Φ ± (k ) i.e.,
Φ ± (k ) = Φ 0 (k ) + κ [λ ± i 0 − A 0 ]−1V Φ ± (k ) = Φ 0 (k ) + κ [λ ± i 0 − A ]−1V Φ ±0 (k ).
(4.39)
If Eq. (4.39) has solutions satisfying Eq. (4.38), it can be seen by substitution in Eq. (4.35) that
ˆ u = Ω ±
∫
∞
−∞
dm (λ ) Φ 0 < Φ ± , u ) .
(4.40)
The scattering and the transition operators can also be expressed in terms of the eigenfunctions. The defining elements of the scattering and the transition operators have their experimental counterparts and therefore, they are of a more direct interest than the operators. Thus, for a rigorous description of the experimental observations, the existence of the solutions of Eq. (4.39) having the completeness property stated in Eq. (4.38) should be established. This provides a motivation for the program, not a proof. The complications in proving the necessary results arise from two sources. First, the eigenfunctions are not the vectors in H
164
S. Raj Vatsya
and second, the inverses [λ ± i 0 − A ] , if they exist, must be unbounded as are −1
[λ ± i 0 − A 0 ]−1 , which is defined by an integral operator with its Green’s function as the kernel given by Eqs. (2.27), (2.28) and (2.29-a, b). These complications invalidate applications of the arguments frequently used to justify the interchanges of orders of operations creating a need for more careful estimates. The existence and completeness of the generalized eigenfunctions is rigorously valid for the restrictions presently imposed on the potentials [Reed and Simon, 1978-79, Theorem XI.4; Prugovečki, 1971, Ch. 5]. Only the parts that are directly relevant for the present analysis are outlined below. For a potential of the Rollnik class (Eq. (4.36)), the operators
K ± (λ ) = V (λ ± i 0 − A 0 ) −1 V = u − lim K ±ε (λ ) , ε →0
(4.41)
where
K ± ε (λ ) = V (λ ± iε − A 0 ) −1 V ,
λ in the resolvent set 0 −1 of A , there is no distinction between ± , and then K (λ ) = V (λ − A ) V is uniformly analytic function of λ . The spectral properties of A are now determined are in the Hilbert-Schmidt class and hence, compact. It is clear that for 0
essentially by the same arguments as in Eq. (4.10) and Lemma 4.7. The continuous spectrum of A coincides with the positive real line (Example 4.3) and its eigenvalues are in a one to one correspondence with the eigenvalues of
κK (λ ) of unit magnitude. To be precise, we
have Proposition 4.1. With the symbols as in Eqs. (4.39) and (4.41), a
λ in the resolvent set
0
of A is an eigenvalue of A if and only if it is a solution of
[1 − κK (λ )] υ = 0
(4.42)
with u = (λ − A ) V υ being the corresponding eigenvector. 0 −1
Proof. Eq. (4.42) reads in detail as
follows by setting u = (λ − A )
0 −1
υ − κ V (λ − A 0 ) −1 V υ = 0 . The result
V υ , as it implies that (λ − A 0 )u = V υ = κV u ,
and the steps are reversible • The properties of the eigenvalues
ξ (λ ) of κK (λ ) and thus of A can be analyzed by
the method of Lemma 4.11. In brief, these eigenvalues vary monotonically from zero at −∞ to their maximum values ξ (0) at λ = 0 , the largest one being equal to || κ K (0) || . If
|| κK (0) ||> 1 , A has at least one eigenvalue in the negative real line. As κ decreases, the number of eigenvalues increases. It is clear that A can have at most a finite number of
Foundations and Applications of Variational and Perturbation Methods isolated eigenvalues each of a finite multiplicity. If an eigenvalue 0 −1
eigenvalue of A , it is required that u = −( A )
ξ (0) = 1 , for zero to be an
V υ be a vector, i.e., u must be a square
integrable function, which may or may not be the case. Thus, for defined by || κ cK (0) ||= 1 , A has at least one eigenvalue; for and for
165
κ < κ c , where κ c is
κ > κ c , it has no eigenvalue
κ = κ c , zero may be its eigenvalue or A may have no eigenvalue at zero.
The scattering counterpart of Proposition 4.1 is described by the inhomogeneous equation as follows: Theorem 4.8. With the symbols as in Proposition 4.1, let g (k ) = V Φ (k ) and let 0
f ± (k ) = [1 − κK ± (λ )]−1 g (k ), λ ≥ 0 .
(4.43)
Then the generalized eigenfunctions Φ ± (λ ) of A are given by
Φ ± (k ) = Φ 0 (k ) + κ (λ ± i 0 − A 0 ) −1 V f ± (k ), = Φ (k ) + κ (λ ± i 0 − A ) V Φ ± (k ), 0
0 −1
(4.44-a) (4.44-b)
where the strong limits are taken with respect to the norm || Φ ||V = || V Φ || .
λ in the complement of the set of the eigenvalues of A , [1 − κK ± (λ )] have bounded inverses since they are the resolvents of the compact operators, and g (k ) is in H . Consequently, f ± (k ) are well defined vectors in H . Define Φ ± (k ) from Eq. (4.44-a). It Proof. For
follows that
V Φ ± (k ) = g (k ) + κ K ± (λ ) f ± (k ) = f ± (k ) .
(4.45)
Eq. ( 4.44-b) follows by substituting for f ± (k ) in Eq. (4.44-a). Since the limits in Eqs. (4.44a) and (4.44-b) exist with respect to the norm ||
||V , the last substitution is justified •
4.III. SPECTRAL DIFFERENTIATION So far we have studied the continuity properties of the spectral quantities with respect to the continuity of operator, i.e., if the perturbation converges to zero in some sense then to what extent does this imply the convergence of spectral points and the other related quantities, e.g., the eigenprojections. Thus spectral differentiation is just yet another step in the transfer of the smoothness of the operator to its spectrum. To be precise, consider a one
166
S. Raj Vatsya
parameter family A (κ ) of the operator valued functions of
κ . Assuming that this family is
differentiable, in the following we investigate the extent to which the quantities associated with its spectrum are differentiable. Mathematically, this topic requires mostly the adaptations of the arguments used to study the continuity, although some new questions are encountered. Major reason for an interest in this question is due to its useful applications, particularly in quantum chemistry. This interest stems from the Hellmann-Feynman theorem, which originally stated that if u (κ ) is a normalized eigenvector of A (κ ) corresponding to an eigenvalue
ξ (κ ) , then
(u (κ ), A& (κ )u (κ )) = ξ&(κ ) .
(4.46)
This result, if valid, is useful in a number of areas, particularly in the studies of molecular bonding and forces, where the right side is equal to the force with parameter being the distance. However, as will be seen, the result holds under strict and restrictive conditions and it does not cover the differentiation properties of a variety of spectral points, which are interesting from the application point of view [Pupyshev, 2000]. Lemma 4.5 shows the validity of Eq. (4.46) for an isolated non-degenerate eigenvalue of the operators, which have a uniformly analytic extension and extends it to cover the degenerate eigenvalues of a finite multiplicity, stated in Eq. (4.4). This result is useful, as shown by its applications in obtaining Lemma 4.11, in addition to a variety of other applications in literature. However, the requirement of the analyticity is quite limiting. In the following, we relax the conditions. A trivial extension of Lemma 4.5 is given by Lemma 4.17. Let A (κ ) be a uniformly differentiable family of self-adjoint operator
κ (Eq. (4.3-a)). Then each of its isolated eigenvalues ξ (κ ) of a finite multiplicity m is the common point of m eigenvalue curves ξ j (κ + ε ) of A (κ + ε ) for
valued function of
all sufficiently small values of
ε , counting multiplicities. The m derivatives ξ& j (κ ) of these
curves are the eigenvalues determined by Eq. (4.4), i.e.,
p (κ ) A& (κ ) p (κ ) = ξ& j (κ ) p (κ ) ,
(4.47)
where p (κ ) is the eigenprojection of A (κ ) corresponding to Proof. From Lemma 3.1, Remark 3.2 (ii), [ z − A (κ )]
−1
ξ (κ ) .
is uniformly differentiable. The
result follows exactly as in Lemma 4.5 except that the interchange of integration and limit is now justified by the Lebesgue theorem (Theorem 1.3) instead of the analyticity • The weakest differentiability condition assumed in literature for the validity of Eq. (4.46) is the strong differentiability of A (κ ) (Eq. (4.3-b)) on their common domain, supplemented
Foundations and Applications of Variational and Perturbation Methods
167
with the existence and continuity of the family of eigenvectors u j (κ + ε ) and the eigenvalues 1
ε
ξ j (κ + ε ) . With this assumption the left side of the equation
(u j (κ + ε ), [ A (κ + ε ) − A (κ )]u j (κ )) =
converges to (u j (κ ), A& (κ )u j (κ )) as
1
ε
[ξ (κ + ε ) − ξ (κ )] (u j (κ + ε ), u j (κ ))
(4.48)
ε → 0 , from the existence of strong derivative
A& (κ ) and the strong convergence of u j (κ + ε ) to some vector u j (κ ) . Consequently, the limit on the right side also exists, which together with the convergence of u j (κ + ε ) to
u j (κ ) implies that ξ (κ ) is differentiable and that (u j (κ ), A& (κ )u j (κ )) = ξ&(κ )(u j (κ ), u j (κ )) . which is equivalent to the result of Lemma 4.17. However, the assumed continuity properties of u j (κ + ε ) and
(4.49)
ξ j (κ + ε ) hold for
uniformly continuous operators but not for a strongly differentiable family. In fact strong continuity and even strong differentiability does not rule out the possibility of spectral expansion, i.e., there can be more than m eigenvalues of A (κ + ε ) no matter how small ε is, while the corresponding eigenvalue of A (κ ) has multiplicity equal to m . Contrary to the case of Theorem 3.13, there is no supplementing property available in the present case. Thus, the continuity assumptions used in deducing Eq. (4.49) are quite restrictive and difficult to establish. Without these properties, the strong differentiability of the operator implies the following relevant result: Theorem 4.9. Let A (κ ′) be a strongly differentiable family of self-adjoint operator
κ ′ with a continuous derivative A& (κ ′) on their common domain D( A (κ )) for (κ − ε ) ≤ κ ′ ≤ (κ + ε ) for all ε in some neighborhood of zero, and let ξ (κ ) be an isolated eigenvalue of A (κ ) of a finite multiplicity m with the corresponding eigenprojection p (κ ) . Then the spectral projection p (κ + ε ) of A (κ + ε ) corresponding to its spectrum contained in the interval [ξ (κ ) − ε ′ − 0; ξ (κ ) + ε ′ + 0] converges strongly to p (κ ) , where ε ′ is an arbitrary positive number as long as this interval is at a positive distance from the spectrum of A (κ ) in the complement of ξ (κ ) . Further, for sufficiently small ε each such interval contains at least m points of the spectrum of A (κ + ε ) , valued functions of
counting the multiplicities. Proof. The interval [ξ (κ ) − ε ′ − δ ; ξ (κ ) + ε ′ + δ ] has only one point
spectrum of A (κ ) in its interior. Let
ξ (κ ) of the
168
S. Raj Vatsya
B (κ + ε ) =
ξ ( κ ) −ε ′ − 0
∫ξ κ
( ) −ε ′ −δ
λ dEλ (κ + ε ) +
ξ (κ ) +ε ′+δ
∫ξ κ
( ) +ε ′ + 0
λ dEλ (κ + ε ) ,
where Eλ (κ + ε ) is the spectral function of A (κ + ε ) . From Corollary 3.6 (iv), Remark 3.2 (ii), the spectral function of ( A (κ + ε ) − B (κ + ε )) converges strongly as ε → 0 , to the spectral function of A (κ ) at the end points of interval [ξ (κ ) − ε ′ − 0; ξ (κ ) + ε ′ + 0] . Consequently, the spectral projection
p (κ + ε ) = [ Eξ (κ )+ε ′+ 0 (κ + ε ) − Eξ (κ )−ε ′− 0 (κ + ε )] of ( A (κ + ε ) − B (κ + ε )) , which is also the spectral projection of A (κ + ε ) for this interval, converges to p (κ ) . This result can also be deduced by the method of Lemma 4.5 by letting c be a circle centered at
ξ (κ ) with radius [ε ′ + δ ′] , where δ ′ < δ is otherwise
arbitrary, and then letting δ ′ → 0 . Since p (κ + ε )u converges strongly to p (κ )u for each u in p (κ ) H , the range of
p (κ + ε ) is at least m -dimensional • Under the conditions of Theorem 4.9, Eq. (4.47) has precisely m solutions for ξ& j (κ ) , while A (κ + ε ) can have more points of its spectrum in a small neighborhood of which we assume to be the eigenvalues approximations
ξ (κ ) ,
ξ j (κ + ε ) , j = 1, 2,..., m′ ≥ m . Formally, m
ξ ′j (κ + ε ) , j = 1, 2,..., m can be obtained for the eigenvalues of A (κ + ε )
from the perturbation series expansion of Theorem 4.3. The first term of the expansion given by Eq. (4.7) is directly related to ξ& j (κ ) . Up to the first order,
ξ ′j (κ + ε ) are the eigenvalues
of p (κ ) A (κ + ε ) p (κ ) defining ξ& j (κ ) . These observations can be stated more precisely as Theorem 4.10. In addition to the assumptions of Theorem 4.9, let
ξ ′′j (κ + ε ) ,
j = 1, 2,..., m , be the eigenvalues of p (κ ) A (κ + ε ) p (κ ) . Then each ξ ′′j (κ + ε ) converges to
ξ (κ ) , the corresponding eigenprojection p ′′j (κ + ε ) converges uniformly to the
orthoprojections p ′′j (κ ) such that p ′′j (κ ) p (κ ) = p ′′j (κ ) , and the solutions ξ& j (κ ) of Eq (4.47),
p (κ ) A& (κ ) p (κ ) = ξ& j (κ ) p (κ ) are the derivatives ξ&′′j (κ ) of
ξ ′′j (κ ) .
Foundations and Applications of Variational and Perturbation Methods
169
Proof. Since p (κ ) A (κ + ε ) p (κ ) maps the finite, m -dimensional subspace p (κ ) H to
p (κ )H for each ε , and converges strongly on p (κ ) H , the convergence is uniform. The first part follows from Lemma 3.2(ii) and Remark 3.2 (ii). By similar arguments, the strong differentiability of
A (κ ′) implies uniform differentiability of p (κ ) A (κ ) p (κ ) . The remainder of the result follows from Lemma 4.17 • While a spectral expansion is a possibility, there is a wide class of perturbations which preclude such occurrence. Following theorem provides one such criterion. Theorem 4.11. In addition to the assumptions of Theorem 4.9, let A& (κ ) be A (κ ) -
bounded, i.e. [ z − A (κ )] A& (κ ) is bounded for z in the resolvent set of A (κ ) . Then −1
ξ (κ )
is the common point of m eigenvalue curves of A (κ ′) with derivatives ξ&j (κ ) being the solutions of Eq (4.47),
p (κ ) A& (κ ) p (κ ) = ξ& j (κ ) p (κ ) . Proof. The existence of a continuous derivative A& (κ ′) implies that there is a small
neighborhood of
[ z − A (κ )] Vε ′ −1
ξ (κ ) in the complex plane such that for all z ≠ ξ (κ ) in this neighborhood is
bounded
for
all
ε′
in
the
interval
[−ε ; ε ] ,
where
Vε ′ = [ A (κ + ε ′) − A (κ )] / ε ′ . Consequently, || [ z − ε A (κ )] Vε ′ || can be made arbitrarily small by decreasing ε . The result now follows from Theorem 4.3 • −1
As is clear from Theorem 4.1 and Example 4.1, the class of perturbations indicated in Theorem 4.11 covers most cases of practical importance. In any case, spectral expansion is essentially a part of the asymptotic perturbation theory commented upon in Remark 4.5 in the sequel. If m = 1 , then the original statement of the Hellmann-Feynman theorem, Eq. (4.46),
(u (κ ), A& (κ )u (κ )) = ξ&(κ ) is equivalent to the more general form of Eq. (4.47):
p (κ ) A& (κ ) p (κ ) = ξ& j (κ ) p (κ ) . For a degenerate eigenvalue, there is some ambiguity. It follows from the above considerations that the equation
(u j (κ ), A& (κ )u j (κ )) = ξ&j (κ )(u j (κ ), u j (κ ))
(4.50)
170
S. Raj Vatsya
still holds as long as u j (κ ) are the limits of u j (κ + ε ) as ε → 0 , not for any other vectors in p (κ )H [Vatsya, 2004-1]. The vectors u j (κ ) are just the eigenvectors of Eq. (4.47). In the above, we have studied the differentiability properties of the exact eigenvalues and corresponding eigenvectors, which are rarely available in practical applications. Instead, some approximations to these quantities are obtained, frequently by the variational methods. For this reason, there has been considerable interest in obtaining the corresponding results for the approximate values. In the present formulation of the variational methods, A N (κ ′) = PN A (κ ′)PN is taken as the approximating sequence to A (κ ′) , where PN is κ ′ -independent orthoprojections converging strongly to the identity operator. Since PN and thus, A N (κ ′) for each N maps the finite-dimensional space PN H to PN H , the strong continuity and differentiability of
A (κ ′) imply the uniform continuity and differentiability, respectively, of A N (κ ′) with respect to κ ′ . Consequently, the corresponding results for A N (κ ′) are stronger than for
A (κ ′) . Thus, we have Corollary 4.5. In addition to the assumptions of Theorem 4.9, let PN be
and for each fixed N , let
κ ′ -independent
ξ N (κ ) be an isolated eigenvalue of multiplicity mN of
AN (κ ) = PN A (κ )PN . Then ξ N (κ ) is the common point of mN eigenvalue curves
ξ Nj (κ + ε ) of A N (κ + ε ) for all sufficiently small values of ε , counting multiplicities. The derivatives ξ&Nj (κ ) of these curves are the eigenvalues determined by Eq. (4.4), i.e.,
p Nj (κ ) A& N (κ ) p Nj (κ ) = ξ&Nj (κ ) p Nj (κ ) , where p Nj (κ ) are the eigenprojections of A N (κ ) corresponding to the eigenvalue
(4.51)
ξ Nj (κ ) .
Proof. Follows by substituting A N (κ ) for A (κ ) together with the other respective
quantities in Lemma 4.17 • The result of Corollary 4.5 is known to be valid under less restrictive conditions on PN . These conditions amount to reducing the additional contributions equal to zero leaving the relevant elements of A& N (κ ) essentially equal to those of PN A& (κ )PN . Further extensions can be obtained with this observation. More interesting question is if ξ&Nj (κ ) converges to
ξ& j (κ ) as N → ∞ . The question reduces to comparing the isolated eigenvalues of A& N (κ ) and A& (κ ) . However, the properties of A N (κ ) with respect to N are less satisfactory. Since
AN (κ ) converges strongly to A (κ ) at best, mN can vary with N . It can be concluded that
Foundations and Applications of Variational and Perturbation Methods
171
mN ≥ m exactly as in Theorem 4.9, and the parallel results can be obtained by similar methods. However, even with the condition of Theorem 4.11, convergence of ξ& (κ ) to Nj
ξ& j (κ ) requires additional properties, e.g., the monotonicity as in Theorem 3.13. In most cases of interest, there is usually sufficient additional information available on A (κ ) and
A& (κ ) enabling further improvement of the results. If such properties can be used to conclude that
s
ξ Nj (κ + ε ) → ξ j (κ + ε ) and p Nj (κ + ε ) → p j (κ + ε ) but the convergence is N →∞
N →∞
ε in some neighborhood of zero, then it can be concluded by the standard arguments that ξ&Nj (κ ) → ξ& j (κ ) . uniform with respect to
N →∞
An interest in the higher order derivatives of the eigenvalues is limited except that a need for the second order derivative of an isolated eigenvalue of a twice differentiable operator arises occasionally. For this case, straightforward differentiation and manipulations of the eigenvalue equation yields
&& (κ ) p (κ ) + 2 p& (κ )[ξ − A (κ )]p& (κ ) = ξ&&(κ ) p (κ ) . p (κ ) A
(4.52)
The derivative p& (κ ) can be obtained as Eq. (4.8). Eq. (4.52) is known as the curvature theorem. This result as well as the higher order derivatives can be deduced rigorously as the first order derivative with essentially the same assumptions extended to the order needed. Eq. (4.52) has found some useful applications in quantum chemistry and other areas [Vatsya and Pritchard, 1985] Not much attention has been paid to the spectral differentiation of the quantities related to the continuous spectra in literature for their limited applicability. In the cases of mild perturbations considered here, continuous spectra of the perturbed and unperturbed operators are identical, both covering the positive real line. Thus, the differentiation of the spectral points has no meaning. Instead, the differentiation properties pertain to the scattering and transition operators. These are determined by the differentiation properties of their elements, which are of more direct interest also. This topic will be considered in sec. 6.III. For the present, a related result is stated in Lemma 4.18 below. Lemma 4.18. With the symbols as in Theorem 4.8, we have
∂[κ ( g , f + )] = ( f − , f+ ) = < Φ − ,V Φ + > . ∂κ Proof. The result follows by observing that
[κ ( g , f + ) = κ ( g , ([1 − κ K ± (λ )]−1 g ),
172
S. Raj Vatsya
together with the standard manipulations and the substitutions from Eqs. (4.42) and (4.44-a, b). Differentiability is obvious • The result of Lemma 4.18 can be extended to include more complicated parameter dependence of the potential. Remark 4.5. We have considered the perturbations that are sufficiently mild not to cause basic changes to the spectral properties of operator and to the spectral quantities under consideration. In the studies of the spectral differentiation, perturbation results from the variation of parameter. Cases are encountered when an arbitrarily small perturbation can alter the spectral properties in a fundamental way. One example of such an occurrence is encountered in Theorem 4.9. In general, this topic constitutes a part of the asymptotic perturbation theory (Reed and Simon, 1978-79, Vol. IV). In addition to various other occurrences, this situation arises also for the case considered in Proposition 4.1. For κ < κ c ,
A has an eigenvalue in the negative real line and for κ > κ c , it has no eigenvalue. The limit point may or may not be an eigenvalue. In any case, as
κ is increased beyond κ c , a non-
isolated eigenvalue at zero or a sequence with accumulation point at zero, makes a transition into the continuous spectrum. For κ < κ c , the eigenvalue is differentiable with derivative given by Theorem 4.9. If A& (κ ) is continuous, the derivative at zero is defined by the left continuity. The transition into the continuous spectrum, for
κ > κ c , is studied in terms of the
transition probabilities and decay rates. Rigorous understanding of such behavior is somewhat limited and the pertaining analysis requires additional techniques, falling out of the scope of the present text. It is pertinent to remark however, that heuristic and semi-rigorous treatments have been successfully used to study the pertaining physical phenomena. Binding energies of the quantum mechanical systems are given by the eigenvalues of self-adjoint operators. Ionization is defined by a transition of an eigenstate into the continuum. Thus, the critical strength
κ c is of an independent interest. It is clear that κ c −1 is
the lowest eigenvalue of the negative definite Hilbert-Schmidt operator K (−0) . Thus, a sequence of converging upper bounds to
κ c −1 can be generated by the variational method as
shown in Theorem 3.13. The lower bound methods require a comparison operator, which is not readily available in general, although the method may be applicable to specific problems. However, an error bound converging to zero can be found with the result of Proposition 3.3, yielding a converging lower bound to
κ c −1 . The problem of critical parameters can be
formulated as a generalized eigenvalue problem for more general quantum mechanical systems of interest (Sergeev and Kais, 1999). The case of a particle moving in a force field generated by a mild potential, e.g., Yukawa, can be treated by the present method. This and the cases of the atomic systems can be treated by the methods of ch. 3 and sec. 4.II., as will be illustrated in sec. 6.II. •
Foundations and Applications of Variational and Perturbation Methods
173
4.IV. ITERATION Thus far we have considered only the linear homogeneous and inhomogeneous equations. The arguments based on iteration have been used frequently to develop other results. The procedure is always about the same as in the Neumann expansion of the perturbed resolvent starting with unperturbed one. The resolvent equation is just an operator version of the linear equation x = (a + bx) with the solution x . Its iterations generate the Neumann expansion, which converges if the perturbation b is sufficiently small; equivalently, if the unperturbed solution a is sufficiently close to the perturbed. This property is exploited by various techniques, e.g., by adjusting a and b at every step in the variational methods, by judicious choice of a and b at some steps in numerical perturbative techniques and in hybrid schemes. In this section, the iterative method is extended to nonlinear problems with and without combining it with other techniques. We shall restrict to the type of problems frequently encountered in physical applications, particularly in the transport phenomena. The methods of analysis introduced are elementary in nature. In view of the properties of the equations, such methods produce valuable information, which is lost if more general results are applied as they do not assume the available restrictive conditions. Since the linear problems constitute the special cases of nonlinear ones, the results are applicable to them also. Relaxations of the conditions will be considered whenever deemed useful. Considerations will be limited to the equations that admit a fixed point representation. With f ( x ) being a function of one variable x varying over its domain Ω , a fixed point x% of f is defined as the solution of x% = f ( x% ) , if it exists. It is clear that for the fixed
point equation to be meaningful, the intersection of the range of f and its domain Ω must be nonempty. A map f is a contraction if
| f ( x1 ) − f ( x2 ) | ≤ a | x1 − x2 | for some a < 1 . A contraction is clearly a uniformly continuous map. The sequence { xn }n =0 ∞
generated
by
the
iterations
of
an
arbitrary
x0
in
Ω
under f ,
i.e.,
xn = f ( xn −1 ), n = 1, 2,.... , will be called the iterative sequence. The fixed point of a contraction of Ω is determined by the contraction mapping theorem, which reduces to the Neumann expansion (Lemma 2.3) for the linear case. Although essentially equivalent, slightly varied versions of this result are found in literature. We state it in the form suitable for the present applications. Lemma 4.19. (contraction mapping theorem) Let the sequence { xn }n =0 of the iterates ∞
generated with an arbitrary x0 in a closed set Ω by a contraction f be contained in Ω . Then the sequence converges to the unique fixed point of f in Ω .
174
S. Raj Vatsya Proof. The inequality | xn − xn −1 |≤ a
n −1
| x1 − x0 | clearly holds for n = 1 . Its validity
for an arbitrary n implies that
| xn +1 − xn | = | f ( xn ) − f ( xn −1 ) | ≤ a | xn − xn −1 | ≤ a n | x1 − x0 | , i.e., its validity for ( n + 1) . Hence the inequality holds for all n by induction. With arbitrary integers n > m , it follows that
| xn − xm | ≤ | xn − xn −1 | + | xn −1 − xn −2 | +.....+ | xm + 2 − xm +1 | + | xm +1 − xm | ≤ [a n −1 + a n −2 + .... + a m +1 + a m ] | x1 − x0 | ≤
implying that
am ] | x1 − x0 | (1 − a )
{ xn }n =0 ∞
→
n , m →∞
0,
is a Cauchy sequence. Since Ω is closed, it has a limit x% . This
together with the continuity of f results in
x% = lim xn +1 = lim f ( xn ) = f ( x% ) , n →∞
n →∞
i.e., x% is a fixed point of f .
For the uniqueness, assume that there is another fixed point x%′ ≠ x% . This implies that
| x% − x%′ | = | f ( x% ) − f ( x%′) | ≤ a | x% − x% ′ | , i.e., a ≥ 1 , which is a contradiction • Corollary 4.6. For a continuous function f from Ω to Ω , the result of Lemma 4.19
holds if f
l
is a contraction for some integer l .
Proof. Consider the sequences
{ xnl+ m }n=0 ∞
=
{( f
) ( xm )}
l n
∞ n =0
=
{g
n
( xm )}
∞ n =0
, m = 0,1, 2,..., (l − 1),
as defined in Lemma 4.19. Since each xm is in Ω , it follows from Lemma 4.19, that
lim xnl + m = y% , m = 0,1, 2,..., (l − 1) , n →∞
Foundations and Applications of Variational and Perturbation Methods
175
l
where y% is the unique fixed point of f . Since there are only a finite number of sequences, there is an integer N independent of m , such that | xnl + m − y% |≤ ε for each n ≥ N and each
ε > 0 . Since each xμ is contained in one of the l sequences { xnl + m } , for sufficiently large
integers μ ,ν we have that | xμ − xν | ≤ | xμ − y% | + | y% − xν | ≤ 2ε , and hence,
{x } μ
is a
Cauchy sequence converging to y% . Since
y% = lim xμ +1 = lim f ( xμ ) = f ( y% ) , μ →∞
μ →∞
y% is a fixed point of f . The uniqueness follows from the fact that each fixed point of f is also a fixed point of f
l
and y% is uniquely defined •
The contraction mapping theorem, Lemma 4.19, by itself is of a limited numerical value. However, as in the linear case, it forms the foundation for development of various other methods. Also, parts of the arguments can be used to supplement other deductions. By itself or in combination with adjustments, the result is widely applicable. One of the adjustments is in selecting x0 , which is essentially the unperturbed solution. In certain situations, a proper choice can improve the properties of the approximating sequence. About the most useful variations of the method are based on the re-statement of the fixed point equation as
[1 − η ]x% = [ f ( x% ) − η x% ] ,
(4.53)
with a judicious choice of η , which can be and usually is a function. This characterizes x% as the fixed point of the function h( x) = [1 − η ] [ f ( x) − η x] . This method is useful in −1
constructing a contraction h( x ) while f ( x) may not be, or to improve the rate of convergence. The contraction mapping theorem can then be applied to h( x) . The simplest choice is to set η = const. For a differentiable f , taking its derivative f& for η is known as Newton’s method. If instead of f& , its finite difference approximation is used, the procedure
is known as the Newton-Raphson method. The most suitable points for finite difference approximations are the members of the sequence itself, i.e.,
η ( x0 ) = 0, η ( xn ) = [ f ( xn ) − f ( xn −1 )] /[ xn − xn −1 ] , where
xn = h( xn −1 ), n = 1, 2,.... . Most frequently, we shall encounter continuously differentiable maps. If f& is bounded by a constant less than one, then f is clearly a contraction and if f& does not change sign,
176
S. Raj Vatsya
f is monotonic. If f& is a continuous and non-decreasing function, then f is called a convex function on Ω , and if f& is continuous and non-increasing, the function is concave. These definitions are stronger than the standard but sufficient and convenient for the present purpose. As usual, strict observance of each property will be explicitly emphasized. A set Ω is termed a convex set if for each x1 , x2 in Ω , [ax1 + (1 − a ) x2 ] is in Ω for each
0 ≤ a ≤ 1 . The following result is straightforward to prove and admits obvious variations. It is stated for its usefulness in some applications. Proposition 4.2.
(i) Let f ( x) , an increasing function with f (0) > 0 , have a positive fixed point x% , assumed to be minimal. Then the iterative sequence
{ xn }n=0 ∞
with
x0 ≤ x% converges to x%
from below, i.e., xn ↑ x% as n → ∞ . Strictly increasing function generates a strictly increasing sequence as long as
x0 < x% .
(ii) Let f ( x ) , a decreasing contraction with f (0) > 0 , have a positive fixed point x% , assumed to be minimal. Then the iterative sequence
{ xn }n=0 ∞
with
x0 ≤ x% produces
alternating bounds: x2 n ↑ x% ↓ x2 n +1 as n → ∞ . Strictly decreasing function generates a strictly decreasing and increasing converging bounds as long as
x0 < x% .
(iii) Let f ( x ) , an increasing convex function with f (0) > 0 , have a positive fixed point x% , assumed to be minimal. Then the Newton sequence { xn }n = 0 with ∞
x0 ≤ x% converges
to x% from below, i.e., xn ↑ x% as n → ∞ . Strictly convex function generates a strictly increasing sequence as long as
x0 < x% .
(iv) Let f ( x) , a decreasing concave function with f (0) > 0 , have a positive fixed point
x% , assumed to be minimal. Then the Newton sequence { xn }n= 0 with arbitrary x0 in its ∞
domain produces upper bounds: xn ↓ x%, n > 0 , as n → ∞ . Strictly concave function generates a strictly decreasing bounds sequence as long as Let B be a Banach space with the norm denoted by ||
x0 ≠ x% • || . A map Au is a contraction
on Ω if || Au − Aυ ||≤ a || u − υ || for u,υ in Ω for some a < 1 . The contraction mapping theorems still holds. Proofs of Lemma 4.19 and Corollary 4.6 require only the replacement of the absolute value with || || . Therefore, the results are stated without proofs.
Foundations and Applications of Variational and Perturbation Methods
177
Lemma 4.20. (contraction mapping theorem) Let the sequence {un }n = 0 of the iterates ∞
generated with an arbitrary u0 in a closed set Ω by a contraction A be contained in Ω . Then the sequence converges strongly to the unique fixed point of A in Ω • Corollary 4.7. For a continuous operator A from Ω to Ω , the result of Lemma 4.20
holds if A is a contraction for some integer l ≥ 1 • l
A convex set or convex hull Ω in B is still defined as above, i.e., the set containing all vectors [ au + (1 − a )υ ] whenever u,υ are in Ω and 0 ≤ a ≤ 1 . If there is an operator A& (υ ) such that the strong derivative
d A[υ + ε u ] − A[υ ]) A[υ + τ u ] |τ = 0 = s - lim = A& (υ )u , ε →0 dτ ε
(4.54-a)
then A is said to be Fréchet differentiable, abbreviated differentiable, at υ with derivative
A& (υ ) . If A& (υ ) is independent of υ , then A is clearly linear. If A& (υ ) exists on a closed convex set Ω , then
[ Au − Aυ ] =
1
∫ dτ 0
A& [υ + τ (u − υ )] (u − υ ), u ,υ in Ω ,
which is an extension of f ( x ) − f ( y ) =
∫
x y
variable obtained by variable change from x′ to
(4.54-b)
dx′ f& ( x′) , for the functions of a single
τ = ( x′ − y ) /( x − y ) . The formula given
by Eq. (4.54-b) indicates the significance of the convexity of the underlying set. The higher order derivatives are defined in the same manner. Eq. (4.53) can now be extended to the operators:
[1 − η ]u = [ Au − η u ] ,
(4.55-a)
where u is a fixed point of A , i.e., Au = u . Eq. (4.55-a) can be used with η = const. or Newton’s method can be extended by taking η = A& , i.e.,
[1 − A& (υ )]u = [ Au − A& (υ )u ] ,
(4.55-b)
υ is an approximation to u . The Newton-Raphson method can also be extended by an appropriate approximation to A& (υ )u in Eq. (4.54-a). where
II. APPLICATIONS
II. APPLICATIONS
Chapter 5
5. MATRICES 5.I. TRIDIAGONAL MATRICES Structured and non-structured matrices are encountered in a variety of physical problems. They also arise as a result of the applications of other methods to solve infinite-dimensional operator equations, particularly the applications of variational methods. Although the tridiagonal matrices constitute a sub-class of structured matrices, they are of considerable interest to warrant a separate treatment, particularly due to some properties not shared by the others. Clearly, all structured matrices are special cases of the general matrices, which are operators in finite-dimensional spaces. Purpose of an independent treatment is to exploit additional properties that each class possesses. Tridiagonal matrices play a significant role in the studies pertaining to the Jacobi matrices arising in the moment problems, and more general problems of symmetric operators in Hilbert spaces, as discussed in sec. 3.III. Here we extend an approach outlined in sec. 2.V., to invert the differential operators, to a tridiagonal matrix. In addition to being an interesting application, the procedure yields a basic property of the inverse, which can be used to deduce the other known properties derived by more complicated methods. Computationally, this property can be exploited to invert a tridiagonal matrix more efficiently. Further, the resulting expression expresses the inverse explicitly in terms of the matrix elements, providing useful information for modeling where the elements are physical quantities impacting upon some other physical quantities depending on the inverse. Although the same arguments are applicable to the complex matrices and extend to the block tridiagonal matrices, which includes the banded matrices [Vatsya and Pritchard, 1982], for simplicity, we restrict to the case of a real tri-diagonal matrix A . This case is sufficient to explain all the essential arguments of the procedure. Initially the results will be established for the simple matrices, i.e., none of its co-diagonal elements being equal to zero, which will then be extended to include the non-simple cases. Consider the equation
Af = g,
(5.1)
184
S. Raj Vatsya
assuming that A is invertible. Guided by the interpretation of the method to solve first order differential equations, Eq. (5.1) will be treated by the following procedure: find a noninvertible extension A ′ of A and an extension g ′ of g such that the equation
A′ f ′ = g′ ,
(5.2)
is solvable and Eq. (5.2) under suitable boundary conditions reduces to Eq. (5.1). To avoid confusion with the sequences of vectors, the elements of f , g and their extensions will be denoted by ξ and γ , respectively, with similar notation for other vectors. The following results will be needed in the process. Lemma 5.1. Let T be a simple tridiagonal matrix in L2 ( M ) . The first and last element of each of its eigenvectors are non-zero. Proof. Since (T − λ ) with λ being a constant multiple of the identity matrix, is also a
simple tridiagonal matrix, it is sufficient to consider the case λ = 0 . Assume that there is vector u ≠ 0 such that T u = 0 . With the elements of u denoted by αn , assume that α1 = 0 . Since T12 ≠ 0 , it follows that
α 2 = − T 11α1 / T 12 = 0 . For n = 2 to ( M − 1) , we have by induction,
α n +1 = − (T n ( n −1)α n −1 + T nnα n ) / T n ( n +1) = 0 , and hence, u = 0 , leading to a contradiction. Hence α1 ≠ 0 . The case of the last element is proven similarly by starting with the last element to the first • Corollary 5.1. Each eigenvalue of a simple tri-diagonal matrix is simple. Proof. From Lemma 5.1, each eigenvector can be normalized so that its first element is equal to one. If there are two eigenvectors u ≠ υ corresponding to the same eigenvalue λ , normalized as indicated, then there is a non-zero vector w = u − υ such that (T − λ ) w = 0 .
This contradicts the result of Lemma 5.1, since the first element of w is equal to zero • The space L2 ( N + 1) will be expressed as the direct sum of L2 ( N ) and the onedimensional space L2N +1 (1) . A vector u with elements α1 , α 2 ,..., α N +1 , can be expressed as
u = υ ⊕ w where υ has elements α1 , α 2 ,..., α N , and w = α N +1 . The vector u will be expressed as υ ⊕ α N +1 .
Foundations and Applications of Variational and Perturbation Methods
185
Theorem 5.1. An invertible simple tridiagonal matrix A in L2 ( N ) admits a non-
invertible simple tridiagonal extension A ′ to L2 ( N + 1) = L2 ( N ) ⊕ L2N +1 (1) , such that
A ′( f ⊕ 0) = ( A f ⊕ ξ N ) , where f is an arbitrary vector in L2 ( N ) with elements ξn . Proof. Let α1 = 1 , α 2 = − A11 / A12 ,
α n +1 = − ( A n ( n −1)α n −1 + A nnα n ) / A n ( n +1) , n = 2, 3,..., ( N − 1) , and α N +1 = − ( A N ( N −1)α N −1 + A NN α N ) . We have that α N +1 ≠ 0 , otherwise there is a nonzero vector u0 in L2 ( N ) , i.e.,with elements α n , n = 1 to N , such that A u0 = 0 , which contradicts the invertibility of A (Lemma 2.1). Define A ′ as follows:
′ = Anm , n, m = 1, 2,....., N ; A N′ ( N +1) = A(′N +1) N = 1 , Anm
(5.3)
A (′N +1)( N +1) = − α N / α ( N +1) . It is clear that A ′ is a simple matrix with A ′u = 0 , where u is a vector with elements
α n , n = 1 to ( N + 1) , and u is unique from Corollary 5.1. Validity of the equality A ′( f ⊕ 0) = ( A f ⊕ ξ N ) can now be checked by direct computation • Since A ′ has zero as a simple eigenvalue with the corresponding eigenvector u , there is
ˆ ′ is the a unique vector uˆ in L2 ( N + 1) with elements αˆn , such that Aˆ ′uˆ = 0 , where A adjoint of A ′ . The elements of uˆ can be computed by transposing the expressions for αn given in Theorem 5.1. We state the result as Corollary 5.2. With Aˆ ′uˆ = 0 , the elements αˆn of uˆ are given by
αˆ1 = 1 , αˆ 2 = − A11 / A21 , αˆ ( n +1) = − ( A ( n −1) nαˆ n −1 + A nnαˆ n ) / A ( n +1) n , n = 2 to ( N − 1) ,
(5.4)
αˆ N +1 = − ( A ( N −1) N αˆ N −1 + A NN αˆ N ) . The equality A ′( f ⊕ 0) = ( A f ⊕ ξ N ) of Theorem 5.1 constitutes the basis of the present procedure, which implies Corollary 5.3. Let f ′ = ( h ⊕ 0) be an otherwise arbitrary solution of
186
S. Raj Vatsya
A ′ f ′ = g ′ = ( g ⊕ γ N +1 ) , with the elements of h in L2 ( N ) denoted by ξn′ . Then h = f , where f is the solution of
A f = g with the elements ξ n′ = ξ n , and ξ N′ = ξ N = γ N +1 . Proof. It follows from Theorem 5.1 and the stated assumptions, that
A ′ f ′ = ( A h ⊕ ξ N′ ) = g ′ = ( g ⊕ γ N +1 ) , i.e., A h = g and ξ N′ = γ N +1 . Now the invertibility of A implies that h = f , where f is the solution of A f = g with the elements ξn and hence, ξ N′ = ξ N = γ N +1 • The result of Corollary 5.3 constitutes a solution scheme for Eq. (5.1), provided that Eq. (5.2) has a solution of the required form, which we establish in Lemma 5.2 below. Lemma 5.2. Let γ N +1 = −
∑
N n =1
αˆ n g n with αˆn as defined in Corollary 5.2. Then
A ′ f ′ = g ′ has a solution f ′ with the representation f ′ = ( h ⊕ 0) . Furthermore, h = f with
ξ N = γ N +1 . Proof. The selected value of γ N +1 implies that (uˆ , g ′) = 0 . Hence, by the Fredholm
alternative, A ′ f ′(κ ) = g ′ , has a one-parameter family of solutions, given by
f ′(κ ) = κ u + f% , where u is as defined in Theorem 5.1 and f% is an arbitrary solution of A ′ f% = g ′ , with the elements ξ%n . From Lemma 5.1, α N +1 ≠ 0 and hence κ = −ξ%N +1 / α N +1 is uniquely defined. Taking this value for κ yields the stated representation. The second assertion follows from Corollary 5.3 • With the result of Lemma 5.2 established, the solution f of Eq. (5.1) can be obtained by back substitution. However, we will pursue the analogy with the differential equations further and obtain f explicitly by expressing f ′ as f ′ = uh . Theorem 5.2. Assuming that αn and αˆn , as defined in Theorem 5.1 and Corollary 5.2,
respectively, are all non-zero, the elements ξn of f are given by
ξn = − α n
N
1
m= n
αˆ m A m′ ( m +1)α m +1
∑
m
∑ l =1
αˆ l γ l .
Foundations and Applications of Variational and Perturbation Methods
187
Proof. From Lemma 5.2, it is sufficient to find a solution of Eq. (5.2) with ξ N +1 = 0 .
Eq. (5.2) is equivalent to
A% n ( n −1)ξ%n −1 + A% nnξ%n + A% n ( n +1)ξ%n +1 = αˆ nγ n , n = 1, 2,..., ( N + 1) , where ξ n = α nξ%n , A% nm = αˆ n A nm ′ α m = αˆ m A mn ′ αˆ m , which can be checked directly, and we
% add to have taken A%10 = A% ( N +1)( N + 2) = 0 for convenience. Since the rows of the matrix A zero, ξ%n can be expressed as ξ%n =
∑
N m=n
η m , which incorporates the condition ξ%N +1 = 0 . The
elements ηm satisfy the equation
A% n ( n −1)η n −1 − A% n ( n +1)η n = αˆ nγ n , n = 1, 2,..., N , yielding
ηm = −
1
αˆ m Am′ ( m +1)α m +1
m
∑ l =1
αˆ l γ , m = 1, 2,3,....., N
•
Remark 5.1. In Theorem 5.2, we have obtained the inverse of a tridiagonal matrix by constructing a one-parameter family of the solutions of a non-invertible matrix and by selecting the desired solution. At times one encounters a non-invertible tridiagonal matrix and the inverse of its restriction is desired, e.g., the solution of Eq. (5.1) with ( wˆ , g ) = 0 such
that ( w′, f ) = 0 . The procedure of Theorem 5.2 can be used to solve such systems essentially
ˆ and obtain f by by the same arguments. All that is required is to replace α , αˆ by w, w determining κ from ( w′, κ w + h) = 0 , where (κ w + h ) is a one parameter family of the solutions [Vatsya and Pritchard, 1980] • This completes the analogy with the first order differential equations. However, the solution obtained in Theorem 5.2 is neither expressed entirely in terms of the elements of the original matrix A , nor it is convenient for computation since αn , αˆn suffer from overflows even for well-conditioned matrices. In the following, this solution is reduced essentially by the algebraic manipulations to eliminate both of these limitations. The procedure overlaps with the analysis of the continued fraction expansions in terms of the associated orthogonal polynomials (Lemma 3.12). First we express the solution in terms of the Jacobi polynomials pn (λ ) associated with the matrix A needing no new elements of the extended matrix A ′ . The Jacobi polynomials are defined by
p0 (λ ) = 1, p1 (λ ) = (λ − A11 ), pn (λ ) = (λ − Ann ) pn −1 (λ ) − A( n −1) n An ( n −1) pn − 2 (λ ), n = 2,3,...., N .
(5.5)
188
S. Raj Vatsya
We have Lemma 5.3. The elements αn and αˆn defined in Theorem 5.1 and Corollary 5.2, are
expressed in terms of the Jacobi polynomials associated with the matrix A as
α n = pn −1 (0)
n −1
∏ m =1
1 A m′ ( m +1)
, αˆ n = pn −1 (0)
n −1
∏ m =1
1 A (′m +1) m
, n = 1, 2,..., ( N + 1) .
Proof. We give a proof for αn . The result for αˆn follows by transposing the elements.
The result is clearly true for n = 1 . Assume that it is true for numbers up to n . Then
′ α n ) / An′ ( n +1) α n +1 = − ( An′ ( n −1)α n −1 + Ann = −
⎡ 1 ′ ⎢ pn −1 (0) Ann An′ ( n +1) ⎣⎢
= pn (0)
n
∏ m =1
1 Am′ ( m +1)
n −1
∏ m =1
1 Am′ ( m +1)
+ pn − 2 (0) An′ ( n −1)
n−2
∏ m =1
⎤ ⎥ Am′ ( m+1) ⎦⎥ 1
.
The result follows by induction • Lemma 5.4. The elements ξn of the solution f of Eq. (5.1) are given by
⎤ ⎡ m −1 pn −1 (0) pl −1 (0) ⎡ m −1 ⎤ ⎢∏ Aμ ( μ +1) ⎥ ⎢∏ A(ν +1)ν ⎥ γ l , pm −1 (0) pm (0) ⎣ μ = n m = n l =1 ⎦ ⎦ ⎣ ν =l n = 1, 2,........., N .
ξn = −
N
m
∑∑
Proof. Follows by substitutions for αn and αˆn from Lemma 5.3 in the result of Theorem
5.2 • The solution given by Lemma 5.4, although expressed entirely in terms of the elements of A , is also inconvenient for applications for the same reason as the expression of Theorem 5.2. This expression can be adjusted to reduce it to a useable form in terms of the Jacobi rationals pn (λ ) = pn (λ ) / pn −1 (λ ) , defined from Eq. (5.5) by
p0 (λ ) = 1,
p1 (λ ) = (λ − A11 ),
pn (λ ) = (λ − Ann ) −
A( n −1) n An ( n −1) pn −1 (λ )
, n = 2,3,...., N .
(5.6)
Foundations and Applications of Variational and Perturbation Methods
189
Computationally, the Jacobi rationals are quite convenient to evaluate. They are used in the standard methods, although not by this name. Theorem 5.3. With the Jacobi rationals defined by Eq. (5.6), let
am (λ ) = Am ( m +1) / pm (λ ) , bm ( λ ) = A ( m +1) m / p m ( λ ) , m = 1, 2, 3,....., ( N − 1) . Then the elements ξn of the solution f of Eq. (5.1) are given by
ξn = −
N
⎡ m −1 ⎤ ⎡ m −1 ⎤ a (0) ⎢∏ μ ⎥ ⎢∏ bν (0) ⎥ pm (0) ⎣ μ = n ⎦ ⎦ ⎣ ν =l
γl
m
∑∑
m = n l =1
⎡ m −1 ⎤ ⎡ m −1 ⎤ 1 a (0) ⎢∏ μ ⎥ ⎢∏ bν (0) ⎥ , ∑ ∑ l =1 m = max( n ,l ) pm (0) ⎣ μ = n ⎦ ⎦ ⎣ ν =l n = 1, 2, 3,........, N . = −
N
N
Proof. Follows by substitution from Eq. (5.6) in Lemma 5.4 •
The above procedure can be used to obtain the solution of ( A − λ ) f ( λ ) = g . All that is required is to replace pm (0) with pm (λ ) in the result of Theorem 5.3. Furthermore,
( A − λ ) −1 can be obtained from the same result, which is stated in Corollary 5.4. Corollary 5.4. With symbols as in Corollary 5.4,
[ A − λ ]−nj1 = −
N
∑
m = max( n , j )
⎤ ⎡ m −1 ⎤ ⎡ m −1 1 ⎢∏ aμ (λ ) ⎥ ⎢∏ bν (λ ) ⎥ , n, j = 1, 2,....., N . pm (λ ) ⎣ μ = n ⎦ ⎣ν = j ⎦
Proof. The result is obtained by letting γ l = δ lj and by replacing pm (0) with pm (λ ) in
the result of Theorem 5.3 • Corollary 5.5. With the symbols as in Corollary 5.4, we have
[ A − λ ]−NN1 = −
1 , p N (λ )
[ A − λ ](−N1 − n )( N − n ) = aN − n (λ ) bN − n (λ ) [ A − λ ](−N1 − n +1)( N − n +1) − n = 1, 2,......, ( N − 1);
1 p N − n (λ )
,
190
S. Raj Vatsya
[ A − λ ](−n1−1) j = an −1 (λ ) [ A − λ ]−nj1 ,
n ≤ j ≤ N;
[ A − λ ]−n1( j −1) = bn−1 (λ ) [ A − λ ]−nj1 ,
j ≤ n ≤ N.
Proof. The result follows from Corollary 5.4 •
The solutions given by Theorem 5.3, Corollary 5.4 and Corollary 5.5, are defined completely in terms of elements of the original matrix A . The first two equalities in Corollary 5.5 enable the construction of diagonal elements recursively starting with the last. The last two equalities enable the completion of construction by simple multiplications of the rows above and the columns below the diagonal by am (λ ) and bm (λ ) , respectively, which is more efficient than standard methods in use. The last two equalities also include a number of other weaker results [Vatsya and Pritchard, 1983]. This scheme is quite convenient for computational purpose as long as the Jacobi polynomials do not vanish at λ , which can be taken to be zero, without loss of generality. It is also valid for non-simple matrices as can be seen by substitution or by using the limiting argument to be indicated in the following to circumvent the difficulty resulting from the singularity caused by zeros of the Jacobi polynomials, equivalently, of the Jacobi rationals. Lemma 5. 5. There exists an ε 0 > 0 , such that for each positive ε < ε 0 , −1 p n (ε ) ≠ 0, n = 1, 2,..., N , ( A − ε ) exists, and || ( A − ε ) −1 − A −1 || → 0 .
ε →0
Proof. Since a nonzero polynomial of degree n has at most n distinct zeros, the union of zeros of all the polynomials form a finite set. Therefore, none of the polynomials vanishes except at a finite number of points. Take ε 0 > 0 to be the smallest positive zero.
Existence of ( A − ε ) −1 and its convergence to A
−1
is established by the existence of
−1
A and the inequality || ( A − ε )−1 − A −1 || = || A −1[(1 − ε A −1 ) − 1] || ≤ || A −1 || || ε A −1 (1 − ε A −1 ) || ≤
ε || A −1 ||2 → 0, 1 − ε || A −1 || ε →0
resulting from the Neumann expansion for sufficiently small values of ε (Lemma 2.3) • In the following, we use the result of Lemma 5.5 to develop a partial pivoting scheme to be invoked when a Jacobi polynomial has a zero at zero. The first step in obtaining A
−1
is the
construction of the Jacobi rationals recursively starting with p1 (0) , which are well defined if and only if the Jacobi polynomials do not vanish. Thus, the construction of rationals can
Foundations and Applications of Variational and Perturbation Methods
191
continue until pn (0) = 0 for some value of n . If so, pn +1 (0) is undefined and thus pn + 2 (0) cannot be constructed from Eq. (5.6). Therefore, the first step is to develop a procedure to construct the remaining rationals. Then, in the construction of the diagonal and the off diagonal elements, difficulty is encountered in evaluating all elements that involve pn +1 (0) and pn (0) in the denominator. All of these elements are evaluated in Lemma 5.6. Lemma 5.6. Let p μ (0) ≠ 0, μ < n and pn (0) = 0 . Then
[1/ pn +1 (ε )] → 0 , p n + 2 (0) = − A ( n + 2 )( n + 2 ) , ε →0
A −1nn
1 ⎡ sn +1 A −1( n + 2)( n + 2) + A( n +1)( n +1) ⎤⎦ , = sn ⎣
where
s n = − A n ( n +1) A ( n +1) n ;
A −1( n +1) n = 1/ An ( n +1) , A −1n ( n +1) = 1/ A( n +1) n , A −1( n +1) j = A −1 j ( n+1) = 0, j ≥ (n + 1),
A −1 jn = − A( n + 2)( n +1) A −1 j ( n + 2) / An ( n+1) , j ≥ n + 2, A −1 jn = − A( n + 2)( n +1) A −1 j ( n + 2) / An ( n+1) , j ≥ n + 2. Proof. Let qn (ε ) = pn (ε ) pn +1 (ε ) and s n = − A n ( n +1) A ( n +1) n . It follows from Eq. (5.6)
that if pn (0) = 0 , then qn (ε ) → sn . Further, from Corollary 5.5, we have that ε →0
−1 = [ A − ε ]nn
q (ε ) − sn 1 −1 ⎡⎣ sn sn +1 ( A − ε )nn ⎤⎦ − n . q (ε ) qn (ε ) pn (ε ) 2 n
If sn = 0 , then since pn (0) = 0 , [ A − ε ]−nn1 diverges as ε decreases, which is impossible from Lemma 5.5, since A
A n ( n+1) ≠ 0 and
−1
exists. Hence if pn (0) = 0 , then s n = − A n ( n +1) A ( n +1) n ≠ 0 , i.e.,
A ( n +1) n ≠ 0 . Since [ qn (ε ) / pn +1 (ε )] = pn (ε ) → 0 , it follows that ε →0
[1/ pn +1 (ε )] → 0 , which implies from Eq. (5.6) that p n + 2 (0) = − A ( n + 2 )( n + 2 ) , and from ε →0
Corollary 5.5, that A −1( n +1)( n +1) = 0 . The element A −1nn is obtained by taking the limit in the above equality. The remaining relations are obtained by the algebraic manipulations using Corollary 5.5 and taking the limit as ε → 0 •
192
S. Raj Vatsya Example 5.1. For illustration, consider the matrix
⎡1 1 0 ⎢1 1 1 A = ⎢ ⎢0 1 − 2 ⎢ ⎣0 0 2
0⎤ 0 ⎥⎥ . 1⎥ ⎥ 1⎦
From Eq. (5.6) we have p1 (0) = −1, p2 (0) = 0 . Then from Lemma 5.6, p4 (0) = −1 . From Corollary 5.5, A −144 = 1 . From Lemma 5.6, A − 133 = A − 134 = A − 143 = 0 ,
A − 122 = 4, A − 132 = 1, A − 123 = 1, A −142 = − 2, A − 124 = − 1. From Corollary 5.5, A − 111 = 5 . From Theorem 5.3, a1 (0) = 1 = b1 , which yield the remaining elements of the first row and the first column, from Corollary 5.5. The resulting inverse is given by
A −1
⎡ 5 − 4 −1 1 ⎤ ⎢ ⎥ ⎢ −4 4 1 − 1 ⎥ = ⎢ ⎥. ⎢ −1 1 0 0 ⎥ ⎢ ⎥ ⎢⎣ 2 − 2 0 1⎥⎦
The procedure can be used to develop an algorithm to solve a tridiagonal system of algebraic equations also but it results in the standard method in use. However, the scheme to obtain the full inverse described above differs from the standard methods mainly by the fact that some elements of the rows and columns of the inverse are constant multiples of the corresponding elements of the adjacent rows and columns as stated in Corollary 5.5. This property of the inverse can be deduced also by the standard methods to solve the tri-diagonal systems of algebraic equations. However, it is not as transparent and therefore, not exploited in the methods in use. In extensive testing it was found that the present scheme incorporating this property requires about 30% less number of operations thereby reducing the computational effort by the same percentage. As a consequence of a less number of operations, marginally more accurate inverse is obtained compared to the other methods [Vatsya and Pritchard, 1983] •
Foundations and Applications of Variational and Perturbation Methods
193
The result of Lemma 5.1 together with Corollary 4.4 provides the following interesting characterization of the eigenvalues of a simple tridiagonal matrix: Proposition 5.1. Let A be a simple tridiagonal matrix and let w0 be the vector with its
first or the last element equal to one and all others equal to zero. Then each eigenvalue λ of A is the solution of
φ ( λ ) = ( w0 ,[ A + B − λ ]−1 w0 ) = 1 , where the operator B is defined by Bυ = w0 ( w0 ,υ ) . Proof. From Lemma 5.1, each eigenvector u of A satisfies (u , w0 ) ≠ 0 . The result
follows from Corollary 4.4 • Since the eigenvalues of A are the poles of [ A − z ]−1 , i.e., the zeros of the determinant of [ A − z ] , which can be computed efficiently, the result of Proposition 5.1 by itself is of limited computational value. However, since [ A + B − z ] is still a tridiagonal matrix, it can be used together with the preceding results yielding the explicit inverse, or the continued fraction representation of ( w0 ,[ A + B − λ ]−1 w0 ) from Lemma 3.12, to study the properties of eigenvalues, which is useful in some instances. Also, as will be seen in the next section, this result and further consequent results can be used for significant computational benefit in some problems.
5.II. STRUCTURED MATRICES The methods exploiting well defined structures of matrices to compute their inverses, equivalently, to solve the associated linear algebraic equations, and to calculate other quantities, e.g., the eigenvalues, are widely available in literature [Spedicato, 1991]. In this section, we illustrate the use of results and techniques developed in ch. 4 to solve some problems where standard methods have been found to be inadequate. In addition to the numerical difficulties, standard algorithms are not suitable for the studies of correlation between the matrix elements and the computed quantity. Although not unique, such situations arise in the calculations of rate constants for a variety of reactions in chemical kinetics, which can be described by the master equation in the population vector η (t ) with elements being the populations of individual states at time t [Pritchard, 1984]:
dη (t ) + A η (t ) = 0 , dt
(5.7)
194
S. Raj Vatsya
where A = ( A 0 + V ) , the n , m element of A
0
is the equilibrium state transition rate
constant from state n to the state m , and V is a diagonal matrix with nth element being zero or nonzero spontaneous decay rate from nth state. We have used the symmetrized form with corresponding adjustments without changing the physical definitions of the symbols. In this form, A
0
and V are self-adjoint, non-negative matrices and zero is the lowest non0
degenerate eigenvalue of A . The solution of Eq. (5.7) is given by
η (t ) = exp [ − At ] η (0) =
∑
N n =0
e − λnt p nη (0) ,
(5.8)
with λn and p n being the eigenvalues and corresponding eigenprojections of A , respectively. As t increases, the states with positive λn decay to zero. Consequently, η (t ) approaches e − λ0 t p 0η (0) asymptotically. For V = 0 , this leaves the equilibrium state
η (∞ ) = p 0η (0) and for A > 0 , η (t ) approaches zero at the decay rate equal to λ0−1 . For physical systems, λ0 is significantly smaller than the other eigenvalues, sometime negligible, in comparison [Pritchard and Vatsya, 1983]. The accuracy of the standard methods, e.g., the Householder, is usually of the order of the difference between the trace and the sum of the other eigenvalues. Consequently, these methods are inadequate to determine λ0 with sufficient degree of accuracy [Pritchard, 2004]. We illustrate the use of the present methods to circumvent this type of difficulties. In addition, the resulting approximations are explicitly expressed in terms of the matrix elements and thus, provide valuable direct information for modeling the reactions. This information can be coupled with the Hellmann-Feynman theorem (Lemma 4.17) to study the variations with respect to the matrix elements and other parameters. While the focus is on this class of problems, the methods and some of the results 0
cover a wider class, including the degeneracy of lowest eigenvalue of A , and non-diagonal perturbations. Treatment of the cases considered, illustrates the techniques to extend the methods to other similar problems. Let A
0
and V be non-negative matrices, and let λn0 and p n0 , n = 0,1, 2,... , be the 0
eigenvalues and the corresponding eigenprojections of A . In the present case λ00 = 0 . In any case, letting λ00 = 0 causes no loss of generality. For convenience in their repeated use, the eigenvector corresponding to the zero eigenvalue will be denoted by u 0 and the eigenvector of A = ( A 0 + V ) corresponding to its lowest eigenvalue λ0 , assumed to be less than λ10 , will be denoted by u . Rough estimates of the upper and the lower bounds on the eigenvalues can be obtained relatively easily by using the results of Theorem 3.13, Lemma 4.9 and Theorem 4.5, with simple trial vectors, if needed. The perturbation V will be assumed to be a diagonal matrix with elements νn . The main results apply to general positive perturbations. With this preparation, we have
Foundations and Applications of Variational and Perturbation Methods
195
Lemma 5.7. With the symbols as in the preceding paragraph, and α > 0 , λ0 is the ) unique solution of φ (λ ) = 1 and the unique fixed point of φ ( λ ) and of φ% ( λ ) , in the interval
(−∞; λ0′ ) , with an eigenvector u (λ0 ) where λ0′ > λ0 is the lowest eigenvalue of
( A + α p ) , and 0
φ (λ ) = α (u 0 ,[ A + α p 0 − λ ]−1 u 0 ), α > 0 ,
(5.9-a)
)
(5.9-b)
φ (λ ) = α − (α − λ )φ (λ ) = α (u 0 ,[ A + α p 0 − λ ]−1V u 0 ), 0 < α < λ1 , ) ) φ (λ ) α − (α − λ )φ (λ ) (α − λ )φ (λ ) % ) φ (λ ) = = = , α > 0, φ (λ ) φ (λ ) (α − φ (λ )) u (λ ) = α [ A + α p 0 − λ ]−1 u 0 , α > 0, λ0′ > λ0 , (u 0 , u ) = 1.
(5.9-c) (5.9-d)
Proof. Since the lowest eigenvalue of [ A 0 + α p 0 ] is α ,
( w ,[ A 0 + V + α p 0 − λ 0 ]w ) > 0 for each normalized w . With w = u , this implies that (u , p 0u ) > 0 , i.e., (u 0 , u ) ≠ 0 . It follows from Corollary 4.4, by taking φ = α u 0 , that λ0 is a solution of φ (λ0 ) = 1 for each α > 0 . Since
φ& ( λ ) = α (u 0 ,[ A + α p 0 − λ ]−2 u 0 ) ≥ 0 ,
φ (λ ) is a non-decreasing function of λ on the interval ( −∞; λ1 ) . However, φ&(λ ) = 0 implies that [ A + α p 0 − λ ]−1 u 0 = 0 , i.e., u 0 = 0 , which is a contradiction. Hence, φ ( λ ) is a strictly increasing function of λ implying that the solution λ0 is unique. It is ) straightforward to check that φ ( λ ) = λ and φ% ( λ ) = λ if and only if φ ( λ ) = 1 on the prescribed interval. Direct substitution shows that u (λ0 ) is the corresponding eigenvector • The function φ% ( λ ) is independent of α as shown below. Lemma 5.8. The function φ% ( λ ) defined by Eq. (5.9-c), Lemma 5.7, is given by
(i)
1 = (u 0 ,[ A − (1 − p 0 )λ ]−1 u 0 ) , % φ (λ )
196
S. Raj Vatsya
⎡ (ii) φ% (λ ) = (u 0 , V ⎢1 + V ⎣
⎤ p0 ∑ n=1 λ 0 −n λ V ⎥ n ⎦
−1
N
V u0) ,
Proof. (i) If B and (B − β p 0 ) are invertible matrices with β being a constant, then we have
(u 0 , [B − β p 0 ]−1 u 0 ) = (u 0 , B −1[1 − β p 0B −1 ]−1 u 0 ) = (u 0 , B −1u 0 ) /[1 − β (u 0 ,B −1u 0 )]
.
Setting B = [ A + α p 0 − λ ] and β = (α − λ ) yields
1 ( u 0 , [ A + α p 0 − λ ]− 1 u 0 ) = = (u 0 ,[ A − (1 − p 0 )λ ]−1 u 0 ) . 0 0 −1 0 % A p u u 1 − ( α − λ )( , [ + α − λ ] ) φ (λ ) (ii) From (i), φ% ( λ ) is independent of α . Consequently, from Eq. (5.9-c) of Lemma 5.7,
)
φ% (λ ) = lim φ (λ ) , if the right side exists. Now, straightforward manipulations of the α →∞
resolvent yield
(
V [ A + α p 0 − λ ]−1 u 0 = 1 + V [ A 0 + α p 0 − λ ]−1 V
(
)
−1
V [ A 0 + α p 0 − λ ]−1 u 0
1 = 1 + V [ A 0 + α p 0 − λ ]−1 V α −λ
)
−1
V u
,
0
implying from Eq. (5.9-b) that
⎡ ⎤ ⎛ N p0 p0 ⎞ φ% (λ ) = (u , V ⎢1 + V ⎜ ∑ n =1 0 n + ⎟ V⎥ α −λ λn − λ α − λ ⎠ ⎝ ⎣ ⎦
α
0
−1
V u0 ) ,
which has the limit as given in (ii). This result is the same as obtained by the Wigner perturbation method, i.e., the Rayleigh-Schrödinger method retaining all terms in the expansion of the eigenvalue • Arbitrariness of α can be exploited in the numerical implementations of the results of ) Lemma 5.7, mostly to increase the stability in calculating φ ( λ ) , φ (λ ) and also if these functions are used to calculate φ% ( λ ) .
) Proposition 5.2. Let φ ( λ ) , φ ( λ ) , φ% ( λ ) and λ0′ be as defined in Lemma 5.7. Then for
λ in the interval (0; λ0′ ) , φ ( λ ) is a positive, increasing and convex function for each
Foundations and Applications of Variational and Perturbation Methods
197
)
α > 0 ; φ ( λ ) is a positive, increasing and convex function for λ1 > α > 0 ; and φ% ( λ ) is a
positive, decreasing and concave functon independent of α .
Proof. Positivity of all three functions is clear. Increasing property of φ ( λ ) is proved in Lemma 5.7. Its second derivative is seen to be strictly positive by essentially the same argument. ) With B = [ A − (1 − p 0 )α ] we have φ ( λ ) = α ( u 0 ,[ A + α p 0 − λ ]−1 B u 0 ) . Since
(u 0 , B u 0 ) = (u 0 , V u 0 ) ≥ 0 and for each w orthogonal to u 0 , ( w , B w ) = (u 0 ,[ A − (1 − p 0 )α ]w ) ≥ ( λ1 − α ) , it follows that B ≥ 0 . Therefore,
B ≥ 0 is well defined and commutes with
[ A + α p 0 − λ ]−1 , yielding
)
φ (λ ) = α ( B u 0 ,[ A + α p 0 − λ ]−1 B u 0 ) .
(5.10)
The result follows from Eq. (5.10) by the same arguments as for φ ( λ ) . The same procedure together with Lemma 5.8, yields the properties of φ% ( λ ) with either form, of Lemma 5.8.i or ii, although the form of Lemma 5.8.i is more convenient • With the properties of the functions established in Proposition 5.2, converging lower and upper bounds to λ0 can be obtained from Proposition 4.2 by setting f ( λ ) = λ + φ (λ ) , ) ) f ( λ ) = φ ( λ ) and f ( λ ) = φ% ( λ ) . Alternatively, φ ( λ ) can be used to produce a sequence
)
)
{λ } of converging lower bounds by iteration or by the Newton method and then {φ%(λ )} n
n
)
)
converges from above. Also if λˆ0 > λ0 is such that φ (λˆ0 ) < λˆ0 , then λˆn +1 = φ (λˆn ) converges from above to λ0 . The condition guarantees that λˆ0 > λ0 is below the next fixed ) point of φ (λ ) that may exist in the interval (λ0 ; λ0′ ) . In most of the applications in the ) unimolecular rate calculations, φ (0) and φ% (0) are as accurate as the standard methods and frequently more accurate, which is well beyond the accuracy of the experimental results. In ) some cases, φ (0) and φ% (0) provide lower and upper bounds to determine λ effectively 0
exactly [Pritchard and Vatsya, 1983]. We give some illustrative examples below. Example 5.2. Let A = [ μ (1 − p 0 ) + V ] . Although simple, it is still instructive and arises in realistic strong collision limit in the unimolecular reactions. Taking α = μ in Lemma 5.7,
we have
198
S. Raj Vatsya
ηn , μ +ν n − λ ) ) η nν n reactive , φ (λ ) = φ μ (λ ) = μ (u 0 ,[ μ + V − λ ]−1V u 0 ) = μ ∑ n μ +ν n − λ φ (λ ) = φ μ (λ ) = μ (u 0 ,[ μ + V − λ ]−1 u 0 ) = μ ∑ n = 0 N
(5.11-a) (5.11-b)
−1
⎛ η nν n ⎞ ⎡ η nν n ⎤ . (5.11-c) reactive reactive φ% (λ ) = φ%μ (λ ) = ⎜ ( μ − λ )∑ n ⎟ ⎢1 − ∑ n μ +ν n − λ ⎠ ⎣ μ + ν n − λ ⎦⎥ ⎝ In Eqs. (5.11-b) and (5.11-c), the sum is over the reactive states only, defined by ν n > 0 . Thus, elementary bounds to λ0 are given by
μ∑n
reactive
⎛ η nν n reactive η nν n ⎞ ⎡ reactive η nν n ⎤ ≤ λ0 ≤ ⎜ μ ∑ n ⎟ ⎢1 − ∑ n ⎥ μ +ν n μ +ν n ⎠ ⎣ μ +ν n ⎦ ⎝
−1
•
(5.12)
Example 5.3. Consider
A = [ μ 0 (1 − p 0 ) + V + ∑ m =1 μ m ( I m − q m )] , M
(5.13)
where I m are the identity projections on the proper subspaces of the original with the identity
I0 , such that I m I l = I l for each l ≥ m , and q m are the orthoprojections defined by q m w = I m u 0 ( I m u 0 , w ) /( u 0 , I m u 0 ) . Example 5.2 is a special case of this example. ) With α = μ 0 , the functions φ ( λ ) , φ (λ ) and φ% ( λ ) are determined by
ψ (λ ) = [ μ 0 − λ + V + ∑ m =1 μ m ( I m − q m )]−1 u 0 M
= [B 0 − ∑ m =1 μ mq m ] u = B u , M
−1
−1 0 M
0
where
B 0 = [V − λ + ∑ m = 0 μ m I m ] = M
∑
M m =0
(V m − λ + μ% m ) J m ,
with
J m = I m − I m +1 , m = 0,1,..., ( M − 1), J M = I M
υ m = I M − m +1u 0 , μ m =
μ M − m +1 m , μ% m = ∑ l = 0 μ l , (υ m ,υ m )
Bm u = B0 − ∑ l =1 μl υl (υl , u ), m = 1, 2,..., M , for each u , m
V m = J mV = V J m , υ l = I M − l +1u 0 .
(5.14)
Foundations and Applications of Variational and Perturbation Methods
199
Invertibility of Bm for each m is easily established by the invertiblity of BM and the fact that B m ≥ B m+1 . The result of Corollary 4.2 can now be used to determine B M−1 recursively starting with
B 0−1 =
∑
M m=0
[V m − λ + μ% m ]−1 J m =
∑
M m=0
σ m0 J m .
(5.15)
This results in Proposition 5.3. With the symbols as above
Bl−1υ j =
∑
M m = M − j +1
σ ml J mu 0 , l = 0,1,..., M ; j = (l + 1),....., ( M + 1) ;
with σ ml being the diagonal matrices defined by
σ ml = σ m0 , m = ( M − j + 1),...., ( M − m); σ ml =
σ ml −1 , m = ( M − j + 1),...., M ; 1 − μ mξ m
where
ξ m = (υ m , Bm−1−1υ m ) = ∑ l = N − m +1 (u 0 , σ lm −1 J l u 0 ) . N
Proof. The result is true for l = 0 from Eq. (5.15). Assume it to be valid for some l . It follows from Corollary 4.2 that
Bl−+11υ j = Bl−1υ j + μl +1Bl−1υl +1 =
∑
m = M - j +1
=
∑
m = M - j +1
M
M
(υl +1 , Bl−1υ j ) 1 − μl +1ξl +1
σ J mu + μl +1 l m
0
(υl +1 , Bl−1υ j ) 1 − μl +1ξl +1
∑
M m = M −l
σ ml J mu 0 .
σ ml +1 J mu 0 ; j = l + 2,...., ( M + 1).
We have used the induction assumption and have set
σ ml +1 = σ ml = σ m0 , m = ( M − j + 1),...., ( M − m − 1); σ
l +1 m
= σ
l m
⎡ (υl +1 , Bl−1υ j ) ⎤ ⎢1 + μ l +1 ⎥ , m = ( M − j ),...., M . 1 − μ l +1ξ l +1 ⎥⎦ ⎢⎣
200
S. Raj Vatsya Since
j ≥ l + 1, m ≥ N − l , it follows that
I m , J m and σ
l m
I N − j +1 J m = J m
and the matrices
all commute. Consequently
(υl +1 , Bl−1υ j ) = (υ j , Bl−1υl +1 ) = ∑ m= M −l ( I M − j +1u 0 , σ ml J mu 0 ) = ∑ m= M −l (u 0 , σ ml J mu 0 ) = ξl +1 , M
M
and hence,
σ ml +1 =
σ ml , m = ( M − l ),...., M . 1 − μl +1ξ l +1
The result follows by induction • From the result of Proposition 5.3, the necessary functions are given by Corollary 5.6. With the symbols as above, we have
ψ (λ ) =
∑
φ (λ ) = μ 0 )
φ (λ ) = μ 0
M
σ mM J mu 0 ,
m =0
∑ ∑
M
m =0 M m=0
(u 0 , σ mM J mu 0 ) = μ0ξ M +1 , (V u 0 , σ mM J mu 0 ) = μ 0 − μ0 ( μ 0 − λ )ξ M +1 ,
(5.16)
) φ (λ ) 1 % = − ( μ0 − λ ). φ (λ ) = φ (λ ) ξ M +1
Proof. Follows from the results and the definitions of Proposition 5.3, by substitution in Eqs. (5.9-a) to (5.9-c) •
These functions can be expressed in several forms. We demonstrate a recursive scheme to determine them based on the result of Proposition 5.3. Proposition 5.4. Let ξl be as in Proposition 5.3 with ξ0 = 0 . Then
ξ l +1 = ( J M −l u 0 , σ M0 − l u 0 ) +
ξl , l = 0,1,...., M . (1 − μl ξ l )
Proof. From Proposition 5.3 we have
Foundations and Applications of Variational and Perturbation Methods
ξl +1 = (υl +1 , Bl−1υl +1 ) = = ( J M −l u 0 , σ Ml −l u 0 )
∑ + ∑
= ( J M −l u 0 , σ Ml −l u 0 ) + = ( J M −l u 0 , σ Ml −l u 0 ) +
M
m = M −l
( I M −l u 0 , σ ml J m u 0 ) =
M m = M −l +1
∑
M m = M −l
201
(u 0 , σ ml J m u 0 )
(u 0 , σ ml J m u 0 )
1 (1 − μlξ l )
∑
M m = M − l +1
(u 0 , σ ml −1 J m u 0 )
ξl . (1 − μlξ l )
The result is transparent for l ≥ 1 ; for l = 0 , the second term is absent and the result holds due to the choice ξ0 = 0 • The recursive scheme of Proposition 5.4 can be converted into the continued fraction expansions of these functions as follows. Let ξ%l = (1 − μl ξl ) . By a rearrangement of the result of Proposition 5.4, we have
β l +1 , l = 1, 2,...., M ; ξ%l
ξ%l +1 = α l +1 −
where β l +1 = μ l +1 / μ l , α l +1 = 1 + β l +1 − μ l +1 ( I M − l u 0 , σ M0 − l u 0 ) . It follows that
βl +1 ξ%l
ξ%l +1 = α l +1 −
β l +1
= α l +1 −
α l − ....... ....... −
with ξ%1 = 1 −
μM 0
|| J M u ||
2
(5.17)
β3
β α 2 − %2 , ξ1
( J M u 0 , (VM − λ + μ M ) −1 u 0 ) •
Self-adjoint operators were shown to admit the Jacobi matrix representations with respect to appropriate bases, in sec. 3.III.2. (Proposition 3.7). As a consequence, it was shown that certain elements of their inverses admit continued fraction expansions (Lemma 3.12)). For a finite-dimensional case, i.e., a matrix, this reduces to a tri-diagonal matrix and the finite expansion of the type of Eq. (5.17). Some reactions have been modeled in terms of the tridiagonal matrices as well as in terms of the matrices of Example 5.3, by somewhat independent physical reasoning. The physical meanings of the coefficients appearing in the two expansions serve to relate the processes invoked in arriving at different mathematical representations of the original matrices. Such information is usually lost in direct numerical
202
S. Raj Vatsya
computations. Also, these expressions are more suitable for determining the behavior of rate constant, e.g., for large and small values of the physical parameters μ , μ0 [Vatsya and Pritchard, 1983]. In addition, the consequent algorithms have frequently been found to be more efficient, convenient and often more accurate in numerical evaluations of the rate constants and related quantities. The examples given above serve to illustrate the basic procedure, which can be used to analyze a variety of structures. If a matrix is decomposable as a sum of the two, with different structures, Corollary 4.1 provides a means to exploit both of them independently. For illustration, consider
A = [ μ 0 − λ + T + V + ∑ m =1 μ m ( I m − q m )] f = g M
(5.18)
where T is a tri-diagonal matrix. Equations of the type of Eq. (5.18) and its variations arise in further extensions of Example 5.3 as well as independently. Let
B = [ μ 0 − λ + T + V + ∑ m =1 μ m I m ] , M
which is also a perturbed tri-diagonal matrix as long as V is tri-diagonal that includes the diagonal. Assuming B to be invertible, we have
f = ⎡1 − ⎣
∑
−1
μ B −1q m ⎤ B −1 g = ⎡1 − m =1 m
M
⎦
⎣
∑
−1
μ q ⎤ B −1 g . (5.19) m =1 m m M
⎦
Instead of Eq. (5.19), its symmetric form can be considered without any essential complications. As such, it is clear that the structure of B can be exploited to obtain B −1 . To clarify, only the solutions of a set of algebraic equations are needed, not the complete inverse. This leaves the structure of (1 −
∑
M m =1
μ mq m ) essentially the same as that of the matrix
(1 − ∑ m =1 μ mq m ) . M
These results provide also the procedures to treat perturbed matrices efficiently [Vatsya and Tai, 1988]. Some sparse matrices can be decomposed as the sums of structured matrices and low rank perturbations, and thus can be treated as perturbed matrices.
5.III. CONJUGATE RESIDUAL-LIKE METHODS Consider again the equation
Af = g
(5.20)
where A is a real invertible matrix of rank N . The purpose of the conjugate gradient and conjugate residual-like methods is to solve Eq. (5.20) approximately, with satisfactory degree
Foundations and Applications of Variational and Perturbation Methods
203
of accuracy, without having to solve the complete system of N equations. To this end, let f0 be an approximation to f , which can be set equal to zero. Then Eq. (5.20) reduces to an equivalent equation
A ( f − f 0 ) = r0 = g − A f 0
(5.21)
If an approximate solution h0 to Eq. (5.21) can be obtained, then the next approximation f1 to f is given by f1 = ( f 0 + h0 ) . Substitution reduces Eq. (5.21) to another equivalent equation. Repeating the procedure generates a sequence of approximations f n to f and the residues rn satisfying the equations
A ( f − f n ) = rn , n = 0,1, 2,...
(5.22)
If the sequence of residues converges to zero, then the invertibility of A implies convergence of f n to f . Thus, the problem reduces to solving Eq. (5.22) approximately, which is of the same type as the original Eq. (5.20). In principle, Eq. (5.20) is solvable by this procedure as long as the approximations are sufficiently accurate to guarantee the convergence of resulting sequence. For computational efficiency, the sequence should converge sufficiently rapidly. For the scheme to be attractive, f n should also be generated efficiently so that after all the iterations, total computational labor is less than solving the original set of N equations by alternative methods. Approximate solutions of these equations can be obtained by the variational method with of R N . The set of equations given by Eq. (3.18-a) with a basis in a proper subspace appropriate substitutions then has a lower rank than the original N . The methods to solve Eq. (5.20) with this approach are termed the conjugate gradient-like. If A is positive-definite, the existence of the solution of the reduced set of equations is guaranteed. For a general matrix, even if self-adjoint, one may encounter singularities. To enlarge the class of matrices to include in the procedure, one may solve the equivalent equation:
AtA f = Atg ,
(5.23)
t
where A is the transpose of A . This reduces this case to inverting a positive definite matrix. It is clear that the methods to treat Eq. (5.23) apply to the conjugate gradient-like methods where A is positive-definite. Solving Eq. (5.23) by the variational method is equivalent to determining the stationary point of diagonal element F [h; h ] = F d [h ] of the form F in where
F d [ h ] = 2( h , g ) − ( h , A t A h ) ,
(5.24)
204
S. Raj Vatsya t
for a fixed g , as describe in sec. 3.I. (Eq. (3.14)). Since A A is positive-definite, the stationary point is a maximum (Eq. (3.15-a)). Thus, if w is the variational solution, we have
max F d [u ] = F d [ w] . u in
Let
Fnd [ h ] = 2( h , A t rn ) − ( h , A t A h ) . In view of the fact that the underlying space is real, we have
|| rn +1 ||2 = || rn ||2 − Fnd [ hn ] .
(5.25)
where hn is an approximate solution of A t A hn = A t rn . Thus solving A t A hn = A t rn by the variational method is equivalent to minimizing || rn +1 || at each stage of iteration. For this reason, this class of the methods is termed the conjugate residual-like. Almost invariably, in these schemes is taken to be a subspace of the linear span of
{ A r } , termed the Krylov space. This is identical to the construction of H n
0
M
( r0 , A ) in the
treatment of the moment problem (Theorem 3.10). With this choice of the basis, the procedures are termed the Petrov-Galerkin-Krylov methods. However, in the moment method, H M ( r0 , A ) is in general an infinite-dimensional space and the issues are whether
H M ( r0 , A ) is large enough to accommodate the solutions and if the procedure produces a convergent sequence as the subspace is enlarged. In the present case, the dimension N ′ of H M ( r0 , A ) can at most be equal to N , in which case it is identical to R N . As indicated in sec. 3.III., N ′ can be strictly less than N . If so, the first question is still relevant. Due to finite dimensionality of the underlying space, the second question becomes redundant. Instead, the issue to be resolved is if the sequence generated by solving a reduced set of the equations at each iteration stage converges to the solution of interest or not. This problem will be addressed in the following. Although a resolution of the sufficiency of H M ( r0 , A ) to accommodate the solution was provided by Theorem 3.10, it will be resolved in the process of proving the convergence results below. Various implementations of this basic procedure differ mostly in the selection of . In the residue based generalized conjugate residual methods ( RGCR ), is contained in the linear span of the residues {rm }m = 0 . We restrict to this procedure for the present, which is n
sufficient for an illustration of the basic points. In general, RGCR = RGCR (b, θ , φ ) are described by the following steps [Saad and Schultz, 1985; Vatsya, 1988]: Choose f0 and set p0 = r0 = g − A f 0 .
Foundations and Applications of Variational and Perturbation Methods
205
Iterate: For n = 0,1, 2,....
f n +1 = f n + rn +1 = rn −
n
∑
m =φ ( n ) n
∑
m =φ ( n )
α nm pm , (5.26-a)
α nm A pm = A ( f − f n +1 ),
where α nm are determined by
( A pm , rn +1 ) = ( A pm ,[ rn −
n
α ∑ φ
l = ( n)
nl
A pl ]) = 0, φ ( n) ≤ m ≤ n .
(5.26-b)
Compute pn +1 :
pn +1 = rn +1 +
n
∑θ
m= ( n )
β nm pm ,
(5.27-a)
where β nm are determined by b[ p m ; p n +1 ] = 0, θ ( n ) ≤ m ≤ n. The functions φ ( n ) and θ ( n ) are non-negative, non-decreasing, integer valued and
b[ ; ] is a positive definite bilinear form, which is unambiguously identified with a positive definite matrix T , i.e., b[u ; w] = (u , T w) , e.g., from Proposition 2.5. Thus, pn +1 is determined by solving
( pm , T [ rn +1 +
n
∑ θ
l = (n)
β nl pl ]) = 0, θ (n) ≤ m ≤ n .
(5.27-b)
Eq. (5.26-b) is just Eq. (3.17-a), equivalently, (3.18-a), with appropriate substitutions, which determines hn with [ n − φ ( n ) + 1] basis vectors pm , and Eq. (5.27-b) determines the next basis vector as the component of rn +1 , which is orthogonal to the previous
[ n − θ ( n ) + 1] basis vectors. Since 0 ≤ φ ( n) ≤ n and 0 ≤ θ ( n ) ≤ n , it follows that φ (0) = θ ( n ) = 0 . The storage requirements and the computational effort in RGCR (T , θ , φ ) increase rapidly as the differences [ n − φ ( n )] and [ n − θ ( n )] increase. To economize, one method may be used for a fixed number of iterations. If a satisfactory degree of accuracy has been achieved, then the procedure is terminated; otherwise it is restarted. This is termed the restarted RGCR (T , θ , φ ) . A widely used form in this class is defined by
GCR ( l% ) = RGCR ( A t A , 0, 0) restarted after every l% iterations. Another frequently used
206
S. Raj Vatsya
method is ORTHOMIN (l ) = RGCR ( A t A , n − l + 1, n − l ) . While in GCR ( l% ) , the size of the orthonormal set and equations increase with each iteration, in case of ORTHOMIN (l ) they are kept fixed, equal to l . , and let w′ be the variational approximation to [ A t A ]−1 g in ′ . Then it follows from Corollary 3.2, as in Remark 3.5 (i) (Eq. (3.33-c), that Let
′ be a subspace of
F d [ w ] = || A w ||2 ≥ 0; F d [ w ] − F d [ w ′] = || A ( w − w ′) ||2 ≥ 0 .
(5.28)
Although stated, the estimates will not be needed for the proofs; inequalities will suffice. Defining α nm by Eq. (5.26-b) is equivalent to minimizing || rn +1 || , i.e., maximizing
Fnd [u ] as u varies over the span of the vectors
{ pm }m=φ ( n ) , n
which will be denoted by
S [φ ( n ), n ] . If { pm }m=φ ( n ) is a linearly independent set of nonzero vectors, then Eq. (5.26-b) n
defines {α nm } and then Eq. (5.26-a) determines rn +1 uniquely. The maximum of Fnd [u ] is attained for hn and rn +1 = rn − A hn . It is clear from Eq. (5.28) that || rn || , when well-defined, is a non-increasing sequence and || rn +1 || decreases as φ (n) is decreased. If r0 = 0 , then f = f 0 ; otherwise p0 = r0 . For m ≥ 1 the basis vectors are constructed as follows. Let n0 = min[φ ( n), θ ( n )] and let
{ pm }m=n n
0
be the set of nonzero, linearly
independent vectors such that ( p n , T p m ) = 0, θ ( n − 1) ≤ m ≤ n − 1 . The set
{ pm }m=n n
0
together with Eqs. (5.26-a) and (5.26-b) determines rn +1 uniquely. If rn +1 = 0 , then
f = f n +1 ; otherwise pn +1 is determined by Eqs. (5.27-a) and (5.27-b) with
β nm = − The set
(T pm , rn +1 ) . (T pm , pm )
{ pm }m=( n+1) n +1
0
(5.29)
constructed by Eq. (5.29) satisfies the orthogonality condition but
there is no guarantee that it will be a set of nonzero, linearly independent vectors. To reduce the rank, in these methods we generate a subspace of H M ( r0 , A ) at each iteration stage. The basic hindrance to the convergence arises from the fact that H M ( r0 , A ) may not be generated in the course of the iterations. As a counter example, let
⎡ 0 1⎤ ⎡1 ⎤ ⎡0⎤ , g = ⎢ ⎥ ; with f = ⎢ ⎥ . A=⎢ ⎥ ⎣0⎦ ⎣1 ⎦ ⎣1 0 ⎦
(5.30)
Take f 0 = 0 ; then r0 = g , α 00 = 0 , f1 = 0 , r1 = r0 and p1 = 0 for any choice of T .
Foundations and Applications of Variational and Perturbation Methods To establish the conditions to guarantee that
{ pm }
207
and {rm } are well defined, which is
shown to be essential for the convergence by this counter example, we need the following result:
{ pl }l =φ ( m −1) , m
Lemma 5.9. Let pm ≠ 0 ;
0 ≤ m ≤ n − 1 , be linearly independent and let
pn = 0 or { pm }m=φ ( n −1) be linearly dependent. Then n
max
u in S [ φ ( n ), n ]
Fnd [u ] = 0.
Proof. The set S [φ ( n − 1), n − 1] , which is contained in S [φ ( n − 2), n − 1] , is a set of nonzero, linearly independent vectors. Hence Eq. (5.26-b) together with Eq. (5.26-a), with (n − 1) substituted for n , determines rn uniquely, i.e.,
rn = rn−1 −
n −1
∑ φ
m = ( n −1)
α n −1,m A pm ,
where α n −1, m are determined by ( A p m , rn ) = ( A p m ,[ rn −1 −
n −1
∑ φ
l = ( n −1)
α n −1, m A pl ]) = 0, φ ( n − 1) ≤ m ≤ ( n − 1) .
If pn = 0 or pn depends linearly on
{ pm }m=φ ( n−1) , n −1
(5.31)
then S [φ ( n ), n ] is contained in
S [φ ( n − 1), n − 1] . Consequently from Eq. (5.28), we have that
max
u in S [φ ( n −1), n −1]
Fnd [u ] ≥
max
u in S [φ ( n ), n ]
Fnd [u ] ≥ 0 .
(5.32)
The left side of Eq. (5.32) is obtained by solving Eq. (3.17-a), equivalently, Eq. (5.26-b) with appropriate substitutions, i.e.,
( A pm , rn −
n −1
∑
l =φ ( n −1)
γ l A pl ) = 0, φ (n − 1) ≤ m ≤ n − 1 .
Since ( A pm , rn ) = 0 from Eq. (5.31), for each m , and the matrix with elements
( A pm , A pl ) is positive definite due to the positive definiteness of A A , γ m = 0 for each m . Therefore, the maximizing vector is equal to zero, implying that the left side in Eq. (5.32) t
is equal to zero. The result follows from Eq. (5.32) •
208
S. Raj Vatsya Lemma 5.10. Let ( A + A t ) be positive definite and let φ ( n ) ≤ θ ( n − 1) for each n . If
rn ≠ 0 then pn ≠ 0 and { pm }m =φ ( n −1) is linearly independent. n
Proof. The result is clearly true for n = 0 . Assume it to be valid for some n . If pn +1 = 0
or if { pm }m=φ ( n ) is linearly dependent, then from Lemma 5.9, n +1
max
u in S [φ ( n +1), n +1]
Fnd+1[u ] = 0.
From (5.27-a) we have rn +1 = pn +1 −
n
∑θ
m= ( n )
β n , m pm . Since φ ( n + 1) ≤ θ ( n ) , it follows that
rn +1 is a vector in S [φ ( n + 1), n + 1] . Hence from Eq. (5.28),
δ = max Fnd+1[u ] ≤ u in S n +1
max
u in S [φ ( n +1), n +1]
Fnd+1[u ] = 0 ,
where Sn +1 is the one-dimensional span of the vector rn +1 . A straightforward computation yields
(rn +1 , A rn +1 ) 2 1 (rn +1 ,[ A + A t ]rn +1 ) 2 δ = = > 0, || A rn +1 ||2 4 || A rn +1 ||2 which is a contradiction. Hence the result holds for ( n + 1) , and by induction, for all n • Let rn , pn be well-defined for n = 0,1, 2.... , which is obviously the case if the conditions of Lemma 5.10 are satisfied. For n = 0 , p0 = r0 = P0 ( A ) r0 . If pn , rn are of the form Pn ( A ) r0 , then it follows from Eqs. (5.26-a) and (5.27-a) that rn +1 , pn +1 are of the form
Pn +1 ( A ) r0 . By induction then, pn , rn are in H M ( r0 , A ) . The following results show that the exact solution is obtained with at most N ′ = dim .[ H M ( r0 , A )] basis vectors. Corollary 5.7. In addition to the assumptions of Lemma 5.10, let [ n − φ ( n − 1)] ≥ N ′
for some n . Then rm = 0 for some m ≤ n . Proof. Assume that rm ≠ 0, m = 0,1, 2,..., n . From Lemma 5.10,
pm ≠ 0 for
0 ≤ m ≤ n and { pm }m =φ ( n −1) is linearly independent. If [ n − φ ( n − 1)] ≥ N ′ , this implies that n
the number of linearly independent vectors in H M ( r0 , A ) exceeds its dimension, which is a contradiction •
Foundations and Applications of Variational and Perturbation Methods
209
Corollary 5.8. Let ( A + A t ) be positive definite. Then RGCR (T , θ , 0) yields the
exact solution in at most N ′ iterations. Proof. If φ ( n) = 0 for each n , then φ ( n ) ≤ θ ( n − 1) . Hence from Corollary 5.7, rm = 0
for some m ≤ N ′ • Corollary 5.9. If the assumptions of Lemma 5.10 are satisfied, then RGCR (T , 0, φ )
yields the exact solution in at most N ′ iterations. Proof. The assumptions θ ( n) = 0 and φ ( n ) ≤ θ ( n − 1) for each n imply that φ ( n) = 0 . The result follows from Corollary 5.8 •
So far T has played no significant role except for defining
{ pn } . In the following we
show that the choice T = A A reduces the computational complications significantly. t
Lemma
5.11.
Let
(A + At )
be
positive
definite,
T = At A
and
φ ( n ) = θ ( n − 1) = φ% ( n ) . If rn ≠ 0 for each n , then α nm = 0 for φ% ( n ) ≤ m ≤ ( n − 1) and α nn = ( A p n , rn ) / || A p n ||2 . Proof. From Eq. (5.26-b), {α nm } is obtained by solving
( A pm , rn ) −
n
α ∑ φ
l= (n)
nl
( A pm , A pl ) = 0, φ% ( n ) ≤ m ≤ n .
If T = A A , then Eq. (5.27-b) reduces to ( A pm , A pn +1 ) = 0, φ% ( n ) ≤ m ≤ n , for each t
n . Hence ( A pl , A p m ) = 0, l ≠ m ≤ n ,
(5.33)
yielding
α nm = ( A pm , rn ) / || A pm ||2 , φ% ( n ) ≤ m ≤ n .
(5.34)
From Lemma 5.10, pm ≠ 0 for φ% ( n ) ≤ m ≤ n ; thus α nm are well defined. We show below by induction that for each n , ( A pm , rn ) = 0 for φ ( n ) ≤ m ≤ n − 1 , implying the remainder of the result.
210
S. Raj Vatsya We show the validity of the result for ( n + 1) if it holds for n . The case of n = 1 can be
shown to be true by the same argument. Alternatively, p−1 can be taken to be zero, showing the validity of the result for n = 0 . Assume that ( A pm , rn ) = 0, φ% ( n ) ≤ m ≤ n − 1 , i.e., α nm = 0 from Eq. (5.34). Now it follows from Eq. (5.26-a) that rn +1 = rn − α nn A pn , and hence
( A pm , rn +1 ) = ( A pm , rn ) − α nn ( A pm , A pn ) . For φ% ( n ) ≤ m ≤ n − 1 , the first term is zero by assumption and the second is zero from Eq. (5.33). For m = n ,
( A pm , rn +1 ) = ( A pn , rn ) −
( A pn , rn ) || A pn ||2 = 0 , || A pn ||2
from Eq. (5.34). Thus ( A pm , rn +1 ) = 0 for φ% ( n ) ≤ m ≤ n ; and since φ% ( n ) ≤ φ% ( n + 1) , also for φ% ( n + 1) ≤ m ≤ n • The following result for GCR is a trivial consequence: Corollary 5.10. Let ( A + A t ) be positive definite. Then GCR (∞ ) yields the exact
solution in at most N ′ iterations. Proof. Since GCR (∞ ) = RGCR ( A t A , 0, 0) , the result follows from Corollary 5.8 •
The main objective of these methods is to iterate the solutions of a small set of algebraic equations rather than increase the dimension until the exact solution is obtained. The associated convergence question is addressed next. Since rn = A ( f − f n ) and A is invertible, f n → f if and only if rn → 0 , as indicated earlier. If rn = 0 , for some n , then f = f n and the question of convergence becomes redundant. Therefore we assume that rn ≠ 0 for all n . Let
⎡ 1ν 2 ⎤ , μ = ⎢1 − 4 γ ⎥⎦ ⎣
(5.35)
where ν is the smallest eigenvalue of ( A + A t ) and γ is the largest eigenvalue of A A . If t
( A + A t ) > 0 , it is clear that μ < 1 . We have
Foundations and Applications of Variational and Perturbation Methods
211
Theorem 5.4. Let ( A + A t ) be positive definite and let φ ( n ) ≤ θ ( n − 1) for each n .
With {rn } being the sequence of the residues generated by RGCR ( A t A , θ , φ ) , we have that for each n , || rn ||≤ μ n || r0 ||→ 0 as n → ∞ . Proof. Under the stated assumptions, for each n , pn ≠ 0 and
{ pm }m=φ ( n−1) n
is linearly
independent, from Lemma 5.10. Since φ ( n − 1) ≤ φ ( n ) ≤ θ ( n − 1) ≤ θ ( n ) , this implies the same for
{ pm }m=θ ( n ) . Therefore {rn } n
and
{ pn }
are uniquely determined from Eqs. (5.26-b)
and (5.27-a), respectively. From Eq. (5.27-a), rn is in S [θ ( n − 1), n ] , which is contained in
S [φ ( n ), n ] . Now, from Eq. (5.28),
δ =
Fnd [u ] ≥ max Fnd [u ] = δ ′ .
max
u in S [ φ ( n ), n ]
u in S n
Since || rn +1 ||2 ≤ || rn ||2 −δ ≤ || rn ||2 −δ ′ , we have that || rn +1 ||2 ≤ || rn ||2 −δ ′ = || rn ||2 −
( rn , A rn ) 1 ( rn ,[ A + A t ]rn ) 2 = || r || − ≤ μ 2 || rn ||2 , n || A rn ||2 4 || A rn ||2
implying the bound part of the result. Convergence follows from the bound together with μ <1• Consider the restarted version of RGCR ( A t A , θ , φ ) . This method is used to generate
f n and rn for 1 ≤ n ≤ l% ≤ N ′ , starting with f0 and r0 , which completes the first cycle. At the end of the cycle, one has rl% = A ( f − f l% ) , which is of the same form as the original
r0 = A ( f − f 0 ) . Thus the method can be applied to the new equation. Let the residues obtained during the lth cycle be denoted by r , m = 0,1, 2,..., l% . l ,m
that
Theorem 5.5. With the assumptions of Theorem 5.4, for each l and 0 ≤ m ≤ l% , we have %
|| rl ,m || ≤ μ ( l −1) l + m || r0 || → 0 . l →∞
Proof. For l = 1 , it follows from Theorem 5.4 that || r1, m ||≤ μ m || r0 || and
r1, m = A ( f − f1, m ), 0 ≤ m ≤ l% .
212
S. Raj Vatsya Assume that || rl , m ||≤ μ ( l −1) l + m || r0 || for some l and rl , m = A ( f − f l , m ) for 0 ≤ m ≤ l% . %
Setting m = l% , one has that rl , l% = rl +1,0 and f l ,l% = f l +1,0 . By substituting rl +1,0 , f l +1,0 for
r0 , f 0 , respectively, in Theorem 5.4, yields %
|| rl +1, m || ≤ μ m || rl +1,0 || = μ m || rl , l% || ≤ μ ll + m || r0 || , and rl +1, m = A ( f − f l +1, m ) . The bound part follows from induction and the convergence, from the bound together with μ < 1 • The following result is obtained from Theorems 5.4 and 5.5 by observing that
ORTHOMIN (l ) = RGCR ( A t A , n − l + 1, n ) and GCR ( l% ) = RGCR ( A t A , 0, 0) restarted after every l% iterations. Corollary 5.11. Let ( A + A t ) be positive definite. 1.
If {rn } is generated by ORTHOMIN ( l ) , 0 ≤ l ≤ n + 1 ; then
|| rn || ≤ μ n || r0 || → 0 . n →∞
(ii) If
{r } , l ,m
0 ≤ m ≤ l% is generated by GCR ( l% ) , during the lth cycle; then
%
|| rl ,m || ≤ μ ( l −1) l + m || r0 || → 0 • l →∞
We have concentrated on the residue based generalized conjugate residual method. There are other procedures in the general class termed the Petrov-Galerkin-Krylov methods. However, the present method of analysis can be used to study the properties of others as well. An advantage in using these iterative schemes is that the sparseness and other structures of the matrices, as well as the relative properties of the right side, are exploited without their explicit considerations. Although their convergence is limited by the counter example of Eq. (5.30), their range of applicability is reasonably wide, particularly with the extensions and adjustments that have been made. Disadvantages are that they are numerically attractive only if the convergence is rapid. For example, in GCR , the accumulated computational effort increases rapidly with iterations impacting adversely on its efficiency. This is one reason that its re-started version is often preferred. When the procedure converges, the rate is limited by the condition number of the matrix, i.e., the ratio of its smallest eigenvalue to the largest,
Foundations and Applications of Variational and Perturbation Methods
213
related with μ defined by Eq. (5.35). Although the error estimates obtained are upper bounds, in practice the rate of convergence is quite close to that determined by them. If the condition number is small, the convergence of the procedures is too slow to be of much practical value. These properties of the schemes limit their range of applicability. The methods have been modified and improved to enable their efficient use in a variety of problems but a judicious choice among these and the other available methods is necessary whenever a system of algebraic equations is to be solved.
Chapter 6
6. ATOMIC SYSTEMS 6.I. PRELIMINARIES In this chapter, general results obtained on the variational and perturbation methods developed in ch. 3 and ch. 4 will be used to investigate the non-relativistic quantum mechanical atomic systems. Systems of (N + 1) interacting particles can be reduced to the case of N by separating the centre of mass motion, leaving 3N independent variables. The corresponding time-dependent Schrödinger’s equation reads
i
∂ u (t ) 2 = ⎡⎣ − ∇ 3N + V3N ⎤⎦ u (t ) , ∂t
(6.1)
2 is the Laplacian in R 3 N , which can be expressed as a tensor product as in Eq. where ∇ 3N
(2.25), and V3N is the corresponding interparticle potential. For a two-particle system, Eq. (6.1) reduces to a single particle in a potential field, which is considerably simpler than the genuine multi-particle system. In case of the atoms, purely Coulombic potentials are encountered. Nuclear multi-particle systems interact via short-range potentials, which can be treated by the same methods as the atoms with simplifications resulting from a short range nature of the interactions. Also, the neutral atoms interact with each other with potentials that decay rapidly at infinity, e.g., the interaction between the neutral atoms with electrons only in the S-orbitals decays exponentially at infinity. In general, the decay is proportionate to a negative power of the distance but still sufficiently rapid for the potential to be short-range [Nuttall and Singh, 1979]. For ions, the potential term decays as the Coulombic plus a short range term. Thus, the short range and long range potentials as well as their combinations are of interest. We shall restrict to the Coulomb and potentials of the Rollnik class. Various other cases indicated in the process can be treated similarly. Eq. (6.1) can be expressed as
i
∂u (t ) = ⎡⎣ A00 + V ⎤⎦ u (t ) , ∂t
(6.2)
216
S. Raj Vatsya
2 and V is the interaction term. Two-particle where A 00 is the self-adjoint realization of −∇ 3N
potentials are A 00 - compact (Example 4.1(ii)). For a multi-particle system, this class of potentials defines a self-adjoint operator A = ( A 00 + V ) , bounded from below. This follows by expressing it as a tensor product of the self-adjoint operators of Example 4.1(i), and the identity operators (Eq. (2.25)). The stationary states of Eq. (6.2), i.e., u n (t ) = e − iλn t u (0) , describe the bound states, reducing it to an eigenvalue equation
⎡⎣ A00 + V ⎤⎦ un (t ) = λnun (t ) .
(6.3)
Since the parameter t is redundant in Eq. (6.3), it will be suppressed. As was shown in sec. 4.II.2., the scattering states for a particle in a short range potential can be described by inhomogeneous equations. Inhomogeneous equations are also encountered in the treatment of eigenvalue problems as shown in sec. 4.II.1. As indicated in Remark 4.5, a determination of the critical points also reduces to an eigenvalue problem in some cases; in others, additional considerations are required. Some physical phenomena are described in terms of the time evolution of solution of Eq. (6.2). Thus, all three types of equations considered in the previous chapters are relevant. In this chapter, we restrict to the bound states, including critical points, and the scattering phenomena. The propagator will be considered in ch. 7 together with other topics in transport phenomena and a treatment of the fundamental quantum theory.
6.II. EIGENVALUES AND CRITICAL POINTS 6.II.1. Helium Atom Eq. (6.3) with N = 1 , V = − 2r1−1 , where r j =| r j | , representing the Hydrogen atom, is exactly solvable with λn = − 1 / n 2 , n = 1, 2,..., ∞ , each with multiplicity n 2 , which is commensurate with Example 4.1(ii) and Remark 4.3.(i). Thus, its discrete spectrum is contained in the interval [−1;0] with zero being the accumulation point. The positive real line constitutes its absolutely continuous spectrum. For the Helium atom, we have
⎡ 1 ⎤ A = ⎡⎣ A 0 + V ⎤⎦ = ⎢ A 0 + ⎥, | r1 - r2 | ⎦ ⎣ where
(6.4)
Foundations and Applications of Variational and Perturbation Methods
217
⎡ 2 2⎤ − A 0 = ⎡⎣ A00 + V 0 ⎤⎦ = ⎢ − ∇ r21 − ∇ r22 − ⎥ r1 r2 ⎦ ⎣ 0
The operator A can be expressed as a tensor product
A 0 = [ A10 ⊗ 1 + 1 ⊗ A 20 ] with A 0j = [ − ∇ r2j − 2 / r j ] . The spectra of the tensor products of closed operators can be determined by Ichinose’s lemma [Reed and Simon, 1978, Vol IV, Corollary 2, Sec. XIII.10], which states that
σ [ A1 ⊗ 1 + 1 ⊗ A 2 ] = σ [ A1 ] + σ [ A 2 ] . It follows that the point spectrum of A
0
(6.5)
consists of the set
{− n
−2
− m−2 }
∞ n ,m =1
, i.e., the
isolated eigenvalues in [ −2; −1) with the accumulation points at −1 and zero, and the essential spectrum covering the interval [ −1; ∞ ) . The part of the spectrum covering strictly positive line is included in its absolutely continuous part. The term | r1 − r2 |−1 shifts the point spectrum in positive direction from the monotonicity principle (Lemma 3.20). However, the lower part of the spectrum still consists of the isolated eigenvalues with an accumulation point at −1 [Kato, 1951]. The atomic systems with more electrons possess similar property, i.e., the lower part of the spectrum consists of the isolated eigenvalues with an accumulation point at the lowest point of the essential spectrum However, except for the Helium atom, their 0
eigenvalues are contained in the continuous spectrum of A . The essential spectrum in all cases consists of the absolutely continuous part, the transition points and the eigenvalues embedded in the continuum. It is clear from Theorem 3.13, that the approximating sequences converging from above to a finite number of the lowest lying eigenvalues of the atomic systems can be generated by the variational method with a basis in L2 ( R 3N ; dr ) . The sequences of the approximating eigenprojections also converge to the exact. This procedure is universally used for this purpose and to solve similar other problems. The alternative described in Remark 4.4, although numerically different, is analytically equivalent to the Rayleigh-Ritz method and thus, produces identical results. However, a lack of adequate error estimates on these upper bounds, the accuracy of the approximations remains undetermined. Converging error bounds or the lower bounds can correct this deficiency. Lower bounds to a finite number of the lowest lying eigenvalues of the Helium atom and similar other systems have been obtained by the method of Lemma 4.9 and Theorem 4.5. There are some other similar and apparently unrelated methods for this purpose. Details of this and other similar methods are available elsewhere [Weinstein and Stenger, 1972]. Therefore, we describe the application of this method to the Helium atom briefly and compare it with the alternative of Theorem 4.6. The later method requires no further explanation as it is directly applicable with straightforward substitutions in the expressions given in Lemmas 4.12 and 4.13.
218
S. Raj Vatsya
)
(
Approximate potential V N = VP N [P N VP N ]−1 P N V
) required in Lemma 4.9 is obtained
by inverting the matrix V%N with its n , m element given by (V%N ) nm = (φn , V φm ) with respect
)
to a basis {φn } . From Eq. (3.17-a), the degenerate operator V N is given by
) VN u =
∑
N n , m =1
V φn ([V%N ]−1 ) nm (V φm , u ) .
(6.6)
As indicated in Theorem 4.5, with an arbitrary basis {φn } , the method often experiences difficulty in solving the required eigenvalue problem
) ) )) (A 0 + VN ) u = λ u .
(6.7)
The difficulty is circumvented by taking the basis such that V φ n = u n0 , where un0 are the eigenvectors of A . With this choice, (V%N ) nm = (un0 , V -1um0 ) , which reduces Eq. (6.7) to an 0
eigenvalue equation in the span of
{u } , 0 n
resulting in substantial computational
simplification. Evaluation of (V%N ) nm in case of the Helium atom is facilitated by the fact that
V
−1
= | r1 − r2 | .
It is clear that the strict positivity of V as an operator, i.e., (u , V u ) > 0 , is necessary in this method; otherwise Eq. (6.6) may not even be defined. However, this is not sufficient to deduce the convergence of the resulting sequence. In addition to a lack of positive
{ }
definiteness of V , an obstacle to the convergence arises from the fact that un0 is not a basis in H = L ( R 2
3N
; dr ) . Completeness of the spectral function (Corollary 2.2) requires 0
integration on the spectrum of A , which has a non-empty essential part. However, numerical applications of this method have produced quite accurate bounds. All of these analytical limitations are eliminated in the method of Theorem 4.6. The requirement of the strict positivity of the potential V is reduced to non-negativity, the 0
convergence result is valid as long as A and A are self-adjoint, and the procedure is applicable with an arbitrary basis without causing any complications. All methods for the lower bounds require more computational effort than the standard method to compute upper bounds. Additional work required in the method of Theorem 4.6 is essentially the evaluation
)N
2
of the matrix elements of A and the effort in repeated construction of ζ
)N
)N
fixed points λ j . Since ζ
j
j
(λ ) to obtain its
( λ ) is an increasing function, its fixed point can be obtained by
the iterative method [Singh, 1981] but it converges too slowly. Alternative methods, e.g., the bisection and Newton’s method (Proposition 4.2), have been successfully used, which are quite satisfactory [Tai, Pritchard and Vatsya, 1993]. There are other methods to calculate the 2
lower bounds requiring only the matrix elements of A , but their convergence is not assured.
Foundations and Applications of Variational and Perturbation Methods
219
However, there are cases where the method of Theorem 4.5 is satisfactory and more convenient, as will be illustrated in sec. 7.II. Applications of all methods to compute the lower bounds to binding energies of more complex atoms than Helium have been impeded for a lack of suitable base operator. The 0
Hydrogen-like part A is not suitable since all of its eigenvalues are embedded in the essential spectrum of the perturbed operator. Further development of the topic is too limited to warrant a discussion.
6.II.2. Short Range Potentials A major drawback of all methods to compute the lower bounds is their need for an exactly solvable base operator A ≤ A . In the following, a procedure is described, which does not require a comparison operator to produce the converging lower and upper bounds, although it has other limitations. In particular, its extension to multi-particle systems will require further developments. Consider a particle in a potential, i.e., N = 1 . Let A = ( A 00 + κ V ) with an absolutely 0
integrable positive V of the Rollnik class. The eigenvalues of A were characterized in Proposition 4.1, Eq. 4.42, as the solutions of
[1 − κ K (γ )] υ = 0
(6.8)
where K (λ ) = κ V (λ − A00 ) −1 V . Each eigenvalue γ of A is in the negative real line and the continuous spectrum of A coincides with the positive real line. Let ξ ( λ ) be an eigenvalue of [ −K (λ )] . Then Eq. (6.8) is equivalent to ξ (γ ) = −1/ κ . The potential strength κ is assumed to be negative and thus, − 1 / κ > 0 . Since K (λ ) is compact, converging sequence of approximations to each ξ ( λ ) for each λ , and the corresponding eigenprojections can be produced by the variational method (Corollary 3.7), and since K (λ ) is also in the Hilbert-Schmidt class, computable, converging error bounds can be obtained by Proposition 3.3.
Let {ξ N (λ )} be the approximating sequence converging to ξ ( λ ) and let {δ N (λ )} be
the sequence of the corresponding error bounds, which are given explicitly by Proposition 3.3 in terms of K (λ ) , K N (λ ) = P N K (λ )P N and the eigenprojection p N (λ ) of K N (λ ) corresponding to the eigenvalue ξ N (λ ) , as 1/ 2 ⎡ ⎛ 4 β N (λ ) ⎞ ⎤ 1 δ N (λ ) = α N (λ ) ⎢1 − ⎜ 1 − ⎟ ⎥ → 0, 2 α N (λ ) 2 ⎠ ⎥ N →∞ ⎢⎣ ⎝ ⎦
where
(6.9)
220
S. Raj Vatsya
β N (λ ) = || p N (λ )K (λ )(1 − p N (λ )) ||2 → 0 , N →∞
α N (λ ) = d N (λ ) − || (1 − p N (λ ))(K (λ ) − K N (λ ))(1 − p N (λ )) || d N ( λ ) = || B N ( λ ) − 1 ||− 1
B N (λ ) = [ξ N (λ ) − (1 − p N (λ ))K N (λ )(1 − p N (λ ))] . For a Hilbert-Schmidt K (λ ) , ||
|| is replaced with the Hilbert-Schmidt norm, whenever it
exists, for computational convenience. The results are valid with the operator norm as well. Since [ −K (λ )] is a positive, compact operator, the upper part of its spectrum consists of the isolated eigenvalues and thus ξ N ( λ ) ↑ ξ ( λ ) , which follows from Theorem 3.13 applied
)
to K (λ ) . The opposite bound is given by [ξ N (λ ) + δ N ( λ )] . Let γ N , γ N be the solutions of
)
)
)
γ N = [ξ N (γ N ) + δ N (γ N )]
(6.10-a)
γ N = ξ N (γ N ) ,
(6.10-b)
and
respectively. The convergence properties of these approximations are obtained in the following Theorem.
) ) Theorem 6.1. With γ N , γ N as in Eqs. (6.10-a, b), γ N ↑ γ ↓ γ N . )
)
Proof. The convergence of γ N may not be monotonic; we only have that γ N ≤ γ and
)
that γ N converges to γ . However, this implies the existence of at least one monotonically convergent subsequence. This is the only one that is retained in the numerical computations, which can still be denoted by { N } . With this understanding, we have
[ξ N ( λ ) + δ N ( λ )] ↓ ξ ( λ ) ↑ ξ N ( λ ) ,
(6.11)
for each fixed λ as N → ∞ . Also, ξ ( λ ) , ξ N (λ ) increase from zero at −∞ to ∞ as λ approaches zero, which follows from the same arguments as in Lemma 4.13. The stated result will follow if the convergence in Eq. (6.11) is uniform with respect to λ in a closed bounded set enclosing γ , as in Theorem 4.6. The uniform convergence with the assumption of monotonicity will follow from the continuity of the bounds in Eq. (6.11) from Dini’s theorem (Lemma 1.2), and without it, from their uniform continuity from ArzelaAscoli theorem (Lemma 1.3). Proofs in both of the cases are about the same. Since the arguments have been repeatedly used, we sketch the steps for the continuity of the
Foundations and Applications of Variational and Perturbation Methods
221
approximating sequences, which will follow from the continuity of ξ N (λ ) and δ N (λ ) , which will follow from the continuity of α N (λ ) and β N (λ ) . The compact operator K N (λ ) is clearly uniformly continuous with respect to λ , i.e.,
|| K N (λ ′) − K N (λ ) ||→ 0 as λ ′ → λ . Hence, ξ N (λ ) are continuous and p N (λ ) are uniformly continuous functions of λ (Lemma 3.2(ii), Remark 3.2. (ii)). Consequently [ p N (λ )K (λ )(1 − p N (λ ))] = η N (λ ) are uniformly continuous. Since
| (|| η N (λ ′) || − || η N (λ ) ||) | ≤ || η N (λ ′) − η N (λ ) || , || η N (λ ) || and hence, β N ( λ ) =|| η N ( λ ) ||2 , are continuous functions of λ . The continuity of
α N (λ ) follows essentially by the same estimates and arguments • While the method of Theorem 6.1 can be used to produce both, the lower and upper bounds, upper bounds can still be obtained more conveniently by the variational method used with A = ( A 00 + κ V ) . Thus, the practical value of this procedure is in eliminating the need for a comparison operator in evaluating the lower bounds. Inspite of a complex appearance of the expressions, the procedure can be implemented without extensive inconvenience and computational effort [Singh and Turchetti, 1977]. The procedure of Theorem 6.1 can be adjusted to include the potentials that change sign. However, it is still inconvenient for the numerical calculations due to an explicit involvement of
V . This difficulty can be circumvented by replacing [κ V ( A00 − λ ) −1 V ] by
[κ ( A 00 − λ ) − 1V ] , which is a Hilbert-Schmidt operator for the square integrable potentials as long as λ is strictly negative (Example 4.1 (ii)), which is sufficient for the consideration of eigenvalues. The result ξ (γ ) = −1/ κ is still valid with ξ ( λ ) being the eigenvalue of
[κ ( A 00 − λ ) − 1V ] (Remark 4.3). This enlarges the scope of the method for its convenience of applications but at the expense of a loss of symmetry with resulting complications. In this case, although γ is real, ξ (λ ), ξ N (λ ) can be complex. The bound δ N (λ ) then provides the
radius
of
a
circle
centered
about
ξ N (λ ) , which contains
ξ ( λ ) , i.e.,
| ξ (λ ) − ξ N (λ ) |≤ δ N (λ ) (Proposition 3.3). Since ξ (γ ) = −1/ κ is real, for sufficiently large N , the circle must intersect with the real line containing γ , due to the continuity of
ξ N (λ ) and convergence of ξ N (γ ) to ξ (γ ) . Although this creates some numerical complications, the procedure is still useful in determining the converging bounds. The method still excludes the long range potentials. However, [κ ( A 00 − λ ) −1V ] for negative λ is still compact. The only reason to restrict [κ ( A 00 − λ ) − 1V ] to the HilbertSchmidt class in Proposition 3.3 was to obtain computable and convergent upper bounds to
222
S. Raj Vatsya
the norms defining δ N (λ ) . If computable and convergent majorants to the norms in δ N (λ ) can be obtained, then the scheme can be used for such potentials as well. In general, a numerical treatment of the integral operators needed in these procedures increases the computational effort in comparison with the differential operator used in the standard variational method to obtain the upper bounds. The potentials in the Rollnik class can support only a finite number of eigenvalues (Remark 4.5). As κ is increased, this number decreases and beyond a critical value κ c , there is no eigenvalue. The critical value of the parameter is of practical interest, which is considered next. Although some of the analysis is extendible to more general cases, we use the symmetric operator [κ V ( A00 − λ ) −1 V ] . The critical behavior is usually formulated in terms of the differential equations, which is also the case with the eigenvalue problems. Both of the formulations are mathematically equivalent, which can be seen by the method of forms as in Theorem 4.2 or directly as in Proposition 4.1. In any case, the eigenvalue equation κ K ( λ ) = 1 is equivalent to the differential equation
[ A 00 − λ + κ V ] w = 0 .
(6.12)
As is clear from Eq. (6.8), κ c = −1/ λ0 , where λ0 is the largest eigenvalue of
K ( −0) = K ( +0) = K (0) , i.e., λ0 =|| K (0) || . Thus, the evaluation of the critical value of the parameter reduces to a special case of Theorem 6.1. It is clear from the above that the critical value of parameter κ c , can also be determined from Eq. (6.12) with λ = 0 , which reduces to a generalized eigenvalue problem
[ A 00 + κ cV ] u = 0 .
(6.13)
At times, the parameter of interest does not appear as a multiplicative factor. In some of such cases, a coordinate transformation reduces the equation to the form of Eq. (6.13) [Sergeev and Kais, 1999]. In any case, the eigenvector u is related to υ defined by [λ0 + K (0)]υ = 0 , by
u = [( A00 ) −1 V ] υ , as shown in Proposition 4.1. As indicated there, u may or may not be in H and thus, Eq. (6.13) may or may not have a solution. However, [( A 00 )1/ 2 u ] and
V u
are in H . If the variational approximations to the solution of Eq. (6.13) are obtained with a basis in H , it still converges to u but with respect to the norms || ( A 00 )1/ 2 u || and || V u || . The generalized eigenvalue problems of the type of Eq. (6.13) can be reduced to a regular eigenvalue equation
[V
− 1/ 2
A 00V
− 1/ 2
+ κ c ] u ′ = [K
−1
(0) + κ c ] u ′ = 0 .
(6.14)
Foundations and Applications of Variational and Perturbation Methods
223
The reduction is the limiting form of the method usually used to obtain its numerical solutions. For illustration, the method of forms can be used by constructing H −− as the completion of H with respect to the scalar product (u , υ ) −− = (u , K (0)υ ) . Eq. (6.13) is then the same as Eq. (6.14) in H −− . Numerically, Eq. (6.13) is first reduced to a generalized matrix eigenvalue problem and then to a finite-dimensional form of Eq. (6.14), which can be treated as follows. Since [ −K (0)] is positive and compact, [ −K −1 (0)] is bounded below by its lowest eigenvalue 1/ λ0 = −κ c . All of its eigenvalues are isolated with no upper bound. Thus, the converging upper bounds to κ c can be obtained by the variational method with basis in the domain of K −1 (0) , i.e., the range of K (0) . This demonstrates the equivalence of the integral and differential formulations of the critical point. However, the numerical method to compute the lower bound to critical point requiring an explicit use of the compact operator available in the integral form is inapplicable in the differential form. In the differential form, alternative methods can still be developed depending on the properties, e.g., the use of a comparison operator if available. Being at the borderline, an analysis of the critical behavior requires techniques from both, the bound state and scattering theories. A study of the scattering phenomena for a multiparticle system requires the techniques of multi-channel scattering, which have not been developed here. Existence and determination of the critical parameter has been formally stated and used for multi-particle systems in terms of a generalized eigenvalue problem similar to Eq. (6.13) [Sergeev and Kais, 1999]. Such formulation in terms of the differential equation raises the question of the existence of an eigenvalue at zero. As indicated above, zero may or may not be a legitimate eigenvalue, i.e., Eq. (6.13) may or may not have a solution in H . Further, in case of the long range potentials, there is a discontinuity in the sense that infinitely many eigenvalues are absorbed in the continuum. Strong fields required for ionization of such systems are too strong perturbations to be treated by the present methods. Thus, a rigorous formulation and an analysis of the critical phenomena for such systems requires various other techniques, which fall outside the scope of the present text However, if the consequent generalized eigenvalue problem has a solution, then it can be solved by the methods developed in ch. 3 and ch. 4.
6.III. SCATTERING 6.III.1. Formulation Quantum mechanically, a free particle is represented by a wave packet decomposable in plane waves. We restrict to a particle in a locally square integrable and absolutely integrable potential V of the Rollnik class. It is also assumed that the potential is positive and invertible, although the greatest lower bound can be zero and the potential can vanish at a set of measure zero. These restrictions can be relaxed, which result in some algebraic complications but the arguments remain essentially the same. Additional considerations require separation of the
224
S. Raj Vatsya
null space of V , which is irrelevant and keeping track of the sign, equivalently, the spaces of positivity and the negativity of V . Decomposition of the wave packet corresponding to a particle under the influence of a potential is approached through the generalized eigenfunctions obtained in Theorem 4.8 known as the distorted plane waves. It follows from Eqs. (2.29-a) and (4.39) that
1 κ e ik r − 3/ 2 (2π ) 4π
Φ ± (k , r ) =
∫
dr′
e ± ik |r −r′| V (r ′)Φ ± (k , r ′) | r − r′ |
(6.15)
= Φ (k , r ) − κ (Γ V Φ ± )(k , r ), 0
0 ±
where Γ 0± = ( A 00 − λ m i 0) − 1 . For large distances from the scattering region, Φ ± (k , r ) behave as [Progovečki, 1971; ch. V, Theorem 6.4]
Φ ± (k , r )
r →∞
1 eik 3/ 2 (2π )
r
+ T k± (rˆ )
e ± ik r , r
(6.16)
with
T k± (rˆ ) = −
κ 4π
∫ dr′e
m ik ′ r ′
V (r′)Φ ± (k , r′) ,
(6.17)
where k ′ = k rˆ and rˆ is the unit r vector. The coefficients T k+ (rˆ ) and T k− (rˆ ) are the scattering amplitudes corresponding to the outgoing and the incoming spherical waves r −1e ik r and r −1e − ik r , respectively. The scattering cross-sections defined by | T k± (rˆ ) |2 are experimentally observable quantities. We consider the case of T k+ (rˆ ) . A parallel formulation can be developed for T k− (rˆ ) . Green’s function admits expansion in terms of the Legendre polynomials in (cos ϑ ) , where ϑ is the angle between the unit r and r ′ vectors: ∞ e ik |r −r ′| = ik ∑ l = 0 (2l + 1) Pl (cos ϑ ) jl ( kr ′) hl ( kr ), r > r ′ , | r − r′ |
(6.18)
where jl and hl are the spherical Bessel and Hankel functions, respectively. For the spherically symmetric potentials, the solution Φ + (k , r ) admits the corresponding expansion,
Φ + (k , r ) =
∑
∞
l=0
al ( k ) Pl (cos ϑ ) u l ( k , r ) ,
(6.19)
Foundations and Applications of Variational and Perturbation Methods
225
where al (k ) depends on the normalization chosen for ul ( k , r ) . With suitable normalization, the radial solution ul ( k , r ) can be expressed as ∞
ul ( k , r ) = jl ( kr ) + κ ∫ dr ′ Gl ( r , r ′) V ( r ′) ul ( k , r ′) , 0
(6.20-a)
where
Gl ( r , r ′) = − jl ( kr ) η l ( kr ′), r ≤ r ′ ,
(6.20-b)
with ηl being the spherical Neumann functions. Asymptotic form of the solution determines ∞
t l = − tan δ l = κ ∫ dr ′ r ′2 jl ( kr ′) V ( r ′) ul ( k , r ′) , 0
(6.21)
where δl are the phase-shifts. Scattering amplitude admits expansion in terms of the phaseshifts:
T k+ (rˆ ) =
1 ∞ (2l + 1) eiδ l sin δ l Pl (cos ϑ ) . ∑ l=0 k
(6.22)
This partial wave decomposition corresponds to the radially reduced form of A ′ :
Al = −
1 d ⎛ 2 d ⎞ l(l + 1) + V (r ) , ⎜r ⎟ + 2 r dr ⎝ dr ⎠ r2
(6.23)
where A ′ is the extension of A to the space of the generalized eigenfunctions and Al is its restriction to a partial wave. Since there is no danger of confusion, Al will denote the formal differential operator given by Eq. (6.23) as well as its self-adjoint realization in the underlying Hilbert space H = L2 ( R + ; r 2 dr ) . The representation corresponding to Eq. (6.16) reduces to
)
ψ + = ψ − = Φ0 + ψ
r→∞
jl ( kr ) + t l η l ( kr )
(6.24)
From Eq. (6.22), the scattering amplitude can be constructed from the solutions of the radial equations. Following analysis and the results are presented for the scattering amplitude, which can be transposed for the phase shifts in a straightforward manner. The amplitude T k+ (rˆ ) defines scattering from the state represented by the wave vector
k = k + to k ′ = k − as follows. From Eq. (6.17) we have
226
S. Raj Vatsya
T (k − ; k + ) = −
2 / π T k+ (rˆ ) =
κ (2π )3/ 2
∫ dr′e
− ik −
r
V (r )Φ + (k + , r )
(6.25)
= κ < Φ 0 (k − ),V Φ + (k + ) > . g − = V Φ 0 (k − ) = V Φ −0 , and f%+ = V Φ + , the amplitude is expressed as T (k − ; k + ) = κ ( g − , f%+ ) . Substitution for Φ ± from Eq. (6.15) in Eq. (6.17) yields the
With
integral equations for T (k − ; k + ) termed the element of the transition operator. The resulting equations, called the momentum space representations, are mathematically equivalent to Eq. (6.15). The following analysis can be transposed to the momentum space, which can be used for numerical computations also. The transition operator can be defined by an integral equation with T (k − ; k + ) as its kernel, [Reed and Simon, 1979; Theorem 11.42]. With the scattering theory formulated in a Hilbert space setting, the pertaining problems can be treated with the methods developed for the operators in the Hilbert spaces. It follows from Eq. (6.15) that
[1 − κ K + ( λ )] f%+ =
V Φ 0 (k + ) =
V Φ 0+ = g + , λ =| k |2 = k 2 ≥ 0 ,
(6.26-a)
which is half of Eq. (4.43) with slightly different notation adopted to avoid confusion with the following adjoint equation:
[1 − κ K − (λ )] f%− = g − .
(6.26-b)
It is clear that
( g − , f%+ ) = ( f%− , g + ) = ( f%− ,[1 − κ K + (λ )] f%+ ) .
(6.27)
% = V −1/ 2 f% . % = Φ and Φ For consistency, we also set Φ − − + + The scattering amplitude has the following differentiability property: Corollary 6.1. With the symbols as above, we have
∂ T (k − ; k + ) % ,V Φ % >. = ( f%− , f%+ ) = < Φ − + ∂κ Proof. With the observation that T (k − ; k + ) = κ ( g − , f + ) , the result follows as in
Lemma 4.18 •
Foundations and Applications of Variational and Perturbation Methods
227
The potential V is equal to the strong derivative ∂A/ ∂κ on a properly defined domain and thus, Corollary 6.1 provides the counterpart of the Hellmann-Feynman theorem for the continuous spectrum. For the remainder of this section, describing the methods to evaluate the scattering amplitude, κ will be absorbed in V . The analysis of Theorem 4.8 can be used with Eq. % . The analysis is % such that f%− = V Φ (6.26-b) to obtain a generalized eigenfunction Φ − −
carried out in H . The arguments and the results can be transposed to H V defined by the norm || χ ||V =|| V χ || . With this identification, both of the settings are equivalent. Since the results can be easily transposed into each other, they will be stated according to convenience without necessarily indicating the parallels.
6.III.2. Born Series The simplest method to solve Eqs. (6.26-a, b) is by its power series expansion, yielding
∑
f%±B (λ ) =
∞
n=0
K ±n (λ )g ± ( λ ) ,
(6.28)
which together with Eqs. (4.44-a, b) is equivalent to
Φ ±B (k ) =
∑
∞
n =0
[ −Γ 0+ ( λ )V ]n Φ 0+
(6.29-a)
The resulting expansion for T (k − ; k + ) = ( g − , f%+ ) is the following series expansion of the scattering amplitude,
T ±B (k − ; k + ) = ( g − , f%+B ) =
∑
∞
n=0
< Φ 0− , V [ −Γ 0+ ( λ )V ]n Φ 0+ > .
(6.29-b)
Eqs. (6.29-a, b) to determine the distorted plane wave and the scattering amplitude are known as the Born series expansions of the respective quantities. Thus, we have Proposition 6.1. If || K ± (λ ) || = || V Γ 0+ (λ ) V || < 1 , then the Born series
expansion given by Eq. (6.29-a) converges to the distorted plane waves Φ ± with respect to the norm ||
||V and the expansion given by Eq. (6.29-b) converges to the scattering
amplitude T (k − ; k + ) . Proof. Since K ± (λ ) are bounded, the expansions converge for || K ± ( λ ) || < 1 (Lemma
2.3), which implies the results •
228
S. Raj Vatsya
6.III.3. Schwinger’s Method Although Proposition 6.1 provides a rigorous formulation of the Born series expansions, their numerical value is limited by their slow rate of convergence even for the potentials sufficiently weak to ensure that || K ± (λ ) || < 1 . This motivated attempts to obtain the scattering solutions by the variational methods. While the Rayleigh-Ritz method proved to be quite successful in producing satisfactory upper bounds to the binding energies, in spite of a lack of error estimates, which was compensated for by the lower bound methods to some extent, the variational formulations of the scattering problems proved to be less satisfactory. A method based on the integral equation for the distorted plane wave was developed by Schwinger, which is considered next. A form suitable for an evaluation of T (k − ; k + ) = ( g − , f%+ ) can be based on Eq. (6.27):
F [ h− ; h+ ] = ( g − , h+ ) + ( h− , g + ) − ( h− ,[1 − K + (λ )]h+ ) ,
(6.30)
which is stationary about the exact solutions f%+ , f%− , with its stationary value given by
F [ f%− ; f% ] = ( g − , f%+ ) = ( f%− , g + ) = ( f%− ,[1 − K + (λ )] f%+ ) .
(6.31)
One of the essentially equivalent forms of the functionals used in the Schwinger method, F s [ψ − ;ψ + ] , is given by
F s [ψ − ;ψ + ] = < Φ 0− , V ψ + > + < ψ − , V Φ 0+ > − < ψ − ,[V + G Γ 0+V ]ψ + > . (6.32) Parallel sets of algebraic equations to be solved resulting from Eq. (3.17-a) reduce to
∑ ∑
N m =1
α m+ (φn , [1 − K + ]φm ) = (φn , g + ) ,
(6.33-a)
α m+ < φn′ , [V + V Γ 0+V ]φm′ > = < φn′ , V Φ% 0+ > ,
(6.33-b)
N m =1
respectively. With the identification
V φn′ = φn , Eqs. (6.33-a) and (6.33-b) are clearly
equivalent. The sets of equations similar to Eqs. (6.33-a) and (6.33-b) corresponding to Eq. (3.17-b), although need not be solved, the approximations to the solutions will be required for reference. The two solutions will be denoted by
f%±s
= f%± N
=
∑
N m =1
α m±φ m , ψ ±s =
∑
N m =1
α m±φ m′ ,
(6.34)
Foundations and Applications of Variational and Perturbation Methods
229
with + corresponding to Eqs. (6.33-a), (6.33-b), and − , to their adjoint counterparts. With the identification
V φn′ = φn , the variational approximation to T + (k − ; k + ) is given by
T s (k − ; k + ) = ( g − , f%+s ) = =
∑ ∑
N m =1
α m+ ( g − , φm )
N
α + < Φ 0− , V φm′ > = < Φ 0− , V ψ +s > m =1 m
(6.35)
We have the following convergence result: Theorem 6.2. With the symbols as above, let
{
}
V φn′ = {φn } be a basis in
H = L ( R ; dr ) . Then 2
3
(i) || f%± N − f%± ||
= || ψ ±s − Φ ± ||V
→ 0;
N →∞
(ii) | T s (k − ; k + ) − T (k − ; k + ) | ≤ const. || f%+ N − f%+ || || f%− N − f%− || → 0 ; N →∞
for each λ > 0 . Proof. Since Eqs. (6-33-a) and (6.33-b) are equivalent, it is sufficient to consider Eq. (6.33-a), which is equivalent to
f%+ N
= [1 − K + N ]−1 P N g +
from Theorem 3.1. Since K + (λ ) is compact for each λ > 0 , (i) follows from Theorem 3.4, and (ii) follows from Corollary 3.3 and (i) • Since K + (λ ) belongs to the Hilbert-Schmidt class, converging and computable error bounds on | T s (k − ; k + ) − T (k − ; k + ) | can be obtained from Proposition 3.2. Schwinger’s method for the partial waves is obtained as for the scattering amplitude. The equation parallel to Eqs. (6.26-a) and (6.26-b) then reads
[1 − K l ] f l = g l , where K l is the integral operator with kernel
(6.36)
V Gl V , g l = V jl and f l = V ul , as
defined in Eqs. (6.20-a) and (6.20-b). From Eq. (6.21), we have that t l = ( g l , f l ) . The operator K l is still compact but it is also symmetric and hence, self-adjoint. Let
t lN = t ls = ( g l , f l N ) denote the Schwinger’s approximations to tl , i.e.,
230
S. Raj Vatsya
t lN = ( g l , f lN ) = ( g l ,[1 − P N K lP N ]− 1 P N g l ) ,
(6.37)
with PN being the orthoprojection converging strongly to the identity in L2 ([0; ∞ ), r 2 dr ) . Corollary 6.2. With f lN and t lN as in Eq. (6.37), || f lN − f l ||
→ 0 and
N →∞
| t l − t lN | ≤ const . || f l − f lN ||2 → 0 . N →∞
Proof. In view of the self-adjointness of K l , the adjoint of Eq. (6.36) can be taken to be
the same. The results follow from Theorem 6.2 with obvious substitutions • Not much additional advantage is gained in case of the partial waves, except the following: Corollary 6.3. In addition to the assumptions of Corollary 6.2, let || K l (λ ) || < 1 . Then
{t lN } ↑ t l as
N →∞.
Proof. It follows from Corollary 3.2, as in Eq. (3.33-c), that
(t lM − t lN ) = ( f lM − f lN ,[1 − K l ] f lM − f lN ) . Since || K l (λ ) ||< 1 , we have that (1 − K l ) ≥ 0 , implying that {t lN } is a non-decreasing sequence. The result follows from this together with Corollary 6.2 • While quite satisfactory, the Schwinger method requires an evaluation of the double integrals appearing in Eq. (6.33-b) decreasing interest in its applications. The procedure can be simplified by using the collocation variational method based on Proposition 3.9. Its application was illustrated for one-dimensional equations with the integral operator defined in a Hilbert space on the finite intervals. Adjustments for the present case are straightforward. Its application to the momentum representation of the Lippmann-Schwinger equation has been found to be more convenient. Another alternative is to use the Schwinger variational method involving only the differential operators, considered below. An evaluation of the matrix elements of the differential operators still requires integration but it is relatively easier. Schwinger’s method can be translated in terms of the differential operators by setting
)
ψ ± = Φ 0± − Γ 0±V ψ ± = Φ 0± + ψ ± with ψ ± as in Eq. (6.32) and
r →∞
Φ 0± + T k± (rˆ )
e ± ik r , r
(6.38)
Foundations and Applications of Variational and Perturbation Methods
T± = −
2 / π T k± s (rˆ ) = < Φ 0m , V ψ ± > .
231 (6.39)
Let A ′ be the extension of A to its generalized eigenfunctions, which is essentially the formal differential operator ( −∇ 2 + V ) defined on twice continuously differentiable functions. The operator A ′ reduces to A on D ( A ) . It follows from Eq. (6.38) that
) (∇ 2 + λ )ψ ± = (∇ 2 + λ )ψ ± = V ψ ± ,
(6.40-a)
equivalently,
( A ′ − λ )ψ ±
= V [ψ ± − ψ ± ] .
(6.40-b)
Straightforward manipulations transform F s (ψ + ;ψ − ) of Eq. (6.32) into
F s [ψ − ;ψ + ] = F s [ψ − ;ψ + ] = T + + < ψ − ,[ A ′ − λ ]ψ + > − < [ A ′ − λ ]ψ − , V −1[ A ′ − λ ]ψ + >
.
(6.41)
This expression for the form can be used instead of Eq. (6.32) to eliminate the complications caused by the double integrals. However, it requires the matrix elements of the operators involving A′ and V −1 . 2
Remark 6.1. Eq. (6.40-a) yielding the consequent results, is deduced by direct differentiation. A clearer picture emerges by expressing it in terms of K ± and the related
operators. It follows from Eq. (6.38) that
) V ψ ± = K ± V ψ ± are in the ranges of K ± but
g ± = V Φ 0± are not. We are assuming that
V ψ ± are in H , which are the only relevant
functions. Also, K ± are invertible. If not, then H can be taken to be the complement of their null spaces in the original Hilbert space without impacting upon the analysis. However, since K ± are compact, their inverses are unbounded with their domains being the ranges of K ± , which are properly contained in H . In particular, the inverses are not defined on g± . On the ranges of K ± , consider
V −1/ 2 (λ − A00 )V −1/ 2K ± ε u = V −1/ 2 (λ − A00 )(λ ± iε − A00 ) −1 V u = u m iε V −1K ± ε u , and for each υ in the domain of V −1 , we have
(6.42)
232
S. Raj Vatsya
| iε (υ , V −1K ± ε u ) | = ε | (V −1υ , K ± ε u ) | ≤ ε || V −1υ || || K ± ε u || → 0 . ε →0
Consequently, the left side in Eq. (6.42) converges to u weakly in H for all υ in a dense set. It may appear somewhat strange that the self-adjoint operator K −1 = V −1/ 2 ( λ − A 00 )V −1/ 2 is the inverse, even if in a restricted sense, of the non-selfadjoint operators K ± and the self-adjoint operator K 0 = (K + + K − ) / 2 , defined by the principal value of Green’s function. However, this is just a manifestation of the fact that ( λ − A 00 ) has multiple valued inverse. Consequently, K
−1
is defined on the union of the ranges of K ± ,0 ,
reducing to the inverse of each if restricted to the range of the respective operator. For these reasons the manipulations involving the inverses require some care. To indicate the restriction, K − 1 will still be denoted by K ±− ,10 , whenever helpful for clarification • Eq. (6.40-a) can now be expressed as
) K −1 V ψ ± = K −1h±s = i.e., h±s = K
±
Vψ ± =
f ±s ,
(6.43)
f ±s . Although deduced from the integral formulation of the Schwinger method,
Eq. (6.41) with ψ ± with the asymptotic behavior given by Eq. (6.38) can be considered independently. For use of Eq. (6.41), the trial function ψ ± are expressed as
)
ψ ± = Φ ±0 + ψ ± = Φ 0± + ∑ n=1α n± s ψ n± . with
N
(6.44)
) ) V ψ ± = h±s and [V −1/ 2 (∇ 2 + λ )ψ ± ] = K −1h±s = f ±s = V ψ ± . The form F s [ψ − ;ψ + ]
is easily seen to reduce to
F s [ψ − ;ψ + ] = F s [ f −s ; f +s ] = ( g − , f +s ) + ( f −s , g + ) − ( f −s ,[1 − K + ] f +s ) , which is the same as Eq. (6.30). Thus, if
{V
−1/ 2
(6.45)
(∇ 2 + λ )ψ n+ } = {V −1/ 2 (∇ 2 + λ )ψ n− }
constitutes a basis in H , then the resulting set of algebraic equations is equivalent to Eqs. (6.33-a) and (6.33-b), yielding a convergent procedure. The calculated value of ) N ) ) V −1/ 2 (∇ 2 + λ )ψ ± , with ψ ± = α ± s ψ n± , converges to f%± and V ψ ± converges to n =1 n
∑
h± = K ± f%± , together with a second order convergence of the approximations to the scattering amplitude, from Theorem 6.2. Parallel differential formulation for the partial wave case can be obtained in the same manner with the results of Corollaries 6.2 and 6.3 remaining valid. This establishes the equivalence of the two formulations with the stated identification of the trial functions. For calculations, Eq. (6.41) can be used by itself as long as the condition on the basis is satisfied.
Foundations and Applications of Variational and Perturbation Methods
233
We have studied the convergence properties of the operators of the type, PN APN , which restricts the matrix elements to (φn , Aφm ) . This condition can be relaxed to allow the matrix elements of the type (φn′ , Aφm ) . For example, the results hold if there is a positive definite operator B such that {Bφn′ } = {φn } , which can be shown to be valid by using Friedrich’s construction (Remark 2.5). Thus, in the Schwinger method, although the condition
{V
−1/ 2
(∇ 2 + λ )ψ n+ } = {V −1/ 2 (∇ 2 + λ )ψ n− } can be satisfied, it can be relaxed. Further
discussion and relaxation of this condition will be provided in the following subsection in the analysis of the Kohn method where it has significant implications. For the present, we retain this condition. By setting
) V ψ ± = h±s , Eq. (6.41) reduces to
F s [ψ − ;ψ + ] = F s [ψ − ;ψ + ] = ( g − , K +−1h+s ) + (K −−1h−s , g + )+(K −−1h−s ,[1 − K +−1 ]h+s ) (6.46) For calculations based on Eq. (6.45), F s [ψ − ;ψ + ] is made stationary with respect to the variations of K ±−1h±s . Direct variations of h±s in Eq. (6.46) are hindered by the terms
( g − , K +− 1 h+s ) and (K −−1 h−s , g + ) which are well defined by themselves, but since g± are not in the domain of K
−1
, they cannot be expressed as (K −− 1 g − , h+s ) and ( h−s , K +−1 g + ) . Although
interesting, Eq. (6.46) expressed in terms of K
−1
, thus offers no advantage.
6.III.4. Hulthén -Kohn Methods Historically, before the Schwinger method was reduced to the differential form given by Eq. (6.41), Hulthén considered the form
F H [ψ − ;ψ + ] = F K [ψ − ;ψ + ] = T + + < ψ − ,[ A ′ − λ ]ψ + > ,
(6.47)
which is stationary about the exact solutions in the sense that if ψ ± deviate from the exact solutions by δψ ± , then F K [ψ − ;ψ + ] deviates from T (k − ; k + ) by < δψ − ,[ A ′ − λ ]δψ + > . A numerical advantage of Eq. (6.47) over Eq. (6.41) is that it eliminates the need to evaluate the matrix elements involving A′ and V −1 . By necessity, Hulthén used the trial functions of the same type as Eq. (6.44): 2
ψ n± = Φ 0± + ∑ n =1α n± H ψ n± , N
with α 1± Hψ 1± taken so that α1± Hψ 1±
(6.48)
α1± H π / 2 r −1e ± ik r as r → ∞ , and ψ n± , n ≥ 2 ,
decaying sufficiently rapidly to ensure that they are in H , in fact in the domain of A . This is
234
S. Raj Vatsya
a legitimate choice from Eq. (6.38) and convenient for applications. With this choice, it can
{ }
be ensured that ψ n+
N n=2
= {ψ n− }
N n=2
without complications. Hulthén varied and obtained
α n± H , n ≥ 2 in terms of α 1± H = T +H by solving the following set of ( N − 1) equations
{ }
determined by ψ n±
∑
N m=2
N n=2
= {ψ n }n = 2 : N
α m± H < ψ n , [ A ′ − λ ]ψ n > = − < ψ n , V Φ 0± >, n = 2,3,..., N .
(6.49-a)
The parameter α1± H was then determined by a judicious choice out of the two solutions of
< ψ − ,[ A ′ − λ ]ψ + > = 0 ,
(6.49-b)
yielding an approximation to α 1± H = T ±H . Hulthén’s set of ( N − 1) equations can be expressed as
∑
N m=2
α m± H (ψ n , [ A − λ ]ψ m ) = (ψ n , g ±′ ), n = 2,3,..., N ,
(6.50-a)
with ψ n = ψ n± , n ≥ 2 , as long as V is square integrable. Eq. (6.50-a) is equivalent to
PN′ −1[ A − λ ]PN′ −1ψ N′ −1 = [ A N −1 − λ ]ψ N′±−1 = = PN′ −1 g ±′ . where
P N′ −1 is the orthoprojection on the manifold spanned by
(6.50-b)
{ψ n }n=2 N
and
ψ N′±−1 = ∑ n = 2 α n± Hψ n . N
From Theorem 3.7(i) or Theorem 3.8, [ A N −1 − z ]−1 converges strongly to [ A − z ]−1 for all non-real z . Consequently from Theorem 3.3, the spectral function of AN −1 , which is discontinuous, converges to the spectral function of A at all of its points of continuity, which includes the positive real line where the spectrum of A is absolutely continuous and its spectral function is an increasing function. Thus, the points of discontinuity of the spectral function of AN −1 must form a dense set in the positive real line as N → ∞ . For a finite N , there are only a finite number of singularities for the set of equations given by Eqs. (6.50-a, b). Therefore, the solutions can be obtained for λ in the complement of a discrete set of points. However, the number of the singularities increases as N is increased and the reliability of the calculated values remains questionable. Thus, we have a rather counterconvergence result for this method. The Kohn method is also based on the form given by Eq. (6.47) with the same basis as in Hulthén’s method, which is
Foundations and Applications of Variational and Perturbation Methods
ψ n± = Φ 0± +
∑
N n =1
235
α n± K ψ n± ,
(6.51)
but its variation is set equal to zero with respect to all of the parameters to generatethe following N equations:
∑
N m =1
α m± K < ψ nm , [ A ′ − λ ]ψ m± > = < ψ nm , g ± >, n = 1, 2,..., N . )
)
The Kohn approximation to ψ ± is given by ψ N± K =
∑
N n =1
(6.52)
α n± Kψ n . The solution of Eq. (6.52)
yields an approximate value α 1+ K of T (k − ; k + ) which is then improved upon to yield
T
K
= F K [ψ − ;ψ + ] = < Φ 0− , V Φ 0+ > −
∑
N m =1
α m+ K < ψ n− , V Φ 0+ > . (6.53)
As indicated earlier, the singularities in Hulthén’s method are due to the inherent properties of the operators and the prescription to evaluate the parameters. In case of the Kohn method, exact equations have well-defined solutions. However, the singularities are encountered in this method also in the calculations of the phase-shifts for some multi-particle atomic systems, which have been problematic inspiring considerable interest in resolving and correcting the problem, particularly as this method is widely used. Spurious singularities are not normally encountered in the calculations related to scattering involving the mild potentials, i.e., locally square integrable with sufficiently rapid decay at infinity. Still the convergence results even for such well-behaved systems are limited and sporadic. Nuttall (1969) considered this problem for the S-wave, i.e., l = 0 , and concluded that the approximate value t 0KN of t 0 obtained with N basis functions by the Kohn method converges to t 0 as N → ∞ , for almost every k , i.e., for all real k except for a set of measure zero. The analysis and the result are equally valid for all of the values of l and can be extended to the scattering amplitude in the multi-dimensional formulation. In any case, the result is of little computational value, since the values of k where the convergence cannot be asserted can be countably many and they cannot be identified. A useable result was obtained by Singh and Stauffer (1974). It was shown in case of the partial waves, that if such singularities are encountered then the offending vectors can be dropped, i.e., the reduced variational method (Proposition 3.6) yields a convergent procedure. This result will be illustrated below and extended to include the scattering amplitude. The Kohn method can be formulated in terms the operators K ± ,0 and K function can still be taken to be of the form given by Eq. (6.44). With
) [V −1/ 2 (∇ 2 + λ )ψ ± ] = K −1h±K = f ±K = V ψ ± , F K [ψ − ;ψ + ] is easily seen to reduce to
)
−1
. The trial
V ψ ± = h±K and
236
S. Raj Vatsya
F K [ψ − ;ψ + ] = F K [ f −K ; f +K ] = ( g − , g + ) + (K − g − , f +K ) + ( f −K , K + g + ) − (K − f −K ,[1 − K + ] f +K )
.
(6.54)
The set of algebraic equations corresponding to the variations of f mK read as
∑
N m =1
α m± K (φnm , [1 − K ± ]K ±φm± ) = (φnm , K ± g ± ), n = 1, 2..., N .
Set f%±KN =
∑
N m =1
(6.55-a)
α m± K φ n± . The bases {φn± } in H in Eq. (6.55-a) are independent. If
{φ } = {φ } = {φ } , Eq. (6.55-a) reduces to + n
− n
n
[PN K ±PN − PN K ±2PN ] f ±KN = PN K ± g ± ,
(6.55-b)
where PN is the orthoprojection on the space spanned by {φn }n=1 , as in Theorem 3.1. The N
corresponding approximation to T (k − ; k + ) is given by
T
K
( k − ; k + ) = F K [ f −KN ; f +KN ] = ( g − , g + ) + ( g − , K + f +KN ) .
(6.56)
Thus, the method with this form attempts to solve
[1 − K ± ]K + f%± = K ± g ± ,
(6.57)
For an invertible K ± , as presently is the case, Eq. (6.57) is equivalent to Eq. (6.26-a). If
K ± are not invertible, this equivalence holds on the ranges of K ± , which is sufficient. However, since ([1 − K ± ]K + ) are compact, their inverses are unbounded complicating the analysis. Consider the following expression for F K (ψ − ;ψ + ) expressed in terms of K
−1
, which is
equivalent to Eq. (6.54):
F K [ψ − ;ψ + ] = F K [ h−K ; h+K ] = ( g − , g + ) + ( g − , h+K ) + ( h−K , g + ) + ( h−K ,[1 − K +−1 ]h+K )
.
(6.58)
The corresponding set of algebraic equations is given by
∑
N m =1
α m± K (φnm , [1 − K ±−1 ]φm± ) = − (φnm , g ± ), n = 1, 2..., N .
(6.59-a)
Foundations and Applications of Variational and Perturbation Methods
237
{ } { }
If the basis sets are the same, i.e., φn+ = φn− = {φn } , this reduces to
[1 − P N K ±−1P N ]h±KN = − P N g ± ,
(6.59-b)
with the target equations
[1 − K ±−1 ]h± = − g ± .
(6.59-c)
The corresponding approximation to T (k − ; k + ) is given by
T
K
(k − ; k + ) = F K [ h−KN ; h+KN ] = ( g − , g + ) + ( g − , h+KN ) .
{ }={
with the identification φn
±
(6.60)
}
V ψ n± , Eq. (6.59-a) reduces to more familiar Eq. (6.52).
Consider Eq. (6.59-c). Since 1 is in the resolvent sets of K ± ,0 , it is also in the resolvent sets of K ±−,01 , which can be replaced by K
−1
(Remark 6.1). In fact, [1 − K ±−,01 ]−1 and
[1 − K −1 ]−1 are all compact. However, inversion of the restrictions of the operators poses complications. The situation improves considerably for the applications of these methods to calculate the tangents of the partial wave phase shifts, which covers most of the numerical applications of the methods. It follows from Eq. (6.36) and the definition of tl that
t l = ( gl , gl ) + ( gl , K l fl ) ,
(6.61)
where fl is the solution of Eq. (6.36), i.e., [1 − K l ] f l = g l , equivalently, of the partial wave form of Eq. (6.57)
[1 − K l ]K l f l
= K l gl ,
(6.62-a)
and hl is the solution of the partial wave form of Eq. (6.59-c)
[K l−1 − 1]hl = g l .
(6.62-b)
The operator K l is still compact but it is also symmetric and hence, self-adjoint. Consequently, K l−1 is also self-adjoint. Consideration of Eq. (6.62-a) provides no new insight. Consider the set of the algebraic equations associated with Eq. (6.62-b):
238
S. Raj Vatsya
∑
N m =1
α m (φn , [K l−1 − 1]φm ) = (φn , g ), n = 1, 2..., N ,
(6.63-a)
equivalently,
∑
N m =1
α m (ψ n , [λ − Al ]ψ m ) = (ψ n , g ), n = 1, 2..., N ,
Assume that the basis set {φn } =
{
(6.63-b)
}
V ψ n is contained in the domain of K l−1 , i.e., in the
range of K l . As indicated earlier, the operator K l−1 , being the inverse of a self-adjoint operator K l , is self-adjoint and by the spectral mapping theorem (Remark 2.3), 1 is in its resolvent set. Its spectrum, although has only the isolated eigenvalues, they are spread over the entire real line. It follows from Theorem 3.8 that if [(K l-1 ) N − 1]−1 is uniformly bounded then the solution of Eq. (6.63-a) converges to hl . However, this is not guaranteed. If a singularity is encountered, then the reduced variational method (Proposition 3.6) can be used to obtain a convergent sequence of vectors, i.e., the offending vector can be dropped [Singh and Stauffer, 1974]. This result can be used for numerical computations with some additional effort needed to implement the modification. The convergence result for the modified Kohn method for the partial waves is obtained above by using the self-adjointness of K l−1 and the compactness of its inverse. Compactness
of K l was used only to the extent that 1 is in the resolvent set of K l−1 . The basis {φn } is required to be in the domain of K l−1 , i.e., in the range of K l . In the following, the problem is −1
formulated further to accommodate the domain restrictions on K , and the result is extended to cover the case of the scattering amplitude. In the process, various technical points are illustrated and the conditions for convergence are clarified. Consider the trial functions given by Eqs. (6.48) and (6.51).
)
ψ N± = Φ 0± + ∑ n = 0 α n± N ψ n± = Φ 0± + ψ N± , N
(6.64-a)
and the corresponding exact solution
)
ψ ± = Φ 0± + ψ ± , with ψ 0±
(6.64-b)
π / 2 r −1e ± ik r as r → ∞ , and ψ n+ = ψ n− = ψ n , n ≥ 1 , decaying sufficiently
rapidly to ensure that they are in H . The minimal condition for Eq. (6.52-a) to be well defined restricts ψ n , n ≥ 1 to the form domain of A . As indicated earlier, this representation is adequate. Slightly different notation, by letting n = 0 to N , is adopted to isolate the basis function ψ 0± incorporating the boundary condition. The representation of Eq. (6.64-a) together with Eq. (6.52) is equivalent to Eq. (6.59-a) with the identification
Foundations and Applications of Variational and Perturbation Methods
hN± =
) V ψ N± =
∑
N n =0
α n± N φn± .
239 (6.65)
) The exact counterpart of Eq. (6.65), the solution of Eq. (6.59-c), is given by h ± = V ψ ± . With this identification, the results can be transposed between Eqs. (6.64-a) and (6.65). Let
h 0 = ( h + + h − ) / 2 , which satisfies the equation [1 − K
−1
]h 0 = − ( g + + g − ) / 2 = − g 0 .
(6.66)
We restrict the description to the case of h + . The results for h − ,0 can be obtained by obvious replacements. The results for h0 , which corresponds to the principal value of Green’s function (Eq. (2.29-b)), can be transferred to the partial wave case for tl . The discussion is focused on the exact solution. Its approximate counterpart is obtained similarly. The representation of h + , which is required to be in the range R (K + ) of K + , expresses it as a direct sum h + = h0+ ⊕ u + where h0+ = α 0+φ0+ . While both, h0+ and u + are in R(K + ) ,
V −1/ 2u + is in H but ψ 0+ = V
φ 0+ is not. However, ψ 0+ is in H V as defined in Theorem
− 1/ 2
4.8, which is the completion of H
with respect to the norm || χ ||V =|| V χ || . This
decomposes R (K + ) into a direct sum of H V+ and D (V −1/ 2 ) , where H V+ is the one dimensional span of φ0+ in H and hence closed . Let H V1 be the union of H V± ,0 . It is clear that D (V −1/ 2 ) is contained in the closed complement H c of H V1 in H . This expresses H as the direct sum of H V1
and H c . For convenience, the details are covered for
h + = h0+ ⊕ u + , which is a vector in H V1 ⊕ H
c
with the scalar product of the two vectors
υ = υ 0 ⊕ υ ′ , u = u0 ⊕ u ′ given by (υ , u ) = (υ 0 , u0 ) + (υ ′, u ′) . Let p0+ be defined by p 0+υ = (φ 0− , υ )φ 0+ /(φ 0− , φ 0+ ) for υ in H and let q 0+ = (1 − p 0+ ) . Eq. (6.59-c) is equivalent to
⎡ p 0+ [K −1 − 1]p 0+ ⎢ ⎢q + [K −1 − 1]p 0+ ⎣ 0
p 0+ [K −1 − 1]q 0+ ⎤ ⎥ q 0+ [K −1 − 1]q 0+ ⎥ ⎦
⎡ h0+ ⎢ + ⎢⎣u
⎤ ⎥ ⎥⎦
=
⎡ p 0+ g + ⎤ ⎢ + ⎥. ⎢⎣q 0 g + ⎥⎦
(6.67)
Due to the available flexibility in selecting ψ 0+ , there is no loss of generality in assuming that
(φ 0− ,[K − 1 − 1]φ 0+ ) = < ψ 0− ,[ λ − A ]ψ 0+ > = d ≠ 0 and (φ 0− , φ 0+ ) =< ψ 0− , V ψ 0+ >≠ 0 , which has been used in constructing p0+ . This relaxes the condition φ 0− = φ 0+ on the basis functions, which was assumed in the differential formulation
240
S. Raj Vatsya
of the Schwinger method. This formulation is more compatible with the usual applications of the Kohn method. In view of the stated conditions, the normalization can be selected to ensure that (φ 0− , φ 0+ ) = 1 . Eq. (6.67) can now be reduced by the Gaussian elimination to yield
h0+ =
(
)
1 − φ0 , ⎡⎣ g + − (K −1 − 1)u + ⎤⎦ φ0+ , d
(6.68-a)
1 + −1 1 ⎡ + −1 + + −1 +⎤ + + −1 + . (6.68-b) ⎢⎣q 0 [K − 1]q 0 − d q 0 (K − 1) p 0 (K − 1)q 0 ⎥⎦ u = q 0 [1 − d (K − 1) p 0 ]g +
Let ω ± ,0 = (K
−1
− 1)φ0± ,0 . Since φ0± ,0 are in R (K ± ,0 ) , contained in H , ω ± ,0 are vectors
in H . Let the operator p ± ,0 be defined by p ± ,0υ = d −1ω ± ,0 (ω m ,0 , υ ) for each υ in H . Eq. (6.68-a, b) are equivalent to
h0+ =
1 ⎡⎣(φ0− , g + ) − (ω − , u + ) ⎤⎦ φ0+ = h [u + ] , d
C + u + = q 0+ ⎡⎣1 − K
−1
(6.69-a)
+ p + ⎤⎦ q 0+ u + = q 0+ ⎡⎣ d −1 ω + (φ 0− , g + ) − g + ⎤⎦ = B g + .
(6.69-b)
From Eq. (6.69-a), h0+ is determined by u + , which is determined by Eq. (6.69-b) without further reference to Eq. (6.69-a). The above procedure reduces Eq. (6.67) to a block triangular system by the Gaussian elimination. Since Eq. (6.69-b) is defined in H c , the problem of approximating h ± ,0 is thus reduced to analysis in H c . With this preparation, we have the following crucial result: +
Lemma 6.1. The operator C defined by Eq. (6.69-b) has a compact inverse on H c . Proof. Since Eq. (6.69-b) is obtained by a legitimate Gaussian elimination from a +
solvable set of equations, Eq. (6.67), C is invertible. For the compactness of the inverse, let
{υ } be a sequence of bounded vectors in H such that (φ ,υ ) = 0 , and let {α } be a sequence of bounded constants. Then {α ω + υ } is a sequence of bounded vectors in H . Taking {α ω + υ } for g , in Eq. (6.69-b) yields a sequence of the solutions {w } = {(C ) B (α ω + υ )} yielding the corresponding solutions {h′ } of Eq. (6.69-a). μ
μ
μ
− 0
c
μ
+
+ −1
μ
+
μ
μ
μ
+
μ
+
μ
μ
Then following the steps in reverse results in (K
{
−1
− 1)( hμ′ ⊕ wμ ) = (α μ ω + ⊕ υ μ ) . Since
}
(K −1 − 1) −1 is compact and α μ ω + + υμ is bounded in H , {hμ′ ⊕ wμ } is compact, i.e., it
{ }
contains a Cauchy subsequence in H , and hence, wμ
(C + )−1 exists as a compact operator from H c to H c •
is compact in H c , implying that
Foundations and Applications of Variational and Perturbation Methods
241
The equation corresponding to Eq. (6.69-b) for u 0 reads
C 0u 0 = q 00 ⎡⎣1 − K −1 + p 0 ⎤⎦ q 00 u 0 = q 00 ⎡⎣ d −1 ω 0 (φ00 , g 0 ) − g 0 ⎤⎦ .
(6.70)
All of the above arguments are applicable to Eq. (6.70). In addition, it is defined completely in terms of the self-adjoint operators, in contradistinction with Eq. (6.69-b). The projection q 0+ = (1 − p 0+ ) can be expressed as q 0+ = q 00 + ( p 00 − p 0+ ) = q 00 + p 10 , where + 0 1 1 p 10 is an operator of finite rank. Consequently, C = C + C , where C is an operator of
finite rank, which follows by substitution. Thus, (C 0 ) −1 and (C + ) −1 exist as bounded operators and (C + − C 0 ) is compact, for it is an operator of a finite rank. The convergence results can be obtained with the above formulation, but we reduce the equations further. The additional analysis is useful for further clarifications and comparison with the standard formulation of the Kohn method corresponding to Eq. (6.52), and the results are slightly stronger resulting from Friedrichs’ construction. The operator q 00 , as q 0± , is a projector from H to H c and thus, Eqs. (6.69-b) and (6.70) are well defined on H c . With this understanding, q 00 can be set equal to the identity in H c . +
0
In H c , C and thus C , is defined on D (K consists
of those
vectors
u
in
−1
) = D (V
− 1/ 2
[ λ − A 00 ]V
− 1/ 2
−1 ) , i.e., D (K )
D (V −1/ 2 ) , which satisfy the condition that
[∇ 2 + k 2 ](V −1/ 2u ) is in D (V −1/ 2 ) . By the same procedure as the exact Eq. (6.67), the set of the algebraic equations in the Kohn method approximating Eq. (6.67) is also reduced to a block triangular system. This set approximating Eq. (6.69-b) is then given by
∑
N m =1
α m± ,0 (φn , C ± ,0φm ) = (φn , B± ,0 g ± ,0 ), n = 1, 2..., N .
(6.71)
with the basis {φn }n=1 contained in D (K −1 ) . With the identification {φ n }n =1 = N
N
{
Vψ n
}
N
n =1
,
Eq. (6.71) reads
∑
N m =1
α m+ (ψ n , [(λ − A ) +
V ( p 0 + C 1 ) V ]ψ m ) = (ψ n , V g ± ,0 ),
n = 1, 2..., N .
(6.72)
Let
u N+ =
h0+N =
∑
N m =1
α m+φ m =
∑
N m =1
α m+ V ψ n ,
1 ⎡⎣ (φ0− , g + ) − (ω − , u N+ ) ⎤⎦ φ0+ = h [u N+ ] d
(6.73-a) (6.73-b)
242
S. Raj Vatsya
The corresponding approximation to the exact solution h + is given by h0+N ⊕ u N+ . This formulation of Eqs. (6.72) together with Eq. (6.73-b), with {ψ n }n =1 in D ( A ) has N
been used to conclude the convergence in measure [Nuttall, 1969]. However, this condition is insufficient for the equivalence of Eq. (6.72) and Eq. (6.71), which requires {ψ n }n =1 to be in N
D (V −1/ 2 A ) . The condition is directly related to the compactness of K ± ,0 , which implies that 1 is in the resolvent set of its inverse, which plays a crucial role in the proof of the convergence obtained below. Fundamental change in the spectral properties resulting from the domain restrictions, particularly to accommodate the boundary conditions, has been frequently discussed in the present text . Although not so transparent as in some other examples, the present case is essentially of the same character as it is a direct consequence of the behavior of the vectors in the domain at infinity. To analyze Eq. (6.71) further, let H +c be the completion of D (C 0 ) in H c , with respect to the scalar product (u ,υ ) + = (u , C 0υ ) , where C 0 = [V
− 1/ 2
( A 00 + ε )V
− 1/ 2
] for an arbitrary
ε > 0 . For convenience, we set ε = 1 . The analysis and the results are valid with C 0 replaced by (1 + C 0 ) but the conditions on the basis are more transparent with C 0 . For the potentials of the Rollnik class, C 0 −1 = [V 1/ 2 ( A 00 + ε ) −1V 1/ 2 ] is a positive and compact operator from H c to H c , with C 0 ≥|| C 0 − 1 ||−1 = γ > 0 justifying the construction. Express
C + as C + = ⎡⎣1 + C 0 − (1 + λ )V −1 + p 0 + C 1 ⎤⎦ . We have +
1
Lemma 6.2. With C 0 , C , p 0 and C as above,
(i) C 0 −1V −1
−1
admits a self-adjoint bounded closure;
0
(ii) C 0 p admits a self-adjoint compact closure; and (iii) C 0 −1C 1 admits a compact closure; on H +c . Proof. For u in D (V
−1
) , we have
|| C 0 −1V −1u ||2+ (C 0 −1V −1u , V −1u ) (V −1/ 2u ,[1 + A00 ]−1V −1/ 2u ) = = || u ||2+ (u , C 0u ) (u , C 0u ) = Hence, C 0 −1V
−1
(V −1/ 2u ,[1 + A00 ]−1V −1/ 2u ) (u , V −1u ) ≤ = 1. (u , V −1/ 2 [1 + A00 ]V −1/ 2u ) (u , V −1u )
is bounded on a domain dense in H +c . It is also symmetric in H c . (i)
follows from Lemma 4.3. Compactness of C 0 −1 p 0 and C 0 −1C 1 follows from the fact that these
Foundations and Applications of Variational and Perturbation Methods are the operators of finite ranks. As in the case of C 0 −1V
−1
243
, C 0 −1 p 0 also has a self-adjoint
closure in H +c • Now the form (u , C +υ ) for u , υ in H + can be expressed as
(u , C +υ ) = (u , ⎡⎣1 − where
0
1
,
0
−
1
⎤⎦ υ ) + ,
(6.74)
are the closures of C 0 − 1 [(1 + λ ) V
−1
− 1 − p 0 ] and ( −C 0 −1C 1 ) respectively,
in H +c . As in Theorem 3.7(i), Eq. (6.71) with B g + = g being arbitrary, is equivalent to
C N + u N+ = ⎡⎣1 −
0 N
−
1 N
⎤⎦ u N+ = PN C 0 −1 g ,
(6.75)
where PN is the orthoprojection on the span of {φn }n=1 with respect to the scalar product in N
H +c , and
0 N
= PN
0
1 N
PN ,
= PN
1
P N . We have also used Theorem 3.1. The basis is
c +
assumed to be complete in H , i.e., PN converges strongly in H +c to the identity operator in
H +c , still denoted by 1. This reduces Eq. (6.71) and Eq.(6.73-b) to the following equation in H V1 ⊕ H +c : ⎡1 V (1V ⊕PN ) A(1V ⊕PN )(h ⊕u ) = ⎢ ⎢0 ⎣ + 0N
where 1V
p′ ⎤ ⎡h0+N ⎤ ⎡p0+ g+ ⎤ + −1 ⎥ ⎢ ⎥=⎢ ⎥ = (1V ⊕PN )(p0 g+ ⊕C0 B+ g+ ), (6.76) + −1 +⎥ CN ⎢⎣uN ⎥⎦ ⎢⎣PNC0 B+ g+ ⎥⎦ ⎦
+ N
is the identity operator on H V1
and the operator p′ is defined by
p ′υ = d −1φ0+ (ω − ,υ ) . The scalar product of the two vectors υ = υ 0 ⊕ υ ′ and u = u0 ⊕ u ′ , each in H V1 ⊕ H +c , is given by (υ , u ) 0+ = (υ 0 , u 0 ) + (υ ′, u ′) + = (υ 0 , u 0 ) + (υ ′, C 0 u ′) . From Lemma 6.2 and Eq. (6.74),
0
1
is bounded, self-adjoint and
is compact, in fact
an operator of finite rank, and [1 − 0 ]−1 , [1 − 0 − 1 ]−1 exist as bounded operators from Lemma 6.1, Eq. (6.74) and Remark 2.5 (Eqs. (2.7-a, b). While 1 is in the resolvent sets of 0
0
+
] , it is in the interior of the numerical range of 0 and thus, it may not be in the resolvent set of 0 and ( 0 + 1 ) . If so, let % 0 be the reduced form of 0 in and [
1
N
N
N
N
accordance of Theorem 3.6; otherwise, let %
0 N
=
0 N
. We have
Lemma 6.3. With the symbols as in Eq (6.76), [1 − %
H +c .
N
0 N
−
1 −1 N
]
s
→ [1 −
N →∞
0
−
1 −1
]
in
244
S. Raj Vatsya Proof. From Theorem 3.6, [1 − %
1
1 N
is compact,
→
from Theorem 2.3, [1 − % Since [1 −
0 −1
[1 −
1 −1
0
−
1
0 −1 N
0 −1
] → [1 −
]
strongly in H +c as N → ∞ . Since
uniformly from Theorem 2.3 as in Theorem 3.4. Consequently,
0 −1 N
] and [1 −
0
] = [1 − (1 −
0 −1
→ [1 −
1 N
]
1
]
uniformly in H +c .
1 −1
−
] exist, we have that
0 −1
1 −1
0 −1
−
1 −1
0
] [1 −
)
] ,
equivalently,
[1 − (1 −
0 −1
1 −1
] = [1 −
)
0
] [1 −
],
implying that 1 is in the resolvent set of the compact operator (1 − from Lemma 3.1, [1 − (1 − %
[1 − % 0N −
1 −1 N
]
s
0 −1 N
)
] → [1 − (1 −
= [1 − (1 − % 0N ) −1
→ [1 − (1 −
N →∞
1 N
0 −1
)
1
1 N
](1 −
0 −1
)
1
0 −1
)
1
. Consequently,
] , uniformly, and hence
](1 − % 0N ) −1 0 −1
)
= [1 −
0
−
1 −1
] •
The result of Lemma 6.3 is sufficient to prove a convergence result parallel to Theorem 3.7 for the Kohn variational method. Theorem 6.3. With the symbols as in Lemma 6.3,
1.
If (C N+ ) −1 exists as a uniformly bounded operator with respect to N , then
|| ( h0+N ⊕ u N+ ) − ( h0+ ⊕ u + ) || → 0 , where h0+N = h [u N+ ] (Eq. (6.73-b)). N →∞
(ii) (Reduced variational method) If C N + has an eigenvalue in a small neighborhood of +
zero at a positive distance from the spectrum of C equivalently, if (
0 N
+
1 N
) has an
eigenvalue in the corresponding neighborhood of 1, then let u ′N+ be the solution of the reduced set of equations (Theorem 7.1 (ii)) obtained from Eq. (6.75) in accordance with Lemma 6.3; otherwise, let u ′N+ = u N+ , and h0′ +N = h [u ′N+ ] (Eq. (6.73-b)).Then
|| ( h0+N ⊕ u ′N+ ) − ( h0+ ⊕ u + ) || → 0 . N →∞
Foundations and Applications of Variational and Perturbation Methods
245
Proof. The results || u N+ − u + ) || → 0 and || u ′N+ − u + ) || → 0 in (i) and (ii), follow as in N →∞
N →∞
Theorem 3.7 (i) and (ii), respectively. The convergence of h0+N , h0′+N to h0+ follows from this result and from their definitions • The form representing the scattering amplitude given by the equivalent equations Eqs. (6.47), (6.54) and (6.58), in terms of A reduces to the form
F K [ψ − ;ψ + ] = F K [ hN− ; hN+ ] = ( g − , g + ) − ([ p 0− ⊕ C 0−1B− g − ],[h0+N ⊕ u N+ ]) 0+ = ( g − , g + ) − ([h0−N ⊕ u N− ],[ p 0− ⊕ C 0−1B+ g + ])0+ = ( g − , g + ) − ([ p 0− ⊕ C 0−1B− g − ],[h0+N ⊕ u N+ ])0+ − 0N
− ([h
− 0N
+ ([h
− N
− 0
−1 + 0
⊕ u ],[ p ⊕ C B g + ])
(6.77)
0 +
⊕ u N− ], A[h0+N ⊕ u N+ ])0+
= T NK (k − ; k + ). We state the following result corresponding to Theorem 6.3 (i). Parallel result corresponding to (ii) follows by replacing uN+ by u ′N+ . Proposition 6.2. Let T NK (k − ; k + ) be as in Eq. (6.77). Then
| T NK (k − ; k + ) − T (k − ; k + ) | ≤ const . || u N+ − u + ||+ || u N− − u − ||+ → 0 . N →∞
Proof. From Eq. (6.73-b), we have that
|| h0±N − h0± || ≤ const . || u N± − u ± || ≤ const . || u N± − u ± ||+ . Further from Corollary 3.2 and Eq. (6.77), we have that
| T NK (k − ; k + ) − T (k − ; k + ) | = | ([ h0−N − h0− ] ⊕ [u N− − u − ], A[ h0+N − h0+ ] ⊕ [u N+ − u + ]) 0+ | . In view of the fact that A is a bounded operator on H V1 ⊕ H +c , the result follows from the routine estimates • The convergence result of Theorem 6.3 can be obtained with analysis in H V1 ⊕ H
c
by
invoking Theorem 3.8 and Proposition 3.6. However, as explained in Remark 3.5, the second order convergence of the scattering amplitude, although in terms of a greater norm, results from the formulation in H V1 ⊕ H +c instead of H V1 ⊕ H c where A is unbounded. The results obtained above for the scattering amplitudes are easily transposed for the partial waves. However, this does not result in any improvement. Therefore, we omit the details.
246
S. Raj Vatsya
It is clear from Theorem 6.3 that the solution obtained by the Kohn method, i.e., by solving Eq. (6.52) with the trial function given by Eq. (6.64-a), by reducing the set if need be, converges to the exact solution. The convergence is with respect to the norm in H after multiplication by V . In comparison, Hulthén’s method suffers from essentially a form of non-convergence. Thus, variation with respect to one additional parameter in Kohn’s prescription, generating one additional equation together with the coupling with the others, has a crucial impact on the properties of the method. Since C 0 −1 is compact, P N C 0 −1 → C 0 −1 uniformly to a compact operator in H +c . Consequently,
[1 − %
0 N
−
1 −1 N
u
] PN C 0 −1 → [1 − N →∞
0
−
1 −1
] C 0 −1 ,
in H +c , from Theorem 2.3. This indicates a favorable property of the equations. While mathematically interesting, this result does not improve the numerical value of the method as the spurious singularities can still be encountered and they have to be eliminated by reducing the set of algebraic equations by dropping the offending vector.
6.III.5. Rotated Hamiltonians It follows from Eq. (6.25) and Eq. (4.44-a, b) that
T (k − ; k + ) = < Φ 0 (k − ), V Φ + (k + ) > = < Φ 0 (k − ), V Φ 0 (k + ) > + S + (k − ; k + ), S + (k − ; k + ) = ( h,[λ + i 0 − A ]−1 g ),
(6.78)
where h = V Φ 0 (k − ) and g = V Φ 0 (k + ) . We restrict to the square integrable V and thus, + g and h are in H . Consider the case of S (k − ; k + ) = S (k − ; k + ) , dropping the superscript.
The case of S − (k − ; k + ) is similar. As was discussed in the previous subsection, numerical evaluation of the scattering related quantities encounters difficulties due to the fact that they are defined in terms of Green’s functions in the limit of the real line. In practice, a number of potentials possess additional analyticity properties, which can be exploited to express the quantities of interest in terms of the bounded operators. If the potentials are analytic with respect to the radial coordinate in a region bounded by the positive real line and a straight line from the origin at an angle α to it, then the method of the rotated Hamiltonians can be used to define these quantities in terms of the bounded operators, as follows. Let H + be the completion of D ( A 00 ) with respect to the scalar product
( u , υ ) + = (u ,[1 + A 00 ]υ ) , i.e., the norm || u ||+ = || [1 + A 00 ]1/ 2 u || ; let U (ϑ ) be the oneparameter group of dilatations on H defined by (U (ϑ ) g ( r )) = e3ϑ / 2 g ( reϑ ) ; let the twobody potential V be A 00 - compact, i.e., compact on H + (Lemma 4.4), such that
Foundations and Applications of Variational and Perturbation Methods
247
V (ϑ ) = U (ϑ )VU -1 (ϑ ) admits an analytical extension into the strip | Im .(ϑ ) | ≤ α for some α > 0 . Exponentially decaying potentials without singularities except at the origin can be seen to satisfy these conditions, which arise in applications. Essentially equivalent formulation can be developed by treating V and U (ϑ ) as maps from H + to H − , where H − is the completion of H with respect to the norm || u ||− =|| [1 + A 00 ]−1/ 2 u || [Simon, 1971]. Consider the set Dα
of the complex numbers z′ such that 0 ≤| z ′ |< ∞ and
−α ≤ arg( z ′) ≤ α , which is the closure of the region Dα of analyticity indicated above, with a continuous boundary. With the above conditions on the potentials, h, g of Eq. (6.78) admit analytic square integrable extensions h( z ′) , g ( z′) to Dα . By replacing the radial integral along the positive real line by a ray at an angle α to it results in
S (k − ; k + ) = e 3 iα ( h − α ,[ λ − e −2 iα A (α )]− 1 g α ) ,
(6.79)
by Cauchy’s theorem, where φ α = φ ( reiα ) for each function φ and
A (α ) = ( A 00 + e 2 iα V α ) . The variational approximation to S (k − ; k + ) is obtained by solving the algebraic equations
∑
N m =1
α m (φn , [λ − e −2iα A (α )]φm ) = (φn , g α ), n = 1, 2..., N ,
(6.80)
where {φn }n=1 is the basis set. The convergence result is obtained by the same arguments as N
used in Lemma 6.3 to Proposition 6.2, with additional benefit of an absence of the spurious singularities. Therefore, the proofs are briefly outlined. Consider the sectorial form (u ,[λ − e −2 iα A (α )]υ ) , with u, υ in H + . Initially u , υ are taken in D ( A 00 ) , which extends to H + by closure, as usual. We have that
(u ,[ λ − e −2 iα A (α )]υ ) = (u ,[1 − ζ B + + K ]υ ) + , where ζ = (1 + λ e 2 iα ) , the operator B + is the closure of [1 + A 00 ]−1 in H + , and K is the closure of e 2 iα [1 + A 00 ]− 1V α . By the stated assumptions on V , K is compact (lemma 4.4(i)). For each g in H and φ in H + , (φ , g ) = (φ ,[1 + A 00 ]−1 g ) + = (φ , B g ) + . The operator
[1 − ζ B + + K ]−1 B , with B = [1 + A 00 ]−1 whenever it exists, is a legitimate operator from H to H + . This provides an adequate definition of the sectorial operator A (α ) by
[ λ − e − 2 iα A (α )] = B −1 [1 − ζ B + + K ] ,
248
S. Raj Vatsya
as in Remark 2.5. Now, Eq. (6.80) reduces to
∑
N m =1
α m (φn , [1 − ζ B + + K ]−1φm ) + = (φn , B g α ) + , n = 1, 2..., N ,
(6.81-a)
where {φn }n=1 is a basis in H + . Eq. (6.81-a) is equivalent to (Theorem 3.1) N
[1 − ζ B N+ + K N ] f N = P N+ B g α ,
(6.81-b)
in H + , where B N+ = P N+ BP N+ , K N+ = P N+ KP N+ and PN+ is the orthoprojection converging strongly to the identity in H + . We have Lemma 6.4. With the symbols as above, s
[1 − ζ B N+ + K N ]−1 P N+ B → [λ − e −2iα A (α )]−1 in H . N →∞
Proof. Since B + is self-adjoint and Im . (ζ ) ≠ 0 , [1 − ζ B N+ ]− 1 is bounded (Corollary
2.4):
|| [1 − ζ B N+ ]−1 || = | ζ −1 | || [ζ −1 − B N+ ]−1 || ≤
| ζ −1 | . Im.(ζ −1 )
s
Consequently [1 − ζ B N+ ]−1 → [1 − ζ B + ]−1 , as in Theorem 3.5, and hence N →∞
u
[1 − ζ B N+ ]−1K N → [1 − ζ B + ]−1K , N →∞
(6.82)
from Theorem 2.3. It follows that
[1 − ζ B N+ + K N ]−1P N+ B = (1 + [1 − ζ B N+ ]−1K N ) −1[1 − ζ B N+ ]−1 P N+ B s
→ (1 + [1 − ζ B + ]−1K ) −1[1 − ζ B + ]−1 B
N →∞
= [1 − ζ B + + K ]−1 B = [λ − e −2 iα A (α )]−1 , since the products of uniformly and strongly convergent sequences are strongly convergent, unless the limit of the uniformly convergent sequence is compact. We have also used Lemma 3.1 •
Foundations and Applications of Variational and Perturbation Methods
249
The following result follows by standard arguments, e.g., Theorem 6.3 (i). Theorem 6.4. With the symbols as in Lemma 6.4 and Eq. (6.80), we have
|| ∑ n =1 α nφn − [λ − e −2 iα A (α )]−1 g α || → 0 • N
N →∞
Also, by standard manipulations and this result, it follows that the approximation to S (k − ; k + ) obtained from the solution of Eq. (6.80) is of a second order accuracy with respect to the norm in H + (Corollary 3.2). Although we have not considered the multi-particle systems, above considerations are easily extendable to them. The multi-particle counterpart of Eq. (6.78) reads
T (Φ f ; Φ i ) = < Φ f , Vi Φ i > + S + (Φ f ; Φ i ),
(6.83)
S + (Φ f ; Φ i ) = < V f Φ f ,[λ + i 0 − A ]−1Vi Φ i >,
where Φ i , Φ f are the initial and final channel states, and V i , V f are the corresponding effective potentials. The Hamiltonian A = ( A 0 + V 3 N ) , where A
0
is the self-adjoint
realization of the 3N -dimensional Laplacian ∇ 32N and V3N is the sum of the two particle potentials. The rotation is now in a multi-dimensional domain yielding
S + (k − ; k + ) = S (k − ; k + ) = e 3 iN α ( h − α ,[ λ − e − 2 iα A (α )]− 1 g α ) .
(6.84)
There is a crucial difference between a particle in a potential and the multi-particle case: Even if the two particle potentials constituting the multi-particle interaction are relatively compact, the resulting operator K for a multi-particle system is only bounded, invalidating Eq. (6.82). All we are left with is the strong convergence of (ζ B N+ − K N ) to (ζ B + − K ) , both non-selfadjoint. Based on this result, it can be concluded that if [1 − ζ B N+ + K N ]− 1 are uniformly bounded, then s
[1 − ζ B N+ + K N ]−1P N B → [1 − ζ B + + K ]−1 B = [λ − e −2iα A (α )]−1 , N →∞
from Lemma 3.3. However, even the existence of [1 − ζ B N+ + K N ]− 1 is not guaranteed. As a result, a possibility of the spurious singularities cannot be ruled out. Lack of self-adjointness complicates the matters further. However, if the singularities are encountered, then the procedure described in Remark 3.4 can still be used, i.e., by converting the approximate equations to their self-adjoint equivalent and then dropping the offending vector from the basis [Singh, 1977].
Chapter 7
7. SUPPLEMENTARY EXAMPLES 7.I. RAY TOMOGRAPHY The first example we consider is to invert the integral
T ( L) =
∫
L
dl (r ) η (r ) ,
(7.1)
where r denotes a point in a two-dimensional domain D , the integral is along a line L in D and dl (r ) is the infinitesimal length element about the point r defining the measure. The aim is to determine the integrand η (r ) from the knowledge of its integrals along a set of lines {Ln } in D . This problem arises in the ray tomographic problems. For example, in the computer aided tomography (CAT), attenuations of the X-rays passing through the object of interest along a set of lines are measured experimentally. Total attenuation along each line can be expressed as an integral of the attenuation along its path. The quantity of interest is the density distribution of the object, which is determined by the local attenuation distribution. In the rocket and satellite imaging of the atmosphere, the integral is the integrated change in the frequency of a microwave beam and the quantity of interest is the charge distribution in the atmosphere, which is directly related to the pollution level. In the geophysical surveys, one method is to measure the sound travel time along a set of paths and the quantity of interest is the density distribution of the rock, which is inversely proportional to the velocity. There are a number of other applications, which are covered by the present method. In all of these cases, the signal is assumed to travel along straight lines, which is a good approximation for the operating physical conditions. By usual manipulations, each of these and some other problems, can be reduced to the mathematical problem stated here. Standard methods in use for the ray tomographic reconstruction of the image of the scanned object are based on expressing Eq. (7.1) as a Radon transform and then use the corresponding inversion formula to develop an algorithm. The most widely known algorithm of this type is the filtered back projection, which is closely related to several other methods [Herman et al, 1987]. Initially the methods yielded unstable numerical schemes with limited accuracy. The procedures have been adjusted and refined mostly by intuition based numerical techniques with some arbitrariness, e.g., the selection of an appropriate filter in the filtered
252
S. Raj Vatsya
back projection method. Currently, they are reasonably satisfactory and widely in use, but a need for a robust, accurate and efficient algorithm still persists, at least in some applications. In any case, an improvement in the existing techniques, and alternative procedures, are always desirable. In this section, we develop an algorithm based on the constructive proof of Theorem 2.9, which was illustrated by obtaining a known algorithm to invert the Laplace transform (Example 2.2), the familiar Fourier series expansion (Example 2.3) and to calculate a function from the knowledge of its moments (Proposition 3.10). As will be seen, in case of the ray tomography, a diagonally dominant positive definite matrix is encountered, rendering its inversion straightforward, resulting in a numerically stable scheme. The algorithm is also straightforward to implement and produces optimally accurate results in the sense that the maximal information contained the measurements is extracted. This method is known as the Areal Basis Inversion Technique (ABIT), which is described below with the Cartesian coordinate system x , y spanning the region of interest. However, it can be adjusted for use with any other locally orthogonal coordinate system, e.g., the polar. Also, the integrals along the curved lines create no essential difficulties. Further extensions of the method are also possible. The value of the integrand at ∂D , the boundary of D , will be assumed to be equal to zero. If the value there is known or can be estimated, it can be included in the treatment. The set of lines will be assumed to join the points of ∂D . For a complete coverage, the end points of the lines should be spread over ∂D and in the limit of infinitely many lines, should form a dense set of points in ∂D . Let ∂Db be the base line, i.e., every point in D can be reached from some point on ∂Db along a coordinate line, assumed for the present to be y = const . In the present coordinate system and the straight lines, since ( dl / dx ) for a given line is constant, Eq. (7.1) reduces to
g ( L) =
∫
L
dx η (r ) ,
(7.2)
where g ( L ) = T ( L ) /( dl / dx ) . Defining f ( x, y ) = ∂ (η ( x, y ) / ∂y , assumed to exist almost everywhere with respect to y , reduces Eq. (7.2) to
g ( L) =
∫
D′
dx dy f ( x , y ) ,
(7.3)
where D ′ is the region bounded by L and the part of ∂D containing ∂Db . If f ( x, y ) is determined, its partial integral with respect to y is equal to η ( x, y ) almost everywhere. Let χ ( L; x , y ) be the characteristic function of D ′ , then we have
g (L) =
∫
D
dx dy χ ( L ; x , y ) f ( x , y ) .
(7.4)
Foundations and Applications of Variational and Perturbation Methods
253
The problem is now formulated as the Fredholm integral equation of the first kind with domain in H = L2 ( D ; ( dxdy )) and the range variable L varies over the set {Ln } . All that is needed for the applicability of the construction of Theorem 2.9 is that the values of the integrals be available on a set {Ln } of lines such that φn = {χ ( Ln ; x, y )} forms a basis in H . The orthonormal set constructed with this set of the characteristic functions is easily seen
to be the collection of the constant multiples of the characteristic functions of the polygons enclosed by a subset of the lines, such that no line passes through their interiors. With a complete coverage, these functions are the two-dimensional step functions that can be used to approximate a Lebesgue integrable function on D . Thus, as n increases, {φn } forms a complete set in H . Hence,
f N ( x, y ) =
∑
N n =1
α n χ ( Ln ; x , y ) ,
(7.5)
provides a converging sequence of approximations to f ( x, y ) , with the coefficients αn being the elements of the solution h of
Ah = g′ ,
(7.6)
where g ′ is the vector with the elements g ( Ln ) and A is the matrix with elements
A nm = (φ n , φ m ) =
∫
D
dx dy χ ( Ln ; x , y ) χ ( Lm ; x , y ) ,
(7.7)
The values of g ( Ln ) are available from the experimental measurements and the matrix elements can be evaluated exactly by elementary methods. Thus, the method is quite convenient to implement. The resulting normalization matrix has the largest entries along the diagonal. Thus, Eq. (7.6) can be efficiently and accurately solved by the standard numerical schemes without complications. The corresponding approximation η N ( x, y ) to the original function of interest η (r ) can now be evaluated by a simple integration:
η N ( x, y ) =
∫
y yb
dy ′ f N ( x , y ′) =
∑
N n =1
αn
For the convergence of η N ( x, y ) to η (r ) , we have
∫
y yb
dy ′ χ ( Ln ; x , y ′).
(7.8)
254
S. Raj Vatsya
| η N ( x, y ) − η ( x, y ) |2 ≤ | [ ∫
y y min
dy ′ ( f N ( x, y ′) − f ( x, y ′)] |2
≤ ( ymax − ymin )
∫
y max y min
2
dy ′ [( f N ( x, y′) − f ( x, y′)] ,
from the Schwarz inequality (Lemma 1.4). The right side exists for almost every x , and since
∫
xmax xmin
dx | η N ( x , y ) − η ( x , y ) |2 ≤ ( xmax − xmin ) || ( f N − f ||2 → 0 , N →∞
η N ( x, y ) converges to η ( x, y ) for almost every x . Furthermore, it follows that || η N − η || ≤
( xmax − xmin )( ymax − ymin ) || ( f N − f || → 0 , N →∞
which shows that η N ( x, y ) converges to η ( x, y ) for almost every x and y . Since the imaged object may have discontinuous deformities, e.g., cracks in the rocks, this is quite adequate. The integrals encountered in evaluating the approximations given by Eq. (7.8) are elementary yielding piecewise linear, with respect to y , approximations η N ( x, y ) to η (r ) . This method was tested by comparing the images reconstructed by the present method with the exact geophysical profiles of the rocks. Sound travel time was calculated for the realistic synthetic rock profiles and used to reconstruct the images. The approximate and the exact profiles were found to be in excellent agreement. The method was found to be more efficient and produced more accurate images with respect to the shapes and the density distributions compared to a number of other methods in use. The numerical scheme was found to be stable as expected, and amenable to usual algebraic equation solvers, e.g., the Choleskii decomposition method [Serzu et al, 1995].
7.II. MAXWELL’S EQUATIONS Propagation of the electro-magnetic waves in macroscopic media, with no sources of charges and currents, is described adequately by the classical Maxwell field equations [Jackson, 1962]:
∂B = 0 ∂t ∂D ∇ D = 0, ∇ × H + = 0, ∂t ∇ B = 0, ∇ × E +
(7.9)
Foundations and Applications of Variational and Perturbation Methods
255
where E, H are the electric and magnetic fields, D, B are the displacement and magnetic induction fields, the dot denotes the scalar product in the three-dimensional Eucledian space and the speed of light is taken to be unity. For an isotropic, low loss dielectric medium with a frequency-independent dielectric function, these equations reduce to
∂H (r , t ) = 0 ∂t ∂D(r , t ) ∇ ε(r )E(r , t ) = 0, ∇ × H (r , t ) + = 0, ∂t ∇ H (r, t ) = 0, ∇ × E(r, t ) +
(7.10)
where ε(r ) is the dielectric distribution of the medium, assumed to be a scalar. We have also set B = H assuming that the magnetic permeability is equal to one, which is an adequate approximation for the materials of interest for the present. Fourier transforming these equations with respect to the time t expresses them in terms of its conjugate variable, frequency ω , which is justified in view of the decay properties of the fields.
% (r, ω) = 0, ∇ × E% (r, ω) + iωH % (r, ω) = 0 ∇H % (r, ω) = 0, ∇ × H % (r, ω) − iωε(r )E% (r , ω) = 0, ∇ ε(r )E
(7.11)
where tilde denotes the Fourier transform. The first two of Eq. (7.11) requiring the fields to be divergence free can be accommodated by restricting the solutions to be transverse, i.e., k H = k E = k H% = k E% = 0 , where k is the wave vector. By further eliminating the electric field from the remaining two equations reduces the considerations to [Joannopoulos , 1995]
⎛ 1 ⎞ % (r , ω) ⎟ = ω 2 H % (r , ω) . ∇ × ⎜⎜ ∇×H ⎟ ε r ( ) ⎝ ⎠
(7.12)
Similar equation can be obtained for the electric field but Eq. (7.12), in magnetic field, is more convenient. The electric field can be obtained from the magnetic field using the last of Eq. (7.11). Thus, the problem of solving Maxwell’s equations, Eq. (7.9), is reduced to determining the eigenvalues and the eigenprojections in Eq. (7.12). For the present, we restrict to the periodic structures, i.e., to the cases when ε(r ) is a periodic function. This covers a variety of applications at least as a base problem. For example, the photonic band gap materials are periodic structures and their perturbations. Although their geometric extensions cover macroscopic domains, they are still called the photonic band gap crystals also. The period of ε(r ) defines the crystal. In a typical crystal
ε(r ) is a discontinuous function with piecewise constant values. In any case, it is bounded on both sides, i.e., 0<ε min ≤ ε(r ) ≤ ε max < ∞ . Basic photonic band gap material is typically an arrangement of the cells defined by the crystal period. For example, an arrangement of the
256
S. Raj Vatsya
dielectric slabs of the same thickness separated by equal air gaps constitutes a onedimensional crystal with one cell defined by one slab and one air gap or in other similar manner. Consider the periodic functions u (r ) on a lattice defined by u (r ) = u (r + ra ) for all ra that translate the lattice into itself, an example being the dielectric function ε(r ) . An arbitrary lattice vector ra can be expressed in terms of the primitive lattice vectors, which are the smallest vectors joining a point in one lattice to the corresponding point in each adjacent lattice. For example, in a cubic lattice with spacing a , each ra can be expressed as
ra = a(lxˆ + myˆ + nzˆ ) with l , m, n being the integers. In a general, a lattice in a threedimensional Euclidean space, ra is expressed as ra = (la1 + ma 2 + na 3 ) with a1 , a 2 , a 3 being the primitive lattice vectors, which may not be of unit length. These considerations can be adjusted for other dimensions. It is sufficient to consider Eq. (7.12) on one cell, i.e., one crystal period, defined by the set of vectors s = (a1 , a 2 , a 3 ) . Taking the origin at the central point of the cell, the periodicity on the crystal is equivalent to the requirement that
% ( −s / 2, ω) = H % (s / 2, ω) . Similar conditions can be applied for other choices of the origin. H As their counterpart for the rectangular domains, a basis for the scalar functions periodic
{
}
on a general lattice can be expressed as eiG r , where G are the reciprocal lattice vectors defined by e = 1 , i.e., G r is a constant multiple of 2π . This defines a reciprocal lattice with the primitive vector set s −1 = (b 1 , b 2 , b 3 ) , which can be calculated from iG r
G ra = (la1 + ma 2 + na 3 ) (l ′b1 + m′b 2 + n′b 3 ) = 2π N ,
(7.13-a)
yielding
b1 = 2π
a 2 × a3 a 3 × a1 a ×a , b 2 = 2π , b 3 = 2π 1 2 . a1 a 2 × a 3 a1 a 2 × a 3 a1 a 2 × a 3
{
(7.13-b)
}
A basis for the periodic vector functions is given by eν eiG r where {eν } is a basis in R N . The set {eν } in the present context is the set of the polarization vectors. The frequency ω(k ) is a scalar function of the vector k , which determines the direction
% (r , ω) can be expressed as a sum of of propagation. Due to the periodicity of the lattice, H the Bloch or the Floquet states or modes, e ik r H k (r ) = H k (r ) . This result is known as Bloch’s theorem. It is clear that ei ( k + G ) r = eik r and thus, translation of k by a reciprocal lattice vector leaves the mode invariant. Therefore, it is sufficient to consider all k in a minimal zone, called the Brillouin zone and for each k , the solutions of the type of H k (r ) can be sought. Alternatively, Eq. (7.12) restricted to a mode can be expressed as
Foundations and Applications of Variational and Perturbation Methods
⎛ 1 ⎞ (ik + ∇ ) × ⎜⎜ (ik + ∇ ) × H k (r ) ⎟⎟ = ω 2 (k )H k (r ) . ⎝ ε (r ) ⎠
257
(7.14)
The following analysis is applicable to Eq. (7.12) and Eq. (7.14) without altering the arguments. We describe the method for Eq. (7.12) for a fixed mode. Eq. (7.12) determines the frequency for each wave vector, for which the medium can support the field. If it has a solution for a wave vector for a given frequency, then the associated wave can propagate through the medium. This generates a graph of frequency versus wave vector indicating the set of allowed frequencies together with the wave vector, which determines the direction of propagation. Two adjacent curves determine a band of the frequencies and the range of wave vectors that are prohibited. This band of the frequencies is termed the photonic band gap. If the band gap corresponds to a partial range of directions, it is a partial band gap and if it covers all of the directions, the band gap is complete. Radiation of a frequency in the interior of a complete band gap cannot propagate through the medium in any direction. Thus, such a structure acts as a perfect reflector. Devices conforming to or based on a structure exhibiting a band gap is termed a photonic band gap material or crystal. Considerable effort is being invested in designing and fabricating the photonic band gap materials for their major potential applications mainly in the photonic communication technology. Details can be found, e.g., in Joannopoulos (1995). We focus on the analysis of Eq. (7.12) to illustrate the applications of the methods developed in the present text to the calculations of the band gaps determined by the frequencies for all wave numbers, which constitute the basis for designing of the photonic crystals. The electromagnetic field is determined by the eigenprojections. Let S be the set of all functions u′(k,r) expressible as u′(k, r) = e ik r u (r ) for a fixed k , where u (r ) is a function from one lattice cell in R N to R N such that ∇ u′ = 0 and
u (r ) is square integrable. The dimension N of the underlying Euclidean space can at most be three in practice but the analysis can be adjusted for an arbitrary dimension. The notation usual for N = 3 is used with comments when needed. All linear combinations of such functions on the field of the complex numbers constitute a linear manifold, which can be completed to a Hilbert space H with respect to the scalar product defined by
( u, υ) = ∫ dr
u * (r ) υ (r ) ,
where the integral is over the region covered by one cell of the material. The space H can be expressed as a tensor product of R N and the Hilbert space H ′ of square integrable, scalar functions, with the scalar product [ , ] defined by product (u, υ)
can be expressed as
[u ,υ ] = ∫ d r u * ( r ) υ ( r ) . The scalar
( u, υ ) = ∑ ν =1[uν ,υν ] , N
where uν and υν are the
components of the vectors u and υ , respectively. The space H ′ itself is a tensor product of N copies of the space of square integrable scalar functions on a domain in the real line. An
258
S. Raj Vatsya
orthonormal basis in H can be constructed from {eν } and an orthonormal basis {ϕn } in
H ′ as their tensor product {eν ⊗ ϕ n } . Eq. (7.12) acquires precise meaning as an eigenvalue equation in H reducing to
( AB A ) H k
= ω2 (k ) H k
(7.15)
for a fixed mode, where ( B u )( r ) = u ( r ) / ε(r ) and formally, ( Au )( r ) = ∇ × u ( r ) . The −1 −1 operator B is clearly positive and bounded: 0 < ε max ≤ B ≤ ε min < ∞ . The exterior
derivative defining A has properties similar to the scalar derivative considered in Example 2.1, but there are differences. To determine its properties, consider
( u, A υ )
=
∫ dr u * ( ∇ × υ )
= υ × u* |s−/s2/ 2 + ∫ dr ( ∇ × u *)
υ.
In higher dimensions, the curl is replaced with the general exterior derivative. With D ( A ) consisting of the functions υ such that υ( −s / 2) = υ(s / 2) , the domain D ( Aˆ ) of its adjoint coincides with D ( A ) , which is essentially the same result as in the scalar case. However, contrary to the scalar derivative, which is anti-symmetric, the exterior derivative is symmetric, implying that A is a self-adjoint operator. Consequently, ( AB A ) is self-adjoint
and also non-negative. Thus, the determination of the frequencies of the waves propagating through a periodic structure is reduced to the calculation of the eigenvalues of a positive operator in a Hilbert space. Since each u in D ( A ) is divergence free, i.e., ∇ u = 0 , it follows that
A 2u = ∇ × ∇ × u = −∇ 2u , and thus, A 2 is the self-adjoint realization of −∇ 2 on its domain. Each u in D ( A 2 ) is mapped into
∑ν
N
e ( −∇ 2 uν ) . Thus, A 2 is a vector sum of
=1 ν
the tensor products of the operators constructed from one-dimensional components of −∇ 2 as illustrated in Eq. (2.25), i.e.,
A 2 = ∑ μ ,ν =1 eν ⎡⎣11 ⊗ ... ⊗ (i∂ μ ) 2 ⊗ ... ⊗ 1N ⎤⎦ , N
(7.16)
where ∂ μ is the derivative with respect to μ th coordinate in R N and 1μ is the identity on the
μ th copy of H ′ . It follows from Example 2.6 that each (i∂ μ ) 2 is positive-definite implying that A 2 is strictly positive and its inverse is a Hilbert-Schmidt operator, in fact an operator of the trace class (Remark 2.6). Thus, its spectrum constitutes entirely of the isolated eigenvalues accumulating at infinity, which can be computed using Eq. (7.16) or directly, which also yields its eigenvectors, as follows:
Foundations and Applications of Variational and Perturbation Methods
−∇ 2 (eν ei ( k +G ) r ) = [(k + G ) × eν ] [(k + G ) × eν ](eν ei (k + G ) r ) = | [(k + G ) × eν ] |2 (eν ei (k + G ) r ). It follows from Eq. (7.13-a) that the eigenvalues | [(k + G ) × eν ] |2
259
(7.17)
with the
corresponding eigenvector eν e i ( k + G ) r , diverge in proportion to (n′)2 with some integer n′ as | G |→ ∞ . This covers all of the eigenvalues of A 2 as can be seen from Eq. (7.16) and Ichinose’s lemma (Eq. (6.5)). The polar decomposition of A expresses it as A = U | A | , where | A | is the positive square root of A 2 and U is a unitary operator [Kato, 1980, Sec. VI.7]. Since A is selfadjoint, this is equivalent to A =| A | U −1 Although inconsequential, for the present case
U = 1, −1 on the subspaces spanned by the eigenvectors corresponding to the positive, negative eigenvalues of A , respectively. It follows from Eq. (7.17) that the eigenvalues of | A | diverge with n′ as | G |→ ∞ , and hence it is a Hilbert-Schmidt operator (Corollary 2.5). Alternatively, | A | admits the representation similar to Eq. (7.16) and the result follows from Example 2.4. In any case, AB A =| A | U −1BU | A | is invertible with its inverse being a strictly positive definite Hilbert-Schmidt operator. From Theorem 3.13, the converging sequences of the upper bounds to the eigenvalues of AB A can be obtained by the Rayleigh-Ritz method. Finite element basis functions are frequently used for their ability to map varied geometrical structures. However, the plane
{
waves φ( n ,ν ) = eν e
i ( k +G n ) r
}
= eν φG0 n are still about the most widely used basis functions for
the simplicity of calculations in spite of a slow rate of convergence in the present applications. Since this is the set of the eigenvectors of a self-adjoint operator with discrete spectrum, it is complete from Corollary 2.3. The slow rate of convergence is a consequence of the unsatisfactory convergence properties of the Fourier series expansions of discontinuous functions, which arise in the calculations of the photonic band gaps. While the expansion converges with respect to the Hilbert space norm, its partial sums produce rapidly oscillating approximations, which is known as the Gibbs phenomenon. The amplitude and the number of oscillations increase as the number of terms is increased. The measure of the set supporting the oscillations decreases to zero maintaining the convergence in H , in spite of a lack of uniform convergence.
{
The Rayleigh-Ritz variational method with basis φ( n ,ν )
}
reduces the problem to the
evaluation of the eigenvalues and the eigenprojections of a matrix:
∑μ Θ(
( m, )
k n ,ν )( m , μ )
h ( m , μ ) = ω(2 j ,η ) (k ) h ( n ,ν ) ,
(7.18-a)
N) are the upper bounds to the exact eigenvalues ω(2 j ,η ) = σ ( j ,η ) , where ω(2 j ,η ) = σ ((N, j ,η )
j = 1,..., N , η = 1,..., N . The approximate eigenvectors are given by
260
S. Raj Vatsya
H ( N , N ) (r ) =
∑ν h(
( n, )
n ,ν )
eik r φ( n ,ν ) (r ) ,
and the matrix elements are given by
Θ (kn ,ν )( m , μ ) = (φ( n ,ν ) , ( AB A )φ( m , μ ) ) = ⎡⎣ ( k + G n ) × eν ⎤⎦ ⎡⎣ (k + G m ) × e μ ⎤⎦ BG n G m ,
(7.18-b)
where BG n G m are the matrix elements of B , in H ′ , given by, BG n G m = ⎡⎣φG0 n , (1/ ε)φG0 m ⎤⎦ , i.e., the coefficients in the Fourier series expansion of [1/ ε(r )] . With bases other than the plane waves, the Rayleigh-Ritz method yields a matrix with more complicated structure than given by Eq. (7.18-b). Also, the evaluations of the Fourier transforms are quite efficient and convenient. Computational simplicity often compensates for the slow rate of convergence. Once again, there are no suitable error bounds available on the upper bounds obtained by this variational method. There are cases of practical importance where the calculated band gaps based on the upper bounds are quite narrow. Consequently, mere existence of the band gaps remains suspect. In any case, an information about the accuracy of the results is desirable in all cases. Converging error bounds or the opposite bounds can compensate for this deficiency. The structure of
( AB A )
is well suited for the application of the method of the
intermediate problems studied in Lemma 4.9 and Theorem 4.5. The operator ( AB A ) can be expressed as
( AB A )
= ( a0 A 2 + A C A ) = A 0 + V = A ,
(7.19)
1 where C = (B − a0 ) defines a positive, bounded operator in H ′ , as well as with 0< a 0 <ε −max −1 −1 in H with the bounds (ε max − a 0 ) ≤ C ≤ (ε min − a 0 ) . With a0 > (1/ ε min ) , where ε min is the
minimum value of ε(r ) , the inequalities are reversed, yielding a parallel procedure to determine the upper bounds. Although of academic interest, this method is less attractive than the Rayleigh-Ritz method. Therefore, we focus on the method for the lower bounds. The eigenvalues {σ (0n ,ν ) } of the base operator
A 0 given from Eq. (7.17) by
σ (0n ,ν ) = a0 | ⎡⎣( k + G n ) × eν ⎤⎦ |2 , provide lower positive bounds to the eigenvalues σ ( n ,ν ) of
( AB A ) .
While we follow the formulation of Lemma 4.9 and Theorem 4.5, the same ) ) ) arguments can be used to obtain bounds on C such that 0 = C 0 ≤ C N ≤ C ( N +1) ≤ C , in H ′
)
)
and set V N = ( AC N A ) , which satisfies the condition of Lemma 4.9 in H , and thus, can be used instead [Vatsya and Nikumb, 2002]. With the basis in accordance with Theorem 4.5, both of the methods yield the same numerical procedure.
Foundations and Applications of Variational and Perturbation Methods
261
Let Z ( N ,N ) be the inverse of (P N ⊗ I N )V (P N ⊗ I N ) , restricted to the N N dimensional subspace (P N ⊗ I N ) H of H , where IN is the identity in R N and PN is the
)
orthoprojection on the space spanned by a basis {φn } in H ′ . The operator V N , is defined by
) V ( N ,N )u = V (P N ⊗ I N ) Z ( N ,N ) (P N ⊗ I N )V u = AC A (P N ⊗ I N ) Z ( N ,N ) (P N ⊗ I N ) AC Au =
∑
( n ,ν )( m , μ )
(7.20-a)
[ Z ( N ,N ) ]( n ,ν )( m , μ ) AC A[eν ⊗ φn ]( AC A[eμ ⊗ φm ], u ),
where the summation is over the respective ranges and [ Z ( N ,N ) ] is the inverse of the matrix
[V ( N , N ) ] , with the elements [V ( N , N ) ]( n ,ν )( m , μ ) = ([ eν ⊗ φ n ], V [ eμ ⊗ φ m ]) .
(7.20-b)
In accordance with Theorem 4.5, computational evaluation of the lower bounds {σ ((nN,ν,N) ) } is considerably simplified with the special choice
{(eν ⊗ φn )} = {[ A C A]−1 eν φG0 } . n
)
With these basis functions, the eigenvalues and the eigenvectors of ( A 0 + V ( N ,N ) ) , for
n > N , are identical to those of ( a0 A 2 ) , i.e., σ ((nN,ν,N) ) = σ (0n ,ν ) , with the corresponding eigenvector eν φG0 n (r ) . For n ≤ N , σ ((nN,ν,N) ) , are the solutions of the matrix eigenvalue equation:
∑μ Φ
(m, )
k ( n ,ν )( m , μ )
h′( G m μ ) = σ (( Nj ,η,N) ) (k ) h′( n ,ν ) ,
(7.21-a)
The matrix elements on the left side of Eq. (7.21-a) are given by,
Φ k( n ,ν )( m , μ ) = ⎡⎣( k + G n ) × eν ⎤⎦ ⎣⎡( k + G m ) × e μ ⎦⎤ {a0δ nm + [C N−1 ]G−1n G m } ,
(7.21-b)
where [C N−1 ]G−1n G m are the elements of the inverse of the matrix with elements [φG0 n , C −1φG0 m ] , and δ nm is the Kronecker delta. For n ≤ N ' ≤ N , σ ((nN,ν,N) ) provide non-trivial bounds to
σ ( n ,ν ) , i.e., σ ((nN,ν,N) ) ≤ σ ( n ,ν ) , as long as σ ((NN′,,νN)) ≤ σ ([( NN,+N1],)ν ) . This condition restricts the smallest value allowed for a0 .
262
S. Raj Vatsya
The present method to calculate the lower bounds parallels the Rayleigh-Ritz method with plane wave basis, differing as follows. In the Rayleigh-Ritz method, the second multiplicative factor, defining the matrix in Eq. (7.18-b), is equal to the matrix elements of
(1/ ε) , which in the method for the lower bounds, is replaced by {a0δ nm + [C N−1 ]G−1n G m } . The elements ( a0δ nm ) define a constant, diagonal matrix, and [C N−1 ]G−1n G m are the matrix elements of the inverse of the matrix with elements [φG0 n , (ε /(1 − a0 ε))φG0 m ] . The function
[ε /(1 − a0 ε)] has properties similar to (1/ ε) . In the limit of a complete basis set,
{a0δ nm + [C N−1 ]G−1n G m }
reduces to the matrix representation of
(1/ ε) . Additional
computational effort in calculating the lower bounds by this method arises out of the need to invert [C N− 1 ] . A numerical application of the method is illustrated in Vatsya and Nikumb (2002). Convergence can be obtained in Lemma 4.9 for the potentials bounded below by a positive constant. In the present case, C is bounded on both sides by positive constants and
| A | is strictly positive, which is more than what is needed to ascertain that σ ((nN,ν,N) ) ↑ σ ( n ,ν ) for each fixed ( n,ν ) as N → ∞ . For convenience, we use the compact, vector notation:
( n,ν ) = n , ( N , N ) = N , (eν ⊗ φn ) = φ n , (P N ⊗ I N ) = PN and similar. Let H + be the completion of D ( A ) with respect to the scalar product (u, υ) + = (u, A υ) . The following results follow from the same arguments as in Theorem 3.7 (i). It follows that
(u, υ) = (u, υ) + , where
is the closure of A
−1
from D ( A ) in H + . The points z ≠ 0 in by ξ = z −1 and the
the resolvent set of A are related to the points ξ in the resolvent set of spectrum of
{ }
consists entirely of the isolated eigenvalues σ n−1
with an accumulation
point at zero (Remark 2.5). With this preparation, we have
{
Theorem 7.1. Let σ (( jN,η,N) )
bounds
σ
( N ,N ) ( j ,η )
as
defined
↓ σ ( j ,η ) ↑ σ
( N ,N ) ( j ,η )
by
} and {σ Eqs.
( N ,N ) ( j ,η )
} be the sequences of the upper and the lower
(7.18-a-b)
and
(7.21-a-b),
respectively.
Then
as N → ∞ . The corresponding approximate eigenprojections also
converge to the exact. Proof. Convergence of the upper bounds to σ ( j ,η ) from above together with the
corresponding eigenprojections to the exact, follows from Theorem 3.13. For the lower bounds, we proceed by transforming the equation
) ( A 0 + V N − σ nN )u nN = 0 .
(7.22)
to the equivalent equation in H + . While the arguments are essentially the same as in sec. 4.I., a detailed construction is useful due to a relatively complicated structure of the present operators. Consider the form
Foundations and Applications of Variational and Perturbation Methods
) ) ( υ,[ A 0 + V N − σ ]u ) = ( υ, A −1[ A 0 + V N − σ ]u ) + ) = ( υ, 0 u ) + + ( υ, V N+ ]u ) + − σ ( υ, u ) + , 0
where
)
−1
263
(7.23)
)
0
and V N+ are the closures of A A and A −1V N in H + , respectively.
)
)
Consider V N+ . From Eqs. (7.20-a) and (7.20-b), ( υ,V N u ) is given by
) ( υ, V N u) =
∑
N n =1
α n (u) (V υ, φn ) ,
(7.24-a)
where
∑
n =1
∑
n =1
N
α m (u) (φ n , V φm ) = (φn , V u) ,
(7.24-b)
α m (u) (φ n , φ m ) + = (φ n , u )+ ,
(7.24-c)
i.e.,
with
N
−1
being the closure of A V in H + . As in Lemma 6.2, we have || u ||2+ ( u, A u ) (V u, A −1V u ) ( AC Au, A −1 AC Au ) (C Au, B −1C Au ) . = = = = || u ||2+ ( u, A u ) ( u, A u ) ( Au, B Au) ( Au, B Au )
−1 −1 Since 0 < ε max ≤ B ≤ ε min < ∞ , it follows that
1 2 ε 2min (ε −max − a0 ) 2 ≤ ε min
|| u ||+2 ( Au , C 2 Au ) ( Au , C 2 Au ) 2 2 1 ε (ε −min ≤ ≤ ≤ ε max − a0 ) 2 . max ( Au , Au ) || u ||+2 ( Au , Au )
Thus,
is a strictly positive and bounded operator in H + , due to this property of | A |
and C . Since
is bounded, it is self adjoint in H + from Lemma 4.3. Further, since,
is
strictly positive, [ PN+ PN+ ]−1 PN+ exists as a bounded operator (Remark 3.2(i)) and hence, Eq. (7.24-c) has a unique solution, which together with Eq. (7.24-a) yields
) ( υ, V N u ) =
∑
N
α (u ) (V υ, φ n ) = n =1 n
∑
N n =1
α n (u ) ( υ, φ n ) +
) = ( υ, PN+ [ PN+ PN+ ]−1 PN+ u ) + = ( υ, V N+ u ) + ,
i.e.
) V N+ =
PN+ [ PN+ PN+ ]−1 PN+
.
(7.25)
264
S. Raj Vatsya
Also, from Theorem 3.7 (i), [ PN+ PN+ ]−1 PN+ 0
H + . The operator
)
+ → 1 in H + and thus, V N →
) is just VN+ with C replaced by 1.
strongly in
Eq. (7.22) assumes the following form in H + : 0
[
) + V N+ − σ nN ]u nN = 0 ,
(7.26-a)
which is equivalent to (Remark 2.5)
[(σ nN ) − 1 −
N
]w nN = 0 ,
where, due to the positivity of N
=[
N
→
0
) + V N+ ]−1/ 2 [
0
(7.26-b) 0
)
and VN+ , the self-adjoint operator
N
is well-defined by
) + V N+ ]−1/ 2 (Theorem 4.2). It is straightforward to check that
strongly in H + . The convergence of (σ nN ) −1 to (σ n ) −1 and hence, of σ nN to σn ,
together with the corresponding eigenprojections, follows as in Theorem 3.13. • Converging sequence of the lower bounds can be computed by the method of Theorem 4.6 also but it is less convenient for the present case. Thus, the two methods to compute the lower bounds complement each other, although the method of Theorem 4.6 is less restrictive in general.
7.III. POSITIVITY LEMMA FOR THE ELLIPTIC OPERATORS 7.III.1. Basic Results In this section, the results developed for the variational and perturbation methods are used to obtain the positivity properties of the solutions of some differential equations defined in terms of the elliptic operators on bounded domains. These results are used to treat practical problems arising in the transport phenomena in the next section. The scope of the results is considerably wider. The applications considered for the present involve simpler forms but it is just as easy to consider the general elliptic differential operators defined by
( A ′u )( x) = −
∑ μ ,ν =1 N
∂ ∂u ( x) [aμν ( x ) ] + a0 ( x )u ( x ), a0 ( x ) ≥ 0, x in D, ∂xμ ∂xν
(B u )( x ) = α ( x )u ( x ) + β ( x )
∂u ( x ) = 0, x in ∂D, ∂η
α ( x ) ≥ 0, ≠ 0, β ( x ) ≥ 0, [α ( x ), β ( x )] ≠ [0,0], ∂u ( x ) = ∂η
∑ μ ,ν =1 N
xˆ μ aμν ( x )
∂u ( x ) , ∂xν
(7.27)
Foundations and Applications of Variational and Perturbation Methods
265
where D is an open region in RN with ∂ D and D being its boundary and the closure, respectively, aμν ( x) are the elements of a positive-definite matrix, aμν ( x) , a0 ( x) , α ( x ) and β ( x ) are piecewise continuous functions of x , in addition to being positive, and ∂ / ∂η is the
co-normal derivative defined by the unit outer normal xˆ = ( xˆ1 ,..., xˆN ) to ∂ D at x . The formal differential operator A ′ supplemented with the stated boundary conditions defines a self-adjoint operator A in H = L2 ( D , dx ) , where dx is the product measure (dx1...dxN ) . This can be checked by the methods used in ch. 2, e.g., Example 2.7. The condition that at least one of the two functions α ( x ) and β ( x ) be nonzero at each point of ∂ D is required for the self-adjointness of A . The condition that α ( x ) does not vanish everywhere on ∂ D ensures its positive-definiteness. As indicated in Example 2.7, with vanishing α ( x ) , A has zero as one of its eigenvalues, which invalidates some of the following results. However, the analysis can be adjusted to obtain similar results with vanishing α ( x ) . For N = 1 , A is usually known as the Strum-Liouville operator. The field underlying H will be restricted to the real numbers. Elliptic operators arise in numerous physical applications. For this reason and for their independent mathematical interest, they have been studied extensively by the classical [Courant and Hilbert, 1962, Vol. II; Ch. IV] as well as the modern methods. [Stone, 1932, Ch X]. The properties of the multi-dimensional operator defined by Eq. (7.27) are remarkably similar to its one-dimensional counterpart considered in Example 2.7. In addition to defining a positive definite operator, A −1 is an integral operator with its kernel, the Green function, G0 ( x; ξ ) being a positive and bounded function on D × D [Courant and Hilbert, 1953]. Since D is bounded, G0 ( x; ξ ) is square integrable and integrable. Thus, A −1 is a Hilbert-Schmidt
operator. Green’s functions for ( A + V ) , with V being a sufficiently smooth and small perturbation retain these properties of G0 ( x; ξ ) . Frequently this follows by absorbing V in a0 ( x) . Alternatively, the Neumann expansion (Lemma 2.3) and the spectral theorem can be
used. Positivity of these Green’s functions will also be established independently by the positivity lemmas proved below. In addition, these Green’s functions are continuous, although not necessarily continuously differentiable. These properties of G0 ( x; ξ ) and other mildly perturbed Green’s functions will be assumed in this section, with comments if necessary. Above results can also be obtained by expressing A as a tensor product of the type of Eq. (7.16) with each non-identity component in the products being a one-dimensional operator of the type considered in Example 2.7. The spectral structure of A can be determined by Ichinose’s lemma (Eq. (6.5)) from the spectral structure of its one-dimensional components. The stated results can be deduced from the spectral structure. The domain of A constitutes of absolutely continuous functions of x on D with absolutely continuous derivative (Example 2.1, Examples 2.4 to 2.7). These functions are differentiable almost everywhere, not necessarily everywhere. In fact, they are differentiable in the complement of a closed set of measure zero, i.e., on the intervals. Thus, they are piecewise differentiable. Alternatively, the domain of A coincides with the range of A −1 ,
266
S. Raj Vatsya
i.e., the domain of A constitutes of the functions u of the form u = A −1υ as υ varies over H . This characterization can also be used to conclude that the domain of A constitutes of twice, piecewise continuously differentiable functions. As an example, while both, sin( nπ x ) and | sin( nπ x ) | , n > 1 , are in the domain of D of Example 2.6, the requirement of differentiability everywhere excludes | sin( nπ x ) | . If the formal differential operator A ′ of Eq. (7.27) is assumed to be defined on twice continuously differentiable functions, then A is a self-adjoint extension of A ′ . Also, since the variational forms are the scalar products, the sets of measure zero do not contribute to their values. These points have been discussed in the preceding chapters. Further elaboration is given here for this domain definition has a significant impact on the following analysis. In the following, continuity will mean piecewise continuity, for brevity. Since A −1 is a Hilbert-Schmidt operator, A has a pure point spectrum. Consider the Eigenvalue problem ( A − μ 0 ρ )ϖ 0 = 0 ,
(7.28)
where ρ ( x ) is a positive, bounded and continuous function of x on D and μ 0 = μ 0 ( ρ ) is the lowest eigenvalue defined by Eq. (7.28). Remark 7.1. It is clear that μ0−1 is the largest eigenvalue of ( A −1 / 2 ρ A − 1 / 2 ) , which is a Hilbert-Schmidt operator as follows by the arguments similar to Lemma 4.4. Thus,
μ 0−1 =|| A −1 / 2 ρ A −1 / 2 || . It is easily seen that ( A −1 / 2 ρ A − 1 / 2 ) is uniformly continuous with respect to ρ ( x ) implying from Lemma 3.2(ii) and Remark 3.2 (ii) that μ0−1 and hence μ0 , is continuous with respect to ρ ( x ) . Since ρ ( x ) is a continuous function of x , this continuity property extends to x , by taking the mean value of the limits from the left and the right, if necessary • The result of Lemma 2.7 can be extended to the present case, i.e., the positivity of the function ( A − λρ ) − 1 g for g > 0 and λ < μ0 , essentially by the same methods, which requires the result for λ = 0 . For the present general case, a similar conclusion can be reached from the results obtained by Aronszajn and Smith (1957), which were deduced by using the fact that ( A − λρ ) −1 has a pseudo-reproducing kernel. To be precise, it was shown that the positivity of ( A − λρ ) −1 as an operator in H and the positivity of its Green’s function as a function on D × D are equivalent for λ < μ0 . In the following analysis, this will follow from the positivity of ( A − λρ ) −1 g for all g > 0 . A proof of this result based on the variational methods was given by Keller [1969], which is an extension of the proof for a special case by Bellman [1957]. We provide essentially Keller’s proof with some adjustments for technicalities, as an application of the present methods. The remainder of the analysis combines Keller’s and Vatsya’s methods [Vatsya, 1981-1, 1987]. The term positivity lemma
Foundations and Applications of Variational and Perturbation Methods
267
is often used for similar results of varying strengths. For the applications, precise results will be stated and validated. Lemma 7.1. Let g ( x ) be a strictly positive continuous function on D and let the other
symbols be as in Eqs. (7.27) and (7.28). If λ < μ0 and f ( x ) is a solution of ( A − λρ ) f = g ,
(7.29)
then f ( x ) > 0 on D . Proof. Since λ < μ0 , [ A − λρ ] is a positive definite operator. From Eqs. (3.15-a, b), f is the solution of Eq. (7.29) if and only if it maximizes the diagonal form
F d [ h ] = 2( h , g ) − ( h ,[ A − λρ ]h ) ,
(7.30-a)
as h varies over D ( A ) . Substitution from Eq. (7.27) in Eq. (7.30-a) and integration by parts yields F d [ h] = ∫
∂D , β ≠ 0
⎡ N ⎤ ⎛α 2 ⎞ ∂h ∂h h ⎟ ( x) − ∫ dx ⎢∑ μ ,ν =1 aμν + (a0 − λρ )h 2 − 2 gh ⎥ ( x) D ∂xμ ∂xν ⎢⎣ ⎥⎦ ⎝β ⎠
ds ⎜
(7.30-b)
whenever β = 0 on ∂ D , the boundary term is equal to zero. Thus, the first integration on the restricted domain includes the complete boundary contribution. We have that ∂h / ∂xμ = ∂ | h | / ∂xμ for almost all x in D wherever h ( x ) ≥ 0 and ∂h / ∂xμ = −∂ | h | / ∂xμ whenever h ( x ) ≤ 0 , and if h is in D ( A ) , so is | h | . Consequently, F d [| h |] − F d [ h ] = 2( g ,[| h | − h ]) .
Let f be the maximizing function. If f < 0 at a point x of its continuity, it is negative in a neighborhood of x . Consequently, F d [| f |] − F d [ f ] = 2( g ,[| f | − f ]) > 0 ,
and hence, f is not the maximizing function. This is a contradiction implying that f ≥ 0 almost everywhere. If f = 0 at a point x of its continuity, it is twice continuously differentiable at x achieving a local minimum there. Consequently, ∂f / ∂xμ = 0 and the matrix with elements ∂ 2 f / ∂xμ ∂xν is non-negative. At this point, Eq. (7.29) reduces to
268
S. Raj Vatsya
∑ μ ,ν =1 N
aμν ( x )
∂f ( x ) ] = − g ( x) . ∂xμ ∂xν
Being the trace of the product of two non-negative matrices, the left side is non-negative while the right side is strictly negative, reaching a contradiction. Hence f ( x ) > 0 on D • The following property of ϖ 0 is needed to strengthen the result of Lemma 7.1. Being an eigenvector, normalized ϖ 0 is defined within a multiplicative factor of absolute value one. We can ensure that Corollary 7.1. There exists an almost everywhere positive eigenvector ϖ 0 as defined by Eq. (7.28). Proof. It follows from the minimum principle (Lemma 3.17) that
μ0 ( ρ ) =
min
u in D ( A )
(u , A u ) . (u , ρ u )
(7.31-a)
As in Lemma 7.1, the right side of Eq. (7.31-a) is invariant under the change of u to | u | , yielding the same minimum. Thus, ϖ 0 can be replaced with its absolute value • The argument of Corollary 7.1 breaks down for the higher eigenvalues since the minimization is required on the subspaces orthogonal to ϖ 0 . Application of Lemma 3.17 may appear illegitimate here. However, the result of Lemma 3.17 can be extended to include such situations by similar arguments. Further, using the fact that μ 0− 1 =|| A − 1 / 2 ρ A − 1 / 2 || (Remark 7.1), we have that
μ0 ( ρ ) = min
u in H
(u , u ) (u , A
−1/ 2
ρ A −1/ 2 u )
,
(7.31-b)
which is equivalent to Eq. (7.31-a). Theorem 7.2. (Positivity lemma) Let g ( x ) be a strictly positive continuous function on D and let the other symbols be as in Eqs. (7.27) and (7.28). If f ( x ) is a solution of Eq.
(7.29), i.e., ( A − λρ ) f = g , then f ( x ) > 0 on D if and only if λ < μ0 . Proof. The sufficiency part follows from Lemma 7.1. For the necessity, from Eqs. (7.28) and (7.29), we have that (ϖ 0 , g ) = (ϖ 0 ,[ A − λρ ] f ) = ( μ 0 − λ ) (ϖ 0 , ρ f ) ,
Foundations and Applications of Variational and Perturbation Methods
269
where ϖ 0 is positive from Corollary 7.1. Since ρ , g > 0 , it follows that if f > 0 , then ( μ 0 − λ ) = (ϖ 0 , g ) /(ϖ 0 , ρ f ) > 0 •
A weaker form of the positivity lemma that proves useful at times, can be obtained by adjusting the arguments of Lemma 7.1. Corollary 7.2. (Weak positivity lemma) Let g ( x ) ≥ 0 be continuous on D and let the other symbols be as in Lemma 7.1. If λ < μ0 , then f ( x ) ≥ 0 •
7.III.2. Applications Positivity lemmas prove useful in solving the fixed point equations of the type A u = λ F ( u ) = F λ (u ) ,
(7.32-a)
which arise in a number of applications. Eq. (7.32-a) is equivalent to u = A − 1F λ ( u ) = B u ,
(7.32-b)
and to w = A −1 / 2 F λ ( A −1 / 2 w ) = C w ,
(7.32-c)
with the identification u = A −1/ 2 w (Theorem 4.2, Lemma 4.4), where F λ (u ) is in general a non-linear, continuous function of u . If F λ (u ) is a differentiable function, the Fréchet derivatives B& , C& of B , C are defined by B& (u )υ = A − 1F &λ (u )υ .
(7.33-a)
C& ( w )υ = A − 1 / 2 F &λ ( w ) A − 1 / 2υ .
(7.33-b)
If B , C map a well-defined set into itself and if B& , C& < 1 on this set, the contraction mapping theorem (Lemma 4.20) can be used to obtain the respective solutions. Sometime this is sufficient but not always, which will be illustrated by a concrete problem in the next & C& , can section. The adjustments indicated in Eqs. (4.55-a, b), with a constant η and η = B, be used to enlarge the class of equations solvable by the iterative methods, as shown below. Lemma 7.2. Assume that
270
S. Raj Vatsya (i) S is a convex set in H ; (ii) for all x in D and all w in S , μ 00 > γ ≥ F &λ ( w ) ≥ −ζ , where γ , ζ are positive
constants and μ 00 = μ 0 (0) = || A −1 ||−1 ; (iii) there is a w in S such that F λ ( w) is in H . Then F λ ( w) is in H for all w in S . Proof. For υ , w in S , we have from Eq, (4.54-b) that
F λ (υ ) = F λ ( w) +
1
∫0 dτ
F&λ [τυ + (1 − τ ) w]( w − υ ) .
It follows from the convexity of S and (ii) that
γ ≥
1
∫0 dτ
F&λ [τυ + (1 − τ ) w] ≥ − ζ ,
(7.34)
implying that | F λ (υ ) | ≤ | F λ ( w) | + max(γ , ζ ) | υ − w | .
Since | υ − w | and F λ (υ ) are in H , it follows that F λ ( w) is in H • In view of Lemma 7.2, family of operators
κ
κυ
= ( A + κ ) − 1[κυ + F λ (υ )], υ in S , defines a one-parameter
from S to H and κ in the interval ( − μ 00 ; ∞ ) . The problem of
solving Eq. (7.32-a) is now reduced to obtaining the fixed points of one to one correspondence. Green’s function for
−1
κ
κ
, since the two are in
retains the properties of G0 ( x; ξ ) for
κ in ( − μ 00 ; ∞ ) as can be seen by absorbing κ in a0 ( x) .
Lemma 7.3. In addition to the assumptions of Lemma 7.2, let
1 [ max(γ , ζ ) − γ ] . Then 2 || on H .
with some κ ≥ the norm ||
κ
κS
be contained in S
is a contraction of S in H , i.e., with respect to
Proof. With υ , w in S , we have that κυ
−
κw
= ( A + κ ) −1[κ (υ − w) + F λ (υ ) − F λ ( w)] = ( A + κ ) −1
where
κ (υ ; w)
κ (υ ; w)(υ
− w),
is the operation of multiplication defined by Eq. (4.54-b) as
Foundations and Applications of Variational and Perturbation Methods
κ
(υ ; w) h =
(κ + ∫ dτ F& [τυ + (1 − τ )w]) h, 1
λ
0
271
h in H .
It follows from the convexity of S , Eq. (7.34) and the assumed bounds on κ that
(κ + ∫ dτ F& [τυ + (1 − τ )w]) 1
(κ + γ ) ≥
0
implying that || follows that ||
κ κυ
λ
≥ − (κ + γ ), (κ + γ ) ≥ 0 ,
(υ ; w) || ≤ (κ + γ ) . Since || ( A + κ ) −1 || ≤ (κ + μ 00 ) −1 (Corollary 2.4), it −
κ w ||
≤ μκ || υ − w || , where μ κ = [(κ + γ ) /(κ + μ 00 )] < 1 •
Corollary 7.3. With the assumptions of Lemma 7.3, the iterative sequence
{un +1}∞n =0 = {
}∞
κ un n = 0
with an arbitrary vector u0 in S is contained in S and
|| un − u m ||→ 0 as n, m → 0 , i.e., it is a Cauchy sequence with respect to ||
Proof. From Lemma 7.3,
κS
|| .
is contained in S , and since u0 is in S , {un } is in S by
induction. The fact that it is a Cauchy sequence is the first part of the contraction mapping theorem (Lemmas 4.19 and 4.20), in view of the result of Lemma 7.3 • Corollary 7.4. In addition to the assumptions of Lemma 7.3, let S be closed with respect to || || and let {un } be as in Corollary 7.3. Then || un − u ||→ 0 , where u is the unique fixed
point of
in S .
κ
Proof. The result follows from the contraction mapping theorem (Lemmas 4.19, 4.20) •
In the following, the result of Corollary 7.4 is strengthened to conclude the uniform converge of the sequence with respect to x on D . Lemma 7.4. With the assumptions of Lemma 7.3, for each υ , w in S , |||
κυ
−
κ w |||
≤ const . || υ − w || .
Proof. For each x in D and υ , w in S , we have
|[
κυ
−
κ w]( x ) |
≤ ||
= | ∫ d ξ Gκ ( x; ξ ) D
κ (υ ; w)(υ
κ (υ ; w)(υ
− w) | 1/ 2
− w) || ⎡ ∫ d ξ Gκ2 ( x; ξ ) ⎤ ⎣ D ⎦
,
by the Schwarz inequality (Lemma 1.4). Green’s function Gκ ( x; ξ ) defining −κ1 is bounded and square integrable. Consequently, the bracketed term is bounded. Denote the bound by ς . We have
272
S. Raj Vatsya κυ
|[
−
κ w ]( x ) |
≤ ς ||
κ
(υ ; w) || || (υ − w) || ≤ ς (κ + γ ) || (υ − w) || .
κ (υ ; w) ||
We have used the estimate of ||
from Lemma 7.3 •
Theorem 7.3. In addition to the assumptions of Lemma 7.4, let S be closed with respect
to the supremum norm ||| w |||= sup | w( x) | . Then the iterative sequence x in D
{
}∞
κ un n =0
with an
arbitrary vector u0 in S converges uniformly with respect to x on D to the unique fixed point of
κ
in S .
Proof. Since from Lemma 7.4, ||| u n +1 − u m +1 ||| = ||| [
κ un
−
κ u m ] |||
≤ ς (κ + γ ) || u n − u m ||,
∞
we have from Corollary 7.3, that {un }n =0 and to |||
{
}∞
κ un n = 0
are Cauchy sequences with respect
||| with the same limit. Since S is closed, there is a u in S such that lim ||| u n +1 − u ||| = lim ||| [
n →∞
n →∞
κ un
− u ] ||| = 0 . κ un
From Lemma 7.4 and Corollary 7.4, we have ||| [
κ u ] |||≤ ς (κ
−
+ γ ) || un − u || → 0 . n →∞
It follows that
||| [u −
κ u ] |||
≤ lim
n →∞
[||| u − un+1 ||| + |||
which shows that u is a fixed point of
κ
and
κ un
{
]
−
κ u |||
}∞
κ un n = 0
= 0,
converges uniformly.
If there is another fixed point u% ≠ u , then from Corollary 7.4, || u − u% ||= 0 , i.e., u% = u almost everywhere. However, it follows from Lemma 7.4 that ||| u − u% ||| = |||
κu
−
κ u% |||
≤ ς (κ + γ ) || u − u% || .
Consequently, u% = u everywhere on D • Theorem 7.3 shows that the freedom in the choice of κ can be exploited to reduce the problem to one involving a contraction while this may not be the case with a given value of κ , e.g., κ = 0 . We show below that a judicious choice of κ can be used further for a smaller class of non-linearities to obtain a monotonically convergent sequence.
Foundations and Applications of Variational and Perturbation Methods
273
Proposition 7.1. In addition to the assumptions of Theorem 7.3, let u0 = 0 be in S ,
F λ (0) ≥ 0 for all x in D , and κ ≥ ζ . Then {un } as defined in Corollary 7.3 converges
uniformly and monotonically from below to the fixed point u . Proof. Since the convergence follows from Theorem 7.3, it is sufficient to show that {un } is a non-decreasing sequence.
From the weak positivity Lemma (Corollary 7.2), we have that u1 =
κ u0
≥ 0 = u0 . Let
u n ≥ u n −1 . From Eq. (7.34) we have κ (un ; un −1 )
1
= κ + ∫ dτ F&λ [τ un + (1 − τ )un −1 ] ≥ 0 , 0
which together with the induction assumption implies that [
κ
(u n ; u n −1 )(u n − u n −1 )](ξ ) ≥ 0 .
Consequently, (u n +1 − u n ) = −κ 1 κ (u n ; u n −1 )(u n − u n −1 ) ≥ 0 , from the weak positivity lemma. The monotonicity follows by induction • In the following, we give some characterizations of suitable sets that may be substituted for S in Theorem 7.3 and Proposition 7.1. Let F λ (0) be in H and for η > 0 , let uη = ( A + η ) −1 | F λ (0) | , which is non-negative from Corollary 7.2 and bounded as in Lemma 7.4. Since ( A + κ )uη =| F λ (0) | + (κ − η )uη , it follows that uη = ( A + κ ) −1 ⎡⎣| F λ (0) | + (κ − η )uη ⎤⎦ . Let Q η be the set of all functions bounded by uη , i.e., all υ such that | υ |≤ uη and let Q η+ be the set of all non-negative functions bounded by uη , i.e., all υ such that 0 ≤ υ ≤ uη . Then it is straightforward to check that the conditions (i) and (iii) of Lemma 7.2 hold on Q η and Q η+ . Assuming that (ii) is satisfied, κ can be restricted to satisfy the condition of Lemma 7.3. Proposition 7.2. Let Q η+ , Q η be as above.
(i) If there exist η and κ such that for each υ in Q η+ ,
0 ≤ κυ + F λ (υ ) ≤ F λ (0) + (κ − η )uη , then
−1
κ
Q η+ is contained in Q η+ .
(ii) If there exist η and κ such that for each υ in Q η ,
274
S. Raj Vatsya
0 ≤| κυ + F λ (υ ) |≤| F λ (0) | + (κ − η )uη , then
−1
κ
Q η in contained in Q η .
Proof. We give a proof for (i); a proof of (ii) follows by similar arguments.
For υ in Q η+ , we have that
0≤
υ ≤ ( A + κ ) −1 [κυ + F λ (υ ) ] ≤ ( A + κ ) −1 ⎡⎣F λ (0) + (κ − η )uη ⎤⎦ = uη .
−1
κ
It is clear that if the condition of Proposition 7.2 holds and if (ii) of Lemma 7.2 is satisfied on the corresponding set, then κ can be chosen to satisfy the bound condition of Lemma 7.3. Thus, Q η+ , Q η , can be taken for S to ensure the uniform convergence of the iterative sequence to the desired solution. In case of Q η+ , u0 and κ can be chosen to satisfy the condition of Proposition 7.1 to obtain a monotonically convergent sequence • Next we still consider Eq. (7.32-a) but with different properties of the non-linearities, inspired by some problems arising in the heat transfer phenomena. The set of positive functions in H will be denoted by S + and the set of the numbers λ for which Eq. (7.32-a) has a positive solution, by Λ with λ * being its least upper bound, which is a parameter of physical significance. Lemma 7.5. Let F ( x ;υ ) be continuous and strictly positive, for υ in S + for all x in D , which includes F ( x ; 0) > 0 . Then each λ in Λ is positive. Proof. If Eq. (7.32-a) has a solution u > 0 for λ < 0 , then − u = − λ A −1F (u ) > 0 , from the positivity Lemma (Theorem 7.2), reaching a contradiction. For λ = 0 , u = 0 is the unique solution, yielding the result • Theorem 7.4. In addition to the assumptions of Lemma 7.5, let F be strictly increasing function of υ on S + , i.e., F ( w ) > F (υ ) for w > υ ≥ 0 , and let u0 = 0 , u n = λ A −1F (u n −1 ) , n = 1, 2,... , for λ > 0 . Then
(i) λ is in Λ if and only if {un } is uniformly bounded; (ii) for λ in Λ , {un } converges monotonically from below uniformly with respect to x in D to the minimal positive solution of Eq. (7.32-a). Proof. First we show that {un } is a monotonically increasing sequence, by induction.
Foundations and Applications of Variational and Perturbation Methods Since
F (0) > 0 ,
275
it follows from the positivity lemma (Theorem 7.2) that
u1 = λ A − 1F (0) > 0 = u 0 . Assume that the result holds for some n . By the same argument, u n +1 − u n = λ A − 1[ F (u n ) − F (u n −1 )] > 0 .
If {un } is uniformly bounded, it must converge to some positive bounded function u given by
u ( x) = lim un +1 ( x) = λ lim [ A −1F (un )]( x) n →∞
n→∞
= λ lim
n →∞
Since {un } is uniformly bounded and F
∫D dξ
G0 ( x; ξ ) [F (un )]( x).
is continuous, F (un ) is also uniformly bounded.
Therefore, the integrand is uniformly bounded by a constant multiple of G0 ( x; ξ ) which is positive and integrable. Therefore, the order of the limit and integration is interchangeable by the Lebesgue theorem (Theorem 1.3), which together with the continuity of F implies that u is a solution of Eq. (7.32-a). Thus, λ is in Λ . If λ is in Λ , then there is a solution u > 0 = u0 . Again by induction, using the inequality, u − u n +1 = λ A − 1[ F (u ) − F (u n )] > 0 , we have that {un } is bounded by u . This proves (i).
The monotonicity of the sequence implies the monotonicity of the convergence in (ii). The uniform convergence follows from the monotonicity and Dini’s theorem (Lemma 1.2), since u ( x ) and un ( x) are continuous, which follows from the continuity of G0 ( x; ξ ) with respect to x and the Lebesgue theorem. Now, let u > u be another solution. As above, since u > u0 , we have that u > un for each n . Hence u = lim un ≤ u , i.e., u is the minimal positive solution • n →∞
The characterization of u as the limit of the sequence defined in Theorem 7.4 determines a number of its useful properties, which follow essentially by the same arguments as in Lemma 7.5 and Theorem 7.4. The assumptions of Theorem 7.4 will be assumed to hold in the rest of this section. Additional assumptions needed for the validity of a particular result, will be stated. Lemma 7.6. Let F
be dominated by F , i.e., F (w) > F (υ ) for each w ≥ υ ≥ 0 , and
let a positive solution u = λ A − 1F ( u ) exist for some λ > 0 . Then a positive solution u = λ A − 1F (u ) exists and u ( λ ) < u ( λ ) for all x in D .
Proof. Let u 0 = u 0 = 0 and let u n +1 = λ A −1F (u n ) , u n +1 = λ A −1F (u n ) . If un ≤ u n for
some n , then u n +1 − u n +1 = λ A − 1[ F (u n ) − F (u n )] > 0 from the positivity lemma (Theorem
276
S. Raj Vatsya
7.2). Since
0 = u0 ≤ u0 = 0 , this implies that
un ≤ un
for all
n . Consequently,
u (λ ) = lim un ≤ lim un = u (λ ) . n→∞
n→∞
If u ( λ ) = u ( λ ) everywhere on D , we have 0 = u − u = λ A −1[ F (u ) − F ( u )] > 0 , from the positivity lemma, which is a contradiction. Hence, u ( λ ; x ) < u ( λ ; x ) at some x in D and by continuity, in some neighborhood of x . Since (u − u )( λ ; x ) = λ
∫D d ξ
G0 ( x ; ξ ) [ F (u ) − F (u )]( λ ; ξ ) ,
(7.35)
and Green’s function is strictly positive on D , u ( λ ) < u ( λ ) for all x in D . Alternatively, it can be shown that u1 > 0 = u0 . Then the induction assumption can be taken to be un < u n to arrive at u < u • Strict inequality holds only on D , not necessarily on ∂ D . Corollary 7.5. Let λ ′ > 0 be in Λ . Then the interval (0; λ ′] is in Λ and the minimal positive solution u ( λ , x ) is an increasing function of λ in Λ for all x in D . Proof. The results follow from Lemma 7.6 by setting F = (λ ′ / λ )F
for each
0 < λ ≤ λ′ •
Corollary 7.6. Let F
be dominated by a linear function F ( x;υ ) = [ f 0 ( x) + ρ ( x)υ ] .
Then μ 0 ( ρ ) ≤ λ * . Proof. Since Au = λF (u ) is equivalent to [ A − λρ ]u = λ f 0 ( x ) > 0 , a positive solution u exists for each
λ < μ 0 ( ρ ) from the positivity lemma. From Lemma 7.6, a positive solution
u = λ A − 1F (u ) exists for each λ < μ 0 ( ρ ) and hence μ 0 ( ρ ) ≤ λ * •
Corollary 7.7. Let F solution for any λ > λ * .
be as in Corollary 7.6. Then Au = λF (u ) has no positive
Proof. If it did, A u = λ F (u ) would have a positive solution from Lemma 7.6, contradicting the definition of λ * • Corollary 7.8. If F
dominates a linear function [ f 0 ( x ) + ρ ( x )υ ] , then λ * ≤ μ 0 ( ρ ) .
Proof. Follows by transposing the argument of Corollary 7.6 •
The condition F ( x ; 0) > 0 is essential for the above results to hold as shown by
Foundations and Applications of Variational and Perturbation Methods
277
is dominated by [ ρ ( x )υ ] , then a positive solution of A u = λ F (u ) does not exist for any λ in the interval (0; μ 0 ( ρ )) . Corollary 7.9. If F
Proof. Assume to the contrary, i.e., there is a positive u such that A u = λ F (u ) for some λ < μ 0 ( ρ ) , i.e., [ A − λρ ]u = λ [ F ( u ) − ρ u ] . However, from the positivity lemma we have
that − u = λ [ A − λρ ]−1[ ρ u − F (u )] > 0 , contradicting the assumption • Lemma 7.7. The minimal positive solution of A u = λ F (u ) is left continuous with respect to λ in Λ .
{λn } ↑ λ . From Corollary ) u ( λn ; x ) ≤ u ( λn +1 ; x ) ≤ u ( λ ; x ) , and hence, {u (λn )} ↑ u ≤ u (λ ) . Since Proof. For a given
) u ( λn ; x ) = λn
∫D d ξ
λ , let
7.5, we have that
) G0 ( x ; ξ ) [F (ξ ; u (λn ; ξ )] ,
and the interchange of integration and limit is justified by the Lebesgue theorem (Theorem 1.3), as in Theorem 7.4, we have by taking the limit on both sides and using the continuity of ) ) ) ) ) F , that u is a solution of A u = λ F ( u ) . Since u ≥ u is minimal, u = u • The solution is in fact continuous with respect to λ on any open interval in Λ , which follows essentially from Lemma 7.7. The left continuity is significant only at the boundary, i.e., at λ * if u ( λ *) exists. Theorem 7.5. Let F
be continuously differentiable with F& (υ ) > 0 for υ in S + . Then
each λ in Λ is bounded above by ν 0 (u (λ )) = μ0 [F& (u (λ ))] . Proof. Let λ ′ < λ both be in Λ . Since from Eq. (4.54-b), F (u ( λ )) − F (u ( λ ′)) =
1
∫0 dτ
F & [τ u ( λ ′) + (1 − τ )u ( λ )][u ( λ ) − u (λ ′)] =
(λ , λ ′)[u ( λ ) − u ( λ ′)] ,
we have that
[ A − λ (λ , λ ′)] ( u (λ ) − u (λ ′) ) = λ F (u (λ )) − λ ′F (u (λ ′)) − λ (λ , λ ′)[u (λ ) − u (λ ′)] = (λ − λ ′)F (u (λ ′)) > 0. Since ( u (λ ) − u (λ ′) ) > 0 (Corollary 7.5), it follows from the Positivity lemma (Theorem 7.2) that λ < μ 0 [ (λ , λ ′)] , i.e., λ ≤ lim μ0 ( ) . Continuity of λ ′↑λ
can be concluded from the
continuity and the uniform boundedness, with respect to λ , of the integrand defining it, and with respect to λ implies the the Lebesgue theorem (Theorem 1.3). The continuity of
278
S. Raj Vatsya
continuity of μ 0 ( ) (Remark 7.1). Now the continuity of u ( λ ) (Lemma 7.7) together with the Lebesgue theorem (Theorem 1.3) implies that
(λ , λ ′) → F& [u (λ )] as λ ′ ↑ λ , implying
the result • The results from Lemma 7.5 to Theorem 7.5 are extendible to the case with A replaced by ( A + κ ) with κ in accordance with Lemma 7.3. The choice in Proposition 7.1 alters the original nonlinear function to the one satisfying the present conditions and thus, the corresponding results can be deduced as the corollaries of the extended results. However, the arguments used in Lemma 7.2 to Theorem 7.3 are significant for their instructional value as well as for applications. Remark 7.2. Although λ = 0 is not in Λ , ν 0 (0) = μ0 [F& (0)] is still well-defined. As indicated earlier,
ν 0 (u ( λ )) = || A − 1 / 2 [ F & (u (λ ))] A − 1 / 2 ||− 1 = || [ K (u ( λ ))] ||− 1 ,
(7.36)
i.e., the reciprocal of the largest eigenvalue of the compact operator K ( u ( λ )) . Since the minimal positive solution u ( λ ) is left continuous (Lemma 7.7), K ( u ( λ )) is uniformly continuous with respect to λ (Remark 7.1). Consequently from Lemma 3.2(ii) and Remark 3.2 (ii), ν 0 (u (λ )) is a continuous function of λ on (0; λ *) • Next we consider more precise nonlinearities. In case of a convex F , i.e.,
F& ( x; w) > F& ( x;υ ) for w > υ , we have F ( x;υ ) + F& ( x; w)( w − υ ) > F ( x; w) > F ( x;υ ) + F & ( x;υ )( w − υ ), υ , w > 0 ,
(7.37-a)
and for concave F , i.e., F& ( x; w) < F& ( x;υ ) , F ( x;υ ) + F& ( x; w)( w − υ ) < F ( x; w) < F ( x;υ ) + F & ( x;υ )( w − υ ), υ , w > 0 ,
(7.37-b)
Inequalities on the left and the right sides in Eq. (7.37-a) are the statements of the facts that a convex curve lies above the tangent and below the chord, respectively. The reverse is valid in case of a concave function as indicated by Eq. (7.37-b). Let
θ ( w;υ ) = F ( x; w) − F ( x;υ ) − F& ( x;υ )( w − υ ) .
(7.38)
For a convex F , Corollary 7.8 and θ ( w ; 0) > 0 (Eq. (7.37-a)) imply that λ * ≤ ν 0 (0) , and for a concave F , Corollary 7.6 and θ ( w ; 0) < 0 (Eq. (7.37-b)) yield that ν 0 (0) ≤ λ * . These properties can be improved further, as follows.
Foundations and Applications of Variational and Perturbation Methods
279
Lemma 7.8.
(i) For a convex F , ν 0 (u ( λ )) ≥ λ * is a decreasing function of λ on the open interval (0; λ *) . (ii) For a concave F , ν 0 (u ( λ )) ≤ λ * is an increasing function of λ on the open interval (0; λ *) . Proof. (i) Since F& (u(λ )) increases with u ( λ ) and hence with λ (Corollary 7.5), K ( u ( λ )) is an increasing function of λ (Remark 7.2). Thus, ν 0 (u (λ )) is a decreasing
function since ν 0−1 (u (λ )) is increasing from the monotonicity principle (Lemma 3.20). As indicated above, λ * ≤ ν 0 (0) and from Theorem 7.5, λ ≤ ν 0 (u (λ )) for 0 < λ < λ * . Since ν 0 (u (λ )) is a strictly decreasing function, this implies that ν 0 (u (λ )) > λ * . (ii) Monotonicity follows by transposing the argument of (i). For the bound, we have from Eq. (7.37-b) with w > 0 that F ( w ) < F (u ) + ( w − u ) F & ( x ; u ) < F (0) + [ F & (0) − F & (u )]u + F & ( u ) w ,
i.e., F is dominated by a linear function. The result follows from Corollary 7.6 • Let
{un }
be the sequence generated in Theorem 7.4, which exists for all λ ≥ 0 .
However, it converges to the minimal positive solution for λ < λ * and diverges for λ > λ * . For λ = λ * , it may converge or diverge. If {un ( x )} , diverges at some x in D , it must diverge everywhere for u n +1 ( λ ; x ) = λ
∫D d ξ
G0 ( x; ξ ) u n ( λ ; ξ ) ,
and Green’s function G0 ( x; ξ ) is strictly positive everywhere on D , as in Lemma 7.6, Eq. (7.35). For a concave nonlinearity, F& (un (λ )) , being a decreasing non-negative sequence, must have a limit. For the convex case, F& (un (λ )) may converge or diverge. In all cases
α (λ ) = lim α n (λ ) exists for all λ ≥ 0 , where (Remark 7.2) n →∞
α n ( λ ) = || A − 1 / 2 [ F & (u n ( λ ))] A − 1 / 2 ||− 1 = || [ K ( u n ( λ ))] ||− 1 = ν 0 (u n ( λ )) .
(7.39)
It is clear that α (λ ) = ν 0 (u (λ )) for 0 ≤ λ < λ * . Thus, α ( λ ) is an extension of ν 0 (u (λ )) to the entire non-negative real line. By extending the minimal solution u ( λ ) to λ ≥ λ * by setting u ( λ ) = ∞ , we still have α ( λ ) = ν 0 (u (λ )) .
Lemma 7.9. If F
is a concave function, then lim α (λ ) = λ * and Λ = (0; λ *) . λ ↑λ *
280
S. Raj Vatsya
Proof. It follows from Theorem 7.5 and Lemma 7.8 (ii) for λ < λ * that λ ≤ α ( λ ) < λ * . The first part follows by letting λ ↑ λ * . To show that a positive solution does not exist at λ = λ * , assume to the contrary. From
the first part and the left continuity, we have λ * =|| K u ( λ *)) ||− 1 . Since F dominated by a linear function as in Lemma 7.8 (ii), i.e.,
is concave, it is
F ( w) < F (0) + [F& (0) − F& (ψ )]ψ + F& (ψ ) w, with an arbitrary ψ > 0 . Take ψ > u ( λ *) yielding from Corollary 7.6 that λ * ≥|| K (ψ ) ||−1 . Since ψ > u ( λ *) implies that F& (u (λ*)) > F& (ψ ) , we have
λ * =|| K u ( λ *)) ||−1 <|| K (ψ ) ||−1 , from the monotonicity principle (Lemma 3.20). This contradicts the result that
λ * ≥|| K (ψ ) ||−1 • It is clear from Lemma 7.9 that for a concave nonlinearity, the positive solutions become unbounded as λ → λ * . In some exactly solvable convex cases, bounded solutions exist at λ = λ * . Also, for some convex cases, multiple solutions are known to exist [Keller, 1969], while in the concave case, the the positive solution for λ < λ * is unique as shown below. Corollary 7.10. For a concave F , positive solution u for each λ < λ * is unique. Proof. Let u be the minimal positive solution, which exists from Theorem 7.4. If there is another solution, u > u at some x , then u > u at all x in D as in Eq. (7.35). It follows that [ A − λ F & (u )] ( u − u ) = θ (u ; u ) < 0 ,
and
λ < ν 0 (u (λ )) = μ0 [F& (u (λ ))] from Theorem 7.5. From the positivity lemma,
( u − u ) < 0 , which is a contradiction • The critical parameter λ * at times corresponds to physically measurable quantity. For the concave case, its determination is reducible to a simple eigenvalue problem. Corollary 7.11. If F
is a concave function, then λ * = μ 0 ( ρ ) =|| A −1 / 2 ρ A −1 / 2 ||−1 .
Proof. For a concave F , F & is a non-negative decreasing function. Consequently, a unique limit lim F & ( x;υ ) = ρ ( x ) exists. Let {υ } = {u (λ )} with λn ↑ λ * . It follows from υ →∞
Lemma 7.10 and Remark 7.2, that
n
n
Foundations and Applications of Variational and Perturbation Methods
λ* =
lim ν ( λ ) =
λn ↑ λ *
lim
λn ↑ λ *
281
|| A −1 / 2 F & (u ( λn )) A −1 / 2 ||−1 = || A − 1 / 2 ρ A − 1 / 2 ||− 1 •
To deduce a parallel weaker result for the convex case we need Corollary 7.12. Let υ < w on D . Then
(i) for a convex F , ν 0 (υ ) > ν 0 ( w) . (ii) for a concave F , ν 0 (υ ) < ν 0 ( w) . Proof. The result follows from ν 0 (υ ) = μ 0 [ F & (υ )] =|| K (υ )] ||− 1 and the other definitions as in Corollary 7.11 • Lemma 7.10. If F
is a convex function, then λ * ≥ ν 0 ( ∞ ) .
Proof. From Corollary 7.12, we have that ν 0 (υ ) is a non-increasing, positive function of
υ . Consequently, lim ν 0 (υ n ) = ν 0 (∞ ) exists, where {υn } is a positive, monotonically n →∞
diverging sequence. From Eq. (7.37-a), convexity implies that
F (υ ) < F (0) + F& (υ )υ < F (0) + F& (∞)υ , assuming that F& (∞) exists. The result follows from Corollary 7.6. If F& (∞) does not exist, i.e., F& (∞) = ∞ , then
ν 0 (∞) = lim ν 0 (υn ) = lim || A −1/ 2 F& (υn ) A −1/ 2 ||−1 = 0 < λ * • n →∞
n →∞
(7.40)
As shown in Theorem 7.4, the iterative sequence {un } converges to the minimal positive solution. While this result is seen to be quite useful for the above analysis, the sequence converges too slowly to be of much value in most of the cases of practical interest. In the following we show that in cases of the convex and concave nonlinearities, Newton’s method, i.e., Eq. (4.55-b), can be used to generate more rapidly convergent monotonic sequences. From Eq. (4.55-b), Newton’s approximation w is given by w = λ [ A − λ F & (υ )]− 1 [ F (υ ) − υ F & (υ )] = χ (υ ) ,
(7.41)
where υ is an initial approximation. It is clear that w is defined for each λ < ν 0 (υ ) . In case of a convex nonlinearity, minimal solution can exist at λ = λ * . For notational convenience, the results will still be stated for λ < λ * , assumed to be valid also for λ = λ * , if the solution exists at the critical point. The results will be obtained for strictly convex and concave
282
S. Raj Vatsya
nonlinear F , which requires more careful considerations. They can be adjusted for the nonstrict cases. Lemma 7.11. Let υ , w be as above and let u be the minimal solution. Then
(i) for a convex F , 0 ≤ υ < u implies that w < u , for each λ < λ * ; (ii) for a concave F , υ > u implies that w > u , for each λ < λ * . Proof. (i) From Eq. (7.41) we have that
[ A − λ F& (υ )](u − w) = λ[F (u ) − F (υ ) − (u − υ )F& (υ )] = θ (u;υ ) > 0, due to the convexity of F (Eq. (7.37-a)). Since ν 0 (υ ) > ν 0 (u ) = α (λ ) ≥ λ * > λ , from Lemma 7.8(i) and Corollary 7.12 , the result follows from the positivity lemma (Theorem 7.2). (ii) Follows by transposing the arguments of (i) • Lemma 7.12.
) ∞ ) ∞ 1. For a fixed λ and convex F , let {un +1}n =0 = { χ (un )}n = 0 be Newton’s sequence ) ) ) ) generated with u0 = 0 . Then un < un +1 < u on D for all n and each λ < ν 0 (un (λ )) , where u ( λ ) = ∞ for λ ≥ λ * . ) ∞ ) ∞ 2. (ii) For a fixed λ < λ * and concave F , let {un +1}n =0 = { χ (un )}n = 0 be Newton’s ) ) ) sequence generated with u0 > u . Then un > un +1 > u on D for all n . Proof. We give a proof for (i). (ii) Follows by transposing the arguments of (i). ) ) ) ) If un < u , then from Lemma 7.11, un +1 < u . Since u0 = 0 < u , by induction, {un } is ) bounded by u . Now u1 = λ [ A − λ F & (0)]−1 F (0) and λ F (0) > 0 on D. Hence from the ) ) ) ) positivity lemma, u1 > 0 = u0 . Let u n > u n −1 . We have ) ) ) ) ) ) ) ) A u n = λ [ F (u n −1 ) + (u n − u n −1 ) F & (u n −1 )] = λθ (u n ; u n −1 ) .
Since F
) ) ) ) is convex, it follows that 0 < θ (un ; un −1 ) < F (un ) (Eq. 7.37-a). Consequently,
) ) ) ) ) ) ) ) ) ) ) ) ) A (u n +1 − u n ) = λ [θ (u n +1 ; u n ) − θ (u n ; u n −1 )] = λ (u n +1 − u n )F & (u n ) + λθ (u n ; u n −1 ), i.e.,
) ) ) ) ) [ A − λ F& (un )](un+1 − un ) = λθ (un ; un−1 ) > 0 on D .
Foundations and Applications of Variational and Perturbation Methods
283
) ) From the positivity lemma and the bound result above, un (λ ) < u n +1 (λ ) < u (λ ) for all ) λ < ν 0 (un (λ )) •
For the strict nonlinearities, the strict monotonicity of the sequences hold only on D , not necessarily on ∂ D , as in Lemma 7.6. Remark 7.3. Although the result of Lemma 7.12(ii) is straightforward to deduce, it ) ) requires that u0 > u . If lim [F ( w) − wF & ( w) = f ∞ ( x ) exists, then u 0 = [ A − λρ ∞ ]− 1 f ∞ will w →∞
suffice, which can be used for some nonlinearities, e.g., F ( x ; w ) = x + w + exp[ − w ] , but not for all, e.g., F ( x ; w ) = log(1 + x + w ) . In general, the following construction will serve the purpose. Let κ be a positive constant such that λ < ν 0 (κ ) . Since ν 0 (κ ) increases with κ and
ν 0 ( ∞ ) = λ * (Lemma 7.9), a suitable κ can be found for any λ < λ * . It is straightforward to check that ) u0 = λ[ A − λ F& (κ )]−1[F (κ ) − κ F& (κ )] ≥ u . However, with this choice, the strict inequality may be lost, but it is inconsequential for the concave non-linearities • Theorem 7.6.
(i) Let F
) be convex , λ < λ * and let {un } be Newton’s sequence generated with
) ) u0 = 0 , as in Lemma 7.12(i). Then un ↑ u uniformly with respect to x in D . ) ) (ii) Let F be concave , λ < λ * and let {un } be Newton’s sequence generated with u0 , )
as in Lemma 7.12(ii). Then un ↓ u uniformly with respect to x in D . Proof. We give a proof for (i). (ii) follows similarly by invoking Lemma 7.12(ii) and noticing that in this case u is unique from Corollary 7.10. ) From Lemma 7.12(i), {un } is an increasing sequence bounded above by u and hence ) ) ) ) ) ) ) ) converges to some u) ≤ u . Since A un = λθ (un ; un −1 ) and θ (un ; un −1 ) → λ F (u ) point wise ) ) ) as n → ∞ , we have that u = lim u n = λ A −1F (u n −1 ) ≤ u . As in Theorem 7.4, by the n →∞
Lebesgue theorem (Theorem 1.3) and Dini’s theorem (Lemma 1.2), we have that
{u)n }
converges uniformly on D to a fixed point u) ≤ u . Since u is minimal, u) = u • Now we turn our attention to the calculation of the critical parameter λ * and focus mainly on the convex case. A parallel method is deducible for the concave case, which is commented upon briefly. While the method for the convex case is of considerable interest in
284
S. Raj Vatsya
applications, the parallel method for the concave case is numerically less attractive than the one obtained in Corollary 7.11. Lemma 7.13.
(i) Let F
be increasing and convex, let {un } be the iterative sequence generated in
Theorem 7.4 and let α n (λ ) = ν 0 (u n (λ )) (Eq. 7.39). Then for each n ≥ 1 , α n (λ ) is positive, continuous and decreasing function of λ in the interval (0; ∞ ) ; α n (0) = const . > 0 , and for each λ in (0; ∞ ) , α n ( λ ) > α n +1 ( λ ) > α (λ ) . (ii) For a concave F , with the symbols as in (i), α n ( λ ) < α (λ ) increases with λ and
α n (λ ) ≤ const. Proof. We give a proof for (i). (ii) follows similarly.
From the positivity lemma, u1 ( λ ) = λ A − 1F (0) is positive. It is also continuous and increasing function of λ in (0; ∞ ) . Assume this result to be valid for un (λ ) . Then u n +1 ( λ ) = λ A − 1F ( u n ( λ )) is positive by the positivity lemma and the continuity follows
from the Lebesgue theorem (Theorem 1.3) as in Theorem 7.4. Further, from the positivity lemma, u n +1 ( λ ′) − u n +1 ( λ ) = ( λ ′ − λ ) A − 1F (u n ( λ ′)) + λ A − 1[ F (u n ( λ ′)) − F (u n ( λ ))] > 0 ,
for λ ′ > λ . By induction, un (λ ) , for each n , is a positive, continuous and increasing function of λ . Let K ( λ ) = A −1 / 2 F & ( u ( λ )) A −1 / 2 . It is clear that K (λ ) is positive, increasing and n
n
n
uniformly continuous, i.e., || K n (λ ) − K n (λ ′) || → 0 , compact operator valued function of λ . λ →λ ′
Positivity and the monotonicity are obvious and the continuity follows by the Lebesgue theorem as in Remark 7.1. Consequently, its largest eigenvalue, || K n (λ ) || is positive and continuous (Lemma 3.2(ii), Remark 3.2 (ii)) and increasing (Lemma 3.20) function implying that α n ( λ ) =|| K n ( λ ) ||− 1 is a positive, continuous and decreasing function of λ . Since un (0) = 0 for each n , K n (0) = K (0) is independent of n , positive and bounded, implying that α n (0) = const . > 0 . Since for each λ , increases with n (Theorem α n ( λ ) > α n +1 (λ ) > α (λ ) •
7.4),
it
{un (λ )} is
follows
from
bounded by u ( λ ) and Corollary
7.12
Theorem 7.7.
(i) Let F
be convex. Then α n (λ ) has a fixed point λn for each n , and λn ↓ λ * .
(ii) Let F
be concave. Then α n (λ ) has a fixed point λn for each n , and λn ↑ λ * .
that
Foundations and Applications of Variational and Perturbation Methods
285
Proof. Since α n (0) = const . > 0 and α n (λ ) is a continuous and decreasing function of λ , from Lemma 7.13, α n (λ ) has a unique fixed point λn . This result for a concave F
follows from the fact that α n (λ ) is bounded by a constant and increases with λ . From Lemma 7.13 and Lemma 7.8, for λ < λ * , α n (λ ) > α n +1 (λ ) > α (λ ) > λ * , α n (λ )
is continuous and nonnegative on the positive real line. Consequently, {λn } is a decreasing sequence bounded below by λ * , and hence λn ↓ λ ≥ λ * . If λ > λ * then λn > λ * +ε for each ε such that 0 ≤ ε ≤ ε 0 for some ε 0 > 0 and all n . Hence α n (λn ) ≤ α n (λ * +ε ) for
α n (λ ) is a decreasing function of λ . Now, un (λ ) increases without bound on D for λ > λ * (Theorem 7.4) and since α n ( λ * +ε ) = ν 0 (u n (λ * +ε )) , it follows from Lemma 7.10 that
α n (λ * +ε ) = α n (λ ) ↓ ν 0 (∞) = α (λ * +ε ) , for each 0 ≤ ε ≤ ε 0 . Thus α n (λ ) converges monotonically to a continuous function α ( λ ) on the closed interval [0; λ * +ε 0 ] , and hence uniformly from Dini’s theorem as in Theorem 7.6. It follows that
lim α n (λn ) = lim λn = ν 0 (∞) ≤ λ * ,
n →∞
n →∞
from Lemma 7.10, which is a contradiction, implying the result • Again, although the approximations based on the iterative sequence converge to the critical point, the convergence is usually too slow. Approximations based on Newton’s sequence should be expected to yield more rapidly convergent sequence. In the following we show the convergence of the approximations to the critical point generated with Newton’s approximations to the solutions. As will be seen, computation of each approximation requires more effort than the one based on the iterative sequence, which can be justified in view of the improved rate of convergence.
) Lemma 7.14. Let F be convex, let {un } be Newton’s sequence generated in Lemma ) ) 7.12(i) and let α n (λ ) = ν 0 (u n (λ )) (Eq. 7.39). Then for each n ≥ 1 , we have ) ) ) ) (i) α n (λ ) has a unique fixed point λn such that λ * < λn < λn −1 ; ) ) (ii) un (λ ) is positive, continuous and increasing function of λ on the interval (0; λn −1 ) ; ) ) (iii) α n (λ ) is positive, continuous and decreasing function of λ on the interval (0; λn −1 ) , ) and α n (0) = const . > 0 . For a concave F with sequence as in Lemma 7.12(i), the inequalities in (i) to (iii) are reversed, which follow by transposing the arguments for the convex case.
286
S. Raj Vatsya
) ) Proof. The proof is by induction. It is clear that α 0 (λ ) = ν 0 (0) = λ0 > λ * . We have that ) u1 ( λ ) = λ [ A − λ F & (0)]− 1 F (0)
) ) is positive by the positivity lemma and clearly u1 (0) = 0 . Also, u1 (λ ) is differentiable with
) respect to λ with its derivative u&1 (λ ) given by
) u&1 ( λ ) = [ A − λ F & (0)]− 1 F (0) + λ [ A − λ F & (0)]− 1 F & (0)[ A − λ F & (0)]− 1 F (0) .
) It follows from the positivity lemma that u&1 (λ ) > 0 . The derivative is also continuous, which )
)
follows from the continuity of [ A − λ F & (0)]−1 F (0) . Consequently, [u1 (λ ′) − u1 (λ )] ↓ 0 as ) ) λ ′ ↓ λ . These properties of u1 (λ ) imply the properties stated in (iii) for α1 (λ ) exactly as in ) ) ) Lemma 7.13. Further, since u0 < u1 < u (Lemma 7.12(i)) for λ in (0; λ0 ) , ) ) ) ) α ( λ ) < α1 (λ ) < α 0 (λ ) from Corollary 7.12. It is clear that α1 (0) = λ0 . These properties of ) ) ) ) α1 (λ ) imply that it has a unique fixed point λ1 such that λ * < λ1 < λ0 . Assume (i) to (iii) to hold for n . For ( n + 1) we have that
) ) ) ) ) [ A − λ F& (un )]un+1 (λ ) = λ[F (un (λ )) − un (λ )F& (un (λ ))] , ) ) ) i.e., un +1 (0) = 0 . By the induction assumptions (ii) and (iii), un (λ ) , α n (λ ) are defined on the ) ) ) ) open interval (0; λn −1 ) and since λn < λn −1 from (i), it contains the interval (0; λn ) . Thus ) ) ) u n +1 ( λ ) is defined for λ in the interval (0; λn ) . Let λ < λ ′ < λn . We have
) ) ) ) A[un +1 (λ ′) − un +1 (λ )] = η + λ F& (un (λ ) ) ) ) ) ) ) = λ ′θ (un +1 (λ ′); un (λ ′)) − λθ (un +1 (λ ); un (λ )) ) ) ) ) ) ) ) ) ) = (λ ′ − λ )θ (un +1 (λ ′); un (λ ′)) + λ[θ (un +1 (λ ′); un (λ ′)) − θ (un +1 (λ ); un (λ ))] ) ) ) ) ) ) > λ[θ (un +1 (λ ′); un (λ ′)) − θ (un +1 (λ ); un (λ ))] = ληˆ (λ ′; λ ). We have used Eq. (7.37-a), to deduce that
) )
)
)
)
)
)
θ (un +1 (λ ′); un (λ ′)) = F (un (λ ′)) + (un +1 (λ ′) − un (λ ′))F& (un (λ ′)) > 0 , ) ) ) ) ) since un +1 (λ ′) > u n (λ ′) for λ ′ < ν 0 (u n (λ ′)) = α n (λ ′) from Lemma 7.12(i). Since λn is the ) ) ) fixed point of α n (λ ) with the properties stated in (iii), λ ′ < α n (λ ′) for each λ ′ < λn . Now
Foundations and Applications of Variational and Perturbation Methods
)
)
)
)
287
)
ηˆ (λ ′; λ ) = [F (un (λ ′)) − F (un (λ )) − (un (λ ′) − un (λ ))F& (un (λ ))]
(
)
) ) ) ) + [(un +1 (λ ′) − un (λ ′)) F& (un (λ ′)) − F& (un (λ )) ] ) ) ) + (un +1 (λ ′) − un +1 (λ ))F& (un (λ )) ) ) ) > (un +1 (λ ′) − un +1 (λ ))F& (un (λ )).
We have used the convexity of F , the induction assumptions and Lemma 7.12(i). Thus
) ) ) [ A − λ F& (un (λ ))](un+1 (λ ′) − un +1 (λ )) > 0 , ) ) ) and hence from the positivity lemma, (u n +1 (λ ′) − u n +1 (λ )) > 0 for λ < α n (λ ) , i.e., for each ) ) λ < λ ′ < λn . Positivity of u n +1 ( λ ′) follows from this by setting λ = 0 . For the continuity we
observe, following the above steps, that ) ) ) ) (u n +1 ( λ ′) − u n +1 ( λ )) = [ A − λ F & (u n ( λ ))]− 1η ( λ ′; λ ) .
) It is straightforward to check that for a fixed ( n + 1) , η (λ ′; λ ) → 0 as λ ′ → λ . An application of the Lebesgue theorem (Theorem 1.3) then shows that ) ) ) (u n +1 ( λ ′) − u n +1 ( λ )) → 0 as λ ′ → λ . Thus (ii) is established for u n +1 ( λ ) . This implies that ) ) α n +1 (λ ) is a positive, continuous and decreasing function of λ in (0; λn ) from the same
argument as in Lemma 7.13.
) ) ) Further, from Lemma 7.12(i), u (λ ) > u n +1 (λ ) > u n (λ ) for λ in (0; λn ) and hence ) ) ) α ( λ ) < α n +1 ( λ ) < α n (λ ) from Corollary 7.12. This implies , as for n = 1 , that α n +1 (λ ) has a ) ) ) unique fixed point λn+1 such that λ * < λn +1 < λn . In the concave case, the transposed arguments are supplemented with the following. In ) ) that case, the interval (0; λ *) contains (0; λn ) , which contains (0; λn −1 ) . Therefore, ) ) ) u ( λ ) < u n ( λ ) < u n −1 ( λ ) for λ in (0; λn ) from Lemma 7.12(ii) • In proving Lemma 7.14, we have also obtained
) Corollary 7.13. Let the symbols be as in Lemma 7.14. Then for each λ in (0; λn −1 ) , ) ) ) ) u ( λ ) > u n ( λ ) > u n −1 ( λ ) and α (λ ) < α n (λ ) < α n −1 (λ ) • Above results are sufficient to obtain the approximation result for the critical point, as follows.
) ) Theorem 7.8. Let λn be as in Lemma 7.14. Then for convex F , λn ↓ λ * and for ) concave F , λn ↑ λ * .
{ }
288
S. Raj Vatsya
) Proof. Again, we give a proof for the convex case. From Lemma 7.14(i), λn is a ) ) decreasing sequence bounded below by λ * , and hence λn ↓ λ ≥ λ * . ) ) If λ > λ * then for all n , λn > λ * +2ε for each non-negative ε bounded by some ε0 . ) Consequently α n (λ ) is defined on (0; λ * + 2ε ) , from Lemma 7.14(iii), and ) ) ) ) α n (λn ) < α n (λ * +2ε ) there. Now, {un (λ * +ε )} is an increasing sequence from Lemma
{ }
7.12(i). If it is uniformly bounded then it must converge uniformly to some positive solution ) u ( λ * + ε ) as in Theorem 7.4. This implies that λ * + ε 0 < λ * , which contradicts the definition of λ * . Hence u) ( λ * + ε ) diverges for some x in D . Since u) ( λ * + ε ) is continuous on D and increasing, it diverges on a set of nonzero measure in D , and hence everywhere as in Lemma 7.7. We have used the positivity of Green’s function corresponding to ) [ A − λ F & ( u n −1 ( λ ))]− 1 , which is assured by the positivity lemma. Now it follows as in Eq. ) ) (7.40) that α n (λ * +ε ) → ν 0 (∞) . We also have that α n (λ ) ↓ α (λ ) uniformly with respect n →∞
) ) ) to λ ≤ λ * + ε 0 as in Theorem 7.7. Hence λn = α n (λn ) → ν 0 (∞) ≤ λ * (Lemma 7.10), which )
)
n→∞
is a contradiction since λn > λ * . Consequently λ = λ * • Remark 7.4. In this section solving the nonlinear equations of the type of Eq. (7.32-a) A u = λ F ( u ) = F λ (u )
has been reduced to solving a sequence of the linear equations of the type of Eq. (7.29) ( A − λρ ) f = g
(7.42-a)
supplemented with the eigenvalues obtained by solving the equations of the type of Eq. (7.28), ( A − μ 0 ρ )ϖ 0 = 0 .
(7.43-a)
Eqs. (7.42-a) and (7.43-a) can be reduced to the equations involving a compact operator as indicated in Eq. (7.28); equivalently by Friedrichs’ construction: Complete H to H + with respect to the scalar product (u ,υ ) + = (u , Aυ ) , which converts Eq. (7.42) into (1 − λK ) f = K g
(7.42-b)
and Eq. (7.43-a) into ( μ 0− 1 − K )ϖ 0 = 0 ,
(7.43-b)
Foundations and Applications of Variational and Perturbation Methods
289
where K is the closure of A −1ρ in H + , equivalently of A −1 / 2 ρ A −1 / 2 (Theorem 4.2, Lemma 4.4). Variational methods are pre-eminently suitable to solve Eqs. (7.42-a) to (7.43-b) by the methods of Theorem 7.1(i) and sec. 3.II.2. In case of Eq. (7.42-a), this yields a sequence of approximations { f n } converging to f with respect to || || . As can be seen by the same arguments as used above, this sequence can be converted into a uniformly convergent sequence { f n′} by one operation of Green’s function, e.g., f n′ = A − 1 ( g + λρ f n ) ,
(7.44)
which is straightforward to check •
7.IV. TRANSPORT AND PROPAGATION Several examples are considered in this section, which are encountered in the areas generally classified as the transport phenomena, including propagation. Similar problems arise in other areas as well. The purpose here is to select problems that can be solved by the methods developed earlier, and illustrate their applications.
7.IV.1. Reaction-Diffusion Physical solution of the steady-state, isothermal, reaction-diffusion of a substance involving finite-order kinetics is determined by [Aris, 1975; Vatsya, 1987]
−
d 2ω dx 2
= λων , ν ≥ 1, λ ≥ 0, 0 < x < 1; ω& (0) = 0, ω (1) = 1.
(7.45)
For physical reasons, a positive solution is sought. By setting u = (1 − ω ) , Eq. (7.45 ) is reduced to an equivalent equation
Au = −
d 2u dx
2
= λ (1 − u )ν = F λ (ν ; u ),
u& (0) = 0, u (1) = 0,
(7.46)
which is of the form of Eq. (7.32-a). The operator B defined by B u = A − 1F λ (ν ; u ) is Fréchet differentiable with derivative B& u = − λν A −1 (1 − u )ν −1 . Let S[0,1] be the set of non-negative functions u bounded by u ′ = 1 . In H = L2 ([0;1], dx ) , we have that || u ′ ||= 1 . Also let
G0 ( x, y ) denote Green’s function for A −1 . It follows that
290
S. Raj Vatsya 1
1
0
0
| (B u )( x) | = λ ∫ dy G0 ( x, y )(1 − u ( y ))ν ≤ λ ||| (1 − u )ν ||| ∫ dy G0 ( x, y)u ′( y) ≤ λ || A −1u ′ || ≤
4λ
π
2
≤
4λν
π2
, ν ≥ 1,
where ||| ||| denotes the supremum norm. We have used the facts that Green’s function is positive and the lowest eigenvalue of A is equal to π 2 / 4 . The same steps show that B& is bounded by 4λν / π 2 . It is clear that for λ < π 2 / 4ν , B is a contraction of S[0,1] into S[0,1] . Thus, the solution of Eq. (7.46) can be obtained by iteration from the contraction mapping theorem (Lemma 4.20) starting with an arbitrary function in S[0,1] . However, B is a positive, decreasing and convex function of u for all positive values of λ . Therefore existence of the solution is expected for all positive λ . This is expected on physical grounds also. Thus, the condition λ < π 2 / 4ν is a serious limitation. We show below that the method of Theorem 7.3 together with the set defined by Proposition 7.2 (i) can be used to obtain the solution for this case for all positive values of λ . For ν = 1 , Eq. (7.46 ) is easily solved to yield
0 ≤ u ( x ) = 1 − (cosh λ x ) /(cosh λ ) = χ ( x ) . Let ν ≥ 2 . With η = λ , uη = χ ≤ 1 and
0 ≤ F λ (ν ;0) = λ . Thus
F λ (ν ; 0)
is in
H = L2 ([0;1], dx ) . Let Q η+ be the set containing all υ such that 0 ≤ υ ≤ uη . For each υ in
Q η+ , we have −δ = − νλ (1/(cosh λ )ν −1 ≥ F &λ ≥ − νλ .
Thus, condition (ii) of Lemma 7.2 is satisfied with γ = −δ and ζ = νλ . Let κ ≥ (νλ + δ ) / 2 , which satisfies the remaining condition of Lemma 7.3. For each υ in Q η+ ,
λ + (κ − λ )υ = κυ + λ (1 − υ ) ≥ κυ + F λ (ν ;υ ) ≥ 0 . For ν ≥ 2 , κ ≥ λ ; consequently,
λ + (κ − λ )υ ≤ λ + (κ − λ )uη == F λ (ν ;0) + (κ − η )uη , and thus, the result of Proposition 7.2 (i) holds. Now, it follows from Theorem 7.3 that Eq. (7.46) has a unique solution u in Q η+ for each non-negative λ , which can be approximated uniformly by an iterative sequence
{
}∞
κ un n = 0
generated with an arbitrary u0 in Q η+ . If
Foundations and Applications of Variational and Perturbation Methods
291
u0 = 0 and κ ≥ νλ , then from Proposition 7.1, the convergence is also monotonic from below. The restriction ν ≥ 2 is not necessary. Above arguments hold in this case with
κ ≥ max . [λ , (νλ + δ ) / 2] .
7.IV.2. Heat Transfer Heat is produced during an exothermic reaction in a bulk of organic material such as gas and wood pile, with a consequent increase in temperature. The rate of heat production typically increases with increase in the ambient temperature. As the heat is produced, a part of it is lost to the surrounding environment at the surface of the material. If a steady state is reached, the material stays in thermal equilibrium. Otherwise, the temperature rises rapidly and an explosion occurs usually in the form of spontaneous ignition. Due to its applications, the phenomenon has attracted considerable attention. The steady state temperature for this case is given by the solutions of the FrankKamenetskii equation [Frank-Kamenetskii, 1940, 1955],
− ∇ 2u = λ eu = λ F (u ), x in D; u ( x) = 0 x in ∂D,
(7.47)
where D is the interior of the material and ∂ D is its boundary. Eq. (7.47) is valid for λ < λ * where λ * is the explosion point, usually the pressure in the gas, and in case of the solid organic material, the size of the pile together with its geometric configuration. Both, the temperature u and the critical parameter value λ * , are of interest. Eq. (7.47) is a special case of the problem analyzed in detail in the last section. In particular, the methods of Theorems 7.4 and 7.6 can be used to obtain satisfactory approximations to u and those of Theorems 7.7 and 7.8 can be used to generate the converging sequences of approximations to λ * , supplemented with Remark 7.4. As indicated earlier, the schemes based on Newton’s method produce considerably more rapidly converging sequences. Numerical performance of the procedures is reported in [Moise and Pritchard, 1989].
7.IV.3. Radiative Transfer Preceding examples were treated by exploiting the positivity lemma associated with the elliptic operators and the functions that arise in modeling them. In this subsection, we consider radiative transfer in the atmosphere, which is modeled in terms of a positive, increasing and convex operator in a Banach space, although these properties do not arise as a result of the elliptic operators and the functions with these properties. This necessitates extensions of the same concepts, but then the consequent arguments are then remarkably similar [Vatsya, 1981-2].
292
S. Raj Vatsya
Radiative transfer is determined by the solutions of the nonlinear Chandrasekhar Hequation u ( z ) = 1 + zu ( z ) ∫
1
0
d ρ ( x) u ( x) , z+x
(7.48)
where ρ ( x ) is determined by the ambient atmospheric properties and its interaction with the radiation, which will be assumed to be a non-decreasing function, continuous at the boundary, i.e., zero and one are the sets of ρ -measure zero. Without loss of generality, we can set ρ (0) = 0 . The solution of interest u is known as the Chandrasekhar H-function, which is the
solution of interest. 1
As will become clear, L1 ([0;1], d ρ ( x )) with norm defined by || υ ||= ∫ d ρ ( x) υ | ( x) | , 0 abbreviated as L1 , is better suited as the underlying Banach space for an analysis of Eq. (7.48). Let the operator A from L1 to L1 be defined by A u = (1 + u B u ) , where (B u )( z ) =
z
1
∫0 d ρ ( x ) z + x
u ( x) .
(7.49-a)
It is clear that u is a fixed point of A , i.e.,
u = 1 + uB u .
(7.49-b)
Lemma 7.15. The Fréchet derivative A& (υ ) of A exists at each υ in L1 with
|| A& (υ ) || ≤ || υ || . Proof. It is straightforward to check that ( A& (υ ) w)( z ) =
z
1
∫0 d ρ ( x ) z + x
α ( z, x)
for each w in L1 , where α ( z , x ) = [ w ( z )υ ( x ) + w ( x )υ ( z )] and that
|| ( A& (υ ) w) || ≤
1
1
z
∫0 d ρ ( z ) ∫0 d ρ ( x) z + x
| α ( z, x) |
1 1 1⎡ 1 d ρ ( z ) ∫ d ρ ( x) | α ( z, x) | − ∫ d ρ ( z ) ∫ ⎢ 0 0 0 2⎣ 1 1 1 = ∫ d ρ ( z ) ∫ d ρ ( x ) | α ( z , x) | ≤ || υ || || w ||, 0 0 2
=
1
x−z
∫0 d ρ ( x) z + x
⎤ | α ( z , x ) |⎥ ⎦
Foundations and Applications of Variational and Perturbation Methods
293
implying the result. The interchange of the order of integration is justified by Fubini’s theorem (Theorem 1.4), and the second integral vanishes as it is invariant under the exchange of x and z but changes the sign • The results of Lemma 7.16 are stated without proofs as they follow by straightforward substitutions. Lemma 7.16. With υ , w in L1 , κ 0 = ( ρ (1) − ρ (0)) = ρ (1) and A as in Eq. (7.48), we have
(i) for υ ≥ 0 , Bυ ≥ 0, Aυ ≥ 1 for each z ≥ 0 ; (ii) for υ , w ≥ 0 , A& (υ ) w = wBυ + υB w ≥ 0 ;
1 || υ ||2 ; 2 (iv) | (Bυ )( z ) | ≤ || υ || for z in (0; ∞ ) • (iii) || Aυ || ≤ κ 0 +
It is clear from Lemma 7.15 that if κ 0 < 1/ 2 and || υ ||≤ [1 − 1 − 2κ 0 ] = d , then
|| A& ||≤ d < 1 . Also, from Lemma 7.16(iii), if || υ ||≤ d then || Aυ ||≤ d . Thus, A is a contraction of the closed ball Bd of radius d centered at the origin into itself, implying the existence of a unique fixed point of A in the ball, which can be obtained by iteration, by the contraction mapping theorem (Lemma 4.20). The kernel of the integral operator B has all the properties of Green’s function corresponding to Eq. (7.32-a) and A will be seen to be a convex operator in Lemma 7.18. Consequently, parallel results can be obtained essentially by the same arguments. However, there are some technical differences requiring extensions. Due to the simplicity of A , the bifurcation point, i.e., κ 0 = 1/ 2 is determined relatively easily. The following result follows by substitutions. Corollary 7.14. Let κ 0 < 1/ 2 and let u be the fixed point of A of Eq. (7.48). Then u ( z ) is a positive and continuous function of z in (0; ∞ ) with 1 ≤ u ( z ) ≤ [1 − 2κ 0 ]− 1 / 2 and || u || < d • ∞
As indicated above, the iterative sequence {un }n= 0 , where u n +1 = A u n , n ≥ 0 with an arbitrary u0 in Bd , converges to u with respect to || || . The convergence can be seen to be uniform with respect to z in any compact subset of [0; ∞ ) by the methods to be used in the following. With u0 = 0 , the sequence is also monotonic, as shown below.
294
S. Raj Vatsya Lemma 7.17. With the symbols as above, the iterative sequence resulting with u0 = 0 ,
converges monotonically and uniformly to the minimal positive fixed point u of A for all z≥0. Proof. From Lemma 7.16(i), u1 = A u0 ≥ 0 = u0 . Let un ≥ un −1 . Then using Eq. (4.54-b) and lemma 7.16 (ii), we have 1
∫0 dτ
un +1 − un = Aun − A un −1 =
A& [un−1 + τ (un − un −1 )](un − un −1 ) ≥ 0 .
(7.50)
Thus, by induction, the sequence is non-decreasing. It is also uniformly bounded by [1 − 2κ 0 ]− 1 / 2 as follows from Corollary 7.14. Consequently, the sequence converges
monotonically to a positive function from below. Continuity of the members of the sequence and the limit follows from the continuity of the kernel and the Lebesgue theorem (Theorem 1.3). Hence, the convergence is uniform on [0;1] by Dini’s theorem (Lemma 1.2). Let u ′ ≥ 0 be a fixed point of A . Then the above argument with
u ′ − un +1 = A u ′ − A un =
1
∫0 dτ
A& [un + τ (u ′ − un )](u ′ − un )
∞
yields that {un }n = 0 is bounded by u′ . Consequently u = lim un ≤ u ′ is the minimal positive n→∞
solution. In the above, we have restricted the domain of u ( z ) to [0;1] . The result can be extended to [0; ∞ ) using the equality u ′ − u = (u ′ − u )B u + u ′B (u ′ − u ) = [1 − B u ]− 1 u ′B (u ′ − u ) .
Since
u′ − u ≥ 0
on
[0;1] ,
B (u ′ − u ) ≥ 0
on [0; ∞ ) from Lemma 7.16(i). Also, 0 ≤ B u ≤ || u ||< 1 . Thus if u ′ ≥ 0 on [0; ∞ ) , u ′ − u ≥ 0 on [0; ∞ ) •
Again, the rate of convergence can be improved by a suitable choices of η in Eq. (4.55a) or its special case, Newton’s method, Eq. (4.55-b). A natural choice for η in the present case is the function B u reducing the fixed point equation, Eq. (7.49-b), to u = (1 − B u ) − 1 ,
(7.51)
) which generates a sequence {un } by u n +1 = (1 − B u n ) − 1 . Newton’s sequence {un } resulting ) ) from Eq. (4.55-b) is given by un +1 = χ (u n ) , where
χ (υ ) = [1 − A& (υ )]− 1[ Aυ − A& (υ )υ ] .
(7.52)
Foundations and Applications of Variational and Perturbation Methods
295
) We focus on Newton’s sequence and show later that u n ≤ u n ≤ u n , which implies the
convergence of {un } as well.
Lemma 7.18. Let A& (υ ) be as in Lemmas 7.15 and 7.16(ii). Then for υ% ≥ υ ≥ 0 and
continuous w ≥ 0 , we have that [ A& (υ%) − A& (υ )]w ≥ 0 , i.e., A is convex. Further, with || υ ||< 1 , [1 − A& (υ )]− 1 w ≥ 0 .
Proof. The first part follows by observing that
[ A& (υ% ) − A& (υ )]w = (υ% − υ )B w + wB (υ% − υ ) = A& (υ% − υ ) w ≥ 0 , from Lemma 7.16 (ii). For the second part, we have that || A& (υ ) ||≤|| υ ||< 1 (Lemma 7.15) and hence the Neumann expansion (Lemma 2.3) of [1 − A& (υ )]− 1 converges in L1 and each term of the series is non-negative from Lemma 7.16(ii). Thus, [1 − A& (υ )]− 1 w ≥ 0 is a non-negative function in
L1 • In the following, we adjust the arguments of the last section to deduce essentially the same results. Corollary 7.15. Let w,υ ≥ 0 . We have θ ( w;υ ) = A w − Aυ − A& (υ )( w − υ ) ≥ 0 . Proof. Since from Eq. (4.54-b),
θ ( w;υ ) =
∫0 dτ { A& [υ + τ ( w − υ )] − A& (υ )} ( w − υ ) 1
and [υ + τ ( w − υ )] ≥ υ for 0 ≤ τ ≤ 1 , the result follows from the convexity of A (Lemma 7.18) • Lemma 7.19. Let 0 ≤ υ ≤ u on [0;1] . Then || A& (υ ) ||< 1 and w = χ (υ ) ≤ u on [0;1] . Proof. Since 0 ≤ υ ≤ u , || υ ||≤|| u ||≤ (1 − 1 − 2κ 0 ) < 1 (Corollary 7.14). The first part
now follows from || A& (υ ) ||≤|| υ || (Lemma 7.15). For the second part, we have
[1 − A& (υ )](u − w) = [ Au − Aυ − A& (υ )(u − υ )] = θ (u;υ ) ≥ 0
296
S. Raj Vatsya
for u ≥ υ ≥ 0 (Corollary 7.15). Since || A& (υ ) ||< 1 , the result follows from the first part of Lemma 7.18 •
) ) ) Lemma 7.20. Let {un } be Newton’s sequence generated by u0 = 0 , i.e., un +1 = χ (u n ) ) ) for n ≥ 0 . Then u n ≤ u n +1 ≤ u for each n on [0;1] . ) ) ) ) Proof. It is clear that 0 = u0 ≤ u1 = 1 ≤ u . Assuming that un −1 ≤ un ≤ u , we have
) ) ) ) ) ) ) ) ) ) un +1 − un = [ A un + A& (un )(un +1 − un )] − [ A un −1 + A& (un −1 )(un − un −1 )] ) ) ) ) ) ) ) ) = θ (un +1 ; un ) + A& (un )(un +1 − un ) ≥ A& (un )(un +1 − un ) from Corollary 7.15. ) ) Since 0 ≤ un ≤ u , it follows that || A& (un ) ||< 1 (Lemma 7.19). Therefore, from Lemma ) ) ) ) 7.18 and Lemma 7.19, un ≤ un +1 ≤ u . By induction 0 ≤ u n ≤ u n +1 ≤ u on [0;1] for all n • The results obtained so far are sufficient to conclude the convergence of Newton’s sequence on [0;1] . However, the result of Lemma 7.20 can be extended to [0; ∞ ) as follows. ) ) The equation un +1 = χ (u n ) reduces to ) ) ) ) ) ) u n +1 ( z ) = 1 + u n +1 ( z )B u n ( z ) + u n ( z )[B (u n +1 − u n )]( z ) .
(7.53)
From Lemma 7.16(iv) and Lemma 7.20, we have ) ) ) | (B u n )( z ) | ≤ || u n || ≤ || u || < 1 ) ) Therefore, Eq. (7.53) defines a continuous u n +1 ( z ) on [0; ∞ ) if un ( z ) is defined there. Since ) ) u0 ( z ) = 0 on [0; ∞ ) , this extends un ( z ) to [0; ∞ ) . The fixed point is extended by Eq. (7.51).
) ) ) Lemma 7.21. Let {un ( z )} be as defined by Eq. (7.53). Then 0 ≤ u n ( z ) ≤ u n +1 ( z ) ≤ u ( z ) for each n and z ≥ 0 . ) ) ) ) Proof. Since 0 = u0 ≤ u1 = 1 , the result is true for n = 0 . Assuming that 0 ≤ un −1 ≤ u n , we have
) ) ) ) ) ) ) ) ) ) ) ) un+1 − un = (un +1 − un )Bun + (un − un−1 )B (un − un−1 ) + unB (un+1 − un ) ) ) ) ) ) ) ) ) = [1 − Bun ]−1[(un − un−1 )B (un − un−1 ) + unB (un+1 − un )]. ) ) For z in [0;1] , 0 ≤ un ≤ un +1 from Lemma 7.20. Therefore from Lemma 7.16(ii) ) ) ) ) ) B (un − un −1 ) ≥ 0 on [0; ∞ ) . Also 0 ≤ un ≤ u on [0;1] and hence, 0 ≤ B un ≤ || u n || < 1
Foundations and Applications of Variational and Perturbation Methods
297
) ) from Lemma 7.16(iv) and Lemma 7.20. These results and the assumption 0 ≤ un −1 ≤ u n are ) ) ) easily seen to imply that (u n +1 − u n ) ≥ 0 on [0; ∞ ) . By induction {un ( z )} is a positive non-
decreasing sequence for all z ≥ 0 . ) We have that 0 = u0 ≤ u . Since
) ) ) ) ) ) u − un +1 = [1 − B un ]−1[(u − un )B (u − un ) + unB (u − un +1 )] , ) ) the assumption 0 ≤ un ≤ u on [0; ∞ ) implies that 0 ≤ u n +1 ≤ u exactly as above •
) We have established that {un } is a non-negative and non-decreasing sequence bounded above by u , which is sufficient to conclude
) ) Theorem 7.9. Let {un } be as in Lemma 7.21. Then un ( z ) ↑ u( z ) uniformly with respect to z in any compact subset S of [0; ∞ ) .
) Proof. Since {un } is a non-decreasing sequence bounded by u , and S is closed and
)
bounded, it follows that un ↑ u ≤ u . From Eq. (7.53) we have
) ) ) ) ) ) u = lim un +1 = 1 + lim [un +1B un + unB (un +1 − un )] . n →∞
n →∞
) ) Since {( z /( z + x ))(un +1 − un )( x )} is bounded by a ρ -integrable function u and converges to ) ) zero point wise, by the Lebesgue theorem (Theorem 1.3), B (u n +1 − u n ) → 0 on S . By similar ) ) argument it follows that un +1B un → u B u , hence u = [1 + (B u )( z )]−1 , with | (B u )( z ) | < 1 . n →∞
Since u ≤ u for z in [0;1] and u is the minimal solution (Lemma 7.17), u = u on [0;1] ) implying that u ( z ) = [1 + (B u )( z )]−1 on S . Therefore u = u on S . Since {un } is a nondecreasing sequence of continuous functions converging to a continuous function u , the convergence is uniform on S by Dini’s theorem (Lemma 1.2) • As indicated earlier, the sequence {un } generated by iterating Eq. (7.51) with u0 = 0 is ) squeezed between {un } and {un } as shown in Corollary 7.16, below. The monotonicity of
{un } can be established by the same methods as have been used repeatedly in this subsection. ) ) Corollary 7.16. Let {un } , {un } and {un } be as above. Then u n ≤ u n ≤ u n for each n ≥ 0 and each z in [0; ∞ ) . ) Proof. We demonstrate the inequality un ≤ u n . The other inequality follows similarly.
298
S. Raj Vatsya
) Since 0 = u0 ≤ u0 = 0 , the inequality holds for n = 0 . Assume its validity for some n . We have that
) ) ) ) ) ) un +1 − un +1 = u n +1B un − un +1B un + unB (un +1 − un ) ) ) ) ) ) ) = (un +1 − un +1 )B un + un +1B (un − un ) + u nB (u n +1 − un ) ) ) ) ) ) = [1 − B un ]−1[u n +1B (un − u n ) + u nB (u n +1 − u n )] ≥ 0, ) ) ) for un ≤ u n by the induction assumption, and u n +1 ≥ un from Lemma 7.21. Further, since ) 0 ≤ un ≤ un ≤ u , we have that 0 ≤ B un < 1 (Lemma 7.16(iv)). Consequently, ) (u n +1 − u n +1 ) ≥ 0 on S . The result follows by induction •
) We have obtained three sequences {un } , {un } and {un } , each one converging to the fixed point monotonically and uniformly with increased rate of convergence, respectively. The sequences {un } and {un } are generated by straightforward iteration. Calculations of {u)n } require solving the linear equations of the type of Eq. (7.42-a) at each stage, as indicated in Remark 7.4. In some situations, ρ ( x ) is not explicitly available but its moments can be calculated. This enables one to construct an approximation ρ n ( x ) to ρ ( x ) by the moment method described in sec. 3.III. Also, since the exact integrations to obtain the members of these sequences are not always possible, the integrals in Eqs. (7.48) and (7.49-a) are replaced with some numerical quadrature. In the following we consider the approximation stated in Eq. (3.44-a), which was shown to satisfactorily approximate the integrals of the continuous functions in Lemma 3.14. Replacing ρ ( x ) by ρ n ( x ) reduces Eqs. (7.48) to
d ρ n ( x) uˆn ( x ) z+x = [1 + uˆn ( z )(Bn uˆn )( z )] = ( An uˆn )( z ).
uˆn ( z ) = 1 + zuˆn ( z ) ∫
1
(7.54)
0
Since ρ n ( x ) can replace ρ ( x ) in the above analysis, the results are applicable to Eq. (7.54). In particular, the results of Lemmas 7.15, 7.16 and Corollary 7.14 will be used with such replacement. Also, the sequences approximating the solutions uˆn of Eq. (7.54), i.e., the fixed points of An , can be generated by the above methods. Alternatively, the exact solution uˆn ( z ) can be obtained at the roots of the quadrature by a matrix inversion and then at all values of z by substitution in Eq. (7.54). In the following we consider the convergence properties of uˆn to u. 1
Let L1n ([0;1], d ρ n ( x )) , with the norm defined by || υ ||n = ∫ d ρ n ( x) |υ ( x) | , be the space of 0
absolutely ρn -integrable functions. Since ρ n (1) = ρ (1) = κ 0 < 1/ 2 , it follows from Corollary
Foundations and Applications of Variational and Perturbation Methods
(
7.14, by replacing ρ ( x ) by ρ n ( x ) , that || uˆn ||n ≤ 1 − 1 − 2κ 0
)
299
with 1 ≤ uˆ n ≤ [1 − 2κ 0 ]− 1 / 2
for each n , and uˆn are continuous functions. Lemma 7.22. Let | w |≤ M , z ≥ 0 ; then lim (B w)( z ) = lim (Bn w)( z ) = 0 . z →0
z →0
Proof. We consider the case of B ; the case of Bn follows by the same argument. By
definition (B w )( z ) = z ∫
1
0
d ρ ( x) w( x ) . z+x
Since the integrand is bounded by a ρ -integrable function M and converges to zero for each x > 0 , i.e., almost everywhere with respect to ρ , the result follows from the Lebesgue theorem (Theorem 1.3) • Lemma 7.23. Let | w |,| wn |≤ M , let w be continuous on [0;1] and || w − wn ||n → 0 . n →∞
Then υn ( z ) = (Bn wn )( z ) → (B w)( z ) = υ ( z ) , uniformly with respect to z in any compact n →∞
subset of [0; ∞ ) . Proof. From Lemma 7.22, υ n (0) = υ (0) = 0 . Let z > 0 . We have that 1
Tn ( z ) = | υ ( z ) − υn ( z ) | ≤ | ∫ [ d ρ ( x) − d ρ n ( x)] 0
1
+ | ∫ d ρ n ( x) 0
z w( x) | z+x
z (w( x) − wn ( x)) | . z+x
Since z > 0 and w ( x ) is continuous, zw ( x ) /( z + x ) is continuous function of x on [0;1] . Consequently, the first term converges to zero as n → ∞ , from Lemma 3.14. The second term is majorized by || w − wn ||n → 0 . Thus υn ( z ) → υ ( z ) point wise. Now, since n →∞
n→∞
1
z z+x
| w |,| wn |≤ M , we also have Tn ( z ) = | υ ( z ) − υ n ( z ) | ≤ M ∫ [ d ρ ( x ) + d ρ n ( x )] 0
and 1
∫0
d ρ n ( x)
z ≤ z+x
1
∫0
d ρ ( x)
z z+x
300
S. Raj Vatsya
for z in (0; ∞ ) , from Corollary 3.10. Hence 1
Tn ( z ) ≤ 2 M ∫ d ρ ( x ) 0
z → 0, z + x z →0
from Lemma 7.22. Thus for any ε > 0 there is a δ (ε ) independent of n such that z < δ (ε ) implies that Tn ( z ) < ε . Let z , z ′ be in a compact subset Sε of [δ (ε ); ∞ ) . We have that | Tn ( z ) − Tn ( z ′) | = | [| υ ( z ) − υn ( z ) | − | υ ( z ′) − υn ( z ′) | ≤ | (υ ( z ) − υ ( z ′)) − (υn ( z ) − υn ( z ′)) | ( z − z ′) xwn ( x) 1 1 ( z − z ′) xw( x) | + | ∫ d ρn ( x) | ≤ | ∫ d ρ ( x) 0 0 ( z + x )( z ′ + x ) ( z + x )( z ′ + x ) 2κ M ≤ | z − z′ | 0 → 0. δ (ε ) z → z′
Therefore {Tn ( z )} is a family of equi-continuous functions converging to zero for each z in Sε and therefore, uniformly on Sε by Arzela’s theorem (Lemma 1.3). Thus, given ε > 0 one
can pick a δ (ε ) such that Tn ( z ) < ε for z < δ (ε ) and then increase n to ensure that Tn ( z ) < ε on the complement of [0; δ (ε )) • Theorem 7.10. Let u , {uˆn } and κ 0 < 1/ 2 be as above. Then uˆn ( z ) → u ( z ) uniformly n →∞
with respect to z in any compact subset of [0; ∞ ) . Proof. We have that u ( z ) − uˆ n ( z ) = Tn1 ( z ) + Tn2 ( z ) , where Tn1 ( z ) = ( A u )( z ) − ( A n u )( z )
and
Tn2 ( z ) = ( An u )( z ) − ( An uˆn )( z ) =
1
∫0 dτ
{ A& n ⎡⎣uˆn + τ ( u − uˆn )⎤⎦ ( u − uˆn )} ( z).
Now,
| Tn1 ( z ) | = | u ( z )[(B − Bn )u ]( z ) | ≤ (1 − 2κ 0 )−1/ 2 | [(B − Bn )u ]( z ) | → 0, n →∞
uniformly for z in any compact subset of [0; ∞ ) , from Lemma 7.23 by setting w = wn = u . Hence
Foundations and Applications of Variational and Perturbation Methods
|| Tn1 ||n =
1
∫0
d ρ n ( x ) | Tn1 ( x ) | ≤ ≤
sup | Tn1 ( x) |
x in [0;1]
1
∫0
301
d ρ n ( x)
⎞ 1⎛ 1 ⎜⎜ sup | Tn ( x ) | ⎟⎟ → 0. 2 ⎝ x in [0;1] ⎠ n →∞
Further, || Tn2 ||n ≤ sup || A& n ⎡⎣uˆn + τ ( u − uˆn ) ⎤⎦ ||n || u − uˆn ||n and for each τ in [0;1] , τ in [0;1]
|| A& n ⎡⎣uˆn + τ ( u − uˆn ) ⎤⎦ ||n ≤ || τ u + (1 − τ )uˆn ||n
(Lemma 7.15)
≤ τ || u ||n + (1 − τ ) || uˆn ||n ≤ τ || u || + (1 − τ ) || uˆn ||n +τ | [|| u ||n − || u ||] | .
(
Also, [τ || u || + (1 − τ ) || uˆ n ||n ] ≤ 1 − 1 − 2κ 0
) from Corollary 7.14, and
1
τ | [|| u ||n − || u ||] | ≤ | ∫0 [ d ρ ( x ) − d ρ n ( x )] u ( x ) |
→
n →∞
0
from Lemma 3.14. We have used the facts that u ( x ) is a non-negative continuous function. 1 ⎞ Thus, one can ensure by increasing n that || Tn2 ||n ≤ ⎛⎜ 1 − 1 − 2κ 0 ⎟ || u − uˆ n ||n . Therefore 2 ⎝ ⎠
1 ⎛ || u − uˆn ||n ≤ || Tn1 ||n + ⎜ 1 − 1 − 2κ 0 2 ⎝
⎞ n − 1/ 2 || Tn1 ||n ⎟ || u − uˆn || ≤ 2(1 − 2κ 0 ) ⎠
→
n →∞
0.
It follows from Lemma 7.23 that (Bn uˆn )( z ) → (B u )( z ) uniformly with respect to z . The n →∞
proof is completed by observing that u ( z ) = [1 − (B u )( z )]−1 , uˆ n ( z ) = [1 − (B n uˆ n )( z )]− 1 , and | (B u )( z ) |, | (B n uˆn )( z ) |≤|| u ||, || uˆ n ||n < 1 , from Corollary 7.14 and Lemma 7.16(iv) •
7.IV.4. Turbulent Diffusion As seen in the previous sections, the positivity of the solutions of some equations proves a useful property to obtain quite strong results. Some fixed point equations encountered are not endowed with this property requiring alternative techniques, although still within the range of the contraction mapping theorem. In the present and the next subsection, we consider two similar examples of this type. Turbulent diffusion is encountered mostly in fluid flow, including gas dynamics. The following nonlinear integro-differential equation is frequently used as the model describing this phenomenon [Monin and Yaglom, 1967; Velikson, 1975]:
302
S. Raj Vatsya t du (t ) + a (t )u (t ) + ∫ ds k (t , s )u (t − s )u ( s ) = g (t ); u (0) = κ , 0 ≤ t ≤ T , 0 dt
(7.55)
where T is arbitrary. Eq. (7.55) is equivalent to the fixed point equation, t
u (t ) = ce −α ( t ) + ∫ dτ e −[α ( t ) −α (τ )] g (τ ) 0
= h(t ) −
t
−
∫ dτ
t
τ
0
0
0
∫ dτ ∫
τ
e −[α (t ) −α (τ )] ∫ ds k (τ , s )u (τ − s )u ( s )
(7.56)
0
ds q (t ,τ , s )u (τ − s )u ( s ) = [ u ](t ),
where t
α (t ) = ∫ dτ a(τ ) , q (t ,τ , s ) = e −[α ( t ) −α (τ )]k (t , s ) , and h(t ) = ( u0 )(t ) with u0 = 0 . 0 Eq. (7.56) is a Voltera type equation. Numerically, such equations are frequently solved by advancing the solution in small steps with respect to t . For sufficiently small time step t , can be shown to be a contraction of a ball in a suitable Banach space. This method will be considered in the next subsection. For the present we show that under suitable conditions, the iterative sequence {un +1 =
un }n =0 generated by a function u0 in a set Q converges to the ∞
solution u for an arbitrary t ≤ T , even though
is not a contraction. It will be assumed that
a (t ) and g (t ) are absolutely integrable and
∫
τ
0
where
sup
ds | q (t ,τ , s ) | = M t
∫ ds
τ in [0;T ] 0
∫
τ
0
ds | e −[α (t )−α (τ )] k (t , s ) | ≤ M ,
(7.57)
is a constant. For the validity of Eq. (7.57) it is sufficient that
| k (t , s ) | exists.
Assumed integrability of | a(t ) | implies that | α (t ) − α (τ ) | is bounded for each t and τ in the interval [0; T ] . This together with the absolute integrability of g (t ) implies that
| h (t ) |≤ ξ , where ξ is a constant. Let κ ≥ 2ξ and ς ≥ κ 2 M / ξ be otherwise arbitrary constants and let Q be the set of functions ω (t ) on [0; T ] such that | e − ς t ω (t ) |≤ κ . The set
Q is thus, a closed set in the Banach space with norm defined by || ω ||= sup | e −ς t ω (t ) | , t in [0;T ]
0
i.e., the space C weighted by the decaying exponential. Validity of the construction is straightforward to check. The first step towards the convergence of the iterative sequence is to ascertain that Q is contained in Q , which implies that the iterative sequence generated by an arbitrary u0 in
Foundations and Applications of Variational and Perturbation Methods
303
Q is contained in Q . The result follows from the following estimate with an arbitrary ω in Q: t
τ
0
0
| e −ς t ( ω )(t ) | ≤ | e −ς t h(t ) | + e −ς t ∫ dτ ∫ ds | q (t ,τ , s ) | ω (τ − s ) | | ω ( s ) | t
≤ ξ +κ M ∫ e 2
− ς ( t −τ )
0
κ 2M ≤ κ. dτ ≤ ξ + ς
Next we show that there is an integer n such that
n
(7.58)
is a contraction. This will follow
from the estimate
| e −ς t (
υ−
n
n
ω ) | ≤ [(2κ Mt ) n / n !] || υ − ω || ,
(7.59-a)
which implies that
||
υ−
n
n
ω || ≤ [(2κ MT ) n / n !] || υ − ω || ,
(7.59-b)
for each υ and ω in Q , since the multiplier on the right side of Eq. (7.59) converges to zero as n → ∞ . To obtain the estimate of Eqs. (7.59-a) and (7.59-b), we proceed with induction. The estimate of Eq. (7.59-a) is clearly true for n = 0 . Assume it to be valid for n . It follows from the definitions that
(
υ−
ω )(t ) = −
n +1
n +1
t
τ
0
0
∫ dτ ∫
ds [q (t ,τ , s )(
+ q (t ,τ ,τ − s )(
n
υ )(τ − s )
n
ω )(τ − s )](
υ−
n
n
ω )( s ),
which is estimated by | e −ς t (
υ−
n
n
τ t (2κ Mt ) n || υ − ω || ∫ dτ e −ς (t −τ ) ∫ ds s n [| q (t ,τ , s ) | + | q (t ,τ ,τ − s ) |] 0 0 n! t (2κ M ) n +1 (2κ Mt ) n +1 ≤ || υ − ω || ∫ dτ τ n = || υ − ω || . 0 n !T ( n + 1)!
ω) | ≤ κ
We have used the induction assumption and the fact that
||
υ ||, ||
n
n
υ,
n
n
ω are in Q , i.e.,
ω || ≤ κ .
The continuity of follows from Eq. (7.59) by setting n = 1 , which also shows that is not a contraction of Q in general. Convergence of the iterative sequence to the unique solution of Eq. (7.56), equivalently Eq. (7.55), in Q is now a consequence of Corollary 4.7. Since the sequence convergences with respect to the supremum norm, it is uniform with respect to t in [0; T ] .
304
S. Raj Vatsya
7.IV.5. Optical Beam Propagation Classical field theoretical description of an optical wave is completely provided by Maxwell’s equations, which is sufficient to determine its interaction with the macroscopic systems. While the wave nature of light precludes its localization in space, an optical signal can assume closely the form of a beam, which is sufficiently adequate approximation for describing a number of physical phenomena, e.g., the material micro-fabrication with lasers and the photonic communication technologies. We restrict the considerations to paraxial waves, which cover a variety of phenomena. In any case, it is sufficient to describe the basic elements of the analysis. A paraxial wave is a plane wave e−ikz modulated by a complex envelope A (r ; z ) that is slowly varying function of the position, where k is the wave-number, i.e., wavelength is equal to 2π / k . The wave propagates along the z -axis and the wave front is a plane wave locally in the x, y plane with the position vector denoted by ur . The amplitude A ( r ; z ) of a beam propagating in vacuum satisfies the Helmholtz equation [Saleh and Teich, 1991; Ch. 3].
i
1 ∂A a + ∇ r 2A = 0 , 2k ∂z
(7.60)
where ∇ r 2 is the transverse part of the Laplacian. The intensity distribution I (r ; z ) in the beam is given by I =| A |2 . In a dielectric medium, A satisfies the nonlinear Schrödinger equation [Newell and Maloney, 1992]
i
∂A ∂z
+
1 ∇ r 2A 2k
− V A = 0,
(7.61)
where the perturbing term V is in general a function of ur , z and A . For the linear case, V is independent of A . Eq. (7.61) in its linear as well as in the nonlinear form arises also in various quantum mechanical applications [Feit , Fleckand and Steiger, 1982]. It is more convenient to consider the dimensionless forms of Eqs. (7.60) and (7.61) resulting from the transformations r → r k / ω 0 and z → z / ω0 , where ω0 is the beam radius at some location. Retaining the original notation for the transformed quantities, Eq. (7.61) reads
i
∂A ∂z
+
1 ∇ r 2A 2
− V A = 0,
reducing to the corresponding form of Eq. (7.60) with V = 0 . Eq. (7.62) with V = 0 is exactly solvable with a Gaussian solution A 0 given by
(7.62)
Foundations and Applications of Variational and Perturbation Methods
A 0 ( r; z ) =
⎡ ⎤ r2 exp ⎢ − ⎥, 2 + 2i[ z − z f ]) ⎣⎢ (ω f + 2i[ z − z f ]) ⎦⎥
(ω 2f + 2i[ z0 − z f ]) (ω 2f
305 (7.63-a)
where ω f is the beam radius at the focal point z f . In the experimental settings, frequently a collimated beam is focused at a desired location with a convex lens. It is convenient to let z0 be the lens location. The solution given by Eq. (7.63-a) is determined by its value at one point on the beam. The beam radius ω ( z ) at an arbitrary location z is given by
⎡ ⎛ z − zf ω ( z ) = ω f ⎢1 + 4 ⎜ ⎜ ω 2f ⎢ ⎝ ⎣
⎞ ⎟⎟ ⎠
2 1/ 2
⎤ ⎥ ⎥ ⎦
.
with respect to the variables ρ = r / ω ( z ) and ξ =
(7.63-b)
⎡ 2( z − z f ) ⎤ 1 tan −1 ⎢ ⎥ , Eq. (7.62) assumes 2 2 ⎣⎢ ω f ⎦⎥
the following form:
i
∂A ∂ξ
= −
1 ∇ ρ 2 A + 2i tan(2ξ ) ( ρ ∇ ρ ) A + ω (ξ ) 2 V ( I ) A , 2
(7.64)
where ω (ξ ) = ω ( z (ξ )) = ω f / cos(2ξ ) . With the representation
A = cos(2ξ ) exp ⎡⎣i ρ 2 tan(2ξ ) ⎤⎦ u , and substitution in Eq. (7.64), it follows that u satisfies the equation
i
∂u 1 = − ∇ ρ 2u + V u , ∂ξ 2
(
(7.65)
)
where V = 2 ρ 2 + ω 2 (ξ )V . The solution of the unperturbed equation corresponding to A 0 is given by
gu 0 ( ρ , ξ ) =
exp [2i (ξ 0 − ξ )] exp [ − ρ 2 ] . cos (2ξ 0 )
(7.66)
Eq. (7.65) is similar to Eq. (7.62) with a basic difference. Parallel rays with respect to ρ conform to the characteristic curves of the unperturbed equation determined by A 0 . Consequently, an unperturbed Gaussian beam with varying width with respect to (r ; z ) coordinate system, is collimated with respect to ( ρ ; ξ ) and the corresponding solution given
306
S. Raj Vatsya
by Eq. (7.66) is a soliton. If the beam width remains almost constant, this property still holds. As will be seen, this has a favorable impact on the numerical schemes to determine the solution. Practically encountered perturbations are sufficiently mild having small impact on the beam structure and the solution. Thus, the solutions of Eq. (7.65) can be obtained within the framework of the perturbation methods, which can be expressed as an operator equation in H = L2 ( R 2 ; d ρ ) . With A 0 being the self-adjoint realization of −∇ ρ 2 / 2 in H . Eq. (7.65)
reads
i
∂u ∂u − ( A 0 + V )u = i − Au = 0 . ∂ξ ∂ξ
(7.67)
If the perturbation is independent of u , then A is a linear operator. Its self-adjointness for a large class of V can be checked by the results of sec. 4.I. If V and thus A , is a non-linear operator, the solution scheme will require iteration, i.e., at each stage a fixed value of V will be needed. This reduces the case to the consideration of a linear operator. For now, we use the methods assuming that A is self-adjoint, which will be justified in the process. Using Eq. (4.25), the solution of Eq. (7.67) given by u (ξ ) = e − i Aξ u (0) can be expressed as
u (ξ ) = u 0 (ξ ) − ie − i A where u 0 (ξ ) = e
− i A 0 ( ξ −ξ )
0
ξ
∫
ξ
0
0
d ξ ′ei A ξ ′V (ξ ′; u (ξ ′))u (ξ ′) = ( u )(ξ ) ,
(7.68)
u 0 (ξ ) with an arbitrary ξ is the unperturbed solution, i.e., the
amplitude in vacuum is given by Eq. (7.66). It is usually convenient to take ξ = ξ 0 . Thus, for each fixed ξ , u (ξ ) is a fixed point of , which is in general a nonlinear Voltera operator with close resemblance with Eq. (7.56). Eq. (7.68) can be treated by similar methods as the case of the last subsection. We illustrate an alternative method. Methods parallel to each other are applicable to both of the examples. For calculations, a judicious choice can be made. For the present, we show that for a sufficiently small ξ , is a contraction of a ball of sufficiently small radius α in H centered at u 0 . In practical applications, V (υ ) is usually a function of I , i.e., υ and υ * , bounded by a polynomial in | υ | , which can therefore, be expressed as || V (u 0 + w) ||≤ a + b || w || for all w in a ball of radius α < 1 centered at u . Similarly, V& is dominated by a linear function. 0
This also shows that V is A 0 -bounded, implying that A is self-adjoint for all functions in the ball, which justifies the propagator representation and Eq. (7.68). The class so defined is sufficiently wide to include a variety of practical cases. Letting u = (u 0 + φ ) transforms Eq. (7.68) into
φ (ξ ) = − ie − i A
0
ξ
∫
ξ
0
d ξ ′e i A
0
ξ′
⎡⎣V (u 0 + φ )(u 0 + φ ) ⎤⎦ (ξ ) = (
φ )(ξ ),
0
Foundations and Applications of Variational and Perturbation Methods i.e., φ is a fixed point of conditions that ||
. It follows from the definition of
0
0
307
and the assumed
≤ a ′ + b ′ || w || .
0 w ||
Let ξ be sufficiently small to ensure that ξ < b ′ −1 and that b ′−1ξ a ′ < 1 . Further, let α ≤ ξ a ′ / b′ < 1 . It follows that if || w || ≤ α , then || 0 w ||≤ α , i.e., 0 is a map of a ball of radius α centered at the origin into itself. Now, (
0
w−
υ )(ξ ) = −ie − i A
0
∫
ξ
0
= −ie − i A = −ie − i A
0
ξ
∫
ξ
0
0
ξ
∫
ξ
0
d ξ ′e i A
d ξ ′e i A 0
ξ′
ξ
0
0
ξ′
d ξ ′ei A
{V (u
0
0
ξ′
0 0 0 0 ⎣⎡V (u + w)(u + w) − V (u + υ )(u + υ ) ⎦⎤ (ξ )
}
+ w)( w − υ ) + ⎡⎣V (u 0 + w) − V (u 0 + υ ) ⎤⎦ (u 0 + υ ) (ξ )
0 0 ⎛ ⎞ ⎡ ⎤ 0 ⎜ V (u + w)( w − υ ) + ⎢ ∫0 dτ V& [u + υ + τ ( w − υ )]( w − υ ) ⎥ (u + υ )(ξ ) ⎟ . ⎣ ⎦ ⎝ ⎠ 1
As above, it can be seen that || further if need be, we have that
0
0
w−
υ ||≤ const . ξ || w − υ || . Thus, by decreasing ξ
0
is a contraction of the stated ball.
It follows from the contraction mapping theorem (Lemma 4.20) that the solution of Eq. (7.68) can be obtained by iteration by taking ξ sufficiently small. However, it is usually more convenient and efficient to decrease ξ further and reduce the number of iterations necessary to achieve a desired level of accuracy, usually just to one. In this scheme the solution is propagated from one value of ξ to the other in small steps as discussed in Eqs. (4.23) to Eq. (4.32). These equations should be adjusted for Eq. (7.65), which is two-dimensional and the Laplacian term differs by a multiplicative constant, which will be assumed in the following. The expression given by Eq. (4.31) together with the approximate evaluations of the integrals by the fast Fourier routines is used frequently for the pertaining calculations. Eq. (4.32) has been used occasionally with approximate evaluations of the integrals by matrix multiplication, which is less convenient. We show below that Eq. (4.32) can be adopted for an application of the fast Fourier routines yielding a more efficient numerical scheme than the one based on Eq. (4.31). Instead of Eq. (4.62), we focus on Eq. (7.65). In all of the numerical procedures, one needs a suitable mesh structure. In case of a beam with little change along the beam axis, an equi-spaced mesh is quite convenient and adequate. In a number of applications, the beam radius varies rapidly along the beam axis. For example, in laser machining of the materials, a beam of several millimeters width at the lens may converge onto a region of a few micrometers about the focal point. In such cases, a fixed mesh structure at all points along the beam axis is unsuitable. This necessitates modifications to the formulations and the methods that require equi-spaced mesh structure, e.g., the fast Fourier transform routines. Complications result in all cases. As indicated above, these difficulties are eliminated in Eq. (7.65). Eq. (4.28) can be expressed as 0
e −2 i A ε ≈ e − i A ε e −2 iε V e − i A
0
ε
= ℘0 (ε ) e −2 iε V ℘0 (ε ) .
(7.69)
308
S. Raj Vatsya 0
where ℘0 (ε ) is the free propagator e−i A ε given by Eq. (4.32) as
1 2π iε
[℘0 (ε )υ (ξ )](r; ξ + ε ) =
⎡ i (r − r ′) 2 ⎤ ′ r d exp ⎢ 2ε ⎥ υ (r ′; ξ ) . ∫R3 ⎣ ⎦
(7.70)
As explained in Eqs. (2.30) to (2.35) and is also clear from Eq. (7.70), ℘0 (ε ) can be expressed as a product of two one-dimensional free propagators P0 (ε ) defined by
(P0 (ε )h(ξ ) ) ( x; ξ + ε )
1 2π iε
=
∫
∞
−∞
⎡ i ⎤ dx′ exp ⎢ ( x − x′) 2 ⎥ h( x′; ξ ) , ⎣ 2ε ⎦
(7.71-a)
i.e.,
(℘0 (ε )h(ξ ) ) ( x, y; ξ + ε ) =
1 2π iε
∫
∞
−∞
⎡ i ⎤ ∞ ⎡ i ⎤ dy′ exp ⎢ ( y − y′) 2 ⎥ ∫ dx′ exp ⎢ ( x − x′) 2 ⎥ h( x′, y′; ξ ) . (7.71-b) −∞ ε ε 2 2 ⎣ ⎦ ⎣ ⎦
It is clear from Eqs. (7.70), (7.71-a) and (7.71-b), that it is sufficient to develop a numerical procedure to evaluate the one-dimensional P0 (ε ) given by Eq. (7.71-a), which can be expressed as
(P0 (ε )υ (ξ ) ) ( x; ξ + ε )
=
(M
(ε ) [P0 ′(ε ) M (ε )υ (ξ ) ]) ( x; ξ + ε ) ,
(7.72)
where M (ε ) is the operation of multiplication defined by
( M (ε ) h )( x ) = exp[ix 2 /(2ε )]h ( x ) , and P0 ′(ε ) is a Fourier transform: f ( x; ξ + ε ) =
(P0 ′(ε ) g (ξ ) ) ( x; ξ + ε )
=
1 2π iε
⎡ i ⎤ dx′ exp ⎢ − x x′ ⎥ g ( x′; ξ ) . (7.73) −∞ ⎣ ε ⎦
∫
∞
The Fourier transforms can be approximated by the fast Fourier transform ( FFT ) methods. The resulting scheme is convenient if the sets of grid points for the functions g and
f are reciprocals of each other. To determine the evolution of the solutions for the case presently under consideration, it is desirable to have the same input and output grid structures with respect to the space variable, since the solution is advanced indefinitely. This is possible but cumbersome and compromises accuracy, which is therefore not preferred normally. Major complicating feature is the rapidly oscillating integrand resulting from a small value of ε , which is desirable for accuracy. It is shown below that this disadvantage can be exploited by
Foundations and Applications of Variational and Perturbation Methods
309
constructing a suitable grid structure to reduce the operation of P0 ′(ε ) essentially to an FFT − , where the superscript refers to the sign of the exponent in the Fourier transform. The integral in Eq. (7.73) can be approximated by restricting the interval of integration to [ − a , a ] with a sufficiently large but finite value of a . The value of the integrand outside this
interval is assumed to be negligible. By the mid-point integration with equispaced grid points
{a(2n / N − 1)}n=0 , the integral is further approximated by N
(P0 ′(ε ) g (ξ ) ) ⎜⎛ a (2 ⎝
n 2a ⎞ − 1); ξ + ε ⎟ = N N 2π iε ⎠
N
⎡
m
⎤
∑ g ⎢⎣ a(2 N − 1);τ ⎥⎦ m =1
⎡ 4a 2 ⎛ nm n + m 1 ⎞ ⎤ × exp ⎢ − + ⎟⎥ i⎜ 2 − 2N 4 ⎠⎦ ⎣ ε ⎝N n = 1, 2,...., ( N − 1);
(P0 ′(ε ) g (ξ ) )( a; ξ + ε )
(7.74)
= (P0 ′(ε ) g (ξ ) )( − a; ξ + ε ) = 0.
with a = π N ε / 2 , Eq. (7.74) can be solved efficiently by FFT − . Since N is typically a large integer and ε is a small number, this condition yields a moderate value of a , which is commensurate with the decay properties of a large class of solutions. Substitution from Eq. (7.74) in Eq. (7.72) yields n 2a m ⎞ ⎡π i ⎤ N exp ⎢ (n −1)2 ⎥ ∑ g[a(2 −1);ξ ] −1)];ξ + ε ⎟ = N N N 2π iε ⎝ ⎠ ⎣N ⎦ m=1 (n −1)(m −1) ⎤ ⎡ ⎡π i ⎤ exp ⎢ (m −1)2 ⎥ × exp ⎢−2π i ⎥ N ⎣ ⎦ ⎣N ⎦ n = 1,2,....,( N −1);
(P0 (ε ) g(ξ )) ⎛⎜[a(2
(P0 (ε )g(ξ )) [−a;ξ + ε ]
(7.75)
= (P0 (ε ) g(ξ )) [a;ξ + ε ] = 0.
Eq. (7.75) is pre-eminently suitable for the standard FFT routines. While the condition a = π N ε / 2 can be implemented to accommodate various practical situations with other choices, a convenient choice is as follows:
a =
2γ π , γ = 0,1, 2,..., ε =
1 , ν = 1, 2,..., 2ν
which determine N = 2 μ , μ = (ν + γ + 1) . The interval of integration can be selected first to be sufficiently large by setting a appropriately. Then ε can be set sufficiently small to ensure the accuracy of the propagation with respect to ξ . The grid structure for the
310
S. Raj Vatsya
variable x is determined by a and ε . Rescaling of the variables ξ and x can also be used to increase the flexibility of the method, particularly to accommodate pre-set intervals for the variables. By the present method, the solution accurate up to o(ε 2 ) is given by
u (ξ + 2ε ) = M P0 ′(ε ) M exp [ 2iε V (ξ ) ] M P0 ′(ε ) M u (ξ ) .
(7.76)
In comparison, the widely used split-step scheme based on FFT , i.e., Eq. (4.30) together with Eq. (4.31), defines the solution up to the same degree of accuracy by
(
{
(
u ( x,τ + 2ε ) = FFT + exp ⎣⎡−iε k 2 ⎦⎤ FFT − exp [ 2iεV ] FFT + exp ⎡⎣ −iε k 2 ⎤⎦ FFT − u(τ )
)})
(7.77)
Except for elementary operations of multiplication, use of Eq. (7.76) reduces the number of numerical operations required to about half in comparison with Eq. (7.77). This reduction in the number of operations results from the exact integration with respect to k carried out to obtain Eq. (4.32), which is approximated numerically in the split-step scheme. The transformation described in Eqs. (7.64) to (7.66) is useful for all beams regardless of the numerical scheme used to solve Eq. (7.65), which is the transformed Eq. (7.62). Although we have focused on Eq. (7.65), the procedure is extendible to include a number of similar equations including the coupled sets of nonlinear Schrödinger equations [Vatsya, 2005].
7.V. QUANTUM THEORY The methods and results developed in the earlier chapters were applied to treat problems arising as a result of the quantum mechanical formulation, in ch 6, and the related problem of optical beam propagation in the last section. The purpose of this section is to show that these concepts and results can also be used to gain an understanding of the fundamental structure and premises underlying the formulation of quantum theory and to illustrate the profound impact, yet again, of the domain restrictions on the properties of associated physical systems. First we revisit the classical variational formulation. Since the material is widely available, needed results are stated omitting the details. Consider the action functional F [ξ ] defined by Eq. (3.1):
S xy ( ρ ) = F [ξ ] =
η (x)
∫η
( y)
L (ξ , ξ&,η ) dη ,
(7.78)
where ξ (η ) is a parameterization of a trajectory ρ from y to x in an N -dimensional manifold M N . The underlying manifold will be assumed to be endowed with a metrical structure with infinitesimal arc-length d τ given by d τ 2 = g αβ dxα dx β = dxα dxα , where g is the metric tensor and repeated indices will be assumed to be summed over. Superscripts and
Foundations and Applications of Variational and Perturbation Methods
311
subscripts will be used to indicate the contravariance and covariance of the vectors and tensors. Lagrangians with explicit dependence on η will not be encountered. Therefore, we drop it as an argument. If a Lagrangian satisfies the condition L (ξ , aξ&) = aL (ξ , ξ&) for all constants a , it is homogeneous. Otherwise the Lagrangian is called inhomogeneous. It is clear that for a homogeneous Lagrangian, a change of parameter from η to aη leaves the action in Eq. (7.78) invariant. Consider an inhomogeneous L . The Euler-Lagrange equations reduce for each α to (Eq. 3.6)
⎡ d ⎛ ∂L ⎞ ∂L ⎤ = 0. ⎢ ⎜ &α ⎟ − α ⎥ ∂ ∂ η ξ ξ d ⎝ ⎠ ⎣ ⎦
(7.79)
Solution of Eq. (7.79) will be termed the classical trajectory. With the covariant components of momentum p defined by pα = ∂L / ∂ξ&α , the Hamiltonian H ( p , ξ ) is given by H ( p , ξ ) = pα ξ&α − L (ξ , ξ& ( p , ξ )) .
This requires that ξ& be determinable as a function of p and ξ from the equations p = ∂L / ∂ξ&α . α
One can find a classical trajectory joining two points in M N with a prescribed value of the parameter interval (η ( x ) − η ( y )) , i.e., there exists a classical trajectory joining two arbitrary points in the ( N + 1) -dimensional manifold obtained by adjoning the parameter η to M N . Thus, the action, denoted by S ( x , y ,η ) , along a classical trajectory is a function of x , y and the parameter interval. Also,
pα at the end point x of a classical trajectory is given
α
by pα = ∂ S ( x , y ,η ) / ∂ x . The action associated with an arbitrary trajectory can be computed by evaluating the integral in Eq. (7.78). Along a classical trajectory, it can also be obtained as the solution of the Hamilton-Jacobi equation [Rund, 1966; Courant and Hilbert, 1953, pp. 114-121]: ∂S ( x, y ,η ) + H ( p, x) = 0 , ∂η
(7.80)
where pα in Eq. (7.80) is taken to be ∂S / ∂xα . The nonlinear equation, Eq. (7.80), is difficult to solve in general. However, the series expansions of S for small values of ( x − y ) and (η ( x ) − η ( y )) are obtained easily to evaluate the action with presently required degree of accuracy by a method developed by Hamilton [Convey and McConnell, 1940]. For the present purpose, power series expansions are also more convenient than approximating the integral in Eq. (7.78) by alternative methods.
312
S. Raj Vatsya Consider an arbitrary curve ξ (η ) from y to x and approximate it with N piecewise
classical trajectories each from (ξ (η n );η n ) to (ξ (η n +1 );η n +1 ) , n = 0,..., ( N − 1) , with
(ξ (η0 );η 0 ) = ( y;η ( y )) and (ξ (η N );η N ) = ( x;η ( x )) . The action η n+1
∫η
L (ξ , ξ&,η ) dη
n
on each segment of the original trajectory between (ξ (η n );η n ) and (ξ (η n +1 );η n +1 ) differs from the action along classical trajectory joining the same points by a second order term due to a variational characterization of the solution of Eq. (7.79). This permits a representation of S xy ( ρ ) as the limit of sum ηN
∫η
0
L (ξ , ξ&,η ) dη =
lim
N →∞
N −1
∑ n=0
S (ξ (η n +1 ), ξ (η n ), (η n +1 − η n )) ,
(7.81)
as N → ∞ , (η n +1 − η n ) → 0 and thus, [ξ (η n +1 ) − ξ (η n )] → 0 , for each n . Further ηn + 2
∫ηn
L(ξ , ξ&,η ) dη =
ηn +1
∫ηn
L(ξ , ξ&,η ) dη + ∫
ηn + 2
ηn +1
L(ξ , ξ&,η ) dη
(7.82)
for the original trajectories as well as their approximations by the piecewise classical trajectories. Consider the Lagrangian
1 L(ξ , ξ&) = m ξ& 2 − V (ξ ) 2
(7.83)
describing the classical motion of a particle under the force field generated by a potential V (ξ ) in three-dimensional physical Euclidian space M N = E 3 parameterized by time t . From Eq. (7.80), the classical action S (ξ , ξ ′, ε ) accurate up to the first order in ε = t (ξ ) − t (ξ ′) is easily seen to be given by
S (ξ , ξ ′, ε ) =
m (ξ − ξ ′)α (ξ − ξ ′)α − ε V (ξ ) + o (ε 2 ) . 2 ε
(7.84)
The Schrödinger equation describing a quantum mechanical, non-relativistic physical particle of mass m moving under the influence of a potential V reads, in the standard notation as
i
∂ψ 1 = − ∇ 2ψ + V ψ . ∂t 2m
(7.85)
Foundations and Applications of Variational and Perturbation Methods
313
From Eq. (2.35), Eq. (4.32) and Eq. (7.84), the solution is given by
⎛ m ⎞ ψ (r; t + ε ) = ⎜ ⎟ ⎝ 2π iε ⎠
3/ 2
⎛ m ⎞ = ⎜ ⎟ ⎝ 2π iε ⎠
3/ 2
⎡ im(r − r′)2 ⎤ −iεV ( r ′) ′ ψ (r′; t ) + o(ε 2 ) d r exp ⎢ ⎥ e ∫R3 2ε ⎣ ⎦ (7.86)
∫
R3
dr′ exp [iS (r, r′, ε ) ] ψ (r′; t ) + o(ε 2 ).
The functions exp [iS (r , r ′, ε ) ] are termed phase factors. It follows from Eq. (7.82) that
exp [iS (r , r ′, ε ) ] satisfies the group property stated in Eq. (1.3-a). Consequently, from Remark 1.2 we have that if ψ (r; t ) is the solution of Eq. (7.85), then it is a weighted sum of the phase factors exp[iS xy ( ρ )] , generated by the corresponding classical action, over a set of trajectories terminating at x , with corresponding measure m (r ) being a constant multiple of r . We show the converse below. It follows from sec. 1.II.2. that the weighted sum of phase factors exp[iS xy ( ρ )] over trajectories satisfies Eq. (7.86). We consider a more general equation:
ψ [ξ ;η + ε ] =
1 2ε (Π γN=1d ζ γ ) exp ⎡⎣iζ γ ζ γ ⎤⎦ [1 − iε V( ξ )] ψ [ξ − ζ ;η ] , ∫ Q m
(7.87)
valid up to the first order in ε . The function Q is introduced to facilitate the enforcement of normalization of Eq. (1.7-c). The infinitesimal volume element, usually written as Π γN=1d ζ γ , is in fact a wedge product, which is a linear combination of this type of terms. This is sufficient to legitimatize the pertaining manipulations in the following. The metric is independent of ζ , although it can depend on ξ . Eq. (7.86) can be expressed as Eq. (7.87) by letting
m (r − r′) = 2ε
m (ξ − ξ ′) = ζ , 2ε
and other obvious substitutions. Term by term comparison of the Taylor series expansions of both sides of Eq. (7.87) yields
1 1 dq = (Π γN=1d ζ γ ) exp ⎡⎣iζ γ ζ γ ⎤⎦ = 1 , ∫ Q Q∫
(7.88-a)
1 ∂ψ (ξ ;η ) α 1 ∂ψ dq Qα = 0 , ζ = α α ∫ ∂ξ Q Q ∂ξ
(7.88-b)
314
S. Raj Vatsya
∂ψ (ξ ;η ) 1 ∂ 2ψ (ξ ;η ) αβ Q , = −iV (ξ )ψ (ξ ;η ) + Qm ∂ξ α ∂ξ β ∂η
(7.88-c)
where
Qαβ =
∫ dq ζ
α
ζ β and Qα =
∫ dq ζ
α
.
Above manipulations are valid with sufficiently smooth functions decaying sufficiently rapidly at infinity. For example, we can assume that ψ (ζ ) = e
−2δ |ζ α ζ α |
)
ψ (ζ ) with some
) δ > 0 , such that ψ (ζ ) is still a square integrable function. The operations can be carried out
with this representation and then let δ → 0 . Procedure is validated by the Lebesgue theorem (Theorem 1.3). Since such functions form a dense set in the underlying Hilbert space H = L2 ( R N ; d ζ ) , which together with boundedness of the operator on right side, implies the
validity of relation on the entire space (Theorem 2.1), the procedure is legitimate. This argument will be used in the following without repeat and explicit demonstration to focus on the main objective of this section. The integrals Q , Q α and Qαβ for usually encountered cases can be evaluated exactly. However, we use the method below to obtain the equation without having to evaluate them. The method is illustrated for Qαβ ; Qα can be evaluated by the same method. We have
Qαβ = −
i 2
⎪ ∂ ⎪ N γ ⎧ γ ⎫ ⎡ ⎤⎦ ⎬ g αλ ζ β . Π ( d ζ ) exp i ζ ζ ⎨ γ γ = 1 λ ∫ ⎣ ⎪⎩ ∂ζ ⎪⎭
By integration by parts with respect to dζ λ we obtain
Q
αβ
i = 2
β i αβ γ αλ ∂ζ ⎡ ⎤ ∫ (Π dζ ) exp ⎣iζ γ ζ ⎦ g ∂ζ λ = 2 g Q . N γ =1
γ
(7.89)
The same method yields Qα = 0 . We have assumed that the boundary contribution vanishes, and the metric in the exponent, which is the same as g , is independent of ζ . Substitutions in Eq. (7.88-c) yield
i
∂ψ (ξ ;η ) 1 αβ ∂ 2ψ (ξ ;η ) g + V (ξ )ψ (ξ ;η ) , = − ∂η 2m ∂ξ α ∂ξ β
Both sides of Eq. (7.86) reduce to ψ (r; t ) in the limit as ε → 0 . Equivalently,
(7.90)
Foundations and Applications of Variational and Perturbation Methods ⎛ m ⎞ s − lim [ψ (r; t + ε ) − ⎜ ⎟ ε →0 ⎝ 2π iε ⎠
3/ 2
∫
R3
dr′ exp [iS (r, r ′, ε ) ] ψ (r′; t )] = 0 .
315 (7.91)
Feynman’s path-integral formulation of quantum mechanics [Feynman, 1948; Feynman and Hibbs, 1965] is based on two postulates. The first postulate states that the quantummechanical amplitude or wave-function is an appropriately weighted sum of the phase factors exp[iS xy ( ρ )] obtained with the classical Lagrangian along all trajectories terminating at x . This assumption was used to deduce the propagator equation, Eq. (7.86), by similar arguments as in sec. 1.II.2. By comparing the first two terms in the Taylor series expansion of both sides of Eq. (7.86), outlined above from Eq. (7.86) to Eq. (7.90), it was deduced that the amplitude satisfies the Schrödinger equation, Eq. (7.85). Thus, we have supplemented the path-integral formulation by its converse. Various quantum mechanical equations can be obtained by the above method [Vatsya, 1999]. This allows flexibility in the applications of this method to a wider class of problems. Feynman’s second postulate interprets | ψ (r; t ) |2 as the probability density, which is incorporated into the path-integral formulation from the basic assumptions of quantum mechanics. While this concept generates some interesting issues, they are outside the scope of the present considerations. Now we return to the main objective of this section, which is to consider the impact of a restriction on the trajectories to be summed over on the resulting equation and its solutions. We consider the following characterization and illustrate its implications with about the simplest example. A more general treatment can be found elsewhere [Vatsya, 2004-2]. ˆˆ ) joining the points Let S xyˆ ˆ ( ρ ) be the action associated with an arbitrary trajectory ρ ( xy
ˆˆ ) will be called physical if and only if xˆ and yˆ . Then ρ ( xy
κˆ ( xˆ , τ ( xˆ )) exp ⎡⎣ iS xyˆ ˆ ( ρ ) ⎤⎦ κˆ −1 ( yˆ ,τ ( yˆ )) = 1 ,
(7.92)
where κˆ is a path-independent function. In a manifold without singularities, κˆ = 1 . We shall encounter only the analytic manifolds and thus, set κˆ = 1 . This characterization of the admissible trajectories has been incorporated in the path integral formulation as follows. As discussed above, ψ (r ; t + ε ) can be expressed as a weighted sum over a set of trajectories ξ (η ) . For a general setting, this reads as
χ ( x,η ( x)) =
∑
ξ (η ), y
m (ξ (η )) exp[iS xy (ξ (η ),η )] ,
(7.93)
where η = η ( x ) − η ( y ) . The weight m (ξ (η )) is usually taken a path-independent function, even constant, as in obtaining Eq. (7.91). However, it is sufficient that m (ξ (η )) defines a suitable measure. The sum in Eq. (7.93) is taken over all physical or admissible trajectories passing through x , i.e., if x , y are two arbitrary points under consideration, then ξ (η ) = ρ is admissible in Eq. (7.93) if and only if x, y are included in a segment [ xˆ; yˆ ] of
316
S. Raj Vatsya
ξ (η ) = ρ ( xˆ , yˆ ) with xˆ , yˆ and ρ satisfying Eq. (7.92). Thus the set of contributing paths is characterized by S xyˆ ˆ ( ρ ) = 2π n , with n being an arbitrary integer. Since each term in the sum of Eq. (7.93) is periodic in the action with period 2π , it is sufficient to include only the elemental trajectories, i.e., n = 1 . Although not complete, this procedure to incorporate Eq. (7.92) into the path-integral formulation is sufficiently accurate [Vatsya; 1998]. Some additional implications can be inferred directly from Eq. (7.92). The sums over the subsets of all trajectories can be included by redistributing the measure as indicated in sec. 1.II.2. In fact, the original formulation is in terms of sums, which are replaced with integrals by introducing a suitable measure. This can be done equally well with a subset of all trajectories. The consequent representation of χ is still given by
χ ( x;η + ε ) =
∫
R3
dm ( y ) exp [iS ( x, y, ε ) ] χ ( y;η + ε ) + o(ε 2 ) .
(7.94)
Let S ( x ,η ( x )) be the set of contributing trajectories. If the parameter η leaves S ( x ,η ( x )) invariant under a translation η → (η + η% ) , i.e.,
S ( x ,η ( x )) = S ( x,η ( x ) + η% ) , then from Eq. (7.93), χ ( x,η ) is periodic with respect to η with period η% . It is clear that the evolution parameter, time t , used to deduce Eq. (7.85) does not satisfy this requirement. We shall return to this point later. In the following, we illustrate the procedure with a simple example. Consider the trajectories in M N = E 3 , i.e., the three-dimensional Euclidian space. Curves in E3 can be parameterized by their arc-lengths τ . Then the infinitesimal arc-length is the action determined by the Lagrangian, L ′ = ω (ξ&α ξ&α )1/ 2 , within a multiplicative constant ω , included to characterize the particle under consideration observed in a small neighborhood of some point in E3 . While a unique value of τ is associated with a fixed curve in terms of the other coordinates, it varies from curve to curve. Therefore, τ must be considered an independent parameter. However, the action calculated should remain the same as with L′ for each trajectory. This is possible by taking an inhomogeneous Lagrangian L = [ω (ξ&α ξ& α + 1) / 2] . This construction is more generally applicable. Computations can be carried out with a suitable inhomogeneous Lagrangian, L in the present case, with parameter τ considered an independent variable. The action computed by setting ∂ S / ∂ τ = 0 yields the same value of the action as with L′ , which is the quantity of interest. For this program to succeed, it is required that the momenta generated by each form of the Lagrangian be the same in form. This replacement serves both of the needs. For the present case, the action along a classical trajectory with L is given by
S ( x, y , ε ) =
⎤ ω ⎡ ( x − y )α ( x − y )α + ε⎥ . ⎢ 2 ⎣ ε ⎦
(7.95-a)
Foundations and Applications of Variational and Perturbation Methods
317
Setting ∂ S / ∂ ε = 0 yields
S ( x , y , ε ) = ω ( x − y )α ( x − y )α = ωε ,
(7.95-b)
which is the value as obtained from L′ . The knowledge of the action along infinitesimals, which are always classical, is sufficient to deduce the differential equation by the same method as Eq. (7.90), which can also be obtained directly from the Schrödinger equation by replacing V by −ω / 2 . In any case, the resulting equation with the action given by Eq. (7.95-a) reads
− 2iω
∂χ = ⎡⎣ ∂ α ∂ α + ω 2 ⎤⎦ χ . ∂τ
(7.96)
The action along an arbitrary curve can be obtained by integration
S xy ( ρ ) = S ( ρ ( x , y )) =
τ (x)
∫τ
( y)
dS ( x ′, y , τ )
(7.97-a)
with
dS ( x′, y,τ ) = S ( x′ + dx, y,τ + ε ) − S ( x′, y,τ )
=
⎤ ω ⎡ dxα dxα + ε ⎥ = ω dxα dxα , ⎢ 2 ⎣ ε ⎦
(7.97-b)
as ε → 0 . Equivalence of the action and ωτ obtained for infinitesimals in Eqs. (7.95-a) and (7.95-b), persists along all arbitrary extremals as required by L′ . For the present case, these curves are straight lines in E3 . Along the other trajectrories, the action can be computed from Eq. (7.97-a). The sum in Eq. (7.93) is taken over all monotonic trajectories defined by
S xy ( ρ ) = S ( x , y , τ ( x ), τ ( y )) = ω (τ ( x ) − τ ( y )) ,
(7.98)
by virtue of the fact that arclength is taken as the parameter implying that L = L ′ = 1 . Eq. (7.98) implies that the translation τ → (τ + 2π n / ω ) of arc-length induces the translation
S → ( S + 2π n ) on action, where n is an integer. Eqs. (7.92) and (7.93) and thus, the set S of the contributing trajectories, is invariant under the translation S → ( S + 2π n ) . In view of Eq. (7.98), it is also invariant under the corresponding translation τ → (τ + 2π n / ω ) of arclength τ . Hence, χ ( x , τ ) is periodic with period τ% = (2π / ω ) . Thus, the derivative ∂ / ∂ τ in Eq. (7.96) can be considered a self-adoint operator on L2 ([0, 2π / ω ], dτ ) with the domain defined by differentiability and the boundary condition χ ( x, 0) = χ ( x , 2π / ω ) , which was described in Example 2.4.
318
S. Raj Vatsya
So far we have determined the consequences of an equivalence of the action with ωτ , resulting from Eqs. (7,92) and (7.93). For the resulting action to correspond to L′ , we must have ∂ S / ∂ τ = 0 everywhere. This condition together with Eq. (7.94) implies that ∂ χ / ∂ τ = 0 , which holds for all homogeneous Lagrangians. It follows from Eq. (7.96) that
⎡⎣∂α ∂α + ω 2 ⎤⎦ψ (ω ; x ) = 0 .
(7.99)
The solution of Eq. (7.99) obviously satisfies the required boundary condition also. Now we restrict the sum in Eq. (7.93) to the elemental extremals defined by Eq. (7.95-b) for all x and ε = τ , i.e.,
S ( x , y ,τ ( x ),τ ( y )) = ω ( x − y )α ( x − y )α = ω (τ ( x ) − τ ( y )) ,
(7.100)
which defines the straight lines in E3 . Longer paths are piecewise elemental extremals in accordance with Eq. (7.92). As indicated above, the sum over all such trajectories reduces to the sum over elemental paths rendering the procedure adequate. Also, all of the above results, particularly the representation of Eq. (7.94), hold with this restriction as well, which will be assumed. The condition given by Eq. (7.100) defines a 3 -dimensional surface Z3 in a 4 dimensional manifold ℜ +4 obtained by adjoining the arc-length to E3 as the additional coordinate dx 0 = dτ . The corresponding metric g μν , μ ,ν = 0,1, 2, 3 , induced by Eq. (7.100), is defined by g 00 = 1 , g 0α = g α 0 = 0 and g αβ = −δ αβ , α , β = 1, 2, 3 , where δ is the Kronecker delta. With the arc-length in ℜ +4 denoted by s , the surface Z 3 = ℜ 04 is defined by s 2 = 0 and ℜ +4 by s 2 > 0 . A reflected copy ℜ −4 of ℜ +4 is defined by s 2 < 0 . The union ℜ 4 of ℜ +4 , ℜ −4 and their common boundary ℜ 04 , thus acquires a Minkowskian
structure, naturally. The arc-length τ is now identified with time t and ω , with energy. It is clear that the straight lines in E3 are curves in ℜ 04 and the trajectories for which the
classical arc-length joining its end points exceeds the action are in ℜ +4 . An invariant, conjugate variable m to s , is naturally generated and identified with the rest mass. This structure is sufficient to develop the classical relativistic mechanics by standard methods requiring no other assumptions. Thus, a particle in E3 traveling along the straight lines collectively manifests itself as a photon with energy ω and the same particle traveling along other paths collectively manifests itself as a particle with non-zero rest mass. It is convenient to isolate the coordinate dx 0 = dτ = dt , which has now been identified with the arc-length in E3 . While Eq. (7.99) describes a photon of an arbitrary energy ω , other eigenvectors of the time derivative consequent of Eqs. (7.92) and (7.93) are not redundant for the following. Let S c (ω ) denote the set of admissible extremals in accordance with Eq. (7.92), with the associated fundamental period [0; T ] , where T = (2π / ω ) .
Foundations and Applications of Variational and Perturbation Methods
319
Existence of the set S c (ω ) implies the existence of S c (nω ) for the elemental path with fundamental period T = (2π / ω ) can be obtained as the union of a subset of paths in
S c (nω ) , and thus is in S c ( nω ) .Thus, if a particle characterized by ω is realizable, i.e., there are physical paths for it to follow, then its integral multiples are also realizable, classifying a discrete set of energies. The solutions ψ ( nω ) corresponding to the multiples
( nω ) are obtained from Eq. (7.99) by replacing ω by ( nω ) ; equivalently, by taking the Lagrangians L ′ = nω (ξ& ξ&α )1/ 2 , L = [ nω (ξ& ξ&α + 1) / 2] and following the above procedure, α
α
yielding
⎡⎣ ∂ α ∂ α + n 2ω 2 ⎤⎦ψ ( nω ; x ) = 0, n = 0, ±1, ±2,.... .
(7.101)
The set of solutions {ψ (nω )} can be represented collectively by
φ ( x,τ ) =
∞
∑ ψ (nω ; x)
n =−∞
exp [inω t ] ,
(7.102)
which satisfies the equation
⎡ ∂2 ⎤ ∂ μ ∂ μ φ = ⎢ 2 − ∂α ∂α ⎥ φ = 0 , ⎣ ∂t ⎦
(7.103)
together with the required boundary condition. Conversely, Eq. (7.103) together with the boundary condition φ ( x , 0) = φ ( x , T ) yields the set of solutions {ψ ( nω )}n =−∞ , establishing equivalence between the solutions of Eqs. ∞
(7.101) and (7.103). Thus, Eq. (7.103) represents a collection of photons, i.e., it provides their field description. The elemental trajectory in the set S c (ω / n ) is the union of a subset of the trajectories in
S c (ω ) and the associated period is equal to ( nT ) . Thus, the integral fractions of the basic frequency are also realizable. The associated photons can be interprerted as the down converted ones from a photon of energy ω , which itself can be considered down converted froma photon of energy nω . In any case, they can be classified in symmetry with the integral multiples in the conjugate space of E3 obtained by interchanging the roles of xα and their conjugate momenta pα . The conjugates of the momenta p α with respect to the conjugate space are xα . The above steps can then be followed to yield ( nT ) , instead of ( nω ) , in complete symmetry. There are several issues arise out of the above considerations, which will not be persued for the present.
320
S. Raj Vatsya
Eq. (7.103) is recognized as the equation for free field electromagnetic potentials, with 4 -vector solutions. Existence of the vector solutions can also be established by purely mathematical considerations. The solutions define a tensor and its dual, both of rank two, as
f
μν
= (φν , μ − φ μ ,ν ) , fˆ μν = δ μν σ η f ση ,
respectively, where δ μν σ η is the Levi-Civita tensor density. The dual fˆ satisfies the Jacobi identity:
fˆ μν,ν = 0,
(7.104-a)
consequent of its definition. Eq. (7.104-a) is one pair of Maxwell's equations. The identity
f
μν ,ν
(
)
= φν , μ,ν − φ μ ,ν,ν , together with the gauge fixing condition φ,μμ = 0 yields the
remaining free field Maxwell equations:
f
μν ,ν
= 0.
(7.104-b)
The converse also holds, which is the standard approach to deduce Eq. (7.103) satisfied by the electromagnetic potentials but without the periodicity. The periodic boundary condition deduced here induces periodicity in φ and the associated quantization on f fˆ μν also.
μν
and
Starting with the path-integral formulation in E3 and restricting the summation to picewise straight lines, we have deduced a particle formulation of photons in Eq. (7.99) and Eq. (7.101), as well as their quantized field formulation in Eq. (7.103) to Eq. (7.104-b). In the process, non-extremal trajectories are transferred to the union of ℜ +4 and ℜ −4 generating a physical particle of non-zero rest mass m . The same arguments can be repeated in ℜ +4 with parallel results in ℜ −4 . The Lagrangians are then taken as L′ = m ( x& μ x& μ )1/ 2 and
L = [ m ( x& μ x& μ + 1) / 2] , with arc-length being the proper time denoted by τ . The straight lines are replaced by the extremals in ℜ +4 , which define a 4-dimensional surface Z4 in the 5dimensional space obtained by adjoining τ as the additional coordinate to ℜ 4 , which then acquires a Minkowskian structure with Z 4 = ℜ 50 joining ℜ 5+ and its copy ℜ 5− . Nonextremals are transferred to ℜ 5+ and ℜ 5− . In ℜ50 the following particle equation, parallel to Eq. (7.101), is obtained
⎡⎣ ∂ μ ∂ μ + n 2 m 2 ⎤⎦ψ ( nm; x ) = 0, n = 0, ±1, ±2,.... ,
(7.105)
Foundations and Applications of Variational and Perturbation Methods
321
Proceeding as in the deduction of Eq. (7.103) yields an equivalent equation with the 5-vector potential equation:
⎡ ∂2 μ⎤ ⎢ ∂τ 2 − ∂ μ ∂ ⎥ φ = 0 , ⎣ ⎦
(7.106)
which provides a field description of a collection of the multiples of particles of rest mass m . Above arguments apply with an arbitrary dimension and thus by induction, the results are true for all dimensions N . Each time, the procedure generates a new invariant conjugate to arclength of the type of rest mass together with its particle and field descriptions. Tensor fields can be constructed from N -vectors in the same manner. To be precise, let φ be a N -vector solution of the equation corresponding to Eq. (7.106), which has a 5 -vector solution. Define a tensor of rank two and its dual of rank (N − 2) by
f
μν
= (φν , μ − φ μ ,ν ) , fˆ μ1 ,...μ N −2 = δ μ1 ,...μ N f μ N −1 , μ N ,
respectively, where δ μ1 ,... μ N is the Levi-Civita tensor density. By virtue of its definition, the dual fˆ satisfies the Jacobi identities
fˆ μ1 ,...μ N −2, μ The identity f
f
μν ,ν
j
μν ,ν
= 0.
= 0,
(
j = 1, 2, ..., (N − 2) ,
(7.107-a)
)
= φν ,μ,ν − φ μ ,ν,ν together with the gauge fixing condition φ,μμ = 0 yields (7.107-b)
Conversely, Eq. (7.107-a) implies that f can be expressed as the exterior derivative of a potential φ with fˆ as its dual. Then Eq. (7.107-b) together with the gauge fixing condition yields Eq. (7.106) in φ . Alternative gauge is sometime more convenient. The gauge used here corresponds to the Lorentz condition. The Schrödinger equation, Eq. (7.85), still remains outside the framework based on the restricted sets of trajectories, Eq. (7.92). One way to proceed would be to construct a Finsler space. In this construction, the parameter is treated as an additional coordinate, a new evolution parameter is introduced and the equivalence of the action and parameter along the trajectories is enforced. A metric tensor is then constructed from the Lagrangian providing a metric space structure to the composite manifold. We use a Riemannian space structure, which is more satisfactory, although it has some similarity with the Finsler spaces. Classical equations of motion can be obtained as the non-relativistic limits of their relativistic counterparts, which are equations in ℜ +4 . The Schrödinger equation can also be obtained as the non-relativistic limit of the Klein-Gordon equation in the presence of weak,
322
S. Raj Vatsya
slowly varying fields [Bjorken and Drell, 1964]. The potential V in Eq. (7.85) is the time
{ }
3
describing the external field, i.e., V = φ0 , and component of 4-component potential φμ μ =0 the other three potentials are negligible. We shall attempt to interpret the Schrödinger equation as an approximation of an equation obtainable by the above procedure by first deducing the Klein-Gordon equation. The procedure is outlined below. Further details are available elsewhere [Vatsya, 1995]. In general the corresponding classical relativistic Lagrangian is given by
Lrel =
1 x&μ x& μ − κφμ x& μ , 2
where κ is the coupling parameter. As seen above, without the potentials, the action with Lrel is equal to the arc-length in ℜ +4 , but not so with non-zero potentials. The problem can be formulated on the background of a Riemannian space with the Kaluza-Klein construction, which introduces an additional coordinate q and defines the infinitesimal arc-length ds by ds 2 = [ dx μ dx μ − ( dq + φ μ dx μ ) 2 ] ,
determining the metric on the 5-dimensional Kaluza-Klein space ℜK , with ({ x μ }, q ) forming a 5-vector. The above procedure used for the Minkowskian spaces can be followed with adjustments to deduce parallel results in some Riemannian spaces. The results in ℜK can be obtained as their special case or by following a parallel deduction. For transparency and to facilitate a comparison with the relativistic formulation, we treat { x μ } as a 4-vector in ℜ +4 and, q as an additional coordinate. Thus, the underlying manifold is treated as the Cartisian product ℜC of ℜ +4 and the one-dimensional space spanned by q . A Riemannian manifold structure is introduced in the process. The inhomogeneous form of the Lagrangian in ℜC is given by
L =
1 m′ [ x& μ x& μ − ( q& + φμ x& μ ) 2 + 1] , 2
where m′ is a constant. The Hamiltonian is given by H =
1 m′ , [( p μ − φ μ pq )( p μ − φ μ pq ) − pq2 ] − ′ 2m 2
where pq = ∂L / ∂q& is the momentum conjugate to q . A comparison of Hamilton’s equations obtained from H and those deduced from Lrel , determines pq = κ , which together with the relativistic relation ( p μ − κφ μ )( p μ − κφ μ ) = m 2 results in m′ = m2 − κ 2 .
Foundations and Applications of Variational and Perturbation Methods
323
The expansion of the action S can be obtained from the Hamilton-Jacobi equation, Eq. (7.80), by Hamilton’s method or by supplementing it with other standard methods. Up to the required order of accuracy necessary to deduce the differential equation, S from a point ( y; q0 ) to ( x; q + q0 ) with ξ = ( x − y ) is given by 2 m′ ⎡ 1 1 1 ⎛ ⎞ ⎤ 1 ⎢ξ μ ξ μ − g μν f μγ fνι ξ γ ξ ι ⎜ q + φμ ξ μ + φμ ,ν ξ μ ξ ν + φμ ,νι ξ μ ξ ν ξ ι ⎟ ⎥ + m′ε 2ε ⎣⎢ 12 2 6 ⎝ ⎠ ⎦⎥ 2
S =
= ζ μζ μ −
where
,ν
ε 6m
q 2 g μν f μγ fνι ζ γ ζ ι +
(7.108)
1 m′ε , 2
denotes derivative with respect to xν and ζ μ = m ′ / 2ε ξ μ , which define q . The
Riemannian structure of the Kaluza-Kaluza manifold is manifest in Eq. (7.108). Also, the action S is now a constant multiple of the arclength in the manifold. Substitution of S from Eq. (7.108) together with obvious variable transformations reduces Eq. (7.94) to Eq. (7.87), 1 with V replaced by ( q 2 g μν f μγ fνι ζ γ ζ ι / 6 m + m′) . As explained in deducing Eq. (7.99), 2 homogeneity of the lagrangian forces the condition ∂S / ∂τ = 0 implying also ∂ψ / ∂τ = 0 . In the Kaluza-Klein formalism, q varies over a compact domain in the real line. The integral can be extended to the entire real line by periodicity. The formalism introduces quantization of the coupling strength in the integral multiples nκ of κ . The solution of interest for the present corresponds to the fundamental period, i.e., n = 1 . The resulting equation for n = 1 in the presently used units reads
⎛ ∂ ⎞⎛ ∂ + κφ μ ⎜ i μ + κφ μ ⎟ ⎜⎜ i ∂ x ∂ x ⎝ ⎠⎝ μ
⎞ ⎡ 2 1 f μν f ⎟ψ = ⎢ m − ⎟ 12 ⎣ ⎠
μν
⎤ ; ⎥ψ ⎦
(7.109-a)
and in the standard units,
∂ ∂ κ ⎞⎛ κ μ ⎛ ⎜ ih μ + c φμ ⎟ ⎜⎜ ih ∂x + c φ ⎝ ∂x ⎠⎝ μ
⎞ ⎡ 2 1 2 ⎟ψ = ⎢ m − G h f μν f ⎟ 6 ⎣ ⎠
μν
⎤ , ⎥ψ ⎦
(7.109-b)
where h is Planck’s constant divided by 2 π , c is the speed of light and G is the universal gravitational constant. The constant multiples of f μν f ℜK given by RK = f μν f
μν
μν
arise out of the curvature RK of
/4.
As seen from Eq. (7.109-b), the contribution of the curvature term is miniscule in comparison with the other terms and of one order higher than the quantum mechanical equations. In any case, in the flat manifold ℜ +4 the equations reduce to the Klein-Gordon equations, i.e., with the curvature term omitted. Thus, the Schrödinger equation, Eq. (7.85), is deduced as the non-relativistic limit of the Klein-Gordon equation, which is the zero
324
S. Raj Vatsya
curvature limit of Eq. (7.109-a), equivalently Eq. (7.109-b), resulting from the present formulation. The formulation of photons outlined in Eqs. (7.99) to (7.104-b) can now be compared with the standard scheme to quantize the electromagnetic field. With the gauge ϕ 0 = ϕ 0 = 0 , ∇ ϕ = 0 , which is more convenient for this purpose, Maxwell’s equations are reduced to Eq. (7.103) in the vector potential ϕ but without the periodicity of φ . Taking the Fourier transform with respect to the space variables transforms Eq. (7.103) into
a&&k (ϕ ) + k 2 (ϕ ) a k (ϕ ) = 0 ,
(7.110)
where ak (ϕ ) is the transformed variable and dot denotes the derivative with respect to time
t . The solutions of Eq. (7.110) satisfy the condition k ak (ϕ ) = 0 leaving two independent transverse waves, both being essentially equivalent. One-dimensional classical, non-relativistic material harmonic oscillator of unit mass is described by
q&& + ϖ 2 q = 0 ,
(7.111)
where q is the position and ϖ , the frequency. By identifying ak (ϕ ) with q and k , with ϖ , it is assumed by analogy with Eq. (7.110), that the field can be considered a collection of harmonic oscillators. Electromagnetic field is quantized by assuming them to be the quantum oscillators, each one of which is described by the Schrödinger equation
⎤ 1 ⎡ ∂2 ∂u + ϖ 2 q2 ⎥ u = i . ⎢− 2 2 ⎣ ∂q ∂t ⎦
(7.112)
In the present formulation, the quantized electromagnetic potentials are the amplitudes obtained as weighted sums of the classical phase-factors along straight lines in E3 . By dropping the two space dimensions, Eq. (7.109-a), in its flat manifold approximation reduces to the Klein-Gordon equation for a one-dimensional material harmonic oscillator, which in its non-relativistic limit reduces to Eq. (7.112). Thus, Eq. (7.112) originates in a threedimensional Kaluza-Klein manifold, indicating fundamentally different characters of the two formulations. The operator on the left side of Eq. (7.112) has a pure point spectrum with eigenvalues ( n + 1 / 2)ϖ for all non-negative integers n . Thus the ground state energy of a material oscillator is strictly positive, which is consistent with the uncertainty principle. Since the photons cannot exist at rest, zero ground state energy causes no violation. In fact zero energy obtained in the above for the ground state of photons is desirable as it represents the vacuum. Thus, the material oscillators and photons are endowed with fundamentally different physical properties rendering the concept of their equivalence to describe the electromagnetic fields, tenuous.
Foundations and Applications of Variational and Perturbation Methods
325
Furthermore, since the ground state energy of each material oscillator is positive, the assumption of the field being a collection of infinitely many such oscillators, implies infinite vacuum energy. To circumvent this difficulty, the vacuum energy is taken as the reference point by setting it equal to zero, which is the simplest example of renormalization. Since most of the experimental results depend on the differences in energy states, instead of their absolute values, they remain unaffected by this adjustment. However, the gravitational effects of the energy are observable due to its equivalence with mass. The standard quantization scheme yields an infinite value for the vacuum energy with consequent infinite gravitational field, contrary to observations. There are reasons to believe that the existing quantum electrodynamics is inapplicable at very high energies. For this reason, a large momentum cutoff is usually employed. Even with this cutoff, the gravitational effect of the vacuum is too large to escape detection [Feynman and Hibbs, 1965; pp. 244-246]. In contradistinction, the present formulation yields zero vacuum energy in compatibility with the experimental observations without impacting upon the satisfactory results of existing Quantum Electrodynamics. In the process, it unifies the particle and field formulations in the quantum theory as well as deduces the Minkowskian structure of the space-time manifold re-interpreting time as arclength in the Euclidean space, in the process. Futher, it provides a framework for other particle and field formulations together with structures of the underlying manifolds.
REFERENCES Aris, R., (1975) The mathematical theory of diffusion and reaction in permeable catalysis, (Clarendon, Oxford) Vol. 1, pp. 101-239. Aronszajn, N and Smith, K., (1957) “Characterization of positive reproducing kernels. Applications to Green’s functions,” Amer. J. Math., 79, 611-622. Baker, G. A. and Graves-Morris, P. (1996) Padé Approximants, Second Edition, (Cambridge University Press, Cambridge). Bazley, N. W., (1959), “Lower bounds to eigenvalues with application to the helium atom,” Proc. Nat. Acad, Sci. U.S.A., 45, 850-853. Bellman, R., (1957), “On the non-negativity of Green’s functions,” Boll d’Unione Mate., 12, 411-413. Bellman, R., Kalaba, R. E., and Lockett, J. A., (1966) Numerical Inversion of Laplace Transform, (Elsevier Publishing Company, Inc., New York ), Ch. 2. Berezanskiĭ, J. M., (1968) Expansions in eigenfunctions of selfadjoint operators, (Am. Math. Soc ., Providence). Bjorken, J. D., Drell, S. D., (1964) Relativistic Quantum Mechanics, (McGraw-Hill, N.Y.), pp. 198-207. Chandrasekhar, S., (1960) Radiative Transfer, (Dover), ch. V. Conway, A. W., and McConnell, A. J., (ed.) (1940), The mathematical papers of Sir William Rowan Hamilton, (Cambridge University Press, Cambridge) Vol. II, pp. 613-617. Courant, R., and Hilbert, D., (1953) Methods of Mathematical Physics, (Interscience, New York). Feit , M. D., Fleck, Jr., J. A. and Steiger, A. (1982) "Solution of the Schrödinger equation by a spectral method," J. Comput. Phys., 47, 412-433. Feynman, R. P., (1948) “Space-time approach to non-relativistic quantum mechanics,” Rev. Mod. Phys. 20, pp. 367-387. Feynman, R. P. and Hibbs, A. R., (1965) Quantum Mechanics and Path Integrals, (McGrawHill, New York). Frank-Kamenetskii, D. A., (1955) Diffusion and Heat Exchange in Chemical Kinetics, (Princeton University Press, Princeton). Frank-Kamenetskii, D. A., (1940) “On the induction period in thermal explosions,” J. Chem. Phys. 8, 125. Herman, G. T., (1980) Image Reconstruction from Projections: The fundamentals of computerized tomography, (Academic Press, New York). Herman, G. T., Tuy, H. K., Langenberg, K. J., and Sabatier, P. C. (1987), Basic Methods of Tomography and Inverse problems, (Adam Hilger, Philadelphia). Jackson, J. D., (1962) Classical electrodynamics, (Wiley, New York). Joannopoulos , J. D., Meade, R. D. and Winn, J.N., (1995) Photonic Crystals, (Princeton University Press, Princeton, NJ). Kato, T., (1980) Perturbation Theory for Linear Operators, (Springer-Verlag, New York) Kato, T., (1951), “On the existence of solutions of the Helium wave equation,” Trans. Am. Math. Soc., 70, 212-218.
328
S. Raj Vatsya
Keller, J. B., (1969), “Some positon problems suggested by nonlinear heat generation,” in Bifurcation theory and nonlinear eigenvalue problems, (ed. Keller, J. B., and Antman, S., Benjamin, New York), pp. 217-255. Mikhlin, S. G. (1964) Variational methods in Mathematical Physics, (Pergamon Press, New York). Moise, A. and Pritchard, H.O., (1989) “Newton-variational solution of the Frank-Kamenetskii thermal explosion problem,” Can. J. Chem. 67, 442-445. Monin, A. S. and Yaglom, A. M., (1967) Statistical Hydromechanics,” Part 2, (Nauka, Moskow). Newell, A. C. and Maloney, J. V., (1992) Nonlinear Optics, (Addison-Wesley, Redwood City, CA). Noble, B. (1973) in Topics in Numerical Analysis, (ed. J. H. Miller, New York, Academic Press) pp.211-32. Nuttall, J., (1969), "The convergence of the Kohn variational method," Ann. Phys. 52, 428443. Nuttall, J. and Singh, S. R., (1979), “Existence of partial-wave two-cluster atomic scattering amplitudes,” Can. J. Phys. 57, 449-456. Nuttall, J. and Singh, S. R., (1977), "Orthogonal Polynomials and Padé Approximants Associated with a System of Arcs," J. Approx. Theory, 21, 1-42. Pritchard , H. O, (2004), “Eigenvalue methods in unimolecular rate calculations,” J. Phys. Chem. A, 108, 173-178. Pritchard, H. O., (1984) The quantum theory of unimolecular reactions, (Cambridge University Press, New York). Pritchard , H. O. and Vatsya, S. R., (1983), “Stiffness of the master equation for lowtemperature reaction rates,” J. Comp. Phys., 49, 5249-5252. Prugovečki, E. (1971) Quantum Mechanics in Hilbert Space, (Academic Press, New York). Pupyshev, V. I., (2000) “The nontriviality of the Hellmann-Feynman theorem,” Russian Journal of Physical Chemistry, 74, S267-S278. Reed, M. and Simon, B., (1978) Methods of modern mathematical physics, (Academic, NY). Riesz, F. and Sz.-Nagy, B. (1971), Functional analysis, (Translated by L. F. Boron, Ungar, New York, Fifth printing). Rund, H., (1966) The Hamilton-Jacobi theory in the calculus of variations, (Van Nostrand, London), Ch. 3. Saad, Y. and Schultz, M. H., (1985) “Conjugate gradient-like algorithms for solving nonsymmetric linear systems,” Math. Comp., 44, 417-424. Saleh, B. A. and Teich, M. C., (1991) Fundamentals of Photonic, (Wiley, New York). Sergeev, A. V. and Kais, S., (1999) “Variational principle for critical parameters of quantum systems,” J. Phys. A: Math. Gen., 32, 6891-6896. Serzu M H,.Vatsya S R, Lodha G S, Hayles J G, O'Connor P A, (1995), Crosshole seismic tomography and the areal basis inversion technique: A model study, (AECL Technical Report, TR-625, COG-95-157, SDDO, AECL, Chalk River Ontario, Canada K0J 1J0). Shohat, J. A. and Tamerkin, J. D., (1943) The Problem of Moments, (Am. Math. Soc.). Simon, B., (1971) Quantum mechanics for Hamiltonians defined as quadratic forms, (Princeton U. Press, Princeton). Singh, S. R. (1981) Converging Lower Bounds to Atomic Binding Energies,” J. Math. Phys., 22, 893-6.
References
329
Singh, S. R. (1977) "On Approximating the Resolvent of a Rotated Hamiltonian in the Scattering Region,” J. Math. Phys., 18, 1466-9. Singh, S. R., (1976) "Some Convergence Properties of the Bubnov-Galerkin Method,” Pac. J. Math., 65, 217-21. Singh, S. R. and Stauffer, A. D., (1974) "A Unified Formulation of Variational Methods in Scattering Theory," Nuovo Cimento, 22B, 139-52. Singh, S.R. and Turchetti, G., (1977) “Error Bounds on Some Approximate Solutions of the Fredholm Equations and Applications to Potential Scattering and Bound State Problems,” J. Math. Phys., 18, 1470-5. Spedicato, E. (ed) (1991) Computer Algorithms for Solving Linear Algebraic equations, (Springer-Verlag New York). Stone, M. H., (1932) Linear Transformations in Hilbert Space and Their Applications to Analysis, (Am. Math. Soc., Providence, Rhode Island). Tai, C. C., Vatsya, S. R. and Pritchard, H. O., (1993) “Related Upper and Lower Bounds to Atomic Binding Energies," Intern. J. Quant. Chem., 46, 675-88. Vatsya, S. R., (2005) “Path-integral formulation of optical beam propagation," J. Opt. Soc. Am. B. 22, 2512-2518. Vatsya, S. R. (2004-1), "Comment on "Breakdown of the Hellmann-Feynman theorem: Degeneracy is the key," Phys. Rev. B. 69, 037102. Vatsya, S. R. (2004-2), "Path integral formulation of quantized fields," LANL, arXiv:quantph/0404166. Vatsya, S. R., (1999) “Mechanics of a particle in a Riemannian manifold,” Chaos, Solitons & Fractals, 10, 1391-1397. Vatsya, S. R., (1998) “Gauge mechanical view of physical reality,” in Causality and locality in modern physics (ed. G. Hunter, S. Jeffers and J. P. Vigier, Kluwer, Dordrecht) pp.243251. Vatsya, S. R., (1995) “Mechanics of a charged particle on the Kaluza-Klein background,” Can. J. Phys., 73, 602-607. Vatsya, S.R., (1989) “Existence of the solution of a nonlinear integro-differential equation,” J. Comput. Phys., 82, 241-244. Vatsya S. R., (1988) "Convergence of conjugate residual-like methods to solve linear equations," SIAM J. Numer. Anal., 25, 2977-2982. Vatsya, S. R., (1987) “Existence and approximation of the solutions of some nonlinear problems,” J. Math. Phys., 28, 1283-1286. Vatsya S. R., (1981-1) "On approximating the solutions and critical points of some nonlinear phenomena", J. Math. Phys., 22, 957-964. Vatsya S. R., (1981-2) "On approximating the solutions of the Chandrasekhar H-equation," J. Math. Phys., 23, 1728-1731. Vatsya, S. R. and Nikumb, S. K., (2002) “Lower and upper bounds to photonic band gap edges,” Phys. Rev. B, 66, 085102. Vatsya S. R., Pritchard H O, (1990), "Multi-exponential unimolecular rate formulae," Theor. Chim. Acta., 77, 295-304 Vatsya S. R., Pritchard H O, (1985), "An Explicit Inverse of a Tridiagonal Matrix," Intern. J. Computer Math. 14, 63-84. Vatsya S. R., Pritchard H O, (1983), "General behaviour of thermal unimolecular reactions at the low-pressure limit," Mol. Phys. 54, 203-209.
330
S. Raj Vatsya
Vatsya S. R., Pritchard H O, (1982), "Unimolecular Reactions with Intense Radiation," J. Chem. Phys. 76, 1024-1032. Vatsya S. R., Pritchard H O, (1980), "Some analytic properties of the master equation for unimolecular reaction," Proc. R. Soc. London A 375, 409-424. Vatsya S. R. and C. C. Tai, (1988) "Inverse of a Perturbed Matrix," Intern. J. Computer Math. 23, 177-184. Velikson, B. A. (1975) “Solution of a nonlinear integro-differential equation,” Comput. Math. And Math. Phys., 15, 256-259. Vorobyev, Yu. V. (1965) Method of Moments in Applied Mathematics, (Gordon and Breach, New York). Wall, H. S., (1948) Analytic Theory of Continued Fractions, (Van Norstand, Princeton). Weinstein, A. and Stenger, W., (1972) Methods of Intermediate Problems for Eigenvalues, (Academic, NY).
INDEX A academic, 260 accessibility, 6 accuracy, 74, 95, 115, 194, 197, 203, 205, 217, 249, 251, 260, 307, 308, 309, 310, 311, 323 adjustment, 38, 58, 81, 114, 121, 325 air, 256 algebraic method, 127, 140 algorithm, 46, 48, 192, 251, 252 alternative, 9, 48, 49, 83, 89, 92, 98, 108, 136, 144, 152, 160, 186, 203, 217, 223, 230, 252, 301, 306, 311 alters, 51, 143, 278 ambiguity, 169 amplitude, 225, 226, 227, 229, 232, 235, 238, 245, 259, 304, 306, 315 application, 41, 50, 52, 67, 68, 73, 87, 95, 101, 115, 134, 141, 154, 166, 183, 217, 230, 260, 262, 266, 287, 307, 327 argument, 19, 53, 54, 56, 66, 81, 90, 95, 98, 108, 114, 130, 132, 133, 139, 142, 159, 190, 197, 210, 268, 275, 276, 279, 287, 294, 297, 299, 311, 314 assignment, 3, 23, 157 assumptions, 9, 47, 70, 71, 75, 76, 79, 83, 90, 91, 124, 129, 133, 139, 144, 146, 147, 167, 168, 169, 170, 171, 186, 208, 209, 211, 230, 247, 270, 271, 272, 273, 274, 275, 286, 287, 315, 318 asymptotic, 169, 172, 232 asymptotically, 156, 194 atmosphere, 251, 291 atoms, 215, 219 attention, 153, 171, 283, 291
B Banach spaces, 23, 28, 29 band gap, 255, 257, 259, 260, 329 basis set, 21, 22, 82, 91, 102, 113, 115, 124, 146, 237, 238, 247, 262 beams, 310 behavior, 172, 202, 232, 242 Bessel, 224 bifurcation point, 293 binding, 219, 228 binding energies, 219, 228 bonding, 166 borderline, 223 boron, 328 boundary conditions, 35, 54, 67, 184, 242, 265 bounded solution, 280 bounds, 84, 118, 123, 124, 126, 128, 145, 146, 147, 148, 149, 151, 152, 172, 176, 194, 197, 198, 213, 217, 218, 219, 220, 221, 222, 223, 228, 229, 259, 260, 261, 262, 264, 271, 327, 329
C calculus, 63, 64, 65, 132, 328 Canada, 328 capacitance, 144 catalysis, 327 cell, 256, 257 charged particle, 329 chemical, 193 chemical kinetics, 193 classes, 16, 74, 114, 127
332 classical, 51, 63, 64, 66, 106, 254, 265, 310, 311, 312, 313, 315, 316, 317, 318, 322, 324 classical mechanics, 64 classified, 289, 319 closure, 12, 18, 23, 24, 28, 29, 33, 43, 61, 96, 111, 112, 114, 121, 129, 130, 131, 132, 133, 242, 243, 247, 262, 263, 265, 289 cofactors, 72 column vectors, 16, 21, 110 communication, 257, 304 communication technologies, 304 compatibility, 101, 325 complement, 22, 85, 98, 108, 136, 142, 143, 153, 165, 167, 231, 234, 239, 264, 265, 300 complex numbers, 3, 16, 28, 42, 63, 247, 257 complications, 12, 19, 118, 121, 163, 202, 209, 218, 221, 223, 231, 234, 237, 253 components, 151, 152, 257, 258, 265, 311 composite, 12, 13, 41, 321 computation, 72, 83, 92, 102, 113, 114, 129, 131, 162, 185, 187, 208, 285 computer, 251 concave, 176, 197, 278, 279, 280, 281, 282, 283, 284, 285, 287 concrete, 269 configuration, 291 confusion, 3, 157, 184, 225, 226 conjecture, 67 construction, 5, 6, 7, 8, 10, 12, 13, 14, 18, 46, 47, 48, 92, 95, 120, 121, 131, 132, 163, 190, 204, 218, 233, 241, 242, 253, 262, 283, 288, 302, 316, 321, 322 continuing, 14 continuity, 3, 13, 41, 78, 79, 80, 87, 88, 97, 108, 115, 157, 159, 162, 163, 165, 166, 167, 170, 172, 174, 220, 221, 234, 266, 267, 275, 276, 277, 280, 284, 286, 287, 294, 303 convex, 176, 177, 196, 270, 278, 279, 280, 281, 282, 283, 284, 285, 287, 288, 290, 291, 293, 295, 305 correlation, 193 Coulomb, 135, 160, 215 coupling, 246, 322, 323 coverage, 252, 253 covering, 4, 6, 8, 12, 171, 217 critical behavior, 222, 223 critical points, 216, 329 critical value, 222 crystal, 255, 256, 257 crystals, 255
Index
D danger, 3, 157, 225 dating, 99 decay, 160, 172, 194, 215, 235, 255, 309 decomposition, 22, 37, 38, 47, 225, 254, 259 deduction, 15, 60, 321, 322 deficiency, 7, 36, 217, 260 definition, 6, 8, 11, 19, 25, 39, 80, 92, 96, 101, 106, 148, 154, 237, 247, 266, 276, 288, 299, 307, 320, 321 deformities, 254 degenerate, 25, 28, 33, 36, 73, 76, 84, 122, 140, 143, 146, 147, 148, 151, 166, 169, 194, 218 degree, 4, 71, 72, 100, 102, 106, 109, 110, 111, 190, 194, 202, 205, 310, 311 delta, 21, 39, 115, 261, 318 density, 251, 254, 315, 320, 321 derivatives, 5, 34, 51, 55, 58, 64, 136, 137, 166, 168, 169, 170, 171, 177, 269 detection, 325 dielectric, 255, 256, 304 dielectric function, 255, 256 differential equations, 51, 184, 186, 187, 222, 264 differentiation, 73, 138, 158, 165, 166, 171, 172, 231 diffusion, 289, 301, 327 dimensionality, 19, 204 Dirac delta function, 58 discontinuity, 59, 101, 223, 234 displacement, 255 distribution, 251, 255, 304 divergence, 255, 258 dominance, 124 duplication, 156
E eigenvalue, 25, 26, 33, 35, 36, 38, 40, 46, 51, 52, 55, 56, 65, 66, 72, 73, 76, 84, 85, 86, 87, 122, 123, 137, 138, 139, 140, 141, 142, 143, 147, 148, 149, 151, 152, 157, 164, 166, 167, 169, 170, 171, 172, 184, 185, 193, 194, 195, 196, 210, 212, 216, 218, 219, 221, 222, 223, 244, 258, 261, 266, 278, 280, 284, 290, 328 eigenvector, 33, 35, 52, 58, 73, 107, 138, 147, 148, 151, 152, 157, 164, 166, 184, 185, 193, 194, 195, 222, 259, 261, 268 elaboration, 266 electric field, 255 electromagnetic, 257, 320, 324
Index electromagnetic fields, 324 electrons, 215, 217 elliptic differential operators, 264 energy, 92, 318, 319, 324, 325 engineering, 12, 65 envelope, 304 environment, 291 equality, 19, 21, 39, 71, 73, 80, 81, 111, 123, 130, 149, 161, 185, 191, 294 equating, 100 equilibrium, 194 equilibrium state, 194 estimating, 83 Euclidean space, 63, 256, 257, 325 Euler-Lagrange equations, 65, 311 evolution, 153, 216, 308, 316, 321 exothermic, 291 expansions, 100, 119, 138, 139, 140, 187, 201, 227, 228, 259, 311, 313 explicit knowledge, 14 explosions, 327 exponential, 13, 154, 155, 302, 329 exposure, 6, 12, 156
F fabrication, 304 family, 5, 15, 36, 37, 38, 39, 41, 52, 53, 54, 55, 60, 75, 77, 134, 140, 153, 154, 156, 166, 167, 186, 187, 270, 300 Feynman, 12, 138, 166, 169, 194, 227, 315, 325, 327, 328, 329 filtered back projection, 251 flexibility, 115, 239, 310, 315 flow, 301 fluid, 301 Fourier, 48, 49, 51, 53, 57, 58, 134, 156, 252, 255, 259, 260, 307, 308, 324 freedom, 272
G gas, 291, 301 gauge, 320, 321, 324 Gaussian, 110, 113, 115, 240, 304, 305 gene, 42 generalizations, 42 generation, 328 geophysical, 251, 254 Gibbs, 259 graph, 6, 7, 12, 63, 257 gravitational constant, 323
333 gravitational effect, 325 gravitational field, 325 Green’s function, 59, 60, 156, 164, 224, 232, 239, 246, 265, 266, 270, 271, 276, 279, 288, 289, 290, 293, 327 ground state energy, 324, 325
H Hamilton’s principle, 65 Hamiltonian, 249, 311, 322, 329 Hamilton-Jacobi, 311, 323, 328 heat, 274, 291, 328 heat transfer, 274 helium, 327 Helmholtz equation, 304 heuristic, 172 Hilbert, 19, 20, 21, 22, 29, 30, 31, 36, 42, 43, 45, 46, 53, 56, 58, 65, 67, 82, 83, 84, 85, 86, 93, 98, 101, 102, 111, 112, 118, 122, 128, 131, 135, 143, 150, 157, 162, 163, 164, 172, 183, 219, 220, 221, 225, 226, 229, 230, 231, 257, 258, 259, 265, 266, 311, 314, 327, 328, 329 Hilbert space, 19, 21, 22, 29, 30, 31, 36, 42, 43, 45, 58, 65, 67, 82, 93, 98, 101, 102, 111, 118, 122, 128, 131, 150, 157, 162, 163, 183, 225, 226, 230, 231, 257, 258, 259, 314 homogeneity, 323 homogeneous, 52, 65, 73, 123, 136, 173, 311, 318 hybrid, 173
I identification, 227, 228, 229, 232, 237, 238, 239, 241, 269 identity, 15, 23, 28, 36, 37, 38, 58, 82, 83, 85, 86, 87, 88, 92, 94, 97, 108, 111, 112, 113, 114, 119, 149, 154, 170, 184, 198, 216, 230, 241, 243, 248, 258, 261, 265, 320, 321 images, 254 imaging, 251 implementation, 67, 196, 204 independence, 68 independent variable, 215, 316 indices, 36, 310 induction, 56, 72, 103, 125, 174, 184, 188, 199, 200, 208, 209, 212, 255, 271, 273, 274, 275, 276, 282, 284, 286, 287, 294, 296, 297, 298, 303, 321, 327 induction period, 327
334
Index
inequality, 17, 18, 45, 59, 71, 110, 120, 123, 124, 134, 147, 174, 190, 254, 271, 275, 276, 283, 297, 298 infinite, 7, 9, 19, 21, 27, 28, 45, 46, 49, 50, 51, 57, 104, 107, 114, 116, 126, 144, 183, 204, 325 insight, 237 instabilities, 46 integration, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19, 34, 37, 38, 40, 41, 54, 59, 64, 78, 80, 99, 109, 115, 135, 137, 139, 154, 161, 166, 218, 230, 253, 267, 275, 277, 293, 309, 310, 314, 317 intensity, 304 interaction, 215, 216, 249, 292, 304 interactions, 215 interdisciplinary, vii interpretation, 184 interval, 4, 6, 7, 10, 11, 12, 13, 19, 20, 21, 37, 38, 40, 45, 46, 49, 56, 88, 89, 99, 107, 116, 120, 140, 146, 148, 149, 150, 152, 167, 168, 169, 195, 196, 197, 216, 217, 270, 276, 277, 279, 284, 285, 286, 287, 302, 309, 311 intuition, 251 inversion, 41, 47, 52, 83, 116, 237, 251, 252, 298, 328 ionization, 223 ions, 215 isothermal, 289 isotropic, 255 iteration, 173, 197, 204, 206, 290, 293, 298, 306, 307
J justification, 24, 156
K Kaluza-Klein, 322, 323, 324, 329 kernel, 44, 45, 53, 112, 164, 226, 229, 265, 266, 293, 294 kinetics, 289 Klein-Gordon, 321, 323, 324
L labor, 203 Lagrangian, 63, 65, 311, 312, 315, 316, 321, 322 laser, 307 lasers, 304 lattice, 256, 257
lead, 67 lens, 305, 307 Lie algebra, 154 Lie group, 15, 154 limitation, 130, 290 limitations, 115, 127, 187, 218, 219 linear, 4, 15, 16, 18, 21, 22, 23, 28, 29, 30, 31, 42, 43, 44, 47, 50, 51, 66, 68, 69, 72, 98, 102, 104, 121, 123, 139, 173, 175, 177, 193, 204, 254, 257, 276, 279, 280, 288, 298, 304, 306, 313, 328, 329 linear function, 4, 28, 29, 30, 31, 42, 44, 72, 276, 279, 280, 306 linear systems, 328 literature, 3, 17, 37, 49, 72, 92, 129, 156, 162, 166, 171, 173, 193 localization, 304 location, 304, 305 London, 328, 330 low-temperature, 328 lying, 217
M magnetic, 254, 255 magnetic field, 255 manifold, 12, 13, 22, 101, 102, 234, 257, 310, 311, 315, 318, 321, 322, 323, 324, 325, 329 manifolds, 12, 315, 325 mapping, 16, 39, 53, 173, 175, 176, 177, 238, 269, 271, 290, 293, 301, 307 master equation, 193, 328, 330 mathematical, 17, 127, 201, 251, 265, 320, 327, 328 matrices, 183, 193 matrix, 36, 38, 47, 48, 72, 86, 91, 92, 103, 104, 105, 113, 114, 115, 116, 138, 140, 141, 144, 145, 147, 148, 149, 150, 151, 183, 184, 185, 187, 188, 190, 192, 193, 194, 201, 202, 203, 205, 207, 212, 218, 223, 230, 231, 233, 252, 253, 259, 260, 261, 262, 265, 267, 298, 307 Maxwell's equations, 320 meanings, 64, 201 measures, 7, 45 mechanical, 152, 156, 162, 172, 215, 304, 310, 312, 315, 323, 329 mechanics, 63, 318, 328 media, 254 metric, 3, 310, 313, 314, 318, 321, 322 microwave, 251 modeling, 51, 183, 194, 291 models, 22, 152 momentum, 226, 230, 311, 322, 325
Index monotone, 9 motion, 65, 215, 312, 321 motivation, 163 multiples, 34, 192, 253, 319, 321, 323 multiplication, 15, 16, 57, 58, 110, 113, 246, 270, 307, 308, 310 multiplicity, 25, 33, 36, 38, 122, 125, 137, 139, 141, 143, 165, 166, 167, 170, 216 multiplier, 64, 72, 303
N natural, 3, 27, 294 negativity, 151, 218, 224, 327 New York, 327, 328, 329, 330 Newton, 175, 176, 177, 197, 328 Newton's second law, 65 nonlinear, 23, 72, 173, 269, 272, 274, 278, 282, 283, 288, 292, 301, 304, 306, 310, 311, 328, 329, 330 nonlinearities, 272, 274, 278, 281, 283 normal, 33, 39, 53, 55, 265 normalization, 15, 47, 48, 74, 100, 109, 140, 225, 240, 253, 313 normed linear space, 16, 18 norms, 18, 23, 24, 60, 83, 161, 222 numerical computations, 155, 202, 220, 226, 238
O observations, 163, 168, 325 one dimension, 239 optical, 304, 310, 329 organic, 291 orthogonality, 34, 113, 206 oscillations, 259 oscillator, 324, 325
P parameter, 12, 13, 14, 15, 41, 52, 54, 60, 64, 66, 118, 127, 133, 135, 136, 138, 153, 162, 166, 172, 186, 187, 216, 222, 223, 234, 246, 270, 274, 280, 283, 291, 311, 316, 317, 321, 322 partial differential equations, 12, 51 particles, 215, 321 performance, 291 periodic, 255, 256, 258, 316, 317, 320 periodicity, 256, 320, 323, 324 permeability, 255
335 perturbation, 43, 127, 133, 134, 135, 136, 143, 147, 154, 155, 156, 163, 165, 168, 169, 172, 173, 194, 196, 215, 264, 265, 306 perturbation theory, 155, 156, 169, 172 perturbations, 140, 143, 145, 151, 156, 159, 160, 162, 163, 169, 171, 172, 194, 202, 223, 255, 306 phase shifts, 225, 237 Philadelphia, 327 photon, 318, 319 photonic, 255, 257, 259, 304, 329 photonic crystals, 257 photons, 319, 320, 324 physical properties, 324 physical sciences, vii physics, 328, 329 plane waves, 162, 223, 224, 227, 259, 260 play, 22, 25, 31, 33, 106, 183 Poisson, 41 polarization, 256 pollution, 251 polygons, 253 polynomial, 4, 72, 100, 102, 106, 107, 109, 110, 111, 112, 115, 190, 306 polynomials, 18, 38, 48, 71, 72, 100, 102, 103, 106, 110, 113, 115, 187, 188, 190, 224 population, 193 power, 38, 39, 101, 139, 140, 153, 215, 227, 311 powers, 64, 100, 108, 138, 139, 140, 154 preparation, 97, 194, 240, 262 pressure, 291, 329 printing, 328 probability, 315 procedures, 127, 145, 152, 202, 204, 212, 213, 222, 251, 291, 307 production, 291 program, 126, 127, 163, 316 projector, 241 propagation, 256, 257, 289, 309, 310, 329 propagators, 308 property, 7, 13, 33, 35, 37, 39, 50, 53, 58, 75, 76, 78, 84, 92, 104, 127, 130, 152, 160, 161, 163, 167, 173, 176, 183, 192, 197, 217, 226, 246, 263, 266, 268, 301, 306, 313 pseudo, 266
Q quantization, 320, 323, 325 quantum, 12, 152, 156, 162, 166, 171, 172, 215, 216, 304, 310, 312, 315, 323, 324, 325, 327, 328 quantum chemistry, 166, 171
336
Index
quantum electrodynamics, 325 quantum fields, 12 quantum mechanics, 12, 315, 327 quantum theory, 12, 216, 310, 325, 328
R radiation, 257, 292, 330 radius, 59, 88, 89, 139, 168, 221, 293, 304, 305, 306, 307 range, 3, 12, 19, 23, 24, 25, 26, 28, 31, 36, 42, 44, 46, 58, 66, 70, 76, 81, 88, 96, 108, 115, 118, 120, 126, 129, 130, 138, 141, 146, 147, 153, 155, 157, 158, 160, 168, 173, 212, 215, 216, 221, 223, 232, 238, 239, 243, 253, 257, 265, 301 Rayleigh, 68, 196, 217, 228, 259, 260, 262 reaction rate, 328 real numbers, 6, 13, 16, 67, 265 reality, 329 reasoning, 201 recall, 157 reconstruction, 47, 251 rectangular domains, 256 reduction, 83, 90, 91, 151, 223, 310 regular, 222 relaxation, 233 relevance, 120 reliability, 234 renormalization, 325 residues, 91, 99, 109, 203, 204, 211 resolution, 36, 204 Rhode Island, 329 Ritz method, 217, 228, 259, 260, 262 routines, 49, 307, 309 Russian, 328
S satellite, 251 scalar, 17, 18, 19, 22, 29, 37, 43, 65, 94, 101, 110, 111, 113, 117, 121, 131, 147, 223, 239, 242, 243, 246, 255, 256, 257, 258, 262, 266, 288 scattering, 156, 157, 160, 162, 163, 165, 171, 216, 223, 224, 225, 226, 227, 228, 229, 232, 235, 238, 245, 246, 328 Schrödinger equation, 61, 304, 310, 312, 315, 317, 321, 323, 324, 327 scientific, 12, 17, 22, 65 seismic, 328 selecting, 51, 115, 116, 175, 187, 239
self, 33, 35, 53, 104, 201 separation, 223 series, 5, 9, 27, 38, 39, 45, 48, 49, 53, 100, 101, 104, 118, 119, 138, 139, 140, 153, 168, 227, 228, 252, 259, 260, 295, 311 shares, 127 short-range, 215 sign, 92, 162, 175, 221, 224, 293, 309 similarity, 127, 321 singular, 25 singularities, 88, 90, 91, 101, 109, 118, 141, 203, 234, 235, 246, 247, 249, 315 smoothness, 135, 136, 165 soliton, 306 solutions, 46, 51, 52, 54, 55, 66, 67, 68, 70, 71, 72, 74, 86, 88, 121, 136, 137, 140, 142, 143, 150, 151, 163, 168, 169, 186, 187, 190, 202, 203, 204, 210, 219, 220, 223, 225, 228, 233, 234, 235, 240, 255, 256, 261, 264, 269, 280, 285, 291, 292, 298, 301, 306, 308, 309, 315, 319, 320, 324, 327, 329 space-time, 12, 325 spectra, 39, 94, 95, 156, 157, 159, 160, 162, 171, 217 spectrum, 25, 28, 36, 37, 38, 40, 46, 53, 56, 58, 59, 75, 76, 80, 85, 88, 89, 90, 93, 98, 101, 107, 120, 122, 125, 136, 138, 141, 143, 145, 146, 147, 152, 153, 156, 157, 159, 160, 164, 165, 167, 168, 172, 216, 217, 218, 219, 220, 227, 234, 238, 244, 258, 259, 262, 266, 324 speed, 255, 323 speed of light, 255, 323 sporadic, 235 stability, 196 steady state, 291 Stieltjes, 10, 41, 99, 100, 101, 118 storage, 205 strength, 99, 172, 219, 323 substitution, 11, 72, 105, 163, 165, 186, 189, 190, 195, 200, 241, 298, 305 symbols, 18, 51, 64, 70, 71, 73, 103, 104, 105, 124, 133, 164, 165, 171, 189, 194, 195, 199, 200, 226, 229, 243, 244, 248, 249, 267, 268, 269, 284, 287, 294 symmetry, 33, 35, 53, 127, 129, 221, 319 synthetic, 254 systems, 18, 35, 152, 172, 187, 192, 194, 215, 217, 219, 223, 235, 249, 304, 310, 328
T Taylor series, 5, 64, 313, 315 technology, 257
Index temperature, 291 tensor products, 22, 41, 58, 217, 258 theoretical, 304 theory, 12, 42, 51, 71, 101, 154, 156, 157, 160, 162, 226, 327, 328 thermal, 291, 327, 328, 329 thermal equilibrium, 291 three-dimensional, 58, 255, 256, 312, 316, 324 topology, 26, 28, 75, 135 trajectory, 13, 310, 311, 312, 315, 316, 319 trans, 130 transfer, 78, 165, 291, 292 transformation, 36, 37, 222, 310 transformations, 38, 304, 323 transition, 157, 163, 171, 172, 194, 217, 226 translation, 256, 316, 317 transparency, 89, 125, 322 transparent, 49, 138, 192, 201, 242 transport, 56, 173, 216, 264, 289 transport phenomena, 173, 216, 264, 289 transpose, 203 travel, 251, 254 travel time, 251, 254 trial, 67, 68, 72, 194, 232, 233, 235, 238, 246 turbulent, 301 two-dimensional, 12, 54, 251, 253, 307
U uncertainty, 324 uniform, 4, 26, 33, 37, 74, 75, 77, 78, 82, 86, 87, 89, 91, 93, 94, 97, 126, 135, 136, 137, 150, 169, 170, 171, 220, 259, 271, 274, 275, 277, 293, 294, 297, 303 uniformity, 87
V vacuum, 304, 306, 324, 325 validity, 6, 9, 48, 49, 51, 53, 61, 70, 72, 98, 129, 139, 153, 166, 174, 210, 275, 298, 302, 314
337 values, 19, 25, 34, 36, 42, 59, 64, 66, 78, 82, 87, 99, 103, 104, 128, 129, 138, 156, 163, 164, 166, 170, 190, 202, 234, 235, 253, 255, 266, 290, 298, 311, 325 variable, 3, 5, 6, 10, 11, 12, 14, 25, 29, 36, 39, 46, 47, 51, 61, 63, 66, 78, 81, 99, 106, 110, 148, 173, 177, 253, 255, 308, 310, 318, 323, 324 variables, 5, 45, 51, 63, 114, 135, 305, 310, 324 variation, 4, 11, 64, 65, 66, 67, 68, 72, 78, 99, 172, 235, 246 vector, 15, 16, 17, 18, 21, 23, 25, 29, 30, 31, 32, 34, 35, 45, 47, 52, 55, 65, 67, 72, 73, 77, 79, 86, 87, 91, 96, 97, 103, 105, 107, 108, 110, 114, 116, 118, 119, 122, 125, 126, 135, 136, 141, 142, 145, 147, 149, 150, 153, 160, 165, 167, 184, 185, 193, 205, 207, 208, 224, 238, 239, 246, 249, 253, 256, 257, 258, 262, 271, 272, 304, 320, 321, 322, 324 velocity, 251
W warrants, 138 wave number, 257 wave packet, 223, 224 wave vector, 225, 255, 257 wood, 291 writing, 69
X X-ray, 251
Y yield, 5, 9, 52, 59, 61, 71, 83, 103, 106, 137, 140, 192, 196, 235, 240, 260, 278, 285, 290, 314, 319