6661
y
Managing editor A. Iserles DAMPT, University of Cambridge, Silver Street Cambridge CBS 9EW, England
Editorial Board C. de Boor, University of Wisconsin, Madison, USA F. Brezzi, Instituto di Analisi Numerica del CNR, Italy J. C. Butcher, University of Auckland, New Zealand P. G. Ciarlet, Universite Paris VI, France G. H. Golub, Stanford University, USA H. B. Keller, California Institute of Technology, USA H.-O. Kreiss, University of California, Los Angeles, USA K. W. Morton, University of Oxford, England M. J. D. Powell, University of Cambridge, England R. Temam, Universite Paris-Sud, France
eta
Nkimerica Volume 8
1999
CAMBRIDGE UNIVERSITY PRESS
PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE
The Pitt Building, Trumpington Street, Cambridge, United Kingdom CAMBRIDGE UNIVERSITY PRESS
The Edinburgh Building, Cambridge CB2 2RU, UK www.cup.cam.ac.uk 40 West 20th Street, New York, NY 10011-4211, USA www.cup.org 10 Stamford Road, Oakleigh, Melbourne 3166, Australia Ruiz de Alarcon 13, 28014 Madrid, Spain © Cambridge University Press 1999 This book is in copyright. Subject to statutory exception and to the provisions of the relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1999 Printed in the United Kingdom at the University Press, Cambridge Typeface Computer Modern ll/13pt
System IMgX [UPH]
A catalogue record for this book is available from the British Library Library of Congress Cataloguing in Publication data available ISBN 0-521-77088-2 hardback ISSN 0962-4929
Contents
Numerical relativity: challenges for computational science
1
Gregory B. Cook and Saul A. Teukolsky
Radiation boundary conditions for the numerical simulation of waves
47
Thomas Hagstrom
Numerical methods in tomography
107
Frank Natterer
Approximation theory of the MLP model in neural networks 143 Allan Pinkus
An introduction to numerical methods for stochastic differential equations
197
Eckhard Platen
Computation of pseudospectra Lloyd N. Trefethen
247
Ada Numerica (1999), pp. 1-45
© Cambridge University Press, 1999
Numerical relativity: challenges for computational science Gregory B. Cook and Saul A. Teukolsky* Center for Radiophysics and Space Research, Cornell University, Ithaca, NY 14853, USA E-mail: cook@spacenet. tn. Cornell. edu
[email protected]
We describe the burgeoning field of numerical relativity, which aims to solve Einstein's equations of general relativity numerically. The field presents many questions that may interest numerical analysts, especially problems related to nonlinear partial differential equations: elliptic systems, hyperbolic systems, and mixed systems. There are many novel features, such as dealing with boundaries when black holes are excised from the computational domain, or how even to pose the problem computationally when the coordinates must be determined during the evolution from initial data. The most important unsolved problem is that there is no known general 3-dimensional algorithm that can evolve Einstein's equations with black holes that is stable. This review is meant to be an introduction that will enable numerical analysts and other computational scientists to enter the field. No previous knowledge of special or general relativity is assumed.
CONTENTS 1 Introduction 2 Initial data 3 Evolution 4 Related literature 5 Conclusions References
2 14 25 38 39 40
* Also Departments of Physics and Astronomy, Cornell University, Ithaca, NY 14853.
G. B. COOK AND S. A. TEUKOLSKY
1. Introduction Much of numerical analysis has been inspired by problems arising from the study of the physical world. The flow of ideas has often been two-way, with the original discipline flourishing under the attention of professional numerical analysis. In this review we will describe the burgeoning field of numerical relativity, which aims to solve Einstein's equations of general relativity numerically. The field contains many novel questions that may interest numerical analysts, and yet is essentially untouched except by physicists with training in general relativity. The subject presents a wealth of interesting problems related to nonlinear partial differential equations: elliptic systems, hyperbolic systems, and mixed systems. There are many novel features, such as dealing with boundaries when black holes are excised from the computational domain, or how to even pose the problem computationally when the coordinates must be determined during the evolution from initial data. Perhaps the most important unsolved problem is that, at the time of writing, there is no known general 3-dimensional algorithm that can evolve Einstein's equations with black holes that is stable. What red-blooded computational scientist could fail to rise to such a challenge? This review is meant to be an introduction that will enable numerical analysts and other computational scientists to enter the field - a field that has a reputation for requiring arcane knowledge. We hope to persuade you that this reputation is undeserved. Our review will not assume any previous knowledge of special or general relativity, but some elementary knowledge of tensors will be helpful. We will give a brief introduction to these topics. This should be sufficient to follow the main part of the review, which describes the formulation of general relativity as a computational problem. We then describe various methods that have been proposed for attacking the problem numerically, and outline the successes and failures. We conclude with a summary of several outstanding problems. While numerical relativity encompasses a broad range of topics, we will only be able to cover a portion of them here. The style of this review is more informal than those usually found in this journal. There are two reasons for this. First, numerical relativity itself is largely untouched by rigorous investigation, and few results have been formalized as theorems. Second, the authors are physicists, for which we beg your indulgence. 1.1.
Resources
A somewhat terse introduction to the partial differential equations of general relativity aimed at mathematicians can be found in Taylor (1996, Section 18). A more leisurely and complete exposition of the subject is given
NUMERICAL RELATIVITY
3
by Sachs and Wu (1977). Standard textbooks aimed at physicists include Misner, Thorne and Wheeler (1973) and Wald (1984). Several collaborations are working on problems in numerical relativity. Information is available at the web sites www.npac.syr.edu/projects/bh and j e a n - l u c . n c s a . u i u c . e d u . These sites also include links to DAGH (Parashar and Brown 1995), a package supporting adaptive mesh refinement for elliptic and hyperbolic equations on parallel supercomputers.
1.2. Special relativity Physical phenomena require four coordinates for their specification: three for the spatial location and one for the time. The mathematical description of special relativity unifies the disparate concepts of space and time into spacetime, a 4-dimensional manifold that is the arena for physics. Points on the manifold correspond to physical events in spacetime. The geometry of spacetime is described by a pseudo-Euclidean metric, ds 2 = - dt2 + dx2 + dy2 + dz2,
(1.1)
which describes the infinitesimal interval, or distance, between neighbouring events.1 All of physics takes place in this fixed background geometry, which is also called Minkowski space. We label the coordinates by Greek indices a, (3,... , taking on values from 0 to 3 according to the prescription x° = t,
xl = x,
x2 = y,
x 3 = z.
(1.2)
Then, if we introduce the metric tensor r?Q/3 = d i a g ( - l , 1,1,1),
(1.3)
we can write equation (1.1) as ds 2 = r]a3 dxa dx13.
(1.4)
Here and throughout we use the Einstein summation convention: whenever indices are repeated in an equation, there is an implied summation from 0 to 3. A special role is played by null intervals, for which ds2 = 0. Events connected by such an interval can be joined by a light ray. More generally, a curve in spacetime along which ds 2 = 0 is a possible trajectory of a light ray, and is called a null worldline. Similarly, we talk of timelike intervals and timelike worldlines (ds 2 < 0) and spacelike intervals and spacelike worldlines 1
We always use the same units of measurement for time and space. It is convenient to choose these units such that the speed of light is one. Thus 1 second of time is equivalent to 3 x 1010 cm of time.
4
G. B. COOK AND S. A. TEUKOLSKY
(ds2 > 0). For a timelike worldline, the velocity (1.5) is everywhere less than 1; this corresponds to the trajectory of a material particle. A spacelike worldline would correspond to a particle travelling faster than the speed of light, which is impossible. Just as rotations form a symmetry group for the Euclidean metric, the set of Lorentz transformations forms the symmetry group of the metric (1.4). A Lorentz transformation is defined by a constant matrix A a a that transforms the coordinates according to a x
_*. xa' = Aa'aXa.
(1.6)
It must preserve the interval ds 2 between events. Substituting the transformation (1.6) into (1.4) and requiring invariance gives the matrix equation 7] = ATr/A.
(1.7)
This equation is the generalization of the relation 6 = RTR for the rotation group, where 6 is the Kronecker delta (identity matrix), the Euclidean metric tensor, and R is a 3 x 3 rotation matrix. The Lorentz group turns out to be 6-dimensional. It contains the 3-dimensional rotation group as a subgroup. The other three degrees of freedom are associated with boosts, transformations from one coordinate system to another moving with uniform velocity in a straight line with respect to the first. Note that in special relativity we select out a preferred set of coordinate systems for describing spacetime, those in which the interval can be written in the form (1.1). These are called inertial coordinate systems, or Lorentz reference frames. An observer in spacetime makes measurements - that is, assigns coordinates to events. Thus an observer corresponds to some choice of coordinates on the manifold. Corresponding to the inertial or Lorentz coordinates, we also use the terms inertial observers or Lorentz observers. The relation (1.6) is phrased in physical terms as follows: all inertial observers are related by Lorentz transformations. Physically, an inertial observer is one for whom a free particle moves with uniform velocity in a straight line. Note that the worldline in spacetime (curve on the manifold) traced out by a free particle is simply a geodesic of the metric. Requiring invariance of the interval under Lorentz transformations builds in one of the physical postulates of special relativity, that the speed of light is the same when measured in any inertial reference frame. For ds2 = 0 is equivalent to v = 1, and a Lorentz transformation preserves ds 2 . The second far-reaching postulate of Einstein was that one cannot perform a
NUMERICAL RELATIVITY
5
physical experiment that distinguishes one inertial frame from another. In other words, suppose we write down an equation for some purported law of nature in one inertial coordinate system. Then we transform each quantity to another coordinate system moving with uniform velocity. When we are done, all quantities related to the velocity of the new frame must drop out of the equation, otherwise we could find a preferred frame with no velocity terms. This requirement turns out to restrict the possible laws of nature quite severely, and has been an important guiding principle in discovering the form of the laws. Mathematically, we implement the second postulate by writing all the laws of physics as tensor equations. We can always write such an equation in the form: tensor = 0. Since the tensor transformation law under Lorentz transformations is linear, if such an equation is valid in one inertial frame it will be valid in any other in the same form. One could use non-Lorentzian coordinates to describe spacetime. For example, one could use polar coordinates for the spatial part of the metric, or one could use the coordinates of an accelerated observer. However, the interpretation of these coordinates would still be done by referring back to an inertial coordinate system. The underlying geometry is still Minkowskian. Special relativity turns out to be entirely adequate for dealing with all the laws of physics, as far as we know, except for gravity. Einstein's great insight was that gravity could be described by giving up the flat metric of Minkowski geometry, and introducing curvature. 1.3. General relativity In general relativity, spacetime is still a 4-dimensional manifold of events, but it is endowed with a pseudo-Riemannian metric: ds2 = ga0dxadx0.
(1.8)
No choice of coordinates can reduce the metric to the form (1.4) everywhere: spacetime is curved. The metric tensor gap and its derivatives play the role of the 'gravitational field', as we shall see. The coordinates xa can be any smooth labelling of events in spacetime, and we are free to make arbitrary transformations between coordinate systems, xa^xa'
=xa\xa).
(1.9)
This is the origin of the 'general' in general relativity (general coordinate transformations). If the coordinates can be completely arbitrary, not necessarily related directly to physical measurements, how are measurements carried out in the theory? The answer depends on the following theorem. At any point in a manifold with a pseudo-Riemannian metric, there exists a coordinate
6
G. B. COOK AND S. A. TEUKOLSKY
transformation such that 0
= 0.
(1.10)
In other words,
ds2 = [Va0 + O(\x\2)} dxa dx$.
(1.11)
The proof follows from counting the degrees of freedom in the Taylor expansion of the transformation (1.9) about the chosen point. In fact, there is a whole 6-parameter family of such transformations, all related by Lorentz transformations that preserve r]ap. We call one of these coordinate systems a local Lorentz frame. It is the best approximation to the global Lorentz frames of special relativity that can be found in a general pseudo-Riemannian metric. To first order in |x|, the geometry is the same as that of special relativity. The observer can make measurements as in special relativity, provided they are local. In particular, ds2 itself is a physically measurable invariant. Departures from special relativity will be noticed on the scale set by the second derivatives of gap: the stronger the gravitational field, the more curved spacetime is, the smaller is this scale. Not only are measurements in a local inertial frame carried out as in special relativity. General relativity asserts that all the nongravitational laws of physics are the same in a local inertial frame as in special relativity. This is the Principle of Equivalence, a generalization from Einstein's famous thought experiment about an observer inside a closed elevator. Physics inside a uniformly accelerated elevator is indistinguishable from physics inside a stationary elevator in a uniform gravitational field. Conversely, inside an elevator freely falling in a uniform gravitational field there are no observable gravitational effects. A local inertial frame is just the reference frame of a freely falling observer. The mathematical implementation of the Principle of Equivalence is very similar to the mathematical implementation of the special relativity principle for uniform velocity, namely to write the laws of physics as tensor equations. Now, however, the tensors must be covariant under arbitrary coordinate transformations, not just under Lorentz transformations between inertial coordinate systems. A (nonunique) way of doing this is to start with any law valid in special relativity and replace all derivative operators by covariant derivative operators. In a general coordinate system, this introduces extra terms, the connection coefficients (Christoffel symbols). They are assumed to represent the effects of the gravitational field. Contrast this with special relativity. There, transforming from one inertial frame to another introduces terms from the velocity of the transformation. Covariance requires that these terms cancel out, restricting the form of the laws. Here, covariance introduces terms involving derivatives of the metric that are interpreted as gravitational effects. Thus no purported law of physics that is valid in special
NUMERICAL RELATIVITY
7
relativity can be ruled out a priori: the real world has to be consulted via experiment. An example of a generalization of a law from special to general relativity is the law of motion of a test particle: we postulate that the worldline is a geodesic of spacetime. In Newton's theory of gravity the gravitational field is measured simply by the gravitational acceleration of a test particle released at a point. In general relativity, gravitational effects can always be removed locally by going to a freely falling frame. So what is the meaning of a 'true' gravitational field at a point? The answer is that the true gravitational field is a measure of the difference between the gravitational accelerations of two nearby test bodies. This is often called the tidal gravitational field, since the difference between the Moon's pull on different parts of the Earth is responsible for the tides in Newtonian gravity. Differential geometers will recognize that the tidal gravitational field is encoded in the Riemann tensor, since we are describing the separation of neighbouring geodesies. l.Jf.. Some differential geometry We summarize here some basic formulas of differential geometry. Our purpose is mainly to establish notation and sign conventions, which unfortunately are not standardized in the literature. A vector V at any point in the manifold can be expressed in terms of its components in some basis V = Vaea.
(1.12)
In this paper we will restrict ourselves to coordinate basis vectors for simplicity. These are tangent to the coordinate lines, so we can write them as the differential operators
The dot product of the basis vectors is given by the metric tensor ea-ep
= gaf3.
(1.14)
The 1-forms comprise the dual space to the space of vectors, that is, for every vector V and 1-form A, (A, V) defines a linear mapping to the real numbers. Since we are in a metric space, we set up a correspondence between vectors and 1-forms: V corresponds to V if and only if (V", W) = V W for all W. If we introduce basis 1-forms to write the components Va of V, then the correspondence can be written Va = ga0Vp.
(1.15)
This is called 'lowering an index'. In physical applications, we treat a vector
8
G. B. COOK AND S. A. TEUKOLSKY
and its corresponding 1-form as describing the same physical quantity, just with different representations. In the older literature, vectors and 1-forms are called contravariant vectors and covariant vectors. We still refer to the components as contravariant (up) or covariant (down). This use of the term 'covariant' should not be confused with the generic usage that denotes correct transformation properties under coordinate transformations. Tensors are multilinear maps from product spaces of 1-forms and vectors to real numbers. For example, Tai^AaBfiC1
= number.
(1.16)
Again, we do not distinguish between tensors where a 1-form is replaced by its corresponding vector or vice versa: T^^A^B^C^ = Ta^AaB/3C1.
(1.17)
This leads to 'index gymnastics', where components of a tensor can be raised and lowered with gap or the inverse metric tensor ga@, T V = 5 a %/3 7 -
(1-18)
The covariant (coordinate invariant) derivative operator is represented by the operator VQ, which denotes the ath component of the covariant derivative, or the covariant derivative in the a direction. The covariant derivative of a scalar is simply the usual partial derivative: if f(x^) is a scalar function over the manifold, then its covariant derivative is
VJ = ^
- daf.
(1.19)
The covariant derivative of a vector field with components V^ is a secondrank tensor with components defined by V Q y = daV* + VaT»aa.
(1.20)
Here, T^av is the connection coefficient, which is not a tensor. The corresponding formula for a 1-form follows from linearity and the fact that (A, V) is a scalar: V Q ^ = daAp - AaT\a. (1.21) Similarly, for a general tensor the covariant derivative is the partial derivative with one 'correction term' with a plus sign for each up-index, and one correction term with a minus sign for each down-index. The values of the connection coefficients are T^aa = \g^{dagva
+ da9va - dvgaa).
(1.22)
This formula follows from the requirement that the connection be compatible with the metric, that is, the covariant derivative of the metric vanishes, = da9llv - 9avT\a - g^T%a = 0.
(1-23)
NUMERICAL RELATIVITY
9
Covariant derivatives do not commute in general. The noncommutation defines the Riemann curvature tensor VaV/jV - V/jVaF" = R»val3Vu.
(1.24)
Its components can be written in terms of the connection and its derivatives (in a coordinate basis) as
R^P = dar"v0 - d0r\a + T»aar\p ~ r w .
(1.25)
Note that the Riemann tensor depends linearly on second derivatives of the metric and quadratically on first derivatives of the metric. Various symmetries reduce the number of independent components of the Riemann tensor in four dimensions from 44 to 20. It is a theorem that the Riemann tensor vanishes if and only if the geometry is flat, that is, there exist coordinates such that ga0 = r\ap everywhere. A contraction of a tensor produces another tensor of rank lower by two. For example, = A»afr.
(1.26)
Contractions of the Riemann tensor are very important in general relativity. They are called the Ricci tensor, rCfus = K fj,avj
(1-27)
R = Raa-
(1-28)
and the Ricci scalar,
The Einstein tensor is the trace-reversed Ricci tensor G^ = i V - \gixuR.
(1.29)
The covariant derivatives of the Riemann tensor satisfy certain identities, the Bianchi identities. Contracting these identities shows that the Einstein tensor satisfies four identities, also called the Bianchi identities: VuG^ = 0.
(1.30)
These identities play a crucial role in the formulation of general relativity. 1.5. Einstein's field equations We have discussed how gravitation affects all the other phenomena of physics. To complete the picture we need to describe how the distribution of mass and energy determines the geometry, gap. Newtonian gravitation can be described as a field theory for a scalar field $ satisfying Poisson's equation, V 2 $ = 4-irGp.
(1.31)
10
G. B. COOK AND S. A. TEUKOLSKY
Here p is the mass density and G is Newton's gravitational constant, which depends on the units of measurement. The gravitational acceleration of any object in the field is given by — V<£. Because Newtonian gravity is governed by an elliptic equation, changes in the distribution of matter instantaneously change the gravitational potential everywhere. Propagation of effects at speeds greater than the speed of light leads to causality violation, and Newtonian gravity is not consistent with special relativity. General relativity is a dynamical theory in which changes in the gravitational field propagate causally, at the speed of light. Einstein's field equations are written as GV = 8vrGT^,
(1.32)
where G^p is the Einstein tensor (1.29) and T^ is the stress-energy tensor of matter and fields in the spacetime. In essence, (1-32) says that matter and energy dictate how spacetime is curved. The Bianchi identities (1.30) applied to Einstein's equations (1-32) imply that VUT^U = 0, which expresses conservation of the total stress-energy of the system, and is a fundamental property of all descriptions of matter. Thus (1.32) also says that the curvature of spacetime dictates how matter and energy flow through it. To solve Einstein's equations, we must find a metric that satisfies (1.32) at all spatial locations for all time. The metric we are looking for exists on a 4-dimensional manifold but, interestingly enough, Einstein's equations do not specify the topology of that manifold. Furthermore, the coordinates labelling points on the manifold are also freely specifiable. Coordinate freedom (e.g., using spherical or cylindrical coordinates) is common in solving field equations such as those of hydrodynamics, but there is a fundamental difference in the case of general relativity. With hydrodynamics, one solves for the density and velocity of matter within some specified geometry. The exact form of, say, the divergence of a vector field may vary depending on the coordinate system used, but the value of that divergence does not change. In general relativity, we are solving for the geometry that defines what the divergence operator means. In addition to changing the spatial coordinate system, we are also free to redefine the temporal coordinate. We can redefine the time coordinate so that the shape and embedding of 3-dimensional constant-time slices vary throughout the 4-dimensional manifold. This is a freedom that is not exploited in Newtonian hydrodynamics, but is very important in general relativity. Given a solution of Einstein's equations g^u, we may find that one choice of coordinates will lead to singularities in the metric, while another choice may be perfectly regular. How to determine a good choice of coordinates is one of the major open questions in numerical relativity. As with most complex theories, the majority of solutions to Einstein's equations have been obtained in the case of special symmetries, or in cer-
NUMERICAL RELATIVITY
11
tain limits where perturbation theory can be applied. The more general and more interesting solutions can only be obtained via numerical techniques. Given that general relativity is a 4-dimensional theory, a natural approach for solving the equations might be to discretize the full 4-dimensional domain into a collection of simplexes and solve the equations somehow on this lattice. A discrete form of Einstein's equations based on this idea was developed by Regge (1961) (see also Williams and Tuckey (1992)). While considerable efforts have been made to implement numerical schemes based upon this Regge calculus approach, they have not yet moved beyond test codes (cf. Barrett et al. (1997); Gentle and Miller (1998)). 1.6. Einstein's equations as a Cauchy problem G^ and T^v are symmetric in their indices, so (1-32) represents ten independent equations. From the definition of the Einstein tensor (1.29), we see that these ten equations are linear in the second derivatives, and quadratic in the first derivatives, of the metric. Since there are ten components of g^, it seems that we have the same number of equations as unknowns. But recall that there are four degrees of freedom to make coordinate transformations that leave ds2 invariant, according to equation (1.9). The problem is still well-posed, however, because of the four Bianchi identities (1.30). We therefore expect the ten equations (1.32) to decompose into four constraint or initial value equations, and only six evolution or dynamical equations. If the four initial value equations are satisfied at t = 0, the Bianchi identities guarantee that the evolution equations preserve them - at least analytically, if not numerically! (An analogous situation occurs for the initial value problem in Maxwell's equations of electromagnetism.) Another way of seeing that there are only six dynamical Einstein equations is that, when they are written out, only six involve second time-derivatives of the metric. Let us now consider the initial value formulation more carefully. Foliate the 4-dimensional manifold with a set of spacelike, 3-dimensional hypersurfaces (or slices) {£}. Label the slices by a parameter t, that is, the slices are t = constant. Let x% be spatial coordinates in the slices. (Latin indices range from 1 to 3 in this so-called 3 + 1 formulation of Einstein's equations.) Let n be the unit normal at some point on a slice, that is, fi = -aVt.
(1.33)
Choose the scalar function a to set the spacing of the slices by dsjalongn = &dt.
(1-34)
Here a is called the lapse function (sometimes denoted N in the literature), since it relates how much physical time elapses (ds) for a given coordinate
12
G. B. COOK AND S. A. TEUKOLSKY
Fig. 1. The 3 + 1 decomposition of spacetime. Neighbouring slices of the foliation are labelled by the value of the time coordinate on that slice. Spatial coordinates remain constant along the t direction as they evolve from S to to Eto+dt- n is the unit normal vector to the slice E to time change (dt). Equation (1.34) is equivalent to _
d
an= — dt along n, fixed x'
(1.35)
since then an-an
-
2
9
d
(1.36) gtt, dt dt which is the coefficient of dt2 in ds 2 when xl = constant, as required by (1.34). Now, in general, one is not required to evolve initial data off the t = 0 slice along the normal congruence. Consider a non-normal congruence threading the family of spacelike hypersurfaces. Let = -of
t
=
(1.37)
~d~t
be the tangent vector to the congruence, that is, t connects points with the same spatial coordinates xl. Then we can write (see Figure 1) t = an + 0,
P-n = O.
(1.38)
The spatial vector f3l is called the shift vector (sometimes denoted Nl in the literature). In terms of the metric components, equation (1.38) is equivalent to d_ d___ 2 (1.39) dt dt d d (1.40) 9ti ~ dt' Denote the spatial part of the metric gij by ^ . The quantity jij describes the intrinsic geometry on a 3-dimensional slice S. Then, in light of equa-
NUMERICAL RELATIVITY
13
tions (1.39) and (1.40), the general pseudo-Riemannian metric (1.8) can be rewritten as ds2 = -a2 dt2 + -jijidx1 + ft di)(dz J + ft di).
(1.41)
This is the standard starting point of most numerical attempts to solve Einstein's equations. We have seen that the four coordinate degrees of freedom in the theory are parametrized by a and (3l. (This is also called the gauge freedom of the theory.) We regard 7^ as a 'fundamental' variable of the theory. Rather than work with Einstein's equations as second order in time for this quantity, we introduce its 'time-derivative' Kij called the extrinsic curvature.2 The quantities 7^ and Kij completely describe the instantaneous state of the gravitational field. Recall that, in the 4-dimensional form of Einstein's equations, six of the ten field equations contain second time-derivatives. These now correspond to twelve first-order evolution equations for 7^ and K^. The particular value of 7^ induced by the 4-metric g^v onto a slice £ depends on how S is embedded into the full spacetime. In order for the foliation of slices {£} to fit into the higher-dimensional space, they must satisfy a set of four elliptic constraint equations. These are the remaining four field equations. We can write the twelve first-order evolution equations for 7^ and Kij as follows: + Vipj + VjPu dtKij
(1.42)
= a [iUj - 1KiiK\ + KKtj - SvGSij + AITG^J {S - p)] - VSjCt + ftViK.j + KuVjl? + KjiVlf3e.
(1.43)
Here Vj is the spatial covariant derivative compatible with 7^, Rij is the Ricci tensor associated with 7^, K = Kj, pis the matter energy density, Sij is the matter stress tensor, and 5 = Sj. The four constraint equations can be written as R + K2 ~Kl3Kij
= 16nGp,
(1.44)
=
(1.45)
8nGf.
Here, R = R\ and f is the matter momentum density. Equation (1-44) is referred to as the scalar or Hamiltonian constraint, while the three equations in (1-45) are referred to as the vector or momentum constraints. Both can be transformed into standard elliptic forms, as described in Section 2.1. In this 3 + 1, or Cauchy, initial value formulation of Einstein's equations, we evolve the gravitational field from some initial time slice So through 2
More precisely, Kij = —^Ca'yij, where C denotes the Lie derivative.
14
G. B. COOK AND S. A. TEUKOLSKY
time using (1.42) and (1.43). The initial data for the evolution are "fij and Kij, which must be chosen to satisfy the constraints (1.44) and (1.45) on So- As mentioned earlier, it can be shown that the evolution preserves the constraints. 1.7. The characteristic initial value problem An alternative approach for posing Einstein's equations as an initial value problem is to foliate spacetime with a set of null hypersurfaces. This leads to the 2 + 2, or characteristic, initial value formulation of general relativity (see Bishop, Gomez, Lehner, Maharaj and Winicour (19976) and references therein). The characteristic formulation of Einstein's equations is particularly adept at following gravitational waves propagating through the spacetime, but has difficulty in highly dynamic, strong field regions where the null surfaces tend to form caustics. Because of this problem, and the limited scope of this review, we will focus entirely on the Cauchy initial value formulation of general relativity. 2. Initial data The initial data for the Cauchy formulation of general relativity are the metric 7^ and extrinsic curvature Kij. These each have six components that must be fixed, a total of twelve. As discussed in Section 1.6, general relativity has a 4-dimensional coordinate invariance or gauge freedom that can be parametrized by the lapse and shift functions. These functions can be chosen to specify four of the twelve quantities (or relations among them). The four constraint equations fix four more quantities. The remaining four quantities describe the two 'dynamical degrees of freedom' of general relativity, four quantities satisfying first-order dynamical equations, or equivalently two quantities satisfying second-order wave-like equations. These four quantities are freely specifiable initial data, corresponding roughly to the initial gravitational wave content of the spacetime. In the weak field limit where the equations of general relativity can be linearized, there are clear ways to determine which components are dynamic, which are constrained, and which are gauge. However, in the full nonlinear theory, there is no unique decomposition. The approach one follows for decomposing the metric and extrinsic curvature determines the final form of the elliptic equations that constrain the initial data. 2.1. York—Lichnerowicz conformal
decomposition
The most widely used approach for separating out the freely specifiable initial data from the constrained initial data is the York-Lichnerowicz conformal decomposition. Here we give a brief summary. For a more complete discussion, with references to the original literature, see York (1979).
NUMERICAL RELATIVITY
15
First the metric is decomposed into a conformal factor multiplying a 3metric: Hj = ip%j.
(2.1)
The auxiliary 3-metric 7Y7 is called the conformal 3-metric. Its determinant can be normalized to some convenient value, leaving five degrees of freedom. Using (2.1), we can rewrite the Hamiltonian constraint (1.44) as VV - \ipR - \ipbK2 + \^Kl3K1^
= -2TTGIP5P,
(2.2)
where V2 and R are the scalar Laplace operator and the Ricci scalar associated with 7y. Equation (2.2) shows that i\) is constrained by the elliptic Hamiltonian constraint. The five components of 7^ contain two freely specifiable degrees of freedom together with three pieces of information related to the 3-dimensional spatial gauge freedom. These three pieces of information are essentially the initial choice of the spatial coordinate system which are then propagated by the shift vector. The extrinsic curvature is decomposed into its trace K and trace-free parts Alj via
Kij = AlJ + lfJK.
(2.3)
The embedding of the initial data hypersurface within the full spacetime fixes the initial time coordinate, the choice then being propagated by the lapse. Thus one piece of K^ is used to specify the time coordinate, and it is taken to be the trace K for geometric and physical reasons (O Murchadha and York 1974). K is thus freely specifiable in the initial data. The five components of A u can be further decomposed using a transverse-traceless decomposition. In order to write the full set of constraints in terms of operators on the conformal 3-geometry, it is necessary to conformally decompose Ali also. The conformal and transverse-traceless decompositions of A%:> do not commute, leading to two different formulations of the full set of constraint equations. Historically, the most widely used decomposition has applied the transverse-traceless decomposition to the conformally rescaled version of Ali. While somewhat less physically motivated, under certain simplifying assumptions this approach decouples the vector constraint equation (1-45) from the Hamiltonian constraint. This was an important simplification when computational power was limited. This is not so much of a concern any more and we present the alternative decomposition here. Readers wishing to skip the details can proceed to equation (2.9). We first decompose A%i as (2.4) where ij
U
j
je
he
(2.5)
16
G. B. COOK AND S. A. TEUKOLSKY
and Qli is a symmetric transverse-traceless tensor (i.e., it satisfies VjQ^ = Q\ = 0). The remainder of Ai:>, constructed from (LW) y , is referred to as the trace-free longitudinal part of the extrinsic curvature. In general, one would construct Q u ' from a general symmetric, trace-free tensor M u by subtracting off its longitudinal part. However, since the vector constraint is linear in Kij, we can rewrite (2.4) as A* = ip-4(LV)ij
+ i;-10Mij,
(2.6) ij
ij
j
where (LV) is defined as in (2.5) but with Vj -> Vj and -f -> f . Note that the longitudinal part of Ali is constructed from a new vector V1, not Wl (see below), and M1-7 = ij)wMl:>. We can now rewrite the vector, or momentum constraint (1-45) as j
^
Inip = f^VjA" - ip-6VjMij + 8irGip4jl.
(2.7)
This is a vector elliptic equation for V"\ where
AhVl = Vjilvy* = y'VfiiV1 + \iuV^3Vj)
+ feRejV\
(2.8)
and Rij is the Ricci tensor associated with the conformal 3-geometry jij. The vector Vz is a linear combination of both the three constrained longitudinal components of Ali represented by Wl in (2.4) and the longitudinal components of MXK Since Ali is traceless, this means that Qli, the transverse-traceless part of MlJ, contains two freely specifiable quantities that are taken as the two gravitational degrees of freedom. Finally, given (2.6), we can rewrite the Hamiltonian constraint (2.2) as (2.9) Equations (2.9) and (2.7) form the coupled set of four elliptic equations that must be solved with appropriate boundary conditions in order to specify gravitational data properly on a given constant-time slice. In the historically more widely used decomposition, the trace-free extrinsic curvature is expressed as
A* = ip-10Aij = i>-10 [(lV)i:> + M ij ] ,
(2.10)
and the Hamiltonian and momentum constraints reduce to 2
\
^
5
2
^
7
i
e
^
^ ,
(2.11) (2.12)
For more on this version of the decomposition, see York (1979). Two simplifying (but restrictive) choices are frequently made with the York-Lichnerowicz decomposition. First, the conformal 3-metric 7^ is taken
NUMERICAL RELATIVITY
17
to be flat (i.e., 6ij in Cartesian coordinates) and the full 3-geometry is said to be conformally flat. This is a reasonable choice, since it is true in the limit of weak gravity. This assumption simplifies the elliptic equations because now Rij = R = 0 and the derivative operators become the familiar flat-space operators. The second assumption usually made is that K = 0. This says that the initial data slice So is maximally embedded in the full spacetime. This is a physically reasonable assumption and, for the case of the decomposition (2.10), decouples the Hamiltonian and momentum constraint equations (2.11) and (2.12). These simplifying choices are used so frequently that many people implicitly assume that they are required in the YorkLichnerowicz decomposition. This, however, is not the case: the York-Lichnerowicz decomposition can be used to construct any initial data. In order to pose the problem of constructing gravitational initial data properly, we must specify boundary conditions. We will discuss the boundary conditions at the surfaces of black holes in Section 2.2. We must also specify boundary conditions at infinity. We are interested in the astrophysically relevant case of isolated systems (as opposed to cosmological models, for example). In this case, we demand that the hypersurface is M3 outside some compact set, and choose the data to be 'asymptotically flat'. A full and rigorous formulation of asymptotic flatness is quite tedious and unnecessary (see, for instance, York (1979) and references therein). For our purposes it will be sufficient to use the following. Assume that we are using a Cartesian coordinate system so that the spatial metric can be written as jij = Sij + hij. For the metric, it is sufficient to demand that htj = 0(r-1),
dkhtj = O(r'2) ,
r^oo.
(2.13)
For the extrinsic curvature, it is sufficient to demand that r^oo.
(2.14)
2.2. Black hole initial data Surprisingly, the most general isolated black hole in equilibrium is described by an analytic solution of the Einstein equations, the Kerr metric (Misner et al. 1973, Section 33). The solution contains two parameters, the mass and angular momentum (spin) of the black hole. (The solution that includes electric charge, the Kerr-Newman metric, is not likely to be astrophysically important.) A nonrotating black hole is a limiting case, described by the spherically symmetric Schwarzschild metric. The challenge in constructing more general black hole spacetimes is to devise schemes that can handle one or more holes with varying amounts of linear and angular momentum on each hole. One of the difficulties in constructing black hole initial data is that they almost always contain singularities.
18
G. B. COOK AND S. A. TEUKOLSKY
Most schemes for specifying black hole initial data avoid the singularities by imposing some form of boundary condition near the surface of each of the black holes. (An alternative is to include some kind of matter source to produce the black hole by gravitational collapse; see Shapiro and Teukolsky (1992) for an example.) The most thoroughly studied of these approaches uses the freedom within general relativity to specify the topology of the manifold. A maximal slice of the Kerr solution, the most general stationary black hole solution, has the property that it consists of two identical, causally disconnected universes (hypersurfaces) that are connected at the surface of the black hole by an 'Einstein-Rosen bridge' (Einstein and Rosen 1935, Misner et al. 1973, Brandt and Seidel 1995). We are free to demand that more general black hole initial data be constructed in a similar way from two identical hypersurfaces joined at the black hole 'throats' (Misner 1963). A method of images applicable to tensors can be used to enforce the isometry between the solutions on the two hypersurfaces and the isometry induces boundary conditions on the topologically S2 fixed point sets that form the boundaries where the two hypersurfaces are joined (Bowen 1979, Bowen and York 1980, Kulkarni, Shepley and York 1983, Kulkarni 1984). A second approach completely bypasses the issue of the topology of the initial data hypersurface by imposing a boundary condition at the 'apparent horizon' associated with each black hole (Thornburg 1987). We will come back to apparent horizons in Section 3.3. Yet another approach is based on factoring out the singular behaviour of the initial data (Brandt and Briigmann 1997). This approach uses an alternative topology for the initial data hypersurface in which each black hole in 'our' universe is connected to a black hole in a separate universe, producing a solution with JVBH + 1 causally disconnected universes joined at the throats of ./VBH black holes. This approach has the advantage of not requiring that boundary conditions be imposed on a spherical surface at each hole, making it easier to use a Cartesian coordinate system. All three of these approaches for constructing black hole initial data are simplified by being constructed on a conformally flat, maximally embedded hypersurface. Because they all use the alternative transverse-traceless decomposition of the extrinsic curvature, the Hamiltonian and momentum constraint equations are decoupled. In vacuum, there exists an analytic solution for the background extrinsic curvature A13 that satisfies the momentum constraint (1.45) for any tp (Bowen and York 1980). For a single black hole, this solution is A** =
~
+ ^
[ekieSenknj + ekitStnkvf\ .
(2.15)
NUMERICAL RELATIVITY
19
Here Pl and S% are the linear and angular momenta of the black hole, r is the Cartesian coordinate radius from the centre of the black hole located at Cl, and nl = (xl — Cl)/r in Cartesian coordinates. Further, fij is the flat metric in whatever coordinate system is used, and e ^ is the totally antisymmetric tensor. For a general spatial metric 7^, it is defined as e^-fc = y/^[ijk\, where [ijk] is the totally antisymmetric permutation symbol with [123] = 1, and
-7iintmAuAim
= 0,
(2.16)
o
The boundary condition on ip at large distances from the collection of black holes can be obtained from its asymptotic behaviour, ip —> l + C/r + O(r~2) where C is a constant. When the choice of topology is that of two isometric hypersurfaces, the isometry induces a boundary condition on the spherical surface where the two hypersurfaces connect, U
(2 17)
^^ = ~Yr
'
When the inner boundary is constructed to be an apparent horizon instead of using two isometric hypersurfaces, (2.17) is modified with a nonlinear correction; see Thornburg (1987) for details. The limitation of all three of the solution schemes described above is that the simplifying choice of a conformally flat 3-geometry and the analytic solution for the background extrinsic curvature represent a very limited choice for the unconstrained, dynamical portion of the gravitational fields. Also, a maximal slice (K = 0) may not always be a good choice for numerical evolutions. Moving beyond these limitations is the major challenge to be faced in constructing black hole initial data. This will certainly require solving the full coupled system of equations (2.9) and (2.7) (or alternatively (2.11) and (2.12)).
In equation (B7) of Cook (1991), '1 for n = V should read 'a
1
for n = 1'.
20
G. B. COOK AND S. A. TEUKOLSKY
2.3. Equilibrium stars An equilibrium or stationary solution of Einstein's equations has no time dependence. In coordinate-invariant language, the solution admits a Killing vector that is timelike at infinity. The metric is specified by a solution of the initial value equations that also satisfies the dynamical equations with time-derivatives set to zero. An important class of such solutions describes rotating equilibrium stars, which are axisymmetric. In axisymmetry there are just three nontrivial initial value equations. There is only one further equation to be satisfied from among the dynamical equations, and it is also elliptic because the time-derivatives have been set to zero. It is simpler in this case just to choose an appropriate form for the metric and solve the resulting four equations directly, without going through something like the York—Lichnerowicz decomposition. There are many numerical approaches for solving these equations to high accuracy: see Butterworth and Ipser (1976) and Friedman, Ipser and Parker (1986) and references therein for a description of a pseudo-spectral method; Komatsu, Eriguchi and Hachisu (1989) and Cook, Shapiro and Teukolsky (1994) and references therein for an iterative method based on a Green's function; Bonazzola, Gourgoulhon, Salgado and Marck (1993) and references therein for a spectral method. 2.4- Binary black holes The most important computations confronting numerical relativity involve binary systems containing black holes or neutron stars. Large experimental facilities are being built around the world in an effort to detect gravitational waves directly from astrophysical sources in the next few years (Abramovici et al. 1992). These binary systems are prime candidates as sources: as they emit gravitational waves they lose energy and slowly spiral inwards, until they finally plunge together emitting a burst of radiation. Since emission of gravitational radiation tends to circularize elliptical orbits, one is interested in initial data corresponding to quasicircular orbits. For the case of a binary black hole system, a very high accuracy survey has been performed to locate these orbits (Cook 1994). In this work, quasicircular orbits were found by locating binding energy minima4 along sequences of models with constant angular momentum (see Figure 2). Locating these minima required extremely high accuracy, which was achieved using a combination of techniques. First, the Hamiltonian constraint (2.16) was discretized on a numerically generated coordinate system specifically adapted 4
The equations for equilibrium stars can be derived from an energy variational principle. Thus the stability of such stars can be analysed by examining turning points along oneparameter sequences of equilibrium solutions (Sorkin 1982). The extension of this idea to quasi-equilibrium sequences is plausible, but has not been rigorously demonstrated.
NUMERICAL RELATIVITY
21
-0.02
-0.04
-0.06 -
-008 -
//m Fig. 2. The binding energy as a function of separation for models with a range of angular momenta. The bold line represents the sequence of quasicircular orbits passing through the minima of the binding energy, fi and m are the reduced and total masses of the binary system
to the problem, then solved using a FAS/block-multigrid algorithm (Cook et al. 1993). Then the results of several runs at different resolutions were combined using Richardson extrapolation to obtain results accurate to one part in 105. There are efforts under way to produce binary black hole initial data that may be more astrophysically realistic by using a linear combination of single spinning black hole solutions to provide a conformal 3-geometry %j that is not flat (Matzner, Huq and Shoemaker 1999). If the constraints can be successfully solved on this non-flat background, then a similar procedure can be used to locate the quasicircular orbits in these data sets as well. This would be extremely useful in estimating how much effect the choice of the conformal 3-geometry has on the location of these orbits. 2.5. Binary neutron stars Even in the simplest case, constructing astrophysically interesting initial data for binary neutron star systems is considerably more difficult than for
22
G. B. COOK AND S. A. TEUKOLSKY
black hole binaries. In addition to finding circular, near equilibrium solutions for the gravitational fields, we must also demand that the neutron star matter be in quasi-equilibrium. In Newtonian physics, a binary star system can exist as a true equilibrium. In general relativity this is not possible because of the loss of energy by gravitational wave emission. However, provided the stars are not so close that they are about to plunge together, the timescale for the orbit to change is much longer than the orbital period. Accordingly, one can look for solutions that neglect the gravitational radiation. Of particular interest are the two limiting cases where the neutron stars are co-rotating (no rotation in the frame co-rotating with the binary system) and counter-rotating (no rotation in the rest frame of the centre of mass). Several schemes have been devised to construct initial data for a neutron star binary in quasi-equilibrium (Wilson, Mathews and Marronetti 1996, Bonazzola, Gourgoulhon and Marck 1997, Baumgarte, Cook, Scheel, Shapiro and Teukolsky 1998). All of these schemes are based on the simplifying assumptions of conformal flatness and maximal slicing, differing primarily in how the neutron star matter is handled. We give here one particular example of the system of equations to be solved. First, the gravitational field equations are (Wilson et al. 1996)
V V + -/"ViVjJ /"ViVjJ
^ 2a = 2ip10AijVj (aip-6) + lfaGa^f, lfaGa^f,
(2.19)
= -&5fijftm(il>*Aii)(il>4Aim)
(2.20)
-2TTG^P,
(2.18)
(2-21) The spatial metric is decomposed as in (2.1) and is taken to be conformally flat jij = f^ (i.e., fij = Sij in Cartesian coordinates). We assume maximal slicing (K = 0), and get the equation for the trace-free extrinsic curvature (2.18) from physical arguments for quasi-equilibrium. Note that (2.18) is very similar to (2.6) except that MZJ is not present and we divide by 2a. We obtain equation (2.19) by substituting (2.18) into the momentum constraint (1.45). Note that the principal part of the operator for (2.19) is the same as in (2.7). The conformal factor ip is fixed via the Hamiltonian constraint, which now takes the form in (2.20). Finally, the lapse a is fixed via (2.21) which enforces the maximal slicing condition on neighbouring slices. For the counter-rotating case, the fluid velocity is irrotational (curl-free), and can be derived from a scalar velocity potential even in general relativity
NUMERICAL RELATIVITY
23
(Teukolsky 1998, Shibata 1998). The matter equations are (2 22)
-
A = C+ FVw = a[h2 + il>-AfiJ(Vnp)Vj
(2.23)
h2 = - ^ - 4 r j ( V ^ ) V ^ + ^ ( c + ^ V ^ ) 2 ,
(2.24)
f3l = w' +
(2.25)
fif,
where tp is the velocity potential, C is an integration constant, Q, is a constant specifying the angular velocity of the rotating binary system and £l is a circular rotation vector ( £l = ( —y, x, 0) in Cartesian coordinates for rotation about the z axis). Finally, nB is the baryon number density (see (2.31) below). The domain of solution for (2.22) is the volume covered by matter and the solution must satisfy
- A ^ v,nB
= 0
(2.26)
surf
at the boundary of the matter where nB goes to zero. Finally, the matter equations couple back into the gravitational field equations through the source terms on the right-hand sides of (2.19), (2.20), and (2.21), defined by p =
( po + Pi + P)__
[C + (3lVj^J - P,
S = (p0 + pi + P)-7-^fij{Vi(p)Vjip f
+ 3P,
= (po + ^ + p)!^— (c + (3eVev) / i j V j ^ ,
h EE ^O +
ft + P
(2.27)
(2.28) (2.29)
, (2.30) Po (2.31) £0 = rnBnB, where po, pi, and P are, respectively, the rest mass density, internal energy density, and pressure of the matter in the matter's rest frame. These are all determined from the enthalpy h via (2.30) given ne, the baryon mass density mB, and an equation of state for the matter. We find, then, that solving for an irrotational neutron star binary system in quasi-equilibrium requires the solution of a set of six coupled, nonlinear elliptic equations given by (2.19), (2.20), (2.21), and (2.22). The solution depends on two free parameters, C and Q, which must be chosen to allow
24
G. B. COOK AND S. A. TEUKOLSKY
a self-consistent solution. A scheme for doing this could be based on the algorithm described by Baumgarte et al. (1998) for the slightly simpler case of a synchronous (co-rotating) binary system. A stable iterative scheme is obtained by rescaling the equations so that the outermost point on the surface of each neutron star, where it crosses the axis connecting the two stars, is at a fixed coordinate location. The innermost point on the surface of each star is taken as an input parameter for a particular solution and roughly corresponds to the free parameter fi. Next, the maximum value of the density po is taken as another input parameter, roughly corresponding to C. The exact values of Q, and C are obtained at each step of the iteration by solving a set of nonlinear algebraic equations that follow from equation (2.24). What complicates the solution of these six equations is that, while five of them are solved on a domain extending out to radial infinity, (2.22) must be solved on the limited domain consisting of the volume containing matter. The boundary of this volume is not prescribed, but is determined by the solution. The first set of successful solutions to these equations has been obtained by Bonazzola, Gourgoulhon and Marck (1999) using spectral methods.
2.6. Summary Common to all of these current efforts at constructing initial data is the need to solve a large set of coupled nonlinear elliptic equations with complicated boundaries over a large range of length scales. The classic problem of constructing axisymmetric rotating neutron star models has been studied extensively, and highly sophisticated and efficient computational techniques are now commonly used. The situation is not nearly so well in hand for the other examples described above. These problems are ripe for new ideas and algorithms. They have been attacked principally using finite difference techniques, although Bonazzola, Gourgoulhon and Marck (1998) are exploring spectral techniques for neutron star binaries, and Arnold, Mukherjee and Pouly (1998) have applied finite element techniques to the problem of solving the Hamiltonian constraint. Which numerical schemes will work the best is still an open question. A good scheme must balance efficiency and speed against accuracy. It must be able to resolve the different length scales of the problem, even though the fields vary on characteristic length scales comparable to the radius of the star when near to the star, while the outer boundary conditions must be imposed at large distances from the stars. There is a great need for both efficiency and accuracy. Physicists are interested in performing extensive parameter space surveys in order to understand the physical content of the initial data. With sufficient accuracy, such surveys can also provide great insight into dynamical, but slowly evolving configurations (the quasi-equilibrium approximation). The accuracy of
NUMERICAL RELATIVITY
25
solutions is limited not only by the truncation error of the numerical scheme and the grid resolutions used, but also by the approximations made. Ideally, the outer boundary should extend to infinity, but this often poses problems numerically. In practice, the outer boundary is usually approximated via a fall-off condition at large radius (c/. equations (2.13) and (2.14)). How far out this radius can be pushed while still maintaining accuracy near the stars depends on the numerical scheme and gridding choices. There are many other issues that must be addressed. Perhaps the most important, besides efficiency and accuracy, are the following. How does the nonlinearity of the coupled system affect the choice of the solution scheme? Will iterative schemes for solving the coupled system remain stable when the nonlinear couplings become strong?
3. Evolution 3.1. Standard ADM form In its simplest form, evolving Einstein's equations as a Cauchy problem involves updating the metric 7^ and extrinsic curvature Kij using the evolution equations (1.42) and (1.43). A pure evolution scheme solves only such time evolution equations. It relies on the evolution equations to preserve the validity of the constraints computationally as well as analytically. It is also possible to determine some of the dynamical quantities from evolution equations and others from the constraint equations at each time-step. Such algorithms are expected to be less efficient than pure evolution schemes, since they require the solution of elliptic equations for the constraints at each time-step. These mixed strategies have been the preferred algorithms in 1- and 2-dimensional problems, because of difficulties in designing stable, accurate pure evolution schemes. Moreover, as mentioned earlier, there is no known general 3-dimensional algorithm that can evolve Einstein's equations with black holes that is stable. While we will emphasize pure evolution schemes in this review, one should bear in mind the possibility that some explicit enforcement of the constraints may be necessary to guarantee a stable algorithm. As discussed in Section 1.6, when solving equations (1.42) and (1-43), we must separately specify exactly how far along we are evolving each point in proper time (physical time) by specifying the lapse function a. We must also choose how the spatial coordinates labelling a particular point on the hypersurface will change by specifying the shift vector /3\ Assume that we have fixed these four kinematical quantities somehow, and that we are in vacuum so that the matter terms vanish. Then, if we express the tensors in terms of a coordinate basis, we can write the evolution equations (1-42) and
26
G . B . C O O K AND S. A . TEUKOLSKY
(1.43) explicitly as dt'ji]
dK t
l3
—
-
e
0 de7ij fi), A ' . ,
le
jd 0
e
+ ^dj3
t
e
K dj(3 i£
£
(3.1)
7J
(
+ Kjtdip* - 2aK K j
+ o/\7\'.,
l{
:m
- \ai \d(d 7ij
+ didjjt
m
+ 7
-2aK ,
n p
\{dr,'
jn
- djdrfmj -
m
+
-
djda,ni
d jij)damp n
Notice that (3.1) and (3.2) do not form a simple wave equation. In fact, (3.1) contains no derivatives of Kij at all, while (3.2) contains both linear combinations of second derivatives of 7,7 and quadratic combinations of first derivatives of 7,7. W e call the set of evolution equations given by (3.1) and (3.2) the 'standard A D M form'. The 'non-Laplacian second derivatives of 7ij in equation (3.2) can be removed by certain modifications to the standard A D M equations (see Baumgarte and Shapiro (1999) and references therein), resulting in a system that seems to be better behaved. In general, the A D M equations are not of any known mathematical type. In particular, they do not satisfy any of the standard definitions of hyperbolicity. While physical effects propagate at the speed of light in general relativity. 7,7 and are not simple physical quantities. Rather, they are gauge-dependent quantities whose values depend on the choice of the lapse a and shift Q . These can be chosen to allow for propagation of waves in 7^ and at arbitrary speeds. 0
,
1
3.2.
Hyperbolic
forms
There is a long history of analytic studies of hyperbolic formulations of gen eral relativity (Foures-Bruhat 1952, Fischer and Marsden 1972, Friedrich 1985); see also Taylor (1996, Section 18.8). The earliest approaches made special gauge choices to rewrite (3.1) and (3.2) in the form of a manifestly symmetric hyperbolic system (Foures-Bruhat 1952. Fischer and Marsden 1972). Interest in using such formulations in numerical studies has been relatively recent (Bona and Masso 1992). The initial motivation for explor ing these techniques was to put Einstein's equations into a form that could make more direct use of the vast repertoire of numerical techniques for han dling first-order symmetric hyperbolic systems such as the equations of fluid 0
A D M = Arnowitt, Deser and Misner. who introduced the 3+1 decomposition used in numerical relativity earlier for other purposes.
NUMERICAL RELATIVITY
27
mechanics. It was soon realized, however, that a clear understanding of the characteristic speeds of propagation of the evolving variables was also quite useful. This is especially true for the problem of evolving spacetimes that contain black holes, as discussed below. It is also hoped that having the equations in a form that can be more readily analysed will aid, for example, in properly posing boundary conditions or in treating the propagation of errors in the constraints (Frittelli 1997, Brodbeck, Frittelli, Hiibner and Ruela 1999). In particular, it is hoped that stable evolution schemes can be developed that do not require that elliptic constraint equations be solved on each time-step (Scheel, Baumgarte, Cook, Shapiro and Teukolsky 1998). A potential problem with using the gauge freedom to achieve explicitly hyperbolic forms is that it is widely believed that successful numerical schemes will need to exploit the gauge freedom for other purposes. Thus there has been a considerable effort recently to find formulations of general relativity that are explicitly hyperbolic while retaining all or most of the gauge freedom of the standard ADM formulation (Bona, Masso, Seidel and Stela 19956, Choquet-Bruhat and York 1995, van Putten and Eardley 1996, Frittelli and Reula 1996, Friedrich 1996, Anderson, Choquet-Bruhat and York 1997). Common to all of these approaches is to expand the set of fundamental variables. All of the approaches include fundamental variables that are essentially first spatial derivatives of the metric. Some also include variables that directly encode the curvature of spacetime. Consider one of these hyperbolic systems, the 'Einstein-Bianchi' formulation of general relativity (Anderson et al. 1997). In vacuum, the equations are: (3.3)
j
+ adktkij
= Kudjp1 + Kjedi(3e + a [f
[
kk
m k ikT j
jm
- {T T ik + -{didjlna-rkijdklna) (3.4) - feijdef3k+
j + 7 -ftrnji
+ Bj£6i il + 1 i^imX je
d ^
28
G. B. COOK AND S. A. TEUKOLSKY _
Kc O
JT1 CA fDZ i
TT
Tp
Q /DC
+ a[KEi:j - 2KkEkj - KkEik — ti
ki, van Ts r\ €j KkmL>en
-
e
klrj f^m l i Wf m l jfcj
- {t^Hkj + tjktBik)dta, =
Died3f3 Djtdil? + a[KDij k -2K Dkj - KkDik -
=
E —€klB + eJkeHlk)dta,
eikie j ran K
- {eik£ Bkj dtHi:i -
ran
j M
K
M R j . fc p f
D
ln T
(3.7)
f"1 l
i ^Im1-
ke
+ (ei Ekj + ej Dik)dea,
jk\
(3.8)
Bitdj$ + 1Sjidift + a[ifBjj -2Kk Bkj - KkBik
+uk *
f""jk]
Hjedi/3e + a[KHij HudjP - 2Kk Hkj — &j tiik hi
=
(3.6)
+ eJkeElk)dea,
(3.9)
=
See Section 2.2 for the definition of e^-fc and 7. This formulation of general relativity differs significantly from the straightforward ADM formulation presented in equations (3.1) and (3.2) above. First, note that derivatives of the metric are replaced by the spatial connection Tljk, which is now treated as a fundamental variable. The system also includes four new variables, Eij, Dij, Hij, and Bij, which encode the information in the 4-dimensional Riemann tensor (1.25). If the shift ^ is zero and the nonlinear terms are dropped, note the resemblance of equations (3.6)-(3.9) to Maxwell's equations. An interesting feature of this formulation of general relativity is that the nine components each of Eij, Dij, Hij, and Bij are treated as independent in order to yield a hyperbolic system with physical characteristic velocities (zero or the speed of light). If the symmetries and constraints of general relativity are imposed explicitly to reduce the number of variables from 36 to the 20 independent components of the Riemann tensor, then additional characteristic speeds of half the speed of light are added to the system (Friedrich 1996). Also, in order to formulate the evolution equation for the extrinsic curvature Kij as part of the
NUMERICAL RELATIVITY
29
hyperbolic system, it is necessary to consider the new quantity d (3.10) as the freely specifiable kinematical gauge quantity instead of the usual lapse variable a. Finally, note that in vacuum, Dij = E%j and Bij — Hij both analytically and computationally if they are equal in the initial data. Matter terms appear as additional source terms on the right-hand sides of equations The Einstein-Bianchi formulation is an example of a hyperbolic formulation of Einstein's equations that is so new that there is as yet no published report of how well it works in practice.
3.3. Black hole evolutions Dealing with black holes when evolving a spacetime numerically introduces new problems that must be dealt with. Inside a black hole is a physical singularity that cannot be finessed away by some clever coordinate transformation: the singularity must be avoided somehow. The first approaches to avoiding the singularity were to impose special time-slicing conditions that would slow down the evolution in the vicinity of the singularity. The most widely used condition was maximal slicing (Smarr and York 1978), but all such slicings lead to a generic phenomenon known as the 'collapse of the lapse'. When this happens, the lapse very rapidly approaches zero in the spatial region near the singularity to 'hold back' the advance of time there. Because the lapse stays large far away from the singularity, the spatial slice has to stretch, leading to steep gradients in the various fields. These gradients ultimately grow exponentially with time and there is no way to resolve these gradients numerically for very long. These 'singularity avoiding' schemes can be made to work in spherical symmetry, and, with considerable effort, in axisymmetry (Evans 1984, Stark and Piran 1987, Abrahams, Shapiro and Teukolsky 19946, Bernstein, Hobill, Seidel, Smarr and Towns 1994). The trick is to adjust the parameters of the calculation to extract the useful results before the code crashes. Such efforts appear doomed in general 3-dimensional calculations. A newer approach for avoiding the singularity is based on the fundamental defining feature of a black hole: that its interior has no causal influence on its exterior. We can, in principle, simply excise the interior of the black hole from the computational domain. Then there is no chance of the evolution encountering the singularity. This class of methods is generically known as 'apparent-horizon boundary conditions' for reasons that will become clear below. Before we can excise the interior of a black hole from the computational domain, we must first know where the black hole is. The surface of a black hole is the event horizon, a null surface that bounds the set of all null geodesies that can never escape to infinity. Unfortunately this is not a
30
G. B. COOK AND S. A. TEUKOLSKY
useful definition for dynamical computations - at any instant you need to have already computed the solution arbitrarily far into the future to check if a given light ray escapes to infinity, falls into the black hole, or remains marginally trapped on the black hole surface. Computationally, the useful surface associated with a black hole is its apparent horizon, the boundary of the region of null geodesies that are 'instantaneously trapped'. The apparent horizon is guaranteed to lie within the event horizon under reasonable assumptions (Hawking and Ellis 1973, Section 9.2), and when the black hole settles down to equilibrium the apparent and event horizons coincide. More precisely, the apparent horizon is defined as the outermost surface on which the expansion of outgoing null geodesies vanishes. Such a surface is called a marginally outer-trapped surface (Wald 1984) and satisfies 6 = Vis* + Kijsisj - K = 0
(3.11)
everywhere on a closed 2-surface of topology § 2 . Here, sl is the outwardpointing unit normal to the closed 2-surface and 8 is the expansion (divergence) of null rays moving in the direction sl. Since the solution of equation (3.11) must be a closed 2-surface, it can be expressed as the level surface r = 0 of some scalar function T(XZ), and the unit normals can be written as sl = V V / | V T | . This reduces the equation to a scalar elliptic equation. The key feature of apparent horizons, as seen from (3.11), is that they are defined
solely in terms of information on a single spacelike hypersurface. Several different approaches for solving equation (3.11) have been proposed (see Cook and York (1990), Baumgarte, Cook, Scheel, Shapiro and Teukolsky (1996), Gundlach (1998&) and references therein). Since black hole excision requires locating the apparent horizon at every time-step, there is a premium on finding efficient and robust methods. It is important that the current methods be improved. The details of how apparent-horizon boundary conditions are implemented can vary greatly, and it is not yet clear which methods are preferred, if any. The first tests of apparent horizon boundary conditions were made on spherically symmetric configurations (Seidel and Suen 1992, Scheel, Shapiro and Teukolsky 1995a, Anninos, Daues, Masso, Seidel and Suen 19956, Marsa and Choptuik 1996). In these tests, the location of the horizon was either fixed at a particular coordinate radius, or allowed to move outward as matter fell into the black hole, increasing its physical size. Trial implementations of apparent horizon boundary conditions in 3-dimensional codes evolving spherically symmetric configurations have been reported by Anninos et al. (1995a) and Briigmann (1996). Tests of a more general 3-dimensional implementation of apparent horizon boundary conditions were reported in Cook et al. (1998). The details of this scheme were reported in Scheel, Baumgarte, Cook, Shapiro and Teukolsky (1997). For concreteness, we will describe the
NUMERICAL RELATIVITY
31
apparent horizon boundary condition scheme used in Cook et al. (1998) and referred to as 'causal differencing'.6 The key feature of the causal differencing scheme described in Cook et al. (1998) is that it accommodates excised regions that move through the computational grid. When an excised region moves, grid points that had been excised from the domain can return to the computational domain and must be filled with correct data. This is accomplished by working in two different coordinate systems during each time-step. The 'physical' coordinates, denoted as the (t, xl) coordinate system, are denned as having spatial coordinates that remain constant when dragged along the t direction (see (1.37)). 'Computational' coordinates, denoted (i,xl), are then defined as having spatial coordinates that remain constant when dragged along the direction normal to the spatial hypersurface. A time-derivative in this direction is denned by 9
- °
8* d
(3 12)
and the coordinate transformation by i x
= t, = x\x\t).
l
(3.13) (3.14)
A time-step begins by setting the two coordinate systems equal, xl\i=tQ = xl\t=t0- The evolution equations are then used to evolve the data forward in time along the normal direction using equation (3.12) to a new time slice where t = to + At. All that remains is to transform the data from the computational coordinates back to the physical coordinates. If we begin at t = to with data located at grid points that are uniformly distributed in the xl coordinates, then we end the first phase of the evolution with data located at grid points that are uniformly distributed in the xl coordinates. We can therefore perform the required transformation back to the physical coordinates via interpolation (or extrapolation) from the computational grid. To determine the location of the physical grid points within the computational grid, we evolve the xl coordinates along the t direction. Using dx1 %=0 dt
(3.15)
and (3.12), we find that dt
(3.16)
The term 'causal differencing' has been applied to several similar schemes, but was first coined by Seidel and Suen (1992). An alternative scheme called 'causal reconnection' was developed by Alcubierre and Schutz (1994).
32
G. B. COOK AND S. A. TEUKOLSKY
to + A t (a)
to + At
(b)
Fig. 3. 1-dimensional illustration of causal differencing, showing a time-step for a black hole in both the computational (a) and physical (b) coordinate systems. The shaded region represents the black hole interior. Data are evolved along the d/di direction (dashed lines with arrows). In the computational frame (a), this is the vertical direction. The surface of the black hole is a characteristic. j3l has been chosen so that the black hole moves to the right in the physical coordinate system (b). d/dt is in the direction of the solid lines which are vertical in the physical coordinate system (b) Equation (3.16) is evolved to t = to + At with the initial conditions that xl\i=t = xl\t-t0+At for each grid point in the physical coordinates at t = to + At that is not excised from the domain. If the black hole has moved during the time-step, the set of non-excised points {xl}\t=t0+At 7^ {xl}\t=t0Special care must be given to computing spatial derivatives during the time-step. During the first phase of the evolution, the data are in the computational coordinate system so that terms in the evolution equations that involve d/dxl must be transformed to d
dx> d
(3.17) dx dxl dxi The Jacobians dxl jdx^ in the computational coordinate system can be obtained by integrating 1
1 d_ dx di dxi
dx{ d(3£ dx1- dxi
(3.18)
with the initial conditions that dx1 /dx3\t=t0 = The underlying reason that the time integration scheme outlined above should work for black holes moving arbitrarily across a computational domain is based on the generic behaviour of apparent horizons. When we evolve along the normal direction, the time direction is centred within the local light cone, that is, outgoing light rays move out at the local speed of light. Since such light rays cannot cross the apparent horizon, it must
NUMERICAL RELATIVITY
33
be moving outwards at least as fast. Thus, by evolving along the normal direction, we know that the apparent horizon at to + At will have moved out in the computational coordinate system. Accordingly, as long as we have no excised points outside an apparent horizon at to, we know we have valid evolved data at to+At extending at least a small distance within the location of the apparent horizon at to + At. This will be true regardless of the choice of the shift vector. Thus, new grid points exterior to the apparent horizon that appear at to+At are guaranteed to lie within the computational domain and data at these points can be set by interpolation. Figure 3 illustrates the case of a translating black hole. In Figure 3(a), we see the view from the computational coordinate system where the horizon moves outward. Here, data evolves vertically along the dashed lines with arrows. Physical coordinates remain constant along the solid lines so that, given this choice of the shift, the black hole is moving to the right. Notice that the first point to the right of the black hole falls into it, while the left-most point inside the black hole emerges from it during the time-step. Grid points in the physical coordinates (where the solid lines intersect the to + At hypersurface) are filled via interpolation form the evolved data. Figure 3(b) shows the same scenario in the physical coordinate frame where the black hole is moving to the right. Notice that the picture in the physical coordinate frame is completely dependent on the choice of the shift. In the computational frame, however, only the final location of the physical coordinates depends on the shift. Figure 4 shows an example of applying this causal differencing scheme to the case of a translating black hole described in Cook et al. (1998). The black hole is translating in the z direction with its centre on the z axis. The plot displays a measure of the violation of the Hamiltonian constraint (1.44) on the z axis. The mass of the black hole is denoted by M and the black hole has a radius of 2M (in units where the gravitational constant G = 1). Equations similar to (3.1) and (3.2) were used to evolve the metric and extrinsic curvature. Values for the lapse and shift during the evolution were obtained from the analytic solution for the translating black hole. The excised region began five grid zones inside the apparent horizon and the evolution continued until t = 61M. We can clearly see that the domain to the left of the black hole is filled correctly with data, as the hole, and the excised region, move to the right through the computational grid. While evolving a translating black hole is a triumph for the causal differencing method, we are still a long way from generic 3-dimensional evolutions. The black hole in the above example is constructed from a spherically symmetric solution using a simple coordinate transformation (a boost). This severely limits the generality of the test, since spherically symmetric solutions to Einstein's equations contain no true gravitational dynamics. While the discrete nature of the numerics breaks the exact spherical symmetry,
34
G. B. COOK AND S. A. TEUKOLSKY
Fig. 4. Normalized Hamiltonian constraint,
H = (R + K2 - KijK^)/(\R\ + \K2\ + \KijKij\), along the z axis as a function of time. The black hole is translating in the z direction at a speed of 1/lOth the speed of light. The flat region shows the location of the excised part of the domain within the black hole this is only at the level of a small perturbation. At present, there have been no successful fully dynamical tests of apparent horizon boundary conditions. To evolve truly dynamical black hole spacetimes, one will need to consider more general shift vector choices than those considered so far. For example, to evolve a black hole binary system one may want to introduce the analogue of co-rotating coordinates but without 'twisting up' the coordinates around the individual black holes, which will not in general rotate about their own axes with the orbital angular velocity. A concern is that these general shift choices may introduce characteristic speeds into the system that exceed the speed of light. This could be potentially disastrous when applying apparent horizon boundary conditions to the standard ADM evolution schemes, since 'gauge waves' could propagate out through the apparent horizon. One reason that hyperbolic formulations of general relativity are currently receiving so much attention is that it is easy to compute the characteristic speeds and to be sure that they are all physical. 3.4- Instabilities and other problems Perhaps the most serious problem that has plagued the development of schemes for evolving Einstein's equations is the pervasiveness of instabilities. As we have mentioned before, there are currently no known general
NUMERICAL RELATIVITY
35
evolution schemes that can evolve Einstein's equations in three dimensions for an indefinite period of time. The possible sources of instability in any general relativistic evolution scheme are many and varied. First, the usual sorts of numerical instabilities such as the Courant—Friedrichs-Lewy instability for explicit evolution schemes can be handled trivially with a Courant condition. We might be concerned about the possible formation of shocks, given the nonlinear nature of Einstein's equations. However, unlike Euler's equations of fluid dynamics, the gravitational field equations do not develop shocks from smooth initial data (Choquet-Bruhat and York 1980, Christodoulou and O Murchadha 1981). It is unknown whether this analytic result guarantees that shocks cannot form in a numerical solution. Moreover, if there are matter sources with shocks on the right-hand sides of the gravitational equations, there will certainly be numerical difficulties. Hyperbolic formulations like the Einstein-Bianchi system have derivatives of the matter density, pressure, etc., as source terms, presumably making things worse. These problems have not been investigated yet. The situation is further complicated because a poor choice of the lapse or shift vector can introduce steep gradients in the solution that look like shocks but are really just coordinate singularities. This problem with the choice of the shift is one aspect of a general class of problems that can occur in numerical evolutions of Einstein's equations. As mentioned in Section 3.3, a particular choice of time slicing condition can lead to 'grid stretching' in the vicinity of a black hole. This is again a coordinate (gauge) effect that ultimately leads to exponentially growing features in the solution. These coordinate effects are not really instabilities in the traditional sense because they represent valid solutions of the equations. However, a poor choice of the gauge functions, the lapse and the shift, can lead to solutions with exponentially growing features that are difficult, but not impossible, to distinguish from real instabilities. The way to diagnose the presence of these 'gauge instabilities' is to examine the growth of physical, gauge-independent quantities, or of violations of the constraints, which are also gauge-independent. Such quantities will not exhibit unstable growth if the instability is simply due to gauge effects. The problem with this idea is that, without sufficient resolution, inaccuracies caused by growing gauge modes will contaminate these indicators, making it difficult in practice to diagnose why a calculation is blowing up. Also complicating the problem is the fact that the gauge freedom of general relativity yields a constrained evolution system. If the constraints are not explicitly enforced during an evolution, then numerical errors will necessarily drive solutions away from the constraint surface. Not all formulations of Einstein's equations are stable when the constraints are violated. It is known that the gauge-independent constraints do form a stable, symmetric hyperbolic system when 7^ and Kij are evolved via (3.1) and (3.2) (see
36
G. B. COOK AND S. A. TEUKOLSKY
Fritelli 1997). Unfortunately, the mathematically rigorous definition of stable evolution used in this and other proofs may not be sufficient for numerical needs, since it guarantees only that perturbations do not grow faster than exponentially. Rapid power-law growth is often seen, and quickly spoils numerical evolutions. This sort of constraint-violating instability can be diagnosed by the unstable growth of gauge-independent quantities. An example where a constraint-violating instability was diagnosed and corrected in a simple 1-dimensional system can be found in Scheel et al. (1998). An interesting area of investigation is whether a system of evolution equations can be written that is attracted to the constraint surface (Brodbeck et al. 1999). Certain choices of boundary conditions can also turn a stable evolution scheme unstable. Of particular concern in this regard is the effect of apparent horizon boundary conditions on the overall stability of an evolution scheme. Currently, there appears to be no good way to analyse the stability of various apparent horizon boundary conditions. Furthermore, there seem to be very few good tools for analysing and diagnosing instabilities in general. The whole area of instabilities in evolution schemes is the outstanding computational problem in the field, and is urgently in need of further work. 3.5. Outer boundary
conditions
Another issue that must be addressed is how to accurately and stably pose outgoing wave boundary conditions in general relativity. The dynamical simulations of greatest interest are those that generate strong gravitational waves that will propagate towards infinity. The nonlinearity of Einstein's equations couples ingoing and outgoing modes, implying that simple outgoing wave boundary conditions must be applied at large radii where the coupling is weak. The quadrupole nature of the dominant modes is a further complication when the outer boundary is not an §2 constant coordinate surface. Consider the apparently easy problem of setting Sommerfeld radiation conditions at the faces of a Cartesian grid for a simple scalar wave $(r, t). Assume that the solution is a purely outgoing wave given by f
^
r
\ (3.19) r elation We can construct a boundary condition from the usual Sommerfeld relation )
or r at and the relations between Cartesian and spherical coordinates d_=drL dx dx dr
dx dO dx df
(3.20)
{
'
NUMERICAL RELATIVITY
37
Using the fact that / has no angular dependence, we get
\dx)
dx~
r
dt'
{
'
which can be applied easily at a constant x boundary. This works well for the simple monopole form given in (3.19). It even works well at large radii if f(t — r)/r is just the leading order term in an expansion for $(£, r, 9, (f>) where the higher-order terms have angular dependence. But, if the leading order term in the expansion has angular dependence, as is the case for the dominantly quadrupole gravitational radiation, (3.22) will not work, and straightforward Cartesian generalizations have proven to be unstable for general relativistic problems. An approach that we term 'interpolated-Sommerfeld' has proven to be a dramatic improvement. The idea is to apply (3.20) directly at the boundaries of a Cartesian grid by using interpolation to provide data at points that are on radial lines associated with the boundary points. This allows the radial derivative to be approximated directly, thereby avoiding problems with angular dependence in the function. (We would not be surprised if this idea has been used before in other fields.) Algorithms designed to extract information about the outgoing gravitational waveforms can provide more sophisticated boundary conditions. One approach expresses the gravitational field at large distances as a perturbation about an analytic spherically symmetric background metric. The waves are decomposed into a multipole expansion. Each multipole component satisfies a 1-dimensional linear wave equation. The wave data is 'extracted' from the full numerical solution in the region near the outer boundary. This provides initial conditions to evolve the perturbation quantities to very large distances. The evolution is cheap because the equations are 1-dimensional, and the asymptotic waveform can be read off very accurately. As a byproduct, the evolution provides boundary conditions 'along the way' at the (much closer) outer boundary of the full numerical solution. See Rezzolla, Abrahams, Matzner, Rupright and Shapiro (1999) and Abrahams et al. (1998) and references therein for more details. Another approach that combines wave extraction with supplying outer boundary conditions is to match the interior evolution to a full nonlinear characteristic evolution code (see Section 1.7). The two evolutions must be fully coupled, each providing the other with boundary data at the S2 surface where they are joined. This approach has the advantage of being non-perturbative, but the additional evolution system is considerably more complex than the perturbative system. See Bishop, Gomez, Lehner and Winicour (1996) and Bishop et al. (19976) and references therein for more details.
38
G. B. COOK AND S. A. TEUKOLSKY
The primary difficulty with both matching techniques is in stably providing boundary conditions to the interior Cauchy evolution. The perturbative approach has had some success for the case of evolving weak waves. The characteristic approach has not yet been able to feed back information into a 3-dimensional Cauchy evolution of Einstein's equations without encountering severe instabilities. The technique has worked in lower-dimensional problems (Dubai, d'Inverno and Clarke 1995) and with simpler systems in three dimensions (Bishop, Gomez, Holvorcem, Matzner and Winicour 1997a).
4. Related literature In addition to the references presented in the main part of this review, there are numerous papers that can prove useful to anyone entering thefieldof numerical relativity. The regime of spherical symmetry provides a lower-dimensional testing domain. However, there is no true gravitational dynamics in spherical symmetry: just as with electrodynamics, there are no spherical gravitational waves. To compensate for this, researchers typically resort to adding scalar wave matter sources, or they explore alternative theories of gravity that admit spherical waves: see Choptuik (1991), Scheel et al. (1995a), and Scheel, Shapiro and Teukolsky (19956) and references therein. A lack of true dynamics is not always a problem, since gauge choices can provide time dependence and a nontrivial test of numerical schemes: see Bona, Masso and Stela (1995a) Axisymmetry provides another useful test-bed, but evolutions are hindered by tedious regularity conditions that must be satisfied at coordinate singularities. Some useful references to axisymmetric evolutions can be found in Abrahams, Bernstein, Hobill, Seidel and Smarr (1992), Shapiro and Teukolsky (1992), Anninos, Hobill, Seidel, Smarr and Suen (1993), Abrahams et al. (19946), Abrahams, Cook, Shapiro and Teukolsky (1994a), Bernstein et al. (1994), and Anninos, Hobill, Seidel, Smarr and Suen (1995c). Critical behaviour in solutions of Einstein's equations is a recent discovery that was made through high-precision numerical work (Choptuik 1993). Gundlach (1998a) provides a recent thorough review of this topic. We have not dealt much with the issue of matter sources in Einstein's equations. The matter evolves via its own set of evolution equations, with gravitational effects coupled in through the metric and its derivatives. The most common matter sources that have been dealt with are hydrodynamic fluids. Pons, Font, Ibariez, Marti and Miralles (1998) treat general relativistic hydrodynamics with Riemann solvers and provide useful references. Banyuls, Font, Ibanez, Marti and Miralles (1997) give an overview of shockcapturing techniques in relativistic hydrodynamics. An earlier general reference on general relativistic hydrodynamics is Wilson (1979).
NUMERICAL RELATIVITY
39
Many of the recent references we have cited are also available from the Los Alamos preprint server, xxx. l a n l . gov, in the gr-qc and a s t r o - p h archives.
5. Conclusions The frontier in numerical relativity explores 3-dimensional problems, in particular 3-dimensional evolutions. There are numerous problems, some of a fundamental physical nature and some computational. Most critical among the physical problems is the question of how to choose good coordinates (lapse and shift choices) during the evolution. We have a reasonable qualitative notion of what it means to choose good coordinates and ideas on how to do so. Early work in numerical relativity often made use of geometrically motivated coordinate choices, such as maximal slicing for the lapse, or 'minimal distortion' for the shift (Smarr and York 1978). It is likely that geometric insight will continue to be useful as new choices are sought. The fundamental computational issue is to develop an evolution scheme for general 3-dimensional black holes that is stable and accurate. Schemes that work in special cases have been developed, but the goal of a truly general scheme has so far been blocked by various instabilities. A critical unanswered question is: what is the source of these instabilities and how can we circumvent them? One possible source includes purely numerical instabilities in the basic discretization of the evolution system. The discretization is complicated by black hole excision and the imposition of nontrivial outer boundary conditions. It is also possible that the evolution systems being used admit unstable modes if the numerical solutions violate the constraints, or if approximated boundary conditions allow these modes to grow. Thus, achieving the goal of stable, accurate and efficient evolutions of black hole spacetimes requires many questions to be answered. Which systems of equations can or should be used? Can the evolutions be performed without imposing the constraints, or at least without imposing them by solving elliptic equations? How should apparent horizon boundaries be handled? This latter question is closely tied to the particular numerical scheme to be used in solving the system of equations. What numerical schemes should be used? To date, only relatively simple schemes have been tried for 3-dimensional black hole simulations. More sophisticated techniques could be based on using the hyperbolic formulations of general relativity and an understanding of the characteristic variables and speeds associated with a given system. If such techniques are to be used, then they must incorporate the properties of the black hole boundary. A characteristic variable whose characteristic direction is outgoing just outside the black hole changes to ingoing just
40
G. B. COOK AND S. A. TEUKOLSKY
inside the hole. Thus, the black hole surface is a sort of sonic point, and techniques from computational fluid dynamics might prove useful. These issues provide many interesting challenges to numerical analysts interested in evolution systems. Similarly, constructing initial data for the evolutions provides interesting problems in the area of coupled nonlinear elliptic systems. Beyond these numerical challenges exist what might be more properly called computational challenges. It requires an enormous amount of computation to evolve dozens of variables in time over three spatial dimensions with sufficient resolution to deal both with fields near the black hole and with waves near the outer boundary. These requirements place numerical relativity among the problems that will continue to demand the highest performance supercomputers and the best algorithms that computational scientists can provide.
REFERENCES A. M. Abrahams, D. Bernstein, D. Hobill, E. Seidel and L. Smarr (1992), 'Numerically generated black-hole spacetimes: Interaction with gravitational waves', Phys. Rev. D 45, 3544-3558. A. M. Abrahams, G. B. Cook, S. L. Shapiro and S. A. Teukolsky (1994a), 'Solving Einstein's equations for rotating spacetimes: Evolution of relativistic star clusters', Phys. Rev. D 49, 5153-5164. A. M. Abrahams, S. L. Shapiro and S. A. Teukolsky (19946), 'Calculation of gravitational wave forms from black hole collisions and disk collapse: Applying perturbation theory to numerical spacetimes', Phys. Rev. D 51, 4295-4301. A. M. Abrahams et al, The Binary Black Hole Alliance (1998), 'Gravitational wave extraction and outer boundary conditions by perturbative matching', Phys. Rev. Lett. 80, 1812-1815. A. Abramovici, W. E. Althouse, R. W. P. Drever, Y. Giirsel, S. Kawamura, F. J. Raab, D. Shoemaker, L. Sievers, R. E. Spero, K. S. Thorne, R. E. Vogt, R. Weiss, S. E. Whitcomb and M. E. Zucker (1992), 'LIGO: The laser interferometer gravitational-wave observatory', Science 256, 325-333. M. Alcubierre and B. F. Schutz (1994), 'Time-symmetric ADI and causal reconnection: Stable numerical techniques for hyperbolic systems on moving grids', J. Comput. Phys. 112, 44-77. A. Anderson, Y. Choquet-Bruhai and J. W. York, Jr. (1997), 'Einstein-Bianchi hyperbolic system for general relativity', Topological Methods in Nonlinear Analysis 10, 353-373. P. Anninos, K. Camarda, J. Masso, E. Seidel, W.-M. Suen and J. Towns (1995a), 'Three dimensional numerical relativity: The evolution of black holes', Phys. Rev. D 52, 2059-2082. P. Anninos, G. Daues, J. Masso, E. Seidel and W.-M. Suen (19956), 'Horizon boundary condition for black hole spacetimes', Phys. Rev. D 51, 5562-5578.
NUMERICAL RELATIVITY
41
P. Anninos, D. Hobill, E. Seidel, L. Smarr and W.-M. Suen (1993), 'Collision of two black holes', Phys. Rev. Lett. 71, 2851-2854. P. Anninos, D. W. Hobill, E. Seidel, L. Smarr and W.-M. Suen (1995c), 'Head-on collision of two equal mass black holes', Phys. Rev. D 52, 2044-2058. D. N. Arnold, A. Mukherjee and L. Pouly (1998), Adaptive finite elements and colliding black holes, in Numerical Analysis 1997: Proceedings of the 11th Dundee Biennial Conference (D. F. Griffiths, D. J. Higham and G. A. Watson, eds), Addison Wesley Longman, Harlow, England, pp. 1-15. F. Banyuls, J. A. Font, J. M. Ibanez, J. M. Marti and J. A. Miralles (1997), 'Numerical {3 + 1} general relativistic hydrodynamics: A local characteristic approach', Astrophys. J. 476, 221-231. J. W. Barrett, M. Galassi, W. A. Miller, R. D. Sorkin, P. A. Tuckey and R. M. Williams (1997), 'A parallelizable implicit evolution scheme for Regge calculus', Int. J. Theor. Phys. 36, 815-839. T. W. Baumgarte and S. L. Shapiro (1999), 'On the integration of Einstein's field equations', Phys. Rev. D 59, 024007. T. W. Baumgarte, G. B. Cook, M. A. Scheel, S. L. Shapiro and S. A. Teukolsky (1996), 'Implementing an apparent-horizon finder in three dimensions', Phys. Rev. D 54, 4849-4857. T. W. Baumgarte, G. B. Cook, M. A. Scheel, S. L. Shapiro and S. A. Teukolsky (1998), 'General relativistic models of binary neutron stars in quasiequilibrium', Phys. Rev. D 57, 7299-7311. D. Bernstein, D. W. Hobill, E. Seidel, L. Smarr and J. Towns (1994), 'Numerically generated axisymmetric black hole spacetimes: Numerical methods and code tests', Phys. Rev. D 50, 5000-5024. N. T. Bishop, R. Gomez, P. R. Holvorcem, R. A. Matzner and P. P. J. Winicour (1997a), 'Cauchy-characteristic evolution and waveforms', J. Comput. Phys. 136,140-167. N. T. Bishop, R. Gomez, L. Lehner and J. Winicour (1996), 'Cauchy-characteristic extraction in numerical relativity', Phys. Rev. D 54, 6153-6165. N. T. Bishop, R. Gomez, L. Lehner, M. Maharaj and J. Winicour (19976), 'Highpowered gravitational news', Phys. Rev. D 56, 6298-6309. C. Bona and J. Masso (1992), 'A hyperbolic evolution system for numerical relativity', Phys. Rev. Lett. 68, 1097-1099. C. Bona, J. Masso and J. Stela (1995a), 'Numerical black holes: A moving grid approach', Phys. Rev. D 51, 1639-1639. C. Bona, J. Masso, E. Seidel and J. Stela (19956), 'New formalism for numerical relativity', Phys. Rev. Lett. 75, 600-603. S. Bonazzola, E. Gourgoulhon and J.-A. Marck (1997), 'A relativistic formalism to compute quasi-equilibrium configurations of non-synchronized neutron star binaries', Phys. Rev. D 56, 7740-7749. S. Bonazzola, E. Gourgoulhon and J.-A. Marck (1998), 'Numerical approach for high precision 3-d relativistic star models', Phys. Rev. D 58, 104020. S. Bonazzola, E. Gourgoulhon and J.-A. Marck (1999), 'Numerical models of irrotational binary neutron stars in general relativity', Phys. Rev. Lett. 82, 892-895.
42
G. B. COOK AND S. A. TEUKOLSKY
S. Bonazzola, E. Gourgoulhon, M. Salgado and J.-A. Marck (1993), 'Axisymmetric rotating relativistic bodies: A new numerical approach for "exact" solutions', Astron. Astrophys. 278, 421-443. J. M. Bowen (1979), 'General form for the longitudinal momentum of a spherically symmetric source', Gen. Relativ. Gravit. 11, 227-231. J. M. Bowen and J. W. York, Jr. (1980), 'Time-asymmetric initial data for black holes and black-hole collisions', Phys. Rev. D 21, 2047-2056. S. Brandt and B. Briigmann (1997), 'A simple construction of initial data for multiple black holes', Phys. Rev. Lett. 78, 3606-3609. S. R. Brandt and E. Seidel (1995), 'The evolution of distorted rotating black holes I: Methods and tests', Phys. Rev. D 52, 856-869. O. Brodbeck, S. Frittelli, P. Hiibner and O. A. Ruela (1999), 'Einstein's equations with asymptotically stable constraint propagation', J. Math. Phys. 40, 909923. B. Briigmann (1996), 'Adaptive mesh and geodesically sliced Schwarzschild spacetime in 3+1 dimensions', Phys. Rev. D 54, 7361-7372. E. M. Butterworth and J. R. Ipser (1976), 'On the structure and stability of rapidly rotating fluid bodies in general relativity I: The numerical method for computing structure and its application to uniformly rotating homogeneous bodies', Astrophys. J. 204, 200-233. M. W. Choptuik (1991), 'Consistency of finite-difference solutions of Einstein's equations', Phys. Rev. D 44, 3124-3135. M. W. Choptuik (1993), 'Universality and scaling in gravitational collapse of a massless scalar field', Phys. Rev. Lett. 70, 9-12. Y. Choquet-Bruhat and J. W. York, Jr. (1980), The Cauchy problem, in General relativity and gravitation. One hundred years after the birth of Albert Einstein (A. Held, ed.), Vol. 1, Plenum, New York, pp. 99-172. Y. Choquet-Bruhat and J. W. York, Jr. (1995), 'Geometrical well posed systems for the Einstein equations', C. R. Acad. Sci. Paris A321, 1089-1095. D. Christodoulou and N. O Murchadha (1981), 'The boost problem in general relativity', Commun. Math. Phys. 80, 271-300. G. B. Cook (1991), 'Initial data for axisymmetric black-hole collisions', Phys. Rev. D 44, 2983-3000. G. B. Cook (1994), 'Three-dimensional initial data for the collision of two black holes II: Quasicircular orbits for equal mass black holes', Phys. Rev. D 50, 5025-5032. G. B. Cook and J. W. York, Jr. (1990), 'Apparent horizons for boosted or spinning black holes', Phys. Rev. D 41, 1077-1085. G. B. Cook, M. W. Choptuik, M. R. Dubai, S. Klasky, R. A. Matzner and S. R. Oliveira (1993), 'Three-dimensional initial data for the collision of two black holes', Phys. Rev. D 47, 1471-1490. G. B. Cook, S. L. Shapiro and S. A. Teukolsky (1994), 'Rapidly rotating neutron stars in general relativity: Realistic equations of state', Astrophys. J. 424, 823-845. G. B. Cook et al, The Binary Black Hole Alliance (1998), 'Boosted threedimensional black-hole evolutions with singularity excision', Phys. Rev. Lett. 80, 2512-2516.
NUMERICAL RELATIVITY
43
M. R. Dubai R. A. d'lnverno and C. J. S. Clarke (1995), 'Combining Cauchy and characteristic codes II: The interface problem for vacuum cylindrical symmetry', Phys. Rev. D 52, 6868-6881. A. Einstein and N. Rosen (1935), 'The particle problem in the general theory of relativity', Phys. Rev. 48, 73-77. C. R. Evans (1984), 'A method for numerical relativity: Simulation of axisymmetric gravitational collapse and gravitational radiation generation', PhD thesis, University of Texas at Austin. A. E. Fischer and J. E. Marsden (1972), 'The Einstein evolution equations as a first-order quasi-linear symmetric hyperbolic system, F, Commun. Math. Phys. 28, 1-38. Y. Foures-Bruhat (1952), 'Theorem d'existence pour certain systems d'equations aux derivees partielles non lineaires', Ada. Math. 88, 141-225. J. L. Friedman, J. R. Ipser and L. Parker (1986), 'Rapidly rotating neutron star models', Astrophys. J. 304, 115-139. H. Friedrich (1985), 'On the hyperbolicity of Einstein's and other gauge field equations', Commun. Math. Phys. 100, 525-543. H. Friedrich (1996), 'Hyperbolic reductions for Einstein's equations', Class. Quantum Gravit. 13, 1451-1469. S. Frittelli (1997), 'Note on the propagation of the constraints in standard 3 + 1 general relativity', Phys. Rev. D 55, 5992-5996. S. Frittelli and O. A. Reula (1996), 'First-order symmetric hyperbolic Einstein equations with arbitrary fixed gauge', Phys. Rev. Lett. 76, 4667-4670. A. P. Gentle and W. A. Miller (1998), 'A fully (3+l)-d Regge calculus model of the Kasner cosmology', Class. Quantum Gravit. 15, 389-405. C. Gundlach (1998a), 'Critical phenomena in gravitational collapse', Adv. Theor. Math. Phys. 2, 1-49. C. Gundlach (19986), 'Pseudo-spectral apparent horizon finders: An efficient new algorithm', Phys. Rev. D 57, 863-875. S. W. Hawking and G. F. R. Ellis (1973), The Large Scale Structure of Spacetime, Cambridge University Press, Cambridge, England. H. Komatsu, Y. Eriguchi and I. Hachisu (1989), 'Rapidly rotating general relativistic stars I: Numerical method and its application to uniformly rotating polytropes', Mon. Not. R. Astr. Soc. 237, 355-379. A. D. Kulkarni (1984), 'Time-asymmetric initial data for the N black hole problem in general relativity', J. Math. Phys. 25, 1028-1034. A. D. Kulkarni, L. C. Shepley and J. W. York, Jr. (1983), 'Initial data for TV black holes', Phys. Lett. 96A, 228-230. R. L. Marsa and M. W. Choptuik (1996), 'Black-hole-scalar-field interactions in spherical symmetry', Phys. Rev. D 54, 4929-4943. R. A. Matzner, M. F. Huq and D. Shoemaker (1999), 'Initial data and coordinates for multiple black hole systems', Phys. Rev. D 59, 024015. C. W. Misner (1963), 'The method of images in geometrostatics', Ann. Phys. 24, 102-117. C. W. Misner, K. S. Thorne and J. A. Wheeler (1973), Gravitation, Freeman, New York.
44
G. B. COOK AND S. A. TEUKOLSKY
N. 0 Murchadha and J. W. York, Jr. (1974), 'Initial-value problem of general relativity. I. General formulation and physical interpretation', Phys. Rev. D 10, 428-436. M. Parashar and J. C. Brown (1995), Distributed dynamical data-structures for parallel adaptive mesh-refinement, in Proceedings of the International Conference for High Performance Computing (S. Sahni, V. K. Prasanna and V. P. Bhatkar, eds), Tata McGraw-Hill, New Delhi, India. See also www. ticam. utexas. edu/~parashar/public_html/DAGH. J. A. Pons, J. A. Font, J. M. Ibanez, J. M. Marti and J. A. Miralles (1998), 'General relativistic hydrodynamics with special relativistic Riemann solvers', Astro, and Astroph. 339, 638-642. T. Regge (1961), 'General relativity without coordinates', Nuovo Cimento 19, 558571. L. Rezzolla, A. M. Abrahams, R. A. Matzner, M. E. Rupright and S. L. Shapiro (1999), 'Cauchy-perturbative matching and outer boundary conditions: Computational studies'. Phys. Rev. D 59, 064001. R. K. Sachs and H. Wu (1977), General Relativity for Mathematicians, SpringerVerlag, New York. M. A. Scheel, T. W. Baumgarte, G. B. Cook, S. L. Shapiro and S. A. Teukolsky (1997), 'Numerical evolution of black holes with a hyperbolic formulation of general relativity', Phys. Rev. D 56, 6320-6335. M. A. Scheel, T. W. Baumgarte, G. B. Cook, S. L. Shapiro and S. A. Teukolsky (1998), 'Treating instabilities in a hyperbolic formulation of Einstein's equations', Phys. Rev. D 58, 044020. M. A. Scheel, S. L. Shapiro and S. A. Teukolsky (1995a), 'Collapse to black holes in Brans-Dicke theory I: Horizon boundary conditions for dynamical spacetimes', Phys. Rev. D 51, 4208-4235. M. A. Scheel, S. L. Shapiro and S. A. Teukolsky (19956), 'Collapse to black holes in Brans-Dicke theory II: Comparison with general relativity', Phys. Rev. D 51, 4236-4249. E. Seidel and W.-M. Suen (1992), 'Towards a singularity-proof scheme in numerical relativity', Phys. Rev. Lett. 69, 1845-1848. S. L. Shapiro and S. A. Teukolsky (1992), 'Collisions of relativistic clusters and the formation of black holes', Phys. Rev. D 45, 2739-2750. M. Shibata (1998), 'A relativistic formalism for computation of irrotational binary stars in quasi-equilibrium states', Phys. Rev. D 58, 024012. L. L. Smarr and J. W. York, Jr. (1978), 'Kinematical conditions in the construction of spacetime', Phys. Rev. D 17, 2529-2551. R. D. Sorkin (1982), 'A stability criterion for many-parameter equilibrium families', Astrophys. J. 257, 847-854. R. F. Stark and T. Piran (1987), 'A general relativistic code for rotating axisymmetric configurations and gravitational radiation: Numerical methods and tests', Computer Physics Reports 5, 221-264. M. E. Taylor (1996), Partial Differential Equations III: Nonlinear Equations, Springer, New York. S. A. Teukolsky (1998), 'Irrotational binary neutron stars in quasiequilibrium in general relativity', Astrophys. J. 504, 442-449.
NUMERICAL RELATIVITY
45
J. Thornburg (1987), 'Coordinate and boundary conditions for the general relativistic initial data problem', Class. Quantum Gravit. 4, 1119-1131. M. H. P. M. van Putten and D. M. Eardley (1996), 'Hyperbolic reductions for Einstein's equations', Phys. Rev. D 53, 3056-3063. R. M. Wald (1984), General Relativity, The University of Chicago Press, Chicago. R. M. Williams and P. A. Tuckey (1992), 'Regge calculus: A brief review and bibliography', Class. Quantum Gravit. 9, 1409-1422. J. R. Wilson (1979), A numerical method for relativistic hydrodynamics, in Sources of Gravitational Radiation (L. L. Smarr, ed.), Cambridge University Press, Cambridge, England, pp. 423-445. J. R. Wilson, G. J. Mathews and P. Marronetti (1996), 'Relativistic numerical method for close neutron star binaries', Phys. Rev. D 54, 1317-1331. J. W. York, Jr. (1979), Kinematics and dynamics of general relativity, in Sources of Gravitational Radiation (L. L. Smarr, ed.), Cambridge University Press, Cambridge, England, pp. 83-126.
Ada Numerica (1999), pp. 47-106
© Cambridge University Press, 1999
Radiation boundary conditions for the numerical simulation of waves Thomas Hagstrom* Department of Mathematics and Statistics, The University of New Mexico. Albuquerque, NM 87131, USA E-mail: hagstromSmath.unm.edu
We consider the efficient evaluation of accurate radiation boundary conditions for time domain simulations of wave propagation on unbounded spatial domains. This issue has long been a primary stumbling block for the reliable solution of this important class of problems. In recent years, a number of new approaches have been introduced which have radically changed the situation. These include methods for the fast evaluation of the exact nonlocal operators in special geometries, novel sponge layers with reflectionless interfaces, and improved techniques for applying sequences of approximate conditions to higher order. For the primary isotropic, constant coefficient equations of wave theory, these new developments provide an essentially complete solution of the numerical radiation condition problem. In this paper the theory of exact boundary conditions for constant coefficient time-dependent problems is developed in detail, with many examples from physical applications. The theory is used to motivate various approximations and to establish error estimates. Complexity estimates are also derived to compare different accurate treatments, and an illustrative numerical example is given. We close with a discussion of some important problems that remain open.
Supported, in part, by NSF Grant DMS-9600146, the Institute for Computational Mechanics in Propulsion (ICOMP), NASA, Lewis Research Center, Cleveland, Ohio; DARPA/AFOSR Contract F49620-95-C-0075; and, while in residence at the Courant Institute. DOE Contract DEFGO288ER25053.
48
T. HAGSTROM
CONTENTS 1 Introduction 2 Formulations of exact boundary conditions 3 Approximations and implementations 4 Conclusions and open problems References
48 50 76 100 103
1. Introduction Problems in wave propagation have played and will continue to play a central role in the mathematical analysis of physical and biological systems. A defining feature of most wave problems is the radiation of energy to the far field. Mathematically, this is naturally modelled by the use of an unbounded domain, with the addition, in frequency domain problems, of a radiation boundary condition at infinity. In numerical simulations, the accurate approximation of radiation to the far field is also crucial. As it is impossible to solve directly a problem posed on an unbounded domain, new techniques, such as the introduction of an artificial boundary and associated radiation boundary conditions, are needed. The goal of this article is to outline the development, implementation, and analysis of various practical methods for solving this problem for some important models, and to present what I believe is a useful mathematical framework in which to pursue improvements and extensions. Although wave propagation is inherently a time-dependent phenomenon, it has been fruitful in many settings to solve linear problems in the frequency domain. Approaches to the accurate solution of elliptic boundary value problems on unbounded domains are, generally, far better developed than their time domain analogues. Useful techniques include a variety of boundary integral methods, which may be applied on physical or artificial boundaries, including classical integral equations of potential theory (Greengard and Rokhlin 1997, Rokhlin 1990), extensions of Calderon-Seeley equations (Ryabenkii 1985), the Dirichlet-to-Neumann map (Givoli 1992) and infinite elements (Bettess 1992, Demkowicz and Gerdes 1999). Moreover, the efficiency of solving the classical equations has been greatly enhanced in the past decade through the introduction of the fast multipole method. The aims of the numerical analysis of partial differential equations on unbounded domains are clear. We seek methods which: (i) can automatically achieve any prescribed accuracy on bounded subsets of the original domain, (ii) in terms of both computation and storage, cost no more than the solution of a standard problem on the bounded subdomain.
RADIATION BOUNDARY CONDITIONS
49
Though certainly more research and development is called for, it is my opinion that these goals have been or can be met by the methods mentioned above for most elliptic problems. An exception to this is the difficult case of Helmholtz-type equations with general variable coefficients at infinity. The situation with time-dependent problems has been far less satisfactory. The general belief was that exact domain reductions, which necessarily involve history-dependent operators, could never be made computationally feasible. As a result, various simple approximations were employed. These easily met the second criterion, but their accuracy was often poorly understood. Although the approximate conditions proposed were typically embedded in a hierarchy of conditions of increasing order and, presumably, accuracy, as in Lindman (1975), Engquist and Majda (1977, 1979), and Bayliss and Turkel (1980), the hierarchy was rarely used. For some problems it seemed that good results were obtained with the low-order approximations. However, there was generally no way to monitor or decrease the error automatically. Moreover, as we shall see later, it is possible to pose very simple problems for which the standard methods produce inaccurate results. Clearly, our first criterion is not met by the simpler techniques, which from the point of view of the numerical analyst is completely unacceptable. In the past few years, the situation has radically changed, at least for the basic, constant coefficient equations of wave theory. Progress has been made on many fronts, including: (i) the development of efficient algorithms for evaluating exact, temporally nonlocal boundary operators through the use of exact or uniformly accurate rational representations of the transforms of their associated convolution kernels, (ii) the development of improved sponge layer techniques exhibiting reflectionless interfaces with the lossless interior domains, (iii) improved techniques for the implementation to higher orders of the older hierarchies of approximate conditions, along with improvements in the analysis of their convergence with increasing order. In this article, I hope to give a broad exposition from a unified viewpoint of the developments listed above. The reader should be cautioned from the outset that, in comparison with typical theories in computational mathematics, what follows may seem rather specialized. We deal with special equations, often restrict ourselves to special boundaries, and make use of special functions. That said, the results themselves have enormous applicability as the equations we can successfully treat include many of the most important in applications. Moreover, it is not unreasonable to hope that some of these methods will be generalizable to problems we cannot now solve.
50
T. HAGSTROM
Our approach to the theory of radiation boundary conditions is straightforward. First, we construct exact boundary conditions. Although such conditions can be described in some generality as certain projections of boundary data, we concentrate on concrete expressions derived by separation of variables. For artificial boundaries that satisfy a scale invariance condition, the exact condition factors into the composition of nonlocal spatial and temporal operators. Approximate conditions are analysed using the standard concepts: stability and consistency. From this analysis we obtain sharp error estimates for a wide variety of techniques. These error bounds are used to estimate the computational complexity of the competing methods. The results are also illustrated by a simple numerical experiment. It should be noted that a comprehensive body of numerical results on an appropriate set of benchmark problems is lacking. There has been interest in developing such a set (see Geers (1998)), and I am of the opinion that the problems used here and in Alpert, Greengard and Hagstrom (1999 a) and Hagstrom and Goodrich (1998) are particularly useful, due both to the simplicity of their definition and to the difficulty of their solution. What I have not tried to do is give a comprehensive survey of the many contributions to this subject that have appeared over the past twenty years. The reader is referred to the survey articles (Givoli 1991, Tsynkov 1998) which have extensive bibliographies. I do make many references to the literature within the text, but these are primarily intended to aid the reader who wishes to delve into the subject more deeply, rather than to provide an accurate historical record of its development. Finally I would like to acknowledge the important contributions of those who collaborated with me on a variety of research projects in this field: Brad Alpert, John Goodrich, Leslie Greengard, S. I. Hariharan, H. B. Keller, Jens Lorenz, Richard MacCamy, Jan Nordstrom, and Liyang Xu. I also acknowledge the support of the NSF, the Institute for Computational Mechanics in Propulsion (NASA), DARPA/AFOSR, and, for work done while in residence at the Courant Institute, DOE. Throughout I have tried to emphasize the personal nature of the conclusions expressed herein through the use of the first person, and take full responsibility for them. 2. Formulations of exact boundary conditions In this section I briefly develop the theory of exact boundary conditions for time-dependent problems and apply it to derive explicit expressions for a long list of problems with simple artificial boundaries. The list includes the scalar wave equation and its dispersive analogue, Maxwell's equations, the linear elasticity equations, the advection-diffusion equation, the Schrodinger equation, the linearized compressible Euler equations, and the linearized incompressible Navier-Stokes equations. My purpose in presenting so many
RADIATION BOUNDARY CONDITIONS
51
examples, aside from the inherent physical interest in all of them, is to demonstrate the remarkable unity of the problem. In particular, although the details of each calculation differ, the recipe for carrying them out does not. Moreover, we shall see that the same few convolution kernels reappear in example after example to define the temporally nonlocal part of the exact condition. Practically, this means that the accurate approximation or compression of a rather small number of operators will have extensive applications. I will show how this can be done in the second part of this article. The outline of the theory of exact conditions presented here has been known for some time (Gustafsson and Kreiss 1979, Hagstrom 1983). However, it was only recently used to develop and analyse efficiently implementable but arbitrarily accurate approximations. The equations considered fall into two classes: hyperbolic equations for which the spatial and temporal operators have the same order, and equations which are first order in time but second order in space. Viewed as pseudodifferential operators, the exact conditions in each case are of different types. As a result our techniques for separating them into local and nonlocal parts differ.
2.1. The scalar wave equation The scalar wave equation is the most ubiquitous model of wave propagation and, hence, is the natural starting point for our study. Consequently, the vast majority of work on the subject has been devoted to this case. I will proceed from the general to the particular, ending with useful expressions for exact conditions on planar, spherical and cylindrical artificial boundaries. General boundaries We consider a mixed problem for the inhomogeneous wave equation in an unbounded domain, Q: 2 C
V 2 « + / , t > 0 , xefl,
(2.1)
with initial and boundary conditions: u(x, 0) = uo(x), a
du — (:z, 0) = vo(x),
(2.2)
7T+0l£+'yu = 9, zecKl (2.3) On dt To truncate the problem, we choose a bounded subdomain, T c f i , which we assume contains the support of the data, / , uo, Vo and g. The boundary of T then consists of two parts, £ C dfl and what we will call the artificial boundary, T. For example, if we are solving an exterior problem, that is, if
52
T. HAGSTROM
Fig. 1. Domains for an exterior problem: E is the tail, T is the computational domain, T is the artificial boundary
the domain ft is the complement of some finite number of finite domains, then X will consist of the boundaries of these regions while F will be some closed surface which surrounds them. (See Figure 1.) In T we solve (2.1), (2.2) and (2.3) supplemented by an additional boundary condition on F: (2.4) Beu = 0, x We term the boundary condition (2.4) exact if the truncated problem has a unique solution which, for all t > 0, coincides with the restriction to T of the solution of the original problem posed on the unbounded domain Q. Note that we insist on a homogeneous boundary condition on F, so that it will be explicitly independent of the data. If we allowed the support of these to extend beyond T an inhomogeneous condition would generally be required. An indirect description of Be may be derived as follows. Consider the part of the domain 'discarded' as we pass from Q to T, i.e., the 'tail', E = f2 — T. Consider the set, S, consisting of all solutions of the homogeneous wave equation in H, with zero initial data and satisfying the boundary condition (2.3) with g = 0 on that part of dE which is also part of d£l. As we have as yet imposed no boundary condition on F, we expect that S will be infinite.
RADIATION BOUNDARY CONDITIONS
53
Now the restriction to S from Q. of the solution, u, to our original problem must be an element of S. Moreover, so long as the trace on F of a solution of (2.1), (2.2), (2.3) in T, along with the trace of its normal derivative, coincide with the traces of an element, w, of <S, the solution is easily extended to U by simply setting it equal to w in S. Set A to be the linear subspace of all pairs of functions on F which coincide with the trace of an element of S and the trace of its normal derivative. We call A the admissible subspace. Then Be is defined by Beu = 0 <=*> u € A.
More directly, this formulation represents Be = I — P4, where P4 is a projection operator for A. We have been somewhat informal in our use of concepts from functional analysis, and the full justification of this abstract construction requires more work. However, the general recipe outlined here may be applied to other problems, as will be shown below. It also forms the basis of the construction of exact conditions for elliptic problems as given in Hagstrom and Keller (1986). Useful, and generally easily justifiable, representations of the operator Be can be obtained by means of Laplace transformation and the theory of elliptic equations. We thus consider the Helmholtz equation s2u = V2u, x e S,
(2.5)
with boundary conditions (si)
a— + (sp + 7)u = 0, xedftn 9S, on and s restricted to some right half-plane, Res>?7>0. We have also introduced
(2.6)
(2.7)
s
s = -. c
At least for r\ sufficiently large, we also require that u be bounded and assume that a, /?, 7 are real and a,(3>0. For our purposes, it is simplest to parametrize the transforms of elements of S by their boundary values at F. In particular, given some sufficiently smooth function w(x, s) defined on F, there exists a unique solution, u, of (2.5)-(2.6) such that u(x, s) = w(x, s), x G F. See, for instance, Ramm (1986) for proofs in various unbounded domains. Now we may compute the trace of the normal derivative (outward for T) of
54
T. HAGSTROM
this solution on F, which defines the so-called Dirichlet-to-Neumann map,
-t>:
du
Vw, x e r.
an Denoting Laplace transformation by C, our exact boundary operator Be may thus be denned by Beu = ~ + ZT 1 (l>£u) . (2.8) on ^ s We note that Be is not a local {i.e., differential) operator. This fact has led to the conclusion that direct implementations of the exact condition are uneconomical: a conclusion we shall demonstrate to be false for some special choices of the artificial boundary. One way to express V is in terms of the Green's function for the problem in H. Let G(x, y, s) satisfy
along with (2.6) (in y) and G = 0,
Then, for x g F , A
/ N
d
[ dG dG,
, , , ,
-Vw(x) = - - — / -—{x,y,s)w{y)dy. onx Jr ony (This equation must be interpreted in terms of limits as x —> F.) For purposes of computation, it is most convenient to express V in terms of its eigenvalues and eigenfunctions, that is, oo
Vw(y,s) = ^\j{s)Yj(y,s)
.
/ Y*(z,s)w(z,s)dz.
In special cases this representation further simplifies, as the eigenfunctions, Yj, turn out to be independent of s. Then the nonlocality may be expressed as the composition of a spatial and temporal operator, each of which may be amenable to a 'fast' evaluation. The invariance of the eigenfunctions with s follows from a scale invariance of the artificial boundary. Boundaries for which it holds include planes, spheres, cylinders and cones. Detailed expressions for Be in these cases are developed below. We finally note that extensions of the solution in S from F to another boundary F' C H may be used in lieu of the exact boundary condition. Precisely, it follows from causality that for any x' € F ' we may derive rep-
RADIATION BOUNDARY CONDITIONS
55
resentations of the form
u(x',t') = f
[ [KD(x',x,t',t)u{x,t)+
Jt
(2.9)
+ KT{x' ,x,t' ,t)-^-(x,ty\ at J
dxdt,
where the kernels KD,N,T a r e determined by the geometry of S and the choice of I" and where 6 is the minimum distance between the two boundaries. Indeed, such expressions follow from the various representations of solutions of (2.5) by potential theory. Using (2.9) to provide Dirichlet data at T', it is possible to solve the wave equation in the extended domain consisting of the union of T and that part of S bounded by T and I". This approach was first suggested by Ting and Miksis (1986) and later implemented by Givoli and Kohen (1995) for exterior problems in three space dimensions. Then one may use the well-known Kirchoff formula as a particular realization of (2.9):
ldu r dn
1 dr du re On dt
,,
where d/dn is the outward normal derivative on V and r = \x — x'\
Although I do not expect that boundary conditions based on this formula will be competitive from the point of view of cost with other equally accurate treatments discussed below, it is important to note that the paper of Ting and Miksis (1986) represents one of the first serious attempts to use exact conditions for the time-dependent wave equation. It is generalized to the equations of elasticity in Givoli and Kohen (1995) and to Maxwell's equations in He and Weston (1996). Planar boundary We now suppose that H consists of the half-space (x,y) € (0, oo) x Rn~l. Applying a Fourier transformation in y with dual variables k, (2.5) becomes the ordinary differential equation —
= (s2 + \k\2)U,
x>0.
For Res > 0, bounded solutions of (2.10) are of the form
(2.10)
56
T. HAGSTROM
where the branch of T> = (s2 + \k\2)1/2 = yjs2 + \k\2 is chosen so that V is analytic and has positive real part when Re s > 0 and satisfies T> ~ s,
s—
oo.
The branch cut is conveniently chosen to be a curve in Re s < 0 connecting the branch points . The exact condition (2.8) is expressed in terms of u in the following way. Let T denote Fourier transformation with respect to y, and let T~l be its inverse. Rewrite f> by removing its large s part so that the remainder is the transform of a function
Finally, let 1 /-i
——
v l — w2coswtdw.
t 7T 7 - 1 As shown, for example, in Hagstrom (1996),
(2.11)
K(s) = Vs2 + 1 - s. Therefore, using standard formulas from Laplace transform theory (e.g., Doetsch (1974)) we finally have the exact condition at x = 0:
(Here, * denotes convolution in time.) Note that, as mentioned in the preceding section, we have written Be as the composition of nonlocal spatial and temporal operators. This is a consequence of the fact that the eigenfunctions of T> are simply the Fourier modes and, hence, are independent of s. We note that the exact condition (2.12) applies with minor modification to problems that are periodic in y or that are posed in cylindrical domains such as waveguides. Indeed, it even applies to certain problems with variable coefficients. Consider
with some homogeneous boundary conditions, Bsu = 0, y £ d&. Suppose, with these boundary conditions, that the operator L has a complete, L2orthonormal set of eigenfunctions Yj(y) with negative, real eigenvalues — KJ. For example, L could be a variable coefficient Sturm-Liouville operator. Then the analysis above can be repeated with the Fourier transform replaced
RADIATION BOUNDARY CONDITIONS
57
by the Sturm-Liouville expansion and | k | replaced by K,J . Precisely, we have = 0 . (2.13)
Spherical boundary Using standard spherical coordinates, (p, 6,<j>), S is defined by p > R and F is the sphere p = R. The Laplace transform of the solution, u, in the tail is now expanded in spherical harmonics, that is,
where
and dp2
p dp
\
p2
For Res > 0, bounded solutions of (2.14) in S are given by =
Ai{s)J^-=Kl+1/2(ps),
the modified spherical Bessel function of the third kind. (See Abramowitz and Stegun (1972, Ch. 10).) It may be represented in terms of elementary functions:
^
J
)-».
(2.15)
Hence we have the following expression for the exact boundary condition, which as before we write as the sum of a local operator and convolution with a function
where, for I ^ 0,
58
T.
HAGSTROM
and 5o = 0. Returning to the time domain and denoting by H the spherical harmonic transformation, we find
| + j |+
S » + ^ « " ' < * « > ' >
= 0.
(2.18,
The rationality of 5) implies that the temporal convolution in (2.18) can be localized, that is. its equivalent to the solution of a differential equation, albeit of order I. T h e localizability of the exact boundary condition was first noted and used by Sofronov (1993) and Grote and Keller (1995, 1996). We also have the following beautiful continued fraction representation for Si, which will play a role in efficient implementations of the exact condition: o ^
_
M (
~
}
-
W + D 9
1 | i
,
1
2
+ 2
4
l-+
Z
x
r 9 l q
m+i)—2—• 4 ( - + ' ( ' + >- ' +
4(
2
2
( 3 N
1
9
)
l
+ 3+..0 J
It is also possible to derive analogous expressions for sphcre-to-sphere ex tension operators. As mentioned earlier, these can be used in lieu of bound ary conditions. The properties of localizability and ease of approximation which (2.18) possesses remain valid for the extension operators. In fact it is the extension formulation that is implemented in Sofronov (1999). The construction of exact conditions on a spherical boundary is directly generalizable to conical domains. Precisely, we suppose E may be described in spherical coordinates by E = { ( p . M ) e (R.oc)
x0},
and that appropriate boundary conditions are imposed on 9 0 . Then we may separate variables as before except that we must expand in terms of the eigenfunctions of the Beltrami operator on 0 and. in (2.14), we must replace —1(1 + 1) by the eigenvalues —Kf- Then the transform of the exact boundary condition is as in (2.16) except that the index of the modified Bessel functions is
Of course, for these indices, the exact condition is no longer a rational func tion of s, so that the operators are not equivalent to local operators in time. However, as we shall see later, this does not preclude their effective implementation. Cylindrical
boundary
We now take T to be the infinite cylinder described in standard cylindrical coordinates, (r.6,z), by r — R. Note that, just as in the case of a planar boundary, we may restrict z to a finite interval with the addition of appro priate boundary conditions and may also replace d /dz by a more general 2
2
RADIATION BOUNDARY CONDITIONS
59
Sturm-Liouville operator. In the formulas that follow, this would simply require replacing Fourier transforms in z by Fourier series. Thus, with k denoting the Fourier dual variable to z and I indexing the Fourier series in 9, we derive the ordinary differential equation =0
(2.20)
r Or in the tail, with bounded solutions for Re s > 0 given by
ui = Ai(s, k)Ki(r\/s2 + k2). Here, again, K\ denotes the modified Bessel function of the third kind (Abramowitz and Stegun 1972, Ch. 9). Using the well-known asymptotic expansions of K\ for large argument, found, for example, in the previously noted reference,
we may again express the temporal part of the boundary condition as the sum of a local operator and convolution with a function. Precisely: V,
= s + —- + (Vs2 + k2-s) + -,Cl(RVs2 + k2),
In the time domain we have
Tu = (ck2K(c\k\t) * u + —^(Gtict,R,
k) * (Fou))\ ,
where Te,z represents Fourier transformation in 6 and z, respectively, and
This result can be extended to domains where 9 is restricted to a subinterval of (#0J#I) £ (0, 2TT) and additional boundary conditions are imposed. Then we need only change the index of the Bessel functions in (2.21) from I to KI where — nf is the Ith. eigenvalue of d2/d92. We note that, in contrast with the spherical case, condition (2.22) cannot be exactly localized in time. However, as previously mentioned, the operators in all three cases may be very accurately and efficiently approximated
60
T. HAGSTROM
using essentially the same techniques. In particular, it is possible to express Ci as the sum of a rational function and a function denned as an integral. Convergent approximations are derived in Sofronov (1999) by discretizing the integral. Similarly, the transform of the planar kernel may be expressed in integral form, and convergent approximations are derived in Hagstrom (1996) by discretization. In subsequent sections we will outline how these representations can be combined with multipole expansions to develop uniform approximations as in Alpert, Greengard and Hagstrom (19996). The dispersive wave equation The dispersive wave equation is given by ^
VW
(2.23)
Exact boundary conditions for (2.23) have precisely the same form as those for the wave equation, except that eigenvalues used in the definition of the nonlocal operators are shifted by one. Therefore we have, on a planar boundary, from (2.12):
On a sphere, using (2.18), we have
0+
!
^
where ~
/{(Rs)~1/2Kl/(Rs))'
l\
and
Note that this condition is not temporally localizable. Finally, on a cylindrical boundary, we adapt (2.22):
tu = (crfKicryt) * u + ^e\Gi{ct, where 7 = yj\k\2 + l, and Ci is given by (2.21).
Gi
R, k) *
RADIATION BOUNDARY CONDITIONS
61
2.2. Generalizations to hyperbolic systems Consider now a general first-order hyperbolic system with artificial boundary F given by the hyperplane x = 0, denoting as above the tangential variables by y. The equation in the tail takes the form du
v-^ _, du
,n
,.
where u € M9, AQ,BJ € Rqxq. We assume strong hyperbolicity. (See Kreiss and Lorenz (1989).) This implies, among other things, that the eigenvalues of P = ii k0A0
V
are purely imaginary for real kj. To simplify the algebra, we also assume that the artificial boundary is noncharacteristic: that is, AQ is invertible. Solving for du/dx and carrying out our usual Fourier-Laplace transformation in y and t we find: ^
= Mu, M = AQ 1 I si - ^2 ikjBj j .
(2.25)
By our assumption of hyperbolicity, for Re s > 0 no eigenvalue of M can lie on the imaginary axis, since that would imply a nonimaginary eigenvalue of P. Therefore, for Res > 0 we may define two distinct invariant subspaces of M: the first, of dimension q+, is associated with eigenvalues with positive real part and the second, A, of dimension g_ = q — q+, is associated with eigenvalues with negative real part. Let Q+(s,k) be a q+ x q matrix of full rank all of whose rows are orthogonal to A. Then an exact boundary condition at x = 0 is given, in the transform variables, by Q+U = 0.
Fixing k and letting s be large, standard results in matrix perturbation theory (Kato (1976, Ch. 2)) imply that we may rewrite this as (2-26) Q+,ou+ $+(*>*)* = 0> where the constant matrix Q+,o is determined by the invariant subspaces of AQ, and hence may be taken so that Q*+QU defines the usual normal characteristic variables, and
62
T.
HAGSTROM
Therefore the elements of Q are the transforms of bounded functions and the temporal transform may be inverted to reveal a convolutional form: +
l
Q ; u + T~ ; 0
[Q\(7u])
= 0.
(2.27)
The exact boundary condition at more general boundaries might be devel oped in the following way. At each point on the artificial boundary identify normal incoming and outgoing characteristics for the interior domain, T . For the tail, E, the roles of these variables are reversed: outgoing for T is incoming for E and vice versa. We parametrize solutions in the tail by their incoming data, thereby expressing the outgoing characteristic variables, from the perspective of E, in terms of the incoming characteristic variables. From the perspective of the computational domain, T, we express incoming variables in terms of outgoing variables as expected. This construction is complicated by the fact that the number of incoming and outgoing charac teristics will generally be different on different parts of the boundary, and we have not yet carried it out in any generality. A n example where this diffi culty occurs, namely the subsonic compressible Euler equations, is discussed below.
Systems
equivalent to the wave equation:
electromagnetism
Many hyperbolic systems of physical interest are isotropic. This, in addi tion to the requirement of homogeneity, forces them to be, in some sense, equivalent to systems of wave equations. Then, our formulations of exact boundary conditions for the wave equation can generally be translated into exact boundary conditions for the equivalent systems. A prime example of this is provided by the equations of electromagnetism. Of course, one can write these equations directly as a system of four wave equations for the vector and scalar potentials. (See, for instance, Schwartz (1987, Ch. 3).) Using the more conventional field variables, E and B, and assuming in E an absence of charges or currents, we have Maxwell's system: dE - - c V x B dt
=
0.
dB — + cVxE at
=
0.
v
(2.28) ' (2.29)
where c is the speed of light. In addition we have V - £ = V - £ = 0.
(2.30)
(Of course E and B are uniquely determined by (2.28)-(2.29) and the initial and boundary conditions. However, it is easily seen that (2.30) is preserved under the time evolution.)
63
RADIATION BOUNDARY CONDITIONS
We begin with the simplest case of a planar boundary. Writing the system in the form (2.24) we note that the coefficient matrix corresponding to ^-differentiation is singular, so that the boundary is characteristic. Applying Fourier transformation in the tangential variables and Laplace transformation in time leads to a differential-algebraic system where the algebraic equations are given by sE\ sB\
= ik2B3 — ik3B2, = -ik2E3 + ik3E2.
Here s = s/c and the subscript 1 denotes a field component in the x direction. Using the algebraic equations to eliminate E\ and B\ yields a system of four equations for (E2, E3, B2, B3)T in the form (2.25) where
M=
The eigenvalues of M are given by A
=v
s2
fcf,
where the branch is chosen as in our discussion of the wave equation. Exact boundary conditions may be determined by computing left eigenvectors corresponding to A+. As this eigenspace is two dimensional, we may choose two independent eigenvectors which in turn will generate two boundary conditions. Note that the only nonlocal operator that can arise is expressed in terms of A+, and hence will be the same as encountered in the case of the wave equation. One reasonable choice, which leads to symmetric formulae, is
s) + , s(X+
, k2k3, ~s(X+ s)
, S(A + + s) +
After some algebra, and the reintroduction of E\ and B\ to further simplify the results, we finally obtain (2.31)
c at
ot
dy
=
0,
(2.32)
64
T. HAGSTROM
where TZ is the nonlocal operator appearing in (2.12):
Uw = T~x (c\k\2K(c\k\t) * (Tw Note that, by taking appropriate linear combinations, many other forms could be obtained. Similarly, exact conditions at spherical and cylindrical boundaries for the Maxwell system involve the nonlocal operators appearing in (2.18) and (2.22), the exact conditions for the wave equation. Again, a number of formulations are possible, as each field component solves the wave equation individually and hence satisfies that equation's exact boundary condition. Of course, not all such formulations are well-posed. Below we outline a direct derivation in the spherical case which involves the application of the nonlocal operator to a minimal number of quantities and is hence somewhat less expensive to implement. For an alternative form see Grote and Keller (1999), where the authors adapt their derivation of local exact conditions for the wave equation. We begin by performing a Laplace transformation in time and expanding E and B in the orthogonal basis of vector spherical harmonics (Newton 1966, Ch. 2): E
= I
B = \ ' i
where V(°) - V,.
V{e)
_ SYl
1
(2.33) and ep, e# and e^ are the standard unit basis vectors in the spherical coordinate system. As in the case of a planar boundary, Maxwell's equations now lead to a differential-algebraic system in p for the expansion coefficients. The algebraic equations may be used to eliminate the coefficients associated with the radial harmonics, Yj ':
65
RADIATION BOUNDARY CONDITIONS
We then have the following first-order system for the remaining variables: / f e,l \ d_ dp
1 p
Be,i
=s
0
0
0
0
0
0 - 1 - 1(1+1) \ ( 4,1 0
1+
0
Be,i
0
\Bmi
Bounded solutions are given by 0 ki(ps) k^ps) + 0
o o V
-hips)
where ki is the modified spherical Bessel function of t h e third kind (2.15). In terms of t h e expansion coefficients this may be written as =
0,
=
0,
P P2
where Si is as in (2.16). Finally, we note that if we define B = ep x B = —
then
Therefore, letting p — R be t h e b o u n d a r y location a n d inverting t h e t r a n s forms we reach our final form:
ld_(E9-B4 cdt
4E n
(2.34)
(Si(ct/R) * Ei,m) - F/ e ) (S^ct/R) *
= 0.
i
Not surprisingly, it is also possible to formulate the exact boundary condition on a cylinder using the nonlocal operator in (2.22), but we will present the details elsewhere. Systems equivalent to the wave equation: elasticity The equations of linear elasticity in an isotropic medium are typically formulated in terms of a 3-vector, u, describing the displacements (Eringen and §uhubi 1975, Ch. 5). They are given by Navier's equations: dt2
(2.35)
66
T. HAGSTROM
where Q and ct are the irrotational and equivoluminal sound speeds, respectively. Unlike the case of Maxwell's equations, each component of u does not satisfy the scalar wave equation. However, a Helmholtz decomposition of u produces a vector and a scalar wave equation, one with each wave speed. We shall proceed directly, deriving expressions for the exact boundary condition at planar and spherical boundaries. The direct approach to deriving exact boundary conditions on the plane x = 0 is to reduce the problem to a first-order system in x, carry out a Fourier-Laplace transform in (y, t), and compute the requisite projection operators into the admissible subspace, A, as described above. However, as this is a second-order system, we will jump ahead and look for the Dirichletto-Neumann map. Seeking solutions of the form U=
e**+st+ik-yv
leads to the algebraic system (c2(A2
_ |fc|2) _
S
2
) I +
(C2
_ c2)wwT^
Q^
V =
«,= ( . * ) .
(2.36)
There are three independent bounded solutions of this system. Two correspond to the equivoluminal modes
V s ct + k ,
-
^^J,
where the 2-vector q is orthogonal to k,
The third corresponds to the irrotational mode A, =
-1
Setting V to be the matrix whose columns are Vi, we have at x = 0, for some 3-vector c, u = Vc,
du — = VAc,
Therefore where, after some algebra, we find
A = diag(At, At, A/).
RADIATION BOUNDARY CONDITIONS
67
and V / 'd
c
// 's
2~\ ~T 2~\ ' l *l + ct A t
Returning to the time domain and using the fact that (kkT + qqT) = \k\2I, we find ~ +^ + Cu + T-l{E*{^u)) = 0, ox dt
(2.37)
where
E = diag(l/Q, 1/ct, 1/ct),
C = f( °
"V^zF ) .
The transform of the nonlocal operator of E is given by E =
— Cl
We see that it involves both the kernel K, through the terms At + s/ct and A/ + s/ci, as well as some new kernels whose transforms are jd and 7 S . As for Maxwell's equations, exact conditions at a spherical boundary are most easily obtained by expanding the solution, u, in terms of vector spherical harmonics (2.33). Denoting the transformed expansion coefficients by u/o, ui:f, and u;jTO, we derive, after some algebra, the following equations where prime denotes differentiation with respect to p: 2-
2;/; , 1\ (™'l,e , * V ^ , 0 ^
SUlfi = ctl(l + l)\-^ +
-^--^-j
p l'c 2
2-
Bounded solutions of this system, which may also be directly derived via a Helmholtz decomposition as in Eringen and §uhubi (1975, Ch. 8) are
68
T. HAGSTROM
described by 0 0 0
h(ps/ct))
where ki is again given by (2.15). Denoting by B the matrix above, the Dirichlet-to-Neumann map is denned by the matrix
It is clear from the structure of the matrix B that the entries of £>i will be rational functions of s. Hence Di corresponds to a localizable operator in time. Separating out the local part of the operator, we reach the form 1
l(l + l)*=2. 0
The elements of the 2 x 2 matrix Pi are rational functions of s of degree no more than (21 + 3, 21 + 4). We have not yet studied them in any detail. Writing
I up U—\U0
using the identities
/ V
1(1 + 1)
X ( V X Un0Tm) =
1„
and inverting the transforms, we find -Q- + S— + —U -I
^ep (V
X yV X
Unorm)
0 Si*
A direct derivation of an exact local form, based on the Helmholtz decomposition and their approach to the wave equation, was first given by Grote and Keller (1998).
RADIATION BOUNDARY CONDITIONS
69
Linearized gas dynamics: a problem with anisotropy
In all the examples above which we have studied in detail, the system has been isotropic. A consequence of this is that the number of boundary conditions required is independent of the location on the boundary. We now consider an important system for which this is not the case. The linearized, subsonic Euler system for a polytropic gas in three space dimensions is given by
where P\
1
(
0 0 0
Q =
T
\ ej+l + ej+i
0 0 0
V7" 1 / V7-1/ where ej+\ is the (j + l)th unit 5-vector, p is the density perturbation, Uj the velocity perturbations, and T the temperature perturbation. Finally, we assume \TJ
Applying a Fourier-Laplace transformation with dual variables (fo, k3,s), and solving for x-derivatives, we obtain a system in the form (2.25) with M given by / 5(1-7(1-1/?)) 7(7i(l—Uf) s
7(1-1/?) _ifc2_
0
ifc3
0
71/1
\
ik2Ui
s
l-l/? st/i
1
s_
ifc 3
s
7(i-r/?)
0 s t/l
0 l-C/2
1-U?
where
s = s + ifc2t/2 4- HC3U3. The eigenvalues of M are given by
A =
1 - U\
s
yUi(l—Uf)
1-t/?
ifc2
1
7t/i(l-£7?)
ik3Ui
1
_ife2_
ifca_ 71/1 7 i7i(l-(7 1
2
)
70
T. HAGSTROM
and a triple eigenvalue Ao = - ^ Setting
five left eigenvectors are given by
\ 7
iQl
=
7
( 7 - 10 0 0
- 1 ) ,
l02 = (ik2 ikiU\ s 0 0) , ^0,3 = (*^3 ikslli 0 s 0 ) . Note that the signs of the real parts of the eigenvalues, and the number with positive and negative real part, depends on the sign oiU\. I will as usual assume that T lies to the left of x = 0, so that the outflow case corresponds to U\ > 0 and the inflow case to U\ < 0. Outflow boundary. At outflow we require one boundary condition, corresponding to the single incoming characteristic or, equivalently, the single eigenvalue A+ with positive real part. The boundary condition is defined by l+. After inverting the transforms we find (Z.66)
where we have introduced the pressure perturbation, and Dtanw Dt
dw ~~dt+
dw 2
~dy+
dw 3
~dz:
Hw = T'1 ((1 - Ul)\k\2K[k, t) * {Fw k(k,t) = Inflow boundary. At inflow, that is, if U\ < 0, we require four boundary conditions as the triple eigenvalue Ao is now positive. It would be natural to simply append the three conditions defined by loj to the condition defined by /+. However, when s = — \k\U\ > 0, A+ = Ao and l+ is in the span of the IQJ. Therefore, the straightforward construction leads to ill-posed problems. (See also Giles (1990).) This may be remedied by replacing l+ with
RADIATION BOUNDARY CONDITIONS
71
an appropriate linear combination of the four conditions. One possibility, analogous to the one used in two-dimensional computations by Hagstrom and Goodrich (1998), is
This leads to the system of four boundary conditions: Aan, Dt ^
u b, LJ
2
v
^
, , 1 + t/l (0u2 2 \dy
iy
du3\ = dz J
(7-l)p-T
=
0, (2.39) 0, (2.40)
The stability of these conditions and derived approximations will be shown by Goodrich and Hagstrom (1999). Note that (2.40) is simply the statement that the entropy perturbation is zero at inflow. Using it and the momentum equations we see that (2.41) and (2.42) are equivalent to setting to zero the tangential components of the vorticity at the boundary dx
dy '
dx
dz
The boundary condition for the acoustic modes is related to exact boundary conditions for the convective wave equation satisfied by the pressure, p. Precisely, it may be obtained from this exact condition and the use of the equations to eliminate the normal p derivative. As mentioned earlier, we have no direct derivation of exact conditions for exterior problems with anisotropy. However, the interpretation above is suggestive of an ad hoc approach, which may work in this important case. In particular, the analogues of (2.40)^(2.41), namely the specification of zero entropy and tangential vorticity perturbations, are valid at inflow for any convex boundary. This leaves us with the problem of deriving exact conditions for the convective wave equation satisfied by the pressure perturbation, and then coupling it with the equations and boundary conditions to produce a well-posed problem. These latter problems to date are unsolved. 2.3. Equations of mixed order Although the vast majority of work on radiation boundary conditions has been concentrated on the hyperbolic case, it is possible to apply the same general principles to construct exact boundary conditions for equations of different types. In this section we consider examples that involve partial
72
T. HAGSTROM
differential operators of first order in time but second order in the spatial variables. The major difference, in comparison with the hyperbolic case, is in the dependence of the symbol of the exact operator on the dual variable to time, s. (See Halpern and Rauch (1995) for a discussion of the appropriate symbol classes in the case of parabolic systems.) In particular, we cannot write the operator as the sum of a local operator and a temporal convolution with a bounded kernel, but rather write it as convolution composed with time differentiation. Our three examples will include the scalar advection-diffusion equation, the Schrodinger equation, and the incompressible Navier-Stokes equations. All three cases will be treated in three space dimensions at planar boundaries and all but the Navier-Stokes equations at spherical boundaries. As the reader is now experienced with the separation-of-variables techniques for deriving the boundary conditions, I will omit some of the details. The advection-diffusion
equation
Consider the scalar advection-diffusion equation (2.43) ^ + U Vu = uV2u, at in the tail, S, defined by x > 0. After Fourier-Laplace transformation, we derive the following expression for bounded solutions: fi = A(s, k)e-*°,
A=
s + iUtn-k tfi/2 + yju{s + iUtaa
+ vW
{2M) 2
k + v\k\ ) + C/2/4
with the exact boundary condition given by du — + An = 0. ox Here we have written U = (Ui,Utan)T and chosen a branch of the square root with positive real part for Res > 0. Note that we cannot write A as the sum of a polynomial in s and the transform of a function. However, we can return to the time domain, finding: | | + T~x (W+ * (F(Nu))) = 0, Nu = — + Utan
Vyu
(2.45)
-
2
V
(The formulas leading to W+ may be found in Oberhettinger and Badii (1970).)
RADIATION BOUNDARY CONDITIONS
73
Similarly, we can treat the case of a spherical boundary. Clearly, without loss of generality, we may assume that the advection field, U, is in the x direction. Then we have 2 su + Ui — = uV ii. dx
Set ul£ ^ u = e~2 V.
Then, v satisfies
(s +4u — I v = vV2v.
J
V
Therefore, repeating our analysis from the case of the wave equation, (2.16), we find Ui . . , , umdv du — = —- sin 9 cos
— sin 0 cos 6 u 2v
Inverting the Laplace transform we reach our final form:
3u dp
Ui . 2v
„
n
^
(du ydt
U? \ 4v J
{
^
1 R = 0,
(2.46)
where
P{t) = (nvQ-We-Q,
§i(R,t) = C
Due to the presence of the irrational function of s in the argument of §1, (2.46) cannot be directly expressed as a local operator. We note that the special case of the heat equation is recovered by setting U = 0. For the heat equation, however, fast methods for evaluating the solution operator are available (Greengard and Lin 1998), which may lead to more efficient methods of solution in most cases. The Schrodinger equation The Schrodinger equation is denned by .du = -,2W -i— dt
74
T. HAGSTROM
Clearly, representations of exact boundary conditions in transform space may be obtained from those above by setting U = 0, replacing s by —is and choosing appropriate branches. For the case of a planar boundary, define ^J—is + \k\* so that, for Re s > 0,
Re U-is
+ \k\2) > 0 .
(2.47)
Then, on the planar boundary x = 0 we have, in analogy with (2.45),
Tu
= _ i ^ _ ot
v
v
y
w = e-*ifcia*+W4(7rt)-i/2_
On the spherical boundary, choosing \/—is according to (2.47), we adapt (2.46) and find
where
T/ie linearized incompressible Navier-Stokes equations As our final example of an equation of mixed order, we consider the construction of exact boundary conditions at a planar boundary for the NavierStokes equations linearized about a uniform flow. Again we shall see that our general techniques apply, and that the temporally nonlocal operators that arise are the same which are needed in the case of the advection-diffusion equation. (For alternative constructions in each case see Halpern (1986), Halpern and Schatzman (1989).) We thus consider
du o — + U u + Vp = vV2u, in the tail H C M3 denned by x > 0. We make no restrictions on U\ so that there may be either inflow or outflow at the boundary. Following our standard construction, we perform a Fourier-Laplace transformation and make the system first order in x. Here we use the divergence constraint to
75
RADIATION BOUNDARY CONDITIONS
eliminate dui/dx, leading to a system of six equations: / dw —— = Mw, ox
w=
P \
U3
9a;
U
—s
0 0 0 i&2
0 0 0 0
\ik3 where s = s + iUtan
-ik3 0 0 0
0 0 s
0 0
s
0 k + v\k\2.
The eigenvalues of M are 2v
with 7 satisfying Re 7 > 0 when Re s > 0 is defined by 7=
It is natural to define exact boundary conditions by the left eigenvectors associated with A i ^ which may be given by
12
=
13
=
ik2, - 2
C/1+7 ik3,~2 ik3vs
,s-vkl,-vk2k3,
Ui+1
, s - vk%, 0, ^
,0 , 7
However, when s = u\k\2 — Ui\k\, Ai = A2,3, and we find that q\ is in the span of <72,3- Therefore we replace it by a linear combination of the three eigenvectors which remains independent of (72,3:
- (l, -2(1 + v\k\W+),ik2v{l where z = Ifc^s and =
+ zW-),ik3u(l
+ 2W-),0,0) ,
76
T. HAGSTROM
Introducing the operators
H = F-\\k\-lFu),
N=(^
+ Utan-V- ^Vt2an ; ,
the kernels — e
which we recognize from our study of the advection-diffusion equation, and the temporally nonlocal operator = Jr~l * (Tu)), the exact conditions in the time domain are expressed by
^
dy
+^ )
dz J
= 0, (2.48)
-
"
Construction of exact conditions on a spherical boundary has not, to our knowledge, been carried out. A closer study of the exact conditions in the planar case is quite suggestive of how these would look. Note, in particular, that (2.48) is related to the exact condition for the Laplace equation satisfied by p, namely px + H~lp = 0. The other two conditions are related to the advection-diffusion equation for the vorticity. As the exact conditions on a sphere for the Poisson equation and the advection-diffusion equation are easily formulated, it is reasonable to believe that an exact condition for the Navier-Stokes equations can be similarly found and will involve the same nonlocal operators. Exact boundary conditions for the compressible Navier-Stokes equations can be found using the same techniques as discussed by Halpern (1991) and Hagstrom and Lorenz (1994). In that case, the number of boundary conditions is different at inflow and outflow boundaries. In addition, nonlocal operators associated with the wave equation are involved. 3. Approximations and implementations Having now completed an exhaustive study of exact boundary conditions for a wide class of problems of physical interest, we turn to the problem of efficient implementation of or approximation to the nonlocal operators appearing in our formulations. I will restrict attention to the scalar wave equation. As we have seen, exact conditions for most other important hyperbolic systems involve the same pseudodifferential operators, so the techniques we develop will be applicable in all these cases.
RADIATION BOUNDARY CONDITIONS
77
As mentioned earlier, it was generally believed that the direct implementation of the exact conditions is prohibitively expensive, a belief which discouraged their study. Considering flop counts, this belief is false. Hairer, Lubich and Schlichte (1985) present an algorithm for the fast solution of convolutional Volterra equations. Its application to the exact boundary conditions in integral form yields a method for which the computational effort is smaller than that required by the interior solver, except for unusually long times. However, this approach does require the storage of full time histories at the boundary, which is excessive for moderately long time simulations. The primary alternative to direct implementation of the temporal integrals is the use of approximations to (or in the spherical case representations of) the kernels by sums of complex exponentials. For such kernels, convolution is equivalent to the solution of differential equations, so that the necessary work per time-step and storage is proportional to the number of exponentials used. In the next few sections I will develop the basic error estimates for such approximations and consider some examples. The same theory can be used to analyse the error associated with sponge layers, and I will do so for the so-called perfectly matched layer (PML), a reflectionless sponge layer recently introduced in computational electromagnetics. 3.1. Convolution with sums of complex exponentials Consider the problem of computing Ou = H-1 (Ei * (Hu^), where now H is any of the spatial harmonic transforms from the preceding sections, I is the harmonic index, * is temporal convolution, and E\ takes the form j
<0.
(3.1)
3=1
Note that Ei(s) is a rational function of s of degree (n/ — 1, n{). There are two distinct and useful ways to represent E\. The first is as a sum of poles, which is directly derivable from (3.1):
Then, for any function w(t),
'£lj(t),
(3.2)
78
T. HAGSTROM
where the >y satisfy the differential equations dt
4>ii(0) = 0.
"
The second representation is as a finite continued fraction,
which is terminated by the condition %m+i = 0. Then, following Xu and Hagstrom (1999), Hagstrom and Hariharan (1998), we may evaluate E\ * w in recursive form. In particular, set >
J
i
k
I
l
k
lk
where En = E\ and E^ni+\
= 0. Set
Wk = Elk * wk-li
w
0
=
w
,
Wni + l
=
0.
Then we have dwk Ot
= l,...,ni.
(3.4)
Hence, for each representation, we must introduce rii auxiliary functions for the Zth harmonic for a total of N 1=0
auxiliary functions where N is the number of harmonics used to represent the solution. The work per step and storage associated with the convolution is thus proportional (with a small constant) to 7V"a. Note that we must also apply the harmonic transform, 7i, and its inverse at each step. In some instances this simply involves fast Fourier transforms, while in others we require spherical harmonic transforms using, for instance, the methods of Mohlenkamp (1997) and Driscoll, Healy and Rockmore (1997). In the former case the work is O(N In N), but in the latter we have 0(iVln 2 N). In special cases, the harmonic transform phase of the application of the boundary condition can be avoided. This occurs when the constants Q/J, fyj or jij, 6ij are eigenvalues of differential operators with eigenfunctions given by the Zth harmonic. The most important example of this is the case of the exact boundary condition at a spherical boundary. Then (2.19) has the
RADIATION BOUNDARY CONDITIONS
79
form (3.3) with s replaced by Rs/c and la
8
lj
1(1 + 1) = —g '
=
3-
Recalling that we see that for A; = 1 , . . . , N (3.4) takes the form
with w;o = u/2, u>./v+i = 0. Then, if we assume
we have
that is, w\ is precisely the nonlocal part of (2.18). (Note: Wj as defined here differs from that in Hagstrom and Hariharan (1998) by a factor of (—I)-7.) The recursive form above can also be modified for use in approximating the nonlocal terms in (2.22), as discussed by Hagstrom and Hariharan (1998), though the approximation is no longer exact. For different approaches to implementing (2.18), see Sofronov (1999) and Grote and Keller (1995, 1996). As these involve spherical harmonic transforms, I expect the formulation given here will be somewhat more efficient, and certainly much easier to implement. The derivation of the continued fraction form was inspired by the reformulation by Barry, Bielak and MacCamy (1988) of asymptotic boundary conditions based on progressive wave expansions first suggested by Bayliss and Turkel (1980). See Hagstrom and Hariharan (1996) for more details. A different approach to applying approximate boundary conditions, based on localizable, homogeneous, rational approximations to the transformed representation of the exact boundary condition on a planar boundary (2.12), is developed in Higdon (1987, 1986). In particular, suppose that we approximate IM2
9
n
.
80
T. HAGSTROM
which is a general form for a homogeneous approximation that is localizable in time and space. Set
A 2 = s2 + \k\2, and note that, if u is a solution of the wave equation, 2
.
d2ti
Replacing |fc|2 by A2 — s 2 leads to a boundary condition of the form Q(s,X)u
=
0,
Multiplying through by the denominator and factoring the result we find that J[
rjjS + X)u = 0,
which is equivalent to
Stably implementing boundary conditions in this form is a reasonably simple matter. However, it may not be feasible to choose q very large (as is my intent) due to the growth of the difference stencil into the interior domain, so that I generally use auxiliary functions as in (3.2) or (3.4). Starting from (3.7), the parameters rjj, which typically correspond to cosines of incidence angles of perfect absorption, may be adjusted directly. This formulation has been applied in a number of more complex settings by Higdon (1991, 1992, 1994).
3.2. Stability and consistency Error estimates are derived, as always, by establishing the stability and consistency of the approximate boundary conditions. Let us begin with the simplest case of a planar boundary and an approximate boundary condition defined by
s + Vs + Fr Assume further, for simplicity, that T is the half-space x < 0. Then the Fourier-Laplace transform of the exact solution, u, and the error, e, take
RADIATION BOUNDARY CONDITIONS
81
the form ft = Ae-^+^x,
e=
with the amplitudes related by (3.8)
(Generally, for more realistic choices of T, e must satisfy homogeneous boundary conditions on S, and so has a more complicated form. However, similar error estimates can be derived.) By Parseval's relation we find, on bounded subsets T ' c T : INIL3(O,T;L2(T'))
( r 2T)(fc)r / e
\J
2
s u p \e\ \\u(0,k,-)\\l2i0!T)dk)
Res=t)(fc)
V/2
.
/
To bound |e|, we must derive an upper bound on its numerator (consistency), and a lower bound on its denominator (stability). It is interesting to note that if we replace the approximate boundary condition by the exact Of course boundary condition, the denominator has zeroes at s = the numerator is identically zero in this case, so the error is indeed zero. However, we expect that accurate approximate conditions will have small denominators near these points. A simple sufficient condition for stability is Rei?>0, Res>0. (3.9) This can be relaxed somewhat, as will be seen in one of the examples. Ideally, we would take r\ = 0 so that our estimates are uniform in time. However, stable, homogeneous spatially localizable conditions generally have poles on the imaginary s axis. (See Trefethen and Halpern (1986, 1988).) At such points |e| > 1, so that no useful estimate holds. Therefore, for spatially local conditions we generally must settle for finite time estimates. In numerical calculations we can only treat functions with wave numbers in some bounded set, |fc| < M. Our error analysis simplifies somewhat if we restrict our attention to this set. The accuracy of any method is then characterized by 6(T,M;R)=
sup inf sup eT/r|e|. 7
(3.10)
?S0
Alternatively, we can seek estimates involving derivative norms of the solution, leading to error estimates in terms of sup inf sup In this exposition I will follow the former, simpler approach. For examples of the latter see Hagstrom (1995, 1996) and Xu and Hagstrom (1999).
82
T. HAGSTROM
The analysis outlined above is easily extended to our other special boundaries. For example, suppose T is the sphere of radius R and the approximate boundary condition is defined by
Then =
ei(s,R)ul{s,R)JV PP h+\/2{Rs)
where //+1/2 i s the modified Bessel function of the first kind (Abramowitz and Stegun 1972, Ch. 9), and e; is given by C.
T/.
r-
(3.H)
The accuracy of the method may then be characterized by the obvious analogue of (3.10). In the following sections I will estimate 8 for various approximations. 3.3. Pade approximants and
generalizations
For planar boundaries the function R(s, \k\) approximates the function
with z = s/\k\. Let Rp(z) be some approximation to K. Noting that J\
==
2z + K
it is reasonable to set
Approximations based on (3.12) are extensively analysed in Xu and Hagstrom (1999). It is clear that they are in continued fraction form, and hence implementable via the recursion (3.4). Precisely, \k\2 71 = - g - .
6j = 0,
\k\2
.
7 j = -^-1
j = 1,...,
k =
raj_i,
Note that Rn+1 ~K =
(2z + K)(2z
j,...,n
h
£„, = |/c|^o.
RADIATION BOUNDARY CONDITIONS
83
Therefore, if we assume that Rp is at least bounded as z —> oo, we gain two orders in the large z approximation at each step. That is,
k - k \ = o(z-2)\Rp-k\. Consider the initialization RQ
= a > 0.
Then an easy induction argument shows that KeRp > 0 when Re 2 > 0 and that all poles are in the closed left half-plane. For a > 0, we can further conclude that the poles and zeroes of the real part lie in the open left half-plane. The choice a = 0, however, leads to the Pade approximants introduced by Engquist and Majda (1977) and Lindman (1975). These are spatially localizable, with poles on the imaginary z axis between . Xu and Hagstrom (1999) show that Rp is given by the following explicit formula: S^/WTW ~ b2n+1 - (-l)nb + a(b2n + (-1)"6 2 ) which leads to the remarkably simple formula for e: (3.13) From this we derive the following theorem. Theorem 1 Let a > 0 and ry, M > 0 be given. Then, sup
|e| < (1 + a)
Res=T],\k\<M
The proof of this theorem follows immediately from the inequality inf |6| > Jl + 2fi2. v
Re z=f]
From the estimate it might be concluded that the Pade approximants, o = 0, are optimal in this class. However, we have proceeded crudely. The minimum of |6| leading to the inequality above occurs at Imz = 0 where we can force e = 0 by a proper choice of a. A more detailed analysis is given in Xu and Hagstrom (1999). Imposing a tolerance r and a computation time T we require 6(T, M) <
T,
which implies _
1
In
^+VT
84
T. HAGSTROM
Optimizing this over r\ yields (for cMT on the order of In 1/r) the following result. Theorem 2 For any 0 < a < 1 there exists a constant, C, such that for any tolerance 0 < r < 1, wave number bound M, and time T > 0, the approximation Rp with p > C In - + V T satisfies 6(T, M; Rp) < r. We see that the number of terms required is weakly dependent on the tolerance, but strongly dependent on the time and the tangential wave numbers. A completely different convergence analysis for the Pade approximants was given in Hagstrom (1995), with the same conclusions. Many other space-time localizable conditions are proposed in Trefethen and Halpern (1988), whose accuracy has not, to my knowledge, been estimated.
3.4- Truncations of (2.19) and asymptotic boundary conditions A second approach to the construction of local approximate boundary conditions has been through the use of the progressive wave expansion, given in the cylindrical case by r
~3~*fj(ct-r,9),
(3.14)
j=o
and in the spherical by oo
u~^2r~j~1fj(ct-r,0,
(3.15)
j=o
where for notational convenience we now use r instead of p to denote the spherical radius. (For a mathematical discussion of expansions of this type see Ludwig (I960).) These are used by Bayliss and Turkel (1980) to construct a hierarchy of boundary conditions satisfied by truncations of the expansion. This is easily accomplished using normal derivatives: (Biu))) = 0, 1 d
d
dt^dr^
R
where a = 1/2 for (3.14) and a = 1 for (3.15). However, the product form limits the order, p, which can be practically used. An alternative proposed by Hagstrom and Hariharan (1998) is to use the continued fraction form
RADIATION BOUNDARY CONDITIONS
85
(2.19) or its cylindrical analogue, which has the form (3.3) with s replaced by Rs/c and I2 - 1/4 ^ ,
7a
=
llj
~
Sij
= j.
4
Here, I is the dual Fourier variable to 6. Truncating the expansions after p — 1 terms, that is, setting wp = 0, leads to an approximate boundary condition whose accuracy may be assessed, in the spherical case, via (3.11). We have not carried through this analysis in full detail, as in the preceding section. However, we can make some conclusions. Assume I S> p. We then rely on the uniform large index asymptotic expansions for Bessel functions developed by Olver (1954). In particular, we find, to leading order, that away from the transition zones Rs « SkjjRs) ki(RS) which obviously corresponds to the planar case. By direct computation we see that the approximate boundary condition poorly approximates the exact condition when ^ » |5|.
(3.16)
For \s\ » l/R, on the other hand, the approximation is good. Here, instead of increasing the order of the boundary condition, we can expand the domain, moving the boundary to -yR. Evaluating the error on the original domain only, we see that our expression for e picks up an extra factor of Kl+l/2{R~s)
'
Il+1/2{-yRSy
Assuming (3.16), this factor is approximately 7~ 2 '. Therefore, we may hope that the error is small if 7~2? is small, which requires
i Clearly, this argument is far from a proof. In particular we have completely ignored the transition regions. Their analysis might introduce some time dependence in the estimates, as in the similar case of the Pade approximants. We find that, with these favourable assumptions, some improve-
86
T. HAGSTROM
ments can be made by combining domain extension with increases in p. It is therefore of some interest to carry through the convergence analysis.
3.5. Uniform rational
approximants
The analysis above points to a defect of the continued fraction approximations, namely, poor approximation properties for tangential wave numbers which are large in comparison to s. This suggests that substantial improvements can be made by uniformly approximating the transforms of the exact boundary kernels along lines Re s = rj > 0. This program is carried out in Alpert et al. (19996). It consists of two parts: proofs using multipole theory that good approximations exist, followed by the numerical construction of the poles and coefficients via nonlinear least squares. The fundamental approximation theorems used, which follow from the methods of Anderson (1992) and are proven in Alpert et al. (1999&), can be summarized in the following form. Theorem 3 Suppose Di, i = 1,... ,p are disks in the complex plane of radius ri and centre Cj. Suppose the complex numbers Zj, j = 1,..., n, and curve, C, lie within the union of the disks and that the function f(z) is defined by
Then there exists a rational function gm(z) with mp poles, all lying on the boundaries of the D{, such that for any z satisfying Re (z — Cj) > ari > rj,
where
The proof follows from the direct construction of gm. One simply places poles symmetrically around the boundary of the disks with coefficients chosen to match the large z expansion of each disk's contribution to / . The application of Theorem 3 to the approximation of Si (z) is quite direct. As Si is a rational function, it can be written as a sum of poles. These poles are zeroes of Ki+i/2{z), and uniform large I expansions of their locations are given in Olver (1954). In particular, they lie near a curve in the left half £-plane connecting the points z = with the poles nearest the imaginary axis separated from it by an O(l1/3) distance. We cover these poles by O(lnZ) disks such that all points in the closed right half z-plane satisfy the
RADIATION BOUNDARY CONDITIONS
87
Table 1. Number of poles, p, required to approximate Si, I < M, with S(T, M)
10" 6 10" 8 lO"15
M = 128 M = 256 M = 512 M = 1024 12 16 26
14 17 29
15 19 33
16 21 36
inequality Re (z — a) > 2r\. Using m O(lnZ) we thus achieve an error that scales like 2~m. Taking into account the behaviour of the denominator of (3.11) near z = we deduce that, for some /-independent constant C, \ei\ < Cl^^T™. Hence we have the following theorem. Theorem 4 There exists a constant C such that, for any tolerance 0 < r < 1, wave number bound M > 1, radius R > 0, and time T > 0, there exists a p-pole rational approximation P; (z) with p
such that 6(T,M;Pi) < r. Note that the number of poles required is bounded independent of T. We have numerically constructed approximations satisfying Theorem 4. The number of poles required as M and r are varied are listed in Table 1. One cannot but be impressed by the efficiency of these approximations, which allow the evaluation of the exact condition for harmonics of index up to 1024 to double precision accuracy with no more than 36 poles per harmonic. It is also possible to apply Theorem 3 to the approximation of the transform of the cylindrical kernel, Cj, and the planar kernel, K. In each case we use integral representations of the form given in the theorem. Details in the cylindrical case for kz = 0 are given in Alpert et al. (19996), while the planar case will be discussed elsewhere. Both CQ and K have branch points on the imaginary axis. This forces us to settle for approximations with Re s > rj > 0. The number of poles required will thus depend on the time, T, as well as on M and r. For the planar case we have the following theorem. Theorem 5 There exists a constant C such that, for any tolerance 0 < r < 1, wave number bound M, and time T such that cMT > 2, there exists
88
T. HAGSTROM
a j?-pole rational approximation R(s, k) with p
\ncMT
\
-(]n-
r
such that S(T, M; R) < r. The planar approximations are computed by specifying r and r\ = {MT)~l. Choosing r = 1CP3 and r\ = 10~4 leads to a 21-pole approximation which is used in the numerical experiments below. 3.6. Reflectionless sponge layers An alternative to the imposition of radiation boundary conditions at the artificial boundary is to surround the computational domain T with a sponge layer or absorbing region, within which propagating waves are damped. Though the construction of layers that absorb wave energy is reasonably simple, additional errors are typically introduced by the interaction of waves with the interface between the computational domain and the layer. Recently this approach was revitalized by the construction in Berenger (1994) of a sponge layer for Maxwell's equations with a reflectionless interface: the so-called perfectly matched layer, or PML. As shown in Abarbanel and Gottlieb (1997), the original formulation is only weakly well-posed. Petropoulos (1999) gives a clear mathematical derivation of reflectionless sponge layers in Cartesian, cylindrical and spherical coordinates, and derives strongly well-posed formulations. Surprisingly, much less has been written about reflectionless sponge layers for the wave equation. Here I adapt the construction of Chew and Weedon (1994) and Petropoulos (1999) to the wave equation and analyse the error, restricting myself to the planar case. The error estimates thus derived coincide with those derived for Maxwell's equations by the same techniques. Take x to be the coordinate normal to the layer interface and suppose that T corresponds to x < 0. The simplest starting point for the analysis is at the level of the solutions in T, described after our usual Fourier-Laplace transformation by Clearly, these solutions are not damped with increasing or decreasing x for imaginary s satisfying |s| > |A;|, that is, for propagating modes. Damping may be achieved by modifying the exponent so that it has a real part that decreases with increasing x for the right-propagating (—) mode, and increases with increasing x for the left-propagating mode. As the sign of the imaginary part of y/s2 + \k\2 coincides with that of s in the propagating mode regime, this is accomplished by adding to x an increasing in x imaginary function whose imaginary part has the opposite sign from that of s. A
RADIATION BOUNDARY CONDITIONS
89
simple function with this property is
So *(w) dw
~>n
s Thus we seek solutions in the sponge layer in the form *( The interface x = 0 will be reflectionless if the solutions and their xderivatives coincide there. This imposes the additional constraint cr(O) = 0. This constraint is not present in applications to hyperbolic systems, but is almost always imposed in computations. No other conditions on a are needed to make the interface reflectionless. From the layer solutions it is straightforward to derive a pseudodifferential equation. For u we have o, s2u =
s d ( s du\ ,, ,o— —- - \k\2u. s + a ox \s + b ox J
Inverting the transforms, introducing a = co, and assuming zero initial data in the layer, we finally obtain
where
=f
I *w =
Jo Local implementations involve the introduction of auxiliary variables to eliminate the convolutions. It is here that strong well-posedness can be lost. In our numerical experiments we replace (3.17) by dv -v>, ax
1
i
2
at " dv
~di
+
i
av = a
du dx
(3.19)
dv d2u (3.20) 2 dx dx These are strongly well-posed, but possibly not asymptotically stable, so that other reformulations may be better. To complete the layer description we must specify its length, d, and impose a boundary condition at its edge. Here I make the simplest choice, + crw = a
90
T. HAGSTROM
u = 0, though of course one could lower the error with more sophisticated conditions. The general solution within the layer is then given by U = At (e-V^ + lWttx-V+s-1 f* a(w)dw) Matching this to a solution in T of the form
we find that E is related to A as in (3.8) with e(5,|*|) = We see that, as in the case of rational approximants to K, good error estimates do not hold along the imaginary s axis, particularly near s = Taking Re s = r\ > 0 and introducing a = d-1 Jo a(w) dw, we must estimate
min Re hdVz2 + l(|fc| + z^a)) .
(3.21)
Consider the case of f] = 4h
+ C 2 )),
Re
|CI < 1,
C~
-
Hence, for 77 sufficiently small, the minimum is achieved near |£| = 1. Restricting 77 to be no greater than one, we have, for some constant C, g < eiic\k\T-Cdy/rj(\k\+ir) _
Minimizing over fj and maximizing over \k\ we prove the following. Theorem 6 There exists a constant C such that, for any tolerance 0 < r < 1, wave number bound, M, and time, T, the ideal reflectionless sponge layer (3.17) with average absorption a > 0 and width, d, satisfying
da>C\ \fcEf + Win -
in -, T
will have S(T, M) < r. A remarkable feature of this bound is its independence of the maximum wave number, M.
RADIATION BOUNDARY CONDITIONS
91
3.7. Complexity Armed with these error estimates, we are in a position to assess the relative efficiency of the various accurate approaches. We consider two distinct idealized problems governed by the wave equation, with units chosen so that c = 1. The first problem assumes a three-dimensional computational domain T that is 1-periodic in two coordinate directions, has length 1 in the third direction, and is truncated by an artificial boundary at each end. The second problem assumes that T is contained within a sphere of radius 1, which serves as the artificial boundary. The first problem is used to test conditions for periodic or waveguide problems and the second to test conditions for exterior problems. In practice, planar boundaries are used to solve exterior problems, enclosing the computational domain in a box. Unfortunately, we do not as yet have any hard error estimates for either the Pade approximants or reflectionless sponge layers used in this way, and so can only make conjectures concerning their efficiency. In each case we assume that wave numbers up to M must be resolved on the boundary and that we are interested in the solution up to time T. We also suppose that the error tolerance is r. For purposes of comparison it is useful to note the work, Wj, and storage, Si, required by the interior solver. Assuming an explicit method with reasonable stability constraint, these are Wj oc aAM4T,
Si OC a3M3,
where a is the number of points per wavelength. Appropriate values for a are strongly dependent on the order of the method and may also depend on T. It will also be proportional to r~llp for a pth order method. In what follows I will suppress the a dependence, but it should then be kept in mind that in some instances a can be fairly large, and some methods will allow coarser representations when evaluating the boundary conditions. Similarly, we will treat logarithmic dependences on the tolerance as 0(1) constants. I have tried to keep the analysis as simple as possible, ignoring possible improvements in the complexity estimates that might be achievable by better implementations. Of course I hope that in the future this analysis will be supplemented by serious computational experimentation. Domain extension By far the simplest way to achieve an accurate solution is to exploit the finite signal speed and extend the domain so that the boundary cannot influence the solution in T for times less than T. Clearly this requires an extension of width O(T). For the first problem the volume of the extended region is proportional to T so that the extra work and storage, which we shall always denote by WB and SB, is given by WB OC M 4 T 2 ,
SB OC M3T.
92
T. HAGSTROM
We see that the work and storage exceeds that required by the interior scheme by a factor of T, which is clearly unacceptable for moderate to large times. The results for the second problem are even worse, as the volume of the extension is proportional to T 3 , that is, WB oc M 4 r 4 ,
SB oc M 3 T 3 ,
a factor of T 3 above the interior scheme. Kirchoff 's formula Here we assume two spherical boundaries separated by a small distance. To compute the solution at each point on the outer boundary requires the computation of an integral over a sphere. This involves O(M4) work per time-step. Although it is reasonable to assume that the integration can be carried out on a coarser grid than required by the solution of the wave equation, so that the constant of proportionality may be small, the order estimates are SB oc M 3 . WB oc M5T, We see that the storage required is comparable to (probably less than) Si but that the work is greater by a factor of M. Direct implementation of the planar exact condition Here we require direct and inverse Fourier transforms at each time step as well as the solution of the convolutional Volterra equation for each mode. Making use of the FFT, the work associated with the transformations is seen to be O(M2 lnM) per time-step. For the convolution we may use the algorithm presented in Hairer et al. (1985), which requires O(MT In2 MT) operations per mode. As for storage, the direct implementation requires full storage of the time histories of the Fourier coefficients. This could probably be reduced somewhat using the t~3!2 decay of the convolution kernel, K(t), but I have not quantified the effect. Therefore we have WB oc M3T In2 MT,
SB oc M3T.
Except for extraordinarily long times, WB compares favourably with Wj. However, we generally have SB > Si, possibly much greater for large T. Direct implementation of the spherical exact condition Here I will consider the completely local version (3.5). The implementations of Grote and Keller (1995, 1996) and Sofronov (1999) require an additional direct and inverse spherical harmonic transform per time-step. Using the fast algorithms of Driscoll et al. (1997) and Mohlenkamp (1997), this requires
RADIATION BOUNDARY CONDITIONS
93
O(M2 In2 M) work per step and does not increase the order of the complexity estimate. We require, then, the solution of M additional equations on the boundary, associated with M auxiliary variables. This costs WB
OC M4T,
SBOCM3.
Here we have, taking account of the probable smaller proportionality constants due to the effect of a, WB < Wi, SB < 5/: the first method we have seen that meets our goals of arbitrary accuracy without increase in cost! Pade approximants
Suppressing the weak dependence on the error tolerance, we require O(MT) auxiliary functions and equations on the boundary. The cost then is WB OC M4T2,
SB OC M3T.
This is more by a factor of T than what is required by the interior solver, and so is unacceptable for long time computations. We note that in our numerical experiments the growth in the number of terms required as T increases was fairly mild, so that we expect the proportionality constants to be small. High-order conditions based on the Pade approximants have been implemented on rectangular domains. This depends on remarkable constructions of corner compatibility conditions given by Collino (1993) and Vacus (1996). It would be of interest to extend the error estimates to this case. It is conceivable that they will be somewhat better, as glancing modes which reflect off one boundary may be effectively absorbed at near normal incidence by another. Asymptotic boundary conditions In this case we have no proven error estimates. Accepting the optimistic assumption that we must expand the domain by a factor of T~l^2p\ we find WB OC ( T ~ £ - 1)M 4 T + pM3T,
SB oc ( r " i - 1)M 3 +pM2.
These estimates are optimized by poc which leads to work and storage estimates that are better than those obtained for the exact condition, p = M, in some cases. Clearly it would be of interest to make the error estimates precise.
94
T. HAGSTROM
Uniform rational approximants These have been constructed for both planar and spherical boundaries, and hence are directly applicable to both problems. In the first case we require O (In2 MT) auxiliary functions per mode on the boundary, and direct and inverse Fourier transforms each time-step. Therefore the work and storage required are WB oc M3T In2 MT,
SB oc M 2 In2 MT.
Except for extraordinarily long times, we have WB <S WJ and SB
SB oc M 2 In2 M.
Theoretically, then, the uniform approximants represent a completely acceptable solution to the boundary condition problem in the constant coefficient case for exterior problems, as they provide essentially arbitrary accuracy for arbitrary times with WB -C WJ and SB ^ Si. There are some practical issues concerning the efficiency of the fast spherical harmonic transforms and the necessity of using an aspect ratio one computational domain. Reflectionless sponge layers Here there are two parameters, the layer width d and the average absorption, a. Clearly, the number of mesh points in the layer will scale like dM3 with d in the planar case. Also, following the analysis in Collino and Monk (1998), the mesh spacing must scale inversely with a. Hence we will take a fixed and reduce the error by increasing d. As d oc y/T this implies WB OC M^T3'2,
SB OC M3Tll2,
which is unacceptable for large T. Most practical applications of this technique involve exterior problems with a computational domain which is a box. Therefore, I believe it would be of great interest to extend the error estimates to this case. If the time dependence of the errors were as bad as in the planar case, then the long time behaviour would be even worse, as the volume of the layer would grow like d3 oc T 3 / 2 . However, there is some reason to believe that far better estimates hold. This is due to the fact that glancing waves at one boundary are nearly normally incident to another, and hence should be effectively absorbed.
RADIATION BOUNDARY CONDITIONS
95
An alternative for exterior problems is the spherical layer developed in Petropoulos (1999). I believe that this layer could be directly analysed as a straightforward extension of the analysis given here. However, for problems of near unit aspect ratio, it is doubtful that a layer technique could be more efficient than the uniform rational approximants. 3.8. A numerical example We now consider a simply described yet, as we shall see, difficult-to-solve concrete problem to illustrate some of our results. In particular, we solve the initial value problem for the wave equation in the planar region (x, y) 6 (—1,1) x (0,1), assuming periodicity in y. An exact solution, u(x,y,t), is constructed by setting off periodic arrays of pulses at various negative times. This leads to 11 ( T* II ~t\ U/\*JUy y ^ V I
where
00
\ 7
7/ - i T* ?/ 1*2 V"^5 M i
rt-n
i(x,y,t)= Yl /I m(x,y,t)= Y, , fc=-c
'/"I /5
ds
'
J—oo
and
rl = (x- Xi)2 + (y-yi-
kf,
*i(s) =
Aie-^s-^\
Here, Tj < 0 is chosen so that A{e~iHTi is negligibly small. At t = 0 the solution is made negligibly small (to more than 11 digits) outside T. A program that accurately evaluates u using high-order Gaussian quadrature and high-order end-point corrected trapezoidal formulas for singular integrals was generously provided by Leslie Greengard, and is used to produce the error tables listed below. I am confident that the accuracy of the evaluation is on the order of ten decimal digits and, hence, far exceeds that of the numerical solutions. These solutions were used to test Pade approximants of K in numerical solutions of the linearized Euler equations (Hagstrom and Goodrich 1998) and will be used in the extensive numerical experiments to be presented in Alpert et al. (1999a). Here we consider a single pulse with parameters: Hi = 150,
n = --,
xi = 0,
yx = - ,
Ai = 1.
Three types of approximate boundary conditions were considered: the Pade sequence, using (3.12) with RQ = 0, and its generalization, (3.12) with RQ — 1; a strongly well-posed local implementation of the PML; and the planar uniform approximant computed with e = 10~3 and 77 = 10~4. The latter . We employs 21 poles. The approximate conditions are imposed at x —
96
T. HAGSTROM
Table 2. ,Errors using (3. 12) Ro
0 0 0 0 0 0 1 1 1 1 1 1
P
0 5 10 15 20 25 30 5 10 15 20 25 30
Max. err. t < 5 Max. err . t<25 1.2
9.9 1.2 2.6 1.5 1.5 1.5 8.9 6.6 1.6 1.5 1.5 1.5
x 10- 1 X X X X X X X X X X X X
3
10" 10" 3 10" 4 10" 4 lO- 4 10" 4 10" 3 10" 4 10" 4 lO- 4 10" 4 10" 4
2.1 x 10"1 7.0 x 10"2 3.3 x 10"2 1.9 x 10"2 1.0 x 10"2 6.4 x lO- 3 3.5 x 10"3 6.9 x io- 2 3.2 x 10"2 1.7 x 10"2 8.8 x lO- 3 4.9 x 10"3 2.5 x lO- 3
Max. err. t < 50 2.1 7.6 5.5 3.7 2.9 1.9 1.4 7.7 5.5 3.5
x 10"1 xlO- 2 xlO- 2 xlO" 2 xlO" 2
X IO- 2 X 10" 2 XlO-2 X 10" 2 X 10" 2 2 2.6 X 10" 2 1.8 X 10" 1.1 x 10"2
use a fourth-order explicit two-step method as our basic solver on a 200 x 100 uniform mesh. This provides a relative error less than 10~3 for 0 < t < 50. The boundary conditions and/or layer equations were also approximated to fourth order. In all cases we compute the relative L2-errors on a uniform 50 x 25 mesh. Complete details on our discretization techniques will be given in Alpert et al. (1999a). Pade approximants and generalizations Results of these experiments are summarized in Table 2. I ran experiments with p = 0 — 30 and the initializations RQ = 0 and Ro = 1. The results are consistent with the error estimates. Generally, the largest errors occurred near t = 50, and we are unable to achieve an error at the level of the discretization error at this late time with p < 30. We are, on the other hand, fairly close to this goal at t = 25. The errors for the second sequence, .Ro = 1, are generally slightly smaller than those obtained using the Pade sequence. This sequence also has the advantage of being exact at steady state. Finally we note that the short time error of 1.5 x 10" 4 is the best one can do for the problem at hand and the mesh I have used. Indeed, it is the same error found if one uses the exact solution as a Dirichlet condition at the artificial boundary.
97
RADIATION BOUNDARY CONDITIONS
Table 3. Uniform rational approximation coefficients, K « YLj r = 10~3, 77 = 10~4. Complex numbers are written in the form (real, imaginary) I 2431987763837349E-6)
(-4998142304334231E-4, ±
.9999998607359947)
(-.1617695923999794E-5, ± . 1638622585172068E-5)
(-.2501648855535112E-3, ±
.9999990907954994)
(-.7723476507531262E-5, ± .7878743138182415E-5)
(-.8021925048752190E-3, ±
.9999958082358295)
( -.34003O4516975200E-4, ± 3510673092397324E-4)
(-2263515963206483E-2, ±
.9999820162287431)
( -.1454893381589074E-3, ± .1535469093409158E-3)
(-.6112737916O31916E-2, ±
.9999224860282032)
( -.6104572904148162E-3, ± .6733883694898616E-3)
(-.1625071664643320E-1, ±
.9996497460330479)
13,14
(-2473202929583869E-2, ± 3011442350813045E-2)
(-.4295328074381198E-1, ±
.9982864080633248)
15,16
( - 8964957513027O30E-2, ± .1398751873403249E-1)
(-.1129636068874967, ± 9907617913485537)
17,18
( -.1846252520037211E-1, ± .6565858806543060E-1)
( -.2902222956062986, ± .9462036470847180)
19,20
( 9181095934161065E-1, ± .2076825633238755)
(-.6548034445533449, ± .7077228221122372)
(.3787484004895032,0)
(-.9345542777004186,0)
1,2 3,4 5,6 7,8 9,10 11,12
(-.2410467618025768E-6, ±
21
Table 4. Errors usmg uniform rational approximant V
Max. err. t < 5
Max. err . t<25
Max. err. t < 50
21
1.5 x 10"-4
4.1 x 10-4
5.3 x 10"4
Uniform approximants For all Fourier modes we use the approximant determined by a tolerance of r = 10~3 and an offset parameter of r) = 10~4. This yields a 21-pole approximation with parameters listed in Table 3. Note that for our mesh we certainly have M < 100 so that 77 < (MT)~l. Hence the error due to the boundary condition is smaller than r. The results, summarized in Table 4, are very encouraging. For all time intervals considered, the accuracy is the best that can be obtained on the given mesh, as determined by using exact Dirichlet data at the boundaries. Given the modest number of poles used, the efficiency of this approach is unparallelled, as predicted by the complexity analysis. We note that the data in this case was generously provided by Brad Alpert. More comprehensive experiments and detailed discussions of the numerical algorithms will be given in Alpert et al. (1999 a).
98
T. HAGSTROM
Reflectionless sponge layer Results of our experiments with the reflectionless sponge layer or PML are summarized in Table 5. I must admit from the outset that the great variety of possible layers, determined by different absorption profiles, widths, and local realizations, makes it difficult to claim that any particular experiment is definitive or comprehensive. Of interest in this regard is the paper of Turkel and Yefet (1998), where a variety of different formulations are compared for the same problem. Equations (3.18)-(3.20) are discretized to fourth order, with an implicit treatment of the absorption terms. We use a quartic rather than the usual quadratic variation of a. Periodic boundary conditions are used at the terminating point. Listed in the table are the maximum value of a, which we varied between 10 and 80, and the total number of points in each layer, ni, which we varied between 25 and 75. For values of crmax too large in comparison with the layer width, in particular for the 25 point layer and 0max = 40, 80, the solution grew and the errors became large. I do not list these results here. The results of this experiment are in basic agreement with the error estimates. This can be verified by checking the ratio da In absolute terms, however, the performance is somewhat disappointing. The best results are achieved for the thickest layer and amax = 40, and almost reach the minimum possible error for t < 5. However, the long time results are worse than those obtained with the other methods. Improvements might be achieved by changing the discretization. I was unable to take a very large, for fixed mesh spacing, without encountering asymptotic instabilities. 3.9. Approximations
in the dissipative case
In the preceding sections we studied many well-developed techniques for approximating radiation boundary conditions in the hyperbolic case. In contrast, much less has been done for equations of mixed type. It is an interesting issue if the highly efficient rational approximants we shall consider can be extended to these cases. As the kernels to be approximated often involve the same analytic functions with arguments restricted to subdomains of the domains of interest in the hyperbolic case, one is tempted to conjecture that it is possible. However, there is the complicating factor that the transforms do not behave like rational functions at infinity. For problems with small diffusion coefficients or viscosities, rational approximants in the spirit above have been proposed and analysed by Halpern (1986) for the scalar advection-diffusion equation, and by Halpern and
99
RADIATION BOUNDARY CONDITIONS
Table 5. Errors using (3.17) ^max
ni
10
25 25 50 50 50 50 75 75 75 75
20 10 20 40 80 10 20 40 80
Max. err. t < 5 3.0 6.2 5.0 5.3 7.7 2.9 8.0 2.8 2.0 4.8
x 10"1 2
Max. err. t < 25
Max. err .
1 3.5 x 10"
4.3 6.3 1.0 8.7 1.7 5.8 7.4 5.7
X lOX 10"2
1.9 1.0
x 10"3 x 10"4 x 10" 3 xlO" 3 x 10"4 x 10"4 x 10"4
6.1 4.8 1.4
6.9 3.2 1.1 2.9
x 10"i x 10"i xlO" 2 x lO- 2 x 10"1 X 10"2 X 10"2 XlO- 2 x 10"2
4.3 1.1
t<50
X
10"1 lo- 1 lo-1
X
lO- 2
X X
X X X
lo- 1 10"2 10"2
X
io- 2 io- 2
X
io-i
X
Schatzman (1989) for the linearized incompressible Navier-Stokes equations. Error estimates in the small parameters are given. It is also possible to make use of the special properties of dissipative problems to derive simple boundary conditions with some provable accuracy. One can exploit the differing decay rates of different modes to identify 'dominant' wave groups in the far field, and use boundary conditions that interpolate the exact conditions at these locations in wave number space. This procedure is developed in Hagstrom (1991a). Asymptotic error estimates are thereby derived, leading to an error, for planar boundaries, which decays in the general case like L" 1 where L is domain length. In Hagstrom (19916) these ideas are applied to the incompressible Navier-Stokes equations in a channel, producing conditions which seem reasonably accurate at moderate Reynolds numbers. An interesting class of problems that can be accurately solved using simple boundary conditions are singularly perturbed hyperbolic systems at boundaries with no incoming characteristics. The simplest model of such a problem is the scalar advection-diffusion equation (2.43) at outflow {U\ > 0) under the assumption ! / < l . We then notice that the 'incoming' solution u = Eex+x,
X =
is of boundary layer type, as
100
T. HAGSTROM
Therefore, any error we make at the boundary very rapidly decays into the interior. From a numerical perspective, however, the boundary layer may lead to large errors for an unrefined mesh - and one certainly does not want to refine an unphysical boundary layer at an artificial boundary. The amplitude of the layer can be decreased through the use of extrapolation conditions. Namely, we impose dru The error associated with this boundary condition can again be explicitly analysed. Using the exact solution near the boundary given in (2.44), we find that the error satisfies
Clearly, for s + iUta.n k + v\k\2 not too large, the error is O(vr) and r xderivatives are bounded independent of v. In the large s, \k\ regime the prefactor is not small, but generally the solution amplitude, A, is exponentially small as its exponent has an O{y~l) negative real part. Precise arguments and error estimates, including general boundaries and variable coefficients, are given in Loheac (1991). Nordstrom (1995),Nordstrom (1997) established the accuracy of extrapolation boundary conditions at supersonic outflow for the compressible Navier-Stokes system. Surprisingly these results are extended to the subsonic case, where there is an incoming characteristic, if there are large transverse solution gradients. This result is applicable when physical boundary layers intersect the artificial boundary. Extrapolation conditions can also be used for the incompressible Navier-Stokes equations as in Johansson (1993). 4. Conclusions and open problems Our fundamental conclusion, amply demonstrated by the theory and fully supported by our as yet sparse numerical experimentation, is that the basic constant coefficient equations of wave theory on unbounded domains with sufficiently simple tails can be accurately solved at essentially the same cost as solving a standard problem on the bounded subdomain of interest. Excepting the case of computational domains of high aspect ratio, which I will discuss further below, the uniform rational approximants seem to provide an ideal solution. In particular, the work associated with their application is generally less than required by the interior solver. However, for spherical boundaries, direct implementations of the exact condition are only slightly less efficient, and, using the local form, extremely easy to implement. For short to moderate time calculations, local conditions based on high
RADIATION BOUNDARY CONDITIONS
101
degree Pade approximants or the use of refiectionless sponge layers are also acceptable. Despite the remarkable progress made over the past few years, there are still many important problems which remain unsolved. Below I will mention those which I think are most important, along with some speculations concerning their possible solution. High aspect ratio domains. For exterior problems, our best techniques require the use of a spherical artificial boundary. In the most extreme cases, for example, scattering from a body with two small dimensions such as a wire, we are required to use a computational domain whose volume is greater by a factor of the aspect ratio squared than the potential domain of interest. This is clearly undesirable. One possible solution for short to moderate times is to use a long cylindrical domain with simple boundary conditions at the ends, but such a procedure becomes inefficient as the time becomes large. More desirable would be the development of efficient approximations to the exact boundary condition on a family of domains that include domains of high aspect ratio. The obvious candidates here are prolate and oblate spheroids, as the wave equation is separable in the associated coordinate system. Prom our point of view, the primary difference between this case and those we have treated is the lack of scale invariance of the boundary. This leads to the dependence on the Laplace transform parameter, s, of the angular eigenfunctions of the exact boundary operator. If we choose to expand in a fixed basis, such as the spherical harmonics, the temporally nonlocal part of the operator is no longer a diagonal matrix. Naturally, this complicates its approximation, but I do not believe that it precludes the existence of effective uniform approximants. Therefore, the detailed study of this operator seems worthwhile. A distinct approach to its elucidation and approximation may be via multipole expansions of the solution. We have seen that these may be used to express the exact condition on a sphere. They have recently been studied for frequency domain problems in the spheroidal case by Holford (1999). Planar boundary conditions on a rectangular box. Another possibility for constructing high aspect ratio domains is the use of rectangular boxes and either the Pade approximants or a refiectionless sponge layer. In light of the bad long time behaviour of our error estimates and computational experiments on periodic domains, this would seem an unlikely solution. However, there are suggestive arguments that better estimates might hold on box domains. Therefore, I believe that a rigorous error analysis for these boundary conditions on boxes should be developed. Anisotropic systems in exterior domains. As mentioned in our discussion of the linearized compressible Euler equations, we have no general construction of exact boundary conditions for anisotropic problems on bound-
102
T. HAGSTROM
aries which include characteristic points. Of course a general theory, which might be based on the Riemann function, would be most desirable. Failing that, a particular solution in the gas dynamics case would have many applications. Such a solution would probably follow easily from the construction of exact conditions for the convective wave equation.
Approximation of exact conditions for problems of mixed order. As we have seen, it is straightforward to develop expressions for exact boundary conditions in these cases. However, the associated theory of rational approximations is essentially undeveloped. Progress on this front could be very useful, particularly for nondissipative problems such as equations of Schrodinger or Boussinesq type. Variable coefficients. Many important problems in wave theory involve propagation in inhomogeneous media. Examples include aeroacoustics, underwater acoustics, and seismics. Unfortunately, the theory as now developed says little about such problems, except in some very special cases mentioned above. A natural starting point is the wave equation in a stratified medium. A reasonable program is to characterize the exact boundary condition and attempt to construct approximations. However, this will again be a case where the eigenfunctions of the exact operator will depend on the Laplace transform parameter, so it is not clear how far one can go with our most successful approach. It may prove easier to construct reflectionless sponge layers, but this is again an open question. It is worth remembering that some of the earliest work on numerical radiation boundary conditions, namely Engquist and Majda (1977, 1979), made use of geometrical optics and, hence, was naturally extensible to variable coefficients. The existence of a limiting operator could even be proven. However, the use of this theory to characterize the accuracy, as carried out by Halpern and Rauch (1987), depends on the assumption that the wave field is dominated by high frequencies. An intriguing alternative, suggested by the results of Radvogin and Zaitsev (1998), is to use a coordinate system in the exterior which is characteristics-based in the hope it will allow for a significant coarsening of the mesh and, hence, extension of the computational domain. Geometrical optics again seems the logical tool for assessing this approach. In comparison with the problems mentioned above, I believe that a satisfactory general treatment of problems with variable coefficients will prove to be the most difficult. Nonetheless, good results just for some special cases, such as stratified media or perturbations thereof, would be quite useful.
RADIATION BOUNDARY CONDITIONS
103
REFERENCES S. Abarbanel and D. Gottlieb (1997), 'A mathematical analysis of the PML method', J. Comput. Phys. 134, 357-363. M. Abramowitz and I. Stegun, eds (1972), Handbook of Mathematical Functions, Dover, New York. B. Alpert, L. Greengard and T. Hagstrom (1999a), 'Accurate solution of the wave equation on unbounded domains'. In preparation. B. Alpert, L. Greengard and T. Hagstrom (19996), 'Rapid evaluation of nonrefiecting boundary kernels for time-domain wave propagation', SIAM J. Numer. Anal. To appear. C. R. Anderson (1992), 'An implementation of the fast multipole method without multiples', SIAM J. Sci. Statist. Comput. 13, 923-947. A. Barry, J. Bielak and R. MacCamy (1988), 'On absorbing boundary conditions for wave propagation', J. Comput. Phys. 79, 449-468. A. Bayliss and E. Turkel (1980), 'Radiation boundary conditions for wave-like equations', Comm. Pure Appl. Math. 33, 707-725. J.-P. Berenger (1994), 'A perfectly matched layer for the absorption of electromagnetic waves', J. Comput. Phys. 114, 185-200. P. Bettess (1992), Infinite Elements, Penshaw Press, Sunderland, UK. W. Chew and W. Weedon (1994), 'A 3-D perfectly matched medium from modified Maxwell's equations with stretched coordinates', Microwave Optical Technol. Lett. 7, 599-604. F. Collino (1993), Conditions d'ordre eleve pour des modeles de propagation d'ondes dans des domaines rectangulaires, Technical Report 1790, INRIA. F. Collino and P. Monk (1998), 'Optimizing the perfectly matched layer'. Preprint. L. Demkowicz and K. Gerdes (1999), 'Convergence of the infinite element methods for the Helmholtz equation in separable domains', Numer. Math. To appear. G. Doetsch (1974), Introduction to the Theory and Application of the Laplace Transformation, Springer, New York. J. Driscoll, D. Healy and D. Rockmore (1997), 'Fast discrete polynomial transforms with applications to data analysis for distance transitive graphs', SIAM J. Comput. 26, 1066-1099. B. Engquist and A. Majda (1977), 'Absorbing boundary conditions for the numerical simulation of waves', Math. Comput. 31, 629-651. B. Engquist and A. Majda (1979), 'Radiation boundary conditions for acoustic and elastic wave calculations', Comm. Pure Appl. Math. 32, 313-357. A. Eringen and E. §uhubi (1975), Elastodynamics, Vol. 2, Academic Press, New York. T. Geers (1998), Benchmark problems, in Computational Methods for Unbounded Domains (T. Geers, ed.), Kluwer Academic Publishers, Dordrecht, Netherlands, pp. 1-10. M. Giles (1990), 'Nonreflecting boundary conditions for Euler equation calculations', AIAA Journal 28, 2050-2058. D. Givoli (1991), 'Non-reflecting boundary conditions', J. Comput. Phys. 94, 1-29. D. Givoli (1992), Numerical Methods for Problems in Infinite Domains, Vol. 33 of Studies in Applied Mechanics, Elsevier, Amsterdam.
104
T. HAGSTROM
D. Givoli and D. Kohen (1995), 'Non-reflecting boundary conditions based on Kirchoff-type formulae', J. Comput. Phys. 117, 102-113. J. Goodrich and T. Hagstrom (1999), 'High-order radiation boundary conditions for computational aeroacoustics'. In preparation. L. Greengard and P. Lin (1998), 'On the numerical solution of the heat equation on unbounded domains (Part I)'. Preprint. L. Greengard and V. Rokhlin (1997), A new version of the fast multipole method for the Laplace equation in three dimensions, in Ada Numerica, Vol. 6, Cambridge University Press, pp. 229-269. M. Grote and J. Keller (1995), 'Exact nonreflecting boundary conditions for the time dependent wave equation', SIAM J. Appl. Math. 55, 280-297. M. Grote and J. Keller (1996), 'Nonreflecting boundary conditions for time dependent scattering', J. Comput. Phys. 127, 52-81. M. Grote and J. Keller (1998), Exact nonreflecting boundary conditions for elastic waves, Technical Report 1998-08, ETH, Zurich. M. Grote and J. Keller (1999), 'Nonreflecting boundary conditions for Maxwell's equations', J. Comput. Phys. To appear. B. Gustafsson and H.-O. Kreiss (1979), 'Boundary conditions for time-dependent problems with an artificial boundary', J. Comput. Phys. 30, 333-351. T. Hagstrom (1983), Reduction of Unbounded Domains to Bounded Domains for Partial Differential Equation Problems, PhD thesis, California Institute of Technology. T. Hagstrom (1991a), 'Asymptotic boundary conditions for dissipative waves: General theory', Math. Comput. 56, 589-606. T. Hagstrom (19916), 'Conditions at the downstream boundary for simulations of viscous, incompressible flow', SIAM J. Sci. Statist. Comput. 12, 843-858. T. Hagstrom (1995), On the convergence of local approximations to pseudodifferential operators with applications, in Proc. 3rd Int. Conf. on Math, and Num. Aspects of Wave Prop. Phen. (E. Becache, G. Cohen, P. Joly and J. Roberts, eds), SIAM, pp. 474-482. T. Hagstrom (1996), On high-order radiation boundary conditions, in IMA Volume on Computational Wave Propagation (B. Engquist and G. Kriegsmann, eds), Springer, New York, pp. 1-22. T. Hagstrom and J. Goodrich (1998), 'Experiments with approximate radiation boundary conditions for computational aeroacoustics', Appl. Numer. Math. 27, 385-402. T. Hagstrom and S. Hariharan (1996), Progressive wave expansions and open boundary problems, in IMA Volume on Computational Wave Propagation (B. Engquist and G. Kriegsmann, eds), Springer, New York, pp. 23-43. T. Hagstrom and S. Hariharan (1998), 'A formulation of asymptotic and exact boundary conditions using local operators', Appl. Numer. Math. 27, 403-416. T. Hagstrom and H. B. Keller (1986), 'Exact boundary conditions at an artificial boundary for partial differential equations in cylinders', SIAM J. Math. Anal. 17, 322-341. T. Hagstrom and J. Lorenz (1994), Boundary conditions and the simulation of low Mach number flows, in Proceedings of the First International Conference on
RADIATION BOUNDARY CONDITIONS
105
Theoretical and Computational Acoustics (D. Lee and M. Schultz, eds), World Scientific, Singapore, pp. 657-668. E. Hairer, C. Lubich and M. Schlichte (1985), 'Fast numerical solution of nonlinear Volterra convolutional equations', SIAM J. Sci. Statist. Comput. 6, 532-541. L. Halpern (1986), 'Artificial boundary conditions for the linear advection diffusion equation', Math. Comput. 46, 425-438. L. Halpern (1991), 'Artificial boundary conditions for incompletely parabolic perturbations of hyperbolic systems', SIAM J. Math. Anal. 22, 1256-1283. L. Halpern and J. Rauch (1987), 'Error analysis for absorbing boundary conditions', Numer. Math. 51, 459-467. L. Halpern and J. Rauch (1995), 'Absorbing boundary conditions for diffusion equations', Numer. Math. 71, 185-224. L. Halpern and M. Schatzman (1989), 'Artificial boundary conditions for viscous incompressible flows', SIAM J. Math. Anal. 20, 308-353. S. He and V. Weston (1996), Wave-splitting and absorbing boundary conditions for Maxwell's equations on a curved surface, Technical Report TRITA-TET96-14, KTH, Stockholm. R. Higdon (1986), 'Absorbing boundary conditions for difference approximations to the multidimensional wave equation', Math. Comput. 47, 437-459. R. Higdon (1987), 'Numerical absorbing boundary conditions for the wave equation', Math. Comput. 49, 65-90. R. Higdon (1991), 'Absorbing boundary conditions for elastic waves', Geophysics 56, 231-254. R. Higdon (1992), 'Absorbing boundary conditions for acoustic and elastic waves in stratified media', J. Comput. Phys. 101, 386-418. R. Higdon (1994), 'Radiation boundary conditions for dispersive waves', SIAM J. Numer. Anal. 31, 64-100. R. Holford (1999), 'A multipole expansion for the acoustic field exterior to a prolate or oblate spheroid'. Submitted to J. Acoust. Soc. Amer. C. Johansson (1993), 'Boundary conditions for open boundaries for the incompressible Navier-Stokes equations', J. Comput. Phys. 105, 233-251. T. Kato (1976), Perturbation Theory for Linear Operators, Springer, New York. H.-O. Kreiss and J. Lorenz (1989), Initial-Boundary Value Problems and the Navier-Stokes Equations, Academic Press, New York. E. Lindman (1975), 'Free space boundary conditions for the time dependent wave equation', J. Comput. Phys. 18, 66-78. J.-P. Loheac (1991), 'An artificial boundary condition for an advection-diffusion equation', Math. Meth. Appl. Sci. 14, 155-175. D. Ludwig (1960), 'Exact and asymptotic solutions of the Cauchy problem', Comm. Pure Appl. Math. 13, 473-508. M. Mohlenkamp (1997), 'A fast transform for spherical harmonics'. Preprint. R. Newton (1966), Scattering Theory of Waves and Particles, McGraw-Hill, New York. J. Nordstrom (1995), 'Accurate solutions of the Navier-Stokes equations despite unknown outflow boundary data', J. Comput. Phys 120, 184-205.
106
T. HAGSTROM
J. Nordstrom (1997), 'On extrapolation procedures at artificial outflow boundaries for the time-dependent Navier-Stokes equations', Appl. Numer. Math. 23, 457-468. F. Oberhettinger and L. Badii (1970), Tables of Laplace Transforms, Springer, New York. F. Olver (1954), 'The asymptotic expansion of Bessel functions of large order', Philos. Trans. Royal Soc. London A247, 328-368. P. Petropoulos (1999), 'Reflectionless sponge layers as absorbing boundary conditions for the numerical solution of Maxwell's equations in rectangular, cylindrical and spherical coordinates'. Submitted to SIAM J. Appl. Math. Y. Radvogin and N. Zaitsev (1998), Absolutely transparent boundary conditions for time-dependent wave problems, in Seventh International Conference on Hyperbolic Problems. A. Ramm (1986), Scattering by Obstacles, D. Reidel, Dordrecht, Netherlands. V. Rokhlin (1990), 'Rapid solution of integral equations of scattering theory in two dimensions', J. Comput. Phys. 86, 414-439. V. Ryaberikii (1985), 'Boundary equations with projections', Russian Math. Surveys 40, 147-183. M. Schwartz (1987), Principles of Electrodynamics, Dover, New York. I. Sofronov (1993), 'Conditions for complete transparency on the sphere for the three-dimensional wave equation', Russian Acad. Sci. Dokl. Math. 46, 397401. I. Sofronov (1999), 'Artificial boundary conditions of absolute transparency for twoand three-dimensional external time-dependent scattering problems', Euro. J. Appl. Math. To appear. L. Ting and M. Miksis (1986), 'Exact boundary conditions for scattering problems', J. Acoust. Soc. Amer. 80, 1825-1827. L. Trefethen and L. Halpern (1986), 'Well-posedness of one-way wave equations and absorbing boundary conditions', Math. Comput. 47, 421-435. L. Trefethen and L. Halpern (1988), 'Wide-angle one-way wave equations', J. Acoust. Soc. Amer. 84, 1397-1404. S. Tsynkov (1998), 'Numerical solution of problems on unbounded domains. A review', Appl. Numer. Math. 27, 465-532. E. Turkel and A. Yefet (1998), 'Absorbing PML boundary layers for wave-like equations', Appl. Numer. Math. 27, 533-557. O. Vacus (1996), Singularites de frontiere et conditions limites absorbantes: le probleme du coin, Technical Report 2851, INRIA. L. Xu and T. Hagstrom (1999), 'On convergent sequences of approximate radiation boundary conditions and reflect ionless sponge layers'. In preparation.
Ada Numerica (1999), pp. 107-141
© Cambridge University Press, 1999
Numerical methods in tomography Frank Natterer Institut fur Numerische und Instrumentelle Mathematik, Universitdt Miinster, Einsteinstrasse 62, D-48149 Miinster, Germany E-mail: nattereOmath. uni-muenster. de
In this article we review the image reconstruction algorithms used in tomography. We restrict ourselves to the standard problems in the reconstruction of function from line or plane integrals as they occur in X-ray tomography, nuclear medicine, magnetic resonance imaging, and electron microscopy. Nonstandard situations, such as incomplete data, unknown orientations, local tomography, and discrete tomography are not dealt with. Nor do we treat nonlinear tomographic techniques such as impedance, ultrasound, and nearinfrared imaging.
CONTENTS 1 Introduction 2 The filtered backprojection algorithm 3 3D reconstruction formulas 4 Iterative methods 5 Circular harmonic algorithms 6 Fourier reconstruction 7 Conclusions References
107 109 121 127 131 134 138 139
1. Introduction By 'tomography' we mean a technique for imaging 2D cross-sections of 3D objects. It is derived from the Greek word TO^LO<; = slice. Tomographic techniques are used in radiology and in many branches of science and technology.
108
F. NATTERER
1.1. The basic example In the simplest case, let us consider an object whose attenuation coefficient with respect to X-rays at the point x is f(x). We scan the cross-section by a thin X-ray beam L of unit intensity. The intensity past the object is e This intensity is measured, providing us with the line integral
g(L) = Jf(x)dx.
(1.1)
L
The problem is to compute / from g. In principle this problem has been solved by Radon (1917). Let L be the straight line x 0 = s where 6 = (cos
g{0,s)= J f(x)dx = (Rf)(6,s).
(1.2)
x-e=s
R is known as the Radon transform. Radon's inversion formula reads 2?r
where g' is the derivative of g with respect to s, and 6 = (cos
NUMERICAL METHODS IN TOMOGRAPHY
109
1.3. Outline of the paper In Section 2 we describe the most important algorithm in tomography, namely the filtered backprojection algorithm. Not only is it the workhorse for today's tomography, but it also serves as the model for the algorithms in future imaging devices, such as the 3D algorithms described in Section 3. Since many imaging problems can be described by large linear sparse systems of equations, iterative methods suggest themselves (see Section 4). Algorithms exploiting rotational symmetry of the imaging devices are described in Section 5. In Section 6 we deal with algorithms which work exclusively in Fourier space and which have the potential to outperform the filtered backprojection algorithm in speed. 1.4-
Prerequisites
The reader needs only a rudimentary knowledge of numerical analysis (interpolation, quadrature, linear systems), functional analysis (linear operators, distributions), Fourier analysis (Fourier transform, Fourier series, inversion, convolution, fast Fourier transform (FFT)) and sampling theory (Shannon's sampling theorem for band-limited functions). 2. The filtered backprojection algorithm In this section we give a detailed description of the most important algorithm in 2D tomography. The discrete implementation depends on the scanning geometry, that is, the way the data are sampled. This algorithm is essentially a numerical implementation of the Radon inversion formula (1.3). However, a different approach avoiding singular integrals is simpler. We describe this approach for the n-dimensional Radon transform
9(0,8) = J f(x)dx = (Rf)(0,s) x-6-s n
where / is a function in R and 0 € S n ~ \ s e t 1 . Let (R*g)(x)=
I
g(6,0-x)d6
gn-l
be the backprojection operator and let V, v be functions such that V = R*v. Mathematically, R* is simply the Hilbert space adjoint of the Radon transform R. It is easy to see that V*f = R*(v*g),
(2.1) n
where the convolution on the left-hand side is in R , while the convolution on
110
F. NATTERER
the right-hand side is a ID convolution with respect to the second variable: fv(x-y)f(y)dy
=
f
f v(x
6 - s)g(8, s) dsd9.
(2.2)
The idea is to choose V as an approximation to Dirac's <5-function. Then, V -k f is close to / . The interrelation between V, v is easily described in terms of the Fourier transform. Denoting with the same symbol 'A' the n-dimensional Fourier transform = (27r)-"/2 f
e-ix
and the ID Fourier transform v{6, a) = (27T)"1/2 / e~isav{6, s) ds,
a G R1,
R1
we have
see Natterer (1986). By |£| we mean the Euclidean length of £ G W1. The choice of V determines the spatial resolution of the reconstruction algorithm. We use the notion of resolution from sampling theory: see Jerry (1977). We give a short account of some basic facts of sampling theory. A function / in M.n is said to be band-limited with bandwidth 17, or simply Jl-band-limited, if /(£) = 0 for |£| > fi. An example for n = 1 is the sine function sin x smc (x) = , x which has bandwidth 1. Obviously, sine (fix) has bandwidth Q. J7-bandlimited functions are capable of representing details of size 27r/f2 but no smaller ones. This becomes clear simply by looking at the graph of sine. In tomography the functions we are dealing with are usually of compact support. Such functions cannot be strictly band-limited, unless identically zero. Hence we require the functions only to be essentially Q-band-limited, meaning that /(£) is approximately 0 for |£| > Q in some sense: see Natterer (1986). A reconstruction method in tomography is said to have resolution 2vr/0 if it reconstructs essentially fi-band-limited functions reliably. For strictly fi-band-limited functions we have the following propositions, which also hold, with very good accuracy, for essentially fi-band-limited functions.
NUMERICAL METHODS IN TOMOGRAPHY
111
1. If / is f2-band-limited and h < TT/Q (the Nyquist condition), then / is uniquely determined by the values f(hk), k € Z, and f[x) =
- hk).
2. If / is fi-band-limited and h < ir/Cl, then
3. If /i, J2 are fi-band-limited and h < vr/fi, then
f fi(x)f2(x)dx
=
Returning to the creation of a reconstruction algorithm with resolution 2vr/f2, we have to determine V such that
where 0 is a filter with the property
This follows from the formula 6 = (2-7r)"n/'2. This means that for the filter function v we must have in—1,
(2.3)
Examples are the ideal low pass 1, \a 0, \a the cos filter _ J cos (air/2), 0,
\a \a
and the filter __ J sine (air/2), which has been introduced in tomography by Shepp and Logan (1974). The corresponding functions v for n — 2 are vn{s) =
^M(QS),
U(S)
= sine (s) - - (sine (-J J
112
F. NATTERER
for the ideal low pass,
with the same function u, and /
/2
- s sin s,
for the Shepp-Logan filter. More filters can be found in Chang and Herman (1980). The integral on the right-hand side of (2.2) has to be approximated by a quadrature rule. We have to distinguish between several ways of sampling 9 = Rf2.1. Parallel geometry in the plane In this case the 2D Radon transform g(0,s) = (Rf)(0,s) is sampled for 6 = 0j = (cospj, sinipj)T, ifj = irj/p, j = 0,... ,p - 1 and s = se = £p/q, £ = —q,..., q. Here p is the radius of the reconstruction region, that is, we assume f(x) = 0 for x G M2, \x\ > p. This means that the measured rays come in p parallel bundles with directions evenly distributed over 180°, each bundle consisting of 2q + 1 equispaced lines. This was the scanning geometry of the first commercial scanner for which Hounsfield received the Nobel prize in 1979. This geometry has been replaced by more efficient ones in today's scanners (see below), but it is still used in scientific and technical imaging. We evaluate the integral in (2.2) by the trapezoidal rule
(V * f)(x) ^2-^Y,Y. Pq
"«(*
9
i - 8i)9(6j> se)-
(2-4)
j=Q£=-q
The accuracy of this approximation can be assessed by sampling theory, according to which the trapezoidal rule for an inner product is exact provided the step-size h satisfies the Nyquist criterion, that is, h < ir/Cl where Q is the bandwidth of the factors in the inner product. In our case the first factor is VQ(X 0j — s) (as a function of s) which has bandwidth $1. The second factor is g(0j, s) (again as a function of s). This is given and does not, in general, have finite bandwidth. At this point we have to make an assumption. We assume / to be essentially band-limited with essential bandwidth 0. The n-dimensional Fourier transform of / and the ID Fourier transform Rf (with respect to the second variable) are interrelated by (Rf)A(0, a) = (2 7 r)("- 1 )/ 2 /(^).
(2.5)
This is the famous (and easy to prove) 'projection' or 'central slice' theorem
NUMERICAL METHODS IN TOMOGRAPHY
113
of computerized tomography. In the present context we need it only to deduce that / and g = Rf have the same (essential) bandwidth. Thus the sintegral in (2.2) is accurately represented by the l-sum in (2.4) provided that the step-size h = p/q in that sum satisfies the Nyquist criterion h < 7r/S7. In other words, q > -pQ.
(2.6)
The condition for the number p of directions which makes the j'-sum in (2.4) a good approximation for the ^-integral in (2.2) is less obvious. Based on Debye's asymptotic relation for the Bessel functions one can show that the essential bandwidth of Rf as a function of ip, 8 = (cos ip, sin nP.
(2.7)
Inequalities (2.6), (2.7) are the conditions for high accuracy in (2.4), assuming / to be zero outside the ball of radius p and essentially band-limited with bandwidth Q,. The double sum in (2.4) has to be evaluated for each reconstruction point x. This leads to unacceptable complexity. This complexity can be reduced by introducing the function 9
Then, (2.4) reads
%=0
This requires only a simple sum for each reconstruction point x, at the expense of an additional interpolation in the second argument of h. In most cases linear interpolation suffices (but nearest neighbour does not). This leads us to the filtered backprojection algorithm for standard parallel geometry (see Herman (1980, p. 133)).
Algorithm 1 Filtered backprojection algorithm for standard parallel geometry Data: The values {gj/ = g(9j, se) : j = 0,... ,p — 1, £ = —q,..., q}, where g is the 2D Radon transform of / . Step 1: For j = 0,... ,p — 1 carry out the discrete convolution p Y~» q
e=-q
114
F. NATTERER
Step 2: For each reconstruction point x, find the discrete backprojection f
( \ — — V^ ((1 — 9U
-I- 9/?
1
^ 3=0
where k = k(j, x) and i? = -d(j, x) are determined by + — L
2. ,
£ — l+l
i9
t
h
A,
U
(/
ft,.
It
I ,
Result: fpB is an approximation to f(x). The algorithm depends on the parameters p, q and on the choice of VQ. It is designed to reconstruct a function / with support in \x\ < p and with essential bandwidth fi, that is, the spatial resolution of the algorithm is 2TT/Q. Conditions (2.6), (2.7) should be satisfied. A few remarks are in order. 1. Condition (2.6) has to be strictly satisfied. Otherwise the s integral in (2.2) is not even approximately equal to the £ sum in (2.4), leading to unacceptable errors. 2. If (2.7) is not satisfied, the reconstruction is still accurate for \x\ < p/9, < p. 3. Filter functions VQ whose 'kernel sum'
does not vanish should not be used: see Natterer and Faridani (1990). 4. Usually, linear interpolation in Step 2 is sufficient. However, for difficult functions / (e.g., functions containing large objects at the boundary of the reconstruction region) linear interpolation generates visible artefacts. In that case an oversampling procedure similar to the one of Algorithm 2 below is advisable. Alternatively one may use the circular harmonic algorithm from Section 5. 5. The algorithm needs O(p) operations for each reconstruction point. Algorithms with lower complexity (such as O(logp)) can be obtained either by Fourier reconstruction (see Section 6) or by the fast backprojection algorithm in Section 2.5. 6. Conditions (2.6), (2.7) suggest taking p — irq. This much debated condition is usually not complied with in radiological applications, where p is chosen to be considerably smaller. This is due to the special requirements in radiological imaging.
NUMERICAL METHODS IN TOMOGRAPHY
115
2.2. The interlaced parallel geometry It is well known (see, for instance, Kruse (1989)) that the data in the standard parallel geometry are redundant: if p is even, then one can omit each g(6j,se) with £ + j odd without impairing the resolution. Deriving algorithms that use only the remaining 'interlaced' data (i.e., those g(Oj, Si) for which j + £ even) is fairly subtle. What happens is the following. If in the £ sum in (2.4) every second term is dropped, the sum no longer approximates the corresponding s integral in (2.2). Miraculously, the large quadrature error cancels when the j sum in (2.4) is computed. This means that success depends entirely on a subtle interplay between different directions. This interplay is disrupted by the interpolation procedure in Step 2 of Algorithm 1. There are two ways out. The first one is to avoid interpolation altogether by using circular harmonic algorithms: see Section 5. The second one is to make the interpolation more accurate, for instance by oversampling. This leads to an algorithm that has the structure of a filtered backprojection algorithm. Algorithm 2 Filtered backprojection algorithm for parallel interlaced geometry Data: The values {g(9j, se) : j = 0,... ,p — 1, £ — —q,..., q, £ + j even}, where g is the 2D Radon transform of / , and p has to be even. Step 1: Choose a sufficiently large integer M > 0 (M = 16 will do) and compute, for j — 0,... ,p - 1,
E ^(g^)s(^)
k = -Mq,...,Mq.
t=-q £+j even
Step 2: For each reconstruction point x, compute 2 p"1 fFB(x) = — Y, ((1 - #)hj,k + %,fc+i), P
3=0
where k = k(j,x), 1? = $(j, x) are determined by
t = Ma—p-, Result:
/FB(^)
k=\t\,
#= t-k.
p/q is an approximation to f(x).
Note that the difference between this algorithm and Algorithm 1 is that it needs only half the data but produces the same image quality. We study the various assumptions underlying this algorithm.
116
F. NATTERER
1. The algorithm is designed to reconstruct a function / supported in \x\ < p with essential bandwidth fi. The sampling conditions (2.6), (2.7) have to be satisfied. In contrast to Algorithm 1, oversatisfying these conditions may lead to artefacts. Thus the algorithm should be used only if (2.6), (2.7) are satisfied with equality, that is, for p = ir q. 2. Only filters v with a smooth transition from nonzero to zero values should be used. The reason is that the additional filtering of the interpolation step is not present in Algorithm 2. 2.3. Standard fan beam geometry This is the most widely used scanning geometry. It is generated by a source moving on a concentric circle of radius r > p around the reconstruction region \x\ < p, with opposite detectors being read out in small time intervals (third generation scanner). Equivalently we may have a fixed detector ring with only the source moving around (fourth generation scanner). Denoting the angular position of the source by 0 and the angle between a measured ray and the central ray by a (a > 0 if the ray, viewed from the source, is left of the central ray), then fan beam scanning amounts to sampling the function = (Rf)(9,r sin a),
g{0,a)
9 = (COf + a\) \ sm(/j + a) J
(2.8) x
'
at the points 0 = 0j=jA0,
A0 = 2n/p, ,
j = 0,... £=-q,...,q.
Here, q is chosen so as to cover the whole reconstruction region \x\ < p with rays, and d is the detector offset which is either 0 or . First we derive the fan beam analogue of (2.1). We only have to put
(V * f)(x) = r
//
/ v(x 0
-TT/2
9 — r sina)g(0, a) cosccdad/3
NUMERICAL METHODS IN TOMOGRAPHY
117
with 9 as in (2.8). Discretizing the integral by the trapezoidal rule yields p-l
(V*f)(x)
q
~ r A a A / 3 ^ ^ va{x j=o e=-q
6(/3j + ae) - rsmae)g(f3j,ae) cosae.
(2.9) This is the fan beam analogue of (2.4) and defines a reconstruction algorithm for fan beam data. One can show that for this algorithm to have resolution 2?r/f2 one has to satisfy
see Natterer (1993). As in the parallel case, an algorithm based on (2.9) needs O(pq) operations for each reconstruction point. Reducing this to O(p) is possible here, too, but this is not as obvious as in the parallel case. We first establish a relation for the expression x 0(ip) — s in (2.2). Let b = rd((3) be the source position, and let 7 be the angle between x — b and —6. We take 7 positive if x, viewed from the source b, lies to the left of the central ray, that is, we have
(b-x) -b |6-xl-|6|'
CO67=
where 0(
(2.11)
Thus, 6{(p) - s) = \b - x|- 2 u n | 6 _ x |(sin(7 - a)).
vQ(x
Using this in (2.2) we obtain 2TT
(V*f)(x) = r / \b-x\~ 0
T/2
2
/ vn\b_xi(sm(j-a))g((3,a)cosadad/3. -TT/2
Here, b = r9((3), and 7 is independent of a. Unfortunately, the a integral has to be evaluated for each x since the subscript ft\b — x\ depends on x. In order to avoid this we make an approximation: we replace Cl\b — x\ by fir. This is not critical as long as \b — x\ ~ |6|, that is, as long as p
118
F. NATTERER
in most scanners r ~ 3p, and this is sufficient for the approximation to be satisfactory. However, if p is only slightly smaller than r, problems arise. Upon the replacement of i>n|b_x| by VQr we obtain 2TT
x
(V * f)( ) —
r
T/2
2
I \b — x\~ 0
I ^nr(sin(7 — a))g((3, a) cosadad/3.
-TT/2
The a integral can now be precomputed as a function of 7 and (3, yielding an algorithm with the structure of a filtered backprojection algorithm. Algorithm 3 Filtered backprojection
algorithm for parallel standard fan beam
D a t a : T h e values {gj
geometry
: j = 0 , . . . ,p - 1, £ = -q,...,q},
where
g is the function in (2.8). Step 1: For j = 0,... ,p — 1 carry out the discrete convolutions vur(sm(ak - ae))gje cos at,
k = -q,...,
q.
l=-q
Step 2: For each reconstruction point x, compute the discrete weighted backprojection p-i b
fFB(x) = rApJ2\ J .
~ x\~2 (C1 - #)h3,i° + #
3=0
where k = k(j, x) and d = ~d(j, x) are determined by bj - x)
7 = it arccos \bj-x\\bj\'
bj
the sign being the one of —x bj and bj = r9((3j), t = -?-, Aa
k=\t\, tf =
t-k.
Result: / F B ( ^ ) is an approximation to f(x). The algorithm as it stands is designed to reconstruct a function / with support in \x\ < p which essentially band-limited with bandwidth fl from fan beam data with the source on a circle of radius r > p. The remarks following Algorithm 1 apply by analogy. In particular, conditions (2.10) have to be satisfied. For r ~ p and with dense parts of the object close to the boundary of the reconstruction region, problems are likely to occur.
NUMERICAL METHODS IN TOMOGRAPHY
119
2.4. Linear fan beam geometry Here, the detector positions within a fan with vertex b are evenly spaced on the line perpendicular to b. We need the explicit form of the inversion formula mainly for the derivation of the FDK algorithm in 3D cone beam tomography in the next section. With g the function in (2.8), the sampled data are 9j,e
=
g{/3j,ae),
Pj
=
—j, P
j = 0,...,p-l,
£=
ae = t&-n(ye/r),
-q,...,q
ye = (£ +
d)Ay.
The coordinates (f3, y) are related to the parallel coordinates (ip, s) in the representation x 9(ip) = s of the rays by V
<^ = /3 + arctani, r
TV y
s=
1
(2.12)
ylyil
[r +
Hence, r3
d { ) 2
d{0,y) (r + Substituting (/?, y) for (ip, s) in (2.2) leads to 2TT
3
(V*f)(x) = J Jv(x.9(
where (2.12) has to be inserted for (
6(/3) )
_ rx
9(0 + TT/2) r-x
z
Prom (2.11) it follows that vn(c(z - y)) = c~2vcn(z - y), yielding 2TT
3 r dyd/3
2
(V*f) = I Jc- vcU(z-y)g((3,y)-{( 2 r
0 Rl 2TT
f
= r
1 f n/aw> I J [r — x u(p)Y J
v
cn(z-y)g((3,y)i
120
F. NATTERER
As in the standard fan beam case we make the approximation c ~ 1. Again this is justified if p 3p. Then,
(V*f)(x)^
f
^
J (r — x
f vn (^^ ~ y) 9(8, V),\r*-\2 \ \r — x t) )
vy J
S1
where
K1
— 6(ip + IT/2). Defining h(0, z) = j vQ(z - y)g(O, y)
^
u/2,
(2-13)
this can be written as
(v*f)(x)~
fh(e, ™_'^e) de-
( 2 - 14 )
The implementation of (2.13), (2.14) can now be done exactly as in the standard case, leading to a filtered backprojection algorithm which needs O(p) operations for each reconstruction point x.
2.5. Fast backprojection The backprojection (Step 2 in Algorithms 1-3) is the most time-consuming part of the filtered backprojection algorithm. The filtering or convolution step (Step 1 in Algorithms 1-3) requires in principle the same number of operations, but this can easily be reduced drastically either by cutting off the filter VQ or by using the fast Fourier transform (FFT). The backprojection consists of the evaluation of the sums
o n a p x p grid. This is the simplest case of Algorithm 1, the resolution of the image being adjusted to the number of views p according to the sampling theorem. Nilsson (1997) suggested a divide and conquer strategy for doing this with O(p2logp) operations, as opposed to the O(p 3 ) operations of a direct evaluation. Suppose p = 2m. Step 1: For j = 0,1, 2 , . . . ,p — 1, compute
f}(x)=g(9j,x-ej). Since fj is constant along the lines x 6j = s it suffices to compute fj (x) at 2p points x. We need p 2p operations. Step 2: For j = 0 , 2 , 4 , . . . ,p - 2, compute
NUMERICAL METHODS IN TOMOGRAPHY
121
Since fj, fj+l are constant along the lines x-Oj = s, x-Oj+i — s, respectively, fj is almost constant along the lines x 9j = s where 9j — (0j + 0j+i)/\/2. Hence it suffices to compute fj only for a few, say 2, points on each of those lines. This means that we have to evaluate fj at 4p points, requiring | 4p operations. Step 3: For j = 0,4,8,..., p - 4, compute
With the same reasoning as in Steps 1 and 2 we find that it suffices to compute ff(x) for only for 8p points, requiring | 8p operations. Proceeding in this fashion we arrive in Step m at the approximation /j™ to /, which has to be evaluated at 2mp points. Hence the number of operations in Steps 1 to m is p 2p + |
4p + |
8p +
+^
2 > = mp2
and this is O(p2logp). Of course this derivation is heuristic, and we have simply ignored the necessary interpolations and approximations. However, practical experience demonstrates that such an algorithm can be made to work. 3. 3D reconstruction formulas Algorithms for 3D tomography are still under development. The main problem is that the data entering explicit inversion formulas are usually not available. Thus the main task in 3D tomography is the derivation of inversion formulas that use only the data measured by a specific imaging device. It is clear that these formulas are tailored to the imaging device. In this section we restrict ourselves to the derivation of exact or approximate inversion formulas. The implementation in a discrete setting can be done along the lines of the 2D algorithms. 3.1. Inversion of the 3D Radon
transform
Let g = Rf, R the 3D Radon transform, be given on S2 xR 1 . Using (2.1) for n = 3 leads directly to a filtered backprojection algorithm, exactly as in the 2D case. Introducing spherical coordinates
(
sini/) cos (/A sinV' sin ip J , cos?/> J
0 < ip < 2TT,
0 < ip < IT,
122
F. NATTERER
(2.1) reads (V*f)(x)=
f f h(8,x-d)sinipdipdp,
(3.1)
0 0
where 8 = 8((p, ip) and
h(8,t) = I g(6,s)v(t-s)ds. Once h is computed, the evaluation of (3.1) requires the computation of a 2D integral for each reconstruction point. This is prohibitive in real world applications. Fortunately we can exploit the structure of the 3D Radon transform as the composition of two 2D Radon transforms. Putting n
kv(s,t)
— / h(6() sinipdip, o we can rewrite (3.1) as (V*f)(.x)=
(3.2) / k^xa, xi cos ip + x2 sin ip) dip. o The last two formulas are essentially 2D backprojections. They can be evaluated exactly as described in the previous section. After having precomputed h and k the final reconstruction step (3.2) requires only a ID integral for each reconstruction point. This algorithm is reminiscent of the two-stage algorithm of Marr, Chen and Lauterbur (1981), developed for magnetic resonance imaging (MIR), except that the convolution steps are not present. 3.2. The FDK approximate formula This is the most widely used algorithm for cone beam tomography with the source running on a circle. It is well known that this inversion problem is highly unstable. However, practical experience with the FDK formula is nevertheless quite encouraging. The function sampled in cone beam tomography with the source on a circle is
g(0,y) = Jf(rO + ty)dt, where 9 is a direction vector in the X\-X2 plane, 0 = (cos ip, sirup, 0)T, (see below) is the vector 01- is the subspace orthogonal to 9, while
NUMERICAL METHODS IN TOMOGRAPHY
123
(— simp, cosip, 0)T perpendicular to 9. As usual we assume / = 0 outside x\ < p where p < r.
The FDK formula is an ingenious adaptation of the 2D inversion formula of Section 2.4 to 3D. Consider the plane ir(9, x) through r9 and x which intersects 91- in a line parallel to the x\-X2 plane. Compute in this plane for each 9 the contribution to (2.14). Finally, integrate all these contributions over Sl, disregarding the fact that the contributions come from different planes. The necessary computations are unpleasant, but the result is fairly simple. Based on (2.14),
(V*f)(x)* I -r^-r 2 fvQ(u-u')g(9,u',z) x J v~ ')
S1
J
V
R1
rdti
+u
'f
=,
+ zA
(3.3)
where r r —x
9
'
r r—x
9
and (u',z) are coordinates in ^ , that is, g(9,u',z) stands for g(9,y) with y = (—u'sinip, u'cosip, z)T. The implementation of (3.3) leads to a reconstruction algorithm of the filtered backprojection type. The reconstructions computed with the FDK formula (3.3) are - understandably - quite good for flat objects, that is, if / is nonzero only close to the x\~X2 plane in which the source runs. If this is not the case then exact formulas using more data such as Grangeat's formula (see below) have to be used. 3.3. Grangeat's
formula
Grangeat's formula requires sources on a curve C with the following property: each plane meeting supp (/) contains at least one source. This condition is obviously not satisfied when C is a circle for which the FDK approximation has been derived. The data for Grangeat's formula are generated by the function
g(c,9)= f f{c + t9)dt,
ceC,
9eS2.
The condition on the source curve means that for each x with f(x) ^ 0 and each 9 € S2 there exists a source c = c(x, 9) G C such that x 9 = c(x, 0) 9. The gist of Grangeat's inversion is a relation between g and the 3D Radon transform Rf of / . This relation reads (Grangeat 1991) g-s(Rf)(9,s)\s=x.e=
j
—g{c(x,0),cj)du>,
(3.4)
where ^ stands for the derivative in the direction 9 € 5 2 , acting on the
124
F. NATTERER
second argument. For this to make sense we have to extend g to all of C x E3 by using the above definition not only for 9 € S2, but for all of R 3 . This is equivalent to extending g by homogeneity of degree —1 in the second argument. With the help of the 3D inversion formula
S2
for the 3D Radon transform, Grangeat's formula leads immediately to an inversion procedure for the data g. Related inversion formulas for cone beam tomography have been derived by Tuy (1983) and Gelfand and Goncharov (1987). For details of the implementation see Defrise and Clack (1995).
3.4- Orlov's inversion
formula
This formula inverts the ray transform
(Pf)(9,y) = J f(y + t9)dt, ye9x,
9 e S2
(3.5)
R1
which arises, for instance, in 3D emission tomography (PET, Defrise, Townsend and Clac (1989)). If 9 is restricted to a plane, then we simply have the Radon transform in this plane, and we can reconstruct / in that plane by any of the methods in the previous section. In practice g = Pf is measured for 9 e S2, where SfiCS2. In Orlov's formula (Orlov 1976), S$ is a spherical zone around the equator, that is, S2 = {0(
) :^-(v)
if>+{
using spherical coordinates 9((p,tp) — (cos (p cos ip, simp cos I/J, sin ip)T and -7r/2 < ip- < V+ < TT/2. Then, f{x)
= A f h(0, x - (x 9)9) d0, g(9,x-y)
dy,
(3.6)
where A is the Laplacian acting on x and £(9, y) is the length of the intersection of S2 with the plane spanned by 0,9,y € M3. The first formula of (3.6) is - up to A - a backprojection, while the second one is a convolution in #-*-. Thus an implementation of (3.6) is again a filtered backprojection algorithm. P can also be inverted by the Fourier transform. We have £G0\
(3.7)
NUMERICAL METHODS IN TOMOGRAPHY
125
where 'A' denotes the (n — l)-dimensional Fourier transform in 91- on the left-hand side and the Fourier transform in W1 on the right-hand side. Assume that S2, C S2 satisfies the Orlov condition: every equatorial circle of S2 meets SQ. Note that the set S2. (the spherical zone) we used above in Orlov's formula satisfies this condition. From (3.7) it follows that / is uniquely determined by (P/)(0, ) for 9 G SQ under the Orlov condition. Namely, if £ G R n is arbitrary, then Orlov's condition says that there exists S e ^ n ^ 1 , and /(£) is determined from (3.7). 3.5. Colsher's inversion formula Assume again that g = Pf is known for 9 G S2, C S2. We want to derive an inversion procedure similar to the one in Section 2.1. With the backprojection
(P*g)(x) =
Jg(9,x-(x-9)9)d9,
2
So
we again have V*f
= P*(v*g)
provided that V = P*v. Again the convolutions on each side have different meanings. Explicitly this reads
(V * f)(x) = J Jv(9,x-
(9
- y)g{9,y) dyd0,
(3.8)
si which corresponds to (2.2). As in (2.2) we express the relationship V = P*v in Fourier space, obtaining
see Colsher (1980). In order to get an inversion formula for P we have to determine v such that V = 6 or V = (2n)~3/2, that is,
J A solution v independent of 9 is
, O d 0 = (27r)- 2 K|.
(3.9)
126
F. NATTERER
where \SQ D ^ l is the length of SQ C\ £-*-. For the spherical zone SQ from Section 3.4 with ip_ = ipQ, ip+ = ipQ, ipo a constant with 0 < tpo < | , Colsher computed h explicitly. Setting £3 = |£J cos?/' we obtain (2vr)2 \\(4arcsin(sinVo/sinV) (4
\
Filters such as the Colsher filter (3.10) do not have small support. This means that g in (3.8) has to be known in all of 91-. Often g is only available in part of 91- (truncated projections). Let us choose
where 9 = 9(
/ where |£3| = |^'| tan V, l^'l = VW+M-
^ T h i s is close
' to F = (27r)"3/2 if Vo
is small. In this case reconstruction from truncated projections is possible, at least approximately.
3.6. Conical tilt geometry In electron microscopy (Frank 1992) one is faced with the case
for some ipo where 0((p,ip) = (cos(psin.i/j,sm(psinip, cos^O^- SQ does not satisfy Orlov's condition, and (3.9) cannot be satisfied since SQH^ = for some £. In that case we put
^ 0,
otherwise.
With this choice of v, (3.8) is the minimal norm solution of Pf = g. A proper discretization along the lines of Section 2.1 leads to the weighted backprojection algorithm of electron microscopy: see Frank (1992).
NUMERICAL METHODS IN TOMOGRAPHY
127
4. Iterative methods If exact inversion formulas are not available, iterative methods are the algorithms of choice. However, even if exact inversion is possible, iterative methods may be preferable due to their simplicity, versatility and ability to handle constraints and noise. Iterative methods are usually applied to discrete versions of the reconstruction problem. These discrete versions are obtained either by starting out from discrete models, as in the EM algorithm below, or by a projection method, known as a 'series expansion method' in the tomographic community (Censor 1981). This means that the unknown function / is written as N
for certain basis functions B^. With g\ the ith measurement, the measurement process being linear, we obtain the linear system N
for the expansion coefficients /^, the matrix element an being the ith measurement for the basis function B^. In tomography we always have an > 0. Also, the matrix (an) is typically sparse. Often B( is the characteristic function of pixels or voxels. Recently, smooth radially symmetric functions with small support (the 'blobs' of Lewitt (1992) and Marabini, Herman and Carazo (1998)) have been used. Blobs have several advantages over pixel- or voxel-based functions. Due to the radial symmetry it is easier to apply the Radon transform (or any of the other integral transforms) to B^, making it easier to set up the linear system (4.1). The smoothness of the B^ prevents a 'checkerboard' effect (i.e., the visual appearance of the pixels or voxels in the reconstruction) and does part of the necessary filtering and smoothing. The linear system (4.1) may be overdetermined (M > N) or underdetermined (M < N), consistent or inconsistent. Useful iterative methods must be able to handle all these cases. 4-1- ART (algebraic reconstruction
technique)
This is an extension of the Kaczmarz (1937) method for solving linear systems. It has been introduced in imaging by Gordon, Bender and Herman (1970). We describe it in a more general context. We consider the linear system Ajf
= gj,
j = 0,...,p-l1
(4.2)
128
F. NATTERER
where Ao : H j—* Hj are bounded linear operators from the Hilbert p space H p o ilbert space space Hj. With C,- : ifj —> H,- a positive definite operator into the Hilbert an iteration as follows: we define an iteration step fk —> / ffc,0
_ *fc j = O,...,p-l _
(4.3)
fk,p
If Cj = A*-Aj (provided Aj is surjective) and u — 1, then / f c j is the orthogonal projection of fk^~1 onto the afflne subspace Ajf — gj. For dim(Hj) = 1 this is the original Kaczmarz method. Other special cases are the Landweber method (p = 1, C\ = I) and fixed-block ART (dim(Hj) finite, Cj diagonal: see Censor and Zenios (1997)). It is clear that there are many ways to apply (4.3) to (4.1), and we will make use of this freedom to our advantage. One can show (Censor, Eggermont and Gordon 1983, Natterer 1986) that (4.3) converges provided that p-i
J ^ y.range i=o
{Aj)
and 0 < to < 2. This is reminiscent of the SOR theory of numerical analysis. In fact we have fk = Auk where uk is the A;th SOR iterative for the linear system AA*u = g with
;
/ 50
H
\5P-I.
If (4.2) is consistent, ART converges to the solution of (4.2) with minimal norm in H. Plain convergence is useful, but we can say much more about the qualitative behaviour and the speed of convergence by exploiting the special structure of the image reconstruction problems at hand. With R the Radon transform in Rn we can put H = L 2 (|x| < 1),
Hj = L 2 ( - l , +1; wl~n),
(Ajf)(s) = (Rf)(0j,s) where w is the weight function (1 — s 2 ) 1 / 2 and 9j E S1""1. One can show that the subspaces
CmJ(x)
=
C^2{x-63),
the Gegenbauer polynomials of degree m (Abramowitz and Stegun 1970)
NUMERICAL METHODS IN TOMOGRAPHY
129
are invariant subspaces of the iteration (4.3). This has been discovered by Hamaker and Solmon (1978). Thus it suffices to study the convergence on each subspace Cm separately. The speed of convergence depends drastically on UJ and - surprisingly - on the way the directions Oj are ordered. We summarize the findings of Hamaker and Solmon (1978) and Natterer (1986) for the 2D case where 6j = (cos ipj, sinipj). 1. Let the tpj be ordered consecutively, that is, ipj = jir/p, j = 0 , . . . ,p— 1. For u large (e.g., u> = 1) convergence on Cm is fast for m < p large and slow for m small. This means that the high-frequency parts of / (such as noise) are recovered first, while the overall features of / become visible only in the later iterations. For u small (e.g., u = 0.05) the opposite is the case. This explains why the first ART iterates for u = 1 look terrible and why surprisingly small relaxation factors (e.g. u = 0.05) are used in tomography. 2. Let { p are irrelevant since they describe those details in / that cannot be recovered from p projections because they are below the resolution limit. This is a result of the resolution analysis in Natterer (1986).
4-2. EM (expectation maximization) The EM algorithm for solving the linear system Af = g reads k
>0
(4 4)
Multiplications and divisions in this formula are understood componentwise. It is derived from a statistical model of image reconstruction: see Shepp and Vardi (1982). The purpose is to compute a minimizer of the log likelihood function
t(f) = J>* MAf); - (Af)i).
(4.5)
130
F. NATTERER
The convergence of (4.4) to a minimizer of (4.5) follows from the general EM theory in Dempster, Laird and Rubin (1977). Note that (4.4) preserves positivity if an > 0. The pure EM method (4.4) produces unpleasant images which contain too much detail. Several remedies have been suggested. An obvious one is to stop the iteration early, typically after 12 steps. One can also perform a smoothing step after each iteration (EMS algorithm of Silverman, Jones, Nychka and Wilson (1990)). A theoretically more satisfying method is to add a penalty term —B(f) to (4.5), that is, to minimize *(f)-B(f),
(4.6)
—B(f) may be interpreted either in a Bayesian setting or simply as a smoothing term. Typically,
= (f-7fB(f-J) where B is a positive definite matrix and / is a reference picture. Unfortunately, minimizing (4.6) is more difficult than minimizing (4.5) and cannot be done with a simple iteration such as (4.4). For partial solutions to this problem see Levitan and Herman (1987), Green (1990), Setzepfandt (1992). As in ART, a judicious arrangement of the equations can speed up the convergence significantly. The directions have to be arranged in 'ordered subsets': see Hudson, Hutton and Larkin (1992).
4-3. MART (multiplicative
algebraic reconstruction
technique)
While ART converges in the consistent case to a minimal norm solution of (4.1), MART is designed to converge to a solution of (4.1) which minimizes the entropy M
A log A-
(4.7)
For this to make sense we assume that (4.1) has a positive solution, and we seek the minimizer of (4.7) among those / that have only positive components. This is reasonable in many tomographic problems. The step fk -> fk+1 of the MART algorithm for (4.1) is as follows:
fk'°
= fk, aTfk,i-l
fk+1
_
fk,M
MART is an example of a multiplicative algorithm (see Pierro (1990)); another example is the EM algorithm.
NUMERICAL METHODS IN TOMOGRAPHY
131
5. Circular harmonic algorithms Circular harmonic algorithms can be derived from the inversion formula of Cormack (1963). For the 2D problem Rf = g it is obtained by Fourier expansions
f(x) = g(9,s)
= ^gt(s)e^,
0 = (cos ^, sir
I
One can show that oo 1
fjr\
r
/
,
(a2
—
jf.\ ) — — / \
s
r
r^\~l2T,
~
)
J
I
S N
\n'la\Aa
|£| I - )5^v s J c l s -
(£,!}
I53-1;
r
Tt is the Chebyshev polynomial of the first kind of order L In principle this defines an inversion formula for R: the Fourier coefficients gn of g — Rf determine the Fourier coefficients /^ of / via (5.1), and hence / is determined by g. The formula (5.1) is useless for practical calculations since Te increases exponentially with £ outside [—1, +1]. Cormack (1964) also derived a stable version of his inversion formula. It reads
-m
where Ug is the Chebyshev polynomial of the second kind of order £. This formula does not suffer from exponential increase. It is the starting point of the circular harmonic reconstruction algorithms of Hansen (1981), Hawkins and Barrett (1986) and Chapman and Cary (1986). We take a different route and start out from (2.1) again. We consider only the case n = 2. Putting x — x^ = Si9((fk), 0(ip) = (cos<£>,sin?)T, in (2.4) we obtain
{V*f)(xik) =
^2 5Zvn(sjCos(^_fc)
Pq
-se)g(0j,st).
e=-qj=o
The j sum is a convolution. In order to make this convolution cyclic we extend g(9j,S() by putting g(9j+p,se) = g(@j,—se), in accordance with the evenness property of the Radon transform. Then, {V*f){xik)
=^ ^ pq
j=o
vn(sico&(
132
F. NATTERER
Defining 2p-l
huk
=
- } P
vn(sicos{
j
3=0
£ = -q,...,q,
i = O,...,q,
we have
k = 0 , . . . , 2p - 1,
g
P V^
q hm
"~ h ' This defines the circular harmonic algorithm.
Algorithm 4 Circular harmonic algorithm for standard parallel geometry Data: The values {gjtt = g(9j, S() : j = 0 , . . . ,p — 1, £ = —q,..., q}, where g is the 2D Radon transform of / . Step 1: Precompute the number V£tij = VQ(S{ cos(ipj) — S£j and extend g^£ to all j = 0 , . . . , 2p — 1 by gj+p,£ = gj,-e-
Step 2: For i = 0 , . . . , q, £ = —q,..., q carry out the discrete cyclic convolutions 2p-l 7T ^ - - \
P '^"^
''
Step 3: For i = 0 , . . . , q and k = 0 , . . . , 2p — 1, compute x
P
Result: fcH(%ik) is a n approximation to
f(xik).
Step 2 of the algorithm has to be done with a fast Fourier transform (FFT) in order to make the algorithm competitive with filtered backprojection. In that case Step 2 requires O(q2p log p) operations. This is slightly more (by the factor logp) than what is needed in the filtered backprojection algorithm. Step 3 needs Apq2 additions. Algorithm 4 can be used almost without any changes for the interlaced parallel geometry, that is, for c/^ with £ + j odd missing (p even). One simply puts gjj = 0 for j + £ odd and doubles hgn- in Step 2. Circular harmonic algorithms are also available for standard fan beam data. Setting x = Xjfc = £i#(/3fc), U = iAt, At = p/q in (2.9) gives q
(V*f)(xik) ~rAaA/? ^
p-l
]Pvn{Ucos(/3k_j-ae)-rsinae)g{i3j,ae)cosae,
NUMERICAL METHODS IN TOMOGRAPHY
133
where g is now the fan beam data function from (2.8).
Algorithm 5 Circular harmonic algorithm for standard fan beam geometry Data: The values {gjte = g((3j, at) : j = 0 , . . . , p - 1, £ = -q,..., given by (2.8).
q}, g being
Step 1: Precompute the numbers V£tij = vn(ti cos((3j — at) —r sinai) cosa^. Step 2: For i — 0 , . . . , q, £ = —q,..., q carry out the discrete cyclic convolutions p-l j=0
Step 3: For i = 0 , . . . , q and k = 0 , . . . ,p — 1 compute the sums q
fcft(xik) = rAa ^2 l=-q
Result: fcHfaik) is an approximation to The complexity of Algorithm 5 is again A few remarks are in order.
f(xik). O(q2p\ogp).
1. Circular harmonic algorithms compute the reconstruction on a grid in polar coordinates. Interpolation to a Cartesian grid (for instance for the purpose of display) is not critical and can be done by linear interpolation. 2. The resolution of the circular harmonic algorithms is the same as for the corresponding filtered backprojection algorithms in Section 2.1. 3. Even though circular harmonic algorithms are asymptotically a little slower (by a factor of logp) than filtered backprojection, they usually run faster due to their simplicity. This is true in particular for fan beam data because in that case the backprojection is quite time-consuming. 4. Circular harmonic algorithms tend to be more accurate than filtered backprojection because no additional approximations, such as interpolations or homogeneity approximations (in the fan beam case), are used. 5. The disadvantage of circular harmonic algorithms lies in the fact that they start with angular convolutions. This is considered impractical in radiological applications.
134
F. NATTERER
6. Fourier reconstruction We have already made use of the relation for the Radon transform in R n in Section 2.1, and of the corresponding formula for the n-dimensional ray transform
in Section 3.4. In this section we use these formulas to derive reconstruction algorithms. To fix ideas, we consider the case of the 2D Radon transform, sampled as in the standard parallel geometry. This means that g = Rf is g i v e n f o r 0 = 8j = (cos ipj, s i n ipj)T, ifj = irj/p, j = 0, . . . , p — 1 a n d s = s^ = £p/q, £ = — q,..., q, as in Section 2.1. Here, / is assumed to vanish outside |x| < p. The idea of Fourier reconstruction is very simple: do a ID Fourier transform on g with respect to the second variable for each 6. According to (6.1) this yields / in all of R 2 . Do a 2D inverse Fourier transform to obtain / . Even though this seems fairly obvious, the numerical implementation in a discrete setting is quite intricate. In fact, good Fourier algorithms have been found only quite recently. To begin with we describe the simplest possible implementation. We warn the reader that this algorithm is quite useless since it is not sufficiently accurate.
Algorithm 6 Standard Fourier reconstruction Data: The numbers {gj^ — g(Oj, S() : j = 0 , . . . ,p— 1, £ = —q,..., q}, where g is the 2D Radon transform of / . Step 1: For j = 0 , . . . ,p — 1 carry out the discrete Fourier transform
gjtr = ( 2 7 r ) V ^ J2 einr£/%,t, q e=-q
r=-q,...,q.
Step 2: For each k G Z 2 , \k\ < q, find (j,r) such that r6j is as close as possible to k and put fk = (27T)- 1 / 2 ^, r . Step 3: Do the 2D discrete inverse Fourier transform
|fc|<9
Result: fm is an approximation to
NUMERICAL METHODS IN TOMOGRAPHY
135
The algorithm is designed to reconstruct a function / with support in |re| < p which is essentially Jl-band-limited. Inequalities (2.6), (2.7) have to be satisfied. We stress again that the algorithm as it stands is not to be recommended because of poor accuracy. Better versions are described below. A few comments are in order. In Step 1 we compute an approximation 9j,r to g
s) ds,
j e^g^,
^
Under assumption (2.6) this approximation is reliable since Fourier transforms are evaluated exactly by the trapezoidal rule if the Nyquist condition, in our case (2.6), is satisfied. According to (6.1), Step 1 provides us with the values
In Step 2 we compute / on the Cartesian grid interpolation: 7T , \
f^-k)~fk,
;
,
-2
keZz,
(TT/P)Z2
by nearest neighbour
\k\
Since / vanishes outside \x\ < p, f has bandwidth p. Thus sampling of / on a 2D grid with step-size ir/p is adequate by the sampling theorem. Step 3 is the trapezoidal rule for the 2D inverse Fourier transform, properly discretized and complying with the Nyquist condition. Hence fm is an approximation to f(pm/q). Steps 1 and 3 of Algorithm 6 are justified by the sampling theorem. Thus the failure of the algorithm must be caused by the interpolation in Step 2. This is in fact the case. Of course we can replace the interpolation by a more accurate one, such as linear interpolation. However, this does not help much. In spite of its poor accuracy, Fourier reconstruction is attractive because of its favourable complexity, which is due to the fast Fourier transform (FFT). We have used FFT in the circular harmonic algorithm already, but FFT is so essential for Fourier reconstruction that we say a few words here; for a thorough treatment we refer to Nussbaumer (1982). The discrete Fourier transform of length q is defined to be q-1
Any algorithm that computes yo,
jjq-i from yo,
, yq-\ in less then
136
F. NATTERER
typically O(qlogq), operations is a called an FFT. In the circular harmonic algorithm we have used FFT just for the evaluation of (6.3). In Fourier reconstruction we employ FFT for the evaluation of Fourier integrals
for the functions / in R1 with support in (—p, p). Sampling theory tells us that / has to be discretized with step-size ir/p (/ has bandwidth p). With h — p/q the step-size for / , the trapezoidal rule provides the approximation k = -q,...,q-l.
(6.4)
The range of k is in agreement with the sampling theorem: the step-size h = p/q corresponds to the bandwidth Q = n/h = (ir/p)q; hence |fc| < q in (6.4) suffices. Of course (6.4) is a discrete Fourier transform of length 2q. Sometimes one wants to adjust the step-sizes of / and / differently. Then one has to evaluate 9-1
yk = J2e~iC'k/qyt'
k= 0,...,q-l
(6.5)
£=0
with an arbitrary real parameter c. This can be done by the chirp-zalgorithm (see Nussbaumer (1982)), again using typically O(q\ogq) operations. Assuming that we have a fast Fourier transform (FFT) algorithm that does a discrete Fourier transform of length q with O(q log q) operations (this may restrict q to 'FFT-friendly numbers'), the complexity of Algorithm 6 is as follows. Step 1 does p Fourier transforms of length 2q, requiring O(pqlogq) operations. Assuming that the interpolation in Step 2 can be done in 0(1) operations per point we get 0(q2) as the complexity of Step 2. The 2D Fourier transform in Step 3 can be done with 0(q2 log q) operations. Hence the complexity of Algorithm 6 is O((pq + q2) logq). This is much better than the filtered backprojection algorithm (Algorithm 1), which needs O(q2p) operations for a reconstruction on a q x q grid. Presently there exist two Fourier methods with satisfactory accuracy and favourable complexity. 1. The linogram algorithm (Edholm and Herman 1987) Here, interpolation in Fourier space is avoided altogether by a clever choice of the directions
NUMERICAL METHODS IN TOMOGRAPHY
137
where j , £ = —q,..., q. Doing a ID Fourier transform on g^e results in
q
e=-Q
where
we get from (6.1)
e=-q Note that this can be done efficiently by the chirp-z-algorithm. The key observation is that the points
k-K
Q 1
__ kir ( 1 \ _ k-K ( 1 ~ p \cot tpj) ~ p \j/q
form a grid lying on vertical lines with distance ir/p, being evenly spaced within each vertical line (though with different step-sizes in different lines). On such a grid we can do a ID FFT in the horizontal direction in the usual way. In the horizontal direction the step-size is not what we need for a direct application of the FFT, but the chirp-z-algorithm is still applicable. This takes care of the 2D inverse Fourier transform in | ^ | > |£i|- For |£i| < l&l w e proceed analogously with the data gj^, evaluating f((T0j) for a = kir/p cos ipj. We remark that the linogram data in Edholm and Herman (1987) is a little different from ours, namely S£ = hisimpj, sg = hi cos
2. The gridding algorithm (O'Sullivan 1985, Kaveh and Soumekh 1987, Schomberg and Timmer 1995) This algorithm works on the standard parallel data used in Algorithm 6. It does the interpolation in Step 2 of Algorithm 6 in the following way. Let w be a smooth function in R2 with w = 1 on \x\ < p which is decaying exponentially at infinity. Put fw = wf. Then,
oo
1
= (27T)" fa 0
Iw(Z-a9)f(a6)deda S
1
oo
= (2vr)-3/2 fafwfc-
aO)g(O, a) d0 da
138
F. NATTERER
by (6.1). Using a quadrature rule with nodes {ar,9j} obtain the approximation p
fw,k =
3 2
(2TT)- /
^
q
\
a
Yl ^rW ( -k - ar03 J gjr
j=Qr=-q
to fw(-k).
/
and weights Oj>, we
^
(6.7)
'
The method relies on the following assumptions.
1. w is decaying at infinity so fast that only a few terms of the r sum in (6.7) have to be retained. 2. The dependence on the angle is not critical, so that it suffices to retain only a few terms in the j sum of (6.7). If these conditions are met then fw^ of (6.3) is a good approximation to fw{^k) which can be evaluated essentially in 0(1) operations for each k. This takes care of Step 2. Of course, when using (6.7) we have to divide / by w after Step 3 to make up for the previous multiplication with w. It is needless to say that our derivation of the gridding algorithm is purely heuristic. It seems that at present there exists no convincing theoretical analysis of the gridding algorithm. 7. Conclusions In the preceding sections we have given the fundamentals of the most widely used algorithms in tomography. In many ways these fundamentals are quite different from traditional numerical analysis, the main difference being the consistent use of sampling theory and Fourier analysis. The development of algorithms is still very lively, particularly in 3D and in Fourier-based algorithms. We have dealt only with the most simple problems and with standard situations. Practical problems deviate in many ways from the simple ones we considered. Often the data is incomplete (see Louis (1980)), leading to nonuniqueness and instability. Sometimes the integral equations to be solved are not completely specified, for instance the weight function (as in emission tomography (Welch, Clack, Natterer and Gullberg 1997)) or the directions (as in electron microscopy (Wuschke 1990, Gelfand and Goncharov 1990)) are unknown. In technical applications in particular, the number of data is often so small (see, for instance, Sielschott and Derichs (1995)) that full reconstruction is impossible and special algorithms have to be developed, usually tailored to the specific application. Sometimes only certain features of the object, such as boundaries between regions of different densities, are sought (Faridani, Finch, Ritman and Smith 1997, Ramm and Katsevich 1996), calling for special algorithms.
NUMERICAL METHODS IN TOMOGRAPHY
139
At present we have an adequate understanding of the fundamentals of tomographic reconstruction algorithms. However, new applications of tomography are arising almost daily, each presenting new challenges to the numerical analyst. So I guess that research in this field will go on for ever!
REFERENCES M. Abramowitz and I. A. Stegun, eds (1970), Handbook of Mathematical Functions, Dover. Y. Censor (1981), 'Row-action methods for huge and sparse systems and their applications', SIAM Review 23, 444-466. Y. Censor, P. B. Eggermont and D. Gordon (1983), 'Strong underrelaxation in Kaczmarz's method for inconsistent systems', Numer. Math. 41, 83-92. Y. Censor and S. A. Zenios (1997), Parallel Optimization, Oxford University Press. L. T. Chang and G. T. Herman (1980), 'A scientific study of filter selection for a fan-beam convolution algorithm', SIAM J. Appl. Math. 39, 83-105. C. H. Chapman and P. W. Cary (1986), 'The circular harmonic Radon transform', Inverse Problems 2, 23-49. J. G. Colsher (1980), 'Fully three-dimensional emission tomography', Phys. Med. Biol. 25, 103-115. A. M. Cormack (1963), 'Representation of a function by its line integrals, with some radiological applications I', J. Appl. Phys. 34, 2722-2727'. A. M. Cormack (1964), 'Representation of a function by its line integrals, with some radiological applications IF, J. Appl. Phys. 35, 195-207. S. R. Deans (1983), The Radon Transform and some of its Applications, Wiley. M. Defrise, D. W. Townsend and R. Clack (1989), 'Three-dimensional image reconstruction from complete projections', Phys. Med. Biol. 34, 573-587. M. Defrise and R. Clack (1995), 'A cone-beam reconstruction algorithm using shiftvariant filtering and cone-beam backprojection', IEEE Trans. Med. Irnag. 13, 186-195. A. P. Dempster, N. M. Laird and D. B. Rubin (1977), 'Maximum likelihood from incomplete data via the EM algorithm', J. R. Statist. Soc. B 39, 1-38. P. Edholm and G. T. Herman (1987), 'Linograms in image reconstruction from projections', IEEE Trans. Med. Imag. 6, 301-307. A. Faridani, D. V. Finch, E. L. Ritman and K. T. Smith (1997), 'Local tomography IF, SIAM J. Appl. Math. 57, 1095-1127. J. Frank, ed. (1992), Electron Tomography, Plenum Press. I. M. Gelfand and A. B. Goncharov (1987), 'Recovery of a compactly supported function starting from its integrals over lines intersecting a given set of points in space', Doklady 290 (1986); English Translation in Soviet Math. Dokl. 34, 373-376. I. M. Gelfand and A. B. Goncharov (1990), 'Spatial rotational alignment of identical particles given their projections: Theory and practice', Translation of Mathematical Monographs 81, 97-122. R. Gordon, R. Bender and G. T. Herman (1970), 'Algebra reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography', J. Theor. Biol. 29, 471-481.
140
F. NATTERER
P. Grangeat (1991), 'Mathematical framework of cone-beam reconstruction via the first derivative of the Radon transform', in Mathematical Methods in Tomography, Vol. 1497 of Lecture Notes in Mathematics (G. T. Herman, A. K. Louis and F. Natterer, eds), Springer, pp. 66-97. P. J. Green (1990), 'Bayesian reconstruction from emission tomography data using a modified EM algorithm', IEEE Trans. Med. Imag. 9, 84-93. C. Hamaker and D. C. Solmon (1978), 'The angles between the null spaces of X-rays', J. Math. Anal. Appl. 62, 1-23. E. W. Hansen (1981), 'Circular harmonic image reconstruction', Applied Optics 20, 2266-2274. W. G. Hawkins and H. H. Barrett (1986), 'A numerically stable circular harmonic reconstruction algorithm', SIAM J. Numer. Anal. 23, 873-890. G. T. Herman (1980), Image Reconstruction from Projection: The Fundamentals of Computerized Tomography, Academic Press. G. T. Herman and L. Meyer (1993), 'Algebraic reconstruction techniques can be made computationally efficient', IEEE Trans. Med. Imag. 12, 600-609. H. M. Hudson, B. F. Hutton and R. Larkin (1992), 'Accelerated EM reconstruction using ordered subsets', J. Nucl. Med. 33, 960-968. A. J. Jerry (1977), 'The Shannon sampling theorem - its various extensions and applications: a tutorial review', Proc. IEEE 65, 1565-1596. S. Kaczmarz (1937), 'Angenaherte Auflosung von Systemen linearer Gleichungen', Bulletin de I'Academie Polonaise des Sciences et des Lettres A35, 355-357. A. C. Kak and M. Slaney (1987), Principles of Computerized Tomography Imaging, IEEE Press, New York. M. Kaveh and M. Soumekh (1987), 'Computer assisted diffraction tomography', in Image Recovery: Theory and Application (H. Stark, ed.), Academic Press, pp. 369-413. H. Kruse (1989), 'Resolution of reconstruction methods in computerized tomography', SIAM J. Sci. Statist. Comput. 10, 447-474. E. Levitan and G. T. Herman (1987), 'A maximum a posteriori probability expectation maximization algorithm for image reconstruction in emission tomography', IEEE Trans. Med. Imag. 6, 185-192. R. M. Lewitt (1992), 'Alternatives to voxels for image representations in iterative reconstruction algorithms', Phys. Med. Biol. 37, 705-716. A. K. Louis (1980), 'Picture reconstruction from projections in restricted range', Math. Meth. Appl. Sci. 2, 109-220. R. Marabini, G. T. Herman and J. M. Carazo (1998), 'Fully three-dimensional reconstruction in electron microscopy', in Computational Radiology and Imaging: Therapy and Diagnostics (C. Borgers and F. Natterer, eds), Vol. 110 of IMA Volumes in Mathematics and its Applications, Springer. R. B. Marr, C. N. Chen and P. C. Lauterbur (1981), 'On two approaches to 3D reconstruction in NMR zeugmatography', in Mathematical Aspects of Computerized Tomography (G. T. Herman and F. Natterer, eds), Proceedings, Oberwolfach 1980, Springer, pp. 225-240. F. Natterer (1986), The Mathematics of Computerized Tomography, Wiley and Teubner.
NUMERICAL METHODS IN TOMOGRAPHY
141
F. Natterer (1993), 'Sampling in fan beam tomography', SI AM J. Appl. Math. 53, 358-380. F. Natterer and A. Faridani (1990), 'Basic algorithms in tomography', in Signal Processing Part II: Control Theory and Applications (F. A. Griinbaum et al., eds), Springer, pp. 321-334. S. Nilsson (1997), 'Application of fast backprojection techniques for some inverse problems of integral geometry', Linkb'ping Studies in Science and Technology, Dissertation No. 499, Department of Mathematics, Linkoping University, Linkoping, Sweden. H. J. Nussbaumer (1982), Fast Fourier Transform and Convolution Algorithm, Springer. S. S. Orlov (1976), 'Theory of three dimensional reconstruction, II: The recovery operator', Sov. Phys. Crystallogr. 20, 429-433. J. D. O'Sullivan (1985), 'A fast sine function gridding algorithm for Fourier inversion in computer tomography', IEEE Trans. Med. Imag. 4, 200-207. A. R. De Pierro (1990), 'Multiplicative interative methods in computed tomography', in Mathematical Methods in Tomography (G. T. Herman, A. K. Louis and F. Natterer, eds), Springer, pp. 167-186. J. Radon (1917), 'Uber die Bestimmung von Funktionen durch ihre Integralwerte langs gewisser Mannigfaltigkeiten', Berichte Sdchsische Akademie der Wissenschaften, Math.-Phys. KL, 69, 262-267, Leipzig. A. Ramm and A. Katsevich (1996), The Radon Transform and Local Tomography, CRC Press. H. Schomberg and J. Timmer (1995), 'The gridding method for image reconstruction bx Fourier transformation', IEEE Trans. Med. Imag. 14, 596-607. L. A. Shepp and B. F. Logan (1974), 'The Fourier reconstruction of a head section', IEEE Trans. Trans. Nucl. Sci. NS-21, 21-43. L. A. Shepp and Y. Vardi (1982), 'Maximum likelihood reconstruction for emission tomography', IEEE Trans. Med. Imag. 1, 113-122. B. Setzepfandt (1992), 'ESNM: Ein rauschunterdriickendes EM-Verfahren fur die Emissionstomographie', Dissertation, Fachbereich Mathematik, Universitat Minister. H. Sielschott and W. Derichs (1995), 'Use of collocation methods under inclusion of a priori information in acoustic pyrometry', Proc. European Concerted Action on Process Tomography, Bergen, Norway, pp. 110-117. B. W. Silverman, M. C. Jones, D. W. Nychka and J. D. Wilson (1990), 'A smoothed EM approach to indirect estimation problems, with particular reference to stereology and emission tomography', J. R. Statist. Soc. B 52, 271-324. H. K. Tuy (1983), 'An inversion formula for cone-beam reconstruction', SIAM J. Appl. Math. 43, 546-552. A. Welch, R. Clack, F. Natterer and G. T. Gullberg (1997), 'Towards accurate attenuation correction in SPECT', IEEE Trans. Med. Imag. 16, 532-541. K. Wuschke (1990), 'Die Rekonstruktion von Orientierungen aus Projektionen', Diplomarbeit, Institut fur Numerische und Instrumentelle Mathematik, Universitat Miinster.
ActaNumerica (1999), pp. 143-195
© Cambridge University Press, 1999
Approximation theory of the MLP model in neural networks Allan Pinkus Department of Mathematics, Technion - Israel Institute of Technology, Haifa 32000, Israel E-mail: [email protected]
In this survey we discuss various approximation-theoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. The MLP model is one of the more popular and practical of the many neural network models. Mathematically it is also one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are approximation-theoretic in character. Most of the research we will discuss is of very recent vintage. We will report on what has been done and on various unanswered questions. We will not be presenting practical (algorithmic) methods. We will, however, be exploring the capabilities and limitations of this model. In the first two sections we present a brief introduction and overview of neural networks and the multilayer feedforward perceptron model. In Section 3 we discuss in great detail the question of density. When does this model have the theoretical ability to approximate any reasonable function arbritrarily well? In Section 4 we present conditions for simultaneously approximating a function and its derivatives. Section 5 considers the interpolation capability of this model. In Section 6 we study upper and lower bounds on the order of approximation of this model. The material presented in Sections 3-6 treats the single hidden layer MLP model. In Section 7 we discuss some of the differences that arise when considering more than one hidden layer. The lengthy list of references includes many papers not cited in the text, but relevant to the subject matter of this survey.
144
A. PlNKUS
CONTENTS 1 On neural networks 2 The MLP model 3 Density 4 Derivative approximation 5 Interpolation 6 Degree of approximation 7 Two hidden layers References
144 146 150 162 165 167 182 187
1. On neural networks It will be assumed that most readers are pure and/or applied mathematicians who are less than conversant with the theory of neural networks. As such we begin this survey with a very brief, and thus inadequate, introduction. The question 'What is a neural network?' is ill-posed. From a quick glance through the literature one quickly realizes that there is no universally accepted definition of what the theory of neural networks is, or what it should be. It is generally agreed that neural network theory is a collection of models of computation very, very loosely based on biological motivations. According to Haykin (1994, p. 2): 'A neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects: 1. Knowledge is acquired by the network through a learning process. 2. Interneuron connection strengths known as synaptic weights are used to store the knowledge.' This is a highly nonmathematical formulation. Let us try to be a bit less heuristic. Neural network models have certain common characteristics. In all these models we are given a set of inputs x = {x\,..., xn) £ W1 and some process that results in a corresponding set of outputs y = ( y i , . . . , ym) £ M.m. The basic underlying assumption of our models is that the process is given by some mathematical function, that is, y = G(x) for some function G. The function G may be very complicated. More importantly, we cannot expect to be able to compute exactly the unknown G. What we do is choose our 'candidate' F (for G) from some parametrized set of functions using a given set of examples, that is, some inputs x and associated 'correct' outputs y = G(x), which we assume will help us to choose the parameters. This is a very general framework. In fact it is
APPROXIMATION THEORY OF THE MLP
MODEL IN NEURAL NETWORKS
145
still too general. Neural network models may be considered as particular choices of classes of functions F(x, w) where the w are the parameters, together with various rules and regulations as well as specific procedures for optimizing the choice of parameters. Most people would also agree that a neural network is an input/output system with many simple processors, each having a small amount of local memory. These units are connected by communication channels carrying data. Most neural network models have some sort of training rule, that is, they learn or are trained from a set of examples. There are many, many different models of neural network. (Sarle (1998) lists over 40 different recognized neural network models, and there are a plethora of additional candidates.) Neural networks have emerged, or are emerging, as a practical technology, that is, they are being successfully applied to real world problems. Many of their applications have to do with pattern recognition, pattern classification, or function approximation, which are all based on a large set of available examples (training set). According to Bishop (1995, p. 5): 'The importance of neural networks in this context is that they offer a very powerful and very general framework for representing non-linear mappings from several input variables to several output variables, where the form of the mapping is governed by a number of adjustable parameters.' The nonlinearity of the neural network models presents advantages and disadvantages. The price (and there always is a cost) is that the procedure for determining the values of the parameters is now a problem in nonlinear optimization which tends to be computationally intensive and complicated. The problem of finding efficient algorithms is of vital importance and the true utility of any model crucially depends upon its efficiency. (However, this is not an issue we will consider in this survey.) The theory of neural nets has become increasing popular in the fields of computer science, statistics, engineering (especially electrical engineering), physics, and many more directly applicable areas. There are now four major journals in the field, as well as numerous more minor journals. These leading journals are IEEE Transactions on Neural Networks, Neural Computation, Neural Networks and Neurocomputing. Similarly, there are now dozens of textbooks on the theory. In the references of this paper are listed only five books, namely Haykin (1994), Bishop (1995), Ripley (1996), Devroye, Gyorfi and Lugosi (1996), and Ellacott and Bos (1996), all of which have appeared in the last five years. The IEEE has generally sponsored (since 1987) two annual conferences on neural networks. Their proceedings run to over 2000 pages and each contains a few hundred articles and abstracts. A quick search of Mathematical Reviews (MathSciNet) turned up a mere 1800 entries when the phrase 'neural network' was entered (and you should realize that much of the neural network literature, including all the above-mentioned journals, is
146
A. PlNKUS
not written for or by mathematicians and is not reviewed by Mathematical Reviews). In other words, this is an explosively active research area and deserves the attention of the readership of Ada Numerica. Initially there was a definite lack of mathematical sophistication to the theory. It tended to be more a collection of ad hoc techniques with debatable justifications. To a pure mathematician, such as the author, reading through some of the early literature in the field was an alien experience. In recent years the professionals (especially statisticians) have established a more organized framework for the theory. The reader who would like to acquire a more balanced and enlarged view of the theory of neural networks is urged to peruse a few of the abovementioned texts. An additional excellent source of information about neural networks and its literature is the 'frequently asked questions' (FAQs) of the Usenet newsgroup comp.ai.neural-nets: see Sarle (1998). This survey is not about neural networks per se, but about the approximation theory of the multilayer feedforward perceptron (MLP) model in neural networks. We will consider certain mathematical, rather than computational or statistical, problems associated with this widely used neural net model. More explicitly, we shall concern ourselves with problems of density (when the models have at least the theoretical capability of providing good approximations), degree of approximation (the extent to which they can approximate, as a function of the number of parameters), interpolation, and related issues. Theoretical results, such as those we will survey, do not usually have direct applications. In fact they are often far removed from practical considerations. Rather they are meant to tell us what is possible and, sometimes equally importantly, what is not. They are also meant to explain why certain things can or cannot occur, by highlighting their salient characteristics, and this can be very useful. As such we have tried to provide proofs of many of the results surveyed. The 1994 issue of Ada Numerica contained a detailed survey: 'Aspects of the numerical analysis of neural networks' by S. W. Ellacott (1994). Only five years have since elapsed, but the editors have again opted to solicit a survey (this time albeit with a slightly altered emphasis) related to neural networks. This is not unwarranted. While almost half of that survey was devoted to approximation-theoretic results in neural networks, almost every one of those results has been superseded. It is to be hoped that the same will be said about this paper five years hence. 2. The MLP model One of the more conceptually attractive of the neural network models is the multilayer feedforward perceptron (MLP) model. In its most basic form this is a model consisting of a finite number of successive layers. Each layer
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
147
consists of a finite number of units (often called neurons). Each unit of each layer is connected to each unit of the subsequent (and thus previous) layer. These connections are generally called links or synapses. Information flows from one layer to the subsequent layer (thus the term feedforward). The first layer, called the input layer, consists of the input. There are then intermediate layers, called hidden layers. The resulting output is obtained in the last layer, not surprisingly called the output layer. The rules and regulations governing this model are the following. 1. The input layer has as output of its jth unit the (input) value XQJ. 2. The kth unit of the ith layer receives the output Xij from each jth unit of the (i — l)st layer. The values x^ are then multiplied by some constants (called weights) w^ and these products are summed. 3. A shift 6ik (called a threshold or bias) and then a fixed mapping a (called an activation function) are applied to the above sum and the resulting value represents the output Xi+\^ of this kth. unit of the ith layer, that is,
A priori one typically fixes, for whatever reasons, the activation function, the number of layers and the number of units in each layer. The next step is to choose, in some way, the values of the weights w^ and thresholds Oik- These latter values are generally chosen so that the model behaves well on some given set of inputs and associated outputs. (These are called the training set.) The process of determining the weights and thresholds is called learning or training. In the multilayer feedforward perceptron model, the basic learning algorithm is called backpropagation. Backpropagation is a gradient descent method. It is extremely important in this model and in neural network theory. We shall not detail this algorithm nor the numerous numerical difficulties involved. We will classify multilayer feedforward perceptron models not by their number of layers, but by their number of hidden layers, that is, the number of layers excluding the input and output layer. As is evident, neural network theory has its own terminology. Unfortunately it is also true that this terminology is not always consistent or logical. For example, the term multilayer perceptron is generically applied to the above model with at least one hidden layer. On the other hand the word perceptron was coined by F. Rosenblatt for the no hidden layer model with the specific activation function given by the Heaviside function
^ ^ { ,
t
148
A. PlNKUS
Thus
(2.1)
for some choice of a, Wjk and 9k, j — 1, . , n, k = 1,..., m. This no hidden layer perceptron network is generally no longer used, except in problems of linear separation. There is a simple mathematical rationale for this. A function of the form (2.1) is constant along certain parallel hyperplanes and thus is limited in what it can do. For example, assume m = 1 (one output), n = 2, and a is any increasing function. If the input is x = {x\, x^) and the output is y, then y = a (wixi + W2X2 — 9).
Assume we are given four inputs x 1 , x 2 , x 3 and x 4 , no three of which lie on a straight line. Then, as is easily seen, there are output values which cannot be interpolated or approximated well. For example, assume x 1 and x 2 lie on opposite sides of the line through x 3 and x 4 . Set y\ = 2/2 — 1> 2/3 = 2/4 = 0. Then we cannot solve yt = a (w\x\ + ui2X2 - 9) ,
i = 1,..., 4,
for any choice of wi,W2 and 9. In fact the difference between at least one of the yi and the associated output will be at least 1/2. This is totally unacceptable if one wishes to build a network that can approximate well any reasonable function, or classify points according to different criteria. With the Heaviside activation function and no hidden layer, two sets of points can be separated {classified) by this model if and only if they are linearly separable. To do more, hidden layers are necessary. The problem of being able to arbitrarily separate iV generic points in R" into two sets by use of a one hidden layer perceptron model with Heaviside activation function (and one output) was considered by Baum (1988). He showed that the problem is solvable if one uses at least [N/n] units in the hidden layer. This model can be used with both continuously valued and discrete inputs. Baum considers the latter; we will consider the former. We will prove that hidden layers and nonlinearity (or, to be more precise, nonpolynomiality) of the activation function make for models that have the capability of approximating (and interpolating) arbitrarily well.
APPROXIMATION THEORY OF THE MLP
MODEL IN NEURAL NETWORKS
149
The model presented above permits generalization, and this can and often is done in a number of ways. The activation function may change from layer to layer (or from unit to unit). We can replace the simple linearity at each unit (i.e., ^2j WijkXij) by some more complicated function of the x,j. The architecture may be altered to allow for different iinks between units of different layers (and perhaps also of the same layer). These are just a few of the many, many possible generalizations. As the mathematical analysis of the multilayer perceptron model is far from being well understood, we will consider only this basic model, with minor modifications. For example, while it is usual in the multilayer perceptron model to apply the same activation function at each hidden layer, it is often the case, and we will follow this convention here, that there be no activation function or threshold applied at the output layer. There may be various reasons for this, from a practical point of view, depending on the problem considered. From a mathematical perspective, applying an activation function to the output layer, especially if the activation function is bounded, is unnecessarily restrictive. Another simplification we will make is to consider models with only one output (unless otherwise noted). This is no real restriction and will tremendously simplify our notation. With the above modifications (no activation function or threshold applied to the output layer and only one output), we write the output y of a single hidden layer perceptron model with r units in the hidden layer and input "5C — (7*1
"T*
J
flS
1=1
V 3=1
/
Here Wij is the weight between the jth unit of the input and the zth unit in the hidden layer, Q{ is the threshold at the ith unit of the hidden layer, and Ci is the weight between the ith unit of the hidden layer and the output. We will generally write this more succinctly as y =
where w x = X)?=i wj£j is the standard inner product. We can also express the output y of a two hidden layer perceptron model with r units in the first hidden layer, s units in the second hidden layer, and input x = (xi,..., xn). It is y = Y^ dk<* ( Yl cik°(™ik
x
" eik) " 7fc I
Jfc=l V i=l / T h a t is, we iterate t h e one hidden layer model. W e will n o t write o u t t h e exact formula for t h e o u t p u t of this model with more hidden layers.
150
A. PlNKUS
Some common choices for activation functions a (all may be found in the literature) are the following. 1. T h e Heaviside function mentioned above, t h a t is, a(t) — X [ o ) ) This is sometimes referred to in the neural network literature as the
threshold function. 2. The logistic sigmoid given by
3. a(t) — tanh(£/2), which is, up to a constant, just a shift of the logistic sigmoid. 4. The piecewise linear function of the form
f a(t) = I (t
ll, 5. T h e Gaussian
\
sigmoid given by
6. The arctan sigmoid given by a(t) = — arctan(t) + - . 7T
2
The logistic sigmoid is often used because it is well suited to the demands of backpropagation. It is a C2 function whose derivative is easily calculated. Note that all the above functions are bounded (generally increasing from 0 to 1). The term sigmoidal is used for the class of activation functions satisfying lim^-co a(t) = 0 and hmt_+oo cr(t) = 1. However, there is a certain lack of consistency in the terminology. Some authors also demand that a be continuous and/or monotonic (or even strictly monotonic) on all of R. Others make no such demands. We shall try to be explicit in what we mean when we use the term. 3. Density In this section we will consider density questions associated with the single hidden layer perceptron model. That is, we consider the set M(a) = span{cr(w x - 6) : « e i , w eM"}, and ask the following question. For which a is it true that, for any / 6 C(Rn), any compact subset K of Rn, and any e > 0, there exists age M{cr) such that
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
151
In other words, when do we have density oi the linear space M{a) in the space C(R n ), in the topology of uniform convergence on compacta (compact sets)? In fact we shall also restrict the permissible set of weights w and thresholds 6. To set terminology, we shall say that a has the density property if M.(a) is dense in C(Rn) in the above topology. It should be noted that this norm is very strong. If /i is any nonnegative finite Borel measure, with support in some compact set K, then C(K) is dense in LJp{K,^i) for any 1 < p < oo. Thus the results of this section extend also to these spaces. In the renaissance of neural net theory that started in the mid-1980s, it was clearly understood that this density question, whether for the single hidden or any number of hidden layer perceptron model, was of fundamental importance to the theory. Density is the theoretical ability to approximate well. Density does not imply a good, efficient scheme for approximation. However, a lack of density means that it is impossible to approximate a large class of functions, and this effectively precludes any scheme based thereon from being in the least useful. This is what killed off the efficacy of the no hidden layer model. Nonetheless it should be understood that density does not imply that one can approximate well to every function from -Oi)
: Ci,6i 6 R , w ' 6
i=l
for some fixed r. On the contrary, there is generally a lower bound (for any reasonable set of functions) on the degree to which one can approximate using A4r(a), independent of the choice of a. (We consider this at some length in Section 6.) This is to be expected and is natural. It is, in a sense, similar to the situation with approximation by polynomials. Polynomials are dense in C[0,1] but polynomials of any fixed degree are rather sparse. (Note also that the sets Air(a) are not subspaces. However, they do have the important property that M.r(o~) + M.s(o~) = A4r+S(a).) Hecht-Nielsen (1987) was perhaps the first to consider the density problem for the single hidden layer perceptron model. He premised his observations on work based on the Kolmogorov Superposition Theorem (see Section 7). While many researchers subsequently questioned the exact relevance of this result to the above model, it is certainly true that this paper very much stimulated interest in this problem. In one of the first proceedings of the IEEE on the topic of neural networks, two papers appeared which discussed the density problem. Gallant and White (1988) constructed a specific continuous, nondecreasing sigmoidal function from which it was possible to obtain any trigonometric (Fourier) series. As such their activation function, which they called a cosine squasher, had the density property. Irie and Miyake (1988) constructed an integral representation for any / G L*(Rn) using a kernel of the form
152
A. PlNKUS
allowed an interpretation in the above framework (but of course restricted to o-e LX(R)). In 1989 there appeared four much cited papers which considered the density problem for general classes of activation functions. They are Carroll and Dickinson (1989), Cybenko (1989), Funahashi (1989), and Hornik, Stinchcombe and White (1989). Carroll and Dickinson (1989) used a discretized inverse Radon transform to approximate L2 functions with compact support in the L2 norm, using any continuous sigmoidal function as an activation function. The main result of Cybenko (1989) is the density property, in the uniform norm on compacta, for any continuous sigmoidal function. (Cybenko does not demand monotonicity in his definition of sigmoidality.) His method of proof uses the Hahn-Banach Theorem and the representation (Riesz Representation Theorem) of continuous linear functionals on the space of continuous functions on a compact set. Funahashi (1989) (independently of Cybenko (1989)) proves the density property, in the uniform norm on compacta, for any continuous monotone sigmoidal function. He notes that, for a continuous, monotone and bounded, it follows that CT(- + a) — a(- + (3) € LX(R) for any a, (3. He then applies the previously mentioned result of Irie and Miyake (1988). Hornik, Stinchcombe and White (1989), unaware of Funahashi's paper, prove very much the same result. However, they demand that their activation function be only monotone and bounded, that is, they permit noncontinuous activation functions. Their method of proof is also totally different, but somewhat circuitous. They first allow sums and products of activation functions. This permits them to apply the Stone-Weierstrass Theorem to obtain density. They then prove the desired result, without products, using cosine functions and the ability to write products of cosines as linear combinations of cosines. There were many subsequent papers which dealt with the density problem and some related issues. We quickly review some, but not all, of them. Stinchcombe and White (1989) prove that a has the density property ^ 0. Cotter (1990) considers differfor every a G LX(R) with J^a^dt ent types of models and activation functions (non-sigmoidal) for which the Stone-Weierstrass Theorem can be employed to obtain density, for instance a(t) = e*, and others. Jones (1990) shows, using ridge functions (which we shall soon define), that to answer the question of density it suffices to consider only the univariate problem. He then proves, by constructive methods, that a bounded (not necessarily monotone or continuous) sigmoidal activation function suffices. Stinchcombe and White (1990) also reduce the question of density to the univariate case and then consider various activation functions (not necessarily sigmoidal) such as piecewise linear (with at least one knot), a subset of polynomial splines, and a subset of analytic functions. They also consider the density question when bounding the set of permissible weights and thresholds. Hornik (1991) proves density for any
APPROXIMATION THEORY OF THE MLP
MODEL IN NEURAL NETWORKS
153
continuous bounded and nonconstant activation function, and also in other norms. Ito, in a series of papers (Ito 1991a, 19916 and 1992) studies the problem of density using monotone sigmoidal functions, with only weights of norm 1. He also considers conditions under which one obtains uniform convergence on all of R™. Chui and Li (1992) constructively prove density where the activation function is continuous and sigmoidal, with weights and thresholds taking only integer values. Mhaskar and Micchelli (1992) extend the density result to what they call A;th degree sigmoidal functions. They prove that if a is continuous, bounded by some polynomial of degree k on all of R, and ait) , ait) lim K = 0, lim - K^ = 1, t—>-oc t
i—>oo t
then density holds if and only if a is not a polynomial. Other results may be found in Light (1993), Chen and Chen (1993, 1995), Chen, Chen and Liu (1995), Attali and Pages (1997) and Burton and Dehling (1998). As we have noted, a variety of techniques were used to attack a problem which many considered important and difficult. The solution to this problem, however, turns out to be surprisingly simple. Leshno, Lin, Pinkus and Schocken (1993) prove that the necessary and sufficient condition for any continuous activation function to have the density property is that it not be a polynomial. Also considered in that paper are some sufficient conditions on noncontinuous activation functions which also imply density. For some reason the publication of this article was delayed and the submission date incorrectly reported. In a subsequent issue there appeared a paper by Hornik (1993) which references Leshno, Lin, Pinkus and Schocken (1993) and restates and reproves their results in a slightly altered form. In Pinkus (1996) a somewhat different proof is given and it is also noted that the characterization of continuous activation functions with the density property can be essentially found in Schwartz (1944) (see also Edwards (1965, pp. 130133)). The problem is in fact very much related to that of characterizing translation (and dilation) invariant subspaces of C(R), in the topology of uniform convergence on compacta. As we have said, the main theorem we will prove is the following. Theorem 3.1 Let a <E C(R). Then M(a) is dense in C(R n ), in the topology of uniform convergence on compacta, if and only if a is not a polynomial. If a is a polynomial, then density cannot possibly hold. This is immediate. If a is a polynomial of degree m, then, for every choice of w € R™ and 0 G R, cr(w x — 9) is a (multivariate) polynomial of total degree at most m, and thus A4(a) is the space of all polynomials of total degree m and does not span C(R"). The main content of this theorem is the converse result.
154
A. PlNKUS
We shall prove considerably more than is stated in Theorem 3.1. We shall show that we can, in diverse cases, restrict the permissible weights and thresholds, and also enlarge the class of eligible a, while still obtaining the desired density. The next few propositions are amalgamations of results and techniques in Leshno, Lin, Pinkus and Schocken (1993) and Schwartz (1944). We start the analysis by denning ridge functions. Ridge functions are multivariate functions of the form g(aixx H
\- anxn) = g(a
x)
where g : R —> R and a = (ai,..., an) G R n \{0} is a fixed direction. In other words, they are multivariate functions constant on the parallel hyperplanes a x = c, c € R. Ridge functions have been considered in the study of hyperbolic partial differential equations (where they go under the name of plane waves), computerized tomography, projection pursuit, approximation theory, and neural networks (see, for instance, Pinkus (1997) for further details). Set 11 = span{#(a x) : a € R", g: R -> R}. Ridge functions are relevant in the theory of the single hidden layer perceptron model since each factor cr(w x — 9) is a ridge function for every choice of a, w and 0. It therefore immediately follows that a lower bound on the extent to which this model with r units in the single hidden layer can approximate any function is given by the order of approximation from the manifold
(We return to this fact in Section 6.) In addition, if ridge functions are not dense in C(R"), in the above topology, then it would not be possible for M(o~) to be dense in C(Rn) for any choice of a. But ridge functions do have the density property. This is easily seen. 1Z contains all functions of the form cos(a x) and sin(a x). These functions can be shown to be dense on any compact subset of C(R"). Another dense subset of ridge functions is given by e a x . Moreover, the set span{(a x)fe : a € Rn, k = 0,1,...} contains all polynomials and thus is dense. In fact we have the following result due to Vostrecov and Kreines (1961) (see also Lin and Pinkus (1993)), which tells us exactly which sets of directions are both sufficient and necessary for density. We will use this result.
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
155
Theorem 3.2. (Vostrecov and Kreines 1961) The set of ridge functions TZ(A) = span{#(a x) : g G C(R), a G A} in dense in C(Rn), in the topology of uniform convergence on compacta, if and only if there is no nontrivial homogeneous polynomial that vanishes on A. Because of the homogeneity of the directions (allowing a direction a is equivalent to allowing all directions /xa for every real /z, since we vary over all g € C(R)), it in fact suffices to consider directions normalized to lie on the unit ball Theorem 3.2 says that 1Z(A) is dense in C(R n ), for A C S n ~ \ if no nontrivial homogeneous polynomial has a zero set containing A. For example, if A contains an open subset of 5"" 1 then no nontrivial homogeneous polynomial vanishes on A. In what follows we will always assume that A C Sn~1. The next proposition is a simple consequence of the ridge function form of our problem, and immediately reduces our discussion from Rn to the more tractable univariate R. In what follows, A, 0 will be subsets of R. By A x A we mean the subset of Rn given by A x A = {Xa : A G A, a G ^4}. Proposition 3.3 Assume A, 0 are subsets of R for which N{(T; A, 0) = span{o-(A£ - 0) : A G A, 9 G 0} is dense in C(R), in the topology of uniform convergence on compacta. Assume in addition that A C Sn~l is such that 1Z(A) is dense in C(R n ), in the topology of uniform convergence on compacta. Then M(CT; A x A, 0) = span{cr(w x — 0) : w G A x A, 9 G 0} is dense in C(R n ), in the topology of uniform convergence on compacta. Proof. Let / G C(K) for some compact set K in R". Since H(A) is dense in C(K), given e > 0 there exist gi G C(R) and a1 G A, i = 1,..., r (some r) such that
2=1
for all x G K. Since K is compact, {a1 x : x G K} C [aj,/3j] for some finite interval [«j, j3i\, i — 1,..., r. Because N{cr; A, ©) is dense in C[aj, /%], i — 1,..., r, there exist constants Cjj G R, A^- G A and Oij G 0, j = 1,..., mj,
156
A. PINKUS
i = 1 , . . . , r, for which 9i(t) ~ .7=1
for all t € [«i, f3i] and i = 1,..., r. Thus /(x)-
x i=l
<e
j=l
for all x € K. Proposition 3.3 permits us to focus on restricted class of activation functions. Proposition 3.4 Let a G C°° ;R,R) is dense in C(R).
We first prove density for a
and assume a is not a polynomial. Then
Proof. It is well known (in fact it is a well-known problem given to advanced math students) that, if a € C°° on any open interval and is not a polynomial thereon, then there exists a point — 9O in that interval for which a^k\—6o) ^ 0 for all k = 0,1, 2 , . . . . The earliest reference we have found to this result is Corominas and Sunyer Balaguer (1954). It also appears in the more accessible Donoghue (1969, p. 53), but there exist simpler proofs than that which appears there. Since a € C°°(R), and [a((X + h)t - 6O) - a(Xt - 6o)}/h G 7V(a;R,R) for all h ^ 0, it follows that A=0
is contained in N(o~; R, R), the closure of A/"(CT; R, R). By the same argument
is contained in M{a; R, R) for any k. Since a^k\-6o) ^ 0, k = 0,1,2,..., the set N{o~, R, R) contains all monomials and thus all polynomials. By the Weierstrass Theorem this implies that M{a; R, R) is dense in C(K) for every compact K C R. Let us consider this elementary proof in more detail. What properties of the function a and of the sets A and © of weights and thresholds, respectively, did we use? In fact we only really needed to show that
is contained in M(a; A, 0) for every k, and that a^(—6O) ^ 0 for all k. It
APPROXIMATION THEORY OF THE MLP
MODEL IN NEURAL NETWORKS
157
therefore suffices that A be any set containing a sequence of values tending to zero, and a G C°°(0), where 0 contains an open interval on which a is not a polynomial. Let us restate Proposition 3.4 in this more general form. Corollary 3.5 Let A be any set containing a sequence of values tending to zero, and let 0 be any open interval. Let a : R — R be such that cr G C°°(0), and a is not a polynomial on 0 . Then Af(a; A, 0 ) is dense in
C{R). We also note that the method of proof of Proposition 3.4 shows that, under these conditions, in the closure of the linear combination of k + 1 shifts and dilations of a are the space of polynomials of degree k. We will use this fact in Section 6. As such we state it formally here.
Corollary 3.6 Let Afr (a) =
i
i=l
If 0 is any open interval and a G C°°(0) is not a polynomial on 0 , then J\fr(a) contains 7rr_i, the linear space of algebraic polynomials of degree at most r — 1. We now consider how to weaken our smoothness demands on a. We do this in two steps. We again assume that A = 0 = R. However, this is not necessary and, following the proof of Proposition 3.8, we state the appropriate analogue of Corollary 3.5. Proposition 3.7 Let a G C(R) and assume a is not a polynomial. Then A/"(a;R,R) is dense in C(R). Proof. Let <j> G C£°(R), that is, C°°(R) with compact support. For each such (j) set o CT / (* J—oo
- y)(y) dy.
that is, 0-$ = a * 4> is the convolution of a and <j>. Since a, <\> G C(R) and <j> has compact support, the above integral converges for all t, and as is easily seen (taking Riemann sums) a^ is contained in the closure of N(o; {1},R). Furthermore, cu G C°°' It also follows that J\f(a(p;M.,'M.) is contained in A/"(«r;R,R) since TOO
&)= I
a(Xt - 9 - y)(y) d y ,
J—oo
for each A G R. Because a^ G C°° (R) we have, from the method of proof of Proposition 3.4, that tka%\-6) is in J\f(a(j);R,R) for all 6 G R and all k.
158
A. PINKUS
Now if M{a; R, R) is not dense in C(R) then tk is not in M(a; R, R) for some k. Thus tk is not in A/"(^;R,R) for each <j> G C£°(R). This implies that a^\-6) = 0 for all 0 <E R and each 0 G C£°(R). Thus „, it therefore follows that a is a polynomial of degree at most k — 1. This contradicts our assumption. We first assumed a G C°°(R) and then showed how to obtain the same result for a G C(R). We now consider a class of discontinuous a. We prove that the same result (density) holds for any a that is bounded and Riemann-integrable on every finite interval. (By a theorem of Lebesgue, the property of Riemann-integrability for such functions is equivalent to demanding that the set of discontinuities of a has Lebesgue measure zero: see, for instance, Royden (1963, p. 70).) It is not true that, for arbitrary <7, the space A/"( R is bounded and Riemann-integrable on every finite interval. Assume a is not a polynomial (almost everywhere). Then A^(cr;R,R) is dense in C(R). Proof. It remains true that, for each
a^t) = I
a(t-y)(f>(y)dy
J—oo
is in C°°(R). Furthermore, for the a^,n as defined in Proposition 3.7 we have that
for every 1 < p < oo and any compact K (see, for instance, Adams (1975, p. 30)). As such, if a^n is a polynomial of degree at most k — 1 for each n, then a is (almost everywhere) also a polynomial of degree at most k — 1. Thus the proof of this proposition exactly follows the method of proof of Proposition 3.7 if we can show that a^ is in the closure of Af(a; {1},R) for each G CQ°(R). This is what we now prove. Let 4> G C Q ° ( R ) and assume that has support in [—a, a]. Set Vi = -a-\
2ia , m
i = 0 , 1 , . . . , 77i,
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
= [yi-i,yi],
and Ay* = yi - Vi-i = 2 a / m , i = l,...,m. G
159
By definition,
JV(a; {l}, R)
for each m. We will prove that the above sum uniformly converges to as on every compact subset K of R. By definition,
m
V
-
/
)] dy.
Since a is bounded on i f - [—a, a], and is uniformly continuous on [—a,a], it easily follows that lim i=l
Now
[o{t-y)-a(t-yi)]4>{y)&y t=i
2a m Since a is Riemann-integrable on K — [—a, a], it follows that lim
sup cr(i — y) — inf
This proves the result. It is not difficult to check that the above conditions only need to hold locally, as in Corollary 3.5. Corollary 3.9 Let A be any set containing a sequence of values tending to zero, and let G be any open interval. Assume a : R —> R is such that
160
A. PlNKUS
a is bounded and Riemann-integrable on © and not a polynomial (almost everywhere) on ©. Then M(<J; A, ©) is dense in C(R). The above results should not be taken to mean that we recommend using only a minimal set of weights and thresholds. Such a strategy would be wrong. In the cases thus far considered it was necessary, because of the method of proof, that we allow dilations (i.e., the set A) containing a sequence tending to zero. This is in fact not necessary. We have, for example, the following simple result, which is proven by classical methods. Proposition 3.10 Assume a G C(R) D L1(R), or a is continuous, nondecreasing and bounded (but not the constant function). Then H(a; {1},R) is dense in C(M). Proof. Assume a G C(R) D LX(R). Continuous linear functional on C(R) are represented by Borel measures of finite total variation and compact support. If Af(a;{l},M.) is not dense in C(R), then there exists such a nontrivial measure // satisfying poo / a(t - 6) dfi(t) = 0 J—oo
for all 6 G R. Both a and // have 'nice' Fourier transforms. Since the above is a convolution this implies for all u G R. Now /x is an entire function (of exponential type), while a is continuous. Since a must vanish where ju ^ 0, it follows that a = 0 and thus a = 0. This is a contradiction and proves the result. If a is continuous, nondecreasing and bounded (but not the constant function), then <j(- + a) — a(-) is in C(R) flLl(M) (and not the zero function) for any fixed a ^ 0. We can now apply the result of the previous paragraph to obtain the desired result. The above proposition does not begin to tell the full story. A more formal study of JV(CT; {1},R) was made by Schwartz (1947), where he introduced the following definition of the class of mean-periodic functions. Definition. A function / G C(Rn) is said to be mean-periodic if span{/(x - a) : a G R n } is not dense in C(R n ), in the topology of uniform convergence on compacta. Translation-invariant subspaces (such as the above space) have been much studied in various norms (more especially L2 and Ll). The study of meanperiodic functions was an attempt to provide a parallel analysis for the space
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
161
C(M.n). Unfortunately this subject is still not well understood for n > 1. Luckily we are interested in the univariate case and Schwartz (1947) provided a thorough analysis of such spaces (see also Kahane (1959)). The theory of mean-periodic functions is, unfortunately, too complicated to present here with proofs. The central result is that subspaces span{/(i - a) : a G R} spanned by mean-periodic functions in C(R) are totally characterized by the functions of the form £me7* which are contained in their closure, where 7 € C. (These values 7 determine the spectrum of / . Note that if 7 is in the spectrum, then so is 7.) Prom this fact follows this next result. Proposition 3.11 Let a G C(R), and assume that a is not a polynomial. For any A that contains a sequence tending to a finite limit point, the set M{a\ A,R) is dense in C(R). Proof. Let 6 G A\{0}. If a(6t) is not mean-periodic then span{a(6t - 6) : 9 G R} is dense in C(R), and we are finished. Assume not. Since a is not a polynomial the above span contains, in its closure, £me7* for some nonnegative integer m and 7 G C\{0}. (We may assume m = 0 since, by taking a finite linear combination of shifts, it follows that e7* is also contained in the above closure.) Thus the closure of span{a(Ai - 6) : 6 G R, A G A} contains e^/^* for every A G A. We claim that span{e(7A/6)t : A G A} is dense in C(R) if A has a finite limit point. This is a well-known result. One can prove it by the method of proof of Proposition 3.4. Alternatively, if the above span is not dense then
for some nontrivial Borel measure // of finite total variation and compact support. Now o
g(z)= J—0
is an entire function on C. But g vanishes on the set {7A/5 : A G A}, and this set contains a sequence tending to a finite limit point. This implies that g is identically zero, which in turn implies that [i is the zero measure. This contradiction proves the density.
162
A. PINKUS
Remark. As may be noted from the method of proof of Proposition 3.11, the condition on A can be replaced by the demand that A not be contained in the zero set of a nontrivial entire function. We should also mention that Schwartz (1947, p. 907) proved the following result. Proposition 3.12 Let a G C(R). If o G i7(R), 1 < p < oo, or a is bounded and has a limit at infinity or minus infinity, but is not the constant function, then a is not mean-periodic. Thus, in the above cases M(a; {A}, R) is dense in C(R) for any A / 0. Remark. If the input is preprocessed, then, rather than working directly with the input x = ( x i , . . . , x n ) , this data is first converted to h(x) = (/ii(x),...,/j m (x)) for some given fixed continuous functions hj G C(R n ), j = 1 , . . . , m. Set Mh(a)
= span{a(w
h(x) - 0) : w G R m , 6 G R}.
Theorem 3.1 is still valid in this setting if and only if h separates points, that is, x* ^ x-7 implies h(x l ) ^ h(x J ) (see Lin and Pinkus (1994)). Analogues of the other results of this section depend upon the explicit form of h.
4. Derivative approximation In this section we consider conditions under which a neural network in the single hidden layer perceptron model can simultaneously and uniformly approximate a function and various of its partial derivatives. This fact is requisite in several algorithms. We first introduce some standard multivariate notation. We let Z™ denote the lattice of nonnegative multi-integers in R n . For m = ( m i , . . . , mn) G Z™ , (- m n , x m = x™1 x™n, and we set |m| — m\-\ <9lml If q is a polynomial, then by q(D) we mean the differential operator given by / d d \dxi' ' dxn We also have the usual ordering on Z™, namely m 1 < m 2 if m\ < m 2 , i = 1,... ,n. We say / G C m (R n ) if Dkf G C(R") for all k < m, k G 1\. We set
C ml -- m "(R n )= [\CX
APPROXIMATION THEORY OF THE MLP
MODEL IN NEURAL NETWORKS
163
and, as a special case, Cm(Rn) =
(~) C m (R n ) = {/ : Dkf
G C(Rn) for all |k| < m}.
We recall that M(a) = span{cr(w
x - 9) : w G R n , 9 G R}.
We say that M(a) is dense in c ™ 1 - - m ' ( R n ) if, for any / G C m l - - m ' ( R n ) , any compact K of R n , and any e > 0, there exists a j G M{cr) satisfying
for all k G Z™ for which k < m l for some i. We will outline a proof (skipping over various details) of the following result. T h e o r e m 4.1 Let m l G Z™, i = 1 , . . . , s, and set m = max{|m l | : i = 1 , . . . , s}. Assume a G Cm(M.) and a is not a polynomial. Then A4(a) is dense in C m l ' - ' m S ( R n ) . This density question was first considered by Hornik, Stinchcombe and White (1990). They showed that, if a^m) G C(R)nL 1 (R), then M(a) is dense in C m (R"). Subsequently Hornik (1991) generalized this to a G Cm(R) which is bounded, but not the constant function. Hornik uses a functional analytic method of proof. With suitable modifications his method of proof can be applied to prove Theorem 4.1. Ito (1993) reproves Hornik's result, but for a G C°°(R) which is not a polynomial. His method of proof is different. We essentially follow it here. This approach is very similar to the approach taken in Li (1996) where Theorem 4.1 can effectively be found. Other papers concerned with this problem are Cardaliaguet and Euvrard (1992), Gallant and White (1992), Ito (19946), Mhaskar and Micchelli (1995) and Attali and Pages (1997). Some of these papers contain generalizations to density in other norms, and related questions. mS Proof. Polynomials are dense in Cm ( R n ) . This may be shown in a number of ways. One proof of this fact is to be found in Li (1996). It therefore suffices to prove that one can approximate polynomials in the appropriate norm. If h is any polynomial on R n , then h can be represented in the form
(a^-x)
(4.1)
for some choice of r, a1 G R n , and univariate polynomials pi, i = 1 , . . . , r.
164
A. PINKUS
A precise proof of this fact is the following. (This result will be used again in Section 6, so we detail its proof here.) Let Hk denote the linear space of homogeneous polynomials of degree k (in R n ), and Pk = U*_OHS the linear space of polynomials of degree at most k. Set r = (n~l+k) = dimilfe. Let m 1 ,!!! 2 € 1\, \ml\ = |m 2 | = k. Then D m l x m 2 = C m i 6 m i m 2 , for some easily calculated C m i . This implies that each linear functional L on Hk may be represented by some q G Hk via L{p) = q(D)p for each p G Hk- Now (a-x) fc G Hk and £> m (a-x) fe = k\am if |m| = k. Thus ( D ) ( )
k
k \ ( )
Since r = dimHk, there exist r points a 1 , . . . , a r such that dimiffcl^ = r for A = { a 1 , . . . , a r } . We claim that {(a1 x)fc}£=1 span Hk- If not, there exists a nontrivial linear functional that annihilates each (a* x)fc. Thus some nontrivial q G Hk satisfies 0 = g(D)(a* x)fc = ifel^a*),
i = 1 , . . . , r.
This contradicts our choice of A, hence {(a2 -x)fc}£=1 span Hk- It also follows that {(a* x) s }^ = 1 spans Hs for each s = 0 , 1 , . . . , k. If not, then there exists a nontrivial q G Hs that vanishes on A. But, for any p G Hk-S, the function pq G Hk vanishes on A, which is a contradiction. Thus Pk = span{(a* x) s : % = 1 , . . . , r, s = 0 , 1 , . . . , k}. Let nk denote the linear space of univariate polynomials of degree at most k. It therefore follows that P i ( a l - x ) : p i G TTfc, i = l , . . . , r > . k i=i J
Thus /i may be written in the form (4.1). Hence it follows that it suffices (see the proof of Proposition 3.3) to prove that one can approximate each univariate polynomial p on any finite interval [a, /?] from A/"(cr; R, R) = span{cr(Ai - 0) : A, 6 G R} in the norm
Since a G CTO(R) is not a polynomial we have, from the results of Section 3, that A/"(cr(m);R,R) is dense in C(R). Let / e C m (R). Then, given e > 0, there exists a g G N{a; R, R) such that < e.
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
165
If every polynomial of degree at most m — 1 is in the closure of N(a; R, R) with respect to the norm || Ho^a,/?]! then, by choosing a polynomial p satisfying (k)/
\
\(k)/
I£
\
n
it follows, integrating m times, that g+p is close to / in the norm || ||cm[Q,/3]This follows by iterating the inequality
< ((3-a)
max !/(*>(*)-G/ + P)(fc)(*)la
We have thus reduced our problem to proving that each of 1, £,..., tm~1 is in the closure of M(a;R, R) with respect to the norm || ||cm[a,/?]Because a € C m (R) it follows from the method of proof of Proposition 3.4 that for k < m — 1, the function tka^k\—90) is contained in the closure of Af(cr;M.,R) with respect to the usual uniform norm || ||c[a,/9] o n a n y [aiP] (and since a is not a polynomial there exists a 9O for which a^k\—9O) ^ 0). A detailed analysis, which we will skip, proves that tk, k < m — 1, is contained in the closure of Af(a, M, R) with respect to the more stringent n norm || ||c-[a,/3]In the above we have neglected the numerous possible nuances which parallel those contained in Section 3 (see, for instance, Corollary 3.5, Propositions 3.10 and 3.11). 5. Interpolation The ability to approximate well is related to the ability to interpolate. If one can approximate well, then one expects to be able to interpolate (the inverse need not, in general, hold). Let us pose this problem more precisely in our setting. Assume we are given a G C(R). For k distinct points {x.l}k=1 C R n , and associated data {a.i}k=l C R, can we always find m, {w-7}^L1 C R", and {cj}£=i> idi}?=i
c R for w h i c h
Cjcr(wJ
x z — 8j) = ai,
for i = 1 , . . . , kl
3=1
Furthermore, what is the relationship between k and ml This problem has been considered, for example, in Sartori and Antsaklis (1991), Ito (1996), Ito and Saito (1996), and Huang and Babri (1998). In Ito and Saito (1996) it is proven that, if a is sigmoidal, continuous and nondecreasing, one can always interpolate with m = k and some {w-3}^L1 C S"1"1.
166
A. PINKUS
Huang and Babri (1998) extend this result to any bounded, continuous, nonlinear a which has a limit at infinity or minus infinity (but their w J are not restricted in any way). We will use a technique from Section 3 to prove the following result. Theorem 5.1 Let a £ C(R) and assume a is not a polynomial. For any k distinct points { x 1 } ^ C Rn and associated data {ai}^=1 C R, there exist {wJ}J =1 C R n , and {c,-}JLi, {0j}kj=l C R such that k
r(w J '
x l — 0j) = aj,
i = 1 , . . . , k.
(5.1)
Furthermore, if a is not mean-periodic, then we may choose {w J }^ = 1 C n
s -\
Proof. Let w £ R" i = 1 , . . . , k. Set w J and vary the Xj. We {cj}} =1 , {Aj}J=1 and
be any vector for which the w x* = U are distinct, = AjW for Xj £ R, j = 1 , . . . , k. We fix the above w will have proven (5.1) if we can show the existence of {^}jfc=1 satisfying aiXjU
- Oj) = cti,
i = l,...,k.
(5.2)
Solving (5.2) is equivalent to proving the linear independence (over A and 9) of the k continuous functions a(Xti — 9), i = 1 , . . . , k. If these functions are linearly independent there exist Xj, 9j, j = 1 , . . . , k, for which
and then (5.2) can be solved, with these {Aj}^ =1 and {6j}j=l, for any choice of {aj}^ =1 . If, on the other hand, they are linearly dependent then there exist nontrivial coefficients {c?i}^=1 for which k
ia(Xti-9) = 0,
(5.3)
(5.4)
for all A, 9 e R. We rewrite (5.3) in the form
for all A, 9 £ R with the measure
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
167
(<5t; is the measure with point mass 1 at U). The measure djl is a nontrivial Borel measure of finite total variation and compact support. In other words, it represents a nontrivial linear functional on C(R). We have constructed, in (5.4), a nontrivial linear functional annihilating a(\t — 6) for all A, 0 G R. This implies that span{a(At - 0) : A, 0 G R} is not dense in C(R), which contradicts Proposition 3.7. This proves Theorem 5.1 in this case. If a is not mean-periodic, then span{a(* - 0) : 0 E R} is dense in C(R). As above this implies that the {a(t{ — 0)}f=1 are linearly independent for every choice of distinct {U}^=1. Thus, for any w € 5 n ~ 1 for which the w x l = ti are distinct, i = 1,..., k, there exist {0j}^=1 such that
Choosing wJ; = w, j = 1,..., k, and the above {0j}^_1, we can solve (5.1). If a is a polynomial, then whether we can or cannot interpolate depends upon the choice of the points {x*}jL1 and on the degree of a. If a is a polynomial of exact degree r, then span{cr(w x - 0) : w G 5"" 1 , 0 G R} is precisely the space of multivariate polynomials of total degree at most r. 6. Degree of approximation For a given activation function a we set, for each r, Mr(
x
" °i)
c
*' 6i e M, w* G
We know, based on the results of Section 3, that if a G C(R) is not a polynomial then to each / G C(K) (K a compact subset of Rn) there exist gr G Air(a) for which lim max |/(x) — gr{x)\ = 0. However, this tells us nothing about the rate of approximation. Nor does it tell us if there is a method, reasonable or otherwise, for finding 'good' approximants. It is these questions, and more especially the first, which we will address in this section.
168
A. PINKUS
We first fix some additional notation. Let Bn denote the unit ball in R n , that is, Bn = {x : ||x|| 2 = {x\ + + 4 ) V 2 < 1}_ In this section we approximate functions defined on Bn. Cm(Bn) will denote the set of all functions / defined on Bn for which Dkf is defined and continuous on Bn for all k G Z™ satisfying |k| < m (see Section 4). The Sobolev space W™ = W™(Bn) may be defined as the completion of Cm{Bn) with respect to the norm \\m,p
I — 1- . . ,
\
[ max o <| k |< m |-DK/||oo,
P = oo
or some equivalent norm thereon. Here p
\ ess sup x e B n |g(x)|,
p = oo.
We set B? = B™(B ) = { / : / € W™, ||/|| m j P < 1}. Since Bn is compact and C{Bn) is dense in Lp = Lp(Bn), we have that M{a) is dense in Lp for each a G C(R) that is not a polynomial. We will first consider some lower bounds on the degree to which one can approximate from M.r{a). As mentioned in Section 3, for any choice of w e R", 0 G R, and function a, each factor n
cr(w x - 9) is a ridge function. Set
i=i
Since A4r(cr) C 7?.r for any a G C(R), it therefore follows that, for every norm || ||x on a normed linear space X containing lZr, E(f;Mr(a);X)
=
inf
\\f-g\\x
> inf \\f-g\\x
= E(f;Kr;X).
(6.1)
Can we estimate the right-hand side of (6.1) from below in some reasonable way? And if so, how relevant is this lower bound? Maiorov (1999) has proved the following lower bound. Assume m > 1 and n > 2. Then for each r there exists an / G B™ for which E(f; TZr; L2) > Cr-m/(n-l\
(6.2)
Here, and throughout, C is some generic positive constant independent of the things it should be independent of! (In this case, C is independent of / and r.) The case n = 2 may be found in Oskolkov (1997). Maiorov also
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
169
proves that for each / £ B™ E(f;nr\L2)
< O-™/(n-D.
(63)
Thus he obtains the following result. Theorem 6.1. (Maiorov 1999) For each n > 2 and m > 1, E(B?;Kr;L2)
= sup
fit/;^;^)^-"*'1).
To be somewhat more precise, Maiorov (1999) proves the above result for B™ for all m > 0, and not only integer m (the definition of B^ for such m is then somewhat different). In addition, Maiorov, Meir and Ratsaby (1999) show that the set of functions for which the lower bound (6.2) holds is of large measure. In other words, this is not simply a worst case result. The proof of this lower bound is too difficult and complicated to be presented here. However, the proof of the upper bound is more elementary and standard, and will be used again in what follows. As such we exhibit it here. It is also valid for every p € [1, oo]. Theorem 6.2 For each p 6 [1, oo] and every m > 1 and n > 2, where C is some constant independent of r. Proof. As in the proof of Theorem 4.1, let H^ denote the linear space of homogeneous polynomials of degree k (in Mn), and Pk = Lig-0Hs the linear space of polynomials of degree at most k. Set r = (n~l+k) = dimHk- Note that r x kn~l. We first claim that Pj C K r . This follows from the proof of Theorem 4.1 where it is proven that if iXk is the linear space of univariate polynomials of degree at most k, then for any set of a 1 ,... ,a r satisfying dimHk\A = r, where A = { a 1 , . . . , a r } , we have g i { a l - x ) : flfj G 7Tfe, i = l , . .
Thus Pfc C TZr, and therefore
E(B™;nr;Lp)<E(B™;Pk;Lp). It is a classical result that Since r x A;""1 it therefore follows that E{B™;Pk;Lp)
170
A. PlNKUS
Remark. Not only is it true that E(B™; Pk\ Lp) < Ck~m, but there also exist, for each p, m and k, linear operators L : W™ —> Pk for which sup | | / - L ( / ) | | p < C A ; - m . fB This metatheorem has been around for years. For a proof, see Mhaskar (1996). Theorem 6.2 is not a very strong result. It simply says that we can, using ridge functions, approximate at least as well as we can approximate with any polynomial space contained therein. Unfortunately the lower bound (6.2), currently only proven for the case p = 2, says that we can do no better, at least for the given Sobolev spaces. This lower bound is also, as was stated, a lower bound for the approximation error from M.r{a) (for every a E C(R)). But how relevant is it? Given p E [l,oo] and m, is it true that for all a E C(R) we have for some C? No, not for all a € C(R) (see, for example, Theorem 6.7). Does there exist a E C(R) for which for some C? The answer is yes. There exist activation functions for which this lower bound is attained. This in itself is hardly surprising. It is a simple consequence of the separability of C[—1,1]. (As such the a exhibited are rather pathological.) What is perhaps somewhat more surprising, at first glance, is the fact that there exist activation functions for which this lower bound is attained which are sigmoidal, strictly increasing and belong toC°°< Proposition 6.3. (Maiorov and Pinkus 1999) There exist a E C° that are sigmoidal and strictly increasing, and have the property that for every g £TZr and e > 0 there exist a, 9t E R and w* E Rn, i = 1,..., r+n+1, satisfying r+n+l CjCr(wl g(x) — ^
x — di) < e
1=1
for all x 6 5 n This result and Theorem 6.2 immediately imply the following result. Corollary 6.4 There exist a E C°°(R) which are sigmoidal and strictly increasing, and for which E{Bpn\ Mr(a); Lp) < Cr"™^"- 1 ' for each p E [1, oo], m > 1 and n > 2.
(6.4)
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
171
Proof of Proposition 6.3. The space C[—1,1] is separable. That is, it contains a countable dense subset. Let {«m}m=i be such a subset. Thus, to each h G C[—1,1] and each e > 0 there exists a k (dependent upon h and e) for which
\h{t)-uk{t)\ <e for all £ € [—1,1]. Assume each um is in C°°[—1,1]. (We can, for example, choose the {um}m=i fr°m among the set of all polynomials with rational coefficients.) We will now construct a strictly increasing C°° sigmoidal function a, that is, limj^-oo a(t) = 0 and limt^oo a{t) = 1, such that, for each h G C[—1,1] and e > 0, there exists an integer m and real coefficients a™, a™, and a™ (all dependent upon /i and e) such that |/i(t) - (a? _ 00 a(t) = 0. From the construction there exists, for each k > 1, reals a\,a^, a\, for which af{a{t - 3) + o\o-{t o\o{t + 1) + a%<j(t + 4k 4k + 1) = uk(t). Let g G "Rr and e > 0 be given. We may write r
for some gj G C[—1,1] and aP G S"" 1 , j = l , . . . , r . From the above construction of a there exist constants b\, b^, b^ and an integers kj such that
|^(t) - (b{a{t - 3) + «4ff(t + 1) + b{a(t + kj))\ < e/r for all t G [-1,1] and j = 1 , . . . , r.
172
A. PlNKUS
Thus \gj{a.j
x) - (b{a(aj
x - 3) + l{o{a? x + 1) + lPza{aj
x + kj))\ < e/r
for all x 6 Bn, and hence r
g(x) - Y^ fe^a-7'
x - 3) + b12a(aj
x + 1) + tf3a(aj
x + A;,))< e
for all x e Bn. Now each cr(aJ x — 3), o(aP x + 1), j = 1 , . . . , r, is a linear function, that is, a linear combination of l,x\,..., xn. As such, the r
*
x - 3) + ft^a*
x + 1)
may be rewritten using at most n + 1 terms from the sum. This proves the proposition. Remark. The implications of Proposition 6.3 (and its proof) and Corollary 6.4 seem to be twofold. Firstly, sigmoidality, monotonicity and smoothness (C°°) are not impediments to optimal degrees of approximation. Secondly, and perhaps more surprisingly, these excellent properties are not sufficient to deter the construction of 'pathological' activation functions. In fact there exist real (entire) analytic, sigmoidal, strictly increasing a for which these same optimal error estimates hold (except that 3r replaces r + n + 1 in Proposition 6.3). For further details, see Maiorov and Pinkus (1999). In practice any approximation process depends not only on the degree (order) of approximation, but also on the possibility, complexity and cost of finding good approximants. The above activation functions are very smooth and give the best degree of approximation. However, they are unacceptably complex. We now know something about what is possible, at least theoretically. However, there is another interesting lower bound which is larger than that given above. How can that be? It has to do with the 'method of approximation'. We will show that if the choice of coefficients, weights and thresholds depend continuously on the function being approximated (a not totally unreasonable assumption), then a lower bound on the error of approximation to functions in B™ from Mr(a) is of the order of r~mln (rather than the r -m/(n-i) p r o v e n a bove). We will also show that for all a G C°°(R) (a not a polynomial), and for many other a, this bound is attained. DeVore, Howard and Micchelli (1989) have introduced what they call a continuous nonlinear d-width. It is defined as follows. Let K be a compact set in a normed linear space X. Let Pj be any continuous map from K to Rd, and M^ any map whatsoever from Rd to X.
APPROXIMATION THEORY OF THE MLP
MODEL IN NEURAL NETWORKS
173
Thus Md(Pd(-)) is a map from K to X that has a particular (and perhaps peculiar) factorization. For each such Pd and Md set
E(K;Pd,Md;X) = sup
\\f-Md(Pd(f))\\x,
feK and now define the continuous nonlinear d-width hd(K;X) = inf
E(K;Pd,Md;X)
PdMd
of K in X, where the infimum is taken over all Pd and Md as above. DeVore, Howard and Micchelli prove, among other facts, the asymptotic estimate In our context we are interested in the lower bound. As such, we provide a proof of the following. Theorem 6.5. (DeVore, Howard and Micchelli 1989) For each fixed p € [1, oo], m > 1 and n > 1 U ft2'm.
T \
^
riA—rain
hd\pv ; Lp) > Go
'
for some constant C independent of d. Proof. The Bernstein d-width, bd(K;X), of a compact, convex, centrally symmetric set K in X is the term which has been applied to a codification of one of the standard methods of providing lower bounds for many of the more common d-width concepts. This lower bound is also valid in this setting, as we now show. For K and X, as above, set bd(K;X) = sup sup{A : XS(Xd+1) C K}, where Xd+\ is any (d + l)-dimensional subspace of X, and S(Xd+i) is the unit ball of Xd+\Let Pd be any continuous map from K into Rd. Set Thus Pd is an odd, continuous map from K into Rd, i.e., Pd(—f) = —Pd(f). Assume XS(Xd+i) C K. Pd is an odd, continuous map of d{XS{Xd+\j) (the boundary of S(Xd+i)) into M.d. By the Borsuk Antipodality Theorem there exists an /* e d(XS(Xd+1)) for which Pd(f*) = 0, i.e., Pd(f*) = Pd(-f*). As a consequence, for any map Md from Rd to X, 2/* = [/* - Md(Pd(f*))] - [-/* -
Md(Pd(-f*))]
and therefore
r - Md(pd(n)\\x, \\-r-
Md(pd(-n)\\x}
> \\r\\x = A.
A. PlNKUS
174
Since /* 6 K, this implies that E(K;Pd,Md;X)>\. This inequality is valid for every choice of eligible Pd and Md, and A <
bd(K;X).
Thus hd{K;X)
> bd(K;X),
and in particular hd(B™;Lp) >
It remains to prove the bound b^B™; Lp) > Cd~mln. This proof is quite standard. Let 0 and any j e Z n , set <j>j,e(xi, ...,xn)
= 4>{x\l - ji,...,
x n£ - j n ) .
Thus the support of >^g lies in niLib'*/^) Ui + 1)/^]- F° r ^ large, the number of j € Z n for which the support of 0j^ lies totally in Bn is of the order of £n. A simple change of variable argument shows that, for every p £ [1, oo] and
and Furthermore, since the 4>^g have distinct support (for fixed £),
and
where ||c|| p is the ^ p -norm of the {CJ}. Thus
E j
-_- pnx
hi m,p
E J hi C (
j
where we have restricted the j in the above summands to those j for which the support of 4>-}^ lies totally in Bn. We have therefore obtained a linear subspace of dimension of order £n with the property that, if
then Ct
<1 m,p
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
175
for some constant C independent of £. This exactly implies that where d x f . Thus hd(B™;Lp) > bd(B?;Lp) >
Cd^n,
which proves the theorem. This theorem is useful in what it tells us about approximating from Mr(a) by certain continuous methods. However, two things should be noted and understood. Firstly, these permissible 'methods of approximation' do not necessarily include all continuous methods of approximation. Secondly, some of the approximation methods being developed and used today in this setting are iterative and are not necessarily continuous. Any element g € Mr(v) has the form
for some constants ci,9i E M. and w ! G W1, i = 1,... , r. In general, when approximating / G Lp, our choice of g will depend upon these (n + 2)r parameters. (Some of these parameters may be fixed independent of the function being approximated.) For any method of approximation which continuously depends on these parameters, the lower bound of Theorem 6.5 holds. Theorem 6.6 Let Qr : Lp —> Mr(a) be any method of approximation where the parameters Q, Q{ and wl, i = 1,..., r, are continuously dependent on the function being approximated (some may of course be fixed independent of the function). Then sup feB?
\\f-Qrf\\P>Cr-m'n
for some C independent of r. Additional upper and lower bound estimates appear in Maiorov and Meir (1999). Particular cases of their lower bounds for specific a improve upon the lower bound for E^B™; ,A/fr(cr); L2) given in Theorem 6.1, without any assumption about the continuity of the approximating procedure. We only state this next result. Its proof is too complicated to be presented here. Theorem 6.7. (Maiorov and Meir 1999) n > 2. Let a be the logistic sigmoid, that is, 1
Let p G [l,oo], m > 1 and
176
A. PINKUS
or a (polynomial) spline of a fixed degree with, a finite number of knots. Then E(B™;Mr(a);Lp) >
C(rlogr)-m'n
for some C independent of r. We now consider upper bounds. The next theorem may, with minor modifications, be found in Mhaskar (1996) (see also Ellacott and Bos (1996, p. 352)). Note that the logistic sigmoid satisfies the conditions of Theorem 6.8. Theorem 6.8 Assume a : R —> R is such that a € C°°(©) on some open interval 0 , and a is not a polynomial on 0 . Then, for each p € [1, oo], m > 1 and n > 2, (6.5) E(B?; Mr(a); Lp) < Cr-m'n for some constant C independent of r. Proof. The conditions of Theorem 6.8 imply, by Corollary 3.6, that A4+i (), the closure of J\fk+i(o~), contains TT^, the linear space of univariate algebraic polynomials of degree at most k. From the proof of Theorem 4.1 (see also Theorem 6.2), for s — dimi/j. x kn~l there exist a 1 , . . . , a s in Sn~l such that
where Pk is the linear space of n-variate algebraic polynomials of degree at most k. Since each gt € A/fc+i(cr), and Mp(a) + Mq{a) = Mp+q(a), it follows that
Set r = s(k + l). Then ; Mr(a); Lp) = E(B?; Mr{cj)- Lp) < ^ ( B ^ ; Pk; Lp) < Ck for some constant C independent of r. Since r x P , we have
which proves the theorem. Remark. It is important to note that the upper bound of Theorem 6.8 can be attained by continuous (and in fact linear) methods in the sense of Theorem 6.6. The thresholds Q{ can all be chosen to equal 6O where 0-(fc)(-f9o) ^ 0, k = 0,1,2,... (see Proposition 3.4). The weights are also chosen independent of the function being approximated. The dependence on the function is only in the choice of the gi and, as previously noted (see the
APPROXIMATION
THEORY OF THE MLP
MODEL IN NEURAL NETWORKS
177
remark after Theorem 6.2), this can in fact be done in a linear manner. (For each j ) 6 ( l , oo), the operator of best approximation from Pk is continuous.) Remark. For functions analytic in a neighbourhood of Bn, there are better order of approximation estimates, again based on polynomial approximation: see Mhaskar (1996). If the optimal order of approximation from A4r(a) is really no better than that obtained by approximating from the polynomial space Pk of dimension r x kn, then one cannot but wonder if it is really worthwhile using this model (at least in the case of a single hidden layer). It is not yet clear, from this perspective, what the mathematical or computational justifications are for choosing this model over other models. Some researchers, however, would be more than content if they could construct neural networks that algorithmically achieve this order of approximation. Petrushev (1998) proves some general estimates concerning ridge and neural network approximation. These results are valid only for p = 2. However, they generalize Theorem 6.8 within that setting. Let L\ = 1/2 [—1,1] with the usual norm 1/2
Similarly rim,2 will denote the Sobolev space on [—1,1] with norm \l/2
/ m
U
b\\nm,2= \52\\9 %) V 3=0
'
Set
E(Hm^Mk(a)-L\)=
sup
inf'
\\h-g\\Li
The point of the above is that this is all taking place in R1 rather than in TO" JK .
Theorem 6.9. (Petrushev 1998) Let m > 1 and n > 2. Assume a has the property that £(H m , 2 ;^fc(<7);L£)
for some other C independent of r.
(6-6)
178
A. PINKUS
Remark. It follows from general 'interpolation' properties of spaces that, if (6.6) or (6.7) hold for a specific m, then they also hold for every positive value less than m. The proof of Theorem 6.9 is too complicated to be presented here. The underlying idea is similar to that used in the proof of Theorem 6.8. One , deuses multivariate polynomials to approximate functions in B™ composes these multivariate polynomials into 'ridge' polynomials (the gi in the proof of Theorem 6.8), and then approximates these univariate 'ridge' polynomials from A/fc (
Corollary 6.10. (Petrushev 1998)
For each keZ+,
let
Then
for m = 1 , . . . , A; + 1 +
^ , and some constant C independent of r.
A variation on a result of Petrushev (1998) proves this corollary for m = k + 1 + (n — l)/2. The other cases follow by taking differences (really just differentiating), or as a consequence of the above remark. Note that ao(t) is the Heaviside function. For given k € Z+ assume a is continuous, or piecewise continuous, and satisfies
(This is essentially what Mhaskar and Micchelli (1992) call kth. degree sig= Ofc(£) uniformly off [—c,c], any c > 0, moidal.) Then lim\^0Oa(\t)/Xk and converges in Lp[— 1,1] for any p € [1, oo). Let at be as denned in Corollary 6.10. Thus A4r(ak) Q .Air(a). In addition, if a is a spline of degree k with at least one simple knot, then by taking (a finite number of) shifts and dilates we can again approximate a^ in the Lp[—l, 1] norm, p € [1, oo). Thus, applying Corollary 6.10 we obtain the following. Corollary 6.11 For given k € Z + , let a be as denned in the previous paragraph. Then
for m = l , . . . , £ ; + l +
2
, and some constant C independent of r.
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
179
Note that the error of approximation in all these results has exactly the same form as that given by (6.5). If a £ C°°(O) as in Theorem 6.8, then (6.6) holds since J\fk{) contains itk-iA different and very interesting approach to the problem of determining (or at least bounding) the order of approximation from the set A4r(a) was initiated by Barron (1993). Until now we have considered certain standard smoothness classes (the W™), and then tried to estimate the worst case error of approximation from functions in this class. Another approach is, given Air(a), to try to find classes of functions which are well approximated by Mr{a). This is generally a more difficult problem, but one well worth pursuing. Barron does this, in a sense, in a specific but interesting setting. What we present here is based on work of Barron (1993), and generalizations due to Makovoz (1996). We start with a general result which is a generalization, due to Makovoz (1996), of a result of Barron (1993) and Maurey (Pisier 1981) (see also Jones (1992)). (Their result does not contain the factor er{K).) It should be mentioned that, unlike the previously discussed upper bounds, these upper bounds are obtained by strictly nonlinear (and not necessarily continuous) methods. Let H be a Hilbert space and K a bounded set therein. Let co K denote the convex hull of K. Set er(K) = inf{e > 0 : K can be covered by r sets of diameter < e}. Theorem 6.12. (Makovoz 1996) Let K be a bounded subset of a Hilbert space H. Let / G co K. Then there is an fr of the form
fr = for some gi G K, en > 0, i = 1,..., r, and YH=i ai ^ 1> satisfying 2e r (X) ||/-/r||H < 7^Letting K be the set of our approximants we may have here a very reasonable approximation-theoretic result. The problem, however, is to identify co K, or at least some significant subset of co K in other than tautological terms. Otherwise the result could be rather sterile. Barron (1993) considered a which are bounded, measurable, and sigmoidal, and set K{a)
=
W
x - 9) : w G Rn, 9 G R}.
(Recall that x G Bn.) He then proved that coK(a) contains the set B of all functions / defined on Bn which can be extended to all of Rn such that
180
A. PlNKUS
some shift of / by a constant has a Fourier transform / satisfying JWi
|s||2|/(s)|ds<7,
for some 7 > 0. Let us quickly explain, in general terms, why this result holds. As we mentioned earlier, at least for continuous, sigmoidal a (see the comment after Corollary 6.10), cr(A-) approaches ao(-) in norm as A — 00, where ao is the Heaviside function. As such, co K{ao) C coK(a) (and, equally important in what will follow, we essentially have K(ao) C K(a), i.e., we can replace each
Applying this result to K(ao), this implies that, for each s G Mn, s ^ 0,
j^-e
co K(ao)
for some 7 (dependent on Bn). Thus, if
y"j|s||2|/(s)|ds<7, then
To apply Theorem 6.12 we should also obtain a good estimate for er(K(a)). This quantity is generally impossible to estimate. However, since K(ao) C K(a) we have A4r(ao) C M.r(a), and it thus suffices to consider er(K(ao)). Since we are approximating on Bn, K{a0)
=
o(w
-x-0):
]|w|| 2 = 1, |0| < 1}.
(For any other w or 9 we add no additional function to the set K(a0).) Now, if Hw1^ = ||w2||2 = 1, Hw1 - w 2 || 2 < e2, and |0i|,|0 2 | < 1,
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
181
|0i -0 2 1 <£2, then 1/2
< Ce for some constant C. Thus to estimate er(K(ao)) we must find an £2-net for {(w,0): ||w||2 = l , | 0 | < l } . It is easily shown that for this we need (s2)~n elements. Thus er(K(ao)) < Cr -l/2n_
We can now summarize. Theorem 6.13. (Makovoz 1996) Let B be as defined above. Then, for any bounded, measurable, sigmoidal function a, E(B- Mricr); L2) < E{B; Mr(a) n B; L2) < Cr~{n+lV2n
(6.8)
for some constant C independent of r. If a is a piecewise continuous sigmoidal function, then from Corollary 6.11 we have This is the same error bound, with the same activation function, as appears in (6.8). As such it is natural to ask which, if either, is the stronger result. In fact the results are not comparable. The condition defining B cannot be restated in terms of conditions on the derivatives. What is known (see Barron (1993)) is that on Bn we essentially have w[n/2]+2 ^ g p a n B c W^ C Wl (The leftmost inclusion is almost, but not quite, correct: see Barron (1993).) The error estimate of Barron (1993) did not originally contain the term eT{K) and thus was of the form Cr~ll2 (for some constant C). This initiated an unfortunate discussion concerning these results having 'defeated the curse of dimensionality'. The literature contains various generalizations of the above results, and we expect more to follow. Makovoz (1996) generalizes Theorems 6.12 and 6.13 to Lq(B,n), where p is a probability measure on some set B in Rn, 1 < q < oo. (For a discussion of an analogous problem in the uniform norm, see Barron (1992) and Makovoz (1998).) Donahue, Gurvits, Darken and Sontag (1997) consider different generalizations of Theorem 6.12 and they provide a general perspective on this type of problem. Hornik, Stinchcombe, White and Auer (1994) (see also Chen and White (1999)) consider generalizations of the Barron (1993) results to where the function and some of its derivatives are simultaneously approximated. Lower bounds on the error of
182
A. PINKUS
approximation are to be found in Barron (1992) and Makovoz (1996). However, these lower bounds essentially apply to approximating from A4r(cro)P\B (a restricted set of approximants and a particular activation function) and do not apply to approximation from all of Mr{cr)- Other related results may be found in Mhaskar and Micchelli (1994), Yukich, Stinchcombe and White (1995) and Kurkova, Kainen and Kreinovich (1997). For / € B the following algorithm of approximation was introduced by Jones (1992) to obtain an iterative sequence {hr} of approximants (hr € Mr{a)) where a is sigmoidal (as above). These approximants satisfy
\\f-hr\\2
min ||/
Assume that these minima are attained for ar £ [0,1] and gr e K(a). Set hr = arhr-i + (1 — ar)gr. (In the above we assume that K{a) is compact.) In fact, as mentioned by Jones (1992), improved upon by Barron (1993), and further improved by Jones (1999) (see also Donahue, Gurvits, Darken and Sontag (1997)), the ar and gr need not be chosen to attain the above minima exactly and yet the same convergence rate will hold. We end this section by pointing out that much remains to be done in finding good upper bounds, constructing reasonable methods of approximation, and identifying classes of functions which are well approximated using this model. It is also worth noting that very few of the results we have surveyed used intrinsic properties of the activation functions. In Theorem 6.8 only the C°° property was used. Corollary 6.11 depends solely on the approximation properties of a^. Theorem 6.13 is a result concerning the Heaviside activation function. 7. Two hidden layers Relatively little is known concerning the advantages and disadvantages of using a single hidden layer with many units (neurons) over many hidden layers with fewer units. The mathematics and approximation theory of the MLP model with more than one hidden layer is not well understood. Some authors see little theoretical gain in considering more than one hidden layer since a single hidden layer model suffices for density. Most authors, however, do allow for the possibility of certain other benefits to be gained from using more than one hidden layer. (See de Villiers and Barnard (1992) for a comparison of these two models.)
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
183
One important advantage of the multiple (rather than single) hidden layer model has to do with the existence of locally supported, or at least 'localized', functions in the two hidden layer model (see Lapedes and Farber (1988), Blum and Li (1991), Geva and Sitte (1992), Chui, Li and Mhaskar (1994)). For any activation function a, every g G Mr(a), g ^ 0, has /R"
for every p € [l,oo), and no g G Mr{a) has compact support. This is no longer true in the two hidden layer model. For example, let ao be the Heaviside function. Then
(
m
/
1\ \
( w
y^«T 0 (w*-x—9i)— f m— 2 i I = < 0 '
^
J )
\
I '
x
— *>
otherwise.
l
— '
' m ' v (7.1)
'
Thus the two hidden layer model with activation function ao, and only one unit in the second hidden layer, can represent the characteristic function of any closed convex polygonal domain. For example, for a^ < bi, i = 1,..., n, j - a-i) + ao{-Xi + is the characteristic function of the rectangle FliLiI0*' ^1- (Up to boundary values, this function also has the representation -
di) - (7o(Xi
-
i=l
since ao(—t) = 1 — ao(t) for all < ^ 0.) If a is a continuous or piecewise continuous sigmoidal function, then a similar result holds for such functions since cr(A-) approaches ao(-) as A —> oo in, say, LP[—1,1] for every p G [1, oo). The function
thus approximates the function given in (7.1) as A —> oo. Approximating by such localized functions has many, many advantages. Another advantage of the multiple hidden layer model is the following. As was noted in Section 6, there is a lower bound on the degree to which the single hidden layer model with r units in the hidden layer can approximate any function. It is given by the extent to which a linear combination of r ridge functions can approximate this same function. This lower bound was shown to be attainable (Proposition 6.3 and Corollary 6.4), and, more importantly, ridge function approximation itself is bounded below (away
184
A. PlNKUS
from zero) with some non-trifling dependence on r and on the set to be approximated. In the single hidden layer model there is an intrinsic lower bound on the degree of approximation, depending on the number of units used. This is not the case in the two hidden layer model. We will prove, using the same activation function as in Proposition 6.3, that there is no theoretical lower bound on the error of approximation if we permit two hidden layers. To be precise, we will prove the following theorem. Theorem 7.1. (Maiorov and Pinkus 1999) There exists an activation function a which is C°°, strictly increasing, and sigmoidal, and has the following property. For any / 6 C[0, l] n and e > 0, there exist constants di, Cij, Oij, 7J, and vectors w'-* € R" for which 4n+3
/2n+l li
i=l
<£,
V j=l
for all x e [0, If. In other words, for this specific activation function, any continuous function on the unit cube in W1 can be uniformly approximated to within any error by a two hidden layer neural network with 2n + 1 units in the first hidden layer and An + 3 units in the second hidden layer. (We recall that the constructed activation function is nonetheless rather pathological.) In the proof of Theorem 7.1 we use the Kolmogorov Superposition Theorem. This theorem has been much quoted and discussed in the neural network literature: see Hecht-Nielsen (1987), Girosi and Poggio (1989), Kurkova (1991, 1992, 19956), Lin and Unbehauen (1993). In fact Kurkova (1992) uses the Kolmogorov Superposition Theorem to construct approximations in the two hidden layer model with an arbitrary sigmoidal function. However, the number of units needed is exceedingly large, and does not provide for good error bounds or, in our opinion, a reasonably efficient method of approximation. Better error bounds follow by using localized functions (see, for instance, Blum and Li (1991), Ito (1994a), and especially Chui, Li and Mhaskar (1994)). Kurkova (1992) and others (see Frisch, Borzi, Ord, Percus and Williams (1989), Sprecher (1993, 1997), Katsuura and Sprecher (1994), Nees (1994, 1996)) are interested in using the Kolmogorov Superposition Theorem to find good algorithms for approximation. This is not our aim. We are using the Kolmogorov Superposition Theorem to prove that there is no theoretical lower bound on the degree of approximation common to all activation functions, as is the case in the single hidden layer model. In fact, we are showing that there exists an activation function with very 'nice' properties for which a fixed finite number of units in both hidden layers is
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
185
sufficient to approximate arbitrarily well any continuous function. We do not, however, advocate using this activation function. The Kolmogorov Superposition Theorem answers (in the negative) Hilbert's 13th problem. It was proven by Kolmogorov in a series of papers in the late 1950s. We quote below an improved version of this theorem (see Lorentz, von Golitschek and Makovoz (1996, p. 553) for a more detailed discussion). Theorem 7.2 There exist n constants Xj > 0, j = 1,..., n, Y2]=i A? — ^> and 2n+l strictly increasing continuous functions >j, z = 1,..., 2n+l, which map [0,1] to itself, such that every continuous function / of n variables on [0, l] n can be represented in the form 2n+l
f(Xl,...,xn)=
^SlE^'fo) V J
(7-2)
for some g G C[0,1] depending on / .
Note that this is a theorem about representing (and not approximating) functions. There have been numerous generalizations of this theorem. Attempts to understand the nature of this theorem have led to interesting concepts related to the complexity of functions. Nonetheless the theorem itself has had few, if any, direct applications. Proof of Theorem 7.1. We are given / G C[0,1]" and e > 0. Let g and the 4>i be as in (7.2). We will use the a constructed in Proposition 6.3. Recall that to any h G C[—1,1] and rj > 0 we can find constants 01,02,03 and an integer m for which \h{t) - (aia(t - 3) + o2cr(t + 1) + a3a(t + m))| < 77 for alH G [—1,1]. This result is certainly valid when we restrict ourselves to the interval [0,1] and functions continuous thereon. As such, for the above g there exist constants 01,02,03 and an integer m such that \g{t) - (a1(r(t ~ 3) + a2a(t + 1) + a3a{t + m))\ <
£ +
(7.3)
for all t G [0,1]. Further, recall that a(t — 3) and a(t + 1) are linear polynomials on [0,1]. Substituting (7.3) in (7.2), we obtain 2n+l
186
A. PINKUS
for all ( x i , . . . , x n ) G [0, l ] n . Since a
( J2 XMxi) - 3 1
and
a f ^ A^z,) + 1 j
are linear polynomials in Y^j=i xj4>i(xj)i f° r e a c n *> 2n+l
/ n
\
1^ as
-
2n+2
3
we can m
/ n
I + °2O-1 ^
fact rewrite \
\j(j>i(xj) + 1 I
/ n j=l
where 02n+2 is 0fc for some fc€{l,...,2n each i). Thus we may rewrite (7.4) as
(and ji is either —3 or 1 for
2n+2 / ( x i , . . . , xn)
-
7i
i=l n
2n+l
(7.5)
m
forall(xi,...,xn)G[0,l]n. For each i € { 1 , . . . , 2n + 1}, and 5 > 0 there exist constants bn, 622, and integers r, such that
<6
i(XJ) - ( bn 0,
<
<6
(xj - 3) 3=1
3=1
for all (xi,...,xn)€ [0, l ] n . Again we use the fact that the (T(XJ — 3) and <J(XJ + 1) are linear polynomials on [0,1] to rewrite the above as 2n+l
<6 3=1
(7.6)
3=1 n
for all ( x i , . . . , x n ) G [0, l ] for some constants Cij and 9{j and vectors (in fact the w y ' are all unit vectors).
APPROXIMATION THEORY OF THE MLP MODEL IN NEURAL NETWORKS
187
We now substitute (7.6) into (7.5). As a is uniformly continuous on every closed interval, we can choose 6 > 0 sufficiently small so that
(7.7)
From (7.5), (7.7), renumbering and renaming, the theorem follows. As a consequence of what was stated in the remark following the proof of Proposition 6.3, we can in fact prove Theorem 7.1 with a a which is analytic (and not only C°°), strictly increasing, and sigmoidal (see Maiorov and Pinkus (1999)). The difference is that we must then use 3n units in the first layer and 6n + 3 units in the second layer. The restriction of Theorem 7.1 to the unit cube is for convenience only. The same result holds over any compact subset of R n . We have established only two facts in this section. We have shown that there exist localized functions, and that there is no theoretical lower bound on the degree of approximation common to all activation functions (contrary to the situation in the single hidden layer model). Nonetheless there seems to be reason to conjecture that the two hidden layer model may be significantly more promising than the single hidden layer model, at least from a purely approximation-theoretic point of view. This problem certainly warrants further study. Acknowledgement The author is indebted to Lee Jones, Moshe Leshno, Vitaly Maiorov, Yuly Makovoz, and Pencho Petrushev for reading various parts of this paper. All errors, omissions and other transgressions are the author's responsibility.
REFERENCES R. A. Adams (1975), Sobolev Spaces, Academic Press, New York. F. Albertini, E. D. Sontag and V. Maillot (1993), 'Uniqueness of weights for neural networks', in Artificial Neural Networks for Speech and Vision (R. J. Mammone, ed.), Chapman and Hall, London, pp. 113-125. J.-G. Attali and G. Pages (1997), 'Approximations of functions by a multilayer perceptron: a new approach', Neural Networks 10, 1069-1081. A. R. Barron (1992), 'Neural net approximation', in Proc. Seventh Yale Workshop
188
A. PINKUS
on Adaptive and Learning Systems, 1992 (K. S. Narendra, ed.), Yale University, New Haven, pp. 69-72. A. R. Barron (1993), 'Universal approximation bounds for superpositions of a sigmoidal function', IEEE Trans. Inform. Theory 39, 930-945. A. R. Barron (1994), 'Approximation and estimation bounds for artificial neural networks', Machine Learning 14, 115-133. P. L. Bartlett, V. Maiorov and R. Meir (1998), 'Almost linear VC dimension bounds for piecewise polynomial networks', Neural Computation 10, 2159-2173. E. B. Baum (1988), 'On the capabilities of multilayer perceptrons', J. Complexity 4, 193-215. C. M. Bishop (1995), Neural Networks for Pattern Recognition, Oxford University Press, Oxford. E. K. Blum and L. K. Li (1991), 'Approximation theory and feedforward networks', Neural Networks 4, 511-515. M. D. Buhmann and A. Pinkus (1999), 'Identifying linear combinations of ridge functions', Adv. Appl. Math. 22, 103-118. R. M. Burton and H. G. Dehling (1998), 'Universal approximation in p-mean by neural networks', Neural Networks 11, 661-667. P. Cardaliaguet and G. Euvrard (1992), 'Approximation of a function and its derivatives with a neural network', Neural Networks 5, 207-220. S. M. Carroll and B. W. Dickinson (1989), 'Construction of neural nets using the Radon transform', in Proceedings of the IEEE 1989 International Joint Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 607-611. T. Chen and H. Chen (1993), 'Approximations of continuous functionals by neural networks with application to dynamic systems', IEEE Trans. Neural Networks 4, 910-918. T. Chen and H. Chen (1995), 'Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems', IEEE Trans. Neural Networks 6, 911-917. T. Chen, H. Chen and R. Liu (1995), 'Approximation capability in CIW1) by multilayer feedforward networks and related problems', IEEE Trans. Neural Networks 6, 25-30. X. Chen and H. White (1999), 'Improved rates and asymptotic normality for nonparametric neural network estimators', preprint. C. H. Choi and J. Y. Choi (1994), 'Constructive neural networks with piecewise interpolation capabilities for function approximations', IEEE Trans. Neural Networks 5, 936-944. C. K. Chui and X. Li (1992), 'Approximation by ridge functions and neural networks with one hidden layer', J. Approx. Theory 70, 131-141. C. K. Chui and X. Li (1993), 'Realization of neural networks with one hidden layer', in Multivariate Approximations: From CAGD to Wavelets (K. Jetter and F. Utreras, eds), World Scientific, Singapore, pp. 77-89. C. K. Chui, X. Li and H. N. Mhaskar (1994), 'Neural networks for localized approximation', Math. Comp. 63, 607-623. C. K. Chui, X. Li and H. N. Mhaskar (1996), 'Limitations of the approximation capabilities of neural networks with one hidden layer', Adv. Comput. Math. 5, 233-243.
APPROXIMATION THEORY OF THE MLP
MODEL IN NEURAL NETWORKS
189
E. Corominas and F. Sunyer Balaguer (1954), 'Condiciones para que una foncion infinitamente derivable sea un polinomo', Rev. Mat. Hisp. Amer. 14, 26-43. N. E. Cotter (1990), 'The Stone-Weierstrass theorem and its application to neural networks', IEEE Trans. Neural Networks 1, 290-295. G. Cybenko (1989), 'Approximation by superpositions of a sigmoidal function', Math. Control, Signals, and Systems 2, 303-314. R. A. DeVore, R. Howard and C. Micchelli (1989), 'Optimal nonlinear approximation', Manuscripta Math. 63, 469-478. R. A. DeVore, K. I. Oskolkov and P. P. Petrushev (1997), 'Approximation by feedforward neural networks', Ann. Numer. Math. 4, 261-287. L. Devroye, L. Gyorfi and G. Lugosi (1996), A Probabilistic Theory of Pattern Recognition, Springer, New York. M. J. Donahue, L. Gurvits, C. Darken and E. Sontag (1997), 'Rates of convex approximation in non-Hilbert spaces', Const. Approx. 13, 187-220. W. F. Donoghue (1969), Distributions and Fourier Transforms, Academic Press, New York. T. Draelos and D. Hush (1996), 'A constructive neural network algorithm for function approximation', in Proceedings of the IEEE 1996 International Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 50-55. R. E. Edwards (1965), Functional Analysis, Theory and Applications, Holt, Rinehart and Winston, New York. S. W. Ellacott (1994), 'Aspects of the numerical analysis of neural networks', in Vol. 3 of Ada Numerica, Cambridge University Press, pp. 145-202. S. W. Ellacott and D. Bos (1996), Neural Networks: Deterministic Methods of Analysis, International Thomson Computer Press, London. C. Fefferman (1994), 'Reconstructing a neural net from its output', Revista Mat. Iberoamer. 10, 507-555. R. A. Finan, A. T. Sapeluk and R. I. Damper (1996), 'Comparison of multilayer and radial basis function neural networks for text-dependent speaker recognition', in Proceedings of the IEEE 1996 International Conference on Neural Networks, Vol. 4, IEEE, New York, pp. 1992-1997. H. L. Frisch, C. Borzi, D. Ord, J. K. Percus and G. O. Williams (1989), 'Approximate representation of functions of several variables in terms of functions of one variable', Phys. Review Letters 63, 927-929. K. Funahashi (1989), 'On the approximate realization of continuous mappings by neural networks', Neural Networks 2, 183-192. A. R. Gallant and H. White (1988), 'There exists a neural network that does not make avoidable mistakes', in Proceedings of the IEEE 1988 International Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 657-664. A. R. Gallant and H. White (1992), 'On learning the derivatives of an unknown mapping with multilayer feedforward networks', Neural Networks 5, 129-138. S. Geva and J. Sitte (1992), 'A constructive method for multivariate function approximation by multilayer perceptrons', IEEE Trans. Neural Networks 3, 621624. F. Girosi and T. Poggio (1989), 'Representation properties of networks: Kolmogorov's theorem is irrelevant', Neural Computation 1, 465-469.
190
A. PlNKUS
F. Girosi and T. Poggio (1990), 'Networks and the best approximation property', Biol. Cybern. 63, 169-176. M. Gori, F. Scarselli and A. C. Tsoi (1996), 'Which classes of functions can a given multilayer perceptron approximate?', in Proceedings of the IEEE 1996 International Conference on Neural Networks, Vol. 4, IEEE, New York, pp. 22262231. S. Haykin (1994), Neural Networks, MacMillan, New York. R. Hecht-Nielsen (1987), 'Kolmogorov's mapping neural network existence theorem', in Proceedings of the IEEE 1987 International Conference on Neural Networks, Vol. 3, IEEE, New York, pp. 11-14. R. Hecht-Nielsen (1989), 'Theory of the backpropagation neural network', in Proceedings of the IEEE 1989 International Joint Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 593-605. K. Hornik (1991), 'Approximation capabilities of multilayer feedforward networks', Neural Networks 4, 251-257. K. Hornik (1993), 'Some new results on neural network approximation', Neural Networks 6, 1069-1072. K. Hornik, M. Stinchcombe and H. White (1989), 'Multilayer feedforward networks are universal approximators', Neural Networks 2, 359-366. K. Hornik, M. Stinchcombe and H. White (1990), 'Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks', Neural Networks 3, 551-560. K. Hornik, M. Stinchcombe, H. White and P. Auer (1994), 'Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives', Neural Computation 6, 1262-1275. G. B. Huang and H. A. Babri (1998), 'Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions', IEEE Trans. Neural Networks 9, 224-229. S. C. Huang and Y. F. Huang (1991), 'Bounds on the number of hidden neurons in multilayer perceptrons', IEEE Trans. Neural Networks 2, 47-55. B. Irie and S. Miyake (1988), 'Capability of three-layered perceptrons', in Proceedings of the IEEE 1988 International Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 641-648. Y. Ito (1991a), 'Representation of functions by superpositions of a step or a sigmoid function and their applications to neural network theory', Neural Networks 4, 385-394. Y. Ito (19916), 'Approximation of functions on a compact set by finite sums of a sigmoid function without scaling', Neural Networks 4, 817-826. Y. Ito (1992), 'Approximation of continuous functions on Rd by linear combinations of shifted rotations of a sigmoid function with and without scaling', Neural Networks 5, 105-115. Y. Ito (1993), 'Approximations of differentiable functions and their derivatives on compact sets by neural networks', Math. Scient. 18, 11-19. Y. Ito (1994a), 'Approximation capabilities of layered neural networks with sigmoidal units on two layers', Neural Computation 6, 1233-1243. Y. Ito (19946), 'Differentiable approximation by means of the Radon transformation and its applications to neural networks', J. Comput. Appl. Math. 55, 31-50.
APPROXIMATION THEORY OF THE MLP
MODEL IN NEURAL NETWORKS
191
Y. Ito (1996), 'Nonlinearity creates linear independence', Adv. Comput. Math. 5, 189-203. Y. Ito and K. Saito (1996), 'Superposition of linearly independent functions and finite mappings by neural networks', Math. Scient. 21, 27-33. L. K. Jones (1990), 'Constructive approximations for neural networks by sigmoidal functions', Proc. IEEE 78, 1586-1589. Correction and addition, Proc. IEEE (1991) 79, 243. L. K. Jones (1992), 'A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training', Ann. Stat. 20, 608-613. L. K. Jones (1994), 'Good weights and hyperbolic kernels for neural networks, projection pursuit, and pattern classification: Fourier strategies for extracting information from high-dimensional data', IEEE Trans. Inform. Theory 40, 439-454. L. K. Jones (1997), 'The computational intractability of training sigmoidal neural networks', IEEE Trans. Inform. Theory 43, 167-173. L. K. Jones (1999), 'Local greedy approximation for nonlinear regression and neural network training', preprint. J. P. Kahane (1959), Lectures on Mean Periodic Functions, Tata Institute, Bombay. P. C. Kainen, V. Kurkova and A. Vogt (1999), 'Approximation by neural networks is not continuous', preprint. H. Katsuura and D. A. Sprecher (1994), 'Computational aspects of Kolmogorov's superposition theorem', Neural Networks 7, 455-461. V. Y. Kreinovich (1991), 'Arbitrary nonlinearity is sufficient to represent all functions by neural networks: a theorem', Neural Networks 4, 381-383. V. Kurkova (1991), 'Kolmogorov's theorem is relevant', Neural Computation 3, 617-622. V. Kurkova (1992), 'Kolmogorov's theorem and multilayer neural networks', Neural Networks 5, 501-506. V. Kurkova (1995a), 'Approximation of functions by perceptron networks with bounded number of hidden units', Neural Networks 8, 745-750. V. Kurkova (19956), 'Kolmogorov's theorem', in The Handbook of Brain Theory and Neural Networks, (M. Arbib, ed.), MIT Press, Cambridge, pp. 501-502. V. Kurkova (1996), 'Trade-off between the size of weights and the number of hidden units in feedforward networks', Neural Network World 2, 191-200. V. Kurkova and P. C. Kainen (1994), 'Functionally equivalent feedforward neural networks', Neural Computation 6, 543-558. V. Kurkova, P. C. Kainen and V. Kreinovich (1997), 'Estimates of the number of hidden units and variation with respect to half-spaces', Neural Networks 10, 1061-1068. A. Lapedes and R. Farber (1988), 'How neural nets work', in Neural Information Processing Systems (D. Z. Anderson, ed.), American Institute of Physics, New York, pp. 442-456. M. Leshno, V. Ya. Lin, A. Pinkus and S. Schocken (1993), 'Multilayer feedforward networks with a non-polynomial activation function can approximate any function', Neural Networks 6, 861-867.
192
A. PlNKUS
X. Li (1996), 'Simultaneous approximations of multivariate functions and their derivatives by neural networks with one hidden layer', Neurocomputing 12, 327-343. W. A. Light (1993), 'Ridge functions, sigmoidal functions and neural networks', in Approximation Theory VII (E. W. Cheney, C. K. Chui and L. L. Schumaker, eds), Academic Press, New York, pp. 163-206. J. N. Lin and R. Unbehauen (1993), 'On realization of a Kolmogorov network', Neural Computation 5, 18-20. V. Ya. Lin and A. Pinkus (1993), 'Fundamentality of ridge functions', J. Approx. Theory 75, 295-311. V. Ya. Lin and A. Pinkus (1994), 'Approximation of multivariate functions', in Advances in Computational Mathematics: New Delhi, India, (H. P. Dikshit and C. A. Micchelli, eds), World Scientific, Singapore, pp. 257-265. R. P. Lippman (1987), 'An introduction to computing with neural nets', IEEE Magazine 4, 4-22. G. G. Lorentz, M. von Golitschek and Y. Makovoz (1996), Constructive Approximation: Advanced Problems, Vol. 304 of Grundlehren, Springer, Berlin. V. E. Maiorov (1999), 'On best approximation by ridge functions', to appear in J. Approx. Theory V. E. Maiorov and R. Meir (1999), 'On the near optimality of the stochastic approximation of smooth functions by neural networks', to appear in Adv. Comput. Math. V. Maiorov, R. Meir and J. Ratsaby (1999), 'On the approximation of functional classes equipped with a uniform measure using ridge functions', to appear in J. Approx. Theory. V. Maiorov and A. Pinkus (1999), 'Lower bounds for approximation by MLP neural networks', Neurocomputing 25, 81-91. Y. Makovoz (1996), 'Random approximants and neural networks', J. Approx. Theory 85, 98-109. Y. Makovoz (1998), 'Uniform approximation by neural networks', J. Approx. Theory 95, 215-228. M. Meltser, M. Shoham and L. M. Manevitz (1996), 'Approximating functions by neural networks: a constructive solution in the uniform norm', Neural Networks 9, 965-978. H. N. Mhaskar (1993), 'Approximation properties of a multilayered feedforward artificial neural network', Adv. Comput. Math. 1, 61-80. H. N. Mhaskar (1994), 'Approximation of real functions using neural networks', in Advances in Computational Mathematics: New Delhi, India, (H. P. Dikshit and C. A. Micchelli, eds), World Scientific, Singapore, pp. 267-278. H. N. Mhaskar (1996), 'Neural networks for optimal approximation of smooth and analytic functions', Neural Computation 8, 164-177. H. N. Mhaskar and N. Hahm (1997), 'Neural networks for functional approximation and system identification', Neural Computation 9, 143-159. H. N. Mhaskar and C. A. Micchelli (1992), 'Approximation by superposition of a sigmoidal function and radial basis functions', Adv. Appl. Math. 13, 350-373. H. N. Mhaskar and C. A. Micchelli (1993), 'How to choose an activation function',
APPROXIMATION THEORY OF THE MLP
MODEL IN NEURAL NETWORKS
193
in Vol. 6 of Neural Information Processing Systems (J. D. Cowan, G. Tesauro and J. Alspector, eds), Morgan Kaufman, San Francisco, pp. 319-326. H. N. Mhaskar and C. A. Micchelli (1994), 'Dimension-independent bounds on the degree of approximation by neural networks', IBM J. Research Development 38, 277-284. H. N. Mhaskar and C. A. Micchelli (1995), 'Degree of approximation by neural and translation networks with a single hidden layer', Adv. Appl. Math. 16, 151-183. H. N. Mhaskar and J. Prestin (1999), 'On a choice of sampling nodes for optimal approximation of smooth functions by generalized translation networks', to appear in Proceedings of International Conference on Artificial Neural Networks, Cambridge, England. M. Nees (1994), 'Approximative versions of Kolmogorov's superposition theorem, proved constructively', J. Comput. Appl. Anal. 54, 239-250. M. Nees (1996), 'Chebyshev approximation by discrete superposition: Application to neural networks', Adv. Comput. Math. 5, 137-151. K. I. Oskolkov (1997), 'Ridge approximation, Chebyshev-Fourier analysis and optimal quadrature formulas', Tr. Mat. Inst. Steklova 219 Teor. Priblizh. Garmon. Anal., 269-285. P. P. Petrushev (1998), 'Approximation by ridge functions and neural networks', SIAM J. Math. Anal. 30, 155-189. A. Pinkus (1995), 'Some density problems in multivariate approximation', in Approximation Theory: Proceedings of the International Dortmund Meeting IDoMAT 95, (M. W. Miiller, M. Felten and D. H. Mache, eds), Akademie Verlag, Berlin, pp. 277-284. A. Pinkus (1996), 'TDI-Subspaces of C(Rd) and some density problems from neural networks', J. Approx. Theory 85, 269-287. A. Pinkus (1997), 'Approximating by ridge functions', in Surface Fitting and Multiresolution Methods, (A. Le Mehaute, C. Rabut and L. L. Schumaker, eds), Vanderbilt University Press, Nashville, pp. 279-292. G. Pisier (1981), 'Remarques sur un resultat non publie de B. Maurey', in Seminaire DAnalyse Fonctionnelle, 1980-1981, Ecole Poly technique, Centre de Mathematiques, Palaiseau, France. B. D. Ripley (1994), 'Neural networks and related methods for classification', J. Royal Statist. Soc, B 56, 409-456. B. D. Ripley (1996), Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge. H. L. Royden (1963), Real Analysis, MacMillan, New York. W. S. Sarle (1998), editor of Neural Network, FAQ, parts 1 to 7, Usenet newsgroup comp.ai.neural-nets, ftp://ftp.sas.com/pub/neural/FAQ.html M.A. Sartori and P. J. Antsaklis (1991), 'A simple method to derive bounds on the size and to train multilayer neural networks', IEEE Trans. Neural Networks 2, 467-471. F. Scarselli and A. C. Tsoi (1998), 'Universal approximation using feedforward neural networks: a survey of some existing methods, and some new results', Neural Networks 11, 15-37.
194
A. PlNKUS
L. Schwartz (1944), 'Sur certaines families non fondamentales de fonctions continues', Bull. Soc. Math. France 72, 141-145. L. Schwartz (1947), 'Theorie generate des fonctions moyenne-periodiques', Ann. Math. 48, 857-928. K. Y. Siu, V. P. Roychowdhury and T. Kailath (1994), 'Rational approximation techniques for analysis of neural networks', IEEE Trans. Inform. Theory 40, 455-46. E. D. Sontag (1992), 'Feedforward nets for interpolation and classification', J. Cornput. System Sci. 45, 20-48. D. A. Sprecher (1993), 'A universal mapping for Kolmogorov's superposition theorem', Neural Networks 6, 1089-1094. D. A. Sprecher (1997), 'A numerical implementation of Kolmogorov's superpositions IF, Neural Networks 10, 447-457. M. Stinchcombe (1995), 'Precision and approximate flatness in artificial neural networks', Neural Computation 7, 1021-1039. M. Stinchcombe and H. White (1989), 'Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions', in Proceedings of the IEEE 1989 International Joint Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 613-618. M. Stinchcombe and H. White (1990), 'Approximating and learning unknown mappings using multilayer feedforward networks with bounded weights', in Proceedings of the IEEE 1990 International Joint Conference on Neural Networks, Vol. 3, IEEE, New York, pp. 7-16. B. G. Sumpter, C. Getino and D. W. Noid (1994), 'Theory and applications of neural computing in chemical science', Annual Rev. Phys. Chem. 45, 439481. H. J. Sussmann (1992), 'Uniqueness of the weights for minimal feedforward nets with a given input-output map', Neural Networks 5, 589-593. Y. Takahashi (1993), 'Generalization and approximation capabilities of multilayer networks', Neural Computation 5, 132-139. J. de Villiers and E. Barnard (1992), 'Backpropagation neural nets with one and two hidden layers', IEEE Trans. Neural Networks 4, 136-141. B. A. Vostrecov and M. A. Kreines (1961), 'Approximation of continuous functions by superpositions of plane waves', Dokl. Akad. Nauk SSSR 140, 1237-1240 = Soviet Math. Dokl. 2, 1326-1329. Z. Wang, M. T. Tham and A. J. Morris (1992), 'Multilayer feedforward neural networks: a canonical form approximation of nonlinearity', Internat. J. Control 56, 655-672. S. Watanabe (1996), 'Solvable models of layered neural networks based on their differential structure', Adv. Comput. Math. 5, 205-231. R. C. Williamson and U. Helmke (1995), 'Existence and uniqueness results for neural network approximations', IEEE Trans. Neural Networks 6, 2-13. J. Wray and G. G. Green (1995), 'Neural networks, approximation theory and finite precision computation', Neural Networks 8, 31-37. Y. Xu, W. A. Light and E. W. Cheney (1993), 'Constructive methods of approximation by ridge functions and radial functions', Numerical Alg. 4, 205-223.
APPROXIMATION THEORY OF THE MLP
MODEL IN NEURAL NETWORKS
195
J. E. Yukich, M. B. Stinchcombe and H. White (1995), 'Sup-norm approximation bounds for networks through probabilistic methods', IEEE Trans. Inform. Theory 41, 1021-1027.
Ada Numerica (1999), pp. 197-246
© Cambridge University Press, 1999
An introduction to numerical methods for stochastic differential equations Eckhard Platen School of Mathematical Sciences and School of Finance and Economics, University of Technology, Sydney, PO Box 123, Broadway, NSW 2007, Australia This paper aims to give an overview and summary of numerical methods for the solution of stochastic differential equations. It covers discrete time strong and weak approximation methods that are suitable for different applications. A range of approaches and results is discussed within a unified framework. On the one hand, these methods can be interpreted as generalizing the welldeveloped theory on numerical analysis for deterministic ordinary differential equations. On the other hand they highlight the specific stochastic nature of the equations. In some cases these methods lead to completely new and challenging problems.
CONTENTS 1 Introduction 2 Stochastic differential equations 3 Euler approximation 4 Strong and weak convergence 5 Stochastic Taylor expansions 6 Strong approximation methods 7 Weak approximation methods 8 Further developments and conclusions References
197 199 201 202 204 205 213 226 228
1. Introduction About three hundred years ago, Newton and Leibniz developed the differential calculus, allowing us to model continuous time dynamical systems in mechanics, astronomy and many other areas of science. This calculus has formed the basis of the revolutionary developments in science, technology and manufacturing that the world has experienced over the last two centuries.
198
E. PLATEN
As we try to build more realistic models, stochastic effects need to be taken into account. In areas such as finance, the randomness in the system dynamics is in fact the essential phenomenon to be modelled. Continuous time stochastic dynamics need to be modelled in many areas of application, including microelectronics, signal processing and filtering, several fields of biology and physics, population dynamics, epidemiology, psychology, economics, finance, insurance, fluid dynamics, radio astronomy, hydrology, structural mechanics, chemistry and medicine. Practical problems arising in some of these areas in the mid-1900s led to the development of a corresponding stochastic calculus. Almost a hundred years ago, Bachelier (1900) used what we now call Brownian motion or the Wiener process to model stock prices in the Paris Bourse. Later Einstein (1906), in his work on Brownian motion, used an equivalent mathematical construct. Wiener (1923) then developed more fully the mathematical theory of Brownian motion. A further advance was made by Ito (1944), who laid the foundation of a stochastic calculus known today as the Ito calculus. This represents the stochastic generalization of the classical differential calculus, allowing us to model in continuous time such phenomena as the dynamics of stock prices or the motions of a microscopic particle subject to random fluctuations. The corresponding stochastic differential equations (SDEs) generalize the ordinary deterministic differential equations (ODEs). A most striking example, where Ito SDEs provide the essential modelling device, is given by modern financial theory. The Nobel prize-winning work of Merton (1973) and Black and Scholes (1973) initiated the entire derivatives and risk management industry that we see today. For the development of corresponding financial markets, it is vital to improve our understanding of its underlying stochastic dynamics, and to calculate efficiently relevant financial quantities such as derivative prices and risk measures. After the earlier technological revolution in manufacturing, it is the author's view that we are likely to experience now and into the next century a revolution in commercial technologies. The finance area is the most notable example where the new changes have occurred. In the insurance area a similar development has already started. Marketing can be expected to base its future models on SDEs. We are at the beginning of a development where commercial and economic activities will become subject to detailed stochastic modelling and quantitative analysis. This global phenomenon will be a major driving force in the development of appropriate numerical methods for the solution of SDEs. This paper provides a very basic introduction, as well as a brief overview of the area of numerical methods for SDEs. The rapidly increasing literature on the topic makes it impossible to give a comprehensive survey. However, an attempt has been made to highlight key approaches, and note results
NUMERICAL METHODS FOR SDES
199
that have been instrumental in the development of the field, or may be of major significance in future research. Books on numerical solutions of SDEs that provide systematic information on the subject include Gard (1988), Milstein (1988a, 1995a), Kloeden and Platen (1992/19956), Bouleau and Lepingle (1993), Janicki and Weron (1994) and Kloeden, Platen and Schurz (1994/1997). Given the diversity of numerical problems that arise in SDEs, there is a strong need to extend the wealth of expertise accumulated in the numerical analysis of ODEs to the field of SDEs. Important monographs on the numerical analysis of ODEs that had an impact on the numerical analysis of SDEs include those by Gear (1971), Bjorck and Dahlquist (1974), Butcher (1987), Hairer, N0rsett and Wanner (1987), Hairer and Wanner (1991) and Stoer and Bulirsch (1993). As this paper will show, a multi-faceted variety of research topics on numerical methods for SDEs has emerged over the last twenty years. These topics can be linked to complexity theory: see, for instance, Traub, Wasilkowski and Wozniakowski (1988), Wozniakowski (1991) and Sloan and Wozniakowski (1998), where it was shown that simulation approaches, including those of stochastic numerical analysis, are optimal with respect to average case complexity.
2. Stochastic differential equations Let us consider an ltd SDE of the form dXt = a{Xt) dt + b{Xt) dWt
(2.1)
for t G [0,T], with initial value XQ G 1Z. The stochastic process X = {Xt, 0 < t < T} is assumed to be a unique solution of the SDE (2.1) which consists of a slowly varying component governed by the drift coefficient a(-) and a rapidly fluctuating random component characterized by the diffusion coefficient b(-). The second differential in (2.1) is an ltd stochastic differential with respect to the Wiener process W = {Wt, 0 < t < T}. It is denned via the corresponding stochastic integral by using a limit of Riemann sums with values of the integrands taken on the left-hand side of the discretization intervals. Another stochastic differential, the Stratonovich stochastic differential, would result if the values of the integrands were taken at the centre of the interval. As introductory textbooks on SDEs, one can refer to Arnold (1974), Gard (1988), Oksendahl (1985) and Kloeden et al. (1994/1997). More advanced material on SDEs is presented, for instance, in Elliott (1982), Karatzas and Shreve (1988), Ikeda and Watanabe (1989), Protter (1990) and Kloeden and Platen (1992/19956).
200
E. PLATEN
To keep our formulae simple in this introductory exposition, we discuss mainly the simple case of the one-dimensional SDE driven by a onedimensional standard Wiener process. In principle, most of the numerical methods we mention can be generalized to multi-dimensional X and W. Since the path of a Wiener process is not differentiable, the ltd calculus differs in its properties from classical calculus. This is most obvious in the stochastic chain rule, the ltd formula, which for a twice continuously differentiable function / has the form
df(Xt) = (f'{Xt) a{Xt) + \ f"(Xt) b2(Xt)) dt + f'(Xt) b(Xt) dWt (2.2) for 0 < t < T. We remark that the extra term ^ f"b2 in the drift function of the resulting SDE (2.2) is characteristic of the Ito calculus, and this consequently also has a substantial impact on numerical methods for SDEs, as will be seen later. The Stratonovich calculus follows the rules of classical calculus more closely. However, it does not conveniently relate to martingale theory, a fundamental part of stochastic analysis. Both the Ito and the Stratonovich stochastic calculus can be related to each other, and one can switch from one to the other if necessary. The above stochastic process X in (2.1) can be written as the solution of the Stratonovich SDE in the form o dXt = a(Xt) dt + b(Xt) o dWt,
(2.3)
where, assuming b' exists, we have the Stratonovich drift function a{x) = a{x)-\b{x)b'{x),
(2.4)
with the notation 'o' in (2.3) referring to the Stratonovich stochastic differential. This differential also arises as the limit of classical differentials when the path of the Wiener process is smoothed, as in the Wong-Zakai approximation. Such approximations of SDEs are studied in Wong and Zakai (1965), Kurtz and Protter (19916) and Saito and Mitsui (1995), for instance. The strong similarity of the Stratonovich calculus with the classical calculus is made clear by the Stratonovich chain rule, which for a differentiable function / has the form odf{Xt)
= f'(Xt)(a(Xt)dt + b(Xt)odWt) = f'(Xt)odXt,
(2.5)
with 'o' again denoting the Stratonovich differential. We note that in (2.5) only first-order derivatives of / appear, as in deterministic calculus. It turns out that for some numerical tasks the Ito, and for others the Stratonovich formulation, of an SDE is more convenient, as we shall see later. Usually only the Ito calculus allows us to exploit powerful martingale results for numerical analysis.
NUMERICAL METHODS FOR S D E S
201
3. Euler approximation Since analytical solutions of SDEs are rare, numerical approximations have been developed. These are typically based on a time discretization with points 0 = T0 < TX <
< Tn <
< TAT = T
in the time interval [0,T], using a step-size A = T/N. More general time discretizations can be used, that could, for instance, be random. Simulation experiments and theoretical studies have shown that not all classical or heuristic time discrete approximations of SDEs converge in a useful sense to the corresponding solution process as the step-size A tends to zero: see, for instance, Clements and Anderson (1973), Wright (1974), Fahrmeier (1976), Clark and Cameron (1980) and Riimelin (1982). Consequently a systematic approach is needed in order to select an efficient and reliable numerical method for the problem at hand. Several different approaches have been proposed in the literature to handle SDEs numerically. Without relying much on the specific structure of SDEs, Kohler and Boyce (1974), Boyle (1977) and Boyce (1978) have suggested general Monte Carlo simulation of the given random system. Kushner (1974) and Kushner and Dupuis (1992) proposed, as the approximating process for solutions of SDEs, discrete, finite state Markov chains. Platen (1992) developed higher-order Markov chain approximations. When digital computers were still in their infancy, Dashevski and Liptser (1966) and Fahrmeier (1976) also used analogue computers to handle SDEs numerically. Both in the literature and in practice, most attention has been directed to discrete time approximations of SDEs. The Euler approximation that was first studied in Maruyama (1955) is the simplest example of such a method, and is ideally suited for implementation on a digital computer. For the SDE (2.1) the Euler approximation Y is given by the recursive equation Yn+l = Yn + a{Yn) A + b{Yn) A Wn
(3.1)
for n = 0 , 1 , . . . , N - 1 with Yo = Xo. Here AWn = WTn+1 - WTn denotes the increment of the Wiener process in the time interval [r n ,r n +i] and are represented by independent N(0, A) Gaussian random variables with mean zero and variance A. It has been shown in the literature that the Euler approximation converges for vanishing A —> 0, under rather different types of convergence, to the solution X of the Ito SDE (2.1). Some of the papers in which the Euler method has been studied are Allain (1974), Yamada (1976), Gikhman and Skorokhod (1979), Clark and Cameron (1980), Ikeda and Watanabe (1989), Janssen (1984a, 19846), Atalla (1986), Jacod and Shiryaev (1987), Kaneko and Nakao (1988), Kanagawa (1988, 1989, 1995, 1996, 1997), Golec and Ladde (1989), Mikulevicius and Platen (1991), Mackevicius (1994), Camba-
202
E. PLATEN
nis and Hu (1996), Gelbrich (1995), Bally and Talay (1995, 1996a, 19966), Jacod and Protter (1998), Kohatsu-Higa and Ogawa (1997) and Chan and Stramer (1998). This is certainly not a complete list of references on the Euler method. It is always an interesting task to study a new technique based on this simple discrete time approximation of an SDE. For instance, Gorostiza (1980) and Newton (1990) suggested Euler type approximations with random step-size, where the approximate path jumps from threshold to threshold. To simulate a realization of the Euler approximation, one needs to generate the independent random variables involved. In practice, linear or nonlinear congruential pseudo-random number generators are often used. An introduction to this area is given by Ripley (1983 a). Books that include chapters on random number generation include Ermakov (1975), Yakowitz (1977), Rubinstein (1981), Ripley (19836), Morgan (1984), Ross (1991), Mikhailov (1992), Fishman (1992) and Gentle (1998). We mention also the papers by Box and Muller (1958), Marsaglia and Bray (1964), Brent (1974), Eichenauer and Lehn (1986), Niederreiter (1988), Sugita (1995) and Antipov (1995, 1996). Random number generation on supercomputers was considered by Petersen (1988), Anderson (1990), Petersen (1994a) and Entacher, Uhl and Wegenkittl (1998). As in the deterministic case, it turns out that the Euler method is rather simple and crude, somewhat inefficient and often exhibits poor stability properties. Much better stochastic numerical methods can be constructed systematically. 4. Strong and weak convergence It is convenient to have some measure of the efficiency of a numerical scheme by identifying its order of convergence. Unlike in the typical deterministic modelling situation, there exist in the stochastic environment many different types of convergence that make theoretical or practical sense. Therefore in stochastic numerical analysis, one has to specify the class of problems that one wishes to investigate, before starting to construct a numerical method and seeking to optimize its efficiency with respect to one or another convergence criterion. In stochastic numerical analysis, the order of convergence plays a crucial role in the design of numerical algorithms. However, as already explained, the choice of the convergence criterion depends on the type of the problem. Roughly speaking, there are two major types of convergence to be distinguished. These can be identified by whether one requires (a) approximations to the sample paths, or (b) approximations to the corresponding distributions. For convenience we choose a rather simple characterization of each of these
NUMERICAL METHODS FOR S D E S
203
two types of convergence for the classification of numerical algorithms, and call them the strong and the weak convergence criterion, respectively. Tasks involving direct simulations of paths, such as the generation of a stock market price scenario, the computation of a filter estimate for some hidden unobserved variable, or the testing of a statistical estimator for parameters in some SDEs, require that the simulated sample paths be close to those of the solution of the original SDE. This implies that in these cases, among others, some strong convergence criterion should be used. The following simple criterion allows us to classify numerical methods according to their strong order 7 of convergence, using the absolute error E\XT — YN\ a t the terminal time T. We shall say that a discrete time approximation Y of the exact solution X of an SDE converges in the strong sense with order 7 6 (0, 00] if there exists a constant K < 00 such that E\Xt-YN\
(4.1)
for all step-sizes A € (0,1). In the deterministic case with vanishing diffusion coefficient 6 = 0 this criterion reduces to the usual deterministic convergence order criterion, as used, for instance, in Gear (1971) or Butcher (1987). Fortunately, in a large variety of practical problems a pathwise approximation of the solution X of an SDE is not required. Much computational effort has been wasted on simulations by missing this point. If one aims to compute, for instance, a moment of X, a probability related to X, an option price on a stock price X or a general functional of the form E(g(XT)), then no strong approximation is required. The simulation of such functionals does not force us to approximate the path of X. Rather, it is sufficient to approximate adequately the probability distribution that corresponds to X. Consequently we need only a much weaker type of convergence than that expressed by the strong convergence criterion (4.1). We shall say that a discrete time approximation Y of a solution X of an SDE converges in the weak sense with order (3 € (0, oo] if, for any polynomial g, there exists a constant Kg < 00 such that \E(g{XT))-E(g(YN))\
(4.2)
for all step-sizes A G (0,1), provided that these functionals exist. Clearly this criterion covers the convergence of pih moments because we can set g(x) = xp. It reduces to the deterministic order criterion in the case b = 0 and g{x) = x. As we shall see later, the numerical methods that can be constructed with respect to this weak convergence criterion are much easier to implement than those required by the strong convergence criterion. In any practical simulation, one should try, if possible, to identify directly the task at hand as being one that requires a weak approximation method.
204
E. PLATEN
5. Stochastic Taylor expansions The key to the construction of most higher-order numerical approximations is usually obtained from the truncated expansion of the variables of interest over small increments. The well-known Taylor formula provides the basis for the derivation of most deterministic numerical algorithms. In the stochastic case, a stochastic Taylor expansion for Ito SDEs was first described in Wagner and Platen (1978). This result was then extended and generalized in Platen (1981, 19826), Platen and Wagner (1982), Azencott (1982), Sussmann (1988), Yen (1988), BenArous (1989), Kloeden and Platen (1991a, 19916), Hu (1992, 1996), Hu and Watanabe (1996), Kohatsu-Higa (1997), Liu and Li (1997), and Kuznetsov (1998). The Wagner-Platen formula is obtained by iterated applications of the Ito formula to the integrands in the integral version of the SDE (2.1). For example, in a simple case, we obtain the expansion
Xt = Xto+a{Xto)
f ds + b{Xto) f * dW8 Jto to
Jto Jt0
+ b(Xt0) b'(Xt0) f r dWsl dWS2 + Rto,t,
(5.1)
Jt0 Jto
where Rto,t represents some remainder term consisting of higher-order multiple stochastic integrals. Multiple ltd integrals of the type fTn+1
hi) = /
fTn+1
dWs,
J(o,i)
=
fTn+1
/(M) = /
fS2
/
1 /
fTn+l
f$2
/
/
JTn
Jrn
dsidWS2,
/(10)=/ Jrn
/(i.i.i) = r+i JTU
r
\
f$2
/
dWSlds2,
Jrn
r dwsi dwS2 dwS3
JTn
o
dWSl dWS2 = - ((/ (1) ) 2 - A) ,
^.2)
Jrn
on the interval [r n ,r n+ i] form the random building blocks in the WagnerPlaten expansions. For comparison, an application of the Stratonovich-Taylor formula, developed in Kloeden and Platen (1991a) and (19916), to the integral version of the Stratonovich SDE (2.3) yields the expansion
Xt = Xt0 + a(Xt0) f ds + b(Xt0) f o dWSl Jto
Jto
+ b(Xt0) b'(Xt0) f r o dWSl o dWS2 + RtOjt, Jt0 Jt0
(5.3)
where RtOtt 1S some remainder term with higher-order multiple Stratonovich
NUMERICAL METHODS FOR S D E S
205
integrals. In this case multiple Stratonovich integrals of the form S2
= 1(1),
J(i,i) = /
/
/
odWSlo
Jrn
° d ^ S l o dWS2 = -
dWS2 o dWS3 = - (J ( 1 ) ) 3
Jrn
(5.4)
o!
on the interval [rn, rn+\} represent the basic random elements of the expansion. Close relationships exist between multiple Ito and Stratonovich integrals which form some kind of algebra. This algebra and certain approximations of multiple stochastic integrals have been described in Platen and Wagner (1982), Liske (1982), Platen (1984), Milstein (1988a, 1995a), Kloeden and Platen (1991a, 19916, 1992/19956), Kloeden, Platen and Wright (1992c), Hu and Meyer (1993), Hofmann (1994), Games and Lyons (1994), Games (1994, 1995a), Castell and Games (1995), Li and Liu (1997), Burrage (1998) and Kuznetsov (1998).
6. Strong approximation methods In this section, we focus on strong discrete time approximations of SDEs. These are suitable for scenario simulations. They are usually more expensive to implement, both in development and computing time, than their weak counterparts.
6.1. Strong Taylor
approximation
If, from the Wagner-Platen formula (5.1) we select only the first two integral terms, then we obtain the Euler approximation (3.1). It has been shown (see, for instance, Milstein (1974) or Platen (1981)) that in general the Euler approximation has strong order of only 7 = 0.5, as a consequence of the Holder continuity of order 0.5 of the paths of X. Taking one more term in the expansion (5.1) we obtain the well-known Milstein scheme Yn+l = Yn + a{Yn) A + b{Yn) AWn + b(Yn) b'(Yn) J ( l j l ) ,
(6.1)
proposed by Milstein (1974), where the double Ito integral /(^i) is given in (5.2) as ((AWn)2 — A)/2. In general this scheme has strong order 7 = 1.0. Thus adding one more term from the Wagner-Platen formula to the Euler scheme already provides an improvement in efficiency. The Milstein scheme can be obtained alternatively from the Stratonovich-Taylor formula (5.3) by selecting the first three integral terms of that expansion.
206
E. PLATEN
For multi-dimensional driving Wiener processes, the double stochastic integrals appearing in the Milstein scheme have to be approximated, unless the drift and diffusion coefficients fulfil a certain commutativity condition. Characterizations and approximations of such double ltd integrals are given, for instance, in Milstein (1988a, 1995a), Kloeden and Platen (1992/19956), Gaines and Lyons (1994) and Kuznetsov (1998). Wagner and Platen (1978), Platen (1981) and Platen and Wagner (1982) have described which terms of the Wagner—Platen formula have to be chosen to obtain a desired higher strong order of convergence. Thus, for instance, the strong Taylor approximation of order 7 = 1.5 has the form Yn+1 = yn + aA + 6AWn + 66 / / ( u ) +6a / / ( l i O ) + (aa' + \b2a") (bb" + (ft')2) / ( 1 > u ) ,
(6.2)
where we suppress the dependence of the coefficients on Yn and use the multiple Ito integrals mentioned in (5.2). The integer strong order Taylor schemes given in Kloeden and Platen (1992/19956) can be conveniently derived from a Stratonovich-Taylor formula of the type (5.3). For instance, the strong order 2.0 Taylor scheme is Yn+i = Yn + aA + bAWn + bb' J^
+ ba' J{1>0) + ab' J ( 0 ,i)+aa' ^
+ b(b(bb'yy J (1)lilil) ; (6.3) here, in addition to those multiple Stratonovich integrals already mentioned in (5.4), we have also used -^(o,i,i) = / JTn
/
/ JTn Tn+i rs3
_n
JTn
j ( 1 . 1 ) 0 ) = [Tn+i r JTn
/ JTn rs2
dsio dWS2o dWS3, o dWslds2o
dWS3,
JTn
r
odwSiodwS2ds3,
JT-a J Tn
and J(l | u j =
rTn+i
/
rs4
rs3
/
/
JTn
JTn
rs2
° dW s l o dW S2 o dVFsg (
J Tn
(6.4)
Milstein (1988a, 1995a), Kloeden and Platen (1991a), Hofmann (1994), Gaines (1994), Liu and Li (1997), Burrage (1998) and Kuznetsov (1998) point out that certain multiple Stratonovich integrals can be expressed using some minimal set of random variables. This is important for efficient practical implementations.
NUMERICAL METHODS FOR S D E S
207
In general, one can say that for higher-order numerical schemes one requires adequate smoothness of the drift and diffusion coefficients, but also adequate information about the driving Wiener processes. This information is contained in the multiple stochastic integrals appearing in the WagnerPlaten and Stratonovich-Taylor formulae. For specific types of drift and diffusion coefficients, for instance, when they fulfil a certain commutativity condition, higher-order strong Taylor schemes can be considerably simplified: see Kloeden and Platen (1992/19956). Only a reduced set of multiple stochastic integrals is then needed to achieve the corresponding strong order. In any given problem with several driving Wiener processes, it is worthwhile checking whether this might apply. Another situation where considerable extra efficiency can be gained occurs when the SDE has only a small noise term: that is, the diffusion coefficient is small and the noise can be interpreted as a perturbation. This situation was studied by Milstein and Tretjakov (1994). Such an approximation has to focus on the drift part of the dynamics. One then usually achieves only a low theoretical strong order, but owing to the smallness of the noise a reasonable overall performance of the algorithm is achieved. A relatively simple approach to constructing discrete time approximations is the splitting method applied by Bensoussan, Glowinski and Rascanu (1990, 1992), LeGland (1992), Sun and Glowinski (1994) and Petersen (1998), who treat the drift term and the diffusion term separately in their algorithms. This method in general achieves only the strong order of the Euler approximation, but is convenient in its implementation. As an illustration, in Figure 6.1, we approximate a simulated path for the geometric Brownian motion that follows the SDE dXt = r Xt dt + a Xt dWt
(6.5)
for t € [0,1] with XQ = 1. This process represents the standard model for asset prices in mathematical finance. Fortunately, in this special case we have an explicit solution of the form
Xt = Xo exp {(r - \ a2) t + aWt) . This was used in Figure 6.1 to plot a sample path of X for an interest rate r = 0.05 and volatility a = 0.2. For the time step-size A = 0.1, we also show in Figure 6.1 the linearly interpolated path of the Milstein approximation. A disadvantage of higher-order strong Taylor approximations is the fact that derivatives of the drift and diffusion coefficients have to be calculated at each step. This can be avoided by considering derivative-free approximations, such as the Runge-Kutta-type methods to be discussed in the following section.
208
E. PLATEN
Fig. 6.1. Paths of exact solution X and the Milstein approximation
6.2. Strong Runge-Kutta
approximations
As has been previously mentioned at the beginning of Section 3, one cannot simply take well-known deterministic Runge-Kutta schemes and adapt them to an SDE. These only converge with a given strong order towards the correct solution if they also approximate the corresponding strong Taylor scheme. Any viable scheme that aims to achieve a certain strong order must in general involve the appropriate multiple stochastic integrals appearing in the corresponding Taylor scheme. As a first attempt at avoiding derivatives in the scheme, one can use the following simple method suggested by Platen (1984), which approximates the Milstein scheme. It has the Ito form Yn+l = Yn + a(Yn) A + b(Yn) AWn + (b(Yn) - b'{Yn)) ^ - ((AWn)2 - A), (6.6) or the Stratonovich form Yn+1 = Yn + a{Yn) A + b(Yn) AWn + (b{Yn) - b'(Ynj) ^= (AWn)2, (6.7) with Yn = Yn + b{Yn) VK. In Rumelin (1982), Gard (1988), Kloeden and Platen (1992/19956, 1992), and Artemiev (1993 a, 19936) further Runge-Kutta-type schemes can be found. It is natural to ask whether the tree approach developed in Butcher (1987) can be translated to the stochastic setting. Some results along these lines
NUMERICAL METHODS FOR S D E S
209
were given by Saito and Mitsui (19936), Burrage and Platen (1994), Komori, Saito and Mitsui (1994), Komori and Mitsui (1995), Saito and Mitsui (1996), Burrage and Burrage (1996, 1998), Burrage, Burrage and Belward (1997), Komori, Mitsui and Sugiura (1997) and Burrage (1998). For instance, in the case of a single driving Wiener process, a rooted tree methodology has been described for Stratonovich SDEs by Burrage (1998). Following this approach, a 7 = 1.0 strong order two-stage Runge-Kutta method has the form Yn+l = Yn + (a(Yn) + 3a(Yn)) f + (b(Yn) + 3b(Yn)) ^
(6.8)
with Yn = Yn + l (a(Yn) A + b(Yn)
AWn).
The advantage of the method (6.8) compared, for instance, with the Platen method (6.6) is that the principal error constant has been minimized within a class of one-stage first-order Runge-Kutta methods. Four-stage RungeKutta methods of strong order 7 = 1.5 can also be found in Burrage (1998). Similarly, in the context of filtering problems Newton (1986a, 19866, 1991) and Castell and Games (1996) have proposed approximations that are, in some sense, asymptotically efficient with respect to the leading error coefficient within a class of Runge-Kutta-type methods. One strong order 7 = 1.0 method proposed by Newton has the form Yn+l = Yn + (a(Yn) + ax)^ + (b(Yn) + 2&i + 2b2 + 63) ^
,
(6-9)
where = a (3A
(AWn)2)
,
01 = a (Yn + \ a + b2 AWn) ,
h = b (Yn + \ b(Yn) AWn) , b2 = b (Yn + \a++
bx ^ f * ) ,
63 = b(Yn + a- + b2 AWn) Lepingle and Ribemont (1991) suggested a two-step strong scheme of first order. In Kloeden and Platen (1992/19956) further two-step strong schemes have been proposed. We now present a long list of publications that deal with higher-order discrete time approximations of Ito or Stratonovich SDEs; these contain many ideas and diverse approaches that may prove of interest in future research. They include Franklin (1965), Shinozuka (1971), Kohler and Boyce (1974), Rao, Borwankar and Ramkrishna (1974), Dsagnidse and Tschitashvili (1975), Harris (1976), Glorennec (1977), Kloeden and Pearson (1977), Clark (1978), Nikitin and Razevig (1978), Helfand (1979), Platen (1980a), Razevig (1980), Greenside and Helfand (1981), Casasus (1982), Clark (1982a), Guo (1982), Talay (1982a, 19826, 1983a, 19836), Drummond,
210
E. PLATEN
Duane and Horgan (1983), Casasus (1984), Guo (1984), Janssen (1984a, 19846), Shimizu and Kawachi (1984), Tetzlaff and Zschiesche (1984), Unny (1984), Clark (19826), Averina and Artemiev (1986), Drummond, Hoch and Horgan (1986), Kozlov and Petryakov (1986), Greiner, Strittmatter and Honerkamp (1987), Liske and Platen (1987), Platen (1987), Milstein (1987), Shkurko (1987), Romisch and Wakolbinger (1987), Averina and Artemiev (1988), Milstein (19886), Golec and Ladde (1989), Feng (1990), Nakazawa (1990), Bensoussan, Glowinski and Rascanu (1992), Feng, Lei and Qian (1992), Artemiev (19936), Kloeden, Platen and Schurz (1993), Saito and Mitsui (1993a), Petersen (19946), Torok (1994), Ogawa (1995), Gelbrich and Rachev (1996), Grecksch and Wadewitz (1996), Newton (1996), Saito and Mitsui (1996), Schurz (19966), Yannios and Kloeden (1996), Artemiev and Averina (1997), Denk and Schaffer (1997), Abukhaled and Allen (1998) and Schein and Denk (1998). To illustrate the strong order of convergence for the strong Taylor schemes mentioned earlier, let us perform a simulation study that uses geometric Brownian motion (6.5) introduced in the previous section. We estimate the absolute error (4.1) for different step-sizes A and different schemes, including the Euler scheme (3.1), the Milstein scheme (6.1), the order 1.5 strong Taylor approximation (6.2) and the order 2.0 strong Taylor approximation (6.3). In Figure 6.2 the logarithm of the respective estimated absolute errors from 2000 simulated paths are plotted against the logarithm of the step-size. We note that, for the different schemes, the slopes of the linearly interpolated absolute errors correspond to the theoretical strong orders of the schemes. Results for corresponding Runge-Kutta methods are almost identical. Simulation studies involving higher-order schemes can be found, for instance, in Klauder and Petersen (1985), Pardoux and Talay (1985), Liske and Platen (1987), Newton (1991) and Kloeden et al. (1994/1997). 6.3. A-stability
and implicit strong methods
What really matters in a numerical scheme is that it should be numerically stable, can be conveniently implemented, and generates fast highly accurate results. Since SDEs generalize ODEs, their numerical analysis must encounter at least all the problems known for the deterministic case. Before any properties of higher order of convergence can be studied the question of numerical stability of a scheme has to be satisfactorily answered. Many practical problems turn out to be multi-dimensional: see Hofmann, Platen and Schweizer (1992) or Heath and Platen (1996) for examples from finance, or Schein and Denk (1998) for an example from microelectronics. We know from the numerical analysis of ODEs that stiff systems can easily occur which cause numerical instabilities for most explicit methods.
211
NUMERICAL METHODS FOR S D E S
Euler •
Hilstein Taylor 1.5 Taylor 2.0
-5
-4 -3 log step size
Fig. 6.2. Log absolute error versus log step-size
The propagation of errors in the stochastic case also depends on the specific nature of the stochastic part of the dynamics. It is quite a delicate matter to provide reasonable answers with respect to the stability of numerical schemes for general SDEs. Therefore it is useful to study important classes of test equations that provide insight into typical instability patterns. The well-known concept of A-stability (see Bjorck and Dahlquist (1974)) can be directly generalized to the case of SDEs with additive noise, that is, b(x) = const in equation (2.1): see Milstein (1988a), Hernandez and Spigler (1992), Kloeden and Platen (1992) or Milstein (1995a). One introduces a complex-valued test equation of the form dXt = XXtdt+
dWt,
(6.10)
with additive noise, where A is a complex number with real part Re(A) < 0 and W is a real-valued Wiener process. A one-step numerical approximation Y applied to (6.10) then usually yields a recursive relation of the form v^ I n-\-\
r~* (\ A \ v — Lr^AiAj in
_i_ 7 ~r
^rti
i R 11 V^'-^--^/
\
where the Zn represent random terms that do not depend on A or YQ, Y\, ..., Yn. The region of A-stability of a scheme is denned as the subset of the complex plane consisting of those complex numbers AA with Re(A) < 0 and A > 0 which are mapped by the function G from (6.11) into the unit circle, that is, those AA for which (6.12)
212
E. PLATEN
If the ^-stability region covers the left half of the complex plane, then we say that the scheme is ^-stable. Owing to the additive noise in the test equation, the concept of yl-stability does not say much about instabilities that may arise, for example, from multiplicative noise, that is, b(x) = ax, or from other non-constant diffusion coefficients. We shall discuss the problem of stability under multiplicative noise in Section 7.4. ^4-stability is a rough indicator for basic stability properties of any discrete time approximation. On the basis of implicit stochastic Taylor expansions (see Kloeden and Platen (1992/19956)), it is possible to construct implicit discrete time approximations. As an example, we mention the Stratonovich version of a family of implicit Milstein schemes described in Milstein (1988a, 1995a) and Kloeden and Platen (1992/19956). It has the form Yn+l
= (6.13)
where a G [0,1] represents the degree of implicitness. This family of schemes is of strong order 7 — 1.0 and ^4-stable for a > \. A major difficulty arises from the fact that in a strong scheme it is almost impossible to construct implicit expressions for the noise terms, because in the actual discrete time approximation these would usually lead to terms with inverted Gaussian random variables. Such terms miss crucial moment properties. A possible research direction that seems to overcome part of the problem has been suggested by Milstein, Platen and Schurz (1998), who have proposed a family of balanced implicit methods. A balanced implicit method can be written in the form y n + 1 = Yn + a{Yn) A + b(Yn) AWn + (Yn - Yn+1) Cn,
(6.14)
where
Cn = c°(Yn)A + c1(Yn)\AWn\ and c°, c1 represent positive real-valued uniformly bounded functions. One can also choose more general functions c° and c 1 that must fulfil conditions described in Milstein et al. (1998). The freedom in choosing c° and c1 can be exploited to construct a method with stability properties tailored to the dynamics of the underlying SDE. However, one must pay a price: this method is only of strong order 7 = 0.5. In a number of applications, especially those associated with multiplicative noise, the balanced implicit method showed much better stability behaviour than other methods: see, for instance, Schurz (1996a) and Fischer and Platen (1998). Implicit schemes or different concepts of numerical stability have been suggested and studied in a variety of papers, and we again mention a long list, including Talay (19826, 1984), Klauder and Petersen (1985), Pardoux
NUMERICAL METHODS FOR S D E S
213
and Talay (1985), Milstein (1988a, 1995a), Artemiev and Shkurko (1991), Drummond and Mortimer (1991), Kloeden and Platen (1992), Hernandez and Spigler (1992, 1993), Artemiev (1993a, 1993&, 1994), Saito and Mitsui (19936), Hofmann and Platen (1994), Milstein and Platen (1994), Komori and Mitsui (1995), Hofmann and Platen (1996), Saito and Mitsui (1996), Schurz (1996a), Schurz (1996c), Ryashko and Schurz (1997), Burrage (1998), Higham (1998) and Petersen (1998). Despite all this work, stochastic numerical stability remains an open and challenging area of research. 7. Weak approximation methods As previously mentioned, in many applications it is not necessary to generate an almost exact replica of the sample path of the solution of the underlying SDE. The Monte Carlo simulation of option prices is a typical example, where simple random walks can be used to approximate option pricing functionals. Within this section we discuss numerical methods that focus on approximating the probability distributions of solutions of SDEs, allowing us to handle wide classes of functionals. We then need to study the weak order of convergence of several stochastic numerical methods. 7.1. Weak Taylor approximation The weak convergence criterion (4.2) allows us more degrees of freedom in constructing a discrete time approximation than the strong convergence criterion (4.1). For instance, under weak convergence, the random increments AWn of the Wiener process can be replaced by simpler random variables AWn which are similar to these in distribution. By substituting the iV(0, A) Gaussian distributed random variable AWn in the Euler approximation (3.1) by an independent two-point distributed random variable AWn with P(AWn =
= 0.5,
(7.1)
we obtain the simplified Euler method Yn+l = Yn + a{Yn) A + b(Yn) AWn.
(7.2)
The key point for this choice of the two-point random variable AWn is that its first two moments match the corresponding ones for AW n . It can be shown that this method (7.2) converges with weak order (3 — 1.0 if sufficient regularity conditions are imposed. This weak order is higher than the strong order 7 = 0.5 achieved by the Euler approximation (3.1). In Mikulevicius and Platen (1991), a lower order, (3 < 1.0, of weak convergence has been proved if there are only Holder continuous drift and diffusion coefficients. The Euler approximation (3.1) can be interpreted as the order 1.0 weak Taylor scheme. One can select additional terms from the Wagner-Platen formula to obtain weak Taylor schemes of higher order. Platen (1984, 1992)
214
E. PLATEN
and Kloeden and Platen (1992/19956) have described how to construct the weak Taylor scheme corresponding to a given weak order /3 6 {1,2,3,...}. It turns out that one has to include all terms from the Wagner-Platen formula with multiple Ito integrals of multiplicity equal to or less than the desired weak order /?. Thus the Euler method, that is, the order 1.0 weak Taylor scheme, is constructed using the multiple integrals of multiplicity one. The order 2.0 weak Taylor scheme must then include all terms with single and double integrals, and therefore has the form
+ (ab' + \b2 b") J(0)1) + (a a' + \ b2 a') ^ .
(7.3)
This scheme was first proposed by Milstein (1978) and later studied by Platen (1984) and Talay (1984). We still obtain a scheme of weak order P = 2.0 if we replace the random variable AWn in (7.3) by AWn, the double integrals /(i,o) and -^(0,1) by ^AWn and the double Wiener integral /(i,i) by \ ((AWn)2 — A). Here AWn might be a three-point distributed random variable with P (AWn =
) = -
and
P (AW^ = o) = - .
(7.4)
Note that the first four moments of AWn match the corresponding ones of AWn. By approximating all triple Ito integrals in the order 3.0 weak Taylor scheme, a simplified order 3.0 weak Taylor scheme was derived by Platen (1984), with the form Yn+1 = Yn + aA + bAW + bb'((AW)2 - A)/2 + ba'AZ + ^-(aa' + \b2 a") + (ab' + \b2 b") {AW A - AZ)
+ L (a b' + b a' + \ b2 b")' + \b2(ab' + ba' + \b2 b") " ,/
,
U2
,,y\ AWA2
+ 16 (ba' + ab' + \ b2 b")' + a(bb')' + \ b2{bb')"\ ({AW)2 - A) + L (a a' + i b2 a")' + \b2(aa' + \b2 a")' AW -IT-
(7-5)
NUMERICAL METHODS FOR S D E S
215
Here AW and AZ can be chosen, for instance, as correlated zero mean Gaussian random variables with
E(AW)2 = A,
E{{AZ)2) = A3/3,
E(AZAW) = A2/2.
As in the case of strong approximations, weak higher-order schemes can be constructed only if there is adequate smoothness of the drift and diffusion coefficients, and is a sufficiently rich set of random variables approximating the multiple stochastic integrals of the corresponding weak Taylor schemes, generated at each time step. Under the weak convergence criterion, we not only have considerable freedom to approximate multiple stochastic integrals, but also need fewer such integrals to achieve a certain order of weak convergence than for the same order of strong convergence: see Kloeden and Platen (1992/19956) and Hofmann (1994). We note that the weak higher-order Taylor schemes involve higher-order derivatives of a and b. Obviously, it would be desirable to have derivativefree or Runge-Kutta-type weak schemes. We discuss these in the following section. 7.2. Weak Runge-Kutta
approximations
A weak second-order Runge-Kutta approximation that avoids derivatives in a and b is given by the algorithm
Yn+1
=
Yn + (a(Yn) + a(Yn))^
^
+ (6(y+) - b(Y-)) ((AWn)2 - A)
(7.6)
with Yn = Yn + a(Yn) A + b(Yn) AWn and Y
= Yn + a{Yn) A
b(Yn) VA,
where AWn can be chosen as in (7.4): see Platen (1984). Talay (1984) suggested a weak second-order scheme which is not completely derivative-free, and also requires two random variables at each step. Another weak second-order scheme that involves the derivative b' has been proposed by Milstein (1985), with the form Yn+l =
(a(Yn) + a{Yn)) | + b(Yn) b'(Yn) \ b{Yn) + \ b(Y-)) AWn,
(7.7)
216
E. PLATEN
where Yn and AWn are as in (7.6), and
y
= Yn + a(Yn)
b(Yn)
^
Weak second- and third-order Runge-Kutta-type schemes have been proposed, for instance, by Kloeden and Platen (1992/19956), Mackevicius (1994) and Komori and Mitsui (1995). There appears to be some scope for future research in weak higher-order Runge-Kutta schemes, possibly generalizing Butcher's rooted tree methods as described, for instance, by Komori et al. (1997) and Burrage (1998).
7.3. Extrapolation methods In deterministic numerical analysis, extrapolation methods represent an elegant way of achieving higher-order convergence by using lower-order methods, provided the numerical stability of these for a range of step-sizes can be guaranteed. For the weak second-order approximation of the functional E(g(Xx)), Talay and Tubaro (1990) proposed a Richardson extrapolation of the form Vg%(T) = 2E(g {Y\T)))
- E (g (Y2A(T))) ,
(7.8)
where Y6(T) denotes the value at time T of an Euler approximation with step-size 6. Using Euler approximations with step-sizes 8 = A and S = 2A, and then taking the difference (7.8) of their respective functionals, the leading error coefficient cancels out, and V^2{T) ends up being a weak secondorder approximation. Further weak higher-order extrapolations have been developed by Kloeden and Platen (1989). For instance, one obtains a weak fourth-order extrapolation method using
Vg% = 1 [32E (g (YA(T))) - 12 E (g (V 2A (r))) + E (g (Y^(T)))] (7.9)
where Y6(T) is the value at time T of the weak second-order Runge-Kutta scheme (7.6) with step-size 6. Such weak high-order extrapolations require the existence of a leading error expansion for functionals of the underlying discrete time weak approximation Ys. Further results on extrapolation methods can be found in Hofmann (1994), Goodlett and Allen (1994), Kloeden, Platen and Hofmann (1995) and Mackevicius (1996). Artemiev (1985), Muller-Gronbach (1996), Gaines and Lyons (1997), Burrage (1998) and Mauthner (1998) have derived results on step size control. Furthermore, Hofmann (1994), Hofmann, Muller-Gronbach and Ritter (1998) have considered extrapolation methods with both step-size and order control. This is another challenging area of practical importance.
,
NUMERICAL METHODS FOR S D E S
217
7.4- M-stability and implicit weak methods The comments in Section 6.3 on numerical stability of discrete time approximations in the context of strong convergence apply equally to weak schemes. Numerical instability is a problem that arises in both strong and weak schemes, and similar methods can be used to study it. However, as we shall see below, it is again much easier to construct a weak method with satisfactory stability properties than a corresponding strong one. The crucial advantage in the construction of implicit schemes under the weak convergence criterion (4.2) lies in the freedom to choose the necessary random variables to be bounded. This allows us to construct weak schemes that have fully implicit terms for the noise part of the SDE. To highlight the importance of this fact, Hofmann and Platen (1994), Hofmann (1995) and Hofmann and Platen (1996) have considered a complexvalued test equation with multiplicative noise of Stratonovich type dXt = (l-a)\Xtdt
+ y/^-fXto dWt,
(7.10)
where A = Ai + A2 i and 7 = 71 + 72 i are complex numbers such that 7 2 = A. Here W again denotes a real-valued standard Wiener process. The real-valued parameter a E [0,2] describes the degree of stochasticity in the test equation (7.10). For a = 0 we have a purely deterministic equation. For a = 1, (7.10) represents a Stratonovich SDE without drift, while for a = 2 it can be written as an Ito SDE with no drift. Suppose that we can express a given stochastic numerical scheme, to be applied to the test equation (7.10) with equidistant step-size A, in the recursive form Yn+1 = G(XA,a)Yn,
(7.11)
where G is a complex-valued random function that does not depend on YQ, Y\, . . . , Y"n-i, Yn+i- Then we can introduce the M-stability set
r = {rQ : 0 < a < 2}, with stability region r a = { A A e C : Re(A) <0,ess w sup|G(AA,a)| < 1 } ,
(7.12)
for a G [0, 2]. Whereas the A-stability discussed in Section 6.3 can be linked to test equations with 'additive noise', the term M-stability is used with test equations that have 'multiplicative noise'. The ess^ sup in (7.12) denotes the essential supremum with respect to all OJ E fl, and in practice refers to the worst case scenario. To check for the worst possible paths is important because we have to protect a simulation against unstable scenarios. A single overflow in a large number of simulations can make the whole simulation study questionable. On the other hand, excluding some extreme simulated
218
E. PLATEN lambda2 0.5 -0.5
0.8
0.6
alpha
0.4
0.2
1ambda1
Fig. 7.1. M-stability set for the simplified Euler scheme
scenarios would certainly bias the result. Therefore it seems natural to judge stability on a worst case basis. For the simplified Euler scheme (7.2) the M-stability region is shown in Figure 7.1. We note that in the deterministic case, a = 0, the region of stability Fo is a circle and coincides with the yl-stability region discussed in Section 6.3. For a > 0 we note that the stability region of the simplified Euler scheme shrinks, and no longer includes the a-axis. This is a crucial observation telling us that reduction of the step-size A might lead us to exit the stability region. In the deterministic case, this is not the typical behaviour of a scheme. Such behaviour is usually observed only when the step-sizes are close to machine precision. In the stochastic case, the noise modelled by the dynamics of the SDE can already generate this type of instability for large step sizes, and has to be taken rather seriously. We see from Figure 7.1 that, for a = 2, the simplified Euler scheme has no M-stability at all. This is the martingale case for X, which is typical in asset price modelling in finance, where multiplicative noise arises naturally: see Hofmann et al. (1992). We also note that random walks and binomial trees which are typically implemented in many applications, particularly in finance, have the structure of the simplified Euler scheme and can therefore suffer serious instabilities.
NUMERICAL METHODS FOR S D E S
219
7.5. Implicit and predictor-corrector methods Implicit and predictor-corrector methods have much larger stability regions than most explicit schemes, and turn out to be better suited to simulation task with potential stability problems. Some results on implicit weak schemes in a weak context can be found in Milstein (1985, 1988a, 1995a), Drummond and Mortimer (1991) and Platen (1995). In Kloeden and Platen (1992/19956) the following family of weak implicit Euler schemes, converging with weak order (3 — 1.0, has been discussed: + {vb(Yn+1)
+ (1 - rj) b(Yn)} AWn.
(7.13)
Here AWn can be chosen as in (7.1), and we have to set an{y) = a{y)-r1b{y)b'{y)
(7.14)
for £ G [0,1] and 77 G [0,1]. For £ = 1 and 77 = 0, (7.13) leads to the drift implicit Euler scheme. The choice £ = 77 = 0 gives us the simplified Euler scheme, whereas for £ = 77 = 1 we have the fully implicit Euler scheme. It can be shown, for instance, that the fully implicit Euler scheme is Astable in the sense of Section 6.3. The exterior of the Af-stability set of the drift implicit Euler scheme with respect to test equation (7.10) is shown in Figure 7.2. One notes that the M-stability set is much larger in Figure 7.2 than in Figure 7.1. However, the a-axis is not included in this set and one has to choose a step-size with a value above a critical minimal size to guarantee stability. A family of implicit weak order 2.0 schemes has been proposed by Milstein (1995a) with + \b{Yn)b\Yn)((AWn)2
- A)/2
+ {b(Yn) + \ (b'(Yn) + (1 - 20 a'(Yn)) A} AWn + (1 - 2Q{0a'(Yn) + (1-/3) a'(Yn+1)} ^ - , where AWn is chosen as in (7.4). Kloeden and Platen (1992/19956) suggested the following type implicit weak order 2.0 scheme: Yn+l = Yn + \{a{Yn) +
(7.15)
Runge-Kutta-
a{Yn+l))A
,&Wn ' 4 _))((AW n ) 2 -A)/4,
(7.16)
220
E. PLATEN lambda2 -20
1.5
alpha 0.5
lambda1
20
Fig. 7.2. Exterior of M-stability set for the drift implicit Euler scheme
= Yn + a(Yn) A
b(Yn)
and
= Yn b(Yn where AWn is chosen as in (7.4). In deterministic numerical analysis, predictor-corrector methods are often used because of their numerical stability, inherited from the implicit counterpart of their corrector scheme. With a predictor-corrector method one is not forced to solve an algebraic equation at each time step as with an implicit method. For instance, a weak second-order predictor-corrector method (see Platen (1995)) is given by the corrector Yn+l = Yn + f (a(Yn+1) + a(Yn))
(7.17)
with AWn b(Yn)b'(Yn) ((AWn)2 ~ A) /2
NUMERICAL METHODS FOR S D E S
221
and the predictor
Yn+l = Yn + a(Yn) A + ipn+ (a(Yn) a'(Yn) + \ b2(Yn) a"(Yn)) +
^
b(Yn)a'(Yn)AWnA/2,
where AWn is as in (7.4). A list of references on schemes with implicit features and stochastic numerical stability has already been given in Section 6.3. Further publications dealing with aspects of weak approximations include Fahrmeier (1974), Milstein (1978), Platen (19806), Gladyshev and Milstein (1984), Platen (1984), Talay (1984), Milstein (1985), Ventzel, Gladyshev and Milstein (1985), Haworth and Pope (1986), Talay (1986), Milstein (1988a), Talay (1990), Talay and Tubaro (1990), Kloeden and Platen (19916), Mikulevicius and Platen (1991), Kloeden, Platen and Hofmann (1992a), Kannan and Wu (1993), Hofmann (1994), Hofmann and Platen (1994), Mackevicius (1994), Komori and Mitsui (1995), Bally and Talay (1996a, 19966), Hofmann and Platen (1996), Kohatsu-Higa and Ogawa (1997) and Milstein and Tretjakov (1997). Let us emphasize again that there is no point in trying to improve the efficiency of a simulation if its stability is not satisfactorily established.
7.6. Monte Carlo simulations of SDEs There exists a well-developed literature on general Monte Carlo methods. We might mention, among others, Hammersley and Handscomb (1964), Ermakov (1975), Sabelfeld (1979), Rubinstein (1981), Ermakov and Mikhailov (1982), Kalos and Whitlock (1986), Bratley, Fox and Schrage (1987), Bouleau (1990), Law and Kelton (1991), Ross (1991), Mikhailov (1992) and Fishman (1992). In focusing on weak numerical discrete time approximations of SDEs, one obtains greater insight into the stochastic analytic structure of the problem than one usually does in general Monte Carlo problems. One can exploit martingale representations as in Newton (1994), or measure transformations as discussed in Milstein (1988a) or Kloeden and Platen (1992/19956). These structures allow one to develop highly sophisticated Monte Carlo methods. In many cases these perform extremely well in circumstances where other methods fail, are difficult to implement or exceed available computing time. Functionals of the form u = E(g(XT)) can be approximated by weak approximations, as discussed in the context of the weak convergence criterion (4.2). One can form a straightforward
222
E. PLATEN
Monte Carlo estimate using the sample average 1
N
(7-18)
fc=i
a with N independent simulated realizations Yr(u>i),Yr{w2), I YT{OJN) of discrete time weak approximation Y at time T. The mean error jj, then has the form
A = UN,A -
E
(9(XT))
which we can decompose (see Kloeden and Platen (1992/19956)) into a systematic error /i s y s and a statistical error /xstat) such that A = Msys + Mstat,
(7.19)
where Msys
=
(j Z X )
- E(g(XT))
= E(g(YT))-E(g(XT)).
(7.20)
Obviously, the absolute systematic error \p,\ represents the critical variable under the weak-order convergence criterion (4.2). For a large number N of simulated independent sample paths of Y, we can conclude from the Central Limit Theorem that the statistical error /xstat becomes asymptotically Gaussian with mean zero and variance of the form Var(/x sta t) = Var(A) = jj Vai(g(YT)).
(7.21)
This reveals a significant disadvantage of Monte Carlo methods, because its deviation Dev( M s t a t ) = ^Var( / i s t a t ) = ~
y/Var{g(YT)),
(7.22)
decreases at only the slow rate N~1/2 as N —> oo. Thus the length of a corresponding confidence interval for the error is, for instance, only halved by a fourfold increase in the number N of simulated realizations. The Monte Carlo approach is very general and works in almost all circumstances. For high-dimensional functionals it is sometimes the only method of obtaining a result. One pays for this generality by the large sample sizes required to achieve reasonably accurate estimates. We note from (7.22) that the length of a confidence interval is also proportional to the square root of the variance of the simulated functional. As we shall see in the next section, this provides us with an opportunity to
NUMERICAL METHODS FOR S D E S
223
construct unbiased estimates for u = E(g(Yr)) with much smaller variances than the raw Monte Carlo functional (7.18).
7.7. Variance reduction
techniques
One can increase the efficiency in Monte Carlo simulation for SDEs considerably by using various variance reduction techniques. These reduce primarily the variance of the random variable actually simulated. There exist many ways of achieving substantial variance reduction. Only some of them can be mentioned here. Experience has shown that, to be effective, variance reduction techniques need to be adapted and engineered to the given specific problem. Some general variance reduction techniques from classical Monte Carlo theory usually result in only moderate improvements. Techniques that exploit to a high degree the stochastic analytic structure of the given functional of an SDE can easily yield savings in computer time corresponding to factors of several thousands. Useful references on variance reduction techniques in a more classical setting include Hammersley and Handscomb (1964), Ermakov (1975), Boyle (1977), Maltz and Hitzl (1979), Rubinstein (1981), Ermakov and Mikhailov (1982), Ripley (19836), Kalos and Whitlock (1986), Bratley et al. (1987), Chang (1987), Wagner (1987, 1988a, 19886, 1989a, 19896), Law and Kelton (1991) and Ross (1991). In what follows, we first mention some more classical Monte Carlo variance reduction techniques and then point to stochastic numerical variance reduction methods that use martingale representations or measure transformations for functionals of SDEs. The method of antithetic variates (see, for instance, Law and Kelton (1991) or Ross (1991)) is a very general method. It uses repeatedly the random variables originally generated in some symmetric pattern to construct sample paths, say for the driving Wiener process, in such a way that these offset each others' noise to some extent in the estimate. The simplest version of antithetic variates is obtained by using together a Wiener path realization W(u+), and its negative counterpart W(u-) = — W{LO+) in the sample. This reduces the time for the computation of the sample and, using certain symmetries, one can reduce the variance of the simulated estimators substantially. Another general method is through variance reduction by conditioning: see Law and Kelton (1991). For some a-algebra, or information set, !F, we can interpret the conditional expectation E(g(Xx) \ J7) as a variancereduced unbiased estimator for the functional E(g(XT)). The variance is then reduced according to the inequality
Vax(E(g{XT)\F))
< Var(g(XT)).
224
E. PLATEN
Here T represents some information about the path of X, for instance that this path remains in a certain region. Stratified sampling is another technique that has been widely used in Monte Carlo simulation: see, for instance, Glynn and Iglehart (1989), Ross (1991) and Fournie, Lebuchoux and Touzi (1997). A simple version of it can be described by dividing the whole sample space into M sets of disjoint events A\,..., AM with P(A{) = -^ for all i G { 1 , . . . , M}. For example, assume that the first step of the discrete time weak approximation ends up in one of the M equally probable states, where each of these events is indicated by writing VT.AJI i € { 1 , . . . , M} for the final value of Y at time T. Then we can use the unbiased estimator -,
M
where Ai is the above-mentioned random event. This estimator has variance
We note that for large M the variance of the estimate Z\ will be considerably smaller than that of the random variable g(Yr)The rather standard control variate technique is based on the selection of a random variable £ with known mean E(£) that allows one to construct the unbiased estimate
where the parameter Cov(g(YTU) Var(O is chosen to minimize the variance Var(Z 2 ) = Vav{g(YT)) + a2 Var(£) - 2aCov(<7(YT),£). Such a control variate can strongly exploit the stochastic analytic structure of the given functional. It turns out to be very powerful, as is shown, for example, in Hull and White (1988), Goodlett and Allen (1994), Newton (1994, 1997) and Heath and Platen (1996). Another variance reduction technique, the measure transformation method, was proposed and studied by Milstein (1988a, 1995a), Kloeden and Platen (1992/19956) and Hofmann et al. (1992). This introduces a new probability measure P via a Girsanov transformation. The underlying Wiener process W is then no longer a Wiener process under the measure P. This
NUMERICAL METHODS FOR S D E S
225
method formally computes the same functional as before, but uses the new measure P and thus some corresponding Wiener process W. The measuretransformed estimate can now be expressed under the original measure P, which then provides an unbiased estimate
where
E(g(YT)) = E lg(YT) — and ^p represents the Radon-Nikodym derivative of P with respect to the original measure P. Since the last relation can be fulfilled by a whole class of measure transformations, we have gained some degree of freedom and can seek a 'best' choice for P that reduces the variance significantly. With reasonable knowledge about the qualitative properties of the functional E(g(Yx)), this method can achieve considerable variance reductions. If we summarize the variance reduction techniques discussed above, then it is apparent that all of them are fairly general and most of them can be combined with each other. This turns out to be an important property, because great flexibility is needed to tailor efficient Monte Carlo estimates for specific functionals. It should be mentioned, however, that there seems to be no specific generally suitable method that is also highly efficient. Experience is required to find an appropriate variance-reduced estimator for a given functional. 7.8. Quasi-Monte Carlo approach Within this section we add some comments on the quasi-Monte Carlo approach, which is another technique for enhancing weak approximation methods. There is a rich literature on this subject with some reviews, for instance in Ripley (19836), Niederreiter (1992) and Niederreiter and Shine (1995). Applications can be found in Barraquand (1995), Paskov and Traub (1995) or Joy, Boyle and Tan (1996), among others. The approach can be illustrated by considering the probability density function px of the random variable XT in such a way that the functional to be computed is expressed in the form fOO
u = E(g(XT))=
g(x)Px(x)dx. J—oo
Consequently the estimation of the functional u appears as a numerical integration problem over (—00,00). If we denote by FxT the distribution
226
E. PLATEN
function of XT, then u can be expressed as
u=
j\(Fxl{z))dz.
A standard Monte Carlo simulation could now evaluate the sum i
N
where the Ri, i e { 1 , . . . , N}, are independent uniformly distributed random variables. In a quasi-Monte Carlo method these random variables are replaced by elements from some low-discrepancy sequence or point set: see, for instance, the book by Niederreiter (1992). Low-discrepancy point sets such as Sobol, Halton or Faure sequences, discussed for instance in Halton (1960), Sobol (1967), Tezuka (1993), Tezuka and Tokuyama (1994), Radovic, Sobol and Tichy (1996), Tuffin (1996, 1997) and Mori (1998), exhibit fewer deviations from uniformity compared to uniformly distributed random point sets. This can generally lead to faster rates of convergence compared to random sequences as discussed in Hofmann and Mathe (1997) and Sloan and Wozniakowski (1998). However, the gain in efficiency is not always balanced with the bias that may result from the use of these methods. Caution has to be exerted in dealing with simplistic quasi-Monte Carlo estimates that could lead to undesirable biases. 8. Further developments and conclusions In this final section, we comment on promising directions for further research, and briefly mention relevant literature that we have not included so far. A number of new research areas have been opened up in recent years, closely related to the area of numerical methods for SDEs. Discrete time approximations for the numerical analysis of functionals of ergodic diffusion processes that depend on the corresponding invariant law were studied by Grorud and Talay (Talay 1987, 1990, 1991, 1995, and Grorud and Talay 1990, 1996) and Arnold and Kloeden (1996). Here the time horizon becomes de facto infinite, and one aims to tackle questions related to the computation of Lyapunov exponents, rotation numbers and other characteristics of stochastic dynamical systems. The numerical solution of nonlinear stochastic dynamical systems has been studied by Kloeden, Platen and Schurz (1991, 19926) and Kloeden and Platen (1995a). SDEs with coloured noise were approximated by Manella and Palleschi (1989), Fox (1991) and Milstein and Tretjakov (1994). Weak approximations on a bounded domain, which relate to the solution of a corresponding parabolic partial differential equation, are constructed in Platen (1983), Milstein (19956, 1995c, 1996, 1997) and Hausenblas (1999a).
NUMERICAL METHODS FOR S D E S
227
This appears to be a very promising direction of future research, where stochastic numerical techniques provide access to efficient numerical solutions of partial differential equations with difficult boundary conditions. These methods seem to be also applicable in higher dimensions. Approximations to first exit times of diffusion processes from a region were considered, for instance, by Platen (1983, 1985) and Abukhaled and Allen (1998). Related to this are numerical methods for SDEs with reflection or boundary conditions. These were studied, for instance, by Gerardi, Marchetti and Rosa (1984), Lepingle (1993), Slominski (1994), Asmussen, Glynn and Pitman (1995), Petterson (1995), Lepingle (1995) and Hausenblas (19996). This is a technically demanding and growing area of research, where quantities such as local times have to be approximated. Discrete time approximations for Ito processes with jump component have already been studied, for instance by Wright (1980), Platen (1982a, 1984), Maghsoodi and Harris (1987), Mikulevicius and Platen (1988) and Maghsoodi (1994). Driven by practical applications in finance and insurance, this area can be expected to develop further in the long-term future. More generally, the discrete time strong and weak approximation of solutions of SDEs that represent semimartingales was studied by Marcus (1981), Platen and Rebolledo (1985), Protter (1985), Jacod and Shiryaev (1987), Mackevicius (1987), Bally (1989a, 19896, 1990), Gyongy (1991) and Kurtz and Protter (1991a, 19916). Special emphasis on semimartingale SDEs driven by Levy processes, including a-stable processes, was given in the book by Janicki and Weron (1994), and in papers by Kohatsu-Higa and Protter (1994), Janicki (1996), Janicki, Michna and Weron (1996), Protter and Talay (1997) and Tudor and Tudor (1997). Tudor and Tudor (1987) and Tudor (1989) have also approximated stochastic delay equations. Approximation schemes for two-parameter SDEs were suggested by Tudor and Tudor (1983), Yen (1988) and Tudor (1992). Numerical experiments and numerical schemes for stochastic partial differential equations are discussed by Liske (1985), Elliott and Glowinski (1989), Bensoussan, Glowinski and Rascanu (1990), LeGland (1992), Gaines (19956), Grecksch and Kloeden (1996), Ogorodnikov and Prigarin (1996), Gyongy and Nuarlart (1997), Werner and Drummond (1997) and Allen, Novosel and Zhang (1998). In Ma, Protter and Yong (1994), Douglas, Ma and Protter (1996) and Chevance (1997), numerical methods for forward-backward SDEs have been studied. This represents yet another new direction of research. Nonlinear diffusion processes that depend on related temporal and spatial partial differential equations were approximated by Ogawa (1992, 1994, 1995). Approximation schemes for Ito-Volterra SDEs have been suggested by Makroglou (1991) and Tudor and Tudor (1995). Averaging principles were applied to systems of singularly perturbed SDEs by Golec and Ladde (1990) and Golec (1995, 1997).
228
E. PLATEN
Almost in every area of stochastic modelling with finite-dimensional or infinite-dimensional dynamics, numerical methods have been or will soon be developed to provide quantitative results. The difficulties are often very similar in the different fields, and concern numerical stability, higher-order efficiency and variance reduction. For well-researched problems the development of standard software tools is becoming part of the general scientific work in the area. The construction of stochastic numerical schemes through symbolic manipulation and related questions were considered, for instance, by Valkeila (1991), Kloeden et al. (1992c), Kloeden and Scott (1993), Kendall (1993), Steele and Stine (1993), Xu (1995) and Cyganowski (1995, 1996). It should be emphasized that Monte Carlo simulation in general, and particularly when it uses discrete time weak approximations of SDEs, represents by its very nature a parallel algorithm. The numerical analysis for ODEs is well developed, with an established literature on parallel computation and supercomputing: see, for example, Burrage (1995). Software packages and tools for it are already available. Stochastic numerical methods applied in parallel computation, as discussed in Petersen (1987, 1988), Anderson (1990) and Hausenblas (19996), represent another promising area of research. It is expected that the numerical analysis of SDEs will experience a diverse and rapid development during the next few years. One aim of this paper is to encourage research in this rewarding but demanding interdisciplinary field. It involves stochastic calculus, numerical analysis, scientific computing, statistics and is linked to many applied areas, including finance, physics and microelectronics. The progress of stochastic modelling in important fields of application will depend to some extent on our ability to master the resulting quantitative challenges.
Acknowledgements The author gratefully acknowledges the support and criticism of Joe Gani and David Heath in writing this survey.
REFERENCES M. I. Abukhaled and E. J. Allen (1998), 'A recursive integration method for approximate solution of stochastic differential equations', Intern. J. Comput. Math. 66, 53-66. M. F. Allain (1974), Sur quelques types d'approximation des solutions d'equations differentielles stochastiques, PhD thesis, Univ. Rennes. E. J. Allen, S. J. Novosel and Z. Zhang (1998), 'Finite element and difference approximation of some linear stochastic partial differential equations', Stochastics and Stochastics Reports 64, 117-142. S. L. Anderson (1990), 'Random number generators on vector supercomputers and other advanced structures', SIAM Review 32, 221-251.
NUMERICAL METHODS FOR S D E S
229
M. V. Antipov (1995), 'Congruence operator of the pseudo-random numbers generator and a modification of Euclidean decomposition', Monte Carlo Methods Appl. 1, 203-219. M. V. Antipov (1996), 'Sequences of numbers for Monte Carlo methods', Monte Carlo Methods Appl. 2, 219-235. L. Arnold (1974), Stochastic Differential Equations, Wiley, New York. L. Arnold and P. E. Kloeden (1996), 'Discretization of a random dynamical system near a hyperbolic point', Mathematische Nachrichten 181, 43-72. S. S. Artemiev (1985), 'A variable step algorithm for numerical solution of stochastic differential equations', Chisl. Metody Mekh. Sploshn. Sredy 16, 11-23. In Russian. S. S. Artemiev (1993a), Certain aspects of application of numerical methods of solving SDE systems, in Numer. Anal., Vol. 1 of Bulletin of the Novosibirsk Computing Center, NCC Publisher, pp. 1-16. S. S. Artemiev (19936), The stability of numerical methods for solving stochastic differential equations, in Numer. Anal., Vol. 2 of Bulletin of the Novosibirsk Computing Center, NCC Publisher, pp. 1-10. S. S. Artemiev (1994), 'The mean square stability of numerical methods for solving stochastic differential equations', Russ. J. Numer. Anal. Math. Model. 9, 405416. S. S. Artemiev and A. T. Averina (1997), Numerical Analysis of Systems of Ordinary and Stochastic Differential Equations, VSP, Utrecht. S. S. Artemiev and I. O. Shkurko (1991), 'Numerical analysis of dynamics of oscillatory stochastic systems', Soviet J. Numer. Anal. Math. Model. 6, 277-298. S. Asmussen, P. Glynn and J. Pitman (1995), 'Discretization error in simulation of one-dimensional reflecting Brownian motion', Ann. Appl. Probab. 5, 875-896. M. A. Atalla (1986), Finite-difference approximations for stochastic differential equations, in Probabilistic Methods for the Investigation of Systems with an Infinite Number of Degrees of Freedom, Collection of Scientific Works, Kiev, pp. 11-16. In Russian. A. T. Averina and S. S. Artemiev (1986), 'A new family of numerical methods for solving stochastic differential equations', Soviet. Math. Dokl. 33, 736-738. A. T. Averina and S. S. Artemiev (1988), 'Numerical solutions of systems of stochastic differential equations', Soviet J. Numer. Anal. Math. Model. 3, 267-285. R. Azencott (1982), Stochastic Taylor formula and asymptotic expansion of Feynman integrals, in Seminaire de probability's XVI, Supplement, Vol. 921 of Lecture Notes in Math., Springer, pp. 237-285. L. Bachelier (1900), 'Theorie de la speculation', Annales de I'Ecole Normale Superieure, Series 3 17, 21-86. V. Bally (1989a), 'Approximation for the solution of stochastic differential equations. I: Lp-convergence', Stochastics and Stochastics Reports 28, 209-246. V. Bally (19896), 'Approximation for the solution of stochastic differential equations. II: Strong-convergence', Stochastics and Stochastics Reports 28, 357385. V. Bally (1990), 'Approximation for the solutions of stochastic differential equations. Ill: Jointly weak convergence', Stochastics and Stochastics Reports 30, 171-191.
230
E. PLATEN
V. Bally and D. Talay (1995), 'The Euler scheme for stochastic differential equations: Error analysis with Malliavin calculus', Math. Comput. Simul. 38, 3541. V. Bally and D. Talay (1996a), 'The law of the Euler scheme for stochastic differential equations I: Convergence rate of the distribution function', Probability Theory Related Fields 104, 43-60. V. Bally and D. Talay (19966), 'The law of the Euler scheme for stochastic differential equations II: Convergence rate of the density function', Monte Carlo Methods Appl. 2, 93-128. J. Barraquand (1995), 'Monte Carlo integration, quadratic resampling, and asset pricing', Math. Comput. Simul. 38, 173-182. G. BenArous (1989), 'Flots et series de Taylor stochastiques', Probability Theory Related Fields 81, 29-77. A. Bensoussan, R. Glowinski and A. Rascanu (1990), 'Approximation of the Zakai equation by the splitting up method', SIAM J. Control Optimiz. 28, 14201431. A. Bensoussan, R. Glowinski and A. Rascanu (1992), 'Approximation of some stochastic differential equations by the splitting up method', Appl. Math. Optim. 25, 81-106. A. Bjorck and G. Dahlquist (1974), Numerical Methods. Series in Automatic Computation, Prentice-Hall, New York. F. Black and M. Scholes (1973), 'The pricing of options and corporate liabilities', J. Political Economy 81, 637-659. N. Bouleau (1990), 'On effective computation of expectations in large or infinite dimension: Random numbers and simulation', J. Comput. Appl. Math. 31, 2334. N. Bouleau and D. Lepingle (1993), Numerical Methods for Stochastic Processes, Wiley, New York. G. Box and M. Muller (1958), 'A note on the generation of random normal variables', Ann. Math. Statist. 29, 610-611. W. E. Boyce (1978), 'Approximate solution of random ordinary differential equations', Adv. Appl. Probab. 10, 172-184. P. P. Boyle (1977), 'A Monte Carlo approach', J. Financial Economics 4, 323-338. P. Bratley, B. L. Fox and L. Schrage (1987), A Guide to Simulation, 2nd edn, Springer, New York. R. P. Brent (1974), 'A Gaussian pseudo number generator', Commun. Assoc. Comput. Mach. 17, 704-706. K. Burrage (1995), Parallel and Sequential Methods for Ordinary Differential Equations, Clarendon Press, Oxford University Press. K. Burrage and P. M. Burrage (1996), 'High strong order explicit Runge-Kutta methods for stochastic ordinary differential equations', Appl. Numer. Math. 22, 81-101. K. Burrage and P. M. Burrage (1998), 'General order conditions for stochastic Runge-Kutta methods for both commuting and non-commuting stochastic ordinary differential equation systems', Appl. Numer. Math. 28, 161-177. K. Burrage and E. Platen (1994), 'Runge-Kutta methods for stochastic differential equations', Ann. Numer. Math. 1, 63-78.
NUMERICAL METHODS FOR S D E S
231
K. Burrage, P. M. Burrage and J. A. Belward (1997), 'A bound on the maximum strong order of stochastic Runge-Kutta methods for stochastic ordinary differential equations', BIT 37, 771-780. P. M. Burrage (1998), Runge-Kutta methods for stochastic differential equations, PhD thesis, University of Queensland, Brisbane, Australia. J. C. Butcher (1987), The Numerical Analysis of Ordinary Differential Equations: Runge-Kutta and General Linear Methods, Wiley, Chichester. S. Cambanis and Y. Z. Hu (1996), 'Exact convergence rate of the Euler-Maruyama scheme and application to sample design', Stochastics and Stochastics Reports 59, 211-240. L. L. Casasus (1982), On the numerical solution of stochastic differential equations and applications, in Proceedings of the Ninth Spanish-Portuguese Conference on Mathematics, Vol. 46 of Ada Salmanticensia Ciencias, Univ. Salamanca, pp. 811-814. In Spanish. L. L. Casasus (1984), On the convergence of numerical methods for stochastic differential equations, in Proceedings of the Fifth Congress on Differential Equations and Applications, Univ. La Laguna, pp. 493-501. Puerto de la Cruz (1982), in Spanish, Informes 14. F. Castell and J. Gaines (1995), 'An efficient approximation method for stochastic differential equations by means of the exponential Lie series', Math. Comput. Simul. 38, 13-19. F. Castell and J. Gaines (1996), 'The ordinary differential equation approach to asymptotically efficient schemes for solution of stochastic differential equations', Ann. Inst. H. Poincare Probab. Statist. 32, 231-250. K. S. Chan and O. Stramer (1998), 'Weak consistency of the Euler method for numerically solving stochastic differential equations with discontinuous coefficients', Stochastic Process. Appl. 76, 33-44. C. C. Chang (1987), 'Numerical solution of stochastic differential equations with constant diffusion coefficients', Math. Comput. 49, 523-542. D. Chevance (1997), Numerical methods for backward stochastic differential equations, in Numerical Methods in Finance (L. C. G. Rogers and D. Talay, eds), Cambridge University Press, pp. 232-244. J. M. C. Clark (1978), The design of robust approximations to the stochastic differential equations of nonlinear filtering, in Communication Systems and Random Processes Theory (J. K. Skwirzynski, ed.), Vol. 25 of NATO ASI Series E: Applied Sciences, Sijthoff and Noordhoff, Alphen aan den Rijn, pp. 721-734. J. M. C. Clark (1982a), An efficient approximation scheme for a class of stochastic differential equations, in Advances in Filtering and Optimal Stochastic Control, Vol. 42 of Lecture Notes in Control and Inform. Sci., Springer, pp. 69-78. J. M. C. Clark (19826), A nice discretization for stochastic line integrals, in Stochastic Differential Systems, Vol. 69 of Lecture Notes in Control and Inform. Sci., Springer, pp. 131-142. J. M. C. Clark and R. J. Cameron (1980), The maximum rate of convergence of discrete approximations for stochastic differential equations, in Stochastic Differential Systems (B. Grigelionis, ed.), Vol. 25 of Lecture Notes in Control and Inform. Sci., Springer, pp. 162-171.
232
E. PLATEN
D. J. Clements and B. D. O. Anderson (1973), 'Well behaved ltd equations with simulations that always misbehave', IEEE Trans. Automat. Control 18, 676677. S. O. Cyganowski (1995), A Maple package for stochastic differential equations, in Computational Techniques and Applications: CTAC95 (A. K. Easton and R. L. May, eds), World Scientific. S. O. Cyganowski (1996), 'Solving stochastic differential equations with Maple', Maple Tech. 3, 38. M. I. Dashevski and R. S. Liptser (1966), 'Simulation of stochastic differential equations connected with the disorder problem by means of analog computer', Autom. Remote Control 27, 665-673. In Russian. G. Denk and S. Schaffer (1997), 'Adam's methods for the efficient solution of stochastic differential equations with additive noise', Computing 59, 153-161. J. Douglas, J. Ma and P. Protter (1996), 'Numerical methods for forward-backward stochastic differential equations', Ann. Appl. Probab. 6, 940-968. I. T. Drummond, S. Duane and R. R. Horgan (1983), 'The stochastic method for numerical simulations: Higher order corrections', Nuc. Phys. B220 FS8, 119— 136. I. T. Drummond, A. Hoch and R. R. Horgan (1986), 'Numerical integration of stochastic differential equations with variable diffusivity', J. Phys. A: Math. Gen. 19, 3871-3881. P. D. Drummond and I. K. Mortimer (1991), 'Computer simulation of multiplicative stochastic differential equations', J. Comput. Phys. 93, 144-170. A. A. Dsagnidse and R. J. Tschitashvili (1975), Approximate integration of stochastic differential equations, Tbilisi State, University, Inst. Appl. Math. 'Trudy IV, Tbilisi, pp. 267-279. In Russian. J. Eichenauer and J. Lehn (1986), 'A non-linear congruential pseudo random number generator', Statist. Paper 27, 315-326. A. Einstein (1906), 'Zur Theorie der Brownschen Bewegung', Ann. Phys. IV 19, 371. R. J. Elliott (1982), Stochastic Calculus and Applications, Springer. R. J. Elliott and R. Glowinski (1989), 'Approximations to solutions of the Zakai filtering equation', Stoch. Anal. Appl. 7, 145-168. K. Entacher, A. Uhl and S. Wegenkittl (1998), 'Linear congruential generators for parallel Monte Carlo: the leap-frog case', Monte Carlo Methods Appl. 4, 1-16. S. M. Ermakov (1975), Die Monte-Carlo-Methode und verwandte Fragen, Hochschulbiicher fur Mathematik, Band 72, VEB Deutscher Verlag der Wissenschaften, Berlin. In German: translation from Russian by E. Schincke and M. Schleiff. S. M. Ermakov and Mikhailov (1982), Statistical Modeling, 2nd edn, Nauka, Moscow. L. Fahrmeier (1974), 'Schwache Konvergenz gegen Diffusionsprozesse', Z. Angew. Math. Mech. 54, 245. L. Fahrmeier (1976), 'Approximation von stochastischen Differenzialgleichungen auf Digital- und Hybridrechnern', Computing 16, 359-371. J. F. Feng (1990), 'Numerical solution of stochastic differential equations', Chinese J. Numer. Appl. 12, 28-41.
NUMERICAL METHODS FOR S D E S
233
J. F. Feng, G. Y. Lei and M. P. Qian (1992), 'Second order methods for solving stochastic differential equations', J. Comput. Math. 10, 376-387. P. Fischer and E. Platen (1998), Applications of the balanced method to stochastic differential equations in filtering, Technical report FMRR 005-98, Australian National University, Canberra, Financial Mathematics Research Reports. G. S. Fishman (1992), Monte Carlo: Concepts, Algorithms and Applications. Series in Operations Research, Springer. E. Fournie, J. Lebuchoux and N. Touzi (1997), 'Small noise expansion and importance sampling', Asympt. Anal. 14, 331-376. R. F. Fox (1991), 'Second-order algorithm for the numerical integration of colorednoise problems', Phys. Rev. A 43, 2649-2654. J. N. Franklin (1965), 'Difference methods for stochastic ordinary differential equations', Math. Comput. 19, 552-561. J. G. Gaines (1994), 'The algebra of iterated stochastic integrals', Stochastics and Stochastics Reports 49, 169-179. J. G. Gaines (1995a), 'A basis for iterated stochastic integrals', Math. Comput. Simul. 38, 7-11. J. G. Gaines (19956), Numerical experiments with S(P)DE's, in Proceedings of the ICMS Conference March 1994, Cambridge University Press. J. G. Gaines and T. J. Lyons (1994), 'Random generation of stochastic area integrals', SIAM J. Appl. Math. 54, 1132-1146. J. G. Gaines and T. J. Lyons (1997), 'Variable step size control in the numerical solution of stochastic differential equations', SIAM J. Appl. Math. 57, 14551484. T. C. Gard (1988), Introduction to Stochastic Differential Equations, Marcel Dekker, New York.
C. W. Gear (1971), Numerical Initial Value Problems in Ordinary Differential Equations, Prentice-Hall, Englewood Cliffs, NJ. M. Gelbrich (1995), 'Simultaneous time and chance discretization for stochastic differential equations', J. Comput. Appl. Math. 58, 255-289. M. Gelbrich and S. T. Rachev (1996), Discretization for stochastic differential equations, Lp Wasserstein metrics, and econometrical models, in Distributions with fixed marginals and related topics, Vol. 28 of IMS Lecture Notes Monogr. Ser.,
Inst. Math. Statist., Hayward, CA, pp. 97-119. J. E. Gentle (1998), Random Number Generation and Monte Carlo Methods. Series in Statistics and Computing, Springer. A. Gerardi, F. Marchetti and A. M. Rosa (1984), 'Simulation of diffusions with boundary conditions', Systems Control Lett. 4, 253. I. I. Gikhman and A. V. Skorokhod (1979), The Theory of Stochastic Processes, Vol. I-III, Springer. S. A. Gladyshev and G. N. Milstein (1984), 'The Runge-Kutta method for calculation of Wiener integrals of functionals of exponential type', Zh. Vychisl. Mat. Mat. Fiz 24, 1136-1149. In Russian. P. Y. Glorennec (1977), 'Estimation a priori des erreurs dans la resolution numerique d'equations differentielles stochastiques', Seminaire de Probability's, Univ. Rennes 1, 57-93.
234
E. PLATEN
P. W. Glynn and O. L. Iglehart (1989), 'Importance sampling for stochastic simulations', Management Science 35, 1367-1392. J. Golec (1995), 'Stochastic averaging principle for systems with pathwise uniqueness', Stock. Anal. Appl. 13, 307-322. J. Golec (1997), 'Averaging Euler-type difference schemes', Stock. Anal. Appl. 15, 751-758. J. Golec and G. S. Ladde (1989), 'Euler-type approximation for systems of stochastic differential equations', J. Appl. Math. Simul. 2, 239-249. J. Golec and G. S. Ladde (1990), 'Averaging principle and systems of singularly perturbed stochastic differential equations', J. Math. Phys. 31, 1116-1123. S. T. Goodlett and E. J. Allen (1994), 'A variance reduction technique for use with the extrapolated Euler method for numerical solution of stochastic differential equations', Stock. Anal. Appl. 12, 131-140. L. G. Gorostiza (1980), 'Rate of convergence of an approximate solution of stochastic differential equations', Stochastics 3, 267-276. Erratum in Stochastics 4 (1981), 85. W. Grecksch and P. E. Kloeden (1996), 'Time-discretised Galerkin approximations of parabolic stochastic PDEs', Bull. Austral. Math. Soc. 54, 79-85. W. Grecksch and A. Wadewitz (1996), 'Approximation of solutions of stochastic differential equations by discontinuous Galerkin methods', J. Anal. Appl. 15, 901-916. H. S. Greenside and E. Helfand (1981), 'Numerical integration of stochastic differential equations', Bell Syst. Techn. J. 60, 1927-1940. A. Greiner, W. Strittmatter and J. Honerkamp (1987), 'Numerical integration of stochastic differential equations', J. Statist. Phys. 51, 95-108. A. Grorud and D. Talay (1990), Approximation of Lyapunov exponents of stochastic differential systems on compact manifolds, in Analysis and Optimization of Systems, Vol. 144 of Lecture Notes in Control and Inform. Sci., Springer, pp. 704-713. A. Grorud and D. Talay (1996), 'Approximation of Lyapunov exponents of nonlinear stochastic differential equations', SIAM J. Appl. Math. 56, 627-650. S. J. Guo (1982), 'On the mollifier approximation for solutions of stochastic differential equations', J. Math. Kyoto Univ. 22, 243-254. S. J. Guo (1984), 'Approximation theorems based on random partitions for stochastic differential equations and applications', Chinese Ann. Math. 5, 169-183. I. Gyongy (1991), 'On approximation of Ito stochastic equations', Math. SSR Sbornik 70, 165-173. I. Gyongy and D. Nuarlart (1997), 'Implicit scheme for stochastic partial differential equations driven by space-time white noise', Potential Analysis 7, 725-757. E. Hairer and G. Wanner (1991), Solving ordinary differential equations II: Stiff and differential algebraic systems, Springer. E. Hairer, S. P. N0rsett and G. Wanner (1987), Solving ordinary differential equations I: Nonstiff problems, Springer. J. H. Halton (1960), 'On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals', Numer. Math. 2, 84-90. J. M. Hammersley and D. C. Handscomb (1964), Monte Carlo Methods, Methuen, London.
NUMERICAL METHODS FOR S D E S
235
C. J. Harris (1976), Simulation of nonlinear stochastic equations with applications in modelling water pollution, in Mathematical Models for Environmental Problems (C. A. Brebbi, ed.), Pentech Press, London, pp. 269-282. E. Hausenblas (1999a), A Monte-Carlo method with inherited parallelism for solving partial differential equations with boundary conditions numerically, Dept. Math., University of Salzburg, Austria. Paper in progress. E. Hausenblas (19996), A numerical scheme using excursion theory for simulating stochastic differential equations with reflection and local time at a boundary, Dept. Math., University of Salzburg, Austria. Paper in progress. D. C. Haworth and S. B. Pope (1986), 'A second-order Monte-Carlo method for the solution of the Ito stochastic differential equation', Stock. Anal. Appl. 4, 151-186. D. Heath and E. Platen (1996), 'Valuation of FX barrier options under stochastic volatility', Financial Engineering and the Japanese Markets 3, 195-215. E. Helfand (1979), 'Numerical integration of stochastic differential equations', Bell Syst. Techn. J. 58, 2289-2299. D. B. Hernandez and R. Spigler (1992), 'A-stability of implicit Runge-Kutta methods for systems with additive noise', BIT 32, 620-633. D. B. Hernandez and R. Spigler (1993), 'Convergence and stability of implicit Runge-Kutta methods for systems with multiplicative noise', BIT 33, 654669. D. J. Higham (1998), Mean-square and asymptotic stability of numerical methods for stochastic ordinary differential equations, Strathclyde Mathematics Research Report 39, University of Strathclyde, Glasgow, UK. N. Hofmann (1994), Beitrage zur schwachen Approximation stochastischer Differentialgleichungen, PhD thesis, Dissertation Humboldt Universitat Berlin. N. Hofmann (1995), 'Stability of weak numerical schemes for stochastic differential equations', Math. Comput. Simul. 38, 63-68. N. Hofmann and P. Mathe (1997), 'On quasi-Monte Carlo simulation of stochastic differential equations', Math. Comput. 66, 573-589. N. Hofmann and E. Platen (1994), 'Stability of weak numerical schemes for stochastic differential equations', Comput. Math. Appl. 28, 45-57. N. Hofmann and E. Platen (1996), 'Stability of superimplicit numerical methods for stochastic differential equations', Fields Institute Communications 9, 93-104. N. Hofmann, T. Miiller-Gronbach and K. Ritter (1998), Optimal approximation of stochastic differential equations by adaptive step-size control, Preprint Nr. A-9-98, Fachbereich Mathematik, Freie Universitat Berlin. N. Hofmann, E. Platen and M. Schweizer (1992), 'Option pricing under incompleteness and stochastic volatility', Mathematical Finance 2, 153-187. Y. Z. Hu (1992), Series de Taylor stochastique et formule de Campbell-Hausdorff d'apres Ben Arous, in Seminaire de Probability XXVI, Vol. 1526 of Lecture Notes in Math., Springer, pp. 587-594. Y. Z. Hu (1996), Strong and weak order of time discretization schemes of stochastic differential equations, in Seminaire de Probability XXX, Vol. 1626 of Lecture Notes in Math., Springer, pp. 218-227. Y. Z. Hu and P. A. Meyer (1993), 'On the approximation of multiple Stratonovich integrals', in Stochastic Processes, Springer, pp. 141-147.
236
E. PLATEN
Y. Z. Hu and S. Watanabe (1996), 'Donsker's delta functions and approximation of heat kernels by the time discretization methods', J. Math. Kyoto Univ. 36, 499-518. J. Hull and A. White (1988), 'The use of control variate techniques in option pricing', J. Financial and Quantitative Analysis 23, 237-251. N. Ikeda and S. Watanabe (1989), Stochastic Differential Equations and Diffusion Processes, 2nd edn, North-Holland, Amsterdam. (1st edn (1981).) K. Ito (1944), 'Stochastic integral', Proc. Imp. Acad. Tokyo 20, 519-524. J. Jacod and P. Protter (1998), 'Asymptotic error distribution for the Euler method for stochastic differential equations', Ann. Probab. 26, 267-307. J. Jacod and A. N. Shiryaev (1987), Limit Theorems for Stochastic Processes, Springer. A. Janicki (1996), Numerical and Statistical Approximation of Stochastic Differential Equations with Non-Gaussian Measures, H. Steinhaus Center for Stochastic Methods in Science and Technology, Wroclaw, Poland. A. Janicki and A. Weron (1994), Simulation of Chaotic Behavior of a-stable Stochastic Processes, Vol. 178 of Monographs and Textbooks in Pure and Applied Mathematics, Marcel Dekker, New York. A. Janicki, Z. Michna and A. Weron (1996), 'Approximation of stochastic differential equations driven by a-stable Levy motion', Applicationes Mathematicae 24, 149-168. R. Janssen (1984a), 'Difference-methods for stochastic differential equations with discontinuous coefficients', Stochastics 13, 199-212. R. Janssen (19846), 'Discretization of the Wiener process in difference methods for stochastic differential equations', Stochastic Process. Appl. 18, 361-369. C. Joy, P. P. Boyle and K. S. Tan (1996), 'Quasi Monte Carlo methods in numerical finance', Management Science 42, 926-938. M. H. Kalos and P. A. Whitlock (1986), Monte Carlo Methods, Wiley-Interscience, New York. S. Kanagawa (1988), 'The rate of convergence for Maruyama's approximate solutions of stochastic differential equations', Yokohoma Math. J. 36, 79-85. S. Kanagawa (1989), 'The rate of convergence for approximate solutions of stochastic differential equations', Tokyo J. Math. 12, 33-48. S. Kanagawa (1995), 'Error estimation for the Euler-Maruyama approximate solutions of stochastic differential equations', Monte Carlo Methods Appl. 1, 165— 171. S. Kanagawa (1996), Convergence rates for the Euler-Maruyama type approximate solutions of stochastic differential equations, in Probability Theory and Mathematical Statistics, Proceedings of the Seventh Japan-Russia Symposium, World Scientific, Singapore, pp. 183-192. S. Kanagawa (1997), 'Confidence intervals of discretized Euler-Maruyama approximate solutions of SDE's', Nonlinear Analysis, Theory, Methods and Applications, Vol. 30, pp. 4101-4103. T. Kaneko and S. Nakao (1988), A note on approximations for stochastic differential equations, in Seminaire de Probability's XXII, Vol. 1321 of Lecture Notes in Math., Springer, pp. 155-162.
NUMERICAL METHODS FOR S D E S
237
D. Kannan and D. T. Wu (1993), 'A numerical study of the additive functional of solutions of stochastic differential equations', Dyn. Sys. Appl. 2, 291-310. I. Karatzas and S. E. Shreve (1988), Brownian Motion and Stochastic Calculus, Springer. W. S. Kendall (1993), Doing stochastic calculus with Mathematica, in Economic andfinancialmodeling with Mathematica, TELOS, Santa Clara, CA, pp. 214238. J. R. Klauder and W. P. Petersen (1985), 'Numerical integration of multiplicativenoise stochastic differential equations', SIAM J. Numer. Anal. 6, 1153-1166. P. E. Kloeden and R. A. Pearson (1977), 'The numerical solution of stochastic differential equations', J. Austral. Math. Soc. Ser. B 20, 8-12. P. E. Kloeden and E. Platen (1989), 'A survey of numerical methods for stochastic differential equations', J. Stochastic Hydrology and Hydraulics 3, 155-178. P. E. Kloeden and E. Platen (1991a), 'Relations between multiple Ito and Stratonovich integrals', Stoch. Anal. Appl. IX, 86-96. P. E. Kloeden and E. Platen (19916), 'Stratonovich and Ito stochastic Taylor expansions', Mathematische Nachrichten 151, 33-50. P. E. Kloeden and E. Platen (1992), 'Higher order implicit strong numerical schemes for stochastic differential equations', J. Statist. Phys. 66, 283-314. P. E. Kloeden and E. Platen (1992/19956), Numerical Solution of Stochastic Differential Equations, Vol. 23 of Appl. Math., Springer. P. E. Kloeden and E. Platen (1995a), Numerical methods for stochastic differential equations, in Nonlinear Dynamics and Stochastic Mechanics, CRC Math. Model. Series, CRC Press, Boca Raton, pp. 437-461. P. E. Kloeden and W. D. Scott (1993), 'Construction of stochastic numerical schemes through Maple', Maple Technical Newspaper 10, 60-65. P. E. Kloeden, E. Platen and N. Hofmann (1992a), Stochastic differential equations: Applications and numerical methods, in Proceedings of the 6th IAHR International Symposium on Stochastic Hydraulics, National Taiwan University, Taipeh, pp. 75-81. P. E. Kloeden, E. Platen and N. Hofmann (1995), 'Extrapolation methods for the weak approximation of Ito diffusions', SIAM J. Numer. Anal. 32, 1519-1534. P. E. Kloeden, E. Platen and H. Schurz (1991), 'The numerical solution of nonlinear stochastic dynamical systems: A brief introduction', J. Bifur. Chaos 1, 277286. P. E. Kloeden, E. Platen and H. Schurz (19926), 'Effective simulation of optimal trajectories in stochastic control', Optimization 1, 633-644. P. E. Kloeden, E. Platen and H. Schurz (1993), Higher order approximate Markov chain filters, in Stochastic Processes: A Festschrift in Honour of Gopinath Kallianpur (S. Cambanis et al., eds), Springer, pp. 181-190. P. E. Kloeden, E. Platen and H. Schurz (1994/1997), Numerical Solution of SDEs Through Computer Experiments, Universitext, Springer. P. E. Kloeden, E. Platen and I. Wright (1992c), 'The approximation of multiple stochastic integrals', Stoch. Anal. Appl. 10, 431-441. A. Kohatsu-Higa (1997), 'High order Ito-Taylor approximations to heat kernels', J. Math. Kyoto Univ. 37, 129-150.
238
E. PLATEN
A. Kohatsu-Higa and S. Ogawa (1997), 'Weak rate of convergence for an Euler scheme of nonlinear SDE's', Monte Carlo Methods Appl. 3, 327-345. A. Kohatsu-Higa and P. Protter (1994), The Euler scheme for SDEs driven by semimartingales, in Stochastic Anal, on Infinite Dimensional Spaces (H. Kunita and H. H. Kuo, eds), Pitman, pp. 141-151. W. E. Kohler and W. E. Boyce (1974), 'A numerical analysis of some first order stochastic initial value problems', SIAM J. Appl. Math. 27, 167-179. Y. Komori and T. Mitsui (1995), 'Stable ROW-type weak scheme for stochastic differential equations', Monte Carlo Methods Appl. 1, 279-300. Y. Komori, T. Mitsui and H. Sugiura (1997), 'Rooted tree analysis of the order conditions of ROW-type scheme for stochastic differential equations', BIT 37, 43-66. Y. Komori, Y. Saito and T. Mitsui (1994), 'Some issues in discrete approximate solution for stochastic differential equations', Comput. Math. Appl. 28, 269278. R. I. Kozlov and M. G. Petryakov (1986), 'The construction of comparison systems for stochastic differential equations and numerical methods', Nauka Sibirsk Otdel. Novosibirsk pp. 45-52. In Russian. T. G. Kurtz and P. Protter (1991a), 'Weak limit theorems for stochastic integrals and stochastic differential equations', Ann. Probab. 19, 1035-1070. T. G. Kurtz and P. Protter (19916), Wong-Zakai corrections, random evolutions and simulation schemes for SDE's, in Stochastic Analysis (E. M. E. MeyerWolf and A. Schwartz, eds), Academic Press, pp. 331-346. H. J. Kushner (1974), 'On the weak convergence of interpolated Markov chains to a diffusion', Ann. Probab. 2, 40-50. H. J. Kushner and P. G. Dupuis (1992), Numerical Methods for Stochastic Control Problems in Continuous Time, Vol. 24 of Applications of Mathematics, Springer, New York. D. F. Kuznetsov (1998), Some Questions in the Theory of Numerical Solution of ltd Stochastic Differential Equations, Saint Petersburg, State Technical University Publisher. In Russian. A. M. Law and W. D. Kelton (1991), Simulation Modeling and Analysis, 2nd edn, McGraw-Hill, New York. F. LeGland (1992), Splitting-up approximation for SPDEs and SDEs with application to nonlinear filtering, in Stochastic Partial Differential Equations and their Applications, Vol. 176 of Lecture Notes in Control and Inform. Sci., Springer, Berlin, pp. 177-187. D. Lepingle (1993), 'An Euler scheme for stochastic differential equations with reflecting boundary conditions', Computes Rendus Acad. Sci. Paris, Series I Math. 316, 601-605. D. Lepingle (1995), 'Euler scheme for reflected stochastic differential equations', Math. Comput. Simul. 38, 119-126. D. Lepingle and B. Ribemont (1991), 'A multistep approximation scheme for the Langevin equation', Stochastic Process. Appl. 37, 61-69. C. W. Li and X. Q. Liu (1997), 'Algebraic structure of multiple stochastic integrals with respect to Brownian motions and Poisson processes', Stochastics and Stochastics Reports 61, 107-120.
NUMERICAL METHODS FOR S D E S
239
H. Liske (1982), 'Distribution of a functional of a Wiener process', Theory of Random Processes 10, 50-54. In Russian. H. Liske (1985), 'Solution of an initial-boundary value problem for a stochastic equation of parabolic type by the semi-discretization method', Theory of Random Processes 113, 51-56. In Russian. H. Liske and E. Platen (1987), 'Simulation studies on time discrete diffusion approximations', Math. Corn-put. Simul. 29, 253-260. X. Q. Liu and C. W. Li (1997), 'Discretization of stochastic differential equations by the product expansion for the Chen series', Stochastics and Stochastics Reports 60, 23-40. J. Ma, P. Protter and J. M. Yong (1994), 'Solving forward-backward stochastic differential equations explicitly: a four step scheme', Probability Theory Related Fields 98, 339-359. V. Mackevicius (1987), 'S'p-stability of solutions of symmetric stochastic differential equations with discontinuous driving semimartingales', Ann. Inst. H. Poincare Probab. Statist. 23, 575-592. V. Mackevicius (1994), 'Second order weak approximations for Stratonovich stochastic differential equations', Lietuvos Matem. Rink. 34, 226-247. Translation in Lithuanian Math. Journal, 34, 183-200. V. Mackevicius (1996), Extrapolation of approximations of solutions of stochastic differential equations, in Probability Theory and Mathematical Statistics, World Scientific, River Edge, NJ, pp. 276-297. Y. Maghsoodi (1994), Mean-square efficient numerical solution of jump-diffusion stochastic differential equations, Preprint OR72, University of Southampton, UK. Y. Maghsoodi and C. J. Harris (1987), 'In-probability approximation and simulation of nonlinear jump-diffusion stochastic differential equations', IMA J. Math. Control Inform. 4, 65-92. A. Makroglou (1991), 'Numerical treatment of stochastic Volterra integro-differential equations', J. Comput. Appl. Math. II, 307-313. F. H. Maltz and D. L. Hitzl (1979), 'Variance reduction in Monte-Carlo computations using multi-dimensional Hermite polynomials', J. Comput. Phys. 32, 345-376. R. Manella and V. Palleschi (1989), 'Fast and precise algorithm for computer simulation of stochastic differential equations', Phys. Rev. A 40, 3381-3386. S. I. Marcus (1981), 'Modeling and approximation of stochastic differential equations driven by semimartingales', Stochastics 4, 223-245. G. Marsaglia and T. A. Bray (1964), 'A convenient method for generating normal variables', SIAM Review 6, 260-264. G. Maruyama (1955), 'Continuous Markov processes and stochastic equations', Rend. Circolo Math. Palermo 4, 48-90. S. Mauthner (1998), 'Step size control in the numerical solution of stochastic differential equations', J. Comput. Appl. Math. 100, 93-109. R. Merton (1973), 'The theory of rational option pricing', Bell Journal of Economics and Management Science 4, 141-183. G. A. Mikhailov (1992), Optimization of Weighted Monte Carlo Methods. Series in Computational Physics, Springer.
240
E. PLATEN
R. Mikulevicius and E. Platen (1988), 'Time discrete Taylor approximations for Ito processes with jump component', Mathematische Nachrichten 138, 93-104. R. Mikulevicius and E. Platen (1991), 'Rate of convergence of the Euler approximation for diffusion processes', Mathematische Nachrichten 151, 233-239. G. N. Milstein (1974), 'Approximate integration of stochastic differential equations', Theory Probab. Appl. 19, 557-562. G. N. Milstein (1978), 'A method of second order accuracy integration of stochastic differential equations', Theory Probab. Appl. 23, 396-401. G. N. Milstein (1985), 'Weak approximation of solutions of systems of stochastic differential equations', Theory Probab. Appl. 30, 750-766. G. N. Milstein (1987), 'A theorem on the order of convergence of mean-square approximations of solutions of systems of stochastic differential equations', Teor. Veroyatnost. i Primenen 32, 809-811. In Russian. G. N. Milstein (1988a), Numerical Integration of Stochastic Differential Equations, Urals Univ. Press, Sverdlovsk. In Russian. G. N. Milstein (19886), 'A theorem of the order of convergence of mean square approximations of systems of stochastic differential equations', Theory Probab. Appl. 32, 738-741. G. N. Milstein (1995a), Numerical Integration of Stochastic Differential Equations, Mathematics and its Applications, Kluwer, Dordrecht/Boston/London. G. N. Milstein (19956), 'The solving of boundary value problems by numerical integration of stochastic equations', Math. Comput. Simul. 38, 77-85. G. N. Milstein (1995c), 'Solving the first boundary value problem of parabolic type by numerical integration of stochastic differential equations', Theory Probab. Appl. 40, 657-665. G. N. Milstein (1996), 'Application of numerical integration of stochastic equations for solving boundary value problems with Neumann boundary conditions', Theory Probab. Appl. 41, 210-218. G. N. Milstein (1997), 'Weak approximation of a diffusion process in a bounded domain', Stochastics and Stochastics Reports 62, 147-200. G. N. Milstein and E. Platen (1994), The integration of stiff stochastic differential equations with stable second moments, Technical report SRR 014-94, Australian National University Statistics Report Series. G. N. Milstein and M. V. Tretjakov (1994), 'Numerical solution of differential equations with colored noise', J. Statist. Phys. 77, 691-715. G. N. Milstein and M. V. Tretjakov (1997), 'Numerical methods in the weak sense for stochastic differential equations with small noise', SI AM J. Numer. Anal. 34, 2142-2167. G. N. Milstein, E. Platen and H. Schurz (1998), 'Balanced implicit methods for stiff stochastic systems', SIAM J. Numer. Anal. 35, 1010-1019. B. J. Morgan (1984), Elements of Simulation, Chapman & Hall, London. M. Mori (1998), 'Low discrepancy sequences generated by piecewise linear maps', Monte Carlo Methods Appl. 4, 141-162. T. Miiller-Gronbach (1996), 'Optimal design for approximating the path of a stochastic process', J. Statist. Planning Inf. 49, 371-385. H. Nakazawa (1990), 'Numerical procedures for sample structures on stochastic differential equations', J. Math. Phys. 31, 1978-1990.
NUMERICAL METHODS FOR S D E S
241
N. J. Newton (1986a), 'An asymptotic efficient difference formula for solving stochastic differential equations', Stochastics 19, 175-206. N. J. Newton (19866), Asymptotically optimal discrete approximations for stochastic differential equations, in Theory and Applications of Nonlinear Control Systems, North-Holland, pp. 555-567. N. J. Newton (1990), 'An efficient approximation for stochastic differential equations on the partition of symmetrical first passage times', Stochastics 29, 227258. N. J. Newton (1991), 'Asymptotically efficient Runge-Kutta methods for a class of Ito and Stratonovich equations', SIAM J. Appl. Math. 51, 542-567. N. J. Newton (1994), 'Variance reduction for simulated diffusions', SIAM J. Appl. Math. 54, 1780-1805. N. J. Newton (1996), 'Numerical methods for stochastic differential equations', Z. Angew. Math. Mech. 76, 211-214. Suppl. 3, I-XVI. N. J. Newton (1997), Continuous-time Monte Carlo methods and variance reduction, in Numerical Methods in Finance, Newton Institute, Cambridge University Press, Cambridge, pp. 22-42. H. Niederreiter (1988), 'Remarks on nonlinear pseudo random numbers', Metrika 35, 321-328. H. Niederreiter (1992), Random Number Generation and Quasi-Monte-Carlo Methods, SIAM, Philadelphia, PA. H. Niederreiter and P. J. Shine (1995), Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing, Vol. 106 of Lecture Notes in Statistics, Springer. N. N. Nikitin and V. D. Razevig (1978), 'Methods of numerical modelling of stochastic differential equations and estimates of their error', Zh. Vychisl. Mat. Mat. Fiz 18, 106-117. In Russian. S. Ogawa (1992), 'Monte Carlo simulation of nonlinear diffusion processes', Japan J. Industrial and Appl. Math. 9, 25-33. S. Ogawa (1994), 'Monte Carlo simulation of nonlinear diffusion processes IF, Japan J. Industrial and Appl. Math. 2, 31-45. S. Ogawa (1995), 'Some problems in the simulation of nonlinear diffusion processes', Math. Comput. Simul. 38, 217-223. V. A. Ogorodnikov and S. M. Prigarin (1996), Numerical Modelling of Random Processes and Fields: Algorithms and Applications, VSP, Utrecht. B. Oksendahl (1985), Stochastic Differential Equations, Springer. E. Pardoux and D. Talay (1985), 'Discretization and simulation of stochastic differential equations', Ada Appl. Math. 3, 23-47. S. Paskov and J. Traub (1995), 'Faster valuation of financial derivatives', J. Portfolio Manag., pp. 113-120. W. P. Petersen (1987), Numerical simulation of Ito stochastic differential equations on supercomputers, in Random Media, Vol. 7 of IMA Vol. Math. Appl., Springer, pp. 215-228. W. P. Petersen (1988), 'Some vectorized random number generators for uniform, normal and Poisson distributions for CRAY X-MP', J. Supercomputing 1, 318-335. W. P. Petersen (1994a), 'Lagged Fibonacci series random number generators for the NEC SX-3', Intern. J. High Speed Computing 6, 387-398.
242
E. PLATEN
W. P. Petersen (19946), 'Some experiments on numerical simulations of stochastic differential equations and a new algorithm', J. Comput. Phys. 113, 75-81. W. P. Petersen (1998), 'A general implicit splitting for stabilizing numerical simulations of Ito stochastic differential equations', SIAM J. Numer. Anal. 35, 14391451. R. Petterson (1995), 'Approximations for stochastic differential equations with reflecting convex boundaries', Stochastic Process. Appl. 59, 295-308. E. Platen (1980a), Approximation of Ito integral equations, in Stochastic differential systems, Vol. 25 of Lecture Notes in Control and Inform. Sci., Springer, pp. 172-176. E. Platen (19806), 'Weak convergence of approximations of Ito integral equations', Z. Angew. Math. Mech. 60, 609-614. E. Platen (1981), 'An approximation method for a class of Ito processes', Lietuvos Matem. Rink. 21, 121-133. E. Platen (1982a), 'An approximation method for a class of Ito processes with jump component', Lietuvos Matem. Rink. 22, 124-136. E. Platen (19826), 'A generalized Taylor formula for solutions of stochastic differential equations', SANKHYA A 44, 163-172. E. Platen (1983), 'Approximation of first exit times of diffusions and approximate solution of parabolic equations', Mathematische Nachrichten 111, 127-146. E. Platen (1984), Zur zeitdiskreten Approximation von Itoprozessen, Diss. B., IMath, Akad. der Wiss. der DDR, Berlin. E. Platen (1985), On first exit times of diffusions, in Stochastic differential systems, Vol. 69 of Lecture Notes in Control and Inform. Sci., Springer, pp. 192-195. E. Platen (1987), Derivative free numerical methods for stochastic differential equations, Vol. 96 of Lecture Notes in Control and Inform. Sci., Springer, pp. 187193. E. Platen (1992), 'Higher-order weak approximation of Ito diffusions by Markov chains', Probability in the Engineering and Information Sciences 6, 391-408. E. Platen (1995), 'On weak implicit and predictor-corrector methods', Math. Comput. Simul. 38, 69-76. E. Platen and R. Rebolledo (1985), 'Weak convergence of semimartingales and discretization methods', Stochastic Process. Appl. 20, 41-58. E. Platen and W. Wagner (1982), 'On a Taylor formula for a class of Ito processes', Probability and Math. Statistics 3, 37-51. P. Protter (1985), 'Approximations of solutions of stochastic differential equations driven by semimartingales', Ann. Probab. 13, 716-743. P. Protter (1990), Stochastic Integration and Differential Equations, Springer. P. Protter and D. Talay (1997), 'The Euler scheme for Levy driven stochastic differential equations', Ann. Probab. 25, 393-423. I. Radovic, I. M. Sobol and R. F. Tichy (1996), 'Quasi-Monte Carlo methods for numerical integration: Comparison of different low discrepancy sequences', Monte Carlo Methods Appl. 2, 1-14. N. J. Rao, J. D. Borwankar and D. Ramkrishna (1974), 'Numerical solution of Ito integral equations', SIAM J. Control Optimiz. 12, 124-139. V. D. Razevig (1980), 'Digital modelling of multi-dimensional dynamics under random perturbations', Autom. Remote Control 4, 177-186. In Russian.
NUMERICAL METHODS FOR S D E S
243
B. D. Ripley (1983a), 'Computer generation of random variables: A tutorial letter', Inter. Statist. Rev. 45, 301-319. B. D. Ripley (19836), Stochastic Simulation, Wiley, New York. W. Romisch and A. Wakolbinger (1987), On the convergence rates of approximate solutions of stochastic equations, in Vol. 96 of Lecture Notes in Control and Inform. Sci, Springer, pp. 204-212. S. M. Ross (1991), A Course in Simulation, MacMillan, New York. R. Y. Rubinstein (1981), Simulation and the Monte Carlo Method, Wiley. W. Riimelin (1982), 'Numerical treatment of stochastic differential equations', SIAM J. Numer. Anal. 19, 604-613. L. B. Ryashko and H. Schurz (1997), 'Mean square stability analysis of some linear stochastic systems', Dyn. Sys. Appl. 6, 165-189. K. K. Sabelfeld (1979), 'On the approximate computation of Wiener integrals by Monte-Carlo method', Zh. Vychisl. Mat. Mat. Fiz 19, 29-43. In Russian. Y. Saito and T. Mitsui (1993a), 'Simulation of stochastic differential equations', Ann. Inst. Statist. Math. 45, 419-432. Y. Saito and T. Mitsui (19936), T-stability of numerical schemes for stochastic differential equations', World Sci. Ser. Appl. Anal. 2, 333-344. Y. Saito and T. Mitsui (1995), '5-series in the Wong-Zakai approximation for stochastic differential equations', Vietnam J. Math. 23, 303-317. Y. Saito and T. Mitsui (1996), 'Stability analysis of numerical schemes for stochastic differential equations', SIAM J. Numer. Anal. 33, 2254-2267. 0. Schein and G. Denk (1998), 'Numerical solution of stochastic differentialalgebraic equations with applications to transient noise simulation of microelectronic circuits', J. Comput. Appl. Math. 100, 77-92. H. Schurz (1996a), 'Asymptotical mean square stability of an equilibrium point of some linear numerical solutions with multiplicative noise', Stoch. Anal. Appl. 14, 313-354. H. Schurz (19966), 'Numerical regularization for SDEs: Construction of nonnegative solutions', Dyn. Sys. Appl. 5, 323-351. H. Schurz (1996c), Stability, stationarity and boundedness of some implicit numerical methods for stochastic differential equations, PhD thesis, Humboldt University, Berlin. A. Shimizu and T. Kawachi (1984), 'Approximate solutions of stochastic differential equations', Bull. Nagoya Inst. Tech. 36, 105-108. M. Shinozuka (1971), 'Simulation of multivariate and multidimensional random differential processes', J. Acoust. Soc. Amer. 49, 357-367. 1. O. Shkurko (1987), Numerical solution of linear systems of stochastic differential equations, in Numerical Methods for Statistics and Modeling, Novosibirsk, pp. 101-109. Collected Scientific Works, in Russian. I. H. Sloan and H. Wozniakowski (1998), 'When are quasi-Monte-Carlo algorithms efficient for high dimensional integrals?', J. Complexity 14, 1-33. L. Slominski (1994), 'On approximation of solutions of multidimensional SDEs with reflecting boundary conditions', Stochastic Process. Appl. 50, 197-219. I. M. Sobol (1967), 'The distribution of points in a cube and the approximate evaluation of integrals', USSR Comput. Math. Math. Phys. 19, 86-112.
244
E. PLATEN
J. M. Steele and R. A. Stine (1993), Mathematica and diffusions, in Economic and Financial Modeling with Mathematica, TELOS, Santa Clara, CA, pp. 192213. J. Stoer and R. Bulirsch (1993), Introduction to Numerical Analysis, 2nd edn, Springer. (1st edn (1980).) H. Sugita (1995), 'Pseudo-random number generator by means of irrational rotation', Monte Carlo Methods Appl. 1, 35-57. M. Sun and R. Glowinski (1994), 'Pathwise approximation and simulation for the Zakai filtering equation through operator splitting', Calcolo 30, 219-239. H. J. Sussmann (1988), Product expansions of exponential Lie series and the discretization of stochastic differential equations, in Stochastic Differential Systems, Stochastic Control Theory and Applications (W. Fleming and P. I. Lions, eds), Vol. 10 of IMA Vol. Math. Appl., Springer, pp. 563-582. D. Talay (1982a), Analyse Numerique des Equations Differentielles Stochastiques, PhD thesis, Universite de Provence, Centre Saint Charles. These 3eme Cycle. D. Talay (19826), 'Convergence pour chaque trajectoire d'un scheme d'approximation des EDS', Computes Rendus Acad. Sci. Paris, Series I Math 295, 249-252. D. Talay (1983a), How to discretize stochastic differential equations, in Nonlinear filtering and stochastic control, Vol. 972 of Lecture Notes in Math., Springer, pp. 276-292. D. Talay (19836), 'Resolution trajectorielle et analyse numerique des equations differentielles stochastiques', Stochastics 9, 275-306. D. Talay (1984), Efficient numerical schemes for the approximation of expectations of functionals of the solution of an SDE and applications, in Filtering and Control of Random Processes, Vol. 61 of Lecture Notes in Control and Inform. Sci., Springer, pp. 294-313. D. Talay (1986), 'Discretisation d'une equation differentielle stochastique et calcul approche d'esperances de fonctionelles de la solution', Model Math, et Anal. Numer. 20, 141-179. D. Talay (1987), Classification of discretization of diffusions according to an ergodic criterion, in Stochastic Modelling and Filtering, Vol. 91 of Lecture Notes in Control and Inform. Sci., Springer, pp. 207-218. D. Talay (1990), 'Second order discretization schemes of stochastic differential systems for the computation of the invariant law', Stochastics and Stochastics Reports 29, 13-36. D. Talay (1991), 'Approximation of upper Lyapunov exponents of bilinear stochastic differential equations', SIAM J. Numer. Anal. 28, 1141-1164. D. Talay (1995), Simulation of stochastic differential systems, in Probabilistic Methods in Applied Physics (P. Kree and W. Wedig, eds), Vol. 451 of Lecture Notes in Physics., Springer, Chapter 3, pp. 54-96. D. Talay and L. Tubaro (1990), 'Expansion of the global error for numerical schemes solving stochastic differential equations', Stock. Anal. Appl. 8, 483-509. U. Tetzlaff and H.-U. Zschiesche (1984), 'Naherungslosungen fur Itc-Differentialgleichungen mittels Taylorentwicklungen fur Halbgruppen von Operatoren', Wiss. Z. Techn. Hochschule Leuna-Merseburg 2, 332-339.
NUMERICAL METHODS FOR S D E S
245
S. Tezuka (1993), 'Polynomial arithmetic analogue of Halton sequences', ACM Trans. Model. Comput. Simul. 3, 99-107. S. Tezuka and T. Tokuyama (1994), 'A note on polynomial arithmetic analogue of Halton sequences', ACM Trans. Model. Computer Simul. 4, 279-284. C. Torok (1994), 'Numerical solution of linear stochastic differential equations', Comput. Math. Appl. 27, 1-10. J. F. Traub, G. W. Wasilkowski and H. Wozniakowski (1988), Information-Based Complexity, Academic Press, New York. C. Tudor (1989), 'Approximation of delay stochastic equations with constant retardation by usual Ito equations', Rev. Roumaine Math. Pures Appl. 34, 55-64. C. Tudor and M. Tudor (1983), 'On approximation in quadratic mean for the solutions of two parameter stochastic differential equations in Hilbert spaces', An. Univ. Bucuresti Mat. 32, 73-88. C. Tudor and M. Tudor (1987), 'On approximation of solutions for stochastic delay equations', Stud. Cere. Mat. 39, 265-274. C. Tudor and M. Tudor (1995), 'Approximation schemes for Ito-Volterra stochastic equations', Bol. Soc. Mat. Mexicana (3) 1, 73-85. C. Tudor and M. Tudor (1997), 'Approximate solutions for multiple stochastic equations with respect to semimartingales', Z. Anal. Anwendungen 16, 761768. M. Tudor (1992), 'Approximation schemes for two-parameter stochastic equations', Probability and Math. Statistics 13, 177-189. B. Tuffin (1996), 'On the use of low discrepancy sequences in Monte Carlo methods', Monte Carlo Methods Appl. 2, 295-320. B. Tuffin (1997), 'Comments on "On the use of low discrepancy sequences in Monte Carlo methods'", Monte Carlo Methods Appl. 4, 87-90. T. E. Unny (1984), 'Numerical integration of stochastic differential equations in catchment modelling', Water Res. 20, 360-368. E. Valkeila (1991), 'Computer algebra and stochastic analysis', CWI 4, 229-238. A. D. Ventzel, S. A. Gladyshev and G. N. Milstein (1985), 'Piecewise constant approximation for the Monte-Carlo calculation of Wiener integrals', Theory Probab. Appl. 24, 745-752.
W. Wagner (1987), 'Unbiased Monte-Carlo evaluation of certain functional integrals', J. Comput. Phys. 71, 21-33. W. Wagner (1988a), 'Monte-Carlo evaluation of functionals of solutions of stochastic differential equations. Variance reduction and numerical examples', Stoch. Anal. Appl. 6, 447-468. W. Wagner (19886), 'Unbiased multi-step estimators for the Monte-Carlo evaluation of certain functionals', J. Comput. Phys. 79, 336-352. W. Wagner (1989a), Stochastische numerische Verfahren zur Berechnung von Funktionalintegralen, Habilitation, Report 02/89, IMATH, Berlin. W. Wagner (19896), 'Unbiased Monte-Carlo estimators for functionals of weak solutions of stochastic differential equations', Stochastics and Stochastics Reports 28, 1-20. W. Wagner and E. Platen (1978), Approximation of Ito integral equations, Preprint ZIMM, Akad. Wissenschaften, DDR, Berlin.
246
E. PLATEN
M. J. Werner and P. D. Drummond (1997), 'Robust algorithms for solving stochastic partial differential equations', J. Comput. Phys. 132, 312-326. N. Wiener (1923), 'Differential space', J. Math. Phys. 2, 131-174. E. Wong and M. Zakai (1965), 'On the convergence of ordinary integrals to stochastic integrals', Ann. Math. Statist. 36, 1560-1564. H. Wozniakowski (1991), 'Average case complexity of multivariate integration', Bull. Amer. Math. Soc. 24, 185-194. D. J. Wright (1974), 'The digital simulation of stochastic differential equations', IEEE Trans. Automat. Control 19, 75-76. D. J. Wright (1980), 'Digital simulation of Poisson stochastic differential equations', Intern. J. Systems. Sci. 11, 781-785. K. Xu (1995), 'Stochastic pitchforkbifurcation: numerical simulations and symbolic calculations using Maple', Math. Comput. Simul. 38, 199. S. J. Yakowitz (1977), Computational Probability and Simulation, Addison Wesley, Reading, MA. T. Yamada (1976), 'Sur l'approximation des solutions d'equations differentielles stochastiques', Z. Wahrsch. Verw. Gebiete 36, 153-164. N. Yannios and P. E. Kloeden (1996), Time-discretization solution of stochastic differential equations, in Proc. CTAC 95 (R. L. May and A. K. Easton, eds), Computational Techniques and Applications: CTAC95, World Scientific, pp. 823-830. Y. Y. Yen (1988), 'A stochastic Taylor formula for functional of two-parameter semimartingales', Ada Vietnamica 13, 45-54.
Ada Numerica (1999), pp. 247-295
© Cambridge University Press, 1999
Computation of pseudospectra Lloyd N. Trefethen Oxford University Computing Laboratory, Wolfson Building, Parks Road, Oxford 0X1 3QD, England E-mail: LNTQcomlab.ox.ac.uk
There is more to the computation of pseudospectra than the obvious algorithm of computing singular value decompositions on a grid and sending the results to a contour plotter. Other methods may be hundreds of times faster. The state of the art is reviewed, with emphasis on methods for dense matrices, and a MATLAB code is given.
248
L. N. TREFETHEN
CONTENTS 1 Introduction 248 2 Norms and adjoints: matrices A and B 251 3 Spectrum and pseudospectra 252 4 A tutorial example 253 5 Discretization 254 6 Eigenvalues and eigenvectors 256 7 Scalar measures of nonnormality 258 8 Random perturbations 261 9 Contour plots via the SVD 263 10 Avoiding uninteresting sections of the z-plane 266 11 Projection to a lower-dimensional subspace 268 12 Triangularization & inverse iteration or Lanczos 271 13 Summary of speedups discussed so far 275 14 Parallel computation of pseudospectra 275 15 Global Krylov subspace iterations 276 16 Local Krylov subspace iterations 279 17 Curve-tracing for pseudospectral boundaries 280 18 Pseudospectra in Banach spaces 280 19 Pseudospectra and behaviour 282 20 A MATLAB program 284 21 Another example 286 22 Discussion 288 References 289
1. Introduction A new tool has become popular in the 1990s for the study of matrices and linear operators. The traditional tool is eigenvalues or spectra (for matrices or linear operators, respectively), which may reveal information about the behaviour of systems both linear and nonlinear, including stability, resonance, and accessibility to matrix iterations and preconditioners. Eigenvalues and spectra tend to be less informative, however, when the matrix or operator is non-Hermitian, or more generally, nonnormal (roughly, having nonorthogonal eigenvectors). Pseudospectra are sets in the complex plane that sometimes do better. For each e > 0, the e-pseudospectrum of a given matrix or operator is a nonempty set in the complex plane, and the spectrum and the field of values (= numerical range) can be recovered as special cases from the limits e —> 0 and (after peeling away an e-border region) e —> oo, respectively. Pseudospectra seem to have been invented independently (with different names) at least five times: by Landau (1975, 1976, 1977), Varah (1979),
COMPUTATION OF PSEUDOSPECTRA
249
Godunov et al. (Godunov 1992 and 1997, Godunov, Kiriljuk and Kostin 1990, Kostin 1991, Godunov, Antonov, Kirilyuk and Kostin 1993), myself (1990, 1992), and Hinrichsen, Pritchard and Kelb (Hinrichsen and Pritchard 1992, Hinrichsen and Kelb 1993). Aside from one plot by J. Demmel (1987), however, they seem not to have been computed numerically before 1990. This situation changed completely in the 1990s, and pseudospectra have now been computed for dozens of applications. Here is a list of some of them, ordered by year of publication. spectral methods for differential equations (Reddy and Trefethen 1990) approximate Fourier analysis (Donato 1991) matrix iterations (Nachtigal, Reichel and Trefethen 1992) Toeplitz matrices and operators (Reichel and Trefethen 1992) control theory (Hinrichsen and Pritchard 1992) random matrices (Trefethen 1992) Orr-Sommerfeld operator (Reddy, Schmid and Henningson 1993) Airy operator (Reddy, Schmid and Henningson 1993) flow in a channel (Trefethen, Trefethen, Reddy and Driscoll 1993) compressible boundary layer flow (Schmid et al. 1993) trailing line vortex flow (Schmid et al. 1993) Wiener-Hopf operators (Reddy 1993) stiffness of ordinary differential equations (Higham and Trefethen 1993) convection-diffusion operators (Reddy and Trefethen 1994) Hille-Phillips and Zabczyk operators (Baggett 1994) polynomial zerofmding (Toh and Trefethen 1994) magnetohydrodynamics (Borba et al. 1994) aerodynamic flutter (Braconnier, Chatelin and Dunyach 1995) flow down inclined plane (Olsson and Henningson 1995) rounding error analysis (Chaitin-Chatelin and Fraysse 1996) reaction-convection-diffusion equations (Higham and Owren 1996) preconditioners for fluid mechanics (Darmofal and Schmid 1996) absorbing boundary conditions (Driscoll and Trefethen 1996) waveform relaxation (Lumsdaine and Wu 1997) Papkovitch-Fadle problem (Trefethen 1997) Abel integral operators (Plato 1997) Ginzburg-Landau equations (Cossu and Chomaz 1997) non-Hermitian quantum mechanics (Davies 1999a) differential operators (Davies 19996) Markov chain 'cutoff phenomenon' (Jonsson and Trefethen 1998) Chebyshev polynomials of matrices (Toh and Trefethen 1999a) flow in a pipe (Trefethen, Trefethen and Schmid 1999) ionospheric instabilities (Flaherty, Seyler and Trefethen 1999) lasers and optical resonators (see Section 21).
250
L. N. TREFETHEN
As is common in the history of scientific computing, this progress has been made possible by developments in both hardware and algorithms. There is an obvious numerical method for plotting pseudospectra: compute an SVD (singular value decomposition) at each point on a grid in the complex plane, then send the results to a contour plotter. However, one can do better, typically by a factor of about N/4 for a problem of dimension N, even without using multiple processors or the techniques of sparse matrices. The aim of this article is to explain the ideas that make this possible. The style of the article is tutorial. The reader I imagine has an interest in eigenvalue problems for large matrices, probably arising as discretizations of differential or integral operators, and a suspicion that sometimes they do not reveal all they should about his or her problem. Among the questions he or she may ask are, When should I compute pseudospectra? How should I do it? What will they tell me? Throughout our discussion, ideas for matrices will be formulated in a manner consistent with the fact that, in most applications, the matrices we are dealing with are approximations to infinite-dimensional operators. Since pseudospectra are norm-dependent, it is essential to frame the matrix norms in a manner that permits them to converge to the appropriate continuous norms as the approximation is refined. We handle this by denning the inner product and norm associated with a matrix A with respect to a weighting matrix W, which might, for example, be a diagonal matrix of Gauss quadrature coefficients. The similarity transformation B = VKAH7"1 then provides a matrix B for which the equivalent problem of pseudospectra is associated with the usual Euclidean inner product and norm. The literature on the numerical computation of pseudospectra is growing, but still manageable, and in this article, all the papers I know of are cited. Let me acknowledge here at the beginning those authors I am aware of who have published on this subject: C. Bekas, Thierry Braconnier, Martin Briihl, Jean-Frangois Carpraux, Frangoise Chaitin-Chatelin, Jocelyne Erhel, Valerie Fraysse, Eduardo Gallestey, Stratis Gallopoulos, Luc Giraud, Sergei Godunov, Vincent Heuveline, Nicholas Higham, Didi Hinrichsen, Viktor Kostin, Shiu-Hong Lui, P. Lavallee, Osni Marques, Alan McCoy, Bernard Philippe, Tony Pritchard, Axel Ruhe, Milhoud Sadkane, Valeria Simoncini, Kim-Chuan Toh, Vincent Toumazou, and Anne Trefethen. To any others whom I may have overlooked, my sincere apologies. For an introduction to the noncomputational aspects of pseudospectra, I recommend Trefethen (1992, 1997, 1999), and Trefethen, Trefethen, Reddy and Driscoll (1993). In 1990, getting a good plot of pseudospectra on a workstation for a 30 x 30 matrix took me several minutes. Today I would expect the same of a 300 x 300 matrix, and pseudospectra of matrices with dimensions in the thousands are around the corner.
COMPUTATION OF PSEUDOSPECTRA
251
2. Norms and adjoints: matrices A and B Let A be a real or complex matrix or closed linear operator acting in a Hilbert space over the complex numbers C with inner product , ) and corresponding norm || ||. (The generalization to Banach spaces is discussed in Section 18.) In practice we are so often concerned with matrix discretizations of infinite-dimensional linear operators that it is important to be more explicit. The following manipulations in the context of pseudospectra were perhaps first written down in Section 5 and Appendix A of Reddy, Schmid and Henningson (1993). In the matrix case, we assume that a nonsingular weight matrix W has been prescribed and that , ) and || || are defined by (u, v) = {Wu)H(Wv) = uH(WHW)v,
(2.1)
||n||2 = (u,u) = (Wu)H(Wu) = uH(WHW)u.
(2.2)
Here and throughout this paper in similar contexts, u and v are column vectors and uH, the Hermitian conjugate, is the complex conjugate transpose of u, and similarly for W. Another way to write (2.1) and (2.2) is (u,v) = (Wu,Wv)2,
\\u\\ = \\Wu\\2,
(2.3)
where (u,v)2 = uHv and ||u||2 = uHu, 'the 2-norm'. In applications, W might be \fh times the identity, if A is obtained by discretization on a regular ID grid of spacing h, or it might be a nonconstant diagonal matrix of quadrature weights for discretizations on irregular grids. The adjoint of A, denoted A*, is defined by the condition (Au,v) = (u,A*v) for all u and v in the domains of A and .A*, respectively. In the matrix case, a little calculation shows that A* is given by A* = (WHW)~1AH{WHW).
(2.4)
If W = I, all the complications above vanish and we have (u, v) = (u, v)2, \\u\\ = \\u\\2, and A* = AH. Alternatively, for general W, we can make the complications go away by introducing the new matrix B = WAW~l.
(2.5)
If v = Au for some u and v, then (2.5) implies (Wv) = B(Wu), and by the definition of matrix norms subordinate to vector norms, this implies \\A\\ = ||-B||2- More generally, we have ||/(.A)|| = ||/(B)||2 for any function / . From (2.4) we can also compute BH = WA*W~\
(2.6)
252
L. N. TREFETHEN
revealing that the same transformation that takes A to B also takes A* to BH. The matrix A is normal if AA* = A*A or, equivalently, if A has a complete set of eigenvectors that are orthogonal with respect to the inner product , . From (2.4) we can calculate that this is the same as the equality BBH = BHB or, equivalently, the condition that B has a complete set of eigenvectors that are orthogonal with respect to the inner product , )2For example, A is normal with respect to , ) if it is self-adjoint or skewadjoint, and B is normal with respect to , )2 if it is Hermitian or skewHermitian. Sometimes we will say that a matrix is 'highly nonnormal' or 'far from normal', terms with no precise meaning beyond the idea that its eigenvectors, if they exist, are in some sense 'far from orthogonal'. 3. Spectrum and pseudospectra There is a function f(A) that we care about especially: the resolvent. For any 2 g C , the resolvent of A at z is the matrix or linear operator (z-A)
-l
if this exists and is bounded, where z — A is a shorthand for zl — A and I is the identity. The spectrum of A, denoted by A(A), is the set of z G C where the resolvent does not exist or is unbounded. The norm of the resolvent is
with B related to A as always by (2.5), and we use the convention that this quantity is defined for all z G C, including points in the spectrum A(A) = A(B), where it takes the value oo. For each e > 0, the e-pseudospectrum of A is defined by
Ae(A) = {zeC: IKs-A)" 1 !!^- 1 } = {zee-. i K z - s ) - 1 ! ^ - 1 } .
(3.1)
In words, the e-pseudospectrum is the subset of the complex plane bounded by the e" 1 level curve or curves of the resolvent norm. (Some authors use a strict inequality; it makes little difference for applications.) For z £ A(A), since ||(z — ^l)" 1 !! is the supremum over all unit vectors u and v of the subharmonic functions \(u, (z — A)~1v)\, it is a subharmonic function itself and hence satisfies the maximum principle, which implies that each bounded component of any e-pseudospectrum contains part of A (.A). The subharmonicity of the norm of the resolvent was pointed out by Boyd and Desoer (1985) and has been exploited for computational purposes by Gallestey (1998a, 19986).
COMPUTATION OF PSEUDOSPECTRA
253
Other conditions can be derived that are equivalent to (3.1). Here is the one that is the most important and most different: Ae(A)
= {z GC : z eA(A + E) for some £ with ||£7|| < e } (3.2) = {z eC : z G A(B + E) for some E with ||£|| 2 < e } .
In words, the e-pseudospectrum is the set of all complex numbers that are in the spectrum of some matrix or operator obtained by a perturbation of norm at most e. This definition implies that pseudospectra can be interpreted in terms of perturbations of spectra, but this does not mean that the analysis of perturbations is the main thing pseudospectra are useful for. On the contrary, other aspects of behaviour of a matrix or linear operator tend to be more important in applications, including growth or decay of ||^4n|| as a function of n and growth or decay of He^H as a function of t. I admit that over the years I have become exasperated by hearing so many people make the mistake of assuming that pseudospectra, since they can be defined by perturbations, must be a tool for coping robustly with rounding errors. In most applications, rounding errors are not the point at all. A starting point for computations is a third equivalent definition of pseudospectra. If crmia(A) denotes the smallest singular value of A, then we have Ae(A) = { z e C : amin(z-B)<e}.
(3.3)
Thus the pseudospectra of A are the sets in the z-plane bounded by level curves of the function am\n(z — B). For details of the equivalence of (3.1)-(3.3), see for example van Dorsselaer, Kraaijevanger and Spijker (1993). The mathematical foundations of such material are set forth in the book by Kato (1976).
4. A tutorial example To make the discussion concrete, this article is built around a single example of a highly nonnormal differential operator, which we shall treat computationally by a succession of methods. The operator is a time-reduced one-dimensional Schrodinger operator of a standard kind, except that the potential function that defines the operator is complex rather than real: Au(x)
= u" + (ex2 - dx4)u,
c = 3 + 3i,
(4.1)
(The constants have been chosen to make the behaviour interesting.) This operator acts on functions defined on the whole real line R. To be precise, the Hilbert space in which A acts is L2 = L2(—oo,oo), and the domain on which it is defined is the subset of L2 of functions that have a second derivative in L2. Roughly speaking, for small x, the potential defining A looks quadratic and complex, whereas for large x it is quartic and nearly
254
L. N. TREFETHEN
real. Note that A is invariant with respect to negation of x, which implies that if u(x) is an eigenfunction of A with eigenvalue A, then so is u(—x). In fact, it can be proved that all the eigenvalues of A are simple, and thus each eigenfunction is either even or odd. The observation that complex Schrodinger operators are highly nonnormal and have interesting pseudospectra is due to Brian Davies of King's College London (Davies 1999a). Our example is adapted from Davies' work. Here then is our task. We are presented with an operator such as (4.1) and wish to find out: What do its spectrum and pseudospectra look like? What do they tell us about its behaviour?
5. Discretization If an operator cannot be handled analytically, the usual course is to approximate it by finite matrices. For computations of e-pseudospectra, we are typically interested in small values of e, and thus high-accuracy approximations are desirable. This means that, wherever possible, one should discretize by spectral methods rather than finite differences or finite elements, since spectral methods have arbitrarily high order of accuracy for smooth problems (Canuto et al. 1988, Fornberg and Sloan 1994, Fornberg 1996). For our tutorial example, (4.1) has been discretized by a Chebyshev collocation spectral method on a finite interval [-L, L] with boundary conditions ) = 0. (One could work on [0, L] and separate the even and odd parts of the problem, which are orthogonal, but we did not do this.) For clarity, especially in the treatment of weight functions, let us spell this out. First the interval [-L, L] is approximated by the set of N + 2 Chebyshev points defined by
( J )
j = 0,...,N
+ l.
(5.1)
The operator A is then approximated on this grid by an N x AT matrix AN defined by the following prescription. For any iV-vector v, ANV is the iV-vector obtained by two steps: let p be the unique polynomial of degree < N + 1 with p(xj) = Vj for 1 < j < N, for j = 1 , . . . , N, (ANv)j = p"(xj) + (cxj - dxf)p{xj).
) = 0 and
The eigenfunctions of A decay exponentially and, as a consequence, we find that any particular eigenfunction can be computed accurately via discrete matrices AN for some finite L. For the portion of the spectrum and pseudospectra considered in this article, L = 10 is sufficiently large and, from now on, all of our numerical examples are based on L = 10. In the context of our spectral discretization, any N-vector v is associated with the continuous function u(x) equal to the polynomial interpolant p(x)
COMPUTATION OF PSEUDOSPECTRA
255
described above for \x\ < L and to zero for |x| > L. In particular, for each eigenvector v of AN, there is an associated continuous function u(x), and if v is sufficiently smooth and decays strongly enough to zero near x = we expect that u(x) will be close to an eigenfunction of A with approximately the same eigenvalue. The description above is all we need for differentiation, but to compute pseudospectra, we need to integrate, too. For the spectral discretization our weight matrix W will take the form W = diag(wi,W2,-
-,wN)
(5.2)
for suitable weights Wj > 0. A sufficient condition for ^4^ to converge in some sense to A as N —> oo is (5.3) for any sequence of indices j with Xj —> x, and any reasonable choice satisfying this condition (and indeed many choices that do not satisfy it) will generally produce good plots of pseudospectra. However, much better performance than mere convergence is achievable if we choose the weights based on ideas of Gauss quadrature, and in applications it is important to get this right if one is to be confident of the results. For our Chebyshev grid, there exists a set of Gauss (or Gauss-Chebyshev-Lobatto) weights {WJ} satisfying (5.3) such that rL
£
f{x)dx = 22
if/ is any function equal to v L2 — x2 times a polynomial of degree < 2iV—1. These weights are simply
and, from now on, these are our choice, with W defined accordingly by (5.2) and B by (2.5). Here is the MATLAB code segment that I used to construct the matrices A and B. D = zeros(N+2.N+2); i = (O:N+1)'; ci = [2;ones(N,l);2]; x = cos(pi*i/(N+l)); for j = O:N+1 cj = 1; if j==0 I j==N+l cj = 2; end
denom = c j * ( x ( i + l ) - x ( j + l ) ) ; denom(j+l) = 1; D(i+l,j+l) = ci.*(-l).~(i+j)./denom; if j>0 & j
256
L. N. TREFETHEN
D ( l , l ) = (2*(N+l)~2+l)/6; D(N+2,N+2) = -(2*(N+l) L = 10; x = x(2:N+l); x = L*x; D = D/L; A = D~2; A = A(2:N+1,2:N+1); A = A + (3+3*sqrt(-l))*diag(x.~2) - (l/16)*diag(x.~4); w = sqrt(pi*sqrt(L~2-x."2)/(2*(N+l))); B = zeros(N,N); f o r j = l : N , B ( : , j ) = w . * A ( : , j ) / w ( j ) ; end We now move on to the study of these matrix approximations to the differential operator (4.1). In doing so, however, we must note that, for some problems, it may not be realistic to expect an operator to be approximated by a single matrix. An example arises in the large-scale hydrodynamic stability calculations of Trefethen, Trefethen, Reddy and Driscoll (1993). Here, the operator depends on two Fourier parameters a and /?, and, for each choice of the parameters, there is a different discretization matrix Aap. Computing the resolvent norm at a point z requires the minimization of ^min(z—Aap) over all a and /3, and the optimal choices vary from one value of z to the next. Situations like this are not unusual in large-scale applications, and when they arise, it may be necessary to consider discretization and computation of pseudospectra in tandem rather than in sequence.
6. Eigenvalues and eigenvectors The first thing we compute are eigenvalues of A or, equivalently, B. For matrices of dimensions less than 1000 or so, this is easily done by standard 'direct' methods related to the QR algorithm, which deliver results to close to machine precision (not counting what is lost to ill-conditioning) in O(NZ) floating point operations. For matrices of larger dimensions, Krylov subspace iterations are generally used instead to determine not all the eigenvalues but those in the portion of the complex plane considered important (Lehoucq, Sorensen and Yang 1998, Saad 1992). For our example problem, we can get away with dimensions small enough for direct methods to be appropriate, and Figure 1 shows eigenvalues calculated by standard methods for the spectral approximations A^ with JV = 140, 160, 180, 200. As is typical with discretizations of differential operators, it is the eigenvalues closest to the origin that are obtained for the smaller values of N, since these tend to correspond to smoother eigenmodes, resolvable on coarser grids. Once N is large enough that all the eigenvalues in this frame are essentially correct, we observe a Y-shaped distribution, with all the eigenvalues lying in the left half-plane and an infinite curve of them extending towards — oo. Of course, when approaching a problem like this in practice, one must take pains at every step to vary all possible aspects of the discretization systematically until one is confident that the results are
COMPUTATION OF PSEUDOSPECTRA
257
N = 140 it :
* *
: *
it : it
*
.
*
*
\ : : : : :
:
;
1
Fig. 1. Eigenvalues of matrix approximations AN to (4.1) of dimensions N = 140,160,180,200. Dots mark simple eigenvalues and stars mark nearly degenerate (though not exactly degenerate) pairs. As iV" increases, the two paths on the left zip together into a single line. For N = 200, the values throughout this part of the complex plane are accurate to 3 digits or more, and we take ^200 a s our matrix A for subsequent computations. (The labels 1, 3, 16, 50 are utilized in Figure 2.) Another 147 eigenvalues of A2QO lie outside the axis limits to the left
258
L. N. TREFETHEN
correct. In some cases convergence theorems will be available to give further reassurance (Chatelin 1983). Figure 1 is reminiscent of Figure 3 and other figures in the paper by Reddy, Schmid and Henningson (1993) on pseudospectra of Orr-Sommerfeld operators. That paper represents an outstanding first example of a study in which pseudospectra of a differential operator were computed carefully. Computation times will occasionally be reported in this article, all based on MATLAB programs executed on a SUN Ultra 30 workstation. To find the eigenvalues of ^200 takes a little more than one second, whereas the smaller matrix A140 can be handled in less than half a second. It is typical in applications to encounter a picture like Figure 1, which blends some degree of complexitj' with a great deal of structure. Naturally, one wants to know more, and a first question one may ask is, What do the eigenvectors look like? For this question, A and B are no longer identical: if v is an eigenvector of A, then Wv is the corresponding eigenvector of B. For a plot representing physical space, it is the former that we want, and Figure 2 shows four of the eigenvectors of ^200 The four nearly degenerate eigenvalues in the upper-right branch of the Y correspond to even/odd eigenfunction pairs. For example, 'mode 3', illustrated in the figure, is even, whereas 'mode 4' is odd, but the two eigenvalues differ by less than one part in 108. Modes 1 and 16 are representative of eigenvectors that 'live', loosely speaking, in the quadratic, complex part of the potential, where \x\ is small enough that the ex2 term dominates the dx4 term in (4.1), whereas mode 50 is one for which the dxA term is dominant.
7. Scalar measures of nonnormality Suppose we suspect that A may be highly nonnormal. Before turning to pseudospectra, there are various scalars we might compute in an attempt to shed light on this matter. An early and influential paper on this topic was by Henrici (1962), and further contributions have been due to Chaitin-Chatelin and Fraysse (1996), among others. For simplicity we use the B formulation of Section 2; all our statements have twins for A. One scalar we might consider, which goes back essentially to Henrici, is
\\BHB-BBHh
11*112 For the matrix .B200 this ratio has the value 0.01843, a number that seems fully converged for the limit N —> 00 (for .B300 we get 0.01843 again). If the denominator of (7.1) is changed to ||-B2||2, as suggested by ChaitinChatelin and Fraysse (1996), we get the same results to five digits, and if we further replace the Euclidean norm by the Frobenius norm \\B\\2F —
259
COMPUTATION OF PSEUDOSPECTRA
mode 1 (even)
mode 3 (even)
mode 16 (odd)
mode 50 (odd)
-10
-5
10
Fig. 2. Eigenvectors corresponding to the four eigenvalues of ^200 labelled in Figure 1. (The mode numbers are sorted by decreasing real part.) The inner curves are the real parts (subject to change with complex scaling), and the outer envelopes are the absolute values and their negatives. These modes are actually more accurate than they look, for they have been plotted by straight line interpolation between points, whereas in fact the mathematical model is based on polynomial interpolants. Notice that mode 50, with about 2 points per wavelength, is near the limit of resolution for this grid
260
L. N. TREFETHEN
J212\hj\2 to obtain what Chaitin-Chatelin and Fraysse call the Henrici number, the numbers change only modestly to 0.02602. These results suggest that, in some global sense, B is close to normal. This is a reflection of the fact that since the coefficient d in (4.1) is real, the nonnormality of this operator is localized for small \x\. Another scalar we might consider, also going back to Henrici, is
\\Th 11*112' where T is the strictly upper-triangular part of a Schur triangularization (unitary triangularization) of B. Different Schur triangularizations may lead to different values of ||T|| 2 , so (7.2) as it stands is not well defined, though it could be made so, at least for theoretical purposes, by taking the infimum over all Schur triangularizations. For .E^oo, with the triangularization computed arbitrarily by MATLAB, the ratio comes out as 0.02205, again effectively converged for N —> 00 (for .B300 it is 0.02204). Switching to the Frobenius norm, which makes the ratio independent of Schur triangularization, changes the result to 0.02166 (0.02164). A third scalar we might consider is the distance of B to the set of normal matrices. For matrices measured in the Frobenius norm, this cannot be too far from the previous estimate, according to an inequality established by Laszlo (1994), < inf{ \\B - N\\F : N is normal} < ||T|| F .
(7.3)
So far, the departure from normality of our operator appears modest, perhaps too small to be important. However, there is a further scalar that tells a different story. If v\,..., vN are a set of linearly independent eigenvectors of B, each normalized by \\vj\\2 = 1 (the normalization is not necessary, just convenient), then an eigenvector matrix for B is an N x N matrix V whose columns are these vectors taken in any order. The condition number of V is the real number 1
which is necessarily > 1. The Bauer-Fike Theorem asserts that if B is perturbed by E, then the eigenvalues move by at most K2(V)||.EI||2. If K 2(V) = 1) then V must be unitary and B must be normal. If K2(V) 3> 1, on the other hand, perhaps there is a need to look beyond eigenvalues. For our example £200 w e n n d K2{V) = 2.83 x 1012. Evidently the matrix of eigenvectors of B is very ill-conditioned indeed. This number, unlike our previous ones, is not quite converged for N —> 00; for Bieo we get K 2 ( V ) = 8.95 x 1011 and for B 2 4 0 we get K 2 ( V ) = 3.74 x 1012. However, convergence to a finite value as N —> 00 does seem to take place;
COMPUTATION OF PSEUDOSPECTRA
261
£300 gives K2(V) = 3.79 x 1012 and BM0 gives K2(V) = 3.75 x 10 12 . In particular, the conclusion that K2(V) is of order 1012 is genuine, and is not a symptom of machine precision on our computer. Are some of the individual eigenvalues to blame for this pronounced nonnormality? To find out, we can look at the condition numbers of the eigenvalues, denned for a simple eigenvalue A (Wilkinson 1965, p. 68) by
*<*>=wk
<7-4>
where w and v are normalized left and right eigenvectors of B corresponding to the eigenvalue A, respectively. The significance of K(\) is that a perturbation B —> B + E may alter the eigenvalue A by as much as re(A)||.E||2 (in the limit of infinitesimal perturbations), but not more. Each eigenvalue necessarily has K(X) < K,2{V)- The condition numbers of some of the eigenvalues of our operator are indicated in Figure 3. Evidently the eigenvalues near the origin, in the bottom-right part of the Y, are well conditioned. As one moves towards the fork of the Y, however, the condition numbers increase to about 1011. (The behaviour in the line of nearly degenerate eigenvalue pairs is similar.) If we continue past the fork further into the left half-plane, K(A) increases gradually to a maximum of about 3.6 x 10 11 for N = 200, which becomes 4.3 x 10 11 for iV = 240. It is apparent that the extreme ill-conditioning of the eigenvector matrix of B is reflected in the individual eigenvalues, but not just in one of them - in nearly all. Evidently there is a collective phenomenon at play here, a pattern that transcends individual eigenmodes, and, indeed, this is perhaps as far as scalar measures of nonnormality can usefully take us. 8. Random perturbations Having decided to move beyond scalars, we begin to think about pseudospectra. Now pseudospectra are sets in the complex plane, and the first question to ask is, What does it mean to compute them? Do we want a picture of boundaries of Ae (A) for various values of e? Do we want some kind of surface plot? Do we want an approximate functional representation of the function ||(z — A)"11|? The customary answer in the literature to date has been the first of these options, a graphical picture of boundary contours, and that is what we consider in this article, but it is possible that other variations will become popular in the future. There is a simple idea for producing approximate pictures of pseudospectra: modify A by one or more small complex (even if A is real) random perturbations and look at the spectra of these perturbations. If A is a matrix, the idea of a random perturbation is well defined. Working as usual with the equivalent matrix B, we note that the set of all possible perturbations E of a given norm e is compact and can be uniformly sampled by taking
262
L. N. TREFETHEN
150
100 --
,
^ II
J
oi
II A?
m —
o I" © * CO' // ^c*
?
o
CO «V co'
J" C3 o>
H ^
'
II
-
o CO' //
»? oi
*
50 -
5.A e +
3
A O
-100
-50
50
Fig. 3. Condition numbers K(A) of some of the eigenvalues of the matrix B = B2oo- The condition numbers of the four starred nearly degenerate eigenvalue pairs, from right to left, are 1.6 x 106, 6.7 x 107, 1.6 x 109, and 2.9 x 1010
random N x N matrices D with entries cr+ir, where a and r are independent standard normals, and then setting E = tD/\\D\\2- Since ||.D||2 ~ y/2N as N —> oo, where A^ is the dimension, approximately the same effect can be achieved with less computation by the formula E = eD/y/2N. The approximation of pseudospectra by random perturbations was illustrated by Trefethen (1992), where for each of thirteen example matrices with N — 32, 100 random perturbations A + E were considered and the 3200 resulting eigenvalues superimposed as small dots. Possibly the first computed examples of this kind were published by Trefethen (1990). In investigating random perturbations, there is no need to consider matrices E of full rank. As pointed out perhaps first by Riedel (Riedel 1994), the boundary of the e-pseudospectrum can equally well or better be traced by matrices of rank one, and so an alternative to the formulas above would be D = xyH, where x and y are vectors of independent entries a + ir, followed by E = eD/\\D\\2 = eD/(\\x\\2 IMh)- Proceeding in this way, we may trace the boundaries of the pseudospectra somewhat more efficiently, and there is no need for the computation of the norm of a matrix.
COMPUTATION OF PSEUDOSPECTRA
263
As a practical matter, random perturbations are a valuable tool that should be used routinely in dealing with highly nonnormal matrices. Random perturbations are more important than the scalar measures of nonnormality discussed in the last section, for they reveal more without being much more expensive to calculate. To illustrate how compelling this technique may be, Figure 4 shows the eigenvalues of one random rank one perturbation in each case of the form B —> B + E, where B = B200 for our problem and ||£|| 2 = € = 1CT 1 ,10" 3 ,10" 5 ,10" 7 . (The results look about the same for perturbations of full rank.) For the first time we begin to see the 'shape' of this matrix A. As we would expect on the basis of Figure 3, the degree of nonnormality is pronounced all along the tail of the Y extending into the left half-plane, and reasonably uniform along that path. There are three problems with the technique of computing eigenvalues of random perturbations. One is that it gives only an approximate picture of the pseudospectra. Another is that pictures of this kind all too easily mislead people into presuming that the main point of analysis of pseudospectra is the investigation of perturbations. Finally, if A is an operator of infinite dimension, the notion of a random perturbation does not make sense, because E must range over a space that is not compact and thus cannot be sampled uniformly. In practice, therefore, I recommend that one begin by computing eigenvalues of random perturbations of finite matrices, without worrying too much what the precise definition of 'random' is in the case of a discretized operator, but then move on to other methods if the sensitivity to perturbations of the physically important eigenvalues is large. A different aspect of random perturbations is that, in some applications, a perturbation of a structured kind may reveal certain algebraic properties of a matrix or operator. An important special case is that, if A is real, plotting the eigenvalues of real perturbations A + E may reveal the Jordan structure of A; the 'spider plots' of Chaitin-Chatelin and Fraysse that show this effect are beautiful and fascinating, and one of them appears on the cover of their book (Chaitin-Chatelin and Fraysse 1996). A systematic study of structured perturbations has been the subject of a number of papers by Hinrichsen and Pritchard and Kelb (Hinrichsen and Pritchard 1992, 1994, Hinrichsen and Kelb 1993).
9. Contour plots via the SVD Let us now calculate pseudospectra properly. The place to begin is with the singular value decomposition, taking advantage of definition (3.3). The obvious algorithm is to evaluate (Tm-m(z — B) for values of z on a grid in the complex plane and then generate a contour plot from this data. (If B is Hermitian, the picture will be symmetric with respect to the real axis, and one halves the computation time by taking advantage of this symmetry.)
264
L. N. TREFETHEN
e - lO"1
*
: :
*
: : : ;
1
50
-100
e = 10" 7
€ = 10" 5
.
!^
. .
: N;.«*"
:
*
*
,
*
;
:
\
:
*
:
:
:
;
'.
:
*
:
;
;
:
;
i
*
50
-100
Fig. 4. Poor man's pseudospectra: eigenvalues of random rank one perturbations -B2oo + E, \\E\\2 =e= 1 0 " 1 , 1 0 " 3 , 1 0 - 5 , 1 0 " 7 . The great sensitivity to perturbations confirms that B200 is a highly nonnormal matrix
COMPUTATION OF PSEUDOSPECTRA
265
150
100
-100
-50
Fig. 5. Boundaries of e-pseudospectra of the matrix A = A2oo for e = 10~\ 10~ 2 ,..., 10~10, from outside in. This is a fine picture, but producing it by the obvious SVD-based algorithm involving a 100 x 100 grid requires 4 hours of computing time on a SUN Ultra 30 workstation Here, for example, is a MATLAB code fragment for this kind of computation, assuming v points in each direction on the grid: I = eye(size(B)); for j = l:nu for i = l:nu z = zz(i,j); sigmin(i,j) = min(svd(z*I-B)); end end contour(x,y,loglO(sigmin),-10:-l);
Figure 5 shows numerically computed pseudospectra for our matrix B = B2oo- This is typical of dozens of images of pseudospectra that have appeared in the literature since Trefethen (1992). For this image, amm(zij — B) was evaluated for 10,000 points z^j on a 100 x 100 regular grid in a square portion of the complex plane, and the resulting values were given as data to MATLAB'S contour plotter, just as in the code fragment above. (Whether or
266
L. N. TREFETHEN
not one introduces the logarithm makes negligible difference.) We see at a glance that the eigenvalues in the two finite branches of the Y have sensitivities that increase as one approaches the fork, and that the eigenvalues along the infinite branch to the left of the fork have roughly constant sensitivities on the order of 1010, as we knew already from Figure 3. Figure 5 has a striking feature, which would prove important in many applications: though the spectrum is in the left half-plane, the pseudospectra protrude significantly into the right half-plane. We shall say more about this in Section 19. The trouble with Figure 5 is that producing it by the method we have described involves a disturbingly long computation. Computing the SVD of an NxN matrix at each point on&uxu grid requires 0(^ 2 /V 3 ) floatingpoint operations, and for iV = 200 and v = 100, as in this figure, the computation time on my workstation works out to about 4 hours. For TV = 1000, it would rise to three weeks - or possibly much longer because of memory limitations. Of course, we can speed up the calculation by using a coarser grid, and in practice one would usually do this in the exploratory phase of any project. Figure 6 illustrates pseudospectra plotted on four different grids corresponding to v = 5, 10, 20, 100. Yet this set of plots mainly serves to emphasize the need for better computational methods. Roughly speaking, one might say that only the first of the four plots of Figure 6 is satisfactory in terms of computing time, and only the last is satisfactory in terms of appearance. The next three sections will describe three ways to accelerate this computation, which can be used in combination. For our example, the speedups achieved are factors of approximately 1.5, 8, and 8, and when the methods are combined, we get a speedup by a factor of better than 60. 10. Avoiding uninteresting sections of the z-plane The first way to speed up the calculation of pseudospectra is the simplest: avoid computing singular values in uninteresting regions of the complex plane, where the resolvent norm is small and there are no boundaries of the pseudospectra of interest. For our example, we note that in about a third of the portion of C shown in our plots, not much is happening, and it is a waste of time to evaluate ||(z — ^l)^ 1 !!. Bypassing this step should accelerate the computation by a worthwhile constant factor. We find that a crude device of this sort improves the computation time for Figure 5 from about 4 hours to 2.5 hours, a speedup by a factor of about 1.5. Some interesting ideas for automating this kind of acceleration have been proposed by Gallestey (1998a, 19986) under the name of the SH algorithm ('subharmonic'). Gallestey divides the region of C of interest iteratively into squares of various sizes and then uses the maximum principle for || (z — A)~l || to prune away squares automatically on which nothing interesting can be
COMPUTATION OF PSEUDOSPECTRA
50
267
-100
Fig. 6. Pseudospectra as in Figure 4 computed on successively finer grids; the points z at which crm\n{z — A) has been evaluated are marked by dots. For v = 5, there are just 25 SVDs to evaluate and the computation is fast, but the plot is crude. The 'publication quality' grid with v = 100 is prohibitively expensive
268
L. N. TREFETHEN
expected to be happening. Any 'industrial strength' software package for computing pseudospectra should incorporate ideas like these. A related, pointwise variant of the same idea is relevant to the various iterative methods for computing 1 and we only want to plot level curves below 1CT1. In this case it may be advantageous to terminate an iteration before convergence.
11. Projection to a lower-dimensional subspace The second way to speed up calculation of pseudospectra, independent of the first, is to reduce the dimension of the N x N matrix A by orthogonal projection onto an invariant subspace of dimension n < N. The idea is that in many applications, most of the 'action' of interest can be captured by the lower-dimensional projection. This technique was perhaps first employed by Reddy, Schmid and Henningson (1993) and is described in Appendix B of that paper and in Section 6 of Toh and Trefethen (1996). It is elementary, but crucial in practice, and too often overlooked. We can often get an improvement in this way by a factor of 10 or more. Following the authors just cited, we first describe a procedure of this kind based on explicit matrix diagonalization. Suppose V is an N x n matrix whose columns are selected linearly independent eigenvectors of B, satisfying BV = VD for some n x n diagonal matrix D of corresponding eigenvalues. If V = QR is a QR decomposition of V, with Q of dimension N x n and R of dimension n x n (Trefethen and Bau 1997), then we have QHV = R and Q = VR'1 and therefore QHBQ = QHBVR~l
= QHVDR~1 =
RDR'1.
Thus T — RDR~X, which is an upper-triangular n x n matrix, is the matrix representation of the projection of B onto the subspace spanned by the selected eigenvectors. We can illustrate this projection process by the following MATLAB code segment, which projects B onto the invariant subspace corresponding to eigenvalues A with Re A > 7 for some constant 7. Of course, different selections of special eigenvalues will be appropriate in other applications (see, for instance, Section 21). [V,D] = eig(B); eigB = diag(D); select = find(real(eigB) > gamma); V = V ( : , s e l e c t ) ; D = D(select.select); [Q,R] = qr(V,0); T = R*D/R;
COMPUTATION OF PSEUDOSPECTRA
269
7 = -100 n = 53 7 minutes
7 = -150 n — 66 13 minutes
50
7 = -250 n = 92 34 minutes
-100
Fig. 7. Acceleration by preliminary projection onto an invariant subspace of dimension n < N. For this example we consider just eigenvalues of real parts > 7 for various 7
270
L. N. TREFETHEN
Figure 7 illustrates the effect of applying this projection to our matrix B = -B200 with 7 = —50, —100, —150, and —250. As more eigenvalues are included, n rises from 37 to 53 to 66 to 92, but this is still far less than 200, and as the final operation count depends on n 3 , it is a very significant improvement - about a factor of eight in this example. A peculiar feature of the projection process just described is that it makes use of a matrix diagonalization. This sounds like a bad idea, since in applications B will often be highly nonnormal or even nondiagonalizable, implying that its eigenvalue problem may be very badly conditioned. In practice, it seems that the use of diagonalization does not cause much trouble, for reasons of backward error analysis. Though each individual numerically computed eigenvalue and eigenvector of a highly nonnormal matrix may be very much in error, their collective behaviour is generally better. Nevertheless, it seems that in principle one ought to avoid the diagonalization, and this can be done by using a Schur decomposition (unitary triangularization) instead. Suppose a unitary similarity transformation is found of the form " T X (ll.l) QH B = Q * 0 Y where Q G cNxN is unitary, T G c n x n is upper-triangular, and X G iV n x JV Nxn is the maCnx(N-n) a n d Y e c( - ) ( -") are arbitrary. If Qx G C trix consisting of the first n columns of Q, then (11.1) implies BQi = Q\T, which implies that if Tx = Xx, then B(Q\x) = X(Qix). Thus the diagonal entries of T are n of the eigenvalues of B, and T is the projection of B onto the corresponding invariant subspace. The factorization (11.1) is known as a partial Schur decomposition (Dongarra, Duff, Sorensen and van der Vorst 1998, Lehoucq, Sorensen and Yang 1998). Since X and Y are arbitrary, all that is really involved here is the determination of an N x n matrix Q\ with orthonormal columns such that T = QHBQI is upper-triangular. Such a matrix might be found by various methods, but we shall consider just the simplest: computing a complete Schur decomposition and then reordering the diagonal entries to bring those of interest to the upper-left. Reorderings of this kind are a standard option in LAPACK (Anderson et al., 1995). They are not standard in the current version of MATLAB, but the desired effect can be achieved using the following code segment adapted from programs of Diederik Fokkema (Fokkema 1996, Fokkema, Sleijpen and van der Vorst 1999): [U,T] = schur(B); if i s r e a l ( B ) , [U,T] = rsf2csf(U,T); end, T = t r i u ( T ) ; eigB = d i a g ( T ) ; select = find(real(eigB) > gamma);
COMPUTATION OF PSEUDOSPECTRA
271
n = length(select); for i = l : n
for k = s e l e c t ( i ) - l : - l : i G([2 1],[2 1]) = planerot([T(k,k+l) T(k,k)-T(k+l,k+l)]')'; J = k:k+l; T(:,J) = T(:,J)*G; T(J,:) = G'*T(J,:); end end T = triu(T(l:n,l:n)); Like the one given earlier, this code segment produces an upper-triangular matrix T corresponding to the projection of B onto the selected subspace. Orthogonal projections have a monotonicity property: they never increase the resolvent norm at any point z. It follows that if A£(T) is the e-pseudospectrum of the projected matrix, then Ae(T) C A e (5), with the e-pseudospectra of T increasing monotonically to those of B as successively larger invariant subspaces are selected. Our orthogonal projections can be viewed as a special case of a more general class of two-sided projections that may be applied for problems of computing pseudospectra. These have been studied under the name of transfer functions by Hinrichsen and Pritchard and Kelb (Hinrichsen and Pritchard 1992, Hinrichsen and Kelb 1993) and Simoncini and Gallopoulos (1998). Finally, it should be mentioned that a different projection idea has also been advocated by Godunov and Sadkane (1996): the numerical use of resolvent integrals (Kato 1976) for the computation of projections associated with subsets of C. 12. Triangularization and inverse iteration or Lanczos A third, major new idea for speeding up the computation of pseudospectra was introduced by S.-H. Lui in an article published in 1997 (Lui 1997). Lui's method is described in his own paper and elsewhere as a method of 'continuation', but his key contribution is really the technique of triangularization followed by inverse iteration or inverse Lanczos iteration. The idea is as follows. If B is a dense matrix, the computation of the smallest singular value of each N x N matrix (z — B)~l takes O(N3) operations, for a total of O(u2N3) operations on a v x v grid. However, suppose that, before computing any singular values, we perform a Schur decomposition, with or without compression, to replace B by a unitarily equivalent upper-triangular matrix T. Then for any z, z — B is unitarily equivalent to the upper-triangular matrix z — T, and hence will have the same singular values. Since z — T is triangular, however, its smallest singular value can be computed in O(N2) rather than O(N3) operations. Thus, at the price of a
272
L. N. TREFETHEN
single computation involving O(N3) operations, we have reduced the cost of each subsequent SVD to O(N2). The overall improvement is from O(v2N3) to O(N3 + v2N2) floating point operations, which for most applications is effectively an improvement to O{y2N2). If B has been orthogonally projected onto a lower-dimensional invariant subspace as described in the last section, then it has been rendered triangular already. In this case there is no need for a further Schur triangularization. It remains to describe how om\n{z — T) can be computed in O(N2) operations. The idea for this is that crmin(z — T) is the square root of the smallest eigenvalue of (z — T)H(z — T), and this can be computed by various iterations; since T is triangular, each step requires only O(N2) operations. The simplest method is inverse iteration applied to (z — T)H(z — T), that is, power iteration applied to (z — T) ~1 (z—T)~H. (An early use of inverse iteration for computing pseudospectra, without triangularization, was by Baggett (1994).) For example, the following rather crudely put together MATLAB code segment is functionally equivalent to the shorter code on p. 265, but many times faster for matrices of larger dimensions. I = eye(size(B)); [U,T] = schur(B); if i s r e a l ( B ) , [U,T] = rsf2csf(U,T); end, T = t r i u ( T ) ; for j = l:nu for i = l:nu z = zz(i,j); Tl = z*I-T; T2 = T 1 J ; v = randn(n,l) + sqrt(-l)*randn(n,1); v = v/norm(v); sigold = 0; for k = 1:99 v = Tl\(T2\v); sig = norm(v); if abs(sigold/sig-l) < .001 break; end sigold = sig; v = v/sig; end sigmin(i,j) = l/sqrt(sig); end end contour(x,y,loglO(sigmin+le-20),-10:-l);
The main part of this code is a double loop just as on p. 265, except that inside the loop, (Tmm(zij — B) is now computed by inverse iteration. The convergence criterion used here is crude: we stop when two successive estimates of <Jmin(zij — B)2 agree to a tenth of a percent, taking up to a maximum of 99 steps.
COMPUTATION OF PSEUDOSPECTRA
273
This simple method does well in many cases but, as always with power iteration, the convergence may be slow if the dominant eigenvalue (of (z — T)~l(z — T)~H) is not well separated from the others. To retain speedy convergence in such cases one can replace the inverse iteration by an inverse Lanczos iteration. Here is a modified MATLAB fragment to achieve the desired effect: I = eye(size(B)); [U,T] = schur(B); if isreal(B), [U,T] = rsf2csf(U,T); end, T = triu(T); for j = l:nu for i = l:nu z = zz(i,j); Tl = z*I-T; T2 = Tl'; sigold = 0; qold = zeros(n.l); beta = 0 ; H = []; q = randn(n,l) + sqrt(-l)*randn(n,l); q = q/norm(q); for k = 1:99 v = Tl\(T2\q) - beta*qold; alpha = real(q'*v); v = v - alpha*q; beta = norm(v); qold = q; q = v/beta; H(k+l,k) = beta; H(k,k+1) = beta; H(k,k) = alpha; sig = max(eig(H(l:k,1:k))); if (abs(sigold/sig-l)<.001) I (sig<3 & k>2) break, end sigold = sig; end sigmin(i,j) = l/sqrt(sig); end end contour(x,y,loglO(sigmin+le-20),-10:-l);
Suppose we apply this code to the matrix B = B200 of our example, using no other acceleration methods. We find that, for most points z on the grid, 3 iterations are taken inside the inner loop, as illustrated in Figure 8. Thus the cost of each evaluation of crmin(z — B) is essentially that of 6 triangular matrix solves. The computing time improves from 4 hours to about 29 minutes, a speedup by a factor of about 8. If uninteresting parts of the z-plane are avoided, the improvement is from 2.5 hours to 19 minutes, again a speedup by a factor of about 8. If in addition we first project B onto the subspace of dimension 92 associated with eigenvalues A with Re A > a with G = —250, as in the last section, the improvement is from 18 minutes to 3.7 minutes. This last speedup is by a factor of about 5, not 8, since the effectiveness of the preliminary triangularization is diminished for a matrix of dimension 92 rather than 200.
274
L. N. TREFETHEN
-100
-50
Fig. 8. Pseudospectra of the matrix B = B2oo computed on a 20 x 20 grid by projection to dimension n = 92 followed by inverse Lanczos iteration. The numbers of Lanczos steps at each point of the grid are marked, illustrating that even with cold starts for each z, 3 steps typically suffice for convergence. Blank sections of the plot correspond to areas of the complex plane that have been pruned away as described in Section 10. This computation took 30 seconds
The Lanczos iteration we have just described is just one possibility for this kind of computation. Alternative methods have been studied in detail by Braconnier (1996 1997), Braconnier and Higham (1996), Lui (1997), and Marques and Toumazou (1995 a, 19956). Braconnier and Higham improve the Lanczos iteration by selective reorthogonalization and Chebyshev acceleration, and they emphasize the importance for robustness of a carefully designed and conservative convergence criterion. All of these authors use continuation from one point z to the next so that an iteration starts with a better than random initial guess. Since a cold start tends to produce convergence in three iterations, however, it seems that the use of continuation is not indispensable.
COMPUTATION OF PSEUDOSPECTRA
275
13. S u m m a r y of s p e e d u p s discusse d so far
For our tutorial example of dimension iV = 200, the various algorithms we have described have performed roughly as follows: straightforward use of SVD 4 hours prune uninteresting portions of z-plane 2.5 hours prune and project to Re A > —250 18 minutes prune and use preliminary triangularization 19 minutes combination of all speedups 3.7 minutes By a succession of three improvements implementable in 40 lines of MATLAB (see Section 20), we have reduced the computation time for Figure 5 from 4 hours to 4 minutes. This factor of 60 is comparable to the factor by which workstations have speeded up in the years since Trefethen (1992), signalling the roughly equal roles of hardware and algorithmic improvements in the field of computation of pseudospectra. Of course, to assess various algorithms systematically one would like asymptotic formulas rather than examples. Unfortunately, outside of the context of a particular class of matrices, it is hard to see how to derive asymptotic formulas with much substance for the computation of pseudospectra of matrices. How much can one gain by projection to a lower-dimensional subspace? It depends on how far the dimension can be lowered, and this depends on the problem at hand, not on any general parameters. Nevertheless we offer this rough guide to the improvement factors that seem to be achievable for many examples: prune uninteresting portions of z-plane cuts no. of grid points in half project to interesting subspace cuts iV in half preliminary triangularization speeds up by factor iV/30 combination of all three speeds up by factor iV/4 This last figure JV/4 is roughly the product of the three speedup factors 23 = 8, 2, and (JV/2)/30. We may call it a Rule of Thumb. For a typical problem of size N = 1000, for example, one should expect to be able to compute a publication-quality plot of pseudospectra with about 1/250 as much work as by using the algorithm of p. 265. 14. Parallel computation of pseudospectra Much further speedups are available via a fourth method: the use of multiple processors. In many situations the computation of pseudospectra falls in the class of problems known as embarrassingly parallel. This means that the computation decouples so readily that taking advantage of multiple processors requires little effort. The first parallel computations of pseudospectra appear to have been those by A. E. Trefethen reported in Trefethen, Trefethen, Reddy and Driscoll (1993), and subsequent contributions in this area
276
L. N. TREFETHEN
have been due to Braconnier (1996), A. E. Trefethen et al. (1996), Fraysse, Giraud and Toumazou (1996), Heuveline, Philippe and Sadkane (1997), Trefethen, Trefethen and Schmid (1999), and Bekas and Gallopoulos (1999). In the simplest case, suppose one is computing a plot of pseudospectra by evaluating 1 require finer grid resolutions than those near the origin. In this case, if the points z^ are to be treated independently, then load-balancing can be achieved by maintaining a list of points z^ not yet treated and assigning a new point to a processor whenever it finishes with an old one. If the treatment of the points is dependent, which might be the case because of some kind of initial guess continued from point to point, then the management of these lists will benefit from some geometric structuring. We will not go into further details, but just summarize the subject of parallel computation of pseudospectra with the statement that if p processors are available, one can usually achieve a speedup by a factor close to p. 15. Global Krylov subspace iterations Up to now, we have discussed methods belonging to the realm of dense linear algebra, where all N2 entries of a matrix are manipulated explicitly and fundamental matrix calculations require O(N3) operations. However, many people have had the idea that the well-developed techniques of Krylov subspace iterations should also have a role to play in computing pseudospectra, and, for matrices of dimensions in the thousands, this conclusion seems inescapable. For information on Krylov subspace iterations see Barrett et al. (1994), Dongarra, Duff, Sorensen and van der Vorst (1998), Greenbaum (1997), Lehoucq, Sorensen and Yang (1998), Saad (1992), and Trefethen and Bau (1997). The very many ideas of this kind that might be considered fall roughly into two classes. One can attempt to approximate the pseudospectra of a matrix or operator all at once with a single sequence of Krylov subspaces, or one can use Krylov methods pointwise to accelerate the computation of resolvent norms for individual values of z. In this section we consider the first of these ideas. (In the end the greatest power may come from combining
COMPUTATION OF PSEUDOSPECTRA
277
the two, working locally with small regions of the z plane but not individual points z.) The motivation for methods of this kind is that Krylov subspace iterations extract essential information from a matrix within the context of low-dimensional subspaces; they are projection processes closely related to those discussed in Section 11. Instead of computing just eigenvalues (Ritz values) in these subspaces, why not compute pseudospectra? Preliminary ideas in this direction can be found in Freund (1992) and in Nachtigal, Reichel and Trefethen (1992), and the method has been explored further by Toh and Trefethen (1996) and Simoncini and Gallopoulos (1998). The simplest procedure, as described by Toh and Trefethen, goes as follows. Starting from a random initial vector, we carry out an Arnoldi iteration in the usual manner, obtaining thereby an increasing sequence of initial columns of a Hessenberg matrix unitarily similar to B. We then compute pseudospectra of successive sections of this Hessenberg matrix, and take these as approximations to the pseudospectra of B. Rectangular sections with dimensions of the form (n + 1) x n are more appropriate than square ones; the pseudospectra of a rectangular matrix B can be denned via (3.3) or, equivalently, via the pseudoinverse of z — B (Toh and Trefethen 1996). (It will be interesting to see whether pseudospectra of rectangular matrices achieve importance in other contexts in the years ahead.) Figure 9 gives an indication of how this method performs for our example problem. Starting from the full matrix B = -B200) pseudospectra are plotted corresponding to Krylov subspace approximations of dimensions 120, 140, 160, and 180. The results are disappointing. Not until n is close to 200 do the pictures look reasonable and, of course, in numerical computation one wants more than merely something that looks reasonable. What has gone wrong is that the spectrum of -B200 extends very far into the left half-plane, with a leftmost (nonphysical) eigenvalue at about — 700,000 + 300i, which will only get worse if TV is increased. Under such circumstances a straightforward Arnoldi iteration has little chance of capturing the interesting behaviour near the origin efficiently. One can improve the situation in various ways, for example by working with B~x instead of B, but this is not an easily used general technology. For large-scale problems, the first thing one might do in the exploratory phase of a computational project involving highly nonnormal matrices should perhaps be a computation of the kind described in this section. Indeed, it might be argued that pictures of estimated pseudospectra should be a routine by-product of all large-scale Krylov subspace calculations; the dimensions of the Hessenberg matrices will usually be low, so the cost will be small. If one decides that accurate pictures of pseudospectra are needed, however, in most cases one will want to move on to other methods.
278
L. N. TREFETHEN
50
-100
Fig. 9. Pseudospectra of Arnoldi projections of various dimensions for B = B2 Convergence takes place eventually, but it is too slow for Krylov subspace iterations in this pure form to be of much use for this example
COMPUTATION OF PSEUDOSPECTRA
279
16. Local Krylov subspace iterations Krylov subspace methods are much more powerful than the last section may seem to suggest. The crucial modification is that they be applied for individual points z 6 C, or in localized regions. Then one has the potential for convergence to arbitrary accuracy at high speed and, for largescale problems, these are the most powerful methods known. I will not attempt to describe these methods in any detail, as my experience in this area is small and new developments are occurring very fast. However, here is a quick outline. One of the first papers to discuss methods of this kind was by Carpraux, Erhel and Sadkane (1994), who used a Davidson iteration with continuation (Davidson 1975). This method was subsequently parallelized by Heuveline, Philippe and Sadkane (1997) and applied by them to a matrix of dimension N = 8192. Other contributions in this area are due to Lui (1997), Braconnier (1996, 1997), Braconnier and Higham (1996), and Ruhe (1994, 1998), who has developed a rational Krylov algorithm. In this and other computations it is crucial to use exact or approximate inverses of the matrix being analysed, wherever possible, as this may greatly speed up the convergence. The starting point is the idea of shift-and-invert Arnoldi iteration but, from there, many different paths can be taken. One of the leading projects currently underway for large-scale computation of pseudospectra is being carried out by numerical analysts and plasma physicists at the University of Utrecht and the CWI in the Netherlands (van Dorsselaer, Goedbloed, Nool, van der Ploeg, van der Vorst and others). In a research project on the stability of plasmas, these researchers have succeeded in computing the eigenvalues of generalized unsymmetric eigenvalue problems, associated with Tokamak plasmas, of order up to 262,144. On a CRAY T3E, a relevant part of the so-called Alfven spectrum (12 eigenvalues) could be computed in 7 seconds of wall-clock time. This was done with the Jacobi-Davidson method (Dongarra, Duff, Sorensen and van der Vorst 1998, Sleijpen and van der Vorst 1996), using a direct decomposition of a shifted matrix for one single shift in the neighbourhood of the desired part of the spectrum. The group in Utrecht has now started work on the evaluation of pseudospectra for this problem, also with the Jacobi-Davidson method and with similar preconditionings as for the generalized eigenproblem. The idea is that one single preconditioner can be used for a portion of the pseudospectra, and the research focuses on the efficient re-use of search subspaces in sweeping over the spectrum with the Jacobi—Davidson method. Continuation of data from point to point appears to be more important for these large-scale Krylov subspace iterations than for the dense matrix computations discussed in Section 12. The reason is that the large-scale methods depend crucially on the use of subspaces that get enlarged and de-
280
L. N. TREFETHEN
flated as the iteration proceeds. To evaluate a resolvent norm \\(z — i ? ) " 1 ^ , it may save a great deal of work if one starts with the subspaces already determined for a neighbouring point z'. 17. C u r v e - t r a c i n g for p s e u d o s p e c t r a l b o u n d a r i e s Quite a different approach to producing plots of pseudospectra has been proposed by Kostin (1991) and worked out in detail by Briihl (1996). Rather than use a contour plotter with data based on a grid, why not trace the boundary curves of the pseudospectra directly? Such a technique has two potential advantages. One is that we can determine the boundary curves to great accuracy, if that is desired. The other is that fewer evaluations of Cmin^ — B) may be needed since no grid is involved. Briihl put this idea into practice with a Newton iteration at each step and showed that it can be effective. An appealing feature of his method is that any speedups one gets in this fashion can be combined with those provided by methods for accelerating the computation of crmin(,z — B), such as projection and Lanczos iteration. For general use, the method of curve tracing runs into questions of how to handle corners and, more seriously, how to cope robustly with pseudospectra that have two or many components. It is very attractive, however, for problems where one wants to concentrate accurately on particular sections of the boundaries of pseudospectra. For example, by a method of this kind one can design a program that enables the user to click with the mouse at a point in the complex plane and have the computer draw the boundary of a pseudospectrum that passes through this point. This can be informative and beautiful graphically, and it can provide an elegant route to computational estimates of matrix functions of interest based on contour integrals. One could click on a point z, for example, and see not only the pseudospectral boundary that passes through z but also some numerically computed upper and lower bounds based on that curve. Briihl's work on curve-tracing methods has been carried further by Bekas and Gallopoulos (1998, 1999) in a method called COBRA. These authors combine curve-tracing and grid methods, using 'a small, moving grid that follows the boundary <9Ae, almost like the head of a cobra that follows the movements of its prey'. This hybrid approach, they argue, offers the advantages of curve-tracing combined with greater robustness and opportunities for parallelism. 18. P s e u d o s p e c t r a in B a n a c h spaces For a fixed matrix, all norms are equivalent, and thus the resolvent norms associated with two different norms differ at most by constants. Since the
COMPUTATION OF PSEUDOSPECTRA
281
effects of interest in plotting pseudospectra are often in some sense exponentially strong, it follows that in many cases, the pseudospectra of a matrix do not change much when one switches, say, from || H2 to || ||i or || ||oo. In such cases the choice of norm may not be too important. (More extreme changes of norm may have more extreme effects; after all, the pseudospectra can be rendered trivial by the switch to a norm defined by the coefficients in an expansion in eigenvectors.) It would be a mistake to presume, however, that the difference between || H2 and || ||i, say, or more generally between Hilbert spaces and Banach spaces, is always minor. Once one is dealing with operators of infinite dimension or their matrix discretizations, the gaps between p-norms may be arbitrarily large, and in some cases the 'physics' of the problem lies in the gap. Pseudospectra in non-Euclidean norms are discussed from a theoretical point of view in van Dorsselaer, Kraaijevanger and Spijker (1993), and an example of a paper in which they are computed numerically is the study of Abel integral operators by Plato (1997). My own conversion to the importance of this subject came with a study of the 'cutoff phenomenon' that occurs in certain Markov chains (Diaconis 1996). Diaconis and others have shown that, for various random processes such as random walk on a hypercube (Diaconis, Graham and Morrison 1990) and riffle shuffling of a deck of cards (Bayer and Diaconis 1992), convergence to a uniform probability distribution, when measured in a certain way, occurs not gradually but in a sudden fashion after a certain number of steps. Since the processes in question involve powers of matrices, this nonsteady behaviour suggests that pseudospectra must be important. The matrix dimensions in these problems are sometimes combinatorially large, however, and in such instances it may be crucial to use || ||i (the natural norm for probability) rather than || H2. Indeed, the matrices of interest are sometimes normal with respect to the Euclidean norm. For example, the problem of random walk on an ndimensional hypercube leads to a matrix of dimension N = 2n. The matrix is real and symmetric, hence normal, so a normalized matrix of eigenvectors has K2(V) = 1 in the 2-norm and uninteresting pseudospectra. In the 1norm, however, we get KI(V) « 106 for n as low as 40, and the corresponding pseudospectra reach outside the unit disk. Similarly, the problem of riffle shuffling of a deck of 52 cards leads to a matrix of dimension 52! « 8 x 10 67 with «i(V) ~ 1040 (Jonsson and Trefethen 1998). The largest eigenvalue is A = 1/2, but the pseudospectra protrude outside the unit disk. The obvious algorithm for computing pseudospectra with respect to || ||i or || Hoc requires the inversion of z — A at a cost of O(N3) flops at each point z. The acceleration methods we have described do not apply directly in this case, but it is possible that they could be adapted to this purpose by the use of dual norms. Very recently, the first contribution that I know of to the fast
282
L. N. TREFETHEN
computation of pseudospectra in || ||i or || H^ has appeared, by Higham and Tisseur (1999). These authors combine two ideas with impressive results. The first is the Schur reduction of A to triangular form, as in Section 12; in contrast to the situation with || H2, here we must retain the matrix Q of the reduction A = QTQ* for use in further computations, as the unitary similarity transformation does not leave || ||i or || ||oo invariant. The second is an iteration to determine the norm of the inverse of a triangular matrix that is developed as a fast algorithm for condition number estimation based on block matrices. Aside from the treatment of nonstandard norms, one of the useful features of the work by Higham and Tisseur for readers of the present paper is that it relates the computation of pseudospectra to the estimation of condition numbers.
19. Pseudospectra and behaviour Now, then, briefly, what is the purpose of all these computations of pseudospectra? Eigenvalues are generally computed for one or both of two reasons: to aid in the solution of a problem via diagonalization, or to give insight into how a system behaves (Trefethen 1997). Important examples of behaviour are stability or resonance for various physical or numerical processes and speed of convergence for numerical iterations. Behavioural phenomena are typically quantified by norms of functions of the matrix or operator in question, such as ||i4n||, || exp(£A)||, or ||p(.A)||, where p is a polynomial or a rational function. If A is an unbounded operator, as with our example (4.1), the notion of exp(tA) can be made rigorous by various methods considered in the theory of semigroups (Kato 1976, Pazy 1983). If A is far from normal, pseudospectra are likely to do better than eigenvalues alone in the second of these two roles. It is known that pseudospectra cannot in general give exact information about norms of functions of matrices or operators (Greenbaum and Trefethen 1993). However, they may provide bounds that are much sharper than those obtained from eigenvalues. Some such bounds are described in Trefethen (1997), and examples can be found in many of the articles cited in the introduction. Here we will not discuss these matters in generality but just illustrate what the pseudospectra of our example operator A of (4.1) may reveal about its time evolution. To be specific, suppose we are interested in the linear evolution process du/ dt = Au, with solution u(t) = exp(tA)u(0). Looking first at the spectrum of A, we note that the rightmost eigenvalue in the complex plane is A = —0.7803 + 1.89511 (labelled 1 in Figure 1), and the secondrightmost is A = —2.3246 + 5.6695i. These numbers appear to suggest that the evolution of this system will exhibit gentle decay at a rate approximately o8^ t n e dominant modes being smooth ones.
COMPUTATION OF PSEUDOSPECTRA
283
x10 2i
max sa 187,000
Fig. 10. Transient behaviour of || exp(iA)||: the actual curve, and the lower bound /C obtained from pseudospectra. The pseudospectra correctly capture the transient growth of order 105
A glance at the pseudospectra in Figure 5 suggests a different time evolution. In fact, the most conspicuous part of the behaviour of this process will be associated with the nearly degenerate eigenvalues along the starred branch, of which the rightmost is the mode 3/4 pair, with A « —2.6809 + 70.8747L Because the pseudospectra in this part of the plane protrude so strongly into the right half-plane, the evolution process will be susceptible to large transient effects, and the structures involved will have frequencies closer to 70 than 0. The low-frequency modes 1 and 2 will be significant only for long time integrations, and then only if the dynamical system is purely linear, governed by A alone, with no variations of coefficients or forcing terms or nonlinearities. Figure 10 shows a plot of || exp(L4)|| against t. We see that there is very strong transient amplification of some initial vectors, rising to a maximum of about 187,000 around t = 0.73. The order of magnitude of this growth can be predicted from the pseudospectra. For any A and e, if Ae(A) extends a distance rj into the right half-plane, then it can be shown by a Laplace transform that ||exp(L4)|| must be as large as 77/e for some t > 0. Given
284
L. N. TREFETHEN
A, let the Kreiss constant K, for A be defined as the supremum of this ratio over all e (Kreiss 1962). Equivalently, K is the smallest constant such that
for all z with Re z > 0. Then the inequality just mentioned takes the form sup||exp(L4)|| >K.
(19.1)
i>0
For our example the Kreiss constant is approximately K = 48570, attained for z = 1.25+68.88i, with I K ^ - A ) ^ ! = 38850. The dashed line in Figure 10 marks this lower bound. The size of the transient hump in || exp(L4)|| is only one of the aspects of the behaviour of A that pseudospectra may shed light on. Another would be the response of a system governed by this operator to oscillatory inputs at various real frequencies to, corresponding to points on the imaginary axis of Figure 5. Judging by the spectrum, one would guess that only frequencies LO « 0 or u! « 70 should excite much response, but the pseudospectra imply that amplifications on the order of 103 or more can be expected for the full range of frequencies u; € [40,80]. This phenomenon of pseudo-resonance and other aspects of the physics of nonnormality are discussed in Trefethen, Trefethen, Reddy and Driscoll (1993), Butler and Farrell (1992), and Farrell and Ioannou (1996). More generally, pseudospectra may provide bounds on ||/(^4)|| for any function / . The most general results along this line are to be found in Toh and Trefethen (19996), where relationships are derived between the size of ||/(-A)||, the size of f(z) on a complex domain Q, and the Kreiss constant of A with respect to f2.
20. A MATLAB program An historic event in numerical computation was the codification of algorithms for computing matrix eigenvalues into the Fortran software package EISPACK in the 1970s (Smith et al. 1976). Major algorithmic advances had been made in that subject in the preceding decade and a half, which had advanced the state of the art far beyond what an average scientist could expect to program for him- or herself, and, equally important, these were problems for which potential users knew that they needed help. The impact of EISPACK was enormous: a set of problems that had earlier been challenging were reduced very quickly, as it were, to a black box. The problem of computing pseudospectra is not yet in a comparable situation. The problem is too new and too rapidly changing, and the algorithms are not yet sufficiently well worked out for it to be appropriate to aim for black boxes in this area. Nevertheless, in a modest way, small-scale software
COMPUTATION OF PSEUDOSPECTRA
285
'/, psa.m - Simple code for 2-norm pseudospectra of given matrix A. '/, Typically about N/4 times faster than the obvious SVD method. '/, Comes with no guarantees! - L. N. Trefethen, March 1999. '/. Set up grid for contour plot: npts = 20; s = .8*norm(A, 1) ; '/. <- ALTER GRID RESOLUTION xmin = -s; xmax = s; ymin = -s; ymax = s; '/, <- ALTER AXES x = xmin:(xmax-xmin)/(npts-l):xmax; y = ymin:(ymax-ymin)/(npts-l):ymax; [xx.yy] = meshgrid(x,y); zz = xx + sqrt(-l)*yy; 7, Compute Schur form and plot eigenvalues: [U,T] = schur(A); if isreal(A), [U,T] = rsf2csf(U,T); end, T = triu(T); eigA = diag(T); hold off, plot(real(eigA).imag(eigA),'.','markersize',15), hold on axis([xmin xmax ymin ymax]), axis square, grid on, drawnow '/, Reorder Schur decomposition and compress to interesting subspace: 7. <- ALTER SUBSPACE SELECTION select = find(real(eigA)>-250); n = length(select); for i = l:n for k = select(i)-1:-1:i G([2 1] , [2 1]) = planerot([T(k,k+l) T(k,k)-T(k+l,k+l)] ' ) ' ; J = k:k+l; T(:,J) = T(:,J)*G; T(J,:) = G'*T(J,:); end, end T = triu(T(l:n,l:n)); I = eye(n); '/, Compute resolvent norms by inverse Lanczos iteration and plot contours: sigmin = Inf*ones(length(y),length(x)); for i = 1:length(y) if isreal(A) & (ymax==-ymin) & (i>length(y)/2) sigmin(i,:) = sigmin(length(y)+l-i,:); else for j = l:length(x) z = zz(i,j); Tl = z*I-T; T2 = Tl'; if real(z)<100 7. <- ALTER GRID POINT SELECTION sigold = 0; qold = zeros(n,l); beta = 0; H = [] ; q = randn(n,l) + sqrt(-l)*randn(n,l); q = q/norm(q); for k = 1:99 v = Tl\(T2\q) - beta*qold; alpha = real(q'*v); v = v - alpha*q; beta = norm(v); qold = q; q = v/beta; H(k+l,k) = beta; H(k,k+1) = beta; H(k,k) = alpha; sig = max(eig(H(l:k,l:k))); if (abs(sigold/sig-l)<.001) I (sig<3 & k>2) break, end sigold = sig; end 7.text(x(j),y(i),num2str(k)) '/. <- SHOW ITERATION COUNTS sigmin(i.j) = l/sqrt(sig); end, end, end disp(['finished line ' int2str(i) ' out of ' int2str(length(y))]), end contour(x,y,loglO(sigmin+le-20),-8:-l); 7. <- ALTER LEVEL LINES
Fig. 11. Fast (not robust)
MATLAB
code for pseudospectra of dense matrices
286
L. N. TREFETHEN
for computing pseudospectra may prove useful. The MATLAB program of Figure 11 is one that I hope readers may find helpful, after adapting the details to their needs, as a starting point for dense matrix computations. The code is available online at www.comlab.ox.ac.uk/oucl/people/nick.trefethen.html. Its main purpose is to show how much better one can do for many problems than to use the obvious SVD algorithm. But this is nothing like robust software, and makes no claim to be. This code is filled with arbitrary features that can easily be broken. Just one item of software has achieved something like general use for the calculation of pseudospectra: the MATLAB program pscont from the Test Matrix Toolbox of Higham (1995). That code makes use of a straight SVD algorithm, however, and thus is not as fast as we would like. Other authors have taken steps to develop software for high-performance computations, but none appear to be in wide use at present. 21. Another example I would like to finish by presenting a plot of pseudospectra of a second example operator, one with a special meaning for this field. So far as I know, the first person to define the notion of pseudospectra was Henry Landau at Bell Laboratories in the 1970s, who was motivated in part by applications in lasers and optical resonators (Landau 1975, 1976, 1977). One of the operators that Landau considered in detail was the complex symmetric (but non-Hermitian) compact integral operator e-tF^-y~>2u{y)dy,
Au{x) = JiF/^f v
(21.1)
J-i
acting on functions in L^y—1,1], where F is a large parameter, the Fresnel number (Landau 1976, 1977). This operator is easily described in words: it convolves a function on [—1,1] with a high-frequency complex Gaussian. The eigenvalues lie on a spiral in the unit disk that converges to the origin (Cochran and Hinds 1974), but Landau proved that, for any e > 0, the e-pseudospectrum A£(A) contains the entire unit circle, for all sufficiently large F. He further showed that each z with \z\ = 1 has an e-pseudoinvariant subspace of dimension at least O(\fF) as F —> oo. Twenty-two years later, Andrew Spratley and I have carried out numerical computations of the pseudospectra of this operator considered by Landau. We use a spectral collocation discretization much as in Section 5; the details will be reported elsewhere. The eigenfunctions and pseudoeigenfunctions are highly oscillatory, and fine grids are needed to resolve them. For the case F = 64, we find that an N x N matrix with N = 600 suffices for a good picture of pseudospectra, shown in Figure 12.
COMPUTATION OF PSEUDOSPECTRA
287
Fig. 12. Spectrum and e-pseudospectra of the operator (21.1) (F — 64) studied in Landau's original work on pseudospectra, for e = 10" 1 ,10~ 2 ,..., 10~8. The dashed curve is the unit circle. Unlike the differential operator (4.1), this integral operator is compact, and the eigenvalues spiral in to the origin. As F —> oo, for any e > 0, Ae(Ap) converges to the disk \z\ < 1 + e. This figure, based on a spectral discretization of dimension N = 600, is probably accurate to plotting accuracy except in a central region of radius about 0.1
With N = 600 rather than N = 200 as before, the evaluation of a single resolvent norm \\(z — ^4)~1|| for Figure 12 takes about 27 times longer than for Figure 5, about 15 seconds on the SUN Ultra 30. Furthermore, because of the fine structure to be resolved in the plot, we have used a 200 x 200 rather than 100 x 100 grid. This means that the total time to compute Figure 12 by the obvious SVD algorithm should be about 100 times greater than the previous figure of 4 hours, that is, about 15 days. According to the Rule of Thumb of Section 13, however, it should be possible to speed this up by a factor of about JV/4 « 150 by the dense matrix methods described in Sections 10-12 and in the MATLAB program of Figure 11, bringing the computation time down to two or three hours. We used a projection onto the space spanned by eigenvectors associated with eigenvalues A with |A| > 0.0001. This reduced the matrix dimension to N = 161, and, in the event, the computation took about 1 hour.
288
L. N. TREFETHEN
22. Discussion The computation of pseudospectra is only a decade old. Remarkable progress has been made in the 1990s; of the 111 items in the list of references below, 96 date from this decade! In this survey I have concentrated mostly on techniques for dense matrices, typically those of dimension less than 1000. Many of the developments to come in the years ahead will pertain to the other case of sparse or structured matrices of larger dimensions, where variations on the themes of Arnoldi, Jacobi-Davidson and rational Krylov iterations are powerful. Our discussion has been confined to the standard matrix problem Ax — Ax, that is, pseudospectra of matrices or operators, but in many applications it may be more appropriate to consider the generalized problem Ax = XBx, that is, pseudospectra of matrix or operator pencils (van Dorsselaer 1997, Fraysse, Gueury, Nicoud and Toumazou 1996, Fraysse and Toumazou 1998, Riedel 1994). Computing pseudospectra is not yet a routine matter among scientists and engineers who deal with nonnormal matrices, but I think it will become so. The nature of the software that is available will play a decisive role in determining how the field develops. As time goes by, more software products for large-scale eigenvalue computations will appear, descendants of today's codes such as ARPACK (Lehoucq, Sorensen and Yang 1998), and these will show an increasing emphasis on graphical interaction with the user. In such an environment it is inevitable that scientists will be encouraged to calculate more than just eigenvalues and, gradually, the computation of the eigenvalues of nonnormal matrices and the computation of their pseudospectra will fuse into one subject. This field will also participate in a broader trend in the scientific computing of the future, the gradual breaking down of the walls between the two steps of discretization (operator —> matrix) and solution (matrix —> spectrum or pseudospectra). For many problems, the matrix —>operator limit is illbehaved in the spectrum but well-behaved in the pseudospectra. Perhaps pseudospectra will play a role too in breaking down walls between the theorists of functional analysis and the engineers of scientific computing. Acknowledgements I am grateful to Andrew Spratley for his computations for Section 21 and to Jos van Dorsselaer and Henk van der Vorst for their contributions to the discussion of large-scale Krylov subspace iterations. (That the discussion is still far from complete is my responsibility alone.) In addition, van Dorsselaer provided indispensable assistance on the subject of partial Schur decompositions. For comments on a draft manuscript I thank Spratley and van Dorsselaer and also Mark Embree, Anne Greenbaum, Nick Higham,
COMPUTATION OF PSEUDOSPECTRA
289
Richard Lehoucq, Satish Reddy, and Endre Siili. Finally, special thanks go to Anna Aslanyan and Brian Davies for taking a detailed interest in this work, which included the computational verification of some of my numbers, and for pointing out an error in a preliminary draft.
REFERENCES E. Anderson et al. (1995), LAPACK Users' Guide, 2nd edn, SIAM, Philadelphia. J. S. Baggett (1994), 'Pseudospectra of an operator of Hille and Phillips', Res. Rep. 94-15, Interdisc. Proj. Ctr. Supercomp., Swiss Federal Institute of Technology, Zurich. R. Barrett et al. (1994), Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia. D. Bau, III, and L. N. Trefethen (1997), Numerical Linear Algebra, SIAM, Philadelphia. D. Bayer and P. Diaconis (1992), 'Trailing the dovetail shuffle to its lair', Ann. Appl. Prob. 2, 294-313. C. Bekas and E. Gallopoulos (1998), 'COBRA: A hybrid method for computing the matrix pseudospectrum', abstract, Copper Mountain Conf. on Iterative Methods. C. Bekas and E. Gallopoulos (1999), 'COBRA: Parallel path following for computing the matrix pseudospectrum', manuscript in preparation. D. Borba et al. (1994), 'The pseudospectrum of the resistive magnetohydrodynamics operator: resolving the resistive Alfven paradox', Phys. Plasmas 1, 3151-3160. A. Bottcher (1994), 'Pseudospectra and singular values of large convolution operators', J. Int. Eqs. Appl. 6, 267-301. S. Boyd and C. A. Desoer (1985), 'Subharmonic functions and performance bounds on linear time-invariant feedback systems', IMA J. Math. Control Inform. 2, 153-170. T. Braconnier (1996), Fvpspack: A Fortran and PVM package to compute the field of values and pseudospectra of large matrices, Numer. Anal. Rep. 293, Manchester Ctr. Comp. Maths., Manchester, England. T. Braconnier (1997), 'Complete iterative method for computing pseudospectra', preprint. T. Braconnier, F. Chatelin and J.-C. Dunyach (1995), 'Highly nonnormal eigenvalue problems in the aeronautical industry', Japan J. Ind. Appl. Math. 12, 123-136. T. Braconnier and N. J. Higham (1996), 'Computing the field of values and pseudospectra using the Lanczos method with continuation', BIT 36, 422-440. T. Braconnier, A. McCoy and V. Toumazou (1997), 'Using the field of values
290
L. N. TREFETHEN
for pseudospectra generation', Technical Report TR/PA/97/28, CERFACS. M. Briihl (1996), 'A curve tracing algorithm for computing the pseudospectrum', BIT 36, 441-454. K. M. Butler and B. F. Farrell (1992), 'Three-dimensional optimal perturbations in viscous shear flow', Phys. Fluids A 4, 1637-1650. C. Canuto, M. Y. Hussaini, A. Quarteroni and T. A. Zang (1988), Spectral Methods in Fluid Dynamics, Springer, New York. J. F. Carpraux, J. Erhel and M. Sadkane (1994), 'Spectral portrait for nonHermitian large sparse matrices', Computing 53, 301-310. F. Chaitin-Chatelin and V. Fraysse (1996), Lectures on Finite Precision Computations, SIAM, Philadelphia. F. Chatelin (1983), Spectral Approximation of Linear Operators, Academic Press, London. J. A. Cochran and E. W. Hinds (1974), 'Eigensystems associated with the complex-symmetric kernels of laser theory', SIAM J. Appl. Math. 26, 776-786. C. Cossu and J. M. Chomaz (1997), 'Global measures of local convective instabilities', Phys. Rev. Lett. 78, 4387-4390. D. L. Darmofal and P. J. Schmid (1996), 'The importance of eigenvectors for local preconditioners of the Euler equations', J. Comput. Phys. 127, 346-362. E. R. Davidson (1975), 'The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices', J. Comput. Phys. 17, 87-94. E. B. Davies (1999a), 'Pseudospectra, the harmonic oscillator and complex resonances', Proc. Royal Soc. London A 455, 585-599. E. B. Davies (19996), 'Pseudospectra of differential operators', J. Operator Theory, to appear. J. W. Demmel (1987), 'A counterexample for two conjectures about stability', IEEE Trans. Autom. Control AC-32, 340-342. P. Diaconis (1996), 'The cutoff phenomenon in finite Markov chains', Proc. Nat. Acad. Sci. USA 93, 1659-1664. P. Diaconis, R. L. Graham and J. A. Morrison (1990), 'Asymptotic analysis of a random walk on a hypercube with many dimensions', Random Struct. Alg. 1, 51-72. J. M. Donato (1991), Iterative Methods for Scalar and Coupled Systems of Elliptic Equations, PhD thesis, Dept. of Math., U. of California, Los Angeles. J. J. Dongarra, I. S. Duff, D. C. Sorensen and H. A. van der Vorst (1998), Numerical Linear Algebra for High-Performance Computers, SIAM, Phil-
adelphia.
COMPUTATION OF PSEUDOSPECTRA
291
J. L. M. van Dorsselaer (1997), 'Pseudospectra for matrix pencils and stability of equilibria', BIT 37, 833-845. J. L. M. van Dorsselaer, J. F. B. M. Kraaijevanger and M. N. Spijker (1993), 'Linear stability analysis in the numerical solution of initial value problems', in Vol. 2 of Ada Numerica, Cambridge University Press, pp. 199237. T. A. Driscoll and L. N. Trefethen (1996), 'Pseudospectra for the wave equation with an absorbing boundary', J. Comput. Appl. Math. 69, 125-142. B. F. Farrell and P. J. Ioannou (1996), 'Generalized stability theory. Part I: autonomous operators and Part II: Nonautonomous Operators', J. Atmos. Sci. 53, 2025-2040 and 2041-2053. J. Flaherty, C. E. Seyler and L. N. Trefethen (1999), 'Large-amplitude transient growth in the linear evolution of equatorial spread F in the presence of shear', J. Geophys. Research, to appear. D. R. Fokkema (1996), Subspace methods for Linear, Nonlinear, and Eigen Problems, PhD thesis, U. of Utrecht, 1996. D. R. Fokkema, G. L. G. Sleijpen and H. A. van der Vorst (1999), 'JacobiDavidson style QR and QZ algorithms for the reduction of matrix pencils', SIAM J. Sci. Comput. 20, 94-125. B. Fornberg (1996), A Practical Guide to Pseudospectral Methods, Cambridge University Press, Cambridge. B. Fornberg and D. M. Sloan (1994), 'A review of pseudospectral methods for solving partial differential equations', in Vol. 3 of Ada Numerica, Cambridge University Press, pp. 203-267. V. Fraysse, L. Giraud and V. Toumazou (1996), 'Parallel computation of spectral portraits on the Meiko CS2', High-Performance Computing and Networking (H. Liddell, A. Colbrook, B. Hertzberger and P. Sloot, eds), Springer, pp. 312-318. V. Fraysse, M. Gueury, F. Nicoud and V. Toumazou (1996), 'Spectral portraits for matrix pencils', Technical Report TR/PA/96/19, CERFACS. V. Fraysse and V. Toumazou (1998), 'A note on the normwise perturbation theory for the regular generalized eigenproblem', Numer. Lin. Alg. Appl. 5, 1-10. R. W. Freund (1992), 'Quasi-kernel polynomials and their use in non-Hermitian matrix iterations', J. Comput. Appl. Math. 43, 135-158. E. Gallestey (1998a), 'Computing spectral value sets using the subharmonicity of the norm of rational matrices', BIT 38, 22-33. E. Gallestey (19986), Theory and Numerics of Spectral Value Sets, PhD thesis, U. Bremen. S. K. Godunov (1992), Spectral portraits of matrices and criteria for spectrum Dichotomy, Proc. Third IMACS-GAMM Symp. Comput. Arith.
292
L. N. TREFETHEN
Sci. Comput. (SCAN-91), (L. Atanassova and J. Herzberger, eds), NorthHolland, Amsterdam. S. K. Godunov (1997), Modern Aspects of Linear Algebra, Nauchnaya Kniga, Novosibirsk. In Russian. S. K. Godunov, A. Antonov, O. P. Kiriljuk and V. I. Kostin (1993), Guaranteed Accuracy in Numerical Linear Algebra, Kluwer. S. K. Godunov, O. P. Kiriljuk and V. I. Kostin (1990), Spectral portraits of matrices, Technical Report 3, Inst. of Math., Acad. Sci. USSR, Novosibirsk. S. K. Godunov and M. Sadkane (1996), 'Elliptic dichotomy of a matrix spectrum', Lin. Alg. Appl. 248, 205-232. A. Greenbaum (1997), Iterative Methods for Solving Linear Systems, SIAM,
Philadelphia. A. Greenbaum and L. N. Trefethen (1993), 'Do the pseudospectra of a matrix determine its behavior?', Technical Report 93-1371, Dept. Comput. Sci., Cornell U. P. Henrici (1962), 'Bounds for iterates, inverses, spectral variation and field of values of nonnormal matrices', Numer. Math. 4, 24-40. V. Heuveline, B. Philippe and M. Sadkane (1997), 'Parallel computation of spectral portrait of large matrices by Davidson type methods', Numer. Algs. 16, 55-75. D. J. Higham and B. Owren (1996), 'Non-normality effects in a discretised nonlinear reaction-convection-convection-diffusion equation', J. Comput. Phys. 124, 309-323. D. J. Higham and L. N. Trefethen (1993), 'Stiffness of ODEs', BIT 33, 285-303. N. J. Higham (1995), 'The Test Matrix Toolbox for MATLAB (Version 3.0)', Numer. Anal. Rep. 276, Manchester Ctr. Comp. Maths., Manchester, England; available online at h t t p : //www.maths.man. ac. uk/~higham/. N. J. Higham and F. Tisseur (1999), 'A block algorithm for matrix 1-norm estimation, with an application to 1-norm pseudospectra', Numer. Anal. Rep. 341, Manchester Ctr. Comp. Maths., Manchester, England. D. Hinrichsen and B. Kelb (1993), 'Spectral value sets: a graphical tool for robustness analysis', Systems Control Lett. 21, 127-136. D. Hinrichsen and A. J. Pritchard (1992), 'On spectral variations under bounded real matrix perturbations', Numer. Math. 60, 509-524. D. Hinrichsen and A. J. Pritchard (1994), 'Stability of uncertain systems', in Systems and Networks: Mathematical Theory and Applications, Vol. I, Akademie-Verlag, Berlin, pp. 159-182. G. F. Jonsson and L. N. Trefethen (1998), 'A numerical analyst looks at the "cutoff phenomenon" in card shuffling and other Markov chains', in Numerical Analysis 1997 (D. F. Griffiths, D. J. Higham and G. A. Watson, eds), Longman, Harlow, Essex, England.
COMPUTATION OF PSEUDOSPECTRA
293
T. Kato (1976), Perturbation Theory for Linear Operators, 2nd edn, Springer, New York. V. Kostin (1991), 'On definition of matrices' spectra', in High Performance Computing II (M. Durand and F. El Dabaghi, eds), Elsevier. H. O. Kreiss (1962), 'Uber die Stabilitatsdefinition fur Differenzengleichungen die partielle Differenzialgleichungen approximieren', BIT 2, 153181. H. J. Landau (1975), 'On Szego's eigenvalue distribution theory and nonHermitian kernels', J. d'Analyse Math. 28, 335-357. H. J. Landau (1976), 'Loss in unstable resonators', J. Opt. Soc. Amer. 66, 525-529. H. J. Landau (1977), 'The notion of approximate eigenvalues applied to an integral equation of laser theory', Quart. Appl. Math. 35, 165-172. L. Laszlo (1994), 'An attainable lower bound for the best normal approximation', SIAM J. Matrix Anal. Appl. 15, 1035-1043. P. Lavallee and M. Sadkane (1997), 'Une methode stable de bloc-diagonalisation de matrices: Application au calcul de portrait spectral', Technical Report 3141, INRIA, Rennes, 1997. R. B. Lehoucq, D. C. Sorensen and C. Yang (1998), ARPACK Users' Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods, SIAM, Philadelphia. S. H. Lui (1997), 'Computation of pseudospectra by continuation', SIAM J. Sci. Comput. 18, 567-573. A. Lumsdaine and D. Wu (1997), 'Spectra and pseudospectra of waveform relaxation operators', SIAM J. Sci. Comput. 18, 286-304. O. Marques and V. Toumazou (1995a), 'Spectral portrait computation by Lanczos method (augmented matrix method)', TR/PA/95/05, CERFACS. O. Marques and V. Toumazou (19956), 'Spectral portrait computation by Lanczos method (normal equation method)', TR/PA/95/02, CERFACS. R. A. McCoy and V. Toumazou (1997), PRECISE User's Guide - Version 1.0, Technical Report TR/PA/97/38, CERFACS, 1997. N. M. Nachtigal, L. Reichel and L. N. Trefethen (1992), 'A hybrid GMRES algorithm for nonsymmetric linear systems', SIAM J. Matrix Anal. Appl. 13, 796-825. P. J. Olsson and D. S. Henningson (1995), 'Optimal disturbance growth in watertable flow', Stud. Appl. Math. 94, 183-210. A. Pazy (1983), Semigroups of Linear Operators and Applications to Partial Differential Equations, Springer, New York. R. Plato (1997), 'Resolvent estimates for Abel integral operators and the regularization of associated first kind integral equations', J. Int. Eqs. Appl. 9, 253-278.
294
L. N. TREFETHEN
S. C. Reddy (1993), 'Pseudospectra of Wiener-Hopf integral operators and constant-coefficient differential operators', J. Int. Eqs. Appl. 5, 369-403. S. C. Reddy and D. S. Henningson (1993), 'Energy growth in viscous channel flows', J. Fluid Mech. 252, 209-238. S. C. Reddy, P. J. Schmid and D. S. Henningson (1993), 'Pseudospectra of the Orr-Sommerfeld operator', SIAM J. Appl. Math. 53, 15-47. S. C. Reddy and L. N. Trefethen (1990), 'Lax-stability of fully discrete spectral methods via stability regions and pseudo-eigenvalues', Comput. Meth. Appl. Mech. Engr. 80, 147-164. S. C. Reddy and L. N. Trefethen (1994), 'Pseudospectra of the convectiondiffusion operator', SIAM J. Appl. Math. 54, 1634-1649. L. Reichel and L. N. Trefethen (1992), 'Eigenvalues and pseudo-eigenvalues of Toeplitz matrices', Lin. Alg. Appl. 162-164, 153-185. K. Riedel (1994), 'Generalized epsilon-pseudospectra', SIAM J. Numer. Anal. 3 1 , 1219-1225. A. Ruhe (1994), 'The rational Krylov algorithm for large nonsymmetric eigenvalues - mapping the resolvent norms (pseudospectrum)', unpublished manuscript. A. Ruhe (1998), 'Rational Krylov: A practical algorithm for large sparse nonsymmetric matrix pencils', SIAM J. Sci. Comput. 19, 1535-1551. Y. Saad (1992), Numerical Methods for Large Eigenvalue Problems, Manchester U. Press, Manchester, England. P. J. Schmid, D. S. Henningson, M. Khorrami and M. Malik (1993), 'A sensitivity study of hydrodynamic stability operators', Theor. Comput. Fluid Dyn. 4, 227-240. V. Simoncini and E. Gallopoulos (1998), 'Transfer functions and resolvent norm approximation of large matrices', Elec. Trans. Numer. Anal. 7, 190-201. G. L. G. Sleijpen and H. A. van der Vorst (1996), 'A Jacobi-Davidson iteration method for linear eigenvalue problems', SIAM J. Matrix Anal. Appl. 17, 401-425. B. T. Smith et al. (1976), Matrix Eigensystem Routines - EISPACK Guide, Springer, Berlin. K.-C. Toh and L. N. Trefethen (1994), 'Pseudozeros of polynomials and pseudospectra of companion matrices', Numer. Math. 68, 403-425. K.-C. Toh and L. N. Trefethen (1996), 'Computation of pseudospectra by the Arnoldi iteration', SIAM J. Sci. Comput. 17, 1-15. K.-C. Toh and L. N. Trefethen (1999a), 'The Chebyshev polynomials of a matrix', SIAM J. Matrix Anal. Appl., to appear. K.-C. Toh and L. N. Trefethen (19996), 'The Kreiss matrix theorem on a general complex domain', SIAM J. Matrix Anal. Appl., to appear. V. Toumazou (1996), Portraits Spectraux de Matrices: Un Outil d'Analyse
COMPUTATION OF PSEUDOSPECTRA
295
de la Stabilite, PhD thesis, U. Raymond Poincare, Nancy I, Technical Report TH/PA/96/46, CERFACS. A. E. Trefethen et al. (1996), 'MultiMATLAB: MATLAB on multiple processors', Tech. Rep. CTC96TR293, Cornell Theory Center, h t t p : / / c s - t r . cs.Cornell.edu:80/Dienst/UI/l.O/Display/ncstrl.cornell/TR96 -1586. A. E. Trefethen, L. N. Trefethen and P. J. Schmid (1999), 'Spectra and pseudospectra for pipe Poiseuille flow', Comput. Meth. Appl. Mech. Engr., to appear. L. N. Trefethen (1990), 'Approximation theory and numerical linear algebra', in Algorithms for Approximation II (J. C. Mason and M. G. Cox, eds), Chapman and Hall, London. L. N. Trefethen (1992), 'Pseudospectra of matrices', in Numerical Analysis 1991 (D. F. Griffiths and G. A. Watson, eds), Longman Scientific and Technical, Harlow, Essex, England, pp. 234-266. L. N. Trefethen (1997), 'Pseudospectra of linear operators', SIAM Review 39, 383-406. L. N. Trefethen (1999), 'Spectra and pseudospectra: The behavior of nonnormal matrices and operators', in The Graduate Student's Guide to Numerical Analysis, Vol. 26 of SSCM series (M. Ainsworth, J. Levesley and M. Marietta, eds), Springer, Berlin, to appear. L. N. Trefethen and D. Bau, III (1997), Numerical Linear Algebra, SIAM, Philadelphia. L. N. Trefethen, A. E. Trefethen, S. C. Reddy and T. A. Driscoll (1993), 'Hydrodynamic stability without eigenvalues', Science 261, 578-584. J. M. Varah (1979), 'On the separation of two matrices', SIAM J. Numer. Anal. 16, 216-222. J. H. Wilkinson (1965), The Algebraic Eigenvalue Problem, Clarendon Press, Oxford.
Acta Numerica 1999 Volume 8 CONTENTS Numerical relativity: challenges for computational science Gregory B. Cook and Saul A. Teukolsky Radiation boundary conditions for the numerical simulation of waves.. Thomas Hagstrom
1
47
Numerical methods in tomography Frank Natterer
107
Approximation theory of the MLP model in neural networks Allan Pinkus
143
An introduction to numerical methods for stochastic differential equations Eckhard Platen Computation of pseudospectra Lloyd N. Trefethen
197
247
Acta Numerica is an annual publication containing invited survey papers by leading researchers in numerical mathematics and scientific computing. The papers present overviews of recent developments in their area and provide 'state of the art' techniques and analysis.