Lecture Notes in Mathematics Editors: J.--M. Morel, Cachan F. Takens, Groningen B. Teissier, Paris Subseries: Fondazione C.I.M.E., Firenze Adviser: Pietro Zecca
1825
3 Berlin Heidelberg New York Hong Kong London Milan Paris Tokyo
J. H. Bramble
A. Cohen
W. Dahmen
Multiscale Problems and Methods in Numerical Simulations Lectures given at the C.I.M.E. Summer School held in Martina Franca, Italy, September 9-15, 2001 Editor: C. Canuto
13
Editor and Authors Claudio Canuto Dipartimento di Matematica Politecnico di Torino Corso Duca degli Abruzzi 24 10129 Torino, Italy
Albert Cohen Laboratoire Jacques-Louis Lions Universit´e Pierre et Marie Curie 175 rue du Chevaleret 75013 Paris, France
e-mail:
[email protected]
e-mail:
[email protected]
James H. Bramble Mathematics Department Texas A&M University College Station Texas TX 77843-3368 USA
Wolfgang Dahmen Institut f¨ur Geometrie und Praktische Mathematik RWTH Aachen Templergraben 55 52056 Aachen, Germany
e-mail:
[email protected]
e-mail:
[email protected]
Cataloging-in-Publication Data applied for Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at http://dnb.ddb.de
Mathematics Subject Classification (2000): 82D37, 80A17, 65Z05 ISSN 0075-8434 ISBN 3-540-20099-1 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specif ically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microf ilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science + Business Media GmbH http://www.springer.de c Springer-Verlag Berlin Heidelberg 2003 Printed in Germany The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specif ic statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready TEX output by the authors SPIN: 10953471
41/3142/du - 543210 - Printed on acid-free paper
These Lecture Notes are dedicated to the victims of the brutal attacks of September 11, 2001, including all who were affected. All of us who attended the C.I.M.E. course, Americans and non-Americans alike, were shocked and horrified by what took place. We all hope for a saner world.
Preface
The C.I.M.E. course on “Multiscale Problems and Methods in Numerical Simulation” was held in Martina Franca (Italy) from September 9 to 15, 2001. The purpose of the course was to disseminate a number of new ideas that had emerged in the previous few years in the field of numerical simulation, bearing the common denominator of the “multiscale” or “multilevel” paradigm. This takes various forms, such as: the presence of multiple relevant “scales” in a physical phenomenon, with their natural mathematical and numerical counterparts; the detection and representation of “structures”, localized in space or in frequency, in the unknown variables described by a model; the decomposition of the mathematical or numerical solution of a differential or integral problem into “details”, which can be organized and accessed in decreasing order of importance; the iterative solution of large systems of linear algebraic equations by “multilevel” decompositions of finite-dimensional spaces. Four world leading experts illustrated the multiscale approach to numerical simulation from different perspectives. Jim Bramble, from Texas A&M University, described modern multigrid methods for finite element discretizations, and the efficient multilevel realization of norms in Sobolev scales. Albert Cohen, from Universit´e Pierre et Marie Curie in Paris, smoothly guided the audience towards the realm of “Nonlinear Approximation”, which provides a mathematical ground for state-of-the-art signal and image processing, statistical estimation and adaptive numerical discretizations. Wolfgang Dahmen, from RWTH in Aachen, described the use of wavelet bases in the design of computationally optimal algorithms for the numerical treatment of operator equations. Tom Hughes, from Stanford University, presented a general approach to derive variational methods capable of representing multiscale phenomena, and detailed the application of the variational multiscale formulation to Large Eddy Simulation (LES) in fluid-dynamics, using the Fourier basis. The“senior” lecturers were complemented by four “junior” speakers, who gave account of supplementary material, detailed examples or applications. Ken Jansen, from Rensselaer Polytechnic Institute in Troy, discussed variational multiscale methods for LES using a hierarchical basis and finite el-
VIII
Preface
ements. Joe Pasciak, from Texas A&M University, extended the multigrid and multilevel approach presented by Bramble to the relevant case of symmetric indefinite second order elliptic problems. Rob Stevenson, from Utrecht University, reported on the construction of finite element wavelets on general domains and manifolds, i.e., wavelet bases for standard finite element spaces. Karsten Urban, from RWTH in Aachen, illustrated the construction of orthogonal and biorthogonal wavelet bases in complex geometries by the domain decomposition and mapping approach. Both the senior and the junior lecturers contributed to the scientific success of the course, which was attended by 48 participants from 13 different countries. Not only the speakers presented their own material and perspective in the most effective manner, but they also made a valuable effort to dynamically establishing cross-references with other lecturers’ topics, leading to a unitary picture of the course theme. On Tuesday, September 11, we were about to head for the afternoon session, when we were hit by the terrible news coming from New York City. Incredulity, astonishment, horror, anger, worry (particularly for the families of our American friends) were the sentiments that alternated in our hearts. No space for Mathematics was left in our minds. But on the next day, we unanimously decided to resume the course with even more determination than before; we strongly believe, and we wanted to testify, that only rationality can defeat irrationality, that only the free circulation of ideas and the mutual exchange of experiences, as it occurs in science, can defeat darkness and terror. The present volume collects the expanded version of the lecture notes by Jim Bramble, Albert Cohen and Wolfgang Dahmen. I am grateful to them for the timely production of such high quality scientific material. As the scientific director of the course, I wish to thank the former Director of C.I.M.E., Arrigo Cellina, and the whole Scientific Board of the Centre, for inviting me to organize the event, and for providing us the nice facilities in Martina Franca as well as part of the financial support. Special thanks are due to the Secretary of C.I.M.E., Vincenzo Vespri. Generous funding for the course was provided by the I.N.D.A.M. Groups G.N.C.S. and G.N.A.M.P.A. Support also came from the Italian Research Project M.U.R.S.T. Cofin 2000 “Calcolo Scientifico: Modelli e Metodi Numerici Innovativi” and from the European Union T.M.R. Project “Wavelets in Numerical Simulation”. The organization and the realization of the school would have been by far less successful without the superb managing skills and the generous help of Anita Tabacco. A number of logistic problems were handled and solved by Stefano Berrone, as usual in the most efficient way. The help of Dino Ricchiuti, staff member of the Dipartimento di Matematica at the Politecnico di Torino, is gratefully acknowledged. Finally, I wish to thank Giuseppe Ghib` o for his accurate job of processing the electronic version of the notes. Torino, February 2003
Claudio Canuto
CIME’s activity is supported by: Ministero dell’Universit` a Ricerca Scientifica e Tecnologica COFIN ’99; Ministero degli Affari Esteri - Direzione Generale per la Promozione e la Cooperazione - Ufficio V; Consiglio Nazionale delle Ricerche; E.U. under the Training and Mobility of Researchers Programme; UNESCO - ROSTE, Venice Office.
Contents
Theoretical, Applied and Computational Aspects of Nonlinear Approximation Albert Cohen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The Haar System and Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Linear Uniform Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Nonlinear Adaptive Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Data Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Statistical Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Adaptive Numerical Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 The Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 4 7 9 15 18 21 24 26 28
Multiscale and Wavelet Methods for Operator Equations Wolfgang Dahmen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Examples, Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Sparse Representations of Functions, an Example . . . . . . . . . . . . . . 2.2 (Quasi-) Sparse Representation of Operators . . . . . . . . . . . . . . . . . . 2.3 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Wavelet Bases – Main Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 The General Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Notational Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Main Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Criteria for (NE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 What Could Additional Conditions Look Like? . . . . . . . . . . . . . . . . 4.2 Fourier- and Basis-free Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31 31 32 32 36 37 39 39 39 40 40 45 45 46
XII
Contents
5 Multiscale Decompositions – Construction and Analysis Principles . . . 5.1 Multiresolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Stability of Multiscale Transformations . . . . . . . . . . . . . . . . . . . . . . . 5.3 Construction of Biorthogonal Bases – Stable Completions . . . . . . 5.4 Refinement Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Structure of Multiscale Transformations . . . . . . . . . . . . . . . . . . . . . . 5.6 Parametrization of Stable Completions . . . . . . . . . . . . . . . . . . . . . . . 6 Scope of Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Scalar 2nd Order Elliptic Boundary Value Problem . . . . . . . . . . . . 6.3 Global Operators – Boundary Integral Equations . . . . . . . . . . . . . . 6.4 Saddle Point Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 An Equivalent 2 -Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Connection with Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 There is always a Positive Definite Formulation – Least Squares . 8 Adaptive Wavelet Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introductory Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Adaptivity from Several Perspectives . . . . . . . . . . . . . . . . . . . . . . . . 8.3 The Basic Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 (III) Convergent Iteration for the ∞-dimensional Problem . . . . . 8.5 (IV) Adaptive Application of Operators . . . . . . . . . . . . . . . . . . . . . . 8.6 The Adaptive Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Ideal Bench Mark – Best N -Term Approximation . . . . . . . . . . . . . 8.8 Compressible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.9 Fast Approximate Matrix/Vector Multiplication . . . . . . . . . . . . . . . 8.10 Application Through Uzawa Iteration . . . . . . . . . . . . . . . . . . . . . . . . 8.11 Main Result – Convergence/Complexity . . . . . . . . . . . . . . . . . . . . . . 8.12 Some Ingredients of the Proof of Theorem 8 . . . . . . . . . . . . . . . . . . 8.13 Approximation Properties and Regularity . . . . . . . . . . . . . . . . . . . . 9 Further Issues, Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Nonlinear Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Time Dependent Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Appendix: Some Useful Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Function Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Local Polynomial Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Condition Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51 51 52 53 53 55 56 57 57 59 59 61 66 67 68 68 68 70 70 71 74 75 76 76 77 79 79 80 85 88 88 90 90 90 91 92 93
Multilevel Methods in Finite Elements James H. Bramble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 1.1 Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 1.2 A Model Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 1.3 Finite Element Approximation of the Model Problem . . . . . . . . . . 100 1.4 The Stiffness Matrix and its Condition Number . . . . . . . . . . . . . . . 101 1.5 A Two-Level Multigrid Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Contents
XIII
2 Multigrid I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 2.1 An Abstract V-cycle Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 2.2 The Multilevel Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 2.3 The Abstract V-cycle Algorithm, I . . . . . . . . . . . . . . . . . . . . . . . . . . 108 2.4 The Two-level Error Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 2.5 The Braess-Hackbusch Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 3 Multigrid II: V-cycle with Less Than Full Elliptic Regularity . . . . . . . . 112 3.1 Introduction and Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 3.2 The Multiplicative Error Representation . . . . . . . . . . . . . . . . . . . . . 116 3.3 Some Technical Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 3.4 Uniform Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4 Non-nested Multigrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.1 Non-nested Spaces and Varying Forms . . . . . . . . . . . . . . . . . . . . . . . 121 4.2 General Multigrid Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.3 Multigrid V-cycle as a Reducer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.4 Multigrid W-cycle as a Reducer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.5 Multigrid V-cycle as a Preconditioner . . . . . . . . . . . . . . . . . . . . . . . . 131 5 Computational Scales of Sobolev Norms . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.2 A Norm Equivalence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.3 Development of Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.4 Preconditioning Sums of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 138 k . . . . . . . . . . . . . . . . . . . . . . . 139 5.5 A Simple Approximation Operator Q Some Basic Approximation Properties . . . . . . . . . . . . . . . . . . . . . . . 140 Approximation Properties: the Multilevel Case . . . . . . . . . . . . . . . . 141 The Coercivity Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 A Preconditioning Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Two Examples Involving Sums of Operators . . . . . . . . . . . . . . . . . . 147 H 1 (Ω) Bounded Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 List of Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Theoretical, Applied and Computational Aspects of Nonlinear Approximation Albert Cohen Laboratoire d’Analyse Num´erique, Universit´e Pierre et Marie Curie, Paris
[email protected] Summary. Nonlinear approximation has recently found computational applications such as data compression, statistical estimation or adaptive schemes for partial differential or integral equations, especially through the development of waveletbased methods. The goal of this paper is to provide a short survey of nonlinear approximation in the perspective of these applications, as well as to stress some remaining open areas.
1 Introduction Approximation theory is the branch of mathematics which studies the process of approximating general functions by simple functions such as polynomials, finite elements or Fourier series. It plays therefore a central role in the accuracy analysis of numerical methods. Numerous problems of approximation theory have in common the following general setting: we are given a family of subspaces (SN )N ≥0 of a normed space X, and for f ∈ X, we consider the best approximation error σN (f ) := inf f − gX . g∈SN
(1)
Typically, N represents the number of parameters needed to describe an element in SN , and in most cases of interest, σN (f ) goes to zero as this number tends to infinity. For a given f , we can then study the rate of approximation, i.e., the range of r ≥ 0 for which there exists C > 0 such that σN (f ) ≤ CN −r .
(2)
Note that in order to study such an asymptotic behaviour, we can use a sequence of near-best approximation, i.e., fN ∈ SN such that f − fN X ≤ CσN (f ),
(3)
J.H. Bramble, A. Cohen, and W. Dahmen: LNM 1825, C. Canuto (Ed.), pp. 1–29, 2003. c Springer-Verlag Berlin Heidelberg 2003
2
Albert Cohen
with C > 1 independent of N . Such a sequence always exists even when the infimum is not attained in (1), and clearly (2) is equivalent to the same estimate with f − fN X in place of σN (f ). Linear approximation deals with the situation when the SN are linear subspaces. Classical instances of linear approximation families are the following: 1) Polynomial approximation: SN := ΠN , the space of algebraic polynomials of degree N . 2) Spline approximation with uniform knots: some integers 0 ≤ k < m being fixed, SN is the spline space on [0, 1], consisting of C k piecewise polynomial functions of degree m on the intervals [j/N, (j + 1)/N ], j = 0, · · · , N − 1. 3) Finite element approximation on fixed triangulations: SN are finite element spaces associated with triangulations TN where N is the number of triangles in TN . 4) Linear approximation in a basis: given a basis (ek )k≥0 in a Banach space, SN := Span(e0 , · · · , eN ). In all these instances, N is typically the dimension of SN , possibly up to some multiplicative constant. Nonlinear approximation addresses in contrast the situation where the SN are not linear spaces, but are still typically characterized by O(N ) parameters. Instances of nonlinear approximation families are the following: 1) Rational approximation: SN := { pq ; p, q ∈ ΠN }, the set rational functions of degree N . 2) Free knot spline approximation: some integers 0 ≤ k < m being fixed, SN is the spline space on [0, 1] with N free knots, consisting of C k piecewise polynomial functions of degree m on intervals [xj , xj+1 ], for all partitions 0 = x0 < x1 · · · < xN −1 < xN = 1. 3) Adaptive finite element approximation: SN are the union of finite element spaces VT of some fixed type associated to all triangulations T of cardinality less or equal to N . 4) N -term approximation in a basis: given a basis (ek )k≥0 in a Banach space, SN is the set of all possible combinations k∈E xk ek with #(E) ≤ N . Note that these examples are in some sense nonlinear generalizations of the previous linear examples, since they include each of them as particular subsets. Also note that in all of these examples (except for the splines with uniform knots), we have the natural property SN ⊂ SN +1 , which expresses that the approximation is “refined” as N grows. On a theoretical level, a basic problem, both for linear and nonlinear approximation can be stated as follows: Problem 1: Given a nonlinear family (SN )N ≥0 , what are the analytic properties of a function f which ensure a prescribed rate σN (f ) ≤ CN −r ?
Nonlinear Approximation
3
By “analytic properties”, we typically have in mind smoothness, since we know that in many contexts a prescribed rate r can be achieved provided that f belongs to some smoothness class Xr ⊂ X. Ideally, one might hope to identify the maximal class Xr such that the rate r is ensured, i.e., have a sharp result of the type f ∈ Xr ⇔ σN (f ) ≤ CN −r .
(4)
Another basic problem, perhaps on a slightly more applied level, is the effective construction of near-best approximants. Problem 2: Given a nonlinear family (SN )N ≥0 , find a simple implementable procedure f → fN ∈ SN such that f − fN X ≤ CσN (f ) for all N ≥ 0. In the case of linear approximation, this question is usually solved if we can find a sequence of projectors PN : X → SN such that PN X→X ≤ K with K independent of N (in this case, simply take fN = PN f and remark that f − fN X ≤ (1 + K)σN (f )). It is in general a more difficult problem in the case of nonlinear method. Since the 1960’s, research in approximation theory has evolved significantly toward nonlinear methods, in particular solving the two above problems for various spaces SN . More recently, nonlinear approximation became attractive on a more applied level, as a tool to understand and analyze the performance of adaptive methods in signal and image processing, statistics and numerical simulation. This is in part due to the emergence of wavelet bases for which simple N term approximations (derived by thresholding the coefficients) yield in some sense optimal adaptive approximations. In such applications, the problems that arise are typically the following ones. Problem 3 (data compression): How can we exploit the reduction of parameters in the approximation of f by fN ∈ SN in the perspective of optimally encoding f by a small number of bits ? This raises the question of a proper quantization of these parameters. Problem 4 (statistical estimation): Can we use nonlinear approximation as a denoising scheme ? In this perspective, we need to understand the interplay between the approximation process and the presence of noise. Problem 5 (numerical simulation): How can we compute a proper nonlinear approximation of a function u which is not given to us as a data but as the solution of some problem F (u) = 0 ? This is in particular the goal of adaptive refinement strategies in the numerical treatment of PDE’s. The goal of the present paper is to briefly survey the subject of nonlinear approximation, with a particular focus on questions 1 to 5, and some emphasis on wavelet-based methods. We would like to point out that these questions are also addressed in the survey paper [15] which contains a more substantial
4
Albert Cohen
development on the theoretical aspects. We hope that our notes might be helpful to the non-expert reader who wants to get a first general and intuitive vision of the subject, from the point of view of its various applications, before perhaps going into a more detailed study. The paper is organized as follows. As a starter, we discuss in §2 a simple example, based on piecewise constant functions, which illustrate the differences between linear and nonlinear approximation, and we discuss a first algorithm which produces nonlinear piecewise constant approximations. In §3, we show that such approximations can also be produced by thresholding the coefficients in the Haar wavelet system. In §4, we give the general results on linear uniform approximation of finite element or wavelet types. General results on nonlinear adaptive approximations by wavelet thresholding or adaptive partitions are given in §5. Applications to signal compression and estimation are discussed in §6 and §7. Applications to adaptive numerical simulation are shortly described in §8. Finally, we conclude in §9 by some remarks and open problems arising naturally in the multivariate setting.
2 A Simple Example Let us consider the approximation of functions defined on the unit interval I = [0, 1] by piecewise constant functions. More precisely, given a disjoint partition of I into N subintervals I0 , · · · , IN −1 and a function f in L1 (I), we shall approximate f on each Ik by its average aIk (f ) = |Ik |−1 Ik f (t)dt. The resulting approximant can thus be written as fN :=
N
aIk (f )χIk .
(5)
k=1
If the Ik are fixed independently of f , then fN is simply the orthogonal projection of f onto the space of piecewise constant functions on the partition Ik , i.e., a linear approximation of f . A natural choice is the uniform partition Ik := [k/N, (k + 1)/N ]. With such a choice, let us now consider the error between f and fN , for example in the L∞ metric. For this, we shall assume that f is in C(I), the space of continuous functions on I. It is then clear that on each Ik we have |f (t) − fN (t)| = |f (t) − aIk (f )| ≤ sup |f (t) − f (u)|.
(6)
t,u∈Ik
We thus have the error estimate f − fN L∞ ≤ sup sup |f (t) − f (u)|. k
(7)
t,u∈Ik
This can be converted into an estimate in terms of N , under some additional smoothness assumptions on f . In particular, if f has a bounded first derivative, we have supt,u∈Ik |f (t) − f (u)| ≤ |Ik |f L∞ = N −1 f L∞ , and thus
Nonlinear Approximation
f − fN L∞ ≤ N −1 f L∞ .
5
(8)
Similarly, if f is in the H¨ older space C α for some α ∈]0, 1[, i.e., if for all x, y ∈ [0, 1], |f (x) − f (y)| ≤ C|x − y|α , (9) we obtain the estimate f − fN L∞ ≤ CN −α .
(10)
By considering simple examples such as f (x) = xα for 0 < α ≤ 1, one can easily check that this rate is actually sharp. In fact it is an easy exercise to check that a converse result holds : if a function f ∈ C([0, 1]) satisfies (10) for some α ∈]0, 1[ then necessarily f is in C α , and f is in L∞ in the case where α = 1. Finally note that we cannot hope for a better rate than N −1 : this reflects the fact that piecewise constant functions are only first order accurate. If we now consider an adaptive partition where the Ik depend on the function f itself, we enter the topic of nonlinear approximation. In order to understand the potential gain in switching from uniform to adaptive partitions, let us consider a function f such that f is integrable, i.e., f is in the space W 1,1 . Since we have supt,u∈Ik |f (t) − f (u)| ≤ Ik |f (t)|dt, we see that a natural choice of the Ik can be made by equalizing the quantities −1 1 |f (t)|dt = N |f (t)|dt, so that, in view of the basic estimate (7), we Ik 0 obtain the error estimate f − fN L∞ ≤ N −1 f L1 .
(11)
In comparison with the uniform/linear situation, we thus have obtained the same rate as in (8) for a larger class of functions, since f is not assumed to be bounded but only integrable. On a slightly different angle, the nonlinear approximation rate might be significantly better than the linear rate for a fixed function f . For instance, the function f (x) = xα , 0 < α ≤ 1, has the linear rate N −α and the nonlinear rate N −1 since f (x) = αxα−1 is in L1 (I). Similarly to the linear case, it can be checked that a converse result holds : if f ∈ C([0, 1]) is such that σN (f ) ≤ CN −1 , (12) where σN (f ) is the L∞ error of best approximation by adaptive piecewise constant functions on N intervals, then f is necessarily in W 1,1 . The above construction of an adaptive partition based on balancing the L1 norm of f is somehow theoretical, in the sense that it pre-assumes a certain amount of smoothness for f . A more realistic adaptive approximation algorithm should also operate on functions which are not in W 1,1 . We shall describe two natural algorithms for building an adaptive partition. The first algorithm is sometimes known as adaptive splitting and was studied e.g. in [17]. In this algorithm, the partition is determined by a prescribed tolerance ε > 0
6
Albert Cohen
which represents the accuracy that one wishes to achieve. Given a partition of [0, 1], and any interval Ik of this partition, we split Ik into two sub-intervals of equal size if f − aIk (f )L∞ (Ik ) ≥ ε or leave it as such otherwise. Starting this procedure on the single I = [0, 1] and using a fixed tolerance ε > 0 at each step, we end with an adaptive partition (I1 , · · · , IN ) with N (ε) and a corresponding piecewise constant approximation fN with N = N (ε) pieces such that f − fN L∞ ≤ ε. Note that we now have the restriction that the Ik are dyadic intervals, i.e., intervals of the type 2−j [n, n + 1]. We now want to understand how the adaptive splitting algorithm behaves in comparison to the optimal partition. In particular, do we also have that f − fN L∞ ≤ CN −1 when f ∈ L1 ? The answer to this question turns out to be negative, but a slight strengthening of the smoothness assumption will be sufficient to ensure this convergence rate : we shall instead assume that the maximal function of f is in L1 . We recall that the maximal function of a locally integrable function g is defined by |g(t)|dt. (13) M g(x) := sup [vol(B(x, r)]−1 r>0
B(x,r)
It is known that M g ∈ L if and only if g ∈ Lp for 1 < p < ∞ and that M g ∈ L1 if and only if g ∈ L log L, i.e., |g| + |g log |g|| < ∞. Therefore, the assumption that M f is integrable is only slightly stronger than f ∈ W 1,1 . If (I1 , · · · , IN ) is the final partition, consider for each k the interval Jk which is the parent of Ik in the splitting process, i.e., such that Ik ⊂ Jk and |Jk | = 2|IK |. We therefore have ε ≤ f − aJk (f )L∞ ≤ |f (t)|dt. (14) p
Jk
For all x ∈ Ik , the ball B(x, 2|Ik |) contains Jk and it follows therefore that M f (x) ≥ [vol(B(x, 2|Ik |)]−1 |f (t)|dt ≥ [4|Ik |]−1 ε, (15) B(x,2|Ik |)
which implies in turn
M f (t)dt ≥ ε/4.
(16)
Ik
If M f is integrable, this yields the estimate 1 −1 M f (t)dt. N (ε) ≤ 4ε
(17)
0
It follows that 1
f − fN L∞ ≤ CN −1
(18)
with C = 4 0 M f . Note that in this case this is only a sufficient condition for the rate N −1 (a simple smoothness condition which characterizes this rate is still unknown).
Nonlinear Approximation
7
3 The Haar System and Thresholding The second algorithm is based on thresholding the decomposition of f in the simplest wavelet basis, namely the Haar system. The decomposition of a function f defined on [0, 1] into the Haar system is illustrated on Figure 1. The first component in this decomposition is the average of f , i.e., the projection onto the constant function ϕ = χ[0,1] , i.e., P0 f = f, ϕϕ.
(19)
The approximation is then recursively refined into Pj f =
j 2 −1
f, ϕj,k ϕj,k ,
(20)
k=0
where ϕj,k = 2j/2 ϕ(2j · −k), i.e., averages of f on the intervals Ij,k = [2−j k, 2−j (k + 1)[, k = 0, · · · , 2j − 1. Clearly Pj f is the L2 -orthogonal projection of f onto the space Vj of piecewise constant functions on the intervals Ij,k , k = 0, · · · , 2j − 1. The orthogonal complement Qj f = Pj+1 f − Pj f is spanned by the basis functions ψj,k = 2j/2 ψ(2j · −k), k = 0, · · · , 2j − 1,
(21)
where ψ is 1 on [0, 1/2[, −1 on [1/2, 1[ and 0 elsewhere. By letting j go to +∞, we therefore obtain the expansion of f into an orthonormal system of L2 ([0, 1]) j −1 2 f = f, ϕϕ + dλ ψλ . (22)
f, ψj,k ψj,k = j≥0 k=0
λ
Here we use the notation ψλ and dλ = f, ψλ in order to concatenate the scale and space parameters j and k into one index λ = (j, k), which varies in a suitable set ∇, and to include the very first function ϕ into the same notation. We shall keep track of the scale by using the notation |λ| = j
(23)
whenever the basis function ψλ has resolution 2−j . This simple example is known as the Haar system since its introduction by Haar in 1909. Its main limitation is that it is based on piecewise constant functions which are discontinuous and only allow for approximation of low order accuracy. We shall remedy to this defect by using smoother wavelet bases in the next sections. We can use wavelets in a rather trivial way to build linear approximations of a function f since the projections of f onto Vj are given by Pj f = dλ ψλ . (24) |λ|<j λ
8
Albert Cohen
Figure 1. Decomposition into the Haar system
Such approximations simply correspond to the case N = 2j using the linear projection onto piecewise constant function on a uniform partition of N intervals, as studied in the previous section. On the other hand, one can think of using only a restricted set of wavelet at each scale j in order to build nonlinear adaptive approximations. A natural way to obtain such adaptive approximation is by thresholding, i.e., keeping only the largest contributions dλ ψλ in the wavelet expansion of f . Such a strategy will lead to an adaptive discretization of f due to the fact that the size of wavelet coefficients dλ is influenced by the local smoothness of f . Indeed, if f is simply bounded on the support Sλ of ψλ , we have the obvious estimate |dλ | = | f, ψλ | ≤ sup |f (t)| |ψλ | = 2−|λ|/2 sup |f (t)|. (25) t∈Sλ
t∈Sλ
On the other hand, if f is continuously differentiable on Sλ , we can use the fact that ψλ = 0 to write |dλ | = inf c∈IR | f − c, ψλ | ≤ 2−|λ|/2 inf c∈IR supt∈Sλ |f (t) − c| ≤ 2−|λ|/2 supt,u∈Sλ |f (t) − f (u)| ≤ 2−3|λ|/2 supt∈Sλ |f (t)|. Note that if f were not differentiable on Sλ but simply H¨ older continuous of exponent α ∈]0, 1[, a similar computation would yield the intermediate estimate |dλ | ≤ C2−(α+1/2)|λ| . As in the case of Fourier coefficients, more smoothness implies a faster decay, yet a fundamental difference is that only local smoothness is involved in the wavelet estimates. Therefore, if f is C 1 everywhere except at some isolated point x , the estimation of |dλ | by 2−3|λ|/2 will only be lost for those λ such that x ∈ Sλ . In that sense, multiscale
Nonlinear Approximation
9
representations are better adapted than Fourier representations to concentrate the information contained in functions which are not uniformly smooth. This is illustrated by the following example. We display on Figure 2 the function f (x) = | cos(2πx)|, which has a cusp singularity at points x = 1/4 and x = 3/4, and which is discretized at resolution 2−13 , in order to compute its coefficients in the Haar basis for |λ| < 13. In order to visualize the effect of local smoothness on these coefficients, we display on Figure 3 the set of indices λ = (j, k) such that |dλ | is larger than the threshold ε = 5 × 10−3 , measuring the spatial position of the wavelet 2−j k in the x axis and its scale level j in the y axis. We observe that for j > 4, the coefficients above the threshold are only concentrated in the vicinity of the singularities. This is explained by the fact that the decay of the coefficients is governed by |dλ | ≤ 2−3|λ|/2 supt∈Sλ |f (t)| in the regions of smoothness, while the estimate |dλ | ≤ C2−(α+1/2)|λ| with α = 1/2 will prevail near the singularities. Figure 4 displays the result of the reconstruction of f using only this restricted set of wavelet coefficients, fε = dλ ψλ , (26) |dλ |>ε
and it reveals the spatial adaptivity of the thresholding operator: the approximation is automatically refined in the neighbourhood of the singularities where wavelet coefficients have been kept up to the resolution level j = 8. In this example, we have kept the largest components dλ ψλ measured in the L2 norm. This strategy is ideal to minimize the L2 error of approximation for a prescribed number N of preserved coefficients. If we are interested in the L∞ error, we shall rather choose to keep the largest components measured in the L∞ norm, i.e., the largest normalized coefficients |dλ |2|λ|/2 . Just as in the case of the adaptive splitting algorithm, we might want to understand how the partition obtained by wavelet thresholding behaves in comparison to the optimal partition. The answer is again that it is nearly optimal, however we leave this question aside since we shall provide much more general results on the performance of wavelet thresholding in §4. The wavelet approach to nonlinear approximation is particularly attractive for the following reason: in this approach, the nonlinearity is reduced to a very simple operation (thresholding according to the size of the coefficients), resulting in simple and efficient algorithms for dealing with many applications, as well as a relatively simple analysis of these applications.
4 Linear Uniform Approximation We now address linear uniform approximation in more general terms. In order to improve on the rate N −1 obtained with piecewise constant functions, one needs to introduce approximants with a higher degree of accuracy, such as splines or finite element spaces. In the case of linear uniform approximation,
10
Albert Cohen
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.1
0.2
0.3
0.4
0.5
Figure 2. The function f (x) =
0.6
| cos(2πx)|
0.7
0.8
0.9
1
0.6
0.7
0.8
0.9
1
8 7 6 5 4 3 2 1 0
0.1
0.2
0.3
0.4
0.5
Figure 3. Coefficients above threshold ε = 5 × 10−3
Nonlinear Approximation
11
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 4. Reconstruction from coefficients above threshold
these spaces consists of piecewise polynomial functions onto regular partitions Th with uniform mesh size h. If Vh is such a space discretizing a regular domain Ω ⊂ IRd , its dimension is therefore of the same order as the number of balls of radius h which are needed to cover Ω, namely N = dim(Vh ) ∼ h−d .
(27)
The approximation theory for such spaces is quite classical, see, e.g., [5], and can be summarized in the following way. If W s,p denotes the classical Sobolev space, consisting of those functions in Lp such that Dα f ∈ Lp for |α| ≤ s, we typically have the error estimate inf f − gW s,p ≤ Cht f W s+t,p
g∈Vh
(28)
provided that Vh is contained in W s,p and that Vh has approximation order larger than s + t, i.e., contains all polynomials of degree strictly less than s + t. In the particular case s = 0, this gives inf f − gLp ≤ Cht f W t,p .
g∈Vh
(29)
Such classical results also hold for fractional smoothness. If we rewrite them in terms of the decay of the best approximation error with respect to the number of parameters, we therefore obtain that if X = W s,p , we have σN (f ) ≤ CN −t/d ,
(30)
12
Albert Cohen
provided that f has t additional derivatives in the metric Lp compared to the general functions in X. Therefore, the compromise between the Lp or W s,p approximation error and the number of parameters is governed by the approximation order of the Vh spaces, the dimension d and the level of smoothness of f measured in Lp . Such approximation results can be understood at a very basic and intuitive level: if Vh contains polynomials of degree t − 1, we can think of the approximation of f as a close substitute to its Taylor expansion fK at this order on each element K ∈ Th , which has accuracy ht |Dt f |, and (29) can then be thought as the integrated version of this local error estimate. At this stage it is interesting to look at linear approximation from the angle of multiscale decompositions into wavelet bases. Such bases are generalizations of the Haar system which was discussed in the previous section, and we shall first recall their main features (see [14] and [6] for more details). They are associated with multiresolution approximation spaces (Vj )j≥0 such that Vj ⊂ Vj+1 and Vj is generated by a local basis (ϕλ )|λ|=j . By local we mean that the supports are controlled by diam(supp(ϕλ )) ≤ C2−j
(31)
if λ ∈ Γj and are “almost disjoint” in the sense that #{µ ∈ Γj s.t. supp(ϕλ ) ∩ supp(ϕµ ) = ∅} ≤ C
(32)
with C independent of λ and j. Such spaces can be built in particular as nested finite element spaces Vj = Vhj with mesh size hj ≈ 2−j in which case (ϕλ )|λ|=j is the classical nodal basis. Just as in the case of the Haar system, a complement space Wj of Vj into Vj+1 is generated by a similar local basis (ψλ )|λ|=j . The full multiscale wavelet basis (ψλ ), allows us to expand an arbitrary function f with the convention that we incorporate the functions (ϕλ )|λ|=0 into the first “layer” (ψλ )|λ|=0 . In the standard constructions of wavelets on the euclidean space IRd , the scaling functions have the form ϕλ = ϕj,k = 2jd/2 ϕ(2j ·−k), k ∈ ZZd and similarly for the wavelets, so that λ = (j, k). In the case of a general domain Ω ∈ IRd , special adaptations of the basis functions are required near the boundary ∂Ω, which are accounted in the general notations λ. Wavelets need not be orthonormal, but one often requires that they constitute a Riesz basis of L2 (Ω), i.e., their finite linear combinations are dense in L2 and for all sequences (dλ ) we have the norm equivalence dλ ψλ 2L2 ∼ |dλ |2 . (33) λ
λ
In such a case, the coefficients dλ in the expansion of f are obtained by an inner product dλ = f, ψ˜λ , where the dual wavelet ψ˜λ is an L2 -function. In the standard biorthogonal constructions, the dual wavelet system (ψ˜λ ) is also built from nested spaces V˜j and has similar local support properties as the primal wavelets ψλ . The practical advantage of such a setting is the possibility
Nonlinear Approximation
13
of “switching” between the “standard” (or “nodal”) discretization of f ∈ Vj in the basis (ϕλ )|λ|=j and its “multiscale” representation in the basis (ψλ )|λ|<j by means of fast O(N ) decomposition and reconstruction algorithms, where N ∼ 2dj denotes the dimension of Vj in the case where Ω is bounded. Multiscale approximations and decompositions into wavelet bases will provide a slightly stronger statement of the linear approximation results, due to the possibility of characterizing the smoothness of a function f through the numerical properties of its multiscale decomposition. In the case of Sobolev spaces H s = W s,2 , this characterization has the form of the following norm equivalence f 2H s ∼ P0 f 2L2 + 22sj Qj f 2L2 , (34)
j≥0
where Pj f = |λ|<j dλ ψλ and Qj f = Pj+1 f − Pj f = |λ|<j dλ ψλ are respectively the biorthogonal projectors onto Vj and Wj . Such a norm equivalence should at least be understood at the intuitive level as a close substitute to the Fourier characterization 2 f H s ∼ (1 + |ω|2s )|fˆ(ω)|2 dω, (35) where the weight (1 + |ω|2s ) plays an analogous role as 22sj in (34). Note that in the particular case where Vj is the space of functions such that fˆ is supported in [−2j , 2j ] and Pj the orthogonal projector, we directly obtain f 2H s ∼ (1 + |ω|2s )|fˆ(ω)|2 dω = |ω|<1 (1 + |ω|2s )|fˆ(ω)|2 + j≥0 2j <|ω|<2j+1 (1 + |ω|2s )|fˆ(ω)|2 ∼ |ω|<1 |fˆ(ω)|2 + j≥0 22sj 2j <|ω|<2j+1 |fˆ(ω)|2 ∼ P0 f 2L2 + j≥0 22sj Qj f 2L2 . We next remark that Qj can be replaced by I − Pj in (34), in the sense that we also have f 2H s ∼ P0 f 2L2 + 22sj f − Pj f 2L2 . (36) j≥0
In order to prove this, we need to compare the right hand side of (34) and (36). In one direction we obviously have Qj f ≤ f − Pj f + f − Pj+1 f
(37)
which is sufficient to control the r.h.s. of (34) by the r.h.s. of (36). In the other direction we use f − Pj f ≤ Ql f (38) l≥j
and conclude by the discreteHardy inequality which states that if (aj ) is a positive sequence and bj := l≥j aj , then for all s > 0 and p > 0
14
Albert Cohen
(2sj bj )p ≤ C(s, p)(2sj aj )p .
(39)
The norm equivalence (36) show that f ∈ H s is exactly equivalent to the property that (40) (2sj inf f − gL2 )j≥0 ∈ 2 , g∈Vj
which is a slight improvement over the approximation rate inf f − gL2 ≤ C2−sj ,
(41)
g∈Vj
which would be a re-expression of (29). In order to provide a similar statement for more general Lp approximation, one needs to introduce the Besov spaces s which measure smoothness of order s > 0 in Lp according to Bp,q s f Bp,q := f Lp + (2sj ωm (f, 2−j )p )j≥0 q ,
(42)
where ωm (f, t)p := sup ∆m h f Lp = sup |h|≤t
|h|≤t
m m k=0
k
(−1)k f (· − kh)Lp
is the m-th order Lp modulus of smoothness and m is any integer strictly s s larger than s. Recall that we have H s ∼ B2,2 for all s > 0, C s ∼ B∞,∞ and s,p s ∼ Bp,p for all non-integer s > 0 and p = 2. For such classes, the norm W equivalences which generalize (34) and (36) have the form s f Bp,q ∼ P0 f Lp + (2sj Qj f Lp )j≥0 q ∼ P0 f Lp + (2sj f − Pj f Lp )j≥0 q .
(43)
Such norm equivalences reflect the intuitive idea that the linear approximation error f − Pj f Lp decays like O(2−sj ) or O(N −s/d ) provided that f has “s derivatives in Lp ”. Their are essentially valid under the restriction that the wavelet ψλ itself has slightly more than s derivative in Lp . We refer to [6] for the general mechanism which allows us to prove these results, based on direct and inverse estimates as well as interpolation theory. Finally, we can re-express these norm equivalences in terms of wavelet coefficients: using the local properties of wavelet bases, we have at each level the norm equivalence Qj f pLp ∼ dλ ψλ pLp ∼ 2d(p/2−1)|λ| |dλ |p , (44) |λ|=j
|λ|=j
which in combination with (43) yields s ∼ (2(s+d/2−d/p)j (dλ )|λ|=j p )j≥0 q . f Bp,q
(45)
Nonlinear Approximation
15
5 Nonlinear Adaptive Approximation Let us now turn to nonlinear adaptive approximation, with a special focus on N -term approximation in a wavelet basis: denoting by SN := { cλ ψλ ; #(E) ≤ N }, (46) λ∈E
the set of all possible N -term combinations of wavelets, we are interested in the behaviour of σN (f ) as defined in (1) for some given error norm X. We first consider the case X = L2 and assume for simplicity that the ψλ constitute an orthonormal basis. In this case, it is a straightforward computation that the best N -term approximation of a function f is achieved by its truncated expansion dλ ψλ , (47) fN := λ∈EN (f )
where EN (f ) contains the indices corresponding to the N largest |fλ |. The approximation error is thus given by σN (f ) = f − fN L2 = ( |dλ |2 )1/2 = ( d2n )1/2 , (48) λ∈E / N (f )
n≥N
where (dn )n≥0 is defined as the decreasing rearrangement of the |dλ |, λ ∈ ∇ (i.e., dn−1 is the n-th largest |dλ |). s where s > 0 and τ are linked by Consider now the Besov spaces Bτ,τ 1/τ = 1/2 + s/d. According to the norm equivalence (45) we note that these space are simply characterized by s f Bτ,τ ∼ (dλ )λ∈∇ τ
(49)
s Thus if f ∈ Bτ,τ , we find that the decreasing rearrangement (dn )n≥0 satisfies
ndτn ≤
n−1
dτk ≤
k=0
and therefore
k≥0
dτk =
dτλ ≤ Cf τBτ,τ < +∞, s
(50)
λ∈∇
s . dn ≤ Cn−1/τ f Bτ,τ
(51)
It follows that the approximation error is bounded by 2 1 1 s ( s s . (52) σN (f ) ≤ Cf Bτ,τ n− τ )1/2 ≤ CN 2 − τ f Bτ,τ = CN −s/d f Bτ,τ n≥N
At this stage let us make some remarks:
16
Albert Cohen
• As it was previously noticed, the rate N −s/d can be achieved by linear approximation for functions having s derivative in L2 , i.e., functions in H s . Just as in the simple example of §2, the gain in switching to nonlinear s is larger than H s . In particular approximation is in that the class Bτ,τ s Bτ,τ contains discontinuous functions for arbitrarily large values of s while functions in H s are necessarily continuous if s > d/2. s . On the other hand it is easy to check • The rate (52) is implied by f ∈ Bτ,τ that (52) is equivalent to the property (51), which is itself equivalent to the property that the sequence (dλ )λ∈∇ is in the weak space τw , i.e., #{λ s.t. |dλ | ≥ ε} ≤ Cε−τ .
(53)
s This shows that the property f ∈ Bτ,τ is almost equivalent to the rate s is by the (52). One can easily check that the exact characterization of Bτ,τ τ −1 s/d stronger property N ≥0 (N σN (f )) N < +∞. s • The space Bτ,τ is critically embedded in L2 in the sense that the injection is not compact. This can be viewed as an instance of the Sobolev embedding theorem, or directly checked in terms of the non-compact embedding of s τ into 2 when τ ≤ 2. In particular Bτ,τ is not contained in any Sobolev s space H for s > 0. Therefore, no convergence rate can be expected for s linear approximation of functions in Bτ,τ .
Figure 5. Pictorial interpretation of nonlinear vs linear approximation
The general theory of nonlinear wavelet approximation developed by DeVore and its collaborators extends these results to various error norms, for
Nonlinear Approximation
17
which the analysis is far more difficult than for the L2 norm. This theory is fully detailed in [15], and we would like to summarize it by stressing on three main types of results, the two first answering respectively to problems 1 and 2 described in the introduction. Approximation and smoothness spaces. Given an error norm · X corresponding to some smoothness space in d-dimension, the space Y of those functions such that σN (f ) = distX (f, SN ) ≤ CN −t/d has a typical description in terms of another smoothness space. Typically, if X represents s order of smoothness in Lp , Y will represent s + t order of smoothness in Lτ with 1/τ = 1/p + t/d and its injection in X is not compact. This generic result has a graphical interpretation displayed on Figure 5. On this figure, a point (s, 1/p) represents function spaces with smoothness r in Lp , and the point Y sits s level of smoothness above X on the critical embedding line of slope d emanating from X. Of course in order to obtain rigorous results, one needs to specify for each case the exact meaning of “s derivative in Lp ” and/or slightly modify the property σN (f ) ≤ CN −t/d . For instance, if X = Lp for s = Y with 1/τ = 1/p + t/d if and only if some p ∈]1, ∞[, then f ∈ Bτ,τ t/d τ −1 σN (f )] N < +∞. One also needs to assume that the wavelet N ≥0 [N basis has enough smoothness, since it should at least be contained in Y . Realization of a near-best approximation. For various error metric X, a near-best approximation of f in SN is achieved by fN := λ∈ΛN (f ) dλ ψλ where dλ := f, ψ˜λ are the wavelet coefficients of f and ΛN (f ) is the set of indices corresponding to the N largest contributions dλ ψλ X . This fact is rather easy to prove when X is itself a Besov space, by using (45). A much more elaborate result is that it is also true for spaces such as Lp and W m,p for 1 < p < +∞, and for the Hardy spaces H p when p ≤ 1 (see [21]). Connections with other types of nonlinear approximation. In the univariate setting, the smoothness spaces Y characterized by a certain rate of nonlinear approximation in X are essentially the same if we replace N -term combinations of wavelets by splines with N free knots or by rational functions of degree N . The similarity between wavelets and free knot splines is intuitive since both methods allow the same kind of adaptive refinement, either by inserting knots or by adding wavelet components at finer scales. The similarities between free knot splines and rational approximation were elucidated by Petrushev in [19]. However, the equivalence between wavelets and these other types of approximation is no longer valid in the multivariate context (see §7). Also closely related to N -term approximations are adaptive splitting procedures, which are generalizations of the splitting procedure proposed in §2 to higher order piecewise polynomial approximation (see e.g. [17] and [15]). Such procedures typically aim at equilibrating the local error f − fN Lp on each element of the adaptive partition. In the case of the example of §2, we
18
Albert Cohen
remark that the piecewise constant approximation resulting from the adaptive splitting procedure can always be viewed as an N -term approximation in the Haar system, in which the involved coefficients have a certain tree structure: if λ = (j, k) is used in the approximation, then (j − 1, [k/2]) is also used at the previous coarser level. Therefore the performances of adaptive splitting approximation is essentially equivalent to those of N -term approximation with the additional tree structure restriction. These performances have been studied in [10] where it is shown that the tree structure restriction does not affect the order N −s/d of N -term approximation in X ∼ (1/p, r) if the space Y ∼ (1/τ, r + s) is replaced by Y˜ ∼ (1/˜ τ , r + s) with 1/˜ τ < 1/τ = 1/p + s/d.
6 Data Compression There exist many interesting applications of wavelets to signal processing and we refer to [18] for a detailed overview. In this section and in the following one, we would like to discuss two applications which exploit the fact that certain signals - in particular images - have a sparse representation into wavelet bases. Nonlinear approximation theory allows us to “quantify” the level of sparsity in terms of the decay of the error of N -terms approximation. On a mathematical point of view, the N -term approximation of a signal f can already be viewed as a “compression” algorithm since we are reducing the number of degrees of freedom which represent f . However, practical compression means that the approximation of f is represented by a finite number of bits. Wavelet-based compression algorithms are a particular case of transform coding algorithms which have the following general structure: •
Transformation: the original signal f is transformed into its representation d (in our case of interest, the wavelet coefficients d = (dλ )) by an invertible transform R. ˜ • Quantization: the representation d is replaced by an approximation d which can only take a finite number of values. This approximation can be encoded with a finite number of bits. ˜ and there• Reconstruction: from the encoded signal, one can reconstruct d −1 ˜ ˜ fore an approximation f = R d of the original signal f .
Therefore, a key issue is the development of appropriate quantization strategies for the wavelet representation and the analysis of the error produced by quantizing the wavelet coefficients. Such strategies should in some sense minimize the distorsion f − f˜X for a prescribed number of bits N and error metric X. Of course this program only makes sense if we refer to a certain modelization of the signal: in a deterministic context, one considers the error supf ∈Y f − f˜X for a given class Y , while in a stochastic context, one considers the error E(f − f˜X ) where the expectation is over the realizations f of a stochastic process. In the following we shall indicate some results in the deterministic context.
Nonlinear Approximation
19
We shall discuss here the simple case of scalar quantization which amounts to quantizing independently the coefficients dλ into approximations d˜λ in or˜ Similarly to the distinction between linear and nonlinear der to produce d. approximation, we can distinguish between two types of quantization strategies: • Non-adaptive quantization: the map dλ → d˜λ and the number of bits which is used to represent dλ depend only on the index λ. In practice they typically depend on the scale level |λ|: less bits are allocated to the fine scale coefficients which have smaller values than the coarse scale coefficients in an averaged sense. • Adaptive quantization: the map dλ → d˜λ and the number of bits which is used to represent dλ depend both of λ and of the amplitude value |dλ |. In practice they typically depend on |dλ | only: more bits are allocated to the large coefficients which correspond to different indices from one signal to another. The second strategy is clearly more appropriate in order to exploit the sparsity of the wavelet representation, since a large number of bits will be used only for a small number of numerically significant coefficients. In order to analyze this idea more precisely, let us consider the following specific strategy: for a fixed ε > 0, we affect no bits to the details such that |dλ | ≤ ε by setting d˜λ = 0, which amount in thresholding them, and we affect j bits to a detail such that 2j−1 ε < |dλ | ≤ 2j ε. By choosing the 2j values of d˜λ uniformly in the range ] − 2j ε, −2j−1 ε[∪]2j−1 ε, 2j ε[, we thus ensure that for all λ |dλ − d˜λ | ≤ ε.
(54)
If we measure the error in X = L2 , assuming the {ψλ } to form a Riesz basis, we find that f − f˜2 = |dλ − d˜λ |2 ≤ ε2 #{λ s.t. |dλ | ≥ ε} + |dλ |2 . (55) |dλ |≤ε
λ∈∇
Note that the second term is simply the error of nonlinear approximation by thresholding at the level ε, while the first term corresponds to the effect of quantizing the significant coefficients. Let us now assume that the class of signals Y has a sparse wavelet representation in the sense that there exists τ ≤ 2 and C > 0 such that for all f ∈ Y we have d = (dλ )λ∈∇ ∈ τw (∇), with dτw ≤ C, i.e., sup #{λ s.t. |dλ | ≥ η} ≤ Cη −τ .
f ∈Y
(56)
We have seen in the previous section that this property is satisfied when s ≤ C for all f ∈ Y with 1/τ = 1/2 + s/d and that it is equivalent f Bτ,τ to the nonlinear approximation property σN ≤ CN −s/d . Using (56), we can
20
Albert Cohen
estimate both terms in (55) as follows: for the quantization term, we simply obtain ε2 #{λ s.t. |dλ | ≥ ε} ≤ Cε2−τ , (57) while for the thresholding term we have |dλ |2 ≤ 2−2j ε2 #{λ s.t. |dλ | ≥ ε2−j−1 } ≤ Cε2−τ . |dλ |≤ε
(58)
j≥0
Therefore we find that the compression error is estimated by Cε1−τ /2 . We can also estimate the number of bits Nq which are used to quantize the dλ according to j2−τ j ≤ Cε−τ . (59) j#{λ s.t. 2j−1 ε < |dλ | ≤ 2j ε} ≤ Cε−τ Nq = j>0
j>0
Comparing Nq and the compression error, we find the striking result that f − f˜L2 ≤ CNq(1−τ /2)/τ = CNq−s/d .
(60)
At the first sight, it seems that we obtain with only N bits the same rate as for nonlinear approximation which requires N real coefficients. However, a specific additional difficulty of adaptive quantization is that we also need to encode the addresses λ such that 2j−1 ε < |dλ | ≤ 2j ε. The bit cost Na of this addressing can be significantly close to Nq or even higher. If the class of signals is modelized by (56), we actually find that Na is infinite since the large coefficients could be located anywhere. In order to have Na ≤ Cε−τ as well, and thus obtain the desired estimate f − f˜L2 ≤ CN −s/d with N = Nq +Na , it is necessary to make some little additional assumption on Y that restricts the location of the large coefficients and to develop a suitable addressing strategy. The most efficient wavelet-compression algorithms, such as the one introduced in [20] (and further developed in the compression standard JPEG 2000), typically apply addressing strategies based on tree structures within the indices λ. We also refer to [10] where it is proved that such strategy allow us to recover optimal rate/distorsion bounds – i.e., optimal behaviours of the compression error with respect to the number of bits N – for various deterministic classes Y modelizing the signals. In practice such results can only be observed for a certain range of N , since the original itself is most often given by a finite number of bits No , e.g. a digital image. Therefore modelizing the signal by a function class and deriving rate/distorsion bounds from this modelization is usually relevant only for low bit rate N << No , i.e., high compression ratio. One should then of course address the questions of “what are the natural deterministic classes which model real signals” and “what can one say about the sparsity of wavelet representations for these classes”. An interesting example is given by real images which are often modelized by the space BV of functions with bounded variation. This function space represents functions which have one order of smoothness
Nonlinear Approximation
21
in L1 in the sense that their gradient is a finite measure. This includes in particular functions of the type χΩ for domains Ω with boundaries of finite length. In [11] it is proved that the wavelet coefficients of a function f ∈ BV are sparse in the sense that they are in 1w . This allows us to expect a nonlinear approximation error in N −1/2 for images, and a similar rate for compression provided that we can handle the addressing with a reasonable number of bits. The last task turns out to be feasible, thanks to some additional properties, such as the L∞ -boundedness of images.
7 Statistical Estimation In recent years, wavelet-based thresholding methods have been widely applied to a large range of problems in statistics - density estimation, white noise removal, nonparametric regression, diffusion estimation - since the pioneering work of Donoho, Johnstone, Kerkyacharian and Picard (see e.g. [16]). In some sense the growing interest for thresholding strategies represent a significant “switch” from linear to nonlinear/adaptive methods. Here we shall consider the simple white noise model, i.e., given a function f (t) we observe on [0, 1] dg(t) = f (t)dt + εdw(t),
(61)
where w(t) is a Brownian motion. In other words, we observe the function f with an additive white gaussian noise of variance ε2 . This model can of course be generalized to higher dimension. We are now interested in constructing an estimator f˜ from the data g. The most common measure of the estimation error is in the mean square sense: assuming that f ∈ L2 we are interested in the quantity E(f˜ − f 2L2 ). Similarly to data compression, the design of an optimal estimation procedure in order to minimize the mean square error is relative to a specific modelization of the signal f either by a deterministic class Y or by a stochastic process. Linear estimation methods define fˆ by applying a linear operator to g. In many practical situations this operator is translation invariant and amounts to a filtering procedure, i.e., f˜ = h ∗ g. For example, in the case of a second order stationary process, the Wiener filter gives an optimal solution in terms ˆ of h(ω) := rˆ(ω)/(ˆ r(ω) + ε2 ) where rˆ(ω) is the power spectrum of f , i.e., the Fourier transform of r(u) := E(f (t)f (t + u)). Another frequently used linear method is by projection on some finite dimensional subspace V , i.e., N f˜ = P g = ˜n en , where (en , e˜n )n=1,···,N are a biorthogonal basis n=0 g, e system for V and N := dim(V ). In this case, using the fact that E(f˜) = P f we can estimate the error as follows: E(f˜ − f 2L2 ) = E(P f − f 2 ) + E(P (g − f )2 ) ≤ E(P f − f 2 ) + CN ε2 . If P is an orthonormal projection, we can assume that en = e˜nis an orthonormal basis so that E(P (g − f )2 ) = n E(| f − g, en |2 ) = n ε2 , and
22
Albert Cohen
therefore the above constant C is equal to 1. Otherwise this constant depends on the “angle” of the projection P . In the above estimation, the first term E(P f − f 2 ) is the bias of the estimator. It reflects the approximation property of the space V for the model, and typically decreases with the dimension of V . Note that in the case of a deterministic class Y , it is simply given by P f − f 2 . The second term CN ε2 represents the variance of the estimator which increases with the dimension of V . A good estimator should find an optimal balance between these two terms. Consider for instance the projection on the multiresolution space Vj , i.e., ˜ f := |λ|≤j g, ψ˜λ ψλ , together with a deterministic model: the functions f satisfy f H s ≤ C, (62) where H s is the Sobolev space of smoothness s. Then we can estimate the bias by the linear approximation estimate in C2−2sj and the variance by C2j ε2 since the dimension of Vj adapted to [0, 1] is of order 2j . Assuming an a-priori knowledge on the level ε of the noise, we find that the scale level balancing the bias and variance term is j(ε) such that 2j(ε)(1+2s) ∼ ε−2 . We thus select as our estimator f˜ := Pj(ε) g. (63) With such a choice, the resulting estimation error is then bounded by 4s
E(f˜ − f 2L2 ) ≤ Cε 1+2s .
(64)
Let us make a few comments on this simple result: • The convergence rate 4s/(1 + 2s) of the estimator, as the noise level tends to zero, improves with the smoothness of the model. It can be shown that this is actually the optimal or minimax rate, in the sense that for any estimation procedure, there always exist an f in the class (62) for which 4s we have E(f˜ − f 2L2 ) ≥ cε 1+2s . • One of the main limitation of the above estimator is that it depends not only on the noise level (which in practice can often be evaluated), but also on the modelizing class itself since j(ε) depends of s. A better estimator should give an optimal rate for a large variety of function classes. • The projection Pj(ε) is essentially equivalent to low pass filtering which eliminates the frequencies larger than 2j(ε) . The drawbacks of such denoising strategies are well known in practice: while they remove the noise, low-pass filters tend to blur the singularities of the signals, such as the edge in an image. This problem is implicitely reflected in the fact that signals with edges correspond to a value of s which cannot exceed 1/2 and therefore the convergence rate is at most O(ε). Let us now turn to nonlinear estimation methods based on wavelet thresholding. The simplest thresholding estimator is defined by
Nonlinear Approximation
f˜ :=
g, ψ˜λ ψλ ,
23
(65)
˜λ |≥η |g,ψ
i.e., discarding the coefficients of the data of size less than some η > 0. Let us remark that the wavelet coefficients of the observed data can be expressed as
g, ψ˜λ = f, ψ˜λ + εbλ
(66)
where the bλ are gaussian variables. These variables have variance 1 if we assume (which is always possible up to a renormalization) that ψ˜λ 2L2 = 1. In the case of an orthonormal bases the bλ are independent. Therefore the observed coefficients appear as those of the real signal perturbated by an additive noise of level ε. It thus seems at the first sight that a natural choice for a threshold is to simply fix η := ε: we can hope to remove most of the noise, while preserving the most significant coefficients of the signal, which is particularly appropriate if the wavelet decomposition of f is sparse. In order to understand the rate that we could expect from such a procedure, we shall again consider the class of signals described by (56). For a moment, let us assume that we dispose of an oracle which gives us the knowledge of those λ such that the wavelet coefficients of the real signal are larger than ε, so that we could build the modified estimator
g, ψ˜λ ψλ . (67) f := ˜λ |≥ε |f,ψ
In this case, f can be viewed as the projection P g of g onto the space V (f, ε) spanned by the ψλ such that | f, ψ˜λ | ≥ ε, so that we can estimate the error by a sum of bias and variance terms according to E(f˜ − f 2L2 ) = f − P f 2 + E(P (f − g)2 ) ≤ C[ |f,ψ˜λ |≥ε | f, ψ˜λ |2 + ε2 #{λ s.t. | f, ψ˜λ | ≥ ε}]. For the bias term, we recognize the nonlinear approximation error which is bounded by Cε2−τ according to (58). From the definition of the class (56) we find that the variance term is also bounded by Cε2−τ . In turn, we obtain for the oracle estimator the convergence rate ε2−τ . In particular, if we consider the model s f Bτ,τ ≤ C, (68) with 1/τ = 1/2 + s, we obtain that 4s
E(f˜ − f 2L2 ) ≤ Cε2−τ = Cε 1+2s .
(69)
Let us again make a few comments: • In a similar way to approximation rates, nonlinear methods achieve the same estimation rate as linear methods but for much weaker models: the exponent 4s/(1 + 2s) was achieved by the linear estimator for the class (62) which is more restrictive than (56).
24
Albert Cohen
• In contrast with the linear estimator, we see that the nonlinear estimator does not need to be tuned according to the value of τ or s. In this sense, it is very robust. • Unfortunately, (67) is unrealistic since it is based on the “oracle assumption”. In practice, we are thresholding according to the values of the observed coefficients g, ψ˜λ = f, ψ˜λ + ε2 bλ , and we need to face the possible event that the additive noise ε2 bλ severely modifies the position of the observed coefficients with respect to the threshold. Another unrealistic aspect, also in (65), is that one cannot evaluate the full set of coefficients ( g, ψ˜λ )λ∈∇ which is infinite. The strategy proposed in [16] solves the above difficulties as follows: a realistic estimator is built by (i) a systematic truncation the estimator (65) above a scale j(ε) such that 2−2αj(ε) ∼ ε2 for some fixed α > 0, and (ii) a choice of threshold slightly above the noise level according to η(ε) := C(α)ε| log(ε)|1/2 . It is then possible to prove that the resulting more realistic estimator f˜ :=
g, ψ˜λ ψλ ,
(70)
(71)
˜λ |≥η(ε) |λ|≤j(ε),|g,ψ 4s
has the rate [ε| log(ε)|1/2 ] 1+2s (i.e., almost the same asymptotic performance as the oracle estimator) for the functions which are in both the class (56) and in the Sobolev class H α . The “minimal” Sobolev smoothness α - which is needed to allow the truncation of the estimator - can be taken arbitrarily close to zero up to a change of the constants in the threshold and in the convergence estimate.
8 Adaptive Numerical Simulation Numerical simulation is nowadays an essential tool for the understanding of physical processes modelized by partial differential or integral equations. In many instances, the solution of these equations exhibits singularities, resulting in a slower convergence of the numerical schemes as the discretization tends to zero. Moreover, such singularities might be physically significant such as shocks in fluid dynamics or local accumulation of stress in elasticity, and therefore they should be well approximated by the numerical method. In order to maintain the memory size and computational cost at a reasonable level, it is then necessary to use adaptive discretizations which should typically be more refined near the singularities. In the finite element context, such discretizations are produced by mesh refinement: starting from an initial coarse triangulation, we allow further subdivision of certain elements into finer triangles, and we define the discretization
Nonlinear Approximation
25
space according to this locally refined triangulation. This is of course subject to certain rules, in particular preserving the conformity of the discretization when continuity is required in the finite element space. The use of wavelet bases as an alternative to finite elements is still at its infancy (some first surveys are [6] and [12]), and was strongly motivated by the possibility to produce simple adaptive approximations. In the wavelet context, a more adapted terminology is space refinement: we directly produce an approximation space VΛ := Span{ψλ ; λ ∈ Λ},
(72)
by selecting an set Λ which is well adapted to describe the solution of our problem. If N denotes the cardinality of the adapted finite element or wavelet space, i.e., the number of degrees of freedom which are used in the computations, we see that in both cases the numerical solution uN can be viewed as an adaptive approximation of the solution u in a nonlinear space ΣN . A specific difficulty of adaptive numerical simulation is that the solution u is unknown at the start, except for some rough a-priori information such as global smoothness. In particular the location and structure of the singularities are often unknown, and therefore the design of an optimal discretization for a prescribed number of degrees of freedom is a much more difficult task than simple compression of fully available data. This difficulty has motivated the development of adaptive strategies based on a-posteriori analysis, i.e., using the currently computed numerical solution to update the discretization and derive a better adapted numerical solution. In the finite element setting, such an analysis was developed since the 1970’s (see [1] or [22]) in terms of local error indicators which aim to measure the contribution of each element to the error. The rule of thumb is then to refine the triangles which exhibit the largest error indicators. More recently, similar error indicators and refinement strategies were also proposed in the wavelet context (see [2] and [13]). Nonlinear approximation can be viewed as a benchmark for adaptive strategies: if the solution u can be adaptively approximated in ΣN with a certain error σN (u) in a certain norm X, we would ideally like that the adaptive strategy produces an approximation uN ∈ ΣN such that the error u − uN X is of the same order as σN (u). In the case of wavelets, this means that the error produced by the adaptive scheme should be of the same order as the error produced by keeping the N largest coefficients of the exact solution. In most instances unfortunately, such a program cannot be achieved by an adaptive strategy and a more reasonable goal is to obtain an optimal asymptotic rate: if σN (u) ≤ CN −s for some s > 0, an optimal adaptive strategy should ˜ −s . An additional important aspect is the produce an error u − uN X ≤ CN computational cost to derive uN : a computationally optimal strategy should produce uN in a number of operation which is proportional to N . A typical instance of computationally optimal algorithm - for a fixed discretization - is the multigrid method for linear elliptic PDE’s. It should be noted that very often, the norm X in which one can hope for an optimal error estimate is dictated by the problem at hand: for example, in the case of an elliptic problem,
26
Albert Cohen
this will typically be a Sobolev norm equivalent to the energy norm (e.g., the H 1 norm when solving the Laplace equation). Most existing wavelet adaptive schemes have in common the following general structure. At some step n of the computation, a set Λn is used to n represent the numerical solution uΛn = λ∈Λn dλ ψλ . In the context of an initial value problem of the type ∂t u = E(u), u(x, 0) = u0 (x),
(73)
the numerical solution at step n is typically an approximation to u at time n∆t where ∆t is the time step of the resolution scheme. In the context of a stationary problem of the type F (u) = 0,
(74)
the numerical solution at step n is typically an approximation to u which should converge to the exact solution as n tends to +∞. In both cases, the derivation of (Λn+1 , uΛn+1 ) from (Λn , uΛn ) goes typically in three basic steps: • Refinement: a larger set Λ˜n+1 with Λn ⊂ Λ˜n+1 is derived from an aposteriori analysis of the computed coefficients dnλ , λ ∈ Λ n. • Computation: an intermediate numerical solution u ˜n+1 = λ∈Λ˜n+1 dλn+1 ψλ is computed from un and the data of the problem. • Coarsening: the smallest coefficients ˜n+1 are thresholded, resulting in of u the new approximation un+1 = λ∈Λn+1 dn+1 ψλ supported on the smaller λ ˜ set Λn+1 ⊂ Λn+1 . Of course the precise description and tuning of these operations strongly depends on the type of equation at hand, as well as on the type of wavelets which are being used. In the case of linear elliptic problems, it was recently proved in [7] that an appropriate tuning of these three steps results in an optimal adaptive wavelet strategy both in terms of approximation properties and computational time. These results have been extended to more general problems such as saddle points [8] and nonlinear [9]. In the elliptic case, similar results have also been proved in the finite element context : in [3] it is shown that optimal appoximation rates can be achieved by an adaptive mesh refinement algorithm which incorporates coarsening steps that play an analogous role to wavelet thresholding.
9 The Curse of Dimensionality The three applications that were discussed in the previous sections exploit the sparsity properties of wavelet decompositions for certain classes of functions, or equivalently the convergence properties of nonlinear wavelet approximations of these functions. Nonlinear adaptive methods in such applications are
Nonlinear Approximation
27
typically relevant if these functions have isolated singularities in which case there might be a substantial gain of convergence rate when switching from linear to nonlinear wavelet approximation. However, a closer look at some simple examples show that this gain tends to decrease for multivariate functions. Consider the L2 -approximation of the characteristic function f = χΩ of a smooth domain Ω ⊂ [0, 1]d . Due to the singularity on the boundary ∂Ω, one can easily check that the linear approximation cannot behave better than σN (f ) = f − Pj f L2 ∼ O(2−j/2 ) ∼ O(N −1/2d ),
(75)
where N = dim(Vj ) ∼ 2dj . Turning to nonlinear approximation, we notice that since ψ˜λ = 0, all the coefficients dλ are zero except those such that the support of ψ˜λ overlaps the boundary. At scale level j there is thus at most K2d−1 j non-zero coefficients, where K depends on the support of the ψλ and on the d − 1 dimensional measure of ∂Ω. For such coefficients, we have the estimate |dλ | ≤ ψ˜λ L1 ≤ C2−dj/2 . (76) In the univariate case, i.e., when Ω is a simple interval, the number of nonzero coefficients up to scale j is bounded by jK. Therefore, using N non-zero coefficients at the coarsest levels gives an error estimate with exponential decay ˜ −dN/2K , σN (f ) ≤ [ K|C2−dj/2 |2 ]1/2 ≤ C2 (77) j≥N/K
which is a spectacular improvement on the linear rate. In the multivariate case, j the number of non-zero coefficients up to scale j is bounded by l=0 K2(d−1)l ˜ (d−1)j . Therefore, using N non-zero coefficients at the coarsest and thus by K2 levels gives an error estimate ˜ −1/(2d−2) , σN (f ) ≤ [ K2(d−1)j |C2−dj/2 |2 ]1/2 ≤ CN (78) ˜ (d−1)j ≥N K2
which is much less of an improvement. For example, in the 2D case, we only go from N −1/4 to N −1/2 by switching to nonlinear wavelet approximation. This simple example illustrates the curse of dimensionality in the context of nonlinear wavelet approximation. The main reason for the degradation of the approximation rate is the large number K2(d−1)j of wavelets which are needed to refine the boundary from level j to level j + 1. On the other hand, if we view the boundary itself as the graph of a smooth function, it is clear that approximating this graph with accuracy 2−j should require much less parameters than K2(d−1)j . This reveals the fundamental limitation of wavelet bases: they fail to exploit the smoothness of the boundary and therefore cannot capture the simplicity of f in a small number of parameters. Another way of describing this limitation is by remarking that nonlinear wavelet approximation allows local refinement of the approximation, but imposes some
28
Albert Cohen
isotropy in this refinement process. In order to capture the boundary with a small number of parameters, one would typically need to refine more in the normal direction than in the tangential directions, i.e., apply anisotropic local refinement. In this context, other approximation tools outperform wavelet bases: it is easy to check that the use of piecewise constant functions on an adaptive partition of N triangles in 2D will produce the rate σN (f ) ∼ O(N −1 ), precisely because one is allowed to use arbitrarily anisotropic triangles to match the boundary. In the case of rational functions it is conjectured an even more spectacular result: if ∂Ω is C ∞ , then σN (f ) ≤ Cr N −r for any r > 0. These remarks reveal that, in contrast to the 1D case, free triangulations or rational approximation outperform N -term approximation, and could be thought as a better tool in view of applications such as those which were discussed throughout this paper. This is not really true in practice: in numerical simulation, rational functions are difficult to use and free triangulations are often limited by shape constraints which restricts their anisotropy, and both methods are not being used in statistical estimation or data compression, principally due to the absence of fast and robust algorithms which would produce an optimal adaptive approximation in a similar manner as wavelet thresholding. The development of new approximation and representation tools, which could both capture anisotropic features such as edges with a very small number of parameters and be implemented by fast and robust procedures, is currently the object of active research. A significant breakthrough was recently achieved by Donoho and Candes who developed representations into ridglet bases which possess the scale-space localization of wavelets together with some directional selection. Such bases allow for example to recover with a simple thresholding procedure the rate O(N −1 ) for a bivariate function which is smooth except along a smooth curve of discontinuity.
References 1. Babushka, I. and W. Reinhbolt (1978) A-posteriori analysis for adaptive finite element computations, SIAM J. Numer. Anal. 15, 736–754. 2. Bertoluzza, S. (1995) A posteriori error estimates for wavelet Galerkin methods, Appl. Math. Lett. 8, 1–6. 3. Binev P., W. Dahmen and R. DeVore (2002), Adaptive finite element methods with convergence rate, preprint IGPM-RWTH Aachen, to appear in Numer. Math.. 4. Candes, E. and D. Donoho (1999), Ridgelets: a key to higher-dimensional intermittency ?, Phil. Trans. Roy. Soc.. 5. Ciarlet, P.G. (1991), Basic error estimates for the finite element method, Handbook of Numerical Analysis, vol II, P. Ciarlet et J.-L. Lions eds., Elsevier, Amsterdam. 6. Cohen, A. (2001) Wavelets in numerical analysis, Handbook of Numerical Analysis, vol. VII, P.G. Ciarlet and J.L. Lions, eds., Elsevier, Amsterdam.
Nonlinear Approximation
29
7. Cohen, A., W. Dahmen and R. DeVore (2000) Adaptive wavelet methods for elliptic operator equations - convergence rate, Math. Comp. 70, 27–75. 8. Cohen, A., W. Dahmen and R. DeVore (2002) Adaptive wavelet methods for elliptic equations - beyond the elliptic case, Found. of Comp. Math. 2, 203–245. 9. Cohen, A., W. Dahmen and R. DeVore (2003) Adaptive wavelet schemes for nonlinear variational problems, preprint IGPM-RWTH Aachen, to appear in SIAM J. Numer. Anal.. 10. Cohen, A., W. Dahmen, I. Daubechies and R. DeVore (2001) Tree approximation and optimal encoding, Appl. Comp. Harm. Anal. 11, 192–226. 11. Cohen, A., R. DeVore, P. Petrushev and H. Xu (1999), Nonlinear approximation and the space BV (IR2 ), Amer. Jour. of Math. 121, 587-628. 12. Dahmen, W. (1997) Wavelet and multiscale methods for operator equations, Acta Numerica 6, 55–228. 13. Dahmen, W., S. Dahlke, R. Hochmuth and R. Schneider (1997) Stable multiscale bases and local error estimation for elliptic problems, Appl. Numer. Math. 23, 21–47. 14. Daubechies, I. (1992) Ten lectures on wavelets, SIAM, Philadelphia. 15. DeVore, R. (1997) Nonlinear Approximation, Acta Numerica 51–150. 16. Donoho, D., I. Johnstone, G. Kerkyacharian and D. Picard (1992) Wavelet shrinkage: asymptotia ? (with discussion), Jour. Roy. Stat. Soc. 57, Serie B, 301-369. 17. DeVore R. and X. M. Yu (1990) Degree of adaptive approximation, Math. Comp. 55, 625–635. 18. Mallat, S. (1998) A wavelet tour of signal processing, Academic Press, New York. 19. Petrushev, P. (1988) Direct and converse theorems for spline and rational approximation and Besov spaces, in Function spaces and applications, M. Cwikel, J. Peetre, Y. Sagher and H. Wallin, eds., Lecture Notes in Math. 1302, Springer Verlag, Berlin, 363–377. 20. Shapiro, J. (1993) Embedded image coding using zerotrees of wavelet coefficients, IEEE Signal Processing 41, 3445–3462. 21. Temlyakov, V. (1998) Best N -term approximation and greedy algorithms, Adv. Comp. Math. 8, 249–265. 22. Verf¨ urth, R. (1994) A-posteriori error estimation and adaptive mesh refinement techniques, Jour. Comp. Appl. Math. 50, 67–83.
Multiscale and Wavelet Methods for Operator Equations Wolfgang Dahmen Institut f¨ ur Geometrie und Praktische Mathematik, RWTH Aachen, Germany
[email protected] Summary. These notes are to bring out some basic mechanisms governing wavelet methods for the numerical treatment of differential and integral equations. Some introductory examples illustrate the quasi–sparsity of wavelet representations of functions and operators. This leads us to identify the key features of general wavelet bases in the present context, namely locality, cancellation properties and norm equivalences. Some analysis and construction principles regarding these properties are discussed next. The scope of problems to which these concepts apply is outlined along with a brief discsussion of the principal obstructions to an efficient numerical treatment. This covers elliptic boundary value problems as well as saddle point problems. The remainder of these notes is concerned with a new paradigm for the adaptive solution of such problems. It is based on an equivalent formulation of the original variational problem in wavelet coordinates. Combining the well-posedness of the original problem with the norm equivalences induced by the wavelet basis, the transformed problem can be arranged to be well–posed in the Euclidean metric. This in turn allows one to devise convergent iterative schemes for the infinite dimensional problem over 2 . The numerical realization consists then of the adaptive application of the wavelet representations of the involved operators. Such application schemes are described and the basic concepts for analyzing their computational complexity, rooted in nonlinear approximation, are outlined. We conclude with an outlook on possible extensions, in particular, to nonlinear problems.
1 Introduction These lecture notes are concerned with some recent developments of wavelet methods for the numerical treatment of certain types of operator equations. A central theme is a new approach to discretizing such problems that differs from conventional concepts in the following way. Wavelets are used to transform the problem into an equivalent one which is well-posed in 2 . The solution of the latter one is based on concepts from nonlinear approximation. This part of the material is taken from recent joint work with A. Cohen, R. DeVore and also with S. Dahlke and K. Urban.
J.H. Bramble, A. Cohen, and W. Dahmen: LNM 1825, C. Canuto (Ed.), pp. 31–96, 2003. c Springer-Verlag Berlin Heidelberg 2003
32
Wolfgang Dahmen
The notes are organized as follows. Some simple examples are to identify first some key features of multiscale bases that will motivate and guide the subsequent discussions. Some construction and analysis principles are then reviewed that can be used to construct bases with the relevant properties for realistic application settings. The scope of these applications will be detailed next. After these preparations the remaining part is devoted to the discussion of adaptive solution techniques.
2 Examples, Motivation Rather than approximating a given function by elements of a suitable finite dimensional space the use of bases aims at representing functions as expansions thereby identifying the function with an array of coefficients – its digits. The following considerations indicate why wavelet bases offer particularly favorable digit representations. 2.1 Sparse Representations of Functions, an Example Consider the space L2 ([0, 1]) of all square integrable functions on the interval [0, 1]. For a given dyadic mesh of mesh size 2−j piecewise constant approximations on such meshes are conveniently formed by employing dilates and translates of the indicator function of [0, 1]. φ(x) := χ[0,1) (x)
φj,k := 2j/2 φ 2j · −k ,
k = 0, . . . 2j − 1, 0
In fact, denoting by f, g = f, g[0,1] :=
1
1
f (x)g(x)dx the standard inner
0
product on [0, 1], Pj (f ) :=
j 2 −1
f, φj,k φj,k
k=0
is the orthogonal projection of f onto the space Sj := span {φj,k : k = 0, . . . , 2j − 1}. Figure 1 displays such approximations for several levels of resolution.
Multiscale and Wavelet Methods for Operator Equations
0
1
0
1
0
33
1
Fig. 1. Piecewise constant approximations to f
If a chosen level of resolution turns out to be insufficient, a naive approach would be to recompute for a smaller mesh size. More cleverly one can avoid wasting prior efforts by monitoring the difference between successive levels of resolution as indicated by Figure 2
a
=
+
b =
Feinstruktur
Mittelung
+
Detail
Fig. 2. Splitting averages and details
To encode the split off details, in addition to the averaging profile φ a second oscillatory profile is needed. Defining 1
ψ(x) := φ(2x) − φ(2x − 1) 1/2
ψj,k := 2j/2 ψ(2j · −k),
1
-1
it is not hard to show that (Pj+1 − Pj )f =
j 2 −1
dj,k (f )ψj,k where dj,k (f ) = f, ψj,k .
(1)
k=0
Thus, due to the denseness of piecewise constants in L2 ([0, 1]), the telescoping expansion converges strongly to f , so that
34
Wolfgang Dahmen
f = P0 f +
∞
∞ 2 −1 j
(Pj − Pj−1 )f =
j=1
dj,k (f )ψj,k =: d(f )T Ψ
(2)
j=−1 k=0
is a representation of f . Norm equivalence: Due to the fact that Ψ = {φ0,0 } ∪ {ψj,k : k = 0, . . . , 2j − 1} is an orthonormal collection of functions – basis in this case – one has 1/2 ∞ f L2 = (Pj − Pj−1 )f 2L = d(f )2 . 2
(3)
j=0
Since therefore small changes in d(f ) cause only equally small changes in f and vice versa, we have a particularly favorable digit representation of f . Ψ is called the Haar-basis – a first simple example of a wavelet basis. Vanishing moment: While summability in (3) implies that the detail or wavelet coefficients dj,k (f ) eventually tend to zero – just as in Fourier expansions, it is interesting to see to what extent – in contrast to Fourier expansions – their size conveys local information about f . Since the ψj,k are orthogonal to constants, one readily concludes |dj,k (f )| = inf | f − c, ψj,k | ≤ inf f − cL2 (Ij,k ) ≤ 2−j f L2 (Ij,k ) . c∈IR
c∈IR
(4)
Thus dj,k (f ) is mall when f is smooth on the support Ij,k of ψj,k . If all wavelet coefficients dj,k (f ) were known, keeping the N largest among them and replacing all others by zero, would provide an approximation to f involving only N terms that minimizes, in view of (3), the L2 -error among all other competing N term approximations from the Haar system. These N terms would give rise to a possibly highly nonuniform mesh for the corresponding piecewise constant approximation. Moreover, (4) tells us that in regions where f has a small first derivative the mesh size would be relatively coarse, while small intervals are encountered where f varies strongly. This ability to provide sparse approximate representations of functions is a key feature of wavelet bases. Wavelet transform: Computing wavelet coefficients directly through quadrature poses difficulties. Since the support of low level wavelets is comparable to the domain, a sufficiently accurate quadrature would be quite expensive. A remedy is offered by the following strategy which can be used when the wavelet coefficients of interest have at most some highest level J say. The accurate computation of the scaling function coefficients cJ,k := f, φJ,k , k = 0, . . . , 2J −1, is much less expensive, due to their uniformly small support. The transformation from the array cJ into the array of wavelet coefficients
Multiscale and Wavelet Methods for Operator Equations
35
dJ = (c0 , d0 , . . . , dJ−1 ), dj := {dj,k (f ) : k = 0, . . . , 2j − 1} is already implicit in (1). Its concrete form can be derived from the two-scale relations: 1 1 φj,k = √ (φj+1,2k + φj+1,2k+1 ), ψj,k := √ (φj+1,2k − φj+1,2k+1 ), 2 2 1 1 φj+1,2k = √ (φj,k + ψj,k ), φj+1,2k+1 = √ (φj,k − ψj,k ). 2 2
(5) (6)
In fact, viewing the array of basis functions as a vector (in some fixed unspecified order) and adopting in the following the shorthand notation 2j+1 −1
cj+1,k φj+1,k =: ΦTj+1 cj+1 ,
k=0
Figure 2 suggests to write the fine scale scaling function representation as a coarse scale representation plus detail: ΦTj+1 cj+1 = ΦTj cj + Ψ T dj . One easily derives from (5) that the coefficients are interrelated as follows 1 1 cj,k = √ (cj+1,2k + cj+1,2k+1 ), dj,k = √ (cj+1,2k − cj+1,2k+1 ), 2 2 1 1 cj+1,2k = √ (cj,k + dj,k ), cj+1,2k+1 = √ (cj,k − dj,k ). 2 2 This leads to the following cascadic transforms whose structure is shared also by more complex wavelet bases. Fast (Orthogonal) Transform: TJ : dJ := (c0 , d0 , . . . , dJ−1 ) → cJ : c0 → c1 → c2 → · · · → cJ−1 → cJ d0 d1 d2 ··· dJ−1 J T Inverse transform: T−1 J = TJ : cJ → d :
cJ → cJ−1 → cJ−2 → · · · → c1 → c0 . dJ−2 ··· d1 d0 dJ−1 are efficient in the sense that the Obviously, both transforms TJ and T−1 J computational effort stays proportional to the size of cJ . However, one should already note at this point that this strategy does not fit with the case when f admits actually a much sparser approximation in wavelet coordinates. Finding alternative strategies will be an important subject of subsequent discussions.
36
Wolfgang Dahmen
2.2 (Quasi-) Sparse Representation of Operators Wavelet bases not only offer nearly sparse representations of functions but also of operators from a wide class. Let us explain first what is meant by standard wavelet representation of a linear operator for the example of the Haar basis. Suppose for simplicity that L takes L2 ([0, 1]) into itself. Expanding f in the wavelet basis, applying L to each basis function and expanding the resulting images again in the wavelet basis, yields Lψj,k , ψl,m ψl,m f, ψj,k = Ψ T Ld,
f, ψj,k Lψj,k = Lf = j,k
j,k
l,m
where L := ( ψj,k , Lψl,m )(j,k),(l,m) =: Ψ, LΨ ,
d := ( ψj,k , f )(j,k) =: Ψ, f . (7)
When L is local like a differential operator many of the entries vanish. However, when L is global like an integral operator, in principle, all the entries of L could be nonzero so that finite sections would be densely populated matrices with corresponding adverse effects on computational complexity. It was observed in [13] that for certain singular integral operators L is almost sparse in the sense that many entries became very small. Let us quantify this statement for the following example of the the Hilbert Transform f (y) 1 dy. (Lf )(x) := p.v. π x−y IR
Using Taylor expansion of the kernel and again the fact that the wavelets are orthogonal to constants, one obtains
2−j(k+1) 2−l (m+1)
πL(j,k),(l,m) = 2−j k
2−l m
2−l (m+1) 2−j(k+1)
= 2−l m
2−j k
2−l m
2−j k
×ψl,m (y) dy.
ψl,m (y) dy
ψj,k (x) dx
(y − 2−l m) ψ (x) dx ψl,m (y) dy j,k (x − yl,m )2
2−l (m+1) 2−j(k+1)
=
1 1 − x−y x − 2−l m
−l
−l
(y − 2 m) (y − 2 m) − −j (x − yl,m )2 (2 k − yl,m )2
ψj,k (x) dx
Multiscale and Wavelet Methods for Operator Equations
Since
37
|ψj,k (x)| dx ≤ 2−j/2 one therefore obtains, for instance when l ≤ j,
IR 3
2− 2 |j−l| −l −3 −(l+j) 32 −j |2 k − 2 m| = , 2 π|L(j,k),(m,l) | < ∼ |k − 2j−l m|3 which says that the entries decay exponentially with increasing difference of levels and also polynomially with increasing spatial distance of the involved wavelets. A more general estimate for general wavelet bases will later play an important role. 2.3 Preconditioning The next example addresses an effect of wavelets which is related to preconditioning. As a simple example consider the boundary value problem −u = f
on [0, 1],
u(0) = u(1) = 0,
whose weak formulation requires finding u ∈ H01 ([0, 1]) such that
u , v = f, v,
v ∈ H01 ([0, 1]).
Here H01 ([0, 1]) denotes the space of functions in L2 ([0, 1]) whose first derivative is also in L2 ([0, 1]) and which vanish at 0 and 1. Note that this formulation requires less regularity than the original strong form. Therefore we can employ piecewise linear continuous functions as building blocks for representing approximate solutions. Specifically,
denoting by Sj this time the span of the functions φj,k = 2j/2 φ 2j · −k , k = 1, . . . , 2j − 1, which are dilates and translates of the classical hat function φ(x) := (1 − |x|)+ , the corresponding Galerkin scheme requires finding uj ∈ Sj such that
uj , v = f, v, Clearly, making the ansatz uJ = of equations AJ uJ = fJ ,
2J −1 k=1
v ∈ Sj .
(8)
uJ,k φJ,k , gives rise to a linear system
where AJ := ΦJ , ΦJ , fJ := Φj , f .
(9)
Here and below we denote for any two countable collections Θ ⊂ Y, Ξ ⊂ X and any bilinear form c(·, ·) on Y × X by c(Θ, Ξ) the matrix (c(θ, ξ))θ∈Θ,ξ∈Ξ . In particular, as used before, Φj , f is then a column vector. Although in practice one would not apply an iterative scheme for the solution of the particular system (9), it serves well to explain what will be relevant for more realistic multidimensional problems. The performance of an iterative scheme for a symmetric positive system is known to depend on the condition number of that system which in this case is the quotient of the maximal and minimal eigenvalue. Although it will be explained later in a little
38
Wolfgang Dahmen
bit more detail, what can be said about the condition numbers for problems of the above type (see Section 10.3), it should suffice for the moment to note that the condition numbers grow like h−2 (here 22J for h = 2−J ) for a given mesh size h, which indeed adversely affects the performance of the iteration. This growth can be made plausible by noting that the second derivative treats highly oscillatory functions very differently from slowly varying functions. Since the trial space SJ contains functions with frequency ranging between one and 2J , this accounts for the claimed growth. This motivates to split the space SJ into direct summands consisting of functions with fixed frequency. On each such summand the differential operator would be well conditioned. If the summands were well separated with respect to the energy inner product a(v, w) := v , w this should enable one to precondition the whole problem. This is the essence of multilevel preconditioners and can be illustrated well in the present simple case. The key is again that the main building block, the hat function, is refinable in the sense that φ(x) = 12 φ(2x + 1) + φ(2x) + 12 φ(2x − 1) φj,k =
√1 ( 1 φj+1,2k−1 2 2
1
+ φj+1,2k + 12 φj+1,2k+1 ),
-1
- 1/2
1/2
1
which means that the trial spaces Sj are nested. As in the case of the Haar basis one builds now an alternative multilevel basis for SJ by retaining basis functions on lower levels and adding additional functions in a complement space between two successive spaces Sj and Sj+1 . The simplest complement spaces are those spanned by the hat functions for the new nodes on the next higher level, see Figure 3. The resulting multilevel basis has become known as hierarchical basis [75]. Ψj
:= {ψj,k := φj+1,2k+1 : k = 0, . . . , 2j − 1}
1
0
Sj+1 = Sj ⊕ span (Ψj )
Fig. 3. basis
1
Hierarchical
H Denoting the Haar wavelets for the sake of distinction here by ψj,k , one d d j+ 32 H ψj,k (x) so that can check that dx ψj,k (x) = dx φj+1,2k+1 (x) = 2
d d ψj,k , ψl,m = 22j+3 δ(j,k),(l,m) . dx dx
(10)
Multiscale and Wavelet Methods for Operator Equations
39
Thus the stiffness matrix relative to the hierarchical basis is even diagonal and (in a slight abuse of the term) a simple diagonal scaling would yield uniformly bounded condition numbers independent of the mesh size. 2.4 Summary So far we have seen some basic effects of two features, namely Vanishing Moments – Cancellation Property (CP) of wavelets which entail: • •
sparse representations of functions; sparse representations of operators;
as well as norm equivalences (NE) which imply: • •
tight relations between functions and their digit representation; well conditioned systems.
Of course, the above examples are very specific and the question arise to what extend the above properties are needed in less simple situations. Specifically, what is the role of orthogonality? Diagonalization is certainly much stronger than preconditioning. We will extract next somewhat weaker features of multilevel bases that have a better chance to be practically feasible in a much more general framework and will then point out why these features are actually sufficient for the purposes indicated above.
3 Wavelet Bases – Main Features The objective is to develop flexible concepts for the construction of multiscale bases on one hand for realistic domain geometries Ω (bounded Euclidean domains, surfaces, manifolds) as well as for interesting classes of operator equations so as to exploit in essence the features exhibited by the above examples. 3.1 The General Format We postpone discussing the concrete construction of such bases but are content for the moment with describing their general format, always keeping the above examples as guide line in mind. We consider collections Ψ = {ψλ : λ ∈ J } ⊂ L2 (Ω) of functions – wavelets – that are normalized in L2 , i.e. ψλ L2 = 1, λ ∈ J , where dim Ω = d. Here J = Jφ ∪ Jψ is an infinite index set where: #Jφ < ∞ representing the “scaling functions”, like the box or hat functions above, living on the coarsest scale. For Euclidean domains these functions will span polynomials up to some order which will be called the order of the basis Ψ . The indices in Jψ represent the “true” wavelets spanning complements between refinement levels. Each index λ ∈ J
40
Wolfgang Dahmen
encodes different types of information, namely the scale j = j(λ) = |λ|, the spatial location k = k(λ) and the type e = e(λ) of the wavelet. Recall that e.g. for tensor product constructions one has 2d − 1 different types of wavelets associated with each spatial index k. For d = 2 one has, for instance, ψλ (x, y) = 2j ψ 1,0 (2j (x, y) − (k, l)) = 2j/2 ψ(2j x − k)2j/2 φ(2j y − l). We will explain later what exactly qualifies Ψ as a wavelet basis in our context. 3.2 Notational Conventions As before it will be convenient to view a collection Ψ as an (infinite) vector (with respect to some fixed but unspecified order of the indices in J ). We will always denote by D = diag (wλ : λ ∈ J ) a diagonal matrix and write D = Ds when the diagonal entries are wλ = 2s|λ| , a frequently occurring case. Thus D−s Ψ = {2−s|λ| ψλ } will often stand for a scaled wavelet basis. Arrays of wavelet coefficients will be denoted by boldface letters like v = {vλ }λ∈J , d, u, . . . ,. The above shorthand notation allows us then to write wavelet expansions as dT Ψ := λ∈J dλ ψλ . In the following the notation d for the wavelet coefficients indicates expansions with respect to the unscaled L2 -normalized basis while v, u will typically be associated with a scaled basis. Recall from (9) that (generalized) Gramian matrices are for any bilinear form c(·, ·) : X × Y → IR and (countable) collections Ξ ⊂ X, Θ ⊂ Y denoted by c(Ξ, Θ) := (c(ξ, θ))ξ∈Ξ,θ∈Θ ,
Ψ, f = ( f, ψλ )Tλ∈J ,
where the right expression is viewed as a column vector. 3.3 Main Features The main features of wavelet bases for our purposes can be summarized as follows: • Locality (L); • Cancellation Properties (CP); • Norm Equivalences (NE); Some comments are in order. Locality (L): By this we mean that the wavelets are compactly supported and that the supports scale as follows. Ωλ := supp ψλ ,
diam (Ωλ ) ∼ 2−|λ| .
(11)
Of course, one could replace 2 by some ρ > 1. But this is merely a technical point and will be dismissed for simplicity.
Multiscale and Wavelet Methods for Operator Equations
41
Cancellation Property (CP) of Order m: ˜ This means that integration against a wavelet annihilates smooth parts which can be expressed in terms of inequalities of the following type: −|λ|(m+ ˜ 2−p) |v|Wpm˜ (Ωλ ) , | v, ψλ | < ∼ 2 d
d
λ ∈ Jψ ,
(12)
see (4) for m ˜ = 1, d = 1, p = 2. The most familiar sufficient (in fact equivalent) condition for (12) for wavelets on Euclidean domains is that the wavelets have vanishing polynomial moments of order m ˜ which means that
P, ψλ = 0,
P ∈ Pm ˜ , λ ∈ Jψ ,
(13)
where Pm ˜ − 1 (order ˜ denotes the space of all polynomials of (total) degree m m). ˜ In fact, the same argument as in (4) yields for p1 + p1 = 1 | v, ψλ | =
inf | v − P, ψλ | ≤ inf v − P Lp (Ωλ ) ψλ Lp ,
P ∈Pm ˜
d −|λ|( d 2−p)
< 2 ∼
P ∈Pm ˜
inf v − P Lp (Ωλ ) ,
P ∈Pm ˜
where we have used that |λ|
ψλ Lp ∼ 2
d p
−d 2
∼ 2|λ|( 2 − p ) . d
d
(14)
Now one can invoke standard Whitney-type estimates for local polynomial approximation of the form k inf v − P Lp (Ω) < ∼ (diam Ω) |v|Wpk (Ω)
P ∈Pk
(15)
to confirm (12) in this case (for more details about (15) see Section 10.2). Norm Equivalences (NE): This is perhaps the most crucial point. It should be emphasized that, in comparison with conventional numerical schemes, wavelet concepts aim at more than just finite dimensional approximations of a given or searched for function but rather at its representation in terms of an array of wavelet coefficients – the digits of the underlying function. The tighter the interrelation between function and digits is, the better. We will make heavy use of the following type of such interrelations. For some γ, γ˜ > 0 and s ∈ (−˜ γ , γ) there exist positive bounded constants cs , Cs such that cs v2 ≤ vT D−s Ψ H s ≤ Cs v2 ,
v ∈ 2 ,
(16)
where for s ≥ 0 the space H s will always stand for a closed subspace of the Sobolev space H s (Ω), defined e.g. by imposing homogeneous boundary conditions on parts or all of the boundary of Ω, i.e., H0s (Ω) ⊆ H s ⊆ H s (Ω). For s < 0, the space H s is always understood as the dual space H s := (H −s ) . (16) means that the scaled basis D−s Ψ is a Riesz-basis for H s , i.e., every element in H s has a unique expansion satisfying (16). Thus small changes in
42
Wolfgang Dahmen
the coefficients can cause only small changes in the function and vice versa, which is obviously a desirable feature with respect to stability. We will discuss next some ramifications of (16). The first observation is that (16) entails further norm equivalences for other Hilbert spaces. The following example addresses the issue of robustness of such norm equivalences. The point here is that these relations can often be arranged to be independent of some parameter in an energy inner product and the corresponding norm, a fact that will be, for instance, relevant in the context of preconditioning. Remark 1 Define v2H := ∇v, ∇v + v, v and consider the diagonal
√ matrix D := (1 + 2|λ| )δλ,µ λ,µ∈J . Then whenever (16) holds with γ > 1 one has for every v ∈ 2
2 −1 −1/2 2 1/2 v2 . v2 ≤ vT D−1 2(c−2 Ψ H ≤ C0 + C1 0 + c1 )
(17)
Proof: Let v = dT Ψ (= (D d)T D−1 Ψ ). We wish to show that D d2 ∼ vH ): √ {(1 + 2|λ| )dλ }λ∈J 2J ≤ 2 |dλ |2 + 22|λ| |dλ |2 λ∈J
−2 v2L2 + |v|2H 1 ≤ 2 c−2 0 + c1
−2 = 2 c−2 v2H 0 + c1
(N E)
Conversely one has (N E)
v2L2 + |v|2H 1 ≤ C02 d22 + C12 D1 d22 ≤ (C02 + C12 ) ≤
(C02
+
C12 )
(1 +
√
(1 + 22|λ| )|dλ |2
λ∈J |λ| 2
2 ) |dλ | , 2
λ∈J
which finishes the proof.
2
The next consequence is of general nature and concerns duality. Remark 2 Let H be a Hilbert space, ·, · : H × H → IR, and suppose that cv2 ≤ vT ΘH ≤ Cv2 ,
(18)
i.e., Θ is a Riesz-basis for H. Then one has C −1 Θ, v2 ≤ vH ≤ c−1 Θ, v2 .
(19)
An important Application of (19) is the case H = H01 (Ω), Θ := D−1 Ψ which gives
Multiscale and Wavelet Methods for Operator Equations −1 C1−1 D−1 Ψ, w2 ≤ wH −1 (Ω) ≤ c−1
Ψ, w2 . 1 D
43
(20)
The H −1 norm arise naturally in connection with second order elliptic equations. Its evaluation is, for instance, important in connection with least squares formulations [17, 18] and poses serious difficulties in the finite element context. Proof of Remark 2: We proceed in three steps. 1) Consider the mapping F : 2 → H, F : v → vT Θ. Then (18) means that F 2 →H = C,
F −1 H→2 = c−1 .
(21)
2) For the adjoint F ∗ : H → 2 , defined by F v, w = vT F ∗ w, one then has on one hand (F ∗ w)T v
F v, w = sup v v2 v v 2 F vH wH ≤ sup = F 2 →H wH , v2 v
F ∗ w2 = sup
and on the other hand,
F v, w (F ∗ w)T v sup wH w wH w F ∗ w2 v2 ≤ sup = F ∗ H →2 v2 . wH w
F vH = sup
Thus we conclude that F 2 →H = F ∗ H →2 = C,
F −1 H→2 = (F ∗ )−1 2 →H = c−1 . (22) 3) It remains to identify F ∗ w by applying it to the sequence eλ := (δλ,µ : µ ∈ J ). In fact, (F ∗ w)λ = (F ∗ w)T eλ = F eλ , w = θλ , w which means F ∗ w = Θ, w, 2
whence the assertion follows.
Riesz-Bases – Biorthogonality: We pause to stress the role of biorthogonality in the context of Riesz bases. Recall that under the assumption (18) the mappings F : 2 → H F : v → vT Θ,
F ∗ : H → 2
F v, w = vT F ∗ w,
˜ by are topological isomorphisms and that F eλ = θλ . Defining the collection Θ θ˜λ := (F ∗ )−1 eλ , we obtain
44
Wolfgang Dahmen
θλ , θ˜µ = F eλ , (F ∗ )−1 eµ = eTλ eµ = δλ,µ , which means ˜ = I.
Θ, Θ
(23)
˜ and (19) simply means that Θ ˜ is a Riesz basis Thus one has w = w, ΘΘ for H . Hence a Riesz basis for H always entails a Riesz basis for H . Of ˜ when Θ is an orthonormal basis and H is identified with course, Θ = Θ H . The following consequence often guides the construction of wavelet bases satisfying (16) also for s = 0. Corollary 3 When H = L2 (Ω) = H for every Riesz basis Ψ of L2 there exists a biorthogonal basis Ψ˜ which is also a Riesz basis for L2 . We conclude this section with some remarks on Further norm equivalences - Besov-Spaces: Norm equivalence of the type (16) extend to other function spaces such as Besov spaces. For a more detailed treatment the reader is referred, for instance, to [11, 24, 51, 53, 52], see also Section 10 for definitions. Note first that (16) can be viewed for 0 < p < ∞, 0 < q ≤ ∞ as a special case of vqB t (Lp ) ∼ vqLp + 2tq|λ| v, ψ˜λ ψλ qLp , (24) q
λ∈J
which holds again for some range of t > 0 depending on the regularity of the wavelets. Now, renormalizing 1
1
ψλ,p := 2d|λ|( p − 2 ) ψλ so that ψλ,p Lp ∼ 1, we denote by Ψp := {ψλ,p : λ ∈ J } the Lp -normalized version of Ψ and note that one still has a biorthogonal pair Ψp , Ψ˜p = I. Moreover, whenever we have an embedding Bqt (Lτ ) ⊂ Lp , (24) can be restated as vqB t (Lτ ) q
∼
vqLτ
+
∞
t 1 1 2jd( d + p − τ ) Ψ˜j,p , vτ
q ,
(25)
j=0
where Ψ˜j,p := {ψ˜λ,p : |λ| = j}. Note that the embedding Bqt (Lτ ) ⊂ Lp holds, by the Sobolev embedding theorem, as long as t 1 1 ≤ + , τ d p
(26)
(with a restricted range for q when equality holds in (26)). In particular, in the extreme case one has t 1 1 + = d p τ
; vBτt (Lτ ) ∼ vLτ + Ψ˜p , vτ ,
(27)
Multiscale and Wavelet Methods for Operator Equations
45
i.e., the Besov norm is equivalent to an τ -norm, where no scale dependent weight arises any more. This indicates that the embedding is not compact. It is however, an important case that plays a central role in nonlinear approximation and adaptivity as explained later. Figure 4 illustrates the situation. With every point in the (1/p, s) coordinate plane we associate (a collection of) spaces with smoothness s in Lp . Left of the critical line through (1/p, 0) with slope d all spaces are still embedded in Lp (with a compact embedding strictly left of the critical line), see [24, 51].
Fig. 4. Critical embedding line
4 Criteria for (NE) In view of the importance of (NE), we discuss briefly ways of affirming its validity. Fourier techniques are typically applied when dealing with bases living on the torus or in Euclidean space. Different concepts are needed when dealing with more general domain geometries. We have already seen that the existence of a biorthogonal Riesz basis comes with a Riesz basis. Thus biorthogonality is a necessary but unfortunately not sufficient criterion. 4.1 What Could Additional Conditions Look Like? Let us see next what kind of conditions, in addition to biorthogonality, are related to (NE). To this end, suppose that (NE) holds for 0 ≤ t < γ, so that
|λ|<J
(NE)
Jt dλ ψλ H t ∼ {2t|λ| dλ }|λ|<J 2 < ∼ 2 {dλ }|λ|<J 2 (NE)
∼ 2Jt
|λ|<J
Thus defining the hierarchy of spaces
dλ ψλ L2 .
46
Wolfgang Dahmen
Sj := closH t span (ψλ : |λ| < j),
(28)
we see that (NE) implies the Inverse (or Bernstein) Estimate: jt vH t < ∼ 2 vL2 ,
v ∈ Sj , t < γ.
(29)
Furthermore, consider the canonical projectors Qj v :=
v, ψ˜λ ψλ , Q∗j v :=
v, ψλ ψ˜λ , |λ|<j
(30)
|λ|<j
and note that < ∼
v − Qj vL2
1/2 | v, ψ˜λ |2
≤ 2−jt
|λ|≥j
1/2 22t|λ| | v, ψ˜λ |2
|λ|≥j
(NE)
< 2−jt vH t . ∼
This entails the Direct (or Jackson) Estimate: −jt inf v − vj L2 < ∼ 2 vH t ,
vj ∈Sj
t < γ.
(31)
Conversely, it will be seen below that direct and inverse estimates imply the validity of (NE) for a certain range of t. 4.2 Fourier- and Basis-free Criteria Note that the above consequences of (NE) are actually properties of the multiresolution spaces Sj , not of the specific bases. Let us therefore use possibly basis-free formulations. To this end, note that details between two successive spaces Sj and Sj+1 can be expressed as
v, ψ˜λ ψλ = (Qj+1 − Qj )v. |λ|=j
Collections of functions that are stable on each level are relatively easy to construct. In fact, when the biorthogonal basis is also local, this is easy to see, as explained later, see Remark 5 in Section 5.1. In such a case one has | v, ψ˜λ |2 , (Qj+1 − Qj )v2L2 ∼ |λ|=j
Multiscale and Wavelet Methods for Operator Equations
47
which means Ns,Q (v) :=
∞
1/2 2
j2s
(Qj+1 −
Qj )v2L2
∼
j=0
1/2 2
2s|λ|
| v, ψ˜λ |2
.
λ∈J
Moreover, biorthogonality of Ψ, Ψ˜ means that the operators Qj defined by (30) commute. Thus, the seemingly relevant properties like direct and inverse estimates as well as biorthogonality can be formulated without explicit reference to the specific bases Ψ, Ψ˜ (which will be important for construction principles). In fact, in summary, one is led by the above discussion to ask the following question: • Given a multiresolution sequence of nested spaces S = {Sj }j∈IN 0 : Sj ⊂ H s for s < γ, and an associate sequence Q = {Qj }j∈IN 0 of uniformly L2 bounded projectors mapping L2 onto Sj , such that • the commutator property (C): l ≤ j,
Ql Qj = Ql ,
(32)
• the Jackson estimate (J):
−m j vH m , inf v − vj L2 < ∼ 2 vj ∈Vj
v ∈ Hm
(33)
and the • Bernstein estimate (B): sj vj H s < ∼ 2 vj L2 ,
vj ∈ Vj , s < γ ,
(34)
hold for Sj = Vj and some γ > 0, m ∈ IN , can one ensure that Ns,Q (·) ∼ · H s ? The following statement is a special case of a more general result from [39]. Theorem 1. For S, Q as above suppose that (32) and (34), (33) hold for Vj = Sj with m = m > γ = γ > 0. Then vH s
1/2 ∞ ∼ 22sj (Qj − Qj−1 )v2L2 ,
0 < s < γ.
(35)
j=0
˜ > Moreover, if (33) and (34) also hold for Vj = S˜j := range Q∗j with m = m γ < s < γ, where γ = γ˜ > 0, then the above equivalence (35) also holds for −˜ for s < 0 it is understood that H s = (H −s ) .
48
Wolfgang Dahmen
A few comments are in order. Note that (35) is easier to realize for s > 0. The case s ≤ 0 requires more effort and involves the dual multiresolution sequence S˜ as well. The space H s above may have incorporated homogeneous boundary conditions. As for the usefulness of the above criterion for the construction of a Riesz basis for L2 , the following remarks should be kept in mind. One actually starts usually with a multiresolution sequence S where the Sj are then not given as spans of wavelets but of single scale bases Φj . They consist of compactly supported functions like scaling functions in classical settings or by standard finite element nodal basis functions, see the examples in Section 2. One then tries to construct suitable complement bases spanning complements between any two successive spaces in S. It is of course not clear how to get a good guess for such complement bases whose union would be a candidate for Ψ . The difficulty is to choose these complements in such way that they give rise to a dual multiresolution (which is completely determined once the ψλ are chosen) satisfying the conditions (J), (B) as well for some range. It will be seen later how to solve this problem in a systematic fashion, provided that for ˜j can be found which then a single scale basis Φj for Sj biorthogonal bases Φ ˜ One can then construct Ψ and Ψ˜ based only on the knowledge of the define S. ˜j . collections Φj , Φ ˜j is possible in Although the construction of dual single scale pairs Φj , Φ some important cases, one faces serious difficulties with this approach when dealing with finite elements on nested triangulations. Therefore criteria, that ˜j , are desirable. We outline do not require explicit knowledge of dual pairs Φj , Φ next such criteria following essentially the developments in [49]. The key is that both multiresolution sequences S and S are prescribed. It will be seen that one need not know corresponding biorthogonal single scale bases to start with, but it rather suffices to ensure that single scale bases for these spaces can be biorthogonalized. This latter property can be expressed as follows. Remark 4 Let S and S˜ be two given multi-resolution sequences satisfying dim Sj = dim S˜j and inf
vj ∈Sj
sup ˜j v ˜j ∈S
| vj , v˜j | > 1. vj L2 (Ω) ˜ vj L2 (Ω) ∼
(36)
Then there exists a sequence Q of uniformly L2 -bounded projectors with ranges S such that range (id − Qj ) = (S˜j )⊥L2 ,
range (id − Q∗j ) = (Sj )⊥L2 .
(37)
Moreover, Q satisfies the commutator property (C), see (32). This leads to a version of Theorem 1, that prescribes both multiresolution sequences S and S˜ without requiring explicit knowledge neither of the dual generator bases nor of the dual wavelets.
Multiscale and Wavelet Methods for Operator Equations
49
Theorem 2. If in addition to the hypotheses of Remark 4 S and S˜ satisfy direct and inverse estimates (33) and (34) with respective parameters m, m ˜ ∈ IN and γ, γ˜ > 0, then one has for any wj ∈ range (Qj − Qj−1 ), Qj given by Remark 4, 2 ∞ ∞ < 22sj wj 2L2 , s ∈ [−m, ˜ γ), (38) w j ∼ j=0 s j=0 H
and
∞ j=0
2 22sj (Qj − Qj−1 )v2L2 < ∼ vH s ,
s ∈ (−˜ γ , m].
(39)
Thus for s ∈ (−˜ γ , γ) (38), (39), v → {(Qj − Qj−1 )v}j is a bounded mapping from H s onto 2,s (Q) := {{wj }j : wj ∈ range (Qj − Qj−1 ), {wj }j 2,s (Q) := 1/2 ∞ ∞ 2sj wj 2L2 (Ω) < ∞}, with bounded inverse {wj }j → j=0 2 j=0 wj ,
i.e., for s ∈ (−˜ γ , γ) the above relations hold with ‘ < ∼ replaced by ‘ ∼ . Clearly Theorem 2 implies Theorem 1. We sketch now the main steps in the proof of Theorem 2 (see [39, 49] for more details). Define orthogonal projectors Pj : L2 → S˜j by
v, v˜j = Pj v, v˜j ,
v ∈ L2 , v˜j ∈ S˜j ,
and set Rj : Pj |Sj so that Rj L2 ≤ 1. Then (36) implies Rj vj L2 > ∼ vj L2 , ˜ vj ∈ Sj . Now we claim that range Rj = Sj , since otherwise there would exist v˜j ∈ S˜j , v˜j ⊥L2 range Rj , contradicting (36). Thus, the Rj−1 : S˜j → Sj are uniformly L2 -bounded. Then the projectors Qj := Rj−1 Qj : L2 → Sj are uniformly L2 bounded, range Qj = Sj and Qj v, v˜j = v, v˜j , which provides range (I − Qj ) ⊂ (S˜j )⊥L2 . Conversely, v ∈ (S˜j )⊥L2 implies range (I − Qj ) ⊂ (S˜j )⊥L2 , which confirms analogous properties for the adjoints Q∗j . Note that S˜j ⊂ S˜j+1 , which implies (C) Qj Qj+1 = Qj . 2 As for Theorem 2, observe first j(s±) wj H s± < wj L2 , ∼ 2
∀ wj ∈ range (Qj − Qj−1 ), s ± ∈ [−m, ˜ γ). (40) In fact, (40) follows from (B) (34) for s ± ∈ [0, γ). For t := s ± ∈ [−m, ˜ 0] and any wj ∈ range (Qj − Qj−1 ), one has wj H t
=
sup z∈H −t
< ∼
sup z∈H −t
wj , (Q∗j − Q∗j−1 )z
wj , z = sup zH −t zH −t z∈H −t wj L2 inf v˜j−1 ∈S˜j−1 z − v˜j−1 L2 zH −t
(J)
< 2tj wj L2 . ∼
50
Wolfgang Dahmen
Hence 2 wj j
=
wj ,
j
Hs (B)
< ∼
l≥j
< ∼
wl
l
j
Hs
j
wj H s+ wl H s−
l≥j
2j 2−l (2sk wj L2 )(2sl wl L2 ) < ∼
22sj wj 2L2 .
j
Thus, we have shown vH s < ∼ Ns,Q (v),
s ∈ [−m, ˜ γ),
(41)
˜ the same argument gives which is (38). Interchanging the roles of S and S, vH s < ∼ Ns,Q∗ (v), s ∈ [−m, γ˜ ). In order to prove now (39), note first that Ns,Q (v)2 = 22sj (Qj − Qj−1 )v, (Qj − Qj−1 )v j
=
(42)
2
2sj
(Q∗j
−
Q∗j−1 )(Qj
− Qj−1 )v, v
j
≤
22sj (Q∗j − Q∗j−1 )(Qj − Qj−1 )v H −s vH s
j
:=w (42)
< N−s,Q∗ (w)vH s , s ∈ (−˜ γ , m] (43) ∼ Since by (C) (see (32)) (Q∗l − Q∗l−1 )(Q∗j − Q∗j−1 ) = δj,l (Q∗l − Q∗l−1 ) one has 2sl (Q∗l −Q∗l−1 )wL2 = 22sl (Q∗l −Q∗l−1 )(Ql −Ql−1 )vL2 < ∼ 2 (Ql −Ql−1 )vL2 . This gives 1/2 N−s,Q∗ (w) = 2−2sl (Q∗l − Q∗l−1 )w2L2 l
< ∼
1/2 2−2sl 24sl (Ql − Ql−1 )v2L2
= Ns,Q (v).
j
Therefore we can bound N−s,Q∗ (w) on the right hand side of (43) by a constant multiple of Ns,Q (v). Dividing both sides of the resulting inequality by 2 Ns,Q (v), yields (39). Direct and inverse estimates (J), (B) are satisfied for all standard hierarchies of trial spaces where m is the order of polynomial exactness.
Multiscale and Wavelet Methods for Operator Equations
51
• A possible strategy for (J) is to construct L2 -bounded local projectors onto Sj and use the reproduction of polynomials and corresponding local polynomial inequalities; • (B) follows from stability and rescaling arguments. The simplest case is s ∈ IN . One first establishes an estimate on a reference domain, then rescales and sums up the local quantities.
5 Multiscale Decompositions – Construction and Analysis Principles We indicate next some construction principles that allow us to make use of the above stability criteria. The main tool is the notion of multiresolution hierarchies. 5.1 Multiresolution The common starting point for the construction of multiscale bases is an ascending sequence of spaces S0 ⊂ S1 ⊂ S2 ⊂ . . . L2 (Ω), Sj = L2 (Ω), j
which are spanned by single scale bases Sj = span Φj =: S(Φj ), One then seeks decompositions Sj+1 = Sj
Φj = {φj,k : k ∈ Ij }.
Wj
along with corresponding complement bases Wj = span Ψj ,
Ψj = {ψλ : λ ∈ Jj }.
The union of the coarse scale basis Φ0 and all the complement bases provides a multi-scale basis Ψ := Ψj (Ψ−1 := Φ0 ), j∈IN 0
which is a candidate for a wavelet basis. Before discussing how to find suitable complement bases, it is important to distinguish several stability notions. Uniform single scale stability: This means that the relations cl2 (Ij ) ∼ cT Φj L2 ,
dl2 (Jj ) ∼ dT Ψj L2
(44)
hold uniformly in j. Single scale stability is easily established, for instance, when local dual bases are available.
52
Wolfgang Dahmen
˜j are dual pairs of single scale bases, i.e., Remark 5 Suppose Φj , Φ φj,k L2 ∼ 1, φ˜j,k L2 ∼ 1,
˜j = I,
Φj , Φ
˜j ) ∼ 1. diam (supp Φj ), diam (supp Φ ˜j . Then (44) holds for Φj and Φ Proof: Let σj,k denote the support of φj,k and 2j,k := 2−j (k + [0, 1]d ). Then for v = cT Φj one has 2 ˜j 2 < v2L2 (˜σj,k ) < c22 = v, Φ 2 ∼ ∼ vL2 . k∈Ij
Conversely, v2L2 (2j,k ) ≤
2 |cm |φj,m L2 (2j,k )
σj,m ∩2j,k =∅
< ∼
|cm |2 ,
σj,m ∩2j,k =∅
which, upon summation, yields 2 v2L2 < ∼ c2 .
2
This finishes the proof.
Of course, this does not mean that the multiscale basis Ψ is stable in the sense that d2 ∼ dT Ψ L2 , which we refer to as stability over all levels, or equivalently the Riesz-basis property in L2 . 5.2 Stability of Multiscale Transformations Let us pause here to indicate a first instance where the Riesz-basis property has computational significance. To this end, note that for each j the elements of Sj have two equivalent representations j−1 l=−1 λ∈Jl
dλ ψλ =
ck φj,k
k∈Ij
Tj : d → c and the corresponding arrays of single-scale, respectively multiscale coefficients c, d are interrelated by the multiscale transformation Tj , see Section 2.1. Remark 6 (cf. [23, 39]) Assume that the Φj are uniformly L2 -stable. Then Tj , T−1 j = O(1) ⇐⇒ Ψ is a Riesz basis in L2 .
Multiscale and Wavelet Methods for Operator Equations
53
Proof: Let Ψ j = {ψλ : |λ| < j} and v = dTj Ψ j = cTj Φj . Assume that Ψ is a Riesz basis. Then one has cj 2 ∼ vL2 = dTj Ψ j L2 ∼ dj 2 = T−1 j c j 2 . Conversely:
dj 2 = T−1 j cj 2 ∼ cj 2 ∼ vL2 ,
whence the assertion follows. 2 Stability over all scales is a more involved issue and will be addressed next. 5.3 Construction of Biorthogonal Bases – Stable Completions We have seen in Section 4 that biorthogonality is necessary for the Riesz basis property. Once a pair of biorthogonal bases is given, one can use Theorem 1 or Theorem 2 to check the desired norm equivalences (16), in particular, for s = 0. The construction and analysis of biorthogonal wavelets on IR or IRd is greatly facilitated by Fourier methods (see [25, 50]). For more general domains the construction of good complement spaces Wj is based on different strategies. A rough road map looks as follows: •
Construct stable dual multiresolution sequences S = {Sj }j∈IN 0 , Sj = span(Φj ),
˜j ), S˜ = {S˜j }j∈IN 0 , S˜j = span(Φ
˜j = I, j ∈ IN 0 . such that Φj , Φ ˇ j. • Construct some simple initial complement spaces W • Change the initial complements into better ones Wj . We will comment next on these steps. 5.4 Refinement Relations Stability (44) and nestedness of the spaces Sj imply refinement equations of the form φj,k = mj,l,k φj+1,l , (45) l∈Ij+1
which we briefly express as ΦTj = ΦTj+1 Mj,0 .
(46)
Here the columns of the refinement matrix Mj,0 consist of the arrays of mask coefficients mj,l,k . The objective is then to find a basis Ψj+1 = {ψλ : |λ| = j} ⊂ Sj+1 , i.e., T Ψj+1 = ΦTj+1 Mj,1 , (47)
54
Wolfgang Dahmen
that spans a complement of Sj in Sj+1 . It is easy to see that this is equivalent to the fact that the matrix Mj := (Mj,0 , Mj,1 ) is invertible, i.e, for M−1 =: j,0
Gj = G Gj,1 one has T Gj,1 ⇐⇒ Mj Gj = Gj Mj = I. ΦTj+1 = ΦTj Gj,0 + Ψj+1
(48)
Remark 7 [23] The bases Ψj , Φj are uniformly stable in the sense of (44) if and only if Mj 2 →2 , Gj 2 →2 = O(1). (49) The matrix Mj,1 is called a completion of the refinement matrix Mj,0 . The sequence {Mj,1 }j is called (uniformly) stable completions when (49) holds. Let us illustrate these notions by the hierarchical complement bases from Section 2.3. From the refinement equations one readily infers (see Figure 5)
Mj,0
=
1 √ 0 0 2 2 √1 0 0 2 1 1 √ √ 0 2 2 2 2 1 0 √2 0 1 1 √ 0 2√ 2 2 2
0 .. .
0 0
and
... ... 0 ... , 0 ... .. .. .. . . . 0 1 1 √ √ 2 2 2 2 ... 0 √12 1 ... 0 2√ 2
Mj,1
...
1 0 0 . = ..
0 0 1 .. .
0 0 0 .. .
...
.. . . 0 1 0 0 0 0 001
Multiscale and Wavelet Methods for Operator Equations
55
1
- 1/2
-1
1/2
1
Fig. 5. Hat function
One also derives directly from Figure 5 how to express the fine scale basis functions in terms of the coarse scale and complement basis functions (the latter being simply the fine scale basis functions at the new knots):
φj+1,2k+1 φj+1,0
√
1 (φj+1,2k−1 + φj+1,2k+1 ) 2 √ 1 = 2 φj,k − (ψj,k−1 + ψj,k ) , k = 1, . . . , 2j − 1, 2 = ψj,k , k = 0, . . . , 2j − 1, √ √ 1 1 = 2 φj,0 − ψj,0 , φj+1,2j+1 = 2 φj,2j − ψj,2j −1 , 2 2
φj+1,2k =
2 φj,k −
Accordingly, the blocks of the matrix Gj look as follows: √ 0 2 0 0 0 ... 0 0 .. 0 0 0 √2 0 . 0 0 .. .. Gj,0 = .. .. . . . . √ 0 2 0 0 0 √ 0 0 ... 0 0 20
Gj,1
1 − 12 0 0 . . .
0 −1 1 −1 2 . 2 . = . . . . 0 ...
.. . .. . . − 12 0 − 12 1
5.5 Structure of Multiscale Transformations Recall that by (46) and (47)
ΦTj cj + ΨjT dj = ΦTj+1 Mj,0 cj + Mj,1 dj ,
(50)
56
Wolfgang Dahmen
so that TJ : d → c has the form M0,0
M1,0
M2,0
MJ−1,0
M0,1
M1,1
M2,1
MJ−1,1
c0 → c1 → c2 → · · ·
d0
d1
···
d2
→
cJ
dJ−1
see Section 2.1. Thus the transform consists of the successive application of the matrices Mj 0 , TJ = TJ,J−1 · · · TJ,0 . TJ,j := 0 I In order to determine the inverse transform T−1 J : c → d, note that by (48), ΦTj+1 cj+1 = ΦTj (Gj,0 cj+1 ) + ΨjT (Gj,1 cj+1 ) = ΦTj cj + ΨjT dj , which yields cJ
GJ−1,0
→
cJ−1
GJ−2,0
→
cJ−2
GJ−2,1
GJ−1,1
G0,0
→
· · · → c0
···
GJ−3,1
dJ−1
GJ−3,0
dJ−2
G0,1
d0 .
5.6 Parametrization of Stable Completions The above example of the hierarchical basis shows that often some stable completions are easy to construct. The corresponding complement bases may, however, not be appropriate yet, because the corresponding multiscale basis may not have a dual basis in L2 . The hierarchical basis is an example. The idea is then to modify some initial stable completion to obtain a better one. Therefore the following parametrization of all possible stable completions is of interest [23]. ˇ j,1 (and G ˇ j ), then all other Theorem 3. Given some initial completion M completions have the form ˇ j,1 K Mj,1 = Mj,0 L + M and
ˇ j,0 − G ˇ j,1 (KT )−1 LT , Gj,0 = G
ˇ j,1 (KT )−1 . Gj,1 = G
(51) (52)
The main point of the proof is the identity I L I −LK−1 ˇ jG ˇ j. I = Mj Gj = Mj Gj =: M 0K 0 K−1 The special case K = I is often referred to as Lifting Scheme [70]. Modifications of the above type can be used for the following purposes:
Multiscale and Wavelet Methods for Operator Equations
57
• Raising the order of vanishing moments: Choose K = I and L such that T Ψj P dx = ΦTj+1 Mj,1 P dx Ω
Ω
ΦTj LP + ΨˇjT P dx = 0,
=
P ∈ Pm∗ .
Ω
• •
Construction of finite element based wavelets through coarse grid corrections [23, 40, 42, 74]. Construction of biorthogonal wavelets on intervals and multivariate domains [21, 22, 30, 46, 47].
In particular, a systematic biorthogonalization can be described as follows. Suppose that dual pairs of single scale bases are given ΦTj = ΦTj+1 Mj,0 ,
˜ j,0 , ˜T = Φ ˜T M Φ j j+1
˜j = I.
Φj , Φ
ˇ j,1 , G ˇ j be as above. Then (K := I, L := −M ˜T M ˜ j,0 , M ˇ j,1 ) Theorem 4. Let M j,0 ˇ j,1 , ˜ T )M Mj,1 := (I − Mj,0 M j,0
˜ j,1 := G ˇ j,1 M
˜ T = I and are new uniformly stable completions satisfying Mj M j ΨjT := ΦTj+1 Mj,1 ,
T ˜ j,1 ˜T M Ψ˜j+1 := Φ j+1
form biorthogonal wavelet bases. The criteria from Section 4 can be used to show that the constructions based on these concepts give indeed rise to Riesz bases, cf. [21, 22, 30, 46, 47]. ˜T M ˇ j,1 are derived Very useful explicit expressions for the entries of L := −M j,0 in [62]
6 Scope of Problems We are going to describe next the scope of problems to which the above tools will be applied. 6.1 Problem Setting Let H be a Hilbert space and A(·, ·) : H × H → IR a continuous bilinear form, i.e., |A(V, U )| < (53) ∼ V H U H , V, U ∈ H. We will be concerned with the variational problem: Given F ∈ H find U ∈ H such that
58
Wolfgang Dahmen
A(V, U ) = V, F ,
V ∈ H.
(54)
We explain first what we mean when saying that (54) is well-posed. To this end, define the operator L : H → H by
V, LU = A(V, U ),
V ∈ H,
(55)
so that (54) is equivalent to LU = F.
(56)
Then (54) is called well-posed (on H) if there exist positive finite constants cL , CL such that cL V H ≤ LV H ≤ CL V H ,
V ∈ H.
(57)
We will refer to (57) as mapping property (MP). Clearly (MP) implies the existence of a unique solution for any F ∈ H which depends continuously on the data F (with respect to the topology of H). Remark 8 It should be noted that in many cases, given A(·, ·) the first (and often most crucial) task is to identify a suitable Hilbert space H such that the mapping property (57) holds. All examples to be considered later will have the following format. H will in general be a product space H = H1,0 × · · · Hm,0 , where the component spaces Hi,0 ⊆ Hi will be closed subspaces of Hilbert spaces Hi with norms · Hi . The Hi are typically Sobolev spaces and the Hi,0 could be determined by homogeneous boundary conditions H0ti (Ωi ) ⊆ Hi,0 ⊆ H ti (Ωi ). We use capital letters to indicate that the elements of H have several components so that V = (v1 , . . . , vm )T and V 2H = m in general 2 vi Hi . Denoting by ·, ·i a dual pairing on Hi ×Hi and setting V, W = i=1 m i=1 vi , wi i , the dual space H is endowed, as usual, with the norm
V, W . V ∈H V H
W H = sup
The bilinear form A(·, ·) will in general have the form A(V, W ) = m (ai,l (vi , wl ))i,l=1 , so that the operator L is matrix valued as well m L = (Li,l )i,l=1 . We proceed now discussing briefly several examples and typical obstructions to their numerical treatment.
Multiscale and Wavelet Methods for Operator Equations
59
6.2 Scalar 2nd Order Elliptic Boundary Value Problem Suppose that Ω ⊂ IRd is a bounded (Lipschitz) domain and a(x) is a symmetric (bounded) matrix that is uniformly positive definite on Ω. The classical boundary value problem associated with this second order partial differential equation reads
−div a(x)∇u + k(x)u = f on Ω, u = 0 on ∂Ω. (58) We shall reserve lower case letters to such scalar problems. Its weak formulation has the form (54) with m = 1 and a(v, w) := a∇v T ∇w + kvwdx, H = H01 (Ω), H = H −1 (Ω). (59) Ω
Classical finite difference or finite element discretizations turn (59) into a finite dimensional linear system of equations. When solving these systems, one encounters the following Obstructions: •
The systems are sparse but in realistic cases often very large so that the use of direct solvers based on elimination techniques, is excluded. In fact, the fill-in caused by elimination would result in prohibitive storage and CPU demands. • Hence one has to resort to iterative solvers whose efficiency depends on the condition numbers of the systems. Unfortunately, the systems are increasingly ill-conditioned. When the mesh size h decreases, cond2 (a(Φh , Φh )) ∼ h−2 , where a(Φh , Φh ) denotes the stiffness matrix with respect to an L2 stable single scale basis, such as a standard nodal finite element basis. 6.3 Global Operators – Boundary Integral Equations Let Ω − be again a bounded domain in IRd (d ∈ {2, 3}) and consider the following special case of (58) −∆w = 0, on Ω,
(Ω = Ω − or Ω + := IR3 \ Ω − ),
(60)
subject to the boundary conditions w=f
on Γ := ∂Ω −
(w(x) → 0, |x| → ∞ when Ω = Ω + ).
(61)
Of course, the unbounded domain Ω + poses an additional difficulty in the case of such an exterior boundary value problem. A well-known strategy is to transform (60), (61) into a boundary integral equation that lives only on the manifold Γ . There are several ways to do that. They all involve the fundamental solution of the Laplace operator E(x, y) = 1/4π|x − y| which gives rise to the single layer potential operator
60
Wolfgang Dahmen
E(x, y)u(y)dΓy ,
(Lu)(x) = (Vu)(x) :=
x ∈ Γ.
(62)
Γ
One can then show that the solution u of the first kind integral equation Vu = f
on Γ
(63)
provides the solution w of (60) through the representation formula w(x) = E(x, y)u(y)dΓy , x ∈ Ω. Γ
Here one has (see e.g.[64]) a(v, u) = v, VuΓ ,
H = H −1/2 (Γ ), H = H 1/2 (Γ ).
An alternative way uses the double layer potential (Kv)(x) :=
∂ E(x, y)v(y)dΓy = ∂ny
Γ
Γ
1 νyT (x − y) v(y) dΓy , 4π |x − y|3
x ∈ Γ, (64)
where ν is the outward normal to Γ . Now the solution of the second kind integral equation 1 (65) Lu := ( ± K)u = f (Ω = Ω ± ) 2 gives the solution to (60) through w(x) = K(x, y)u(y)dΓy . Γ
In this case the bilinear form and the corresponding energy space are as follows 1 a(v, w) = v, ( ± K)wΓ , 2
H = L2 (Γ ) = H2 = H .
The so called hypersingular operator offers yet another alternative in which case H turns out to be H 1/2 (Γ ). According to the shifts caused by these operators in the Sobolev scale the single layer potential, double layer potential and hypersingular operator have order −1, 0, 1, respectively. In a similar way Neumann boundary conditions can be treated. Moreover, other classical elliptic systems can be treated similarly where the above operators serve as core ingredients. The obvious advantage of the above approach is the reduction of the spatial dimension and that one has to discretize in all cases only bounded domains. On the other hand, one faces the following
Multiscale and Wavelet Methods for Operator Equations
61
Obstructions: • Discretizations lead in general to densely populated matrices. This severely limits the number of degrees of freedom when using direct solvers. But even iterative techniques are problematic, due to the cost of the matrix/vector multiplication. • Whenever the order of the operator is different from zero, the problem of growing condition numbers arises (e.g. L = V), see Section 10. 6.4 Saddle Point Problems All the above examples involve scalar coercive bilinear forms. An important class of problems which are no longer coercive are saddle point problems. A detailed treatment of this type of problems can be found in [19, 59]. Suppose X, M are Hilbert spaces and that a(·, ·), b(·, ·) are bilinear forms on X × X, respectively X × M which are continuous |a(v, w)| < ∼ vX wX ,
|b(q, v)| < ∼ vX qM .
(66)
Given f ∈ X , g ∈ M , find U = (u, p) ∈ X × M =: H such that one has for all V = (v, q) ∈ H A(U, V ) =
a(u, v) + b(p, v) = f, v, b(q, u) = q, g.
(67)
Note that when a(·, ·) is positive definite symmetric, the solution component u minimizes the quadratic functional J(w) := 12 a(w, w) − f, w subject to the constraint b(u, q) = q, g, for all q ∈ M , i.e., 1 inf sup a(v, v) + b(v, q) − f, v − g, q . v∈X q∈M 2 This accounts for the term saddle point problem (even under more general assumptions on a(·, ·)). In order to write (67) as an operator equation, define the operators A, B by a(v, w) =: v, Aw, v ∈ X, b(v, p) =: Bv, q, q ∈ M, so that (67) becomes LU :=
A B B 0
u f = =: F. p g
(68)
As for the mapping property (MP) (57), a simple (sufficient) condition reads as follows [19, 59]. If a(·, ·) is elliptic on ker B := {v ∈ X : b(v, q) = 0, ∀ q ∈ M },
62
Wolfgang Dahmen
i.e., a(v, v) ∼ v2X , v ∈ ker B,
(69)
and if b(·, ·) satisfies the inf-sup condition inf sup
q∈M v∈X
b(v, q) >β vX qM
(70)
for some positive β, then (66) is well-posed, i.e., L defined by (68) satisfies
1/2 v 2 2 1/2 cL vX + qM ≤ L X ×M ≤ CL v2X + q2M . (71) q Condition (70) means that B is surjective (and thus has closed range). Condition (69) is actually too strong. It can be replaced by requiring bijectivity of A on ker B, see [19], which will be used in some of the examples below. Before turning to these examples we summarize some principal Obstructions: •
As in previous examples discretizations lead usually to large linear systems that become more and more ill-conditioned when the resolution of the discretization increases. • An additional difficulty is caused by the fact that the form (67) is indefinite, so that more care has to be taken when devising an iterative scheme. • An important point is that the well-posedness of the infinite dimensional problem (71) is not automatically inherited by a finite dimensional Galerkin discretization. In fact, the trial spaces in X and M have to be compatible in the sense that they satisfy the inf-sup condition (70) uniformly with respect to the resolution of the chosen discretizations. This is called the Ladyˇshenskaja-Babuˇska-Brezzi-condition (LBB) and may, depending on the problem, be a delicate task. We discuss next some special cases. Second Order Problem - Fictitious Domains Ω ⊂ 2: Instead of incorporating boundary conditions for the second order problem (58) in the finite dimensional trial spaces, one can treat them as constraints for a variational problem, that is formulated over some possibly larger but simpler domain 2, e.g. a cube. This is of interest when the boundary varies or when boundary values are used as control variables. Appending these constraints with the aid of Lagrange multipliers, leads to the following saddle point problem, cf. [16, 43, 60]. Find U = (u, p) ∈ H := H 1 (2) × H −1/2 (Γ ), Γ := ∂Ω, such that
∇v, a∇u + v, pΓ = v, f
q, uΓ = g, q
for all v ∈ H 1 (2), for all q ∈ H −1/2 (Γ ).
(72)
Multiscale and Wavelet Methods for Operator Equations
63
Remark 9 The problem (72) is well posed, i.e., (71) holds. Proof: Clearly, (69) is satisfied. Moreover, the inf-sup condition (70) is a consequence of the ‘inverse’ Trace Theorem that states that inf
v∈H 1 (Ω),v|∂Ω=g
vH 1 (Ω) < ∼ gH 1/2 (∂Ω) ,
[19, 59]. In fact, in the present case we have b(v, p) = v, pΓ . Given p ∈ H −1/2 (Γ ), choose g ∈ H 1/2 (Γ ) such that < pH −1/2 (Γ ) ≤ 2 g, pΓ /gH 1/2 (Γ ) < ∼ v, pΓ /vH 1 (Ω) ∼ v, pΓ /vH 1 (2) for some v ∈ H ( 2). Thus b(v, p) > ∼ vH 1 (2) pH −1/2 (Γ ) which confirms the claim. 2
First Order Systems One is often more interested in derivatives of the solution to the boundary value problem (58). Introducing the fluxes θ := −a∇u, (58) can be written as a system of first order equations whose weak formulations reads
θ, η + η, a∇u = 0, − θ, ∇v + ku, v
∀ η ∈ L2 (Ω),
1 (Ω). = f, v, ∀v ∈ H0,Γ D
(73)
One now looks for a solution 1 U = (θ, u) ∈ H := L2 (Ω) × H0,Γ (Ω), D
(74)
1 (Ω) is the closure in H 1 (Ω) of all smooth functions whose support where H0,Γ D does not intersect ΓD . For a detailed discussion in the finite element context see e.g. [17, 18]. It turns out that in this case the Galerkin discretization inherits the stability from the original second order problem.
The Stokes System The simplest model for viscous incompressible fluid flow is the Stokes system −ν∆u + ∇p = f div u = 0 u|Γ = 0,
in Ω, in Ω,
(75)
where u and p are the velocity, respectively pressure, see [19, 59]. The relevant function spaces are 1 1 d X = H0 (Ω) := (H0 (Ω)) , M = L2,0 (Ω) := {q ∈ L2 (Ω) : q = 0}. (76) Ω
64
Wolfgang Dahmen
In fact, one can show that the range of the divergence operator is L2,0 (Ω). The weak formulation of (75) is ν ∇v, ∇uL2 (Ω) + div v, pL2 (Ω) = f , v, v ∈ H10 (Ω)
div u, qL2 (Ω) = 0, q ∈ L2,0 (Ω),
(77)
i.e., one seeks a solution U = (u, p) in the energy space H = X × M = H10 (Ω) × L2,0 (Ω), for which the mapping property (57) can be shown to hold [19, 59]. Stokes System - Fictitious Domain Formulation An example with m = 3 components is obtained by weakly enforcing inhomogeneous boundary conditions for the Stokes system (see [61]) −ν∆u + ∇p = f in Ω, div u = 0 in Ω, u|Γ = g, p dx = 0, ( g · n ds = 0), Ω
Γ
whose weak formulation is ν ∇v, ∇uL2 (Ω) + v, λL2 (Γ ) + div v, pL2 (Ω) = f , v ∀ v ∈ H1 (Ω), = g, µ ∀ µ ∈ H−1/2 (Γ ),
u, µL2 (Γ )
div u, qL2 (Ω) =0 ∀ q ∈ L2,0 (Ω). The unknowns are now velocity, pressure and an additional Lagrange multiplier for the boundary conditions U = (u, λ, p). An appropriate energy space is H := H1 (Ω) × H−1/2 (Γ ) × L2,0 (Ω). Well-posedness can either be argued by taking b(v, (λ, p)) = v, λL2 (Γ ) +
div v, pL2 (Ω) and observing that (69) holds, or by taking b(v, λ) = v, λL2 (Γ ) and using the fact that the Stokes operator is bijective on the kernel of this constraint operator. First Order Stokes System The same type of argument can be used to show that the following first order formulation of the Stokes system leads to a well-posed variational problem (with m = 4), [20, 44]. The fluxes θ are now matrix valued indicated by the underscore:
Multiscale and Wavelet Methods for Operator Equations
θ + ∇u = 0 −ν( div θ)T + ∇p = f div u = 0 u=g
65
in Ω, in Ω, in Ω, on Γ.
The unknowns are now U = (θ, u, p, λ) ∈ H where the energy space H := L2 (Ω) × H1 (Ω) × L2,0 (Ω) × H−1/2 (Γ ), can be shown to render the following weak formulation
θ, η + η, ∇u ν θ, ∇v − p, div v − λ, vΓ
div u, q
µ, uΓ
= 0, η ∈ L2 (Ω), = f , v, v ∈ H1 (Ω), = 0, q ∈ L2,0 (Ω), = µ, gΓ , µ ∈ H−1/2 (Γ ),
satisfy the mapping property (57). Transmission Problem The following example is interesting because it involves both local and global operators [31]
−∇ · (a∇u) = f in Ω0 , −∆u = 0 in Ω1 , u|Γ0 = 0 1 (Ω0 ) × H −1/2 (Γ1 ). H := H0,Γ D
Γ1 Γ0 Ω0 Ω1
Both boundary value problems are coupled by the interface conditions: u− = u+ , (∂n )u− = (∂n )u+ . A well-posed weak formulation of this problem with respect to the above H is
a∇u, ∇vΩ0 + Wu − ( 12 I − K )σ, vΓ1 = f, vΩ0 ,
( 12 I − K)u, δΓ1 + Vσ, δΓ1 = 0,
1 v ∈ H0,Γ (Ω0 ), D
δ ∈ H −1/2 (Γ1 ),
where W denotes here the hypersingular operator, see [31, 44]. Note that in all these examples, as an additional obstruction, the occurrence and evaluation of difficult norms · H 1/2 (Γ ) , · H −1/2 (Γ ) , · H −1 (Ω) arises.
66
Wolfgang Dahmen
7 An Equivalent 2 -Problem We shall now describe how the wavelet concepts from Section 3 apply to the above type of problems. An important distinction from conventional approaches lies in the fact, that for suitable choices of wavelet bases the original problem (54) can be transformed into an equivalent one defined in 2 . Moreover, it will turn out to be well-posed in Euclidean metric. Here ‘suitable’ means that the component spaces of H admit a wavelet characterization in the sense of (16). Specifically, we will assume that for each Hi,0 one has suitable bases Ψ i and scaling matrices Di such that i cΨ v2 (Ji ) ≤ vT D−1 i Ψ Hi ≤ CΨ v2 (Ji ) ,
v ∈ 2 , i = 1, . . . , m.
(78)
It will be convenient to adopt the following notational conventions. J := J1 × · · · × Jm ,
D := diag (D1 , . . . , Dm ) ,
V = (v1 , . . . , vm )T .
Thus, catenating the quantities corresponding to the component spaces, we will employ the following compact notation 1 m T −1 m T VT D−1 Ψ := ((v1 )T D−1 1 Ψ , . . . , (v ) Dm Ψ ) .
In these terms the wavelet characterization of the energy space can be expressed as cΨ V2 ≤ VT D−1 ΨH ≤ CΨ V2 . (79) Recalling Section 6.1, the (scaled) wavelet representation of the operators Li,l , defined by (55), is given by −1 i l Ai,l := D−1 i ai,l (Ψ , Ψ )Dl ,
i, l = 1, . . . , m.
The scaled standard representation of L, defined by (55), and the dual wavelet representation of the right hand side data are given by L := (Ai,l )i,l=1 = D−1 Ψ, LΨD−1 , m
F := D−1 Ψ, F .
We shall make essential use of the following fact. Theorem 5. Suppose that U = UT D−1 Ψ is the scaled wavelet representation of the solution to (54). Then one has LU = F
⇐⇒ LU = F,
(80)
and there exist positive constants cL , CL such that cL V2 ≤ LV2 ≤ CL V2 ,
V ∈ 2 .
(81)
In fact, lower respectively upper estimates for these constants are cL ≥ c2Ψ cL , CL ≤ CΨ2 CL .
Multiscale and Wavelet Methods for Operator Equations
67
Proof: The proof follows from (79) and (57). In fact, let V = VT D−1 Ψ. Then (19)
(MP)
−1 −1 −2 −1 −1 V2 ≤ c−1
Ψ, LV 2 Ψ V H ≤ cΨ cL LV H ≤ cΨ cL D −1 −1 −1 = c−2
Ψ, LΨD−1 V2 = c−2 Ψ cL D Ψ cL LV2 .
The converse estimate works analogously in reverse order.
2
7.1 Connection with Preconditioning The above result can be related to preconditioning as follows. Denote by Λ some finite subset of J . Let ΨΛ := {ψλ : λ ∈ Λ ⊂ J } so that SΛ := span ΨΛ is a finite dimensional subspace of H. Let DΛ denote the finite principal section of D determined by Λ. The stiffness matrix of L with respect to D−1 Λ Ψλ is then −1 given by LΛ := D−1 Λ A(ΨΛ , ΨΛ )DΛ . The following observation is obvious. Remark 10 If A(·, ·) in (54) is H-elliptic, then L is symmetric positive definite (s.p.d.) which, in turn, implies that cond2 (LΛ ) ≤
CΨ2 CL . c2Ψ cL
(82)
Ellipticity implies stability of Galerkin discretizations which explains why the infinite dimensional result (81) implies (82). The following observation stresses that it is indeed the stability of the Galerkin discretization that together with (81) implies (82). Remark 11 If the Galerkin discretizations with respect to the trial spaces SΛ are stable, i.e., L−1 Λ 2 →2 = O(1), #Λ → ∞, then one also has VΛ 2 ∼ LΛ VΛ 2 ,
VΛ ∈ IR#Λ .
(83)
Hence the mapping property (MP) of L and the norm equivalences (NE) imply uniformly bounded condition numbers cond2 (LΛ ) = O(1),
#Λ → ∞,
whenever the Galerkin discretizations with respect to SΛ are (uniformly) stable. Proof: By stability of the Galerkin discretizations, we only have to verify the uniform boundedness of the LΛ . Similarly as before we obtain −1 −1 LΛ VΛ 2 = D−1 Λ A(ΨΛ , ΨΛ )DΛ VΛ 2 = DΛ ΨΛ , LVΛ 2 (19)
≤ D−1 Ψ, LVΛ 2 ≤ c−1 Ψ LVΛ H (79)
−1 ≤ c−1 Ψ CL VΛ H ≤ cΨ CL CΨ VΛ 2 ,
which proves the assertion.
2
Recall that Galerkin stability for indefinite problems is in general not guaranteed for any choice of trial spaces.
68
Wolfgang Dahmen
7.2 There is always a Positive Definite Formulation – Least Squares Once a well-posed problem on 2 is given, squaring will yield a symmetric positive definite formulation. Theorem 6. Adhering to the previous notation, let M := LT L, Then with U = UT D−1 Ψ one has LU = F and
G := LT F.
⇐⇒ MU = F,
2 c−2 L V2 ≤ MV2 ≤ CL V2 .
Note that LU = F if and only if U minimizes LV − F H ∼ MV − G2 . The left expression corresponds to the natural norm least squares formulation of (54), see [44]. Such least squares formulations have been studied extensively in the finite element context, [17, 18]. A major obstruction then stems from the fact that the dual norm · H is often numerically hard to handle, see the examples in Section 6 involving broken trace norms or the H −1 -norm. Once suitable wavelet bases are available these norms become just weighted 2 -norms in the transformed domain. Nevertheless, finite dimensional Galerkin approximations would still require approximating corresponding infinite sums which again raises the issue of stability. This is analyzed in [44]. However, we will see below that this problem disappears, when employing adaptive solution techniques.
8 Adaptive Wavelet Schemes 8.1 Introductory Comments A natural way of applying the tools developed so far is to choose a finite subset ΨΛ of wavelets and solve for UΛ ∈ SΛ , satisfying A(V, UΛ ) = V, F ,
V ∈ SΛ .
(84)
The perhaps best understood case corresponds to choosing Λ = (J), which is the set of all indices in J up to level J. For instance, when using piecewise polynomial wavelets, Ψ(J) spans a finite dimensional space of piecewise polynomials on a mesh with mesh size h ∼ 2−J . For this type of trial spaces classical error estimates are available. For instance, in a scalar case (m = 1) when H = H t one has −J(s−t) u − u(J) H t < uH s , ∼ 2
J → ∞,
(85)
Multiscale and Wavelet Methods for Operator Equations
69
provided that the solution u belongs to H s . One can then choose J so as to meet some target accuracy . However, when the highest Sobolev regularity s of u exceeds t only by a little, one has to choose J correspondingly large, which amounts to a possibly very large number of required degrees of freedom N ∼ 2dJ . Of course, it is then very important to have efficient ways of solving the corresponding large systems of equations. This issue has been studied extensively in the literature. Suitable preconditioning techniques, based e.g. on the observations in Section 7.1, combined with nested iteration, (i.e., solving on successively finer discretization levels while using the approximation from the current level as initial guess on the next level,) allows one to solve the linear systems within discretization error accuracy at a computational expense that stays proportional to the number N of degrees of freedom. Thus, under the above circumstances, the amount of work needed to achieve accuracy remains proportional to the order of −d/(s−t) degrees of freedom, which is the larger the closer s is to t. In other words, relative to the target accuracy, the computational work related to preassigned uniform discretizations is the larger the lower the regularity, when measured in the same metric as the error. Our point of view here will be different. We wish to be as economical as possible with the number of degrees of freedom so as to still achieve some desired target accuracy. Thus instead of choosing a discretization in an apriori manner, we wish to adapt the discretization to the particular case at hand. Given any target accuracy , the goal then would be to identify ideally a possibly small set Λ() ⊂ J and a vector U() with support Λ() such that U − U()T D−1 Λ() ΨΛ() H ≤ .
(86)
The following two questions immediately come to mind. (I) What can be said about the relation between #Λ() as a measure for the minimum computational complexity and the accuracy , when one has complete knowledge about U . We will refer to this as the optimal work/accuracy balance corresponding to the best N -term approximation σN,H (U ) :=
inf
#Λ≤N,V,supp V=Λ
U − VT D−1 Λ Ψ Λ H ,
(87)
that minimizes the error in H for any given number N of degrees of freedoms. This is a question arising in nonlinear approximation, see e.g. [51]. Precise answers are known for various spaces H. Roughly speaking the asymptotic behavior of the error of best N -term approximation in the above sense is governed by certain regularity scales. (II)Best N -term approximation is in the present context only an ideal bench mark since U is not known. The question then is: Can one devise algorithms that are able to track during the solution process approximately the significant coefficients of U , so that the resulting work/accuracy balance is, up to a uniform constant, asymptotically the same as that of best N -term approximation?
70
Wolfgang Dahmen
A scheme that tracks in the above sense the significant coefficients of U , based at every stage on knowledge acquired during prior calculations, is called adaptive. 8.2 Adaptivity from Several Perspectives Adaptive resolution concepts have been the subject of numerous studies from several different perspectives. In the theory of Information Based Complexity the performance of adaptivity versus nonadaptive strategies has been studied for a fairly general framework [65, 72]. For the computational model assumed in this context the results are not in favor of adaptive techniques. On the other hand, adaptive local mesh refinements, based on a-posteriori error estimators/indicators in the finite element context, indicate a very promising potential of such techniques [56, 4, 5, 15, 57]. However, there seem to be no rigorous work/accuracy balance estimates that support the gained experiences. Likewise there have been numerous investigations of adaptive wavelet schemes that are also not backed by a complexity analysis, see e.g. [2, 12, 14, 30, 35, 37]. More recently, significant progress was made by multiresolution compression techniques for hyperbolic conservation laws initiated by the work of A. Harten and continued by R. Abgral, F. Arrandiga, A. Cohen, W. Dahmen, R. DeVore, R. Donat, R. Sjorgreen, T. Sonar and others. This line was developed into a fully adaptive technique by A. Cohen, O. Kaber, S. M¨ uller, M. Postel. The approach is based on a perturbation analysis applied to the compressed array of cell averages. Another direction of recent developments concerns the scope of problems described in Section 6. In this context first asymptotically optimal work/accuracy balances have been established in [26, 27, 36] and will be addressed in more detail below. 8.3 The Basic Paradigm The classical approach to the numerical treatment of operator equations may be summarized as follows. Starting with a variational formulation, the choice of finite dimensional trial and test spaces determines a dicretization of the continuous problem which eventually leads to a finite dimensional problem. The issue then is to develop efficient solvers for such problems. As indicated in Section 6, typical obstructions are then the size of the systems, ill-conditioning, as well as compatibility constraints like the LBB condition. In connection with wavelet based schemes a different paradigm suggests itself, reflected by the following steps, [27].
Multiscale and Wavelet Methods for Operator Equations
71
(I) One starts again with a variational formulation but puts first most emphasis on the mapping property (MP) (cf. (57)), as exemplified in Section 6.4. (II) Instead of turning to a finite dimensional approximation, the continuous problem is transformed into an equivalent ∞ - dimensional 2 problem which is well-conditioned, see Section 7. (III) One then tries to devise a convergent iteration for the ∞ - dimensional 2 -problem. (IV) This iteration is, of course, only conceptual. Its numerical realization relies on the adaptive application of the involved operators. In this framework adaptivity enters only through ways of applying infinite dimensional operators within some accuracy tolerance. Moreover, all algorithmic steps take place in 2 . The choice of the wavelet basis fully encodes the original problem, in particular, the underlying geometry. Of course, the realization of appropriate bases by itself may be a difficult problem depending on the case at hand. We will assume throughout the remainder of the paper that bases with the features described in Sections 3, 7 are available. Thus steps (I) and (II) above have been already discussed before in Sections 6 and 7. It remains to treat (III) and (IV). 8.4 (III) Convergent Iteration for the ∞-dimensional Problem When (54) involves a symmetric positive definite form the matrix L is symmetric positive definite and, by (81), well-conditioned. Hence simple iterative schemes like gradient or Richardson iterations would converge for any initial guess with some fixed reduction rate. However, in general L is not definite. The idea is therefore to transform LU = F again into an equivalent system LU = F ⇐⇒ Mp = G with possibly different coordinates p, such that one still has cM q2 ≤ Mq2 ≤ CM q2 ,
q ∈ 2 ,
(88)
while in addition there exists some relaxation weight ω with I − ωM2 →2 ≤ ρ < 1.
(89)
Thus simple iterations of the form pn+1 = pn + ω(G − Mpn )
(90)
would still converge with a reduction rate ≤ ρ per step. Of course, one could think of more sophisticated iterations with better convergence properties. We will confine the discussion though in essence to (90) for two reasons. First, it makes things much more transparent. Second the ideal
72
Wolfgang Dahmen
iteration (90) will eventually have to be perturbed in practical realizations. There is always a risk of loosing superior convergence properties when performing the applications of the involved operators only approximately. Thus economizing on the application of operators, offers a chance to accomplish the reduction rate offered by (90) at minimal cost. In fact, it will be shown later that this leads to schemes that achieve the target accuracy at asymptotically minimal cost. In the following we discuss two choices of M for which the above program can be carried through. The first one works for any well-posed problem, the second one offers an alternative for saddle point problems. For further options in the latter case, see [27]. Two choices of M Following [44, 27] we put M := LT L,
G := LT F,
p = U.
(91)
As pointed out in Theorem 6, one then has MU = G
⇐⇒
LU − F H = min LV − F H , V ∈H
(92)
so that (89) can be realized whenever (57) and (79) hold. Of course, the condition number still gets squared so that the quantitative behavior of the iteration (90) may in this case be rather poor. We will therefore discuss an alternative for the class of saddle point problems, see Section 6. In this case, it will turn out that one can take M as the Schur complement M = BA−1 BT . (93) The interpretation of (89) for the least squares formulation is clear, because the application of M means to apply successively L and then LT . The question then remains how to approximate the application of L and LT , which will be discussed later. When M is the Schur complement, things are less clear even for the idealized situation because of the occurrence of A−1 . We shall therefore point out next how (89) is to be understood even in the idealized infinite dimensional case. The application of the Schur complement will be facilitated with the aid of an Uzawa iteration, see [37]. To explain this, we need some preparations and recall that in the saddle point case we have H = X × M (cf. Section 6.4), and that we need a wavelet basis for each component space X ↔ ΨX The norm equivalences (78) then read
M ↔ ΨM .
Multiscale and Wavelet Methods for Operator Equations
and
73
cX v2 (JX ) ≤ vT D−1 X ΨX X ≤ CX v2 (JX ) .
(94)
cM q2 (JM ) ≤ qT D−1 M ΨM M ≤ CM q2 (JM ) .
(95)
Specifically, in the case of the Stokes problem one can take the scaling weights (DX )λ,λ = 2|λ| , (DM )λ,λ = 1. Setting, −1 A := D−1 X a(ΨX , ΨX )DX ,
and
−1 B := D−1 M b(ΨM , ΨX )DX ,
−1 f := D−1 X ΨX , f , g := DM ΨM , g,
(67) is equivalent to
LU = F ⇐⇒
u f A BT = . B 0 p g L
U
(96)
F
Moreover, under the assumptions (69), (70) together with (94), (95) the mapping property cL V2 ≤ LV2 ≤ CL V2 ,
V ∈ 2 ,
(97)
holds, see Theorem 5 One actually has to be somewhat careful with (94), which expresses that A is invertible on all of 2 or, in other words, that A is invertible on all of X. Recall from (69) that this is neither necessary nor generally the case. However, whenever the saddle point problem (67) is well-posed, one can show ˆ := A + cBT B is invertible on all that for some suitable c > 0 the matrix A of 2 and satisfies (94). One can then replace (96) by an equivalent system (with adjusted right hand side data) so that (MP) is valid and the Schur complement is well defined. Therefore we will assume in the following that either A is invertible on all of X, or that the above precaution has been been ˆ henceforth again denoted by A, so that (93) used, leading to a matrix A, makes sense. Thus we can use block elimination and observe that (96) is equivalent to Mp := BA−1 BT p = BA−1 f − g =: G Au = f − BT p. On account of (97), we know that M := BA−1 BT : 2 (JM ) → 2 (JM ),
Mq2 (JM ) ∼ q2 (JM ) .
(98)
Therefore there exists a positive relaxation parameter ω such that a fixed point iteration (or a gradient iteration) based on the identity
74
Wolfgang Dahmen
p = p + ω (BA−1 f − g) − Mp = p + ω B A−1 (f − BT p) −g = p + ω(Bu − g) =u
converges with a fixed reduction rate ρ < 1. Thus replacing for some iterate pn the expression A−1 (f − BT pn ) in this iteration by the solution un of Aun = f − BT pn ,
(99)
the iteration (90) for M given by (93), reduces to the simple update pn+1 = pn + ω(Bun − g),
(100)
with un from (99). This is the idea of the Uzawa scheme here formulated for the original infinite dimensional problem in wavelet coordinates. We are now prepared to discuss the last step (IV). 8.5 (IV) Adaptive Application of Operators For notational simplicity we dispense with distinguishing between the unknowns U and p (see Section 8.4) and continue to use U for the moment to denote the vectors of wavelet coefficients. Our objective is to turn the idealized iteration MU = G ; Un+1 = Un + ω(G − MUn ) into a practicable version. Neither can we evaluate the generally infinite array G exactly, nor can we compute MUn , even when Un has finite support. Thus, we need approximations to these two ingredients, which we will formulate as Basic Routines: RHS [η, G] → Gη : such that
G − Gη 2 ≤ η;
APPLY [η, M, V] → Wη : such that MV − Wη 2 ≤ η; ¯ η ≤ η. ¯ η : such that W − W COARSE [η, W] → W 2 A few comments on these routines are in order. The input to APPLY and COARSE will always be finitely supported. COARSE can then be realized by sorting the coefficients and by adding successively the squares of the entries from small to large, until the target threshold is reached. For details see [26]. It is pointed out in [7] how to avoid log-terms caused by sorting. The simplest example of RHS is encountered in the scalar elliptic case. Then G = f consists of the dual wavelet coefficients of the right hand side f . One should then think of computing in a preprocessing step a highly accurate approximation to f in the dual basis along with the corresponding coefficients. The necessary accuracy can be easily related to the target accuracy of
Multiscale and Wavelet Methods for Operator Equations
75
the adaptive process. Ordering these finitely many coefficients by size, RHS becomes then an application of COARSE to this finitely supported array. In more general cases, e.g. in the least squares formulation, RHS may take different formats to be discussed for each special case. However, in general it will always involve a combination of the two last routines APPLY and COARSE. It is less obvious how to go about the routine APPLY, which will be explained later in more detail. 8.6 The Adaptive Algorithm We will assume for the moment that we have the above routines at hand and wish to determine first for which tolerances η the corresponding perturbation of the ideal scheme (90) converges. ¯ SOLVE [, M, G] → U() ¯ 0 = 0, 0 := c−1 G , j = 0. (i) Set U 2 M ¯ j. ¯ j → U(). Else V0 := U (ii) If j ≤ , stop U (ii.1) For l = 0, . . . , K − 1 : RHS [ρl j , G] → Gl ;
APPLY [ρl j , M, Vl ] → Wl ; and set Vl+1 := Vl + α(Gl − Wl ).
¯ j+1 , j+1 := j /2, j + 1 → j go to (ii). (ii.2)COARSE [VK , 2j /5] → U Here ρ is the reduction rate from (89) and K depends only on the constants in (79), (88) and (57). In fact, it can be shown that, based on these constants and the reduction rate ρ in (89), there exists a uniformly bounded K so that U − VK 2 ≤ j /10. The coarsening step (ii.2) leads then to the following estimate. ¯ j satisfy Proposition 12 The approximations U ¯ j ≤ j , U − U 2
j ∈ IN .
(101)
Thus any target accuracy is met after finitely many steps. Note that the accuracy tolerances are at each stage comparable to the current accuracy, which will be important for later complexity estimates. The above scheme should be viewed as the simplest example of a perturbed representation of an iteration for the infinite dimensional problem. Several alternatives come to mind. Instead of applying always K steps (ii.1), one can monitor the approximate residual for possible earlier termination. Furthermore, the fixed relaxation parameter α can be replaced by a stage dependent parameter αj resulting from a line search in a gradient iteration. Finally, one
76
Wolfgang Dahmen
could resort to (approximate) conjugate gradient iterations. To minimize technicalities we will stick, however, with the above simpler Richardson scheme. The central question now is how to realize the basic routines in practice and what is their complexity. 8.7 Ideal Bench Mark – Best N -Term Approximation We wish to compare the performance of the above algorithm with what could be achieve ideally, namely with the work/accuracy balance of the best N -term approximation, recall (86). Since the relevant domain is just 2 the following version matters. σN,2 (V) := V − VN 2 =
V − W2 .
(102)
V − WT D−1 Ψ H = σN,H (V ),
(103)
min
#supp W≤N
Due to the norm equivalences (79), one has σN,2 (V) ∼
inf
W,#supp W≤N
i.e., the best N -term approximation in the computational domain 2 corresponds directly to the best N -term approximation in the energy norm. The following interrelation between best N -term approximation and coarsening or thresholding sheds some light on the role of COARSE in step (ii.2) of algorithm SOLVE, see [26]. Remark 13 Suppose the the finitely supported vector w satisfies v −w2 ≤ ¯ 2 ≤ η. More¯ η := COARSE [w, 4η/5] still satisfies v − w η/5. Clearly w −s N over, whenever v − vN 2 < one has ∼ −1/s ¯ η 2 ≤ η. ¯η < , v − w (104) #supp w ∼ η Thus the application of COARSE pulls a current approximation towards the best N -term approximation.
8.8 Compressible Matrices We turn now to the routine APPLY. At this point the cancellation properties (12) comes into play. As indicated in Section 2.2 wavelet representations of many operators are quasi-sparse. In the present context the following quantification of sparsity or compressibility is appropriate [26]. A matrix C is said to be s∗ -compressible – C ∈ Cs∗ – if for any 0 < s < s∗ and every j ∈ IN , there exists a matrix Cj obtained by replacing all but the order of αj 2j ( j αj < ∞) entries per row and column in C by zero, while still C − Cj ≤ αj 2−js , j ∈ IN , αj < ∞. (105) j
As mentioned above, one can use the cancellation properties (CP) to confirm the following claim.
Multiscale and Wavelet Methods for Operator Equations
77
Remark 14 The scaled wavelet representations Li,l in the above examples belong to Cs∗ for some s∗ = s∗ (L, Ψ ) > 0. 8.9 Fast Approximate Matrix/Vector Multiplication The key is that for compressible matrices one can devise approximate application schemes that exhibit in a certain range an asymptotically optimal work/accuracy balance. To this end, consider for simplicity the scalar case and abbreviate for any finitely supported v the best 2j -term approximations as v[j] := v2j and define wj := Aj v[0] + Aj−1 (v[1] − v[0] ) + · · · + A0 (v[j] − v[j−1] ),
(106)
as an approximation to Av. In fact, the triangle inequality together with the above compression estimates yield j αl 2−ls v[j−l] − v[j−l−1] 2 . (107) Av − wj 2 ≤ c v − v[j] 2 + l=0 σ2j , (v) < σ2j−l−1 , (v) 2 2 ∼
One can now exploit the a-posteriori information offered by the quantities σ2j−l−1 ,2 (v) to choose the smallest j for which the right hand side of (107) is smaller than a given target accuracy η. Since the sum is finite for each finitely supported input v such a j does indeed exist. This leads to a concrete multiplication scheme MULT [η, A, v] → wη
s.t.: Av − wη ≤ η,
which is analyzed in [26] and implemented in [8]. The main result can be formulated as follows [26]. −s Theorem 7. If A ∈ Cs∗ and v − vN 2 < for s < s∗ , then ∼ N −1/s #supp wη < , ∼ η
−1/s #flops < . ∼ #supp v + η
Thus, MULT has in some range an asymptotically optimal work/accuracy balance. In fact, it is pointed out in [7] that an original logarithmic factor, due to sorting operations can be avoided.
78
Wolfgang Dahmen
Fig. 6. Adaptive solution of the Poisson problem on a L-shaped domain
Note that whenever the bilinear form in (54) is coercive, one can take M = L to be just the wavelet representation of L, which is typically s∗ -compressible for some positive s∗ . In this case one can take APPLY = MULT. Again Poisson’s equation, i.e., (58) with a(x) the identity matrix and k(x) = 0, may serve as an example. Figure 6 illustrates the progressive formation of the solution when Ω ⊂ IR2 is an L-shaped domain. The lower right part displays the position of the significant coefficients of the approximate solution at the final stage. The reentrant corner causes a singularity in spite of a smooth right hand side. In the example shown in Figure 6 the right hand side is chosen so as to create the strongest possible singularity for that type of problems, see [8] for more details. The role of the regularity of the solution with regard to the computational complexity will be addressed later. In general M will differ from L and we will indicate briefly how to choose the scheme APPLY for the two choices of M mentioned above. The simplest case is the least squares formulation M = LT L, G = LT F, p = U. ! " RHSls [η, G] := MULT η2 , LT , COARSE [ 2CηL , F] ! " APPLYls [η, M, V] := MULT η2 , LT , MULT [ 2CηL , L, V]
Multiscale and Wavelet Methods for Operator Equations
79
In this case the routines RHS and APPLY are compositions of COARSE and MULT. The situation is more involved when employing Uzawa iterations for saddle point problems as explained next. The reader is referred to [36] for details. 8.10 Application Through Uzawa Iteration As indicated before, the application of the Schur complement M (98) can be based on Uzawa iterations. To explain this, recall first the IDEAL UZAWA: Given any p0 ∈ 2 (JM ), compute for i = 1, 2, . . . Aui = f − BT pi−1 ,
(set R := Ψ˜M , Ψ˜M ) (108)
pi = pi−1 + ωR(Bui − g),
I − ωRS2 →2 ≤ ρ < 1
A few comments on the role of R are in order. The operator B in (68) maps X to M while p belongs to M . In principle, this does not matter since the wavelet transform maps all spaces to 2 . However, as in the case of the Stokes problem, M may be a closed subspace (with finite codimension) of some larger Hilbert space that permits a wavelet characterization. In the Stokes problem L2,0 (Ω) is obtained by factoring out constants. Due to the vanishing moments of wavelets, these constants have to be removed only from the finite dimensional coarse scale space. But the corresponding linear relations on the coefficients are in general different from their counterparts for the dual basis Ψ˜M . Since the operators are not applied exactly a correction through the application of R that maps the vectors into the “right domain” turns out to be necessary, see [36]. The application of M consists then of solving the elliptic problem in (108) approximately by invoking an elliptic version of SOLVE (with APPLY = MULT) called ELLSOLVE. The application of B, BT and R in turn is facilitated by MULT. See [36] for details and corresponding precise tolerances. 8.11 Main Result – Convergence/Complexity We return now to the general problem (54). In the following it is understood that the scheme APPLY is based on either one of the above versions. Moreover, full accessibility of the right hand side data is assumed. Finally, we will assume that the entries of L can be computed on average at unit cost. This is justified, for instance, when dealing with constant coefficient differential operators. In general, this is a delicate task that often motivates to resort to the so called nonstandard form. However, it can be shown that also in more realistic situations this assumption can be realized, see also [7, 9]. The main result can then be formulated as follows.
80
Wolfgang Dahmen
Theorem 8. Assume that L ∈ Cs∗ for some s∗ = s∗ (L, Ψ ) > 0. Then if the exact solution U = UT D−1 Ψ of (54) satisfies for some s < s∗ inf
#suppV≤N
−s U − VT D−1 Ψ H < ∼ N ,
¯ then, for any > 0, the approximations U() produced by SOLVE satisfy ¯ T D−1 Ψ H < U − U() ∼ and
−1/s ¯ . #supp U(), comp. work < ∼ It should be noted that, aside from the complexity aspect, the adaptive scheme has the interesting effect, that compatibility conditions like the LBB condition become void. Roughly speaking, the adaptive application of the (full infinite dimensional) operators within the right accuracy tolerances inherits at each stage enough of the stability of the infinite dimensional problem. At no stage a fixed trial space is chosen that requires taking care of such constraints.
8.12 Some Ingredients of the Proof of Theorem 8 We shall discuss next some typical ingredients from nonlinear approximation entering the proof of Theorem 8, see [24, 51, 54] for more details. Recall that typical approximation errors for spaces arising from spatial refinements decay like powers of the mesh size and thus like negative powers of the number of degrees of freedom. In view of (103), it is therefore of interest to understand the structure of those sequences v ∈ 2 whose best N -term approximation decays like N −s for some s > 0. Since best N -term approximation in 2 is realized by retaining simply the largest N coefficients, it is directly related to thresholding. Thresholding One way to describe sequences that are sparse in the sense that σN,2 (v) decays fast, is to control the number of terms exceeding any given threshold. To this end, define the thresholding operator Tη v :=
vλ , |vλ | ≥ η, 0, |vλ | < η,
and set for some τ < 2 −τ . w τ := v ∈ 2 : N (v, η) := #supp Tη v ≤ Cv η
(109)
In fact, for τ ≥ 2 the condition #supp Tη v ≤ Cv η −τ is satisfied for all v ∈ 2 . For sequences v in w τ the error produced by thresholding is easily determined, noting that for v ∈ w τ
Multiscale and Wavelet Methods for Operator Equations
v − Tη v22 =
∞
|vλ |2 ≤ Cv
l=0 2−l−1 η≤|vλ |<2−l η
=
∞
81
(2−l η)2 (2−l−1 η)−τ
l=0
4Cv η 2−τ . 22−τ − 1
(110)
To understand the nature of the constant Cv appearing in the definition of w τ, ∗ ≤ vn∗ , consider the decreasing rearrangement (vn∗ )n∈IN of v defined by vn+1 vn∗ = |vλn |. By definition one has ∗ 1/τ ≤ ηN (v, η)1/τ ≤ Cv1/τ . vN (v,η)+1 N (v, η)
Thus defining
∗ =: |v|w , Cv1/τ = sup n1/τ v1+n τ
(111)
n∈IN
we see that := v2 + |v|w vw τ τ is a (quasi-) norm for In fact
w τ.
(112)
The next observation is that
∗ n1/τ vn+1 ≤ (n(vn∗ )τ )
1/τ
≤
w τ
is very close to τ .
1/τ (vj∗ )τ
≤ vτ ,
j≤n
so that τ ⊂ w τ ⊂ τ + ⊂ 2 ,
τ < τ + < 2.
(113)
The estimate (110) can be used to establish the following facts, that will serve as prerequisites for Remark 13, see [26]. Lemma 15. Suppose that v ∈ w τ for some 0 < τ < 2 and that w ∈ 2 satisfies v − w2 ≤
for some > 0.
Then one has for any η > 0 τ /2 1−τ /2 ¯ v − Tη w2 ≤ 2 + Cv , w η τ
#supp Tη w ≤
42 τ ¯ + 4Cv η −τ , w τ η2
where C¯ depends only on τ as τ → 0. This can be used to prove Remark 13, see [26]. Characterization of the Rates N −s We now turn to the characterization of those sequences whose best N -Term approximation rates decay like N −s and recall that σN,2 (v) := v − vN 2 =
min
#supp w≤N
v − w2 .
82
Wolfgang Dahmen
Proposition 16 Let and
1 τ
−s < = s+ 12 . Then v ∈ w τ if and only if σN,2 (v) ∼ N −s v − vN 2 < . τ ∼ N vw
Proof: σN,2 (v)2 =
(vn∗ )2 ≤
n>N
=CN
n−2/τ |v|2w ≤ C N 1− τ |v|2w τ τ 2
n≥N −2s
|v|2w . τ
Conversely, ∗ 2 N |v2N | ≤
|vn∗ |2 ≤ σN,2 (v)2 ≤ C N −2s ,
N
which means that ∗ | ≤ C N −(s+1/2) = C N −1/τ , |v2N
and hence v ∈ w τ , which completes the proof.
2
Key Complexity Bounds The above preliminaries can be used to show that the basic routines in SOLVE exhibit under certain circumstances optimal work/accuracy balances. In fact, counting the terms in (106) and taking the characterization of σN,2 (v) = O(N −s ) given in Proposition 16 into account, leads to the following fact, which implies Theorem 7 stated above. Proposition 17 Suppose that C ∈ Cs∗ and let τ1 = s + 12 , s < s∗ . Then wη = MULT [η, C, v] satisfies for any finitely supported input v < vw wη w τ τ ∼ 1/s 1/s −1/s • #flops < #supp v + vw η −1/s , #supp wη < η , ∼ vw ∼ τ τ •
which indeed confirms the optimal balance: accuracy for s < s∗ .
η
↔
cost
η −1/s
This, in turn, is one of the main ingredients of a proof of the following result [27]. 1 1 ∗ Theorem 9. Let L ∈ Cs∗ and suppose that U ∈ w τ for τ = s + 2 , s < s . Then in all the above cases the output Gη of the right hand side scheme RHS [η, G] satisfies
< Gw (1) Gη w τ τ ∼ (2) #flops ∼ #supp Gη
< Uw ; τ ∼ < G1/s η −1/s . w ∼ τ
Multiscale and Wavelet Methods for Operator Equations
83
Moreover, for APPLY ∈ {MULT, APPLYls , APPLYU z } the output Wη of APPLY [η, M, V] satisfies for s < s∗ and any finitely supported input V: < Vw ; (3) Wη w τ τ ∼ 1/s 1/s −1/s < (4) #flops ∼ #supp V + Vw η −1/s , #supp Wη < η . ∼ Vw τ τ Thus, according to Theorem 8, APPLY [, M−1 , G] := SOLVE [, M, G] exibits the same work/accuracy balance as its ingredients. In fact, the proof of Theorem 8 is based on Theorem 9 above. Compressibility Criteria We conclude this section with some sufficient conditions for a wavelet representation L to be compressible. To this end, recall that in all the examples considered before, the operator L is either local or of the form (Lu)(x) = K(x, y)u(y) dΓy , Γ
where
# # α β #∂x ∂y K(x, y)# < dist(x, y)−(d+2t+|α|+|β|) . ∼ Results of the following type can be found e.g. in [45, 66].
(114)
Theorem 10. Suppose that L has order 2t and satisfies for some r > 0 LvH −t+a < ∼ vH t+a ,
v ∈ H t+a , 0 ≤ |a| ≤ r.
γ < s < γ (16) and has Assume that D−s Ψ is a Riesz-basis for H s for −˜ cancellation properties (CP) (12) of order m. ˜ Then for any 0 < σ ≤ min {r, d/2 + m ˜ + t}, t + σ < γ, t − σ > −˜ γ , one has
2−||λ|−|λ ||σ . 2−(|λ |+|λ|)t | ψλ , Lψλ | < ˜ ∼ (1 + 2min(|λ|,|λ |) dist(Ωλ , Ωλ ))d+2m+2t
(115)
Thus the entries of the wavelet representation of operators of the above type exhibit a polynomial spatial decay, depending on the order of cancellation properties, and an exponential scalewise decay, depending on the regularity of the wavelets. Since estimates of this type are also basic for matrix compression techniques, in connection with boundary integral equations, we give a Sketch of the argument: (See ([45, 66, 67] for more details). Let again Ωλ := supp ψλ . One distinguishes two cases. Suppose first that dist(Ωλ , Ωλ ) > ∼ 2− min(|λ|,|λ |) . Since then x stays away from y the kernel is smooth. In contrast to the nonstandard form one can then apply (CP) (see (12)) of order m ˜ to the kernel
84
Wolfgang Dahmen
K(x, y)ψλ ψλ dxdy = K, ψλ ⊗ ψλ Γ Γ
successively with respect to both wavelets. The argument is similar to that in Section 2.2 and one eventually obtains
˜ 2−(|λ|+|λ |)(d/2+m) | Lψλ , ψλ | < . ˜ ∼ (dist(Ωλ , Ωλ ))d+2m+2t
(116)
− min(|λ|,|λ |) If on the other hand dist(Ωλ , Ωλ ) < we follow [35] and use ∼ 2 the continuity of L together with (NE) (see (16)) to conclude that
LvH −t+s < ∼ vH t+s ,
v ∈ H t+s , 0 ≤ |s| ≤ τ.
(117)
Assuming without loss of generality that |λ| > |λ | and employing the Schwarz inequality, yields | Lψλ , ψλ | ≤ Lψλ H −t+σ ψλ H t−σ < ∼ ψλ H t+σ ψλ H t−σ . Under the above assumptions on σ one obtains
| Lψλ , ψλ | ≤ 2t(|λ|+|λ |) 2σ(|λ |−|λ|) . 2
This completes the proof.
The estimates for wavelets with overlapping supports can actually be refined using the so called second compression. This allows one to remove logfactors in matrix compression estimates arising in the treatment of boundary integral operators, see [62, 68]. Estimates of the type (115) combined with the Schur Lemma lead to the following result [26]. Proposition 18 Suppose that 2−σ||λ|−|ν|| |Cλ,ν | < ∼ (1 + d(λ, ν))β , and define s∗ := min
d(λ, ν) := 2min {|λ|,|ν|} dist (Ωλ , Ων ) $ σ 1 β − , −1 . d 2 d
Then C ∈ Cs∗ . As mentioned above, the proof is based on the Schur Lemma: If for some C < ∞ and any positive sequence {ωλ }λ∈J |Cλ,ν |ων ≤ C ωλ , |Cλ,ν |ωλ ≤ C ων , ν ∈ J , ν∈J
λ∈J
Multiscale and Wavelet Methods for Operator Equations
85
then C2 →2 ≤ C, see e.g. [71]. This can be proved by establishing ∞ -estimates for W−1 CW and W CT W, where W := diag (ωλ : λ ∈ J ), and using then that 2 is obtained by interpolation between 1 and ∞ . In the proof of Proposition 18 this is applied to C − Cj 2 with weights ωλ = 2−d|λ|/2 [26]. −1
8.13 Approximation Properties and Regularity A fundamental statement in approximation theory is that approximation properties can be expressed in terms of the regularity of the approximated function. In order to judge the performance of an adaptive scheme versus a much simpler scheme based on uniform refinements say, requires, in view of Theorem 8, comparing the approximation power of best N -term approximation versus approximations based on uniform refinements. The latter ones are governed essentially by regularity in L2 . Convergence rates N −s on uniform grids with respect to some energy norm · H t are obtained essentially if and only if the approximated function belongs to H t+ds . On the other hand, the same rate can still be achieved by best N -term approximation as long as the approximated functions belong to the (much larger) space Bτt+sd (Lτ ) where τ −1 = s + 1/2. The connection of these facts with the present adaptive schemes is clear from the remarks in Section 8.12. In fact, (113) says that those sequences for which σN,2 (v) decays like N −s are (almost) those in τ . On the other hand, τ is directly related to regularity through relations of the type (27). Here the following version is relevant which refers to measuring the error in H = H t , say, see [34]. In fact, when H = H t , D = Dt and D−t Ψ is a Riesz basis for H t , one has u ∈ τ ⇐⇒ u = uλ 2−t|λ| ψλ ∈ Bτt+sd (Lτ (Ω)), λ
where again τ1 = s + 12 . Thus, functions in the latter space, that do not belong to H t+ds can be recovered by the adaptive scheme at an asymptotically better rate when compared with uniform refinements. The situation is illustrated again by Figure 7 which indicates the topography of function spaces. While in Figure 4 embedding in L2 mattered, Figure 7 shows embedding in H t . The larger r = t + sd the bigger the gap between H r and Bτr (Lτ ). The loss of regularity when moving to the right from H r at height r is compensated by judiciously placing the degrees of freedom through nonlinear approximation. Moreover, Theorem 8 says that this is preserved by the adaptive scheme. Now the question is whether and under which circumstances the solutions to (54) have higher Besov-regularity than Sobolev-regularity to take full advantage of adaptive solvers. For scalar elliptic problems this problem has been
86
Wolfgang Dahmen
treated e.g. in [32, 33]. The result essentially says that for rough boundaries such as Lipschitz or even polygonal boundaries the Besov-regularity of solutions is indeed higher than the relevant Sobolev-regularity, which indicates the effective use of adaptive techniques. The analysis also shows that the quantitative performance of the adaptive scheme, compared with one based on uniform refinements, is the better the larger the H r -norm of the solution is compared with its Bτr (Lτ )-norm. S
r
t
r
B τ( L τ)
H
1 r−t = τ d
+
1 2
t
1 2
1 τ
1 p
Fig. 7. Embedding in H t
Similar results can be found for the Stokes System [32, 36, 33]. We know that, if the solution U = (u, p) of (75) satisfies u ∈ Bτ1+sd (Lτ (Ω)),
p ∈ Bτsd (Lτ (Ω)),
1 1 =s+ , τ 2
(118)
the solution components satisfy −s σN,H01 (Ω) (u) < ∼ N ,
−s σN,L2 (Ω) (q) < ∼ N .
Thus, again the question arises under which circumstances has the solution to the Stokes problem a high Besov regularity, which according to Theorem 8 and (118), would result in correspondingly high convergence rates provided by the adaptive scheme. The following result can be found in [36]. Theorem 11. For d = 2 the strongest singularity solutions (uS , pS ) of the Stokes problem on an L-shaped domain in IR2 belong to the above scale of Besov spaces for any s > 0. The Sobolev regularity is limited by 1.544483..., resp. 0.544483.... Thus arbitrarily high asymptotic rates can be obtained by adaptive schemes of correspondingly high order. Numerical experiments for the adaptive Uzawa scheme from Section 8.4 (see (100)) for the situation described in Theorem 11 are reported in [36]. Therefore we give here only some brief excerpts. The first example concerns a strong singularity of the pressure component. Graphs of the solution components are depicted in Figure 8.
Multiscale and Wavelet Methods for Operator Equations
87
Fig. 8. Exact solution for the first example. Velocity components (left and middle) and pressure (right). The pressure functions exhibits a strong singularity
The second example involves a pressure which is localized around the reentrant corner, has strong gradients but is smooth, see Figure 9. To assess the
Fig. 9. Exact solution for the second example. Velocity components (left and middle) and pressure (right).
quantitative performance of the adaptive schemes, we relate the errors for the approximations produced by the scheme with the corresponding best N -term approximation error and define ρx :=
x − xΛ 2 , x − x#Λ 2
rx :=
x − xΛ 2 . x2
A detailed discussion of the results for the first example are given in [36]. The factors ρx for velocity and pressure stay clearly below 1.5, except on the
88
Wolfgang Dahmen
first three refinement levels. This is somewhat due to the provisional way of factoring out the constants in the pressure component which pulls in the full level of scaling functions. Therefore we concentrate here only on an experiment for the second example, where the wavelet bases ΨX and ΨM are chosen in such a way that for fixed trial spaces the LBB-condition would be violated. The results are shown in Table 1, where Λx refers to the support of the component x. The results fully It #Λu 3 4 5 6 7 8
5 20 61 178 294 478
ρu 1.00 1.13 1.47 1.33 1.19 1.25
ru #Λv 0.7586 0.4064 0.2107 0.1060 0.0533 0.0271
5 24 77 198 286 531
ρv 1.00 1.45 1.79 1.52 1.46 1.46
rv #Λp 0.7588 0.3979 0.2107 0.1306 0.0744 0.0362
243 262 324 396 674 899
ρp
rp
2.23810 2.08107 2.72102 2.81079 2.21371 1.83271
0.1196 0.0612 0.0339 0.0209 0.0108 0.0071
Table 1. Results for the second example with piecewise linear trial functions for velocity and pressure - LBB condition is violated
confirm the theoretical predictions concerning the role of the LBB-condition. The factor ρp is only slightly larger than for LBB-compatible choices of ΨX and ΨM .
9 Further Issues, Applications We shall briefly touch upon several further issues that are currently under investigation or suggest themselves from the previous developments. 9.1 Nonlinear Problems A natural step is to apply the same paradigm to nonlinear variational problems which is done in [28, 29]. We shall briefly indicate some of the ideas for the simple example of the following boundary value problem −∆u + u3 = f in Ω,
u = 0 on Γ = ∂Ω.
(119)
A natural weak formulation would be to look for u ∈ H = H01 (Ω) such that a(v, u) := ∇v, ∇u + v, u3 = v, f ,
v ∈ H01 (Ω).
(120)
This presupposes, of course, that u3 ∈ H −1 (Ω), i.e., that the mapping u → u3 takes H01 (Ω) into H −1 (Ω). By simple duality arguments this can indeed be confirmed to be the case as long as the spatial dimension d is bounded by 4. It
Multiscale and Wavelet Methods for Operator Equations
89
is then not hard to show that (120) is the Euler equation for the minimization of a strictly convex functional and thus turns out to possess a unique solution. The next step is to transform (120) into wavelet coordinates R(u) = 0, where
(121)
R(u) = Au + D−1 Ψ, u3 − f ,
and with (D)λ,ν = δλ,ν 2|λ| A = D−1 ∇Ψ, ∇Ψ D−1 ,
f = D−1 Ψ, f .
Now (121) is to be treated by an iterative method. The simplest version is a gradient iteration of the form un+1 = un − ωR(un ),
n = 0, 1, 2, . . .
(122)
In fact, taking the bounded invertibility of A on 2 as well as the above mentioned mapping property of the nonlinearity into account, one can show that for a suitable damping parameter ω the error is reduced in each step by a factor ρ < 1. As before the objective is to carry out the iteration approximately by an adaptive application of the involved operators. The application of A can be based again on the routine MULT discussed above. The treatment of the nonlinearity raises several issues: (1) Knowing the significant coefficients of v ∈ 2 predict the location of the significant coefficients of R(v), so that an overall accuracy tolerance is met. (2) Knowing the location of the significant coefficients, compute them efficiently. Problem (1) is treated for a class of nonlinearities in [29]. The resulting algorithm is then used in [28] to devise adaptive algorithms that exhibit again in some range asymptotically optimal work/accuracy balances, provided that the significant coefficients in the nonlinearity can be computed efficiently. This latter task (2), in turn, can be attacked by the techniques from [48, 10]. The main idea can be sketched as follows. The searched for array w := F(u) = D−1 Ψ, F (u),
F (u) := u3 ,
consists of the dual wavelet coefficients of the function w := wT Ψ˜ ∈ L2 (Ω). Thus, as soon as one has some approximation w ˜ to w in L2 , the Riesz-basis property says that ˜ 2 < w − w ˜ L2 , ∼ w − w ˜ are the dual ˜ to w in 2 when w that is, one also has a good approximation w wavelet coefficients of the approximation w. ˜ The objective is now to construct
90
Wolfgang Dahmen
w ˜ with the aid of suitable quadrature techniques and quasi-interpolants in ˜ is then obcombination with local single scale representations. The array w tained by local multiscale transformations. An important point is that quadrature is not used for the approximation of the individual entries but for determining the global approximant w. ˜ For details the reader is referred to [48, 10]. 9.2 Time Dependent Problems The simplest example of a time dependent problem, that can be attacked by the above concepts, is the heat equation or more generally ∂t u = Lu, where L is an elliptic operator (incorporating as above homogeneous boundary conditions). Using an implicit time discretization, the method of lines turns this into a series of elliptic boundary value problems, which can be treated by the concepts discussed above. An alternative is to write u(t) = etL u(0) and utilize the following representation of the exponential 1 tL etz (zI − L)−1 u0 dz, e u0 = 2πi Γ
which in a somewhat different context has been also done in [58]. Here Γ is a curve in the complex plane avoiding the spectrum of L. Approximating the right hand side by quadrature, we obtain u(t) ≈ ωn etzn (zn I − L)−1 u0 , n
see [58, 63]. Thus one has can solve the individual problems (zn I−L)u(n) = u0 in parallel.
10 Appendix: Some Useful Facts For the convenience of the reader a few useful facts are summarized in this section which have been frequently referred to in previous sections. 10.1 Function Spaces We collect first a few definitions concerning function spaces. Standard references are [1, 11, 73].
Multiscale and Wavelet Methods for Operator Equations
91
Sobolev spaces of integer order k (on bounded domains or IRd ) are defined as Wpk (Ω) := {f : ∂ α f ∈ Lp (Ω), |α| ≤ k}, where derivatives are understood in the weak sense. The corresponding (semi-)norms are given by |f |Wpk (Ω) := 1/p k p α , vpW k (Ω) := m=0 |f |pW m (Ω) . |α|=k ∂ f Lp (Ω) p
p
Fractional order spaces can be defined by intrinsic norms with k := t :
vWpt (Ω)
1/p p α α |∂ v(x) − ∂ v(y)| = vpW k (Ω) + dxdy . p |x − y|d+tp |α|=k Ω Ω
Negative indices are handled by duality. An alternative is offered by extension to IRd and using then Fourier transforms or by interpolation. This leads to the notion of Besov-spaces. For a given fixed r and any t < r the quantities 1/q ∞ −t q ds [s ω (f, s, Ω) ] , 0 < q < ∞; r p s |v|Bqt (Lp (Ω)) := 0 sups>0 s−t ωr (f, s, Ω)p , q = ∞, define equivalent Besov semi-norms. Here we have employed the Lp -modulus of continuity: ωr (f, t, Ω)p := sup ∆rh f Lp (Ωr,h ) , |h|≤t
where ∆h f := f (· + h) − f (·), ∆kh = ∆h ◦ ∆k−1 h and Ωr,h := {x : x + sh ∈ Ω, s ∈ [0, r]}. Important special cases are: • Wpt = Bpt (Lp ) for t > 0, s ∈ IN , (p = 2) • H t := W2t = B2t (L2 ) for t ∈ IR (H −t := (H t ) ), where as before f X :=
f, g . g∈X,g≡0 gX sup
10.2 Local Polynomial Approximation Estimates of the following type are frequently used, see e.g. [55]. k−m inf v − P Wpm (Ω) < |v|Wpk (Ω) , (m < k) ∼ (diam Ω)
P ∈Pk
t inf v − P Lp (Ω) < ∼ (diam Ω) |v|Bqt (Lp (Ω)) , inf v − P Lp (Ω) < ωk (f, t, Ω)p . ∼ sup P ∈Pk t>0 P ∈Pk
92
Wolfgang Dahmen
Idea of proof: By rescaling it suffices to consider a reference domain with unit diameter. Suppose that inf P ∈Pk vn − P Wpm (Ω) ≥ n|vn |Wpk (Ω) . Rescale to conclude that 1 = inf wn − P Wpm (Ω) = wn Wpm (Ω) ≥ n|wn |Wpk (Ω) . P ∈Pk
Thus {wn }n is precompact in Wpm , that is there exists w ∈ Wpm such that |w|Wpk (Ω) = lim |wn |Wpk (Ω) = 0, n→∞
which implies that w ∈ Pk . On the other hand, inf P ∈Pk w − P Wpm (Ω) = 1 which is a contradiction. This provides the first estimate. The remaining cases are similar. 2 10.3 Condition Numbers The connection between the order of an operator and the condition numbers of related stiffness matrices can be described as follows. Remark 19 If a(·, ·) ∼ · 2H t is symmetric and Ψ is an L2 -Riesz-basis, then for J := max {|λ| : λ ∈ Λ}, min {|λ| : λ ∈ Λ} < ∼ 1, one has cond2 (a(ΨΛ , ΨΛ )) ∼ 22|t|J , Proof: max
v∈SΛ
J → ∞.
a(v, v) a(v, v) a(ψλ , ψλ ) ≥ ≥ min , 2 2 v∈SΛ v2 vL2 ψλ L2 L2
which gives cond2 (a(ΨΛ , ΨΛ )) > ∼
a(ψλ1 ,ψλ1 ) ψλ1 2L 2
a(ψλ2 ,ψλ2 ) ψλ2 2L
.
2
Since a(ψλ , ψλ ) ∼ Now setting
ψλ 2H t
the norm equivalences (16) imply 2t|λ| ∼ ψλ H t .
|λ1 | =
max {|λ| : λ ∈ Λ} if t ≥ 0, min {|λ| : λ ∈ Λ} if t < 0,
|λ2 | =
min {|λ| : λ ∈ Λ} if t ≥ 0, max {|λ| : λ ∈ Λ} if t < 0,
and
2J|t| we conclude cond2 (a(ΨΛ , ΨΛ )) > . As for the upper estimate, consider ∼ 2 first the case t < 0:
min
v∈SΛ
D−t dΛ 22 dTΛ ΨΛ 2H t (NE) a(v, v) > 2−|t|J , ∼ min ∼ min 2 2 v2L2 dΛ ∈IR#Λ dΛ 2 dΛ ∈IR#Λ dΛ 2 ∼
Multiscale and Wavelet Methods for Operator Equations
while max
v∈SΛ
93
a(v, v) < 1. v2L2 ∼
In the case t > 0, the Bernstein estimate (29) yields max
v∈SΛ
v2H t a(v, v) Jt < max , < 2 ∼ ∼ 2 , v∈SΛ v2 vL2 L2
2J|t| . which shows that cond2 (a(ΨΛ , ΨΛ )) < ∼ 2
2
References 1. R.A. Adams, Sobolev Spaces, Academic Press, 1978. 2. A. Averbuch, G. Beylkin, R. Coifman, and M. Israeli, Multiscale inversion of elliptic operators, in: Signal and Image Representation in Combined Spaces, J. Zeevi, R. Coifman (eds.), Academic Press, 1995, 3. I. Babuˇska, The finite element method with Lagrange multipliers, Numer. Math. 20, 1973, 179–192. 4. I. Babuˇska and W.C. Rheinboldt, Error estimates for adaptive finite element computations, SIAM J. Numer. Anal. 15 (1978), 736–754. 5. R.E. Bank and A. Weiser, Some a posteriori error estimates for elliptic partial differential equations, Math. Comp., 44 (1985), 283–301. 6. H.J.C. Barbosa and T.J.R. Hughes, Boundary Lagrange multipliers in finite element methods: Error analysis in natural norms, Numer. Math., 62 (1992), 1–15. 7. A. Barinka, Fast evaluation tools for adaptive wavelet schemes, PhD Thesis, RWTH Aachen, 2003. 8. A. Barinka, T. Barsch, P. Charton, A. Cohen, S. Dahlke, W. Dahmen, K. Urban, Adaptive wavelet schemes for elliptic problems – Implementation and numerical experiments, SIAM J. Sci. Comp., 23, No. 3 (2001), 910–939. 9. A. Barinka, S. Dahlke, W. Dahmen, Adaptive Application of Operators in Standard Representation, in preparation. 10. A. Barinka, W. Dahmen, R. Schneider, Y. Xu, An algorithm for the evaluation of nonlinear functionals of wavelet expansions, in preparation 11. J. Bergh, J. L¨ ofstr¨ om, Interpolation Spaces, An Introduction, Springer, 1976. 12. S. Bertoluzza, A-posteriori error estimates for wavelet Galerkin methods, Appl. Math. Lett., 8 (1995), 1–6. 13. G. Beylkin, R. R. Coifman, V. Rokhlin, Fast wavelet transforms and numerical algorithms I, Comm. Pure and Appl. Math., 44 (1991), 141–183. 14. G. Beylkin, J. M. Keiser, An adaptive pseudo-wavelet approach for solving nonlinear partial differential equations, in: Multiscale Wavelet Methods for PDEs, W. Dahmen, A. J. Kurdila, P. Oswald (eds.) Academic Press, 137–197, 1997. 15. F. Bornemann, B. Erdmann, and R. Kornhuber, A posteriori error estimates for elliptic problems in two and three space dimensions, SIAM J. Numer. Anal., 33 (1996), 1188–1204. 16. J.H. Bramble, The Lagrange multiplier method for Dirichlet’s problem, Math. Comp., 37 (1981), 1–11.
94
Wolfgang Dahmen
17. J.H. Bramble, R.D. Lazarov, and J.E. Pasciak, A least-squares approach based on a discrete minus one inner product for first order systems, Math. Comput., 66 (1997), 935–955. 18. J.H. Bramble, R.D. Lazarov, and J.E. Pasciak, Least-squares for second order elliptic problems, Comp. Meth. Appl. Mech. Engnrg., 152 (1998), 195–210. 19. F. Brezzi and M. Fortin, Mixed and Hybrid Finite Element Methods, Springer, 1991. 20. Z. Cai, R. Lazarov, T.A. Manteuffel, and S.F. McCormick, First-order system least squares for second-order partial differential equations; Part I, SIAM J. Numer. Anal., 31 (1994), 1785–1799. 21. A. Canuto, A. Tabacco, K. Urban, The wavelet element method, part I: Construction and analysis, Appl. Comp. Harm. Anal., 6 (1999), 1–52. 22. A. Canuto, A. Tabacco, K. Urban, The wavelet element method, part II: Realization and additional features in 2D and 3D, Appl. Comp. Harm. Anal., 8 (2000), 123–165. 23. J.M. Carnicer, W. Dahmen and J.M. Pe˜ na, Local decomposition of refinable spaces, Appl. Comp. Harm. Anal., 3 (1996), 127-153. 24. A. Cohen, Wavelet methods in numerical analysis, in the Handbook of Numerical Analysis, vol. VII, P.-G. Ciarlet et J.-L. Lions eds., Elsevier, Amsterdam, 2000. 25. A. Cohen, I. Daubechies, J.-C. Feauveau, Biorthogonal bases of compactly supported wavelets, Comm. Pure and Appl. Math., 45 (1992), 485–560. 26. A. Cohen, W. Dahmen, R. DeVore, Adaptive wavelet methods for elliptic operator equations – Convergence rates, Math. Comp., 70 (2001), 27–75. 27. A. Cohen, W. Dahmen, R. DeVore, Adaptive wavelet methods II - Beyond the elliptic case, Foundations of Computational Mathematics, 2 (2002), 203–245. 28. A. Cohen, W. Dahmen, R. DeVore, Adaptive Wavelet Schemes for Nonlinear Variational Problems, IGPM Report # 221, RWTH Aachen, July 2002. 29. A. Cohen, W. Dahmen, R. DeVore, Sparse evaluation of nonlinear functionals of multiscale expansions, IGPM Report # 222, RWTH Aachen, July 2002. 30. A. Cohen, R. Masson, Wavelet adaptive methods for second order elliptic problems: boundary conditions and domain decomposition, Numer. Math., 86 (2000), 193–238, 31. M. Costabel and E.P. Stephan, Coupling of finite and boundary element methods for an elastoplastic interface problem, SIAM J. Numer. Anal., 27 (1990), 1212–1226. 32. S. Dahlke: Besov regularity for elliptic boundary value problems on polygonal domains, Appl. Math. Lett., 12 (1999), 31–36. 33. S. Dahlke, R. DeVore, Besov regularity for elliptic boundary value problems, Comm. Partial Differential Equations, 22 (1997), 1–16. 34. S. Dahlke, W. Dahmen, R. DeVore, Nonlinear approximation and adaptive techniques for solving elliptic operator equations, in: Multiscale Wavelet Methods for PDEs, W. Dahmen, A. Kurdila, P. Oswald (eds.), Academic Press, London, 237–283, 1997. 35. S. Dahlke, W. Dahmen, R. Hochmuth, R. Schneider, Stable multiscale bases and local error estimation for elliptic problems, Appl. Numer. Maths. 8 (1997), 21–47. 36. S. Dahlke, W. Dahmen, K. Urban, Adaptive wavelet methods for saddle point problems – Convergence rates, SIAM J. Numer. Anal., 40 (No. 4) (2002), 1230– 1262.
Multiscale and Wavelet Methods for Operator Equations
95
37. S. Dahlke, R. Hochmuth, K. Urban, Adaptive wavelet methods for saddle point problems, Math. Model. Numer. Anal. (M2AN), 34(5) (2000), 1003-1022. 38. W. Dahmen, Some remarks on multiscale transformations, stability and biorthogonality, in: Wavelets, Images and Surface Fitting, P.J. Laurent, A. Le M´ehaut´e, L.L. Schumaker (eds.), AK Peters, Wellesley, Massachusetts, 157188, 1994. 39. W. Dahmen, Stability of multiscale transformations, Journal of Fourier Analysis and Applications, 2 (1996), 341-361. 40. W. Dahmen, Wavelet and Multiscale Methods for Operator Equations, Acta Numerica, Cambridge University Press, 6 (1997), 55–228. 41. W. Dahmen, Wavelet methods for PDEs – Some recent developments, J. Comp. Appl. Math., 128 (2001), 133–185. 42. W. Dahmen, A. Kunoth, Multilevel preconditioning, Numer. Math., 63 (1992), 315–344. 43. W. Dahmen, A. Kunoth, Appending boundary conditions by Lagrange multipliers: Analysis of the LBB condition, Numer. Math., 88 (2001), 9–42. 44. W. Dahmen, A. Kunoth, R. Schneider, Wavelet least squares methods for boundary value problems, SIAM J. Numer. Anal. 39(6) (2002), 1985–2013. 45. W. Dahmen, S. Pr¨ oßdorf, R. Schneider, Multiscale methods for pseudodifferential equations on smooth manifolds, in: Proceedings of the International Conference on Wavelets: Theory, Algorithms, and Applications, C.K. Chui, L. Montefusco, L. Puccio (eds.), Academic Press, 385-424, 1994. 46. W. Dahmen and R. Schneider, Composite Wavelet Bases for Operator Equations, Math. Comp., 68 (1999), 1533–1567. 47. W. Dahmen and R. Schneider, Wavelets on Manifolds I: Construction and Domain Decomposition, SIAM J. Math. Anal., 31 (1999), 184–230. 48. W. Dahmen, R. Schneider, Y. Xu, Nonlinear functions of wavelet expansions – Adaptive reconstruction and fast evaluation, Numerische Mathematik, 86 (2000), 49–101. 49. W. Dahmen and R. Stevenson, Element-by-element construction of wavelets – stability and moment conditions, SIAM J. Numer. Anal., 37 (1999), 319–325. 50. I. Daubechies, Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Math. 61, SIAM, Philadelphia, 1992. 51. R. DeVore, Nonlinear approximation, Acta Numerica, Cambridge University Press, 7 (1998), 51-150. 52. R. DeVore, V. Popov, Interpolation of Besov spaces, Trans. Amer. Math. Soc., 305 (1988), 397–414. 53. R. DeVore, B. Jawerth, and V. Popov, Compression of wavelet decompositions, Amer. J. Math., 114 (1992), 737–785. 54. R. DeVore, G.G. Lorentz, Constructive Approximation, Grundlehren vol. 303, Springer-Verlag, Berlin, 1993. 55. R. DeVore, R. Sharpley, Maximal functions measuring smoothness, Memoirs of the American Mathematical Society, 47 (No 293), 1984. 56. W. D¨ orfler, A convergent adaptive algorithm for Poisson’s equation, SIAM J. Numer. Anal., 33 (1996), 1106–1124. 57. K. Eriksson, D. Estep, P. Hansbo, and C. Johnson, Introduction to adaptive methods for differential equations, Acta Numerica, Cambridge University Press, 4 (1995), 105–158.
96
Wolfgang Dahmen
58. I.P. Gavrilyuk, W. Hackbusch, B.N. Khoromskij, H-matrix approximation for the operator exponential with applications, Preprint No. 42, Max-PlanckInstitut f¨ ur Mathematik, Leipzig, 2000, to appear in Numer. Math.. 59. V. Girault and P.-A. Raviart, Finite Element Methods for Navier-Stokes Equations, Springer, 1986. 60. R. Glowinski and V. Girault, Error analysis of a fictitious domain method applied to a Dirichlet problem, Japan J. Industr. Appl. Maths. 12 (1995), 487–514. 61. M.D. Gunzburger, S.L. Hou, Treating inhomogeneous boundary conditions in finite element methods and the calculation of boundary stresses, SIAM J. Numer. Anal., 29 (1992), 390–424. 62. H. Harbrecht, Wavelet Galerkin Schemes for the Boundary Element Method in three Dimensions, Doctoral Dissertation, Technical University of Chemnitz, June 2001. 63. M. J¨ urgens, Adaptive methods for time dependent problems, in preparation. 64. R. Kress, Linear Integral Equations, Springer-Verlag, Berlin-Heidelberg, 1989. 65. E. Novak, On the power of adaptation, J. Complexity, 12 (1996), 199–237. 66. T. von Petersdorff, C. Schwab, Wavelet approximation for first kind integral equations on polygons, Numer. Math., 74 (1996) 479-516. 67. T. von Petersdorff, C. Schwab, Fully discrete multiscale Galerkin BEM, in: Multiscale Wavelet Methods for PDEs, W. Dahmen, A. Kurdila, P. Oswald (eds.), Academic Press, London, 287–346, 1997. 68. R. Schneider, Multiskalen- und Wavelet-Matrixkompression: Analysisbasierte Methoden zur effizienten L¨ osung großer vollbesetzter Gleichungssysteme, Habilitationsschrift, Technische Hochschule, Darmstadt, 1995, Advances in Numerical Mathematics, Teubner,1998. 69. W. Sweldens, The lifting scheme: A custom-design construction of biorthogonal wavelets, Appl. Comput. Harm. Anal., 3 (1996), 186-200. 70. W. Sweldens, The lifting scheme: A construction of second generation wavelets, SIAM J. Math. Anal., 29 (1998), 511–546. 71. P. Tchamitchian, Wavelets, Functions, and Operators, in: Wavelets: Theory and Applications, G. Erlebacher, M.Y. Hussaini, and L. Jameson (eds.), ICASE/LaRC Series in Computational Science and Engineering, Oxford University Press, 83–181, 1996. 72. J. Traub and H. Wo´zniakowski, A General Theory of Optimal Algorithms, Academic Press, New York, 1980. 73. H. Triebel, Interpolation Theory, Function Spaces, and Differential Operators, North-Holland, Amsterdam, 1978. 74. P.S. Vassilevski, J. Wang, Stabilizing the hierarchical basis by approximate wavelets, I: Theory, Numer. Linear Algebra Appl., 4 (1997),103-126. 75. H. Yserentant, On the multilevel splitting of finite element spaces, Numer. Math. 49 (1986), 379–412.
Multilevel Methods in Finite Elements James H. Bramble Mathematics Department, Texas A&M University, College Station, Texas, USA
[email protected]
Summary. This survey consists of five parts. In the first section we describe a model problem and a two-level algorithm in order to motivate the multilevel approach. In Section 2 an abstract multilevel algorithm is described and analyzed under some regularity assumptions. An analysis under less stringent assumptions is given in Section 3. Non-nested spaces and varying forms are treated in Section 4. Finally, we show how the multilevel framework provides computationally efficient realizations of norms on Sobolev scales.
1 Introduction We consider a two-level multigrid algorithm applied to a simple model problem in this section. The purpose here is to describe and motivate the use of multiple grids. We will give a complete analysis of this algorithm. For this purpose and for the treatment of multilevel methods, we first provide some preliminary definitions. Next, a model problem and its finite element approximation are described. The two-level method is then defined and an analysis of its properties as a reducer/preconditioner is provided. 1.1 Sobolev Spaces The iterative convergence estimates for multigrid algorithms applied to the computation of the discrete approximations to partial differential equations are most naturally analyzed using Sobolev spaces and their associated norms. To be precise, we shall give the definitions here although a more thorough discussion can be found in, for example, [1], [14] and [19]. Let Ω be a Lebesgue measurable set in d dimensional Euclidean space Rd and f be a real valued function defined on Ω. We denote by f Lp (Ω) =
1/p |f (x)|p dx
Ω
J.H. Bramble, A. Cohen, and W. Dahmen: LNM 1825, C. Canuto (Ed.), pp. 97–151, 2003. c Springer-Verlag Berlin Heidelberg 2003
98
James H. Bramble
the Lp (Ω) norm of f . Let N denote the set of nonnegative integers and let α = (α1 , . . . , αN ), with αi ∈ N, be a multi-index. We set |α| = α1 + · · · + αN
and α
D =
∂ ∂x1
α1
∂ ∂x2
α2
···
∂ ∂xN
αN .
The Sobolev spaces Wps (Ω) for s ∈ N are defined to be the set of distributions f ∈ D (Ω) (cf. [17]) for which the norm f Wps (Ω) =
D
α
|α|≤s
f pLp (Ω)
1/p < ∞.
When p = 2, the spaces are Hilbert spaces and are of special interest in this book. We shall denote these by H s (Ω) ≡ W2s (Ω). The corresponding norm will be denoted by · s,Ω = · W2s (Ω) . For real s with i < s < i + 1, the Sobolev space H s (Ω) is defined by interpolation (using the real method) between H i (Ω) and H i+1 (Ω) (see, e.g., [12], [19] or Appendix A of [3] ). The norm and inner product notation will be further simplified when the domain Ω is clear from the context, in which case we use · s = · s,Ω
and
(·, ·) = (·, ·)Ω .
We will also use certain Sobolev spaces with negative indices. These will be defined later as needed. 1.2 A Model Problem In this section we consider the Dirichlet problem on a bounded domain Ω in R2 . This problem and its finite element approximation can be used to illustrate some of the most fundamental properties of the multigrid algorithms. Let ∂2 ∂2 ∆= + . ∂x2 ∂y 2 Given f in an appropriately defined Sobolev space, consider the Dirichlet problem −∆u = f, in Ω, (1) u = 0, on ∂Ω. For v, w ∈ H 1 (Ω), let
D(v, w) = Ω
∂v ∂w ∂v ∂w + ∂x ∂x ∂y ∂y
dx dy.
Multilevel Methods in Finite Elements
99
Denote by C0∞ (Ω) the space of infinitely differentiable functions with compact support in Ω. By Green’s identity, for φ ∈ C0∞ (Ω), (f, φ) = (−∆u, φ) = D(u, φ).
(2)
Let H01 (Ω) be the closure of C0∞ (Ω) with respect to · 1 . The Poincar´e inequality implies that there is a constant C > 0 such that v20 ≤ CD(v, v),
for all v ∈ H01 (Ω).
Hence, we can take D(·, ·)1/2 to be the norm on H01 (Ω). This changes the Hilbert space structure. In the above inequality, C represents a generic positive constant. Such constants will appear often in this book and will be denoted by C and c, with or without subscript. These constants can take on different values at different occurrences, however, they will always be independent of mesh and grid level parameters. For a bounded linear functional f on H01 (Ω), the weak solution u of (1) satisfies D(u, φ) = (f, φ), for all φ ∈ H01 (Ω). (3) Here (f, φ) is the value of the functional f at φ. If f ∈ L2 (Ω), it coincides with the L2 -inner product. Existence and uniqueness of the function u ∈ H01 (Ω) satisfying (3) will follow from the Poincar´e inequality and the Riesz Representation Theorem. Theorem 1.1 (Riesz Representation Theorem) Let H be a Hilbert space with norm · and inner product (·, ·)H . Let f be a bounded linear functional on H, i.e., |f (φ)| ≤ C(f )φ. Then there exists a unique uf ∈ H such that (uf , φ)H = f (φ),
for all φ ∈ H.
To apply the above theorem to (3), we take H = H01 (Ω) with (·, ·)H = D(·, ·) and set f (φ) = (f, φ), for all φ ∈ H01 (Ω). Then, by the definition of · (H01 (Ω)) and the Poincar´e inequality, |f (φ)| ≤ f (H01 (Ω)) φ1 ≤ Cf (H01 (Ω)) D(φ, φ)1/2 . Thus, f (·) is a bounded linear functional on H and Theorem 1.1 implies that there is a unique function u ∈ H = H01 (Ω) satisfying (3).
100
James H. Bramble
1.3 Finite Element Approximation of the Model Problem In this subsection, we consider the simplest multilevel finite element approximation spaces. We start with the Galerkin Method. subspace of H01 (Ω). The Galerkin approximation is the function u ∈ M satisfying D(u, φ) = (f, φ),
for all φ ∈ M.
(4)
As in the continuous case, the existence and uniqueness of solutions to (4) follows from the Poincar´e inequality and the Riesz Representation Theorem. We shall consider spaces M which result from multilevel finite element constructions. We first define a nested sequence of triangulations. Let Ω be a domain with polygonal boundary and let T1 be a given (coarse) triangulation of Ω. Successively finer triangulations {Tk }, k = 2, . . . , J are formed by subdividing the triangles of Tk−1 . More precisely, for each triangle τ of Tk , Tk+1 has four triangles corresponding to those formed by connecting the midpoints of the sides of τ . Note that the angles of the triangles in the finest triangulation are the same as those in the coarsest. Thus, the triangles on all grids are of quasi-uniform shape independent of the mesh parameter k. This construction is illustrated in Figure 1. Nested approximation spaces are defined in terms
; ; ; ; ;; ;; ;; ;; ; ; ; ;; ;; ; ; ; ; ; ;; ;; ;; ;; ; ; ; ; ;; ;; ; ; ;; ; ; ; ;; ; ; ; ; ; ; ; ;; ;; ;; ;;
;;;;;;;;;;;;;;;; ; ; ;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ; ; ;;;;;;;;;;;;;; ; ; ;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;; ; ;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;; ; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ;;;;;;;;;;;;;;;
Fig. 1. Nested triangulations
of the triangulations. Let Mk , for k = 1, . . . , J, be the space of continuous piecewise linear functions on Tk which vanish on ∂Ω. Set h1 to be the length of the side of maximum length in T1 and hk = 2−k+1 h1 . We clearly have that M1 ⊂ M2 ⊂ · · · ⊂ MJ ≡ M ⊂ H01 (Ω). We also denote h = hJ . The sequence of spaces defined above satisfy the following approximation properties: For v ∈ H01 (Ω) ∩ H r (Ω) with r = 1 or 2, there exists χ ∈ Mk such that 2 v − χ20 + h2k v − χ21 ≤ Ch2r (5) k vr . See, e.g., [11] for a proof.
Multilevel Methods in Finite Elements
101
1.4 The Stiffness Matrix and its Condition Number The point of our multigrid algorithms is to develop effective iterative techniques for computing approximate solutions satisfying equations exemplified by (4). Functions in M are most naturally represented in terms of a basis. Order the interior vertices of TJ , 1 ≤ i . . . ≤ NJ and let φi ∈ M be such that 1 0
φi =
at xi at xj = xi .
Since any continuous, piecewise linear function is determined by its values at the vertices, NJ u ˜i φi where u ˜i = u(xi ). u= i=1
Hence, NJ
u ˜i D(φi , φj ) = (f, φj ).
(6)
i=1
Let AJ denote the stiffness matrix [AJ ]ij = D(φi , φj ). Equation (6) is equiv≈ ≈ alent to ˜ = f, (7) AJ u ≈
∼
where u ˜ = (˜ u1 , . . . , u ˜N )T , f = (f 1 , . . . , f N )T and f i = (f, φi ). ∼ ∼ ∼ ∼ It is well known that the rate of convergence of simple linear iterative methods or the conjugate gradient method, directly applied to (7), can be bounded in terms of the condition number of AJ (cf. [3]). We will now estimate ≈ this condition number. Let v be in MJ and write v=
NJ
v˜i φi .
i=1
Let τ be a triangle of TJ . Since τ is of quasi-uniform shape and v is linear on τ , we clearly have that v20,τ is equivalent to the sum of the squares of the values of v at the vertices of τ times h2 . By summing over the triangles, v20 ≈ h2
NJ
v˜i2 ,
v ∈ MJ .
i=1
The notation ≈ means equivalence of norms with constants of equivalence independent of h. Hence, by the Poincar´e inequality, 2
h
NJ i=1
v˜i2
≤
Cv20
≤ CD(v, v) = C
NJ i,j=1
[AJ ]ij v˜i v˜j , ≈
v ∈ MJ .
102
James H. Bramble
This means that the smallest eigenvalue of AJ is bounded below by ch2 . ≈ Let τ be a triangle in TJ . Since v is linear on τ and τ is of quasi-uniform shape, we have |∇v|2 dx ≈ [v(a) − v(b)]2 + [v(b) − v(c)]2 , v ∈ MJ . τ
Here a, b, c are the vertices of τ . By summing and using obvious manipulations, it follows that NJ D(v, v) ≤ C1 v˜i2 , for all v ∈ MJ . i=1
This means that the largest eigenvalue is bounded and hence the condition number of AJ is bounded by ≈
κ(AJ ) ≤ Ch−2 .
(8)
≈
It is not difficult to show that the estimate (8) is sharp, i.e., there is a constant c not depending on J such that κ(AJ ) ≥ ch−2 . ≈
We leave this as an exercise for the reader. This means that the problem is rather ill conditioned and, thus, iterative methods applied directly to (7) will, in general, converge slowly. 1.5 A Two-Level Multigrid Method To motivate the two-level multigrid method, we start by considering the simple linear iterative method u ˜n+1 = u ˜n − λ−1 ˜n − f ). J (AJ u ≈
≈
∼
(9)
Here λJ denotes the largest eigenvalue of AJ . Since AJ is symmetric and ≈ ≈ ≈ positive definite, there is a basis {ψ˜i }, for i = 1, . . . , N of eigenfunctions with corresponding eigenvalues 0 < η1 ≤ η2 ≤ · · · ≤ ηN = λJ . From (9), it immediately follows that the error e˜n = u ˜−u ˜n =
N
e˜ni ψ˜i
i=1
satisfies e˜n+1 = (1 − ηi /λJ )˜ eni . i ≈
This means that the rate of reduction of the i’th error component is ρi = (1 − ηi /λJ ). For ηi on the order of λJ , this reduction is bounded away from ≈ ≈ one and corresponds to a good reduction. However, for ηi near η1 ≈ ch2 ,
Multilevel Methods in Finite Elements
103
ρi = 1 − ch2 which is near 1. This means that the components of the error corresponding to small eigenvalues will converge to zero rather slowly. The two-level multigrid algorithm can now be motivated as follows. Since the form (AJ v˜, v˜) is equivalent to the sum of the squares of differences be≈ tween neighboring mesh points, the components corresponding to the larger eigenvalues are highly oscillatory. In contrast, the components corresponding to the smaller eigenvalues are smoother and should be adequately approximated by coarser grid functions. This suggests combining a simple iteration of the form of (9) (to reduce the components corresponding to large eigenvalues) with a coarse grid solve (to reduce the components corresponding to smaller eigenvalues). To describe this procedure, it is most natural to consider the computational problem more abstractly as one of finding functions in finite element subspaces. To this end, we define linear operators on the finite element spaces as follows: Let Ak : Mk → Mk be defined by (Ak v, φ) = D(v, φ),
for all φ ∈ Mk .
(10)
Clearly, Ak is well defined since Mk is finite dimensional. Moreover, Ak is clearly symmetric and positive definite with respect to the inner product (·, ·). Let Qk denote the L2 (Ω) projection onto Mk and let Pk denote the orthogonal projector with respect to the D(·, ·) inner product. These are defined by (Qk v, φ) = (v, φ),
for all v ∈ L2 (Ω), φ ∈ Mk ,
and D(Pk v, φ) = D(v, φ),
for all v ∈ H 1 (Ω), φ ∈ Mk .
Now (4) can be rewritten as AJ u = fJ ≡ QJ f. We denote by λJ the largest eigenvalue of AJ . By the well known inverse properties for M , (AJ v, v) = D(v, v) ≤ Ch−2 (v, v),
for all v ∈ M.
This means that the largest eigenvalue λJ of AJ is bounded by Ch−2 . The Poincar´e inequality implies that the smallest eigenvalue of AJ is bounded from below. In this notation, the two-level multigrid algorithm is given as follows. Algorithm 1.1 Given ui ∈ M approximating the solution u = A−1 J fJ of (4), define ui+1 ∈ M as follows: i (1) Set ui+1/3 = ui + λ−1 J (fJ − AJ u ). i+2/3 i+1/3 (2) Define u =u + q, where
AJ−1 q = QJ−1 (fJ − AJ ui+1/3 ).
104
James H. Bramble
i+2/3 (3) Finally, set ui+1 ≡ ui+3/3 = ui+2/3 + λ−1 ). J (fJ − AJ u
Steps 1 and 3 above are simple iteration of the form i ui+1 = ui + λ−1 J (fJ − AJ u ).
(11)
In terms of the coefficient vectors u ˜i and the dual vector f J , it can be written ∼ as −1 −1 i+1 i i u ˜ =u ˜ + λJ GJ (f J − AJ u ˜) ≈
∼
≈
where GJ = [(φi , φj )] is the Gram matrix. This matrix is symmetric and ≈ positive definite and all its eigenvalues are on the order of Ch2J . Note that λJ ≈ −1 −2 hJ and λJ ≈ 1, hence λ−1 is spectrally equivalent to λ−1 I. Consequently J G ≈J ≈ ≈J iteration (11), although slightly different from iteration (9), has a smoothing property similar to that of (9), i.e., the resulting error after application of (11) should be less oscillatory. Steps 1 and 3 above are often referred to as smoothing steps. The middle step is called a coarse grid correction step. Note that for φ ∈ MJ−1 , (QJ−1 AJ v, φ) = (AJ v, φ) = D(v, φ) = D(PJ−1 v, φ) = (AJ−1 PJ−1 v, φ), i.e., QJ−1 AJ = AJ−1 PJ−1 . Set ei = u − ui and ei+j/3 = u − ui+j/3 . We see that i+1/3 q = A−1 ) = PJ−1 ei+1/3 . J−1 QJ−1 AJ (u − u Thus, ei+2/3 = (I − PJ−1 )ei+1/3 . This means that ei+2/3 is the D(·, ·) orthogonal projection of ei+1/3 into the subspace of MJ which is orthogonal to MJ−1 . We now prove the following theorem. Theorem 1.2 Set ei = u − ui , where u is the solution of (4) and ui is given by Algorithm 1.1. Then |||ei+1 ||| ≤ δ|||ei |||. Here δ < 1 independently of h and ||| · ||| denotes the norm defined by |||v||| = D(v, v)1/2 . Proof. From Step (1) and Step (3) of Algorithm 1.1 and the above discussion, i ei+1/3 = (I − λ−1 J AJ )e ,
ei+2/3 = (I − PJ−1 )ei+1/3 , and i+2/3 . ei+1 = ei+3/3 = (I − λ−1 J AJ )e
Thus,
−1 i i ei+1 = (I − λ−1 J AJ )(I − PJ−1 )(I − λJ AJ )e ≡ Ee .
Multilevel Methods in Finite Elements
105
Clearly, |||ei+1 ||| ≤ |||E||| |||ei |||, where |||E||| ≡ supv∈MJ |||Ev|||/|||v||| is the operator norm of E. Note that E is symmetric with respect to D(·, ·). Hence |||E||| = sup v∈MJ
= sup v∈MJ
|D(Ev, v)| |||v||| 2 2 |||(I − PJ−1 )(I − λ−1 J AJ )v||| 2 |||v|||
2 = |||(I − PJ−1 )(I − λ−1 J AJ )|||
2 = |||(I − λ−1 J AJ )(I − PJ−1 )||| .
We will estimate the last norm above. Let w ∈ M and set w ˆ = (I −PJ−1 )w. Then D(w, ˆ θ) = 0 for all θ ∈ MJ−1 . Hence, for any θ ∈ MJ−1 , D(w, ˆ w) ˆ = D(w, ˆ w ˆ − θ) = (AJ w, ˆ w ˆ − θ) ≤ (AJ w, ˆ AJ w) ˆ 1/2 w ˆ − θ0 . Using the approximation property (5) gives D(w, ˆ w) ˆ ≤ ChJ−1 (AJ w, ˆ AJ w) ˆ 1/2 D(w, ˆ w) ˆ 1/2 . −2 Cancelling the common factor and using λJ ≤ Ch−2 J = 4ChJ−1 , we obtain,
ˆ −1 (AJ w, D(w, ˆ w) ˆ ≤ Ch2J−1 (AJ w, ˆ AJ w) ˆ ≤ Cλ ˆ AJ w). ˆ J −1 Now I −λ−1 J AJ is symmetric with respect to D(·, ·) and σ(I −λJ AJ ) ⊆ [0, 1). Hence
|||(I − λ−1 ˆ 2 ≤ D((I − λ−1 ˆ w) ˆ J AJ )w||| J AJ )w, −1 = D(w, ˆ w) ˆ − λJ D(AJ w, ˆ w) ˆ ˆ ≤ (1 − 1/C)D( w, ˆ w) ˆ 2 ˆ ˆ ≤ (1 − 1/C)D(w, w) = (1 − 1/C)|||w||| . 2 ˆ This shows that |||E||| = |||(I − λ−1 J AJ )(I − PJ−1 )||| ≤ (1 − 1/C). Therefore
ˆ |||ei ||| = δ |||ei |||, |||ei+1 ||| ≤ (1 − 1/C) where δ = 1 − 1/Cˆ is independent of h. Remark 1.1 If we omit Step 1 in Algorithm 1.1, then i ei+1 = (I − λ−1 J AJ )(I − PJ−1 )e
and hence
106
James H. Bramble i |||ei+1 ||| = |||(I − λ−1 J AJ )(I − PJ−1 )e ||| i ≤ |||(I − λ−1 J AJ )(I − PJ−1 )||| |||e |||
≤ δ 1/2 |||ei |||. We obtain the same bound if we omit, instead, Step (3). One obvious advantage of the symmetric algorithm is that it can be used as a preconditioner in the PCG algorithm. What have we gained by using the two-level scheme? In this example, dim MJ−1 ≈
1 dim MJ . 4
−1 Hence A−1 J−1 is “cheaper” to compute than AJ . This is not a significant improvement for very large problems. This algorithm was described only to illustrate the multilevel technique. More efficient algorithms will be developed in later sections by recursive application of the above idea.
Remark 1.2 The algorithm described above is not too convenient for direct application. First of all, the largest eigenvalue explicitly appears in Steps 1 and 3. It is not difficult to see that the largest eigenvalue in these algorithms can be replaced by any upper bound λJ for λJ provided that λJ ≤ CλJ . More importantly, Steps 1 and 3 require that Gram matrix systems be solved at each step of the iteration. Even though these matrices are well conditioned, this results in some unnecessary computational effort. Both of the above mentioned problems will be avoided by the introduction of more appropriate smoothing procedures. Smoothing procedures are studied in more detail in [3]. The more abstract multigrid algorithms of which we will now introduce use generic smoothing operators in place of λ−1 J I.
2 Multigrid I In this section we develop the abstract multilevel theory. The first part generalizes the two-level method of the introduction to the multilevel case and an analysis of the V-cycle algorithm is given under somewhat restrictive conditions. The first convergence theory for the V-cycle is given in the case in which the multilevel spaces are nested and the corresponding forms are inherited from the form on the finest space. The case in which smoothing is done only on certain subspaces is not treated here but may be found in [3]. That theory covers the case of local mesh refinements discussed in [3] in the chapter on applications. A more general setting is introduced in next section of these notes. There, the requirements that the spaces be nested and that the forms be inherited is dropped. The W-cycle is defined as well as the variable V-cycle preconditioner.
Multilevel Methods in Finite Elements
107
2.1 An Abstract V-cycle Algorithm We consider an abstract V-cycle algorithm in this section. We shall provide a theorem which can be used to estimate the rate of convergence of this multigrid algorithm. The analysis follows that of [7] and requires the so-called full regularity and approximation assumption. The fundamental ideas of this analysis are contained in the paper [2]. However, multigrid methods are applied to many problems which do not satisfy this hypothesis. Further results may be found in [3] which give some bounds for the rate of convergence in the case of less than full regularity. The first theorem provides a uniform convergence estimate while the second gives rise to a rate of convergence which deteriorates with the number of levels. In the next section a stronger result is proved using a product representation of the error. For generality, the algorithms and theory presented in this section are given abstractly. However, the usefulness of any abstract theory depends on the possibility of verification of the hypotheses in various applications. We will indicate how these conditions are satisfied for many applications. Again, details are contained in [3]. 2.2 The Multilevel Framework In this subsection, we will set up the abstract multilevel framework. Let us consider a nested sequence of finite dimensional vector spaces M 1 ⊂ M2 ⊂ . . . ⊂ MJ . In addition, let A(·, ·) and (·, ·) be symmetric positive definite bilinear forms on MJ . The norm corresponding to (·, ·) will be denoted by · . We shall develop multigrid algorithms for the solution of the following problem: Given f ∈ MJ , find u ∈ MJ satisfying A(u, φ) = (f, φ),
for all φ ∈ MJ .
(12)
As in the previous section, we will use auxiliary operators to define the multigrid algorithms. For k = 1, . . . , J, let the operator Ak : Mk → Mk be defined by (Ak v, φ) = A(v, φ), for all φ ∈ Mk . The operator Ak is clearly positive definite and symmetric in both the A(·, ·) and (·, ·) inner products. Also, we define the projectors Pk : MJ → Mk and Qk : MJ → Mk by A(Pk v, φ) = A(v, φ),
for all φ ∈ Mk ,
and (Qk v, φ) = (v, φ),
for all φ ∈ Mk .
Note that (12) can be rewritten in the above notation as
108
James H. Bramble
AJ u = f.
(13)
We will use generic smoothing operators in our abstract multigrid algorithm. Assume that we are given linear operators, Rk : Mk → Mk for k = 2, . . . , J. Denote by Rkt the adjoint of Rk with respect to (·, ·). We shall state further conditions concerning these operators as needed in the discussion. The construction and analysis of effective smoothing operators may be found in [3]. 2.3 The Abstract V-cycle Algorithm, I We describe a V-cycle multigrid algorithm for computing the solution u of (13) by means of an iterative process. This involves recursively applying a generalization of Algorithm 1.1. Given an initial iterate u0 ∈ MJ , we define a sequence approximating u by um+1 = MgJ (um , f ).
(14)
Here MgJ (·, ·) is the map of MJ × MJ into MJ defined by the following algorithm. Algorithm 2.1 Set Mg1 (v, g) = A−1 1 g. Let k be greater than one and let v and g be in Mk . Assuming that Mgk−1 (·, ·) has been defined, we define Mgk (v, g) as follows: (1) Set v = v + Rkt (g − Ak v). (2) Set v = v + q where q = Mgk−1 (0, Qk−1 (g − Ak v )). (3) Define Mgk (v, g) ≡ v = v + Rk (g − Ak v ). The above algorithm is a recursive generalization of Algorithm 1.1. In fact, if we take J = 2 and RJ = λ−1 J I then Algorithm 2.1 coincides with Algorithm 1.1. The above algorithm replaces the solve on the J −1’st subspace (in Algorithm 1.1) with one recursive application of the multilevel procedure. A straightforward induction argument shows that Mgk (·, ·) is a linear map of Mk × Mk into Mk . Moreover, it is obviously consistent. The linear operator EJ v ≡ MgJ (v, 0) is the error reduction operator for (14). That is EJ (u − um ) ≡ u − um+1 = MgJ (u, f ) − MgJ (um , f ) = MgJ (u − um , 0). Steps 1 and 3 above are referred to as smoothing steps. Step 2 is called a coarse grid correction step. Step 3 is included so that the resulting linear multigrid operator BJ g = MgJ (0, g) is symmetric with respect to the inner product (·, ·), and hence can be used as a preconditioner for AJ . It is straightforward to show that with Bk g = Mgk (0, g), Mgk (v, g) = v + Bk (g − Ak v).
Multilevel Methods in Finite Elements
109
To use Bk as a preconditioner, it is often convenient to define Bk g ≡ Mgk (0, g) directly and the following algorithm gives a recursive definition of Bk . Algorithm 2.2 Let B1 = A−1 1 . Assuming Bk−1 : Mk−1 → Mk−1 has been defined, we define Bk : Mk → Mk as follows. Let g ∈ Mk . (1) Set v = Rkt g. (2) Set v = v + q where q = Bk−1 Qk−1 (g − Ak v ). (3) Set Bk g ≡ v = v + Rk (g − Ak v ). 2.4 The Two-level Error Recurrence We next derive a two-level error recurrence for the multigrid process defined by Algorithm 2.1, i.e., we compute an expression for the error operator Ek v = Mgk (v, 0). Let u = A−1 k g. By consistency, Ek (u−v) ≡ u−Mgk (v, g) = Mgk (u− v, 0). We want to express u−Mgk (v, g) in terms of u−v. We start by considering the effect of the smoothing steps. Now v of Algorithm 2.1 satisfies u − Mgk (v, g) = u − v = (I − Rk Ak )(u − v ) ≡ Kk (u − v )
(15)
and v satisfies u − v = (I − Rkt Ak )(u − v) ≡ Kk∗ (u − v)
(16)
Note that Kk∗ = (I − Rkt Ak ) is the adjoint of Kk with respect to the A(·, ·) inner product. We next consider the effect of the coarse grid correction step. Note that for φ ∈ Mi with i < k, (Qi Ak v, φ) = (Ak v, φ) = A(v, φ) = A(Pi v, φ) = (Ai Pi v, φ). This means that Qi Ak = Ai Pi . Thus, q of Step 2 is given by q = Mgk−1 (0, Qk−1 (g − Ak v )) = Mgk−1 (0, Qk−1 Ak (u − v )) = Mgk−1 (0, Ak−1 Pk−1 (u − v )) = Pk−1 (u − v ) − Ek−1 Pk−1 (u − v ). We used consistency for the last equality. Consequently, u − v = [(I − Pk−1 ) + Ek−1 Pk−1 ](u − v ).
(17)
Combining (15), (16) and (17) gives the two-level error recurrence Ek = Kk [(I − Pk−1 ) + Ek−1 Pk−1 ]Kk∗ .
(18)
110
James H. Bramble
A simple argument using mathematical induction shows that Ek is symmetric, positive semidefinite with respect to the A(·, ·) inner product, i.e., 0 ≤ A(Ek v, v),
for all v ∈ Mk .
0 ≤ A(EJ v, v),
for all v ∈ MJ .
In particular, (19)
2.5 The Braess-Hackbusch Theorem We prove a result given by [2] in this subsection. This result was the first which showed that multigrid algorithms could be expected to converge with very weak conditions on the smoother. For this result, we require two conditions. The first is on the smoother. Let λk be the largest eigenvalue of Ak . Let ω be in (0, 2). For v, g ∈ Mk , Richardson’s iteration for the solution of Ak v = g is defined by m v m+1 = v m + ωλ−1 k (g − Ak v ).
The corresponding error reduction operator is Kk,ω = I − (ωλ−1 k )Ak . Note that Kk = (I − Rk Ak ) and A(Kk v, Kk v) = A((I − Rk Ak )v, v), where
t t Rk = (I − Kk∗ Kk )A−1 k = R k + R k − R k Ak R k
is the linear smoothing operator corresponding to the symmetrized iterative process with reducer Kk∗ Kk . Our condition concerning the smoothers Rk , k = 2, . . . , J is related to Richardson’s iteration with an appropriate parameter ω. Condition (SM.1) There exists an ω ∈ (0, 1), not depending on J, such that 0 ≤ A(Kk v, Kk v) ≤ A(Kk,ω v, v),
for all v ∈ Mk , k = 2, 3, . . . , J,
or equivalently ω (v, v) ≤ (Rk v, v) ≤ (A−1 k v, v), λk
for all v ∈ Mk , k = 2, 3, . . . , J.
Remark 2.1 Note that both Kk∗ Kk and Kk,ω are symmetric with respect to A(·, ·) and positive semi-definite on Mk . Condition (SM.1) implies that Kk∗ Kk is less than or equal to Kk,ω . That is, the smoother Rk results in a reduction which is at least as good as the result obtained using the Richardson smoother (ω/λk )I.
Multilevel Methods in Finite Elements
111
The second condition which we will require is the so-called full regularity and approximation condition. Condition (A.1) There is a constant CP not depending on J such that (I − Pk−1 )v2 ≤ CP λ−1 k A(v, v),
for all v ∈ Mk , k = 2, . . . , J.
In applications, the verification of this condition requires that the underlying elliptic problem satisfy full elliptic regularity. In the model problem of Section 1, Pk−1 is the elliptic projector. The above condition is obtained by application of finite element duality techniques. We can now state the Braess-Hackbusch Theorem. Theorem 2.1 Let MgJ (·, ·) be defined by Algorithm 2.1. Assume that Condition (A.1) holds and that the smoothers Rk satisfy Condition (SM.1). Then the error reduction operator EJ v = MgJ (v, 0) of Algorithm 2.1 satisfies 0 ≤ A(EJ v, v) ≤
CP A(v, v), CP + ω
for all v ∈ MJ .
This means that the sequence um defined by (14) satisfies m CP |||u − um ||| ≤ |||u − u0 |||. CP + ω Before proving this theorem, we state and prove the following lemma. Lemma 2.1 Assume that Condition (SM.1) holds. Then for any v ∈ Mk , 1 1 Ak Kk∗ v2 ≤ [A(v, v) − A(Kk∗ v, Kk∗ v)]. λk ω Proof. Let v be in Mk . By Condition (SM.1) with v replaced by Ak Kk∗ v, 1 1 1 Ak Kk∗ v2 ≤ (Rk Ak Kk∗ v, Ak Kk∗ v) = A((I − Kk∗ Kk )Kk∗ v, Kk∗ v) λk ω ω 1 ¯ k )K ¯ k v, v), = A((I − K ω ¯ k is symmetric and non-negative with ¯ k = Kk K ∗ . The operator K where K k respect to the A(·, ·) inner product. ¯ k that It follows from the symmetry of K ¯ k v, v) ≤ A((I − K ¯ k )v, v) = [A(v, v) − A(K ∗ v, K ∗ v)]. ¯ k )K A((I − K k k Combining the above inequalities completes the proof of the lemma. Proof of Theorem 2.1. By (19), we need only prove the upper inequality. The proof is by induction. We will show that for i = 1, . . . , J, A(Ei v, v) ≤ δA(v, v),
for all v ∈ Mi
(20)
112
James H. Bramble
holds for δ = CP /(CP + ω). Since E1 = 0 is the trivial operator on M1 , (20) obviously holds for i = 1. Let k be greater than one and assume that (20) holds for i = k − 1. Let v be in Mk . Using the two-level recurrence formula (18) gives A(Ek v, v) = A((I − Pk−1 )Kk∗ v, Kk∗ v) + A(Ek−1 Pk−1 Kk∗ v, Pk−1 Kk∗ v). Applying the induction hypothesis we have A(Ek v, v) ≤ A((I − Pk−1 )Kk∗ v, Kk∗ v) + δA(Pk−1 Kk∗ v, Pk−1 Kk∗ v) = (1 − δ)A((I − Pk−1 )Kk∗ v, Kk∗ v) + δA(Kk∗ v, Kk∗ v).
(21)
By the Cauchy-Schwarz inequality and Condition (A.1), A((I − Pk−1 )Kk∗ v, Kk∗ v) ≤ (I − Pk−1 )Kk∗ vAk Kk∗ v
1/2 ∗ ∗ Ak Kk∗ v. ≤ CP λ−1 k A((I − Pk−1 )Kk v, Kk v) Obvious manipulations give A((I − Pk−1 )Kk∗ v, Kk∗ v) ≤
CP Ak Kk∗ v2 . λk
(22)
Thus, (21), (22) and Lemma 2.1 imply that (1 − δ)CP [A(v, v) − A(Kk∗ v, Kk∗ v)] + δA(Kk∗ v, Kk∗ v) ω = δA(v, v).
A(Ek v, v) ≤
This completes the proof of the theorem. Remark 2.2 The original result given by [2] was stated in terms of the particular smoother Rk = λ−1 k I. They also allowed a fixed number m of applications of this smoother on each level. Theorems with variable numbers of smoothings will appear later in these notes. Results in this section are based on the two-level error recurrence for the multigrid algorithm. We refer to the survey paper [4] for more detailed discussion on the development of the multigrid methods.
3 Multigrid II: V-cycle with Less Than Full Elliptic Regularity 3.1 Introduction and Preliminaries In this section, we provide an analysis of Algorithm 2.1 with an assumption which is weaker than Condition (A.1). To state this condition, we require
Multilevel Methods in Finite Elements
113
scales of discrete norms. The operator Ak is symmetric and positive definite on Mk . Consequently, its fractional powers are well defined. For real s, we consider the scale of discrete norms on Mk defined by ||| · ||| s,k |||v||| s,k = (Ask v, v)1/2 ,
for all v ∈ Mk .
Note that |||v||| 0,k = v
and |||v||| 1,k = |||v|||.
By expanding in the eigenfunctions of Ak and using the H¨ older inequality, it follows easily that 1−α |||v||| 1+α,k ≤ |||v||| α 2,k |||v||| 1,k ,
0 ≤ α ≤ 1.
Our theorem will be based on the following generalization of Condition (A.1). Condition (A.2) There exists a number α ∈ (0, 1] and a constant CP not depending on J such that |||(I − Pk−1 )v||| 21−α,k ≤ CPα λ−α k A(v, v),
for all v ∈ Mk , k = 2, . . . , J.
In the model problem of Section 1, Pk−1 is the elliptic projector. It can be shown that for this example and s ∈ [0, 1], the norm ||| · ||| s,k is equivalent on Mk (with constants independent of J) to the Sobolev norm · s . The above condition can be then obtained by application of finite element duality techniques. We will also need a geometrical growth property of λk which can be verified easily for elliptic finite element problems on a sequence of nested meshes. We postulate this property in the following condition. Condition (A.3) There exist positive constants γ1 ≤ γ2 < 1 such that γ1 λk+1 ≤ λk ≤ γ2 λk+1
for k = 1, 2, . . . , J − 1.
When using this condition, it is convenient to introduce the symmetric matrix Λ(α) , with 0 < α ≤ 1, whose lower triangular entries are given by (α)
Λki = (λk /λi )α/2 , (α)
for k ≤ i.
(23)
(α)
Denote by ΛL and ΛU the lower and upper triangular parts of Λ(α) . Con(α) (i−k)α/2 dition (A.3) then implies that Λki = (λk /λi )α/2 ≤ γ2 . Therefore, α/2 α/2 γ2 1 + γ2 (α) (α) (α) and Λ 2 ≤ . ΛL 2 = ΛU 2 ≤ α/2 α/2 1 − γ2 1 − γ2 The last condition needed is a condition on the uniform boundedness of the operators Qk . This is the following.
114
James H. Bramble
Condition (A.4) Let α ∈ (0, 1] be given. Then there exists a constant CQ not depending on J such that |||Qk v||| 21−α,k ≤ CQ |||v||| 21−α,i ,
for all v ∈ Mi , 1 ≤ k ≤ i ≤ J.
Before proceeding we will prove a basic lemma. Lemma 3.1 Assume that Conditions (A.2) and (A.4) hold for the same α ∈ (0, 1] and that Condition (A.3) holds. Then J
|||(Pk − Qk )v||| 2 ≤ Cb A(v, v),
for all v ∈ MJ ,
(24)
k=1
A(v, v) ≤ Cc
|||Q1 v||| 2 +
J
|||(Qk − Qk−1 )v||| 2
,
for all v ∈ MJ , (25)
k=2
and |||Q1 v||| 2 +
J
λk (Qk − Qk−1 )v2 ≤ Ca A(v, v),
for all v ∈ MJ . (26)
k=2
Here Ca
Cb
=
γ2
1−γ2 2
(CQ CPα )
= 4CQ CP
2
α/2
1 α/2 1−γ2
α/2
, Cc = ( Cb + Cb + 1)2 and
.
Proof. Let v be in MJ . By the definition of λk , J
|||(Pk − Qk )v||| ≤ 2
k=1
J
α/2 λk |||(Pk
2 − Qk )v||| 1−α,k
k=1
=
J
α/2 λk |||Qk (Pk
2 − I)v)||| 1−α,k
.
k=1
Note that (I − Pk ) =
J
(Pi − Pi−1 ). By the triangle inequality and Condi-
i=k+1
tion (A.4), J k=1
|||(Pk − Qk )v||| ≤ CQ 2
J k=1
J
2 α/2 λk |||(Pi
− Pi−1 )v||| 1−α,i
.
i=k+1
Note that (Pi − Pi−1 ) is a projector and hence Condition (A.2) implies that
Multilevel Methods in Finite Elements J
|||(Pk − Qk )v||| ≤ 2
CQ CPα
k=1
J
k=1
115
α/2 J λk |||(Pi − Pi−1 )v||| λi
2
i=k+1
≤ CQ CPα ΛU 22 (α)
J
|||(Pk − Pk−1 )v||| 2 .
k=2
Condition (A.3) implies that
(α) ΛU 22
≤
2
α/2
γ2
. Inequality (24) now
α/2
1−γ2
follows from the orthogonality of (Pi − Pi−1 ). J Now let vk = (Qk − Qk−1 )v. Then i=k+1 vi = (I − Qk )v. We have J
A(v, v) =
=
A(vk , vi ) =
J
k,i=1
k=1
J
J
A(vk , vk ) + 2
k=1
=
J
J
A(vk , vk ) + 2 |||vk ||| + 2 2
k=1
≤
J
J J
A(vk , vi )
k=1 i=k+1
A(vk , (I − Qk )v)
k=1
k=1
≤
A(vk , vk ) + 2
|||vk ||| + 2 2
k=1
J
A(vk , (Pk − Qk )v)
k=1 J
1/2
|||vk |||
2
k=1 J
1/2 |||(Pk − Qk )v|||
2
k=1
1/2 |||vk |||
J
[Cb A(v, v)]1/2 .
2
k=1
Elementary algebra shows that A(v, v) ≤
Cc
J k=1
|||vk ||| = 2
Cc
|||Q1 v||| + 2
J
|||(Qk − Qk−1 )v|||
2
k=2
for Cc = ( Cb + Cb + 1)2 . This proves (25). For a proof of the (26) see [3]. We now return to the multigrid method defined by Algorithm 2.1. In this section we continue in the framework of the previous two sections and establish four additional theorems which provide bounds for the convergence rate of Algorithm 2.1. The proofs of these results depend on expressing the error propagator for the multigrid algorithm in terms of a product of operators defined on the finest space. The theorem presented here gives a uniform convergence estimate and can be used in many cases without full elliptic regularity. Several other results based on this error representation may be found in [3].
116
James H. Bramble
3.2 The Multiplicative Error Representation The theory of Subsection 2.1 was based on a two-level error representation. To get strong results in the case of less than full elliptic regularity, we shall need to express the error in a different way. The fine grid error operator EJ is expressed as a product of factors associated with the smoothings on individual levels. By (18), for k > 1 and any v ∈ MJ , (I − Pk )v + Ek Pk v = (I − Pk )v + Kk [(I − Pk−1 ) + Ek−1 Pk−1 ]Kk∗ Pk v. Let Tk = Rk Ak Pk for k > 1 and set T1 = P1 . The adjoint of Tk (for k > 1) with respect to the inner product A(·, ·) is given by Tk∗ = Rkt Ak Pk . Note that I − Tk and I − Tk∗ respectively extend the operators Kk and Kk∗ to operators defined on all of MJ . Clearly (I − Pk )v = 0 for all v ∈ Mk . In addition, Pk commutes with Tk . Thus, (I − Pk )v + Ek Pk v = (I − Tk )[(I − Pk−1 Pk ) + Ek−1 Pk−1 Pk ](I − Tk∗ )v = (I − Tk )[(I − Pk−1 ) + Ek−1 Pk−1 ](I − Tk∗ )v. The second equality follows from the identity Pk−1 Pk = Pk−1 . Repeatedly applying the above identity gives EJ v = (I − TJ )(I − TJ−1 ) · · · (I − T1 )(I − T1∗ )(I − T2∗ ) · · · (I − TJ∗ ).
(27)
Equation (27) displays the J’th level error operator as a product of factors. These factors show the precise effect of the individual grid smoothings on the fine grid error propagator. Our goal is to show that 0 ≤ A(EJ v, v) ≤ (1 − 1/CM )A(v, v).
(28)
This means that the sequence um defined by (14) satisfies |||u − um ||| ≤ (1 − 1/CM )m |||u − u0 |||. Let E0 = I and set Ek = (I − Tk )Ek−1 , for k = 1, 2, . . . , J. By (27), EJ = EJ EJ∗ where EJ∗ is the adjoint of EJ with respect to the A(·, ·) inner product. We can rewrite (28) as A(EJ∗ v, EJ∗ v) ≤ (1 − 1/CM )A(v, v),
for all v ∈ MJ ,
which is equivalent to A(EJ v, EJ v) ≤ (1 − 1/CM )A(v, v),
for all v ∈ MJ .
Consequently, (28) is equivalent to A(v, v) ≤ CM [A(v, v) − A(EJ v, EJ v)],
for all v ∈ MJ .
(29)
We will establish a typical uniform convergence estimate for the multigrid algorithm by proving (29) using the above conditions. For some other results in this direction see [3].
Multilevel Methods in Finite Elements
117
3.3 Some Technical Lemmas We will prove in this subsection an abstract estimate which is essential to all of our estimates. For this, we will need to use the following lemmas. The first two are simply identities. Recall that Rk = Rk + Rkt − Rkt Ak Rk . Lemma 3.2 The following identity holds. A(v, v) − A(EJ v, EJ v) =
J
(Rk Ak Pk Ek−1 v, Ak Pk Ek−1 v).
(30)
k=1
Proof. For any w ∈ MJ , a simple calculation shows that A(w, w) − A((I − Tk )w, (I − Tk )w) = A((2I − Tk )w, Tk w). Let v be in MJ . Taking w = Ek−1 v in the above identity and summing gives A(v, v) − A(EJ v, EJ v) =
J
A((2I − Tk )Ek−1 v, Tk Ek−1 v)
k=1
=
J
(Rk Ak Pk Ek−1 v, Ak Pk Ek−1 v).
k=1
This is (30). Lemma 3.3 Let πk : M → Mk be a sequence of linear operators with πJ = I. Set π0 = 0 and E0 = I. Then the following identity holds. A(v, v) =
J
A(Ek−1 v, (πk − πk−1 )v) +
k=1
A(Tk Ek−1 v, (Pk − πk )v)
k=1
=
J−1
A(P1 v, v) +
J
A(Ek−1 v, (πk − πk−1 )v)
k=2
J−1 + A(Tk Ek−1 v, (Pk − πk )v) .
(31)
k=2
Proof. Expanding v in terms of πk gives A(v, v) =
J
A(v, (πk − πk−1 )v)
k=1
=
J
A(Ek−1 v, (πk − πk−1 )v) +
k=1
Let Fk = I − Ek =
J
A((I − Ek−1 )v, (πk − πk−1 )v).
k=1
k i=1
Ti Ei−1 . Then since F0 = 0 and π0 = 0,
118
James H. Bramble J
A(Fk−1 v, (πk − πk−1 )v)
k=1
=
J
A(Fk−1 v, πk v) −
k=1
=
J−1
J−1
A(Fk v, πk v)
k=1
A((Fk−1 − Fk )v, πk v) + A(FJ−1 v, v).
k=1
Now Fk−1 − Fk = −Tk Ek−1 . Rearranging the terms gives J
A(Fk−1 v, (πk − πk−1 )v)
k=1 J−1
=−
A(Tk Ek−1 v, πk v) +
J−1
A(Tk Ek−1 v, v)
k=1
k=1
=
J−1
A(Tk Ek−1 v, (Pk − πk )v).
k=1
This is the first equality in (31). Note that R1 = A−1 1 and T1 = P1 and thus the second equality in (31) follows from the first. 1/2 and wRk = (Rk w, w)1/2 for all w ∈ Mk . Set wR−1 = (R−1 k w, w) k Using (30) and (31) we can prove the following basic estimate. Lemma 3.4 Let πk : M → Mk be a sequence of linear operators with πJ = I. Let π0 = 0. Then we have the following basic estimate. )1/2 ( ) ( σ1 (v) + σ2 (v) , (32) A(v, v) ≤ A(v, v) − A(EJ v, EJ v) where
σ1 (v) =
A(P1 v, v) +
J
1/2 (πk −
k=2
and σ2 (v) =
J−1
πk−1 )v2R−1 k 1/2
Rkt Ak (Pk
k=2
−
πk )v2R−1 k
.
Proof. We will bound the right hand side of (31). By the Cauchy-Schwarz inequality, the first sum of (31) can be bounded as A(P1 v, v) +
J
A(Ek−1 v, (πk − πk−1 )v)
k=2
= A(P1 v, v) +
J k=2
(Ak Pk Ek−1 v, (πk − πk−1 )v)
Multilevel Methods in Finite Elements
≤
J
1/2 2 Ak Pk Ek−1 vR k
A(P1 v, v) +
k=1
J
119
1/2 (πk −
k=2
πk−1 )v2R−1 k
.
Similarly the second sum of (31) can be bounded as J−1
A(Tk Ek−1 v, (Pk − πk )v)
k=2
=
J−1
(Rk Ak Pk Ek−1 v, Ak (Pk − πk )v)
k=2
≤
J−1
Ak Pk Ek−1 v2Rk
1/2 J−1
k=2
1/2 Rkt Ak (Pk − πk )v2R−1
.
k
k=2
Combining these two estimates and using (31) we obtain A(v, v) ≤
J
1/2 (Rk Ak Pk Ek−1 v, Ak Pk Ek−1 v)
[σ1 (v) + σ2 (v)].
k=1
Using (30) in the above inequality proves the lemma. 3.4 Uniform Estimates In this subsection, we will show that the uniform convergence estimate of Theorem 2.1 often extends to the case of less than full regularity and approximation. The results are obtained by bounding the second factor in Lemma 3.4 from above by A(v, v). We shall require some conditions on the smoother which we state here. Recall that Rk = Rk + Rkt − Rkt Ak Rk , Kk = (I − Rk Ak ) and Kk,ω = I − (ωλ−1 k )Ak . Condition (SM.1) There exists a constant ω ∈ (0, 1) not depending on J such that 0 ≤ A(Kk v, Kk v) ≤ A(Kk,ω v, v),
for all v ∈ Mk , k = 2, 3, . . . , J,
or equivalently ω (v, v) ≤ (Rk v, v) ≤ (A−1 k v, v), λk
for all v ∈ Mk , k = 2, 3, . . . , J.
In addition to the smoother Condition (SM.1), we will require that the smoother Rk be properly scaled. More precisely the condition is the following: Condition (SM.2) There exists a constant θ ∈ (0, 2) not depending on J such that
120
James H. Bramble
A(Rk v, Rk v) ≤ θ(Rk v, v),
for all v ∈ Mk , k = 2, 3, . . . , J.
This inequality is the same as any one of the following three inequalities: 2−θ ((Rkt Ak Rk )v, v), for all v ∈ Mk . (Rk v, v) ≥ (2 − θ)(Rk v, v) ≥ θ A simple change of variable shows that Condition (SM.2) is equivalent to the following condition for Tk = Rk Ak Pk . A(Tk v, Tk v) ≤ θA(Tk v, v),
for all v ∈ MJ , k = 2, 3, . . . , J.
Note that if the smoother Rk satisfies Condition (SM.1), then it also satisfies the inequality in Condition (SM.2) with θ < 2, but possibly depending on k. Therefore smoothers that satisfy (SM.1) will satisfy Condition (SM.2) after a proper scaling. Our theorem uses Condition (A.2), a regularity and approximation property of Pk , Condition (A.3), a geometrical growth condition for λk , and Condition (A.4), a stability condition on Qk . Theorem 3.1 Assume that Conditions (A.2) and (A.4) hold with the same α and that Condition (A.3) holds. Assume in addition that the smoothers, Rk , satisfy Conditions (SM.1) and (SM.2). Then, 0 ≤ A(EJ v, v) ≤ (1 − 1/CM )A(v, v), ! Here CM =
θCb 2−θ
1/2
+ 1+
Ca
= 4CQ CP
1/2 "2
with
2
1
Ca ω
for all v ∈ MJ .
α/2
1 − γ2
and
Cb
=
α/2
γ2
(CQ CPα )
2
α/2
1 − γ2
..
Proof. The theorem is proved by bounding the second factor in the right hand side of (32) from above by A(v, v). By Lemma 3.1, J
|||(Pk − Qk )v||| 2 ≤ Cb A(v, v),
for all v ∈ MJ
k=1
and |||Q1 v||| 2 +
J
λk (Qk − Qk−1 )v2 ≤ Ca A(v, v),
for all v ∈ MJ .
k=2 −1 By Condition (SM.1), (R−1 λk w2 , for all w ∈ Mk , k ≥ 2. Hence, k w, w) ≤ ω
A(P1 v, v) +
J k=2
(Qk − Qk−1 )v2R−1 ≤ k
1+
Ca ω
A(v, v).
Multilevel Methods in Finite Elements t Condition (SM.2) implies that (Rk R−1 k Rk w, w) ≤ Hence, J
Rkt Ak (Pk − Qk )v2R−1 ≤ k
k=2
−1 θ 2−θ (Ak w, w)
121
for w ∈ Mk .
J θ |||(Pk − Qk )v||| 2 2−θ k=2
θCb ≤ A(v, v). 2−θ Combining these two estimates and applying Lemma 3.4, with πk = Qk , shows that A(v, v) ≤ CM [A(v, v) − A(EJ v, EJ v)]. The theorem follows.
4 Non-nested Multigrid 4.1 Non-nested Spaces and Varying Forms There is not a lot in the literature on this subject. Since one of the lectures was devoted to this more general point of view, and an extensive treatment was given in [3], the material presented here is largely taken from [3]. We therefore depart from the framework of the previous sections where we considered only nested sequences of finite dimensional spaces. More generally, assume that we are given a sequence of finite dimensional spaces M1 , M2 , . . . , MJ = M . Each space Mk is equipped with an inner product (·, ·)k and denote by · k the induced norm. In addition, we assume that we are given symmetric and positive definite forms Ak (·, ·) defined on Mk × Mk and set A(·, ·) = AJ (·, ·). Each form Ak (·, ·) induces an operator Ak : Mk → Mk defined by for all ϕ ∈ Mk . (Ak w, ϕ)k = Ak (w, ϕ), Denote by λk the largest eigenvalue of Ak . These spaces are connected through J − 1 linear operators Ik : Mk−1 → Mk , for k = 2, . . . , J. These operators are often referred to as prolongation operators. The operators Qk−1 : Mk → Mk−1 and Pk−1 : Mk → Mk−1 are defined by (Qk−1 w, ϕ)k−1 = (w, Ik ϕ)k ,
for all ϕ ∈ Mk−1 ,
and Ak−1 (Pk−1 w, ϕ) = Ak (w, Ik ϕ),
for all ϕ ∈ Mk−1 .
Finally, we assume that smoothers Rk : Mk → Mk are given and set ()
Rk =
Rk Rkt
if is odd, if is even.
122
James H. Bramble
In the case discussed in Sections 2.1, Mk ⊂ Mk+1 , (·, ·)k = (·, ·), Ak (·, ·) = A(·, ·), Ik = I, and Qk and Pk are projectors with respect to (·, ·) and A(·, ·) respectively. Here Qk and Pk are not necessarily projectors. Note that the relationship Ak−1 Pk−1 = Qk−1 Ak still holds. 4.2 General Multigrid Algorithms Given f ∈ MJ , we are interested in solving AJ u = f. With u0 given, we will consider the iterative algorithm ui = MgJ (ui−1 , f ) ≡ ui−1 + BJ (f − AJ ui−1 ), where BJ : M → M is defined recursively by the following general multigrid procedure. Algorithm 4.1 Let p be a positive integer and let mk be a positive integer depending on k. Set B1 = A−1 1 . Assuming that Bk−1 : Mk−1 → Mk−1 has been defined, we define Bk : Mk → Mk as follows. Let g ∈ Mk . (1) Pre-smoothing: Set x0 = 0 and define x , = 1, . . . , mk by (+mk )
x = x−1 + Rk
(g − Ak x−1 ).
(2) Correction: y mk = xmk + Ik q p , where q 0 = 0 and q i for i = 1, . . . , p is defined by q i = q i−1 + Bk−1 [Qk−1 (g − Ak xmk ) − Ak−1 q i−1 ]. (3) Post-smoothing: Define y for = mk + 1, . . . , 2mk by (+mk )
y = y −1 + Rk
(g − Ak y −1 ).
(4) Bk g = y 2mk . The cases p = 1 and p = 2 correspond to the V-cycle and the W-cycle multigrid algorithms respectively. The case p = 1 with the number of smoothings, mk , varying is known as the variable V-cycle multigrid algorithm. In addition to the V-cycle and the W-cycle multigrid algorithms with the number of smoothings the same on each level, we will consider a variable V-cycle multigrid algorithm in which we will assume that the number of smoothings increases geometrically as k decreases. More precisely, in such a case we assume that there exist two constants β0 and β1 with 1 < β0 ≤ β1 such that β0 mk ≤ mk−1 ≤ β1 mk .
Multilevel Methods in Finite Elements
123
Note that the case β0 = β1 = 2 corresponds to doubling the number of smoothings as we proceed from Mk to Mk−1 , i.e., mk = 2J−k mJ . Our aim is to study the error reduction operator Ek = I − Bk Ak and provide conditions under which, we can estimate δk between zero and one such that |Ak (Ek u, u)| ≤ δk Ak (u, u), for all u ∈ Mk . We also show that the operator Bk corresponding to the variable V-cycle multigrid algorithm provides a good preconditioner for Ak even in the cases where Ek is not a reducer. We first derive a recurrence relation. Let Kk = I − Rk Ak , Kk∗ = I − Rkt Ak and set if m is even (Kk∗ Kk )m/2 (m) Kk = m−1 (Kk∗ Kk ) 2 Kk∗ if m is odd. By the definition of Bk the following two-level recurrence relation is easily derived: (m ) (m ) p Ek = (Kk k )∗ [I − Ik Pk−1 + Ik Ek−1 Pk−1 ]Kk k . (33) Our main assumption relating the spaces Mk and Mk−1 is the following regularity-approximation property. Condition (A.2”) For some α with 0 < α ≤ 1 there exists CP independent of k such that α Ak v2k 2α [Ak (v, v)]1−α . |Ak ((I − Ik Pk−1 )v, v)| ≤ CP λk The so-called variational assumption refers to the case in which all of the forms Ak (·, ·) are defined in terms of the form AJ (·, ·) by Ak−1 (v, v) = Ak (Ik v, Ik v),
for all v ∈ Mk−1 .
(34)
We call this case the inherited case. Given the operators Ik , then all of forms come from A(·, ·). If the spaces are nested, then Ik can be chosen to be the natural injection operator and the forms can be defined by Ak (v, v) = A(v, v),
for all v ∈ Mk .
We call this case the nested-inherited case. It follows that in the nestedinherited case, Condition (A.2) implies Condition (A.2”). In general, (34) does not hold in the nonnested case. We will consider, at first, a weaker condition than (34). Condition (I.1) For k = 2, . . . , J, the operators Ik satisfy Ak (Ik v, Ik v) ≤ Ak−1 (v, v),
for all v ∈ Mk−1 .
It follows from the definition of Pk−1 that Condition (I.1) holds if and only if
124
James H. Bramble
Ak−1 (Pk−1 v, Pk−1 v) ≤ Ak (v, v),
for all v ∈ Mk
and if and only if Ak ((I − Ik Pk−1 )v, v) ≥ 0,
for all v ∈ Mk .
Lemma 4.1 If Condition (I.1) holds then the error propagator Ek is symmetric and positive semidefinite with respect to Ak (·, ·), i.e., Ak (Ek v, v) = Ak ((I − Bk Ak )v, v) ≥ 0,
for all v ∈ Mk .
Proof. Since Ak (Ik Pk−1 w, v) = Ak−1 (Pk−1 w, Pk−1 v), the operator Ik Pk−1 is symmetric with respect to Ak (·, ·). Hence by induction, since E1 = I −B1 A1 = 0, it follows, using (33), that I − Bk Ak is symmetric and positive semidefinite. The assumption on the smoother is the same as in Section 2.1; we restate it here. Condition (SM.1) There exists ω > 0 not depending on J such that ω v2k ≤ (Rk v, v)k , for all v ∈ Mk , k = 2, . . . , J. λk This is a condition local to Mk and hence does not depend on whether or not the spaces Mk are nested. The following lemma is a generalization of Lemma 2.1. Lemma 4.2 If Condition (SM.1) holds, then Ak Kk v2k ω −1 (m) (m) [Ak (v, v) − Ak (Kk v, Kk v)]. ≤ λk m (m)
(m)
Proof. Let v˜ = Kk
v. By Condition (SM.1),
Ak v˜2k ≤ ω −1 Ak (Rk Ak v˜, v˜) = ω −1 Ak ((I − Kk∗ Kk )˜ v , v˜). λk Suppose m is even. Then v˜ = (Kk∗ Kk )m/2 v. Set K k = Kk∗ Kk . Then Ak v˜2k ω −1 m m Ak ((I − K k )v, v) ≤ ω −1 Ak ((I − K k )K k v, v) ≤ λk m ω −1 [Ak (v, v) − Ak (˜ v , v˜)]. = m For m odd, we set K k = Kk Kk∗ and the result follows.
Multilevel Methods in Finite Elements
125
4.3 Multigrid V-cycle as a Reducer We will provide an estimate for the convergence rate of the V-cycle and the variable V-cycle multigrid algorithm under Conditions (A.2”), (I.1) and (SM.1). It follows from Lemma 4.1 that if Condition (I.1) holds, then Ak (Ek u, u) ≥ 0. Hence our aim in such a case is to estimate δk between zero and one, where δk is such that 0 ≤ Ak ((I − Bk Ak )u, u) ≤ δk Ak (u, u),
for all u ∈ Mk .
Theorem 4.1 (V-cycle) Assume that Conditions (A.2”) and (I.1) hold and that the smoothers satisfy Condition (SM.1). Let p = 1. Then 0 ≤ Ak (Ek v, v) ≤ δk Ak (v, v),
for all v ∈ Mk
(35)
with 0 ≤ δk < 1 given in each case as follows: (a) If p = 1 and mk = m for all k, then (35) holds with δk =
Mk Mk
1−α α
1−α α
+ mα
M = e(1−α) αCP2 /ω + 1 − 1.
with
(b) If p = 1 and mk increases as k decreases, then (35) holds with k * 1−α 1 1− . δk = 1 − αC 2 mα i 1 + ωmPα i=2 k
In particular if there exist two constants β0 and β1 with 1 < β0 ≤ β1 such M that β0 mk ≤ mk−1 ≤ β1 mk , then δk ≤ M +m α for M sufficiently large. k
Proof. By Lemma 4.1, the lower estimate in (35) follows from Condition (I.1). For the upper estimate, we claim that Conditions (A.2”) and (SM.1) imply that Ak (Ek v, v) ≤ δk (τ )Ak (v, v), for all v ∈ Mk (36) for τ > 0 if 0 < α < 1 and for τ ≥ maxi (1/mα i ) if α = 1. Here δk (τ ) is defined by k * (1 − α) 1 1− . (37) 1 − δk (τ ) = α 1 + (αCP2 /ω)τ i=2 (τ mi ) 1−α From this the theorem will follow by choosing an appropriate τ . We shall prove (36) by induction. Since E1 = I − B1 A1 = 0, (36) holds for k = 1. Assume that Ak−1 (Ek−1 v, v) ≤ δk−1 (τ )Ak−1 (v, v), (mk )
Let v˜ = Kk
for all v ∈ Mk−1 .
v. Using the two-level recurrence relation (33), we have
126
James H. Bramble
Ak (Ek v, v) ≡ Ak ((I − Bk Ak )v, v) = Ak ((I − Ik Pk−1 )˜ v , v˜) + Ak−1 (Ek−1 Pk−1 v˜, Pk−1 v˜) ≤ Ak ((I − Ik Pk−1 )˜ v , v˜) + δk−1 (τ )Ak−1 (Pk−1 v˜, Pk−1 v˜) = Ak ((I − Ik Pk−1 )˜ v , v˜) + δk−1 (τ )Ak (Ik Pk−1 v˜, v˜) = [1 − δk−1 (τ )] Ak ((I − Ik Pk−1 )˜ v , v˜) + δk−1 (τ )Ak (˜ v , v˜). By Conditions (A.2”) and (SM.1) and using Lemma 4.2, α Ak v˜2k 2α Ak ((I − Ik Pk−1 )˜ v , v˜) ≤ CP [Ak (˜ v , v˜)]1−α λk CP2α v , v˜)]1−α . ≤ [Ak (v, v) − Ak (˜ v , v˜)]α [Ak (˜ (ωmk )α We now put things together to get Ak (Ek v, v) ≤ [1−δk−1 (τ )]
CP2α (|||v||| 2 −|||˜ v ||| 2 )α (|||˜ v ||| 2 )1−α +δk−1 (τ )|||˜ v ||| 2 . (ωmk )α
Let x = |||˜ v ||| 2 /|||v||| 2 . Then Ak (Ek v, v) ≤ f (δk−1 (τ ); x, mk )Ak (v, v), where f is defined by f (δ; x, m) = (1 − δ)
CP2α (1 − x)α x1−α + δx, (ωm)α
0 ≤ x < 1.
(38)
By the H¨older inequality, we have α CP2α (1 − x)α x1−α ≤ (αCP2 /ω)τ (1 − x) + (1 − α)(τ m)− 1−α x, (ωm)α
for all τ > 0 if 0 < α < 1 and for all τ ≥ 1/mα if α = 1. Hence ! " α f (δ; x, m) ≤ (δ; x) ≡ δx + (1 − δ) (αCP2 /ω)τ (1 − x) + (1 − α)(τ m)− 1−α x . Note that (δ; x) is linear in x and therefore f (δ; x, m) ≤ (δ; x) ≤ max((δ; 0), (δ; 1)). If
2 (αCP /ω)τ 2 /ω)τ 1+(αCP
≤ δ < 1, then
" ! α (δ; 0) = (1 − δ)(αCP2 /ω)τ ≤ δ ≤ δ + (1 − δ) (1 − α)(τ m)− 1−α = (δ; 1), and hence
Multilevel Methods in Finite Elements
127
" ! α f (δ; x, m) ≤ (δ; 1) = δ + (1 − δ) (1 − α)(τ m)− 1−α . In particular, since
2 (αCP /ω)τ 2 /ω)τ 1+(αCP
= δ1 (τ ) ≤ δk−1 (τ ) < 1,
f (δk−1 (τ ); x, mk ) ≤ δk−1 (τ ) + [1 − δk−1 (τ )] [(1 − α)(τ mk )− 1−α ] ≡ δk (τ ). α
As a consequence, Ak (Ek v, v) ≤ f (δk−1 (τ ); x, mk ) Ak (v, v) ≤ δk (τ ) Ak (v, v) holds for any τ > 0 if 0 < α < 1 and for any τ ≥ maxi (1/mi ) if α = 1. This proves (36). If mk ≡ m, then setting τ = k (1−α)/α /mα in (37) shows that δk (τ ) ≤
Mk Mk
1−α α
1−α α
+ mα
with M = e1−α (αCP2 /ω + 1) − 1 ≤ CP2 /ω + (e1−α − 1). This proves part (a) of the theorem. Part (b) of the theorem follows by setting τ = m−α in (37) and notk ing that the product term is uniformly bounded from below if mk increases geometrically as k decreases. Remark 1. Notice that, for p = 1 and mk = m, δk → 1 as k → ∞. This is in contrast to the results in the nested-inherited case under similar hypotheses. The deterioration, however is only like a power of k and hence may not be too serious. Remark 2. For the variable V-cycle methods, if mk increases geometrically as k decreases, then clearly the product term is strictly larger than 0 and thus δk is strictly less than 1, independently of k. So, e.g., if mJ = 1 we have 0 ≤ AJ (EJ v, v) ≤
M AJ (v, v), M +1
for all v ∈ MJ .
If mk = 2J−k mJ , then the cost per iteration of the variable V-cycle multigrid algorithm is comparable to that of the W-cycle multigrid algorithm. 4.4 Multigrid W-cycle as a Reducer We now provide an estimate for the rate of convergence of the W-cycle multigrid algorithm. Theorem 4.2 (W-cycle) Assume that Conditions (A.2”) and (I.1) hold and that the smoothers satisfy Condition (SM.1). Let p = 2 and mk = m for all k. Then
128
James H. Bramble
0 ≤ Ak (Ek v, v) ≤ δAk (v, v),
for all v ∈ Mk
and δ=
M M + mα
with M sufficiently large but independent of k.
Proof. By Lemma 4.1, the lower estimate follows from Condition (I.1). We obtain the upper estimate inductively. Since E1 = 0, the estimate holds for k = 1. Assume that Ak−1 (Ek−1 v, v) ≤ δAk−1 (v, v),
for all v ∈ Mk−1 .
Then since Ak−1 (Ek−1 v, v) ≥ 0, we have 2 Ak−1 (Ek−1 v, v) ≤ δ 2 Ak−1 (v, v), (m)
Let v˜ = Kk
for all v ∈ Mk−1 .
v. Using the two-level recurrence (33), we obtain for p = 2,
Ak (Ek v, v) ≡ Ak ((I − Bk Ak )v, v) 2 = Ak ((I − Ik Pk−1 )˜ v , v˜) + Ak−1 (Ek−1 Pk−1 v˜, Pk−1 v˜) 2 v , v˜) + δ Ak−1 (Pk−1 v˜, Pk−1 v˜) ≤ Ak ((I − Ik Pk−1 )˜ = Ak ((I − Ik Pk−1 )˜ v , v˜) + δ 2 Ak (Ik Pk−1 v˜, v˜) = (1 − δ 2 )Ak ((I − Ik Pk−1 )˜ v , v˜) + δ 2 Ak (˜ v , v˜). By Conditions (A.2”) and (SM.1), and using Lemma 4.2, α Ak v˜2k 2α Ak ((I − Ik Pk−1 )˜ v , v˜) ≤ CP [Ak (˜ v , v˜)]1−α λk CP2α [Ak (v, v) − Ak (˜ v , v˜)]α [Ak (˜ v , v˜)]1−α . ≤ (ωm)α We now put things together to get Ak (Ek v, v) ≤ (1 − δ 2 )
CP2α (|||v||| 2 − |||˜ v ||| 2 )α (|||˜ v ||| 2 )1−α + δ 2 |||˜ v ||| 2 . (ωm)α
Let x = |||˜ v ||| 2 /|||v||| 2 . Then Ak (Ek v, v) ≤ f (δ 2 ; x, m)Ak (v, v), with f (δ; x, m) defined by (38). The theorem will follow if we can show f (δ 2 ; x, m) ≤ δ, for δ = M/(M + mα ) with M sufficient large. We now prove (39). We have shown in the proof of Theorem 4.1 that
(39)
Multilevel Methods in Finite Elements
f (δ 2 ; x, m) ≤ max((δ 2 ; 0), (δ 2 ; 1)),
129
for all τ > 0.
Here (δ; x) ≡ δx + (1 − δ)[(αCP2 /ω)τ (1 − x) + (1 − α)(τ m)− 1−α x]. It thus suffices to show that there exists a number δ with 0 < δ < 1 such that, with an appropriately chosen τ (with the possible restrictions τ ≥ 1/m for α = 1), α
(δ 2 ; 0) = (1 − δ 2 )(αCP2 /ω) τ ≤ δ and
(δ 2 ; 1) = δ 2 + (1 − δ 2 )(1 − α)(τ m)−α/(1−α) ≤ δ.
These two inequalities can be written as
1+δ δ
1−α
1−α mα/(1−α)
1−α
≤τ ≤ α
δ 1 − δ2
α
1 αCP2 /ω
α .
This is equivalent to choosing a δ ∈ (0, 1) so that 1+δ mα mα (1 − δ)α ≤ ≡ 2 α 1−α δ (αCP /ω) (1 − α) Cα M holds uniformly in m ≥ 1. Clearly, we can take δ = M +m α with M large enough such that above inequality holds. For example, take M ≥ 12 (4Cα )1/α = (1−α) 1 1/α (αCP2 /ω)(1 − α) α . 24
We have thus proved (39) and hence the theorem.
In many applications, Condition (I.1) is not valid. We note that Condition (I.1) is used in Lemma 4.1 to prove Ak (Ek v, v) ≡ Ak ((I − Bk Ak )v, v) ≥ 0. Without (I.1), we have to prove |Ak (Ek v, v)| ≤ δAk (v, v)
for some δ < 1.
It is sufficient to assume either the number of smoothings, m, to be sufficiently large (but independent of k), or the following stability condition on Ik . Condition (I.2) For k = 2, . . . , J, the operators Ik satisfy Ak (Ik v, Ik v) ≤ 2Ak−1 (v, v),
for all v ∈ Mk−1 .
Theorem 4.3 (W-cycle) Suppose that Condition (A.2”) holds and that the smoothers, Rk , satisfy Condition (SM.1). Let p = 2 and mk = m for all k. Assume that either (a) Condition (I.2) holds or (b) the number of smoothings, m, is sufficiently large.
130
James H. Bramble
Then |Ak (Ek v, v)| ≤ δAk (v, v), with δ =
M M +mα
for all v ∈ Mk ,
(40)
with M sufficiently large.
Proof. We proceed by induction. Let δ be defined as in Theorem 4.2. Since Ek = I − B1 A1 = 0, (40) holds for k = 1. Assume that |Ak−1 (Ek−1 v, v)| ≤ δAk−1 (v, v),
for all v ∈ Mk−1 .
Then 2 0 ≤ Ak−1 (Ek−1 v, v) ≤ δ 2 Ak−1 (v, v),
for all v ∈ Mk−1 .
We can prove in a way similar to the proof of the previous theorem that Ak (Ek v, v) ≤ δAk (v, v), with δ = M/(M + mα ) given by the previous theorem. We now show that −Ak (Ek v, v) ≤ δAk (v, v), with the same δ. We first consider the case in which m is sufficiently large. Note that Ek2 is always symmetric, positive semidefinite. The two-level recurrence relation (33), with p = 2, implies that −Ak (Ek v, v) ≤ −Ak ((I − Ik Pk−1 )˜ v , v˜) ≤
CP2α Ak (v, v). (ωm)α
Without loss of generality, we assume that M > CP2α /ω α . Then we choose m C 2α
M P so large that (ωm) α ≤ δ = M +mα and then (40) follows in this case. Now if Condition (I.2) holds, then
−Ak ((I − Ik Pk−1 )˜ v , v˜) ≤ Ak (˜ v , v˜). This implies that for any δ ∈ (0, 1), −Ak (Ek v, v) ≤ (1 − δ 2 )|Ak ((I − Ik Pk−1 )˜ v , v˜)| + δ 2 Ak (˜ v , v˜). This is the same bound as for Ak (Ek v, v). Thus the proof proceeds as before to show that |Ak (Ek v, v)| ≤ δAk (v, v), for δ =
M M +mα
with M sufficiently large. This proves the theorem.
Multilevel Methods in Finite Elements
131
4.5 Multigrid V-cycle as a Preconditioner We now consider using the multigrid operator Bk as a preconditioner for Ak . Any symmetric positive definite operator can be used as a preconditioner for Ak . We will discuss the conditions under which multigrid operator Bk is a good preconditioner. Theorem 4.1 states that if Conditions (A.2”) and (I.1) hold and the smoothers satisfy Condition (SM.1), then 0 ≤ Ak (Ek v, v) ≡ Ak ((I − Bk Ak )v, v) ≤ δk Ak (v, v) which is equivalent to (1 − δk )Ak (v, v) ≤ Ak (Bk Ak v, v) ≤ Ak (v, v).
(41)
Therefore, in such a case, Bk is an optimal preconditioner for Ak . In many situations, Condition (I.1) is not satisfied. In such a case Ak (Ek v, v) could become negative, and thus Ek may not be a reducer. For the V-cycle and the variable V-cycle multigrid algorithms, we have proved without using Condition (I.1) that Ak (Ek v, v) ≤ δk Ak (v, v), which is equivalent to the lower estimate in (41). Condition (I.1) is only used to get the upper estimate in (41). We do not need the upper estimate in (41) in order to use Bk as a preconditioner. For a good preconditioner, we want to find numbers η k and η k such that η k Ak (v, v) ≤ Ak (Bk Ak v, v) ≤ η k Ak (v, v),
for all v ∈ Mk .
(42)
We will provide estimates for η k and η k for the variable V-cycle multigrid operator under the Conditions (SM.1) and (A.2”). Note again that (SM.1) and (A.2”) imply the lower estimate in (41), and hence the lower estimate of (42) follows with η k ≥ 1 − δk . Theorem 4.4 (V-cycle preconditioner) Let p = 1. Assume that Condition (A.2”) holds and that the smoothers, Rk , satisfy Condition (SM.1). Then (42) holds with ηk = where δ¯i = satisfies
k * 1−α ωmα k 1 − αCP2 + ωmα mα i k i=2
2α CP αα (1−α)1−α . ω α mα i
and
ηk =
k *
(1 + δ¯i )
i=2
In particular if the number of smoothings, mk ,
β0 mk ≤ mk−1 ≤ β1 mk for some 1 < β0 ≤ β1 independent of k, then ηk ≥
mα k M + mα k
and
ηk ≤ 1 +
M M + mα k = mα mα k k
for some M .
132
James H. Bramble
Proof. We have shown in the proof of Theorem 4.1 that Conditions (A.2”) and (SM.1) imply that Ak ((I − Bk Ak )v, v) = Ak (Ek v, v) ≤ δk Ak (v, v). Consequently, (1 − δk )Ak (v, v) ≤ Ak (Bk Ak v, v) and the lower estimate in (42) holds with k * 1−α 1 1 − . η k = 1 − δk = αCP2 /(ωmα mα i k) + 1 i=2 We prove the upper estimate in (42) by induction. For k = 1 there is nothing to prove. Assume (42) is true for k − 1. Then −Ak−1 ((I − Bk−1 Ak−1 )w, w) ≤ (η k−1 − 1)Ak−1 (w, w), (mk )
Set v˜ = Kk
for all w ∈ Mk−1 .
v. By the induction hypothesis
−Ak ((I − Bk Ak )v, v) = −Ak ((I − Ik Pk−1 )˜ v , v˜) − Ak−1 ((I − Bk−1 Ak−1 )Pk−1 v˜, Pk−1 v˜) v , v˜) + (η k−1 − 1)Ak−1 (Pk−1 v˜, Pk−1 v˜) ≤ −Ak ((I − Ik Pk−1 )˜ = −η k−1 Ak ((I − Ik Pk−1 )˜ v , v˜) + (η k−1 − 1)Ak (˜ v , v˜) ≤ −η k−1 Ak ((I − Ik Pk−1 )˜ v , v˜) + (η k−1 − 1)Ak (v, v).
(43)
It remains to estimate −Ak ((I − Ik Pk−1 )˜ v , v˜). By Condition (A.2”) + ,α ˜2k 2α Ak v −Ak ((I − Ik Pk−1 )˜ v , v˜) ≤ CP [Ak (˜ v , v˜)]1−α . λk By Lemma 4.2,
Ak v˜2k 1 [Ak (v, v) − Ak (˜ v , v˜)]. ≤ ωmk λk Hence, since 0 ≤ Ak (˜ v , v˜) ≤ Ak (v, v), we have that −Ak ((I − Ik Pk−1 )˜ v , v˜) CP2α ≤ [Ak (v, v) − Ak (˜ v , v˜)]α [Ak (˜ v , v˜)]1−α (ωmk )α CP2α ≤ αα (1 − α)1−α Ak (v, v) (ωmk )α = δ¯i Ak (v, v).
(44)
Combining (43) and (44), we obtain −Ak ((I − Bk Ak )v, v) ≤ [η k−1 δ¯k + (η k−1 − 1)]Ak (v, v) = (η k − 1)Ak (v, v). Hence Ak (Bk Ak v, v) ≤ η k Ak (v, v) and the theorem is proved. We now prove a general result for the V-cycle multigrid without assuming the regularity-approximation assumption (A.2”).
Multilevel Methods in Finite Elements
133
Theorem 4.5 If the operators Rk , k = 1, 2, . . . , J, are symmetric and positive definite, then the operator BJ corresponding to the V-cycle multigrid method (p = 1) is symmetric and positive definite. If in addition, Condition (SM.1) holds, then ω Ak (v, v) ≤ Ak ((Bk Ak )v, v). λk Proof. It is the same to prove that Bk Ak is symmetric and positive definite (m ) with respect to Ak (·, ·). The symmetry is easy to see. Let v˜ = Kk k v. Since p = 1, we get Ak ((I − Bk Ak )v, v) = Ak (˜ v , v˜) − Ak−1 ((Bk−1 Ak−1 )Pk−1 v˜, Pk−1 v˜). The induction hypothesis implies that Ak ((I − Bk Ak )v, v) < Ak (˜ v , v˜). v , v˜) < Ak (v, v) and Bk is If Rk is symmetric and positive definite, then Ak (˜ symmetric and positive definite. If in addition, (SM.1) holds, then Ak (˜ v , v˜) ≤ A (v, v) ≤ A (B A v, v). (1 − ω/λk )Ak (v, v) and thus ωλ−1 k k k k k The theorem shows that the (variable) V-cycle multigrid operator Bk is symmetric and positive definite, and thus can always be used as a preconditioner if the size of the condition number is not a concern. If in addition we know that Condition (I.1) holds, then by Lemma 4.1, we have 0 ≤ Ak ((I − Bk Ak )v, v),
for all v ∈ Mk
Ak (Bk Ak v, v) ≤ Ak (v, v),
for all v ∈ Mk .
or equivalently
Therefore, if Conditions (I.1) and (SM.1) hold, then Ak (Rk Ak v, v) ≤ Ak (Bk Ak v, v) ≤ Ak (v, v).
5 Computational Scales of Sobolev Norms 5.1 Introduction In this lecture we provide a framework for developing computationally efficient multilevel preconditioners and representations for Sobolev norms. The material, for the most part, is taken from [6]. Specifically, given a Hilbert space V and a nested sequence of subspaces, V1 ⊂ V2 ⊂ . . . ⊂ V , we construct operators which are spectrally equivalent to those of the form A = k µk (Qk − Qk−1 ). Here µk , k = 1, 2, . . . are positive numbers and
134
James H. Bramble
Qk is the orthogonal projector onto Vk with Q0 = 0. We first present abstract results which show when A is spectrally equivalent to a similarly constructed k of Qk , for k = 1, 2, . . . . operator A defined in terms of an approximation Q We describe how these results lead to efficient preconditioners for discretizations of differential and pseudo-differential operators of positive and negative order and to sums of such operators. Multilevel subspace decompositions provide tools for the construction of preconditioners. One of the first examples of such a construction was provided in [9] where a simple additive multilevel operator (BPX) was developed for preconditioning second order elliptic boundary value problems. The analysis of the BPX preconditioner involves the verification of norm equivalences of the form u2H 1 (Ω) #
J
2 h−2 k (Qk − Qk−1 )uL2 (Ω) ,
for all u ∈ VJ .
(45)
k=1
The above norms are those corresponding to the Sobolev space H 1 (Ω) and L2 (Ω) respectively. The quantity hk is the approximation parameter associated with Vk . The original results in [9] were sharpened by [23] and [30] to show that (45) holds with constants of equivalence independent of J. Practical preconditioners involve the replacement of the operator Qk − Qk−1 by easily computable operators as discussed in [9]. In addition to the above application, there are other practical applications of multilevel decompositions. In particular, for boundary element methods, it is important to have computationally simple operators which are equivalent to pseudo-differential operators of order one and minus one. In addition, multilevel decompositions which provide norm equivalences for H 1/2 (∂Ω) can be used to construct bounded extension operators used in nonoverlapping domain decomposition with inexact subdomain solves. The equivalence (45) is the starting point of the multilevel analysis. This equivalence is valid for J = ∞ in which case we get a norm equivalence on H 1 (Ω). It follows from (45) that v2H s (Ω) #
∞
2 h−2s k (Qk − Qk−1 )vL2 (Ω) ,
k=1
for s ∈ [0, 1]. Here · H s (Ω) denotes the norm on the Sobolev space H s (Ω) of order s. This means that the operator As =
∞
h−2s k (Qk − Qk−1 )
(46)
k=1
can be used as a preconditioner. However, As is somewhat expensive to evaluate since the evaluation of the projector Qk requires the solution of a Gram matrix problem. Thus, many researchers have sought computationally efficient operators which are equivalent to As .
Multilevel Methods in Finite Elements
135
Some techniques for constructing such operators based on wavelet or wavelet–like space decompositions are given by [13], [18], [20], [26], [25], [28], [29] and others. In the domain decomposition literature, extension operators that exploit multilevel decomposition were used in [10], [16], and [22]. In this lecture we construct simple multilevel decomposition operators which can also be used to define norms equivalent to the usual norms on Sobolev spaces. Specifically, we develop computationally efficient operators which are uniformly equivalent to the more general operator AJ =
J
µk (Qk − Qk−1 ),
(47)
k=1
where 1 ≤ J ≤ ∞ and {µk } are positive constants. We start by stating an k : VJ → Vk , be another k }, with Q abstract theorem. The proof is in [6]. Let {Q sequence of linear operators. The theorem shows that the operators AJ and AJ =
J
t − Q t )(Q k − Q k−1 ) µk (Q k k−1
(48)
k=1
are spectrally equivalent under appropriate assumptions on the spaces Vk , t is the adjoint of Q k . The k and the sequence {µk }. Here Q the operators Q k abstract results are subsequently applied to develop efficient preconditioners k is defined in terms of a simple averaging operator. Some partial when Q results involving the operator used here were stated in [22]. Because of the generality of the abstract results, they can be applied to preconditioning sums of operators. An example of this is the so-called “singularly perturbed” problem resulting from preconditioning parabolic time stepping problems which leads to −1 µk = (h−2 . k + 1)
In this example is the time step size. Our results give rise to preconditioned systems with uniformly bounded condition numbers independent of the parameter . We note that [25] provides an L2 -stable local basis for the spaces {Range(Qk −Qk−1 )}. With such a construction it is possible to obtain preconditioners for many of the applications considered in this lecture. However, our approach is somewhat simpler to implement. In addition, our abstract framework allows for easy application to other situations such as function spaces which are piecewise polynomials of higher order. 5.2 A Norm Equivalence Theorem We now provide abstract conditions which imply the spectral equivalence of (47) and (48). We start by introducing the multilevel spaces. Let V be a
136
James H. Bramble
Hilbert space with inner product (·, ·). We assume that we are given a nested sequence of approximation subspaces, V1 ⊂ V2 ⊂ . . . ⊂ V, and that this sequence is dense in V . Let θj , j = 1, 2, . . ., be a non-decreasing sequence of positive real numbers. Define H to be the subspace of V such that the norm 1/2 ∞ |||v||| = θj (Qj − Qj−1 )v2 j=1
is finite. Here · denotes the norm in V , Qj for j > 0, denotes the orthogonal projection onto Vj and Q0 = 0. Clearly H is a Hilbert space and {Vk } is dense in H. The following properties are obvious from the construction. 1. The “inverse inequality” holds for Vj , i.e., 1/2
|||v||| ≤ θj v,
for all v ∈ Vj .
(49)
2. The “approximation property” holds for Vj , i.e., −1/2
(Qj − Qj−1 )v ≤ θj
|||v||| ,
for all v ∈ H.
(50)
As discussed in the introduction, the abstract results will be stated in k : V → Vk terms of an additional sequence of “approximation” operators, Q 0 = 0. These operators are assumed to satisfy the following for k > 0 and Q three conditions, for k = 1, 2, . . .. 1. An “approximation property”: There exists a constant CA such that k )v ≤ CA θ (Qk − Q k
−1/2
|||v||| ,
for all v ∈ H.
(51)
k : There exists a δ > 0 such that 2. Uniform coercivity of Q k vk , vk ), δvk 2 ≤ (Q
for all vk ∈ Vk .
(52)
t , the adjoint of Q k , is contained in Vk . This condition is 3. The range of Q k equivalent to k Qk = Qk Q k . Q (53) Remark 5.1 Let {φi }m i=1 be a basis for Vk . It is not difficult to see that there exists {fi }m with f ∈ V such that i i=1 ˜kv = Q
m
(v, fi ) φi
for all v ∈ V.
(w, φi ) fi
for all w ∈ Vk .
i=1
Then ˜t w = Q k
m i=1
Thus Condition 3 above holds if and only if fi ∈ Vk , for i = 1, . . . , m.
Multilevel Methods in Finite Elements
137
The purpose of this section is to provide abstract conditions which guarantee that the symmetric operators AJ and AJ , defined respectively by (47) and (48), are spectrally equivalent. Let L = (k,j ) be the lower triangular (infinite) matrix with nonzero entries k,j =
θj µk θk µj
1/2 , k ≥ j.
(54)
We assume that L has bounded l2 norm, i.e., ∞
L2 ≡
k,j ξk ζj
k=1 j≤k sup 1/2 ∞ {ξk }, {ζk } ξk2 k=1
∞ k=1
1/2 ≤ CL .
(55)
ζk2
The above condition implies that µk ≤ Cθk for C = CL2 µ1 /θ1 . Thus, (AJ v, v) < ∞ for all v ∈ H. We introduce one final condition: There exists a constant α such that µk + µk+1 ≤ αµk , for k = 1, 2, . . . .
(56)
We can now state the main abstract theorem. Theorem 5.1 Assume that conditions (51)–(53), (55), and (56) are satisfied. Then the operator AJ defined by (48), with 1 ≤ J ≤ ∞, satisfies 2 2 −1 2 2 CL )] (AJ v, v) ≤ (AJ v, v) ≤ 3(1 + αCA CL ) (AJ v, v), [3(1 + αδ −2 CA
for all v ∈ H. Remark 5.2 If W is the completion of H under the norm vA = (A∞ v, v)1/2 , then the estimate of Theorem 5.1 extends to all of W by density. With the following lemma the theorem is easily proved. Its proof is in [6]. Lemma 5.1 Assume that conditions (51)–(53), and (55) are satisfied. Then for all u ∈ H, J k )u2 ≤ C 2 C 2 (AJ u, u) µk (Qk − Q (57) A L k=1
and
J k=1
k )u2 ≤ δ −2 C 2 C 2 (AJ u, u). µk (Qk − Q A L
(58)
138
James H. Bramble
Proof (Proof of Theorem 5.1). Note that k − Q k−1 ) = (Qk − Qk−1 ) − (Qk − Q k ) + (Qk−1 − Q k−1 ). (Q Thus for v ∈ H, (AJ v, v) =
J
k − Q k−1 )v2 µk (Q
k=1
J
≤3
µk (Qk − Qk−1 )v2 +
k=1
≤ 3(1 +
J
k )v2 (µk + µk+1 )(Qk − Q
k=1
2 2 αCA CL ) (AJ v, v).
We used (56) and Lemma 5.1 for the last inequality above. The proof for the other inequality is essentially the same. This completes the proof of the theorem. 5.3 Development of Preconditioners The above results can be applied to the development of preconditioners. Indeed, consider preconditioning an operator on VJ which is spectrally equivalent to J LJ = µ−1 (59) k (Qk − Qk−1 ). k=1
Our preconditioner BJ is to be spectrally equivalent to the operator AJ ≡ L−1 J =
J
µk (Qk − Qk−1 ).
k=1
Let BJ =
J
k − Q k−1 )t (Q k − Q k−1 ). µk (Q
(60)
k=1
k } satisfy Then BJ and AJ are spectrally equivalent provided that {µk } and {Q the hypothesis of the theorem. It follows that BJ LJ is well conditioned. 5.4 Preconditioning Sums of Operators We next consider the case of preconditioning sums of operators. Suppose {ˆ µk } is another sequence which satisfies conditions (55) and (56). Then ˆJ = L
J
µ ˆ−1 k (Qk − Qk−1 ).
(61)
k=1
ˆk in (60) can be preconditioned by the operator defined by replacing µk by µ above. The following corollary shows that the result can be extended to nonˆJ . negative combinations of LJ and L
Multilevel Methods in Finite Elements
139
Corollary 5.1 Assume that conditions (51)–(53) are satisfied and that (55) µk }. For nonnegative c1 , c2 with c1 + c2 > 0 and (56) hold for both {µk } and {ˆ define J −1 k−1 )t (Q k − Q k−1 ). (c1 µ−1 ˆ−1 (Qk − Q (62) BJ = k + c2 µ k ) k=1
Then for 1 ≤ J ≤ ∞, 2 2 −1 ˆ J )−1 v, v) ≤ (BJ v, v) ≤ CL )] ((c1 LJ + c2 L [3(1 + 4αδ −2 CA 2 2 ˆ J )−1 v, v), CL ) ((c1 LJ + c2 L for all v ∈ H. 3(1 + 4αCA
The above corollary shows that BJ is spectrally equivalent to (c1 LJ + ˆ J )−1 and hence provides a uniform preconditioner for c1 LJ + c2 L ˆJ . c2 L Moreover, the resulting condition number (for the preconditioned system) is bounded independently of the parameters c1 and c2 . Proof. Note that ˆ J )−1 = (c1 LJ + c2 L
J
−1 (c1 µ−1 ˆ−1 (Qk − Qk−1 ). k + c2 µ k )
k=1
To apply the theorem to this operator, we simply must check the conditions −1 . The corresponding lower triangular on the sequence µ k = (c1 µ−1 ˆ−1 k + c2 µ k ) matrix has entries 1/2 1/2 + c2 µ ˆ−1 ) θj (c1 µ−1 µ θ j k j j k,j = = (L) θk µ j ˆ−1 θk (c1 µ−1 k + c2 µ k ) 1/2 1/2 1/2 ˆk θj µk µ ˆk θj µk θj µ ˆ k,j . ≤ + ≤ + = (L + L) θk µj µ ˆj θk µj θk µ ˆj k,j ≤ (L + L) ˆ k,j , for every pair k, j, it follows that Since 0 ≤ (L) ˆ ≤ 2CL . ≤ L + L L 2 2 µk }, it clearly holds for { µk }. The Because (56) holds for both {µk } and {ˆ corollary follows by application of the theorem. k 5.5 A Simple Approximation Operator Q k . In this section, we define and analyze a simple approximation operator Q Our applications involve Sobolev spaces with possibly mixed boundary conditions. Let Ω be a polygonal domain in R2 with boundary ∂Ω = ΓD ∪ ΓN where ΓD and ΓN are essentially disjoint. Dirichlet boundary conditions are imposed on ΓD . We consider domains in R2 for convenience. Generalizations of the
140
James H. Bramble
results to be presented to domains in Rd , with d > 2, at least for rectangular parallelepipeds, are straightforward. For non-negative integers s, let H s (Ω) denote the Sobolev space of order s on Ω (see, e.g. [14],[15]). The corresponding norm and semi-norm are denoted 1 (Ω) is defined to be the · H s (Ω) and | · |H s (Ω) respectively. The space HD 1 s functions in H (Ω) which vanish on ΓD and for s > 1, HD (Ω) = H s (Ω) ∩ 1 s s HD (Ω). For positive non-integers s, the spaces H (Ω) and HD (Ω) are defined by interpolation between the neighboring integers using the real method of Lions and Peetre (cf. [15]). For negative s, H s (Ω) is defined to be the space of linear functionals for which the norm uH s (Ω) =
sup −s φ∈HD (Ω)
< u, φ > φH −s (Ω) D
is finite. Here < ·, · > denotes the duality pairing. Clearly, for s < 0, L2 (Ω) ⊆ H s (Ω) if we identify u ∈ L2 (Ω) with the functional < u, φ >≡ (u, φ). Some Basic Approximation Properties Let T be a locally quasi-uniform triangulation of Ω and τ be a closed triangle in T with diameter hτ . Let τ be the subset of the triangles in T whose boundaries intersect τ and define Vτ to be the finite element approximation subspace consisting of functions which are continuous on τ and piecewise linear with respect to the triangles of τ. Note that there are no boundary conditions imposed on the elements of Vτ. We restrict the discussion in this paper to piecewise linear subspaces. Extensions of these considerations to more general nodal finite element subspaces pose no significant additional difficulties. The following facts are well known. 1. Given u ∈ H 1 ( τ ), there exists a constant u such that |u|H 1 (τ ) , u − u H s (τ ) ≤ Ch1−s τ
s = 0, 1.
(63)
τ ), there exists a linear function u such that 2. Given u ∈ H 2 ( |u|H 2 (τ ) , u − u H s (τ ) ≤ Ch2−s τ
s = 0, 1, 2.
(64)
The best constants satisfying the above inequalities clearly depend on the shape of the domain τ. However, under the assumption that the triangulation is locally quasi-uniform, it is possible to show that the above inequalities hold with constants only depending on s and on the quasi-uniformity constants. For the purpose of analyzing our multilevel example we define the following τ : L2 (Ω) → Vτ. Let φi , i = 1, 2, . . . , m, be local approximation operator Q τ is given by the nodal basis for Vτ. The operator Q τu = Q
m (u, φi )τ i=1
(1, φi )τ
φi ,
(65)
Multilevel Methods in Finite Elements
141
with (·, ·)τ the inner product in L2 ( τ ). For u, v ∈ L2 ( τ ), τu, v)τ = (Q
m (u, φi )τ(v, φi )τ
(1, φi )τ
i=1
τ is symmetric on L2 ( τ and hence it immediately follows that Q τ ). Moreover, Q is positive definite when restricted to Vτ (see, Lemma 5.5). The next lemma τ. provides a basic approximation property for Q Lemma 5.2 Let τ be in T . Then for s = 0, 1, there exists a constant C, independent of τ , such that τuL2 (τ ) ≤ Chs uH s (τ ) , u − Q τ
for all u ∈ H s ( τ ).
(66)
Proof. A simple computation shows that τuL2 (τ ) ≤ CuL2 (τ ) Q from which (66) immediately follows for s = 0. For s = 1, let u ˜ be the constant function satisfying (63). Using the previous estimate, since Qτu ˜=u ˜, we have τuL2 (τ ) ≤ u − u τ(u − u u − Q ˜L2 (τ ) + Q ˜)L2 (τ ) ≤ Cu − u ˜L2 (τ ) . Combining the above inequalities and (63) completes the proof of (66) for s = 1. Approximation Properties: the Multilevel Case We provide some stronger approximation properties in the case when the mesh results from a multilevel refinement strategy. Again we describe the case of d = 2. The analogous constructions for d > 2, at least for the case of rectangular parallelepipeds, are straightforward generalizations. Assume that an initial coarse triangulation T1 of Ω has been provided with ΓD aligning with the mesh T1 . By this we mean that any edge of T1 on ∂Ω is either contained in ΓD or intersects ΓD at most at the endpoints of the edge. Multilevel triangulations are defined recursively. For k > 1, the triangulation Tk is defined by breaking each triangle in Tk−1 into four, by connecting the centers of the edges. The finite element space Vk consists of the functions which are continuous on Ω, piecewise linear with respect to Tk and vanish on ΓD . Let hk = maxτ ∈Tk hτ . Clearly, hk = 2−k+1 h1 . k : L2 (Ω) → Vk . We now define a sequence of approximation operators Q Let φi , i = 1, . . . , m be the nodal basis for Vk . We define Qk by k u = Q
m (u, φi ) i=1
(1, φi )
φi .
(67)
142
James H. Bramble
τu and Q k u Remark 5.3 Let τ be a triangle of Tk . It is easy to see that Q agree on τ as long as τ ∩ ΓD = ∅. In the multilevel case, we have the following stronger version of Lemma 5.2. Lemma 5.3 Let s be in [0, 3/2). There exists a constant Cs not depending on hk such that k uL2 (Ω) ≤ Cs hs uH s (Ω) , u − Q k
s for all u ∈ HD (Ω).
For the proof of the lemma, we will use the following lemma which is a slight modification of Lemma 6.1 of [8]. Its proof is contained in the proof of Lemma 6.1 of [8]. Lemma 5.4 Let Ω η denote the strip {x ∈ Ω | dist(x, ∂Ω) < η} and 0 ≤ s < 1/2. Then for all v ∈ H 1+s (Ω), vH 1 (Ω η ) ≤ Cη s vH 1+s (Ω) .
(68)
η In addition, let ΩD denote the strip {x ∈ Ω | dist(x, ΓD ) < η}. Then for all v 1 in HD (Ω), η vL2 (ΩD (69) ) ≤ Cη vH 1 (Ω η ) .
Proof (Proof of Lemma 5.3). The proof for s = 0 is trivial (see, Lemma 5.2). For positive s, we consider two cases. First we examine triangles whose boundaries do not intersect the boundary of any triangle in T1 . We shall denote this set by τ ∩ T1 = ∅ and the remaining set of triangles by τ ∩ T1 = ∅. Let φi be the nodal basis function in the space Vk associated with the node xki . Assume that xki does not lie on the boundary of any triangle τ ∈ T1 . Because of the multilevel construction, the mesh Tk is symmetric with respect to reflection through the point xki . It follows that the nodal basis function φi , restricted to a line passing through xki , is an even function with respect to xki . Let xki = (p1 , p2 ). Then, both of the functions x − p1 and y − p2 are odd on each such line. Consequently, (x − p1 , φi ) = (y − p2 , φi ) = 0. k u Thus, it follows from Remark 5.3 that Q (xki ) = u (xki ) for any linear function u ˜. Let τ be a triangle whose boundary does not intersect the boundary of any triangle of T1 . Applying the above argument to each node of τ shows that k u Q = u on τ for any linear function u . Let τ be as in Lemma 5.2. Given τ ), let u ˜ be the linear function satisfying (64). As in the proof of u ∈ H 2 ( Lemma 5.2, we get k uL2 (τ ) = u − Q τuL2 (τ ) ≤ Cu − u u − Q ˜L2 (τ ) ≤ Chsk uH s (τ ) for s = 0, 1, 2. Summing the above inequality and interpolating gives
Multilevel Methods in Finite Elements
k u2 2 u − Q L (τ )
143
1/2 ≤ Chsk uH s (Ω)
(70)
τ ∩T1 =∅
for s ∈ [0, 2]. We next consider the case when τ intersects an edge in the triangulation T1 . Suppose that τ intersects ΓD . We clearly have that k uL2 (τ ) ≤ CuL2 (τ ) . Q Thus,
(71)
k uL2 (τ ) ≤ CuL2 (τ ) . u − Q
Summing the above inequality and applying (69) gives
k u2 2 u − Q L (τ )
1/2 ≤ Chk uH 1 (Ω 2hk ) .
(72)
τ ∩ΓD =∅
Finally, we consider the case when τ intersects an edge in the triangulation T1 and does not intersect ΓD . By Remark 5.3 and Lemma 5.2, k uL2 (τ ) ≤ Chk uH 1 (τ ) . u − Q Summing the above inequality and using (72) gives
k u2 2 u − Q L (τ )
1/2 ≤ Chk uH 1 (E 2hk ) .
(73)
τ ∩T1 =∅
Here E 2hk denotes the strip of width O(2hk ) around all element edges from the initial triangulation T1 . The lemma for s = 1 follows combining (70) and (73). The result for s ∈ (0, 1) follows by interpolation. For 1 < s < 3/2, (68) and (73) imply
k u2 2 u − Q L (τ )
1/2 ≤ Chsk uH s (Ω) .
τ ∩T1 =∅
The lemma for 1 < s < 3/2 follows combining the above inequality with (70). This completes the proof of the lemma. Remark 5.4 We can extend these arguments to the case when Vk consists of piecewise quadratic functions with respect to the k’th triangulation. Again {φk } ˜ k defined by (67) satisfies Lemma 5.3. denotes the nodal basis for Vk . Then Q The proof is identical to the case of linears.
144
James H. Bramble
The Coercivity Estimate k . Actually, we We next show that the coercivity estimate (52) holds for Q only require that the triangulation Th be locally quasi-uniform. We assume that ΓD aligns with this triangulation and let Vh be the functions which are piecewise linear with respect to this triangulation, continuous on Ω and vanish h defined analogously to Q k in (67) on ΓD . We consider the linear operator Q and show that h v, v), for all v ∈ Vh . v2 ≤ C(Q The constant C above only depends on the quasi-uniformity constant (or minimal angle). Let {xi } for i = 1, . . . , m be the nodes of the triangulation and {φi } be the corresponding nodal basis functions. The mesh is quasi-uniform so for each φi , there is a parameter hi such that h τ # hi
(74)
holds for all triangles τ which have the node xi as a vertex. Here we define a # b to mean that a ≤ Cb and b ≤ Ca with constant C independent of the triangulation. It is well known that (v, v) # h2τ v(xl )2 , for all v ∈ Vh . (75) τ ∈Th
xl ∈τ
It follows from (74) that (v, v) #
m
h2i v(xi )2 ,
for all v ∈ Vh .
(76)
i=1
We can now prove the coercivity estimate. This result was essentially given in [9] for the case of a globally quasi-uniform triangulation. Lemma 5.5 Assume that the mesh Th is locally quasi-uniform. There is a constant C only depending on the quasi-uniformity condition such that h v, v) ≤ C (v, v), C −1 (v, v) ≤ (Q
for all v ∈ Vh .
Proof. Let G be the Gram matrix, i.e. Gij = (φi , φj ),
i, j = 1, . . . , m
and D be the diagonal matrix with entries Dii = h2i . Let v be in Vh and w be the coefficient vector satisfying v=
m i=1
wi φi .
Multilevel Methods in Finite Elements
145
Note that (76) can be rewritten as C −1 ((Gw, w)) ≤ ((Dw, w)) ≤ C((Gw, w)),
for all w ∈ Rm .
Here ((·, ·)) denotes the inner product on Rm . This is equivalent to C −1 ((D−1 Gw, Gw)) ≤ ((Gw, w)) ≤ C((D−1 Gw, Gw)),
for all w ∈ Rm .
Since (1, φi ) # h2i , it follows that h v, v) = (Q
m (v, φi )2 i=1
(1, φi )
=
m ((Gw)i )2 i=1
(1, φi )
# ((D−1 Gw, Gw)) # ((Gw, w)) = (v, v). This completes the proof of the lemma. 5.6 Applications Here we apply some of the above results. We follow [6]. As we have seen k satisfies the approximation and coercivity in the previously, the operator Q estimates required for application of the abstract results. Throughout this discussion we assume that V1 ⊂ V2 ⊂ . . . is a sequence of nested piecewise linear and continuous multilevel spaces as described earlier. We take V = L2 (Ω) and (·, ·) to be the corresponding inner product. With a slight abuse of notation we also use (·, ·) to denote the obvious duality pairing. k extend natuRemark 5.5 Since Vk ⊂ H s (Ω), for 0 ≤ s < 3/2, Qk and Q −s s rally to all of H (Ω). Let −3/2 < s < 3/2 and define A as in (46). It is known that the norm (As u, u)1/2 is equivalent to · H s (Ω) ; cf. [24]. Fix γ < 3/2. By Lemma 5.3, the triangle inequality and well known properties of Qk k )uL2 (Ω) ≤ Cθ−1/2 uH γ (Ω) (Qk − Q k where θk = h−2γ . Let s < γ and set µk = h−2s k . Then, k γ−s hk k,j = hj decays exponentially as a function of k − j. An elementary computation gives that γ−s −1 1 L ≤ CL = 1 − . 2 The next theorem immediately follows from Remark 5.5, Remark 5.2 and Theorem 5.1.
146
James H. Bramble
Theorem 5.2 Let −3/2 < s < 3/2. Then (A(s) u, u)1/2 provides a norm on s (Ω) which is equivalent to the usual Sobolev norm. Here HD A(s) u =
∞
2 h−2s k (Qk − Qk−1 ) u.
k=1
A Preconditioning Example We consider applying the earlier results to develop a preconditioner for an example involving a pseudo-differential operator of order minus one. The canonical example of such an application is associated with a form u(s1 )v(s2 ) V(u, v) = ds1 ds2 . Ω Ω |s1 − s2 | For this application, ΓD is empty and we seek preconditioners for the problem: Find U ∈ VJ satisfying V(U, φ) = F (φ)
for all φ ∈ VJ .
Here F is a given functional. It is shown in [5] that V(u, u) # u2H −1/2 (Ω)
for all u ∈ VJ .
(77)
It is convenient to consider the problem of preconditioning in terms of operators. Specifically, let V : VJ → VJ be defined by (Vv, w) = V(v, w)
for all v, w ∈ VJ .
(1/2) defined by We shall see that AJ (1/2) = AJ
J
2 h−1 k (Qk − Qk−1 )
k=1
provides a computationally efficient preconditioner for V. Indeed, by Theorem 5.1, (1/2) (AJ u, u) # (A1/2 u, u) for all u ∈ VJ . Applying Remark 5.5 and (77) implies that (VA1/2 u, A1/2 u) # (A−1/2 A1/2 u, A1/2 u) = (u, A1/2 u) for all u ∈ VJ . Thus, AJ V has a bounded spectral condition number. (1/2) It is easy to evaluate the action of AJ in a preconditioned iteration k procedure. For k = 1, 2, . . . , J, let {φi } denote the nodal basis for Vk . In typical preconditioning applications, one is required to evaluate the action of the preconditioner on a function v where only the quantities {(v, φJi )} are (1/2)
Multilevel Methods in Finite Elements
147
known. One could, of course, compute v from {(v, φJi )} but this would require solving a Gram matrix problem. Our preconditioner avoids the Gram matrix k , for 1 ≤ k ≤ J, one is only required problem. To evaluate the action of Q to take linear combinations of the quantities {(v, φki )}. Note that (v, φki ) is a k ’s can simple linear combination of {(v, φk+1 )}. Thus, we see that all of the Q i be computed efficiently (with work proportional to the number of unknowns on the finest level J) by a V-cycle-like algorithm. Two Examples Involving Sums of Operators The first example involves preconditioning the discrete systems which result from time stepping a parabolic initial value problem. For the second example we consider a Tikhonov regularization of a problem with noisy data. Fully discrete time stepping schemes for parabolic problems often lead to problems of the form: Find u ∈ Sh satisfying (u, φ) + D(u, φ) = F (φ)
for all φ ∈ Sh .
(78)
Here D(·, ·) denotes the Dirichlet form on Ω and Sh is the finite element approximation. The parameter is related to the time step size and is often small. Assume that Sh = VJ where VJ is a multilevel approximation space as developed earlier. Let µk = 1 and µ ˆk = h2k , for k = 1, 2, . . .. For convenience, 1 we assume that ΓD is non-empty so that D(v, v) # v21 , for all v ∈ HD (Ω). ˆ J defined respectively by (59) and (61), we have Then for LJ and L ˆ J v, v) # D(v, v) (LJ v, v) # (v, v) and (L for all v ∈ VJ . Applying Corollary 5.1 gives that BJ =
J
−1 k−1 )2 (µ−1 µ−1 (Qk − Q k + ˆ k )
(79)
k=1
provides a uniform preconditioner for the discrete operator associated with (78). The resulting condition number for the preconditioned system can be bounded independently of the time step size and the number of levels J. We next consider an example which results from Tikhonov regularization of a problem with noisy data. We consider approximating the solution of the problem Tv = f where T denotes the inverse of the Laplacian and f ∈ L2 (Ω). This is replaced by the discrete problem Th v = fh where Th is the Galerkin solution operator, i.e., Th v = w where w ∈ VJ satisfies
148
James H. Bramble
D(w, θ) = (v, θ)
for all θ ∈ VJ
and fh is the L2 (Ω) orthogonal projection onto VJ . If it is known that v is smooth but f is noisy, better approximations result from regularization [21], [27]. We consider the regularized solution w ∈ VJ satisfying = fh . (Th + αAh )w
(80)
Here Ah : VJ → VJ is defined by (Ah v, w) = D(v, w)
for all v, w ∈ VJ .
The regularization parameter α is often small (see, [27]) and can be chosen optimally in terms of the magnitude of the noise in f . Preconditioners for the sum in (80) of the form of (79) result from the ˆk = h2k . The condition application of Corollary 5.1. In this case, µk = h−2 k , µ numbers for the resulting preconditioned systems can be bounded independent of the regularization parameter α. Preconditioners for systems like (80) are generally not easily developed. The problem is that the operator applied to the higher frequencies (depending on the size of α) behaves like a differential operator while on the lower frequencies, it behaves like the inverse of a differential operator. This causes difficulty in most multilevel methods. H 1 (Ω) Bounded Extensions We finally consider the construction of H 1 (Ω) bounded extensions. Such extensions are useful in development of domain decomposition preconditioners with inexact subdomain solves. The construction given here is essentially the same as that in [22]. We include it here in detail as an application of Theorem 4.1. With {Vj } as above, let Vk (for k = 1, 2, . . . , J) be the functions defined on ∂Ω which are restrictions of those in Vk . This gives a multilevel structure on the finest space VJ . These spaces inherit a nodal basis from the original nodal basis on Vk . The nodal basis function associated with a boundary node xi is just the restriction of the basis function for Vk associated with xi . Denoting this basis by {ψik }, we define qk (f ) =
< f, ψ k > i ψk . < 1, ψik > i
The above sum is taken over the nodal basis elements for Vk and < ·, · > denotes the L2 (∂Ω) inner product. We note that it is known [24] that θ2H 1/2 (∂Ω) #
J k=1
2 h−1 k (qk − qk−1 )θL2 (∂Ω)
Multilevel Methods in Finite Elements
149
where qk denotes the L2 -projection onto V˜k . It is easy to see that Theorem 4.1 holds for these spaces. Thus θ2H 1/2 (∂Ω) #
J
h−1 qk − qk−1 )θ2L2 (∂Ω) , k (
(81)
k=1
with qJ θ = θ and q0 θ = 0. J Now given a function θ ∈ VJ , we define EJ θ ∈ VJ by EJ θ = k=1 ωk with ωk defined as follows. Let θ¯ be the mean value of θ on ∂Ω. Then ω1 is the function in V1 defined by ω1 (xi ) = q1 (xi ),
if xi is a node of V1 on ∂Ω,
and ¯ ω1 (xi ) = θ,
if xi is a node of V1 in the interior of Ω.
For J ≥ k > 1, ωk is the function in Vk defined by ωk (xi ) = [ qk θ − qk−1 θ](xi ),
if xi is a node of Vk on ∂Ω,
and ωk (xi ) = 0,
if xi is a node of Vk in the interior of Ω.
Note that EJ θ = θ on ∂Ω so that EJ is an extension operator. Recall that | · |H 1 (Ω) denotes the semi-norm on H 1 (Ω). Then ¯ H 1 (Ω) = |EJ (θ − θ)| ¯ H 1 (Ω) ≤ EJ (θ − θ) ¯ H 1 (Ω) . |EJ θ|H 1 (Ω) = |EJ θ − θ| We now use the following well known multilevel characterization of the H 1 (Ω) norm on VJ : J 2 2 vH 1 (Ω) # inf h−2 k vk L2 (Ω) , k=1
J where the infimum is taken over all splittings v = k=1 vk , with vk ∈ Vk . ¯ + J ωk and using (81), we ¯ = (ω1 − θ) Applying this with v = EJ (θ − θ) k=2 conclude that . - J −2 −2 2 ¯ 21 ¯ 22 EJ (θ − θ) ≤ C h ω + h ω − θ 2 k 1 1 H (Ω) L (Ω) L (Ω) k k=2
≤C
J
¯ 22 h−1 qk − qk−1 )(θ − θ) L (∂Ω) k (
k=1 2 ¯ 2 1/2 ≤ Cθ − θ H (∂Ω) ≤ C|θ|H 1/2 (∂Ω) ,
where | · |H 1/2 (∂Ω) denotes the H 1/2 (∂Ω) semi-norm. Thus we see that |EJ θ|H 1 (Ω) ≤ C|θ|H 1/2 (∂Ω) . This type of bounded extension operator is precisely what is required for the development of non-overlapping domain decomposition algorithms with inexact solves.
150
James H. Bramble
References 1. R.A. Adams. Sobolev Spaces. Academic Press, Inc., New York, 1975. 2. D. Braess and W. Hackbusch. A new convergence proof for the multigrid method including the V-cycle. SIAM J. Numer. Anal., 20:967–975, 1983. 3. J. H. Bramble and X. Zhang. The analysis of a multigrid methods. In Handbook of Numerical Analysis, Vol. VII. North-Holland, Amsterdam, 2001. 4. James H. Bramble. On the development of multigrid methods and their analysis. In Proceedings of Symposia in Applied Mathematics, volume 48, pages 5–19, 1994. 5. James H. Bramble, Zbigniew Leyk, and Joseph E. Pasciak. Iterative schemes for nonsymmetric and indefinite elliptic boundary value problems. Math. Comp., 60:1–22, 1993. 6. James H. Bramble, J. E. Pasciak, and P. Vasilevski. Computational scales of sobolev spaces with application to preconditioning. Math. Comp., 69:463–480, 2001. 7. James H. Bramble and Joseph E. Pasciak. New convergence estimates for multigrid algorithms. Math. Comp., 49:311–329, 1987. 8. James H. Bramble and Joseph E. Pasciak. New estimates for multigrid algorithms including the V-cycle. Math. Comp., 60:447–471, 1993. 9. James H. Bramble, Joseph E. Pasciak, and Jinchao Xu. Parallel multilevel preconditioners. Math. Comp., 55:1–22, 1990. 10. James H. Bramble and Panayot S. Vassilevski. Wavelet–like extension operators in interface domain decomposition methods. 1997, 1998. 11. James H. Bramble and J. Xu. Some estimates for weighted L2 projections. Math. Comp., 56:463–476, 1991. 12. P.L. Butzer and H. Berens. Semi-Groups of Operators and Approximation. Springer-Verlag, New York, 1967. 13. J. M. Carnicer, Wolfgang Dahmen, , and J. M. Pe˜ na. Local decompositions of refinable spaces and wavelets. Appl. Comp. Harm. Anal., 3:127–153, 1996. 14. P. G. Ciarlet. The Finite Element Method for Elliptic Problems. North-Holland, New York, 1978. 15. P. Grisvard. Elliptic Problems in Nonsmooth Domains. Pitman, Boston, 1985. 16. G. Haase, U. Langer, A. Meyer, and S. V. Nepomnyaschik. Hierarchical extension operators and local multigrid methods in domain decomposition preconditioners. East–West J. Numer. Math., 2:173–193, 1994. 17. L. H¨ ormander. Linear Partial Differential Operators. Springer-Verlag, New York, 1963. 18. U. Kotyczka and Peter Oswald. Piecewise linear prewavelets of small support. In C. Chui and L.L. Schumaker, editors, Approximation Theory VIII, vol. 2 (wavelets and multilevel approximation), pages 235–242, Singapore, 1995. World Scientific. 19. J.L. Lions and E. Magenes. Non-Homogeneous Boundary Value Problems and Applications. Springer-Verlag, New York, 1972. 20. R. Lorentz and Peter Oswal. Multilevel additive methods for elliptic finite element problems. In Proceedings of the Domain Decomposition Conference held in Bergen, Norway, June 3–8, 1996. 21. Frank Natterer. Error bounds for tikhonov regularization in hilbert scales. Appl. Anal., 18:29–37, 1984.
Multilevel Methods in Finite Elements
151
22. S. V. Nepomnyaschikh. Optimal multilevel extension operators. Technical report, Technische Universit¨ at Chemnitz–Zwickau, Germany, 1995. Report SPC 95–3. 23. P. Oswald. On discrete norm estimates related to multilevel preconditioners in the finite element method. In P. Petrushev K. G. Ivanov and B. Sendov, editors, Constructive Theory of Functions, pages 203–214, Bulg. Acad. Sci., Sofia, 1992. Proc. Int. Conf. Varna. 24. Peter Oswald. Multilevel Finite Element Approximation, Theory and Applications. Teubner Stuttgart, 1994. 25. Rob Stevenson. Piecewise linear (pre-) wavelets on non–uniform meshes. Technical report, Department of Mathematics, University of Nijmegen, The Netherlands, 1995. 26. Rob Stevenson. A robust hierarchical basis preconditioner on general meshes. Technical report, Department of Mathematics, University of Nijmegen, The Netherlands, 1995. 27. U. Tautenhahn. Error estimates for regularization methods in hilbert scales. SIAM J. Numer. Anal., 33:2120–2130, 1996. 28. Panayot S. Vassilevski and Junping Wang. Stabilizing the hierarchical basis by approximate wavelets, i: Theory. Numer. Linear Alg. Appl., 4:103–126, 1997. 29. Panayot S. Vassilevski and Junping Wang. Stabilizing the hierarchical basis by approximate wavelets, ii: Implementation and numerical experiments. SIAM J. Sci. Comput., 2000. 30. Xuejun Zhang. Multi-level additive Schwarz methods. Technical report, Courant Inst. Math. Sci., Dept. Comp. Sci. Rep, 1991.
List of Participants
1. BAEZA Antonio, Spain,
[email protected] 2. BELDA Ana Maria, Spain,
[email protected] 3. BERRONE Stefano, Italy,
[email protected] 4. BERTOLUZZA Silvia, Italy,
[email protected] 5. BRAMBLE James, USA, (lecturer)
[email protected] 6. BRAUER Philipp, Germany,
[email protected] 7. BUFFA Annalisa, Italy
[email protected] 8. BURSTEDDE Carsten, Germany
[email protected] 9. CARFORA Maria Francesca, Italy,
[email protected] 10. CATINAS Daniela, Romania,
[email protected] 11. COHEN Albert, France, (lecturer)
[email protected] 12. CANUTO Claudio, Italy, (editor)
[email protected] 13. CORDERO Elena, Italy,
[email protected] 14. DAHMEN Wolfgang, Germany, (lecturer)
[email protected] 15. FALLETTA Silvia, Italy,
[email protected]
154
List of Participants
16. FRYZLEWICZ Piotr, Poland,
[email protected] 17. GHEORGHE Simona, Romania,
[email protected] 18. GRAVEMEIER Volker, Germany,
[email protected] 19. HAUSENBLAS Erika, Austria,
[email protected] 20. HUND Andrea, Germany,
[email protected] 21. JAECKEL Uwe, Germany,
[email protected] 22. JUERGENS Markus, Germany,
[email protected] 23. JYLHAKALLIO Mervi, Finland, Mervi.Jylhakallio@fi.abb.com 24. KONDRATYUK Yaroslav, Ukraine,
[email protected] 25. LEE Kelvin, Singapore,
[email protected] 26. LELIEVRE Tony, France,
[email protected] 27. LENZ Stefano, Germany,
[email protected] 28. LOIX Fabrice, Belgium,
[email protected] 29. MALENGIER Benny, Belgium,
[email protected] 30. MICHELETTI Stefano, Italy,
[email protected] 31. MOMMER Mario, Germany,
[email protected] 32. NGUYEN Hoang, Netherlands,
[email protected] 33. PEROTTO Simona, Italy,
[email protected] 34. ROHRIG Susanne, Germany,
[email protected] 35. RUSSO Alessandro, Italy,
[email protected] 36. SALERNO Ginevra, Italy,
[email protected] 37. SANGALLI Giancarlo, Italy,
[email protected]
List of Participants
38. SANSALONE Vittorio, Italy,
[email protected] 39. SERNA Susana, Spain,
[email protected] 40. SHYSHKANOVA Ganna, Ukraine,
[email protected] 41. TABACCO Anita, Italy,
[email protected] 42. VAN GOTHEM Nicolas, Belgium,
[email protected] 43. VERANI Marco, Italy,
[email protected]
155
LIST OF C.I.M.E. SEMINARS
1954
1. Analisi funzionale 2. Quadratura delle superficie e questioni connesse 3. Equazioni differenziali non lineari
1955
4. 5. 6. 7.
1956
8. 9. 10. 11.
Teorema di Riemann-Roch e questioni connesse Teoria dei numeri Topologia Teorie non linearizzate in elasticit` a, idrodinamica, aerodinamic Geometria proiettivo-differenziale Equazioni alle derivate parziali a caratteristiche reali Propagazione delle onde elettromagnetiche Teoria della funzioni di pi` u variabili complesse e delle funzioni automorfe Geometria aritmetica e algebrica (2 vol.) Integrali singolari e questioni connesse Teoria della turbolenza (2 vol.)
C.I.M.E " " " " " " " " " " " " "
1957
12. 13. 14.
1958
15. Vedute e problemi attuali in relativit` a generale 16. Problemi di geometria differenziale in grande 17. Il principio di minimo e le sue applicazioni alle equazioni funzionali 18. Induzione e statistica 19. Teoria algebrica dei meccanismi automatici (2 vol.) 20. Gruppi, anelli di Lie e teoria della coomologia
" " "
1960
21. Sistemi dinamici e teoremi ergodici 22. Forme differenziali e loro integrali
" "
1961
23. Geometria del calcolo delle variazioni (2 vol.) 24. Teoria delle distribuzioni 25. Onde superficiali
" " "
1962
26. Topologia differenziale 27. Autovalori e autosoluzioni 28. Magnetofluidodinamica
" " "
1963
29. Equazioni differenziali astratte 30. Funzioni e variet` a complesse 31. Propriet` a di media e teoremi di confronto in Fisica Matematica
" " "
1959
" " "
158
LIST OF C.I.M.E. SEMINARS
1964
32. 33. 34. 35. 36. 37. 38.
Relativit` a generale Dinamica dei gas rarefatti Alcune questioni di analisi numerica Equazioni differenziali non lineari Non-linear continuum theories Some aspects of ring theory Mathematical optimization in economics
39. 40. 41. 42. 43. 44.
Calculus of variations Economia matematica Classi caratteristiche e questioni connesse Some aspects of diffusion theory Modern questions of celestial mechanics Numerical analysis of partial differential equations Geometry of homogeneous bounded domains Controllability and observability Pseudo-differential operators Aspects of mathematical logic
1965
1966
1967
1968
1969
1970
1971
45. 46. 47. 48.
49. Potential theory 50. Non-linear continuum theories in mechanics and physics and their applications 51. Questions of algebraic varieties 52. Relativistic fluid dynamics 53. Theory of group representations and Fourier analysis 54. Functional equations and inequalities 55. Problems in non-linear analysis 56. Stereodynamics 57. Constructive aspects of functional analysis (2 vol.) 58. Categories and commutative algebra
C.I.M.E " " " " " " Ed. Cremonese, Firenze " " " " " " " " " " " " " " " " " " "
1972
59. Non-linear mechanics 60. Finite geometric structures and their applications 61. Geometric measure theory and minimal surfaces
" " "
1973
62. Complex analysis 63. New variational techniques in mathematical physics 64. Spectral analysis 65. Stability problems 66. Singularities of analytic spaces 67. Eigenvalues of non linear problems
" "
1975
68. Theoretical computer sciences 69. Model theory and applications 70. Differential operators and manifolds
" " "
1976
71. Statistical Mechanics 72. Hyperbolicity 73. Differential topology
1977
74. Materials with memory 75. Pseudodifferential operators with applications 76. Algebraic surfaces
1974
" " " "
Ed. Liguori, Napoli " " " " "
LIST OF C.I.M.E. SEMINARS
159
1978
77. Stochastic differential equations 78. Dynamical systems
Ed. Liguori, Napoli & Birkh¨ auser "
1979
79. Recursion theory and computational complexity 80. Mathematics of biology
" "
1980
81. Wave propagation 82. Harmonic analysis and group representations 83. Matroid theory and its applications
" " "
1981
84. Kinetic Theories and the Boltzmann Equation 85. Algebraic Threefolds 86. Nonlinear Filtering and Stochastic Control
(LNM 1048) Springer-Verlag (LNM 947) " (LNM 972) "
1982
87. Invariant Theory 88. Thermodynamics and Constitutive Equations 89. Fluid Dynamics
(LNM 996) (LN Physics 228) (LNM 1047)
" " "
1983
90. Complete Intersections 91. Bifurcation Theory and Applications 92. Numerical Methods in Fluid Dynamics
(LNM 1092) (LNM 1057) (LNM 1127)
" " "
1984
93. Harmonic Mappings and Minimal Immersions 94. Schr¨ odinger Operators 95. Buildings and the Geometry of Diagrams
(LNM 1161) (LNM 1159) (LNM 1181)
" " "
1985
96. Probability and Analysis 97. Some Problems in Nonlinear Diffusion 98. Theory of Moduli
(LNM 1206) (LNM 1224) (LNM 1337)
" " "
1986
99. Inverse Problems 100. Mathematical Economics 101. Combinatorial Optimization
(LNM 1225) (LNM 1330) (LNM 1403)
" " "
1987
102. Relativistic Fluid Dynamics 103. Topics in Calculus of Variations
(LNM 1385) (LNM 1365)
" "
1988
104. Logic and Computer Science 105. Global Geometry and Mathematical Physics
(LNM 1429) (LNM 1451)
" "
1989
106. Methods of nonconvex analysis 107. Microlocal Analysis and Applications
(LNM 1446) (LNM 1495)
" "
1990
108. Geometric Topology: Recent Developments 109. H∞ Control Theory 110. Mathematical Modelling of Industrial Processes
(LNM 1504) (LNM 1496) (LNM 1521)
" " "
1991
111. Topological Methods for Ordinary Differential Equations 112. Arithmetic Algebraic Geometry 113. Transition to Chaos in Classical and Quantum Mechanics 114. Dirichlet Forms 115. D-Modules, Representation Theory, and Quantum Groups 116. Nonequilibrium Problems in Many-Particle Systems
(LNM 1537)
"
(LNM 1553) (LNM 1589)
" "
(LNM 1563) (LNM 1565)
" "
(LNM 1551)
"
1992
160
LIST OF C.I.M.E. SEMINARS
1993
117. Integrable Systems and Quantum Groups 118. Algebraic Cycles and Hodge Theory 119. Phase Transitions and Hysteresis
(LNM 1620) Springer-Verlag (LNM 1594) " (LNM 1584) "
1994
120. Recent Mathematical Methods in Nonlinear Wave Propagation 121. Dynamical Systems 122. Transcendental Methods in Algebraic Geometry 123. Probabilistic Models for Nonlinear PDE’s 124. Viscosity Solutions and Applications 125. Vector Bundles on Curves. New Directions
(LNM 1640)
"
(LNM (LNM (LNM (LNM (LNM
1609) 1646) 1627) 1660) 1649)
" " " " "
(LNM 1684)
"
(LNM 1713)
"
(LNM 1656) (LNM 1714) (LNM 1697)
" " "
(LNM (LNM (LNM (LNM (LNM
1716) 1776) 1740) 1784) 1823)
" " " " "
(LNM 1715)
"
(LNM 1734)
"
(LNM 1739)
"
(LNM 1804)
"
to appear
"
(LNM 1775) to appear
" "
(LNM 1822) (LNM 1819) (LNM 1812) to appear to appear to appear (LNM 1802) (LNM 1813) (LNM 1825)
" " " " " " " " "
to appear to appear to appear
" " "
1995
1996
1997
1998
1999
2000
2001
2002
126. Integral Geometry, Radon Transforms and Complex Analysis 127. Calculus of Variations and Geometric Evolution Problems 128. Financial Mathematics 129. Mathematics Inspired by Biology 130. Advanced Numerical Approximation of Nonlinear Hyperbolic Equations 131. Arithmetic Theory of Elliptic Curves 132. Quantum Cohomology 133. Optimal Shape Design 134. Dynamical Systems and Small Divisors 135. Mathematical Problems in Semiconductor Physics 136. Stochastic PDE’s and Kolmogorov Equations in Infinite Dimension 137. Filtration in Porous Media and Industrial Applications 138. Computational Mathematics driven by Industrial Applications 139. Iwahori-Hecke Algebras and Representation Theory 140. Theory and Applications of Hamiltonian Dynamics 141. Global Theory of Minimal Surfaces in Flat Spaces 142. Direct and Inverse Methods in Solving Nonlinear Evolution Equations 143. Dynamical Systems 144. Diophantine Approximation 145. Mathematical Aspects of Evolving Interfaces 146. Mathematical Methods for Protein Structure 147. Noncommutative Geometry 148. Topological Fluid Mechanics 149. Spatial Stochastic Processes 150. Optimal Transportation and Applications 151. Multiscale Problems and Methods in Numerical Simulations 152. Real methods in Complex and CR Geometry 153. Analytic Number Theory 154. Imaging
LIST OF C.I.M.E. SEMINARS
2003
2004
155. 156. 157. 158.
Stochastik Methods in Finance Hyperbolik Systems of Balance Laws Sympletic 4-Manifolds and Algebraic Surfaces Mathematical Foundation of Turbulent Viscous Flows 159. Representation Theory and Complex Analysis 160. Nonlinear and Optimal Control Theory 161. Stochastic Geometry
announced announced announced announced announced announced announced
161
Fondazione C.I.M.E. Centro Internazionale Matematico Estivo International Mathematical Summer Center http://www.math.unifi.it/∼cime
[email protected]fi.it
2004 COURSES LIST Representation Theory and Complex Analysis June 10–17, Venezia Course Directors: Prof. Enrico Casadio Tarabusi (Universit` a di Roma “La Sapienza”) Prof. Andrea D’Agnolo (Universit` a di Padova) Prof. Massimo A. Picardello (Universit` a di Roma “Tor Vergata”)
Nonlinear and Optimal Control Theory June 21–29, Cetraro (Cosenza) Course Directors: Prof. Paolo Nistri (Universit` a di Siena) Prof. Gianna Stefani (Universit` a di Firenze)
Stochastic Geometry September 13–18, Martina Franca (Taranto) Course Director: Prof. W. Weil (Univ. of Karlsruhe, Karlsruhe, Germany)