EMS Tracts in Mathematics 14
EMS Tracts in Mathematics Editorial Board: Carlos E. Kenig (The University of Chicago, USA) Andrew Ranicki (The University of Edinburgh, Great Britain) Michael Röckner (Universität Bielefeld, Germany, and Purdue University, USA) Vladimir Turaev (Indiana University, Bloomington, USA) Alexander Varchenko (The University of North Carolina at Chapel Hill, USA) This series includes advanced texts and monographs covering all fields in pure and applied mathematics. Tracts will give a reliable introduction and reference to special fields of current research. The books in the series will in most cases be authored monographs, although edited volumes may be published if appropriate. They are addressed to graduate students seeking access to research topics as well as to the experts in the field working at the frontier of research. 1 2 3 4 5 6 7 8 9 10 11 12 13
Panagiota Daskalopoulos and Carlos E. Kenig, Degenerate Diffusions Karl H. Hofmann and Sidney A. Morris, The Lie Theory of Connected Pro-Lie Groups Ralf Meyer, Local and Analytic Cyclic Homology Gohar Harutyunyan and B.-Wolfgang Schulze, Elliptic Mixed, Transmission and Singular Crack Problems Gennadiy Feldman, Functional Equations and Characterization Problems on Locally Compact Abelian Groups , Erich Novak and Henryk Wozniakowski, Tractability of Multivariate Problems. Volume I: Linear Information Hans Triebel, Function Spaces and Wavelets on Domains Sergio Albeverio et al., The Statistical Mechanics of Quantum Lattice Systems Gebhard Böckle and Richard Pink, Cohomological Theory of Crystals over Function Fields Vladimir Turaev, Homotopy Quantum Field Theory Hans Triebel, Bases in Function Spaces, Sampling, Discrepancy, Numerical Integration , Erich Novak and Henryk Wozniakowski, Tractability of Multivariate Problems. Volume II: Standard Information for Functionals Laurent Bessières et al., Geometrisation of 3-Manifolds
Steffen Börm
Efficient Numerical Methods for Non-local Operators 2-Matrix Compression, Algorithms and Analysis
Author: Prof. Dr. Steffen Börm Institut für Informatik Christian-Albrechts-Universität zu Kiel 24118 Kiel Germany E-mail:
[email protected]
2010 Mathematical Subject Classification: 65-02; 65F05, 65F30, 65N22, 65N38, 65R20 Key words: Hierarchical matrix, data-sparse approximation, boundary element method, preconditioner
ISBN 978-3-03719-091-3 The Swiss National Library lists this publication in The Swiss Book, the Swiss national bibliography, and the detailed bibliographic data are available on the Internet at http://www.helveticat.ch. This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. For any kind of use permission of the copyright owner must be obtained. © 2010 European Mathematical Society Contact address: European Mathematical Society Publishing House Seminar for Applied Mathematics ETH-Zentrum FLI C4 CH-8092 Zürich Switzerland Phone: +41 (0)44 632 34 36 Email: info @ems-ph.org Homepage: www.ems-ph.org Typeset using the author’s TEX files: I. Zimmermann, Freiburg Printing and binding: Druckhaus Thomas Müntzer GmbH, Bad Langensalza, Germany ∞ Printed on acid free paper 987654321
Foreword
Non-local operators appear naturally in the field of scientific computing: non-local forces govern the movement of objects in gravitational or electromagnetic fields, nonlocal density functions describe jump processes used, e.g., to investigate stock prices, and non-local kernel functions play an important role when studying population dynamics. Applying standard discretization schemes to non-local operators yields matrices that consist mostly of non-zero entries (“dense matrices”) and therefore cannot be treated efficiently by standard sparse matrix techniques. They can, however, be approximated by data-sparse representations that significantly reduce the storage requirements. Hierarchical matrices (H -matrices) [62] are one of these data-sparse representations: H -matrices not only approximate matrices arising in many important applications very well, they also offer a set of matrix arithmetic operations like evaluation, multiplication, factorization and inversion that can be used to construct efficient preconditioners or solve matrix equations. H 2 -matrices [70], [64] introduce an additional hierarchical structure to reduce the storage requirements and computational complexity of H -matrices. In this book, I focus on presenting an overview of theoretical results and practical algorithms for working with H 2 -matrices. I assume that the reader is familiar with basic techniques of numerical linear algebra, e.g., norm estimates, orthogonal transformations and factorizations. The error analysis of integral operators, particularly Section 4.7, requires some results from polynomial approximation theory, while the error analysis of differential operators, particularly Section 9.2, is aimed at readers familiar with standard finite element techniques and makes use of a number of fundamental properties of Sobolev spaces. Different audiences will probably read the book in different ways. I would like to offer the following suggestions: Chapters 1–3 provide the basic concepts and definitions used in this book and any reader should at least be familiar with the terms H -matrix, H 2 -matrix, cluster tree, block cluster tree, admissible and inadmissible blocks and cluster bases. After this introduction, different courses are possible: • If you are a student of numerical mathematics, you should read Sections 4.1–4.4 on integral operators, Sections 5.1–5.5 on orthogonalization and truncation, Sections 6.1–6.4 on matrix compression, and maybe Sections 7.1, 7.2, 7.6 and 7.7 to get acquainted with the concepts of matrix arithmetic operations. • If you are interested in using H 2 -matrices to treat integral equations, you should read Chapter 4 on basic approximation techniques and Chapters 5 and 6 in order to understand how the storage requirements can be reduced as far as possible. Remarks on practical applications can be found in Sections 10.1–10.4.
vi
Foreword
• If you are interested in applying H 2 -matrices to elliptic partial differential equations, you should consider Sections 5.1–5.5 on truncation, Chapter 6 on compression, Chapter 8 on adaptive matrix arithmetic operations and Sections 10.5 and 10.3. Convergence estimates can be found in Chapter 9. If you would like to try the algorithms presented in this book, you can get the HLib software package that I have used to provide the numerical experiments described in Chapters 4–10. Information on this package is available at http://www.hlib.org and it is provided free of charge for research purposes. This book would not exist without the help and support of Wolfgang Hackbusch, whom I wish to thank for many fruitful discussions and insights and for the chance to work at the Max Planck Institute for Mathematics in the Sciences in Leipzig. I am also indebted to my colleagues Stefan A. Sauter, Lars Grasedyck and J. Markus Melenk, who have helped me find answers to many questions arising during the course of this work. Last, but not least, I thank Maike Löhndorf, Kai Helms and Jelena Djoki´c for their help with proofreading the (already quite extensive) drafts of this book and Irene Zimmermann for preparing the final version for publication. Kiel, November 2010
Steffen Börm
Contents 1
2
3
Introduction 1.1 Origins of H 2 -matrix methods . . . . . . . . . . . . . 1.2 Which kinds of matrices can be compressed? . . . . . . 1.3 Which kinds of operations can be performed efficiently? 1.4 Which problems can be solved efficiently? . . . . . . . 1.5 Organization of the book . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
1 1 3 4 6 6
Model problem 2.1 One-dimensional integral operator 2.2 Low-rank approximation . . . . . 2.3 Error estimate . . . . . . . . . . . 2.4 Local approximation . . . . . . . . 2.5 Cluster tree and block cluster tree . 2.6 Hierarchical matrix . . . . . . . . 2.7 Matrix approximation error . . . . 2.8 H 2 -matrix . . . . . . . . . . . . . 2.9 Numerical experiment . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
9 9 10 11 15 16 20 22 23 26
Hierarchical matrices 3.1 Cluster tree . . . . . . . . . . . . . . . . . . . . . . 3.2 Block cluster tree . . . . . . . . . . . . . . . . . . 3.3 Construction of cluster trees and block cluster trees 3.4 Hierarchical matrices . . . . . . . . . . . . . . . . 3.5 Cluster bases . . . . . . . . . . . . . . . . . . . . . 3.6 H 2 -matrices . . . . . . . . . . . . . . . . . . . . . 3.7 Matrix-vector multiplication . . . . . . . . . . . . . 3.8 Complexity estimates for bounded rank distributions 3.9 Technical lemmas . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
28 29 34 37 47 53 56 59 63 70
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
74 76 76 79 87 103 114 125 149 155
4 Application to integral operators 4.1 Integral operators . . . . . . . . . 4.2 Low-rank approximation . . . . . 4.3 Approximation by Taylor expansion 4.4 Approximation by interpolation . . 4.5 Approximation of derivatives . . . 4.6 Matrix approximation . . . . . . . 4.7 Variable-order approximation . . . 4.8 Technical lemmas . . . . . . . . . 4.9 Numerical experiments . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
viii 5
6
Contents
Orthogonal cluster bases and matrix projections 5.1 Orthogonal cluster bases . . . . . . . . . . . . . . . . . . 5.2 Projections into H 2 -matrix spaces . . . . . . . . . . . . 5.3 Cluster operators . . . . . . . . . . . . . . . . . . . . . . 5.4 Orthogonalization . . . . . . . . . . . . . . . . . . . . . 5.5 Truncation . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Computation of the Frobenius norm of the projection error 5.7 Numerical experiments . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
163 164 166 175 180 185 200 202
Compression 6.1 Semi-uniform matrices . . . . . . . . . . . . . . . . . 6.2 Total cluster bases . . . . . . . . . . . . . . . . . . . 6.3 Approximation by semi-uniform matrices . . . . . . . 6.4 General compression algorithm . . . . . . . . . . . . 6.5 Compression of hierarchical matrices . . . . . . . . . 6.6 Recompression of H 2 -matrices . . . . . . . . . . . . 6.7 Unification and hierarchical compression . . . . . . . 6.8 Refined error control and variable-rank approximation 6.9 Numerical experiments . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
211 212 218 222 227 234 239 248 259 271
7 A priori matrix arithmetic 7.1 Matrix forward transformation . . . . 7.2 Matrix backward transformation . . . . 7.3 Matrix addition . . . . . . . . . . . . 7.4 Projected matrix-matrix addition . . . 7.5 Exact matrix-matrix addition . . . . . 7.6 Matrix multiplication . . . . . . . . . 7.7 Projected matrix-matrix multiplication 7.8 Exact matrix-matrix multiplication . . 7.9 Numerical experiments . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
280 281 287 292 293 297 301 302 319 328
8 A posteriori matrix arithmetic 8.1 Semi-uniform matrices . . . . . . . . 8.2 Intermediate representation . . . . . 8.3 Coarsening . . . . . . . . . . . . . . 8.4 Construction of adaptive cluster bases 8.5 Numerical experiments . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
332 334 336 347 359 360
9 Application to elliptic partial differential operators 9.1 Model problem . . . . . . . . . . . . . . . . . . 9.2 Approximation of the solution operator . . . . . 9.3 Approximation of matrix blocks . . . . . . . . . 9.4 Compression of the discrete solution operator . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
363 364 366 376 384
. . . . .
ix
Contents
10 Applications 10.1 Indirect boundary integral equation . . . . . . . . . . . . 10.2 Direct boundary integral equation . . . . . . . . . . . . 10.3 Preconditioners for integral equations . . . . . . . . . . 10.4 Application to realistic geometries . . . . . . . . . . . . 10.5 Solution operators of elliptic partial differential equations
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
387 388 395 401 411 413
Bibliography
421
Algorithm index
427
Subject index
429
Chapter 1
Introduction
The goal of this book is to describe a method for handling certain large dense matrices efficiently. The fundamental idea of the H 2 -matrix approach is to reduce the storage requirements by using an alternative multilevel representation of a dense matrix instead of the standard representation by a two-dimensional array.
1.1 Origins of H 2 -matrix methods The need for efficient algorithms for handling dense matrices arises from several fields of applied mathematics: in the simulation of many-particle systems governed by the laws of gravitation or electrostatics, a fast method for computing the forces acting on the individual particles is required, and these forces can be expressed by large dense matrices. Certain homogeneous partial differential equations can be reformulated as boundary integral equations, and compared to the standard approach, these formulations have the advantage that they reduce the spatial dimension, improve the convergence and can even simplify the handling of complicated geometries. The discretization of the boundary integral equations leads again to large dense matrices. A number of models used in the fields of population dynamics or machine learning also lead to integral equations that, after discretization, yield large dense matrices. Several approaches for handling these kinds of problems have been investigated: for special integral operators and special geometries, the corresponding dense matrices are of Toeplitz or circulant form, and the fast Fourier transform [37] can be used to compute the matrix-vector multiplication in O.n log n/ operations, where n is the matrix dimension. The restriction to special geometries limits the range of applications that can be treated by this approach. The panel clustering method [71], [72], [91], [45] follows a different approach to handle arbitrary geometries: the matrix is not represented exactly, but approximated by a data-sparse matrix, i.e., by a matrix that is still dense, but can be represented in a compact form. This approximation is derived by splitting the domain of integration into a partition of subdomains and replacing the kernel function by local separable approximations. The resulting algorithms have a complexity of O.nm˛ logˇ n/ for problem-dependent small exponents ˛; ˇ > 0 and a parameter m controlling the accuracy of the approximation. The well-known multipole method [58], [60] is closely related and takes advantage of the special properties of certain kernel functions to improve the efficiency. It has
2
1 Introduction
originally been introduced for the simulation of many-particle systems, but can also be applied to integral equations [88], [86], [85], [57]. “Multipole methods without multipoles” [2], [82], [108] replace the original multipole approximation by more general or computationally more efficient expansions while keeping the basic structure of the corresponding algorithms. Of particular interest is a fully adaptive approach [46] that constructs approximations based on singular value decompositions of polynomial interpolants and thus can automatically find efficient approximations for relatively general kernel functions. It should be noted that the concept of separable approximations used in both the panel clustering and the multipole method is already present in the Ewald summation technique [44] introduced far earlier to evaluate Newton potentials in crystallographical research efficiently. Wavelet techniques use a hierarchy of nested subspaces combined with a Galerkin method in order to approximate integral operators [94], [9], [41], [39], [73], [105], [102]. This approach reaches very good compression rates, but the construction of suitable subspaces on complicated geometries is significantly more complicated than for the techniques mentioned before. Hierarchical matrices [62], [68], [67], [49], [52], [63] and the closely related mosaic skeleton matrices [103] are the algebraic counterparts of panel-clustering and multipole methods: a partition of the matrix takes the place of the partition of the domains of integration, and low-rank submatrices take the place of local separable expansions. Due to their algebraic structure, hierarchical matrices can be applied not only to integral equations and particle systems, but also to more general problems, e.g., partial differential equations [6], [56], [55], [54], [76], [77], [78] or matrix equations from control theory [53], [51]. Efficient approximations of densely populated matrices related to integral equations can be constructed by interpolation [16] or more efficient cross approximation schemes [5], [4], [7], [17]. H 2 -matrices [70], [64] combine the advantages of hierarchical matrices, i.e., their flexibility and wide range of applications, with those of wavelet and fast multipole techniques, i.e., the high compression rates achieved by using a multilevel basis. The construction of this cluster basis for different applications is one of the key challenges in the area of H 2 -matrices: it has to be efficient, i.e., it has to consist of a small number of vectors, but it also has to be accurate, i.e., it has to be able to approximate the original matrix up to a given tolerance. In some situations, an H 2 -matrix approximation can reach the optimal order O.n/ of complexity while keeping the approximation error consistent with the requirements of the underlying discretization scheme [91], [23]. Obviously, we cannot hope to be able to approximate all dense matrices in this way: if a matrix contains only independent random values, the standard representation is already optimal and no compression scheme will be able to reduce the storage requirements. Therefore we have first to address the question “Which kinds of matrices can be compressed by H 2 -matrix methods?” It is not sufficient to know that a matrix can be compressed, we also have to be able to find the compressed representation and to use it in applications, e.g., to perform
1.2 Which kinds of matrices can be compressed?
3
matrix-vector multiplications or solve systems of linear equations. Of course, we do not want to convert the H 2 -matrices back to the less efficient standard format, therefore we have to consider the question “Which kinds of operations can be performed efficiently with compressed matrices?” Once these two theoretical questions have been answered, we can consider practical applications of the H 2 -matrix technique, i.e., try to answer the question “Which problems can be solved efficiently by H 2 -matrices?”
1.2 Which kinds of matrices can be compressed? There are two answers to this question: in the introductory Chapter 2, a very simple one-dimensional integral equation is discussed, and it is demonstrated that its discrete counterpart can be handled by H 2 -matrices: if we replace the kernel function by a separable approximation, the resulting matrix will be an H 2 -matrix and can be treated efficiently. Chapter 4 generalizes this result to the more general setting of integral operators with asymptotically smooth kernel functions. In Chapter 6, on the other hand, a relatively general characterization of H 2 -matrices is introduced. Using this characterization, we can determine whether arbitrary matrices can be approximated by H 2 -matrices. In this framework, the approximation of integral operators can be treated as a special case, but it is also possible to investigate more general applications, e.g., the approximation of solution operators of ordinary [59], [96] and elliptic partial differential equations by H 2 -matrices [6], [15]. The latter very important case is treated in Chapter 9.
Separable approximations Constructing an H 2 -matrix based on separable approximations has the advantage that the problem is split into two relatively independent parts: the first task is to approximate the kernel function in suitable subdomains by separable kernel functions. This task can be handled by Taylor expansions [72], [100] or interpolation [45], [65], [23] if the kernel function is locally analytic. Both of these approaches are discussed in Chapter 4. For special kernel functions, special approximations like the multipole expansion [58], [60] or its counterparts for the Helmholtz kernel [1], [3] can be used. The special techniques required by these methods are not covered here. Once a good separable approximation of the kernel function has been found, we face the second task: the construction of an H 2 -matrix. This is accomplished by splitting the integral operator into a sum of local operators on suitably defined subsets and then replacing the original kernel function by its separable approximations. Discretizing the resulting perturbed integral operator by a standard scheme (e.g., Galerkin methods, collocation or Nystrøm techniques) then yields an H 2 -matrix approximation of the original matrix.
4
1 Introduction
The challenge in this task is to ensure that the number of local operators is as small as possible: using one local operator for each matrix entry will not lead to a good compression ratio, therefore we are looking for methods that ensure that only a small number of local operators are required. The standard approach in this context is to use cluster trees, i.e., to split the domains defining the integral operator into a hierarchy of subdomains and use an efficient recursive scheme to find an almost optimal decomposition of the original integral operator into local operators which can be approximated. The efficiency of this technique depends on the properties of the discretization scheme. If the supports of the basis functions are local, i.e., if a neighborhood of the support of a basis function intersects only a small number of supports of other basis functions, it can be proven that the cluster trees will lead to efficient approximations of the matrix [52]. For complicated anisotropic meshes or higher-order basis functions, the situation becomes more complicated and special techniques have to be employed.
General characterization Basing the construction of an H 2 -matrix on the general theory presented in Chapter 6 has the advantage that it allows us to treat arbitrary dense matrices. Whether a matrix can be approximated by an H 2 -matrix or not can be decided by investigating the effective ranks of two families of submatrices, the total cluster bases. If all of these submatrices can be approximated using low ranks, the matrix itself can be approximated by an H 2 -matrix. Since this characterization relies only on low-rank approximations, but requires no additional properties, it can be applied in relatively general situations, e.g., to prove that solution operators of strongly elliptic partial differential operators with L1 coefficients can be approximated by H 2 -matrices. Chapter 9 gives the details of this result.
1.3 Which kinds of operations can be performed efficiently? In this book, we consider three types of operations: first the construction of an approximation of the system matrix, then arithmetic operations like matrix-vector and matrix-matrix multiplications, and finally more complicated operations like matrix factorizations or matrix inversion, which can be constructed based on the elementary arithmetic operations.
Construction An H 2 -matrix can be constructed in several ways: if it is the approximation of an explicitly given integral operator, we can proceed as described above and compute the
1.3 Which kinds of operations can be performed efficiently?
5
H 2 -matrix by discretizing a number of local separable approximations. For integral operators with locally smooth kernel functions, the implementation of this method is relatively straightforward and it performs well. This approach is described in Chapter 4. If we want to approximate a given matrix, we can use the compression algorithms introduced in Chapter 6. These algorithms have the advantage that they construct quasi-optimal approximations, i.e., they will find an approximation that is almost as good as the best possible H 2 -matrix approximation. This property is very useful, since it allows us to use H 2 -matrices as a “black box” method. It is even possible to combine both techniques: if we want to handle an integral operator, we can construct an initial approximation by using the general and simple interpolation scheme, and then improve this approximation by applying the appropriate compression algorithm. The experimental results in Chapter 6 indicate that this technique can reduce the storage requirements by large factors.
Arithmetic operations If we want to solve a system of linear equations with a system matrix in H 2 -representation, we at least have to be able to evaluate the product of the matrix with a vector. This and related operations, like the product with the transposed matrix or forward and backward substitution steps for solving triangular systems, can be accomplished in optimal complexity for H 2 -matrices: not more than two operations are required per unit of storage. Using Krylov subspace methods, it is even possible to construct solvers based exclusively on matrix-vector multiplications and a number of simple vector operations. This is the reason why most of today’s schemes for solving dense systems of equations (e.g., based on panel clustering [72], [91] or multipole expansions [58], [60]) provide only efficient algorithms for matrix-vector multiplications, but not for more complicated operations. Hierarchical matrices and H 2 -matrices, on the other hand, are purely algebraic objects, and since we have efficient compression algorithms at our disposal, we are able to approximate the results of complex operations like the matrix-matrix multiplication. In Chapters 7 and 8, two techniques for performing this fundamental computation are presented. The first one reaches the optimal order of complexity, but requires a priori knowledge of the structure of the result. The second one is slightly less efficient, but has the advantage that it is fully adaptive, i.e., that it is possible to guarantee a prescribed accuracy of the result.
Inversion and preconditioners Using the matrix-matrix multiplication algorithms, we can perform more complicated arithmetic operations like the inversion or the LU factorization. The derivation of the
6
1 Introduction
corresponding algorithms is straightforward: if we express the result in terms of block matrices, we see that it can be computed by a sequence of matrix-matrix multiplications. We replace each of these products by its H 2 -matrix approximation and combine all of the H 2 -submatrices to get an H 2 -matrix approximation of the result (cf. Section 6.7 and Chapter 10). If we perform all operations with high accuracy, the resulting inverse or factorization can be used as a direct solver for the original system, although it may require a large amount of storage. If we use only a low accuracy, we can still expect to get a good preconditioner which can be used in an efficient iterative or semi-iterative scheme, e.g., a conjugate gradient or GMRES method.
1.4 Which problems can be solved efficiently? In this book, we focus on dense matrices arising from the discretization of integral equations, especially those connected to solving homogeneous elliptic partial differential equations with the boundary integral method. For numerical experiments, these matrices offer the advantage that they are discretizations of a continuous problem, therefore we have a scale of discretizations of differing resolution at our disposal and can investigate the behavior of the methods for very large matrices and high condition numbers. The underlying continuous problem is relatively simple, so we can easily construct test cases and verify the correctness of an implementation. We also consider the construction of approximate inverses for the stiffness matrices arising from finite element discretizations of elliptic partial differential operators. In the paper [6], it has been proven that these inverses can be approximated by hierarchical matrices [62], [52], [63], but the proof is based on a global approximation argument that does not carry over directly to the case of H 2 -matrices. Chapter 9 uses the localized approach presented in [15] to construct low-rank approximations of the total cluster bases, and applying the general results of Chapter 6 and [13] yields the existence of H 2 -matrix approximations. H 2 -matrices have also been successfully applied to problems from the field of electromagnetism [24], heat radiation, and machine learning.
1.5 Organization of the book In the following, I try to give an overview of the current state of the field of H 2 -matrices. The presentation is organized in nine chapters covering basic definitions, algorithms with corresponding complexity analysis, approximation schemes with corresponding error analysis, and a number of numerical experiments.
1.5 Organization of the book
7
Chapter 2: Model problem This chapter introduces the basic concepts of H 2 matrices for a one-dimensional model problem. In this simple setting, the construction of an H 2 -matrix and the analysis of its complexity and approximation properties is fairly straightforward. Chapter 3: Hierarchical matrices This chapter considers the generalization of the definition of H 2 -matrices to the multi-dimensional setting. H 2 -matrices are defined based on a block cluster tree describing a partition of the matrix into a hierarchy of submatrices and cluster bases describing the form of these submatrices. If a number of relatively general conditions for the block cluster tree and the cluster bases are fulfilled, it is possible to derive optimal-order estimates for the storage requirements and the time needed to compute the matrix-vector multiplication Chapter 4: Integral operators A typical application of H 2 -matrices is the approximation of matrices resulting from the finite element (or boundary element) discretization of integral operators. This chapter describes simple approximation schemes based on Taylor expansion and constant-order interpolation, but also more advanced approaches based on variable-order interpolation. The error of the resulting H 2 -matrices is estimated by using error bounds for the local approximants of the kernel function. Chapter 5: Orthogonal cluster bases This chapter describes techniques for finding the optimal H 2 -matrix approximation of a given arbitrary matrix under the assumption that a suitable block cluster tree and good cluster bases are already known. If the cluster bases are orthogonal, the construction of the optimal approximation is straightforward, therefore this chapter contains two algorithms for converting arbitrary cluster bases into orthogonal cluster bases: the first algorithm yields an orthogonal cluster basis that is equivalent to the original one, the second algorithm constructs an approximation of lower complexity. Chapter 6: Compression This chapter introduces the total cluster bases that allow us to give an alternative characterization of H 2 -matrices and to develop algorithms for approximating arbitrary matrices. The analysis of these algorithms relies on the results of Chapter 5 in order to establish quasi-optimal error estimates. Chapter 7: A priori matrix arithmetic Once a matrix has been approximated by an H 2 -matrix, the question of solving corresponding systems of linear equations has to be answered. For dense matrices, the usual solution strategies require factorizations of the matrix or sometimes even its inverse. Since applying these techniques directly to H 2 -matrices would be very inefficient, this chapter introduces an alternative: factorization and inversion can be performed using matrix-matrix products, therefore finding an efficient algorithm for approximating these products is an important step towards solving linear systems. By using the orthogonal projections introduced in Chapter 5 and preparing suitable quantities in advance, the best approximation of a matrix-matrix product in a given H 2 -matrix space can be computed very efficiently.
8
1 Introduction
Chapter 8: A posteriori matrix arithmetic The algorithms introduced in Chapter 7 compute the best approximation of the matrix-matrix product in a given matrix space, but if this space is not chosen correctly, the resulting error can be quite large. This chapter describes an alternative algorithm that constructs an H 2 -matrix approximation of the matrix-matrix product and chooses the cluster bases in such a way that a given precision can be guaranteed. Chapter 9: Elliptic partial differential equations Based on the a posteriori arithmetic algorithms of Chapter 8, it is possible to compute approximate inverses of H 2 matrices, but it is not clear whether these inverses can be represented efficiently by an H 2 -matrix. This chapter proves that the inverse of the stiffness matrix of an elliptic partial differential equation can indeed be approximated well in the compressed format, and due to the best-approximation property of the compression algorithm, this means that the computation can be carried out efficiently. Chapter 10: Applications The final chapter considers a number of practical applications of H 2 -matrices. Most of the applications are related to boundary integral formulations for Laplace’s equation, but there are also some examples related to more general elliptic partial differential equations. In some chapters, I have collected technical lemmas in a separate section in the hope of focusing the attention on the important results, not on often rather technical proofs of auxiliary statements.
Chapter 2
Model problem
In this chapter, we introduce the basic concepts of hierarchical matrices and H 2 matrices. Since the underlying ideas are closely related to panel-clustering techniques [72] for integral equations, we use a simple integral operator as a model problem.
2.1 One-dimensional integral operator Let us consider the integral operator Z
1
G Œu.x/ ´
log jx yju.y/ dy 0
for functions u 2 L2 Œ0; 1. For n 2 N, we discretize it by Galerkin’s method using the n-dimensional space spanned by the basis .'i /niD1 of piecewise constant functions given by ´ 1 if x 2 Œ.i 1/=n; i=n; 'i .x/ D 0 otherwise; for all i 2 f1; : : : ; ng and x 2 Œ0; 1. This leads to a matrix G 2 Rnn with entries Z 1 Z 1 Gij ´ 'i .x/ g.x; y/'j .y/ dy dx 0 0 (2.1) Z i=n Z j=n D g.x; y/ dy dx; .i1/=n
.j 1/=n
where the kernel function is given by ´ log jx yj g.x; y/ ´ 0
if x ¤ y; otherwise:
Due to supp g D Œ0; 12 , all entries Gij of the matrix are non-zero, therefore even a simple task like computing the matrix-vector product in the standard way requires at least n2 operations. Finite element techniques tend to require a large number of degrees of freedom in order to reach a suitable precision, and if n is large, a quadratic complexity means that the algorithm will take very long to complete. Therefore we have to look for more efficient techniques for handling the matrix G.
10
2 Model problem
2.2 Low-rank approximation Since typically all entries of G will be non-zero, the standard sparse matrix representations used in the context of partial differential equations will not by efficient. Therefore we have to settle for an approximation which is not sparse, but only data-sparse, i.e., which requires significantly less storage than the original matrix. In order to derive a data-sparse approximation of G, we rely on an approximation of the kernel function g. By defining f W R>0 ! R;
z 7! log z;
8 ˆ
we have
if x > y; if x < y; otherwise;
for all x; y 2 R. Since g is symmetric, we only consider the case x > y in the following. We approximate f by its m-th order Taylor expansion (see, e.g., Chapter XIII, §6 in [75]) around a center z0 2 R>0 , which is given by fQz0 ;m .z/ ´
m1 X
f .˛/ .z0 /
˛D0
.z z0 /˛ : ˛Š
We pick x0 ; y0 2 R with z0 D x0 y0 and notice that the Taylor expansion of f gives rise to an approximation of g, gQ z0 ;m .x; y/ ´ fQz0 ;m .x y/ D
m1 X ˛D0
D
m1 X
f .˛/ .x0 y0 /
˛D0
D
m1 X ˛D0
D
m1 X ˛D0
D
.x y z0 /˛ ˛Š
.x x0 C .y0 y//˛ ˛Š
˛ f .˛/ .x0 y0 / X ˛ .x x0 /˛ .y0 y/ ˛Š D0
f
.˛/
(2.2)
˛ X .x x0 /˛ .y y0 / .x0 y0 / .1/ .˛ /Š Š D0
m1 X m1 X D0
f .˛/ .z0 /
D0
.1/ f .C/ .x0 y0 /
.x x0 / .y y0 / : Š Š
The main property of gQ z0 ;m , as far as hierarchical matrices are concerned, is that the variables x and y are separated in each term of the sum. Expansions with this property,
2.3 Error estimate
11
no matter whether they are based on polynomials or more general functions, are called degenerate. A data-sparse approximation of G can be obtained by replacing the original kernel function g by gQ z0 ;m in (2.1): Z
i=n
Z
zij ´ G .i1/=n
D
.j 1/=n
m1 X m1 X D0
j=n
gQ z0 ;m .x; y/ dy dx
Z .1/ f .C/ .x0 y0 /
D0
Z j=n .x x0 / .y y0 / dx dy: Š Š .i1/=n .j 1/=n i=n
We introduce K ´ f0; : : : ; m 1g and ´ f1; : : : ; ng and matrices V; W 2 RK and S 2 RKK with Z i=n Z j=n .x x0 / .y y0 / Vi ´ dx; Wj ´ dy; Š Š .i1/=n .j 1/=n 8 .C/ (2.3) ˆ .x0 y0 / if x0 > y0 and C < m; <.1/ f .C/ S ´ .1/ f .y0 x0 / if x0 < y0 and C < m; ˆ : 0 otherwise for i; j 2 and 2 K, and find z D V SW ; G
(2.4)
z in a factorized form which requires only 2nm C m2 units of i.e., we can represent G 2 storage instead of n . z is bounded by m. The factorized representation (2.4) implies that the rank of G Conversely, each rank-m-matrix can be expressed in this factorized form.
2.3 Error estimate Replacing g by gQ z0 ;m leads to an approximation error which we have to control in order z is not only data-sparse, but also useful. to ensure that the matrix G A simple first error estimate can be obtained by using the Lagrange representation of the remainder of the Taylor expansion: we consider the approximation of f in an interval Œa; b with 0 < a b using z0 D .a C b/=2 as the center of expansion. We have ˇ ˇ ˇ .m/ .z z0 /m ˇˇ .m 1/Š jz z0 j m Q ˇ jf .z/ fz0 ;m .z/j D ˇf .0 / ˇD mŠ mŠ zz m 1 ba for all z 2 Œa; b; m 2 N; m 2a
12
2 Model problem
5
f Linear
5
4
4
3
3
2
2
1
1
0
f Quadratic
0 a
z0
b
a
z0
b
Figure 2.1. Approximation of f by linear and quadratic Taylor polynomials.
for points zz 2 Œa; b depending on z and m. We can see that fast convergence is guaranteed if the diameter b a of the interval is less than its distance a from the singularity at zero. This simple error estimate is useful since it stresses the importance of controlling the ratio of diameter and distance, but it is not optimal: it suggests that we can only expect convergence if the diameter is not too large, while a more careful analysis reveals that the Taylor series will always converge as long as a > 0 holds. The refined proof is based on the Cauchy representation of the remainder: Lemma 2.1 (Cauchy error representation). Let f 2 C 1 Œa; b and z0 2 Œa; b. We have Z 1 .z z0 /m .1 t /m1 f .m/ .z0 C t .z z0 // dt f .z/ fQz0 ;m .z/ D .m 1/Š 0 for all z 2 Œa; b and all m 2 N. Proof. See, e.g., [75], Chapter XIII, §6. Applying this error representation to the logarithmic function f and bounding the resulting terms carefully yields the following improved error estimate: Lemma 2.2 (Error of the Taylor expansion). Let a; b 2 R with 0 < a < b. Let z0 ´ .a C b/=2 and ´ 2a=.b a/. For all z 2 Œa; b and all m 2 N, we have m1 1 1 : C1 jf .z/ fQz0 ;m .z/j log C1 Proof. Let z 2 Œa; b and m 2 N. Due to Lemma 2.1, the Taylor expansion satisfies Z f .z/ fQz0 ;m .z/ D
1
.1 t /m1 f .m/ .z0 C t .z z0 // 0
.z z0 /m dt: .m 1/Š
(2.5)
2.3 Error estimate
For our special kernel function, we have ´ log.z/ ./ f .z/ D . 1/Š .1/ z
13
if D 0; otherwise;
and the remainder takes the form Z 1 jz z0 jm Q jf .z/ fz0 ;m .z/j .1 t /m1 .m 1/Š jz0 C t .z z0 /jm dt .m 1/Š 0 Z 1 .1 t /m1 jz z0 jm D dt jz0 C t .z z0 /jm 0 m Z 1 jz z0 j m1 .1 t / dt: jz0 j t jz z0 j 0 Due to jz0 j D a C
ba ba ba ba D C D . C 1/ ; 2 2 2 2
jz z0 j
ba ; 2
we find jz z0 j .b a/=2 1 D jz0 j t jz z0 j . C 1/.b a/=2 t .b a/=2 C1t and observe
Z
jf .z/ fQz0 ;m .z/j
1 0
Z
.1 t /m1 dt D . C 1 t /m
1
0
1 Cs
s Cs
m1 ds
by substituting s D 1 t . Since elementary computations yield s 1 Cs C1 we can conclude
jf .z/ fQz0 ;m .z/j D D
for all s 2 Œ0; 1;
1 C1 1 C1 1 C1
m1 Z m1
0
1
1 ds Cs
.log.1 C / log / m1
1C log
;
which is the desired result. The speed of convergence depends on the quantity , the ratio between a and the radius of the interval Œa; b. In order to ensure uniform convergence of gQ z0 ;m , we have to assume a uniform lower bound of jzj D jx yj.
14
2 Model problem
Corollary 2.3. Let 2 R>0 . Let t; s R be non-trivial intervals satisfying diam.t / C diam.s/ 2 dist.t; s/:
(2.6)
Let x0 be the midpoint of t and let y0 be the midpoint of s, and let z0 ´ x0 y0 . Then the estimate m1 jg.x; y/ gQ z0 ;m .x; y/j log. C 1/ C1 holds for all x 2 t , y 2 s, and m 2 N. Proof. Let x 2 t , y 2 s, and m 2 N. Since g and gQ z0 ;m are symmetric, we can restrict our attention to the case x > y without loss of generality. For a t ´ inf t , b t ´ sup t, as ´ inf s, bs ´ sup s we have diam.t / D b t a t ; diam.s/ D bs as ; dist.t; s/ D a t bs ; bt C at bs C as b t C a t bs a s bCa x0 D ; y0 D ; z0 D D 2 2 2 2 with a ´ a t bs D dist.t; s/ and b ´ b t as . We apply Lemma 2.2 to z D x y 2 Œa; b and get
1 jg.x; y/ gQ z0 ;m .x; y/j log 1 C
1 1C
m1 :
Now the admissibility condition (2.6) implies D
2a 2 dist.t; s/ 2 dist.t; s/ 1 D D ba diam.t / C diam.s/ 2 dist.t; s/
and we can conclude jg.x; y/ gQ z0 ;m .x; y/j log. C 1/
1 1 C 1=
m1 D log. C 1/
C1
m1
for all x 2 t and y 2 s. This means that the Taylor expansion gQ z0 ;m will converge exponentially in m, and that the speed of the convergence depends on the ratio of the distance and the diameter of the intervals. The assumption that the intervals containing x and y have positive distance is crucial: if x and y could come arbitrarily close, the resulting singularity in g could no longer be approximated by the polynomial gQ z0 ;m .
2.4 Local approximation
15
2.4 Local approximation Corollary 2.3 implies that we cannot use a global Taylor expansion to store the entire matrix G in factorized form: at least the diagonal entries Gi i correspond to integrals with singular integrands which cannot be approximated efficiently by our approach. Instead, we only use local Taylor expansions for subblocks of the matrix G. Let Ot ; sO , and let t; s Œ0; 1 be intervals satisfying Œ.i 1/=n; i=n t
and
Œ.j 1/=n; j=n s
(2.7)
for all i 2 tO and j 2 sO . Let x t 2 t be the midpoint of t , and let ys 2 s be the midpoint of s. We assume that t and s satisfy the admissibility condition (2.6), so Corollary 2.3 implies that the Taylor expansion of g at the point z0 ´ x t ys will converge exponentially for all x 2 t and y 2 s. Similar to the global case, we introduce the local approximation z t;s ´ V t S t;s Ws G with S t;s 2 RKK and V t ; Ws 2 RK given by ´R i=n .xx t / dx if i 2 tO; .i1/=n Š .V t /i ´ 0 otherwise; ´R j=n .yys / dy if j 2 sO ; .Ws /j ´ .j 1/=n Š 0 otherwise; 8 .C/ ˆ .x t ys / if x t > ys and C < m; <.1/ f .C/ .S t;s / ´ .1/ f .ys x t / if x t < ys and C < m; ˆ : 0 otherwise;
(2.8)
(2.9)
for all i; j 2 and ; 2 K. The error estimate from Corollary 2.3 yields Z
i=n
z t;s /ij j jGij .G .i1/=n
Z
j=n
.j 1/=n
jg.x; y/ gQ z0 ;m .x; y/j dy dx
1 maxfjg.x; y/ gQ z0 ;m .x; y/j W x 2 Œ.i 1/=n; i=n; y 2 Œ.j 1/=n; j=ng n2 1 2 maxfjg.x; y/ gQ z0 ;m .x; y/j W x 2 t; y 2 sg n m1 1 2 log. C 1/ (2.10) n C1
for all m 2 N, i 2 tO and j 2 sO .
16
2 Model problem
2.5 Cluster tree and block cluster tree z t;s of subblocks tO sO of the matrix G will We have seen that local approximations G converge exponentially if the intervals t and s satisfy the admissibility condition (2.6). In order to approximate the entire matrix, we therefore have to cover it with subblocks that either satisfy the condition, and can therefore be stored efficiently in the factorized form (2.8), or which contain only a small number of entries, so that we can afford to store them in the standard way, i.e., as a two-dimensional array. Techniques for constructing this covering for general matrices are discussed in Chapter 3, here we only consider a simple approach suitable for the model problem. We assume that there is a q 2 N0 such that n D 2q , and we define t`;˛ ´ Œ˛2` ; .˛ C 1/2` ;
tO`;˛ ´ f˛2q` C 1; : : : ; .˛ C 1/2q` g
for all ` 2 f0; : : : ; qg and all ˛ 2 f0; : : : ; 2` 1g. By definition, we have t`;˛ Œ0; 1;
tO`;˛ f1; : : : ; ng D ;
and we observe Œ.i 1/=n; i=n D Œ.i 1/2q ; i 2q Œ˛2q` 2q ; .˛ C 1/2q` 2q D t`;˛ for all i 2 tO`;˛ . This means that the support of all basis functions 'i with i 2 tO`;˛ is contained in t`;˛ . We call the intervals t`;˛ clusters, and we call tO`;˛ the index set corresponding to t`;˛ . A very important property of clusters is that they can be arranged in a hierarchy: we have t`;˛ D Œ˛2` ; .˛ C 1/2` D Œ2˛2.`C1/ ; .2˛ C 2/2.`C1/ D Œ2˛2.`C1/ ; .2˛ C 1/2.`C1/ [ Œ.2˛ C 1/2.`C1/ ; .2˛ C 2/2.`C1/ D t`C1;2˛ [ t`C1;2˛C1 ; tO`;˛ D f˛2q` C 1; : : : ; .˛ C 1/2q` g D f2˛2q`1 C 1; : : : ; .2˛ C 2/2q`1 g D f2˛2q`1 C 1; : : : ; .2˛ C 1/2q`1 g P f.2˛ C 1/2q`1 C 1; : : : ; .2˛ C 2/2q`1 g [ P tO`C1;2˛C1 D tO`C1;2˛ [ for all ` 2 f0; : : : ; q 1g and all ˛ 2 f0; : : : ; 2` 1g. This observation suggests to organize the clusters in a tree. We pick the largest cluster t0;0 as the root, and we define the father-son relationships by ´ ¹t`C1;2˛ ; t`C1;2˛C1 º if ` < q; sons.t`;˛ / D ; otherwise;
2.5 Cluster tree and block cluster tree
17
for all ` 2 f0; : : : ; qg and ˛ 2 f0; : : : ; 2` 1g. This defines a balanced binary tree of depth q with 2qC1 1 nodes. Due to its relationship to the index set we denote it by T , and since it consists of clusters, we call it a cluster tree. `D0 `D1 `D2 `D3 f1; 2; 3; 4; 5; 6; 7; 8g f1; 2; 3; 4g f1; 2g
f3; 4g
f5; 6; 7; 8g f5; 6g
f7; 8g
f1g f2g f3g f4g f5g f6g f7g f8g
Figure 2.2. Cluster tree for q D 3.
The most important purpose of a cluster tree is to allow us to split the matrix G efficiently into submatrices GjtOOs that can be approximated efficiently, i.e., that correspond to intervals satisfying the admissibility condition. Using the cluster tree, we can organize the search for these submatrices by using a hierarchy: starting with t D s D t0;0 , we check whether a pair .t; s/ of clusters is admissible. If it is, we can approximate the corresponding matrix block GjtOOs . If the admissibility condition does not hold, we proceed to check all pairs consisting of sons of t and sons of s. The recursive procedure stops if a pair is admissible or if no more sons exist. In the latter case, our construction implies that tO and sO contain only a small number of indices (in our construction even only one), so we can afford to simply store all coefficients of the submatrix GjtOOs . The recursive search for admissible pairs of clusters suggests a second tree of pairs of clusters: its root is .t0;0 ; t0;0 /, and the father-son relationship is defined by 8 ˆ if .t; s/ is admissible; <; sons.t; s/ D ; if sons.t / D ; D sons.s/; ˆ : 0 0 ¹.t ; s / W t 0 2 sons.t /; s 0 2 sons.s/º otherwise: The elements of this tree are called blocks, since each of them is a pair .t; s/ corresponding to a block GjtOOs of the matrix G. The tree is called block cluster tree and
18
2 Model problem
denoted by T since it describes a hierarchy of subsets of the index set in the same way that the cluster tree T describes a hierarchy of subsets of the index set . Since the root of T is the pair .t0;0 ; t0;0 /, the definition implies that each block b D .t; s/ 2 T is of the form b D .t`;˛ ; t`;ˇ / for ` 2 f0; : : : ; qg and ˛; ˇ 2 f0; : : : ; 2` 1g. We use the short notation b`;˛;ˇ ´ .t`;˛ ; t`;ˇ /: In order to keep the presentation simple, we only consider the admissibility condition (2.6) with D 1. A block b`;˛;ˇ is admissible if and only if diam.t`;˛ / C diam.t`;ˇ / 2 dist.t`;˛ ; t`;ˇ / holds. Due to the definition of the cluster tree, it is easy to see that diam.t`;˛ / D 2` ;
dist.t`;˛ ; t`;ˇ / D 2` maxfj˛ ˇj 1; 0g
hold for all ` 2 f0; : : : ; qg and ˛; ˇ 2 f0; : : : ; 2` 1g, therefore a block b`;˛;ˇ is admissible if and only if 1 maxfj˛ ˇj 1; 0g () j˛ ˇj 2 holds. This gives us a simple and practical alternative definition of the block cluster tree: its root is b0;0;0 , and we have 8 ˆ if ` D q or j˛ ˇj 2; <; sons.b`;˛;ˇ / D ¹b`C1;2˛;2ˇ ; b`C1;2˛C1;2ˇ ; ˆ : b`C1;2˛;2ˇ C1 ; b`C1;2˛C1;2ˇ C1 º otherwise: Since we only have to store data for the leaves of the block cluster tree, we are interested in knowing how many leaves there are on the different levels `. Lemma 2.4 (Types of blocks). A block b`;˛;ˇ is called diagonal if ˛ D ˇ holds, it is called neighbour if j˛ ˇj D 1 holds, and it is called admissible otherwise. We denote the sets of diagonal, neighbour and admissible blocks on a level ` 2 f0; : : : ; qg of the block cluster tree T by D` ´ fb`;˛;ˇ 2 T W ˛ D ˇg; N` ´ fb`;˛;ˇ 2 T W j˛ ˇj D 1g; A` ´ fb`;˛;ˇ 2 T W j˛ ˇj > 1g: Then we have
´ `
#D` D 2 ;
`C1
#N` D 2
2;
#A` D
0 3.2` 2/
if ` D 0; otherwise.
(2.11)
2.5 Cluster tree and block cluster tree
19
Figure 2.3. Diagonal, neighbour and admissible blocks on different levels of the block cluster tree. The bottom right picture shows all leaf blocks.
Proof. By induction on `. On level ` D 0 of the block cluster tree, there is exactly one block b0;0;0 , and it is obviously diagonal, so we have #D` D 1 D 2` ;
#N` D 0 D 2`C1 2;
#A` D 0:
Let now ` 2 f0; : : : ; q 1g be fixed such that (2.11) holds. Let b`;˛;ˇ 2 D` , i.e., we have ˛ D ˇ. Since this block is not admissible, it is split into its sons b`C1;2˛;2˛ , b`C1;2˛C1;2˛ , b`C1;2˛;2˛C1 , b`C1;2˛C1;2˛C1 . The first and the last son are again diagonal, the others are neighbour blocks. Let b`;˛;ˇ 2 N` . Without loss of generality we assume ˛ C 1 D ˇ. This block, too, is not admissible, so we split it into b`C1;2˛;2˛C2 , b`C1;2˛C1;2˛C2 , b`C1;2˛;2˛C3 , b`C1;2˛C1;2˛C3 . The second of these sons is a neighbour block, the others are admissible. Let b`;˛;ˇ 2 A` . Since admissible blocks are not split, this block contributes nothing to the next level. Since each block has to be either diagonal, neighbour or admissible, we conclude #D`C1 D 2#D` D 2 2` D 2`C1 ; #N`C1 D 2#D` C #N` D 2 2` C 2`C1 2 D 2`C2 2; #A`C1 D 3#N` D 3.2`C1 2/;
20
2 Model problem
and the induction is complete. The lemma implies that the number of blocks on a given level ` 2 f0; : : : ; qg can be bounded by #D` C #N` C #A` < 2` C 2 2` C 3 2` D 6 2` and that the entire block cluster tree therefore contains not more than q X `D0
6 2` D 6
q X
2` D 6.2qC1 1/ 12 2q D 12n
`D0
blocks, i.e., the number of blocks grows only linearly with the matrix dimension, not quadratically. This is a key property of the block cluster tree. Since the sons of a block describe a disjoint partition of their father, a simple induction (cf. Corollary 3.15) yields that the set L of leaves of T corresponds to a disjoint partition of , and this disjoint partition can be used to define an approximation of the matrix G.
2.6 Hierarchical matrix Given a block cluster tree, we can now define an approximation of the matrix G. Until now we have denoted submatrices corresponding to blocks b D .t; s/ by GjtOOs . In order to avoid technical complications dealing with different subsets of the index set , it is a good idea to use a different notation: we consider submatrices of G as elements of R which vanish outside of the index set tO sO . This does not increase the storage requirements, since only non-zero entries have to be stored, but it makes treating interactions between submatrices corresponding to different clusters easier. For each cluster t 2 T , we introduce the diagonal matrix t 2 R by ´ 1 if i D j 2 tO; for all i; j 2 : . t /ij ´ 0 otherwise The restriction of G to a submatrix corresponding to a block b D .t; s/ is expressed by t Gs in the new notation: all coefficients outside the rows in tO are eliminated, and also all outside the columns of sO . Since the leaves L of the block cluster tree T form a disjoint partition of , we get X t Gs : (2.12) GD bD.t;s/2L
21
2.6 Hierarchical matrix
We split the leaves into admissible and inadmissible ones using LC ´ fb 2 L W b is admissibleg;
C L ´ L n L
and note that for admissible leaves the submatrix t Gs can be replaced by the approximation (2.8). This yields the approximation of G by a hierarchical matrix X X z´ z t;s C G t Gs G bD.t;s/2LC
D
X
bD.t;s/2L
X
V t Sb Ws C
bD.t;s/2LC
(2.13) t Gs :
bD.t;s/2L
The submatrices corresponding to admissible blocks have the form z t;s D V t Sb Ws G and are typically stored in matrices Ab D V t Sb 2 RtOK and Bb D Ws 2 RsO K in the simple product form z t;s D Ab B G (2.14) b
requiring .#tO C #sO /m units of storage. This representation is called the hierarchical matrix representation or H -matrix representation. Let us take a look at the storage requirements of this representation. Since the factorized representation is only useful if .#tO/.#sO / .#tO C #sO /m holds, it makes sense to stop subdividing the block cluster tree before the blocks become too small. On level ` 2 f0; : : : ; qg, we have #tO`;˛ D 2q` for all ˛ 2 f0; : : : ; 2` 1g and get 2q` 2q` D .#tO`;˛ /.#tO`;ˇ / .#tO C #sO /m D 2 2q` m () 2q` 2m: We let p0 ´ dlog2 me C 1;
2m D 2log2 mC1 2p0 < 2log2 mC2 D 4m
(2.15)
and choose to stop subdividing blocks at level p ´ q p0 . This implies 2q` 2qp D 2p0 2m and therefore ensures that we apply the approximation only to submatrices that are sufficiently large. From now on we assume that the cluster tree T and the block z is defined accordingly. cluster tree stop at level p and that the hierarchical matrix G
22
2 Model problem
By definition, admissible blocks b`;˛;ˇ 2 T are stored in the factorized form (2.14) requiring .#tO`;˛ C #tO`;ˇ /m D .2q` C 2q` /m D 2m2q` units of storage. Using Lemma 2.4 yields that all admissible blocks require p X `D0
.#A` /2m2q` D
p X
3.2` 2/2m2q` D 6m
`D1
p X
2q 2q`C1
`D1 p0 C1
D 6mpn 12mn C 6m2
D 6m.p 2/n C 6m2p0 C1
units of storage if p 2. Due to 2p0 n we get the upper bound 6m.p 2/n C 6m2p0 C1 6m.p 2/n C 6m2n D 6mpn;
(2.16)
i.e., the storage requirements for the admissible blocks are in O.nm log2 n/. The inadmissible blocks occurring on the maximal level p are stored as twodimensional arrays requiring .#tO`;˛ /.#tO`;ˇ / D 2q` 2q` D 22p0 units of storage per block, and using Lemma 2.4 shows that all inadmissible blocks require .#Dp C #Np /22p0 D .2p C 2pC1 2/22p0 D .3 2p 2/22p0 D 3 2pCp0 2p0 22p0 C1 D 3n2p0 22p0 C1 units of storage. Due to 2p0 4m we get the upper bound 3n2p0 22p0 C1 3n4m D 12nm;
(2.17)
i.e., the storage requirements for the inadmissible blocks are in O.nm/. z by a hierarchical matrix requires not We can conclude that the representation of G more than 6m.p C 2/n (2.18) units of storage, therefore the storage complexity is in O.nm log2 n/, a major improvement over the quadratic complexity of the traditional representation by a twodimensional array.
2.7 Matrix approximation error The approximation error is given by zD GG
X
bDts2Pfar
. t Gs G t;s /;
2.8 H 2 -matrix
23
i.e., we can construct global error estimates by combining local ones. For the Frobenius norm, we can use (2.10) to find the following simple estimate: 1=2 XX z F D zij /2 kG Gk .Gij G
i2 j 2
1 2m2 X X 1 1=2 2 log . C 1/ n2 C1 n2 i2 j 2 m1 1 D log. C 1/ : n C1
(2.19)
For the spectral norm, we can derive a simple global error estimate by using the Cauchy– Schwarz inequality: for all vectors x 2 R we have 2 1=2 X X XX X 1=2 kGxk2 D Gij xj Gij2 xk2 D kGkF kxk2 ; i2
j 2
i2 j 2
k2
and this implies kGk2 kGkF , so we conclude z 2 kG Gk z F kG Gk
m1 1 : log. C 1/ n C1
(2.20)
The storage requirements of the H -matrix are bounded by 6mn.p C 2/, therefore they only increase linearly with respect to m, while (2.19) and (2.20) show that the error decreases exponentially (for our choice D 1, each additional Taylor term at least halves the error).
2.8 H 2 -matrix We have seen that the nearfield matrices require not more than 12nm units of storage, i.e., the storage requirements grow only linearly in the number n of degrees of freedom. The farfield representation of an H -matrix, on the other hand, requires roughly 6mnp units of storage, i.e., it scales like n log2 n if the matrix dimension grows. For large problems, the logarithmic factor is undesirable, therefore we are interested z that scales linearly with n. in finding a representation of G The key idea is to consider the matrices V t and Ws appearing in (2.8) not independently, but to treat the entire families .V t / t2T and .Ws /s2T and take advantage of relationships between “family members”. These families are referred to as cluster z t;s , while the bases, since the columns of V t form a generating set for the range of G z t;s columns of Ws form a generating set for the range of G . The columns do not have to be linearly independent, although they usually will be (e.g., for the orthogonal cluster bases introduced in Chapter 5).
24
2 Model problem
Let us take a look at a matrix V t for t 2 T : it is defined by Z
i=n
.V t /i D .i1/=n
.x x t / dx Š
for the midpoint x t of t, i 2 tO and 2 K. We consider the case that t is not a leaf. Let t 0 2 sons.t /, and let x t 0 be the midpoint of t 0 . In this case, we can use .x x t / 1 X 1 .x x t 0 / .x t 0 x t / D .x x t 0 C x t 0 x t / D Š Š Š D0
D
X .x x t 0 / .x t 0 x t / Š . /Š D0
to express the integrand in terms of Taylor monomials centered at x t 0 of t 0 instead of x t and get Z i=n X .x t 0 x t / .x x t / .x x t 0 / .V t /i D dx D dx Š Š . /Š .i1/=n D0 .i 1/=n (2.21) for all i 2 tO0 and all 2 K. The first factor is just the coefficient .V t 0 /i of the matrix corresponding to the cluster t 0 . The second factor describes the change of basis from the center x t to the center x t 0 , and we collect its coefficients in a transfer matrix E t 0 2 RKK given by ´ .x t 0 x t / if ; ./Š 0 .E t / ´ 0 otherwise;
Z
i=n
for all ; 2 K. Using this matrix, the equation (2.21) can be expressed by X .V t /i D .V t 0 /i .E t 0 / D .V t 0 E t 0 /i
(2.22)
2K
for all i 2 tO0 and all 2 K. Let now i 2 tO and 2 K. Due to our definition of the cluster tree, there is exactly one ti 2 sons.t/ with i 2 tOi . Combining the equations (2.9) and (2.22) yields X .V t 0 E t 0 /i D .V ti E ti /i D .V t /i ; t 0 2sons.t/
since the i -th row of V t 0 can only differ from zero if i 2 tO0 , and we have already established that this only happens for t 0 D ti .
2.8 H 2 -matrix
We can conclude that Vt D
X
25
Vt 0 Et 0
t 0 2sons.t/
holds for all cluster t 2 T that are not leaves, i.e., the matrix V t corresponding to any non-leaf cluster t can be expressed in terms of the matrices V t 0 corresponding to the sons of t . Cluster bases with this property are called nested. This property suggests a data-sparse, hierarchical representation for .V t / t2T : we store V t only for leaf clusters and use the expansion matrices E t to express all other cluster basis matrices. Since the expansion matrices require only m2 units of storage, this representation is far more efficient than the original one. Let us consider the storage requirements for a cluster basis .V t / t2T represented in this form. With q, p and p0 as in Section 2.6, we see that the cluster tree T consists of p X 2` D 2pC1 1 (2.23) `D0
clusters. We have to store a transfer matrix E t for all clusters except for the root, and since each of these matrices is given by m2 coefficients, we need m2 .2pC1 2/ units of storage. We also have to store the matrices V t for all leaf clusters. Due to our construction, all leaf clusters are on level p, and this level contains 2p clusters t of size #tO D 2qp . Since we only have to store the #tO non-zero rows of each V t , the storage requirements for one matrix are .#tO/m D 2qp m, and for all matrices 2p 2qp m D 2q m D nm: Therefore the entire cluster basis .V t / t2T requires nm C 2m2 .2p 1/ units of storage, and (2.15) yields the bound nmC2m2 .2p 1/ nmC2p0 m2p D nmCm2pCp0 D nmCm2q D 2nm (2.24) for the storage requirements of the cluster basis .V t / t2T . Since Ws D Vs holds for all s 2 T , the cluster basis .Ws /s2T requires no additional storage. This leaves only the matrices .Sb /b2LC describing the coupling coefficients of clusters t and s with respect to the cluster bases .V t / t2T and .Ws /s2T . Each of these matrices requires m2 units of storage, and we have to store one matrix for each admissible block. Using Lemma 2.4 we can compute that all the coupling matrices require p X `D0
.#A` /m2 D
p X `D1
3.2` 2/m2 D 6m2
p1 X
.2` 1/ D 6m2 .2p 1 p/
`D0
26
2 Model problem
units of storage, and due to (2.15) we can bound this by 6m2 .2p 1 p/ 3m2m2p 3m2p0 2p D 3m2q D 3mn:
(2.25)
We see that we have reached our goal: the storage requirements for the cluster basis and the coupling matrices, i.e., for the representation of all admissible blocks, are bounded by 5mn, and therefore grow only linearly with n. A matrix given in the form (2.13), where the cluster bases .V t / t2T and .Ws /s2T are nested and represented by expansion matrices, is called an H 2 -matrix. We have seen that the nearfield requires less than 12mn units of storage, the coupling matrices require less than 3mn units, and the cluster basis requires 2mn, so we get a total of less than 17mn units of storage for the H 2 -matrix. More precisely, the H -matrix representation of the admissible blocks requires 6m..p 2/n C 2p0 C1 / D 6mn.p 2/ C 12m2p0 units of storage according to (2.16), while the H 2 -matrix requires not more than 5mn units of storage for the same purpose according to (2.24) and (2.25). We can see that the H 2 -matrix will be more efficient as soon as p grows larger than 3. The estimates even imply that the ratio of the storage requirements of H - and H 2 -matrices is bounded from below by p 2, i.e., the H 2 -matrix representation will become more efficient as the matrix dimension grows.
2.9 Numerical experiment We conclude this chapter with a simple numerical experiment: we compare the matrix G and its H 2 -matrix approximation GQ for different problem dimensions n and different expansion orders m. In order to improve the storage efficiency, we stop the construction of the cluster tree as soon as the leaves contain not more than 4m indices. Q 2 in the spectral norm, Table 2.1 contains the absolute approximation errors kG Gk estimated by a power iteration. For D 1, our theory predicts that the approximation error will be proportional to .1=2/m , and this behaviour is indeed visible. The theory also states that the error will be proportional to 1=n, and this prediction also coincides with the experiment. Table 2.2 contains the storage requirements in units of 1 KByte = 1024 Bytes. Our theoretical estimates predict that the storage requirements will be proportional to n, and this is clearly visible: H 2 -matrices indeed have linear complexity in the number n of degrees of freedom. The prediction of the dependence on m is far less accurate, since the expansion order influences the depth of the cluster tree and thereby the balance
2.9 Numerical experiment
27
Table 2.1. Approximation errors for the model problem.
n mD1 mD2 mD3 mD4 mD5 mD6 mD7 256 3:34 7:05 1:25 3:86 1:16 4:27 1:57 512 1:74 3:65 6:06 2:06 5:67 2:27 7:58 1024 8:45 1:95 3:06 1:06 2:97 1:17 3:88 2048 4:25 9:46 1:56 5:37 1:47 5:78 1:98 4096 2:15 4:76 7:67 2:77 7:28 2:98 9:59 8192 1:15 2:46 3:87 1:37 3:68 1:48 4:89
Table 2.2. Storage requirements in KB per degree of freedom for H 2 -matrix approximations and standard array representation for the model problem.
n m D 1 m D 2 m D 3 m D 4 m D 5 m D 6 m D 7 Array 256 0:45 0:39 0:43 0:51 0:55 0:59 0:64 2:0 0:46 0:40 0:45 0:53 0:58 0:62 0:68 4:0 512 0:47 0:41 0:46 0:55 0:59 0:64 0:70 8:0 1024 2048 0:47 0:41 0:46 0:56 0:60 0:65 0:71 16:0 4096 0:47 0:42 0:47 0:56 0:60 0:66 0:71 32:0 0:48 0:42 0:47 0:56 0:61 0:66 0:72 64:0 8192 1048576 0:48 0:42 0:47 0:56 0:61 0:66 0:72 8192:0
between storage required for the nearfield, coefficient, expansion and cluster basis matrices and storage required for internal bookkeeping of the program. Still we can see that using an expansion order of m D 7 will yield a precision which should be sufficient for most practical applications at a cost of less than 1 KByte per degree of freedom. For n D 8192 D 213 , this translates to a compression factor of 1:13%, and the compression factor will improve further as n grows larger. As an example, the case n D 1048576 D 220 has been included. The array representation would require more than 8 TBytes, while the H 2 -representation takes less than 1 GByte. This corresponds to a compression factor of 0:009%. On a single 900 MHz UltraSPARC IIIcu processor of a SunFire 6800 computer, the setup of the standard representation for n D 8192 requires 118 seconds, while the setup of the H 2 -matrix approximation with m D 7 is accomplished in less than 0:5 seconds, so we can conclude that the compressed format saves not only storage, but also time.
Chapter 3
Hierarchical matrices
We have seen that H 2 -matrices can be used to represent the dense matrices corresponding to the model problem efficiently. In order to be able to treat more general problems, we require more general structures which still retain the important properties observed in the model problem. We start by introducing general cluster trees and block cluster trees. The definitions can be kept relatively simple: the most important property of a cluster tree, as far as theoretical investigations are involved, is that the index set corresponding to a nonleaf cluster is the disjoint union of the index sets corresponding to its sons. The most important property of a block cluster tree is that it is a cluster tree and that the index sets corresponding to its nodes are of product structure. Once we have general block cluster trees at our disposal, we can define general hierarchical matrices and prove bounds for their storage complexity. The definition of general H 2 -matrices is slightly more challenging, since we have to be able to handle simple constant-order approximation schemes as well as a variety of variable-order schemes, and we are looking for definitions that allow us to treat all of these methods in a unified way and still get optimal estimates. The key is the definition of a bounded rank distribution that covers all applications mentioned above and is still relatively simple. The complexity estimate for H 2 -matrices given for the model problem in the previous chapter essentially relies on the fact that the number of matrix blocks is proportional to the number of clusters. We can use the same approach for general H 2 -matrices, i.e., we introduce the concept of sparse block cluster trees and prove that the storage requirements of an H 2 -matrix grow linearly with respect to the number of clusters if the rank distributions are bounded and the block cluster tree is sparse. There is a price to pay for the optimal order of complexity: we cannot store cluster bases directly, but have to rely on transfer matrices. This implies that we cannot compute matrix-vector products directly, but require suitable recursive algorithms: the forward and backward transformations, which allow us to perform the computation in linear complexity. This chapter is organized as follows: • Section 3.1 introduces the general definition of a cluster tree and proves a number of its basic properties. • Section 3.2 contains the general definition of a block cluster tree. • Section 3.3 describes a simple geometrical construction for cluster and block cluster trees, which will be used in numerical examples presented in the other chapters.
3.1 Cluster tree
29
• Section 3.4 gives the general definition of hierarchical matrices, a predecessor of H 2 -matrices (cf. [62], [68], [67], [52], [63]). • Section 3.5 introduces general cluster bases and the basic concepts for estimating the complexity of algorithms for H 2 -matrices. • Section 3.6 is devoted to the general definition of H 2 -matrices and H 2 -matrix spaces and to proving bounds for the storage complexity (cf. [70]). • Section 3.7 presents the most important algorithm in the context of H 2 -matrices: the evaluation of the product of an H 2 -matrix and an arbitrary vector. • Section 3.8 contains a number of definitions that allow us to express complexity estimates in terms of matrix dimensions instead of numbers of clusters. • Section 3.9 contains a number of auxiliary results required in the other sections. The proofs are only included for the sake of completeness. Assumptions in this chapter: We assume that finite index sets and J are given and let n ´ # and nJ ´ #J denote the cardinalities of these index sets.
3.1 Cluster tree The construction of the H - and H 2 -matrix approximations of the matrix G in Chapter 2 is based on a cluster tree T and a block cluster tree T . The elements t 2 T of the cluster tree correspond to index sets tO , and the elements b D .t; s/ 2 T of the block cluster tree correspond to pairs of these index sets describing submatrices GjtOOs of the matrix G. The cluster tree used in the model case has a very regular structure: all leaves are on the same level of the tree, and if a cluster is not a leaf, it has exactly two sons of exactly the same size. These properties severely limit the applicability of these regular cluster trees. In order to be able to handle more general matrices, we require more general cluster trees. Definition 3.1 (Tree). Let T D .V; S; r/ be a triple of a finite set V , a mapping S W V ! P .V / of V into the subsets of V , and an element r 2 V . A tuple v0 ; v1 ; : : : ; v` 2 V is called a path connecting v0 and v` in T if vi 2 S.vi1 /
holds for all i 2 f1; : : : ; `g:
The triple T is called a tree if for each v 2 T there is a unique path connecting r to v in T . In this case, we call V the set of nodes, r the root, and S.v/ the set of sons of v 2 V .
30
3 Hierarchical matrices
Definition 3.2 (Tree notations). Let T D .V; S; r/ be a tree. In order to avoid working with the triple .V; S; r/ explicitly, we introduce the short notation v 2 T for v 2 V , sons.v/ D S.v/ (assuming that the tree T corresponding to v is clear from the context), and root.T / D r. Due to Definition 3.1, for each v 2 T there is a unique path r D v0 ; : : : ; v` D v connecting the root to v. If v ¤ r, we have ` > 0, therefore v`1 2 T exists, i.e., there is a unique v`1 2 T with v 2 sons.v`1 /. We call this node the father of v and denote it by father.v/. In the model problem, there is an index set tO associated with each cluster t 2 T . In the general case, we use labels to attach additional information to each node of a tree: Definition 3.3 (Labeled tree). Let T D .V; S; r; ; L/. T is a labeled tree if .V; S; r/ is a tree and if W V ! L is a mapping from the set of nodes V into L. The set L is called the label set, and for all v 2 V , .v/ is called the label of v and denoted by v. O The notations for trees introduced in Definition 3.2 are also used for labeled trees. Using the general labeled tree, we can define the general cluster tree that keeps the most important properties of the simple one introduced in Section 2.5, but can also be useful in far more general situations. Definition 3.4 (Cluster tree). Let T be a labeled tree. T is a cluster tree for the index set if the following conditions hold: • The label of the root r D root.T / is , i.e., rO D . • If t 2 T has at least one son, then the labels of the sons form a disjoint partition S of the label of the father, i.e., tO D P fOs W s 2 sons.t /g. The nodes t 2 T of a cluster tree T are called clusters. A cluster t 2 T with sons.t/ D ; is called a leaf, and the set of all leaves is denoted by L ´ ft 2 V W sons.t / D ;g: Differently from the special case considered for the model problem, we allow arbitrary partitions of the index set instead of the contiguous sets used in the onedimensional case, see Figure 3.1. Definition 3.5 (Cluster relationships). For all t 2 T , we define the set of descendants inductively by ´ S ¹t º [ t 0 2sons.t/ sons .t 0 / if sons.t / ¤ ; sons .t / ´ ¹tº otherwise: The set of predecessors of a cluster t 2 T is given by pred.t / ´ ft 2 T W t 2 sons .t /g:
31
3.1 Cluster tree
f1; 2; 3; 4; 5; 6; 7; 8; 9g
f2; 3; 4; 6; 9g
f4g
f1; 5; 7; 8g
f2; 3; 6; 9g
f6g
f2; 9g
f1; 8g
f3g
f1; 8g
f5; 7g
f7g
f5g
Figure 3.1. Simple cluster tree for D f1; 2; 3; 4; 5; 6; 7; 8; 9g.
t
t
Figure 3.2. Descendants (top) and predecessors (bottom) of a cluster t .
It is frequently useful to split the cluster tree into levels, i.e., to assign each cluster an integer according to its distance from the root. Definition 3.6 (Cluster level). Let T be a cluster tree. The level of a cluster t 2 T is defined inductively by ´ level.t C / C 1 if there is a cluster t C 2 T with t 2 sons.t C / level.t/ D 0 otherwise, i.e., if t D root.T /: The definition implies that the root will be assigned a level of zero and all other clusters
32
3 Hierarchical matrices
T0 T1 T2 T3
Figure 3.3. Levels in a cluster tree.
t will be assigned the level of their father t C plus one. All clusters on a given level ` 2 N0 are collected in the set T` ´ ft 2 T W level.t / D `g and all leaves on a given level ` 2 N0 in the set L` ´ T` \ L D ft 2 L W level.t / D `g: As an example for the usefulness of the concept of cluster levels, let us consider the proof of the following lemma: Lemma 3.7 (Transitivity). Let t; s; r 2 T with s 2 sons .t / and r 2 sons .s/. Then we have r 2 sons .t /. Proof. By induction on level.r/ level.t /. For level.r/level.t / D 0, Definition 3.5 yields r D s D t , so our claim is trivially fulfilled. Let now n 2 N0 , and let us assume that r 2 sons .t / holds for all t; s; r 2 T with s 2 sons .t/, r 2 sons .s/ and level.r/ level.t / D n. Let t; s; r 2 T with s 2 sons .t /, r 2 sons .s/ and level.r/ level.t / D n C 1. Case 1: t D s. This case is trivial, since we have r 2 sons .s/ D sons .t /. Case 2: t ¤ s. Definition 3.5 implies that there is a cluster t 0 2 sons.t / with s 2 sons .t 0 /. Since level.t 0 / D level.t / C 1, we have level.r/ level.t 0 / D n and can apply the induction assumption in order to prove that r 2 sons .t 0 / sons .t / holds, which concludes the induction. For the simple cluster tree used in the one-dimensional example, the relationships between clusters are fairly simple: on each level, the clusters describe a disjoint partition of , all clusters on level p are leaves, while all clusters on all other levels have exactly two sons. Since all of these properties are lost in the general case, we have to base proofs of complexity or error estimates on the following result:
3.1 Cluster tree
33
Lemma 3.8 (Intersecting clusters). Let t; s 2 T with tO \ sO ¤ ;. Then the following statements holds: 1. level.t/ D level.s/ implies t D s. 2. level.t/ level.s/ implies s 2 sons .t /. Proof. We prove level.t / D level.s/ ^ tO \ sO ¤ ; H) t D s
(3.1)
for all t; s 2 T by induction. If level.t/ D level.s/ D 0 holds, t and s are both the root of T , so they have to be identical. Let now m 2 N0 be such that (3.1) holds for all t; s 2 T with level.t / D level.s/ D m. Let t; s 2 T with level.t / D level.s/ D mC1 and tO\Os ¤ ;. Due to the definition of the level, there are uniquely defined clusters t C ; s C 2 T with level.t C / D level.s C / D m and t 2 sons.t C /, s 2 sons.s C /. Since tOC \ sO C tO \ sO ¤ ; holds, we can apply the induction assumption to prove t C D s C , i.e., t and s are sons of the same father t C . Since all sons of t C are disjoint by definition, tO \ sO ¤ ; implies t D s, which concludes the induction. Let t; s 2 T with tO \ sO ¤ ; and level.t / level.s/. The definition of the level implies that we can find a cluster s C 2 pred.s/ with level.s C / D level.t /. Due to tO \ sO C tO \ sO ¤ ;, we can apply the first statement of this lemma to find t D s C . This lemma states that if the index sets corresponding to two clusters intersect, one of the clusters has to be a descendant of the other. A simple consequence of this result is that the leaves of a cluster tree form a disjoint partition of . Corollary 3.9 (Leaf clusters). Let T be a cluster tree, and let r 2 T . The set O As a special case, the set ftO W t 2 L \ sons .r/g is a disjoint partition of r. ftO W t 2 L g is a disjoint partition of . Proof. Let t; s 2 L with tO \ sO ¤ ;. Due to Lemma 3.8, we have s 2 sons .t / or t 2 sons .s/. Since t and s are leaves, sons .t / D ft g and sons .s/ D fsg hold and we conclude t D s, which means that the index sets corresponding to leaf clusters are pairwise disjoint. Let i 2 r. O We have to find t 2 sons .r/ \ L with i 2 tO. Let us consider the set C ´ ft 2 sons .r/ W i 2 tOg. It contains r and is therefore not empty. Since the cluster tree T is finite, C contains only a finite number of elements, so there has to be a cluster t 2 C with level.t / D maxflevel.s/ W s 2 C g: If sons.t/ ¤ ; would hold, Definition 3.4 implies that we could find s 2 sons.t / with i 2 sO , i.e., s 2 C with level.s/ D level.t / C 1, which would contradict the maximality property of t. Therefore t has to be a leaf.
34
3 Hierarchical matrices
Corollary 3.10 (Level partitions). Let T be a cluster tree and let ` 2 N0 . The set ftO W t 2 T` g is a disjoint partition of a subset of . Proof. Let t; s 2 T` . If tO \ sO ¤ ; holds, Lemma 3.8 implies t D s, therefore all elements of ftO W t 2 T` g are disjoint. Since most of the complexity estimates in the following sections depend on characteristic quantities of cluster trees, we introduce the following notations: Definition 3.11 (Cluster tree quantities). Let T be a cluster tree for the index set . We denote the number of indices by n ´ # and the number of clusters by c ´ #T : For some estimates we also require the depth p ´ maxflevel.t / W t 2 T g of a cluster tree. In the model case, we have p D p q D log2 n, c D 2pC1 1 and for any ` 2 f0; : : : ; pg, t 2 T` implies that we can find an integer ˛ 2 f0; : : : ; 2` 1g with tO D f˛2q` C 1; : : : ; .˛ C 1/2q` g.
3.2 Block cluster tree Let T and TJ be cluster trees for finite index sets and J. Using these cluster trees, we can now proceed to define the block cluster tree TJ which gives rise to the hierarchical block partitions used in constructing H - and H 2 -matrices. In the model case, all blocks correspond to square subblocks of the matrix, and a block is either a leaf or has exactly four descendants. These properties are lost in the general case, since clusters on the same level in the cluster tree can correspond to index sets of different sizes and since not all leaves of the cluster tree are on the same level. Definition 3.12 (Block cluster tree). Let TJ be a labeled tree. TJ is a block cluster tree for T and TJ if it satisfies the following conditions: • root.TJ / D .root.T /; root.TJ //. • Each node b 2 TJ has the form b D .t; s/ for t 2 T and s 2 TJ and its label satisfies bO D tO sO .
3.2 Block cluster tree
• Let b D .t; s/ 2 TJ . If sons.b/ ¤ ;, we have 8 ˆ if sons.t / D ;; sons.s/ ¤ ;; <¹t º sons.s/ sons.b/ D sons.t / ¹sº if sons.t / ¤ ;; sons.s/ D ;; ˆ : sons.t / sons.s/ otherwise:
35
(3.2)
The nodes b 2 TJ of a block cluster tree are called blocks. In order to handle the three cases of (3.2) in a unified fashion, we introduce the following notation: Definition 3.13 (Extended set of sons). Let T be a cluster tree. We let ´ sons.t / if sons.t / ¤ ;; sonsC .t / ´ for all t 2 T : ¹t º otherwise We can see that (3.2) takes the short form sons.b/ D sonsC .t / sonsC .s/. Lemma 3.14 (Cluster tree of blocks). Let TJ be a block cluster tree for T and TJ . Then TJ is a cluster tree for the product index set J. Proof. Let r be the root of TJ , let r and rJ be the roots of T and TJ . By definition, we have r D .r ; rJ / and rO D rO rOJ D J. Let b D .t; s/ 2 TJ with sons.b/ ¤ ;. This implies sons.b/ D sonsC .t / sonsC .s/. By definition, we have [ [ P P 0 O tO D t ; s O D sO 0 ; 0 C 0 C t 2sons .t/
therefore we can conclude bO D tO sO D D
[ P [ P
t 0 2sonsC .t/
s 2sons .s/
[ P tO0
.t 0 ;s 0 /2sons.b/
s 0 2sonsC .s/
[ P tO0 sO 0 D
sO 0
b 0 2sons.b/
bO 0 :
Combining this lemma with Corollary 3.9 yields the following useful observation: Corollary 3.15 (Block partition). Let b D .t ; s / 2 TJ . Then fbO D tO sO W b D .t; s/ 2 sons .b / \ LJ g
(3.3)
is a disjoint partition of bO . As a special case, fbO D tO sO W b D .t; s/ 2 LJ g is a disjoint partition of J.
(3.4)
36
3 Hierarchical matrices
Proof. Due to Lemma 3.14, TJ is a cluster tree for the index set J. Applying Corollary 3.9 concludes the proof of the first claim. In order to prove the second claim, we let b D root.TJ / and observe sons .b / \ LJ D LJ . In short, the partition (3.4) corresponds to a decomposition of matrices in RJ into non-overlapping submatrices. Among the elements of this partition, we now have to distinguish admissible and inadmissible subblocks, since we can apply low-rank approximation schemes only to admissible submatrices. Definition 3.16 (Admissibility condition). Let T and TJ be cluster trees. A predicate A W T TJ ! ftrue; falseg is an admissibility condition for T and TJ if A.t; s/ ) A.t 0 ; s/
holds for all t 2 T ; s 2 TJ ; t 0 2 sons.t /
A.t; s/ ) A.t; s 0 /
holds for all t 2 T ; s 2 TJ ; s 0 2 sons.s/:
and If A.t; s/ D true, the pair .t; s/ is called admissible. In the model case, the clusters t; s 2 T are intervals, and we can use (2.6) as an admissibility condition: ´ true if diam.t / C diam.s/ 2 dist.t; s/; A1d .t; s/ ´ (3.5) false otherwise; for all intervals t; s 2 T and the admissibility parameter 2 R>0 . Since t 0 t holds for all t 0 2 sons.t / and s 0 s holds for all s 0 2 sons.s/, A1d satisfies the conditions of Definition 3.16. Remark 3.17 (Weak admissibility condition). In certain situations, we can replace (3.5) by the weak admissibility condition ´ true if t ¤ s; A1d;w .t; s/ ´ false otherwise; investigated in [69]. This condition leads to the simple matrix structure investigated in the first paper [62] on hierarchical matrices. It can be proven that this structure is wellsuited for treating certain one-dimensional problems [59], [96] and yields acceptable results for some essentially one-dimensional integral equations [84]. In the case of integral equations, experiments [69] indicate that the original admissibility condition leads to more robust approximation properties, while the weak admissibility condition offers algorithmic advantages.
3.3 Construction of cluster trees and block cluster trees
37
Using an admissibility condition, we can split the subblocks in the set (3.4) into admissible and inadmissible ones. The admissible leaves of TJ can be represented by a suitable factorization and can therefore be handled efficiently. The inadmissible leaves of TJ will be stored as dense matrices, therefore we have to ensure that these matrices are not too large. A simple way of doing this is to require all inadmissible leaves of the block cluster tree to be formed from leaves of the cluster tree, since these can be assumed to correspond only to small sets of indices. Definition 3.18 (Admissible block cluster tree). Let TJ be a block cluster tree for T and TJ , and let A be an admissibility condition. If for each .t; s/ 2 LJ either sons.t/ D ; D sons.s/ or A.t; s/ D true holds, the block cluster tree TJ is called A-admissible. Definition 3.19 (Farfield and nearfield). Let TJ be a block cluster tree for T and TJ , and let A be an admissibility condition for these cluster trees. The set LC J ´ f.t; s/ 2 LJ W A.t; s/ D trueg is called the set of farfield blocks. The set L J ´ f.t; s/ 2 LJ W A.t; s/ D falseg is called the set of nearfield blocks. Obviously, the labels of the pairs in LC J and L form a disjoint partition of the labels of the pairs in L . J J From now on, we assume that each block cluster tree TJ is implicitly associated with an admissibility condition A and that the sets LC J and LJ are defined with respect to this condition.
3.3 Construction of cluster trees and block cluster trees Before we proceed to consider H - and H 2 -matrices based on general cluster trees and block cluster trees, let us briefly discuss the construction of these trees in practical applications. The problem of constructing a “good” cluster tree for an index set can be approached in different ways: if the index set is already equipped with a hierarchy, e.g., if it is the result of the application of a refinement strategy to an initial coarse discretization, we can use the given hierarchy. If the index set corresponds to a quasi-uniform discretization, it is possible to construct cluster trees and block cluster trees by recursive binary splittings of the underlying domain, as demonstrated in [90], [91] and [92], Section 7.4.1. Even in the case of non-uniform discretizations, it is possible to find cluster trees and prove that algorithms based on these trees are efficient [52].
38
3 Hierarchical matrices
Binary space partitioning Here we only introduce relatively simple, yet quite general, algorithms for the construction of suitable cluster and block cluster trees. The theoretical investigation of these algorithms is not the topic of this book, we only use them as the foundation for numerical experiments. Our construction is based on the assumption that the index set corresponds to a family of subsets of a domain or manifold Rd . As a typical example let us consider the discretization of a differential or integral equation: each i 2 corresponds to a basis function or test functional, and, as in the model case, the success of an approximation is determined by the quality of the approximation on the support of this function or functional. As in the model case described in Chapter 2, we base general clustering strategies only on the supports of these functions or functionals and neglect their other features. Definition 3.20 (Support). A family . i /i2 of subsets of is called a family of supports for . Given such a family, the set i is called the support of i for all i 2 . If a cluster tree T for is given, we define the support of a cluster t 2 T by [ t ´ i : (3.6) i2tO
In standard applications, e.g., for finite element or boundary element discretizations, the supports i will be small, i.e., their diameters will be proportional to the meshwidth of the underlying grid. Therefore it makes sense to avoid the necessity of handling possibly complicated subdomains i by picking a single point xi 2 i for each i 2 and basing constructive algorithms on these points instead of the original subdomains i . Definition 3.21 (Characteristic point). A family .xi /i2 of points in is called a family of characteristic points for . Given such a family, the point xi is called the characteristic point of i for all i 2 . If a family of supports . i /i2 satisfies xi 2 i for all i 2 , we say that the characteristic points .xi /i2 match the family of supports . i /i2 . Using families of characteristic points, a wide variety of practical constructions for cluster trees can be investigated. A fairly general approach, which guarantees that the resulting cluster trees and block cluster trees lead to efficient algorithms for hierarchical matrices, is described in [52]. We restrict our attention to two simple constructions of cluster trees, which are both based on subdivisions of the “cloud” of characteristic points in d -dimensional space. The basic idea is to assign each cluster t an axis-parallel box B t D Œa t;1 ; b t;1 Œa t;d ; b t;d
3.3 Construction of cluster trees and block cluster trees
39
containing all corresponding characteristic points, i.e., satisfying xi 2 B t
for all i 2 tO:
(3.7)
If we decide that t should not be a leaf of the cluster tree, e.g., if the cardinality #tO is not small enough, we have to construct sons of t . We do this by picking a coordinate direction 2 f1; : : : ; d g, fixing the midpoint m ´
b t; C a t; 2
of the corresponding interval Œa t; ; b t; , and sorting the indices of tO into “left” and “right” portions tO1 ´ fi 2 tO W xi; < m g;
tO2 ´ fi 2 tO W xi; m g:
P tO2 , we can use these sets to define sons t1 and t2 and use Due to tO D tO1 [ 8 ˆ if tO1 ¤ ;; tO2 D ;; <¹t1 º sons.t / ´ ¹t2 º if tO1 D ;; tO2 ¤ ;; ˆ : ¹t1 ; t2 º otherwise: In order to be able to proceed by recursion, we need boxes B t1 and B t2 for the new sons. Due to our construction, the boxes B t1 ´ Œa t;1 ; b t;1 Œa t;1 ; b t;1 Œa t; ; m Œa t;C1 ; b t;C1 Œa t;d ; b t;d ; B t2 ´ Œa t;1 ; b t;1 Œa t;1 ; b t;1 Œm ; b t; Œa t;C1 ; b t;C1 Œa t;d ; b t;d are a simple choice and satisfy (3.7). We also need coordinate directions 1 ; 2 2 f1; : : : ; d g to be used for splitting these boxes. A simple approach is to cycle through all possible directions using ´
C 1 if < d;
1 D 2 D 1 otherwise: This approach guarantees that the characteristic boxes on level ` C d of the cluster tree will be similar to the boxes on level `, only scaled by 1=2 and shifted. Due to this self-similarity property, we call this the “geometrically regular” clustering strategy. It is summarized in Algorithm 1. Figure 3.4 illustrates the procedure for the twodimensional case: a (not necessarily optimal) axis-parallel box is split into two halves, and the set of indices is subdivided accordingly, creating two sons. The “left” son is not small enough, so it is split again.
40
3 Hierarchical matrices
Algorithm 1. Create a geometrically regular cluster tree. function ConstructRegularTree(tO, B t , ) : cluster; Create a new cluster t with index set tO; sons.t / ;; Œa t;1 ; b t;1 Œa t;d ; b t;d Bt ; if #tO not small enough then m .a t; C b t; /=2; tO1 ;; tO2 ;; for i 2 tO do if xi; < m then tO1 tO1 [ fi g else tO2 tO2 [ fi g end if end for; B t1 fx 2 B t W x mg; B t2 fx 2 B t W x mg; if < d then
0
C1 else
0 1 end if; if tO1 ¤ ; then t1 ConstructRegularTree(tO1 , B t1 , 0 ); sons.t/ sons.t / [ ft1 g end if if tO2 ¤ ; then t2 ConstructRegularTree(tO2 , B t2 , 0 ); sons.t/ sons.t / [ ft2 g end if end if; return t For anisotropic situations, e.g., if the width and height of the boxes differ significantly, it is desirable to split the boxes in the direction of maximal extent, i.e., to choose the direction 2 f1; : : : ; d g with b t; a t; D maxfb t; a t; W 2 f1; : : : ; d gg: By this technique, we can ensure that the splitting strategy tries to equilibrate the dimensions of the boxes, i.e., to minimize their diameters. This is a very useful feature, since most admissibility criteria (cf. (2.6)) are based on diameters and distances, and clusters with small diameters are more likely to be admissible than those with large diameters. In order to reduce the diameters even further, we can recompute the boxes B t in each step in order to ensure that they are minimal. As a desirable side-effect,
3.3 Construction of cluster trees and block cluster trees
x4
x2
x4
x5 x1 x7
x2 x5 x1
x3 x6
x4
x6
x7
x2 x5 x1
x3
41
x7
x3 x6
f1; 2; 3; 4; 5; 6; 7g f1; 2; 4; 5; 7g f1; 7g
f3; 6g
f2; 4; 5g
f3; 6g
Figure 3.4. Two steps of the construction of a geometrically regular cluster tree for characteristic points in two spatial dimensions.
this will also ensure that splitting a cluster will create two sons as long as the cluster is not trivial, i.e., as long as we can find i; j 2 tO with xi ¤ xj . This approach is called “geometrically balanced” clustering. The resulting procedure is given in Algorithm 2. All clusters t in a cluster tree constructed by the geometrically balanced clustering Algorithm 2 will satisfy either # sons.t / D 0 or # sons.t / D 2. This fact allows us to bound the number of clusters: Lemma 3.22 (Number of clusters). Let T be a cluster tree constructed by Algorithm 2. Then the number of clusters is bounded by c 2n 1; with n D # and c D #T as in Definition 3.11. Proof. Due to ¤ ;, Algorithm 2 ensures that tO ¤ ; holds for all t 2 T . According to Corollary 3.9, the index sets tO of all leaf clusters t 2 L are disjoint, so there cannot be more than n leaf clusters. We can conclude the proof by using Lemma 3.52. Remark 3.23 (Improved bound). Under certain conditions we can ensure that all leaves of the cluster tree have a minimal size of m, i.e., we have #tO m for all leaf clusters t 2 L . According to Corollary 3.9, all leaves correspond to disjoint index sets, and since all of these sets contain at least m indices, there cannot be more than n =m leaves. Applying Lemma 3.52 yields the bound n c 2 1 m for the number of clusters. This is similar to the result (2.23) for the model case.
42
3 Hierarchical matrices
Algorithm 2. Create a geometrically balanced cluster tree. function ConstructGeometricTree(tO) : cluster; Create a new cluster t ; sons.t / ;; if #tO not small enough then for 2 f1; : : : ; d g do a t; minfxi; W i 2 tOg; b t; maxfxi; W i 2 tOg; end for
1; for 2 f2; : : : ; d g do if b t; a t; > b t; a t; then
end if end for; if b t; a t; > 0 then m .a t; C b t; /=2; tO1 ;; tO2 ;; O for i 2 t do if xi; < m then tO1 [ fi g tO1 else tO2 tO2 [ fi g end if end for; t1 ConstructGeometricTree(tO1 ); t2 ConstructGeometricTree(tO2 ); sons.t/ ft1 ; t2 g end if end if; return t
Block cluster tree Once we have found cluster trees T and TJ for the index sets and J, the next step is to construct an admissible block cluster tree TJ . If an admissibility condition A is given, Definitions 3.12 and 3.18 suggest a simple recursive algorithm: according to Definition 3.12, the root of TJ is given by .root.T /; root.TJ //. Due to Definition 3.18, a block b D .t; s/ 2 TJ can only be a leaf if it is admissible or if t and s are leaves of the respective cluster trees. If the block is not a leaf, Definition 3.12 uniquely defines its sons, and we can proceed by recursion. Algorithm 3 summarizes the resulting construction of a minimal A-admissible block cluster tree: if called with t D root.T / and s D root.TJ /, it will construct the minimal
3.3 Construction of cluster trees and block cluster trees
43
A-admissible block cluster tree and return its root block. Algorithm 3. Create a minimal A-admissible block cluster tree. function ConstructBlockClusterTree(t , s) : block; Create a new block b ´ .t; s/; bO tO sO ; sons.b/ ;; if A.t; s/ then LC LC fAdmissible leafg J J [ fbg else if sons.t / D ; then if sons.s/ D ; then L L fInadmissible leafg J J [ fbg else for s 0 2 sons.s/ do b0 ConstructBlockClusterTree(t, s 0 ); sons.b/ sons.b/ [ fb 0 g end for end if else if sons.s/ D ; then for t 0 2 sons.t / do ConstructBlockClusterTree(t 0 , s); b0 sons.b/ sons.b/ [ fb 0 g end for else for t 0 2 sons.t /, s 0 2 sons.s/ do b0 ConstructBlockClusterTree(t 0 , s 0 ); sons.b/ sons.b/ [ fb 0 g end for end if end if; return b The efficiency of Algorithm 3 is determined by the algorithmic complexity of the test for A-admissibility. We now examine two situations in which this test can be performed efficiently.
44
3 Hierarchical matrices
Admissibility condition for spherical domains In applications based on Taylor expansions or spherical harmonics, we will usually encounter admissibility conditions of the type ´ true if max¹diam.K t /; diam.Ks /º 2 dist.K t ; Ks /; A.t; s/ ´ false otherwise; where K t and Ks are d -dimensional balls satisfying t K t and s Ks and 2 R>0 is a parameter. If the ball K t corresponding to t 2 T is described by its center c t 2 Rd and its radius r t 2 R0 , the test for admissibility is straightforward: we have diam.K t / D 2r t and diam.K t / D 2r t ; diam.Ks / D 2rs ; ´ kc t cs k2 r t rs if kc t cs k2 r t C rs ; dist.K t ; Ks / D 0 otherwise: If the cluster tree has been constructed by Algorithms 1 or 2, the construction of the balls K t for all t 2 T can be handled by a simple recursion: if # sons.t / D 0, i.e., if t is a leaf, the computation of a suitable center c t and radius r t depends on the application, e.g., on the underlying discretization scheme. If # sons.t / D 1, we let t 0 2 sons.t / and observe tO D tO0 , therefore we can use K t ´ K t 0 , i.e., c t ´ c t 0 and r t ´ r t 0 . Otherwise, i.e., if # sons.t / D 2, we pick t1 ; t2 2 sons.t / with sons.t / D ft1 ; t2 g. If K t2 K t1 , we use K t D K t1 , i.e., c t ´ c t1 and r t ´ r t1 . If K t1 K t2 , we use K t D K t2 , i.e., c t ´ c t2 and r t ´ r t2 . Otherwise, we are in the situation depicted in Figure 3.5.
c t2 ct c t1
Figure 3.5. Covering two discs by one larger disc.
The optimal point c t is characterized by the fact that it minimizes the radius r t required to ensure that K t covers K t1 and K t2 , i.e., that the distance of c t to the
3.3 Construction of cluster trees and block cluster trees
45
boundaries of K t1 and K t2 is minimal. This leads to the problem of finding a point c t which minimizes r t D maxfkc t c t1 k2 C r t1 ; kc t c t2 k2 C r t2 g: We can see that the center c t of the optimal ball K t has to be situated on the line connecting c t1 and c t2 , i.e., c t D .1 /c t1 C c t2
(3.8)
has to hold for a parameter 2 Œ0; 1, so the minimization problem now takes the form r t D maxfkc t2 c t1 k2 C r t1 ; .1 /kc t2 c t1 k2 C r t2 g and we find that the minimum is determined by kc t2 c t1 k2 C r t1 D .1 /kc t2 c t1 k2 C r t2 : This equation implies 2kc t2 c t1 k2 D kc t2 c t1 k2 C r t2 r t1 :
(3.9)
Since neither K t2 K t1 nor K t1 K t2 hold, we have c t2 ¤ c t1 and jr t2 r t1 j < 1; kc t2 c t1 k2 so the equation (3.9) has the unique solution 1 r t2 r t1 ´ : 1C 2 kc t2 c t1 k2 Combining this choice of with (3.8) yields the recursive Algorithm 4 for constructing K t for all t 2 T . If an efficient method for computing good covering balls K t for the leaves t 2 L is available (since #tO is assumed to be small in this case, only a small number of supports has to be taken into account), Algorithm 4 will also be very efficient.
Admissibility condition for rectangular domains Let us now consider a second admissibility condition, typically used by applications based on tensor-product interpolation. Here, we assume that we have axis-parallel bounding boxes Q t and Qs satisfying t Qt ;
s Qs
for all t 2 T ; s 2 TJ
(3.10)
46
3 Hierarchical matrices
Algorithm 4. Construct covering balls for cluster supports. procedure ConstructCoveringBalls(t ); if # sons.t/ D 0 then Compute c t and r t such that the corresponding ball K t contains t else if # sons.t / D 1 then Pick t1 2 sons.t /; ConstructCoveringBalls(t1 ); c t1 ; r t r t1 ct else Pick t1 ; t2 2 sons.t / with sons.t / D ft1 ; t2 g; ConstructCoveringBalls(t 1 ); ConstructCoveringBalls(t2 ); r t2 r t1 1 1 C kc t c t k2 ; 2 2 1 .1 /c t1 C c t2 ; r t kc t2 c t1 k2 C r t1 ct end if at our disposal, and use ´ true A.t; s/ ´ false
if max¹diam.Q t /; diam.Qs /º 2 dist.Q t ; Qs /; otherwise
to determine whether a block .t; s/ is admissible. Here 2 R>0 is an additional parameter controlling the strictness of the admissibility condition (cf. (2.6)). If the boxes Q t and Qs are described by Q t D Œa t;1 ; b t;1 Œa t;d ; b t;d ;
Qs D Œas;1 ; bs;1 Œas;d ; bs;d ;
the test for admissibility can be handled efficiently due to diam.Q t / D
d X
1=2
diam2 .Œa t; ; b t; /
;
D1
diam.Qs / D
d X
1=2
diam2 .Œas; ; bs; /
;
D1
dist.Q t ; Qs / D
d X
1=2
dist2 .Œa t; ; b t; ; Œas; ; bs; /
:
D1
The parameters of the optimal bounding box Q t are determined by a t; ´ inffx W x 2 t g;
b t; ´ supfx W x 2 t g
for 2 f1; : : : ; d g:
As in the case of covering balls, we can compute the optimal bounding boxes by a recursion.
3.4 Hierarchical matrices
47
Q t2 Q t1 Qt
Figure 3.6. Covering two boxes by one large box.
The computation of Q t for leaf clusters t 2 L depends on the application, but we can assume that any reasonable data structure allows us to find minimal and maximal coordinates efficiently, especially since only a small number of supports has to be considered due to our assumption that #tO is small for leaf clusters. Let now t 2 T be a cluster with sons.t / ¤ ;. Definitions 3.4 and 3.20 imply [ t D t 0 ; t 0 2sons.t/
and we find a t; D inffx W x 2 t g D
min inffx W x 2 t 0 g D
t 0 2sons.t/
b t; D supfx W x 2 t g D
max supfx W x 2 t 0 g D
t 0 2sons.t/
min a t 0 ; ;
t 0 2sons.t/
max b t 0 ; :
t 0 2sons.t/
This suggests the simple recursive Algorithm 5. As in the case of Algorithm 4, the efficiency of Algorithm 5 depends mainly on the efficiency of the algorithm used to find good bounding boxes for leaf clusters.
3.4 Hierarchical matrices Let TJ be an A-admissible block cluster tree. Let X 2 RJ . As in the model case, we use cut-off matrices in order to restrict matrices to submatrices related to single clusters or blocks: Definition 3.24 (Cut-off matrices). Let T be a cluster tree. For all t 2 T , the cut-off matrix t 2 R corresponding to t is defined by ´ 1 if i D j 2 tO; . t /ij ´ for all i; j 2 : 0 otherwise;
48
3 Hierarchical matrices
Algorithm 5. Construct bounding boxes for cluster supports. procedure ConstructBoundingBoxes(t); if # sons.t/ D 0 then for 2 f1; : : : ; d g do a t; inffx W x 2 t g; b t; supfx W x 2 t g end for else for t 0 2 sons.t / do ConstructBoundingBoxes(t 0 ) end for; for 2 f1; : : : ; d g do a t; minfa t 0 ; W t 0 2 sons.t /g; b t; maxfb t 0 ; W t 0 2 sons.t /g end for end if The cut-off matrices correspond to subspaces of vectors and matrices which vanish on subsets of the index sets. Definition 3.25 (Restricted spaces). Let 0 , let J 0 J, let K be a finite index set. We define RK ´ fX 2 RK W Xik D 0 for all i 2 n 0 and k 2 Kg; 0 J RJ W Xij D 0 for all i 2 n 0 and j 2 J n J 0 g; 0 ;J 0 ´ fX 2 R
contains all matrices which are zero outside of the rows corresponding to i.e., RK 0 J 0 , and R0 ;J 0 contains all matrices which are zero outside of the block 0 J 0 . J0
0
J Figure 3.7. Matrix in RJ 0 ;J 0 .
49
3.4 Hierarchical matrices
The cut-off matrices t provide us with a purely algebraic characterization of matrices in the restricted spaces: Remark 3.26 (Restricted spaces). Let K be a finite index set, let t 2 T and s 2 TJ . Let X 2 RK and N 2 RJ . We have X 2 RK if and only if X D t X holds, tO and we have N 2 RJ if and only if N D t Ns holds. tO;Os
Proof. Combine Definitions 3.25 and 3.24. A generalization of (2.12) is provided by Corollary 3.15: we can split the matrix X into submatrices X XD t Xs : (3.11) bD.t;s/2LJ
Remark 3.26 implies t Xs 2 RJ for all blocks b D .t; s/ 2 LJ , therefore each tO;Os of these blocks requires only .#tO/.#sO / units of storage. As in the model case, we are looking for a more efficient representation of the admissible leaves. In order to define a general hierarchical matrix, we follow the approach suggested by (2.14) and represent each admissible leaf of TJ in a factorized form using “slim” matrices, i.e., matrices with only a small number of columns. Definition 3.27 (Hierarchical matrix). Let X 2 RJ , let TJ be an admissible block cluster tree, and let .Kb /b2LC be a family of finite index sets. X is a hierarchical J
Kb
matrix (or short H -matrix) if for each b D .t; s/ 2 LC J there are Ab 2 RtO Bb 2
JK RsO b
The quantity
with
and
t Xs D Ab Bb : k ´ maxf#Kb W b 2 LC J g
is called the local rank of the H -matrix. For an H -matrix, equation (3.11) implies X Ab Bb C XD bD.t;s/2LC J
X
t Xs ;
(3.12)
bD.t;s/2L J
and we call this the H -matrix representation of X . Usually hierarchical matrices [62], [19], [18], [63] are defined by requiring that the ranks of submatrices corresponding to admissible blocks are bounded. This approach coincides with Definition 3.27: Remark 3.28 (Alternative definition). Let X 2 RJ , let TJ be an admissible block cluster tree, and let rank. t Xs / k
50
3 Hierarchical matrices
hold for all b D .t; s/ 2 LC J .
Let b D .t; s/ 2 LC J and Xb ´ t Xs . The definition of the rank of a matrix implies that the dimension kb of the range of Xb is bounded by k, therefore we can find an orthonormal basis of range.Xb / consisting of kb vectors. We use these vectors k as columns of a matrix Vb 2 RtO b satisfying Vb Vb D I: Let now x 2 range.Xb /. Due to range.Xb / D range.Vb / we can find y 2 Rkb with x D Vb y and get Vb Vb x D Vb Vb Vb y D Vb y D x; i.e., Vb Vb is a projection into range.Xb /. This means Xb D Vb Vb Xb D Vb .Xb Vb / D Ab Bb
for Ab D Vb and Bb D Xb Vb . This is the factorized representation required by Definition 3.27, therefore X is an H -matrix. Now let us consider the storage requirements of hierarchical matrices. In the model case, we can compute the numbers of admissible and inadmissible blocks per level explicitly, and this makes computing the storage requirements a simple task. In the general case, the level of the block cluster tree is no longer connected to the size of its blocks, therefore we need an approach that allows us to take care of different block sizes. Although the size of a block b D .t; s/ is not connected to its level, it is connected to the sizes of its row cluster t and its column cluster s. This observation leads to one of the central ideas of the complexity analysis of H - and H 2 -matrices: by bounding the number of blocks b connected to a row cluster t or a column cluster s, we can derive bounds for the number of blocks and for the complexity of many important algorithms [49], [52]. Definition 3.29 (Block rows and columns). For each t 2 T and s 2 TJ , we define row.TJ ; t / ´ fs 0 2 TJ W .t; s 0 / 2 TJ g and col.TJ ; s/ ´ ft 0 2 T W .t 0 ; s/ 2 TJ g: The set row.TJ ; t / is called the block row corresponding to t , the set col.TJ ; s/ is called the block column corresponding to s (cf. Figure 3.8).
3.4 Hierarchical matrices
51
Figure 3.8. Block rows (top) and columns (bottom) of clusters in the model case.
If this does not lead to ambiguity, we use row.t / instead of row.TJ ; t / and col.s/ instead of col.TJ ; s/. A block cluster tree is called sparse if block rows and columns contain only a bounded number of elements. Definition 3.30 (Sparsity). Let Csp 2 N. TJ is called Csp -sparse if # row.TJ ; t / Csp
and
# col.TJ ; s/ Csp
(3.13)
hold for all t 2 T and s 2 TJ . In the case of the one-dimensional model problem of Chapter 2, we can establish sparsity by the following argument: let t 2 T . Due to the definition of the cluster tree, there are ` 2 f0; : : : ; pg and ˛ 2 f0; : : : ; 2` 1g with t D t`;˛ . Let s 2 row.t / and b D .t; s/ 2 T . Due to the definition of the block cluster tree, there is a ˇ 2 f0; : : : ; 2` 1g with b D b`;˛;ˇ . If ` D 0, we have t D t0;0 and therefore row.t/ D fb0;0;0 g and # row.t / D 1.
52
3 Hierarchical matrices
If ` > 0, the construction of the block cluster tree implies that the father b`1;˛C ;ˇ C of b, given by ˛ C ´ b˛=2c;
ˇ C ´ bˇ=2c;
cannot have been admissible, since admissible blocks are not subdivided by our algorithm. Due to our admissibility condition, this means j˛ C ˇ C j 1 and ˇ C 2 f˛ C 1; ˛ C ; ˛ C C 1g. We conclude 2ˇ C 2 f2˛ C 2; 2˛ C ; 2˛ C C 2g; 2ˇ C C 1 2 f2˛ C 1; 2˛ C C 1; 2˛ C C 3g; ˇ 2 f2˛ C 2; : : : ; 2˛ C C 3g: This means that for each t`;˛ 2 T there cannot be more than six blocks of the form b`;˛;ˇ 2 T in the block cluster tree, i.e., # row.t`;˛ / 6. Due to symmetry, this means that the block cluster tree T used for the model problem is 6-sparse. Lemma 3.31 (Storage requirements). Let the block cluster tree TJ be Csp -sparse and admissible. Let pJ be its depth. Let X 2 RJ be a hierarchical matrix with local rank k 2 N and let r 2 N satisfy #tO r;
#sO r
hold for all leaves t 2 L ; s 2 LJ :
(3.14)
Then the representation (3.12) requires not more than Csp maxfk; r=2g.pJ C 1/.n C nJ / units of storage. Proof. We consider first the storage requirements for one block b D .t; s/ 2 TJ . If b is not a leaf, no storage is required (at least not for the matrix coefficients). If b is an admissible leaf, the matrix block Xb D t Xs is represented as Xb D Ab Bb with Ab 2 Rk and Bb 2 RJk . This requires k.#tO C #sO / units of storage. sO tO If b is an inadmissible leaf, Definition 3.18 implies that t and s are leaves of T and TJ , respectively. Due to the assumption (3.14), we get #tO r and #sO r. For inadmissible blocks, we store all coefficients of the submatrices Xb D t Xs , and this requires .#tO/.#sO / r#sO ;
.#tO/.#sO / r#tO
units of storage. Adding both estimates and dividing by two yields .#tO/.#sO /
r .#tO C #sO /; 2
and we conclude that the storage requirements for each individual block are bounded by maxfk; r=2g.#tO C #sO /:
3.5 Cluster bases
53
In order to get the total storage requirements, we have to sum over all blocks: X maxfk; r=2g.#tO C #sO / bD.t;s/2TJ
X
D maxfk; r=2g
X
#tO C
bD.t;s/2TJ
#sO :
bD.t;s/2TJ
Due to Definition 3.12, we have level.t /; level.s/ level.b/ for all b D .t; s/ 2 TJ , and the sparsity assumption (3.13) yields X X X X #tO D #tO Csp #tO: t2T s2row.t/ level.t/pJ
bD.t;s/2TJ
t2T level.t/pJ
In order to bound this sum, we have to know how often an index i 2 can appear in clusters t 2 T . Due to Corollary 3.10, it can only be part of at most one cluster per level, and we find X
pJ
#tO D
t2T level.t/pJ
X X
pJ
#tO D
`D0 t2T `
X `D0
#
[ t2T`
pJ
tO
X
# D .pJ C 1/n :
`D0
The same arguments can be applied to the second sum and yield X #sO Csp .pJ C 1/nJ : bD.t;s/2TJ
We combine both bounds to conclude that the H -matrix requires not more than X maxfk; r=2g.#tO C #sO / Csp maxfk; r=2g.pJ C 1/.n C nJ / bD.t;s/2TJ
units of storage. In the model case, we have k D m, r D 4m and Csp D 6, and Lemma 3.31 yields a bound of 24m.p C 1/n, approximately four times the bound (2.18) we derived in Chapter 2. The additional factor is due to the fact that the proof does not distinguish between admissible and inadmissible blocks.
3.5 Cluster bases As we have seen in the model case, H 2 -matrices are based on cluster bases, which we now introduce in a general setting required for our applications.
54
3 Hierarchical matrices
Definition 3.32 (Rank distribution). Let K D .K t / t2T be a family of finite index sets. Then K is called a rank distribution for T . Definition 3.33 (Cluster basis). Let K D .K t / t2T be a rank distribution. Let V D t .V t / t2T be a family of matrices satisfying V t 2 RK for all t 2 T . Then V is called tO a cluster basis with rank distribution K and the matrices V t are called cluster basis matrices. Due to Remark 3.26, we have V t D t V t for each t 2 T . In a matrix V t , only rows corresponding to indices i 2 tO can have non-zero entries. This implies that we need only .#K t /.#tO/ units of memory to store the matrix V t (cf. Figure 3.9).
tO
Kt Figure 3.9. Cluster basis.
In order to store a general cluster basis, even one with constant rank k 1, we have to handle the matrices V t for all t 2 T , and since each of these matrices requires #tO units of storage, we will need a total of O.n .p C 1// units of storage. As in the case of the model problem, we can reduce the storage requirements significantly if we assume that the matrices V t are connected to each other. Definition 3.34 (Nested cluster basis). Let V be a cluster basis with rank distribution K. V is called nested if there is a family .E t / t2T of matrices satisfying the following conditions: • For all t 2 T and all t 0 2 sons.t /, we have E t 0 2 RK t 0 K t . • For all t 2 T with sons.t / ¤ ;, the following equation holds: X Vt 0 Et 0 : Vt D
(3.15)
t 0 2sons.t/
The matrices E t are called transfer matrices or expansion matrices. V is given in nested representation if the cluster basis matrices V t are only stored for leaf clusters t and expressed by transfer matrices for all other clusters.
3.5 Cluster bases
55
Due to equation (3.15), we do not have to store the matrices V t for all t 2 T : if sons.t/ ¤ ;, we only have to store the, typically small, transfer matrices for each of the sons and derive quantities corresponding to V t by a recursion (cf. Figure 3.10) whenever necessary. V t1
Vt
tO
tO1
V t2 E t1
E t2
tO2
K t1
Kt
K t2
Figure 3.10. Nested cluster basis.
In the model case, we have seen that this strategy leads to the optimal order of complexity, i.e., we can find a bound for the amount of storage which is proportional to the number of degrees of freedom. In more general situations, especially in the case of variable-order approximations (cf. Section 4.7), we require more flexible methods for controlling the complexity. We base all of our complexity estimates on the rank distribution K D .K t / t2T . A typical H 2 -matrix algorithm performs a number of operations for each cluster t 2 T , and the number of operations depends on the cluster: if t is a leaf, the dimensions of the matrix V t , i.e., #tO and #K t are important, if t is not a leaf, the dimensions of the transfer matrices E t 0 for the sons t 0 2 sons.t / are relevant. In summary, the complexity of most algorithms is determined by the quantities defined by ´ tOº max¹#K ± if sons.t / D ;; for all t 2 T : (3.16) ° t ; #P kt ´ otherwise max #K t ; t 0 2sons.t/ #K t 0 As an example, we can use this definition to prove a simple bound for the storage requirements of a cluster basis: Lemma 3.35 (Storage complexity). Let T be a cluster tree, and let V be a nested cluster basis for T with rank distribution K. Then the nested representation of V requires not more than X X k t #K t k t2 t2T
units of storage.
t2T
56
3 Hierarchical matrices
Proof. Let .k t / t2T be defined by (3.16). t Let t 2 T . If t is a leaf, we have to store V t 2 RK , and this takes .#tO/.#K t / tO k t .#K t / units of storage. If t is not a leaf, we have to store E t 0 2 RK t 0 K t for all t 0 2 sons.t/, and this takes X X .#K t 0 /.#K t / D #K t 0 .#K t / k t .#K t / t 0 2sons.t/
t 0 2sons.t/
units of storage. For the total storage requirements, we sum over all t 2 T and can use #K t k t to get the bound X
X
k t .#K t /
t2T
k t2 :
t2T
In the model case, we have #K t D m and #T D 2pC1 1, and we can bound k t by 2m for the 2p 1 non-leaf clusters t 2 T n L and by #tO for the leaf clusters t 2 L , so Lemma 3.35 combined with Corollary 3.9 yields the bound X
2m2 .2p 1/ C m
#tO < 2m2 2p C m#
t2L
[
tO m2pCp0 C mn D 2mn;
t2L
and this is exactly the estimate given in (2.24).
3.6 H 2 -matrices Let V be a nested cluster basis for T with rank distribution K, and let W be a nested cluster basis for TJ with rank distribution L. Let TJ be an admissible block cluster tree for T and TJ . Based on V , W and TJ , we can introduce the corresponding space of H 2 -matrices. Definition 3.36 (H 2 -matrix). Let X 2 RJ . X is an H 2 -matrix for the block cluster tree TJ , the row cluster basis V and the column cluster basis W if there is a family S D .Sb /b2LC of matrices satisfying Sb 2 RK t Ls for all b D .t; s/ 2 LC J and J
XD
X bD.t;s/2LC J
V t Sb Ws C
X
t Xs :
(3.17)
bD.t;s/2L J
The elements Sb of the family S are called coupling matrices and are sometimes also referred to by S t;s D Sb for b D .t; s/ 2 LC J .
3.6 H 2 -matrices
57
Before we investigate the storage requirements of H 2 -matrices, we observe that the set of all H 2 -matrices for the same block cluster tree and cluster bases is a matrix subspace. This is important since it not only means that we can add and scale H 2 matrices efficiently, but also that we can construct approximations of general matrices by projecting into the H 2 -matrix subspace (cf. Chapter 5 for optimal projections into this space and Chapter 7 for an application). Remark 3.37 (Matrix subspace). The set H 2 .TJ ; V; W / ´ fX 2 RJ W X is an H 2 -matrix for TJ ; V and W g is a subspace of RJ . Proof. Let A; B 2 H 2 .TJ ; V; W /, and let SA D .SA;b /b2LC , SB D .SB;b /b2LC J
be the corresponding coupling matrices. Let 2 R and C ´ A C B. We have C D A C B X D
V t SA;t;s Ws C X
X X
bD.t;s/2LC J
V t SC;t;s Ws C
t Bs
bD.t;s/2L J
X
V t .SA;t;s C SB;t;s /Ws C
bD.t;s/2LC J
D
X
V t SB;t;s Ws C
bD.t;s/2LC J
D
t As
bD.t;s/2L J
bD.t;s/2LC J
C
X
J
X
t .A C B/s
bD.t;s/2L J
t Cs ;
bD.t;s/2L J
C A B where we let S t;s ´ S t;s C S t;s for all b D .t; s/ 2 LC J . Therefore C can 2 be represented in the form (3.17), which implies C 2 H .TJ ; V; W /. Observing 0 2 H 2 .TJ ; V; W / concludes the proof.
We have already seen that the storage requirements for the cluster basis V grow at most proportionally to the number of clusters c in the corresponding cluster tree T . The same holds for the cluster basis W and the cluster tree TJ . In order to establish an upper bound for the storage requirements of an H 2 -matrix, we therefore only have to consider the nearfield and coupling matrices. Lemma 3.38 (Storage). Let TJ be an admissible Csp -sparse block cluster tree for the cluster trees T and TJ . Let V and W be nested cluster bases with rank distributions K and L, let .k t / t2T be defined as in (3.16) and define ´ tOº max¹#L ± if sons.s/ D ;; for all s 2 T ° s ; #P (3.18) ls ´ J otherwise; max #Ls ; s 0 2sons.s/ #Ls 0
58
3 Hierarchical matrices
correspondingly. The H 2 -matrix representation of a matrix in H 2 .TJ ; V; W / requires not more than X Csp X 2 kt C ls2 2 t2T
s2TJ
units of storage. If we also include the representations of the cluster bases V and W , a total of X Csp C 2 X 2 kt C ls2 2 t2T
s2TJ
units of storage is sufficient. Proof. Let X 2 H 2 .TJ ; V; W /. Let b D .t; s/ 2 LJ . If b is admissible, we store the coupling matrix Sb 2 RK t Ls , and this takes .#K t /.#Ls / k t ls
1 2 .k C ls2 / 2 t
, and this units of storage. If b is not admissible, we store the matrix t Xs 2 RJ tOOs requires 1 .#tO/.#sO / k t ls .k t2 C ls2 / 2 units of storage. For the total storage requirements we add the results for all blocks and use (3.13) to get X bD.t;s/2LJ
1 2 1 .k t C ls2 / 2 2
X
k t2 C
bD.t;s/2TJ
1 X D 2
X
k t2 C
t2T s2row.t/
1 2
X
ls2
bD.t;s/2TJ
1 X X 2 ls 2 s2TJ t2col.s/
Csp X 2 Csp X 2 kt C ls : 2 2
(3.13)
t2T
s2TJ
Lemma 3.35 implies that V and W require not more than X X k t2 C ls2 t2T
s2TJ
units of storage, and adding both estimates completes the proof. In the model case, we have Csp D 6, k t #tO 4m for leaf clusters and k t D 2m for the 2p 1 non-leaf clusters, so Lemma 3.38 yields a bound of X .#tO/2 C .2p 1/.2m/2 8.4mn C 2mn/ D 48mn: 8 t2L
3.7 Matrix-vector multiplication
59
The result in Chapter 2 is better by a factor of almost 3 because it distinguishes admissible and inadmissible blocks, while we avoided this for the sake of simplicity in the proof of the general estimate. Remark 3.39 (HSS-matrices). A quadratic H 2 -matrix based on a balanced binary cluster tree T and a block cluster tree T using the weak admissibility condition (cf. Remark 3.17) is called a hierarchically semi-separable matrix (or short an HSS-matrix) [30], [106]. Matrices of this type can be handled very efficiently, even robust direct solvers of linear complexity are available [29], [31]. Since the weak admissibility condition is only adequate for essentially one-dimensional problems, the range of applications of HSS-matrices is limited to problems of this type, e.g., there are exact solvers for one-dimensional boundary value problems [59], [96], approximate solvers for Toeplitz matrices [32], and it is possible to use nested dissection techniques to reduce two-dimensional sparse problems to one-dimensional problems that can be treated by HSS-matrices [83], [107]. Hierarchically semi-separable matrices are closely related to the matrix set Mk . / introduced and analyzed in Section 4 of [69].
3.7 Matrix-vector multiplication Merely storing a matrix in the efficient H 2 -matrix representation (3.17) is not sufficient for the majority of applications, since these will require a way of evaluating the operator associated with the matrix, i.e., of multiplying the matrix with a vector. Since the H 2 -matrix is given in factorized form, and since we do not intend to convert the efficient representation back into a less efficient one, we have to find a way of performing the necessary operations based on the available factorized representation. Let X 2 H 2 .TJ ; V; W / be given in the form (3.17), and let the nested cluster bases V and W be given in nested representation. Let x 2 RJ . We are looking for a way of computing y ´ Xx efficiently. Due to (3.17), y is given in the form X X yD V t S t;s Ws x C t Xs x: bD.t;s/2LC J
bD.t;s/2L J
In order to be able to rewrite this equation in a form more suitable for our algorithm, we introduce the following subsets of block rows and columns (cf. Definition 3.29). Definition 3.40 (Admissible block rows and columns). For each t 2 T and s 2 TJ , we define rowC .TJ ; t / ´ fs 0 2 TJ W .t; s 0 / 2 LC J g
60
3 Hierarchical matrices
and colC .TJ ; s/ ´ ft 0 2 T W .t 0 ; s/ 2 LC J g: The set rowC .TJ ; t / is called the admissible block row corresponding to t , the set colC .TJ ; s/ is called the admissible block column corresponding to s. If the block cluster tree is implied by the context, we use rowC .t / instead of rowC .TJ ; t/ and colC .s/ instead of colC .TJ ; s/. Using the admissible block rows, we find yD
X
X
Vt
t Xs x;
bD.t;s/2L J
s2rowC .t/
t2T
X
S t;s Ws x C
therefore the task of computing y can be split into four subtasks: • Compute xO s ´ Ws x for all s 2 TJ . This is called the forward transformation. • Compute yO t ´
P
s2rowC .t/
• Compute yfar ´
P t2T
• Compute y D yfar C
P
S t;s xO s . This is called the multiplication step.
V t yO t . This is called the backward transformation.
bD.t;s/2L J
t Xs x. This is called the nearfield step.
The multiplication and nearfield steps do not present us with a challenge, since they only involve the “small” matrices S t;s and t Xs . The forward transformation requires a close investigation, since it involves the matrix Ws , which may have a large number of rows and is only implicitly given by equation (3.15). Fortunately, the nested structure implies xO s D Ws x D
X
Ws 0 Fs 0
xD
s 0 2sons.s/
X
X
Fs0 Ws0 x D
s 0 2sons.s/
Fs0 xO s 0
s 0 2sons.s/
for all s 2 TJ with sons.s/ ¤ ;, i.e., we require the matrices Ws only for leaves of the cluster tree and can use the transfer matrices for all other clusters. Applying the transfer matrices recursively leads to Algorithm 6. The backward transformation can be handled in a similar fashion: let t 2 T with sons.t/ ¤ ;. Then (3.15) implies V t yO t C
X t 0 2sons.t/
V t 0 yO t 0 D
X t 0 2sons.t/
V t 0 E t 0 yC O
X t 0 2sons.t/
V t 0 yO t 0 D
X
V t 0 .yO t 0 CE t 0 yO t /:
t 0 2sons.t/
Applying this equation recursively allows us to reduce the original sum over T to a sum over L , which can be easily computed. The result is Algorithm 7.
3.7 Matrix-vector multiplication
61
Algorithm 6. Forward transformation. procedure ForwardTransformation(t, V , x, var x); O if sons.t/ D ; then xO t V t x else xO t 0; for t 0 2 sons.t / do ForwardTransformation(t 0 , V , x, x); O xO t xO t C E t0 xO t 0 end for end if
Algorithm 7. Backward transformation. procedure BackwardTransformation(t, V , var y, y); O if sons.s/ D ; then y y C V t yO t else for t 0 2 sons.t / do yO t 0 yO t 0 C E t 0 yO t ; BackwardTransformation(t 0 , V , y, y) O end for end if Lemma 3.41 (Forward and backward transformation). Let V D .V t / t2T be a nested cluster basis with transfer matrices .E t / t2T and rank distribution K D .K t / t2T . Let .k t / t2T be defined as in (3.16) . Then the Algorithms 6 and 7 require not more than X X 2 k t #K t 2 k t2 t2T
t2T
arithmetic operations. t Proof. Let t 2 T . If t is a leaf cluster, both algorithms multiply the matrix V t 2 RK tO or its adjoint by a vector. This requires not more than 2.#tO/.#K t / 2k t #K t arithmetic operations. If t is not a leaf cluster, both algorithms multiply the matrices E t 0 2 RK t 0 K t or their adjoints by vectors for all sons t 0 2 sons.t /. For one matrix this takes not more than 2.#K t 0 /.#K t / operations, for all sons we get a bound of X 2.#K t 0 /.#K t / D 2k t #K t :
t 0 2sons.t/
Summing over all t 2 T yields the desired bound.
62
3 Hierarchical matrices
A closer look at the algorithms reveals that for each coefficient stored in the nested representation of V the algorithms perform one multiplication and at most one addition, so we could also have based the proof on the estimate of Lemma 3.35. Algorithm 8. Matrix-vector multiplication. procedure MVM(X , x, var y); ForwardTransformation(root.TJ /, W , x, x); O for t 2 T do yO t 0; for s 2 rowC .t / do yO t yO t C S t;s xO s end for end for; BackwardTransformation(root.T /, V , y, y); O for b D .t; s/ 2 LJ do y y C t Xs x end for Theorem 3.42 (Matrix-vector multiplication). Let X 2 H 2 .TJ ; V; W /. Let TJ be Csp -sparse and admissible, let K and L be the rank distributions of V and W , and let .k t / t2T and .ls /s2TJ be defined as in (3.16) and (3.18). The Algorithm 8 requires not more than X X .Csp C 2/ k t2 C ls2 t2T
s2TJ
operations. Proof. Both the forward and backward transformation are covered by Lemma 3.41. The multiplication step is carried out for each admissible block b D .t; s/ 2 LC J and requires 2.#K t /.#Ls / 2k t ls operations for multiplying by the matrix Sb and adding the result to yO t . The nearfield step is carried out for each inadmissible leaf b D .t; s/ 2 L J and requires 2.#tO/.#sO / operations for multiplying by the matrix t Xs and adding the result to y. Since TJ is admissible, both t and s are leaf clusters, so we have #tO k t and #sO ls and can bound the number of operations in this case by 2k t ls . Summing over all leaf blocks yields a bound of X X 2k t ls k t2 C ls2 bD.t;s/2LJ
bD.t;s/2TJ
D
X
X
k t2 C
t2T s2row.t/
Csp
X
t2T
k t2 C Csp
X X s2TJ t2col.s/
X
s2TJ
ls2 ;
ls2
3.8 Complexity estimates for bounded rank distributions
63
and adding the estimates of Lemma 3.41 completes the proof. In the model case, we have Csp D 6, k t ; ls D 2p0 4m for leaf clusters and k t ; ls 2m for non-leaf clusters, and Theorem 3.42 yields a bound of X .#tO/2 C .2p 1/.2m/2 16.4mn C 2mn/ D 96mn: 16 t2L
Our detailed analysis of the H 2 -matrix representation in the model case shows that not more than 17mn units of storage are required, and since we have seen that at most two operations are performed for each coefficient, we can derive the improved estimate of 34mn operations. The difference between this result and the bound provided by the general Theorem 3.42 is again due to the fact that we have simplified the proof by treating admissible and inadmissible blocks similarly.
3.8 Complexity estimates for bounded rank distributions Our bounds for the storage requirements and the number of operations involved in the matrix-vector multiplication are sums of powers of the quantities k t and ls describing the work corresponding to individual clusters. In the model problem, k t is uniformly bounded by 4m, and this leads to relatively simple estimates for the complexity. A closer look reveals that we can get useful bounds for the number of operations and for the storage requirements even for unbounded k t as long as large values occur only for a small number of clusters. This property is very important for variable-order techniques that can significantly improve the efficiency of H 2 -matrix approximations (cf. Sections 4.7 and 6.8). The storage requirements of the H 2 -matrices used for the model problem grow linearly with the number of degrees of freedom. This is a very desirable property, and we would like to preserve it even if the ranks of the cluster basis are unbounded. In order to do so, we have to ensure that there are only a few clusters that require a larger amount of work. We have seen that the amount of work for a cluster t depends on ´ if sons.t / D ;; max¹#K t ; #tOº ® ¯ kt ´ P max #K t ; t 0 2sons.t/ #K t 0 otherwise; and in order to bound this quantity, we have to bound #K t , and we also have to bound #tO if t is a leaf and # sons.t / if it is not. In order to keep the definition as general as possible, we use an indirect approach: we require that the number of clusters decreases exponentially as the amount of work increases algebraically. This guarantees that the sum of the amounts of work for all clusters converges.
64
3 Hierarchical matrices
Definition 3.43 (Bounded cluster tree). Let T be a cluster tree, let Cbc 2 R1 , ˛ 2 R>0 , ˇ 2 R0 , r 2 R1 , 2 R>1 . The tree T is .Cbc ; ˛; ˇ; r; /-bounded if #ft 2 L W #tO > .˛ C ˇ.` 1//r g Cbc ` #T
holds for all ` 2 N
and holds for all t 2 T :
# sons.t / Cbc Let k 2 N. The tree T is .Cbc ; k/-bounded if #tO k # sons.t / Cbc
holds for all t 2 L ; holds for all t 2 T :
Definition 3.44 (Bounded rank distribution). Let T be a cluster tree, let K D .K t / t2T be a rank distribution for T . Let Cbn 2 R1 , ˛ 2 R>0 , ˇ 2 R0 , r 2 R1 and 2 R>1 . The rank distribution K is .Cbn ; ˛; ˇ; r; /-bounded, if #ft 2 T W #K t > .˛ C ˇ.` 1//r g Cbn ` #T
holds for all ` 2 N:
(3.19)
Let k 2 N. The rank distribution K is k-bounded if #K t k
holds for all t 2 T :
If a rank distribution K is k-bounded, it is also .1; k; 0; 1; /-bounded for arbitrary values of 2 R>1 . If, on the other hand, K is .Cbn ; ˛; 0; r; /-bounded, taking the limit ` ! 1 of (3.19) implies #K t ˛ r for all t 2 T , i.e., K is ˛ r -bounded, and both definitions are equivalent for ˇ D 0. In the model case, the rank distribution is 4m-bounded, since we have k t D 2p0 < 4m for all leaf clusters t 2 L and even k t D 2m for the remaining clusters t 2 T nL . For variable-order approximations (cf. Section 4.7), a simple approach is to allow the order to increase as the level decreases, i.e., to use #K t D ˛ C ˇ.p level.t //; where p is the maximal level and ˛ and ˇ are parameters that can be used to control the approximation error. The resulting rank distribution is illustrated in Figure 3.11 for ˛ D ˇ D 1. We can see that ˛ C ˇ.p level.t // D #K t > ˛ C ˇ.` 1/ implies level.t/ < p C 1 `, so there are exactly 2p C1` 1 clusters t 2 T with #K t > ˛ C ˇ.` 1/. The total number of clusters satisfies #T D 2p C1 1 D 2` .2p C1` 2` / > 2` .2p C1` 1/ D 2` #ft 2 T W #K t > ˛ C ˇ.` 1/g;
3.8 Complexity estimates for bounded rank distributions
65
4 2
3
4
8
5
9
6
11
10
2
7
1
13
12
Figure 3.11. Rank distribution for variable-order approximation based on the level, .1; 1; 1; 1; 2/bounded.
4 2
3
4
8
5
9
10
6
12
2
7
13
2
13
Figure 3.12. Rank distribution for variable-order approximation with an additional singularity at the left endpoint, .Cbn ; 1; 1; 1; /-bounded.
therefore our rank distribution is .1; ˛; ˇ; 1; 2/-bounded. We conclude the discussion of Definition 3.44 by a non-trivial example. Sometimes, e.g., when approximating certain integral operators on manifolds [80], [81], additional singularities appear at endpoints of intervals, and we have to increase the order close to these points in order to ensure that the error is sufficiently small. Let us consider the situation given in Figure 3.12: we use #K t D ˛ C ˇ maxfp level.t /; p blog2 .dist.t; 0/= diam.t / C 1/cg; i.e., clusters close to a singularity assumed at the left endpoint of the interval Œ0; 1 are assigned a higher order. For this distribution, we can see that #K t > ˛ C ˇ.` 1/
66
3 Hierarchical matrices
implies level.t/ < p C 1 ` or blog2 .dist.t; 0/= diam.t / C 1/c < p C 1 `: We have already seen that the first condition holds for 2p C1` 1 clusters. The second condition is equivalent to dist.t; 0/= diam.t / C 1 < 2p C1` ;
dist.t; 0/ < diam.t /.2p C1` 1/;
so on each level there are at most 2p C1` 1 clusters matching the second condition. We conclude #ft 2 T W #K t > ˛ C ˇ.` 1/g D #ft 2 T W level.t / < p C 1 `g C #ft 2 T W level.t / p C 1 `; dist.t; 0/ < diam.t /.2p C1` 1/g .2p C1` 1/ C `.2p C1` 1/ D .` C 1/.2p C1` 1/: In order to apply Definition 3.44, we have to prove that this quantity decays exponentially if ` grows. Due to the factor ` C 1, we cannot get a convergence rate of 1=2, but we can get arbitrarily close: let 2 .1; 2/, and let q ´ =2 < 1. Due to Lemma 3.50, there is a constant C 2 R1 with .` C 1/q `
1 X
.` C 1/q ` C
for all ` 2 N0 ;
`D0
and we have #ft 2 T W #K t > ˛ C ˇ.` 1/g .` C 1/.2p C1` 1/ < .` C 1/2` .2p C1 1/ D .` C 1/q ` ` #T C ` #T for all ` 2 N. This means that our rank distribution is .C; ˛; ˇ; 1; /-bounded. The main advantage of Definition 3.44 is that it allows us to simplify complexity estimates of the form encountered in Lemma 3.38 or Theorem 3.42: Lemma 3.45 (Complexity bounds). Let T be .Cbc ; ˛; ˇ; r; /-bounded, and let K D .K t / t2T be a .Cbn ; ˛; ˇ; r; /-bounded rank distribution. Let .k t / t2T be defined as in (3.16), and let m 2 N. Then there is a constant Ccb 2 R1 depending only on Cbc , Cbn , r, and m satisfying X k tm Ccb .˛ C ˇ/rm c t2T
with c D #T denoting the number of clusters in T .
67
3.8 Complexity estimates for bounded rank distributions
Proof. We split the clusters into a disjoint partition according to the corresponding complexity: we let ´ if ` D 0; ¹t 2 T W k t Cbc ˛ r º C` ´ r r ¹t 2 T W Cbc .˛ C ˇ.` 1// < k t Cbc .˛ C ˇ`/ º otherwise for all ` 2 N0 . We have T D
[ P `2N0
C` ;
i.e., the complexity levels C` define a disjoint partition of the cluster tree, and k t Cbc .˛ C ˇ`/r holds for all t 2 C` and all ` 2 N0 by definition. Let ` 2 N, and let t 2 C` . This means ´ tOº max¹#K ° t ; #P ± Cbc .˛ C ˇ.` 1//r < k t D max #K t ; t 0 2sons.t/ #K t 0
(3.20)
if sons.t / D ;; otherwise:
If t is a leaf, this implies #K t > Cbc .˛ C ˇ.` 1//r .˛ C ˇ.` 1//r or #tO > Cbc .˛ C ˇ.` 1//r .˛ C ˇ.` 1//r : If t is not a leaf, we cannot have more than Cbc son clusters and can conclude #K t > .˛ C ˇ.` 1//r or #K t 0 > .˛ C ˇ.` 1//r
for at least one t 0 2 sons.t /:
This means that t has to be an element of at least one of the three sets R` ´ ft 2 T W #K t > .˛ C ˇ.` 1//r g; L` ´ ft 2 L W #tO > .˛ C ˇ.` 1//r g; ` ´ ft 2 T W #K t 0 > .˛ C ˇ.` 1/r for at least one t 0 2 sons.t /g; and we have proven the inclusion C` R` [ L` [ ` : Due to our assumptions we have #R` Cbn ` c ;
#L` Cbc ` c ;
68
3 Hierarchical matrices
and since each cluster can have at most one father, we also find ` D ffather.t / W t 2 T ; #K t > .˛ C ˇ.` 1//r g D ffather.t / W t 2 R` g; #` #R` Cbn ` c and obtain the bound #C` .2Cbn C Cbc / ` c
for all ` 2 N0 :
Combining this estimate with (3.20) yields X X X k tm D k tm t2T
`2N0 t2C`
X
.˛ C ˇ`/rm #C`
`2N0
X
Cbc .2Cbn C Cbc /
.˛ C ˇ`/rm ` c :
`2N0
Due to Lemma 3.50, there is an arm 2 R0 depending only on , r and m with X .˛ C ˇ`/rm ` .arm C 1/.˛ C ˇ/rm ; `2N0
and we conclude
X
k tm Cbc .2Cbn C Cbc /.arm C 1/.˛ C ˇ/rm c :
t2T
We set Ccb ´ Cbc .2Cbn C Cbc /.arm C 1/ to complete the proof. Using this result we can translate complexity estimates into a more manageable form: Corollary 3.46 (H 2 -matrix complexity). Let T and TJ be .Cbc ; ˛; ˇ; r; /-bounded cluster trees. Let V and W be nested cluster bases for T and TJ with .Cbn ; ˛; ˇ; r; /bounded rank distributions K and L. Let TJ be an admissible Csp -sparse block cluster tree, and let X 2 H 2 .TJ ; V; W /. The storage requirements for the H 2 matrix representation of X are in O..˛ C ˇ/2r .c C cJ //, and the matrix-vector multiplication can be performed in O..˛ C ˇ/2r .c C cJ // operations, with c ´ #T and cJ ´ #TJ denoting the numbers of clusters in T and TJ . Proof. Combine Lemma 3.38 and Theorem 3.42 with Lemma 3.45. Typically, we are not interested in complexity estimates based on the number of clusters, but on estimates based on the number of degrees of freedom. In order to relate both numbers, we require lower bounds for the size of leaf clusters and for the number of sons of non-leaf clusters:
3.8 Complexity estimates for bounded rank distributions
69
Definition 3.47 (Regular cluster tree). Let T be a cluster tree. Let Crc 2 R1 , ˛ 2 R>0 , ˇ 2 R0 , r 2 R1 and 2 R>1 . The tree T is .Crc ; ˛; ˇ; r; /-regular if it is .Crc ; ˛; ˇ; r; /-bounded and satisfies Crc #tO .˛ C ˇ/r # sons.t / 2
for all leaf clusters t 2 L ; for all non-leaf clusters t 2 T n L :
The tree T is .Crc ; k/-regular if it is .Crc ; k/-bounded and satisfies Crc #tO k # sons.t / 2
for all t 2 L ; for all non-leaf clusters t 2 T n L :
In the model case, the cluster tree is constructed by splitting clusters into two equalsized sons until their size drops below 4m. Due to the regular structure, each leaf cluster still contains at least 2m indices, so the cluster tree is .2; 4m; 0; 1/-regular. Lemma 3.48 (Number of clusters). Let T be a .Crc ; ˛; ˇ; r; /-regular cluster tree. Then we have n c D #T 2Crc ; .˛ C ˇ/r i.e., we can bound the number of clusters c by the number of indices. Proof. According to Definition 3.47, each leaf of the cluster tree contains at least Crc1 .˛ C ˇ/r indices. Due to Corollary 3.9, each index i 2 appears in exactly one leaf, and this means #L
n n D Crc : Crc1 .˛ C ˇ/r .˛ C ˇ/r
Since # sons.t/ 2 holds for all non-leaf clusters t 2 T nL , we can use Lemma 3.52 to get n c D #T 2#L 2Crc : .˛ C ˇ/r Regular cluster trees allow us to bound the number of clusters by the number of indices and thus get complexity estimates similar to the ones derived for the model case: Corollary 3.49 (Linear complexity). Let V and W be nested cluster bases for T and TJ with .Cbn ; ˛; ˇ; r; /-bounded rank distributions K and L. Let T and TJ be .Crc ; ˛; ˇ; r; /-regular. Let TJ be an admissible Csp -sparse block cluster tree, and let X 2 H 2 .TJ ; V; W /. The storage requirements for the H 2 -matrix representation of X are in O..˛ Cˇ/r .n CnJ //, and the matrix-vector multiplication can be performed in O..˛ C ˇ/r .n C nJ // operations. Proof. Combine Corollary 3.46 with Lemma 3.48.
70
3 Hierarchical matrices
3.9 Technical lemmas In order to derive bounds for the storage complexity for nested cluster bases, we have to investigate the relationship between the polynomial growth of #K t and the exponential decay of the number of clusters. Our theory is based on the following estimate: Lemma 3.50 (Exponential decay and polynomial growth). Let q 2 Œ0; 1/. We define .ak /k2N0 by the recurrence relation a0 D and find
1 ; 1q
1 X
akC1 D
k q X kC1 aj j 1q
(3.21)
j D0
.˛ C ˇ`/k q ` .ak C 1/.˛ C ˇ/k
(3.22)
`D0
for all ˛; ˇ 2 R0 and all k 2 N0 . Proof. For all n 2 N0 and k 2 N0 , we introduce the partial sums akn ´
n X
`k q ` :
`D0
We prove D ak by induction on k 2 N0 . In the case k D 0, the geometric summation formula implies limn!1 akn
a0n D
n X
q` D
`D0
1 q `C1 ; 1q
and this converges to a0 D 1=.1 q/. Let now k 2 N0 be such that limn!1 ajn D aj holds for all j 2 f0; : : : ; kg. The definition of akn implies nC1 akC1 D
nC1 X
`kC1 q ` D
`D0
Dq
n kC1 X X `D0 j D0
nC1 X
`kC1 q ` D
`D1
n X `D0
kC1 j ` ` q Dq j
.` C 1/kC1 q `C1
kC1 X j D0
n kC1 X k C 1 kC1 X j ` ajn ` q Dq j j `D0
j D0
for n 2 N, so we find nC1 n qakC1 akC1
1q
k q X kC1 n aj : D j 1q j D0
(3.23)
71
3.9 Technical lemmas
Due to our assumption, the right-hand side of equation (3.23) converges from below to akC1 , and due to n akC1 D
n n akC1 qakC1
1q
nC1 n akC1 qakC1
1q
D
k q X kC1 n aj akC1 ; j 1q j D0
n the increasing sequence .akC1 /n2N0 also converges to a limit ˛ akC1 . Taking equation (3.23) to the limit n ! 1 yields ˛ D akC1 and concludes the induction, thus proving 1 X mk q m D ak : (3.24) mD0
Due to 1 X
k m
.˛ C ˇm/ q
k
˛ C
mD0
1 X
k m
.˛m C ˇm/ q
k
k
.˛ C ˇ/ C .˛ C ˇ/
mD1
1 X
mk q m ;
mD1
we can use equation (3.24) to prove (3.22). Lemma 3.51 (Bound of ak ). The quantities .ak /k2N0 defined in Lemma 3.50 satisfy ak
1 1q
q 1 C 1q 2
k kŠ
(3.25)
for all k 2 N0 . Proof. We introduce ˛´
q 1 1 C 1q 2 2
and prove aj
1 ˛j j Š 1q
(3.26)
for all j 2 N0 by induction. For j D 0, this estimate is obvious. Let k 2 N0 be such that (3.26) holds for all j 2 f0; : : : ; kg. The recurrence relation (3.21) implies akC1
k k q X .k C 1/Š q X kC1 1 aj ˛j j Š D j 1q 1q1q .k C 1 j /Šj Š j D0
D
q.k C 1/Š .1 q/2
j D0
k X j D0
˛ q.k C 1/Š X ˛ j .k C 1 j /Š .1 q/2 2kj j
j D0
q.k C 1/Š k X 2 .2˛/j .1 q/2 k
D
k
j D0
72
3 Hierarchical matrices
q.k C 1/Š k .2˛/kC1 1 q.k C 1/Š k .2˛/kC1 2 2 .1 q/2 2˛ 1 .1 q/2 2˛ 1 kC1 q.k C 1/Š ˛ D 2 .1 q/2 2˛ 1 2q 1 D ˛ kC1 .k C 1/Š .1 q/2 2˛ 1 2q 1 q D ˛ kC1 .k C 1/Š .1 q/2 2q 1 D ˛ kC1 .k C 1/Š; 1q D
which proves (3.26) for j D k C 1 and concludes the induction. Lemma 3.52 (Bounding nodes by leaves). Let 2 N2 . Let T be a cluster tree satisfying for all non-leaf clusters t 2 T n L :
# sons.t / Then we have # sons .t/
1 #.L \ sons .t // 2#.L \ sons .t // 1 1
(3.27)
for all t 2 T . Applying this estimate to t D root.T / yields #T
1 #L 2#L ; 1 1
i.e., that the number of clusters can be bounded by the number of leaves. Proof. We prove (3.27) for all t 2 T by induction on # sons .t / 2 N. Let t 2 T with # sons .t / D 1. This implies sons.t / D ;, i.e., t 2 L and # sons .t / D 1 D
1 1 D #.L \ sons .t // : 1 1 1
Now let n 2 N. We assume that (3.27) holds for all t 2 T with sons .t / n. Let t 2 T with # sons .t / D n C 1. Due to # sons .t / > 1, we have [ sons .t 0 / sons .t / D ft g [ t 0 2sons.t/
and observe
# sons .t / D 1 C
X t 0 2sons.t/
# sons .t 0 /:
3.9 Technical lemmas
73
Since sons .t 0 / sons .t / n ft g holds, we have # sons .t 0 / n and can use the assumption in order to conclude X # sons .t / D 1 C # sons .t 0 / t 0 2sons.t/
1C
X t 0 2sons.t/
1 #.L \ sons .t 0 // 1 1
# sons.t / #.L \ sons .t // C 1 1 1 1 #.L \ sons .t // C 1 1 1 1 D #.L \ sons .t // : 1 1 D
Chapter 4
Application to integral operators
We now consider the application of the general H 2 -matrix structures introduced in Chapter 3 to a class of problems in which densely populated matrices occur naturally: the numerical treatment of integral operators. Our approach is similar to the one applied to the model problem in Chapter 2. We find a separable approximation of the kernel function underlying the integral operator, and this approximation gives rise to an H 2 -matrix approximation of the corresponding stiffness matrix. This is also the basic idea of the well-known panel clustering [72] and multipole [88], [58] techniques. A separable approximation can be constructed in many different ways. The generalization of the truncated Taylor expansion used in the model problem (and in the original paper [72] introducing the panel clustering algorithm) to the multi-dimensional setting is straightforward and yields an efficient technique if the coefficients of the Taylor expansion, i.e., arbitrary derivatives of the kernel function up to a certain order, can be computed efficiently. The latter requirement can be satisfied by using a recursion formula for the derivatives, but since these have to be derived by hand for each individual kernel function, the range of problems that can be covered by Taylor expansions is limited. Typical multipole methods [58], [60] use a more efficient approximation of the kernel function instead of the Taylor expansion, but they share the latter’s limitations to specific functions. A more general approach is based on interpolation: instead of approximating the kernel function by a Taylor expansion, we interpolate it by a polynomial [45], [65]. Using tensor-product interpolation operators in the multi-dimensional setting leads to a method that requires only pointwise evaluations of the kernel function instead of derivatives, so it is far better suited for general kernel functions and its implementation is very simple. The theoretical investigation of the interpolation approach is significantly more complicated than in the Taylor-based case, but it also yields significantly better results: it is possible to prove that the interpolant will always converge as long as the kernel function is locally analytic, and that in situations in which both interpolant and Taylor expansion converge, the interpolant will converge faster. In practical applications, we sometimes have to handle integral operators based on derivatives of analytic kernel functions. Instead of approximating the derivative directly, we can also use the derivatives of an approximation. The latter approach is usually preferable from the viewpoint of implementation, and it is possible to prove similar convergence rates for both techniques. Using the error estimates derived for the approximation of the kernel function, it is possible to find bounds for the error of the matrix approximation both in the Frobenius and the spectral norm. In case of the spectral norm, the error in each block is determined
75 by the product of the kernel approximation error and the ratio of the measure of the support of a cluster and a power of its diameter. If the singularity of the kernel function is sufficiently weak, the error is dominated by the large clusters and converges to zero as the clusters shrink. This suggests a variableorder scheme [91], [23], [100]: if we increase the order of the approximation in large clusters, we can ensure convergence of the global spectral error without harming the asymptotic complexity. In order to ensure that the cluster bases are nested despite the varying orders, the construction of the approximation scheme has to be modified. It can be proven that only a minimal change in the implementation is required to provide us with a stable and convergent method. This chapter is organized as follows: • Section 4.1 introduces the basic problem we want to solve. • Section 4.2 describes a general symmetric panel-clustering approach and demonstrates that it leads to an H 2 -matrix approximation of the relevant stiffness matrix. • Section 4.3 summarizes the well-known basic properties of Taylor approximations of asymptotically smooth kernel functions. • Section 4.4 describes the approach based on interpolation of the kernel function and presents the corresponding approximation results (cf. [65], [23]). • Section 4.5 gives error estimates for the derivatives of interpolants, which are useful for approximating, e.g., the classical double layer potential operator or the hyper-singular operator (cf. Section 4.1 in [17]). • Section 4.6 uses the results of the previous three sections to establish error estimates for the H 2 -matrix in the spectral and Frobenius norm. • Section 4.7 introduces the variable-order interpolation scheme (cf. [91], [23]). • Section 4.8 contains a number of auxiliary results required in the other sections. The proofs are only included for the sake of completeness. • Section 4.9 presents numerical experiments that demonstrate that the theoretical error and complexity estimates are close to optimal. Assumptions in this chapter: We assume that cluster trees T and TJ for the finite index sets and J, respectively, are given. Let TJ be an admissible block cluster tree for T and TJ . Let n ´ # and nJ ´ #J denote the number of indices in and J, and let c ´ #T and cJ ´ #TJ denote the number of clusters in T and TJ .
76
4 Application to integral operators
4.1 Integral operators In the model problem described in Chapter 2, we have considered the integral operator Z 1 G Œu.x/ ´ log jx yj u.y/ dy: 0
Its discretization by Galerkin’s method using a family .'i /niD1 of basis functions leads to a matrix G 2 Rnn given by Z 1 Z 1 Gij ´ 'i .x/ log jx yj 'j .y/ dy dx for all i; j 2 f1; : : : ; ng; 0
0
which is dense in general. Handling this matrix directly leads to an algorithmic complexity of O.n2 /, i.e., the method will require too much storage and too much time for large values of n. Using an H 2 -matrix approximation, it is possible to reduce the complexity to O.nm/, where m 2 N is a parameter controlling the accuracy of the approximation. Let us now consider the general integral operator Z G Œu.x/ ´ g.x; y/u.y/ dy; (4.1)
defined for a suitable compact subdomain or submanifold of Rd , a suitable kernel function g and functions u from a suitable linear space. Discretizing the operator G by a Petrov–Galerkin method using finite families .'i /i2 and . j /j 2J of basis functions leads to a matrix G 2 RJ given by Z Z Gij ´ 'i .x/ g.x; y/ j .y/ dy dx for all i 2 ; j 2 J: (4.2)
In typical applications, the kernel function g has global support, i.e., in general the matrix G is dense. Handling the matrix G directly leads to algorithms with a complexity of O.n nJ /, which is unacceptable for large-scale computations.
4.2 Low-rank approximation Using the techniques introduced in Section 3.3, we can construct suitable cluster trees T and TJ for the index sets and J and an admissible block cluster tree TJ for these trees. As in the case of the model problem, the approximation of the matrix G will be based on local separable approximations of the kernel function g. We define families of supports . i /i2 and . j /j 2J for and J by i ´ supp 'i for all i 2 ;
j ´ supp
j
for all j 2 J:
4.2 Low-rank approximation
77
As in Definition 3.20, we define the supports of clusters by [ i for all t 2 T ; t ´ i2tO
s ´
[
j
for all s 2 TJ :
j 2Os
Based on these subsets of , we can now introduce “nested cluster bases” in infinitedimensional spaces of functions: Definition 4.1 (Expansion system). Let K D .K t / t2T be a rank distribution for T . A family .v t; / t2T ;2K t of functions defined on the domain is called an expansion system for T with rank distribution K. An expansion system .v t; / t2T ;2K t is called nested if there is a family .E t / t2T of matrices satisfying the following conditions: • For all t 2 T and all t 0 2 sons.t /, we have E t 0 2 RK t 0 K t . • For all t 2 T and all t 0 2 sons.t /, we have X .E t 0 / 0 v t 0 ; 0 .x/ v t; .x/ D
for all 2 K t ; x 2 t 0 :
0 2K t 0
The matrices E t are called transfer matrices or expansion matrices. Let K D .K t / t2T and L D .Ls /s2TJ be rank distributions for T and TJ . Let .v t; / t2T ;2K t be a nested expansion system for T with expansion matrices .E t / t2T . Let .ws; /s2TJ ;2Ls be a nested expansion system for TJ with expansion matrices .Fs /s2TJ . Let S D .Sb /b2LC be a family of matrices satisfying Sb 2 RK t Ls for J
all admissible blocks b D .t; s/ 2 LC J . For all admissible blocks b D .t; s/ 2 LC J in TJ , we can define the separable approximation X X .Sb / v t; .x/ws; .y/ for all x 2 t ; y 2 s (4.3) gQ t;s .x; y/ ´ 2K t 2Ls
of the function g in the subdomain t s corresponding to the supports of t and s. Replacing g by gQ t;s in (4.2) gives rise to an approximation Z Z z 'i .x/ gQ t;s .x; y/ j .y/ dy dx .G t;s /ij ´ Z Z X X D .Sb / 'i .x/v t; .x/ dx (4.4) j .y/ws; .y/ dy 2K t 2Ls
D
X X
2K t 2Ls
.Sb / .V t /i .Ws /j D .V t Sb Ws /ij
for all i 2 tO; j 2 sO ;
78
4 Application to integral operators
s t where the matrices V t 2 RK and Ws 2 RsJL are given by O tO
´R .V t /i ´ .Ws /j ´
0 ´R
if i 2 tO; otherwise;
v t; .x/'i .x/ dx ws; .y/
j .y/ dy
for all t 2 T ; i 2 ; 2 K t ;
if j 2 sO ; otherwise;
0
for all s 2 TJ ; j 2 J; 2 Ls :
For all t 2 T , t 0 2 sons.t /, i 2 tO0 and 2 K t , we observe Z Z .V t /i ´ v t; .x/'i .x/ dx D v t; .x/'i .x/ dx t 0 X Z X D .E t / 0 v t 0 ; 0 .x/'i .x/ dx D .E t / 0 .V t 0 /i 0 D .V t 0 E t 0 /i ; 0 2K t 0
t 0
0 2K t 0
and this is equivalent to
X
Vt D
Vt 0 Et 0 ;
t 0 2sons.t/
i.e., the family V D .V t / t2T is a nested cluster basis for the row cluster tree T with transfer matrices .E t / t2T . By the same arguments, we can prove that W D .Ws /s2TJ is a nested cluster basis for the column cluster tree TJ with transfer matrices .Fs /s2TJ . Using the family S D .Sb /b2LC of coupling matrices, we find that we have J
constructed an H 2 -matrix approximation X zD G V t Sb Ws C bD.t;s/2LC J
X
t Gs
(4.5)
bD.t;s/2L J
of the matrix G. If gQ t;s j t s is a sufficiently accurate approximation of gj t s , the factorized z t;s ´ V t Sb Ws is an approximation of the block t Gs of the original matrix G matrix G. Remark 4.2 (General discretization schemes). Introducing the families of functionals .ˆi /i2 and .‰j /j 2J defined by Z Z ˆi .f / ´ f .x/'i .x/ dx; ‰j .f / ´ f .y/ j .y/ dy;
we can write equation (4.2) in the form Gij D .ˆi ˝ ‰j /.g/
for all i 2 ; j 2 J:
4.3 Approximation by Taylor expansion
79
For an admissible block b D .t; s/ 2 LC Q t;s yields J , replacing g by g z t;s /ij D .ˆi ˝ ‰j /.gQ t;s / D .G
X X
.Sb / ˆi .v t; /‰j .ws; /
2K t 2Ls
D .V t Sb Ws /ij
for all i 2 tO; j 2 sO ;
s t where the matrices V t 2 RK and Ws 2 RsJL are defined by O tO
´ .V t /i D ´ .Ws /j D
ˆi .v t; / 0
if i 2 tO; otherwise;
‰j .ws; / 0
if j 2 sO ; otherwise
for all i 2 , j 2 J, 2 K t and 2 Ls . Using this more general form allows us to handle general discretization schemes: a Nystrøm method corresponds to functionals ˆi .f / ´ f . i /;
‰j .f / ´ f .j /
for families . i /i2 and .j /j 2J of interpolation points, while a collocation method corresponds to Z ˆi .f / ´ f . i /; ‰j .f / ´ f .y/ j .y/ dy
for collocation points . i /i2 and basis functions .
j /j 2J .
4.3 Approximation by Taylor expansion We have seen that a separable approximation of the type X X gQ t;s .x; y/ D s t;s;; v t; .x/ws; .y/ 2K t 2Ls
can be used to find factorized representations for admissible blocks b D .t; s/ 2 LC J . In the model problem discussed in Chapter 2, this approximation is derived by using the Taylor expansion of g, which leads to (2.2).
80
4 Application to integral operators
Multi-dimensional Taylor expansion In order to generalize this approach to the d -dimensional case, we rely on the usual multi-index notation: a tuple 2 Nd0 is called a multi-index. The norm and the factorial of multi-indices are defined by jj ´
d X
D 1 C C d
for all 2 Nd0 ;
Š D .1 Š/ : : : .d Š/
for all 2 Nd0 :
D1
Š ´
d Y D1
A partial order on the set of multi-indices in Nd0 is defined by W() 8 2 f1; : : : ; d g W
for all ; 2 Nd0 :
We can raise a vector to a multi-index power given by p ´
d Y
p D p11 : : : pdd
for all 2 Nd0 ; p 2 Rd :
D1
As a “special case” of this notation, we introduce the partial differential operator
@ ´ @11 : : : @dd
for all 2 Nd0 :
Using these notations, we can introduce the truncated Taylor expansion: let z0 2 Rd , let ! Rd be a star-shaped domain with center z0 , let m 2 N and let f 2 C m .!/. The m-th order Taylor approximation of f in z0 is given by fQz0 ;m W ! ! R;
z 7!
X jj<m
@ f .z0 /
.z z0 / : Š
Separable approximation In the model case discussed in Chapter 2, the kernel function g is shift-invariant, i.e., we have g.x; y/ D g.x y; 0/, which allows us to treat g as a function of only one variable and apply a Taylor approximation directly. In the general case, we cannot assume shift-invariance. Instead, we use the fact that we have multi-dimensional Taylor expansions at our disposal: we handle the kernel function g directly as a 2d -dimensional function. We assume that d -dimensional balls K t and Ks satisfying t K t and s Ks are given. Since K t and Ks are star-shaped with respect to their centers x t and ys ,
81
4.3 Approximation by Taylor expansion
respectively, the 2d -dimensional domain ! ´ K t Ks is star-shaped with respect to z0 ´ .x t ; ys / 2 R2d . We assume that gj! 2 C m .!/ holds, and let gQ t;s be the m-th order Taylor expansion of g in z0 . It is given by gQ t;s .x; y/ D
X jCj<m
D
X jCj<m
@.;/ g.x t ; ys / .x x t ; y ys /.;/ .; /Š d C g.x t ; ys / .x x t / .y ys / dx dy Š Š
(4.6)
for all x 2 K t and all y 2 Ks . We can see that this function is of the form (4.3) if we let K t D Ls D f 2 Nd0 W jj < mg: As in the general case, the separable expansion gives rise to an approximation t Gs V t Sb Ws t of the matrix block corresponding to b D .t; s/, where the matrices V t 2 RK , tO JLs K t Ls Ws 2 R and Sb 2 R are given by ´R .xx t / 'i .x/ dx if i 2 tO; Š for all i 2 ; 2 K t ; .V t /i ´ 0 otherwise ´R .yys / if j 2 sO ; j .y/ dy .Ws /j ´ Š for all j 2 J; 2 Ls ; 0 otherwise ´ C d g if j C j < m; .x t ; ys / for all 2 K t ; 2 Ls : .Sb / ´ dx dy 0 otherwise
In order to reach the optimal order of complexity, the cluster bases V ´ .V t / t2T and W ´ .Ws /s2TJ have to be nested. In the case of Taylor expansions, we find .x x t 0 C x t 0 x t / 1 X .x x t / 0 0 .x x t 0 / .x t 0 x t / D D Š Š Š 0 0
X .x x t 0 / 0 .x t 0 x t / 0 D 0Š . 0 /Š 0
for all t; t 0 2 T and all 2 K t . This identity implies that for all t 2 T and all t 0 2 sons.t/ the matrices E t 0 2 RK t 0 K t defined by ´ 0 .x t 0 x t / if 0 ; 0 . /Š .E t 0 / 0 ´ for all 0 2 K t 0 ; 2 K t ; 0 otherwise
82
4 Application to integral operators
satisfy the equation Vt D
X
Vt 0 Et 0 ;
t 0 2sons.t/
that is, the matrices .E t / t2T are transfer matrices for the nested cluster basis V D .V t / t2T . A family .Fs /s2TJ of transfer matrices for the nested cluster basis W D .Ws /s2TJ can be constructed by a similar argument. Remark 4.3 (Implementation). Computing the entries of V t and Ws is straightforward: if the basis functions 'i and j are piecewise polynomials, we can use standard quadrature rules to evaluate the integrals. If ; 2 K t satisfy D .1 ; : : : ; 1 ; C 1; C1 ; : : : ; d / for a 2 f1; : : : ; d g, we get .x x t / .x x t / x x t; D ; Š Š C 1 and this equation allows us to use an efficient recurrence to evaluate the monomials in a given quadrature point rapidly. The computation of the transfer matrices E t and Fs is even simpler: we only have to evaluate monomials and factorials at points in Rd , no quadrature is required. The construction of the matrices Sb is more challenging. In order to find an efficient algorithm, recursive expressions for the derivatives of g have to be derived by hand. Lemma 4.4 (Complexity). Let m 2 N. We have m1Cd md #K t D d
for all t 2 T ;
i.e., the rank distribution is md -bounded. Proof. According to Lemma 4.74, we have
#K t D #f 2
Nd0
m1Cd W jj < mg D : d
Elementary computations yield
m1Cd d
D
d Y .m 1 C d /Š m1Ck D .m 1/Šd Š k kD1
d d Y Y k.m 1/ C k m D md ; D k kD1
so we conclude #K t md .
kD1
4.3 Approximation by Taylor expansion
83
Error analysis Let us now investigate the error introduced by the truncated Taylor expansion. The multi-dimensional counterpart of the representation (2.5) of this error is given by Lemma 4.75. In order to stress the similarities between the multi-dimensional error term (4.79) and its one-dimensional counterpart (2.5), we introduce the directional derivative X mŠ @ f .z/p @pm f .z/ ´ Š jjDm
for z 2 ! and a vector p 2 Rd . Using p ´ z z0 in Lemma 4.75 yields Z 1 1 Q .1 t /m1 @pm f .z0 C tp/ dt: f .z/ fz0 ;m .z/ D .m 1/Š 0
(4.7)
Let us now investigate the approximation properties of the function gQ t;s . In order to prove useful bounds for the approximation error introduced by replacing g with gQ t;s , we have to be able to control the growth of the derivatives of the original kernel function. Since we frequently have to work with quotients of factorials, we use the notation ´Q n .n C k/Š n if n > 0; `D1 .k C `/ for all n; k 2 N0 : ´ (4.8) D kŠ k 1 otherwise Definition 4.5 (Asymptotically smooth kernels). Let g 2 Rd Rd ! R. Let Cas 2 R>0 , c0 2 R1 and 2 N. The function g is called .Cas ; ; c0 /-asymptotically smooth (cf. [28], [27]) if j@p g.x; y/j Cas
c0 kpk2 1 kx ykC 2
(4.9)
holds for all 2 N0 , x; y 2 Rd with x ¤ y and all directions p 2 Rd Rd . For D 0, the function g is called .Cas ; 0; c0 /-asymptotically smooth if j@p g.x; y/j Cas . 1/Š
c0 kpk2 kx yk2
(4.10)
holds for all 2 N, x; y 2 Rd with x ¤ y and all directions p 2 Rd Rd . In this context is called the order of the singularity of g, D 0 corresponds to logarithmic singularities. Example 4.6. The most important examples of asymptotically smooth kernel functions are ´ 1 1 if x ¤ y; 3 3 g3 W R R ! R; .x; y/ 7! 4 kxyk2 0 otherwise;
84
4 Application to integral operators
the fundamental solution of Poisson’s equation in three-dimensional space, and ´ g2 W R2 R2 ! R;
.x; y/ 7!
1 log kx yk2 2 0
if x ¤ y; otherwise;
its two-dimensional counterpart. According to [63], Appendix E, both functions are asymptotically smooth: for a given c0 > 1, [63], Satz E.2.1, yields a constant Cas 2 R>0 such that g3 is .Cas ; 1; c0 /asymptotically smooth and g2 is .Cas ; 0; c0 /-asymptotically smooth. In order to be able to formulate the approximation error estimate in the familiar terms of diameters and distances, we require the following result (which is obvious if considered geometrically, cf. Figure 4.1): xt
ys rt
dist rs
Figure 4.1. Distance of the centers x t and ys of two circles expressed by the distance of the circles and their radii.
Lemma 4.7 (Distance of centers). Let x t 2 K t and ys 2 Ks be the centers of the balls K t and Ks , respectively. Let r t 2 R0 and rs 2 R0 be the radii of K t and Ks , respectively. If dist.K t ; Ks / > 0 holds, we have kx t ys k2 dist.K t ; Ks / C r t C rs : Proof. We define the continuous functions x W Œ0; r t ! Rd ; y W Œ0; rs ! Rd ; h W Œ0; r t Œ0; rs ! R;
x t ys ; kx t ys k2 x t ys ˇ 7! ys C ˇ ; kx t ys k2 .˛; ˇ/ 7! kx t ys k2 ˛ ˇ; ˛ 7! x t ˛
85
4.3 Approximation by Taylor expansion
and observe x.˛/ 2 K t ;
y.ˇ/ 2 Ks
for all ˛ 2 Œ0; r t ; ˇ 2 Œ0; rs ;
which implies x t ys 0 < dist.K t ; Ks / kx.˛/ y.ˇ/k2 D .kx t ys k2 ˛ ˇ/ kx y k t
s 2 2
D jkx t ys k2 ˛ ˇj D jh.˛; ˇ/j for all ˛ 2 Œ0; r t and ˇ 2 Œ0; rs . Since h is continuous with h.0; 0/ D kx t ys k2 dist.K t ; Ks / > 0; we conclude h.˛; ˇ/ > 0 for all ˛ 2 Œ0; r t and all ˇ 2 Œ0; rs , i.e., dist.K t ; Ks / jh.r t ; rs /j D h.r t ; rs / D kx t ys k2 r t rs ; and this is the desired estimate. Now we can proceed to prove an estimate for the approximation error resulting from Taylor approximation: Theorem 4.8 (Approximation error). Let 2 R>0 . Let K t and Ks be d -dimensional balls satisfying the admissibility condition diam.K t / C diam.Ks / 2 dist.K t ; Ks /:
(4.11)
Let the kernel function g be .Cas ; ; c0 /-asymptotically smooth. Let x t 2 K t and ys 2 Ks be the centers of K t and Ks , respectively. Then 8 m1 ˆ
ˆC m1 c0 c0 : otherwise as C1 dist.K t ;Ks / holds for all m 2 N, x 2 K t and all y 2 Ks . Proof. Combining equation (4.7) with the bound (4.9) yields jg.x; y/ gQ t;s .x; y/j D jg.x; y/ gQ z0 ;m .x; y/j Z 1 1 D .1 t /m1 j@pm f .z0 C pt /j dt .m 1/Š 0 Z 1 .1 t /m1 m m y C c0 kpk2 dt mC 0 k.x t ys / C .x x t .y ys //t k2
86
4 Application to integral operators
for p D .x x t ; y ys /, z0 D .x t ; ys / and ´ Cas .m 1/Š D Cas m Cy ´ .m1/Š .m1C/Š Cas D Cas .m1/Š .m1/Š 1 . 1/Š
if D 0; otherwise:
In order to derive a useful estimate, we have to find a bound for the integral term. To this end, we introduce relative coordinates ´ x y and 0 ´ x t ys and observe Z 1 .1 t /m1 m m y dt: (4.13) jg.x; y/ gQ t;s .x; y/j C c0 kpk2 mC 0 k0 C . 0 /t k2 Let r t ´ diam.K t /=2 and rs ´ diam.Ks /=2 denote the radii of K t and Ks , respectively. We let ˛ ´ r t C rs ; ˇ ´ dist.K t ; Ks / and use Lemma 4.7 in order to prove k0 k2 D kx t ys k2 dist.K t ; Ks / C r t C rs D ˛ C ˇ; k 0 k2 D kx x t .y ys /k2 kx x t k2 C ky ys k2 r t C rs D ˛; kpk2 D .kx x t k22 C ky ys k22 /1=2 kx x t k2 C ky ys k2 r t C rs D ˛; which implies k0 C . 0 /tk2 k0 k2 k 0 k2 t ˛ C ˇ ˛t D ˛.1 t / C ˇ for all t 2 Œ0; 1. Combining this inequality with (4.13) yields Z jg.x; y/ gQ t;s .x; y/j Cy c0m kpkm 2 Z
0
1
.1 t /m1 dt k0 C . 0 /t kmC 2
1
.1 t/m1 ˛ m dt mC 0 ..1 t /˛ C ˇ/ m1 Z 1 ˛ .1 t /˛ m Cy c0 dt: .1 t /˛ C ˇ ..1 t /˛ C ˇ/C1 0 Cy c0m
The assumption (4.11) implies ˛ ˇ, i.e., .1 t /˛ 1t 1 .1 t/˛ D D : .1 t/˛ C ˇ .1 t /˛ C ˛= 1 t C 1= 1 C 1= C1 If D 0 holds, we have ˇ ˇ Z 1 Z 1 ˇ˛ C ˇ ˇ ˛ ˛ ˇ ˇ dt D ds D log j˛ C ˇj log jˇj D log ˇ C1 ˇ ˇ 0 ..1 t/˛ C ˇ/ 0 s˛ C ˇ
4.4 Approximation by interpolation
87
ˇ ˇ ˇ ˇ. C 1/ ˇ ˇ D log. C 1/: ˇ
log ˇˇ
ˇ
For > 0, we find Z 1 Z 1 ˛ ˛ 1 1 1 dt D ds D C1 C1 .˛ C ˇ/ ˇ 0 ..1 t/˛ C ˇ/ 0 .s˛ C ˇ/ 1 1 1 . C 1/ 1 1 D ˇ .ˇ C ˇ/ . C 1/ ˇ 1 ˇ and can conclude
8 m1 ˆ
C1
if D 0; otherwise;
and substituting the value of Cy completes the proof. Theorem 4.8 is a generalization of the result of Lemma 2.2: the former implies the latter if we let Cas D 1, D 0 and c0 D 1. If the distance of the d -dimensional balls K t and Ks is sufficiently large compared to their diameters, i.e., if qopt ´ c0 =. C 1/ < 1 holds, Theorem 4.8 yields that the approximant gQ t;s jK t Ks converges to the kernel function gjK t Ks at an exponential rate if m grows: for fixed K t and Ks , the error is proportional to .m C 1/Š m1 qopt ; .m 1/Š and here the exponential decay of the second factor dominates the polynomial growth of the first one: for each q 2 R>qopt , we can find a constant Cta 2 R>0 such that kg gQ t;s kK t Ks
Cta qm dist.K t ; Ks /
holds for all b D .t; s/ 2 LC J ; m 2 N:
4.4 Approximation by interpolation Using Taylor expansions to construct separable approximations of the kernel function has many advantages: significant portions of the resulting transfer and coupling matrices contain only zero entries, i.e., we can save storage by using efficient data formats, the evaluation of the monomials corresponding to the cluster bases is straightforward, and the error analysis is fairly simple.
88
4 Application to integral operators
Unfortunately, the approach via Taylor expansions has also two major disadvantages: the construction of the coupling matrices requires the efficient evaluation of derivatives of the kernel function g, e.g., by recursion formulas that have to be derived by hand, and the error estimates are not robust with respect to the parameter c0 appearing in the Definition 4.5 of asymptotic smoothness: if c0 grows, we have to adapt in order to guarantee exponential convergence. Both properties limit the applicability of Taylor-based approximations in general situations. We can overcome the disadvantages by using an alternative approximation scheme: instead of constructing a polynomial approximation of the kernel function g by Taylor expansion, we use Lagrangian interpolation.
One-dimensional interpolation Let us first consider the one-dimensional case. For each interpolation order m 2 N, we fix interpolation points . m; /m D1 in the interval Œ1; 1. We require that the points corresponding to one m 2 N are pairwise different, i.e., that ¤ ) m; ¤ m;
holds for all m 2 N and ; 2 f1; : : : ; mg:
(4.14)
The one-dimensional interpolation operator of order m 2 N is given by Im W C Œ1; 1 ! Pm ;
f 7!
m X
f . m; /Lm; ;
D1
where the Lagrange polynomials Lm; 2 Pm are given by Lm; .x/ ´
m Y
x m;
m; D1 m;
for all x 2 R; m 2 N; 2 f1; : : : ; mg:
¤
Since Lm; . m; / D ı holds for all m 2 N and ; 2 f1; : : : ; mg, we have Im Œf . m; / D f . m; /
for all f 2 C Œ1; 1; m 2 N and 2 f1; : : : ; mg:
Combining (4.14) with this equation and the identity theorem for polynomials yields Im Œp D p
for all m 2 N and p 2 Pm ;
(4.15)
i.e., the interpolation Im is a projection with range Pm . In order to define an interpolation operator for general non-empty intervals Œa; b, we consider the linear mapping ˆŒa;b W Œ1; 1 ! Œa; b;
t 7!
bCa ba C t; 2 2
4.4 Approximation by interpolation
89
W C Œa; b ! Pm by and define the transformed interpolation operator IŒa;b
m 1 IŒa;b
m Œf ´ .Im Œf B ˆŒa;b / B ˆŒa;b
for all m 2 N;
i.e., by mapping f into C Œ1; 1, applying the original interpolation operator, and is an affine mapping the resulting polynomial back to the interval Œa; b. Since ˆ1 Œa;b
mapping, the result will still be a polynomial in Pm . Let m 2 N. The definition of Im yields IŒa;b
m Œf D
m X
f .ˆŒa;b . m; //Lm; B ˆ1 Œa;b ;
D1
Œa;b m and defining the transformed interpolation points m; in the interval Œa; b by D1 bCa ba C
m; 2 2
m and the corresponding transformed Lagrange polynomials LŒa;b
m; D1 by Œa;b
m; ´ ˆŒa;b . m; / D
1 LŒa;b
m; ´ Lm; B ˆŒa;b ;
we get the more compact notation IŒa;b
m
m X Œa;b Œa;b
Lm; : D f m; D1
Since the equation Œa;b
1 LŒa;b
m; . m; / D Lm; B ˆŒa;b .ˆŒa;b . m; // D Lm; . m; / D ı
D
m Œa;b
Œa;b
Y
m; m; D1 ¤
Œa;b
Œa;b
m; m;
holds for all ; 2 f1; : : : ; mg, the identity theorem for polynomials yields the equation LŒa;b
m; .x/
D
m Y D1 ¤
Œa;b
x m; Œa;b
Œa;b
m; m;
for all x 2 R; m 2 N and 2 f1; : : : ; mg;
which we can use to evaluate the transformed Lagrange polynomials efficiently.
90
4 Application to integral operators
Separable approximation by multi-dimensional interpolation Since we intend to apply interpolation to construct a separable approximation of the kernel function g defined in a multi-dimensional domain, we require multi-dimensional interpolation operators. Let us consider an axis-parallel d -dimensional box Q D Œa1 ; b1 Œad ; bd with a1 < b1 , …, ad < bd . The order m of the one-dimensional interpolation scheme is replaced by an order vector m 2 Nd , and the corresponding tensor-product interpolation operator IQ m is defined by Œa1 ;b1
Œad ;bd
˝ ˝ Im : IQ m ´ Im1 d
We can observe that it takes the familiar form X Q IQ Œf D f . m; /LQ m m;
(4.16)
for all f 2 C.Q/;
0<m
if the multi-dimensional interpolation points and Lagrange polynomials are given by Œa1 ;b1
Œad ;bd
Q
m; ´ . m ; : : : ; m / 1 ;1 d ;d
for all 2 Nd with m;
Œa1 ;b1
Œad ;bd
LQ m; ´ Lm1 ;1 ˝ ˝ Lmd ;d
for all 2 Nd with m:
In practice, the multi-dimensional Lagrange polynomials can be easily evaluated by using the equation LQ m; .x/ D
d Y
;b
LŒa m ; .x /
D1
D
m d Y Y
Œa ;b
x m ;
Œa ;b
D1 D1 m ; ¤
Œa ;b
m ;
for all x 2 Rd and 2 Ndm :
In order to construct separable approximations of g, we enclose the supports t and s of all clusters t 2 T and s 2 TJ in axis-parallel bounding boxes (cf. 3.10), i.e., we fix axis-parallel boxes .Q t / t2T and .Qs /s2TJ satisfying t Qt ;
s Qs
for all t 2 T ; s 2 TJ :
We also fix order vectors .m t / t2T and .ms /s2TJ for all clusters in T and TJ , respectively. Using these bounding boxes and order vectors, we can define interpolation operators .I t / t2T and .Is /s2TJ for all clusters in T and TJ , respectively, by t I t ´ IQ mt ;
s Is ´ IQ ms
for all t 2 T ; s 2 TJ :
(4.17)
91
4.4 Approximation by interpolation
If we let K t ´ f 2 Nd W 1 m t; for all 2 f1; : : : ; d gg
for all t 2 T ;
Ls ´ f 2 N
for all s 2 TJ
d
W 1 ms; for all 2 f1; : : : ; d gg
and define the interpolation points and Lagrange polynomials for all clusters by Qt
t; ´ m ; t ;
t L t; ´ LQ m t ;
for all t 2 T ; 2 K t ;
Qs
s; ´ m ; s ;
s Ls; ´ LQ ms ;
for all s 2 TJ ; 2 Ls ;
we can see that the interpolation operators take the form X f . t; /L t; for all t 2 T ; f 2 C.Q t /; I t Œf D 2K t
Is Œf D
X
f . s; /Ls;
for all s 2 TJ ; f 2 C.Qs /:
(4.18)
2Ls
Let b D .t; s/ 2 LC J be an admissible pair of clusters. By construction, Q ´ Q t Qs is an axis-parallel box in R2d and m ´ .m t ; ms / is a 2d -dimensional order vector. By definition, we have Qt Qs IQ m D I m t ˝ I ms D I t ˝ I s
and the 2d -dimensional interpolant of g is given by X X g. t ; s /Lt .x/Ls .y/ gQ t;s .x; y/ ´ IQ m Œg.x; y/ D
for all x; y 2 Rd ;
2K t 2Ls
(4.19) i.e., the 2d -dimensional interpolation leads to a separable approximation of the desired form (4.3). We can use this approximation of g to define an approximation t Gs V t Sb Ws t of the matrix block corresponding to b D .t; s/, where the matrices V t 2 RK , tO s Ws 2 RsJL and Sb 2 RK t Ls are given by O ´R L t; .x/'i .x/ dx if i 2 tO; .V t /i ´ 0 otherwise ´R Ls; .y/ j .y/ dy if j 2 sO ; .Ws /j ´ 0 otherwise
for all i 2 ; 2 K t ;
(4.20a)
for all j 2 J; 2 Ls ; (4.20b)
92
4 Application to integral operators
.Sb / ´ g. t; ; s; /
for all 2 K t ; 2 Ls :
(4.20c)
We can only obtain an algorithm with optimal order of complexity if the resulting cluster bases V ´ .V t / t2T and W ´ .Ws /s2TJ are nested. We assume m t m t 0 for all t 2 T and all t 0 2 sons.t / and observe that (4.15) implies I t 0 ŒL t; D L t;
for all t 2 T ; t 0 2 sons.t / and all 2 K t :
Due to (4.18), this means X
L t; . t 0 ; 0 /L t 0 ; 0 D L t; ;
0 2K t 0
and we find that for all t 2 T and all t 0 2 sons.t / the matrices E t 0 2 RK t 0 K t defined by .E t 0 / 0 ´ L t; . t 0 ; 0 / satisfy the equation Vt D
for all 0 2 K t 0 ; 2 K t ;
X
Vt 0 Et 0 ;
t 0 2sons.t/
i.e., the matrices .E t / t2T are transfer matrices for the nested cluster basis V D .V t / t2T . A family .Fs /s2TJ of transfer matrices for the nested cluster basis W D .Ws /s2TJ can be constructed by a similar argument. Remark 4.9 (Implementation). We can see that the construction of the coupling matrices Sb for this approximation scheme is simpler than in the case of Taylor approximations: instead of having to evaluate arbitrarily high derivatives of the kernel function g, interpolation only requires the evaluation of the function g itself. This means that we can apply interpolation in quite general situations as a “black box” strategy. Finding the correct interpolation points is straightforward: we have seen in Section 3.3 that optimal bounding boxes Q t and Qs can be constructed by a simple and efficient algorithm. Once these boxes and one-dimensional interpolation points . /m D1 are given, the construction of tensor interpolation points is trivial. Based on these interpolation points, the Lagrange polynomials L t can be evaluated efficiently, i.e., the computation of the entries of the transfer matrices and the values of the integrands appearing in the definition of V t and Ws is easily accomplished. A simple quadrature rule can be used to take care of the integration. The transfer matrices E t 0 are of a special structure: we have .E t 0 / 0 D L t; . t 0 ; 0 / D
d Y
Œa0 ;b 0
LŒa ;b . 0 / ƒ‚ … D1 „ μ.E t 0 ; / 0
4.4 Approximation by interpolation
93
for Q t D Œa1 ; : : : ; ad and Q t 0 D Œa10 ; : : : ; ad0 . We can prepare the d.m C 1/2 coefficients of the d auxiliary matrices E t 0 ; using only 4d.m C 1/3 operations, and the computation of one entry of E t 0 then requires only d 1 multiplications. Lemma 4.10 (Complexity). If 2 N satisfies m t; we have #K t D
for all t 2 T ; 2 f1; : : : ; d g; d Y
m t; d
for all t 2 T ;
D1
i.e., the rank distribution is d -bounded. Proof. Similar to the proof of Lemma 4.4.
Error analysis in the one-dimensional case Let m 2 N. The interpolation operator Im is continuous since we have ˇ ®ˇ P ¯ ˇ kIm Œf k1;Œ1;1 D max ˇ m D1 f . m; /Lm; .x/ W x 2 Œ1; 1 ® Pm ¯ max D1 kf k1;Œ1;1 jLm; .x/j W x 2 Œ1; 1 D ƒm kf k1;Œ1;1
(4.21)
for all f 2 C Œ1; 1 with the Lebesgue constant ® Pm ¯ ƒm ´ max D1 jLm; .x/j W x 2 Œ1; 1 : Combining the stability estimate (4.21) with the projection property (4.15) yields the lower bound ƒm 1 for all m 2 N. Definition 4.11 (Stable interpolation scheme). A family .Im /1 mD1 of interpolation operators is called an interpolation scheme. If there are constants ƒ; 2 R1 satisfying ƒm ƒ.m C 1/
for all m 2 N;
(4.22)
the interpolation scheme .Im /1 mD0 is called .ƒ; /-stable. Example 4.12 (Chebyshev interpolation). The Chebyshev interpolation points are given by 2 1 for all m 2 N; 2 f1; : : : ; mg:
m; ´ cos 2m
94
4 Application to integral operators
The corresponding interpolation scheme has the advantage that its Lebesgue constants satisfy ƒm
2 ln.m C 1/ C 1 m C 1
for all m 2 N;
(4.23)
i.e., the interpolation scheme is .1; 1/-stable [87]. Let m 2 N, and let Œa; b be a non-trivial interval. Since we have 1 kIŒa;b
m Œf k1;Œa;b D k.Im Œf B ˆŒa;b / B ˆŒa;b k1;Œa;b
D kIm Œf B ˆŒa;b k1;Œ1;1
ƒm kf B ˆŒa;b k1;Œ1;1 D ƒm kf k1;Œa;b
(4.24)
for all f 2 C Œa; b, we can conclude that the transformed interpolation operator IŒa;b
m has the same Lebesgue constant as Im . Using a bound for the Lebesgue constant of an interpolation scheme, we can demonstrate that the interpolant is close to the best possible polynomial approximation: Lemma 4.13 (Best approximation). Let m 2 N and f 2 C Œ1; 1. We have kf Im Œf k1;Œ1;1 .ƒm C 1/kf pk1;Œ1;1
for all p 2 Pm :
(4.25)
Proof. Let p 2 Pm . Due to (4.15), we have Im Œp D p and find kf Im Œf k1;Œ1;1 D kf p C Im Œp Im Œf k1;Œ1;1
kf pk1;Œ1;1 C kIm Œp f k1;Œ1;1
kf pk1;Œ1;1 C ƒm kf pk1;Œ1;1
D .ƒm C 1/kf pk1;Œ1;1 : Due to this best-approximation property, we can bound the interpolation error by finding an approximating polynomial. We follow the approach described in [23], Lemma 3.13: we construct a holomorphic extension of f into an elliptic subdomain of the complex plane C and use the following result to find the desired polynomial: Lemma 4.14 (Approximation of holomorphic functions). Let % 2 R>1 and ´ μ 2 2 2x 2y E% ´ x C iy W x; y 2 R; C 1 : % C 1=% % 1=% Let f 2 C 1 .E% / be a holomorphic function. We have min kf pk1;Œ1;1
p2Pm
2% m % kf k1;E% %1
for all m 2 N:
(4.26)
4.4 Approximation by interpolation
95
Proof. See, e.g., [42], Chapter 7, Section 8, equation (8.7). % D 5=2 %D2 % D 3=2
Figure 4.2. Analyticity domains E% for different values of %.
In order to apply this lemma, we have to be able to find holomorphic extensions of functions f 2 C 1 Œ1; 1 to a regularity ellipse E% . If the derivatives of f are bounded, the extension exists and we obtain the following existence result for polynomial approximations: Lemma 4.15 (Polynomial approximation). Let f 2 C 1 Œ1; 1, Cf 2 R0 , f 2 R>0 and 2 N be such that jf ./ .x/j
Cf f 1
holds for all x 2 Œ1; 1; 2 N0 :
(4.27)
For all m 2 N, we can then find a polynomial p 2 Pm with kf pk1;Œ1;1 2Cf e.m C 1/
q m f C 2 f C 1 C f2 ; f
Proof. Let m 2 N. Let r 20; f Œ. Due to Lemma 4.76, we can find a holomorphic extension fQ of f to the domain Rr which satisfies f Q kf k1;Rr Cf : f r p Let % ´ r C 1 C r 2 . We can apply Lemma 4.77 in order to prove E% Rr , i.e., f kfQk1;E% Cf : f r
96
4 Application to integral operators
Due to Lemma 4.14, we can find a polynomial p 2 Pm with 2% m Q kf pk1;Œ1;1 D kfQ pk1;Œ1;1 % kf k1;E% %1 f 2% m Cf : % %1 f r Using the definition of % yields p p p % .r C 1 C r 2 /.r 1 1 C r 2 / r C 1 C r2 D D p p p %1 r 1 C 1 C r2 .r 1 C 1 C r 2 /.r 1 1 C r 2 / p p p r2 r r 1 C r2 C r 1 C r2 1 C r2 1 r2 D r 2 2r C 1 1 r 2 p p p r 1 1 C r 2 r C 1 C 1 C r2 r C 1 C 1 C 2r C r 2 D D 2r 2r 2r 2r C 2 r C1 D D : 2r r Applying these estimates to r ´ ˛f with ˛ 20; 1Œ yields f 2% m % kf pk1;Œ1;1 Cf %1 f r m p r C1 f 2Cf r C 1 C r2 r f r q m f ˛f C 1 2 2 ˛f C 1 C ˛ f D 2Cf ˛f f ˛f q m ˛f C 1 1 ˛f C ˛ 1 C f2 2Cf ˛f .1 ˛/ q m ˛f C 1 1 2 D 2Cf C 1 C : f f ˛f ˛ m .1 ˛/ We choose ˛ ´ m=.m C 1/ and get f C 1=˛ f C .m C 1/=m f C 2 ˛f C 1 D D ; ˛f f f f 1 m mC1 m 1 m 1 D D 1C .m C 1/ < e.m C 1/ ˛ m .1 ˛/ m mC1 m and conclude kf pk1;Œ1;1 < 2Cf e.m C 1/
q m f C 2 f C 1 C f2 : f
4.4 Approximation by interpolation
97
Using this general approximation result for analytic functions in the interval Œ1; 1, we can now bound the interpolation error using the best-approximation result Lemma 4.13. Theorem 4.16 (Interpolation error). Let Œa; b R be a non-trivial interval. Let m 2 N and n 2 f1; : : : ; mg. Let f 2 C 1 Œa; b, Cf 2 R0 , f 2 R>0 and 2 N be such that jf ./ .x/j
Cf f 1
holds for all x 2 Œa; b; 2 N0 :
For the function % W R0 ! R1 ;
r 7! r C
p
1 C r 2;
(4.28)
we have
2f n ba kf IŒa;b
1 C % Œf k 2eC .ƒ C 1/.n C 1/ : m 1;Œa;b
f m ba f (4.29)
Proof. We let fO ´ f B ˆŒa;b and Of ´ 2f =.b a/ and observe ˇ ˇ ˇ b a ./ ˇ ./ O ˇ jf .x/j D ˇ f .ˆŒa;b .x//ˇˇ 2 Cf for all x 2 Œ1; 1; 2 N0 ; Of 1 so we can apply Lemma 4.15 to construct a polynomial pO 2 Pn satisfying q n 2 O kf pk O 1;Œ1;1 2Cf e.n C 1/ 1 C : Of C 1 C Of2 Of Due to Lemma 4.13, this implies Œa;b
kf IŒa;b
m Œf k1;Œa;b D kf B ˆŒa;b Im Œf B ˆŒa;b k1;Œ1;1
D kfO Im ŒfOk1;Œ1;1
q n 2 2eCf .ƒm C 1/.n C 1/ 1 C Of C 1 C Of2 ; Of
which is the required estimate. Remark 4.17 (Rate of convergence). The function % determining the rate of convergence introduced in (4.28) is monotonic increasing, since it is the sum of two monotonic increasing functions. It can be bounded from below by p p %.r/ D r C 1 C r 2 > r C 1 and %.r/ D r C 1 C r 2 > r C r D 2r
98
4 Application to integral operators
for all r 2 R>0 . The first estimate guarantees convergence even if a holomorphic extension exists only in a small neighbourhood of Œ1; 1. The second estimates shows that for r 1 we get the improved convergence rate of standard Chebyshev interpolation.
Error analysis in the multi-dimensional case We now consider the d -dimensional interpolation operator IQ m defined by (4.16) for the axis-parallel box Q D Œa1 ; b1 Œad ; bd : For all 2 f1; : : : ; d g, we let Œa ;b
IQ I ˝ ˝ I ˝Im ˝ I ˝ ˝ I : m; ´ „ ƒ‚ … „ ƒ‚ …
(4.30)
d times
1 times
For an index 2 f1; : : : ; d g and a function f 2 C.Q/, the interpolant IQ m; Œf is polynomial in the -th coordinate: we have IQ m; Œf
.x/ D
m X D1
Œa ;b
;b
f .x1 ; : : : ; x1 ; m ; xC1 ; : : : ; xd /LŒa m ; .x / ;
(4.31)
for all x 2 Q. The definition of IQ m implies IQ m D
d Y
IQ m; ;
D1
therefore we can analyze the tensor interpolation operator by analyzing interpolation in each coordinate direction separately. Lemma 4.18 (Stability of directional interpolation). Let 2 f1; : : : ; d g be a coordinate direction. We have kIQ m; Œf k1;Q ƒm kf k1;Q
for all f 2 C.Q/:
Proof. Let f 2 C.Q/. We fix y 2 Œa ; b for all 2 f1; : : : ; 1; C 1; : : : ; d g. We define the function f W Œa ; b ! R;
x 7! f .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd /:
We have f .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd / D f .x/ IQ m; Œf
.y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd / D
Œa ;b
Im Œf .x/
for all x 2 Œa ; b ; for all x 2 Œa ; b ;
99
4.4 Approximation by interpolation
therefore the estimate (4.24) yields ˇ ˇ ˇ Q ˇ ;b
Œf .x/j ˇIm; Œf .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd /ˇ D jIŒa m ƒm kf k1;Œa ;b ƒm kf k1;Q for all x 2 Œa ; b . Since this estimate holds for all y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd , it implies the desired bound. Lemma 4.19 (Directional interpolation error). Let 2 f1; : : : ; d g. Let % be defined by (4.28). Let f 2 C 1 .Q/, Cf 2 R0 , f 2 R>0 and 2 N be such that k@ f k1;Q
Cf f 1
holds for all 2 N0 :
(4.32)
Let m 2 Nd be an order vector, and let n 2 f1; : : : ; m g. We have kf
IQ m; Œf
k1;Q 2eCf .ƒm C 1/.n C 1/
b a 1C f
n 2f : % b a
Proof. We fix y 2 Œa ; b for all 2 f1; : : : ; 1; C 1; : : : ; d g. We define the function f W Œa ; b ! R;
x 7! f .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd /;
and observe that jf .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd / Œa ;b
IQ Œf .x/j m; Œf .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd /j D jf .x/ Im
(4.33)
holds for all x 2 Œa ; b . Our assumption (4.32) implies jf.i/ .x/j D j@i f .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd /j
Cf f 1
for all x 2 Œa ; b and 2 N0 . Due to Theorem 4.16, this means kf
;b
IŒa k1;Œa ;b
m
2eCf .ƒm C 1/.n C 1/
b a 1C f
n 2f : % b a
Applying (4.33) concludes the proof. Combining the stability estimate of Lemma 4.18 with the interpolation error estimate of Lemma 4.19 yields the following result:
100
4 Application to integral operators
Theorem 4.20 (Multi-dimensional interpolation error). Let % be defined as in (4.28). Let f 2 C 1 .Q/, Cf 2 R0 , f 2 R>0 and 2 N be such that k@ f k1;Q
Cf f 1
holds for all 2 f1; : : : ; d g; 2 N0 :
(4.34)
Let m; n 2 Nd be order vectors with n m. We let ƒm ´
d Y
.ƒm C 1/;
diam1 .Q/ ´ maxfb a W 2 f1; : : : ; d gg:
D1
Then the following error estimate holds: kf
IQ m Œf
d n 2f diam1 .Q/ X 1C .n C 1/ % : f b a D1 (4.35)
k1;Q 2eCf ƒm
Proof. Let ı ´ diam1 .Q/. For all 2 f0; : : : ; d g, we define the operator P ´
Y
IQ m; :
D1
We can see that P D P1 IQ m;
holds for all 2 f1; : : : ; d g;
and due to P0 D I and Pd D IQ m , we have kf
IQ m Œf
k1;Q D kP0 Œf Pd Œf k1;Q
d X
kP1 Œf P Œf k1;Q
D1
D
d h i X Œf P1 f IQ m; D1
1;Q
:
According to Lemma 4.18, we find kP1 Œgk1;Q
1 Y
ƒm kgk1;Q
for all g 2 C.Q/;
D1
and applying this to g D f IQ m; Œf for 2 f1; : : : ; mg leads to kf IQ Œf k1;Q
d Y 1 X D1
D1
ƒm kf IQ m; Œf k1;Q :
4.4 Approximation by interpolation
101
Now we can use Lemma 4.19 to conclude kf IQ Œf k1;Q d n 1 2f ı X Y 2eCf 1 C ƒm .ƒm C 1/.n C 1/ % f D1 D1 b a n d 2f ı X Y .ƒm C 1/ .n C 1/ % 2eCf 1 C b a f D1 D1 d n 2f ı X 2eCf ƒm 1 C .n C 1/ % : f D1 b a Theorem 4.20 provides us with a fairly general interpolation error estimate for the case of anisotropic boxes Q: if the extent of the box in one direction is significantly smaller than in the other ones, estimate (4.35) allows us to use a lower interpolation order in this direction, i.e., to reach a higher efficiency. In most applications, the clustering strategy is chosen in such a way that anisotropic boxes are avoided. In this situation, we can afford to choose approximately identical interpolation orders in all coordinate directions and base the error estimate only on the minimal order: Corollary 4.21 (Isotropic interpolation). Let f 2 C 1 .Q/, Cf 2 R0 , f 2 R>0 and 2 N be such that (4.34) holds. Let m 2 Nd be an order vector, and let n ´ minfm W 2 f1; : : : ; d gg;
ƒm ´
d Y
ƒm ;
D1
diam1 .Q/ ´ maxfb a W 2 f1; : : : ; d gg: Then we have kf
IQ m Œf
k1;Q 2edCf ƒm .n C 1/
n 2f diam1 .Q/ : 1C % f diam1 .Q/ (4.36)
Proof. Since % is monotonic and 2f 2f diam1 .Q/ b a
holds for all 2 f1; : : : ; d g;
this is a simple consequence of Theorem 4.20.
Application to the kernel function Let us now apply the general error estimates of Theorem 4.20 and Corollary 4.21 to the problem of approximating the kernel function g by gQ t;s .
102
4 Application to integral operators
Theorem 4.22 (Approximation error). Let 2 R>0 . We consider d -dimensional axis-parallel bounding boxes Q t D Œa1 ; b1 Œad ; bd ;
Qs D Œad C1 ; bd C1 Œa2d ; b2d ;
satisfying the admissibility condition maxfdiam1 .Q t /; diam1 .Qs /g D diam1 .Q t Qs / 2 dist.Q t ; Qs /:
(4.37)
Let the kernel function g be .Cas ; ; c0 /-asymptotically smooth with > 0. Let m ´ .m t ; ms / and n ´ minfm W 2 f1; : : : ; 2d gg;
ƒm ´
2d Y
.ƒm C 1/:
D1
Then the separable approximation gQ t;s defined by (4.19) satisfies jg.x; y/ gQ t;s .x; y/j
4edCas ƒm .n C 1/ 1 n .1 C 2c /% 0 dist.Q t ; Qs / c0
for all x 2 Q t and y 2 Qs . Proof. Since g is .Cas ; ; c0 /-asymptotically smooth, we have k@ gk1;Q t Qs
Cf c0 Cas D dist.Q t ; Qs / dist.Q t ; Qs / 1 f 1
for all 2 N0 , 2 f1; : : : ; 2d g, where we let Cf ´
Cas ; dist.Q t ; Qs /
f ´
dist.Q t ; Qs / : c0
Applying Corollary 4.21 and observing diam1 .Q t Qs / 2c0 ; f
2f 1 diam1 .Q t Qs / c0
concludes the proof. The convergence rate of the interpolant is determined by %.1=.c0 //n . Since working with this quantity is slightly inconvenient, we replace it by simpler expressions: due to Remark 4.17, we can bound %.1=.c0 //1 from above by .c0 /=.c0 C 1/ < 1 and see that the interpolant will always converge as long as > 0 holds. On the other hand, we can bound %.1=.c0 //1 from above by .c0 /=2 and thus reproduce the convergence rate estimates predicted by classical results. It is even possible to combine both estimates and simplify the estimate even further:
4.5 Approximation of derivatives
103
Remark 4.23. Let us assume that the interpolation scheme .Im /1 mD1 is .ƒ; /-stable and that we use interpolation of constant order n, i.e., that the order vectors .m t / t2T and .ms /s2TJ satisfy m t n ms for all t 2 T , s 2 TJ . Under these assumptions, the estimate of Theorem 4.22 can be simplified: we let
c0 1 c0 c0 1 q ´ min D ; ; > D q q 1 1 %.1=.c0 // c0 C 1 2 C 1 C c 2 2 1 C c02 2 C 1 c0 0
and combining the estimate ´ q%.1=.c0 // > 1 with Lemma 3.50 yields that there is a constant Cin 2 R>0 satisfying 4edCas ƒ2d .n C 2/2d .n C 1/ .1 C 2c0 / n Cin
for all n 2 N:
Due to the stability assumption and m D n, we have ƒm D
2d Y
.ƒm C 1/
D1
2d Y
.ƒ.m C 1/ C 1/
D1
2d Y
.ƒ.n C 2/ / D ƒ2d .n C 2/2d
D1
and Theorem 4.22 yields kg gQ t;s kQ t Qs
Cin qn dist.Q t ; Qs /
for all b D .t; s/ 2 LC J ; n 2 N;
i.e., the interpolant will converge exponentially if the order n is increased. Remark 4.24 (Taylor expansion and interpolation). Although both Taylor expansion and interpolation provide polynomial approximations of the kernel function g, the latter approach has several significant advantages: the convergence rate of the Taylor expansion is bounded by .c0 /=. C 1/, and we can expect no convergence if c0 is too large. The convergence rate of interpolation, on the other hand, is bounded by .c0 /=.c0 C 1/ < 1, so exponential convergence is always guaranteed, even though the rate of convergence may deteriorate for large values of the product c0 . If is small enough, the interpolant will converge with a rate of c0 =2. In addition, the parameter in the admissibility condition (4.11) for the Taylor expansion has to be large enough to bound the sum of the diameters of K t and Ks , while in the condition (4.37) for interpolation only the maximum of the diameters has to be bounded. If the clusters are of comparable size, this means that the parameter for the Taylor expansion has to be chosen twice as large as for interpolation, i.e., interpolation will converge even faster for similar block cluster trees.
4.5 Approximation of derivatives The theory presented so far requires the kernel function g to be asymptotically smooth on the entire domain f.x; y/ 2 Rd Rd W x ¤ yg. In some applications, e.g., when
104
4 Application to integral operators
dealing with boundary integral equations, the function g will not be globally defined, i.e., our definition of asymptotically smooth functions will not apply. Fortunately, g can usually be expressed as a product of a derivative of a globally-defined asymptotically smooth generator function W Rd Rd ! R and a separable function. An example is the classical double layer potential operator Z 1 hx y; n.y/i GDLP Œu.x/ D u.y/ dy 2 kx yk22 given on a curve R2 in two spatial dimensions, where for each y 2 , n.y/ 2 R2 is the outward-pointing normal vector. The corresponding kernel function g.x; y/ D
1 hx y; n.y/i 2 kx yk22
is only defined on and will, in general, not be smooth, since n does not have to be a smooth function.
Construction of a separable approximation We can see that g.x; y/ D hgrady .x; y/; n.y/i
holds for all x; y 2 with x ¤ y;
where the generator function is given by W R2 R2 ! R;
.x; y/ 7!
´
1 log kx yk2 2 0
(4.38)
if x ¤ y; otherwise:
The generator function is asymptotically smooth (cf. Appendix E of [63]). The equation (4.38) suggests two different approaches to approximating g: since is asymptotically smooth, the same will hold for the components of grady , so we can use Taylor expansions or interpolation to find separable approximations of these components and combine them to form a separable approximation of the gradient. This approach will lead to optimal convergence, but it will also increase the rank of the approximation by a factor of d , since each component of grady has to be approximated individually. We can avoid the higher rank and the corresponding higher complexity of the implementation by constructing a local approximation Q t;s of the generator function and using grady Q t;s as an approximation of grady . In short: the first approach uses an approximation of the derivative, while the second uses the derivative of an approximation. The first approach is covered by the theory in the previous section, so we focus on the second approach.
4.5 Approximation of derivatives
105
For each admissible block b D .t; s/ 2 LC J , we can construct a local separable approximation X X Q t;s .x; y/ D .Sb / v t; .x/ws; .y/ for all x 2 t ; y 2 s ; 2K t 2Ls
by Taylor expansion or interpolation, where Sb 2 RK t Ls is the corresponding coupling matrix and .v t; /2K t and .ws; /2Ls are suitable expansion functions. If we assume that the functions .ws; /2Ls are differentiable, we can replace in (4.38) by Q t;s and obtain gQ t;s .x; y/ ´ hgrady Q t;s .x; y/; n.y/i X X D .Sb / v t; .x/hgrady ws; .y/; n.y/i 2K t 2Ls
D
X X
.Sb / v t; .x/wQ s; .y/
for all x 2 t ; y 2 s ;
2K t 2Ls
where we let wQ s; .y/ ´ hgrady ws; .y/; n.y/i
for all y 2 ; 2 Ls ;
i.e., we have found a separable approximation gQ t;s of g. In order to bound the error introduced by replacing g with gQ t;s , we have to find estimates for jg.x; y/ gQ t;s .x; y/j D jhgrady . Q t;s /.x; y/; n.y/ij k grady . Q t;s /.x; y/k2 kn.y/k2 in all points x; y 2 t s . Since kn.y/k2 D 1 holds for all y 2 , we face the task of finding a bound for k grady . Q t;s /.x; y/k2 in all points x 2 t , y 2 s , i.e., we require error bounds for the derivatives of Taylor expansions or interpolants.
Error analysis in the one-dimensional case Due to its theoretical and practical advantages (cf. Remarks 4.9 and 4.24), we focus on approximation by interpolation, i.e., we will only consider the approximation X X t Qs Q t;s .x; y/ ´ IQ Œ .x; y/ D . t; ; s; /L t; .x/Ls; .y/ (4.39) .m t ;ms / 2K t 2Ls
for x 2 Q t , y 2 Qs , where K t , Ls , . t; /2K t , . s; /2Ls , .L t; /2K t and .Ls; /2Ls are defined as in Section 4.4.
106
4 Application to integral operators
We base our analysis on one-dimensional results and derive multi-dimensional estimates by tensor arguments. Let us start by considering the approximation of f 0 for a function f 2 C 1 Œ1; 1. We have to prove that the interpolation error kf 0 .Im Œf /0 k1;Œ1;1 is small. As in the previous section, the proof is split into two parts: we start by proving a stability estimate for derivatives of polynomials, since this allows us to bound the error of the interpolation by kf 0 p 0 k1;Œ1;1 for an arbitrary polynomial p 2 Pm . Now we have to construct a suitable polynomial p 2 Pm . We do this by an indirect approach: we find a polynomial p0 2 Pm1 approximating the derivative f 0 of f and let p be an antiderivative of p0 , i.e., we have p 0 D p0 and kf 0 p 0 k1;Œ1;1 D kf 0 p0 k1;Œ1;1 . This allows us to re-use the approximation results of the previous section. The stability estimate for derivatives of polynomials is based on an inverse estimate for polynomials: Lemma 4.25 (Markov’s inequality). Let m 2 N. We have ku0 k1;Œ1;1 m2 kuk1;Œ1;1
for all u 2 Pm :
Proof. See, e.g., Theorem 4.1.4 in [42]. In order to be able to handle higher derivatives, we use the following straightforward generalization of Markov’s inequality: Lemma 4.26 (Markov’s inequality iterated). Let m 2 N and ` 2 N0 . We have 8 < mŠ 2 kuk1;Œ1;1 if ` m; .m`/Š ku.`/ k1;Œ1;1 for all u 2 Pm : (4.40) :0 otherwise Proof. By induction on `. For ` D 0, the inequality is trivial. Let now ` 2 N0 be such that (4.40) holds. We have to prove the estimate for ` C 1. If ` m holds, we have ` C 1 > m, i.e., u.`C1/ D 0 and (4.40) is trivial. Otherwise, we have u.`/ 2 Pm` and can apply Lemma 4.25 in order to get ku.`C1/ k1;Œ1;1 D k.u.`/ /0 k1;Œ1;1 .m `/2 ku.`/ k1;Œ1;1 : Using the induction assumption, we can conclude .`C1/
ku
2
.`/
2
k1;Œ1;1 .m `/ ku k1;Œ1;1 .m `/ D
mŠ .m .` C 1//Š
2
and this is the estimate (4.40) for ` C 1.
kuk1;Œ1;1 ;
mŠ .m `/Š
2 kuk1;Œ1;1
4.5 Approximation of derivatives
107
Combining Markov’s inequality with a simple Taylor approximation yields a stability estimate for derivatives of interpolants: Lemma 4.27 (Stability of derivatives). Let Œa; b R be a non-trivial interval. Let ` 2 N0 and let m 2 N` . We have .`/ .`/ .`/ k1;Œa;b
k.IŒa;b
m Œf / k1;Œa;b ƒm kf
for all f 2 C ` Œa; b;
where the stability constant is given by ƒ.`/ m ´
ƒm `Š
mŠ .m `/Š
2 :
Proof. Let f 2 C ` Œa; b. Let fO ´ f B ˆŒa;b 2 C ` Œ1; 1. Since we are only interested in the `-th derivative of fO, we can subtract any polynomial of degree less than ` without changing the derivative. We choose the truncated Taylor expansion fQ W Œ1; 1 ! R;
x 7!
`1 X
x fO./ .0/ ; Š D0
observe fQ.`/ D 0 and Im ŒfQ D fQ due to ` m, and conclude .Im ŒfO/.`/ D .Im ŒfO/.`/ fQ.`/ D .Im ŒfO Im ŒfQ/.`/ D .Im ŒfO fQ/.`/ : (4.41) We apply Lemma 4.26 and the stability estimate (4.21) to u ´ Im ŒfO fQ 2 Pm and get 2 mŠ kIm ŒfO fQk1;Œ1;1
k.Im ŒfO fQ/.`/ k1;Œ1;1 .m `/Š (4.42) 2 mŠ O Q ƒm kf f k1;Œ1;1 : .m `/Š Since fQ is the Taylor expansion of fO, we can apply Lemma 4.75 in order to get Z 1 jzj` dt .1 t /`1 jfO.`/ .t z/j jfO.z/ fQ.z/j .` 1/Š 0 Z kfO.`/ k1;Œ1;1 1 .1 t /`1 dt (4.43) .` 1/Š 0 kfO.`/ k1;Œ1;1
D `Š for all z 2 Œ1; 1: Combining (4.41), (4.42) and (4.43) yields ƒm mŠ O.`/ k1;Œ1;1 : kfO.`/ k1;Œ1;1 D ƒ.`/ k.Im ŒfO/.`/ k1;Œ1;1 m kf `Š .m `/Š
108
4 Application to integral operators
Due to the definition of IŒa;b
m , we have .`/ 1 .`/ O k.IŒa;b
m Œf / k1;Œa;b D k.Im Œf B ˆŒa;b / k1;Œa;b
` ` 2 2 D k.Im ŒfO/.`/ B ˆ1 k D k.Im ŒfO/.`/ k1;Œ1;1
Œa;b 1;Œa;b
ba ba ` ` 2 2 O.`/ k1;Œ1;1 D ƒ.`/ ƒ.`/ k f kfO.`/ B ˆ1 m m Œa;b k1;Œa;b
ba ba D ƒ.`/ k.fO B ˆ1 /.`/ k1;Œa;b ; m
Œa;b
and this concludes the proof. We can combine this stability result with Lemma 4.15 to prove that the derivative of IŒa;b
m Œf is a good approximation of the derivative of f . Since we have to apply the lemma to the derivative, we have to modify the analyticity condition (4.27) accordingly. Theorem 4.28 (Derived interpolation). Let Œa; b 2 R be a non-trivial interval. Let ` 2 N0 . Let f 2 C 1 Œa; b, Cf.`/ 2 R0 , f 2 R>0 and .`/ 2 N be such that jf .`C/ .x/j
Cf.`/ f
.`/ 1
holds for all x 2 Œa; b; 2 N0 :
Let % be defined as in (4.28). We have .`/ kf .`/ .IŒa;b
m Œf / k1;Œa;b
2eCf.`/ .ƒ.`/ m C 1/.n C 1/
.`/
1C
2f n ba % f ba
for all m 2 N` and all n 2 f1; : : : ; m `g. Proof. Let m 2 N` and n 2 f0; : : : ; m `g. We let fO ´ f B ˆŒa;b 2 C 1 Œ1; 1 and find O.`C/
jf
ˇ ˇ .`/ ˇ b a `C .`C/ ˇ Cf b a `C .x/j D ˇˇ f .ˆŒa;b .x//ˇˇ 2 2 .`/ 1 f D
Cyf.`/ Of
.`/ 1
for all x 2 Œ1; 1; 2 N0
with the constants Cyf.`/
´
Cf.`/
ba 2
` ;
Of ´ f
2 ; ba
4.5 Approximation of derivatives
109
i.e., the derivative fO.`/ satisfies the conditions of Lemma 4.15. This result allows us to find a polynomial pO0 2 Pn satisfying 2 .`/ kfO.`/ pO0 k1;Œ1;1 2e Cyf.`/ .n C 1/ 1C %.Of /n : Of For all i 2 f1; : : : ; `g, we define polynomials pOi 2 PnCi by Z x pOi .x/ D pOi1 .y/ dy for all x 2 Œ1; 1: 1
Due to this definition, we have pOi0 D pOi1 for all i 2 f1; : : : ; `g. We let pO ´ pO` 2 PnC` Pm and observe pO .`/ D pO0 . This means 2 .`/ .`/ .`/ .`/ .`/ O O y kf pO k1;Œ1;1 D kf pO0 k1;Œ1;1 2e Cf .nC1/ 1C %.Of /n : Of We define p ´ pO B ˆ1 and find Œa;b
kf
.`/
p
.`/
k1;Œa;b D
2 ba
`
.`/ kfO.`/ B ˆ1 B ˆ1 Œa;b pO Œa;b k1;Œa;b
` 2 kfO.`/ pO .`/ k1;Œ1;1
D ba ` 2 2 .`/ .`/ y 1C %.Of /n 2e Cf .n C 1/ ba Of 2 .`/ .`/ 2eCf .n C 1/ 1C %.Of /n : Of
Since pO 2 PnC` Pm holds and since ˆŒa;b is an affine map, we have p 2 Pm and can use the stability result of Lemma 4.27 in order to conclude .`/ .`/ .`/ p .`/ k1;Œa;b C k.IŒa;b
kf .`/ .IŒa;b
m Œf / k1;Œa;b kf m Œf p/ k1;Œa;b
.`/ .1 C ƒ.`/ p .`/ k1;Œa;b : m /kf
Applying the bound for the right-hand term completes the proof.
Error analysis in the multi-dimensional case As in Section 4.4, we now extend the one-dimensional approximation result to the multi-dimensional setting by investigating the properties of the directional interpolation operators IQ m; . We first establish the counterpart of Lemma 4.27:
110
4 Application to integral operators
Lemma 4.29 (Stability of directional interpolation). Let 2 Nd0 with m. Let
2 f1; : : : ; d g. We have . / k@ IQ m; Œf k1;Q ƒm k@ f k1;Q
for all f 2 C 1 .Q/:
Proof. Let f 2 C 1 .Q/. Due to (4.31), we have Q Q @ Im; Œf D Im; Œ@ f
for all 2 f1; : : : ; 1; C 1; : : : ; d g:
(4.44)
In order to handle the differential operator @ , we proceed as in the proof of Lemma 4.18: we fix y 2 Œa ; b for all 2 f1; : : : ; 1; C 1; : : : ; d g and define the function
f W Œa ; b ! R;
x 7! f .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd /:
We observe . / .x/ @ f .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd / D f Q @ Im; Œf
.y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd / D
for all x 2 Œa ; b ;
Œa ;b
.Im Œf /. / .x/
for all x 2 Œa ; b ;
and applying Lemma 4.27 yields ˇ ˇ ˇ ˇ Q ˇ@ I Œf .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd /ˇ D ˇ.IŒa ;b Œf /. / .x/ˇ m; m . / ƒm kf. / k1;Œa ;b
D By definition, we have
. / ƒm kf
(4.45)
k1;Q :
1 d @ D @ 1 : : : @d ;
so combining (4.44) and (4.45) concludes the proof. Next, we require a multi-dimensional counterpart of the one-dimensional interpolation error estimate provided by Theorem 4.28: Lemma 4.30 (Directional interpolation error). Let 2 Nd0 with m. Let 2 f1; : : : ; d g. Let % be defined by (4.28). Let f 2 C 1 .Q/, Cf./ 2 R0 , f 2 R>0 and ./ 2 N be such that k@ @ f
k1;Q
Cf./ f
holds for all 2 N0 :
./ 1
Let n 2 f1; : : : ; m g. We have k@ f @ IQ m; Œf k1;Q
. / 2eCf./ .ƒm
C 1/.n C 1/
./
b a 1C f
n 2f % : b a
111
4.5 Approximation of derivatives
Proof. Let 0 ´ .1 ; : : : ; 1 ; 0; C1 ; : : : ; d / 2 Nd0 . Let y 2 Œa ; b for all 2 f1; : : : ; 1; C 1; : : : ; d g. We define the function 0
f W Œa ; b ! R;
x 7! @ f .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd /;
0
0
Q and since (4.44) implies @ IQ m; Œf D Im; Œ@ f , we observe 0
@ f .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd / D @ .@ f /.y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd /
D f. / .x/
for all x 2 Œa ; b ;
and @ IQ m; Œf .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd / 0
Q D @ Im; Œ@ f .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd / Œa ;b
D .Im Œf /. / .x/
for all x 2 Œa ; b :
Our assumption implies jf.C / .x/j D j@ @ f .y1 ; : : : ; y1 ; x; yC1 ; : : : ; yd /j
Cf./ f
./ 1
for all x 2 Œa ; b ; 2 N0 ;
and we can apply Theorem 4.28 to obtain Œa ;b
Œf /. / k1;Œa ;b
kf. / .Im . / 2eCf./ .ƒm C 1/.n C 1/
./
1C
b a f
n 2f : % b a
Since the right-hand side of this estimate does not depend on the choice of the variables y1 ; : : : ; y1 ; yC1 ; : : : ; yd , a look at the definition of f reveals that the proof is already complete. Combining the Lemmas 4.29 and 4.30 allows us to prove the necessary estimate for the derivatives of the multi-dimensional tensor product interpolation operator IQ m: Theorem 4.31 (Multi-dimensional interpolation error). Let % be defined as in (4.28). Let 2 Nd0 . Let f 2 C 1 .Q/, Cf./ 2 R0 , f 2 R>0 and ./ 2 N be such that k@ @ f k1;Q
Cf./ f
./ 1
holds for all 2 f1; : : : ; d g; 2 N0 :
(4.46)
Let m; n 2 Nd be order vectors with m and n m . We let ƒ./ m ´
d Y
. / .ƒm C 1/;
D1
diam1 .Q/ ´ maxfb a W 2 f1; : : : ; d gg:
112
4 Application to integral operators
Then the following error estimate holds: k@ f @ IQ m Œf k1;Q d n 2f diam1 .Q/ X ./ ./ ./ 2eCf ƒm 1 C .n C 1/ % : f b a D1 Proof. Let ı ´ diam1 .Q/. As in the proof of Theorem 4.20, we define the operator P ´
Y
IQ m;
for all 2 f0; : : : ; d g;
D1
and apply the triangle inequality to obtain k@ f @ IQ m Œf k1;Q
d X @ P1 f IQ Œf : m; 1;Q D1
We can use Lemma 4.29 in order to prove k@ f @ IQ m Œf k1;Q
d Y 1 X D1
/ @ f @ I Q ƒ. m m; Œf 1;Q :
D1
Since n m holds for all 2 f1; : : : ; d g, we can apply Lemma 4.30 to find k@ f @ IQ m; Œf k1;Q
n 2f b a C 1/.n C 1/ 1C % f b a n 2f ı ./ . / ./ 1C % 2eCf .ƒm C 1/.n C 1/ f b a . / 2eCf./ .ƒm
./
and conclude k@ f @ IQ m Œf k1;Q
d Y 1 X D1
D1
. / k@ f @ IQ ƒm m; Œf k1;Q
d n 1 2f ı X Y . / . / ./ .ƒm C 1/.n C 1/ ƒ % 1C f D1 D1 m b a d n 2f ı X ./ 1 C 2eCf./ ƒ./ .n C 1/ % : m f D1 b a 2eCf./
113
4.5 Approximation of derivatives
In an isotropic situation, we can simplify the error estimate provided by Theorem 4.31: Corollary 4.32 (Isotropic interpolation). Let % be defined as in (4.28). Let 2 Nd0 . Let f 2 C 1 .Q/, Cf./ 2 R0 , f 2 R>0 and ./ 2 N be such that (4.46) holds. Let m 2 Nd be an order vector with m , and let n ´ minfm W 2 f1; : : : ; d gg;
d Y
ƒ./ m ´
. / .ƒm C 1/;
D1
diam1 .Q/ ´ maxfb a W 2 f1; : : : ; d gg: Then the following error estimate holds: k@ f @ IQ m Œf k1;Q 2deCf./ ƒ./ m .n C 1/
./
1C
n 2f diam1 .Q/ % : diam1 .Q/ f
Proof. This is a simple consequence of Theorem 4.31.
Application to the kernel function Now we can apply the general result of Theorem 4.31 to the kernel function g D @ . In order to keep the expressions from becoming too complicated, we only consider the isotropic case: Theorem 4.33 (Approximation error). Let 2 R>0 . We consider d -dimensional axis-parallel bounding boxes Q t D Œa1 ; b1 Œad ; bd ;
Qs D Œad C1 ; bd C1 Œa2d ; b2d ;
satisfying the admissibility condition maxfdiam1 .Q t /; diam1 .Qs /g D diam1 .Q t Qs / 2 dist.Q t ; Qs /: 2d be an order vector with m . Let @ be Let 2 N2d 0 and let m 2 N ./ ./ .Cas ; ; c0 /-asymptotically smooth with ./ > 0. We let
n ´ minfm W 2 f1; : : : ; 2d gg;
ƒ./ m ´
2d Y
. / .ƒm C 1/:
D1
Then the separable approximation Q t;s defined by (4.39) satisfies
j@ .x; y/ @ Q t;s .x; y/j for all x 2 Q t and y 2 Qs .
4edCas./ ƒ./ m .n C 1/
dist.Q t ; Qs / ./
./
1 .1 C 2c0 /% c0
n
114
4 Application to integral operators
Proof. We let Q ´ Q t Qs , define ı ´ diam1 .Q/;
Cf./ ´
Cas./ dist.Q t ; Qs
/ ./
;
f ´
dist.Q t ; Qs / ; c0
and observe that the asymptotic smoothness of the derivative @ of implies k@ @ k1;Q Cas./
c0
./ 1 dist.Q t ; Qs / ./ C
D
Cf./ f
./ 1
:
This means that we can apply Corollary 4.32 to Q in order to get 2f n ı ./ ./ ./ 1 C % Œ k 4deC ƒ .n C 1/ k@ @ IQ 1;Q m m f f ı n ./ Cas 1 ./ D 4de ƒ./ .n C 1/ .1 C 2c0 /% m ./ c0 dist.Q t ; Qs / by using the admissibility condition ı 2 dist.Q t ; Qs /. Remark 4.34. If is the order of the singularity of the generator function , its -th derivative will typically have a singularity of order ./ ´ Cjj. If we disregard the polynomial factors, the results from Theorem 4.22 and Theorem 4.33 are quite similar: in the isotropic case, the first corresponds to the direct approximation of g D @ by gQ and yields an estimate of the type C.m/ 1 m % kg gk Q 1;Q t Qs c0 dist.Q t ; Qs /Cjj for an interpolation order m, the second correspond to the approximation of g by @ Q and yields an estimate of the type
kg @ Q k1;Q t Qs
C.m/ 1 mCjj1 % ; c0 dist.Q t ; Qs /Cjj
where we define jj1 ´ max . Roughly speaking, the second approach yields the same approximation quality as the first for the reduced order mjj1 of interpolation. This is not surprising, since polynomials of low order will not effect the derivative of Q and therefore contribute nothing to the approximation.
4.6 Matrix approximation Due to the Theorems 4.8, 4.22 and 4.33, we can find bounds . b /bD.t;s/2LC such J that jg.x; y/ gQ t;s .x; y/j b
holds for all x 2 t ; y 2 s
(4.47)
4.6 Matrix approximation
115
and all admissible blocks b D .t; s/ 2 LC J . Based on these bounds, we now investigate the error of the approximation of the matrix G by X X zD G t Gs C V t Sb Ws bD.t;s/2L J
bD.t;s/2LC J
constructed in (4.5). In order to simplify the presentation, we only consider the case that the basis functions .'i /i2 and . j /j 2J are elements of L2 . /. More general vector spaces can be handled by the approach described in Remark 4.2. We measure the error introduced by the H 2 -matrix approximation in the Frobenius and in the spectral norm.
Frobenius norm estimates Let us first consider the Frobenius norm. Definition 4.35 (Frobenius norm). Let X 2 RJ . The Frobenius norm of X is given by 1=2 XX kX kF ´ Xij2 : i2 j 2J
In the model problem, we can base the analysis of the error on the element-wise estimate provided by (2.10), since the supports of the basis functions are disjoint and we can find bounds for their L2 -norms. In general, we cannot assume that the supports are disjoint, since most popular finite element spaces do not satisfy this requirement. Instead, we assume that the overlap of the supports can be bounded by a constant, i.e., that in each point x 2 only a bounded number of basis functions can differ from zero. Definition 4.36 (Overlapping supports). Let . i /i2 be a family of supports in (cf. Definition 3.20). Let Cov 2 N. If #fi 2 W x 2 i g Cov
holds for almost all x 2 ;
we call the family . i /i2 Cov -overlapping. In the model case, i.e., for non-overlapping supports, the measure of is the sum of the measures of the supports i . If a family of supports if Cov -overlapping, we can still bound the sum of the measures of all supports i by the measure of multiplied by the constant Cov .
116
4 Application to integral operators
Lemma 4.37 (Bounded overlap). Let . i /i2 be a Cov -overlapping family of supports in . Then we have X j i j Cov j t j for all t 2 T ; i2tO
X
j i j Cov j j:
i2
Proof. For all i 2 , we introduce
´
i W ! R;
x 7!
if x 2 i ; otherwise:
1 0
Let t 2 T . Since . i /i2 is Cov -overlapping, we have Z X Z X XZ j i j D i .x/ dx D i .x/ dx D i2tO
i2tO
Z
i2tO
t
Z
#fi 2 W x 2 i g dx
D t
X
i .x/ dx
i2tO
Cov dx D Cov j t j: t
Applying this result to the root of T yields the second estimate of this lemma. Using this estimate, we can derive the following estimate for the Frobenius norm of the blockwise error: Lemma 4.38 (Blockwise error). Let . b /b2LC
J
be a family satisfying (4.47). Let the
families . i /i2 and . j /j 2J be Cov -overlapping. Let C ´ maxfk'i kL2 W i 2 tOg;
CJ ´ maxfk
j kL2
W j 2 sO g:
(4.48)
We have k t Gs V t Sb Ws kF C CJ b
X
1=2 X
j i j
i2tO
1=2 j j j
j 2Os
for all b D .t; s/ 2 LC J . Proof. Let b D .t; s/ 2 LC J . According to Definition 4.35 and (4.4), we have k t Gs V t Sb Ws k2F D
X X Z i2tO j 2Os
b2
Z 'i .x/
2 .y/ dy dx j
X X Z i2tO j 2Os
.g.x; y/ gQ t;s .x; y//
2 Z j'i .x/j dx j
2 : j .y/j dy
4.6 Matrix approximation
117
Let us consider the sum over i 2 . We use the Cauchy–Schwarz inequality in order to get 2 X Z X Z j'i .x/j dx j i j 'i .x/2 dx i2tO
i2tO
X 2 O maxfk'i kL j i j 2 W i 2 tg C
X
i2tO
j i j:
i2tO
Applying the same reasoning to the sum over j 2 J yields the desired estimate. Now we can combine the blockwise error estimates in order to derive a global estimate. Due to the structure of the Frobenius norm, combining blockwise estimates is straightforward: Lemma 4.39 (Global Frobenius norm). Let X 2 RJ . We have 1=2 X k t Xs k2F : kX kF D bD.t;s/2LJ
Proof. We once more use the equation XD
X
t Xs :
bD.t;s/2LJ
The matrices t and s are orthogonal projections for all t 2 T and s 2 TJ , and the definition of the norm implies X h t Xs ; X iF kX k2F D hX; XiF D bD.t;s/2LJ
D
X
h t Xs ; t Xs iF
bD.t;s/2LJ
D
X
k t Xs k2F :
bD.t;s/2LJ
Using Lemma 4.39 in combination with the blockwise error estimate of Lemma 4.38 yields the following error bound: Lemma 4.40 (Frobenius error bound). Let . b /b2LC
J
be a family satisfying (4.47).
Let the families . i /i2 and . j /j 2J be Cov -overlapping. Let C ; CJ 2 R0 be defined as in (4.48). We have z F Cov C CJ j j maxf b W b 2 LC g: kG Gk J
118
4 Application to integral operators
z and use Lemma 4.38 in order Proof. We apply Lemma 4.39 to the matrix X ´ G G to obtain X z 2F D k t Gs V t Sb Ws k2F kG Gk bD.t;s/2LC J
X
C2 CJ2
b2
bD.t;s/2LC J
X
X
j i j
i2tO
j j j
j 2Os
X
X
C2 CJ2 maxf b2 W b 2 LC J g
bD.t;s/2LJ
X
j i j
i2tO
j j j :
j 2Os
Combining Lemma 3.14 with Corollary 3.9 yields that ftO sO W b D .t; s/ 2 LJ g is a disjoint partition of J, and we find X X XX X j i j j j j D j i j j j j bD.t;s/2LJ
i2tO
i2 j 2J
j 2Os
D
X i2
X
j i j
2 j j j Cov j j2
j 2J
by using Lemma 4.37 in the last step. Now we can conclude 2 2 2 2 z 2F Cov C CJ maxf b2 W b 2 LC kG Gk J gj j :
This result is a generalization of the corresponding estimate for the model problem: if we let Cov D 1 (i.e., no overlap between the supports of the basis functions), C D CJ D n1=2 (corresponding to piecewise constant basis functions) and use the bound b D log. C 1/.=. C 1//m1 established in Corollary 2.3, we recover the estimate (2.19) derived for the model problem in Chapter 2. Let us now consider the application of this norm estimate to the kernel approximations constructed by Taylor expansion and interpolation. We consider only the case > 0. Comparing the Theorems 4.8, 4.22 and 4.33 yields the error bounds kg gQ t;s k1;K t Ks
Cg q n dist.K t ; Ks /
for all b D .t; s/ 2 LC J ;
Cg q n dist.Q t ; Qs /
for all b D .t; s/ 2 LC J ;
Cg q n dist.Q t ; Qs / Cjj
for all b D .t; s/ 2 LC J ;
if Taylor expansions are used, kg gQ t;s k1;Q t Qs if interpolation is applied and kg gQ t;s k1;Q t Qs
119
4.6 Matrix approximation
if the -th partial derivative of an interpolant is considered. Here Cg 2 R>0 is a constant which does not depend on n and b, and q 2 Œ0; 1Œ is the rate of convergence (recall that the admissibility parameter has to be sufficiently small in the case of the Taylor expansion). In the case of the Taylor approximation, the corresponding admissibility condition (4.11) implies 1 .diam.K t / C diam.Ks // 2 dist.K t ; Ks / for all b D .t; s/ 2 LC J ;
diam.K t /1=2 diam.Ks /1=2
and the error estimate takes the form kg gQ t;s k1;K t Ks
Cg q n : diam.K t /=2 diam.Ks /=2
In order to derive a similar result for approximations constructed by interpolation, we have to replace the admissibility condition (4.37) by the slightly stronger condition maxfdiam.Q t /; diam.Qs /g 2 dist.Q t ; Qs /
(4.49)
and observe that it yields diam.Q t /1=2 diam.Qs /1=2 maxfdiam.Q t /; diam.Qs /g 2 dist.Q t ; Qs /
for all b D .t; s/ 2 LC J
(4.50)
if the block cluster tree is constructed based on the new admissibility condition. Then the error estimate implies kg gQ t;s k1;Q t Qs
Cg .2/ q n diam.Q t /=2 diam.Qs /=2
in the case of the interpolation and kg gQ t;s k1;Q t Qs
Cg .2/Cjj q n diam.Q t /.Cjj/=2 diam.Qs /.Cjj/=2
if -th partial derivatives of interpolants are used. Since all three cases can be handled in a similar fashion, we restrict our attention to approximations constructed by interpolation, i.e., we use kg gQ t;s k1;Q t Qs b ´
Cg; q n diam.Q t /=2 diam.Qs /=2
(4.51)
for all b D .t; s/ 2 LC J , where we let Cg; ´ Cg .2/ . Combining this estimate with Lemma 4.40 yields the following global error bound for the Frobenius norm:
120
4 Application to integral operators
Theorem 4.41 (Frobenius approximation error). Let C ; CJ 2 R0 be defined as in (4.48), and let q 2 Œ0; 1Œ be the rate of convergence introduced in (4.51). Let the families . i /i2 and . j /j 2J be Cov -overlapping. For all admissible leaves b D .t; s/ 2 LC Q t;s satisfies (4.51). Then we have J , we assume that the g z F kG Gk
Cov C CJ Cg; j j =2 minfdiam.Q t / diam.Qs / W b D .t; s/ 2 LC J g
qn:
(4.52)
Proof. Combine Lemma 4.40 with (4.51). Remark 4.42 (Asymptotic behaviour of the error). Let us assume that is a d dimensional subset or submanifold of Rd , and that the discretization is based on a quasiuniform hierarchy of grids with a decreasing sequence h0 ; h1 ; : : : of mesh parameters. The minimum in estimate (4.52) is attained for leaf clusters, and we can assume that the diameters of their bounding boxes are approximately proportional to the grid parameter. If we neglect the constants C and CJ corresponding to the scaling of the basis functions, the Frobenius approximation error on mesh level ` can be expected to behave like h qn. `
Spectral norm estimates z of the H 2 -matrix approxiNow let us focus on the spectral norm of the error G G mation. Definition 4.43 (Spectral norm). Let X 2 RJ . The spectral norm (or operator norm) of X is given by ² ³ kXuk2 kXk2 ´ sup W u 2 RJ n ¹0º : kuk2 Under the same conditions as in the case of the Frobenius norm we can also derive a blockwise error estimate for the spectral norm: Lemma 4.44 (Blockwise error). Let . b /b2LC
J
be a family satisfying (4.47). Let
C ; CJ 2 R>0 be defined as in (4.48). We have X 1=2 X 1=2 k t Gs V t Sb Ws k2 C CJ b j i j j j j i2tO
j 2Os
for all b D .t; s/ 2 LC J . Proof. Let E ´ t Gs V t Sb Ws . Let u 2 RJ . Due to 2 X X X X XX kEuk22 D .Eu/2i D Eij uj Eij2 uj2 i2
i2
j 2J
i2
j 2J
j 2J
121
4.6 Matrix approximation
D
XX
Eij2 kuk22 D kEk2F kuk22 ;
i2 j 2J
we can conclude by using the Frobenius norm estimate of Lemma 4.38. In order to find a global error bound, we have to combine these blockwise error bounds using the operator-norm counterpart of Lemma 4.39: Lemma 4.45 (Global spectral norm). Let X 2 RJ . We have
X
kX k2
k t Xs k22
1=2 :
bD.t;s/2LJ
Proof. Let v 2 R and u 2 RJ . We have ˇ ˇ X ˇ ˇ hv; t Xs ui2 ˇ jhv; Xui2 j D ˇ
X
bD.t;s/2LJ
bD.t;s/2LJ
X
k t Xs k2 k t vk2 ks uk2
bD.t;s/2LJ
jh t v; t Xs s ui2 j
X
k t Xs k22
1=2
bD.t;s/2LJ
X
k t vk22 ks uk22
1=2 :
bD.t;s/2LJ
Combining Lemma 3.14 and Corollary 3.9 yields that LJ corresponds to a disjoint partition of J, i.e., we have X k t vk22 ks uk22 D kvk22 kuk22 bD.t;s/2LJ
and conclude jhv; Xui2 j
X
k t Xs k22
1=2 kuk2 kvk2 :
bD.t;s/2LJ
Setting v ´ Xu and proceeding as in the proof of Lemma 4.44 proves our claim. We can combine the Lemmas 4.44 and 4.45 in order to prove the following bound for the H 2 -matrix approximation error: Lemma 4.46 (Spectral error bound). Let . b /b2LC
J
be a family of positive real
numbers satisfying (4.47). Let the families . i /i2 and . j /j 2J be Cov -overlapping, and let C ; CJ 2 R>0 be defined as in (4.48). We have z 2 Cov C CJ j j maxf b W b 2 LC g: kG Gk J
122
4 Application to integral operators
z and use Lemma 4.44 in order Proof. We apply Lemma 4.45 to the matrix X ´ G G to obtain X z 22 kG Gk k t Gs V t Sb Ws k22 bD.t;s/2LC J
X
C2 CJ2
b2
bD.t;s/2LC J
X
X
j i j
i2tO
j j j
j 2Os
X
X
C2 CJ2 maxf b2 W b 2 LC J g
bD.t;s/2LC J
i2tO
X
j i j
j j j :
j 2Os
As in the proof of Lemma 4.40, we combine Lemma 3.14 with Corollary 3.9 and find X X XX X j i j j j j D j i j j j j bD.t;s/2LC J
i2tO
i2 j 2J
j 2Os
D
X i2
X
j i j
2 j j j Cov j j2
j 2J
by using Lemma 4.37 in the last step. The estimate of Lemma 4.45 is quite straightforward, general and completely sufficient for many standard applications, but it is, in general, far from optimal: the Lemmas 4.40 and 4.46 even yield exactly the same upper bounds for Frobenius and spectral norm, although the spectral norm will indeed be significantly smaller than the Frobenius norm in practical applications. To illustrate this, let us consider the example of the identity matrix 0 1 1 B C nn :: I D@ A2R : : 1 It can bepconsidered as an n n block matrix, and Lemma 4.45 yields the estimate kI k2 n, which is obviously far from the optimal bound kI k2 D 1. Therefore we now consider an improved estimate (a generalization of [49], Satz 6.2) which takes the block structure of a matrix into account: Theorem 4.47 (Global spectral norm). Let X 2 RJ , and let TJ be Csp -sparse. pJ p Let p and pJ be the depths of T and TJ . Let . ;` /`D0 and . J;` /`D0 be families in R0 satisfying 1=2 1=2 J;level.s/ k t Xs k2 ;level.t/
for all b D .t; s/ 2 LJ :
(4.53)
123
4.6 Matrix approximation
Then we have kX k2 Csp
p X
;`
pJ 1=2 X
`D0
1=2 J;`
:
`D0
Proof. Let v 2 R and u 2 RJ . Due to the triangle inequality, estimate (4.53), and the Cauchy–Schwarz inequality, we have X jh t v; t Xs s ui2 j jhv;Xui2 j bD.t;s/2LJ
X
k t Xs k2 k t vk2 ks uk2
bD.t;s/2LJ
X
1=2 1=2 ;level.t/ k t vk2 J;level.s/ ks uk2
bD.t;s/2LJ
X
;level.t/ k t vk22
1=2
bD.t;s/2LJ
X
J;level.s/ ks uk22
1=2 : (4.54)
bD.t;s/2LJ
Let us take a look at the first factor. Using sparsity and Corollary 3.10, we find X X ;level.t/ k t vk22 Csp ;level.t/ k t vk22 t2T
bD.t;s/2LJ
X p
D Csp
D Csp
;`
X
k t vk22 Csp
`D0
t2T`
p X
;` kvk22 :
p X
;` kvk22
`D0
`D0
By the same reasoning, the second factor in (4.54) can be bounded by X
J;level.s/ ks uk22
Csp
bD.t;s/2LJ
pJ X
J;` kuk22 ;
`D0
combining both bounds yields jhv; Xui2 j Csp
p X `D0
;`
pJ 1=2 X
1=2 J;`
kvk2 kuk2 ;
`D0
and we can complete the proof by setting v D Xu. Let us now consider the application of these estimates to the kernel function. Combining Lemma 4.44 with (4.51) yields the following result:
124
4 Application to integral operators
Lemma 4.48 (Factorized error estimate). Let b D .t; s/ 2 LC J satisfy the strong admissibility condition (4.49). Let the families . i /i2 and . j /j 2J of supports be Cov -overlapping. Let the approximation gQ t;s satisfy the error estimate (4.51). Then we have 1=2 1=2 j t j j s j n k t Gs V t Sb Ws k2 Cov C CJ Cg; q : diam.Q t / diam.Qs / (4.55) Proof. Combining Lemma 4.44 with (4.51) yields k t Gs V t Sb Ws k2
X 1=2 X 1=2 C CJ Cg; q n j j j j : i j diam.Q t /=2 diam.Qs /=2 i2tO
Due to Lemma 4.37, we have X j i j Cov j t j; i2tO
X
j 2Os
j j j Cov j s j;
j 2Os
so combining both estimates concludes the proof. This lemma provides us with error bounds matching the requirements of Theorem 4.47 perfectly. Theorem 4.49 (Spectral approximation error). Let C ; CJ 2 R>0 be given as in (4.48). Let the families . i /i2 and . j /j 2J of supports be Cov -overlapping. Let the block cluster tree TJ be Csp -sparse and let all of its admissible leaves satisfy the strong admissibility condition (4.49). Let p and pJ be the depths of T and TJ , respectively. Let ² ³ j t j .`/ ;` ´ max W t 2 T for all ` 2 f0; : : : ; p g; diam.Q t / ² ³ j s j J;` ´ max W s 2 TJ.`/ for all ` 2 f0; : : : ; pJ g: diam.Qs / Then we have z 2 Csp Cg; Cov C CJ q n kG Gk
p X
;`
pJ 1=2 X
`D0
1=2 J;`
:
(4.56)
`D0
Proof. We combine Lemma 4.48 with Theorem 4.47. Remark 4.50 (Asymptotic behaviour of the error). As in Remark 4.42, let us assume that is a d -dimensional subset or submanifold and that the discretization is based on a quasi-uniform grid hierarchy with a decreasing sequence h0 ; h1 ; : : : of mesh
4.7 Variable-order approximation
125
parameters. We again neglect the scaling of the basis functions .'i /i2 and . j /j 2J captured by the constants C and CJ . For piecewise smooth geometries, we can expect the diameters diam. t / of cluster supports and diam.Q t / of the corresponding bounding boxes to be approximately proportional, and we can expect that j t j behaves like diam. t /d . Under these assumptions, we find ;` maxfdiam. t /d W t 2 T.`/ g J;` maxfdiam. s /
d
W s2
TJ.`/ g
for all ` 2 f0; : : : ; p g; for all ` 2 f0; : : : ; pJ g
and have to distinguish between three different cases: • If d > holds, i.e., if the singularity of the kernel function is weak, the sums appearing in the estimate (4.56) will be dominated by the large clusters. Since the large clusters will remain essentially unchanged when refining the grid, the sums can be bounded by a constant, and we can conclude that the spectral error will behave like q n on all levels of the grid. • If d D holds, all terms in the sums appearing in (4.56) can be individually bounded by a constant. Therefore we can expect that the error is approximately proportional to the depth of the cluster tree, i.e., that it will behave like j log h` jq n and grow very slowly when the grid is refined. • If d < holds, i.e., if the kernel function is strongly singular, the sums appearing in (4.56) will be dominated by the small clusters, and therefore the spectral error will behave like h`d q n . In all three cases, the spectral error estimate is better than the Frobenius error estimate given in Remark 4.42 as h qn. `
4.7 Variable-order approximation Until now, we have assumed that the order of the Taylor expansion or the interpolation scheme is constant for all clusters. We have seen in Lemma 4.48 that the size of the support of a cluster plays a major role in determining the spectral approximation error, and we can now take advantage of this observation in order to construct approximation schemes that lead to better error estimates than the ones provided by Theorem 4.49. The fundamental idea is to use different approximation orders for different cluster sizes. It was introduced in [70] and analyzed for Taylor expansions and weakly singular kernel functions in [90], [91]. A refined analysis of this approach was presented in [101], [100]. By using interpolation instead of Taylor expansions, the convergence results can be significantly improved [23].
126
4 Application to integral operators
The restriction to weakly singular kernel functions can sometimes be overcome by using suitable globally-defined antiderivatives of the kernel function [25], but since this approach does not fit the concept of H 2 -matrices introduced here and since its error analysis requires fairly advanced tools which cannot be introduced in the context of this book, we do not discuss it further.
Motivation We assume that the kernel function is only weakly singular, i.e., that the order of the singularity is smaller than the dimension d of the subset or submanifold . For this case, Remark 4.50 states that the spectral error will be dominated by the error arising in the large clusters, while the blockwise error in leaf clusters will behave like h`d , i.e., it will decrease as the meshwidth h` decreases. We would like to ensure the same favourable convergence behaviour for all clusters, z In order to do this, we reconsider Lemma 4.48: let and thus for the entire matrix G. C b D .t; s/ 2 LJ be an admissible block. If we can ensure kg gQ t;s k1;Q t Qs .
q nt diam.Q t /
1=2
q ns diam.Qs /
1=2
instead of (4.51), where .n t / t2T and .ns /s2TJ are suitably-chosen families of parameters, the estimate (4.55) takes the form nt ns q j t j 1=2 q j s j 1=2 k t Gs V t Sb Ws k2 . ; (4.57) diam.Q t / diam.Qs / and proceeding as in Theorem 4.49 yields z 2. kG Gk
p X
;`
pJ 1=2 X
`D0 p
1=2 J;`
`D0
p
J for the families . ;` /`D0 and . J;` /`D0 given by ² nt ³ q j t j .`/ ;` ´ max W t 2 T diam.Q t / ² ns ³ q j s j .`/ J;` ´ max W s 2 TJ diam.Qs /
for all ` 2 f0; : : : ; p g; for all ` 2 f0; : : : ; pJ g:
This error estimate differs significantly from the one provided by Theorem 4.49: instead of working with the same interpolation order for all clusters, we can use a different order for each individual cluster. This variable-order approach [91], [90] allows us to compensate the growth of the cluster supports j t j by increasing the order of interpolation.
4.7 Variable-order approximation
127
We assume that the measure of the support of a cluster can be bounded by the diameter of the corresponding bounding box, i.e., that there is a constant Ccu 2 R>0 satisfying j t j Ccu diam.Q t /d
for all t 2 T ;
(4.58a)
d
for all s 2 TJ :
(4.58b)
j s j Ccu diam.Qs /
The size of the constant Ccu is determined by the geometry: if is a subset of Rd , we always have Ccu 1. If is a submanifold, the size of Ccu depends on how “tightly folded” is, e.g., on the curvature of the manifold (cf. Figure 4.3). Qt
Qt
t
t
Ccu D
p 2
Ccu > 10
Figure 4.3. Influence of the geometry of on the constant Ccu .
We also have to assume that the diameters of bounding boxes of leaf clusters are approximately proportional to the mesh parameter h and that they do not grow too rapidly as we proceed from the leaves of the cluster trees towards the root, i.e., that there are constants Cgr 2 R>0 and 2 R1 satisfying diam.Q t / Cgr h p level.t/
for all t 2 T ;
(4.59a)
diam.Qs / Cgr h pJ level.s/
for all s 2 TJ :
(4.59b)
Based on the assumptions (4.58) and (4.59), we find ³ ² nt q j t j .`/ ;` D max W t 2 T diam.Q t / Ccu maxfq n t diam.Q t /d W t 2 T.`/ g Ccu Cgrd hd maxfq n t . d /p ` W t 2 T.`/ g: For arbitrary parameters ˛; ˇ 2 N0 , we can choose the interpolation orders high enough to ensure n t ˛ C ˇ.p level.t // for all t 2 T . Then the error estimates takes the form ;` Ccu Cgrd hd q ˛ .q ˇ d /p ` :
128
4 Application to integral operators
For any given 20; 1Œ, we can let ˇ and observe
log .d / log log q q ˇ d ;
which implies the bound ;` Ccu Cgrd hd q ˛ p ` : This estimate allows us to bound the sum over all levels ` by a geometric sum, and bounding the sum by its limit yields p X
;` Ccu Cgrd hd q ˛
`D0
Ccu Cgrd hd q ˛
p X `D0 1 X
p ` D Ccu Cgrd hd q ˛
p X
`
`D0
` D Ccu Cgrd hd q ˛
`D0
1 : 1
p
J Applying the same reasoning to the family . J;` /`D0 yields
z 2 . hd kG Gk
q˛ : 1
(4.60)
Since we have assumed < d , this estimate implies that the approximation error will decrease like hd if the grid is refined. The additional parameter ˛ can be used to ensure that the approximation error is sufficiently small. The major advantage of this variable-order approximation scheme is that the order of the interpolation is bounded in the leaf clusters and grows only slowly if we proceed towards the root of the cluster tree. This means that the leaf clusters, and these dominate in the complexity estimates, are handled very efficiently. A detailed complexity analysis shows that the resulting rank distribution is bounded, i.e., the computational and storage complexity is optimal.
Re-interpolation scheme We have seen that we have to ensure n t ˛ C ˇ.p level.t//
for all t 2 T
if we want to reach the desirable error estimate (4.60). If we restrict our analysis to the case of isotropic interpolation, this can be achieved by using the order vectors m t; ´ ˛ C ˇ.p level.t //
for all t 2 T ; 2 f1; : : : ; d g:
(4.61)
4.7 Variable-order approximation
129
Since larger clusters now use a higher approximation order than smaller clusters, we have to expect that I t 0 ŒL t; ¤ L t; will hold for certain clusters t 2 T with t 0 2 sons.t / and certain 2 K t : if the order in the cluster t 0 is lower than the order used in t , not all Lagrange polynomials used for t can be represented in the basis used for t 0 . According to (4.20a), this can lead to X Vt ¤ Vt 0 Et 0 ; t 0 2sons.t/
i.e., the cluster basis V D .V t / t2T will in general not be nested. Losing the nested structure of the cluster basis is not acceptable, since it would lead to a less efficient representation of the matrix approximation. As in [23], [22], we fix this by constructing a nested cluster basis based on the – no longer nested – cluster basis V : we define Vz D .Vzt / t2T by ´ if sons.t / D ;; Vt for all t 2 T : Vzt ´ P z 0 0 otherwise t 0 2sons.t/ V t E t This cluster basis is obviously nested and uses the same rank distribution K D .K t / t2T as V . The same reasoning applies to the cluster basis W D .Ws /s2TJ , and we introduce z D .W zs /s2T by a similarly modified nested clusterbasis W J ´ if sons.s/ D ;; s zs ´ W for all s 2 TJ : W P z otherwise s 0 2sons.s/ Ws 0 Fs 0 z is the same as for W . Instead of approximating an admisThe rank distribution for W z z sible block b D .t; s/ 2 LC J by V t Sb Ws , we now approximate it by V t Sb Ws . z require no Remark 4.51 (Implementation). The algorithms for constructing Vz and W modifications compared to the standard case, we only have to ensure that the correct interpolation orders and Lagrange polynomials are used. Lemma 4.52 (Complexity). According to Lemma 4.10, we have #K t D .˛ C ˇ.p level.t///d
for all t 2 T :
Let T be quasi-balanced, i.e., let there be constants Cba 2 N and 2 R>1 satisfying #ft 2 T W level.t / D `g Cba `p c
for all ` 2 N0 :
Then the rank distribution K D .K t / t2T is .Cbn ; ˛; ˇ; d; /-bounded with Cbn ´ Cba
:
1
(4.62)
130
4 Application to integral operators
Proof. Let ` 2 N and R` ´ ft 2 T W #K t > .˛ C ˇ.` 1//d g: For a cluster t 2 R` , we have .˛ C ˇ.` 1//d < #K t D .˛ C ˇ.p level.t ///d ; ˛ C ˇ.` 1/ < ˛ C ˇ.p level.t //; ` 1 < p level.t /; level.t / < p ` C 1; level.t / p `: Since T is quasi-balanced, the estimate (4.62) yields p `
R`
[
ft 2 T W level.t / D ng;
nD0 p `
#R`
X
p `
#ft 2 T W level.t / D ng
nD0
Cba np c
nD0
X p
D Cba c
X
n D Cba c `
p `
X
n < Cba c `
nD0
nD`
1 D Cbn ` c 1 1=
with Cbn D Cba =. 1/. In order to analyze the approximation error corresponding to the new cluster bases z , we have to relate them to suitable approximation schemes for the kernel Vz and W function. Let t 2 T and i 2 tO. If sons.t / D ; holds, we have Z .V t /i D L t; .x/'i .x/ dx for all 2 K t
by definition. Otherwise, we let t 0 2 sons.t / with i 2 tO0 . If sons.t 0 / D ; holds, we have Z .V t 0 /i D L t 0 ; .x/'i .x/ dx for all 2 K t 0 ;
and since the transfer matrix E t 0 is given by .E t 0 / D L t; .x t 0 ; /
for all 2 K t ; 2 K t 0 ;
the definition of Vzt implies .Vzt /i D Vzt 0 E t 0 D V t 0 E t 0 D
X 2K t 0
Z L t; .x t 0 ; /
L t 0 ; .x/'i .x/ dx
4.7 Variable-order approximation
131
Z I t 0 ŒL t; .x/'i .x/ dx
D
for all 2 K t ;
i.e., the Lagrange polynomial L t; corresponding to t is replaced by its lower-order interpolant I t 0 ŒL t; . If t 0 is not a leaf, we proceed by recursively applying interpolation operators until we reach a leaf cluster containing i . The resulting nested interpolation operator is defined as follows: Definition 4.53 (Re-interpolation). For all t 2 T and all r 2 sons .t /, we define the re-interpolation operator Ir;t by ´ Ir;t 0 I t if there is a t 0 2 sons.t / with r 2 sons .t 0 /; Ir;t ´ It otherwise, i.e., if t D r: Due to Vzt D V t for t 2 L , we can derive interpolation operators corresponding to the cluster basis Vz inductively starting from the leaves of T . We collect all leaf clusters influencing a cluster in the families .L t / t2T and .Ls /s2TJ of sets given by L t ´ fr 2 L W r 2 sons .t /g Ls ´ fr 2 LJ W r 2 sons .s/g
for all t 2 T ; for all s 2 TJ :
and can express the connection between re-interpolation operators and the new cluster basis Vz by a simple equation: Lemma 4.54 (Re-interpolation). Let t 2 T , and let r 2 L t . Then we have Z z Ir;t ŒL t; .x/'i .x/ dx for all i 2 r; O 2 Kt : .V t /i D
(4.63)
Proof. By induction on level.r/ level.t /. Let t 2 T and r 2 L t with level.r/ level.t/ D 0. We have t D r 2 L , i.e., Vzt D V t by definition. According to X X L t; .x t; /L t; D ı L t; D L t; ; I t ŒL t; D 2K t
we find
2K t
Z .Vzt /i D .V t /i D Z
L t; .x/'i .x/ dx I t ŒL t; .x/'i .x/ dx
D
for all i 2 tO; 2 K t :
Let now n 2 N0 be such that (4.63) holds for all t 2 T and r 2 L t with level.r/ level.t/ D n. Let t 2 T and r 2 L t with level.r/ level.t / D n C 1. Due to the
132
4 Application to integral operators
definition of sons .t /, we can find t 0 2 sons.t / with r 2 sons .t 0 /. The definition of Vzt yields X .Vzt 0 /i .E t 0 / .Vzt /i D .Vzt 0 E t 0 /i D 2K t 0
X
D
.Vzt 0 /i L t; .x t 0 ; /
for all i 2 r; O 2 Kt :
2K t 0
Since level.r/ level.t 0 / D level.r/ level.t / 1 D n holds, we can apply the induction assumption and get Z X z 0 .V t /i D L t; .x t ; / Ir;t 0 ŒL t 0 ; .x/'i .x/ dx 2K t 0
Z
Ir;t 0
D
h X
i L t; .x t 0 ; /L t 0 ; .x/'i .x/ dx
2K t 0
Z
Ir;t 0 ŒI t 0 ŒL t; .x/'i .x/ dx
D Z
Ir;t ŒL t; .x/'i .x/ dx
D
for all i 2 r; O 2 Kt ;
which concludes the induction. zs Using this lemma, we can find a representation for subblocks of the matrix Vzt Sb W used to approximate a matrix block: Lemma 4.55 (Re-interpolated kernel). Let b D .t; s/ 2 LC 2 L t and J . Let t s 2 Ls . We have Z Z zs /ij D 'i .x/ .I t ;t ˝ Is ;s /Œg.x; y/'j .y/ dy dx .Vzt Sb W
O
for all i 2 t , j 2 sO . Proof. Let i 2 tO and j 2 sO . Lemma 4.54 and the definition of Sb (cf. 4.20c) yield X X zs /ij D zs /j .Vzt /i .Sb / .W .Vzt Sb W 2K t 2Ls
D
X X Z 2K t 2Ls
Z
D
Z
'i .x/
Z
I t ;t ŒL t; .x/'i .x/
Is ;s ŒLs; .y/
j .y/g.x t; ; xs; / dy
.I t ;t ˝ Is ;s / h X X i g.x t; ; xs; /L t; ˝ Ls; .x; y/
2K t 2Ls
j .y/ dy
dx
dx
4.7 Variable-order approximation
Z
Z 'i .x/
D Z D
Z
.I t ;t ˝ Is ;s /ŒI t ˝ Is Œg.x; y/ .I t ;t ˝ Is ;s /Œg
'i .x/
133
j .y/ dy
j .y/ dy
dx
dx:
In the last step, we have used the fact that I t and Is are projections. Using this representation of the matrix approximation, we can generalize the error estimate of Lemma 4.38 to the case of variable-order approximation: Lemma 4.56 (Blockwise error). Let b D .t; s/ 2 LC J , and let b 2 R>0 satisfy for all t 2 L t ; s 2 Ls :
kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs b Let C and CJ satisfy (4.48). Then we have
zs k2 k t Gs Vzt Sb W zs kF k t Gs Vzt Sb W X 1=2 X 1=2 C CJ b j i j j j j : i2tO
j 2Os
zs . For all u 2 RJ , we have Proof. Let X ´ t Gs Vzt Sb W 2 X X X XX kXuk22 D Xij uj Xij2 uj2 D kX k2F kuk22 i2
j 2J
i2
j 2J
j 2J
and can conclude kX k2 kX kF . Corollary 3.9 and Lemma 3.8 yield that ftO W t 2 L t g and fOs W s 2 Ls g are disjoint partitions of tO and sO , respectively, and we find X X X X XX kXk2F D k t Xs k2F D Xij2 ; t 2L t s 2Ls
t 2L t s 2Ls i2tO j 2Os
and we can apply Lemma 4.55 to each t 2 sons .t / \ L and s 2 sons .s/ \ LJ in order to find zs /ij j jXij j D jGij .Vzt Sb W ˇZ ˇ Z ˇ ˇ ˇ D ˇ 'i .x/ .g.x; y/ .I t ;t ˝ Is ;s /Œg.x; y// j .y/ dy dx ˇˇ Z Z b j'i .x/j dx j j .y/j dy b j i j1=2 k'i kL2 j j j1=2 k j kL2
b C CJ j i j
1=2
j j j1=2 :
Due to this estimate, we can conclude X X XX XX j i j j j j D b2 C2 CJ2 j i j j j j; kX k2F b2 C2 CJ2 t 2L t s 2Ls i2tO j 2Os
which is the desired result.
i2tO j 2Os
134
4 Application to integral operators
Figure 4.4. Re-interpolation of a Lagrange polynomial.
Assuming that the overlap of the supports is bounded, we can simplify this result: Corollary 4.57 (Blockwise spectral error). Let b D .t; s/ 2 LC J , and let b 2 R>0 satisfy kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs b
for all t 2 L t ; s 2 Ls :
Let C and CJ satisfy (4.48), and let the families . i /i2 and . j /j 2J be Cov overlapping. Then we have zs k2 Cov C CJ b j t j1=2 j s j1=2 : k t Gs Vzt Sb W Proof. Combine Lemma 4.56 with Lemma 4.37.
Error analysis in the one-dimensional case Before we can analyze the properties of the re-interpolation operators I t ;t in d dimensional space, we have to investigate the one-dimensional case. Let ˛ 2 N and ˇ 2 N0 . Let p 2 N, and let .J` /p`D0 be a family of non-trivial intervals J` D Œa` ; b` satisfying J`C1 J`
for all ` 2 f0; : : : ; p 1g:
For each ` 2 f0; : : : ; pg, we introduce the interpolation operator J
` ´ I˛Cˇ.p`/ : IJ;˛;ˇ `
The one-dimensional re-interpolation operator IJ;˛;ˇ is given recursively by ` ;` ´ IJ;˛;ˇ if ` < ` ; IJ;˛;ˇ ` ;`C1 ` for all `; ` 2 f0; : : : ; pg with ` `: ´ IJ;˛;ˇ ` ;` otherwise IJ;˛;ˇ ` (4.64)
4.7 Variable-order approximation
135
We can see that this definition implies IJ;˛;ˇ D IJ;˛;ˇ IJ;˛;ˇ : : : IJ;˛;ˇ IJ;˛;ˇ ` ;` ` ` 1 `C1 `
for all `; ` 2 f0; : : : ; pg with ` `: (4.65)
In order to bound the interpolation error by a best-approximation estimate, we have to establish the stability of IJ;˛;ˇ . The simple approach ` ;` kIJ;˛;ˇ Œf k1;Œa` ;b` ƒ˛Cˇ.p` / kIJ;˛;ˇ Œf k1;Œa` ;b`
` ;` ` 1;` ƒ˛Cˇ.p` / kIJ;˛;ˇ Œf k1;Œa` 1 ;b` 1
` 1;` ::: ƒ˛Cˇ.p` / ƒ˛Cˇ.p` C1/ : : : ƒ˛Cˇ.p`/ kf k1;Œa` ;b`
suggested by (4.65) will not yield a sufficiently good bound, since the resulting “stability constant” will grow too rapidly if ` ` grows. An improved estimate can be derived if we require that the intervals shrink sufficiently fast: Definition 4.58 ( -regular intervals). Let 20; 1Œ. If jJ`C1 j jJ` j
holds for all ` 2 f0; : : : ; p 1g;
we call the family J D .J` //p`D0 -regular. In the -regular case, we can exploit the fact that all but the rightmost interpolation operators in (4.65) are applied to polynomials and not to general functions: the growth of polynomials can be controlled, and applying Lemma 4.14 yields the following improved stability estimate: Theorem 4.59 (Stability of re-interpolation). Let the family J D .J` /p`D0 be -regular. Let the interpolation scheme .Im /1 mD1 be .ƒ; /-stable. Then there is a constant Cre 2 R1 depending only on , ƒ, and ˇ which satisfies kIJ;˛;ˇ Œf k1;J` Cre ƒ˛Cˇ.p`/ kf k1;J` ` ;` for all `; ` 2 f0; : : : ; pg with ` ` and all f 2 C.J` /. Proof. Cf. Theorem 3.11 in [23]. Using this inequality, we can derive an error estimate for the re-interpolation operator:
136
4 Application to integral operators
Lemma 4.60 (Re-interpolation error). Let the family J D .J` /p`D0 be -regular. Let the interpolation scheme be .ƒ; /-stable. Let Cre 2 R1 be the constant introduced in Theorem 4.59. Let `; ` 2 f0; : : : ; pg with ` ` . We have
kf
IJ;˛;ˇ Œf ` ;`
k1;J` Cre ƒ
` X
.˛ C ˇ.p n/ C 1/ kf IJ;˛;ˇ Œf k1;Jn (4.66) n
nD`
for all f 2 C.J` /. Proof. Let f 2 C.J` /. We define ´ I if n D ` C 1; Pn ´ ˛;ˇ I` ;n otherwise
for all n 2 f`; : : : ; ` C 1g:
, and Theorem 4.59 yields According to (4.64), we have Pn D PnC1 IJ;˛;ˇ n kPn Œgk1;J` Cre ƒ˛Cˇ.pn/ kgk1;Jn
for all g 2 C.Jn /; n 2 f`; : : : ; ` g:
We can use the .ƒ; /-stability of the interpolation scheme in order to prove ´ Cre ƒ.˛ C ˇ.p .n C 1// C 1/ if n < ` ; kPnC1 Œgk1;J` kgk1;Jn 1 otherwise Cre ƒ.˛ C ˇ.p n/ C 1/ kgk1;Jn for all g 2 C.Jn / and all n 2 f`; : : : ; ` g. Using a telescoping sum, we get kf
IJ;˛;ˇ Œf ` ;`
k1;J`
` X D PnC1 Œf Pn Œf
1;J`
nD`
` X D PnC1 f IJ;˛;ˇ Œf n
1;J`
nD` ` X Œf PnC1 f IJ;˛;ˇ n nD`
1;J`
Cre ƒ
` X
.˛ C ˇ.p n/ C 1/ kf IJ;˛;ˇ Œf k1;Jn ; n
nD`
and this is the required estimate. We can use Theorem 4.16 in order to bound the terms in the sum (4.66): if we again assume jf
./
Cf .x/j f 1
for all x 2 Œa` ; b` ; 2 N0 ;
4.7 Variable-order approximation
137
we observe Œf k1;Jn 2eCf .ƒ˛Cˇ.pn/ C 1/.˛ C ˇ.p n/ C 1/ kf IJ;˛;ˇ n 2f ˛ˇ.pn/ jJn j 1C % f jJn j for all n 2 f`; : : : ; ` g. Due to -regularity, we have jJn j n` jJ` j
for all n 2 f`; : : : ; ` g;
and Lemma 4.78 yields that there is a O 20; 1Œ satisfying ` 2f 2f 2f 1 % % % ` : jJn j jJ` j
jJ` j
O We can use the additional factor O ` in order to bound the sum (4.66): Theorem 4.61 (Re-interpolation error). Let the family .J` /p`D0 be -regular. Let the interpolation scheme be .ƒ; /-stable. Let a rank parameter ˇ 2 N0 and a lower bound r0 2 R>0 for the convergence radius be given. There are constants Cri 2 R1 , ˛0 2 N such that 1 2C kf IJ;˛;ˇ 1 C %.r0 /˛ˇ.p`/ Œf k C C .˛ C ˇ.p `/ C 1/ 1;J` f ri ` ;` r0 holds for all `; ` 2 f0; : : : ; pg with ` `, all ˛ ˛0 and all f 2 C 1 .J` / satisfying jf
./
Cf .x/j f 1
for all x 2 J` ; 2 N0
(4.67)
for constants Cf 2 R0 , f 2 R>0 and 2 N with 2f =jJ` j r0 . Proof. Let Cre 2 R1 be the constant introduced in Theorem 4.59. According to Lemma 4.78, there is a constant O 20; 1Œ satisfying 1 r %.r/ for all r 2 Rr0 : %
O Let w 20; 1Œ. Due to O < 1, we can find ˛0 2 N such that
O ˛0 w%.r0 /ˇ holds. We define Cri ´ 2eƒ2 Cre =.1 w/. Let now `; ` 2 f0; : : : ; pg with ` `, ˛ 2 N˛0 , and let f 2 C 1 .J` / be a function satisfying (4.67) for constants Cf 2 R0 , f 2 R>0 and 2 N with 2f =jJ` j r0 . According to Theorem 4.16, we have
138
4 Application to integral operators
kf IJ;˛;ˇ Œf k1;Jn n
2f ˛ˇ.pn/ jJn j % 2eCf .ƒ˛Cˇ.pn/ C 1/.˛ C ˇ.p n/ C 1/ 1 C 2f jJn j ˛ˇ.pn/ 2f 1 2eƒCf .˛ C ˇ.p n/ C 1/C 1 C % n` r0
jJ` j 2f ˛ˇ.pn/ 1 O .n`/.˛Cˇ.pn// C 2eƒCf .˛ C ˇ.p n/ C 1/ % 1C
r0 jJ` j 1 2eƒCf .˛ C ˇ.p `/ C 1/C 1 C
O ˛0 .n`/ %.r0 /˛ˇ.pn/ r0 1 C 1C w n` %.r0 /ˇ.n`/ %.r0 /˛ˇ.pn/ 2eƒCf .˛ C ˇ.p `/ C 1/ r0 1 w n` %.r0 /˛ˇ.p`/ D 2eƒCf .˛ C ˇ.p `/ C 1/C 1 C r0
for all n 2 f`; : : : ; ` g. We can combine this estimate with Lemma 4.60 in order to find Œf k1;J` kf IJ;˛;ˇ ` ;`
Cre ƒ
` X
.˛ C ˇ.p n// kf IJ;˛;ˇ Œf k1;Jn n
nD`
` X 1 ˛ˇ.p`/ 1C %.r0 / Cre 2eƒ Cf .˛ C ˇ.p `/ C 1/ w n` r0 nD` 1 Cre 2eƒ2 %.r0 /˛ˇ.p`/ Cf .˛ C ˇ.p `/ C 1/2C 1 C 1w r0 1 2C 1C %.r0 /˛ˇ.p`/ ; D Cf Cri .˛ C ˇ.p `/ C 1/ r0 2
2C
which is the desired result.
Error analysis in the multi-dimensional case Now we can turn our attention to the multi-dimensional case. Let p 2 N0 . Let .Q` /p`D0 be a family of non-trivial axis-parallel boxes Q` D Œa`;1 ; b`;1 Œa`;d ; b`;d satisfying Q`C1 Q`
for all ` 2 f0; : : : ; p 1g:
139
4.7 Variable-order approximation
We let m` ´ ˛ C ˇ.p `/ for all ` 2 f0; : : : ; pg and introduce a tensor-product interpolation operator ` IQ;˛;ˇ ´ IQ m` `
for all ` 2 f0; : : : ; pg:
(4.68)
The multi-dimensional re-interpolation operators are given by ´ IQ;˛;ˇ IQ;˛;ˇ if ` < ` ; Q;˛;ˇ ` ;`C1 ` I` ;` ´ for all `; ` 2 f0; : : : ; pg with ` `: otherwise IQ;˛;ˇ ` (4.69) For each 2 f1; : : : ; d g, the family Q D .Q` /p`D0 gives rise to a family J ´ .J;` /p`D0 of non-trivial intervals J;` ´ Œa`; ; b`; satisfying J;`C1 J;`
for all 2 f1; : : : ; d g; ` 2 f0; : : : ; p 1g;
and the equation Q` D J1;` Jd;` implies that IQ;˛;ˇ D IJ` 1 ;˛;ˇ ˝ ˝ I` d `
J ;˛;ˇ
holds for all ` 2 f0; : : : ; pg:
Using the definition of the re-interpolation operators yields ;˛;ˇ IQ;˛;ˇ D IJ`1;` ˝ ˝ I`d;` ` ;`
J ;˛;ˇ
for all `; ` 2 f0; : : : ; pg with ` ` ;
i.e., we have found a tensor-product representation for the multi-dimensional re-interpolation operator. We proceed as in Section 4.4: the operators IQ;˛;ˇ are characterized by ` ;` directional re-interpolation operators, which can be analyzed by using one-dimensional results. For all 2 f1; : : : ; d g and all `; ` 2 f0; : : : ; pg with ` ` , we let IQ;˛;ˇ ´ I ˝ ˝ I ˝I`J ;˛;ˇ ˝ I ˝ ˝ I ;` ` ;`; „ ƒ‚ … „ ƒ‚ … d times
1 times
and observe IQ;˛;ˇ D ` ;`
d Y
IQ;˛;ˇ ` ;`;
for all `; ` 2 f0; : : : ; pg with ` ` :
D1
The directional operators can again be expressed as polynomials in one variable: for
2 f1; : : : ; d g, `; ` 2 f0; : : : ; pg with ` ` , a function f 2 C.Q` / and all points x 2 Q` , we have IQ;˛;ˇ Œf .x/ D ` ;`;
m` X D1
J
J
f .x1 ; : : : ; x1 ; m;` ; xC1 ; : : : ; xd /I`J ;˛;ˇ ŒLm;` .x /: ` ; ` ; ;`
140
4 Application to integral operators
Now we can turn our attention to the problem of establishing stability and approximation error estimates for the directional operators. For the one-dimensional re-interpolation operators, the -regularity of the sequence of intervals is of central importance, therefore it is straightforward to introduce its multi-dimensional counterpart: Definition 4.62 ( -regular boxes). Let 20; 1Œ. The family Q D .Q` /p`D0 of boxes is called -regular if we have jJ;`C1 j jJ;` j
for all ` 2 f0; : : : ; p 1g; 2 f1; : : : ; d g;
i.e., if all the families J are -regular, where Q` D J1;` Jd;`
for all ` 2 f0; : : : ; pg:
If the bounding boxes are -regular, we can generalize the Lemmas 4.18 and 4.19: Lemma 4.63 (Stability of directional re-interpolation). Let Q D .Q` /p`D0 be -regular. Let the interpolation scheme be .ƒ; /-stable. Then there is a constant Cre 2 R1 depending only on , ƒ, and ˇ which satisfies Œf k1;Q` Cre ƒ˛Cˇ.p`/ kf k1;Q` kIQ;˛;ˇ ` ;`; for all 2 f1; : : : ; d g, f 2 C.Q` / and `; ` 2 f0; : : : ; pg with ` ` . Proof. Combine the technique used in the proof of Lemma 4.18 with the result of Theorem 4.59. Lemma 4.64 (Directional re-interpolation error). Let Q D .Q` /p`D0 be a family of
-regular boxes. Let the interpolation scheme be .ƒ; /-stable. Let a rank parameter ˇ 2 N0 and a lower bound r0 2 R>0 for the convergence radius be given. There are constants Cri 2 R1 , ˛0 2 N such that 1 2C kf IQ;˛;ˇ Œf k C C .˛ C ˇ.p `/ C 1/ 1 C %.r0 /˛ˇ.p`/ 1;Q` f ri ` ;`; r0 holds for all 2 f1; : : : ; d g, all `; ` 2 f0; : : : ; pg with ` ` , all ˛ 2 N˛0 and all f 2 C 1 .Q` / satisfying k@ f k1;Q`
Cf f 1
for all 2 N0
for constants Cf 2 R0 , f 2 R>0 and 2 N with 2f =jJ;` j r0 . Proof. Proceed as in the proof of Lemma 4.19 and use Theorem 4.61.
4.7 Variable-order approximation
141
Combining the Lemmas 4.63 and 4.64 yields the following multi-dimensional error estimate for the re-interpolation operator: Theorem 4.65 (Multi-dimensional re-interpolation error). Let Q D .Q` /p`D0 be regular. Let the interpolation scheme be .ƒ; /-stable. Let a rank parameter ˇ 2 N0 and a lower bound r0 2 R>0 for the convergence radius be given. There are constants Cmr 2 R1 , ˛0 2 N such that kf IQ;˛;ˇ Œf k1;Q` ` ;`
1 Cf Cmr .˛ C ˇ.p `/ C 1/.d C1/C 1 C %.r0 /˛ˇ.p`/ r0
holds for all `; ` 2 f0; : : : ; pg with ` ` , all ˛ 2 N˛0 and all f 2 C 1 .Q` / satisfying k@ f k1;Q`
Cf f 1
for all 2 N0 ; 2 f1; : : : ; d g
and 2f =jJ;` j r0 for all 2 f1; : : : ; d g for constants Cf 2 R0 , f 2 R>0 and 2 N. Proof. As in the proof of Theorem 4.20, replacing the Lemmas 4.18 and 4.19 by the Lemmas 4.63 and 4.64, and using Cmr ´ Cri Cred 1 .
Application to the kernel function Before we can apply Theorem 4.65, we have to ensure that all relevant sequences of bounding boxes are -regular. Definition 4.66 ( -regular bounding boxes). Let 20; 1Œ. Let .Q t / t2T be a family of bounding boxes for a cluster tree T . It is called -regular if all boxes have to form Q t D J t;1 J t;d for suitable non-trivial intervals J t;1 ; : : : ; J t;d and if J t 0 ; J t; ;
jJ t 0 ; j jJ t; j
holds for all t 2 T ; t 0 2 sons.t /; 2 f1; : : : ; d g:
Let .Q t / t2T and .Qs /s2TJ be families of bounding boxes for the cluster trees T and TJ . In order to be able to apply Theorem 4.65, we have to find a sequence of bounding boxes for each pair t 2 T , t 2 L t of clusters.
142
4 Application to integral operators
Lemma 4.67 (Sequence of ancestors). For each t 2 T and each t 2 L t , there is a sequence .t` /p`D0 with p D level.t / satisfying t0 D root.T /;
tlevel.t/ D t;
tp D t
and t`C1 2 sons.t` /
for all ` 2 f0; : : : ; p 1g:
Proof. By induction on level.t / 2 N0 . If level.t / D 0 holds, t0 D t D root.T / is the solution. Now let n 2 N0 be such that we can find the desired sequence for all t 2 T , t 2 L t with level.t / D n. Let t 2 T and t 2 L t with level.t / D n C 1. Due to Definition 3.6, there is a cluster t C 2 T with t 2 sons.t C / and level.t C / D n. We can apply the induction assumption to get a sequence t0 ; : : : ; tn of clusters with t0 D root.T /, tn D t C and t`C1 2 sons.t` / for all ` 2 f0; : : : ; n1g. We let p ´ nC1 and tp ´ t . Due to definition, we have t 2 sons .tlevel.t/ / and t 2 sons .t /, i.e., ; ¤ tO tOlevel.t/ \ tO, and since the levels of tlevel.t/ and t are identical, Lemma 3.8 yields t D tlevel.t/ , thus concluding the induction. We can use this result to construct sequences of bounding boxes corresponding to the re-interpolation operators I t ;t and Is ;s and get the following error estimate: Theorem 4.68 (Variable-order interpolation). Let .Q t / t2T and .Qs /s2TJ be -regular families of bounding boxes. Let the kernel function g be .Cas ; ; c0 /-asymptotically smooth with > 0. Let the interpolation scheme be .ƒ; /-stable. Let 2 R>0 . Let ˇ 2 N0 . There are constants Cvo 2 R1 and ˛0 2 N such that kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs
Cvo .˛ C ˇ`C C 1/.2d C1/C 1 ˛ˇ ` % dist.Q t ; Qs / c0
with `C ´ maxfp level.t/; pJ level.s/g; ` ´ minfp level.t /; pJ level.s/g holds for all ˛ 2 N˛0 , all blocks b D .t; s/ 2 LC J satisfying the standard admissibility condition maxfjJ t; j; jJs; j W 2 f1; : : : ; d gg 2 dist.Q t ; Qs /
(4.70)
and all t 2 L t and s 2 Ls . Proof. Let ˛ 2 N˛0 , let b D .t; s/ 2 LC J satisfying (4.70), and let t 2 L t and s 2 Ls . In order to be able to apply Theorem 4.65, we have to find families of boxes Q D .Q` /p`D0 such that the corresponding re-interpolation operators IQ;˛;ˇ coincide ` ;` with I t ;t or Is ;s .
4.7 Variable-order approximation
143
Without loss of generality, we only consider I t ;t . Let ` t ´ level.t / and `t ´ `
t satisfying t` D t , level.t /. According to Lemma 4.67, there is a sequence .t` /`D0 t` D t and
t`C1 2 sons.t` /
for all ` 2 f0; : : : ; `t 1g:
`
t by Q` ´ Q t` . This family is -regular due to We define the family Q ´ .Q` /`D0 our assumptions, and comparing Definition 4.53 and (4.68), (4.69) yields t ;ˇ I`Q;˛ D I t ;t ;` t
t
with ˛ t ´ ˛ C ˇ.p `t / ˛:
The asymptotic smoothness of g implies that we can apply the re-interpolation error estimate provided by Theorem 4.65 in order to get kg .I t ;t ˝ I /Œgk1;Q t Qs
Cf Cmr .˛ t C ˇ.`t ` t //.d C1/C .1 C c0 / 1 ˛ t ˇ.` t ` t / % dist.Q t ; Qs / c0 Cf Cmr .˛ C ˇ.p ` t //.d C1/C .1 C c0 / 1 ˛ˇ.p ` t / % dist.Q t ; Qs / c0 ˛ˇ ` Cf Cmr .˛ C ˇ`C /.d C1/C .1 C c0 / 1 % : dist.Q t ; Qs / c0
Applying the same reasoning to Is ;s yields kg .I ˝ Is ;s /Œgk1;Q t Qs
Cf Cmr .˛ C ˇ`C /.d C1/C .1 C c0 / 1 ˛ˇ ` % : dist.Q t ; Qs / c0
Due to the stability estimate of Theorem 4.59, we have k.I t ;t ˝ I /Œf k1;Q t Qs Cred ƒd .˛ C ˇ`C /d kf kQ t Qs for all f 2 C.Q t Qs /, and we can conclude kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs kg .I t ;t ˝ I /Œgk1;Q t Qs C k.I t ;t ˝ I /Œg .I ˝ Is ;s /Œgk1;Q t Qs kg .I t ;t ˝ I /Œgk1;Q t Qs C Cred ƒd .˛ C ˇ`C /d kg .I ˝ Is ;s /Œgk1;Q t Qs Cf Cmr Cred ƒd .˛ C ˇ`C C 1/.2d C1/C .1 C c0 / 1 ˛ˇ ` % : dist.Q t ; Qs / c0
144
4 Application to integral operators
Defining the constant Cvo as Cvo ´ Cf Cmr Cred ƒd .1 C c0 / yields the desired estimate. In order to be able to apply Theorem 4.47, we have to separate the clusters t and s appearing in the estimate of Theorem 4.68. We can do this by assuming that the quantities p level.t / and pJ level.s/ describing the number of re-interpolation steps are close to each other: Definition 4.69 (Ccn -consistency). Let Ccn 2 N0 , and let TJ be an admissible block cluster tree. TJ is called Ccn -consistent if j.p level.t // .pJ level.s//j Ccn
for all b D .t; s/ 2 LC J :
For standard grid hierarchies, we can find a uniform bound for the difference between p and pJ for all meshes in the hierarchy, i.e., the left-hand term in Definition 4.69 can be bounded uniformly for all meshes. This implies that the corresponding block cluster trees will be Ccn -consistent for a constant Ccn which does not depend on the mesh, but only on the underlying geometry and the discretization scheme. Before we can apply Theorem 4.47 to bound the spectral error of the variable-order approximation, we have to establish the factorized error estimates of the form (4.53). Corollary 4.70 (Factorized estimate). Let .Q t / t2T and .Qs /s2TJ be -regular families of bounding boxes. Let TJ be Ccn -consistent. Let the kernel function g be .Cas ; ; c0 /-asymptotically smooth with > 0. Let the interpolation scheme be .ƒ; /stable. Let 2 R>0 . Let ˇ 2 N0 . There are constants Cin 2 R>0 and ˛0 2 N such that kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs !1=2 !1=2 Cin q ˛Cˇ.pJ level.s// Cin q ˛Cˇ.p level.t// diam.Q t / diam.Qs / holds with
²
c0 c0 ; q ´ min c0 C 1 2
(4.71)
³
for all ˛ 2 N˛0 , all blocks b D .t; s/ 2 LC J satisfying the admissibility condition maxfdiam.Q t /; diam.Qs /g 2 dist.Q t ; Qs / and all t 2 L t and s 2 Ls .
(4.72)
4.7 Variable-order approximation
145
Proof. Let ˛ 2 N˛0 and let b D .t; s/ 2 LC J satisfy the admissibility condition (4.72). We have to find bounds for the three important factors appearing in the error estimate of Theorem 4.68. The admissibility condition (4.72) yields 1 .2/ dist.Q t ; Qs / maxfdiam.Q t / ; diam.Qs / g 1 1 .2/ : diam.Q t /=2 diam.Qs /=2
(4.73)
Using the consistency of the block cluster tree TJ , we find ˛ C ˇ`C C 1 ˛ C ˇ.p level.t // C 1 C Ccn ˇ .Ccn ˇ C 1/.˛ C ˇ.p level.t // C 1/; ˛ C ˇ`C C 1 ˛ C ˇ.pJ level.s// C 1 C Ccn ˇ .Ccn ˇ C 1/.˛ C ˇ.pJ level.s// C 1/; and can bound the stability term by .˛ C ˇ`C C 1/.2d C1/C .Ccn ˇ C 1/1=2 .˛ C ˇ.p level.t // C 1/1=2 .˛ C ˇ.pJ level.s// C 1/1=2 :
(4.74)
The convergence term can also be bounded by using the consistency. We get
1 % c0
%
1 c0
˛ˇ `
˛ˇ `
1 ˛ˇ.p level.t//CCcn ˇ % c0 1 Ccn ˇ 1 ˛ˇ.p level.t// D% % ; c0 c0 1 ˛ˇ.pJ level.s//CCcn ˇ % c0 1 Ccn ˇ 1 ˛ˇ.pJ level.s// D% % c0 c0
and conclude 1 ˛ˇ ` 1 Ccn ˇ=2 1 .˛Cˇ.p level.t///=2 % % % c0 c0 c0 .˛Cˇ.pJ level.s///=2 1 % : c0
(4.75)
Combining Theorem 4.68 with (4.73), (4.74) and (4.75) yields the factorized error
146
4 Application to integral operators
estimate kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs 1=2 0 C .˛ C ˇ.p level.t // C 1/.2d C1/C diam.Q t / %.1=.c0 //˛Cˇ.p level.t// 1=2 0 C .˛ C ˇ.pJ level.s// C 1/.2d C1/C ; diam.Qs / %.1=.c0 //˛Cˇ.pJ level.s//
(4.76)
where we abbreviate C 0 ´ Cvo .2/ .Ccn ˇ C 1/1=2 %
1 c0
Ccn ˇ=2 :
As in Remark 4.23, we can see that q>
1 %.1=.c0 //
holds, i.e., we have ´ q%.1=.c0 // > 1. We consider only the first factor on the right-hand side of estimate (4.76): introducing the polynomial S W R ! R;
x 7! C 0 .x C 1/.2d C1/C
yields C 0 .˛ C ˇ.p level.t // C 1/.2d C1/C S.˛ C ˇ.p level.t /// D ˛Cˇ.p level.t// %.1=.c0 // %.1=.c0 //˛Cˇ.p level.t// S.n/ S.n/ D D qn n n .=q/ for n ´ ˛ C ˇ.p level.t //. Since S is a polynomial and due to > 1, we can find a constant Cin 2 R>0 such that S.x/ Cin x
holds for all x 2 R1 ;
and we conclude C 0 .˛ C ˇ.p level.t // C 1/.2d C1/C Cin q n : %.1=.c0 //˛Cˇ.p level.t//
Global spectral error estimate Now we can assemble the spectral error estimate provided by Theorem 4.47, the blockwise estimate of Lemma 4.45 and the factorized error estimate of Corollary 4.70.
4.7 Variable-order approximation
147
We assume that the families .Q t / t2T and .Qs /s2TJ of bounding boxes are regular, that the block cluster tree TJ is Ccn -consistent and Csp -sparse, that the kernel function is .Cas ; ; c0 /-asymptotically smooth with > 0, that the interpolation scheme is .ƒ; /-stable, that the supports . i /i2 and . j /j 2J of the basis functions are Cov -overlapping and that the L2 -norms of the basis functions .'i /i2 and . j /j 2J are bounded by constants C ; CJ 2 R>0 defined as in (4.48). We fix an admissibility parameter 2 R>0 and assume that (4.58) holds, i.e., that the domain is not folded too tightly, and that (4.59) holds, i.e., that the bounding boxes do not grow too rapidly. Lemma 4.71 (Factorized error estimate). Let ˇ 2 N0 . With the constants Cin 2 R>0 , q 20; 1Œ and ˛0 2 N from Corollary 4.70, we have zs k2 k t Gs Vzt Sb W
q Cin C2 Cov j t j
˛Cˇ.p level.t//
!1=2
diam.Q t /
q Cin CJ2 Cov j s j
˛Cˇ.pJ level.s//
!1=2
diam.Qs /
for all ˛ 2 N˛0 and all blocks b D .t; s/ 2 LC J satisfying the admissibility condition (4.72). Proof. We can use Corollary 4.70 to bound the interpolation error by kg .I t ;t ˝ Is ;s /Œgk1;Q t Qs b ´
Cin q ˛Cˇ.p level.t// diam.Q t /
!1=2
Cin q ˛Cˇ.pJ level.s// diam.Qs /
!1=2
and since b does not depend on t or s , we can use Lemma 4.56 to conclude zs k2 C CJ b k t Gs Vzt Sb W
X i2tO
1=2 X
j i j
1=2 j j j
:
j 2Os
Since the supports are Cov -overlapping, this estimate can be simplified: zs k2 C CJ b j t j1=2 j s j1=2 : k t Gs Vzt Sb W Combining this estimate with the definition of b yields the desired bound for the blockwise error. Combining Theorem 4.47 with this estimate yields the following error bound for the global spectral error: Theorem 4.72 (Variable-order spectral error). Let ˇ 2 N0 , and let d . With the constants Cin 2 R>0 , q 20; 1Œ and ˛0 2 N from Corollary 4.70 and Ccu 2 R>0 and
148
4 Application to integral operators
d 2 N from (4.58), we have z 2 Csp Ccu Cgrd Cin Cov C CJ hd q ˛ kG Gk p X
ˇ d p `
.q
/
pJ 1=2 X
`D0
.q ˇ d /pJ `
1=2
`D0
for all ˛ 2 R˛0 . Proof. We combine Theorem 4.47 with Lemma 4.71. In the resulting estimate, we use (4.58) and (4.59) in order to find Ccu diam.Q t /d j t j D Ccu diam.Q t /d diam.Q t / diam.Q t / Ccu Cgrd hd .d /.p level.t//
for all t 2 T ;
j s j Ccu diam.Qs /d D Ccu diam.Qs /d diam.Qs / diam.Qs / Ccu Cgrd hd .d /.pJ level.s//
for all s 2 TJ
and notice that combining these factors with the ones from Lemma 4.71 yields the desired result. We can use the rank parameter ˇ to compensate the growth of the clusters and the rank parameter ˛ to reduce the global error to any prescribed accuracy: Corollary 4.73 (Variable-order approximation). Let d . Let C 2 R>0 . We can ensure z 2 C C CJ hd kG Gk by choosing the rank parameters ˛ 2 N and ˇ 2 N0 appropriately. Proof. Let ˇ 2 N0 with
q ˇ d < 1=2:
According to Theorem 4.72, we can choose ˛ 2 N˛0 with 2Csp Ccu Cgrd Cin Cov q ˛ C and observe p X `D0 pJ
X `D0
ˇ d p `
.q
/
.q ˇ d /pJ `
p p ` X 1 `D0 pJ
2
2;
X 1 pJ ` 2; 2 `D0
and applying Theorem 4.72 yields the desired bound.
4.8 Technical lemmas
149
4.8 Technical lemmas Lemma 4.74 (Cardinality of multi-index sets). We have nCd d #f 2 N0 W jj ng D d
(4.77)
for all n 2 N0 and all d 2 N. Proof. We first prove
n X j Cd nCd C1 D d d C1
(4.78)
j D0
for all d 2 N and all n 2 N0 by induction on n. Let d 2 N. For n D 0, the equation is trivial. Assume now that (4.78) holds for n 2 N0 . We conclude the induction by observing n nC1 X j C d X j Cd nCd C1 nCd C1 nCd C1 D C D C d d d d C1 d j D0
j D0
.n C d C 1/Š .n C d C 1/Š C nŠ.d C 1/Š .n C 1/Šd Š .n C d C 1/Š.n C 1/ .n C d C 1/Š.d C 1/ D C .n C 1/Š.d C 1/Š .n C 1/Š.d C 1/Š .n C d C 1/Š.n C d C 2/ nCd C2 D D : .n C 1/Š.d C 1/Š d C1
D
Now we prove (4.77) by induction on d 2 N. For d D 1, we have f 2 Nd0 W jj ng D f0; : : : ; ng; i.e.,
.n C 1/Š nCd #f 2 W jj ng D n C 1 D nŠ D : 1Š d Let us now assume that (4.77) holds for a given d 2 N and all n 2 N0 . Let n 2 N0 . We have n [ d C1 W jj ng D f.; i / W 2 Nd0 ; jj n i g; f 2 N0 Nd0
iD0
which implies #f 2 Nd0 C1 W jj ng D
n X
#f.; i / W 2 Nd0 ; jj n i g
iD0
D
n X iD0
f 2 Nd0 W jj n i g
150
4 Application to integral operators
X n n X j Cd nCd C1 ni Cd D ; D D d d C1 d j D0
iD0
where we have used (4.78) in the last step. Lemma 4.75 (Taylor approximation error). Let ! Rd be a star-shaped domain with center z0 2 Rd . Let m 2 N, and let f 2 C m .!/. For all z 2 !, the approximation error of the Taylor approximation fQz0 ;m .z/ D
X
@ f .z0 /
jj<m
.z z0 / Š
is given by X Z
f .z/ fQz0 ;m .z/ D m
jjDm
1
.1 t /m1 @ f .z0 C .z z0 /t / dt
0
.z z0 / : (4.79) Š
Proof. Let z 2 ! and p ´ z z0 . We define the function F W Œ0; 1 ! R;
t 7! f .z0 C pt /;
which satisfies F .1/ D f .z/, and we define its m-th order Taylor approximation FQm W Œ0; 1 ! R;
t 7!
m1 X i iD0
t .i/ F .0/; iŠ
for all m 2 N. A simple induction yields F .i/ .t/ D
X iŠ @ f .z0 C pt /p ; fQz0 ;m .z/ D FQm .1/ Š
for all i 2 f0; : : : ; mg;
jjDi
so we can apply Lemma 2.1 to get Z 1 1 .1 t /m1 F .m/ .t / dt .m 1/Š 0 X mŠ Z 1 1 D .1 t /m1 @ f .z0 C pt /p dt .m 1/Š Š 0 jjDm X Z 1 .z z0 / Dm ; .1 t /m1 @ f .z0 C pt / dt Š 0
F .1/ FQm .1/ D
jjDm
and this is the desired result.
4.8 Technical lemmas
151
Lemma 4.76 (Holomorphic extension). Let f 2 C 1 Œ1; 1, Cf 2 R>0 , f 2 R>0 and 2 N be such that jf ./ .x/j
Cf f 1
holds for all 2 N0 ; x 2 Œ1; 1:
(4.80)
Let r 2 Œ0; f Œ, and let Rr ´ fz 2 C W there is a z0 2 Œ1; 1 with jz z0 j rg: We have Œ1; 1 Rr , and there is a holomorphic extension fQ 2 C 1 .Rr / of f satisfying f Q jf .z/j Cf for all z 2 Rr : (4.81) f r Proof. For all 2 N, we define the function s W Œ0; 1Œ! R; and observe that s .`/
7!
` . 1 C `/Š D s C` s C` D 1 . 1/Š
1 ; .1 /
holds for all ` 2 N0 ; 2 N:
Computing the Taylor expansion of s at zero yields s ./ D
1 X 1 D0
for all 2 Œ0; 1Œ:
(4.82)
Let us now turn our attention to the function f . For all z0 2 Œ1; 1, we define the disc Dz0 ´ fz 2 C W jz z0 j rg centered at z0 with radius r. Let z0 2 Œ1; 1. For all z 2 Dz0 , we have jz z0 j r < f , and the upper bound (4.80) implies ˇ 1 1 ˇ ˇ X X ˇ ./ jz z0 j ˇf .z0 / .z z0 / ˇ Cf ˇ ˇ Š f 1 D0 D0 1 1 X X r Cf D Cf 1 1 f D0 D0 for ´ r=f 2 Œ0; 1Œ. Comparing this equation with (4.82) yields ˇ 1 ˇ ˇ X ˇ ./ ˇf .z0 / .z z0 / ˇ Cf s ./ < 1; ˇ ˇ Š D0
(4.83)
152
4 Application to integral operators
i.e., the Taylor series in the point z0 converges on Dz0 , therefore the function fQz0 W Dz0 ! C;
z 7!
1 X
f .i/ .z0 /
iD0
.z z0 /i ; iŠ
is holomorphic. Let z 2 Dz0 \ Œ1; 1 R. Combining (4.27) and the error equation (2.5) of the truncated Taylor expansion, we find ˇ ˇ m1 X ˇ .z z0 /i ˇˇ .i/ ˇf .z/ f .z0 / ˇ ˇ iŠ iD0 ˇZ 1 ˇ ˇ .z z0 /m ˇˇ m1 .m/ ˇ D ˇ .1 t / f .z0 C t .z z0 // dt ˇ .m 1/Š 0 Z 1 jz z0 jm dt .1 t /m1 jf .m/ .z0 C t .z z0 //j .m 1/Š 0 m Z 1 jz z0 j .1 t /m1 Cf . 1 C m/Š dt f .m 1/Š 0 . 1 C m/Š r m Cf mŠ f for all m 2 N. Due to r=f < 1, sending m to infinity yields f .z/ D fQz0 .z/ for all z 2 Dz0 \ Œ1; 1, i.e., fQz0 is an extension of f into Dz0 . Now we can define the extension fQ of f into Rr . For all z 2 Rr , we pick a point z0 2 Œ1; 1 with jz z0 j r and define fQ.z/ ´ fQz0 .z/. Since all holomorphic functions fQz0 are extensions of f , the identity theorem for holomorphic functions yields that fQ is a well-defined holomorphic extension of f in Rr . The inequality (4.83) implies (4.81). p Lemma 4.77 (Covering of E% ). Let r 2 R>0 and % ´ r C 1 C r 2 . Let Rr C be defined in Lemma 4.76 and let E% C be the regularity ellipse defined in Lemma 4.14. Then we have E% Rr . Proof. Let z 2 E% , and let x; y 2 R with z D x C iy. We have to find z0 2 Œ1; 1 with jz z0 j r. The definition (4.26) implies 2 2 2x 2y C 1; % C 1=% % 1=% and we can conclude x2
.% C 1=%/2 ; 4
y2
.% 1=%/2 : 4
4.8 Technical lemmas
Rr
153
E%
Figure 4.5. Covering E% by a family of discs.
If x 2 Œ1; 1 holds, we let z0 ´ x and observe p %2 1 .r C 1 C r 2 /2 1 % 1=% jz z0 j D jyj D D 2 2% 2% p p 2 2 2 2r C 2r 1 C r 2 r C 2r 1 C r 2 C 1 C r 1 D D 2% 2% p 2 2r.r C 1 C r / D r; D 2% i.e., z 2 Rr . Otherwise we have jxj > 1. Due to symmetry, we can restrict our attention to the case x > 1. We let z0 ´ 1. We have 1<x
%2 C 1 ; 2%
which implies 4x 2 %2 4%2
.%2 C 1/2 D .%2 C 1/2 ; 4%2
and we can conclude x 2 .%2 1/2 D x 2 .%4 2%2 C 1/ D x 2 .%4 C 2%2 C 1/ 4x 2 %2 D x 2 .%2 C 1/2 4x 2 %2 x 2 .%2 C 1/2 .%2 C 1/2 D .x 2 1/.%2 C 1/2 > .x 2 2x C 1/.%2 C 1/2 D .x 1/2 .%2 C 1/2 : This estimate is equivalent to 4x 2 4.x 1/2 > 2 ; 2 C 1/ .% 1/2
.%2
154
4 Application to integral operators
so we get 2 2 2x 2y 4x 2 %2 4y 2 %2 1 C D 2 C % C 1=% % 1=% .% C 1/2 .%2 1/2 2 2 2 2 2 4.x 1/ % 4y % 4.x 1/ 4y 2 .x 1/2 y2 > C D C D C ; .%2 1/2 .%2 1/2 .% 1=%/2 .% 1=%/2 r2 r2 i.e., .x 1/2 C y 2 < r 2 . This is equivalent to jz z0 j < r and implies z 2 Rr . Lemma 4.78 (Scaling of %). Let % W R0 ! R1 ;
r 7! r C
p
1 C r 2:
Let r0 2 R>0 . For each 2 .0; 1/, there is a O 2 .0; 1/ such that 1 r %.r/ holds for all r 2 Rr0 : %
O Proof. Let x 2 R>0 . We start by considering the function p f W R>1 ! R0 ; ˛ 7! 1 C x 2 ˛ 2 : Elementary computations yield ˛x 2 x2 f 0 .˛/ D p ; f 00 .˛/ D 0; p 1 C x2˛2 .1 C x 2 ˛ 2 / 1 C x 2 ˛ 2
for all ˛ 2 R>0 :
Applying the error equation (2.5) to the second-order Taylor expansion of f in the point 1 yields f .˛/ f .1/ f 0 .1/.˛ 1/ 0
for all ˛ 1;
and we conclude p p x2 1 C x 2 ˛ 2 D f .˛/ f .1/Cf 0 .1/.˛ 1/ D 1 C x 2 C p .˛ 1/: (4.84) 1 C x2 Now let r 2 Rr0 . Applying (4.84) to x ´ r and ˛ ´ 1= > 1 yields p p r2 1 C r 2 ˛ 2 r˛ C 1 C r 2 C p .˛ 1/ 1 C r2 p p r 1 C r2 C r2 2 .˛ 1/ Dr C 1Cr C p 1 C r2 p r D .r C 1 C r 2 / 1 C p .˛ 1/ 1 C r2
%.r= / D %.r˛/ D r˛ C
4.9 Numerical experiments
155
r .˛ 1/ : D %.r/ 1 C p 1 C r2 Due to r r0 0, we have r 2 r02 and r 2 .1 C r02 / D r 2 C r 2 r02 r02 C r 2 r02 D r02 .1 C r 2 /. This means r02 r2 ; 1 C r2 1 C r02 so we can conclude
i.e.,
p
r 1C
r2
r0 q ; 1 C r02
! 1 %.r= / %.r/ 1 C q : 1 C r2 r0
(4.85)
0
We let
q 1 C r02
<1
O ´ q
1 C r02 C .1 /r0 and observe
q 1 C r02 C .1 /r0 1 1 r0 D D1C ; q q O
1 C r02
1 C r02
which combined with (4.85) is the desired result.
4.9 Numerical experiments The purpose of this section is to demonstrate that the theoretical properties of the techniques introduced in this chapter can be observed in the implementation, i.e., that the estimates for the complexity and the error are close to the real behaviour of the algorithms. In order to allow us to observe the relevant effects, we choose a relatively simple setting: we consider the stiffness matrices V 2 R and K 2 R corresponding to the three-dimensional single and double layer potential operator, given by Z Z 1 Vij ´ 'i .x/ 'j .y/ dy dx for all i; j 2 ; 4kx yk2 Z Z hx y; n.y/i Kij ´ 'i .x/ 'j .y/ dy dx for all i; j 2 ; 4kx yk32
discretized by piecewise constant basis functions .'i /i2 on a piecewise triangular approximation of the unit sphere S ´ fx 2 R3 W kxk2 D 1g.
156
4 Application to integral operators
Table 4.1. Single layer potential operator, approximated by interpolation with m D 4 and D 2 for different matrix dimensions n.
n Bld Mem Mem=n MVM kX Xz k2 512 1:2 2:9 5:8 < 0:01 0 11:0 33:9 17:0 0:04 3:67 2048 8192 45:7 183:2 22:9 0:23 1:57 32768 211:8 923:3 28:9 1:31 3:68 131072 1219:6 4274:5 33:4 5:99 9:09 524288 5107:4 18330:0 35:8 26:92 2:29 10000
100000
Bld O(n)
Mem O(n)
10000
1000
1000 100 100 10
1 100
10
1000
10000
100
100000
1 100
1e+06
MVM O(n)
10
1e-06
1
1e-07
0.1
1e-08
0.01 100
1000
10000
100000
1000
10000
1e-05
1e+06
1e-09 100
100000
1e+06
Err O(h^2)
1000
10000
100000
1e+06
The integrals are computed by the black-box quadrature techniques introduced in [89], [43]. Since we are only interested in the approximation properties of the matrix, not in those of the underlying operators, we use a constant order for the quadrature. Let us now consider the approximation of the matrix V corresponding to the single layer potential operator by constant-order interpolation (cf. Section 4.4). The manifold S is approximated by a sequence of polygons with n 2 f512; 2048; : : : ; 524288g triangles, and one piecewise constant basis function is used in each triangle. Table 4.1 lists the results for H 2 -matrices constructed by using tensor-product interpolation of order m D 4 for a block cluster tree satisfying the admissibility condition (4.49) with D 2. The table gives the dimension of the matrix (“n”), the time in seconds1 for the construction of the H 2 -matrix approximation (“Bld”), the stor1
On a single 900 MHz UltraSPARC IIIcu processor of a SunFire 6800 computer.
157
4.9 Numerical experiments
Table 4.2. Single layer potential operator, approximated by interpolation with different orders m and D 2 for the matrix dimension n D 32768.
m Bld Mem Mem=n MVM kX Xz k2 1:85 1 59:4 134:2 4:2 0:77 2 60:3 195:7 6:1 1:03 2:96 3 123:6 451:3 14:1 0:99 2:67 28:9 1:32 3:68 4 210:2 923:3 5 399:0 1655:8 51:7 1:99 5:19 6 582:3 2595:6 81:1 2:82 6:810 7 860:3 3845:2 120:2 4:09 1:910 900
4000
Bld O(m^3)
800
Mem O(m^3)
3500
700
3000
600
2500
500 2000 400 1500
300 200
1000
100
500
0
0 1
2
3
4
4.5
5
6
7
MVM O(m^3)
4
1
2
3
4
5
0.0001
6
7
6
7
Err O(q^m)
1e-05
3.5 1e-06
3 2.5
1e-07
2
1e-08
1.5 1e-09
1 0.5
1e-10 1
2
3
4
5
6
7
1
2
3
4
5
age requirements in MBytes2 (“Mem”), the storage required per degree of freedom in KBytes3 (“Mem=n”), the time in seconds for the matrix-vector multiplication and an estimate for the spectral norm of the error (“kX Xz k2 ”). The complexity estimate of Lemma 3.38 leads us to expect that the time for constructing the H 2 -matrix and its storage requirements will behave like O.n/, and this is clearly visible in the table. Theorem 3.42 suggests that O.n/ operations are required to perform a matrix-vector multiplication, and this estimate also coincides with the numerical results. Remark 4.50 implies that the error of the kernel approximation will 2 3
1 MByte = 220 Bytes 1 KByte = 210 Bytes
158
4 Application to integral operators
Table 4.3. Single layer potential operator, approximated by interpolation with different orders m and D 1 for the matrix dimension n D 32768.
m Bld Mem Mem=n MVM kX Xz k2 1:15 1 103:6 297:0 9:3 1:66 2 105:6 403:7 12:6 2:16 1:26 3 272:6 961:8 30:1 2:10 5:98 517:1 1963:8 61:4 2:80 5:09 4 5 1015:9 3562:9 111:3 4:27 3:110 6 1293:0 5226:7 163:3 5:68 2:911 7 1926:3 6291:1 216:3 7:30 4:012 2500
9000
Bld O(m^3)
Mem O(m^3)
8000
2000
7000 6000
1500
5000 4000
1000
3000 2000
500
1000 0
0 1
2
3
4
5
8
6
7
MVM O(m^3)
7
1
2
3
4
5
0.0001
6
7
Err O(q^m)
1e-05 1e-06
6
1e-07
5
1e-08
4
1e-09 3
1e-10 2
1e-11 1 1
2
3
4
5
6
7
1e-12 1
2
3
4
5
6
7
be constant on all grid levels, and the scaling of the basis functions (C and CJ are in O.h/) implies that the error should behave like O.h2 /, which is indeed visible. We can conclude that the theoretical predictions concerning the dependence on n match the practical observations. Now let us investigate the relationship between the order m of the interpolation and the complexity of the H 2 -matrix approximation. Table 4.2 lists the storage and time requirements and the accuracy of the H 2 -matrix approximation for different choices of the order m 2 f1; : : : ; 7g and a fixed grid with n D 32768 D 215 triangles. Lemma 4.10 yields that the rank distribution is m3 -bounded. According to Lemma 3.38, we expect the build time and storage requirements to behave like O.m3 /, and according to The-
159
4.9 Numerical experiments
Table 4.4. Double layer potential operator, approximated by interpolation with different orders m and D 2 for the matrix dimension n D 32768.
m 1 2 3 4 5 6 7
Bld Mem Mem=n MVM kX Xz k2 1:84 61:4 134:2 4:2 0:77 62:3 195:7 6:1 1:03 2:55 128:3 451:3 14:1 0:99 2:36 218:3 923:3 28:9 1:32 5:07 413:4 1655:8 51:7 1:99 6:18 605:8 2595:6 81:1 2:83 1:58 895:1 3845:2 120:2 3:98 3:29
1000
4000
Bld O(m^3)
900
Mem O(m^3)
3500
800
3000
700 600
2500
500
2000
400
1500
300
1000
200
500
100 0
0 1
2
3
4
5
0.001
6
7
1
2
3
4
5
0.001
Err O(q^m)
7
6
7
DLP SLP
0.0001
0.0001
6
1e-05
1e-05
1e-06 1e-06 1e-07 1e-07
1e-08
1e-08
1e-09
1e-09
1e-10 1
2
3
4
5
6
7
1
2
3
4
5
orem 3.42, the matrix-vector multiplication should also require O.m3 / operations. These predictions coincide with the results given in Table 4.2. Due to Theorem 4.22, we expect the approximation error to decrease exponentially with a rate of q 2=3, but we measure a rate of approximately 0:12. Whether this is due to suboptimal estimates or to a pre-asymptotic behaviour of the interpolation is not clear. Repeating the same experiment for the admissibility parameter D 1 gives the results listed in Table 4.3. We can see that the storage and time requirements are approximately doubled, while the approximation error is reduced by a factor between 2 and 50. Theorem 4.22 suggests that we have to expect a convergence rate of q 1=2, but we again measure a better rate of approximately 0:09. We can conclude that using a lower value of will improve the rate of convergence, but that the increase in storage
160
4 Application to integral operators
Table 4.5. Single layer potential operator, approximated by variable-order interpolation with ˛ D 1, ˇ D 1 and D 2 for different matrix dimensions n.
n Bld Mem Mem=n MVM kX Xz k2 512 0:9 1:8 3:6 < 0:01 4:04 2048 3:5 10:3 5:2 0:04 2:04 8192 14:1 54:8 6:8 0:22 4:15 32768 64:6 300:8 9:4 1:08 7:66 131072 300:6 1508:3 11:8 4:60 1:36 1:57 524288 1633:0 7181:7 14:0 20:26 2097152 6177:5 30584:1 14:9 89:50 2:08 10000
100000
Bld O(n)
1000
10000
100
1000
10
100
1
10
0.1 100
1000
10000
100000
100
1e+06
1e+07
1 100
Mem O(n)
1000
10000
0.01
MVM O(n)
100000
1e+06
1e+07
Err O(h^3)
0.001 10 0.0001 1
1e-05 1e-06
0.1 1e-07 0.01 100
1000
10000
100000
1e+06
1e+07
1e-08 100
1000
10000
100000
1e+06
1e+07
and algorithmic complexity renders this approach unattractive in practice: we can get a similar improvement of the error by increasing the order instead of decreasing . We approximate the matrix K corresponding to the double layer potential operator by the approach described in Section 4.5: we construct an approximation of the kernel function hx y; n.y/i @ 1 D 3 @n.y/ 4kx yk2 4kx yk2 by taking the normal derivative of an interpolant of the generator function .x; y/ ´ 1=.4kx yk2 /. As before, we test the approximation scheme by using different orders m 2 f1; : : : ; 7g for a fixed grid with n D 32768 D 215 triangles. The results
4.9 Numerical experiments
161
Table 4.6. Single layer potential operator, approximated by variable-order interpolation with ˛ D 2, ˇ D 1 and D 2 for different matrix dimensions n.
n Bld Mem Mem=n MVM kX Xz k2 9:15 512 1:0 2:0 4:1 < 0:01 2048 8:5 21:9 11:0 0:04 1:45 8192 30:7 97:6 12:2 0:19 7:36 124:6 463:3 14:5 0:94 8:77 32768 131072 607:0 2393:5 18:7 4:61 1:07 524288 2743:0 10539:6 20:6 19:19 1:48 2097152 11494:6 44237:5 21:6 83:06 1:99 100000
100000
Bld O(n)
10000
10000
1000
1000
100
100
10
10
1 100
1000
10000
100000
100
1e+06
1e+07
1 100
Mem O(n)
1000
10000
0.001
MVM O(n)
100000
1e+06
1e+07
Err O(h^3)
0.0001 10 1e-05 1
1e-06 1e-07
0.1 1e-08 0.01 100
1000
10000
100000
1e+06
1e+07
1e-09 100
1000
10000
100000
1e+06
1e+07
are listed in Table 4.4: we can see that, the time and storage requirements behave like O.m3 / and that the error decreases exponentially. Once again the measured rate of convergence, approximately 0:16, is far better than the predicted rate of q 2=3. Now let us turn our attention to the variable-order interpolation scheme described in Section 4.7. Instead of the isotropic scheme described in this section, we use the more practical anisotropic rank distribution algorithm described in [23], [80]. The results for the rank parameters ˛ D 1 and ˇ D 1 and the dimensions n 2 f29 ; 211 ; : : : ; 221 g are given in Table 4.5. We can see that the time for the construction of the matrix, its storage requirements and the time required for a matrix-vector multiplication converge to O.n/ and that the error decays like O.h3 /, which is the rate predicted by Corollary 4.73 for d D 2 and D 1.
162
4 Application to integral operators
Table 4.6 is concerned with the case where the “slope” of the order distribution is still ˇ D 1, but the lower bound is now ˛ D 2. In our theory, ˛ is used to compensate certain pollution effects caused by the growth of cluster supports and bounding boxes, therefore a higher value of ˛ can be expected to lead to a more regular convergence of the error. We can observe this behaviour in this experiment: except for a short preasymptotic phase, the error converges exactly like O.h3 /. The price for this improved convergence are an increase in storage requirements (by approximately 33%) and a pronounced increase in setup time (which is almost doubled).
Chapter 5
Orthogonal cluster bases and matrix projections
We have seen that we can construct H 2 -matrix approximations for certain types of integral operators by using polynomial expansions. These approximations have a reasonable accuracy and can be constructed efficiently, but they require a large amount of storage. On modern computers, which have typically very fast processors and a comparatively small amount of memory, the storage requirements are the limiting factor for most practical applications. Since the storage requirements of an H 2 -matrix are governed by the rank distributions of the row and column cluster bases, we have to reduce these ranks without sacrificing accuracy. Then we have to convert the given matrix into an element of the H 2 -matrix space defined by the improved cluster bases. An elegant solution for the second task is available if the cluster basis matrices are orthogonal, i.e., if their columns are pairwise orthogonal vectors. In this case, the best approximation of an arbitrary matrix in the H 2 -matrix space can be constructed by a simple orthogonal projection. The concept of orthogonal cluster basis matrices also provides a good approach to the first task: in an orthogonal cluster basis matrix, no column is redundant, since it cannot be expressed in terms of the other columns. This chapter introduces algorithms for turning a cluster basis into an orthogonal cluster basis with the same or almost the same approximation properties, for converting an arbitrary matrix into an H 2 -matrix using orthogonal cluster bases, and for estimating or even explicitly computing the error introduced by this conversion. It is organized as follows: • Section 5.1 introduces the concept of orthogonal cluster bases and provides us with a simple algorithm for checking whether a given cluster basis is orthogonal. • Section 5.2 contains algorithms for converting dense and H -matrices into H 2 matrices using orthogonal cluster bases. • Section 5.3 defines cluster operators, which are used to describe transformations between different cluster bases and allow us to switch efficiently between different H 2 -matrix spaces. • Section 5.4 is devoted to an algorithm that turns an arbitrary nested cluster basis into an orthogonal nested cluster basis in optimal complexity. • Section 5.5 introduces a variant of the former algorithm that only produces an approximation of the original cluster basis. A careful error analysis allows us to control the resulting approximation error (cf. [10], [20]).
164
5 Orthogonal cluster bases and matrix projections
• Section 5.6 describes algorithms for computing the error introduced by a projection into an H 2 -matrix space. • Section 5.7 contains numerical experiments that demonstrate that the bounds for the complexity of the new algorithms are optimal and that the adaptive error control works as predicted by theory. Assumptions in this chapter: We assume that cluster trees T and TJ for the finite index sets and J, respectively, are given. Let TJ be an admissible block cluster tree for T and TJ . Let n ´ # and nJ ´ #J denote the number of indices in and J, and let c ´ #T and cJ ´ #TJ denote the number of clusters in T and TJ . Let p , pJ and pJ be the depths of T , TJ and TJ .
5.1 Orthogonal cluster bases While the construction of cluster bases using polynomials is quite straightforward, it typically includes a certain degree of redundancy: the rank of cluster basis matrices t V t 2 RK for leaf clusters t 2 L is bounded by #tO and #K t , i.e., if #K t > #tO holds, tO the matrix V t has more columns than necessary, which leads to suboptimal efficiency. We can avoid redundant columns in cluster basis matrices V t by ensuring that all columns of V t are pairwise orthogonal, since this implies that we cannot eliminate any of the columns without changing the range of the matrix. Definition 5.1 (Orthogonal cluster basis). Let V be a cluster basis for T . V is called orthogonal if V t V t D I
(5.1)
holds for each t 2 T , i.e., if all cluster basis matrices are orthogonal. For nested cluster bases, we can find a criterion for the orthogonality which depends only on the cluster basis matrices V t in leaf clusters t 2 L and the transfer matrices E t , i.e., on quantities that are available in a standard H 2 -matrix representation. We require the following simple auxiliary result: Lemma 5.2. Let V1 and V2 be cluster bases for T . Let t 2 T with sons.t / ¤ ;. Let t1 ; t2 2 sons.t/. We have ´ V V1;t 1 2;t2
D
V V1;t 1 2;t1 0
if t1 D t2 ; otherwise:
(5.2)
5.1 Orthogonal cluster bases
165
Proof. The case t1 D t2 is obvious, so we only consider t1 ¤ t2 . Due to Definition 3.33, we have V1;t1 D t1 V1;t1 and V2;t2 D t2 V2;t2 . Definition 3.4 implies tO1 \ tO2 D ;, and this means t1 t2 D 0 due to Definition 3.24. Combining these equations yields V D . t1 V1;t1 / t2 V2;t2 D V1;t V D0 V1;t 1 2;t2 1 t1 t2 2;t2
and completes the proof. Now we can proceed to prove an orthogonality criterion based only on matrices available in the standard representation: Lemma 5.3 (Orthogonality criterion). Let V D .V t / t2T be a nested cluster basis for T with transfer matrices E D .E t / t2T . V is orthogonal if and only if V t V t D I and
X
holds for all leaf clusters t 2 L
E t0 E t 0 D I
holds for all non-leaf clusters t 2 T n L :
t 0 2sons.t/
Proof. Let us first assume that V is orthogonal. Let t 2 T . The case sons.t / D ; is trivial. If sons.t / ¤ ;, we have X
X
(5.1)
E t0 E t 0 D
t 0 2sons.t/
t 0 2sons.t/
X
D
V t1 E t1
t1 2sons.t/
X
X
(5.2)
E t0 V t0 V t 0 E t 0 D
X
E t1 V t1 V t2 E t2
t1 2sons.t/ t2 2sons.t/
V t2 E t2
(3.15) D V t V t D I;
t2 2sons.t/
which yields the desired result. assume that V t V t D I holds for all leaf clusters t 2 L and that P Let us now t 0 2sons.t/ E t 0 E t 0 D I holds for all remaining clusters t 2 T n L . We prove V t V t D I by induction on # sons .t /. For # sons .t / D 1, we have sons.t / D ; and V t V t D I holds by assumption. Let now n 2 N be such that V t V t D I holds for all t 2 T with # sons .t / n. Let t 2 T with # sons .t / D n C 1. For all t 0 2 sons.t /, we have # sons .t 0 / # sons .t/ 1 D n, and the induction assumption implies V t0 V t 0 D I . We find (3.15)
V t V t D
X t1 2sons.t/
(5.2)
D
X
X
V t1 E t1
t2 2sons.t/
E t0 V t0 V t 0 E t 0 D
t 0 2sons.t/
which concludes the induction step.
V t2 E t2
X
t 0 2sons.t/
E t0 E t 0 D I;
166
5 Orthogonal cluster bases and matrix projections
It is important to note that this criterion can only be used to determine whether the entire cluster basis is orthogonal. If there is a cluster t 2 T with V t V t ¤ I , the criterion cannot be applied to its predecessors. We can use the criterion of Lemma 5.3 in a recursion to determine whether a given nested cluster basis is orthogonal. This approach is summarized in Algorithm 9. Since it is a special case of Algorithm 13, the complexity analysis of Lemma 5.13 applies. Algorithm 9. Check a nested cluster basis for orthogonality. function IsOrthogonal(t, V ) : boolean; if sons.t/ D ; then ortho .V t V t D I / else ortho true; P 0; for t 0 2 sons.t / do ortho ortho and IsOrthogonal(t 0 , V ); P P C E t0 E t 0 end for; ortho ortho and (P D I ) end if; return ortho
5.2 Projections into H 2 -matrix spaces Remark 3.37 states that H 2 .TJ ; V; W / is a subspace of RJ . Equipped with the Frobenius inner product given by XX X X Xij Yij D .X Y /i i D .X Y /jj hX; Y iF ´ i2 j 2J
j 2J
i2
for all X; Y 2 RJ and the corresponding Frobenius norm kX kF ´ hX; Xi1=2 F D
XX
Xij2
1=2
i2 j 2J
for all X 2 RJ , the matrix space RJ is a Hilbert space. This implies that the best approximation of an arbitrary matrix X 2 RJ in the subspace H 2 .TJ ; V; W / with respect to the Frobenius norm is given by an orthogonal projection. Before we can investigate its properties, we need the following auxiliary results:
5.2 Projections into H 2 -matrix spaces
167
Lemma 5.4 (Block restriction). Let .Xb /b2LJ be a family of matrices Xb 2 RJ . For all b 0 D .t 0 ; s 0 / 2 LJ , we have X t 0 t Xb s s 0 D t 0 Xb 0 s 0 : bD.t;s/2LJ
Proof. Let b 0 D .t 0 ; s 0 / 2 LJ . Let b D .t; s/ 2 LJ with t 0 t Xb s s 0 ¤ 0. Due to Definition 3.24, this implies that there are i 2 tO \ tO0 and j 2 sO \ sO 0 , i.e., that bO \ bO 0 D .tO sO / \ .tO0 sO 0 / ¤ ;. Due to Lemma 3.14, LJ is the set of leaves of a cluster tree for J, and due to Corollary 3.9, this means that the corresponding index sets form a disjoint partition of J, and therefore bO \ bO 0 ¤ ; implies b D b 0 . We can use this lemma to pick single blocks from the equations (3.12) and (3.17), which helps us to find an explicit representation of the orthogonal projection from RJ into H 2 .TJ ; V; W /. Lemma 5.5 (H 2 -matrix projection). Let V and W be orthogonal cluster bases. The operator …TJ ;V;W W RJ ! H 2 .TJ ; V; W /; X X 7! V t V t X Ws Ws C
X
t Xs ;
bD.t;s/2L J
bD.t;s/2LC J
is the orthogonal projection into H 2 .TJ ; V; W /. This implies kX …TJ ;V;W X kF D
inf
2 .T z X2H J ;V;W /
kX Xz kF
(5.3)
for all matrices X 2 RJ . Proof. Let … ´ …TJ ;V;W . In a first step we establish that … is a projection into H 2 .TJ ; V; W /. Let X 2 H 2 .TJ ; V; W /, and let .Sb /b2LC be the correspondJ
ing coupling matrices. Due to Lemma 5.4, we have X X V t Sb Ws C t 0 Xs 0 D t 0 D t 0
bD.t;s/2LC J
X
bD.t;s/2LC J
t Xs s 0
bD.t;s/2L J
X
t V t Sb Ws s C
t Xs s 0 D V t 0 Sb 0 Ws0
bD.t;s/2L J
and therefore V t 0 V t0 XWs 0 Ws0 D V t 0 V t0 . t 0 Xs 0 /Ws 0 Ws0 D V t 0 V t0 .V t 0 Sb 0 Ws0 /Ws 0 Ws0 (5.1)
D V t 0 Sb 0 Ws0 D t 0 Xs 0
168
5 Orthogonal cluster bases and matrix projections
0 0 0 for all b 0 D .t 0 ; s 0 / 2 LC J . We can apply a similar argument to b D .t ; s / 2 LJ and add the contributions of all blocks to get …X D X , and observing range.…/ H 2 .TJ ; V; W / concludes this part of the proof. In order to prove that … is an orthogonal projection, we have to demonstrate … D …. For X; Y 2 RJ , we find X X h…X; t Ys iF D h t .…X /s ; Y iF : h…X; Y iF D bD.t;s/2LJ
bD.t;s/2LJ
Lemma 5.4 implies h t .…X /s ; Y iF D hV t V t X Ws Ws ; Y iF D hX; V t V t Y Ws Ws iF D hX; t .…Y /s iF for b D .t; s/ 2 LC J and h t .…X/s ; Y iF D h t Xs ; Y iF D hX; t Ys iF D hX; t .…Y /s iF for b D .t; s/ 2 L J , so we can conclude X
h…X; Y iF D
h t .…X /s ; Y iF
bD.t;s/2LJ
X
D
hX; t .…Y /s iF D hX; …Y iF ;
bD.t;s/2LJ
which implies … D …. Let now X 2 RJ and Xz 2 H 2 .TJ ; V; W /. We observe z 2F D kX …X C …X Xz k2F kX Xk D kX …X k2F C 2hX …X; …X Xz i2F C k…X Xz k2F D kX …X k2F C 2hX …X; ….X Xz /i2F C k…X Xz k2F D kX …X k2F C 2h…X …2 X; X Xz i2F C k…X Xz k2F D kX …X k2F C 2h…X …X; X Xz i2F C k…X Xz k2F D kX …X k2F C k…X Xz k2F kX …X k2F ; and this result implies (5.3). Let us now turn our attention to the computation of the best approximation …TJ ;V;W X for a given matrix X 2 RJ . According to Lemma 5.5, we only have to compute Sb ´ V t X Ws
for all b D .t; s/ 2 LC J ;
5.2 Projections into H 2 -matrix spaces
169
since the nearfield blocks can be copied directly. This computation can be split into two parts: first we compute the intermediate matrix Xyt;s ´ Ws X t 2 RLs #tO , then we compute the result Sb D V t X Ws D V t t X Ws D V t .Ws X t / D V t Xyt;s . Both steps of this procedure require us to multiply a matrix with the transposed of a cluster basis matrix from the left. The forward transformation Algorithm 6 introduced in Section 3.7 solves this problem by an elegant recursion, and it is straightforward to apply the same recursion to the task at hand. For a given matrix X and a cluster t 2 T , the block forward transformation Algorithm 10 computes the matrices Xyr ´ Vr X for all descendants r 2 sons .t / of t . Algorithm 10. Block forward transformation. y procedure BlockForwardTransformation(t, V , X , var X); if sons.t/ D ; then Xyt V t X else Xyt 0; for t 0 2 sons.t / do BlockForwardTransformation(t 0 , V , X , Xy ); Xyt Xyt C E t0 Xyt 0 end for end if
Lemma 5.6. Let t 2 T and let X 2 RP for a finite index set P . Let k D .k t / t2T be defined as in (3.16). The computation of Xyr ´ Vr X for all r 2 sons .t / by Algorithm 10 requires not more than 2.#P /
X
X
kr .#Kr / 2.#P /
r2sons .t/
kr2
r2sons .t/
operations. Proof. By induction on # sons .t / 2 N. For t 2 T with # sons .t / D 1 we have t sons.t/ D ;, and Algorithm 10 computes the product of the adjoint of V t 2 RK tO and X 2 RP in not more than 2.#tO/.#K t /.#P / 2k t .#K t /.#P / operations. Let now n 2 N be such that the bound holds for all # sons .t / n. Let t 2 T with # sons.t/ D n C 1. Then we have sons.t / ¤ ; and # sons.t 0 / n for all sons t 0 2 sons.t/, so the computation of Xyt 0 for all sons requires not more than 2.#P /
X r2sons .t 0 /
kr .#Kr /
170
5 Orthogonal cluster bases and matrix projections
operations due to the induction assumption. Algorithm 10 multiplies each Xyt 0 2 RK t P by the adjoint of the transfer matrix E t 0 2 RK t 0 K t , and this takes not more than 2.#K t 0 /.#K t /.#P / operations per son t 0 2 sons.t / and X 2.#K t 0 /.#K t /.#P / 2k t .#K t /.#P / t 0 2sons.t/
operations for all sons. Adding the bound for the subtrees corresponding to the sons yields a total of not more than X X 2k t .#K t /.#P / C 2.#P / kr .#Kr / D 2.#P / kr .#Kr / r2sons .t 0 /
r2sons .t/
operations. Algorithm 11 makes use of the block forward transformation to convert an arbitrary dense matrix into an H 2 -matrix by computing the coupling matrices for all farfield blocks: the block forward transformation applied to . t Xs / and the cluster basis W yields the matrices Xyr D Wr X t 2 RLr for all r 2 sons .s/, and applying the block forward transformation to the matrix Xys and the cluster basis V then yields the matrix Yyt D V t Xys D V t X Ws , i.e., the desired coupling matrix (cf. Figure 5.1). Algorithm 11. Convert a dense matrix into an H 2 -matrix. procedure ConvertDense(M , V , W , var S ); for b D .t; s/ 2 LC J do BlockForwardTransformation(s, W , . t Xs / , Xy ); BlockForwardTransformation(t, V , Xys , Yy ); Sb Yyt end for Let us now consider the complexity of Algorithm 11. Since the block forward transformation of Xy is performed by a sequence of operations for all r 2 sons .t /, each of which treats #Ls #sO units of data, we need a method for bounding sums of the type X X X X X X X X #sO D #sO D #sO : r2sons .t/ bD.t;s/2LC J
t2T s2rowC .t/ r2sons .t/
r2T t2pred.r/ s2rowC .t/
Fortunately, we can find a simple upper bound for the two rightmost sums in this expression by using the following result: Lemma 5.7 (Extended block rows). For all t 2 T , let row .TJ ; t / ´ fs 2 TJ W there is a t C 2 pred.t / with .t C ; s/ 2 LC J g [ D rowC .t C /: t C 2pred.t/
(5.4)
5.2 Projections into H 2 -matrix spaces
171
Xys
X
Yyt Kt t
t
Ls
s
Ls
Figure 5.1. Conversion of a dense matrix by block forward transformations applied to columns and rows.
If this does not lead to ambiguity, we use row .t / instead of row .TJ ; t /. For t 2 T and s1 ; s2 2 row .t / with s1 ¤ s2 , we have sO1 \ sO2 D ;, i.e., the index sets corresponding to the clusters in row .t / are pairwise disjoint. For t 2 T and t C 2 pred.t /, we have row .t C / row .t /. Proof. We prove the first statement by contraposition. Let t 2 T and s1 ; s2 2 row .t / with sO1 \ sO2 ¤ ;. By definition, there are t1C ; t2C 2 pred.t / such that s1 2 row.t1C / and s2 2 row.t2C / hold. Without loss of generality, we assume level.t1C / level.t2C /, which implies t2C 2 sons .t1C / due to Lemma 3.8 and therefore tO2C tO1C . Let j 2 sO1 \ sO2 and i 2 tO2C tO1C . Then we have .i; j / 2 .tO1C sO1 / \ .tO2C sO2 /, and since .t1C ; s1 / and .t2C ; s2 / are leaves of the block cluster tree TJ , Corollary 3.9 implies .t1C ; s1 / D .t2C ; s2 /, and therefore s1 D s2 and t1C D tsC . The second statement is a simple consequence of the fact that pred.t / pred.t 0 / holds for all t 0 2 sons.t /. Due to this lemma, we find X X X #sO D bD.t;s/2LC J
r2sons .t/
X
X
#sO D
r2T t2pred.r/ s2rowC .t/
X
X
r2T s2row .t/
#sO
X
nJ
r2T
and can now proceed to prove an upper bound for the complexity of Algorithm 11: Lemma 5.8 (Conversion of dense matrices). Let V and W be orthogonal nested cluster bases with rank distributions K and L, and let .k t / t2T and .ls /s2TJ be defined as in (3.16) and (3.18). Let X 2 RJ . The computation of Xz ´ …TJ ;V;W X by Algorithm 11 requires not more than X X 2nJ k t2 C 2n ls2 t2T
s2TJ
172
5 Orthogonal cluster bases and matrix projections
operations. If K, L are .Cbn ; ˛; ˇ; r; /-bounded and if T , TJ are .Crc ; ˛; ˇ; r; /regular, the number of operations is in O.n nJ .˛ C ˇ/r /. Proof. Let k D .k t / t2T and l D .ls /s2TJ be defined as in (3.16) and (3.18) for K and L, respectively. Let b D .t; s/ 2 LC J . According to Lemma 5.6, the computation of y Xs requires not more than X lr2 2.#tO/ r2sons .s/
operations, and the computation of Yyt requires not more than X X 2.#Ls / kr2 2.#sO / kr2 r2sons .t/
r2sons .t/
operations. Applying Lemma 5.7 to the first sum yields a bound of X X X X X 2.#tO/ lr2 D 2 .#tO/lr2 bD.t;s/2LC J
r2sons .s/
s2TJ t2colC .s/ r2sons .s/
D2
X
X
X
.#tO/lr2
r2TJ s2pred.r/ t2colC .s/
D2
X
r2TJ
2
X
X
lr2
#tO
t2col .r/
lr2 n ;
r2TJ
and by a similar argument we get the bound X X 2.#sO / bD.t;s/2LC J
r2sons .t/
kr2 2
X
kr2 nJ
r2T
for the second sum. This proves the estimate for the number of operations. Using Lemma 3.45 and Lemma 3.48 completes the proof. We can conclude that the number of operations required to convert an arbitrary matrix into an H 2 -matrix grows quadratically with the matrix dimension. This is not surprising, since in this case the matrix is represented by n nJ coefficients, and all of these coefficients have to be taken into account to compute the best approximation. In order to reach a better than quadratic complexity, we have to assume that the input matrix is given in a more efficient representation. If the matrix X 2 RJ is given in the hierarchical matrix representation (3.12), we can take advantage of the factorized structure: Lemma 5.4 implies V t X Ws D V t Ab Bb Ws D .V t Ab /.Ws Bb /
5.2 Projections into H 2 -matrix spaces
173
for all b D .t; s/ 2 LC J . As in the case of the conversion of a dense matrix, we can use the block forward transformation Algorithm 10 to compute Ayb ´ V t Ab ;
Byb ´ Ws Bb
efficiently and then let Sb ´ Ayb Byb . The result is Algorithm 12. Algorithm 12. Convert an H -matrix into an H 2 -matrix. procedure ConvertH(A, B, V , W , var S ); for b D .t; s/ 2 LC J do BlockForwardTransformation(t, V , Ab , Ayb ); BlockForwardTransformation(s, W , Bb , Byb ); Sb Xyt Yys end for Lemma 5.9 (Conversion of H -matrices). Let V and W be orthogonal nested cluster bases with rank distributions K and L, let .k t / t2T and .ls /s2TJ be defined as in (3.16) and (3.18). Let X 2 RJ be a hierarchical matrix with local rank kH given in the form (3.12). Let pJ be the depth of the block cluster tree TJ . The computation of Xz ´ …TJ ;V;W X by Algorithm 12 requires not more than X X 3Csp kH .pJ C 1/ k t2 C ls2 t2T
s2TJ
operations. If K, L are .Cbn ; ˛; ˇ; r; /-bounded and if T , TJ are .Crc ; ˛; ˇ; r; /regular, the number of operations is in O.kH .pJ C 1/.n C nJ /.˛ C ˇ/r /. X
Ab Bb
Ayb Byb Kt
t kH L s s
kH
Figure 5.2. Conversion of a hierarchical matrix by block forward transformations applied to the factors Ab and Bb .
174
5 Orthogonal cluster bases and matrix projections
y Proof. Let b D .t; s/ 2 LC J . According to Lemma 5.6, the computation of Ab requires not more than X kr2 2kH r2sons .t/
operations, and the computation of Byb can be accomplished in not more than X 2kH lr2 r2sons .s/
operations. Using these auxiliary matrices, Sb can be constructed in not more than 2.#K t /.#Ls /kH 2k t ls kH kH .k t2 C ls2 / operations, and the handling of the block b requires not more than a total of X X 3kH kr2 C 3kH lr2 r2sons .t/
r2sons .s/
operations. Definition 3.12 implies level.t / level.b/ pJ ; level.s/ level.b/ pJ
for all b D .t; s/ 2 TJ ;
and summing over all blocks b D .t; s/ 2 LC J yields X X 3kH kr2 r2sons .t/
bD.t;s/2LC J
X
D 3kH
X
X
kr2
t2T s2rowC .t/ r2sons .t/ level.t/pJ
X
X
t2T level.t/pJ
r2sons .t/
3Csp kH
D 3Csp kH
D 3Csp kH
X
X
r2T
t2pred.r/ level.t/pJ
X
kr2
kr2
.minflevel.r/; pJ g C 1/kr2
r2T
3Csp kH .pJ C 1/
X
kr2
r2T
for the first term and X bD.t;s/2LC J
3kH
X r2sons .t/
lr2 3Csp kH .pJ C 1/
X
lr2
r2TJ
for the second. Using Lemma 3.45 and Lemma 3.48 completes the proof.
5.3 Cluster operators
175
As the storage complexity for a hierarchical matrix is O.kH .pJ C 1/.n C nJ //, the number of operations required for the conversion into an H 2 -matrix is approximately proportional to the amount of storage required to represent the input data.
5.3 Cluster operators We have seen that we can convert dense and hierarchical matrices into the H 2 -matrix representation efficiently once we have found suitable orthogonal cluster bases. The efficient conversion of an H 2 -matrix X into an H 2 -matrix with different cluster bases is a slightly more challenging task: let VX and WX be nested cluster bases, let KX and LX be the corresponding rank distributions, and let X 2 H 2 .TJ ; VX ; WX / be given in H 2 -matrix representation X X XD VX;t SX;b WX;s C t Xs bD.t;s/2L J
bD.t;s/2LC J
(cf. (3.17)) with coupling matrices .SX;b /b2LC . We are looking for an approximation J
of X in the space H 2 .TJ ; V; W /. Applying the same arguments as in the previous section yields that Sb D V t VX;t SX;b WX;s Ws (5.5) is the best coefficient matrix for an admissible block b D .t; s/ 2 LC J . By introducing the auxiliary matrices PV;t ´ V t VX;t 2 RK t KX;t
and
PW;s ´ Ws WX;s 2 RLs LX;s ;
(5.6)
we can write equation (5.5) in the form : Sb D PV;t SX;b PW;s
(5.7)
This expression is well-suited for efficient computations, since it only involves products of matrices with dimensions #K t ,#KX;t ,#Ls and #LX;s , which we can assume to be small. Definition 5.10 (Cluster operator). Let K1 D .K1;t / t2T and K2 D .K2;t / t2T be rank distributions. Let P D .P t / t2T be a family of matrices satisfying P t 2 RK1;t K2;t for all t 2 T . Then P is called a cluster operator for K1 and K2 . We can see that PV ´ .PV;t / t2T is a cluster operator for K and KX , while PW ´ .PW;s /s2TJ is one for L and LX . Equation (5.7) allows us to compute the best approximation of X in the space H 2 .TJ ; V; W / efficiently if PV and PW are given. Therefore we now investigate the general problem of constructing the matrices from (5.6) efficiently.
176
5 Orthogonal cluster bases and matrix projections
Definition 5.11 (Cluster basis product). Let V1 D .V1;t / t2T and V2 D .V2;t / t2T be nested cluster bases with rank distributions K1 D .K1;t / t2T and K2 D .K2;t / t2T . The cluster operator P D .P t / t2T defined by P t ´ V1;t V2;t
for all t 2 T is called the cluster basis product of V1 and V2 . According to (5.6), PV is the cluster basis product of V and VX , and PW is the cluster basis product of W and WX . Constructing the cluster basis product for V1 and V2 directly would mean computing P t D V1;t V2;t , which requires O..#K1;t /.#K2;t /#tO/ operations, for all t 2 T , and the resulting algorithm would have non-optimal complexity. In order to reach the optimal complexity, we have to rely on the fact that V1 and V2 are nested: equation (3.15) implies X X P t D V1;t V2;t D V1;t 0 E1;t 0 V2;t 0 E2;t 0 (5.2)
D
X t 0 2sons.t/
t 0 2sons.t/
t 0 2sons.t/
X
E1;t 0 V1;t 0 V2;t 0 E2;t 0 D
E1;t 0 P t 0 E2;t 0
t 0 2sons.t/
for all t 2 T with sons.t / ¤ ;, therefore we can compute the entire family P by the recursive approach given in Algorithm 13. Algorithm 13. Compute P t ´ V1;t V2;t for all t 2 T .
procedure ClusterBasisProduct(t, V1 , V2 , var P ); if sons.t/ D ; then Pt V1;t V2;t else 0; Pt for t 0 2 sons.t / do ClusterBasisProduct(t 0 , V1 , V2 , P ); Pt P t C E1;t 0 P t 0 E2;t 0 end for end if The complexity analysis of Algorithm 13 involves the two rank distributions K1 and K2 for the same cluster tree. We simplify the proof by introducing an auxiliary rank distribution: Lemma 5.12 (Maximum of rank distributions). Let 2 N, and let K1 ; : : : ; K be .Cbn ; ˛; ˇ; r; /-bounded rank distributions for the cluster tree T . The rank distribution Ky ´ .Kyt / t2T defined by Kyt ´ f1; : : : ; maxf#K1;t ; : : : ; #K;t gg
for all t 2 T
177
5.3 Cluster operators
is .Cbn ; ˛; ˇ; r; /-bounded and satisfies #Kyt D maxf#K;t W 2 f1; : : : ; gg. We call Ky the maximum rank distribution for K1 ; : : : ; K . Proof. In order to prove that Ky is .Cbn ; ˛; ˇ; r; /-bounded, we have to investigate the cardinality of the sets D` ´ ft 2 T W #Kyt > .˛ C ˇ.` 1//r g for all ` 2 N0 . We introduce D;` ´ ft 2 T W #K;t > .˛ C ˇ.` 1//r g for all 2 f1; : : : ; g and all ` 2 N0 . Let t 2 T . If t 2 D` , we have maxf#K;t W 2 f1; : : : ; gg D #Kyt > .˛ C ˇ.` 1//r ; and so we can find at least one 2 f1; : : : ; g with #K;t > .˛ C ˇ.` 1//r , i.e., t 2 D;` . We have proven [ D;` D` D1
and therefore also #D`
X D1
#D;`
X
Cbn ` c D Cbn ` c :
D1
The maximum rank distribution allows us to apply the crucial Lemma 3.45 also in situations involving multiple cluster bases: since #K1;t and #K2;t can be bounded by #Kyt for all t 2 T , we also have k1;t ; k2;t kO t for all t 2 T , where kO t is defined as y and therefore can express complexity estimates in terms of .kO t / t2T in (3.16) for K, and use Lemma 3.45. Lemma 5.13 (Cluster basis product). Let V1 D .V1;t / t2T and V2 D .V2;t / t2T be nested cluster bases with rank distributions K1 D .K1;t / t2T and K2 D .K2;t / t2T . Let .k1;t / t2T and .k2;t / t2T be defined as in (3.16). The computation of the cluster basis product P D .P t / t2T by Algorithm 13 requires not more than X 3 3 2 .k1;t C k2;t / t2T
operations. If K1 and K2 are .Cbn ; ˛; ˇ; r; /-bounded and if T is .Crc ; ˛; ˇ; r; /regular, the number of operations is in O.n .˛ C ˇ/2r /. K t;1
Proof. Let t 2 T . If t is a leaf, the algorithm multiplies the adjoint of V1;t 2 RtO K t;2
and V2;t 2 RtO
, and this requires not more than 2.#tO/.#K1;t /.#K2;t / operations.
178
5 Orthogonal cluster bases and matrix projections
If t is not a leaf, there are two ways of computing the product E1;t 0 P t 0 E2;t 0 : y y we can first compute P D P t 0 E2;t 0 and then E1;t 0 P , or we can start with Py D y y E1;t 0 P t 0 and then compute P E2;t 0 . We consider only the first case. Finding P requires at most 2.#K1;t 0 /.#K2;t 0 /.#K2;t / operations, the second product takes at most 2.#K1;t /.#K1;t 0 /.#K2;t / operations, and the total number of operations for this cluster is bounded by X 2.#K1;t 0 /.#K2;t 0 /.#K2;t / C 2.#K1;t /.#K1;t 0 /.#K2;t / t 0 2sons.t/
X
t 0 2sons.t/
X
.#K1;t 0 /2 C
X
2 #K
1;t 0
.#K2;t 0 /2 #K2;t C 2.#K1;t /k1;t .#K2;t /
t 0 2sons.t/
C
t 0 2sons.t/
X
2 #K2;t C 2.#K1;t /k1;t .#K2;t /
#K
2;t 0
t 0 2sons.t/
2 2 2 2 2 3 3 k1;t k2;t C k2;t k2;t C 2k1;t k2;t D .3k1;t C k2;t /k2;t 2.k1;t C k2;t /:
In the last step, we have used the inequality 3x 2 y D 2x.2xy/ x 2 y 2x.x 2 C y 2 / x 2 y D 2x 3 C y.2xy/ x 2 y 2x 3 C y.x 2 C y 2 / x 2 y D 2x 3 C y 3
for all x; y 2 R0 :
(5.8)
Let now K1 and K2 be .Cbn ; ˛; ˇ; r; /-bounded and let T be .Crc ; ˛; ˇ; r; /-regular. By Lemma 5.12, we can find a rank distribution .Kyt / t2T with #kO t maxfk1;t ; k2;t g that is .2Cbn ; ˛; ˇ; r; /-bounded. Our upper bound takes the form X X 3 3 kO t3 ; .k1;t C k2;t /4 2 t2T
t2T
and we can apply Lemma 3.45 and Lemma 3.48 again to complete the proof. Algorithm 14. Convert an H 2 -matrix into an H 2 -matrix. procedure ConvertH2(SX , V , W , var S); ClusterBasisProduct(root.T /, V , VX , PV ); ClusterBasisProduct(root.TJ /, W , WX , PW ); for b D .t; s/ 2 LC J do PV;t SX;b PW;s Sb end for Lemma 5.14 (Conversion of H 2 -matrices). Let V and W be nested orthogonal cluster bases with rank distributions K and L, let VX and WX be nested cluster bases with rank distributions KX and LX . Let X 2 H 2 .TJ ; VX ; WX / be a matrix in H 2 -matrix
5.3 Cluster operators
179
representation. Let .k t / t2T , .kX;t / t2T , .ls /s2TJ and .lX;s /s2TJ be defined as in (3.16) and (3.18). Algorithm 14 requires not more than X X 3 3 maxfk t3 ; kX;t g C 2.Csp C 2/ maxfls3 ; lX;s g 2.Csp C 2/ s2TJ
t2T
operations. If K, KX , L and LX are .Cbn ; ˛; ˇ; r; /-bounded and if T and TJ are .Crc ; ˛; ˇ; r; /-regular, the number of operations is in O..˛ C ˇ/2r .n C nJ //. y be the maxProof. Let Ky be the maximum rank distribution of K and KX , and let L O O imum rank distribution of L and LX . Let .k t / t2T and .ls /s2TJ be defined as in y respectively. This implies kO t D maxfk t ; kX;t g and (3.16) and (3.18) for Ky and L, lOs D maxfls ; lX;s g for all t 2 T and s 2 TJ . Let b D .t; s/ 2 LC J . We can compute Sb in two ways depending on whether or PV;t first. Here we restrict the investigation we perform the multiplication by PW;s to the first case. Computing SX;b PW;s requires 2.#KX;t /.#LX;s /.#Ls / 2kX;t lX;s ls operations, and subsequently computing the matrix Sb D PV;t .SM;b PW;s / requires 2.#K t /.#KX;t /.#Ls / 2k t kX;t ls operations. This means that the number of operations for one admissible block can be bounded by 2kX;t lX;s ls C 2k t kX;t ls 2kO t lOs2 C 2kO t2 lOs : Due to (5.8), we have x 2 y C xy 2
1 .2x 3 C y 3 C 2y 3 C x 3 / D x 3 C y 3 3
for all x; y 2 R0
(5.9)
and can bound the number of operations for one admissible block by 2.kO t lOs2 C kO t2 lOs / 2.kO t3 C lOs3 /:
(5.10)
Once more we rely on Csp -sparsity of the block cluster tree TJ in order to get the bound X X X X X kO t3 C 2 lOs3 2.kO t3 C lOs3 / D 2 bD.t;s/2LC J
t2T s2rowC .t/
2Csp
X
kO t3 C 2Csp
t2T
s2TJ t2colC .s/
X
lOs3
s2TJ
for the number of operations. Due to Lemma 5.13, the preparation of .PV;t / t2T and .PW;s /s2TJ requires not more than X X 4 kO t3 C 4 lOs3 t2T
s2TJ
180
5 Orthogonal cluster bases and matrix projections
operations, and adding both estimates yields our result. Let now K, KX , L and LX be .Cbn ; ˛; ˇ; r; /-bounded, and let T and TJ be y are .2Cbn ; ˛; ˇ; r; /-bounded by Lemma 5.12, .Crc ; ˛; ˇ; r; /-regular. Then Ky and L and we can once again apply the Lemmas 3.45 and 3.48 to complete the proof. Remark 5.15 (Conversion of submatrices). In many applications the cluster operators .PV;t / t2T and .PW;s /s2TJ are given and therefore do not have to be computed by Algorithm 13. In this situation, (5.10) yields the bound 2
X
.kO t3 C lOs3 / 2Csp
bD.t;s/2TJ
X
3 maxfk t3 ; kX;t g C 2Csp
t2T
X
3 maxfls3 ; lX;s g
s2TJ
for the number of operations of the projection into the H 2 -matrix space corresponding to the cluster bases V and W .
5.4 Orthogonalization Obviously, orthogonality is a very desirable property of a cluster basis: it ensures that no redundant columns are present and it facilitates the computation of matrix projections. We therefore now investigate an algorithm for constructing orthogonal cluster bases from general ones. If we do not require the resulting cluster basis to be nested, this is a straightforward task: we can simply compute the Householder factorizations Q t R t of the cluster basis matrices V t and use the orthogonal matrices Q t to construct the desired cluster basis. Lemma 5.16 (Householder factorization). Let 0 . Let X 2 RJ 0 . Let p ´ 0 0 minf# ; #Jg. There are an index set K with #K D p, an orthogonal matrix and a matrix R 2 RKJ satisfying Q 2 RK 0 X D QR: Proof. Let m ´ # 0 and n ´ #J. Let m W f1; : : : ; mg ! 0 and n W f1; : : : ; ng ! J be arbitrary bijective mappings. Let Xy 2 Rmn be defined by .Xy / D Xm ./;n ./ for all 2 f1; : : : ; mg and all 2 f1; : : : ; ng. We apply p ´ minfm; ng Householder transformations [48], Section 5.2, to the y 2 Rmm and an upper triangular matrix Xy in order to find an orthogonal matrix Q mn y2R y R. y Since R y is upper triangular, its rows p C 1; : : : ; m matrix R with Xy D Q are zero.
5.4 Orthogonalization
181
We let K ´ m .f1; : : : ; pg/ 0 and define Q 2 RK and R 2 RKJ by 0 ´ Qik D
y 1 1 Q m .i/;m .k/ 0
if i 2 0 ; otherwise;
y 1 Rkj D R m .k/;1 n .j /
for i 2 , k 2 K and j 2 J. Let i 2 and j 2 J. If i 62 0 , we obviously have .QR/ij D 0 D Mij . If i 2 0 , we find .QR/ij D
X k2K
Qik Rkj D
p X
y 1 1 R y D Xy1 D Xij : Q 1 1 m .i/;m .k/ 1 m .k/;n .j / m .i/;n .j /
kD1
Lemma 5.17. There is a constant Cqr 2 N such that Algorithm 15 requires not more than Cqr .# 0 /nJ minf# 0 ; #Jg operations. Proof. See Section 5.2 in [48]. Algorithm 15. Householder factorization QR D X of a matrix X 2 RJ with an 0 K KJ orthogonal matrix Q 2 R0 and a weight matrix R 2 R . procedure Householder(X , 0 , var Q, R, K) m # 0 ; n #J; p minfm; ng; Fix arbitrary isomorphisms m W f1; : : : ; mg ! 0 and n W f1; : : : ; ng ! J; Xy 0 2 Rmn for i 2 f1; : : : ; mg, j 2 f1; : : : ; ng do Xyij Xm .i/;n .j / end for; y 2 Rmm and an upper triangular matrix Compute an orthogonal matrix Q y R; y y 2 Rmn with Xy Q R K
m .f1; : : : ; pg/; ; R 0 2 RKJ ; Q 0 2 RK 0 for i 2 f1; : : : ; mg, k 2 f1; : : : ; pg do y ik Qm .i/;m .k/ Q end for; for k 2 f1; : : : ; pg, j 2 f1; : : : ; ng do ykj Rm .k/;n .j / R end for
Using this lemma and the corresponding Algorithm 15, constructing an orthogonal cluster basis Q D .Q t / t2T with rank distribution L D .L t / t2T satisfying #L t #K t and range.V t / range.Q t / for all t 2 T is straightforward: let t 2 T . Let
182
5 Orthogonal cluster bases and matrix projections
l t ´ minf#tO; k t g. Using Lemma 5.16, we can find an index set L t with #L t D l t , an t orthogonal matrix Q t 2 RtL and a matrix R t 2 RL t K t satisfying V t D Q t R t . O The family Q D .Q t / t2T constructed in this way is an orthogonal cluster basis with rank distribution L D .L t / t2T , and due to V t D Q t R t D Q t .Qt Q t /R t D Q t Qt V t ;
(5.11)
it can be used to express anything that can be expressed by V t , and R D .R t / t2T is the cluster operator describing the change of basis from V to Q. We have already seen that only nested cluster bases allow us to reach the optimal efficiency, therefore we are interested in finding an algorithm that turns a nested cluster basis V into an orthogonal nested cluster basis Q. For leaf clusters, we can still use the Householder factorization directly, but for the remaining clusters we have to find a way to work with transfer matrices instead of V t and Q t . Let t 2 T with ´ # sons.t / > 0. We denote the sons of t by ft1 ; : : : ; t g D L 0 sons.t/. We assume that the matrices Q t 0 2 RtO0 t , index sets L t 0 and cluster operator matrices R t 0 2 RL t 0 K t 0 with V t 0 D Q t 0 Rr 0 have been constructed for all t 0 2 sons.t /. Without loss of generality, we can also assume that the index sets fL t 0 W t 0 2 sons.t /g are pairwise disjoint. Using the nested structure of V , this implies 1 0 R t1 E t1 X X
B C Vt D Vt 0 Et 0 D Q t 0 R t 0 E t 0 D Q t1 : : : Q t @ ::: A : t 0 2sons.t/ t 0 2sons.t/ R t E t We introduce the auxiliary matrices 1 R t1 E t1 C B and Vyt ´ @ ::: A 2 RM t K t R t E t 0
U t ´ Q t1
:::
t Q t 2 RM tO
for the new index set Mt ´
[
(5.12)
Lt 0 :
t 0 2sons.t/
The matrix Vyt contains the coefficients required to express the matrix V t in terms of the cluster basis matrices Q t1 ; : : : ; Q t , and U t simply describes the transformation from these coefficients back to R , so that we have V t D U t Vyt (cf. Figure 5.3). Due to Lemma 5.2, we find 1 0 1 0 Q t1 Q t1 Q t1 : : : Qt1 Q t
B C C B :: :: U t U t D @ ::: A Q t1 : : : Q t D @ ::: ADI : : Q t Q t Q t1 : : : Q t Q t
183
5.4 Orthogonalization Vt tO
V t1
tO1
V t2 E t1
Q t1 E t2
Q t2 R t1 E t 1
Ut Vyt
R t2 E t2
tO2
Kt
K t1
K t2
L t1
Mt
L t2
Figure 5.3. Construction of the representation V t D U t Vyt used in the orthogonalization algorithm.
and conclude that U t is an orthogonal matrix. The coefficient matrix Vyt has only [ X #M t D # Lt 0 D #L t 0 t 0 2sons.t/
t 0 2sons.t/
rows and #K t columns, therefore we can compute its Householder decomposition y t Rt Vyt D Q y t 2 RM t L t is an orthogonal matrix. Since efficiently using Algorithm 15, where Q y t , and we observe y t are orthogonal, so is Q t ´ U t Q U t and Q y t Rt D Qt Rt ; V t D U t Vyt D U t Q i.e., we have found a Householder decomposition of V t . y t jL 0 L t , observe For all t 0 2 sons.t /, we let F t 0 ´ Q t 0 1 F t1 B y t D @ :: C Q : A
(5.13)
F t and conclude y t D Q t1 Qt D Ut Q
1 F t1
B C Q t @ ::: A D 0
:::
F t
X
Qt 0 Ft 0 ;
(5.14)
t 0 2sons.t/
therefore the new orthogonal cluster basis is nested indeed. This construction is summarized in Algorithm 16. Lemma 5.18 (Complexity of orthogonalization). Let V be a nested cluster basis with rank distribution K, and let .k t / t2T be defined as in (3.16). Algorithm 16 requires not more than X X .Cqr C 2/ k t2 #K t .Cqr C 2/ k t3 t2T
t2T
184
5 Orthogonal cluster bases and matrix projections
operations. If K is .Cbn ; ˛; ˇ; r; /-bounded and if T is .Crc ; ˛; ˇ; r; /-regular, the number of operations is in O..˛ C ˇ/2r n /. Algorithm 16. Orthogonalize a cluster basis. procedure Orthogonalize(t, V , var Q, R, L) if sons.t/ D ; then Householder(V t , tO, Q t , R t , L t ) else Mt ;; for t 0 2 sons.t / do Orthogonalize(t 0 , V , Q, R, L); P Lt 0 Mt [ Mt end for; Vyt 0 2 RM t K t ; 0 for t 2 sons.t / do Rt 0 Et 0 Vyt jL t 0 K t end for; y t , R t , L t ); Householder(Vyt , M t , Q 0 for t 2 sons.t / do y t jL 0 L t Ft 0 Q t end for end if
fAlgorithm 15g
fAlgorithm 15g
t Proof. Let t 2 T . If t is a leaf, we apply Algorithm 15 to the matrix V t 2 RK , and tO according to Lemma 5.17, this takes not more than
Cqr .#tO/.#K t / minf#tO; #K t g Cqr .#tO/.#K t /2 Cqr k t2 #K t operations. If t is not a leaf, we have to compute R t 0 E t 0 for all t 0 2 sons.t /, using not more than 2.#L t 0 /.#K t 0 /.#K t / 2.#K t 0 /2 .#K t / operations per son and X .#K t 0 /2 2.#K t / 2.#K t / t 0 2sons.t/
2
X
#K t 0
2.#K t /k t2
t 0 2sons.t/
for all sons. Due to #L t 0 #K t 0 , we have X #M t D #L t 0 t 0 2sons.t/
X
#K t 0 k t ;
t 0 2sons.t/
and so applying Algorithm 15 to Vyt 2 RM t K t takes not more than Cqr k t .#K t / minfk t ; #K t g Cqr k t .#K t /2 Cqr k t2 #K t
(5.15)
5.5 Truncation
185
operations due to Lemma 5.17. We conclude that the number of operations for one cluster t 2 T is bounded by .Cqr C 2/k t2 #K t , and summing over all clusters completes the proof. If K is .Cbn ; ˛; ˇ; r; /-bounded and if T is .Crc ; ˛; ˇ; r; /-regular, we can again use the Lemmas 3.45 and 3.48 to bound the asymptotic complexity.
5.5 Truncation The orthogonalization Algorithm 16 ensures that Q t is orthogonal and satisfies V t D Q t R t . By construction, the rank of Q t is always equal to minfk t ; #K t g, in particular it can be significantly larger than the rank of V t . Since the rank distribution determines the algorithmic complexity of most algorithms, we would like to reduce the rank as far as possible, i.e., to modify Algorithm 16 in such a way that it ensures #L t rank.V t /. In practical algorithms, computing the true rank of V t can be impossible due to rounding errors, therefore we prefer to look for an orthogonal cluster basis Q with rank distribution L which can be used to approximate the original cluster basis V D .V t / t2T , i.e., which satisfies minfkV t Q t Z t k2 W Z t 2 RL t K t g D kV t Q t Qt V t k2
(5.16)
for a given tolerance 2 R>0 and all t 2 T . We call the process of constructing an orthogonal cluster basis which approximates the original basis up to a certain error truncation and refer to the new basis Q satisfying a condition of the type (5.16) as the truncated cluster basis. In order to reach optimal efficiency, we have to look for approximations of low rank. Approximations of this type can be constructed by the singular value decomposition: Lemma 5.19 (Singular value decomposition). Let 0 . Let X 2 RJ 0 . Let p ´ rank.X/ minf# 0 ; #Jg with p > 0. Let 1 p > 0 be the non-zero singular values of X . For each l 2 f1; : : : ; pg, we can find an index set K 0 with #K D l, an orthogonal matrix Q 2 RK and a matrix R 2 RKJ which satisfy 0 ´ lC1 if l < p; kX QRk2 D (5.17a) 0 otherwise; ´ P
1=2 p 2 if l < p; iDlC1 i kX QRkF D (5.17b) 0 otherwise: The orthogonality of Q implies R D Q X . Proof. Let m ´ # 0 and n ´ #J. Let m , n and Xy be defined as in the proof of Lemma 5.16.
186
5 Orthogonal cluster bases and matrix projections
Due to Theorem 2.5.2 in [48], there is a singular value decomposition of Xy , i.e., there are orthogonal matrices Uy 2 Rmm and Vy 2 Rnn such that Uy Xy Vy D diag.1 ; : : : ; q / 2 Rmn holds for the family 1 2 q 0, q ´ minfm; ng of singular values. Since Uy and Vy are invertible, we have p q and pC1 D D q D 0. y ´ Uy jml , y 2 Rml by setting Q Let now l 2 f1; : : : ; pg. We define the matrix Q y y y i.e., by using the first l columns of U to define the columns of Q. Since U is orthogonal, y the same holds for Q. y implies The definition of Q y Xy D .Uy Xy /jln D .Uy Xy Vy /jln Vy D diagmn .1 ; : : : ; q /jln Vy Q D diagln .1 ; : : : ; l /Vy and therefore
yQ y Xy D Uy diagmn .1 ; : : : ; l ; 0; : : : ; 0/Vy : Q
Since Uy and Vy are orthogonal matrices, we find yQ y Xk y 2 kXy Q D kUy diagnk .1 ; : : : ; q /Vy Uy diagnk .1 ; : : : ; l ; 0; : : : ; 0/Vy k2 D k diagnk .1 ; : : : ; q / diagnk .1 ; : : : ; l ; 0; : : : ; 0/k2 D k diagnk .0; : : : ; 0; lC1 ; : : : ; p /k2 ´ lC1 if l < p D 0 otherwise:
(5.18)
This is the desired error bound for the spectral norm. The error bound for the Frobenius norm follows by a similar argument. We let K ´ m .f1; : : : ; lg/ and define Q 2 RK 0 by setting ´ y 1 1 if i 2 0 ; Q m .i/;m .k/ Qik ´ 0 otherwise; for all i 2 and k 2 K and see that the error bound (5.18) is equivalent to (5.17a). The Algorithm 18 follows the structure of the proof of Lemma 5.19 in order to compute index sets K 0 with #K rank.X / and orthogonal matrices Q 2 satisfying (5.17a) or (5.17b), depending on which strategy of the ones given in RK 0 Algorithm 17 is used to determine the rank. We use a minor modification: we employ Householder factorizations in order to ensure that the singular value decomposition has only to be computed for a square matrix of dimension q D minfn; mg.
5.5 Truncation
187
Algorithm 17. Find the optimal rank l for approximating a matrix with singular values 1 > > q up to an accuracy of . z k2 g procedure FindRankEuclAbs ( , q, .i /qiD1 , var l); fkM M l 0; while l < q and lC1 do l l C1 end while procedure FindRankEuclRel ( , q, .i /qiD1 , var l); l 1; while l < q and lC1 1 do l l C1 end while
z k2 kM k2 g fkM M
procedure FindRankFrobAbs ( , q, .i /qiD1 , var l); l q; ı 0; while l > 0 and ı C l2 2 do l 1 ı ı C l2 ; l end while
z kF g fkM M
procedure FindRankFrobRel ( , q, .i /qiD1 , var l); 0; for i 2 f1; : : : ; qg do C i2 end for; l q; ı 0; while l > 0 and ı C l2 2 do ı ı C l2 ; l l 1 end while
z kF kM kF g fkM M
Remark 5.20 (Complexity of the SVD). A singular value decomposition is typically computed by an iteration, therefore the number of iteration steps depends on the desired precision. Due to [48], Section 5.4.5, there is a constant Csvd 2 N such that the computation of the decomposition of the matrix Yy 2 Rqq to an arbitrary, but fixed, machine precision requires not more than Csvd q 3 operations. Lemma 5.21. There is a constant Cpr 2 N such that Algorithm 18 requires not more than Cpr .# 0 /nJ minf# 0 ; #Jg operations. Proof. Let m ´ # 0 , n ´ #J and q ´ minfm; ng. yR y of If m > n holds, we construct Yy by finding the Householder factorization Q the m n matrix Xy . According to [48], Section 5.2, this can be accomplished in not more than Cqr mn2 D Cqr mnq operations.
188
5 Orthogonal cluster bases and matrix projections
Algorithm 18. Construct an approximative factorization QR X of a matrix X 2 RJ with an orthogonal matrix Q 2 RK and a weight matrix R 2 RKJ . 0 0 procedure Lowrank(X, 0 , , var Q, R, K) m # 0 ; n #J; q minfm; ng; Fix arbitrary isomorphisms m W f1; : : : ; mg ! 0 and n W f1; : : : ; ng ! J; Xy 0 2 Rmn ; for i 2 f1; : : : ; mg, j 2 f1; : : : ; ng do Xyij Xm .i/;n .j / end for; if m > n then yR y of Xy ; Compute a Householder factorization Xy D Q y 2 Rqq Yy R else yR y of Xy ; Compute a Householder factorization Xy D Q qq y y Y R 2R end if; Compute a singular value decomposition Yy D Uy diag.1 ; : : : ; q /Vy of Yy ; if m > n then y Uy Uy Q end if; FindRank( , q, .i /qiD1 , l); fcf. Algorithm 17g K
m .f1; : : : ; lg/; Q 0 2 RK ; for i 2 f1; : : : ; mg, k 2 f1; : : : ; lg do Qm .i/;m .k/ Uyik end for; R Q X 2 RKJ
yR y of the Otherwise, we construct Yy be finding the Householder factorization Q y n m matrix X . Again according to [48], Section 5.2, this requires not more than Cqr nm2 D Cqr mnq operations. Due to Remark 5.20, we know that finding the singular values and left singular vectors of the matrix Yy 2 Rqq up to machine precision requires not more than Csvd q 3 operations. If m > n holds, an additional multiplication is needed to construct Uy , this takes not more than 2mq 2 operations. Filling the matrix R takes not more than 2q nm operations, so our claim holds with the constant Cpr D .Cqr C Csvd C 4/. Remark 5.22 (Optimality of low-rank projections). Due to the optimality result [48], Theorem 2.5.3, the low-rank projection constructed by Algorithm 18 satisfies kX QRk2 kX ABk2
for all A 2 RK ; B 2 RKJ ; 0
5.5 Truncation
189
i.e., the projection computed by using the singular value decomposition is at least as good as any other low-rank projection. In this sense, Algorithm 18 provides us with an approximation with minimal rank #K. We can apply Algorithm 18 to the cluster basis matrices V t in order to find an orthogonal cluster basis matrix Q t satisfying (5.16) for any given precision 2 R>0 by ensuring that all but the first #L t singular values of V t are below . Picking the minimal rank with this property yields the optimal orthogonal cluster basis matrix t Q t 2 RtL for the given tolerance . O If V is nested, we again have to ensure that the new cluster basis Q has the same property. Let t 2 T be a cluster with ´ # sons.t / > 0. We denote the sons of t L 0 by ft1 ; : : : ; t g D sons.t /. We assume that orthogonal matrices Q t 0 2 RtO0 t , index sets L t 0 and cluster operator matrices R t 0 D Qt 0 V t 0 2 RL t 0 K t 0 have already been constructed for all t 0 2 sons.t /. Without loss of generality, we can also assume that the index sets fL t 0 W t 0 2 sons.t /g are pairwise disjoint. Since we want the new cluster basis to be nested, there have to be transfer matrices F t 0 for all t 0 2 sons.t / satisfying 0 1 Ft X
B :1C yt : Q t 0 F t 0 D Q t1 : : : Q t @ :: A D U t Q (5.19) Qt D „ ƒ‚ … t 0 2sons.t/ F t DU t „ ƒ‚ … yt DQ
This means that Q can only be nested if the range of the matrix Q t is a subspace of the range of the matrix U t . Since U t is orthogonal, the best approximation of V t in the range of U t is given by 0 1 Q t1 V t B :: C x Vt ´ Ut Ut Vt D Ut @ : A ; Qts V t
and Lemma 5.2 and (3.15) imply X Qti V t D Qti
t 0 2sons.t/
for all i 2 f1; : : : ; g, so we find
V t 0 E t 0 D Qti V ti E ti D R ti E ti
0
1 R t1 E t1 B C Vxt D U t @ ::: A D U t Vyt : R t E t „ ƒ‚ … DVyt
(5.20)
190
5 Orthogonal cluster bases and matrix projections
Here we can see the fundamental difference between the orthogonalization algorithm and the truncation algorithm: since the former is based on equation (5.11), no information is lost when switching from the original basis to the orthogonal one and we have V t D U t Vyt . In the truncation algorithm, this equality no longer holds and we have to replace V t by its projection Vxt D U t U t V t into the space prescribed by the sons of t . In order to simplify the presentation, we extend the notation Vxt to leaf clusters by setting ´ U t U t V t if sons.t / ¤ ;; (5.21) Vxt ´ otherwise; Vt for all t 2 T . We can now proceed as in the case of the orthogonalization algorithm: instead of computing a Householder factorization of Vyt , we use a low-rank approximation of the M t K t -matrix Vyt , which is provided efficiently by Algorithm 18, and observe that y t and by extension to an orthogonal it gives rise to an orthogonal matrix Q t ´ U t Q cluster basis. The entire recursive procedure is given in Algorithm 19. Algorithm 19. Truncate a cluster basis. procedure Truncate(t, V , , var Q, R, L) if sons.t/ D ; then Lowrank(V t , tO, t , Q t , R t , L t ) else Mt ;; for t 0 2 sons.t / do Truncate(t 0 , V , , Q, R, L) P Lt 0 Mt [ Mt end for; Vyt 0 2 RM t K t ; 0 for t 2 sons.t / do Rt 0 Et 0 Vyt jL t 0 K t end for; y t , R t , L t ); Lowrank(Vyt , M t , t , Q 0 for t 2 sons.t / do y t jL 0 L t Ft 0 Q t end for end if
fAlgorithm 18g
fAlgorithm 18g
Lemma 5.23 (Complexity of truncation). Let V be a nested cluster basis with rank distribution K, and let .k t / t2T be defined as in (3.16). Algorithm 19 requires not more than X .Cpr C 2/ k t2 #K t t2T
5.5 Truncation
191
operations. If K is .Cbn ; ˛; ˇ; r; /-bounded and if T is .Crc ; ˛; ˇ; r; /-regular, the number of operations is in O..˛ C ˇ/2r n /. Proof. Let t 2 T . If sons.t / D ;, the algorithm constructs a low-rank projection for V t directly. According to Lemma 5.21, this requires not more than Cpr .#tO/.#K t /2 Cpr k t2 #K t operations. If sons.t/ ¤ ;, we form the matrix Vyt by computing R t 0 E t 0 for all sons t 0 of t , which requires 2.#L t 0 /.#K t 0 /.#K t / 2.#K t 0 /2 .#K t / operations for each t 0 2 sons.t /, i.e., a total of 2 X X 2.#K t / .#K t 0 /2 2.#K t / #K t 0 2.#K t /k t2 : t 0 2sons.t/
t 0 2sons.t/
According to Lemma 5.21, finding the low-rank projection for Vyt by Algorithm 18 requires not more than (5.15)
Cpr .#M t /.#K t /2 Cpr k t2 #K t operations, and we can proceed as in the proof of Lemma 5.18. Remark 5.24. The truncation algorithm is related to the well-known Tausch–White construction [102] of wavelets on manifolds: instead of looking for a wavelet basis with vanishing moments, we are interested in finding a cluster basis Q that approximates the original cluster basis V . If V has been constructed using a polynomial basis, both techniques will yield similar results. The advantage of the general truncation Algorithm 19 is that it also can be applied to general cluster bases, e.g., to the total cluster basis introduced in Chapter 6. Algorithm 19 guarantees the error bound (5.16) only for leaf clusters, since the construction in the remaining clusters involves the additional projection U t U t used to ensure a nested structure. We now analyze the combined error of this projection and the truncated singular value decomposition. In order to be able to reach the optimal result, we first have to establish that the errors introduced by the singular value decompositions are pairwise perpendicular. Lemma 5.25. Let V be a nested cluster basis. Let Q be an orthogonal nested cluster basis. For each t 2 T , we define the cluster-wise projection error D t ´ Vxt Q t Qt Vxt
(5.22)
where Vxt is given by (5.21). The equation Qt Ds D 0 holds for all t 2 T and s 2 sons .t /.
(5.23)
192
5 Orthogonal cluster bases and matrix projections
Proof. We start with t 2 T and s 2 sons .t / satisfying level.s/ level.t / D 0. This implies t D s, and we find Qt D t D Qt Vxt Qt Q t Qt Vxt D Qt Vxt Qt Vxt D 0: „ƒ‚… DI
Let ` 2 N0 . We assume that equation (5.23) holds for all t 2 T , s 2 sons .t / with level.s/ level.t / `. Let t 2 T and s 2 sons .t / with level.s/ level.t / D ` C 1. Since t ¤ s, there is a cluster t 2 sons.t / with s 2 sons .t /. The definition (5.22) implies s Ds D Ds , and since the sons of t correspond to disjoint subsets of tO, we have ´ s if t 0 D t ; t 0 s D (5.24) 0 otherwise; for all t 0 2 sons.t /, therefore we can apply (3.15) to Q and find X Qt Ds D F t0 Qt 0 Ds t 0 2sons.t/
D
X
F t0 Qt 0 t 0 s Ds D F t Qt s Ds D F t Qt Ds :
t 0 2sons.t/
Since t 2 sons.t /, we have level.t / D level.t / C 1. This implies level.s/ level.t / D `, so the induction assumption yields Qt Ds D 0, which concludes the induction step. Lemma 5.26 (Error orthogonality). Let V be a nested cluster basis. Let Q be an orthogonal nested cluster basis. Let .D t / t2T be defined as in (5.22). The equation D t Ds D 0
(5.25)
holds for all t; s 2 T with t ¤ s. Proof. Let t; s 2 T with t ¤ s. Due to symmetry, it is sufficient to restrict our attention to the case level.t / level.s/. Case 1: level.t / D level.s/. Lemma 3.8 implies tO \ sO D ; and therefore t s D 0, which allows us to conclude D t Ds D D t t s Ds D 0:
(5.26)
Case 2: level.t / < level.s/. Case 2a: s 62 sons .t /. Lemma 3.8 implies again tO \ sO D ; and we proceed as in (5.26). Case 2b: s 2 sons .t /. Due to level.t / < level.s/, we can infer ´ # sons.t / > 0. We let ft1 ; : : : ; t g ´ sons.t / and find y t Qt Vxt D U t Vyt U t Q y t Qt V t D t D Vxt Q t Qt Vxt D U t U t V t U t Q
5.5 Truncation
D Q t1 D
1 0 Rt Et
B 1: 1 C Q t @ :: A Q t1 R t E t
:::
X
:::
193
0 1 Ft
B :1C Q t @ :: A Q t V t F t
Q t 0 .R t 0 E t 0 F t 0 Qt V t /:
t 0 2sons.t/
By definition of sons .t /, there is a cluster t 2 sons.t / with s 2 sons .t /, and (5.24) yields X .R t 0 E t 0 F t 0 Qt V t / Qt 0 Ds D t Ds D t 0 2sons.t/
D
X
.R t 0 E t 0 F t 0 Qt V t / Qt 0 t 0 s Ds
t 0 2sons.t/
D .R t E t F t Qt V t / Qt Ds : Due to Lemma 5.25, we have Qt Ds D 0. Since any descendant r 2 sons .t / of a cluster t can be relevant to the error estimate, we require a method for investigating the interaction between t and all of its descendants. A straightforward approach is to introduce suitably generalized transfer matrices: Definition 5.27 (Long-range transfer matrices). Let V be a nested cluster basis with rank distribution K and transfer matrices .E t / t2T . For all t 2 T and r 2 sons .t /, we define the matrix Er;t 2 RKr K t by ´ Er;t 0 E t 0 if there is a t 0 2 sons.t / with r 2 sons .t 0 / Er;t ´ I otherwise, i.e., if t D r: Lemma 5.28 (Transitivity). Let t; s; r 2 T with s 2 sons .t / and r 2 sons .s/. Then we have Er;t D Er;s Es;t : (5.27) Proof. By induction on level.r/ level.t /. If level.r/ level.t / D 0, we have t D s D r and the statement is trivial. Let now n 2 N0 be such that (5.27) holds for all t; s; r 2 T with level.r/ level.t/ D n, s 2 sons .t / and r 2 sons .s/. Let t; s; r 2 T with level.r/ level.t / D n C 1, s 2 sons .t / and r 2 sons .s/. Case 1: s D t. This case is trivial, since we have Er;t D Er;s D Er;s Es;s D Er;s Es;t : Case 2: s ¤ t . Since s 2 sons .t /, there is a t 0 2 sons.t / with s 2 sons .t 0 /. Due to Lemma 3.7, r 2 sons .s/ implies r 2 sons .t 0 /, and Definition 5.27 yields Er;t D Er;t 0 E t 0 :
(5.28)
194
5 Orthogonal cluster bases and matrix projections
We have r 2 sons .s/, s 2 sons .t 0 / and level.r/ level.t 0 / D n, so we can apply the induction assumption to Er;t 0 and get Er;t D Er;t 0 E t 0 D Er;s Es;t 0 E t 0 ;
0
and s 2 sons .t / yields Es;t 0 E t 0 D Es;t , which concludes the induction. The long-range transfer matrices give an explicit representation of the approximation error: the error matrix is the sum of local error matrices, which are weighted by the long-range transfer matrices. Lemma 5.29 (Error decomposition). Let V be a nested cluster basis. Let Q be an orthogonal nested cluster basis. Let .Vxt / t2T be defined as in (5.21). Then the approximation error has the representation X V t Q t Qt V t D .Vxr Qr Qr Vxr /Er;t (5.29) r2sons .t/
for all t 2 T . Proof. By induction on # sons .t / 2 N. We start by considering t 2 T with # sons .t/ D 1. This implies sons.t / D ; and (5.29) follows directly from Definition 5.27 and (5.21). Let now n 2 N be such that (5.29) holds for all t 2 T with # sons .t / n. Let t 2 T with # sons .t / D n C 1. This implies ´ # sons.t / > 0. Let ft1 ; : : : ; t g ´ y t (cf. (5.14)) and find sons.t/. Since Q is nested, we have Q t D U t Q y t U t V t D Q y t .U t U t /U t V t D Qt Vxt ; Qt V t D Q
(5.30)
which together with (3.15) implies V t Q t Qt V t D V t Vxt C Vxt Q t Qt Vxt D V t U t Vyt C D t 1 0 R t1 E t1 X
B C D Dt C V t 0 E t 0 Q t1 : : : Q t @ ::: A t 0 2sons.t/ R t E t X .V t 0 Q t 0 Rr 0 /E t 0 D Dt C t 0 2sons.t/
X
D Dt C
.V t 0 Q t 0 Qt 0 V t 0 /E t 0 :
t 0 2sons.t/
Due to # sons .t 0 / < # sons .t / D n C 1, we can apply the induction assumption to get X X V t Q t Qt V t D D t C Dr Er;t 0 E t 0 t 0 2sons.t/ r2sons .t 0 /
D Dt C
X
X
t 0 2sons.t/ r2sons .t 0 /
which concludes the induction.
Dr Er;t D
X r2sons .t/
Dr Er;t ;
5.5 Truncation
195
In order to stress the similarities between leaf and non-leaf clusters and to simplify y t (cf. (5.12) and (5.13), the notation, we extend the definitions of the matrices Vyt and Q respectively) to the case of leaf clusters: let 1 80 E R ˆ t t 1 1 ˆ ˆ
for all t 2 T and all x 2 RK t . Proof. Let x 2 RJ and t 2 T . We can combine Lemma 5.29 with Lemma 5.26 in order to get X 2 kV t x Q t Qt V t xk22 D Dr Er;t x r2sons .t/
D
X
2
X
hDr1 Er1 ;t x; Dr2 Er2 ;t xi
r1 2sons .t/ r2 2sons .t/
D
X
kDr Er;t xk22
r2sons .t/
D
X
k.Vxr Qr Qr Vxr /Er;t xk22
r2sons .t/
for all t 2 T ; s 2 RK t . If sons.t / D ;, equation (5.32) is the same as (5.33).
(5.33)
196
5 Orthogonal cluster bases and matrix projections
If sons.t/ ¤ ;, we have Vxr D Ur Ur Vr D Ur Vyr ;
yr Qr D Ur Q
for all r 2 sons .t /
and observe yr Q y r Ur Ur Vyr /Er;t xk2 k.Vxr Qr Qr Vxr /Er;t xk2 D k.Ur Vyr Ur Q yr Q y r Vyr /Er;t xk2 for all r 2 sons .t /: D kUr .Vyr Q Since the matrices Ur are orthogonal, we get y r Vyr /Er;t xk2 D k.Vyr Q y r Vyr /Er;t xk2 ; yr Q yr Q k.Vxr Qr Qr Vxr /Er;t xk2 D kUr .Vyr Q and this is the desired equation. Corollary 5.31 (Spectral and Frobenius norm error). Let V be a nested cluster basis. Let Q be an orthogonal nested cluster basis. We define y t Vyt k2 yt Q 2;t ´ kVyt Q yt Q y t Vyt kF F ;t ´ kVyt Q
for all t 2 T ; for all t 2 T :
Then the cluster basis Q satisfies the estimates X 2 kV t Q t Qt V t k22 2;r kEr;t k22
for all t 2 T ;
(5.34a)
for all t 2 T :
(5.34b)
r2sons .t/
X
kV t Q t Qt V t k2F
F2 ;r kEr;t k22
r2sons .t/
for all t 2 T . Proof. Let K be the rank distribution of V . For the spectral norm estimate, we pick an arbitrary x 2 RK t and see that Theorem 5.30 implies k.V t Q t Qt V t /xk22 X yr Q y r Vyr /Er;t xk22 D k.Vyr Q r2sons .t/
X
y r Vyr /k22 kEr;t k22 kxk22 yr Q k.Vyr Q
r2sons .t/
D
X
r2sons .t/
and this implies (5.34a).
2 kEr;t k22 kxk22 ; 2;r
5.5 Truncation
197
For j 2 K t , we define the canonical unit vectors e j 2 RK t by eij D ıij . The square of the Frobenius norm of a matrix equals the sum of the squares of the norms of its columns, so we have X kV t e j Q t Qt V t e j k22 kV t Q t Qt V t k2F D j 2K t
X
D
X
y r Vyr /Er;t e j k22 yr Q k.Vyr Q
r2sons .t/ j 2K t
X
D
y r Vyr /Er;t k2F yr Q k.Vyr Q
r2sons .t/
X
yr Q y r Vyr k2F kEr;t k22 kVyr Q
r2sons .t/
X
D
F2 ;r kEr;t k22
r2sons .t/
and this is (5.34b). Remark 5.32. According to Lemma 5.19, we can control the quantities . 2;t / t2T and . F ;t / t2T by choosing the ranks .l t / t2T appropriately. In order to be able to apply Corollary 5.31, we require bounds for the long-range transfer matrices kEr;t k2 . If the original cluster basis V is constructed by interpolation (cf. Section 4.4), we have j.Er;t / j D jL t; . r; /j kL t; k1;Q t ƒm , i.e., each individual entry of the transfer matrix is bounded by the stability constant and we get kEr;t k2 kEr;t kF .#Kr /1=2 .#K t /1=2 ƒm D md ƒm . For variable-order interpolation and Taylor expansion, similar estimates can be derived. Remark 5.33 (Practical error control). In the situation of Remark 5.32, i.e., if there is a constant Clt 2 R>0 satisfying kEr;t k2 Clt for all t 2 T and r 2 sons .t /, we can choose O 2 R>0 and p 2 .0; 1/ and use t ´ O p level.t/
for all t 2 T
(5.35)
to determine the error tolerances used in the truncation Algorithm 19. According to Lemma 5.19 and Corollary 5.31, we get kV t Q t Qt V t k22
X
r2 Clt2 D Clt2
r2sons .t/
Clt2 O 2
p X `Dlevel.t/
p X
X
r2
`Dlevel.t/ r2sons .t/ level.r/D`
p 2` #fr 2 sons .t / W level.r/ D `g:
198
5 Orthogonal cluster bases and matrix projections
If we assume # sons.t / Cbc (cf. Definition 3.43 of bounded cluster trees) for all t 2 T , a simple induction yields `level.t/ ; #fr 2 sons .t / W level.r/ D `g Cbc
and assuming p 2 < 1=Cbc allows us to conclude kV t
Q t Qt V t k22
Clt2 O 2
p X
`level.t/ p 2` Cbc
`Dlevel.t/ p level.t/
D
Clt2 O 2 p 2 level.t/
X
.Cbc p 2 /`
(5.36)
`D0
< Clt2 O 2 p 2 level.t/
1 X
.Cbc p 2 /` D
`D0
Clt2 O 2 p 2 level.t/ : 1 Cbc p 2
This means that we can reach any accuracy by choosing O small enough. Remark 5.34 (Matrix error control). In practical applications, we are not interested in bounds for kV t Q t Qt V t k2 , i.e., the approximation error for the cluster basis, but in bounds for kX …TJ ;Q; ŒX k2 , the approximation error for the matrix. We assume that X is an H 2 -matrix constructed by constant-order interpolation (cf. Section 4.4) and that we have used Algorithm 19 with the error tolerances (5.35). For a single admissible block b D .t; s/ 2 LC J , we have k t .X…TJ ;Q; ŒX /s k2 D k t Xs Q t Qt Xs k2 D kV t Sb Ws Q t Qt V t Sb Ws k2 D k.V t Q t Qt V t /Sb Ws k2 kV t Q t Qt V t k2 kSb Ws k2 : Remark 5.33 allows us to bound the first factor, so we only have to find a bound for the second one. The definitions (4.20b) and (4.20c) yield ˇ ˇX X ˇ ˇ j.Sb Ws u/ j D ˇ .Sb / .Ws /j uj ˇ j 2Os 2Ls
Z ˇX X ˇ Dˇ g. t; ; s; / Ls; .y/
j 2Os 2Ls
ˇZ X ˇ D ˇˇ Is Œg. t; ; /.y/ uj
j 2Os
ˇ ˇ ˇ j .y/ dy ˇ
Z ˇX ˇ kIs Œg. t; ; /k1;Qs uj ˇ
j 2Os
ˇ ˇ .y/ dyu j jˇ
ˇ ˇ
j .y/ˇ dy
5.5 Truncation
199
for all u 2 RJ . Using the stability of the interpolation scheme, the asymptotic smoothness of g and the admissibility condition (4.49), we prove kIs Œg. t; ; /k1;Qs ƒm kg. t; ; /k1;Qs
Cas ƒm Cas ƒm .2/ ; dist.Q t ; Qs / diam.Qs /
and assuming that the supports are Cov -overlapping and that the “curvature bound” (4.58) holds implies Z ˇX ˇ XZ ˇ ˇ uj j .y/ dy ˇ j j .y/j dy juj j ˇ
j 2Os
j 2Os
X
uj2
j 2Os
ks uk2
1=2 X Z
X j 2Os
j 2Os
2 1=2 j
j .y/j dy
1=2
Z
j j j
1=2 CJ ks uk2 Cov j s j1=2
2 j .y/ dy
CJ ks uk2
X
1=2 j j j
j 2Os
1=2 1=2 CJ ks uk2 Cov Ccu
diam.Qs /d =2
for all u 2 RJ . Combining these estimates gives us 1=2 1=2 Ccu CJ diam.Qs /d =2 : kSb Ws k2 Cas ƒm .2/ Cov
(5.37)
In practical applications, CJ and diam.Qs / can be bounded, and combining the resulting estimate with (5.36) allows us to ensure that the error is small enough in each block, and a bound for the global error is provided by Lemma 4.46. Finding estimates of the type (5.36) and (5.37) for more general approximation schemes can be complicated. In these situation, the algorithms presented in Chapter 6, especially Algorithm 30, are more attractive, since these take the matrices Er;t and Sb Ws directly into account and guarantee a given accuracy without the need for additional estimates. The orthogonalization Algorithm 16 only guarantees range.V t / range.Q t / for all t 2 T . With the truncation algorithm 19, we can reach equality of these spaces, which is a useful property for theoretical investigations. Corollary 5.35 (Lossless truncation). Let V be a nested cluster basis. There is an orthogonal nested cluster basis Q satisfying range.Q t / D range.V t / and, consequently, rank.Q t / D rank.V t / for all t 2 T . Proof. We construct Q by using Algorithm 19 with an error tolerance of D 0. Let t 2 T . Due to Remark 5.32 and Theorem 5.30, we find V t D Q t Qt V t , i.e., range.V t / range.Q t / and rank.V t / rank.Q t /. According to the construction of Lemma 5.19, we have rank.Q t / rank.V t /, which implies rank.Q t / D rank.V t / and therefore range.V t / D range.Q t /.
200
5 Orthogonal cluster bases and matrix projections
5.6 Computation of the Frobenius norm of the projection error In order to compute the Frobenius norm kX kF D
XX
Xij2
1=2
i2 j 2J
of the error of an H 2 -matrix approximation explicitly, we use Lemma 4.39 to express the global approximation error by local errors: Lemma 5.36 (Frobenius approximation error). Let X 2 RJ , and let V and W be orthogonal. We have kX …TJ ;V;W ŒX kF D
X
k t Xs k2F kV t X Ws k2F
1=2 :
bD.t;s/2LC J
Proof. Due to Lemma 4.39 and Lemma 5.4, we only have to investigate k t .X …TJ ;V;W ŒX /s k2F ´ k t Xs V t V t X Ws Ws k2F D k t Xs t Xs k2F
if b D .t; s/ 2 LC J ; otherwise;
for all b D .t; s/ 2 LJ . Since the case b 2 LJ n LC J D LJ is trivial, we focus on the admissible blocks and pick b D .t; s/ 2 LC J . The definition of the Frobenius norm implies
k t Xs V t V t X Ws Ws k2F D k t Xs k2F C kV t V t X Ws Ws k2F 2h t Xs ; V t V t X Ws Ws iF ; and the orthogonality of V t and Ws yields h t Xs ; V t V t X Ws Ws iF D h t Xs ; V t .V t V t /V t X Ws .Ws Ws /Ws iF D hV t V t X Ws Ws ; V t V t X Ws Ws iF D kV t V t X Ws Ws k2F ; so we find k t Xs V t V t X Ws Ws k2F D k t Xs k2F kV t V t X Ws Ws k2F and observing kV t V t X Ws Ws k2F D kV t X Ws k2F concludes the proof. Using this lemma, we can embed the efficient computation of the approximation error directly into Algorithm 11: after computing Sb D V t X Ws for an admissible
5.6 Computation of the Frobenius norm of the projection error
201
2 2 block b D .t; s/ 2 LC J , we determine kSb kF and k t Xs kF and accumulate the differences of these norms. The result is the modified Algorithm 20, which computes the coupling matrices .Sb /b2LC of the best approximation Xz ´ …TJ ;V;W ŒX of J
X and returns the approximation error ´ kX …TJ ;V;W ŒX kF in the Frobenius norm. Algorithm 20. Convert a dense matrix into an H 2 -matrix and compute the resulting approximation error. procedure ConvertDenseAndError(X, V , W , var S , ); 0; for b D .t; s/ 2 LC J do BlockForwardTransformation(s, W , . t Xs / , Xy ); BlockForwardTransformation(t, V , Xys , Yy ); Sb Yyt ; C .k t Xs k2F kSb k2F / end for; 1=2 Finding the approximation error when converting a hierarchical matrix into the H 2 matrix format is slightly more complicated since we need an efficient way of computing k t Xs kF D kAb Bb kF for admissible blocks b D .t; s/ 2 LC J given in factorized form. Due to kAb Bb k2F D
XX
.Ab Bb /2ij D
i2 j 2J
D
k XXX i2 j 2J
k XXX
k X
2 .Ab /i .Bb /j
D1
.Ab /i .Bb /j .Ab /i .Bb /j
i2 j 2J D1 D1
D
k X k X X X .Ab /i .Ab /i .Bb /j .Bb /j D1 D1
D
k X
k X
i2
j 2J
.Ab Ab / .Bb Bb / ;
D1 D1
this task is reduced to the computation of the k-by-k Gram matrices Ab Ab and Bb Bb and the summation of k 2 products. We can exploit the symmetry of the Gram matrices in order to derive Algorithm 21. While finding the approximation error for matrices in dense and hierarchical matrix representation is straightforward, carrying out this computation in the case of H 2 matrices presents us with a challenge: we have to compute the norm k t Xs kF D
202
5 Orthogonal cluster bases and matrix projections
Algorithm 21. Convert an H -matrix into an H 2 -matrix and compute the resulting approximation error. procedure ConvertHAndError(A, B, V , W , var S , ); 0; for b D .t; s/ 2 LC J do y BlockForwardTransformation(t, V , Ab , X); BlockForwardTransformation(s, W , Bb , Yy ); Sb Xyt Yys ; 0; b for 2 f1; : : : ; kg do b b C .Ab Ab / .Bb Bb / ; for 2 f C 1; : : : ; kg do b b C 2.Ab Ab / .Bb Bb / end for end for; ´ C . b kSb k2F / end for; ´ 1=2 kVX;t SX;b WX;s kF , and in order to reach optimal efficiency, we cannot use more than 3 3 O.k t C ls / operations. Our approach relies on Algorithm 16: applied to the cluster bases VX and WX , it provides us with orthogonal cluster bases QV and QW and cluster operators RV and RW satisfying VX;t D QV;t RV;t and WX;s D QW;s RW;s for all t 2 T and s 2 TJ . For an admissible block b D .t; s/ 2 LC J , the orthogonality of QV;t and QW;s implies k t Xs kF D kVX;t SX;b WX;s kF QW;s kF D kQV;t RV;t SX;b RW;s D kRV;t SX;b RW;s kF ;
and this norm can be computed in the desired complexity. We can see that in this case the orthogonal cluster bases QV and QW are not required, we are only interested in the cluster operators RV and RW , which are used as weights in place of VX and WX . Combining this approach with Algorithm 14 yields Algorithm 22.
5.7 Numerical experiments Let us now investigate how closely the theoretical estimates for the complexity and the approximation error correspond to practical properties of our algorithms.
5.7 Numerical experiments
203
Algorithm 22. Convert an H 2 -matrix into an H 2 -matrix and compute the resulting approximation error. procedure ConvertH2(SX , V , W , var S, ); ClusterBasisProduct(root.T /, V , VX , PV ); ClusterBasisProduct(root.TJ /, W , WX , PW ); Orthogonalize(root.T /, V , QV , RV ); Orthogonalize(root.TJ /, W , QW , RW ); 0; for b D .t; s/ 2 LC J do PV;t SX;b PW;s ; Sb C .kRV;t SX;b RW;s k2F kSb k2F / end for 1=2
Orthogonalization In a first experiment, we apply the orthogonalization Algorithm 16 to the H 2 -matrix approximation of the single layer potential V introduced in Section 4.9. We first consider a fixed interpolation order m D 4 and different matrix dimensions n 2 f512; 2048; : : : ; 524288g, similar to Table 4.1. The results of this series of experiments are given in Table 5.1. According to the Lemmas 5.18 and 5.14, we expect that the time required for the orthogonalization of the cluster bases and the projection into the corresponding H 2 -matrix space is proportional to n, and this is indeed the case. Since the orthogonalization does not change the H 2 -matrix space, the approximation errors in Table 5.1 and Table 4.1 are identical. Now let us investigate the influence of the interpolation order. This time, we fix n D 32768 and consider different orders m 2 f1; 2; : : : ; 7g. The results are given in Table 5.2 for the admissibility parameter D 2 and in Table 5.3 for the parameter D 1. Lemma 5.18 predicts that the time for the computation will be proportional to m6 , and the prediction fits the experimental results. For the change of cluster bases, Lemma 5.14 also predicts a complexity proportional to m6 , but the experiments suggest that m3 is a better fit for the practical results. One possible explanation for these surprisingly good execution times is that the matrices appearing in the matrix-matrix multiplications of Algorithm 14 are so small that the complexity is dominated by the O.m6 / write accesses instead of the O.m9 / read accesses to storage. In the next experiment, we consider a variable-order approximation of the matrix V . Our theory predicts a complexity of O.n/ both in time and in storage requirements, and the results in Table 5.4 match these predictions. Comparing this table with Table 4.6 shows that the orthogonalization procedure increases the runtime only by approximately 25% and reduces the storage requirements by 14%. Since the orthogonalization leaves the underlying H 2 -matrix approximation unchanged (only the cluster bases are
204
5 Orthogonal cluster bases and matrix projections
Table 5.1. Approximation of the three-dimensional single layer potential by orthogonalized polynomial interpolation of order m D 4.
n Ortho Proj Mem Mem=n MVM kX XQ k2 512 0:2 1:1 2:9 5:8 < 0:01 0 2048 0:8 10:7 33:8 16:9 0:04 3:67 8192 3:2 45:7 176:1 22:0 0:22 1:57 32768 13:8 215:6 854:0 26:7 1:24 3:58 131072 56:8 1241:3 3968:2 31:0 5:67 9:09 33:7 25:53 2:29 524288 226:8 5150:9 17263:8 1000
10000
Ortho O(n)
100
1000
10
100
1
10
0.1 100
1000
10000
100000
100000
1 100
1e+06
Proj O(n)
1000
10000
100
Mem O(n)
10000
100000
1e+06
MVM O(n)
10
1000 1 100 0.1
10
1 100
1000
10000
100000
1e+06
0.01 100
1000
10000
100000
1e+06
swapped), the approximation errors of the orthogonalized matrix are the same as for the original matrix. The orthogonalization algorithm reduces the storage requirements only if #tO #K t holds for a cluster t 2 T , i.e., if the matrix V t has more columns than rows. In the previous experiments, we have constructed cluster trees satisfying #tO #K t for most clusters, therefore the orthogonalization algorithm could not reduce the storage requirements by a significant amount. Let us now consider a situation in which there are a large number of clusters with #tO #K t : we stop subdividing clusters if they contain less than Clf D 16 indices, therefore the average size of leaf clusters will be close to 8. We once more compute an approximation of V for different interpolation orders m 2 f1; 2; : : : ; 7g. Without
205
5.7 Numerical experiments
Table 5.2. Approximation of the three-dimensional single layer potential by orthogonalized polynomial interpolation with D 2 and leaf bound Clf D 2m3 .
m Ortho Proj Mem Mem=n MVM kX XQ k2 1 0:4 59:5 134:2 4:2 0:80 1:75 2 1:2 61:0 188:9 5:9 1:01 2:96 3 4:4 130:8 436:2 13:6 1:02 2:37 4 13:8 213:3 854:0 26:7 1:31 3:68 5 41:1 431:4 1607:7 50:2 1:98 5:19 77:2 2:73 6:910 6 101:7 620:4 2469:3 7 234:3 968:1 3830:7 119:7 4:14 1:910 300
1600
Ortho O(m^6)
Proj O(m^3) O(m^6)
1400
250
1200 200
1000
150
800 600
100
400 50
200
0
0 1
2
3
4
4000
5
6
7
Mem O(m^3)
3500
1
3.5
2500
3
2000
2.5
1500
2
1000
1.5
500
1
0
0.5 2
3
4
5
3
4
6
7
5
6
7
6
7
MVM O(m^3)
4
3000
1
2
4.5
1
2
3
4
5
orthogonalization, the storage requirements would grow like m6 , since the number of clusters now no longer depends on m, and each cluster requires O.m6 / units of storage. Table 5.5 shows that orthogonalization can be used to prevent this: the storage requirements are comparable to the standard case described in Tables 5.2 and 4.2, i.e., they grow like m3 despite the unsuitable choice of Clf , since the orthogonality of the matrices Q t ensure that they cannot have more columns than rows. This is a very important application of the orthogonalization Algorithm 16: in many applications, the cluster basis is constructed adaptively by applying truncation or the compression techniques introduced in Chapter 6 to an initial approximation of the matrix. In order to reach the optimal performance, we have to choose the cluster
206
5 Orthogonal cluster bases and matrix projections
Table 5.3. Approximation of the three-dimensional single layer potential by orthogonalized polynomial interpolation with D 1 and leaf bound Clf D 2m3 .
Mem Mem=n MVM kX XQ k2 297:0 9:3 1:68 1:35 392:2 12:3 2:17 1:26 930:4 29:1 2:08 5:48 1825:0 57:0 2:69 4:69 3425:3 107:0 4:20 2:910 153:9 5:48 2:811 4923:7 6850:3 214:1 7:57 4:012
m Ortho Proj 1 0:4 104:5 2 1:2 107:8 3 4:4 280:7 4 14:0 524:4 5 41:0 1056:4 6 102:0 1356:4 7 234:2 2034:2 250
3500
Ortho O(m^6)
Proj O(m^3) O(m^6)
3000
200 2500 150
2000 1500
100
1000 50 500 0
0 1
2
3
4
8000
5
6
7
Mem O(m^3)
7000
1
2
3
4
5
8
7
6
7
MVM O(m^3)
7
6000
6
6
5000 5 4000 4 3000 3
2000
2
1000 0
1 1
2
3
4
5
6
7
1
2
3
4
5
tree and the block cluster tree in such a way that the amount of storage required for the nearfield is approximately proportional to the amount of storage required for the farfield, i.e., if the ranks of the final cluster bases are small, the leaf clusters also have to be small, usually far smaller than the number of expansion functions used for the initial approximation. This means that we are in the situation described above: the matrices V t corresponding to the initial approximation have far more columns than rows, therefore we can apply orthogonalization in order to reduce both storage and time requirements by a large amount, especially for large problems with initial approximations of high order.
5.7 Numerical experiments
207
Table 5.4. Approximation of the three-dimensional single layer potential by orthogonalized polynomial interpolation of variable order with ˛ D 2 and ˇ D 1.
n Ortho Proj Mem Mem=n MVM kX XQ k2 512 < 0:1 1:0 2:0 4:1 < 0:01 9:15 2048 0:1 8:5 21:9 11:0 0:04 1:45 8192 1:6 30:9 93:8 11:7 0:18 7:36 32768 9:5 128:1 438:7 13:7 0:91 8:77 131072 98:7 654:1 2210:2 17:3 4:31 1:07 18:3 18:29 1:58 524288 564:5 2867:4 9353:5 2097152 2221:7 12052:1 38480:8 18:8 77:88 1:99 10000
100000
Ortho O(n)
1000
Proj O(n)
10000
100 1000 10 100 1 10
0.1 0.01 100
1000
10000
100000
100000
1e+06
1 100
1e+07
1000
10000
100000
100
Mem O(n)
10000
1e+06
1e+07
MVM O(n)
10
1000 1 100 0.1
10
1 100
1000
10000
100000
1e+06
1e+07
0.01 100
1000
10000
100000
1e+06
1e+07
Truncation Now we apply the truncation Algorithm 19 to the cluster bases constructed by interpolation. Since the truncated cluster bases only approximate the original ones, we have to be able to control the error. We use constant-order interpolation and base our investigation on the Remarks 5.33 and 5.34: we have kEr;t k2 ƒm .#K t / D ƒm md and kSb Ws k2 . ƒm CJ diam.Qs /d =2 : We use Lagrangian basis functions on a two-dimensional surface, hence CJ is in O.h/.
208
5 Orthogonal cluster bases and matrix projections
Table 5.5. Approximation of the three-dimensional single layer potential by orthogonalized polynomial interpolation with D 2 and fixed leaf bound Clf D 16.
m Ortho Proj Mem Mem=n MVM kX XQ k2 1 0:4 59:5 134:2 4:2 0:77 1:85 2 1:2 61:3 188:9 5:9 1:01 2:86 3 7:6 84:6 446:8 14:0 1:62 2:67 4 30:7 150:7 894:4 27:9 2:25 3:68 5 115:6 401:4 1617:2 50:5 3:14 5:19 80:7 4:16 7:110 6 341:8 963:0 2580:8 7 944:8 2175:2 3858:5 120:6 5:52 1:310 1000
2500
Ortho O(m^6)
900 800
Proj O(m^6)
2000
700 1500
600 500 400
1000
300 200
500
100 0
0 1
2
3
4
4000
5
6
7
Mem O(m^3)
3500
1
2
3
4
5
7
7
6
7
MVM O(m^3)
6
3000
6
5
2500 4 2000 3 1500 2
1000
1
500 0
0 1
2
3
4
5
6
7
1
2
3
4
5
For the single layer potential, we get d D 2 and D 1, i.e., kSb Ws k2 is in O.h/ and we have to ensure kV t Q t Qt V t k2 . h in order to reach the full accuracy of O.h2 /. Choosing the local error tolerances according to (5.35) with p D 2=3 yields the results given in Table 5.6: the approximation error is doubled compared to the original approximation (cf. Table 4.1), but the storage requirements are halved. The loss of accuracy is not critical, since we can always increase the order of the interpolation and get “exponential convergence at polynomial cost”. The reduction of storage is very important, since most computers provide only a (relatively) small amount of storage, and the storage efficiency of algorithms determines whether a certain computation can
5.7 Numerical experiments
209
Table 5.6. Approximation of the three-dimensional single layer potential by truncated polynomial interpolation with D 2, fixed leaf bound Clf D 32 and O D 2 103 h.
n Trunc Proj Mem Mem=n MVM kX XQ k2 512 0:4 1:0 3:3 6:6 < 0:01 1:26 2048 1:7 6:2 23:5 11:8 0:05 6:27 8192 7:1 27:2 104:4 13:0 0:25 2:27 32768 27:8 124:9 468:1 14:6 1:23 6:48 131072 111:8 630:9 2018:9 15:8 5:39 1:88 15:6 22:15 4:59 524288 486:0 2602:5 7965:0 1000
10000
Ortho O(n)
100
1000
10
100
1
10
0.1 100
1000
10000
10000
100000
1e+06
1 100
1e-06
100
1e-07
10
1e-08
1000
10000
100000
1000
10000
1e-05
Mem O(n)
1000
1 100
Proj O(n)
1e+06
1e-09 100
100000
1e+06
Err O(h^2)
1000
10000
100000
1e+06
be performed or not. For the double layer potential, we find D 2, i.e., kSb Ws k2 is in O.1/ and we have to ensure kV t Q t Qt V t k2 . h2 for the row cluster basis V D .V t / t2T in order to reach the full accuracy. For the column cluster basis W D .Ws /s2TJ , we still have kV t Sb k2 in O.h/ and an accuracy of O.h/ is sufficient for the approximation of W . We once again choose the local error tolerances according to (5.35) with p D 2=3 and obtain the results given in Table 5.7. As expected, the approximation quality is reduced by one order (due to the normal derivative used to define K), but otherwise the errors behave as expected. The storage requirements and execution times are comparable to the case of the single layer potential.
210
5 Orthogonal cluster bases and matrix projections
Table 5.7. Approximation of the three-dimensional double layer potential by truncated polynomial interpolation with D 2, fixed leaf bound Clf D 32 and Orow D 5 102 h2 , Ocol D 2 102 h.
Proj Mem Mem=n MVM kX XQ k2 n Trunc 512 0:4 1:1 3:0 6:0 < 0:01 2:04 2048 1:6 6:4 22:1 11:1 0:05 2:86 13:2 0:27 1:86 8192 7:3 27:9 105:7 32768 29:4 129:9 490:9 15:3 1:29 5:97 131072 120:4 674:2 2191:3 17:1 5:70 1:77 524288 546:1 2799:1 8936:3 17:5 23:58 4:68 1000
10000
Ortho O(n)
100
1000
10
100
1
10
0.1 100
1000
10000
100000
100000
1e+06
1 100
0.0001
1000
1e-05
100
1e-06
10
1e-07
1000
10000
100000
1000
10000
0.001
Mem O(n)
10000
1 100
Proj O(n)
1e+06
1e-08 100
100000
1e+06
Err O(h^2)
1000
10000
100000
1e+06
Chapter 6
Compression
We have seen that we can compute H 2 -matrix approximations of matrices efficiently if suitable nested cluster bases are given, and that we can improve H 2 -matrix approximations by removing redundant information from the cluster bases. Now we investigate the approximability of general matrices and address the following three questions: • What are the basic structural properties of H 2 -matrices? • Which types of matrices can be approximated by H 2 -matrices? • If a matrix can be approximated, how can we find the approximation efficiently? In order to simplify the investigation of the first question, we introduce the space of semi-uniform matrices and demonstrate that a matrix X is an H 2 -matrix if and only if X and X are semi-uniform matrices, therefore we only have to investigate this more general class of matrices. The answer to the first question is then rather simple: a matrix is semi-uniform if and only if all elements of a family of submatrices, called the total cluster basis, have low rank. The second question can also be answered by referring to total cluster bases: a matrix can be approximated by a semi-uniform matrix if and only if the total cluster basis matrices can be approximated by low-rank matrices. To answer the third question, we derive an efficient algorithm which uses recursively constructed low-rank approximations of the total cluster basis matrices in order to find an orthogonal nested cluster basis. This chapter is organized as follows: • Section 6.1 introduces the spaces of left and right semi-uniform matrices for a given block cluster tree and given cluster bases. • Section 6.2 contains the definition of total cluster bases and the proof that a matrix is semi-uniform if and only if the total cluster bases have low rank. • Section 6.3 investigates the approximation of general matrices by semi-uniform matrices. The fundamental result is the explicit error representation given by Theorem 6.16 (cf. [13]). • Section 6.4 is devoted to the introduction and analysis of an algorithm for finding good cluster bases, and thereby good H 2 -matrix approximations, for arbitrary matrices (cf. [64] for a less robust predecessor of this algorithm).
212
6 Compression
• Section 6.5 introduces a modification of the basic compression algorithm that allows us to approximate H -matrices efficiently (cf. [64] for a less robust predecessor of this algorithm). • Section 6.6 is devoted to another modification of the algorithm that allows us to approximate H 2 -matrices in optimal complexity (cf. [10] for a less robust predecessor of this algorithm). • Section 6.7 describes a variant of this algorithm to find cluster bases that approximate multiple H 2 -matrices simultaneously, the resulting hierarchical compression strategy is of crucial importance for several advanced applications including the matrix arithmetic operations presented in Chapter 8. • Section 6.8 provides refined strategies for controlling the error introduced by the H 2 -matrix approximation algorithm (cf. [12]). • Section 6.9 concludes this chapter by presenting numerical results that show that the compression algorithm reaches optimal complexity and that the resulting approximation error behaves as expected. Assumptions in this chapter: We assume that cluster trees T and TJ for the finite index sets and J, respectively, are given. Let TJ be an admissible block cluster tree for T and TJ . Let n ´ # and nJ ´ #J denote the number of indices in and J, and let c ´ #T and cJ ´ #TJ denote the number of clusters in T and TJ . Let p , pJ and pJ be the depths of T , TJ and TJ .
6.1 Semi-uniform matrices Let V and W be nested cluster bases for T and TJ . Let K and L be the rank distributions of V and W , and let E D .E t / t2T and F D .Fs /s2TJ be their transfer matrices. Our goal is to investigate the approximation properties of H 2 -matrices. Instead of taking the influence of both row and column cluster bases into account simultaneously, we consider only “half” H 2 -matrices, i.e., matrices which depend only on row or column cluster bases. Definition 6.1 (Semi-uniform hierarchical matrices). Let X 2 RJ . X is a left semiuniform matrix for TJ and V if there is a family B D .Bb /b2LC of matrices J
t for all b D .t; s/ 2 LC satisfying Bb 2 RsJK J and O
XD
X bD.t;s/2LC J
V t Bb C
X bD.t;s/2L J
t Xs :
(6.1)
6.1 Semi-uniform matrices
The matrices B D .Bb /b2LC
J
213
are called right coefficient matrices.
X is a right semi-uniform matrix for TJ and W if there is a family A D s .Ab /b2LC of matrices satisfying Ab 2 RL for all b D .t; s/ 2 LC J and tO J
X
XD
Ab Ws C
bD.t;s/2LC J
The matrices A D .Ab /b2LC
J
X
t Xs :
(6.2)
bD.t;s/2L J
are called left coefficient matrices.
Semi-uniform hierarchical matrices and H 2 -matrices share an important property: unlike hierarchical matrices, both are subspaces of a matrix space. Remark 6.2 (Matrix subspaces). The sets H 2 .TJ ; V; / ´ fX 2 RJ W X is a left semi-uniform matrix for TJ and V g; H 2 .TJ ; ; W / ´ fX 2 RJ W X is a right semi-uniform matrix for TJ and W g are subspaces of RJ . Proof. We introduce the trivial cluster bases ´ . t / t2T and J ´ .s /s2TJ , observe H 2 .TJ ; V; / D H 2 .TJ ; V; J /; H 2 .TJ ; ; W / D H 2 .TJ ; ; W /; and apply Remark 3.37. Our goal is to establish a connection between left and right semi-uniform matrices and H 2 -matrices. The key result is that a matrix is an H 2 -matrix if and only if it is both left and right semi-uniform. Lemma 6.3 (H 2 - and semi-uniform matrices). Let X 2 RJ . We have X 2 H 2 .TJ ; V; W / if and only if X is both left and right semi-uniform, i.e., if X 2 H 2 .TJ ; V; / and X 2 H 2 .TJ ; ; W / hold. Proof. Let X be left and right semi-uniform. Let b D .t; s/ 2 LC J . Due to Definit s tion 6.1, there are matrices Ab 2 RL and Bb 2 RJK with sO tO
t Xs D V t Bb D Ab Ws : Let n ´ #tO and k ´ dim.range.V t // n. Due to range.V t / RtO , we can find a basis .vi /niD1 of RtO which satisfies range.V t / D spanfv1 ; : : : ; vk g. For each i 2 f1; : : : ; kg, we fix a vector xi 2 RK t with vi D V t xi . Since the vectors .vi /niD1 are linear independent, the same holds for the vectors .xi /niD1 .
214
6 Compression
t We define the linear operator R 2 RtK by O
´ Rvi D
xi 0
if i k; otherwise;
for all i 2 f1; : : : ; ng. This definition implies V t Rvi D V t xi D vi for all i 2 f1; : : : ; kg, and since .vi /kiD1 is a basis of range.V t /, we can conclude V t RV t D V t . Due to t Xs D V t Bb , we have t Xs D V t Bb D V t RV t Bb D V t R t Xs ; and due to t Xs D Ab Ws , we have V t R t Xs D V t RAb Ws ; so we can set Sb ´ RAb and find t Xs D V t R t Xs D V t RAb Ws D V t Sb Ws ; which proves that X is an H 2 -matrix. The rest of the proof is trivial. If V and W are orthogonal, the best approximations of a given matrix in the spaces H 2 .TJ ; V; / and H 2 .TJ ; ; W / are given by orthogonal projections (with respect to the Frobenius norm). Lemma 6.4 (Semi-uniform matrix projections). Let V and W be orthogonal. The operators …TJ ;V; W RJ ! H 2 .TJ ; V; /; X X 7! V t V t Xs C bD.t;s/2LC J
X
…TJ ;;W W RJ ! H 2 .TJ ; ; W /; X X 7! t X Ws Ws C bD.t;s/2LC J
t Xs ;
bD.t;s/2L J
X
t Xs ;
bD.t;s/2L J
are orthogonal projections into H 2 .TJ ; V; / and H 2 .TJ ; ; W /. They satisfy …TJ ;V; …TJ ;;W D …TJ ;V;W D …TJ ;;W …TJ ;V; : Proof. For the first part of the proof, we again use the trivial cluster bases ´ . t / t2T ;
J ´ .s /s2TJ ;
6.1 Semi-uniform matrices
215
observe H 2 .TJ ; V; / D H 2 .TJ ; V; J /; H 2 .TJ ; ; W / D H 2 .TJ ; ; W / as well as …TJ ;V; D …TJ ;V;J and …TJ ;;W D …TJ ; ;W , and apply Lemma 5.5. The second part of the proof is a simple consequence of Lemma 5.4. We would like to restrict our attention to left semi-uniform matrices and handle right semi-uniform matrices by switching to adjoint matrices. In order to be able to switch from a matrix X to its adjoint X in a meaningful way, we require the block cluster tree corresponding to X : Definition 6.5 (Transposed block cluster tree). Let TJ be a block cluster tree for TJ and T . TJ is the transposed block cluster tree of TJ if the following conditions hold: • For all t 2 T and s 2 TJ , we have b D .s; t / 2 TJ if and only if .t; s/ 2 TJ . • For all t 2 T and s 2 TJ , we have b D .s; t / 2 LC J if and only if .t; s/ 2 . LC J . The transposed block cluster tree of TJ is unique and denoted by TJ
Using the transposed block cluster tree, we can replace the projection into the space of right semi-uniform matrices by the one into the space of left semi-uniform matrices: Lemma 6.6. Let W be orthogonal. For all X 2 RJ , we have .…TJ ;;W ŒX / D …TJ ;W; ŒX : Proof. Let X 2 RJ and TJ D TJ . We denote the admissible and inadmissible C leaves of TJ by LJ and LJ . Due to the definitions in Lemma 6.4, we find
.…TJ ;;W ŒX / D
X
t X Ws Ws C
.t;s/2LC J
D
X
Ws Ws X t C
D
t Xs
.t;s/2L J
X
s X t
.t;s/2L J
.t;s/2LC J
X
X
Ws Ws X t C
.s;t/2LC J D …TJ ;W; ŒX :
X
.s;t/2L J
s X t
216
6 Compression
This lemma allows us to formulate estimates for the operator norm of the H 2 -matrix approximation error which depend only on row matrix projections. Lemma 6.7 (Spectral norm estimate). Let V and W be orthogonal cluster bases. Let X 2 RJ . Introducing Xl ´ …TJ ;V; ŒX and Xr ´ …TJ ;;W ŒX D .…TJ ;W; ŒX / , we find kX …TJ ;V;W ŒX k2
8
TJ ;W; ŒX
k2 C kXr …TJ ;V; ŒXr k2 :
Proof. Combining the triangle inequality and Lemma 6.6, we find kX …TJ ;V;W ŒX k2 kX Xl k2 C kXl …TJ ;;W ŒXl k2 D kX Xl k2 C kXl …TJ ;W; ŒXl k2
and kX …TJ ;V;W ŒX k2 kX Xr k2 C kXr …TJ ;V; ŒXr k2 D kX …TJ ;W; ŒX k2 C kXr …TJ ;V; ŒXr k2 :
Differently from the spectral norm, the Frobenius norm of a matrix can be expressed in terms of the Frobenius norms of the subblocks of the matrix. This allows us to derive error estimates which are significantly more elegant than the ones for the operator norm. Lemma 6.8 (Frobenius norm estimate). Let V and W be orthogonal. Let TJ be a block cluster tree. Let X 2 RJ . Introducing Xl ´ …TJ ;V; X and Xr ´ …TJ ;;W X D .…TJ ;W; X / , we find kX …TJ ;V;W ŒX k2F D kX Xl k2F C kXl …TJ ;;W ŒXl k2F kX Xl k2F C kX Xr k2F :
(6.3)
Proof. Let … ´ …TJ ;V;W , …l ´ …TJ ;V; and …r ´ …TJ ;;W . Let X 2 RJ . Due to Lemma 6.4, …l is an orthogonal projection, and we find kX …ŒX k2F D kX …l ŒX C …l ŒX …l …r ŒX k2F D kX …l ŒX k2F C 2hX …l ŒX ; …l ŒX …r X iF C k…l ŒX …r ŒX k2F D kX …l ŒX k2F C 2h…l ŒX …2l ŒX ; X …r ŒX iF C k…l ŒX …r ŒX k2F
6.1 Semi-uniform matrices
217
D kX …l ŒX k2F C k…l ŒX …l …r ŒX k2F D kX …l ŒX k2F C k…l ŒX …r …l ŒX k2F ; which is the first half of our claim. For Xe ´ X …r ŒX , we have k…l ŒX …l …r ŒX k2F D k…l ŒXe k2F k…l ŒXe k2F C kXe …l ŒXe k2F D k…l ŒXe k2F C kXe k2F 2hXe ; …l ŒXe iF C k…l ŒXe k2F D 2k…l ŒXe k2F C kXe k2F 2h…l ŒXe ; …l ŒXe iF D kXe k2F D kX …r ŒX k2F ; and this completes the proof. Now we can conclude that, as far as error estimates in the Frobenius norm are concerned, investigating only the left matrix projections of a matrix and its transposed provides us with enough information to bound the H 2 -matrix approximation error. Theorem 6.9 (Separation of cluster bases). Let V and W be orthogonal. Let TJ be a block cluster tree. For all X 2 RJ , we have 2 kX …TJ ;V;W ŒX k2F kX …TJ ;V; ŒX k2F C kX …TJ ;W; ŒX kF :
Proof. Combine Lemma 6.8 with Lemma 6.6. We can even demonstrate that treating a matrix and its transposed separately increases the error estimate only by a small factor: Remark 6.10. Let X 2 RJ . We have kX …TJ ;V; ŒX k2F D kX k2F 2hX; …TJ ;V; ŒX iF C k…TJ ;V; ŒX k2F D kX k2F 2h…TJ ;V; ŒX ; …TJ ;V; ŒX iF C k…TJ ;V; ŒX k2F D kX k2F k…TJ ;V; ŒX k2F kX k2F k…TJ ;;W …TJ ;V; ŒX k2F D kX k2F k…TJ ;V;W ŒX k2F D kX k2F 2hX; …TJ ;V;W ŒX iF C k…TJ ;V;W ŒX k2F D kX …TJ ;V;W ŒX k2F : The same upper bound holds for kX …TJ ;;W ŒX k2F , and we can conclude 2 2 kX …TJ ;V; ŒX k2F C kX …TJ ;W; ŒX kF 2kX …TJ ;V;W ŒX kF :
This means that Theorem 6.9 only slightly overestimates the true approximation error.
218
6 Compression
6.2 Total cluster bases Due to Theorem 6.9, we can investigate row and column cluster bases independently, and it is sufficient to consider only the left semi-uniform matrices. Our goal is to find an alternative characterization of these matrices which can be used for our theoretical investigations. We start by fixing an arbitrary matrix X 2 RJ . We intend to express it as a left semi-uniform matrix, therefore we have to construct a suitable cluster basis. If we define X X0;t ´ t Xr for all t 2 T ; r2row.t/
the resulting family .X0;t / t2T satisfies X t Xs D t X r s r2row.t/
D
X
t Xr s D X0;t s
for all b D .t; s/ 2 LC J
r2row.t/
due to Lemma 5.7, i.e., the condition (6.1) can be fulfilled by letting Bb ´ s for all b D .t; s/ 2 LC J . The family .X0;t / t2T is a cluster basis with the rank distribution .J/ t2T , but it is not nested. Fortunately, this can be remedied by a simple trick: a cluster basis is nested (cf. (3.15)), if the cluster basis matrix for a non-leaf cluster t can be expressed in terms of the cluster basis matrices of its sons. Due to Lemma 5.7, all matrices X0;t correspond to disjoint blocks of X , therefore the condition (3.15) can be fulfilled by adding suitable restrictions of all predecessors of a cluster t to X0;t (cf. Figure 6.1). This idea gives rise to the following definition: Definition 6.11 (Total cluster basis). Let X 2 RJ . The family .X t / t2T defined by X X X Xt ´ t Xs D t Xs for all t 2 T s2row .t/
t C 2pred.t/ s2rowC .t C /
is called the total cluster basis corresponding to X . Using Lemma 5.7, we can easily verify that .X t / t2T is indeed a nested cluster basis and that the matrix X is a left semi-uniform matrix with respect to this cluster basis: Lemma 6.12 (Properties of the total cluster basis). Let X 2 RJ . The total cluster basis .X t / t2T for X is a nested cluster basis with rank distribution .J/ t2T , and its transfer matrices are given by ´P if there exists t C 2 T with t 2 sons.t C /; s2row .t C / s (6.4) Et ´ 0 otherwise.
6.2 Total cluster bases
219
Figure 6.1. Matrices X0;t (top) and X t (bottom) for the model problem.
Using the total cluster basis, X can be represented as X X X t s C XD
t Xs ;
(6.5)
bD.t;s/2L J
bD.t;s/2LC J
i.e., X is a left semi-uniform matrix for the cluster basis .X t / t2T . The long-range transfer matrices (cf. Definition 5.27) satisfy ´P if r ¤ t; s2row .t/ s (6.6) Er;t D I otherwise; for t 2 T and r 2 sons .t /. Proof. Let t 2 T with sons.t / ¤ ;. Definition 3.4 and Definition 3.24 imply X t 0 ; t D t 0 2sons.t/
0
and due to Lemma 5.7, we have row .t / row .t / and r s D ıs;r s
for t 2 sons.t/, s 2 row .t / row .t 0 / and r 2 row .t 0 /. This means X X X t Xs D t 0 Xs Xt D 0
s2row .t/
D
X
X
t 0 2sons.t/ s2row .t/
X
t 0 2sons.t/ s2row .t/ r2row .t 0 /
t 0 Xr s
220
6 Compression
D
X
X
t 0 2sons.t/ r2row .t 0 /
D
X
t 0 2sons.t/
Xt 0
X
t 0 Xr
s
s2row .t/
X
X
s D
s2row .t/
Xt 0 Et 0 :
t 0 2sons.t/
Due to Lemma 5.7 and t 2 pred.t /, the second part of the proof is trivial. We prove (6.6) by induction on level.r/ level.t / 2 N0 . Let t 2 T and r 2 sons .t/ with level.r/ level.t / D 0. This implies t D r, and (6.6) follows directly. Let now n 2 N0 be such that (6.6) holds for all t 2 T and all r 2 sons .t / with level.r/ level.t / D n. Let t 2 T and r 2 sons .t / with level.r/ level.t / D n C 1. Due to the definition of sons .t /, we can find t 2 sons.t / such that r 2 sons .t / holds. We have level.r/ level.t / D n, so we can apply the induction assumption in order to find ´P if s ¤ t ; s2row .t / s Er;t D I otherwise; and (6.4) implies
X
Et D
s :
s2row .t/
Lemma 5.7 yields row .t / row .t /, and we can conclude X Er;t D Er;t E t D s : s2row .t/
In general, the total cluster basis is not useful in practical applications, since its rank is too large by far (although there are some efficient algorithms that are based on working implicitly with the total cluster basis, cf. Section 6.4). On the other hand, the total cluster basis is a useful tool when investigating whether an arbitrary matrix can be approximated by an H 2 -matrix. In order to illustrate this, we require the following general result concerning restrictions of cluster basis matrices: Lemma 6.13 (Restriction). Let V D .V t / t2T be a nested cluster basis. Let t 2 T and r 2 sons .t/. With the long-range transfer matrix Er;t introduced in Definition 5.27, we have r V t D Vr Er;t : (6.7) Proof. By induction on # sons .t / 2 N. Let t 2 T and r 2 sons .t /. If # sons .t / D 1 holds, we have sons.t / D ; and therefore t D r. The definition of the cluster basis implies r V t D t V t D V t D V t E t;t . Let n 2 N be such that (6.7) holds for all t 2 T with # sons .t / n. Let t 2 T with # sons .t/ D n C 1. Let r 2 sons .t /. If r D t , we can proceed as before. If r ¤ t, there is a t 2 sons.t / with r 2 sons .t /, and we find X r V t D r V t 0 E t 0 D r V t E t : t 0 2sons.t/
6.2 Total cluster bases
221
Due to # sons.t / # sons.t / 1 D n, we can apply the induction assumption to find r V t D Vr Er;t , which implies r V t D r V t E t D Vr Er;t E t D Vr Er;t ; and concludes the induction. Using this result, total cluster bases can be used to characterize left semi-uniform matrices without the need for explicitly given cluster bases. The basic result is that the rank of the total cluster basis matrices is bounded if the corresponding matrix is left semi-uniform: Lemma 6.14. Let V D .V t / t2T be a nested cluster basis. We fix a left semi-uniform matrix X 2 H 2 .TJ ; V; / and its total cluster basis .X t / t2T . Then we have range.X t / range.V t / and therefore rank.X t / rank.V t / for all t 2 T . Proof. Let t 2 T , let t C 2 pred.t / and s 2 rowC .t C /. We have b ´ .t C ; s/ 2 LC J
t by definition, and since X is left semi-uniform, we can find a matrix Bb 2 RsJK O satisfying t C Xs D V t C Bb . Lemma 6.13 implies
t Xs D t t C Xs D t V t C Bb D V t E t;t C Bb : For X t , this means X Xt D
X
t Xs D V t
t C 2pred.t/ s2rowC .t C /
E t;t C B.t C ;s/ ; (6.9)
X
X
(6.8)
t C 2pred.t/ s2rowC .t C /
i.e., the range of X t is a subspace of the range of V t . The following lemma can be considered the counterpart of the previous one: it states that the matrix X is left semi-uniform if the ranks of the total cluster basis matrices are bounded. Lemma 6.15. Let X 2 RJ . There exists an orthogonal nested cluster basis Q D .Q t / t2T with rank distribution L D .L t / t2T satisfying #L t D rank.Q t / D rank.X t / for all t 2 T such that X 2 H 2 .TJ ; Q; / holds. Proof. Due to Lemma 6.12, .X t / t2T is a nested cluster basis. Therefore we can apply Corollary 5.35 to it in order to find a nested orthogonal cluster basis Q D .Q t / t2T with Q t Qt X t D X t and #L t D rank.Q t / D rank.X t / for all t 2 T . We use (6.5) to conclude X X XD X t s C t Xs bD.t;s/2LC J
D
X
bD.t;s/2LC J
bD.t;s/2L J
Q t Qt X t s C
X
bD.t;s/2L J
t Xs
222
6 Compression
D
X bD.t;s/2LC J
Q t Bb C
X
t Xs
bD.t;s/2L J
t with Bb ´ .Qt X t s / 2 RsJL for all b D .t; s/ 2 LC J . O
In the standard situation, i.e., if each cluster t corresponds to a subdomain t Rd and each admissible block b D .t; s/ satisfies the condition maxfdiam. t /; diam. s /g 2 dist. t ; s /; the total cluster basis has a geometric interpretation: for t 2 T and s 2 row .t /, we can find a cluster t C 2 pred.t / with s 2 rowC .t C /. Due to the admissibility condition, this means diam. t / diam. t C / 2 dist. t C ; s / 2 dist. t ; s /; i.e., for all s 2 rowC .t /, the set s is contained in the set ct ´ fx 2 W diam. t / 2 dist. t ; x/g: In this sense, the matrix X t describes the transfer of information from the “geometric farfield” ct (cf. Figure 6.2) of the cluster t into the domain t . ct
t
Figure 6.2. Geometric interpretation of total cluster bases: the matrix X t describes the transfer of information from the farfield ct to the cluster t .
6.3 Approximation by semi-uniform matrices We have seen that a matrix X is left semi-uniform if and only if the ranks of the total cluster basis matrices .X t / t2T are bounded. In practice, we frequently encounter
6.3 Approximation by semi-uniform matrices
223
matrices which do not have low rank, but which can be approximated by low-rank matrices, therefore we have to investigate whether the existence of low-rank approximations of the matrices .X t / t2T implies that we can approximate X by a left semi-uniform matrix. Let X 2 RJ , and let .X t / t2T be its total cluster basis. Let K D .K t / t2T be a rank distribution, and let A D .A t / t2T and .B t / t2T be families of matrices satisfying t A t 2 RK tO
and B t 2 RJK t
(6.10)
for all t 2 T . Let us assume that the low-rank matrices A t B t approximate X t , i.e., that a suitable norm of X t A t B t is small enough for all t 2 T . In order to find a left semi-uniform approximation of X , we have to construct an orthogonal nested cluster basis Q D .Q t / t2T . Once we have this basis, we can find the best approximation of X in H 2 .TJ ; Q; / by applying the orthogonal projection …TJ ;Q; . Obviously, the challenging task is the construction of Q. We use a variant of the orthogonalization Algorithm 16: in leaf clusters t 2 T , we construct the orthogonal cluster basis matrices by computing the Householder factorization A t D Q t R t of A t . Let now t 2 T with ´ # sons.t / > 0. We let ft1 ; : : : ; t g ´ sons.t / and assume L 0 that we have already constructed orthogonal cluster basis matrices Q t 0 2 RtO0 t with disjoint index sets L t 0 for all t 0 2 sons.t /. As in the case of the truncation Algorithm 19, we have to replace A t by its projection 0 1 Q t1 A t X B :: C N At ´ Ut Ut At D Ut @ : A D Q t 0 Qt 0 A t Qt A t
t 0 2sons.t/
using the orthogonal matrix U t ´ Q t1
:::
t Q t 2 RM tO
for
Mt ´
[
Lt 0
t 0 2sons.t/
we have introduced in (5.12). Using 0
1 Qt1 A t C B Ayt ´ @ ::: A 2 RM t K t ; Qt A t
we find AN t D U t Ayt . Now we can proceed as in the orthogonalization algorithm: the y t R t of Ayt yields an index set L t and an orthogonal Householder factorization Ayt D Q M t L t y t is also y matrix Q t 2 R , and the orthogonality of U t implies that Q t ´ U t Q
224
6 Compression
y t jL 0 L t , orthogonal. The corresponding transfer matrices can be defined by F t 0 ´ Q t since this yields 0 1 F t1 X
: C y t D Q t1 : : : Q t B Qt 0 Ft 0 : Qt D Ut Q @ :: A D 0 t 2sons.t/ F t The resulting recursive procedure is given in Algorithm 23. Algorithm 23. Converting a general cluster basis into a nested one. procedure Nestify(t, A, var Q, R, L) if sons.t/ D ; then Householder(A t , tO, Q t , R t , L t ) else Mt ;; for t 0 2 sons.t / do Nestify(t 0 , A, Q, R, L); P Lt 0 Mt Mt [ end for; for t 0 2 sons.t / do Ayt jL t 0 K t Qt 0 A t end for; y t , R t , L t ); Householder(Ayt , M t , Q 0 for t 2 sons.t / do y t jL 0 L t Ft 0 Q t end for end if Since Algorithm 23 uses a projection to ensure that the resulting cluster basis is nested, we have to analyze the resulting error. Theorem 6.16 (Matrix approximation). Let Q D .Q t / t2T be a nested orthogonal cluster basis for T . Let X 2 RJ . Let .X t / t2T be the total cluster basis for X , and let ´ U t U t X t if sons.t / ¤ ;; N Xt ´ Xt otherwise; for all t 2 T (compare (5.21)). The error of the left matrix projection is given by X X …TJ ;Q; ŒX D (6.11) XN t Q t Qt XN t t2T
and satisfies k.X …TJ ;Q; ŒX /xk22 D
X t2T
k.XN t Q t Qt XN t /xk22
for all x 2 RJ : (6.12)
225
6.3 Approximation by semi-uniform matrices
Proof. We have
X
X…TJ ;Q; ŒX D D
X
X
t Xs Q t Qt Xs
.t;s/2LC J
t2T s2rowC .t/
As in Lemma 5.25, we let
X
X
. t X Q t Qt X /s D
.X t Q t Qt X t /s :
t2T s2rowC .t/
D t ´ XN t Q t Qt XN t
for all t 2 T and apply Lemma 5.29 to each term in the above sum in order to find X X X X …TJ ;Q; ŒX D Dr Er;t s t2T s2rowC .t/ r2sons .t/
D
X
X
X
Dr Er;t s ;
r2T t2pred.r/ s2rowC .t/
where Er;t are the long-range transfer matrices of the total cluster basis (cf. (6.6)). Let r 2 T , t 2 pred.r/ and s 2 rowC .t /. Due to Lemma 6.12, we have ´ P s s 0 2row .t/ s 0 if r ¤ t; Er;t s D s Er;t D otherwise; s and s 2 rowC .t / row .t / implies Er;t s D s , which leads to X X X X X X …TJ ;Q; ŒX D Dr s D r2T s2row .r/
r2T t2pred.r/ s2rowC .t/
Dr s D
X
Dr :
r2T
This is the desired equation (6.11). According to Lemma 5.26, the ranges of the matrices Dr are pairwise orthogonal and we get X k.X …TJ ;Q; ŒX /xk22 D kD t xk22 ; t2T
which concludes the proof. With this general approximation result, we can now investigate the error introduced by Algorithm 23. Corollary 6.17 (Approximation error bound). Let X 2 RJ , and let .A t / t2T and .B t / t2T be as in (6.10). Let Q D .Q t / t2T be the nested orthogonal cluster basis constructed by Algorithm 23. We have X k.X …TJ ;Q; ŒX /xk22 k.X t A t B t /xk22 (6.13) t2T
for all x 2 RJ .
226
6 Compression
Proof. Let Xl ´ …TJ ;Q; ŒX and let x 2 RJ . According to Theorem 6.16, we have to find bounds for k.XN t Q t Qt XN t /xk22 . Let t 2 T . If sons.t / D ;, we have XN t D X t and Q t R t D A t by definition and find X t Q t Qt X t D X t A t B t C A t B t Q t Qt X t D X t A t B t C Q t .R t B t Qt X t / D X t A t B t Q t Qt .X t A t B t / D .I Q t Qt /.X t A t B t /: Since Q t Qt is an orthogonal projection, so is I Q t Qt , and we conclude k.X t Q t Qt X t /xk2 k.X t A t B t /xk2 :
(6.14)
y t R t D U t Ayt D AN t and If sons.t/ ¤ ;, we have XN t D U t U t X t and Q t R t D U t Q observe XN t Q t Qt XN t D XN t AN t B t C AN t B t Q t Qt XN t D XN t AN t B t C Q t .R t B t Qt XN t / D XN t AN t B t Q t Qt .XN t AN t B t / D .I Q t Qt /.XN t AN t B t /: Using the fact that I Q t Qt and U t U t are orthogonal projections, we find k.XN t Q t Qt XN t /xk2 D k.I Q t Qt /.XN t AN t B t /xk2 k.XN t AN t B t /xk2 D kU t U t .X t A t B t /xk2 k.X t A t B t /xk2 : (6.15) Combining (6.14) and (6.15) with Theorem 6.16 yields (6.13). Corollary 6.18 (Approximation error). Let X 2 RJ , and let .A t / t2T and .B t / t2T be as in (6.10). There is a nested orthogonal cluster basis Q D .Q t / t2T (the one defined by Algorithm 23) such that we have X kX t A t B t k22 (6.16a) kX …TJ ;Q; ŒX k22 t2T
and kX …TJ ;Q; ŒX k2F
X
kX t A t B t k2F :
(6.16b)
t2T
Proof. Let Q D .Q t / t2T be the orthogonal nested cluster basis defined by applying Algorithm 23 to the family .A t / t2T .
6.4 General compression algorithm
227
We denote the best left semi-uniform approximation of the matrix X in the space H 2 .TJ ; Q; / by Xl ´ …TJ ;Q; ŒX . Due to Corollary 6.17, we have X
(6.13)
k.X Xl /xk22
k.X t A t B t /xk22
t2T
X
kX t A t B t k22 kxk22
t2T
for all x 2 RJ , and this implies (6.16a). In order to prove (6.16b), we proceed as in Theorem 5.30: for all j 2 J, we introduce the canonical unit vectors e j 2 RJ by eij D ıij . The Frobenius norm is given by kX Xl k2F D
X
(6.13)
k.X Xl /e j k22
j 2J
D
X
XX
k.X t A t B t /e j k22
j 2J t2T
kX t
A t B t k2F ;
t2T
and this is the desired estimate. We can combine this result with Theorem 6.9 in order to prove that a matrix X 2 RJ can be approximated efficiently by an H 2 -matrix if the total cluster bases of X and X can be approximated by low-rank matrices.
6.4 General compression algorithm Up to this point, we have only investigated the approximability of an arbitrary matrix from a theoretical point of view: we know that a matrix can be approximated by a left semi-uniform matrix if the corresponding total cluster bases can be approximated by low-rank matrices. Let us now consider the practical side of the problem: how can we construct a good left semi-uniform approximation of an arbitrary matrix X 2 RJ ? A straightforward approach could be based on the construction of the previous section: we construct lowrank approximations of the total cluster basis matrices .X t / t2T by the singular value decomposition (cf. Lemma 5.19) or a similar technique and then apply Algorithm 23 to create a nested cluster basis. The major disadvantage of this procedure is that finding low-rank approximations of all total cluster basis matrices leads to a rather high algorithmic complexity: constructing optimal low-rank approximations by singular value decompositions typically involves a cubic complexity in the number n of degrees of freedom, and although more efficient and less accurate techniques like rank-revealing factorization [47], [33] or cross-approximation [104], [7] schemes can reduce the complexity, they will also not reach the optimal order of complexity.
228
6 Compression
Lemma 6.12 suggests a more efficient approach: the total cluster basis .X t / t2T is a nested cluster basis, and due to Theorem 6.16, a good approximation of this cluster basis by an orthogonal nested cluster basis will yield a good approximation of the matrix X in left semi-uniform representation. The problem of approximating a nested cluster basis is solved by the truncation Algorithm 19, therefore applying this procedure to the total cluster basis .X t / t2T can be expected to yield good results. According to Lemma 6.12, the transfer matrices of the total cluster basis .X t / t2T are given by ´P if there exists t C 2 T with t 2 sons.t C /; s2row .t C / s Et ´ 0 otherwise; for all t 2 T , therefore applying Algorithm 19 to the total cluster basis leads to Algorithm 24. We recall that in this algorithm, as in Algorithm 19, the matrices R t ´ Qt X t describe the cluster operator used to switch from the original total cluster basis X t to the truncated orthogonal cluster basis Q t and that the matrices Xyt are given by 1 0 R t1 C X B Xyt D U t X t D @ ::: A s 2 RM t J R t s2row .t/ P [ P L t . for ´ # sons.t /, ft1 ; : : : ; t g ´ sons.t / and M t ´ L t1 [ Algorithm 24 looks fairly simple from a mathematical point of view, but its implementation can be challenging since the total cluster basis matrices X t is usually not directly available. Let us therefore now consider a more practical variant of the algorithm. We introduce block-related counterparts of the matrices X t , Xyt and R t by defining X t;s ´ t Xs ;
Xyt;s ´ Xyt s ;
Due to the definition of X t , we have Xt D
R t;s ´ R t s X
for all s 2 row .t /:
X t;s ;
s2row .t/
and this implies Xyt D
X
Xyt;s ;
Rt D
s2row .t/
X
R t;s
s2row .t/
In the data structures typically used when implementing H - and H 2 -matrix algorithms, the set [ row .t / D rowC .t C / t C 2pred.t/
6.4 General compression algorithm
229
Algorithm 24. Theoretical algorithm for finding adaptive cluster bases. procedure AdaptiveDenseSimple(t, X , , var Q, R) if sons.t/ D ; then Lowrank(X t , tO, t , Q t , R t , L t ) fAlgorithm 18g else Mt ;; for t 0 2 sons.t / do AdaptiveDenseSimple(t 0 , X , , Q, R); P Lt 0 Mt Mt [ end for; Xyt 0 2 RM t J ; 0 for t 2 sons.t / do P R t 0 s2row .t/ s Xyt jL t 0 J end for; y t , R t , L t ); Lowrank(Xyt , tO, t , Q fAlgorithm 18g 0 for t 2 sons.t / do y t jL 0 L t Q Ft 0 t end for end if is not directly available, therefore we have to construct it from rowC .t / by using the recursion ´ r C [ rowC .t / if there is a cluster t C 2 T with t 2 sons.t C /; (6.17) rt ´ t C otherwise, i.e., if t is the root of T : row .t / We can see that r t D row .t / holds for all t 2 T . We express X t , Xyt and R t by their block counterparts X t;s , Xyt;s and R t;s and use the recursive definition (6.17) to determine r t . This results in the practical Algorithm 25 for the construction of an adaptive cluster basis. Remark 6.19. Algorithm 25 actually computes not only the orthogonal nested cluster basis Q D .Q t / t2T , but also the matrices B D .Bb /b2LC corresponding to the J
best approximation of X in H 2 .TJ ; Q; /: for b D .t; s/ 2 LC J , we have R t;s D Qt X t;s D Qt Xs and conclude Q t Qt Xs D Q t R t;s ; i.e., Bb D Rt;s .
230
6 Compression
Algorithm 25. Algorithm for finding adaptive cluster bases for dense matrices. procedure AdaptiveDense(t, r t C , X , , var Q, R, L) r t C [ rowC .t /; rt if sons.t/ D ; then Xt 0 2 RJ ; tO for s 2 r t do t Xs ; X t;s Xt X t C X t;s end for; fAlgorithm 18g Lowrank(X t , tO, t , Q t , R t , L t ); for s 2 r t do R t;s R t s end for else Mt ;; for t 0 2 sons.t / do AdaptiveDense(t 0 , r t , X , , Q, R, L); P Lt 0 Mt Mt [ end for; Xyt 0 2 RM t J ; for s 2 r t do Xyt;s 0 2 RM t J ; 0 for t 2 sons.t / do Xyt;s jL t 0 J R t 0 ;s end for; Xyt C Xyt;s Xyt end for; y t , R t , L t ); fAlgorithm 18g Lowrank(Xyt , tO, t , Q 0 for t 2 sons.t / do y t jL 0 L t Ft 0 Q t end for; for s 2 r t do R t;s R t s end for end if
Lemma 6.20 (Complexity). Let X 2 RJ , let L D .L t / t2T be the rank distribution computed by Algorithm 25, and let ´ lt ´
max¹#tO; #L t º ®P ¯ max t 0 2sons.t/ #L t 0 ; #L t
if sons.t / D ;; otherwise
(6.18)
6.4 General compression algorithm
231
for all t 2 T . Algorithm 25 requires not more than X Cpr nJ l t2 t2T
operations. If L is .Cbn ; ˛; ˇ; r; /-bounded and if T is .Crc ; ˛; ˇ; r; /-regular, the number of operations is in O..˛ C ˇ/r n nJ /. Proof. Lemma 5.7 states that all s 2 row .t / correspond to disjoint index sets sO , therefore constructing the matrices X t and Xyt requires no arithmetic operations, we only have to copy coefficients. Let t 2 T . If sons.t / D ;, Algorithm 25 computes the singular value decomposition of the matrix X t and determines Q t , R t and L t . Due to Lemma 5.21, this takes not more than Cpr .#tO/nJ minf#tO; nJ g Cpr .#tO/2 nJ Cpr l t2 nJ operations. If sons.t/ ¤ ;, the algorithm computes the singular value decomposition of the y t , R t and L t . Since the matrix Xyt has only M t rows and matrix Xyt and determines Q since X #M t D #L t 0 l t t 0 2sons.t/
holds by definition, the computation requires not more than Cpr .#M t /nJ minf#M t ; nJ g Cpr .#M t /2 nJ Cpr l t2 nJ operations. Once again we can use the Lemmas 3.45 and 3.48 to complete the proof. As in the case of the conversion of dense matrices to H 2 -matrices considered in Lemma 5.8, the number of operations required for constructing the cluster basis grows quadratically with the matrix dimension. Since the matrix is represented by n nJ coefficients, this can be considered to be the optimal rate. Let us now investigate the error introduced by the compression Algorithm 25. As in Theorem 5.30, we extend the notation to cover leaf as well as non-leaf clusters by introducing 1 80 R ˆ t 1 ˆB ˆ < :::CP A s2row .t/ s D U t X t if sons.t / ¤ ;; @ y Xt ´ for all t 2 T ; ˆ R t ˆ ˆ : X t otherwise 80 1 F t1 ˆ ˆ ˆ
232
6 Compression
y t / t2T are computed explicitly during the course Since the matrices .Xyt / t2T and .Q of Algorithm 25, an error estimate based on these quantities is desirable, since this gives us the possibility to ensure that the error is below a given tolerance. Theorem 6.21 (Compression error). Let X 2 RJ . Let Q D .Q t / t2T be the nested orthogonal cluster basis constructed by Algorithm 25. We have X y t Xyt /xk22 for all x 2 RJ : yt Q k.Xyt Q k.X …TJ ;Q; ŒX /xk22 D t2T
Proof. In order to be able to apply Theorem 6.16, we have to derive bounds for the error terms appearing on the right-hand side of equation (6.12). Let x 2 RJ and t 2 T . If sons.t/ D ; holds, we have yt Q y t Xyt /xk2 k.XN t Q t Qt XN t /xk2 D k.X t Q t Qt X t /xk2 D k.Xyt Q
(6.19)
by definition. If, on the other hand, sons.t / ¤ ; holds, we find yt Q y t U t U t U t X t /xk2 k.XN t Q t Qt XN t /xk2 D k.U t U t X t U t Q y t U t X t /xk2 yt Q D kU t .U t X t Q for all x 2 RJ . The orthogonality of U t and the equation U t X t D Xyt imply yt Q yt Q y t U t X t /xk2 D k.Xyt Q y t Xyt /xk2 k.XN t Q t Qt XN t /xk2 D k.U t X t Q (6.20) J for all x 2 R . Combining (6.19) and (6.20) with Theorem 6.16 concludes the proof. Corollary 6.22 (Error control). Let X 2 RJ . Let Q D .Q t / t2T be the nested orthogonal cluster basis constructed by Algorithm 25. For all t 2 T , we let y t Xyt k2 ; yt Q 2;t ´ kXyt Q
y t Xyt kF : yt Q F ;t ´ kXyt Q
Then we have kX …TJ ;Q; ŒX k22
X
2 2;t ;
kX …TJ ;Q; ŒX k2F D
t2T
X
F2 ;t : (6.21)
t2T
Proof. Theorem 6.21 yields k.X …TJ ;Q; X /xk22 D
X
y t Xyt /xk22 yt Q k.Xyt Q
t2T
for all x 2 RJ . This implies the left part of (6.21).
X t2T
2 2;t kxk22
6.4 General compression algorithm
233
For the proof of the right part, we once again use the canonical unit vectors e j 2 RJ defined by eij ´ ıij for all i; j 2 J and observe kX …TJ ;Q; X k2F D
X
k.X …TJ ;Q; X /e j k22
j 2J
D
XX
yt Q y t Xyt /e j k22 k.Xyt Q
j 2J t2T
D
X
yt Q y t Xyt /k2F D kXyt Q
t2T
X
F2 ;t :
t2T
Lemma 6.23 (Quasi-optimality). Let Q D .Q t / t2T be the nested orthogonal cluster basis with rank distribution L D .L t / t2T computed by Algorithm 25. Let A D .A t / t2T and B D .B t / t2T be families satisfying t A t 2 RtL O
and B t 2 RJL t
for all t 2 T (compare (6.10)). Then we have kX …TJ ;Q; ŒX k22
X
kX t A t B t k22 ;
t2T
i.e., the approximation constructed by Algorithm 25 is at least as good as the one resulting from applying Algorithm 23 to arbitrary low-rank approximations of the total cluster basis. Proof. In order to apply Corollary 6.22, we have to find bounds for the quantities 2;t . Let t 2 T . If sons.t / D ;, we can use Theorem 2.5.3 in [48] to prove 2;t D kX t Q t Qt X t k2 kX t A t B t k2 : If sons.t/ ¤ ;, the same theorem implies yt Q y t Xyt k2 kXyt Ayt B t k2 D kU t .Xyt Ayt B t k2 2;t D kXyt Q D kU t .U t X t U t A t B t k2 D kU t U t .X t A t B t /k2 kX t A t B t k2 : Combining these estimates with Corollary 6.22 yields the desired result. Remark 6.24 (Adaptivity). Due to Lemma 5.19, we can ensure that the quantities . 2;t / t2T and . F ;t / t2T used in Corollary 6.22 are arbitrarily small by choosing suitable ranks l t for the clusters adaptively, e.g., by using Algorithm 18.
234
6 Compression
6.5 Compression of hierarchical matrices Algorithm 25 can be used to find a left semi-uniform approximation of an arbitrary matrix, but the number of operations required to do this will usually behave like n nJ , which is optimal considering the fact that every single entry of the matrix X 2 RJ has to be taken into account. Sometimes the matrix X is not given in the standard representation, but in a datasparse format. In this case we can take advantage of the compact representation of the input matrix in order to reach a better than quadratic complexity. Let us assume that the matrix X 2 RJ is an H -matrix for the admissible block cluster tree TJ with local rank kH 2 N. And let us also assume that X is given in hierarchical matrix representation (3.12), i.e., that there are a family .Kb /b2LC J
of index sets and families .Ab /b2LC Bb 2
JK RsO b ,
J
and .Bb /b2LC
J
Kb
satisfying Ab 2 RtO
,
#Kb kH and X t;s D t Xs D Ab Bb
for all b D .t; s/ 2 LC J . Our goal is to take advantage of the factorized representation in order to reduce the complexity of the basic Algorithm 25. Let b D .t; s/ 2 LC J be an admissible leaf of TJ . We apply the Householder factorization as introduced in Lemma 5.16 to the matrix Bb in order to find an index y y JK set Kyb sO , an orthogonal matrix Pb 2 RsO b and a matrix Cb 2 RKb Kb with Bb D Pb Cb : According to the definition, we have #Kyb D minf#sO ; #Kb g kH . We define the matrix y K c ´ Ab Cb 2 RtO b X t;s and observe that c Pb X t;s D t Xs D Ab Bb D .Ab Cb /Pb D X t;s
(6.22)
c is a factorization of the submatrix into a left factor X t;s with only #Kyt D minf#sO ; #Kb g columns and an orthogonal right factor Pb (cf. Figure 6.3). Since the construction of the cluster basis Q D .Q t / t2T is based only on the left singular vectors and the non-zero singular values, multiplications by orthogonal matrices from the right do not c change the result, therefore our algorithm has to consider only the matrix X t;s instead of X t;s . While the latter matrix has #sO columns, the former cannot have more than minf#Kb ; #sO g kH , i.e., it can be handled more efficiently. The process of using an orthogonal matrix in this way to translate a matrix into a smaller matrix that contains the same information is called condensation, and we refer c as a condensed counterpart of X t;s . to X t;s
6.5 Compression of hierarchical matrices
235
Pb Cb
Bb X t;s
Ab
c X t;s
Ab
Figure 6.3. Condensation of a low-rank block. c Due to (6.22), we only have to consider the condensed matrices X t;s when constructing the adaptive cluster basis. In order to reach an efficient algorithm, it is not c sufficient to replace X t;s by X t;s in Algorithm 25, we also have to replace Xyt;s and R t;s by suitable condensed counterparts defined by y c ´ Xyt;s Pb 2 RM t Kb ; Xyt;s
y
Rct;s ´ R t;s Pb 2 RL t Kb
for all b D .t; s/ with t 2 T and s 2 row .t /. Combining all of these condensed matrices, we can construct condensed counterparts of X t and Xyt by c
c t : : : X t;s ; 2 RN X tc ´ X t;s 1 tO t N t c c Xytc ´ Xyt;s 2 RM ; : : : Xyt;s 1 tO where we let ´ # row .t /;
fs1 ; : : : ; s g ´ row .t /;
P [ P Kyt;s : N t ´ Kyt;s1 [
The set N t is the disjoint union of the sets Kyt;s since we have Kyt;s sO by definition and since Lemma 5.7 guarantees that all sO with s 2 row .t / are disjoint. The sets N t can be constructed by a recursion that closely resembles the one applied to r t in (6.17): 8 S ˆ
236
6 Compression
P
X tc
Xt
Figure 6.4. Condensation of X t for the model problem.
Lemma 6.25 (Complexity). Let X 2 RJ be a hierarchical matrix for the block cluster tree TJ and the local rank kH 2 N, given in H -matrix representation (3.12). Let pJ be the depth of TJ . Let L D .L t / t2T be the rank distribution computed by Algorithm 27, and let .l t / t2T be defined as in (6.18). Algorithm 27 requires not more than X 2 2 nJ C Csp .pJ C 1/ .Cpr kH l t2 C 2kH lt / Csp Cqr .pJ C 1/kH t2T
operations. If L is .Cbn ; ˛; ˇ; r; /-bounded and if T is .Crc ; ˛; ˇ; r; /-regular, the 2 number of operations is in O.kH .pJ C 1/.n C nJ / C kH .pJ C 1/.˛ C ˇ/r n /. Proof. Algorithm 27 starts by computing the weight matrices Cb for all b D .t; s/ 2 LC J using Algorithm 15. According to Lemma 5.17, computing Cb for an admissible leaf b D .t; s/ 2 LC J requires not more than 2 Cqr .#sO /.#Kb / minf#sO ; #Kb g Cqr .#sO /.#Kb /2 Cqr .#sO /kH
operations. Since the block cluster tree TJ is Csp -sparse and since Definition 3.12 implies level.s/ level.b/ pJ for all b D .t; s/ 2 TJ , we get the bound X X X 2 2 Cqr kH #sO D Cqr kH #sO bD.t;s/2LC J
s2TJ t2colC .s/ level.s/pJ
pJ
X
2 Csp Cqr kH
#sO D
s2TJ level.s/pJ Cor. 3.10
2 Csp Cqr kH
X X
#sO
`D0 s2T ` J
pJ 2 Csp Cqr kH
X `D0
2 nJ D Csp Cqr kH .pJ C 1/nJ
6.5 Compression of hierarchical matrices
237
Algorithm 26. Recursion for finding adaptive cluster bases for H -matrices. y var Q, R, L) procedure AdaptiveHRec(t , r t C , N t C , , A, C , K, r t C [ rowC .t /; N t Nt C ; rt for s 2 rowC .t / do P Kyt;s Nt Nt [ end for; if sons.t/ D ; then t 0 2 RtN ; X tc O for s 2 r t do X tc jKyt;s t Ab Cb end for; fAlgorithm 18g Lowrank(X tc , tO, t , Q t , Rct , L t ); for s 2 r t do Rct;s Rct jL t Kyt;s end for else Mt ;; for t 0 2 sons.t / do y Q, R, L); AdaptiveHRec(t 0 , r t , N t , , A, C , K, P Lt 0 Mt Mt [ end for; 0 2 RM t N t ; Xytc for s 2 r t do y c 0 2 RM t K t;s ; Xyt;s 0 for t 2 sons.t / do c Xyt;s jL 0 Kyt;s Rct0 ;s t end for; c Xytc jM t Kyt;s Xyt;s end for; yt , R yct , L t ); fAlgorithm 18g Lowrank(Xytc , M t , t , Q 0 for t 2 sons.t / do y t jL 0 L t Ft 0 Q t end for; for s 2 r t do yct;s yct j R R y t;s L t K end for end if
for the number of operations required to prepare all matrices .Cb /b2LC . Once these J matrices are available, the recursive Algorithm 26 is called.
238
6 Compression
Algorithm 27. Adaptive cluster bases for H -matrices. procedure AdaptiveH(A, B, , var Q, R, L) for b D .t; s/ 2 LC J do Householder(Bb , sO , Pb , Cb , Kyb ) end for; y var Q, R, L) AdaptiveHRec(root.T /, ;, ;, , A, C , K,
fAlgorithm 15g fAlgorithm 26g
Let t 2 T . We can bound the number of columns of X tc by X X X #Kyt;s D #Kyt;s Csp kH .pJ C 1/: #Nyt D s2row .t/
s2rowC .t C / t C 2pred.t/ level.t C /pJ
If sons.t/ D ;, Algorithm 26 first constructs the matrix X tc by multiplying t Ab and Cb , which requires not more than 2.#tO/kH #Kyt;s operations for each s 2 row .t /, and a total of X 2 2.#tO/kH #Kyt;s D 2.#tO/kH #Nyt 2Csp l t kH .pJ C 1/: s2row .t/
Finding the low-rank factorization of this matrix requires not more than Cpr .#tO/2 #N t Cpr .#tO/2 Csp kH .pJ C 1/ Csp Cpr l t2 kH .pJ C 1/ operations according to Lemma 5.21. If sons.t/ ¤ ;, the algorithm uses Algorithm 18 to find a low-rank approximation of the matrix Xytc 2 RM t N t . According to Lemma 5.21, this requires not more than Cpr .#M t /2 #N t Cpr .#M t /2 Csp kH .pJ C 1/ Cpr Csp l t2 kH .pJ C 1/ operations. Combining the estimates for leaf and non-leaf clusters yields the bound 2 Csp .pJ C 1/.2l t kH C Cpr kH l t2 /
for the number of operations for one cluster t , and summing over all clusters proves our claim. If L is .Cbn ; ˛; ˇ; r; /-bounded and T is .Crc ; ˛; ˇ; r; /-regular, we can again use the Lemmas 3.45 and 3.48 to bound the asymptotic complexity. According to Lemma 3.31, the H -matrix representation of a matrix X 2 RJ requires O.kH .pJ C 1/.n C nJ // units of storage. The preparation of X tc (or Xytc ) 2 in Algorithm 27 requires O.kH .pJ C 1/.n C nJ // operations, since approximately O.kH / operations are performed for each coefficient of the input representation. The computation of the cluster bases Q then takes additional O.kH .˛ Cˇ/r .pJ C1/n / operations, this reminds us of the bound of Lemma 6.20 for the original algorithm, only that nJ in the original estimate is replaced by kH .pJ C 1/ due to condensation.
6.6 Recompression of H 2 -matrices
239
6.6 Recompression of H 2 -matrices We have seen that we can improve the efficiency of the general Algorithm 25 if the matrix X we intend to convert to the H 2 -matrix format is already given in a data-sparse representation, e.g., as an H -matrix. The complexity of the modified Algorithm 27 is essentially proportional to the amount of storage required to represent the matrix X . Let us now consider an even more compact representation of X : we assume that X is given in H 2 -matrix form and intend to compute the adapted cluster bases Q D .Q t / t2T defined by Algorithm 25 efficiently. At first glance, this task makes no sense: if X is already an H 2 -matrix, why should we convert it into an H 2 -matrix? The answer is given by Lemma 6.23: the cluster bases computed by the adaptive algorithm have almost optimal rank, i.e., we could create an initial H 2 -matrix approximation of a matrix by polynomial interpolation or a similar, probably non-optimal, approach and then construct adaptive cluster bases with lower storage requirements. Let VX D .VX;t / t2T be a nested cluster basis with rank distribution .KX;t / t2T for T . Let WX D .WX;s /s2TJ be a nested cluster basis with rank distribution .LX;s /s2TJ for TJ . Let X 2 H 2 .TJ ; VX ; WX /. Let S D .Sb /b2LC be the coefficient matrices J
of X, i.e., let XD
X
VX;t Sb WX;s C
bD.t;s/2LC J
X
t Xs :
(6.24)
bD.t;s/2L J
As in the case of hierarchical matrices, the key to finding an efficient algorithm is to use suitable condensed matrices. We consider an admissible leaf b D .t; s/ 2 LC J . Due to Lemma 5.4, we have X t;s D t Xs D VX;t Sb WX;s :
(6.25)
Let now t 2 T and s 2 row .t /. Due to (5.4), we can find a predecessor t C 2 pred.t / of t and a cluster s 2 rowC .t C / such that .t C ; s/ 2 LC J is an admissible leaf. Lemma 6.13 yields X t;s D t Xs D t t C Xs D t VX;t C S t C ;s WX;s D VX;t EX;t;t C S t C ;s WX;s ;
where EX;t;t C denotes a long-range transfer matrix (cf. Definition 5.27) for the cluster basis VX , and applying this equation to all s 2 row .t / gives us the representation X X EX;t;t C S t C ;s WX;s X t D VX;t t C 2pred.t/ s2rowC .t C /
(cf. (6.9) for a similar result for left semi-uniform matrices). Introducing the matrix X X JKX;t WX;s S tC ;s EX;t;t ; Yt ´ C 2 R t C 2pred.t/ s2rowC .t C /
240
6 Compression
we conclude that
X t D VX;t Y t
holds, i.e., that we have a factorized representation of the entire total cluster basis matrix X t , not only of one block X t;s . We apply Lemma 5.16 in order to construct an y index set Kyt J with #Kyt #KX;t , an orthogonal matrix P t 2 RJK t and a weight y matrix Z t 2 RK t KX;t satisfying Yt D Pt Zt : Using this factorization, we conclude X t D VX;t Y t D .VX;t Z t /P t ;
(6.26)
i.e., we can replace the entire total cluster basis matrix X t by the condensed matrix X tc ´ VX;t Z t with not more than #KX;t columns. This is a significant improvement compared to the case of H -matrices, where the condensed total cluster basis matrix has #N t
Csp kH .pJ C 1/ columns. Obviously, working with the condensed matrix X tc is far more efficient than working directly with X t . Computing Z t in the way described here would involve the factorization of the matrix Y t 2 RJKX;t and lead immediately to an algorithm of quadratic complexity. Instead of computing the factorization of Y t directly, we have to use a recursive construction: if we assume that Y t has already been computed and that we want to compute Y t 0 for a son t 0 2 sons.t /, Lemma 5.28 implies EX;t 0 ;t C D EX;t 0 EX;t;t C and we find Yt 0 D
X
X
WX;s S tC ;s EX;t 0 ;t C
t C 2pred.t 0 / s2rowC .t C /
D
X
s2rowC .t 0 /
D
X
for all t C 2 pred.t /
WX;s S t0 ;s C
X
X
t C 2pred.t/ s2rowC .t C /
WX;s S tC ;s EX;t;t C EX;t 0
(6.27)
WX;s S t0 ;s C Y t EX;t 0:
s2rowC .t 0 /
This equation allows us to compute all matrices Y t by the recursive relationship 8 P ˆ C s2rowC .t/ WX;s S t;s if there exists a t C 2 T
6.6 Recompression of H 2 -matrices
We first consider the second case. Finding a factorization of X WX;s S t;s Yt D
241
(6.29)
s2rowC .t/
directly leads to a complexity of O..#tO/.#LX;s /.#KX;t // for each term in the sum, and the resulting total complexity would not be optimal. We avoid this problem by applying the orthogonalization Algorithm 16 to the cluster basis WX D .WX;s /s2TJ . This procedure is of optimal complexity and yields an yX D .L yX;s /s2T orthogonal cluster basis QX D .QX;s /s2TJ with rank distribution L J and a family of weight matrices RX D .RX;s /s2TJ satisfying for all s 2 TJ :
WX;s D QX;s RX;s
(6.30)
Using these factorizations allows us to write the sum (6.29) in the form X Yt D WX;s S t;s s2rowC .t/
D
X
QX;s RX;s S t;s
s2rowC .t/
D QX;s1
1 RX;s1 S t;s 1
B C :: QX;s @ A D UX;t Yyt : 0
:::
RX;s S t;s
for ´ # rowC .t / and fs1 ; : : : ; s g ´ rowC .t / and the matrices 1 0 RX;s1 S t;s 1
C B :: N K UX;t ´ QX;s1 : : : QX;s 2 RJN t ; Yyt ´ @ A 2 R t X;t : RX;s S t;s with the index set Nt ´
[
yX;s : L
s2rowC .t/
Due to Lemma 5.7, the matrix UX;t is orthogonal. The matrix Yyt has only X yX;s #L #N t D s2rowC .t/
rows, therefore we can afford to apply the Householder factorization Algorithm 15 in y order to construct an index set Kyt N t , an orthogonal matrix Pyt 2 RN t K t and a y weight matrix Z t 2 RK t KX;t satisfying Yyt D Pyt Z t :
242
6 Compression
Since UX;t is orthogonal, the same holds for the matrix P t ´ U t Pyt ; and we see that we have efficiently constructed the desired factorization X Yt D WX;s S t;s D UX;t Yyt D UX;t Pyt Z t D P t Z t : s2rowC .t/
Now let us consider the first line in (6.28). We assume that a factorization Yt C D Pt C Zt C of Y t C has already been computed. Using the weight matrices RX;s provided by the orthogonalization Algorithm 16, we find X X C WX;s S t;s D P t C Z t C EX;t C QX;s RX;s S t;s Y t D Y t C EX;t s2rowC .t/
D Pt C
QX;s1
s2rowC .t/
1
0
Z t C EX;t C
B B RX;s1 S t;s1 C
: : : QX;s B C D UX;t Yyt :: A @ : RX;s S t;s
for ´ # rowC .t /, fs1 ; : : : ; s g ´ rowC .t / and the matrices 1 0 Z t C EX;t B RX;s S C
1 t;s1 C B UX;t ´ P t C QX;s1 : : : QX;s 2 RJN t ; Yyt ´ B C 2 RN t KX;t :: A @ : RX;s S t;s
(6.31) with the index set N t ´ Kyt C [
[
yX;s : L
s2rowC .t/
We apply Algorithm 15 to the matrix Yyt in order to find an index set Kyt N t , an y y orthogonal matrix Pyt 2 RN t K t and a weight matrix Z t 2 RK t KX;t with Yyt D Pyt Z t :
(6.32)
Due to Lemma 5.7, the matrix UX;t is orthogonal, and since Pyt is also orthogonal, the same holds for P t ´ UX;t Pyt ;
6.6 Recompression of H 2 -matrices
243
and we find Y t D Y t C EX;t C
X
WX;s S t;s D UX;t Yyt D UX;t Pyt Z t D P t Z t :
s2rowC .t/
Fortunately, we are only interested in the weight matrices .Z t / t2T , therefore we do not have to construct P t , UX;t and QX;s explicitly. The fairly small matrices RX;s and Yyt are sufficient, and these matrices we can handle efficiently. We construct adaptive cluster bases for H 2 -matrices in three phases: the weights RX D .RX;s /s2TJ are computed by applying the orthogonalization Algorithm 16 to the column cluster basis WX . Once the weights for the column cluster basis have been prepared, we have to compute the weights .Z t / t2T for the total cluster basis are computed using (6.31) and (6.32). The corresponding top-down recursion is summarized in Algorithm 28. Algorithm 28. Recursive construction of weight matrices for the total cluster basis. y procedure TotalWeights(t , TJ , Kyt C , Z t C , VX , RX , S , var Z, K) y Nt Kt C ; for s 2 rowC .TJ ; t / do yX;s P L Nt [ Nt end for; 0 2 RN t KX;t ; Yyt y if K t C ¤ ; then Yyt jKy C KX;t Z t C EX;t ; ft C D father.t /g t end if; for s 2 rowC .TJ ; t / do RX;s S t;s Yyt jLyX;s KX;t end for; Householder(Yyt , N t , Pyt , Z t , Kyt ); fAlgorithm 15g for t 0 2 sons.t / do y TotalWeights(t 0 , Kyt , Z t , VX , RX , S, Z, K) end for Lemma 6.26 (Complexity of finding weights). Let .kX;t / t2T and .lX;s /s2TJ be defined as in (3.16) and (3.18) for the rank distributions KX and LX . Algorithm 28 requires not more than X X 3 3 3 maxf2; Cqr g 2 kX;t C .kX;t C lX;s / t2T
operations.
bD.t;s/2LC J
244
6 Compression
Proof. Since Algorithm 28 uses Algorithm 15 to construct Kyt by splitting the matrix Yyt 2 RN t KX;t , we have #Kyt #KX;t for all t 2 T . Let now t 2 T n froot.T /g, and let t C 2 T be its father. The computation of Z t C EX;t requires not more than 2.#Kyt C /.#KX;t C /.#KX;t / 2.#KX;t C /2 .#KX;t / operations, the computation of RX;s S t;s requires not more than 2 yX;s /.#LX;s /.#KX;t / 2.#LX;s /2 .#KX;t / 2lX;s lX;t 2.#L
operations, and therefore 2 .#KX;t /.#KX;t C /2 C
X
2 kX;t lX;s
s2rowC .t/
operations are sufficient to set up the matrix Yyt 2 RKX;t N t . Due to Lemma 5.17, computing its Householder factorization using Algorithm 15 requires not more than Cqr .#KX;t /.#N t / minf#KX;t ; #N t g Cqr .#KX;t /2 .#N t / X yX;s Cqr .#KX;t /2 #Kyt C C #L
s2rowC .t/
Cqr .#KX;t /2 .#KX;t C / C
X
2 kX;t lX;s
s2rowC .t/
operations. Using (5.9), we get the bound maxf2; Cqr g .#KX;t /.#KX;t C /2 C .#KX;t /2 .#KX;t C / C
X
2 2 kX;t lX;s C kX;t lX;s
s2rowC .t/
maxf2; Cqr g .#KX;t /.#KX;t C /2 C .#KX;t /2 .#KX;t C / C
X
3 3 kX;t C lX;s
s2rowC .t/
for the number of operations required in one cluster t 2 T n froot.T /g. For the root cluster r ´ root.T / we get the same estimate without the terms involving #KX;t C , i.e., X 3 3 maxf2; Cqr g kX;r C lX;s : s2rowC .r/
For the number of operations for all clusters except for the root r we get the bound X X X X maxf2; Cqr g .#KX;t /2 .#KX;t C / C .#KX;t /.#KX;t C /2 t C 2T t2sons.t C /
t C 2T t2sons.t C /
6.6 Recompression of H 2 -matrices
C
X
X
3 3 kX;t C lX;s
t2T nfrg s2rowC .t/
X
C
X
X
2 kX;t C .#KX;t C / C
maxf2; Cqr g
t C 2T
X
3 3 kX;t C lX;s
t2T nfrg s2rowC .t/
X 3 kX;t maxf2; Cqr g 2 C C
245
kX;t C .#KX;t C /2
t C 2T
X
X
3 3 kX;t C lX;s ;
t2T nfrg s2rowC .t/
t C 2T
and adding the bound for the root cluster yields the bound X X X 3 3 3 kX;t C maxf2; Cqr g kX;t C lX;s 2 maxf2; Cqr g t2T
D 2 maxf2; Cqr g
X
t2T s2rowC .t/ 3 kX;t C maxf2; Cqr g
X
3 3 kX;t C lX;s
bD.t;s/2LC J
t2T
for the total number of operations. In the last step, we compute the adaptive cluster basis by using the weights .Z t / t2T and equation (6.26) to replace the total cluster basis matrices X t of the original Algorithm 25 by the condensed matrices X tc D VX;t Z t . Since the handling of the total cluster bases has been almost completely moved into Algorithm 28, the resulting Algorithm 29 closely resembles the truncation Algorithm 19 up to an important difference: the singular value decompositions are not computed for VX;t and VyX;t , but for the weighted matrices X tc D VX;t Z t and Xytc D VyX;t Z t . Due to this modification, the z t D Qt VyX;t Z t do not, as in Algorithm 19, describe the change of basis matrices R from VX;t to Q t , and we have to compute R t D Qt VX;t explicitly. All three phases of the procedure are collected in the “front end” Algorithm 30, which calls the orthogonalization Algorithm 16, the weighting Algorithm 28 and the modified truncation Algorithm 29 in order to construct an adaptive orthogonal cluster basis for an H 2 -matrix X . We can avoid the preparation step of calling Algorithm 16 to get the matrices .RX;s /s2TJ if we know that the cluster basis W is orthogonal, since then we have RX;s D I for all s 2 TJ (cf. Section 6.7). Theorem 6.27 (Complexity). Let .kX;t / t2T and .lX;s /s2TJ be defined as in (3.16) and (3.18) for the rank distributions KX and LX . There is a constant Cad 2 R3 such that Algorithm 30 requires not more than X X 3 3 Cad .Csp C 2/ kX;t C lX;s t2T
s2TJ
246
6 Compression
Algorithm 29. Recursion for finding adaptive cluster bases for H 2 -matrices. y , var Q, R, L) procedure AdaptiveH2Rec(t, VX , Z, K, if sons.t/ D ; then y
Kt VX;t Z t 2 R ; X tc tO c O z t , L t ); Lowrank(X t , t , t , Q t , R Rt Qt VX;t else ;; Mt for t 0 2 sons.t / do y , Q, R, L); AdaptiveH2Rec(t 0 , VX , Z, K, P Lt 0 Mt Mt [ end for; VyX;t 0 2 RM t KX;t ; 0 for t 2 sons.t / do R t 0 EX;t 0 ; VyX;t jL 0 KyX;t t end for; y Xytc VyX;t Z t 2 RM t K t ; yt , R z t , L t ); Lowrank(Xytc , M t , t , Q 0 for t 2 sons.t / do y t jL 0 L t Ft 0 Q t end for; y t VyX;t Rt Q end if
Algorithm 30. Adaptive cluster bases for H 2 -matrices. procedure AdaptiveH2(VX , WX , S, , var Q, R, L) yX ); Orthogonalize(root.TJ /, WX , QX , RX , L y TotalWeights(root.T /, TJ , ;, 0, VX , RX , S, Z, K); y , Q, R, L) AdaptiveH2Rec(root.T /, VX , Z, K,
fAlgorithm 18g
fAlgorithm 18g
fAlgorithm 16g fAlgorithm 28g fAlgorithm 29g
operations. If the rank distributions KX and LX are .Cbn ; ˛; ˇ; r; /-bounded and if the trees T and TJ are .Crc ; ˛; ˇ; r; /-regular, the number of operations is in O..˛ C ˇ/2r .n C nJ //. Proof. According to Lemma 5.18, the computation of the weight matrices .RX;s /s2TJ requires not more than X 3 lX;s (6.33) .Cqr C 2/ s2TJ
operations. According to Lemma 6.26, the computation of the total weight matrices
6.6 Recompression of H 2 -matrices
.Z t / t2T requires not more than X X 3 kX;t C maxf2; Cqr g 2 t2T
3 3 kX;t C lX;s
bD.t;s/2LC J
X X 3 D maxf2; Cqr g 2 kX;t C
X
3 kX;t C
t2T s2rowC .t/
t2T
247
X
X
3 lX;s
s2TJ t2colC .s/
X X X 3 3 3 kX;t C Csp kX;t C Csp lX;s maxf2; Cqr g 2 t2T
X
.Csp C 2/ maxf2; Cqr g
t2T 3 kX;t C
t2T
X
3 lX;s
s2TJ
(6.34)
s2TJ
operations. Once the weights .Z t / t2T have been computed, Algorithm 30 starts the main recursion by calling Algorithm 29. Let t 2 T . If sons.t / D ;, we compute X tc D 3 operations, construct Q t using Algorithm 18 VX;t Z t in 2.#tO/.#KX;t /.#Kyt / 2kX;t 3 2 O in not more than Cpr ..#t /.#KX;t / / Cpr kX;t operations (cf. Lemma 5.21) and R t D 3 Q t VX;t in 2.#L t /.#tO/.#KX;t / 2kX;t operations, which leads to a total complexity of 3 .Cpr C 4/kX;t for each leaf cluster. If sons.t/ ¤ ;, (5.15) implies that we can compute the matrix VyX;t in X X 2.#L t 0 /.#KX;t 0 /.#KX;t / 2.#KX;t / .#KX;t 0 /2 t 0 2sons.t/
t 0 2sons.t/
2.#KX;t /
X
2 #KX;t 0
3 2kX;t
t 0 2sons.t/ 3 operations, construct Xytc D VyX;t Z t in 2.#M t /.#KX;t /.#Kyt / 2kX;t operations, 3 2 y t in not more than Cpr .#M t /.#Kyt / Cpr k operations apply Algorithm 18 to find Q X;t 3 and then compute R t in 2.#L t /.#M t /.#KX;t / 2kX;t operations. Therefore we also have a complexity of 3 .Cpr C 4/kX;t
for each non-leaf cluster. Adding the bounds for all t 2 T yields X 3 kX;t : .Cpr C 4/
(6.35)
t2T
Adding (6.33), (6.34) and (6.35) gives us the desired estimate with Cad ´ maxf2; Cqr g C maxfCqr =2 C 1; Cpr =2 C 2g: If KX and LX are .Cbn ; ˛; ˇ; r; /-bounded and if T and TJ are .Crc ; ˛; ˇ; r; /-regular, we can again use the Lemmas 3.45 and 3.48 to complete the proof.
248
6 Compression
6.7 Unification and hierarchical compression The Algorithm 27 for the construction of an adaptive row cluster basis for an H matrix X 2 RJ computes the same cluster basis as Algorithm 25, and according to Lemma 6.23, this means that we achieve an almost optimal result. The disadvantage of this procedure is that the entire matrix X has to be given in H matrix representation before Algorithm 27 can begin to construct the cluster basis. In some applications, e.g., if we are approximating a boundary element matrix by adaptive [4], [7] or hybrid cross approximation [17], if the block cluster tree is determined by an adaptive coarsening strategy [50] or if the blocks result from an a posteriori matrix product (cf. Chapter 8), we would like to convert submatrices into an H 2 -matrix format as soon as possible in order to save storage. Turning a single low-rank matrix block Ab Bb into an H 2 -matrix with adaptive cluster bases is a simple task: we can use Sb D I , Vb D Ab and Wb D Bb in order to get Ab Bb D Vb Wb D Vb Sb Wb : This means that we can convert the submatrices corresponding to the leaf blocks of TJ into individual H 2 -matrices, but each of these H 2 -matrices will have its own cluster bases, so we cannot simply add them to construct a global H 2 -matrix approximation. We solve this problem by a blockwise recursion: let b D .t; s/ 2 TJ . If b is a leaf, the construction of cluster bases Vb D .Vb;t / t 2sons .t/ , Wb D .Wb;s /s 2sons .s/ is straightforward. Otherwise, we assume that cluster bases Vb 0 and Wb 0 and H 2 -matrix approximations Xb 0 2 H 2 .TJ ; Vb 0 ; Wb 0 / have been computed for all submatrices t 0 Xs 0 corresponding to sons b 0 D .t 0 ; s 0 / 2 sons.b/ of b and aim to combine these submatrices into a new H 2 -matrix approximation Xb 2 H 2 .TJ ; Vb ; Wb / of t Xs .
Construction of unified cluster bases Let us first consider the task in a more general setting. We assume that we have a finite index set B and a family .Xb /b2B of H 2 -matrices. For each b 2 B, there are a row cluster basis Vb D .Vb;t / t2T with rank distribution Kb D .Kb;t / t2T and a column cluster basis Wb D .Wb;s /s2TJ with rank distribution Lb D .Lb;s /s2TJ such that Xb 2 H 2 .TJ ; Vb ; Wb /
holds for all b 2 B:
We are looking for a row cluster basis Q D .Q t / t2T such that the combined matrix given by
Xz D Xb1 : : : Xbˇ for ˇ ´ #B and fb1 ; : : : ; bˇ g ´ B can be approximated, i.e., such that each of the matrices Xb can be approximated in the H 2 -matrix space H 2 .TJ ; Q; Wb /. Since this cluster basis Q can be used for all matrices Xb , we call it a unified row cluster basis for the family .Xb /b2B .
6.7 Unification and hierarchical compression
249
For each b 2 B, we introduce the corresponding total cluster basis .Xb;t / t2T given by X X Xb;t ´ (6.36) t Xb s for all t 2 T : t C 2pred.t/ s2rowC .t C /
Let t 2 T . Since Xb is an H 2 -matrix in the space H 2 .TJ ; Vb ; Wb /, we can proceed as in the previous section to find a matrix Yb;t 2 RJKb;t with ; Xb;t D Vb;t Yb;t
and applying Lemma 5.16 to Yb;t yields an index set Kyb;t J with #Kyb;t #Kb;t , y y an orthogonal matrix Pb;t 2 RJKb;t and a weight matrix Zb;t 2 RKb;t Kb;t such that Yb;t D Pb;t Zb;t holds and we get
Pb;t ; Xb;t D Vb;t Zb;t
(6.37)
c i.e., we can replace the total cluster basis Xb;t by the condensed matrix Xb;t ´ Vb;t Zb;t . The total cluster basis of Xz is given by
Xzt ´ Xb1 ;t : : : Xbˇ ;t for all t 2 T ;
and using equation (6.37) allows us to represent it in the condensed form
Xzt D Xb1 ;t : : : Xbˇ ;t 0 1 Pb1 ;t B C :: D Vb1 ;t Zb1 ;t : : : Vbˇ ;t Zbˇ ;t @ A : Pbˇ ;t 0 1 Pb1 ;t C :: cB z D Xt @ A : Pbˇ ;t for the matrix
Xztc ´ Vb1 ;t Zb1 ;t
:::
and the index set Nt ´
[
t Vbˇ ;t Zbˇ ;t 2 RN tO Kyb;t :
b2B
Due to #N t .#Kb1 ;t / C C .#Kbˇ ;t /, the condensed matrix Xztc will typically be significantly smaller than the original total cluster basis matrix Xzt .
250
6 Compression
We can proceed as in Algorithm 30: in leaf clusters, a singular value decomposition of Xztc yields the cluster basis matrix Q t and Rb;t ´ Qt Vb;t , in non-leaf clusters we let D # sons.t / and ft1 ; : : : ; t g D sons.t / and use 1 0 Rb1 ;t1 Eb;t1 C B :: M K Vyb;t D @ A 2 R t b;t ; :
Rb1 ;t Eb;t
Xytc D Vyb1 ;t Zb1 ;t
:::
Vybˇ ;t Zbˇ ;t 2 RM t N t
y t and Rb;t ´ Q y t Vyb;t . Essentially the only difference between to construct the matrix Q the new algorithm and the original Algorithm 30 is that the new basis is required to approximate multiple total cluster bases simultaneously, and that we therefore have to work with multiple original cluster bases and multiple weights in parallel. Lemma 6.28 (Complexity). Let .l t / t2T and .kb;t /b2B;t2T be defined as in (6.18) and (3.16) for the rank distributions L and Kb , b 2 B. We assume that the weight matrices .Zb;t / t2T have already been computed. Algorithm 31 requires not more than X X 2 maxf4; Cpr C 2g .l t kb;t C l t2 kb;t / b2B t2T
operations. If the rank distributions L and Kb , b 2 B are .Cbn ; ˛; ˇ; r; /-bounded and if T is .Crc ; ˛; ˇ; r; /-regular, the number of operations is in O..˛ C ˇ/2r .#B/n /. Proof. Let t 2 T . for all b 2 If sons.t/ D ;, Algorithm 31 computes Xztc by multiplying Vb;t and Zb;t 2 2 B, and this requires not more than 2.#tO/.#Kb;t /.#Kyb;t / 2.#tO/.#Kb;t / 2l t kb;t operations for each b 2 B. Then Algorithm 18 is called to compute Q t . According to Lemma 5.21, this can be accomplished in Cpr .#tO/2 .#N t / Cpr l t2 .#N t / operations. Finally the matrices Rb;t are computed by not more than 2.#L t /.#tO/.#Kb;t / 2 2l t kb;t operations for each b 2 B. We conclude that a leaf cluster requires not more than X 2 2l t kb;t C .Cpr C 2/l t2 kb;t operations. (6.38) b2B
If sons.t/ ¤ ;, Algorithm 31 computes Vyb;t by multiplying Rb;t 0 and Eb;t 0 for all blocks b 2 B and all sons t 0 2 sons.t /. This requires 2.#L t 0 /.#Kb;t 0 /.#Kb;t / operations per block and son, leading to a total of X X 2.#L t 0 /.#Kb;t 0 /.#Kb;t / b2B t 0 2sons.t/
6.7 Unification and hierarchical compression
2
X b2B
2
X
b2B
X
.#L t 0 /2
t 0 2sons.t/
X
1=2
#L t 0
t 0 2sons.t/
X
X
.#Kb;t 0 /2
251
1=2 .#Kb;t /
t 0 2sons.t/
X 2 #Kb;t 0 .#Kb;t / 2 l t kb;t
t 0 2sons.t/
b2B
operations. Once the matrices Vyb;t are available, the matrix Xytc is computed by multiplying Vyb;t 2 by Zb;t , which takes 2.#M t /.#Kb;t /.Kyb;t / 2.#M t /.#Kb;t /2 2l t kb;t operations per block and a total of X 2 l t kb;t operations. 2 b2B
y t . Due to Lemma 5.21, this takes not We use Algorithm 18 to compute the matrix Q more than X 2 X Cpr .#M t /2 .#N t / D Cpr #L t 0 .#N t / Cpr l t2 #Kyb;t t 0 2sons.t/
Cpr l t2
X
#Kb;t Cpr l t2
b2B
X
b2B
kb;t
operations.
b2B
Finally the matrices Rb;t are computed by 2.#L t /.#M t /.#Kb;t / 2l t2 kb;t operations per block, so we can conclude that a non-leaf cluster requires not more than X 2 4l t kb;t C .Cpr C 2/l t2 kb;t operations. (6.39) b2B
We can complete the proof by combining the estimates (6.38) and (6.39) to get the bound X 2 maxf4; Cpr C 2g l t kb;t C l t2 kb;t (6.40) b2B
for one cluster, and summing the bounds for all clusters completes the first part of the proof. If L and Kb , b 2 B, are .Cbn ; ˛; ˇ; r; /-bounded and if T is .Crc ; ˛; ˇ; r; /regular, we can apply the Lemmas 3.45 and 3.48 to prove the asymptotic bound. The complexity estimate of Lemma 6.28 for Algorithm 31 requires explicit bounds for the rank distribution L of the new cluster basis Q, while the complexity estimate of Theorem 6.27 for the closely related Algorithm 30 uses only bounds for the original cluster bases. Assuming that the cardinality of B is bounded, we can derive bounds for L from bounds for the rank distributions .Kb /b2T and find the following result, which closely resembles the one of Theorem 6.27. Corollary 6.29. Let kb D .kb;t / t2T be defined as in (3.16) for the rank distributions Kb , b 2 B. We assume that the weight matrices .Zb;t / t2T have already been
252
6 Compression
computed. Algorithm 31 requires not more than maxf4; Cpr C 2g..#B/3 C 1/
X X
3 kb;t
b2B t2T
operations. Algorithm 31. Recursion for finding unified cluster bases for multiple H 2 -matrices. procedure UnifiedBasesRec(t , .Vb /b2B , .Zb /b2B , .Kyb /b2B , , var Q, .Rb /b2B , L) Nt ;; for b 2 B do P Kyb;t Nt Nt [ end for; if sons.t/ D ; then Xztc 0 2 RN t ; for b 2 B do Xztc jKyb;t Vb;t Zb;t end for; z t , L t ); Lowrank(Xztc , tO, t , Q t , R fAlgorithm 18g for b 2 B do Rb;t Qt Vb;t end for else Mt ;; for t 0 2 sons.t / do UnifiedBasesRec(t 0 , .Vb /b2B , .Zb /b2B , .Kyb /b2B , , Q, .Rb /b2B , L); P Lt 0 Mt [ Mt end for; Xytc 0 2 RM t N t ; for b 2 B do y Vyb;t 0 2 RM t Kb;t ; for t 0 2 sons.t / do Rb;t 0 Eb;t 0 Vyb;t jL 0 Kyb;t t end for; Xytc jM t Kyb;t Vyb;t Zb;t end for; yt , R z t , L t ); Lowrank(Xytc , M t , t , Q fAlgorithm 18g for b 2 B do y t Vyb;t Rb;t Q end for end if
6.7 Unification and hierarchical compression
253
Proof. Let l D .l t / t2T be defined as in (6.18) for the rank distribution L D .L t / t2T of the cluster basis Q D .Q t / t2T . Lemma 5.19 implies that #L t minf#M t ; #N t g holds for all t 2 T . The definition of N t yields X #L t #N t D #Kb;t for all t 2 T ; b2B
and this gives us lt
X
kb;t
for all t 2 T :
b2B
According to (6.40) and the simple estimate (5.9), the number of operations for a cluster t 2 T is bounded by X X 2 3 kb;t l t C kb;t l t2 maxf4; Cpr C 2g .kb;t C l t3 /: maxf4; Cpr C 2g b2B
b2B
We apply Hölder’s inequality to the second term and find 3 X 2 X X X 3 3 D .#B/2 kb;t 13=2 kb;t kb;t ; l t3 b2B
b2B
b2B
(6.41)
b2B
therefore we can bound the number of operations for t 2 T by X X 3 kb;t maxf4; Cpr C 2g C .#B/2 kb30 ;t b 0 2B
b2B 3
D maxf4; Cpr C 2g..#B/ C 1/
X
3 kb;t ;
b2B
and summing these bounds for all t 2 T completes the proof.
Unification of submatrices Now we can return our attention to the problem of constructing a unified cluster basis for a matrix consisting of several submatrices. Submatrices correspond to subtrees of the block cluster tree TJ . The elements of these subtrees are formed from clusters contained in subtrees of the cluster trees T and TJ . In order to handle these subtrees efficiently, we introduce the following notation: Definition 6.30 (Subtrees). For t 2 T , we denote the subtree of T with root t by T t . For s 2 TJ , we denote the subtree of TJ with root s by Ts . For b 2 TJ , we denote the subtree of TJ with root b by Tb . We denote its leaves by Lb and its admissible leaves by LC . b
254
6 Compression
Let b0 D .t0 ; s0 / 2 TJ be a non-leaf block in TJ . We assume that Xb D t Xb s 2 H 2 .TJ ; Vb ; Wb / holds for all b D .t; s/ 2 sons.b0 / with a row basis .Vb;t / t 2T t and a column basis .Wb;s /s 2Ts , i.e., that all submatrices are H 2 -matrices and represented accordingly. Let t 2 sonsC .t0 /, and let B ´ f.t; s/ W s 2 sonsC .s0 /g sons.b0 / (with sonsC .s0 / as in Definition 3.13). We use Algorithm 28 to compute weight matrices .Zb;t / t2T for all b 2 B, then we apply Algorithm 31 to determine the new orthogonal cluster basis Vb0 for all descendants of t (cf. Figure 6.5).
X t1 ;s1
X t1 ;s2
Vb0 ;t1
X t1 ;s1
X t1 ;s2
X t2 ;s1
X t2 ;s2
Vb0 ;t2
X t2 ;s1
X t2 ;s2
Wb0 ;s1
Wb0 ;s2
Figure 6.5. Computation of unified cluster bases Vb0 and Wb0 for a block matrix b0 D .t0 ; s0 / with sons.t0 / D ft1 ; t2 g and sons.s0 / D fs1 ; s2 g.
Once we have done this for all t 2 sonsC .t0 /, we have determined the cluster basis matrices Vb0 ;t for all t 2 sons .t0 / n ft0 g. Since no admissible block uses Vb0 ;t0 , we can set it equal to zero and complete the construction of the new cluster basis Vb0 . We repeat this procedure for the column cluster basis Wb0 and project all submatrices Xb into the new H 2 -matrix space H 2 .TJ ; Vb0 ; Wb0 / by using the corresponding cluster operators provided by Algorithm 31. The procedure is summarized in Algorithm 32. To keep the notation simple, we denote the coupling matrices of Xb by Sb D ..Sb /s ;t /b D.t ;s /2LC , defined as b
.Sb /s ;t ´ .Sb;.t ;s / / for all b D .t ; s / 2 LC . This allows us to compute total weights for Xb by applying b Algorithm 28 to the transposed block cluster tree Tb and the coupling matrices Sb .
6.7 Unification and hierarchical compression
255
Algorithm 32. Construct an approximation Xb0 2 H 2 .Tb0 ; Vb0 ; Wb0 / by combining approximations Xb 2 H 2 .Tb ; Vb ; Wb / of the subblocks b 2 sons.b0 /. procedure UnifySubmatrices(b0 , .Vb /b2sons.b0 / , .Wb /b2sons.b0 / , .Sb /b2sons.b0 / , , var Vb0 , Wb0 , Sb0 ) b0 ; .t0 ; s0 / fConstruct row cluster basis Vb0 g for t 2 sonsC .t0 / do B ;; for s 2 sonsC .s0 / do b .t; s/; B B [ fbg; Orthogonalize(t, Wb , Q, Rb , Lb ); fAlgorithm 16, Q is not usedg TotalWeights(t, Tb , ;, 0, Vb , Rb , Sb , Zb , Kyb ) fAlgorithm 28g end for; UnifiedBasesRec(t , .Vb /b2B , .Zb /b2B , .Kyb /b2B , , Vb0 , .RV;b /b2B , Kb0 ) end for; fConstruct column cluster basis Wb0 g for s 2 sonsC .s0 / do B ;; for t 2 sonsC .t0 / do b .t; s/; B B [ fbg; Orthogonalize(t , Vb , Q, Rb , Lb ); fAlgorithm 16, Q is not usedg TotalWeights(s, Tb , ;, 0, Wb , Rb , Sb , Zb , Kyb ) fAlgorithm 28g end for; UnifiedBasesRec(s, .Wb /b2B , .Zb /b2B , .Kyb /b2B , , Wb0 , .RW;b /b2B , Lb0 ) end for; fSwitch coupling matrices to new basesg for b 2 sons.b0 / do for b D .t ; s / 2 LC do b Sb0 ;b RV;b;t Sb;b RW;b;s end for end for
Lemma 6.31 (Complexity). Let b0 2 TJ , and let row cluster bases Vb with rank distributions Kb and columns cluster bases Wb with rank distributions Lb be given for all b 2 sons.b0 /. Let kb and lb be defined as in (3.16) and (3.18) for Kb and Lb . Let Kb0 and Lb0 be the rank distributions computed by Algorithm 32, and let kb0 and lb0 be defined as before. Let kO t ´ maxfkb0 ;t ; kb;t W b D .t; s/ 2 sons.b0 / with t 2 T t g; lOs ´ maxflb0 ;s ; lb;s W b D .t; s/ 2 sons.b0 / with s 2 Ts g
256
6 Compression
for all t 2 T t , s 2 Ts , .t; s/ 2 sons.b0 /. Algorithm 32 requires not more than X X X X Cuc kO t3 C Cuc lOs3 C Cub .kO t3 C lOs3 / bD.t;s/2sons.b0 /
t 2T t
s 2Ts
b D.t ;s /2LC b
operations, where Cuc ; Cub 2 N are constants that depend only on Cqr and Cpr . Proof. We first consider the construction of the unified row basis Vb0 . According to Lemma 5.18, the weights Rb for each b D .t; s/ 2 sons.b0 / can be computed in X 2 maxf2; Cqr g lOs3 s2Ts
operations, and Lemma 6.26 states that the weights of the total cluster basis for the block b can be computed in X X maxf2; Cqr g 2 .kO t3 C lOs3 / kO t3 C t 2T t
b D.t ;s /2LC b
operations. Together we get a bound of X X maxf2; Cqr g 2 kO t3 C 2 lOs3 C t 2T t
s 2Ts
X
.kO t3 C lOs3 /
(6.42)
b D.t ;s /2LC b
for each block. Once the total weights for all blocks have been computed, Algorithm 31 is applied to each t 2 sonsC .t0 /, and Lemma 6.28 yields the bound X X 2 maxf4; Cpr C 2g kO t3 bD.t;s/ t 2T t s2sonsC .s0 /
for each t 2 sonsC .t0 /. Adding up gives us the bound X X 2 maxf4; Cpr C 2g kO t3 : bD.t;s/2sons.b0 /
t 2T
(6.43)
t
By similar arguments, (6.42) is also a bound for the construction of the weights for the unified column basis Wb0 , and the construction of the basis itself requires not more than X X lOs3 (6.44) 2 maxf4; Cpr C 2g bD.t;s/2sons.b0 / s 2Ts
operations. We assume that the computation of the new coupling matrices Sb0 ;b for all b 2 sons.b0 / and b 2 LC is performed from right to left: the product Sb;b RW;b;s b
257
6.7 Unification and hierarchical compression
can be computed in 2.#Kb;t /.#Lb;s /.#Lb0 ;s / 2kO t lOs2 operations, and the prod O2 O uct RV;b;t Sb;b RW;b;s can be computed in 2.#Kb0 ;t /.#Kb;t /.#Lb0 ;s / 2k t ls additional operations. Constructing all coupling matrices therefore takes not more than X X X X 2kO t lOs2 C 2kO t2 lOs 2 .kO t3 C lOs3 / b2sons.b0 / b D.t ;s /2LC
b2sons.b0 / b D.t ;s /2LC
b
b
(6.45) operations. Here we have used (5.9) again to separate the factors in the sum. Adding (6.42) twice, (6.43), (6.44) and (6.45) yields the upper bound X X X X Cuc kO 3 C Cuc lO3 C Cub .kO 3 C lO3 / t
bD.t;s/2sons.b0 /
t Tt
s
t
s 2Ts
s
b D.t ;s /2LC b
with the constants Cuc ´ 4 maxf2; Cqr g C 2 maxf4; Cpr C 2g;
Cub ´ 2 maxf2; Cqr g C 2:
Hierarchical compression Now that we have the unification Algorithm 32 at our disposal, we can convert an H matrix into an H 2 -matrix by the blockwise approach of Algorithm 33: we recursively merge submatrices until we have found an approximation of the entire matrix [14]. Algorithm 33. Blockwise conversion of an H -matrix into an H 2 -matrix with adaptively chosen cluster bases. procedure ConvertHtoH2(b, X , , var Vb , Wb , Sb ) .t; s/ b; if b 2 L J then fNothing to do for inadmissible leavesg else if b D .t; s/ 2 LC J then K
JK
Ab 2 RtO b ; Wb;s Bb 2 RsO b ; Sb;b I 2 RKb Kb Vb;t Kb;t Kb ; Lb;s Kb else for b 0 2 sons.b/ do b 0 ; Xb 0 t 0 Xs 0 ; .t 0 ; s 0 / ConvertHtoH2(b 0 , Xb 0 , , Vb 0 , Wb 0 , Sb 0 ) end for; UnifySubmatrices(b, .Vb 0 /b 0 2sons.b/ , .Wb 0 /b 0 2sons.b/ , .Sb 0 /b 0 2sons.b/ , b , Vb , Wb , Sb ) end if
258
6 Compression
Theorem 6.32 (Complexity). Let X 2 RJ be a hierarchical matrix for the block cluster tree TJ , given in the form (3.12). Similar to Lemma 5.12, we let Kyt ´ f1; : : : ; maxfkb;t W b D .t C ; s/ 2 TJ ; t C 2 pred.t /gg y s ´ f1; : : : ; maxflb;s W b D .t; s C / 2 TJ ; s C 2 pred.s/gg L
for all t 2 T ; for all s 2 TJ
denote index sets for the maximal ranks. Let .kb;t / t 2T t and .lb;s /s 2Ts be defined as in (3.16) and (3.18) for all b D .t; s/ 2 TJ , and let .kO t / t2T and .lOs /s2TJ be y Let pJ be the depth of TJ . Algorithm 33 requires defined similarly for Ky and L. not more than X X Csp .Cuc C Cub /.pJ C 1/ kO t3 C lOs3 t2T
s2TJ
y s are .Cbn ; ˛; ˇ; r; /-bounded and if T , TJ are .Crc ; ˛; ˇ; r; /operations. If Kyt , L regular, the number of operations is in O..˛ C ˇ/2r .pJ C 1/.n C nJ //. Proof. The algorithm only performs work for blocks b0 2 TJ n LJ that are not leaves, and the number of operations for each of these blocks can be bounded by Lemma 6.31. Adding these bounds yields X X X X Cuc kO 3 C Cuc lO3 t
s
t 2T t
b0 2TJ nLJ bD.t;s/2sons.b0 /
X
C Cub
X
Cuc
X
kO t3 C Cuc X
C Cub X
D Cuc
X
X
X
lOs3
s 2Ts
kO t3 C lOs3
b D.t ;s /2LC b
X
X
X
kO t3 C Cuc
t2T s2row.t/ t 2T t level.t/pJ
C Cub
kO t3 C lOs3
b D.t ;s /2LC b
t 2T t
bD.t;s/2TJ
s 2Ts
X
s2TJ t2col.s/ s 2Ts level.s/pJ
kO t3 C lOs3
b2TJ b D.t ;s /2LC b
Cuc Csp
X
kO t3 #ft 2 pred.t / W level.t / pJ g
t 2T
C Cuc Csp
X
lOs3 #fs 2 pred.s / W level.s/ pJ g
s 2TJ
C Cub
X
b 2LC b
X
.kO t3 C lOs3 /#fb 2 pred.b /g
lOs3
6.8 Refined error control and variable-rank approximation
259
X X kO t3 C lOs3 Cuc Csp .pJ C 1/ t2T
C Cub .pJ
X C 1/
s2TJ
X
kO t3 C
t2T s2rowC .t/
Cuc Csp .pJ
C Cub Csp .pJ
X
lOs3
s2TJ t2colC .s/
X X kO t3 C lOs3 C 1/ t2T
X
s2TJ
X X C 1/ kO t3 C lOs3 t2T
D Csp .Cuc C Cub /.pJ
s2TJ
X X kO t3 C lOs3 ; C 1/ t2T
s2TJ
where we have used again the Csp -sparsity of TJ and the fact that we have level.t /, level.s/ level.b/ pJ for all b D .t; s/ 2 TJ due to Definition 3.12. y are .Cbn ; ˛; ˇ; r; /-bounded and if T and TJ are .Crc ; ˛; ˇ; r; /-regular, If Ky and L we can again apply Lemmas 3.45 and 3.48 to complete the proof. Remark 6.33 (Error control). Since the Algorithm 31 used to construct the unified cluster bases is only an efficient realization of the original compression Algorithm 25, the error estimates provided by Corollary 6.22 and the comments of Remark 6.24 apply, i.e., we can guarantee any given accuracy by choosing the error tolerances used in the truncation procedures appropriately. Using the simple recursive estimate X X Xb 0 C Xb 0 Xb k t Xs Xb k D t Xs
X
b 0 2sons.b/
b 0 D.t 0 ;s 0 /2sons.b/
b 0 2sons.b/
k t 0 Xs 0 Xb 0 k C
X
Xb 0 X b
b 0 2sons.b/
for b D .t; s/ 2 TJ , we can see that the total error resulting from Algorithm 33 can be bounded by the sum of the errors introduced by the unification Algorithm 32 for all non-leaf blocks of the block cluster tree TJ , i.e., we can guarantee a given accuracy of the result by ensuring that the intermediate approximation steps are sufficiently accurate.
6.8 Refined error control and variable-rank approximation The compression Algorithms 25, 27, 30 and 32 allow us to control only the total error by means of Corollary 6.22. In some situations, it is desirable to ensure that the error in individual blocks can be controlled. Important applications include estimates for the
260
6 Compression
relative error and the algebraic counterpart of the variable-order schemes introduced in Section 4.7. We assume that an error tolerance b 2 R>0 is given for all blocks b 2 LC J , and our goal is to ensure k t Xs Q t Qt Xs k2 b
for all b D .t; s/ 2 LC J ;
(6.46)
i.e., that the spectral error in each individual admissible block b 2 LC J is bounded by the corresponding parameter b . Let b D .t; s/ 2 LC J . According to Lemma 5.7, we have t Xs D X t s . Applying Theorem 5.30 yields X y r Xyr /Er;t s k22 ; yr Q k.Xyr Q k.X t Q t Qt X t /s k22 r2sons .t/
where the transfer matrix Er;t for the total cluster basis is given by equation (6.6) of Lemma 6.12. Since s 2 row .r/ holds for all r 2 sons .t /, we can use Lemma 5.7 again to prove Er;t s D s and get the estimate X yr Q y r Xyr /s k22 : k.X t Q t Qt X t /s k22 k.Xyr Q (6.47) r2sons .t/
The quantities on the right-hand side of this estimate appear explicitly during the course of the compression Algorithm 25, and Lemma 5.19 allows us to ensure that they are below a prescribed bound. Therefore we now have to find bounds for the right-hand side that ensure that the left-hand side of the estimate is bounded by b2 . As in Remark 5.33, we achieve this by turning the sum into a geometric sum: we fix p 2 .0; 1/ and assume yr Q y r Xyr /s k2 Ob p level.r/level.t/ k.Xyr Q
for all r 2 sons .t /;
(6.48)
for a constant Ob 2 R>0 we will specify later. Under this assumption, we find X y r Xyr /s k22 yr Q k t Xs Q t Qt Xs k22 k.Xyr Q
X
Ob2
r2sons .t/
p
2.level.r/level.t//
r2sons .t/
D
Ob2
p X
p 2.`level.t// #fr 2 sons .t / W level.r/ D `g:
(6.49)
`Dlevel.t/
In order to control the right-hand side, we have to assume that the cluster tree T is .Cbc ; ˛; ˇ; r; /-bounded, since this implies # sons.t / Cbc for all t 2 T , and a simple induction yields `level.t/ #fr 2 sons .t / W level.r/ D `g Cbc :
6.8 Refined error control and variable-rank approximation
261
Assuming p 2 < 1=Cbc 1, we can combine the estimate with (6.49) and find k t Xs
Q t Qt Xs k22
p X
Ob2
.Cbc p 2 /`level.t/
`Dlevel.t/
< Ob2
1 X
.Cbc p 2 /` D Ob2
`D0
1 : 1 Cbc p 2
p If we ensure Ob b 1 Cbc p 2 , we get k t Xs Q t Qt Xs k22 < Ob2
1 b2 ; 1 Cbc p 2
(6.50)
i.e., we have reached our goal provided that we can ensure (6.48). For a single block, this is straightforward. For all blocks, we have to use a simple reformulation: we introduce the weights !r;s ´ Ob p level.r/level.t/
for all b D .t; s/ 2 LC J ; r 2 sons .t /;
(6.51)
which are well-defined since due to Corollary 3.15 there is exactly one t 2 pred.r/ with .t; s/ 2 LC J , and observe that the inequalities 2 yr Q y r Xyr /s k22 !r;s k.Xyr Q D Ob2 p 2.level.r/level.t//
and 1 y y r Xyr /s k22 1 yr Q .Xr Q k!r;s
(6.52)
are equivalent. The second inequality has the advantage that we can “hide” the additional weight in the matrix Xyr by introducing a suitable generalization of a total cluster basis: Definition 6.34 (Weighted total cluster basis). Let X 2 RJ be a matrix and ! D .!r;s /r2T ;s2row .r/ a family of weights. The family .X!;t / t2T defined by X X 1 X!;t ´ !r;s t Xs for all t 2 T t C 2pred.t/ s2rowC .t C /
is called the weighted total cluster basis corresponding to X and !. The relation between weighted and original total cluster basis is given by 1 1 r Xs D !r;s Xr s X!;r s D !r;s
for all b D .t; s/ 2 LC J ; r 2 sons .t /;
and the inequality (6.52) takes the form yr Q y r Xy!;r /s k22 1 k.Xy!;r Q
for all b D .t; s/ 2 LC J ; r 2 sons .t /: (6.53)
262
6 Compression
Since we have to treat all blocks b D .t; s/ 2 LC J simultaneously, we use the condition yr Q y r Xy!;r k22 1 kXy!;r Q
for all r 2 T ;
(6.54)
which, by definition, implies (6.53) for all admissible blocks and fits perfectly into the context of Algorithm 25. Due to (6.51), the weights ! t;s can be computed by the recursion ´ Ob if .t; s/ 2 LC J ; ! t;s ´ for all t 2 T ; s 2 row .t /: p! t C ;s otherwise with t C D father.t / Including this recursion in Algorithm 25 yields the new Algorithm 34 for computing an adaptive cluster basis for weighted matrices. The additional complexity compared to Algorithm 25 is negligible, therefore the complexity estimate of Lemma 6.20 also holds for Algorithm 34. Of course, the same modifications can also be included in the Algorithms 27 and 30 for converting H - and H 2 -matrices. Now let us consider two important applications of the new algorithm. The first is the construction of cluster bases ensuring that the blockwise relative error of the matrix approximation is bounded, i.e., that k t Xs Q t Qt Xs k2 k t Xs k2
holds for all b D .t; s/ 2 LC J :
This property is very important for adaptive-rank arithmetic operations (cf. [49]). Obviously, the condition fits well into the framework presented here: we only have to apply Algorithm 34 to b ´ k t Xs k2 , i.e., to p Ob ´ k t Xs k2 1 Cbc p 2 : In practice, the local norms k t Xs k2 can be approximated by a power iteration. For dense matrices, this is straightforward, for H - and H 2 -matrices, we can use the techniques introduced in Section 5.6 to reduce the complexity: by Householder factorizations or Algorithm 16, we find orthogonal transformations which reduce t Xs to a small matrix, and applying the power iteration to this matrix is very efficient. Remark 6.35 (Frobenius norm). The construction also applies to the Frobenius norm: Lemma 5.19 allows us to control the approximation error with respect to this norm in each step of the compression algorithm using the weighted matrix, i.e., to ensure yr Q y r Xy!;r k2F 1 kXy!;r Q
for all r 2 T ; p and using b ´ k t Xs kF and Ob ´ k t Xs kF 1 Cbc p 2 yields k t Xs Q t Qt Xs kF k t Xs kF
for all b D .t; s/ 2 LC J :
6.8 Refined error control and variable-rank approximation
263
Algorithm 34. Algorithm for finding adaptive cluster bases for weighted dense matrices. procedure AdaptiveDenseWeighted(t, r t C , X , , var Q, R, L) rt ;; for s 2 r t C do r t [ fsg ! t;s p! t C ;s ; r t end for; for s 2 rowC .t / do Ob ; r t r t [ fsg b .t; s/; ! t;s end for; if sons.t/ D ; then X!;t 0 2 RJ ; tO for s 2 r t do 1 X t;s t Xs ; X!;t X!;t C ! t;s X t;s end for; Lowrank(X!;t , tO, 1, Q t , R!;t , L t ); fAlgorithm 18g for s 2 r t do ! t;s R!;t s R t;s end for else Mt ;; for t 0 2 sons.t / do P Lt 0 AdaptiveDenseWeighted(t 0 , r t , X , , Q, R, L); M t Mt [ end for; 0 2 RM t J ; Xy!;t for s 2 r t do Xyt;s 0 2 RM t J ; 0 for t 2 sons.t / do Xyt;s jL t 0 J R t 0 ;s end for; 1 y Xy!;t Xyt C ! t;s X t;s end for; y t , R!;t , L t ); Lowrank(Xy!;t , tO, 1, Q fAlgorithm 18g 0 for t 2 sons.t / do y t jL 0 L t Ft 0 Q t end for; for s 2 r t do R t;s ! t;s R!;t s end for end if
264
6 Compression
Due to the special properties of the Frobenius norm, we can even prove X k t Xs Q t Qt Xs k2F kX …TJ ;Q; ŒX k2F D b2LC J
X
b2 D
b2LC J
X
X
2 k t Xs k2F
b2LC J
2 k t Xs k2F D 2 kX k2F ;
b2LJ
i.e., the global relative error is also bounded. The second application of Algorithm 34 is related to the variable-order approximation schemes introduced in Section 4.7. We have seen that these schemes can significantly improve the approximation quality while retaining the optimal order of complexity. In general situations, e.g., for kernel functions involving derivatives and for complicated geometries, the construction of a good variable-order interpolation scheme can be fairly complicated. Fortunately, the variable-order scheme is only a means to an end: we are only interested in the resulting H 2 -matrix, not in the way it was constructed. By choosing the weights for Algorithm 34 correctly, we can ensure that the resulting H 2 -matrix has the same properties as one constructed by a variable-order interpolation scheme without actually using this scheme. Given an error tolerance 2 R>0 , we want to ensure kX …TJ ;Q; ŒX k2 ;
(6.55)
and we want to keep the ranks of the cluster basis Q D .Q t / t2T as low as possible. To reach this goal, we combine the spectral norm estimate of Theorem 4.47 with another geometric sum: we let O 2 R>0 and 2 .0; 1/ and use the weights Ob ´ O .p level.t//=2 .pJ level.s//=2
for all b D .t; s/ 2 LC J :
For these weights, (6.50) gives us k t Xs Q t Qt Xs k2 O
.p level.t//=2 .pJ level.s//=2 p 1 Cbc p 2
for all b D .t; s/ 2 LC J ;
and Theorem 4.47 yields X 1=2 X 1=2 Csp O kX …TJ ;Q; ŒX k2 p
p `
pJ ` 1 Cbc p 2 `D0 `D0
p
pJ
1 X Csp O Csp O 1 ;
` D p 1 Cbc p 2 `D0 1 Cbc p 2 1
6.8 Refined error control and variable-rank approximation
265
i.e., the estimate (6.55) holds if we can ensure p .1 / 1 Cbc p 2 : O Csp Choosing values for Ob and ! t;s is simple, the interesting part is to demonstrate that these choices lead to a reasonable complexity of the resulting approximation. As an example, we consider the variable-order approximation of matrices corresponding to integral operators introduced in Section 4.7: we consider the matrix G 2 RJ defined by Z Z 'i .x/ g.x; y/ j .y/ dy dx for all i 2 ; j 2 J; (6.56) Gij ´
where the subdomain or submanifold 2 Rd , the kernel function g and the finite element basis functions .'i /i2 and . j /j 2J are as in Chapter 4. Our goal is to prove that applying Algorithm 34 to this matrix leads to cluster bases that are at least as efficient as the ones constructed in Section 4.7: a quasi-optimality property (cf. Lemma 6.23) allows us to use general results of polynomial interpolation theory to derive upper bounds for the rank, the complicated analysis of the stability of re-interpolation carried out in Section 4.7 can be avoided, since Algorithm 34 replaces the re-interpolation operators by simple orthogonal projections that are always stable. For variable-order parameters ˛; ˇ 2 N – which we will specify later – let the order vectors .m t / t2T be defined as in (4.61) and the corresponding interpolation operators .I t / t2T as in (4.17). Assuming that the cluster tree T is quasi-balanced, Lemma 4.52 guarantees that the rank distribution K D .K t / t2T corresponding to the order vectors .m t / t2T is .Cbn ; ˛; ˇ; d; /-bounded. We require the following generalization of Lemma 6.23: Lemma 6.36 (Weighted quasi-optimality). Let L D .L t / t2T be the rank distribution of the orthogonal cluster basis Q computed by Algorithm 34. Let .A t / t2T t and .B t / t2T be families of matrices satisfying A t 2 RK , B t 2 RJK t , and let tO .G!;t / t2T be the weighted total cluster bases corresponding to G. If kG!;t A t B t k2 1
holds for all t 2 T ;
we have #L t #K t for all t 2 T , i.e., L is also .Cbn ; ˛; ˇ; d; /-bounded. Proof. Let t 2 T . We define ´ U t A t Ayt ´ At
if sons.t / ¤ ;; otherwise
for all t 2 T
and the orthogonality of U t yields y!;t Ayt B t k2 kG!;t A t B t k2 1 kG
for all t 2 T :
(6.57)
266
6 Compression
y!;t with rank #K t . Since the matrices This means that Ayt B t is an approximation of G y t and the corresponding rank distribution L D .L t / t2T are constructed by singular Q value decompositions in Algorithm 18, the quasi-optimality stated in Remark 5.22 y t cannot be larger than #K t . Since the matrices implies that the rank of the matrices Q y Q t are orthogonal, this implies #L t #K t . In order to be able to apply this lemma, we have to construct the matrix families .A t / t2T and .B t / t2T . Among several possible approaches, including Taylor and multipole expansion, we choose interpolation to keep the presentation consistent with Section 4.7: let t 2 T . We interpolate the kernel function in the first argument, i.e., we approximation g by the degenerate function X g. t; ; y/L t; .x/ for all x 2 Q t ; y 2 n Q t ; gQ t .x; y/ ´ 2K t
where . t; /2K t and .L t; /2K t are the interpolation points and Lagrange polynomials corresponding to the bounding box Q t and the order vector m t . As in Section 4.4, we define the low-rank approximation of A t B t of G!;t by replacing g with gQ t : ´R 'i .x/L t; .x/ dx if i 2 tO; for all i 2 ; 2 K t ; .A t /i ´ 0 otherwise ´R if j 2 sO ; j .y/g. t; ; y/ dy .B t;s /j ´ for all j 2 J; 2 K t ; 0 otherwise X 1 Bt ´ ! t;s B t;s : s2row .t/
We can see that A t B t corresponds to a blockwise approximation of G!;t : X X 1 1 ! t;s t Gs ! t;s A t B t;s G!;t A t B t D s2row .t/
D
X
s2row .t/ 1 ! t;s . t Gs A t B t;s /:
s2row .t/
Due to the different weights involved in this sum, we cannot bound its norm directly. Instead, we break it down into individual blocks: 1=2 Lemma 6.37. Let p 2 .0; Cbc / and 2 .0; 1/. Let .!r;s /r2T ;s2row .r/ be as in Definition 6.34. If s 1 .level.t/level.t C //=2 k t Gs A t B t;s k2 ! t;s
(6.58) Csp
267
6.8 Refined error control and variable-rank approximation
holds for all t C 2 pred.t /, s 2 rowC .t C /, we have kG!;t A t B t k2 1: Proof. We apply Lemma 4.45 to the equation X 1 ! t;s . t Gs A t B t;s / G!;t A t B t D s2row .t/
and get kG!;t A t B t k22
X
2 2 ! t;s k t Gs A t B t;s k2
s2row .t/
D
1 Csp 1 Csp
X
X
2 2 level.t/level.t ! t;s ! t;s
C/
t C 2pred.t/ s2rowC .t C /
X
X
level.t/level.t
C/
t C 2pred.t/ s2rowC .t C /
.1 /
X
level.t/level.t
C/
D .1 /
t C 2pred.t/
< .1 /
1 X `D0
` D .1 /
level.t/ X
`
`D0
1 D 1: 1
are determined by the As in Chapter 4, the approximation properties of A t B t;s interpolation error. We apply the results of Section 4.4 to the interpolant g t :
Lemma 6.38. Let g be .Cas ; ; c0 /-asymptotically smooth with > 0. Let the family of interpolation operators .Im /1 mD0 be .ƒ; /-stable. Let 2 .0; 1/ be such that diam.Q t 0 / diam.Q t /
holds for all t 2 T ; t 0 2 sons.t /:
(6.59)
As in Remark 4.23, we denote the convergence rate of the interpolation scheme by ³ ² c0 c0 ; 2 .0; 1/: q ´ min c0 C 1 2 There are constants Cin 2 R>0 and O 2 .0; 1/ which only depend on d , g, , , ƒ and such that Cin C kg gQ t k1;Q t Qs O ˛.level.t/level.t // q ˛Cˇ.p level.t// (6.60) dist.Q t ; Qs / holds for all t 2 T , t C 2 pred.t / and s 2 rowC .t C / satisfying the admissibility condition diam.Q t C / 2 dist.Q t C ; Qs / (which is implied by the standard admissibility condition (4.49)).
268
6 Compression
Proof. Let t 2 T . We start by observing that Cf ´
Cas ; dist.Q t ; Qs /
f ´
dist.Q t ; Qs / c0
fulfill the requirements of Theorem 4.20, which gives us n t 2f diam.Q t / d kg gQ t k1;Q t Qs 2edCf ƒm 1 C .n t C 1/ % : f diam.Q t / We apply (6.59) to bound the diameter of Q t by that of Q t C : n t n t 2f 2f % % : diam.Q t / level.t/level.t C / diam.Q t C / The admissibility condition yields 2f 2 dist.Q t ; Qs / 2 dist.Q t C ; Qs / 2 dist.Q t C ; Qs / 1 D D ; diam.Q t C / c0 diam.Q t C / c0 diam.Q t C / 2c0 dist.Q t C ; Qs / c0 c0 diam.Q t / c0 diam.Q t C / 2c0 dist.Q t C ; Qs / diam.Q t / D D 2c0 f dist.Q t C ; Qs / dist.Q t C ; Qs / dist.Q t C ; Qs / and we conclude
kg gQ t k1;Q t Qs
2edCf ƒdm .1
C 2c0 /.n t C 1/ %
n t
1
:
level.t/level.t C / c0
Now we can use Lemma 4.78 to find a O 2 .0; 1/ such that %.1=.c0 // %.1=.c0 //=O holds, and applying this estimate level.t / level.t C / times yields 1 n t d O n t .level.t/level.t C // kg gQ t k1;Q t Qs 2edCf ƒm .1 C 2c0 /.n t C 1/ % : c0 Remark 4.17 implies %.1=.c0 // > maxf1=.c0 / C 1; 2=.c0 /g D 1=q, so we have ´ q%.1=.c0 // 2 R>1 , and Lemma 3.50 implies that there is a constant Cin 2 R>0 such that 2edCas .1 C 2c0 /ƒd .x C 2/d .x C 1/ x Cin
holds for all x 2 N;
therefore we find kg gQ t k1;Q t Qs d
2edCf ƒ .n t C 2/
d
O n t .level.t/level.t C //
.1 C 2c0 /.n t C 1/
1 % c0
n t
2edCas .1 C 2c0 /ƒd C .n t C 2/d .n t C 1/ n t O n t .level.t/level.t // q n t dist.Q t ; Qs / Cin C O n t .level.t/level.t // q n t : dist.Q t ; Qs / D
Observing n t D ˛ C ˇ.p level.t // ˛ completes the proof.
269
6.8 Refined error control and variable-rank approximation
It is important to note that estimates similar to (6.60) also hold for Taylor or multipole expansion schemes. Based on this estimate, we now derive the required bound for the blockwise spectral error: Lemma 6.39. Let Cin 2 R>0 and O 2 .0; 1/ be such that (6.60) holds for all ˛; ˇ; t; t C 1=2 and s. Let O 2 R>0 , p 2 .0; Cbc / and 2 .0; 1/. We require the following assumptions: • The measure of the support of a cluster can be bounded by a power of its diameter, i.e., there are constants Ccu 2 R1 and d 2 N satisfying for all t 2 T :
j t j Ccu diam.Q t /d
(6.61a)
This assumption defines the “effective dimension” d of the subdomain or submanifold (cf. (4.58)). • The order of the singularity of g is not too high, i.e., we have d . • The growth of bounding boxes can be controlled, and the diameters of bounding boxes of leaf clusters can be bounded by a multiple of the grid parameter h, i.e., there are constants Cgr 2 R>0 and 2 .0; 1/ satisfying p level.t/ diam.Q t / Cgr h
for all t 2 T :
(6.61b)
This assumption is related to the quasi-uniformity of the underlying grid and the regularity of the clustering scheme (cf. (4.59)). • The levels of row and column clusters of admissible blocks are not too far apart, i.e., there is a constant Ccn 2 N0 satisfying j.p level.t // .pJ level.s//j Ccn
for all b D .t; s/ 2 LC J : (6.61c)
This assumption is related to the compatibility of the cluster tree T and TJ and to the “level-consistency” (cf. [18]) of the resulting block cluster tree TJ . We assume that the rank parameters are large enough, i.e., that ˛ ˛
log.p 1=2 / ; log O
ˇ
log. / ; log q
log.Cin Cov Ccu Cgr Csp1=2 .2/ .1 /1=2 . /Ccn =2 C CJ hd =O / log q
hold. This ensures
1 C .p 1=2 /level.t/level.t / .p level.t//=2 .pJ level.s//=2 Csp (6.62) C C 2 pred.t / and s 2 row .t /.
k t Gs A t B t;s k2 O for all t 2 T , t C
s
270
6 Compression
Proof. Let t 2 T , t C 2 pred.t / and s 2 rowC .t C /. We denote the levels of t , t C and s by l t ´ level.t /, l t C ´ level.t C / and ls ´ level.s/. Lemma 4.44 yields k t Gs A t B t;s k2 Cov C CJ j t j1=2 j s j1=2 kg gQ t k1;Q t Qs :
We apply (6.61a) and get k t Gs A t B t;s k2
Cov Ccu C CJ diam.Q t /d =2 diam.Qs /d =2 kg gQ t k1;Q t Qs : Now we use the interpolation error estimate (6.60) in order to find k t Gs A t B t;s k2
Cin Cov Ccu C CJ
diam.Q t /d =2 diam.Qs /d =2 O ˛.l t l C / ˛Cˇ.p l t / t q : dist.Q t ; Qs /
Since .t C ; s/ 2 LC J is admissible, we have 1 1 diam.Q t C / diam.Q t / ; dist.Q t ; Qs / dist.Q t C ; Qs / 2 2 1 dist.Q t ; Qs / dist.Q t C ; Qs / diam.Qs / ; 2 and these inequalities can be used to eliminate the denominator of our estimate and get k t Gs A t B t;s k2 Cin Cov Ccu C CJ .2/
diam.Q t /.d /=2 diam.Qs /.d /=2 O ˛.l t l t C / q ˛Cˇ.p l t / : Now we use (6.61b) in order to prove k t Gs A t B t;s k2 C1 C CJ hd .l t p /=2 .ls pJ /=2 O ˛.l t l t C / q ˛Cˇ.p l t / ;
where the constants are collected in C1 ´ Cin Cov Ccu Cgr .2/ for better readability. Due to our choice of ˛ and ˇ, we have q ˇ and O ˛ p 1=2 , and the consistency assumption (6.61c) yields O ˛.l t l t C / q ˇ.p l t / .p 1=2 /l t l t C . /p l t D .p 1=2 /l t l t C . /.l t l t C /=2 . /.p l t /=2 . /.p l t /=2 D .p 1=2 /l t l t C . /.p l t /=2 . /.p l t C /=2 .p 1=2 /l t l t C . /.p l t /=2 . /.pJ ls /=2 . /Ccn =2 ; therefore our error estimate now takes the form k t Gs A t B t;s k2
C1 C CJ . /Ccn =2 hd q ˛ .p 1=2 /l t l t C .p l t /=2 .pJ ls /=2 :
6.9 Numerical experiments
271
Observing that our choice of ˛ implies . /Ccn =2 q C1 C CJ hd
s
˛
1 O Csp
concludes the proof. It is important to note that the rank parameter ˇ depends only on the constants , and q, but not on O , while ˛ depends on the quotient O ; C CJ hd i.e., if O decays not faster than C CJ hd , both ˛ and ˇ can be chosen as constants independent of h, i.e., the corresponding rank distributions K and L will be bounded independently of the mesh parameter, while the discretization error converges like hd . This means that the approximation resulting from Algorithm 34 will have the same advantages as the variable-order interpolation scheme introduced in Section 4.7, while requiring weaker assumptions and no complicated analysis of re-interpolation operators: the existence of low-rank approximations A t B t for each cluster is sufficient, they do not have to be nested. Other panel clustering schemes, e.g., Taylor expansion or multipole expansion, can also be used to construct these low-rank approximations, and the quasi-optimality result of Lemma 6.36 implies that the adaptively constructed H 2 -matrix approximation will be at least as good as any of these.
6.9 Numerical experiments The theoretical estimates for the complexity of the compression algorithms require the rank distribution of the resulting cluster basis to be bounded. In the case of the recompression of an existing H 2 -matrix, this requirement is easily fulfilled, in the general case we have to rely on the quasi-optimality estimate of Lemma 6.23: if the total cluster bases can be approximated by bounded-rank matrices, then the rank distribution of the resulting cluster basis will also be bounded. In a first series of experiments, we consider the compression of dense matrices in standard array representation. The dense matrices S; D 2 R are given by Z Z 1 'i .x/ 'j .y/ log kx yk2 dy dx for all i; j 2 ; Sij ´ 2 Z Z 1 hx y; n.y/i Dij ´ 'i .x/ 'j .y/ dy dx for all i; j 2 ; 2 kx yk22
272
6 Compression
where is a polygonal curve in two-dimensional space and .'i /i2 is a family of piecewise constant basis functions in L2 . /. The matrices S and D correspond to the two-dimensional single and double layer potential operators. The entries of the matrices S and D are approximated by hierarchical quadrature [21]. In the first experiment, we apply the weighted compression Algorithm 34 to the matrix S, where is a regular polygonal approximation of the unit p circle C ´ fx 2 R2 W kxk2 D 1g. We use the weighting parameters p D 2=3 < 1=2 and D 5=6 < 1 and observe the results given in Table 6.1: the times for the construction of the original matrix (“Matrix”), of the adaptive cluster basis (“Basis”) and of the corresponding
Table 6.1. Variable-order recompression of the two-dimensional single layer potential operator on the circle with O D h2 , based on a dense matrix.
n Matrix Basis Proj Mem Mem=n kX XQ k2 256 0:1 < 0:1 < 0:1 109:2 0:43 1:04 0:6 0:1 < 0:1 223:0 0:44 4:15 512 1024 2:2 0:4 0:2 446:4 0:44 1:35 2048 8:9 1:8 0:7 896:6 0:44 3:76 4096 36:0 7:8 3:1 1794:3 0:44 9:37 8192 144:2 38:3 12:9 3590:4 0:44 2:57 16384 580:6 164:1 55:6 7176:0 0:44 6:38 32768 2318:5 688:3 236:3 14369:1 0:44 1:78 10000
1000
Basis O(n^2)
1000
Proj O(n^2)
100
100 10 10 1
1
0.1 1000
10000
100000
100000
0.1 1000
10000
0.0001
Memory O(n)
100000 Error O(h^2)
1e-05 10000 1e-06 1000 1e-07
100 1000
10000
100000
1e-08 1000
10000
100000
6.9 Numerical experiments
273
Table 6.2. Variable-order recompression of the two-dimensional single layer potential operator on the square with O D h2 , based on a dense matrix.
n Matrix Basis Proj Mem Mem=n kX XQ k2 0:57 2:14 256 0:1 < 0:1 < 0:1 146:1 512 0:6 0:1 < 0:1 283:7 0:55 2:45 1024 2:2 0:4 0:1 553:6 0:54 8:26 8:9 1:9 0:6 1094:7 0:53 3:26 2048 4096 36:0 7:6 2:4 2170:8 0:53 8:67 8192 144:2 37:5 10:1 4318:5 0:53 2:07 16384 576:7 154:1 43:4 8569:3 0:52 5:08 32768 2320:5 654:4 187:0 17040:3 0:52 1:38 10000
1000
Basis O(n^2)
1000
Proj O(n^2)
100
100 10 10 1
1
0.1 1000
10000
100000
100000
0.1 1000
10000
0.0001
Memory O(n)
100000 Error O(h^2)
1e-05 10000 1e-06 1000 1e-07
100 1000
10000
100000
1e-08 1000
10000
100000
H 2 -matrix representation (“Proj”) scale like n2 , which can be considered optimal for a method based on dense matrices and is in accordance with the Lemmas 6.20 and 5.8. The storage requirements scale like n and the approximation error scales like n2 , therefore we can conclude that the weighted compression algorithm indeed finds a variable-order approximation of the dense matrix. In the next experiment, we apply the algorithm to a regular triangulation of the unit square1 S ´ f1; 1g Œ1; 1 [ Œ1; 1 f1; 1g. The results given in Table 6.2 are 1
In practical applications, a graded triangulation would be more appropriate, but the discussion of the
274
6 Compression
Table 6.3. Variable-order recompression of the two-dimensional double layer potential operator on the square with O D h2 , based on a dense matrix.
n Matrix Row Column Proj Mem Mem=n kX XQ k2 256 0:1 < 0:1 < 0:1 < 0:1 153:8 0:60 2:44 512 0:2 0:1 0:1 < 0:1 302:9 0:59 4:35 1024 0:9 0:4 0:4 0:1 598:5 0:58 1:25 2048 3:7 1:9 1:9 0:6 1185:5 0:58 3:76 15:1 7:8 7:6 2:4 2346:5 0:57 6:57 4096 8192 60:4 37:1 31:0 10:1 4620:8 0:56 1:47 16384 243:3 154:0 123:9 43:2 9132:9 0:56 4:78 32768 971:6 601:8 514:0 175:6 18058:7 0:55 1:78 1000
1000
Row basis Column basis O(n^2)
100
100
10
10
1
1
0.1 1000
10000
100000
100000
0.1 1000
Proj O(n^2)
10000
0.0001
Memory O(n)
100000 Error O(h^2)
1e-05 10000 1e-06 1000 1e-07
100 1000
10000
100000
1e-08 1000
10000
100000
very similar to those observed for the unit sphere: again the time for the construction scales like n2 , while the storage requirements of the resulting H 2 -matrix scale like n and the error scales like n2 . The previous two cases are covered by the theory presented in Sections 4.7 and 6.8. The situation is more complicated for the double layer potential operator D on the unit square: a variable-order approximation exists [81], but its construction is technical details connected to this approach is beyond the scope of this book, cf. [66].
6.9 Numerical experiments
275
complicated. Fortunately, the existence of an approximation is already sufficient for Algorithm 34, and the numerical results in Table 6.3 show that it finds an efficient H 2 -matrix approximation. Let us now consider the three-dimensional case. Since the matrix dimensions tend to grow rapidly due to the higher spatial dimension, we can no longer base our experiments on dense matrices, but construct an initial H 2 -matrix and apply recompression to improve its efficiency. We construct this initial approximation by the interpolation approaches discussed in the Sections 4.4 and 4.5. We start with a simple example: the matrix V corresponding to the single layer operator (cf. Section 4.9) is approximated by interpolation with constant order m D 4
Table 6.4. Recompression of the single layer potential operator with O D 4h2 105 , based on an initial approximation by constant-order interpolation with m D 4.
n Build Mem Mem=n MVM kX XQ k2 512 1:9 1:8 3:6 < 0:01 6:26 2048 14:6 8:6 4:3 0:04 9:57 8192 70:4 36:8 4:6 0:23 2:57 32768 322:6 165:5 5:2 1:05 6:38 131072 1381:5 664:5 5:2 4:25 1:28 524288 5975:2 2662:6 5:2 17:72 2:59 10000
10000
Build O(n)
1000
1000
100
100
10
10
1 100
1000
10000
100
100000
1e+06
1 100
MVM O(n)
1e-06
1
1e-07
0.1
1e-08
1000
10000
100000
1000
10000
1e-05
10
0.01 100
Memory O(n)
1e+06
1e-09 100
100000
1e+06
Error O(h^2)
1000
10000
100000
1e+06
276
6 Compression
on a cluster tree with Clf D 16, then we use Algorithm 30 with the variable-order error control scheme (cf. Section 6.8) to construct a more efficient approximation. According to Section 4.6, it is reasonable to use an error tolerance O h2 , where h is again the minimal meshwidth of the underlying triangulation. The results are given in Table 6.4. We can see that the computation time, the storage requirements and the time for the matrix-vector multiplication are in O.n/, as predicted by our theory, and that the error decreases at a rate of approximately h2 . Compared to the uncompressed case (cf. Table 4.1), we can see that the storage requirements are reduced by a factor of more than six, while the approximation error and the computation time are only slightly increased.
Table 6.5. Recompression of the double layer potential operator with O D 4h2 104 , based on an initial approximation by constant-order interpolation with m D 4.
n Build Mem Mem=n MVM kX XQ k2 512 1:9 1:8 3:5 < 0:01 9:15 2048 14:2 7:6 3:8 0:03 9:26 8192 68:6 31:0 3:9 0:18 2:26 32768 312:8 143:3 4:5 0:84 6:57 131072 1360:5 585:2 4:6 3:45 1:77 524288 5937:8 2388:1 4:7 15:00 4:78 10000
10000
Build O(n)
1000
1000
100
100
10
10
1 100
1000
10000
100
100000
1e+06
1 100
Memory O(n)
1000
10000
0.0001
MVM O(n)
10
100000
1e+06
Error O(h^2)
1e-05
1 1e-06 0.1 1e-07
0.01
0.001 100
1000
10000
100000
1e+06
1e-08 100
1000
10000
100000
1e+06
6.9 Numerical experiments
277
Table 6.6. Variable-order recompression of the single layer potential operator with O D 2h3 102 , based on an initial approximation by variable-order interpolation with ˛ D 2 and ˇ D 1.
Build Mem Mem=n MVM kX XQ k2 n 1:0 1:7 3:5 < 0:01 2:94 512 2048 6:9 7:3 3:6 0:02 1:24 8192 43:1 30:7 3:8 0:18 6:06 32768 267:9 142:9 4:5 0:84 6:77 131072 1574:9 590:5 4:6 3:49 8:28 524288 8271:4 2449:4 4:8 15:60 9:89 2097152 38640:7 9921:5 4:8 65:74 1:29 100000
100000
Build O(n)
10000
10000
1000
1000
100
100
10
10
1 100
1000
10000
100000
100
1e+06
1e+07
1 100
Memory O(n)
1000
10000
0.001
MVM O(n)
100000
1e+06
1e+07
Error O(h^3)
0.0001
10
1e-05 1 1e-06 0.1 1e-07 0.01
0.001 100
1e-08
1000
10000
100000
1e+06
1e+07
1e-09 100
1000
10000
100000
1e+06
1e+07
In the next experiment, we construct an approximation of the matrix K corresponding to the double layer operator (cf. Section 4.9) by using the derivatives of local interpolants (cf. Section 4.5) and again use Algorithm 30 with variable-order error control and a tolerance of O h2 . The results given in Table 6.5 are comparable to the previous experiment: the computation times are almost identical, and the storage requirements for K are slightly lower than for V . We have seen in Chapter 4 that variable-order interpolation schemes yield a very good approximation quality compared to their time and storage requirements, and of course we would like to ensure that these properties are preserved by the recompression algorithm. Combining an initial variable-order approximation with the variable-order
278
6 Compression
Table 6.7. Variable-order recompression of the double layer potential operator with O D 2h3 101 , based on an initial approximation by constant-order interpolation with m D 6.
Build Mem Mem=n MVM kX XQ k2 n 8:0 1:7 3:4 < 0:01 3:33 512 2048 71:4 7:2 3:6 0:02 1:13 8192 521:4 28:9 3:6 0:17 3:25 32768 2373:9 134:5 4:2 0:79 7:66 131072 10102:2 567:8 4:4 3:43 3:57 524288 41980:5 2353:6 4:6 15:52 3:38 100000
10000
Build O(n)
10000
Memory O(n)
1000
1000 100 100 10
10
1 100
1000
10000
100
100000
1e+06
1 100
1000
10000
0.01
MVM O(n)
100000
1e+06
Error O(h^3)
0.001
10
0.0001 1 1e-05 0.1 1e-06 0.01
0.001 100
1e-07
1000
10000
100000
1e+06
1e-08 100
1000
10000
100000
1e+06
recompression algorithm yields the results reported in Table 6.6: compared to the uncompressed case (cf. Table 4.6), the storage requirements are reduced by a factor of four, the approximation error is even smaller than before, only the computation time is increased. This latter effect can be explained by a closer look at the proof of Theorem 6.27: we have to bound the product of a ninth-order polynomial and an exponential term, and the corresponding constant is so large that the asymptotic behaviour is not yet visible. As a final example, we consider the variable-order recompression of the matrix K corresponding to the double layer potential operator. It is possible to base the recompression on variable-order interpolation schemes [23] or on Taylor expansions [90], [100], but the implementation would be relatively complicated and not very
6.9 Numerical experiments
279
efficient. Instead, we simply use an initial approximation constructed by constant-order interpolation with m D 6 and compress it using the strategy described in Section 6.8. The experimental results given in Table 6.7 show that the compression strategy is successful: the error decays like h3 , the storage requirements increase only like n.
Chapter 7
A priori matrix arithmetic
The structure of H - and H 2 -matrices is purely algebraic, therefore it is straightforward to wonder whether it is possible to perform matrix arithmetic operations like addition, multiplication, inversion or factorizations efficiently in these data-sparse formats. Factorizations could be used to construct efficient solvers for systems of linear equations, the matrix inverse would be a useful tool for the approximation of matrix functions or the solution of matrix equations like Lyapunov’s or Riccati’s. Both the inversion and the basic Cholesky and LU -factorizations can be expressed by means of sums and products of matrices, therefore we focus on algorithms for performing these fundamental operations efficiently. We consider three ways of handling sums and products of two H 2 -matrices A and B with different structure: • We can prescribe an H 2 -matrix space and compute the best approximations of A C B and AB C C in this space. Using orthogonal cluster bases, we can use the results of Chapter 5 in order to handle these computations very efficiently. • The exact sum A C B and the exact product AB are elements of H 2 -matrix spaces with suitably chosen block cluster trees and cluster bases. • We can compute an auxiliary approximation of a sum or a product in the form of a hierarchical matrix, which can then be approximated by an H 2 -matrix with adaptively chosen cluster bases by applying Algorithm 33. The three methods have different advantages and disadvantages: the first approach computes the best approximation of the result in a prescribed H 2 -matrix space, i.e., in order to reach a given precision, the space has to be chosen correctly. Constructing a suitable matrix space can be quite complicated in practical applications, but once it is available, the algorithm reaches the optimal order of complexity. The second approach yields exact sums and products, but at the cost of a significant increase both in the number of nodes in the block cluster tree and in the rank. Therefore it is, at least at the moment, only interesting for theoretical investigations, but computationally too complex to be used in practical applications. The third approach combines both methods: it computes an intermediate approximation by a technique similar to the second approach, relying on simple blockwise low-rank approximations instead of an exact representation in an H 2 -matrix format. Since this approach is only loosely related to the other two, it is discussed in the separate Chapter 8.
7.1 Matrix forward transformation
281
The current chapter is organized as follows: • Section 7.1 introduces the matrix forward transformation algorithm, a variant of the forward transformation Algorithm 6 that applies to matrices and is of crucial importance for the algorithms introduced in the following Sections 7.4 and 7.7. • Section 7.2 is devoted to its counterpart, the matrix backward transformation algorithm that is also required in Sections 7.4 and 7.7. • Section 7.3 contains the definitions and notations required to investigate the matrix addition. • Section 7.4 considers an algorithm for computing the best approximation of the sum of two matrices in a given H 2 -matrix space (cf. [11], Section 3.3). • Section 7.5 considers the computation of the exact sum of two matrices in an extended H 2 -matrix space that will usually be too large for practical applications. • Section 7.6 contains the definitions and notations required to investigate the matrix multiplication. • Section 7.7 is devoted to an algorithm for computing the best approximation of the product of two matrices in a given H 2 -matrix space (cf. [11], Section 4). This algorithm and the proof of its optimal complexity are the main results of this chapter. • Section 7.8 considers the computation of the exact product of two matrices in an extended H 2 -matrix space that will usually be far too large for practical applications. • Section 7.9 presents numerical experiments that show that the complexity estimates for the algorithm introduced in Section 7.7 are of optimal order. Assumptions in this chapter: We assume that cluster trees T , TJ and TK for the finite index sets , J and K, respectively, are given. Let n ´ #, nJ ´ #J and nK ´ #K denote the number of indices in each of the sets, and let c ´ #T , cJ ´ #TJ and cK ´ #TK denote the number of clusters in each of the cluster trees.
7.1 Matrix forward transformation In this chapter, we frequently have to compute representations of blocks of an H 2 matrix in varying cluster bases. Differently from the simple techniques introduced in Section 5.3, which only handle transformations between admissible leaves of a block cluster tree, we require transformations between admissible leaves, inadmissible leaves and non-leaf blocks.
282
7 A priori matrix arithmetic
We now introduce algorithms for handling these transformations efficiently. Let A 2 H 2 .TA;J ; VA ; WA /, where VA D .VA;t / t2T and WA D .WA;s /s2TJ are nested cluster bases for T and TJ , respectively, and TA;J is an admissible block cluster tree. Let V D .V t / t2T and W D .Ws /s2TJ be nested cluster bases for T and TJ , respectively, and let K D .K t / t2T and L D .Ls /s2TJ be the rank distributions for V and W , respectively. We consider the computation of the matrix family SyA D .SyA;b /b2TA;J (cf. Figure 7.1) defined by SyA;b ´ V t AWs 2 RK t Ls
for all b 2 TA;J :
(7.1)
If V and W are orthogonal cluster bases, the matrix V t SyA;b Ws D V t V t AWs Ws is the best approximation of the block t As with respect to the Frobenius norm (cf. Lemma 5.5), but the matrices SyA D .SyA;b /b2TA;J are also useful in the nonorthogonal case (cf. equation (7.17) used for computing matrix-matrix products).
Figure 7.1. Blocks involved in the matrix forward transformation.
If b D .t; s/ 2 TA;J is an inadmissible leaf of TA;J , both t and s have to be leaves of T and TJ , respectively, and we can compute SyA;b directly by using its definition (7.1). If b is an admissible leaf of TA;J , we have t As D VA;t SA;b WA;s
and conclude Ws D PV;t SA;b PW;s ; SyA;b D V t AWs D V t VA;t SA;b WA;s
(7.2)
where the cluster operators PV D .PV;t / t2T and PW D .PW;s /s2TJ are defined by PV;t ´ V t VA;t ;
PW;s ´ WA;s Ws ;
for all t 2 T ; s 2 TJ :
We can see that PV and PW are cluster basis products of the cluster bases V , VA , W and WA , so we can use Algorithm 13 to compute these cluster operators efficiently.
7.1 Matrix forward transformation
283
This leaves us with the case that b is not a leaf, i.e., that sons.TA;J ; b/ ¤ ; holds. According to Definition 3.12, we have 8 ˆ if sons.t / D ;; sons.s/ ¤ ;; <¹t º sons.s/ sons.TA;J ; b/ D sons.t / ¹sº if sons.t / ¤ ;; sons.s/ D ;; ˆ : sons.t / sons.s/ if sons.t / ¤ ;; sons.s/ ¤ ;: Using Definition 3.13, we can see that sons.TA;J ; b/ D sonsC .t / sonsC .s/ holds for all b D .t; s/ 2 TA;J with sons.b/ ¤ ;. For all t 0 2 sonsC .t /, the Definition 5.27 of long-range transfer matrices yields ´ E t 0 if sons.t / ¤ ;; E t 0 ;t D (7.3) I otherwise; and since V is nested, equation (3.15) takes the form X Vt D V t 0 E t 0 ;t : t 0 2sonsC .t/
Obviously, the same holds for the cluster basis W , i.e., we have X Ws D Ws 0 Fs 0 ;s : s 0 2sonsC .s/
Applying these equations to (7.1) yields X X SyA;b D V t AWs D D
X
E t0 ;t V t0 AWs 0 Fs 0 ;s
t 0 2sonsC .t/ s 0 2sonsC .s/
b 0 D.t 0 ;s 0 /2sons.b/
E t0 ;t V t0 AWs 0 Fs 0 ;s D
X
E t0 ;t SyA;b 0 Fs 0 ;s :
b 0 D.t 0 ;s 0 /2sons.b/
Using the definitions of E t 0 ;t and Fs 0 ;s (e.g., E t 0 ;t D I for t 0 D t and E t 0 ;t D E t 0 for t 0 2 sons.t/), we can return this equation to the more explicit form 8P ˆ if sons.t / D ; and sons.s/ ¤ ;;
284
7 A priori matrix arithmetic
Collect
Figure 7.2. Collect matrices SyA;b 0 corresponding to subblocks of b in order to form SyA;b .
Lemma 7.1 (Complexity of collecting submatrices). Let V and W be nested cluster bases with rank distributions K and L, and let .k t / t2T and .ls /s2TJ be defined as in (3.16) and (3.18). Let b D .t; s/ 2 TJ . Algorithm 35 requires not more than 2.k t2 ls C k t ls2 / operations. Proof. Depending on whether t and s are leaves, we have to distinguish four cases: if t and s are leaves, no operations are performed. If t is a leaf and s is not, the matrix SyA;b can be computed in X 2.#K t /.#Ls 0 /.#Ls / 2.#K t /ls .#Ls / 2k t ls2 s 0 2sons.s/
operations. If s is a leaf and t is not, the matrix can be computed in X 2.#K t /.#K t 0 /.#Ls / 2.#K t /k t .#Ls / 2k t2 ls t 0 2sons.t/
operations. If both t and s are not leaves, the matrix X t 0 can be computed in X 2.#K t 0 /.#Ls 0 /.#Ls / 2.#K t 0 /ls .#Ls / 2.#K t 0 /ls2 s 0 2sons.s/
operations for each t 0 2 sons.t /, and using these matrices, SyA;b can be computed in X 2.#K t /.#K t 0 /.#Ls / 2.#K t /k t .#Ls / 2k t2 ls : t 0 2sons.t/
The total number of operations is bounded by X 2k t2 ls C 2.#K t 0 /ls2 2k t2 ls C 2k t ls2 : t 0 2sons.t/
7.1 Matrix forward transformation
285
Algorithm 35. Collect son blocks to compute their father. procedure Collect(b, V , W , var SyA ); .t; s/ b; SyA;b 0; if sons.t/ D ; then if sons.s/ D ; then Do nothing fLeaf blocks are handled differentlyg else for s 0 2 sons.s/ do b0 .t; s 0 /; fsons.t / D ;, sons.s/ ¤ ;g y SyA;b C SyA;b 0 Fs 0 SA;b end for end if else if sons.s/ D ; then for t 0 2 sons.t / do .t 0 ; s/; fsons.t / ¤ ;, sons.s/ D ;g b0 y SA;b SyA;b C E t0 SyA;b 0 end for else for t 0 2 sons.t / do Xt 0 0 2 RK t 0 Ls ; fsons.t / ¤ ;, sons.s/ ¤ ;g 0 for s 2 sons.s/ do b0 .t 0 ; s 0 /; 0 X t 0 C SyA;b 0 Fs 0 Xt end for SyA;b SyA;b C E t0 X t 0 end for end if end if
Combining (7.4), as implemented in Algorithm 35, with (7.1) for inadmissible leaves and (7.2) for admissible leaves yields the recursive Algorithm 36 for computing the entire family SyA D .SyA;b /b2TA;J efficiently. It closely resembles the forward transformation Algorithm 6: instead of computing coefficient vectors corresponding to an input vector by proceeding upwards through a cluster tree, it computes coefficient matrices of an input matrix by proceeding upwards through a block cluster tree. Due to this similarity, we call it the matrix forward transformation.
286
7 A priori matrix arithmetic
Algorithm 36. Matrix forward transformation. procedure MatrixForward(b, V , W , PV , PW , SA , var SyA ); .t; s/ b; if sons.TA;J ; b/ ¤ ; then for b 0 2 sons.TA;J ; b/ do MatrixForward(b 0 , V , W , PV , PW , SA , SyA ) end for; Collect(b, V , W , SyA ) fAlgorithm 35g else if b is admissible then SyA;b PV;t SA;b PW;s fAdmissible leafg else SyA;b V t AWs fInadmissible leafg end if Remark 7.2. If the cluster bases V D .V t / t2T and W D .Ws /s2TJ are orthogonal, for all blocks b D .t; s/ 2 TA;J the matrix V t SyA;b Ws D V t V t AWs Ws is the best approximation of the subblock t As by the cluster bases V t and Ws . In this sense, the matrix forward transformation constructs a hierarchy of approximations of the matrix A, and this hierarchy is of crucial importance for matrix arithmetic operations. Lemma 7.3 (Complexity). Let V and W be nested cluster bases with rank distributions K and L, let VA and WA be nested cluster bases with rank distributions KA and LA . Let TA;J be an admissible block cluster tree. Let .k t / t2T , .ls /s2TJ , .kA;t / t2T and .lA;s /s2TJ be defined as in (3.16) and (3.18). Let kO t ´ maxfk t ; kA;t g;
lOs ´ maxfls ; lA;s g
for all t 2 T and s 2 TJ . Algorithm 36, started with b D root.TA;J /, requires not more than X X .k t2 ls C k t ls2 / 2 .k t3 C ls3 / 2 bD.t;s/2TA;J
bD.t;s/2TA;J
operations. If TA;J be Csp -sparse, this can be bounded by X X 2Csp k t3 C ls3 : t2T
s2TJ
If in addition K, L, KA and LA are .Cbn ; ˛; ˇ; r; /-bounded and if T and TJ are .Crc ; ˛; ˇ; r; /-regular, the number of operations is in O..˛ C ˇ/2r .n C nJ //.
7.2 Matrix backward transformation
287
Proof. Let b D .t; s/ 2 TA;J . If b is not a leaf, the algorithm handles the subblocks b 0 2 sons.b/ by recursion and then uses Algorithm 35. According to Lemma 7.1, this requires not more than 2.kO t2 lOs C kO t lOs2 / operations. If b is an admissible leaf, the algorithm can compute the product PV;t SA;b PW;s “from right to left”, i.e., by first computing SA;b PW;s in 2.#KA;t /.#LA;s /.#Ls / 2kO t lOs2 operations, and then computing SyA;b in 2.#K t /.#KA;t /.#Ls / 2kO t2 lOs operations. The entire computation again takes not more than 2.kO t2 lOs C kO t lOs2 / operations. If b is an inadmissible leaf, both t and s have to be leaves, since TJ is admissible. This implies #tO k t and #sO ls . If we compute V t AWs again “from right to left”, the first product AWs takes not more than 2.#tO/.#sO /.#Ls / 2kO t lOs2 operations, and the second takes not more than 2.#K t /.#tO/.#Ls / 2kO t2 lOs operations, and we again get the upper bound 2.kO t2 lOs C kO t lOs2 /: Adding these bounds for all b D .t; s/ 2 TJ and applying (5.9) yields the bound X X 2 kO t2 lOs C kO t lOs2 2 kO t3 C lOs3 bD.t;s/2TJ
bD.t;s/2TJ
for the total number of operations. If TA;J is Csp -sparse, we get X X X X X 2 kO t3 C lOs3 D 2 kO t3 C 2 lOs3 bD.t;s/2TJ
t2T s2row.t/
2Csp
X
t2T
kO t3 C 2Csp
s2TJ t2col.s/
X
lOs3 D 2Csp
X
s2TJ
t2T
kO t3 C
X
lOs3 ;
s2TJ
and we can complete the proof using the Lemmas 3.45 and 3.48.
7.2 Matrix backward transformation We have seen that the matrix forward transformation can be used to approximate subblocks of an H 2 -matrix by an admissible block. Its counterpart is the matrix backward transformation, which approximates an admissible block by an H 2 -matrix.
288
7 A priori matrix arithmetic
Let VC D .VC;t / t2T and WC D .WC;s /s2TJ be orthogonal nested cluster bases for T and TJ , respectively, and let TC;J be an admissible block cluster tree. Let V D .V t / t2T and W D .Ws /s2TJ be nested cluster bases for T and TJ , respectively, and let K D .K t / t2T and L D .Ls /s2TJ be the corresponding rank distributions. Let b D .t; s/ 2 TC;J , and let Sb 2 RK t Ls . We consider the approximation of the matrix C ´ V t Sb Ws in the space H 2 .TC;J ; VC ; WC /. According to Lemma 5.5, the best approximation is given by X VC;t VC;t Cz ´ …TC;J ;VC ;WC C D C WC;s WC;s b D.t ;s /2LC b
X
C
t Cs
b D.t ;s /2L b
X
D
VC;t VC;t V t Sb Ws WC;s WC;s
(7.5)
b D.t ;s /2LC b
X
C
t V t Sb Ws s :
b D.t ;s /2L b
Handling the nearfield blocks is straightforward: for all b D .t ; s / 2 L , Definib tion 3.33 implies t V t Sb Ws s D t t V t Sb Ws s s ; and due to Lemma 3.8, this matrix can only differ from zero if t 2 sons .t / and s 2 sons .s/ hold. In this case, Lemma 6.13 implies t V t D V t E t ;t ; and we can conclude ´ V t E t ;t Sb Fs ;s Ws t Cs D 0
s Ws D Ws Fs ;s ;
if b D .t ; s / 2 sons .TC;J ; b/; otherwise:
(7.6)
Here E t ;t and Fs ;s denote the long-range transfer matrices introduced in Definition 5.27. For the farfield blocks, we have to compute SC;b ´ VC;t V t Sb Ws WC;s
for all b D .t ; s / 2 LC . According to Definition 3.33, we have b SC;b D VC;t . t t /V t Sb Ws .s s /WC;s ;
7.2 Matrix backward transformation
289
and Lemma 3.8 yields that SC;b can only be non-zero if t 2 sons .t / and s 2 sons .s/. In this case, we can again use Lemma 6.13 to find ´ if b D .t ; s / 2 sons .TC;J ; b/; VC;t V t E t ;t Sb Fs ;s Ws WC;s SC;b D 0 otherwise: We once more employ cluster operators to simplify this equation: we introduce PV ´ .PV;t / t2T and PW ´ .PW;s /s2TJ by Vt ; PV;t ´ VC;t
for all t 2 T , s 2 TJ , and find ´ PV;t E t ;t Sb Fs ;s PW;s SC;b D 0
PW;s ´ Ws WC;s
if b D .t ; s / 2 sons .TC;J ; b/; otherwise:
Both equations (7.6) and (7.7) require matrices of the type ´ E t ;t Sb Fs ;s if b D .t ; s / 2 sons .TC;J ; b/; SyC;b ´ 0 otherwise;
(7.7)
(7.8)
and the efficient computation of these matrices is the key component of the matrix backward transformation. Let b 2 sons .TC;J ; b/, and let b 0 2 sons.b /. According to Lemma 5.28, we have E t 0 ;t D E t 0 ;t E t ;t ;
Fs 0 ;s D Fs 0 ;s Fs ;s ;
and (7.8) yields y SyC;b 0 D E t 0 ;t Sb Fs0 ;s D E t 0 ;t E t ;t Sb Fs;s Fs 0 ;s D E t 0 ;t SC;b Fs 0 ;s :
Due to Definition 3.12, we have either t 0 s 0 D s, so Definition 5.27 implies 8 ˆ <SyC;b Fs 0 SyC;b 0 D E t 0 SyC;b ˆ : 0y E t SC;b F 0 s
2 sons.t / or t 0 D t and either s 0 2 sons.s/ or if sons.t / D ;; sons.s/ ¤ ;; if sons.t / ¤ ;; sons.s/ D ;; if sons.t / ¤ ;; sons.s/ ¤ ;:
(7.9)
Lemma 7.4 (Complexity of splitting). Let V and W be nested cluster bases with rank distributions K and L, and let .k t / t2T and .ls /s2TJ be defined as in (3.16) and (3.18). Let b D .t; s/ 2 TJ . Algorithm 37 requires not more than 2.k t2 ls C k t ls2 / operations.
290
7 A priori matrix arithmetic
Split
Figure 7.3. Split a matrix SyC;b into matrices SyC;b 0 corresponding to subblocks b 0 of b.
Algorithm 37. Split a block into son blocks. procedure Split(b, V , W , var SyC ); .t; s/ b; if sons.t/ D ; then if sons.s/ D ; then Do nothing fLeaf blocks are handled differentlyg else for s 0 2 sons.s/ do b0 .t; s 0 /; fsons.t / D ;, sons.s/ ¤ ;g y SyC;b 0 C SyC;b Fs0 SC;b 0 end for end if else if sons.s/ D ; then for t 0 2 sons.t / do b0 .t 0 ; s/; fsons.t / ¤ ;, sons.s/ D ;g SyC;b 0 C E t 0 SyC;b SyC;b 0 end for else for t 0 2 sons.t / do Xt 0 E t 0 SyC;b 2 RK t 0 Ls ; fsons.t / ¤ ;, sons.s/ ¤ ;g 0 for s 2 sons.s/ do .t 0 ; s 0 /; b0 SyC;b 0 SyC;b 0 C X t 0 Fs0 end for end for end if end if
Proof. Similar to the proof of Lemma 7.1.
7.2 Matrix backward transformation
291
The step (7.9) is handled by Algorithm 37, which translates a coupling matrix corresponding to the father block b D .t; s/ into coupling matrices corresponding to son blocks b 0 . If t and s are not leaf clusters, we employ auxiliary matrices X t 0 to store the intermediate results corresponding to sons t 0 of t . As in Algorithm 35, this improves the efficiency of the implementation. Using (7.9), as implemented in Algorithm 37, to construct SyC D .SyC;b /b2TC;J for all blocks by a top-down recursion and noting that (7.6) takes the form t Cs D V t SyC;b Ws for inadmissible leaves b D .t; s/ 2 L C;J and that (7.7) takes the form SC;b D PV;t SyC;b PW;s for admissible leaves b D .t; s/ 2 LC C;J leads to the recursive Algorithm 38 for z constructing the best approximation C of C D V t Sb Ws in the H 2 -matrix space H 2 .TC;J ; VC ; WC /. Algorithm 38. Matrix backward transformation. procedure MatrixBackward(b, V , W , PV , PW , var SyC , SC ); .t; s/ ´ b; if sons.TC;J ; b/ ¤ ; then fAlgorithm 37g Split(b, V , W , SyC ); for b 0 2 sons.TC;J ; b/ do MatrixBackward(b 0 , V , W , PV , PW , SyC , SC ) end for else if b is admissible then SC;b SC;b C PV;t SyC;b PW;s fAdmissible leafg else t Cs t Cs C V t SyC;b Ws fInadmissible leafg end if This algorithm closely resembles the backward transformation Algorithm 7, but works on matrices instead of vectors, therefore it is called the matrix backward transformation. As in the case of the former, it is a good idea to ensure that the algorithm adds the contribution of a father block to the son blocks instead of simply replacing the latter, since this leads to a significant reduction in complexity for the matrix-matrix multiplication discussed in Secion 7.7. The matrix backward transformation is the counterpart of the matrix forward transformation: the latter performs a bottom-up recursion to transform leaf blocks into larger blocks, while the matrix backward transformation uses a top-down recursion to transform larger blocks into leaf blocks.
292
7 A priori matrix arithmetic
Lemma 7.5 (Complexity). Let V and W be nested cluster bases with rank distributions K and L, let VC and WC be nested cluster bases with rank distributions KC and LC . Let TC;J be an admissible block cluster tree. Let .k t / t2T , .ls /s2TJ , .kC;t / t2T and .lC;s /s2TJ be defined as in (3.16) and (3.18). Let kO t ´ maxfk t ; kC;t g;
lOs ´ maxfls ; lC;s g
for all t 2 T and s 2 TJ . Algorithm 38, started with b D root.TC;J /, requires not more than X X .k t2 ls C k t ls2 / 2 .k t3 C ls3 / 2 bD.t;s/2TC;J
bD.t;s/2TC;J
operations. If TC;J be Csp -sparse, this can be bounded by X X 2Csp k t3 C ls3 : t2T
s2TJ
If in addition K, L, KC and LC are .Cbn ; ˛; ˇ; r; /-bounded and if T and TJ are .Crc ; ˛; ˇ; r; /-regular, the number of operations is in O..˛ C ˇ/2r .n C nJ //. Proof. Similar to the proof of Lemma 7.3. Remark 7.6 (Optimality). If the row cluster basis VC D .VC;t / t2T and the column cluster basis WC D .WC;s /s2TJ chosen for the matrix C are orthogonal, our construction implies that the matrix Cz defined in (7.5) is the best approximation of V t Sb Ws in H 2 .TC;J ; VC ; WC /.
7.3 Matrix addition Let us now consider the computation of M ´ C C A for A; C; M 2 RJ . According to Remark 3.37, H 2 -matrix spaces are subspaces of RJ , i.e., the sum of two H 2 -matrices based on the same block cluster tree and the same row and column cluster bases will again be an H 2 -matrix with the same structure, and its representation can be computed in linear complexity by adding the near- and farfield matrices. We consider generalizations of this result: the matrices C and A can be based on different block cluster trees and different cluster bases. Let VA D .VA;t / t2T and VC D .VC;t / t2T be nested cluster bases for T with rank distributions KA D .KA;t / t2T , KC D .KC;t / t2T and families of transfer matrices EA D .EA;t / t2T , EC D .EC;t / t2T . Let WA D .WA;s /s2TJ and WC D .WC;s /s2TJ be nested cluster bases for TJ with rank distributions LA D .LA;s /s2TJ , LC D .LC;s /s2TJ and families of transfer matrices FA D .FA;s /s2TJ , FC D .FC;s /s2TJ .
293
7.4 Projected matrix-matrix addition
Let TA;J and TC;J be admissible block cluster trees for the row cluster tree T and the column cluster tree TJ . Let A 2 H 2 .TA;J ; VA ; WA / and C 2 H 2 .TC;J ; VC ; WC /. We assume that the 2 H -matrices A and C are given in the usual H 2 -matrix representations AD
X X
t As ;
(7.10a)
t Cs
(7.10b)
bD.t;s/2LA;J
C bD.t;s/2LA;J
C D
X
VA;t SA;b WA;s C
X
VC;t SC;b WC;s C
bD.t;s/2L C;J
bD.t;s/2LC C;J
for coupling matrices .SA;b /b2LC
A;J
algorithms for computing M .
and .SC;b /b2LC
and now introduce two
C;J
7.4 Projected matrix-matrix addition The sum M ´ C C A is, in general, not an element of the H 2 -matrix spaces H 2 .TA;J ; VA ; WA / or H 2 .TC;J ; VC ; WC /. We assume that M can be approximated reasonably well in the space H 2 .TC;J ; VC ; WC / and try to find its best approximation in this matrix space. In order to be able to use the techniques introduced in Chapter 5, we require the cluster bases VC and WC to be orthogonal, which can be easily ensured by using the Algorithms 16 or 19. According to Lemma 5.5, the best approximation (with respect to the Frobenius norm) of the matrix M in the space H 2 .TC;J ; VC ; WC / is given by …TC;J ;VC ;WC M D
X X
VC;t SM;b WC;s C
bD.t;s/2LC C;J
for the matrices .SM;b /b2LC
t Ms
bD.t;s/2L C;J
bD.t;s/2LC C;J
D
X
VC;t VC;t M WC;s WC;s C
X
t Ms
bD.t;s/2L C;J
defined by
C;J
M WC;s SM;b D VC;t
for all b D .t; s/ 2 LC C;J . Computing the desired result …TC;J ;VC ;WC M is equivalent to computing SM;b for all admissible leaves b 2 LC C;J and computing t Ms for all inadmissible leaves b 2 LC;J .
294
7 A priori matrix arithmetic
Case 1: Admissible leaves b 2 LC C; J Let b D .t; s/ 2 LC C;J . For M D C C A, the equation above takes the form SM;b D VC;t C WC;s C VC;t AWC;s :
Since VC and WC are orthogonal, combining Lemma 5.4 and (7.10b) yields SM;b D SC;b C VC;t AWC;s ;
(7.11)
i.e., computing SM;b is a simple task once we have prepared the matrix VC;t AWC;s in an efficient way. If b D .t; s/ 2 TA;J holds, we can apply the matrix forward transformation to the matrix A and the cluster bases VC and WC in order to compute SyA;b D VC;t AWC;s :
We can see that this is exactly the matrix we need, and that (7.11) takes the form SM;b D SC;b C SyA;b : If b D .t; s/ 62 TA;J holds, we can use the following result in order to find a predecessor b 2 TA;J of b: 6 TA;J . Then we can find an admissible Lemma 7.7. Let b 2 TC;J with b 2 C of TA;J which is a predecessor of b, i.e., which satisfies b 2 leaf b 2 LA;J sons .TC;J ; b /. Proof. We consider the set A ´ fb C 2 TA;J \ TC;J W b 2 sons .TC;J ; b C /g: Since TA;J and TC;J share the same root, A is not empty. Let b D .t ; s / 2 A be the block with maximal level in A. If it had sons in TA;J , Definition 3.12 would imply that the sons in TA;J would coincide with those in TC;J , and one of the latter would have to be a predecessor of b. This predecessor would therefore be included in A, and its level would be higher than that of b , which contradicts the assumption. Therefore b has to be a leaf of TA;J . If b was an inadmissible leaf, both t and s would have to be leaves, according to Definition 3.18, therefore b would also have to be a leaf in TC;J . Since b ¤ b , this is impossible, so b has to be an admissible leaf. C According to this result, there is a b D .t ; s / 2 LA;J with SM;b D SC;b C VC;t VA;t SA;b WA;s WC;s ;
7.4 Projected matrix-matrix addition
295
and due to Lemma 3.15, b is unique. Applying the matrix backward transformation to the block b yields a matrix SyC;b with SM;b D SC;b C VC;t VA;t SyC;b WA;s WC;s ;
and using the cluster operators PV ´ .PV;t / t2T and PW ´ .PW;s /s2TJ defined by PV;t ´ VC;t VA;t ;
PW;s ´ WA;s WC;s
for all t 2 T , s 2 TJ gives us SM;b D SC;b C PV;t SyC;b PW;s : We conclude that admissible leaves b of TC;J can be handled by the matrix forward transformation if they are contained in TA;J and by the matrix backward transformation if they are not.
Case 2: Inadmissible leaves b 2 L C; J Let b D .t; s/ 2 L C;J . If b 62 TA;J , Lemma 7.7 yields the existence of an C with b 2 sons .TC;J ; b /, and we can use the matrix admissible leaf b 2 LA;J backward transformation. If b 2 TA;J , Definition 3.12 implies that the block has to be a leaf in LA;J , too. If it is admissible, we can compute its explicit representation by the matrix backward transformation. Otherwise, we can simply add t As to t Cs . Algorithm 39 uses the matrix forward and backward transformation in order to compute the projection …TC;J ;VC ;WC .A C C / of the sum of A and C . The far- and nearfield matrices of C are overwritten with the result. Lemma 7.8 (Complexity). Let VA and VC be nested cluster bases for T with rank distributions KA and KC , and let .kA;t / t2T and .kC;t / t2T be defined as in (3.16). Let WA and WC be nested cluster bases for TJ with rank distributions LA and LC , and let .lA;s /s2TJ and .lC;s /s2TJ be defined as in (3.18). Let kO t ´ maxfkA;t ; kC;t g;
lOs ´ maxflA;s ; lC;s g
for all t 2 T and s 2 TJ . Let TA;J and TC;J be Csp -sparse admissible block cluster trees. Algorithm 39 requires not more than X X Csp X X 4 kO t3 C lOs3 C kO t2 C ls2 2 t2T
s2TJ
t2T
s2TJ
operations. If KA , KC , LA and LC are .Cbn ; ˛; ˇ; r; /-bounded and if T and TJ are .Crc ; ˛; ˇ; r; /-regular, the number of operations is in O..˛ C ˇ/2r .n C nJ /.
296
7 A priori matrix arithmetic
Algorithm 39. Projected matrix addition. procedure ProjectedAdd(A, var C ); fAlgorithm 13g ClusterBasisProduct(root.T /, VC , VA , PV ); ClusterBasisProduct(root.TJ /, WA , WC , PW ); fAlgorithm 13g fAlgorithm 36g MatrixForward(root.TA;J /, VC , WC , PV , PW , SA , SyA ); for b 2 TC;J do SyC;b 0 end for; for b D .t; s/ 2 TA;J \ TC;J do if b 2 LC C;J then SC;b SC;b C SyA;b C else if b 2 LA;J then y SA;b SC;b else if b 2 L C;J then t Cs C t As fb 2 L t Cs C;J implies b 2 LA;J g end if end for; MatrixBackward(root.TC;J /, VA , WA , PV , PW , SyC;b , SC ) fAlgorithm 38g Proof. According to Lemma 5.13, we can compute PV and PW using Algorithm 13 in X X X X 3 3 3 3 2 kO t3 C lOs3 .kA;t C kC;t /C2 .lA;s C lC;s /4 t2T
s2TJ
t2T
s2TJ
operations. Handling a block b D .t; s/ 2 TA;J \ TC;J in the main loop of Algorithm 39 requires not more than kO t lOs operations, and we get an upper bound of X bD.t;s/2TA;J \TC;J
1 kO t lOs 2
1 2
X
kO t2 C lOs2
bD.t;s/2TA;J \TC;J
X
kO t2 C lOs2
bD.t;s/2TA;J
1 X D 2
X
t2T s2row.TA;J ;t/
1 X kO t2 C 2
X Csp X O 2 kt C lOs2 : 2 t2T
X
lOs2
s2TJ t2col.TA;J ;s/
(7.12)
s2TJ
Adding the bound for the number of operation required for the computation of the cluster basis products yields the desired estimate.
7.5 Exact matrix-matrix addition
297
If KA , KC , LA and LC are .Cbn ; ˛; ˇ; r; /-bounded, we can use Lemma 5.12 in combination with the Lemmas 3.45 and 3.48 to complete the proof. Remark 7.9. If VA D VC , WA D WC and TA;J D TC;J hold, we do not have to prepare the cluster operators PV and PW , and since we also do not have to use the matrix forward and backward transformation, we can reduce Algorithm 39 to its main loop, which requires not more than X Csp X O 2 lOs2 kt C 2 t2T
s2TJ
operations. If the rank distributions KA and LA are .Cbn ; ˛; ˇ; r; /-bounded and if T and TJ are .Crc ; ˛; ˇ; r; /-regular, we find that the number of operations is in O..˛ C ˇ/r .n C nJ /. Remark 7.10 (Auxiliary storage). Algorithm 39 requires temporary storage for the families SyA D .SyA;b /b2TA;J and SyC D .SyC;b /b2TC;J used in the matrix forward and backward transformation. As in Lemma 3.38, we can prove that not more than X Csp X O 2 kt C lOs2 2 t2T
s2TJ
units of auxiliary storage are required. In the special case of Algorithm 39, we can avoid storing all matrices in SyA and SyC by relying on the fact that the computation of, e.g., SyC;b for b 2 LC;J can be accomplished by keeping only the predecessors of b in storage. Using this approach means that we only have to store a small number of matrices per level of the block cluster tree instead of per block, i.e., it is far more storage-efficient as long as the block cluster tree is not degenerate.
7.5 Exact matrix-matrix addition We have seen that we can compute the best approximation of the sum of two H 2 matrices in a prescribed H 2 -matrix space by the efficient Algorithm 39. Estimates for the error of this best approximation depend on the nature of the matrices A and C , i.e., on the underlying problem. Our goal is now to construct an H 2 -matrix space H 2 .TM;J ; VM ; WM / that can represent any matrix M D A C C with A 2 H 2 .TA;J ; VA ; WA / and C 2 H 2 .TC;J ; VC ; WC /, i.e., the error of the best approximation in this space will be zero. Based on the exact representation of the sum in H 2 .TM;J ; VM ; WM /, we can construct more efficient approximations by truncating the cluster bases VM and WM using Algorithm 19 or (preferably) by applying the compression Algorithm 30.
298
7 A priori matrix arithmetic
We construct the block cluster tree TM;J by merging the trees TA;J and TC;J : we require that TM;J has the same root as TA;J and TC;J , namely .root.T /; root.TJ //, and that the set of blocks of TM;J is the union of the blocks of TA;J and TC;J . Definition 7.11 (Induced block cluster tree). Let TM;J be the minimal block cluster tree (i.e., the block cluster tree with minimal number of blocks) for T and TJ satisfying the following conditions: • A block b D .t; s/ 2 TM;J is subdivided if it is subdivided in TA;J or TC;J , i.e., the sons of b in TM;J are given by 8 C C ˆ <sons .t / sons .s/ if b 2 TA;J n LA;J sons.TM;J ; b/ D or b 2 TC;J n LC;J ; (7.13) ˆ : ; otherwise: • A block b D .t; s/ 2 TM;J is admissible if and only if it is a descendant of admissible blocks in TA;J and TC;J , i.e., C C b 2 LM;J () there are bA 2 LA;J and bC 2 LC C;J with
b 2 sons .TM;J ; bA / \ sons .TM;J ; bC /:
(7.14)
Then TM;J is called the induced block cluster tree for the addition. Lemma 7.12. The induced block cluster tree TM;J is admissible. For each b 2 , we have TM;J , we have b 2 TA;J or b 2 TC;J . For each b 2 LM;J b 2 LA;J or b 2 LC;J . If TA;J and TC;J are Csp -sparse, the tree TM;J is 2Csp -sparse. Proof. We start by proving that each b 2 TM;J is an element of TA;J or TC;J by induction on level.b/. Let b 2 TM;J with level.b/ D 0. Then we have b D root.TM;J / D root.TA;J / 2 TA;J . Let now n 2 N be such that b 2 TA;J or b 2 TC;J holds for all b 2 TM;J with level.b/ D n, and let b 2 TM;J with level.b/ D n C 1. Then we can find a father b C 2 TM;J with b 2 sons.TM;J ; b C / and level.b C / D n. According to the induction assumption, we have b C 2 TA;J or b C 2 TC;J , and (7.13) yields b 2 sons.TA;J ; b C / TA;J or b 2 sons.TC;J ; b C / TC;J , respectively, which completes the induction. Let now b 2 LM;J . We prove that b 2 LA;J or b 2 L C;J holds. We have already seen that b has to be a block in TA;J or TC;J . Without loss of generality, let us assume b 2 TA;J . Since b is a leaf of TM;J , (7.13) implies that it has to be a leaf of TA;J . If it is an inadmissible leaf of TA;J , we have already proven . b 2 LA;J
7.5 Exact matrix-matrix addition
299
If b is an admissible leaf of TA;J , we can let bA ´ b and find b D bA 2 sons .TM;J ; bA /. If b 2 sons .TM;J ; bC / would hold for a bC 2 LC C;J , (7.14) would imply that b is admissible in TM;J , which is not the case. Therefore b 62 sons .TM;J ; bC / has to hold for all bC 2 LC C;J . If b 62 TC;J would hold, Lemma 7.7 would give us an admissible leaf bC 2 LC C;J with b 2 sons .TA;J ; bC / sons .TM;J ; bC /, and we have already seen that this is impossible. Therefore we have b 2 TC;J , and since b is a leaf in TM;J , it has to be a leaf in LC;J , as well. Since b is inadmissible in TM;J and admissible in TA;J , (7.14) implies that it has to be inadmissible in TC;J , i.e., b 2 L C;J . In order to prove that TM;J is an admissible block cluster tree, we only have to verify that sons.t / D ; D sons.s/ holds for all inadmissible leaves b D .t; s/ 2 LM;J . Let b 2 LM;J . Since we know that b 2 LA;J or b 2 L C;J holds and that TA;J and TC;J are admissible block cluster trees, this is obvious. Let us now consider the sparsity of TM;J . Let t 2 T , and let s 2 row.TM;J ; t /. This means b D .t; s/ 2 TM;J , and we have already seen that this implies b 2 TA;J or b 2 TB;J , so we find # row.TM;J ; t / # row.TA;J ; t / C # row.TB;J ; t / 2Csp ; which concludes the proof. We can see that the block cluster tree TM;J is “finer” than TA;J and TC;J and that each admissible leaf of TM;J is a descendant of admissible leaves in TA;J and TC;J . In order to be able to express both A and C in the space H 2 .TM;J ; VM ; WM /, we therefore only have to ensure that the descendants of admissible leaves in TA;J and TC;J can be expressed by VM and WM . In order to simplify the presentation, let us assume that the index sets are disjoint, i.e., that KA;t \ KC;t D ;;
LA;s \ LC;s D ;
holds for all t 2 T ; s 2 TJ :
Due to this assumption, we can construct VM or WM by simply merging VA and VC or WA and WC , respectively. Definition 7.13 (Induced cluster bases). We define the induced row cluster basis VM ´ .VM;t / t2T with corresponding rank distribution KM ´ .KM;t / t2T by P KC;t ; KM;t ´ KA;t [
VM;t ´ VA;t
K VC;t 2 RtO M;t
for all t 2 T
and the induced column cluster basis WM ´ .WM;s /s2TJ with corresponding rank distribution LM ´ .LM;s /s2TJ by P LC;s ; LM;s ´ LA;s [
WM;s ´ WA;s
JL WC;s 2 RsO M;s
for all s 2 TJ :
300
7 A priori matrix arithmetic
Lemma 7.14. The families VM D .VM;t / t2T and WM D .WM;s /s2TJ are nested cluster bases for T and TJ , respectively. The transfer matrices EM ´ .EM;t / t2T and FM ´ .FM;s /s2TJ are given by EM;t ´
EA;t EC;t
;
FM;s
´
FA;s
for all t 2 T ; s 2 TJ :
FC;s
We have for all t 2 T ; for all s 2 TJ ;
VA;t D VM;t PVA;t ; VC;t D VM;t PV C;t WA;s D WM;s PWA;s ; WC;s D WM;s PW C;s
where the cluster operators PVA ´ .PVA;t / t2T , PV C ´ .PV C;t / t2T , PWA ´ .PWA;s /s2TJ and PW C ´ .PW C;s /s2TJ are defined by PVA;t PWA;s
I 0 KM;t KA;t ´ 2R ; PV C;t ´ 2 RKM;t KC;t 0 I I 0 ´ 2 RLM;s LA;s ; PW C;s ´ 2 RLM;s LC;s 0 I
for all t 2 T ; for all s 2 TJ ;
i.e., the cluster bases VM or WM can be used to express anything which can be expressed by VA and VC or WA and WC , respectively. Proof. This is trivial. Theorem 7.15 (Exact matrix addition). Let TM;J be the induced block cluster tree from Definition 7.11. Let VM D .VM;t / t2T and WM D .WM;s /s2TJ be the induced row and column cluster bases from Definition 7.13. Then we have M D C C A 2 H 2 .TM;J ; VM ; WM /. Proof. Due to Remark 3.37, it suffices to prove that A 2 H 2 .TM;J ; VM ; WM / and C 2 H 2 .TM;J ; VM ; WM / hold. Without loss of generality, we consider only the representation of the matrix A in H 2 .TM;J ; VM ; WM /. Let b D .t; s/ 2 LA;J . Lemma 7.12 implies b 2 TM;J . If b is inadmissible, it has to be a leaf in TM;J , and we can use the corresponding nearfield matrix directly. If b is admissible, we observe that VM;t holds.
SA;b 0
0 WM;s D VA;t 0
SA;b VC;t 0
0 0
WA;s WC;s
D VA;t SA;b WA;s
7.6 Matrix multiplication
301
The representation of M D C C A can be computed efficiently by using the matrix backward transformation Algorithm 38: we introduce SyM ´ .SyM;b /b2TM;J by
SyM;b
! 8 ˆ 0 SA;b ˆ ˆ ˆ ˆ 0 SC;b ˆ ˆ ! ˆ ˆ ˆ ˆ < SA;b 0 ´ 0 0 ! ˆ ˆ ˆ ˆ 0 0 ˆ ˆ ˆ ˆ 0 SC;b ˆ ˆ ˆ : 0
C if b 2 LA;J \ LC C;J ; C if b 2 LA;J n LC C;J ;
for all b 2 TM;J
C if b 2 LC C;J n LA;J ;
otherwise
and use Algorithm 38 with trivial PV and PW in order to construct the representation of M D A C C in the space H 2 .TM;J ; VM ; WM /.
7.6 Matrix multiplication Let us now turn our attention towards the computation of M ´ C CAB for A 2 RJ , B 2 RJK and C; M 2 RK . Let VA D .VA;t / t2T and VC D .VC;t / t2T be nested cluster bases for the cluster tree T with rank distributions KA D .KA;t / t2T , KC D .KC;t / t2T and families EA D .EA;t / t2T , EC D .EC;t / t2T of transfer matrices Let WA D .WA;s /s2TJ and VB D .VB;s /s2TJ be nested cluster bases for the cluster tree TJ with rank distributions LA D .LA;s /s2TJ , KB D .KB;s /s2TJ and families FA D .FA;s /s2TJ , EB D .EB;s /s2TJ of transfer matrices. Let WB D .WB;r /r2TK and WC D .WC;r /r2TK be nested cluster bases for the cluster tree TK with rank distributions LB D .LB;r /r2TK , LC D .LC;r /r2TK and families FB D .FB;r /r2TK , FC D .FC;r /r2TK of transfer matrices. Let TA;J , TB;JK and TC;K be admissible block cluster trees for the cluster trees T , TJ and TK , respectively. We assume that A, B and C are H 2 -matrices given in the usual H 2 -matrix representation: X X AD VA;t SA;b WA;s C t As ; C bD.t;s/2LA;J
BD
X
bD.t;s/2LA;J
VB;s SB;b WB;r C
X
bD.t;r/2LC C;K
s Br ;
bD.s;r/2LB;JK
C bD.s;r/2LB;JK
C D
X
VC;t SC;b WC;r C
X
bD.t;r/2L C;K
t Cr
302
7 A priori matrix arithmetic
for families SA D .SA;b /b2LC , SB D .SB;b /b2LC and SC D .SC;b /b2LC A;J B;JK C;K of coupling matrices.
7.7 Projected matrix-matrix multiplication As in the case of the addition, we first consider the projection of the result M ´ C CAB into the space H 2 .TC;K ; VC ; WC /. We once again use the results of Chapter 5 to compute the best approximation of M , and to keep the presentation simple we assume that the cluster bases VC and WC are orthogonal. According to Lemma 5.5, the best approximation of the matrix M in the H 2 -matrix space H 2 .TC;K ; VC ; WC / is given by the projection X X VC;t SM;b WC;r C t Mr …TC;K ;VC ;WC M D bD.t;r/2L C;K
bD.t;r/2LC C;K
for the matrices .SM;b /b2LC
defined by
C;K
SM;b ´ VC;t M WC;r
for all b D .t; r/ 2 LC C;K :
We have to compute SM;b for all admissible leaves b 2 LC C;K and t Mr for all . inadmissible leaves b 2 L C;K The main difficulty of the matrix-matrix multiplication is due to the fact that the computation of one block t Mr D t .C C AB/r D t Cr C t ABr requires us to consider the interaction of all indices of J in the evaluation of AB, i.e., it is a non-local operation. In order to handle the non-locality efficiently, we split the computation of t ABr into a sum X t ABr D t As Br s2 t;r
over a suitably-chosen subset t;r of clusters s 2 TJ and have to compute X t As Br t Mr D t Cr C s2 t;r
in the case of an inadmissible leaf b D .t; r/ 2 L C;J and VC;t M WC;r D SC;b C
X s2 t;r
VC;t As BWC;r
7.7 Projected matrix-matrix multiplication
303
in the case of an admissible leaf b D .t; r/ 2 LC C;J . We can split both computations into a number of elementary steps t Mr
t Mr C t As Br ;
SM;b
SM;b C VC;t As BWC;r ;
(7.15)
and we can manage these elementary steps by a recursion. Let t 2 T , s 2 TJ and r 2 TK . We let bA ´ .t; s/, bB ´ .s; r/ and bC ´ .t; r/. Each of these three blocks can fall into one of three categories: it can be admissible, it can be an inadmissible leaf, and it can be a non-leaf block. In the first case, we can use the factorized representation of the corresponding submatrix, in the second case, we can assume that the submatrix is small, while the third case requires a proper handling of the block’s descendents. Definition 7.16 (Subsets of blocks). Let TJ be a block cluster tree. We define the set of subdivided blocks by J ´ TJ n LJ and the set of inadmissible blocks by J ´ TJ n LC J : The following observation simplifies the treatment of recursions for subdivided blocks: Lemma 7.17. For all admissible block cluster trees TJ , we have J J TJ : If b D .t; s/ 2 J , we have b 0 ´ .t 0 ; s 0 / 2 TJ for all t 0 2 sonsC .t / and all s 0 2 sonsC .s/. Proof. Since LC J LJ , the first inclusion is a direct consequence of Definition 7.16. Let now b D .t; s/ 2 J , t 0 2 sonsC .t /, s 0 2 sonsC .s/ and b 0 ´ .t 0 ; s 0 /. If sons.t/ D ; D sons.s/, we have t 0 D t and s 0 D s, i.e., b 0 D b 2 J TJ . Otherwise, b cannot be an inadmissible leaf, since TJ is admissible, and b cannot be an admissible leaf, since admissible leaves are excluded from J . Therefore b is not a leaf of TJ , and Definition 3.12 yields sons.TJ ; b/ D sonsC .t / sonsC .s/, i.e., .t 0 ; s 0 / 2 sons.TJ ; b/ TJ . Depending on the nature of bA , bB and bC , i.e., depending on whether they are admissible leaves, inadmissible leaves or subdivided blocks, we have to handle different combinations of block types differently. We do not discuss situations involving inadmissible leaf blocks in detail. Since inadmissible leaves of an admissible block cluster tree correspond to leaves of the cluster tree, which we can assume to correspond to small sets of indices, we can handle most of these situations in the standard way.
304
7 A priori matrix arithmetic
Case 1: bC is subdivided Let bC 2 C;K be a subdivided block. We have to perform the elementary step t Mr
t Mr C t As Br :
Case 1a: bA and bB are subdivided Let bA 2 A;J and bB 2 B;JK be subdivided blocks in TA;J and TB;JK , respectively. Since Definition 3.12 implies sons.bA / D sonsC .t / sonsC .s/;
sons.bB / D sonsC .s/ sonsC .r/;
sons.bC / D sonsC .t / sonsC .r/; the elementary step (7.15) is equivalent to t Mr
t Mr C t As Br X X D t 0 Mr 0 C t 0 As Br 0 t 0 2sonsC .t/ r 0 2sonsC .r/
D
X
X
t 0 Mr 0 C
t 0 2sonsC .t/ s 0 2sonsC .s/
X
t 0 As 0 Br 0 ;
s 0 2sonsC .s/
and we can split it up into a sequence of elementary steps t 0 Mr 0
t 0 Mr 0 C t 0 As 0 Br 0 ;
where t 0 2 sonsC .t /, s 0 2 sonsC .s/ and r 0 2 sonsC .r/. Each of these steps can be handled by recursion. Case 1b: bA is admissible, bB is subdivided Let bA be admissible and let bB 2 B;JK be a subdivided block in TB;JK . Under these conditions, the step (7.15) is given by t Mr D
t Mr C t As Br X
. t 0 Mr 0 C t 0 As Br 0 /
0 D.t 0 ;r 0 /2sons.T bC C;K ;bC /
D D
X
X
t 0 2sonsC .t/
r 0 2sonsC .r/
X
. t 0 Mr 0 C t 0 As Br 0 /
X
t 0 2sonsC .t/ r 0 2sonsC .r/
t 0 Mr 0 C
X s 0 2sonsC .s/
t 0 As 0 Br 0 :
7.7 Projected matrix-matrix multiplication
305
CD
Split
CD
Recursion X
X
X
X
CD
Figure 7.4. Case 1b of the multiplication: bA is admissible, bB and bC are subdivided.
Since bA is admissible, and since the cluster bases are nested, we can use s 0 D VA;t 0 EA;t 0 ;t SA;bA FA;s t 0 As 0 D t 0 VA;t SA;bA WA;s 0 ;s WA;s 0 D VA;t 0 SA;b 0 WA;s 0 A
with the auxiliary matrix SA;bA0 ´ EA;t 0 ;t SA;bA FA;s 0 ;s
for all bA0 ´ .t 0 ; s 0 / 2 sonsC .t / sonsC .s/ in order to get t Mr
t Mr C t As Br X X t 0 Mr 0 C D t 0 2sonsC .t/ r 0 2sonsC .r/
X
VA;t 0 SA;.t 0 ;s 0 / WA;s 0 s 0 Br 0 ;
s 0 2sonsC .s/
and this operation can be split up into a sequence of elementary steps of the form t 0 Mr 0
t 0 Mr 0 C VA;t 0 SA;bA0 WA;s 0 s 0 Br 0 ;
for all bA0 D .t 0 ; s 0 / 2 sonsC .t / sonsC .s/ and r 0 2 sonsC .r/, where the original matrix t As is now replaced by t 0 As 0 D VA;t 0 SA;bA0 WA;s 0 . The same recursion as in Case 1a can be applied.
306
7 A priori matrix arithmetic
The auxiliary matrices SA;bA0 can be computed by the same approach as in the matrix backward transformation, i.e., by using the splitting Algorithm 37. Once these matrices have been constructed, we can proceed to evaluate the sum over t 0 , s 0 and r 0 by recursion as if .t 0 ; s 0 / was an admissible leaf of TA;J . The case that bA is subdivided and bB is admissible can be handled by a similar procedure. Case 1c: bA and bB are admissible Let bA and bB be admissible (either admissible leaves of TA;J and TB;J , respectively, or auxiliary blocks introduced in Case 1b). We have t As D VA;t SA;bA WA;s ;
s Br D VB;s SB;bB WB;r
and find that the step (7.15) takes the form t Mr
t Mr C t As Br VB;s SB;bB WB;r D t Mr C VA;t SA;bA WA;s D t Mr C VA;t SA;bA PAB;s SB;bB WB;r ;
where the cluster operator PAB ´ .PAB;s /s2TJ is defined by PAB;s ´ WA;s VB;s
for all s 2 TJ . We can introduce SyC;bC ´ SA;bA PAB;s SB;bB in order to get t Mr
t Mr C VA;t SyC;bC WB;r :
Now we only have to add an admissible block to a subdivided block matrix, and this task we can manage by means of the matrix backward transformation Algorithm 38. The computation of the coupling matrix SyC;bC involves only matrices of dimensions #KA;t , #LA;s , #KB;s and #LB;r , so it can be handled efficiently. The cluster operator PAB can be prepared in advance by using the cluster basis product Algorithm 13.
Case 2: bC is admissible Let bC be admissible (either an admissible leaf of TC;J or an auxiliary block introduced in Case 2a). In this case, we have to perform the elementary step SM;bC
SM;bC C VC;t As BWC;r :
7.7 Projected matrix-matrix multiplication
307
CD
Elementary
CD
Matrix backward X
X
X
X
CD
Figure 7.5. Case 1c of the multiplication: bA and bB are admissible, bC is subdivided.
Case 2a: bA and bB are subdivided Let bA 2 A;J and bB 2 B;JK be subdivided blocks in TA;J and TB;JK , respectively. Definition 3.12 implies sons.bA / D sonsC .t / sonsC .s/; and using VC;t D
X
VC;t 0 EC;t 0 ;t ;
sons.bB / D sonsC .s/ sonsC .r/; X
WC;r D
t 0 2sonsC .t/
WC;r 0 FC;r 0 ;r
r 0 2sonsC .r/
allows us to translate the elementary step (7.15) into X X SM;bC C EC;t SM;bC 0 ;t VC;t 0 As BWC;r 0 FC;r 0 ;r t 0 2sonsC .t/ r 0 2sonsC .r/
D
X
X
t 0 2sonsC .t/ r 0 2sonsC .r/
D
X
X
t 0 2sonsC .t/ r 0 2sonsC .r/
SM;b C EC;t 0 ;t
X
(7.16)
VC;t 0 As 0 BWC;r 0 FC;r 0 ;r
s 0 2sonsC .s/ SM;b C EC;t 0 ;t SM;b 0 FC;r 0 ;r C
308
7 A priori matrix arithmetic
CD
Recursion
CD
Collect
X
CD
Figure 7.6. Case 2a of the multiplication: bA and bB are subdivided, bC is admissible.
for bC0 ´ .t 0 ; r 0 / and the auxiliary matrices SM;bC0 ´
X
VC;t 0 As 0 BWC;r 0 ;
s 0 2sonsC .s/
which can be computed by a sequence of elementary steps SM;bC0
SM;bC0 C VC;t 0 As 0 BWC;r 0
for all bC0 D .t 0 ; r 0 / 2 sonsC .t / sonsC .r/ and s 0 2 sonsC .s/. As in the Cases 1a and 1b, these elementary steps are handled by a recursion. Once all auxiliary matrices SM;bC0 have been computed, we can use the collecting Algorithm 35 to evaluate (7.16) and update SM;b . Case 2b: bA is admissible, bB is subdivided Let bA be admissible (either an admissible leaf of TA;J or an auxiliary block introduced in Case 1b) and let bB 2 B;JK be a subdivided block in TB;JK . Since bA is admissible, we have t As D VA;t SA;bA WA;s ;
7.7 Projected matrix-matrix multiplication
309
CD
Matrix forward
CD
Elementary
X
CD
Figure 7.7. Case 2b of the multiplication: bA and bC are admissible, bB is subdivided.
and the step (7.15) is equivalent to SM;bC
SM;bC C VC;t As BWC;r D SM;bC C VC;t VA;t SA;bA WA;s BWC;r D SM;bC C PCA;t SA;bA WA;s BWC;r
(7.17)
D SM;bC C PCA;t SA;bA SyB;bB for the cluster operator PCA D .PCA;t / t2T given by VA;t PCA;t D VC;t
for all t 2 T and for the matrices SyB ´ .SyB;b /b2TB;JK defined by SyB;b ´ WA;s BWC;r
for all b D .t; s/ 2 TB;JK . The matrices SyB D .SyB;b /b2TB;JK can be prepared in advance by using the matrix forward transformation Algorithm 36, and once they are available, the step (7.17) can be performed efficiently. The case that bA is subdivided and bB is admissible can be handled similarly and requires the matrices SyA D .SyA;b /b2TA;J given by SyA;b ´ VC;t AVB;s
310
7 A priori matrix arithmetic
for all b D .t; s/ 2 TA;J which can also be prepared by the matrix forward transformation. Case 2c: bA and bB are admissible Let bA and bB be admissible blocks (either admissible leaves of TA;J and TB;JK , respectively, or auxiliary blocks introduced in Case 1b). We have t As D VA;t SA;bA WA;s ;
s Br D VB;s SB;bB WB;r ;
and the step (7.15) is given by SM;bC
SM;bC C VC;t As BWC;r D SM;bC C VC;t VA;t SA;bA WA;s VB;s SB;bB WB;r WC;r
D SM;bC C PCA;t SA;bA PAB;s SB;bB PBC;r ; where the cluster operator PAB is defined as in Case 1c, the cluster operator PCA is defined as in Case 2b, and the cluster operator PBC D .PBC;r /r2TK is given by WC;r PBC;r ´ WB;r
for all r 2 TK . This update involves only matrices of dimensions #KC;t , #KA;t , #KB;s , #LA;s , #LB;r and #LC;r , so it can be handled efficiently. The cluster operators PCA , PAB and PBC can be prepared efficiently by the cluster basis product Algorithm 13.
Overview of the multiplication algorithm Let us summarize the algorithm. Its core is the update step (7.15) for blocks bA D .t; s/, bB D .s; r/ and bC D .t; r/. Each of these blocks can be inadmissible or admissible. If all blocks are admissible (cf. Case 2c), we can rely on cluster operators in order to perform the update efficiently. The necessary cluster operators PCA , PAB and PBC can be prepared in advance. If two blocks are admissible (cf. Cases 1c and 2b), we use either the matrix forward transformation to find a suitable projection of the remaining inadmissible block, or the matrix backward transformation to translate an admissible matrix representation into the one required by the inadmissible block. If one block is admissible (cf. Cases 1b and 2a), we split this block into auxiliary subblocks and proceed by recursion. The auxiliary blocks either correspond to a splitting of the admissible block, which ensures compatibility with the subsequent recursion steps, or collect the results of the recursion steps, which can then be accumulated to update the admissible block. If no block is admissible (cf. Case 1a), we proceed by recursion if subdivided blocks are present or compute the result directly if all blocks are leaves.
7.7 Projected matrix-matrix multiplication
311
In order to implement the projected matrix-matrix multiplication efficiently, we have to handle all of these cases in the optimal way. Each of the blocks bA , bB and bC can be admissible (i.e., an admissible leaf of the corresponding block cluster tree or an auxiliary block created in the Cases 1b and 2a), an inadmissible leaf of the corresponding block cluster tree, or a non-leaf element of this tree. This leads to 27 D 33 different situations, which all require special handling, so the resulting algorithm is rather lengthy. In order to make it more readable, we split the algorithm into a number of parts: we have Algorithms 41, 42 and 43 for handling the special cases that bA , bB and bC are admissible, respectively, Algorithm 44 for the case that none of the blocks is admissible, and a central “dispatch” Algorithm 40 which decides which one of the other algorithms is appropriate. Algorithm 40. Projected matrix multiplication, recursion. procedure ProjectedMulRec(t, s, r, PCA , PAB , PBC , A, B, var C ); C if .t; s/ 2 LA;J or .t; s/ 62 TA;J then ProjectedMulAdmA(t , s, r, PCA , PAB , PBC , A, B, C ) fAlgorithm 41g C else if .s; r/ 2 LB;JK or .s; r/ 62 TB;JK then ProjectedMulAdmB(t, s, r, PCA , PAB , PBC , A, B, C ) fAlgorithm 42g else if .t; r/ 2 LC or .t; r/ 2 6 T then C;K C;K ProjectedMulAdmC(t, s, r, PCA , PAB , PBC , A, B, C ) fAlgorithm 43g else ProjectedMulInadm(t , s, r, PCA , PAB , PBC , A, B, C ) fAlgorithm 44g end if Algorithm 41 handles the case that the block bA is admissible, i.e., that it is an admissible leaf of TA;J or an auxiliary block, i.e., not an element of TA;J . In both cases, we have t As D VA;t SA;bA WA;s . If bB is also admissible, we can compute the product by using the cluster basis product PAB and use the matrix backward transformation to translate the result into a format which is suitable for the block t Cr corresponding to bC . If bB is not admissible, but bC is, we can apply the matrix forward transformation to bB in order to construct a projection of the matrix block s Br . If bB and bC are inadmissible leaves, we can use Definition 3.18 to infer that t , s and r have to be leaves of the respective cluster trees, so we can afford to expand t As and compute the product directly. Otherwise, i.e., if bB and bC are not admissible and at least one of them is further subdivided, we split bA into auxiliary matrices and proceed by recursion. Algorithm 42 is responsible for the case that bB is admissible, i.e., that bB is an admissible leaf of TB;JK or an auxiliary block, i.e., not an element of TB;JK . In this situation, we have s Br D VB;s SB;bB WB;r . If bA is admissible, the multiplication is handled by Algorithm 41.
312
7 A priori matrix arithmetic
Algorithm 41. Projected matrix multiplication, bA is admissible. procedure ProjectedMulAdmA(t, s, r, PCA , PAB , PBC , A, B, var C ); bA .t; s/; bB .s; r/; bC .t; r/; C if bB 2 LB;JK or bB 62 TB;JK then SyC;b C SA;b PAB;s SB;b fbB is admissibleg SyC;b C
C
A
B
else if bC 2 LC C;K or bC 62 TC;K then SC;b SC;b C PCA;t SA;b SyB;b C
C
LB;JK
A
B
and bC 2 L C;K then t Cs C VA;t SA;bA WA;s Br fbB
fbC is admissibleg
else if bB 2 t Cs and bC are inadmissible leavesg else for t 0 2 sonsC .t /, s 0 2 sonsC .s/ do bA0 .t 0 ; s 0 /; SA;bA0 0 fSet up auxiliary blocks bA0 g end for; Split(bA , VA , WA , SA ); fAlgorithm 37g for t 0 2 sonsC .t /, s 0 2 sonsC .s/, r 0 2 sonsC .r/ do ProjectedMulRec(t 0 , s 0 , r 0 , PCA , PAB , PBC , A, B, C ) fAlgorithm 40g end for end if Algorithm 42. Projected matrix multiplication, bB is admissible. procedure ProjectedMulAdmB(t, s, r, PCA , PAB , PBC , A, B, var C ); .t; s/; bB .s; r/; bC .t; r/; bA if bC 2 LC or b 2 6 T then C C;K C;K y SC;bC SC;bC C SA;bA SB;bB PBC;r fbC is admissibleg else if bA 2 LA;J and bC 2 L then C;K t Cs C t AVB;s SB;bB WB;r fbA and bC are inadmissible leavesg t Cs else for s 0 2 sonsC .s/, r 0 2 sonsC .r/ do .s 0 ; r 0 /; SB;bB0 0 fSet up auxiliary blocks bB0 g bB0 end for; Split(bB , VB , WB , SB ); fAlgorithm 37g for t 0 2 sonsC .t /, s 0 2 sonsC .s/, r 0 2 sonsC .r/ do ProjectedMulRec(t 0 , s 0 , r 0 , PCA , PAB , PBC , A, B, C ) fAlgorithm 40g end for end if
If bC is admissible, we can use the matrix forward transformation to get a “coarse” representation of the block of t As corresponding to bA and compute the product directly.
7.7 Projected matrix-matrix multiplication
313
If bA and bC are inadmissible leaves, Definition 3.18 yields that t , s and r are leaves of the respective cluster trees, so we expand s Br and compute the product explicitly. Otherwise, i.e., if bA and bC are not admissible and at least one of them is subdivided, we split bB into auxiliary blocks, initialize them by using Algorithm 37 and proceed by using Algorithm 40 recursively. Algorithm 43 takes care of the case that bC is admissible, i.e., that bC is an admissible leaf of TC;K or an auxiliary block, i.e., not an element of TC;K . In this case, As BWC;r to the coupling matrix SC;bC . we are interested in adding VC;t Algorithm 43. Projected matrix multiplication, bC is admissible. procedure ProjectedMulAdmC(t, s, r, PCA , PAB , PBC , A, B, var C ); .t; s/; bB .s; r/; bC .t; r/; bA if bA 2 LA;J and bB 2 LB;JK then SC;bC C VC;t As BWC;r fbA and bB are inadmissible leavesg SC;bC else for t 0 2 sonsC .t /, r 0 2 sonsC .r/ do bC0 .t 0 ; r 0 /; SC;bC0 0 fSet up auxiliary matrices blocks bC0 g end for; for t 0 2 sonsC .t /, s 0 2 sonsC .s/, r 0 2 sonsC .r/ do ProjectedMulRec(t 0 , s 0 , r 0 , PCA , PAB , PBC , A, B, C ) fAlgorithm 40g end for; Collect(bC , VC , WC , SC ) fAlgorithm 35g end if If bA or bB are admissible, Algorithm 41 or 42 are responsible. If bA and bB are inadmissible leaves, Definition 3.18 implies that t , s and r are leaf clusters, and we compute the product directly. Otherwise, i.e., if bA and bB are not admissible and at least one of them is subdivided, we split bC into auxiliary blocks, which are initialized by zero. We use Algorithm 40 to compute all subblocks and then apply Algorithm 35 to combine them and perform the update of SC;bC . Finally, we consider the case that neither bA , bB or bC are admissible. This means that each of these blocks can either be an inadmissible leaf or further subdivided. If all blocks are inadmissible leaves, we compute the update of t Cr directly. Otherwise, at least one of the blocks has to be subdivided, and we apply Algorithm 40 recursively to handle all of the subblocks. The final Algorithm 45 performs the computation C …TC;K ;VC ;WC .C C AB/. It uses the cluster basis product Algorithm 13 to prepare the cluster operators PCA D .PCA;t / t2T , PAB D .PAB;s /s2TJ and PBC D .PBC;r /r2TK and the matrix forward transformation Algorithm 36 to prepare the families SyA D .SyA;b /b2TA;J and SyB D .SyB;b /b2TB;JK . Then it uses the recursive multiplication Algorithm 40 to perform the central part of the operation. Afterwards, the intermediate results stored
314
7 A priori matrix arithmetic
Algorithm 44. Projected matrix multiplication, bA , bB and bC are not admissible. procedure ProjectedMulAdmC(t, s, r, PCA , PAB , PBC , A, B, var C ); .t; s/; bB .s; r/; bC .t; r/; bA if bA 2 LA;J and bB 2 LB;JK and bC 2 L C;K then t Cr C t As Br fbA , bB and bC are inadmissible leavesg t Cr else for t 0 2 sonsC .t /, s 0 2 sonsC .s/, r 0 2 sonsC .r/ do fAlgorithm 40g ProjectedMulRec(t 0 , s 0 , r 0 , PCA , PAB , PBC , A, B, C ) end for end if Algorithm 45. Projected matrix-matrix multiplication, computes the best approximation of C C C AB in the space H 2 .TC;J ; VC ; WC /. procedure ProjectedMul(A, B, var C ); r root.T /; rJ root.TJ /; rK root.TK /; fAlgorithm 13g ClusterBasisProduct(r , VC , VA , PCA ); ClusterBasisProduct(rJ , WA , VB , PAB ); fAlgorithm 13g ClusterBasisProduct(rK , WB , WC , PBC ); fAlgorithm 13g MatrixForward(root.TA;J /, VC , VB , PCA , PAB , A, SyA ); fAlgorithm 36g fAlgorithm 36g MatrixForward(root.TB;JK /, WA , WC , PAB , PBC , B, SyB ); for b 2 TC;K do SyC;b 0 fInitialize SyC g end for ProjectedMulRec(r , rJ , rK , PCA , PAB , PBC , A, B, C ); fAlgorithm 40g y MatrixBackward(root.TC;K /, VA , WB , PCA , PBC , SC , C ) fAlgorithm 38g in the family SyC D .SyC;b /b2TC;K have to be added to C in order to complete the computation. This is done by the matrix backward transformation Algorithm 38.
Complexity analysis We base the analysis of the complexity of the projected matrix-matrix multiplication on the triples .t; s; r/ of clusters passed as parameters to the recursive Algorithm 40. Since it uses Algorithms 41, 42, 43 and 44 to accomplish the computation, and since each of these algorithms only uses sums and products of small matrices, we can bound the total complexity once we have bounded the number of cluster triples .t; s; r/ passed to Algorithm 40. Let us consider the situations in which this algorithm is called. The first call occurs in Algorithm 45 with the triple .t; s; r/ D .root.T /; root.TJ /; root.TK // corresponding to the roots of the cluster trees T , TJ and TK . Taking a close look at the Algorithms 41,
7.7 Projected matrix-matrix multiplication
315
42, 43 and 44 reveals that a recursive call to the dispatch Algorithm 40 takes place for the triples .t 0 ; s 0 ; r 0 / with t 0 2 sonsC .t /, s 0 2 sonsC .s/ and r 0 2 sonsC .r/ if, and only if, at least two of the blocks .t; s/, .s; r/ and .t; r/ are inadmissible and at least one of them is subdivided. We define the subsets A ´ A;J D TA;J n LA;J ; B ´ B;JK D TB;JK n LB;JK ; C ´ C;K D TC;K n LC;K ;
(7.18a) (7.18b) (7.18c)
C ; A ´ A;J D TA;J n LA;J
(7.18d)
B ´ B;JK D C ´ C;K D
C TB;JK n LB;JK ; TC;K n LC C;K ;
(7.18e) (7.18f)
of subdivided and inadmissible blocks. The case that .t; s/ is not a leaf and .s; r/ is inadmissible, e.g., can now be characterized by .t; s/ 2 A and .s; r/ 2 B . We organize the triples .t; s; r/ for which Algorithm 40 is called in a tree structure, the call tree TJK . It is uniquely defined as the minimal tree with root root.TJK / ´ .root.T /; root.TJ /; root.TK // and a father-son relation given by 8 sonsC .t / sonsC .s/ sonsC .r/ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ < sons.t; s; r/ ´ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ : ;
if .t; s/ 2 A ; .s; r/ 2 B ; or .t; s/ 2 A ; .s; r/ 2 B ; or .s; r/ 2 B ; .t; r/ 2 C ; or .s; r/ 2 B ; .t; r/ 2 C ; or .t; r/ 2 C ; .t; s/ 2 A ; or .t; r/ 2 C ; .t; s/ 2 A ; otherwise; (7.19) for all triples .t; s; r/ 2 TJK it contains. Since the concept of sparsity has served us well in the analysis of block cluster trees, we aim to establish a similar property for the call tree TJK . Lemma 7.18 (Sparsity of the call tree). Let TA;J , TB;JK and TC;K be Csp -sparse admissible block cluster trees, and let C .t/ ´ f.s; r/ 2 TJ TK W .t; s; r/ 2 TJK g CJ .s/ ´ f.t; r/ 2 T TK W .t; s; r/ 2 TJK g CK .r/ ´ f.t; s/ 2 T TJ W .t; s; r/ 2 TJK g
for all t 2 T ; for all s 2 TJ ; for all r 2 TK :
Then we have #C .t/; #CJ .s/; CK .r/ 3Csp2 ;
for all t 2 T ; s 2 TJ ; r 2 TK :
316
7 A priori matrix arithmetic
Proof. Let t 2 T . We now prove C .t/ C0 .t / ´ f.s; r/ W .t; s/ 2 TA;J ; .s; r/ 2 TB;JK g [ f.s; r/ W .s; r/ 2 TB;JK ; .t; r/ 2 TC;K g [ f.s; r/ W .t; r/ 2 TC;K ; .t; s/ 2 TA;J g: Let .s; r/ 2 C .t /. If .t; s; r/ D root.TJK /, we have t D root.T /, s D root.TJ / and r 2 root.TK /, which implies .t; s/ D root.TA;J / 2 TA;J and .s; r/ D root.TB;JK / 2 TB;JK . Otherwise, i.e., if .t; s; r/ ¤ root.TJK /, the definition of the call tree implies that we can find a father triple .t C ; s C ; r C / 2 TJK with .t; s; r/ 2 sons.t C ; s C ; r C /. Due to A ; A TA;J ;
B ; B TB;JK ;
C ; C TC;K ;
the definition (7.19) yields either .t; s/ 2 TA;J and .s; r/ 2 TB;JK or .s; r/ 2 TB;JK and .t; r/ 2 TC;K or .t; r/ 2 TC;K and .t; s/ 2 TA;J , i.e., .s; r/ is contained in C0 .t /. Since TA;J , TB;JK and TC;K are Csp -sparse, we find #f.s; r/ W .t; s/ 2 TA;J ; .s; r/ 2 TB;JK g D #f.s; r/ W s 2 row.TA;J ; t /; r 2 row.TB;JK ; s/g X # row.TB;JK ; s/ Csp2 ; s2row.TA;J ;t/
#f.s; r/ W .s; r/ 2 TB;JK ; .t; r/ 2 TC;K g D #f.s; r/ W r 2 row.TC;K ; t /; s 2 col.TB;JK ; r/g X # col.TB;JK ; r/ Csp2 ; r2row.TC;K ;t/
#f.s; r/ W .t; r/ 2 TC;K ; .t; s/ 2 TA;J g D #f.s; r/ W r 2 row.TC;K ; t /; s 2 row.TA;J ; t /g X # row.TA;J ; t / Csp2 ; r2row.TC;K ;t/
and can conclude #C .t / #C0 .t / 3Csp2 : The same arguments can be applied to prove bounds for CJ and CK . In order to prove a bound for the complexity, we only have to bound the number of operations required for one call to the dispatch Algorithm 40, i.e., for one triple .t; s; r/ 2 TJK .
7.7 Projected matrix-matrix multiplication
317
Theorem 7.19 (Complexity). Let KA , KB , KC , LA , LB and LC be the rank distributions of VA , VB , VC , WA , WB and WC , respectively, and let .kA;t / t2T , .kB;s /s2TJ , .kC;t / t2T , .lA;s /s2TJ , .lB;r /r2TK and .lC;r /r2TK be defined as in (3.16) and (3.18). Let kO t ´ maxfkC;t ; kA;t g lOs ´ maxflA;s ; kB;s g m O r ´ maxflB;r ; lC;r g
for all t 2 T ;
(7.20)
for all s 2 TJ ; for all r 2 TK :
(7.21) (7.22)
Algorithm 45 requires not more than X X X 4 kO t3 C lOs3 C m O 3r t2T
C2
s2TJ
X
r2TK
.kO t3 C lOs3 / C
.t;s/2TA;J
C3
X
X
X
.lOs3 C m O 3r / C
.s;r/2TB;JK
.kO t3 C m O 3r /
.t;r/2TC;K
.kO t3 C lOs3 C m O 3r /:
.t;s;r/2TJK
operations. If TA;J , TB;JK and TC;K are Csp -sparse, this can be bounded by X X X .4 C 4Csp C 9Csp2 / m O 3r : kO t3 C lOs3 C t2T
s2TJ
r2TK
If KA , KB , KC , LA , LB and LC are .Cbn ; ˛; ˇ; r; /-bounded and if T , TJ and TK are .Crc ; ˛; ˇ; r; /-regular, the number of operations is in O..˛ C ˇ/2r .n C nJ C nK //. Proof. The preparation of the cluster operators PCA , PAB and PBC by Algorithm 13 requires not more than X X X 4 m O 3r kO t3 C lOs3 C t2T
s2TJ
r2TK
operations due to Lemma 5.13. The computation of SyA and SyB by Algorithm 36 and the handling of SyC by Algorithm 38 requires not more than X X X 2 .kO t3 C lOs3 / C .lOs3 C m O 3r / C .kO t3 C m O 3r / .t;s/2TA;J
.s;r/2TB;JK
.t;r/2TC;K
operations due to the Lemmas 7.3 and 7.5. This leaves onlyAlgorithm 40 to be analyzed. In order to bound the total complexity, we have to find bounds for the number of operations required for each triple .t; s; r/ 2 TJK of the call tree.
318
7 A priori matrix arithmetic
Let .t; s; r/ 2 TJK . Algorithm 40 uses one of the Algorithms 41, 42, 43 and 44 to handle this triple. Let us consider Algorithm 41. If bB is admissible, it requires Or 2.#KA;t /.#LA;s /.#KB;s / C 2.#KA;t /.#KB;s /.#LB;r / 2kO t lOs2 C 2kO t lOs m operations to update SyC;bC . Otherwise, if bC is admissible, it uses Or 2.#KC;t /.#KA;t /.#LA;s / C 2.#KC;t /.#LA;s /.#LC;r / 2kO t2 lOs C 2kO t lOs m operations to update SC;bC directly. Otherwise, if bB and bC are inadmissible leaves, the algorithm needs O 2kO t2 lOs C2kO t lOs2 C2kO t lOs m Or 2.#tO/.#KA;t /.#LA;s /C2.#tO/.#LA;s /.#sO /C2.#tO/.#sO /.#r/ operations to update t Cs . Otherwise, i.e., if bB and bC are not admissible and one of these blocks is subdivided, auxiliary blocks are created by using Algorithm 37, which requires not more than 2kO t2 lOs C 2kO t lOs2 operations due to Lemma 7.4, before proceeding to the sons in the call tree by recursion. We can conclude that the number of operations performed for one triple .t; s; r/ 2 TJK by Algorithm 41 is bounded by 2kO t2 lOs C 2kO t lOs2 C 2kO t lOs m O r: Using (5.9) and the elementary inequality 1 ..2xy/z C .2xz/y C .2yz/x/ 6 1 .x 2 z C y 2 z C x 2 y C z 2 y C y 2 x C z 2 x/ 6 1 1 .x 3 C z 3 C y 3 C z 3 C x 3 C y 3 / D .x 3 C y 3 C z 3 / 6 3
xyz D
(7.23)
for all x; y; z 2 R0 , we get the bound Or 2kO t2 lOs C 2kO t lOs2 C 2kO t lOs m
8 O 3 8 O3 2 3 O 3r /: k C ls C m O < 3.kO t3 C lOs3 C m 3 t 3 3 r
By similar arguments, we can prove that the remaining Algorithms 42, 43 and 44 also O 3r / operations per triple. require not more than 3.kO t3 C lOs3 C m Let now TA;J , TB;JK and TC;K be Csp -sparse. Due to Lemma 7.18, we can bound the total number of operations for Algorithm 40 by X 3.kO t3 C lOs3 C m O 3r / .t;s;r/2TJK
3
X
t2T
kO t3 #C .t / C 3
X s2TJ
lOs3 #CJ .s/ C 3
X r2TK
m O 3r #CK .r/
7.8 Exact matrix-matrix multiplication
9Csp2
X
kO t3 C
t2T
X
lOs3 C
s2TJ
X
319
m O 3r ;
r2TK
while Lemmas 7.3 and 7.5 yield the bound X X X X X X 2Csp kO t3 C lOs3 C lOs3 C kO t3 C m O 3r C m O 3r t2T
4Csp
s2TJ
X
t2T
kO t3
s2TJ
C
X
lOs3
r2TK
C
s2TJ
X
m O 3r
t2T
r2TK
:
r2TK
We can complete the proof using the Lemmas 5.12, 3.45 and 3.48. This is a quite surprising result: although the matrix-matrix multiplication involves non-trivial interactions between all levels of the block cluster trees, it can be carried out in linear complexity. Remark 7.20 (Auxiliary storage). Algorithm 45 requires auxiliary storage for the matrix families SyA , SyB and SyC . As in Lemma 3.38, we can bound the storage requirements by X X X .kO t2 C lOs2 / C .lOs2 C m O 2r / C .kO t2 C m O 2r /: .t;s/2TA;J
.s;r/2TB;JK
.t;r/2TC;K
If TA;J , TB;JK and TC;K are Csp -sparse, this can be bounded by X X X 2Csp m O 2r ; kO t2 C lOs2 C t2T
s2TJ
r2TK
and if the rank distributions are .Cbn ; ˛; ˇ; r; /-bounded and if T , TJ and TK are .Crc ; ˛; ˇ; r; /-regular, the storage requirements are in O..˛ C ˇ/r .n C nJ C nK //. By arranging the recursion in a manner similar to that suggested in Remark 7.10, we can significantly reduce the storage requirements for SyA and SyB : in a first phase we perform all operations for admissible blocks bA . By using an outer loop over sons.bA / and an inner loop over sonsC .r/, we can simultaneously compute the matrix forward transformation for A. In a second phase we perform the remaining operations. By using an outer loop over sons.bB / and an inner loop over sonsC .t /, we can simultaneously compute the matrix forward transformation for B.
7.8 Exact matrix-matrix multiplication The projection …TC;K ;VC ;WC M D C C …TC;K ;VC ;WC .AB/ of the matrix M D C C AB can be computed efficiently by the recursive Algorithm 45. The quality of
320
7 A priori matrix arithmetic
the approximation depends on how well the block cluster tree TC;K and the cluster bases VC and WC match the inherent structure of the product AB. We now investigate a different approach: we construct an admissible block cluster tree TM;K and nested cluster bases VM and WM such that M D C C AB 2 H 2 .TM;K ; VM ; WM / holds, i.e., that the product can be represented exactly as an H 2 -matrix. In order to reach a meaningful result, we require that the block cluster trees TA;J , TB;JK and TC;K are admissible and Csp -sparse. Let t 2 T , s 2 TJ and r 2 TK , and let bA ´ .t; s/, bB ´ .s; r/ and bC ´ .t; r/. We are looking for a block cluster tree TM;K and cluster bases VM and WM which ensure that all intermediate products of the form t Mr D t As Br can be expressed exactly in the space H 2 .TM;K ; VM ; WM /, since this implies that also the sum of all intermediate products, i.e., the product AB, can be expressed in this space. We assume that we have chosen a block cluster tree TM;K for M and that bC C is one of its admissible leaves, i.e., that bC 2 LM;K holds (inadmissible leaves of TM;K are not interesting, since they are not limited by VM and WM ). In order to gain some insight into the proper choices for TM;K , VM and WM , we investigate three special cases. Case 1: bB is admissible If bB is admissible, we find s Br D VB;s SB;bB WB;r
and have to ensure that VM and WM can represent the block : t As Br D t As VB;s SB;bB WB;r
If bA is admissible, this is not a problem, since we find VB;s SB;bB WB;r D VA;t SA;bA PAB;s SB;bB WB;r t As Br D VA;t SA;bA WA;s
and observe that the cluster bases VA and WB are sufficient. If bA is inadmissible, we have to be able to express terms of the form t As VB;s D t AVB;s in the basis VM , i.e., we require that there is a matrix RV;t;s with t AVB;s D VM;t RV;t;s ;
(7.24)
since this implies D VM;t RV;t;s SB;bB WB;r ; t As Br D t As VB;s SB;bB WB;r
i.e., the product can be expressed by VM and WB . Since bA is not admissible, we can infer bA 2 A;J TA;J , i.e., s 2 row.TA;J ; t /. If TA;J is Csp -sparse, we therefore have to ensure (7.24) for not more than Csp clusters s.
321
7.8 Exact matrix-matrix multiplication
In order to determine the induced row cluster basis VM D .VM;t / t2T , we introduce the set C row .TA;J ; t / ´ fs 2 TJ W .t; s/ 2 TA;J n LA;J g
D row.TA;J ; t / n rowC .TA;J ; t / (compare Definition 3.29 and 3.40) and can define a row cluster basis satisfying our requirements: Definition 7.21 (Induced row basis). Let all index sets in the rank distributions KA D .KA;t / t2T , KB D .KB;s /s2TJ and KC D .KC;t / t2T be disjoint. We define the induced row cluster basis for the multiplication by VM ´ .VM;t / t2T by
VM;t ´ VC;t VA;t t AVB;s1 : : : t AVB;s (7.25) for all t 2 T , where ´ # row .TA;J ; t / is the number of inadmissible blocks connected to t and fs1 ; : : : ; s g D row .TA;J ; t / are the corresponding column clusters. The rank distribution for VM is given by KM ´ .KM;t / t2T with P KA;t [ P KB;s1 [ P [ P KB;s : KM;t ´ KC;t [ The definition of H 2 -matrices requires the cluster bases to be nested. Fortunately the induced row cluster basis inherits this property from the nested structure of the cluster bases VA and VC and from the structure of the block cluster tree TA;J : Lemma 7.22 (Nested cluster basis). The induced row cluster basis VM is nested. Proof. Let t 2 T with sons.t / ¤ ;. Proving X VM;t 0 EM;t 0 VM;t D t 0 2sons.t/
for suitably chosen transfer matrices .EM;t 0 / t 0 2T is equivalent to proving t 0 VM;t D VM;t 0 EM;t 0
(7.26)
for all t 0 2 sons.t /. We fix t 0 2 sons.t / and have to find a matrix EM;t 0 satisfying equation (7.26). Let ´ # row .TA;J ; t / and 0 ´ # row .TA;J ; t 0 /. For fs1 ; : : : ; s g D row .TA;J ; t / and fs10 ; : : : ; s0 0 g D row .TA;J ; t 0 /, the matrices VM;t 0 and t 0 VM;t take the form VM;t 0 D VC;t 0 VA;t 0 t 0 AVB;s10 : : : t 0 AVB;s0 0 ;
t 0 VM;t D t 0 VC;t t 0 VA;t t 0 AVB;s1 : : : t 0 AVB;s :
322
7 A priori matrix arithmetic
Since VC and VA are nested, we have 1 1 0 0 0 EC;t 0 BEA;t 0 C B 0 C C C B B C C B B t 0 VC;t D VC;t 0 EC;t 0 D VM;t 0 B 0 C ; t 0 VA;t D VA;t 0 EA;t 0 D VM;t 0 B 0 C ; B :: C B :: C @ : A @ : A 0 0 (7.27) this defines the first two columns of the transfer matrix EM;t 0 . Let us now consider s 2 row .TA;J ; t /. According to our definition, bA ´ C .t; s/ 2 TA;J n LA;J holds. Since TA;J is an admissible block cluster tree and since t is not a leaf cluster, Definition 3.18 implies that bA cannot be an inadmissible leaf, therefore it has to be a subdivided block, i.e., bA 2 A;J holds. We have X X t 0 t AVB;s D t 0 As VB;s D t 0 As 0 VB;s D t 0 AVB;s 0 EB;s 0 ;s s 0 2sonsC .s/
s 0 2sonsC .s/
(7.28) for the long-range transfer matrices EB;s 0 ;s from (7.3) and Definition 5.27. Let s 0 2 sonsC .s/. Due to .t; s/ 2 A;J , Lemma 7.17 yields bA0 ´ .t 0 ; s 0 / 2 TA;J . If bA0 is inadmissible, we have s 0 2 row .TA;J ; t 0 /, therefore we can find an index i 2 f1; : : : ; 0 g with s 0 D si0 and conclude 1 0 0 B :: C B : C C B B 0 C C B C (7.29) t 0 AVB;s 0 EB;s 0 ;s D . t 0 AVB;si0 /EB;si0 ;s D VM;t 0 B BEB;si ;s C I B 0 C C B B :: C @ : A 0 the entry of the right matrix is in the row corresponding to si0 D s 0 2 row .TA;J ; t 0 /. If bA0 is admissible, we have s 0 2 rowC .TA;J ; t 0 / and can use the factorized representation of t 0 As 0 in order to get 1 0 0 BSA;b 0 PAB;s 0 EB;s 0 ;s C C B A C B 0 t 0 AVB;s 0 EB;s 0 ;s D VA;t 0 SA;bA0 WA;s 0 VB;s 0 EB;s 0 ;s D VM;t 0 B C C B :: A @ : 0 (7.30)
7.8 Exact matrix-matrix multiplication
323
Accumulating the matrices (7.29) and (7.30) according to (7.28) yields the column of EM;t 0 corresponding to s. Combining the columns yields EM;t 0 and thus proves that VM is nested. Our complexity estimates rely on bounds for the rank distributions of the cluster bases. Using standard assumptions we can establish these bounds also for the induced cluster basis VM : Lemma 7.23 (Bounded cluster basis). Let KA , KB and KC be .Cbn ; ˛; ˇ; r; /-bounded, let TA;J be Csp -sparse, and let Ccr 2 R be a constant satisfying cJ Ccr c (as usual with c D #T and cJ D #TJ ). Then the rank distribution KM corresponding to the O r; /-bounded with O ˇ; induced row cluster basis is .Cybn ; ˛; p p Cybn ´ Cbn .Ccr Csp C 2/; ˛O ´ ˛ r Csp C 2; ˇO ´ ˇ r Csp C 2: Proof. We introduce the sets DA;` ´ ft 2 T W #KA;t > .˛ C ˇ.` 1//r g; DB;` ´ ft 2 T W #KB;s > .˛ C ˇ.` 1//r for s 2 row .TA;J ; t /g; DC;` ´ ft 2 T W #KC;t > .˛ C ˇ.` 1//r g; O 1//r g DM;` ´ ft 2 T W #KM;t > .˛O C ˇ.` O for all ` 2 N. Let ` 2 N, and let t 2 DM;` . According to the definition of ˛O and ˇ, this implies X .Csp C 2/.˛ C ˇ`/r < #KM;t D #KC;t C #KA;t C #KB;s : (7.31) s2row .TA;J ;t/
Due to row .TA;J ; t / row.TA;J ; t /, we have # row .TA;J ; t / Csp and find that (7.31) can only hold if at least one of the following holds: Case 1: #KC;t > .˛ C ˇ.` 1//r . This implies t 2 DC;` . Case 2: #KA;t > .˛ C ˇ.` 1//r . This implies t 2 DA;` . Case 3: There exists a cluster s 2 row .TA;J ; t / with #KB;s > .˛ C ˇ.` 1//r . This implies t 2 DB;` . We conclude that DM;` DC;` [ DA;` [ DB;` (7.32) holds. Since KA and KC are .Cbn ; ˛; ˇ; r; /-bounded, we have #DC;` Cbn ` c ;
#DA;` Cbn ` c :
For the set DB;` , we have DB;` D ft 2 T W #KB;s > .˛ C ˇ.` 1//r for an s 2 row .TA;J ; t /g ft 2 T W #KB;s > .˛ C ˇ.` 1//r for an s 2 row.TA;J ; t /g
324
7 A priori matrix arithmetic
D ft 2 T W #KB;s > .˛ C ˇ.` 1//r for an s 2 TJ with t 2 col.TA;J ; s/g [ D col.TA;J ; s/ s2TJ #KB;s >.˛Cˇ.`1//r
and can use the sparsity of TA;J to find #DB;` Csp #fs 2 TJ W #KB;s > .˛ C ˇ.` 1//r g Csp Cbn ` cJ Csp Cbn Ccr ` c : Combining the estimates for the cardinalities of DC;` , DA;` and DB;` with the inclusion (7.32) yields the required bound for #DM;` . Case 2: bA is admissible If bA is admissible, we have t As D VA;t SA;bA WA;s
and need to ensure that the cluster bases VM and WM are chosen in such a way that t As Br D VA;t SA;bA WA;s s Br
can be expressed. Case 1 already covers the situation that bB is admissible. If bB is not admissible, we have to ensure that the cluster basis WM is able to express r B s WA;s , i.e., that there is a matrix RW;r;s with r B s WA;s D WM;r RW;r;s ;
(7.33)
since then the product takes the form t As Br D VA;t SA;bA WA;s s Br D VA;t SA;bA RW;r;s WM;r ;
i.e., it can be expressed in VA and WM . Since bB is not admissible, we have bB 2 TB;JK , i.e., s 2 col.TB;JK ; r/, and since TB;JK is Csp -sparse, we only have to ensure (7.33) for up to Csp clusters s. We introduce the set C col .TB;JK ; r/ ´ fs 2 TJ W .s; r/ 2 TB;JK n LB;JK g
D col.TB;JK ; r/ n colC .TB;JK ; r/ and can now define a cluster basis satisfying our requirements.
7.8 Exact matrix-matrix multiplication
325
Definition 7.24 (Induced column basis). Let all index sets in the rank distributions LA D .LA;s /s2TJ , LB D .LB;r /r2TK and LC D .LC;r /r2TK be disjoint. We define the induced column cluster basis for the multiplication by WM ´ .WM;r /r2TK with WM;r ´ WC;r
WB;r
r B WA;s1
:::
r B WA;s
(7.34)
for all r 2 TK , where ´ # col .TB;JK ; r/ is the number of inadmissible blocks connected to r and fs1 ; : : : ; s g D col .TB;JK ; r/ are the corresponding row clusters. The rank distribution for WM is given by LM ´ .LM;r /r2TK with P LB;r [ P LA;s1 [ P [ P LA;s : LM;r ´ LC;r [ Lemma 7.25. The induced column cluster basis WM is nested. If the rank distributions LA , LB and LC are .Cbn ; ˛; ˇ; r; /-bounded and if cJ Ccr cK holds, the rank distribution LM of the induced column cluster basis O r; /-bounded with is .Cybn ; ˛; O ˇ; Cybn ´ Cbn .Ccr Csp C 2/;
˛O ´ ˛
p r
Csp C 2;
ˇO ´ ˇ
p r
Csp C 2:
Proof. Similar to the proofs of the Lemmas 7.22 and 7.23. Case 3: bA and bB are inadmissible We cannot handle the case that bA and bB are both inadmissible while bC is admissible, since it leaves us with no information whatsoever about the structure of the product t As Br . Therefore we have to ensure that we never encounter the situation that bC is an admissible leaf, but bA and bB are inadmissible. This can be done by choosing the block cluster tree TM;K appropriately: we construct TM;K based on TC;K and subdivide its leaves .t; r/ as long as there is a cluster s 2 TJ such that .t; s/ and .s; r/ are both inadmissible and at least one of them can be subdivided further. A block in TM;K is considered admissible if it can be expressed by VM and WM , i.e., if for each s 2 TJ with .t; s/ 2 TA;J and .s; r/ 2 TB;JK at least one of the blocks is admissible. Definition 7.26 (Induced block cluster tree). Let TM;K be the minimal block cluster tree for T and TK satisfying the following conditions: • A block b D .t; r/ 2 TM;K is subdivided if it is subdivided in TC;K or if there exists a cluster s 2 TJ with .t; s/ 2 TA;J and .s; r/ 2 TB;JK such that both are inadmissible and at least one of them is subdivided, i.e., the sons of b
326
7 A priori matrix arithmetic
in TM;K are given by 8 ˆ sonsC .t / sonsC .r/ ˆ ˆ ˆ ˆ < sons.TM;K ; b/ D ˆ ˆ ˆ ˆ ˆ :;
if b 2 C or there exists s 2 TJ with .t; s/ 2 A ; .s; r/ 2 B or .t; s/ 2 A ; .s; r/ 2 B ; otherwise: (7.35)
• A block b D .t; r/ 2 TM;K is admissible if and only if for all s 2 TJ with .t; s/ 2 TA;J and .s; r/ 2 TB;JK at least one of these blocks is admissible, i.e., C b D .t; r/ 2 LM;K () .8s 2 TJ W .t; s/ 2 TA;J ^ .s; r/ 2 TB;JK C C ) .t; s/ 2 LA;J _ .s; r/ 2 LB;JK /: (7.36)
Then TM;K is called the induced block cluster tree for the multiplication. Lemma 7.27 (Admissibility). The induced block cluster tree TM;K is admissible. Proof. In order to prove that TM;K is admissible, we have to ensure that each inadmissible leaf b D .t; r/ 2 TM;K corresponds to leaf clusters t and r. Let b D .t; r/ 2 LM;K be an inadmissible leaf. According to (7.36), this means that there exists a cluster s 2 TJ satisfying .t; s/ 2 TA;J and .s; r/ 2 TB;JK with C C .t; s/ 62 LA;J and .s; r/ 62 LB;JK , i.e., .t; s/ 2 A and .s; r/ 2 B . If sons.t/ ¤ ; would hold, we would have .t; s/ 62 LA;J , and .t; s/ 2 A D C TA;J n LA;J would imply .t; s/ 2 A D TA;J n LA;J . By (7.35), we would get sons.TM;K ; b/ ¤ ;, which is impossible, since b is a leaf of TM;K . Therefore we can conclude sons.t / D ;. We can apply the same reasoning to r in order to prove sons.r/ D ;, which implies that TM;K is indeed an admissible block cluster tree. Lemma 7.28 (Sparsity). If TA;J , TB;JK and TC;K are Csp -sparse, the tree TM;K is Csp .Csp C 1/-sparse. Proof. Let t 2 T . We prove row.TM;K ; t / row.TC;K ; t / [ fr 2 T W there exists s 2 TJ with .t; s/ 2 TA;J ; .s; r/ 2 TB;JK g:
(7.37)
Let r 2 row.TM;K ; t /. If b ´ .t; r/ D root.TM;K /, we have t D root.T / and r D root.TK /, i.e., .t; r/ D root.TC;K / 2 TC;K .
7.8 Exact matrix-matrix multiplication
327
Otherwise, b has to have a father, i.e., a block b C D .t C ; r C / 2 TM;K with b 2 sons.TM;K ; b C /. If b C 2 C , we have b 2 sons.TM;K ; b C / D sons.TC;K ; b C / and therefore r 2 row.TC;K ; t /. Otherwise, we can find a cluster s C 2 TJ with bAC ´ .t C ; s C / 2 A TA;J and bBC ´ .s C ; r C / 2 B TB;JK or with bAC D .t C ; s C / 2 A TA;J and bBC D .s C ; r C / 2 B TB;JK . We let s 2 sonsC .s C / and bA ´ .t; s/ and bB ´ .s; r/. Due to t 2 sonsC .t C / and r 2 sonsC .r C /, we can apply Lemma 7.17 in order to find bA 2 TA;J and bB 2 TB;JK , which proves (7.37). This inclusion yields X # row.TM;K ; t / # row.TC;K ; t / C # row.TB;JK ; s/ s2row.TA;J ;t/
Csp C
Csp2
D Csp .Csp C 1/:
A similar argument can be applied to demonstrate # col.TM;K ; r/ Csp .Csp C 1/ and conclude this proof. Theorem 7.29 (Exact matrix multiplication). Let VM D .VM;t / t2T be the induced row cluster basis from Definition 7.21 and let WM D .WM;r /r2TK be the induced column cluster basis from Definition 7.24. Let TM;K be the induced block cluster tree from Definition 7.26. Then we have M D C C AB 2 H 2 .TM;K ; VM ; WM /. Proof. Due to the construction of TM;K , VM and WM , we can directly see that C 2 H 2 .TM;K ; VM ; WM / holds. Since H 2 .TM;K ; VM ; WM / is a matrix space (cf. Remark 3.37), it is sufficient to prove AB 2 H 2 .TM;K ; VM ; WM /. We prove the more general statement t As Br 2 H 2 .TM;K ; VM ; WM /
(7.38)
for all t 2 T , s 2 TJ and r 2 TK with .t; s/ 2 TA;J , .s; r/ 2 TB;JK and .t; r/ 2 TM;K . For t D root.T /, s D root.TJ / and r D root.TK /, this implies the desired property. C holds, i.e., that Case 1: Let us first consider the case that bA ´ .t; s/ 2 LA;J bA is an admissible leaf. We have seen in Case 1 that t As Br D VA;t .SA;bA RW;r;s /WM;r 2 H 2 .TM;K ; VM ; WM /;
holds, i.e., the product can be expressed by the induced row and column cluster bases. C Case 2: Let us now consider the case that bB ´ .s; r/ 2 LB;JK holds, i.e., that bB is an admissible leaf. According to Case 2, we have t As Br D VM;t .RV;t;s SB;bB /WB;r 2 H 2 .TM;K ; VM ; WM /;
328
7 A priori matrix arithmetic
i.e., the product can once more be expressed in the induced cluster bases. Induction: We prove (7.38) by an induction on the number of descendants of bA and bB , i.e., on # sons .bA / C # sons .bB / 2 N2 . Let us first consider bA D .t; s/ 2 TA;J and bB D .s; r/ 2 TB;JK with # sons .bA / C # sons .bB / D 2. This implies sons .bA / D fbA g and sons .bB / D fbB g, i.e., bA and bB are leaves. If at least one of them is admissible, we have already established that (7.38) holds. Otherwise, i.e., if bA and bB are inadmissible leaves, (7.36) implies that bC ´ .t; r/ is also an inadmissible leaf of TM;K , i.e., (7.38) holds trivially. Let now n 2 N2 , and assume that (7.38) holds for all t 2 T , s 2 TJ and r 2 TK with bA D .t; s/ 2 TA;J , bB D .s; r/ 2 TB;JK and bC D .t; r/ 2 TM;K satisfying # sons .bA / C # sons .bB / n. Let t 2 T , s 2 TJ and r 2 TK with bA D .t; s/ 2 TA;J , bB D .s; r/ 2 TB;JK and bC D .t; r/ 2 TM;K and # sons .bA / C # sons .bB / D n C 1. C C If bA or bB are admissible leaves, i.e., if bA 2 LA;J or bB 2 LB;JK holds, we can again use Case 1 or Case 2 to conclude that (7.38) holds. Otherwise, i.e., if bA and bB are inadmissible, they cannot both be leaves, since this would imply # sons .TA;J ; bA / C # sons .TB;JK ; bB / D 2 < n C 1. Therefore at least one of bA and bB has to be subdivided. Definition 7.26 yields sons.TM;K ; bC / D sonsC .t / sonsC .r/, and we observe X X X t 0 As 0 Br 0 : t As Br D t 0 2sonsC .t/ s 0 2sonsC .s/ r 0 2sonsC .r/
For all t 0 2 sonsC .t /, s 0 2 sonsC .s/ and r 0 2 sonsC .r/, Lemma 7.17 yields .t 0 ; s 0 / 2 TA;J and .s 0 ; r 0 / 2 TB;JK . Since at least one of bA and bB is subdivided, we can apply the induction assumption to get t 0 As 0 Br 0 2 H 2 .TM;K ; VM ; WM / for all t 0 2 sonsC .t /, s 0 2 sonsC .s/ and r 0 2 sonsC .r/. According to Remark 3.37, we can conclude X X X t As Br D t 0 As 0 Br 0 2 H 2 .TM;K ; VM ; WM /: t 0 2sonsC .t/ s 0 2sonsC .s/ r 0 2sonsC .r/
7.9 Numerical experiments On a smooth surface, the matrices V and K introduced in Section 4.9 correspond to discretizations of asymptotically smooth kernel functions. The products V V , VK, K V and KK of the matrices correspond, up to discretization errors, to the convolutions of these kernel functions, and we expect (cf. [63], Satz E.3.7) that these convolutions
7.9 Numerical experiments
329
will again be asymptotically smooth. Due to the results of Chapter 4, this means that they can be approximated by polynomials and that the approximation will converge exponentially if the order is increased. Up to errors introduced by the discretization process, the matrix products V V , K V , VK and KK can be expected to share the properties of their continuous counterparts, i.e., it should be possible to approximate the products in the H 2 -matrix space defined by polynomials of a certain order m, and the approximation should converge exponentially as the order increases. In the first example, we consider a sequence of polygonal approximations of the unit sphere S D fx 2 R3 W kxk D 1g consisting of n 2 f512; 2048; : : : ; 131072g plane triangles. On each of these grids, we construct orthogonal nested cluster bases by applying Algorithm 16 to the polynomial bases introduced in Sections 4.4 and 4.5. Together with a standard block cluster tree, these cluster bases define H 2 -matrix spaces, and we use Algorithm 45 to compute the best approximation of the products V V , K V , VK and KK in these spaces. The results are collected in Table 7.1 (which has been taken from [11]). For each grid, each product and each interpolation order, it contains the
Table 7.1. Multiplying double and single layer potential on the unit sphere.
n mD2 512 2:2 1:54 2048 13:0 2:64 8192 66:5 4:64 32768 283:9 5:44 131072 1196:8 5:64 512 2:3 2:63 KV 2048 13:8 5:43 8192 71:1 1:02 32768 304:3 1:62 131072 1257:4 2:32 VK 512 2:3 5:03 2048 14:0 2:12 8192 72:7 4:22 32768 313:1 7:02 131072 1323:6 1:11 KK 512 2:2 6:94 2048 12:9 2:93 8192 66:4 6:13 32768 283:9 1:12 131072 1169:1 1:62
Oper. VV
mD3 1:0 8:57 32:3 9:66 184:2 3:65 897:0 4:95 3625:6 4:75 1:0 4:95 34:4 1:74 196:7 8:54 959:9 2:33 3876:3 4:03 1:0 7:86 35:6 4:24 204:3 2:03 1003:3 4:13 4101:6 7:13 1:0 9:46 32:4 5:25 184:0 2:64 894:0 5:54 3602:0 9:64
mD4 0:4 conv. 40:8 4:67 355:2 1:66 1919:2 2:36 7918:5 3:66 0:4 conv. 41:7 5:96 374:8 5:35 1982:2 1:44 8361:8 2:84 0:4 conv. 41:7 1:95 395:3 1:34 2098:5 2:64 8744:8 5:14 0:4 conv. 40:6 4:86 354:4 1:85 1881:5 3:55 7839:9 6:85
330
7 A priori matrix arithmetic
time1 in seconds for computing the product by Algorithm 45 and the resulting approximation error in the relative operator norm kX ABk2 =kABk2 , which was estimated using a power iteration. We can see that increasing the interpolation order m will typically reduce the approximation error by a factor of 10, i.e., we observe the expected exponential convergence. Since we are working with interpolation in three space dimensions, the rank of the cluster bases is bounded by k D m3 , i.e., we expect a behaviour like O.nk 2 / D O.nm6 / in the computation time. Especially for higher interpolation orders and higher problem dimensions, this behaviour can indeed be observed. In Table 7.2, we investigate the behaviour of the polynomial approximation on the surface C D @Œ1; 13 of the cube. Since it is not a smooth manifold, we expect Table 7.2. Multiplying double and single layer potential on the unit cube.
n mD2 768 2:0 2:93 3072 10:9 3:63 12288 49:7 3:83 49152 208:6 4:03 196608 833:0 4:03 KV 768 2:2 6:32 3072 11:7 7:02 12288 53:4 8:52 49152 222:8 8:62 196608 869:8 8:22 VK 768 2:4 1:91 3072 11:8 2:71 12288 55:0 3:41 49152 232:3 3:71 196608 930:5 3:71 KK 768 2:0 2:72 3072 11:0 3:52 12288 49:7 4:72 49152 206:5 5:72 196608 804:8 6:32
Oper. VV
mD3 3:8 3:54 32:0 6:34 176:7 7:94 850:7 8:54 3692:2 8:64 3:9 5:53 34:9 1:42 200:4 1:92 978:0 2:12 4245:9 2:12 3:8 4:42 37:7 1:11 214:5 1:61 1059:7 2:01 4614:4 2:31 3:8 5:13 32:6 1:22 185:3 1:92 903:1 2:42 3945:2 2:72
mD4 1:2 conv. 133:7 2:04 455:6 2:84 1867:4 3:44 7542:0 3:64 1:2 conv. 134:6 3:23 470:0 7:23 1922:1 8:63 7862:8 9:13 1:2 conv. 134:4 3:92 484:0 8:32 2045:5 1:21 8454:1 1:41 1:2 conv. 132:6 5:13 454:3 9:43 1823:6 1:42 7374:3 1:82
that the smoothness of the result of the multiplication, and therefore the speed of convergence, will be reduced. This is indeed visible in the numerical experiments: we no longer observe a convergence like 10m , but only 2m for V V and K V and even 1
On a single 900 MHz UltraSPARC IIIcu processor of a SunFire 6800 computer.
7.9 Numerical experiments
331
slower convergence for VK and KK. The latter effect might be a consequence of the fact that the column cluster bases used for K involve the normal derivative of the kernel function, which reduces the order of smoothness even further.
Chapter 8
A posteriori matrix arithmetic
In the previous chapter, we have considered two techniques for performing arithmetic operations like matrix addition and multiplication with H 2 -matrices. The first approach computes the best approximation of sums and products in a prescribed H 2 -matrix space, and this can be accomplished in optimal complexity. Since the a priori construction of suitable cluster bases and block partitions can be very complicated in practical applications, the applicability of this technique is limited. The second approach overcomes these limitations by constructing H 2 -matrix spaces that can represent the exact sums and products without any approximation error. Unfortunately, the resulting matrix spaces lead to a very high computational complexity, and this renders them unattractive as building blocks for higher-level operations like the LU factorization or the matrix inversion. In short, the first approach reaches the optimal complexity, while the second approach reaches the optimal accuracy. In this chapter, we introduce a third approach that combines the advantages of both: it allows us to construct adaptive cluster bases a posteriori which ensure a prescribed accuracy and it does not lead to excessively high storage requirements. The new algorithm consists of three parts: first the exact matrix-matrix product is computed in an intermediate representation that is closely related to the left and right semi-uniform matrices introduced in Section 6.1. Since the block cluster tree for the intermediate representation would lead to prohibitively high storage requirements, we do not construct this representation explicitly, but instead approximate it on the fly by a suitable H -matrix format with a coarse block structure. In the last step, this H -matrix is converted into an H 2 -matrix by Algorithm 33. The resulting procedure for approximating the matrix-matrix product is sketched in Figure 8.1. In practical applications, it is advisable to combine the second and third step in order to avoid the necessity of storing the entire intermediate H -matrix (cf. Algorithm 33), but for our theoretical investigations, it is better to treat them separately. The accuracy of these two approximation steps can be controlled, therefore the product can be computed with any given precision. Since the product is computed block by block, the new procedure can be used as a part of inversion and factorization algorithms. The price for its flexibility is an increased algorithmic complexity: the new algorithm does not scale linearly with the number of degrees of freedom, but involves an additional logarithmic factor. The chapter is organized as follows: • Section 8.1 introduces the structure of semi-uniform matrices, a generalization of the left and right semi-uniform matrices defined in Section 6.1.
333 • Section 8.2 contains an algorithm for computing the matrix M0 ´ C C AB exactly and representing it as a semi-uniform matrix. • Section 8.3 introduces an algorithm for coarsening the block structure to compute z that approximates M0 . an H -matrix M • Section 8.4 is devoted to the construction of adaptive cluster bases for this H z into an H 2 -matrix approximation of C C AB. matrix and the conversion of M • Section 8.5 presents numerical experiments that demonstrate that the new multiplication algorithm ensures the desired accuracy and produces data-sparse approximations of the exact result. H 2 -matrices A, B, C
Input A, B, C
Exact computation 2 Semi-uniform matrix M0 2 Hsemi .TM;K ; VA ; WB /
M0 ´ C C AB
Coarsening z 2 H .TC;K ; kH / H -matrix M
z M0 M
Compression M C C AB
H 2 -matrix M with adaptive cluster bases
Figure 8.1. A posteriori multiplication.
Assumptions in this chapter: We assume that cluster trees T , TJ and TK for the finite index sets , J and K, respectively, are given. Let n ´ #, nJ ´ #J and nK ´ #K denote the number of indices in each of the sets, and let c ´ #T , cJ ´ #TJ and cK ´ #K denote the number of clusters in each of the cluster trees. Let TA;J , TB;JK and TC;K be admissible block cluster trees for T and TJ , TJ and TK , and T and TK , respectively. Let VA and VC be nested cluster bases for T with rank distributions KA and KC , let WA and VB be nested cluster bases for TJ with rank distributions LA and KB , and let WB and WC be nested cluster bases for TK with rank distributions LB and LC .
334
8 A posteriori matrix arithmetic
8.1 Semi-uniform matrices The new algorithm for approximating the matrix-matrix product can be split into two phases: first an exact representation of the product is computed, then this representation is converted into an H 2 -matrix with adaptively-chosen cluster bases. In Section 7.8, we have seen that the exact product is an H 2 -matrix for a modified block cluster tree TM;K and modified cluster bases VM and WM . In practice, these modified cluster bases VM and WM will have very high ranks, i.e., the representation of the exact product as an H 2 -matrix will require large amounts of storage (cf. Lemma 7.23 taking into account that Csp 100 holds in important applications). In order to reduce the storage requirements, we replace the H 2 -matrix space 2 H .TM;J ; VM ; WM / used in Section 7.8 by a new kind of matrix subspace: Definition 8.1 (Semi-uniform hierarchical matrices). Let X 2 RJ be matrix. Let TJ be an admissible block cluster tree, and let V D .V t / t2T and W D .Ws /s2TJ be nested cluster bases with rank distributions K D .K t / t2T and L D .Ls /s2TJ . X is a semi-uniform matrix for TJ , V and W if there are families A D .Ab /b2LC J
t s of matrices satisfying Ab 2 RL and Bb 2 RsJK for all O tO
and B D .Bb /b2LC
J
b D .t; s/ 2 LC J and X
XD
bD.t;s/2LC J
.Ab Ws C V t Bb / C
X
t Xs :
(8.1)
bD.t;s/2L J
The matrices A D .Ab /b2LC and B D .Bb /b2LC are called left and right coeffiJ J cient matrices. Obviously, a matrix X is semi-uniform if and only if it can be split into a sum of a left and a right semi-uniform matrix, i.e., the set of semi-uniform matrices is the sum of the spaces of left and right semi-uniform matrices. Remark 8.2 (Matrix subspaces). Let TJ , V and W be as in Definition 8.1. The set 2 .TJ ; V; W / ´ fX 2 RJ W X is a semi-uniform Hsemi matrix for TJ ; V and W g
is a subspace of RJ . 2 .TJ ; V; W / during Proof. Since we have to compute sums of matrices in Hsemi the course of the new multiplication algorithm, we prove the claim constructively instead of simply relying on the fact that H 2 .TJ ; ; W / and H 2 .TJ ; V; / are subspaces. 2 Let X; Y 2 Hsemi .TJ ; V; W /, and let ˛ 2 R. Let b D .t; s/ 2 LC J be an admissible leaf. Due to Definition 8.1 and Lemma 5.4, there are matrices AX;b ; AY;b 2
8.1 Semi-uniform matrices
335
t s and BX;b ; BY;b 2 RsJK satisfying RL O tO
C ˛.AY;b Ws C V t BY;b / t .X C ˛Y /s D t Xs C ˛ t Ys D AX;b Ws C V t BX;b
D .AX;b C ˛AY;b /Ws C V t .BX;b C ˛BY;b / : Applying this equation to all admissible leaves b 2 LC J of TJ yields the desired result and proves that we can compute the representation of linear combinations efficiently by forming linear combinations of the left and right coefficient matrices. We construct semi-uniform matrices by accumulating blockwise updates, and since these updates can appear for arbitrary blocks of the block cluster tree TJ , not only for leaf blocks, we have to ensure that the result still matches Definition 8.1. t s and B 2 RJK . Let TJ , Lemma 8.3 (Restriction). Let t 2 T , s 2 TJ , A 2 RL sO tO V and W be as in Definition 8.1. For all t 2 sons .t / and s 2 sons .s/, we have
t .AWs C V t B /s D . t AFs ;s /Ws C V t .s BE t ;t / : If .t; s/ 2 TJ holds, this equation implies 2 AWs C V t B 2 Hsemi .TJ ; V; W /:
Proof. Let t 2 sons .t / and s 2 sons .s/. Due to Lemma 6.13, we have t .AWs /s D t A.s Ws / D t A.Ws Fs ;s / D t AFs ;s Ws ; t .V t B /s D t V t B s D V t E t ;t B s D V t .s BE t ;t / ; and adding both equations proves our first claim. Now let .t; s/ 2 TJ . We have to prove that AWs C V t B is an element of the 2 .TJ ; V; W /. Let b ´ .t; s/ 2 TJ , and let Tb be the subtree of TJ space Hsemi with root b (cf. Definition 6.30). Due to Corollary 3.15, we find that Lb D fb D .t ; s / 2 LJ W b 2 sons .b/g describes a disjoint partition fbO D tO sO W b 2 Lb g O and this implies of b, AWs C V t B D
X
t .AWs C V t B /s
b D.t ;s /2Lb
D
X
. t AFs ;s /Ws C V t .s BE t ;t / :
b D.t ;s /2Lb
Since all blocks b appearing in this sum are leaves of Tb , each of the terms in the sum 2 is an element of Hsemi .TJ ; V; W / due to Definition 8.1. According to Remark 8.2, 2 Hsemi .TJ ; V; W / is a linear space, so the sum itself is also contained and we conclude 2 .TJ ; V; W /. AWs C V t B 2 Hsemi
336
8 A posteriori matrix arithmetic
8.2 Intermediate representation We now consider the computation of the matrix M0 ´ C C AB for A 2 RJ , B 2 RJK and C 2 RK . As in Section 7.6, we assume that A, B and C are H 2 -matrices given in the standard H 2 -matrix representation X X VA;t SA;b WA;s C t As ; AD bD.t;s/2LA;J
C bD.t;s/2LA;J
BD
X
VB;s SB;b WB;r C
X
s Br ;
bD.s;r/2LB;JK
C bD.s;r/2LB;JK
C D
X
VC;t SC;b WC;r C
X
t Cr
bD.t;r/2L C;K
bD.t;r/2LC C;K
for families SA D .SA;b /b2LC , SB D .SB;b /b2LC and SC D .SC;b /b2LC A;J B;JK C;K of coupling matrices. We denote the sets of subdivided blocks of TA;J , TB;JK and TC;K by J , JK and K and the sets of inadmissible blocks by J , JK and K (cf. Definition 7.16). The first step of the new multiplication algorithm consists of representing the exact result M0 ´ C C AB as a semi-uniform matrix. As in Section 7.7, we express the operation M0
C C AB
M0 M0
C; M0 C t1 As1 Br1 ;
as a sequence
:: : M0
M0 C tm Asm Brm
of updates involving subblocks ti Asi of A and si Bri of B. If we can find a good bound for the number of updates and perform each update efficiently, the resulting algorithm is also efficient. Let us consider one of the updates. We fix t 2 T , s 2 TJ and r 2 TK , and let bA ´ .t; s/, bB ´ .s; r/ and bC ´ .t; r/. Our goal is to perform the update t M0 r
t M0 r C t As Br
efficiently. As in Section 7.8, we now investigate three special cases.
(8.2)
8.2 Intermediate representation
337
Case 1: bB is admissible If bB is admissible, we have s Br D VB;s SB;bB WB;r ;
since B is an H 2 -matrix. This means that the product on the right-hand side of (8.2) has the form t As Br D t AVB;s SB;bB WB;r D . t AVB;s SB;bB /WB;r :
(8.3)
We recall Definition 6.1 and see that the structure of the product is that of a right semi-uniform matrix. We can split the computation of the product t AVB;s SB;bB into two parts: the challenging part is finding AybA ´ t AVB;s
(8.4)
efficiently, the remaining multiplication by SB;bB can be handled directly. Since the matrices AybA resemble the matrices computed by the matrix forward transformation Algorithm 36, we can adapt its recursive approach: we have X X t D t 0 ; VB;s D VB;s 0 EB;s 0 ;s : t 0 2sonsC .t/
s 0 2sonsC .s/
If sons.bA / ¤ ;, Definition 3.12 implies sons.bA / D sonsC .t / sonsC .s/ and we find X X AybA D t AVB;s D t 0 AVB;s 0 EB;s 0 ;s D
X
t 0 2sonsC .t/ s 0 2sonsC .s/
0 D.t 0 ;s 0 /2sons.b / bA A
AybA0 EB;s 0 ;s :
(8.5)
Otherwise, i.e., if bA is a leaf of TA;J , it can be either an inadmissible leaf, in which case we can compute AybA directly by using its definition (8.4), or it is an admissible leaf, and we get AybA D t AVB;s D t As VB;s D VA;t SA;bA WA;s VB;s :
The computation of this product can be split into three steps: the cluster basis product PAB D .PAB;s /s2TJ given by PAB;s ´ WA;s VB;s for all s 2 TJ can be computed efficiently by Algorithm 13, multiplying PAB;s by SA;bA is straightforward, so we only VB;s D require a method for multiplying the intermediate matrix Yyt ´ SA;bA WA;s SA;bA PAB;s with VA;t . We could use a block variant of the backward transformation
338
8 A posteriori matrix arithmetic
Algorithm 7 to compute VA;t Yyt , but for our application it is simpler and in the general case even more efficient to prepare and store the matrices .VA;t / t2T explicitly instead of using the transfer matrices. Algorithm 46 can be used to compute these matrices efficiently. Combining Algorithm 13 for the computation of cluster basis products with the block backward transformation and the recursion (8.5) yields Algorithm 47, the semiuniform matrix forward transformation. Algorithm 46. Expansion of a nested cluster basis. procedure ExpandBasis(t , var V ); if sons.t/ ¤ ; then t 0 2 RtK ; Vt O 0 for t 2 sons.t / do ExpandBasis(t 0 , V ); Vt Vt C Vt 0 Et 0 end for end if
Algorithm 47. Semi-uniform matrix forward transformation for the matrix A. y procedure SemiMatrixForward(b, VA , VB , PAB , SA , A, var A); .t; s/ b; if sons.b/ ¤ ; then Ayb 0; for b 0 2 sons.b/ do b0; .t 0 ; s 0 / y SemiMatrixForward(b 0 , VA , VB , PAB , SA , A, A); Ayb Ayb C Ayb 0 EB;s 0 ;s end for else if b is admissible then Yyt SA;b PAB;s ; VA;t Yyt Ayb else Ayb t AVB;s end if This procedure is a variant of the matrix forward transformation Algorithm 36 used in Chapter 7: instead of applying cluster basis matrices from the left and the right and efficiently computing a projection of a matrix block into a given H 2 -matrix space, we apply only one cluster basis matrix from the right and end up with a matrix in a semi-uniform matrix space.
8.2 Intermediate representation
339
Remark 8.4 (Block backward transformation). The computation of the product VA;t Yyt is closely related to the computation of the vector VA;t yO t performed by the backward transformation Algorithm 7 that is introduced in Section 3.7, and generalizing this procedure yields the block backward transformation Algorithm 48, the counterpart of the block forward transformation Algorithm 10 introduced in Section 5.2. Instead of using Algorithm 46 to compute all matrices VA;t and WB;r explicitly, we therefore can also apply the block backward transformation Algorithm 48 to Yyt and Yyr and reduce the amount of auxiliary storage. This will increase the number of operations, but should not hurt the asymptotic complexity. Algorithm 48. Block backward transformation. procedure BlockBackwardTransformation(t , V , var Y , Yy ); if sons.t/ D ; then Y Y C V t Yyt else for t 0 2 sons.t / do Yyt 0 E t 0 Yyt ; BlockBackwardTransformation(t 0 , V , Y , Yy ) end for end if
Case 2: bA is admissible If bA is admissible, the fact that A is an H 2 -matrix implies t As D VA;t SA;bA WA;s
and we conclude that the product on the right-hand side of (8.2) now satisfies Br D VA;t .r B WA;s SA;b / ; t As Br D VA;t SA;bA WA;s A
(8.6)
i.e., it is a block in a left semi-uniform matrix. As in Case 1, we can split the computation of the product r B WA;s SA;b A into two parts: first we compute BybB ´ r B WA;s
(8.7)
can be for all bB D .s; r/ 2 TB;JK , then the remaining multiplication with SA;b A handled directly. The structure of (8.7) closely resembles the structure of (8.4): if
340
8 A posteriori matrix arithmetic
sons.bB / ¤ ;, we can use the recursion X BybB D r 0 B WA;s 0 FA;s 0 ;s D b 0 D.s 0 ;r 0 /2sons.bB /
X
BybB FA;s 0 ;s ;
(8.8)
b 0 D.s 0 ;r 0 /2sons.bB /
otherwise, we can either rely on the definition (8.7) if bB is an inadmissible leaf or on V WA;s D WB;r SB;b P BybB D r B WA;s D WB;r SB;b B B;s B AB;s
if it is an admissible leaf. The resulting recursive procedure is given in Algorithm 49. Algorithm 49. Semi-uniform matrix forward transformation for the matrix B . y procedure SemiTransMatrixForward(b, WB , WA , PAB , SB , B, var B); .s; r/ b; if sons.b/ ¤ ; then Byb 0; for b 0 2 sons.b/ do .s 0 ; r 0 / b0; y SemiTransMatrixForward(b 0 , WB , WA , PAB , SB , B, B); y y y Bb B C Bb 0 FA;s 0 ;s end for else if b is admissible then Yyr SB;b PAB;s ; y y Bb WB;r Y t else Byb r B WA;s end if
Case 3: bA and bB are inadmissible As in Section 7.8, we do not have any information about the product of two inadmissible blocks and cannot hope to be able to express it in a simple way, e.g., as a semiuniform matrix. Fortunately, we can use the induced block cluster tree introduced in Definition 7.26 to ensure that this situation only appears if bC is also inadmissible. If bC is a leaf, we can handle it directly. Otherwise, we can proceed to the sons of bA , bB and bC by recursion. Lemma 8.5 (Exact matrix product). Let TM;K be the induced block cluster tree of 2 Definition 7.26. Then we have AB 2 Hsemi .TM;K ; VA ; WB /. Proof. Similar to the proof of Theorem 7.29: we prove the more general statement 2 .TM;K ; VA ; WB / t As Br 2 Hsemi
8.2 Intermediate representation
341
for all t 2 T , s 2 TJ and r 2 TK with .t; s/ 2 TA;J , .s; r/ 2 TB;JK and .t; r/ 2 TM;K by induction. C holds, our investigation of Case 1 yields If bA D .t; s/ 2 LA;J / 2 H 2 .TM;K ; VA ; / t As Br D VA;t .r BWA;s SA;b A 2 Hsemi .TM;K ; VA ; WB /: C If bB D .s; r/ 2 LB;JK holds, the results of Case 2 imply 2 H 2 .TM;K ; ; WB / t As Br D . t AVB;s SB;bB /WB;s 2 Hsemi .TM;K ; VA ; WB /:
We can continue as in the proof of Theorem 7.29. Our algorithm requires an explicit representation X .Xb WB;r C VA;t Yb / C MAB ´ AB D
X
t ABr
bD.t;r/2LM;K
C bD.t;r/2LM;K
(8.9) of the product AB as a semi-uniform matrix, i.e., we have to compute the left and right coefficient matrices .Xb /b2LC and .Yb /b2LC . M;K
M;K
We accomplish this task in two steps: first we compute an intermediate representation X X MAB D AB D .Xb WB;r C VA;t Yb / C t ABr bD.t;r/2TM;K
bD.t;r/2LM;K
of the product with coefficient matrices in all blocks, and then we apply a procedure similar to the matrix backward transformation Algorithm 38 in order to translate it into the desired form (8.9). This second step is part of the coarsening Algorithm 54. For the first part of the construction, the structure of the proof of Theorem 7.29 suggests a recursive approach: we start with t D root.T /, s D root.TJ / and r D root.TK /. If bA ´ .t; s/ or bB ´ .s; r/ are admissible leaves, we update XbC or YbC for bC ´ .t; r/ using the equations (8.3) and (8.6). Otherwise, all blocks are inadmissible, and we can either handle an inadmissible leaf directly or proceed by recursion. Since we are looking for the matrix M0 D C C AB D C C MAB , we also have to handle the matrix C . We solve this task by a very simple procedure: we construct a representation of C 2 H 2 .TC;K ; VC ; WC / in the matrix space H 2 .TM;K ; VC ; WC / corresponding to the induced block partition TM;K . Due to Definition 7.26, each block of TC;K is also present in TM;K , and we can use the matrix backward transformation Algorithm 38 to compute the representation of C in H 2 .TM;K ; VC ; WC / efficiently.
342
8 A posteriori matrix arithmetic
Algorithm 50 computes the exact product MAB D AB of two H 2 -matrices A 2 and B, represented as a semi-uniform matrix AB 2 Hsemi .TM;J ; VA ; WB /. Algo2 rithm 38 computes an H -matrix representation of C in H 2 .TM;K ; VC ; WC / 2 Hsemi .TM;K ; VC ; WC /. Each admissible block b D .t; r/ of the matrix M D C C AB is therefore represented in the form VC;t SyC;b WC;r C Xb WB;r C VA;t Yb , i.e., its rank is bounded by .#LC;r / C .#LB;r / C .#KA;t /. This fairly low rank is a major advantage of the new approach compared to the H 2 -matrix representation introduced in Section 7.8: the latter is based on induced cluster bases, and the rank of these cluster bases can become very large (depending on the size of the sparsity constant Csp , cf. Lemma 7.23). Algorithm 50. Semi-uniform representation of the matrix product. procedure MatrixProductSemi(t, s, r, A, B, var MAB , X , Y ); bA .t; s/; bB .s; r/; bC .t; r/; if bB is admissible then C XbC XbC C AybA SB;bB fbB 2 LB;JK g else if bA is admissible then C YbC YbC C BybB SA;b fbA 2 LA;J g A else if bA and bB are leaves then MAB C t As Br fbC 2 L MAB C;K g else for t 0 2 sonsC .t /, s 0 2 sonsC .s/, r 0 2 sonsC .r/ do fbA 2 A or bB 2 B g MatrixProductSemi(t 0 , s 0 , r 0 , A, B, MAB , X , Y ) end for end if As already mentioned this advantage comes at a price: the storage requirements of semi-uniform matrices typically do not grow linearly with the matrix dimension, as for H -matrices the depth of the block cluster tree is also a factor in the complexity estimates.
Complexity estimates Let us now investigate the complexity of Algorithm 50. Since it is based on the matrices .Ayb /b2TA;J and .Byb /b2TB;JK prepared by Algorithm 47 and 49, we first have to analyze the complexity of these preliminary steps. Lemma 8.6. Let V be a nested cluster basis with rank distribution K, let .k t / t2T be defined as in (3.16), and let kO ´ maxfk t W t 2 T g:
343
8.2 Intermediate representation
The computation of all matrices .V t / t2T by Algorithm 46 requires not more than 2.p C 1/kO 2 n operations. Proof. Let t 2 T . If t is not a leaf, the algorithm computes the products of V t 0 2 K 0 RtO0 t and E t 0 2 RK t 0 K t for all t 0 2 sons.t /. This can be accomplished in X
2.#tO0 /.#K t 0 /.#K t /
X
2.#tO0 /k t2 D 2.#tO/k t2 2.#tO/kO 2
t 0 2sons.t/
t 0 2sons.t/
operations, since Definition 3.4 implies X #tO0 D # t 0 2sons.t/
[
tO0 D #tO:
t 0 2sons.t/
Adding these bounds for all t 2 T and applying Corollary 3.10 yields the bound X
2.#tO/kO 2 D 2kO 2
t2T
X t2T
2kO 2
p X
#tO D 2kO 2
p X
X
`D0
t2T level.t/D`
#tO
n D 2kO 2 .p C 1/n :
`D0
Once the cluster bases have been prepared, the complexity of the semi-uniform matrix forward transformation can be investigated. Lemma 8.7. Let KA , LA , KB and LB be the rank distributions of VA , WA , VB and WB . Let .kA;t / t2T , .lA;s /s2TJ , .kB;s /s2TJ and .lB;r /r2TK be defined as in (3.16) and (3.18), and let kO ´ maxfkA;t ; lA;s ; kB;s ; lB;r W t 2 T ; s 2 TJ ; r 2 TK g:
(8.10)
If TA;J is Csp -sparse, Algorithm 47 requires not more than 2Csp .kO 3 c C kO 2 .p C 1/n / operations. If TB;JK is Csp -sparse, Algorithm 49 requires not more than 2Csp .kO 3 cK C kO 2 .pK C 1/nK / O 0; 1/-regular, the number of operations is in operations. If T and TK are .Crc ; k; 2 2 O O O.k .p C 1/n / and O.k .pK C 1/nK /, respectively.
344
8 A posteriori matrix arithmetic
Proof. Since Algorithms 47 and 49 are very similar, we only consider the first one. Let b D .t; s/ 2 TA;J . If sons.b/ ¤ ;, the algorithm computes Ayb 0 EB;s 0 ;s for all sons b 0 of b and adds the result to Ayb . This can be accomplished in X 2.#tO0 /.#KB;s 0 /.#KB;s / b 0 D.t 0 ;s 0 /2sons.b/
D2
X
#tO0
t 0 2sonsC .t/
2 2.#tO/kB;s
X
#KB;s 0 .#KB;s /
s 0 2sonsC .s/
2.#tO/kO 2 :
Otherwise, i.e., if b is a leaf, it can be either inadmissible or admissible. If it is inadmissible, the clusters t and s are leaves of T and TJ , respectively, so we have #sO kB;s and find that we can compute the product in 2.#tO/.#sO /.#KB;s / 2.#tO/kB;s .#KB;s / 2.#tO/kO 2 operations. If we are dealing with an admissible leaf, the computation of Yyt can be performed in 2.#KA;t /.#LA;s /.#KB;s / 2kA;t lA;s kB;s 2kO 3 operations, and the multiplication by VA;t takes not more than 2.#tO/.#KA;t /.#KB;s / 2.#tO/kA;t kB;s 2.#tO/kO 2 operations. We can summarize that we need not more than 2.kO C #tO/kO 2 operations for the block b. Adding the bounds for all blocks yields X X X X kO 3 C 2 2.kO C #tO/kO 2 D 2 .#tO/kO 2 bD.t;s/2TA;J
b2TA;J
t2T s2row.t/
2k #TA;J C 2Csp kO 2 O3
X
#tO
t2T
D 2kO 3 #TA;J C 2Csp kO 2
2kO 3 #TA;J C 2Csp kO 2
p X
X
`D0
t2T level.t/D`
p X
n
`D0
D 2kO 3 #TA;J C 2Csp kO 2 .p C 1/n :
#tO
8.2 Intermediate representation
Using the estimate #TA;J D
X
345
# row.t / Csp #T D Csp c
t2T
O 0; 1/-regular, we can find a constant C we get the desired result, and if T is .Crc ; k; O independent of k and n satisfying c C n =kO and complete the proof. In order to analyze the complexity of Algorithm 50, we follow the approach used in Section 7.7: we observe that the triples .t; s; r/ for which the algorithm is called are organized in a tree structure. The corresponding call tree TJK is the minimal tree with root root.TJK / ´ .root.T /; root.TJ /; root.TK // and a father-son relation given by 8 C C C ˆ <sons .t / sons .s/ sons .r/ sons.t; s; r/ ´ ˆ : ;
if .t; s/ 2 A ; .s; r/ 2 B ; or .t; s/ 2 A ; .s; r/ 2 B ; otherwise; (8.11) for all triples .t; s; r/ 2 TJK it contains. Here A A TA;J and B B TB;JK denote the sets of subdivided and inadmissible blocks of A and B, respectively (cf. Definition 7.16 and (7.18)). As in Section 7.7, the complexity estimate depends on a sparsity estimate for the call tree. Lemma 8.8 (Sparsity of the call tree). Let TA;J and TB;JK be Csp -sparse admissible block cluster trees, and let C .t/ ´ f.s; r/ 2 TJ TK W .t; s; r/ 2 TJK g CJ .s/ ´ f.t; r/ 2 T TK W .t; s; r/ 2 TJK g CK .r/ ´ f.t; s/ 2 T TJ W .t; s; r/ 2 TJK g
for all t 2 T ; for all s 2 TJ ; for all r 2 TK :
Then we have #C .t /; #CJ .s/; #CK .r/ 3Csp2
for all t 2 T ; s 2 TJ ; r 2 TK :
Proof. As in the proof of Lemma 7.18: comparing (8.11) and (7.19) reveals that each node of the call tree of Algorithm 50 is also a node of the call tree of Algorithm 45 (with a minimal TC;K consisting only of the root). Since we are working with semi-uniform matrices instead of H 2 -matrices, the number of operations for a block b D .t; r/ 2 TM;K depends on the cardinalities of tO and r, O which means that the weak assumptions used in the previous chapters do not allow us to derive meaningful complexity estimates. Instead, we have to require that the strict conditions introduced in Section 3.8 are met, i.e., that the rank distribution is k-bounded and that the cluster trees are .Crc ; k/-bounded.
346
8 A posteriori matrix arithmetic
Lemma 8.9 (Complexity). Let KA , LA , KB and LB be the rank distributions of VA , WA , VB and WB . Let .kA;t / t2T , .lA;s /s2TJ , .kB;s /s2TJ and .lB;r /r2TK be defined as in (3.16) and (3.18), and let kO be defined as in (8.10). If TA;J and TB;JK are Csp -sparse, Algorithm 50 requires not more than 6Csp2 kO 2 ..p C 1/n C .pK C 1/nK / operations to compute the semi-uniform representation MAB D AB of the product of the H 2 -matrices A and B. Proof. Let .t; s; r/ 2 TJK , and let bA ´ .t; s/, bB ´ .s; r/ and bC ´ .t; r/. If bB is admissible, we have to compute the matrix AybA SB;bB and add it to XbC , and this requires not more than 2.#tO/.#KB;s /.#LB;r / 2.#tO/kO 2 operations. If bA is admissible, we have to compute the matrix BybB SA;b and add it to A YbC , and this requires not more than
2.#r/.#L O O kO 2 A;s /.#KA;t / 2.# r/ operations. If bA and bB are inadmissible leaves, the fact that TA;J and TB;JK are admissible block cluster trees implies that the clusters t , s and r are leaves of T , TJ and TK , and the update of t MAB r requires not more than 2.#tO/.#sO /.#r/ O .#tO/kO 2 operations, and we conclude that the number of operations for each triple .t; s; r/ 2 TJK is bounded by 2kO 2 .#tO C #r/: O Combining Corollary 3.10 and Lemma 8.8 we can find and estimate for the number of operations of the entire algorithm: we have X X X #tO D .#tO/#C .t / 3Csp2 #tO t2T
.t;s;r/2TJK
t2T
X
X
`D0
t2T level.t/D`
p
D X
#rO D
.t;s;r/2TJK
and this implies X .t;s;r/2TJK
3Csp2 X
#tO 3Csp2
p X
n D 3Csp2 .p C 1/n ;
`D0
2 .#r/#C O K .r/ 3Csp .pK C 1/nK ;
r2TK
2kO 2 .#tO C #r/ O 6Csp2 kO 2 ..p C 1/n C .pK C 1/nK /:
8.3 Coarsening
347
Remark 8.10 (Computation of M0 D C C AB). Algorithm 50 computes the representation of MAB D AB as a semiuniform matrix in the block cluster tree TzM;K defined by the call tree through .t; r/ 2 TzM;K () there is an s 2 TJ with .t; s; r/ 2 TJK : In general, TzM;K is only a subtree of the induced block cluster tree TM;K , but we can use Lemma 8.3 to compute a semi-uniform representation of MAB using TM;K . Similarly we can use the matrix backward transformation Algorithm 38 to compute an H 2 -matrix representation of C in the space H 2 .TM;K ; VC ; WC /. The result is a representation of the exact result C C AB D C C MAB in the matrix 2 space H 2 .TM;K ; VC ; WC / C Hsemi .TM;K ; VA ; WB /.
8.3 Coarsening In most practical applications, the induced block cluster tree TM;K used in the exact representation of M0 D C C AB provided by Lemma 8.5 will not coincide with the block cluster tree TC;K we have prescribed for the result of the multiplication: although Definition 7.26 ensures that each block of TC;K will also appear in TM;K , leaves of TC;K may correspond to subdivided blocks in TM;K , and admissible blocks in TC;K may be inadmissible in TM;K due to the modified admissibility condition used in the construction of TM;K . The second phase of the a posteriori matrix-matrix multiplication algorithm addresses this problem: for all admissible leaves b D .t; r/ 2 TC;K , we check whether they are also admissible leaves in TM;K . If they are not, we construct low-rank approximations of the corresponding submatrices by using the hierarchical approximation approach described in [52]. The result of this procedure is an approxiz of M0 D C C AB by a hierarchical matrix based on the prescribed block mation M cluster tree TC;K instead of the induced block cluster tree TM;K . In the third and final phase of the multiplication algorithm, we convert this intermediate approximation into the desired H 2 -matrix based on the block cluster tree TC;K . Since we aim to approximate the matrix M D C C AB by an H -matrix, we can compute all approximations block by block and do not have to take interactions within rows or columns into account. This computation can be carried out by a recursive procedure working from the leaves of TM;K upwards.
Inadmissible leaves of TM; K Let b 2 LM;K be an inadmissible leaf of TM;K . If b is also an inadmissible leaf of TC;K , we are done. Otherwise, we can use a singular value decomposition to turn the corresponding submatrix into a low-rank matrix:
348
8 A posteriori matrix arithmetic
Lemma 8.11 (Singular value decomposition). Let 0 and J 0 J. Let M 2 0 0 RJ 0 ;J 0 . Let p ´ rank.M / minf# ; #J g with p > 0. Let 1 p > 0 be the non-zero singular values of M . For each l 2 f1; : : : ; pg, we can find an index set and W 2 RJK and a diagonal K with nK D l, orthogonal matrices V 2 RK 0 J0 KK matrix S 2 R which satisfy ´ lC1 if l < p; (8.12) kM V SW k2 D 0 otherwise; ´ P
1=2 p i2 if l < p; iDlC1 kM V SW kF D (8.13) 0 otherwise; Proof. As in Lemma 5.19. For a given error tolerance b 2 R>0 , this result allows us to find an approximation with sufficiently small error and the lowest possible rank: for an inadmissible leaf K b D .t; r/ 2 TM;K , we can find an index set Kb , orthogonal matrices Vb 2 RtO b , KKb
Wb 2 RrO
and a diagonal matrix Sb 2 RKb Kb such that
k t Mr Vb Sb Wb k2 b
or
k t Mr Vb Sb Wb kF b
(8.14)
holds. Low-rank representations of this type are closely related to the ones used in the context of H -matrices: by setting Ab ´ Vb Sb and Bb ´ Wb , we immediately get the desired factorized representation of a low-rank approximation of the submatrix corresponding to the block b. Remark 8.12 (Complexity). According to Lemma 5.17, the Householder factorization in Algorithm 51 requires not more than Cqr mnq operations, and according to Remark 5.20 the singular value decomposition can be computed in not more than Csvd q 3 operations. If m > n, the computation of Vy can be carried out in 2mn2 D 2mnq y can be performed in 2m2 n D 2mnq operations, otherwise the computation of W operations. The entire Algorithm 51 therefore takes not more than Csvdf .# 0 /.#J 0 / minf# 0 ; #J 0 g operations with Csvdf D Cqr C Csvd C 2 to complete.
8.3 Coarsening
349
Algorithm 51. Compute a low-rank approximation V S W of a matrix X 2 RJ 0 ;J 0 . y procedure LowrankFactor(X , 0 , J 0 , , var V , S, W , K); 0 0 m # ; n #J ; q minfm; ng; Fix arbitrary isomorphisms m W f1; : : : ; mg ! 0 and n W f1; : : : ; ng ! J 0 ; Xy 0 2 Rmn for i 2 f1; : : : ; mg, j 2 f1; : : : ; ng do Xyij Xm .i/;n .j / end for; if m > n then yR y of Xy ; Compute a Householder factorization Xy D Q qq y y Y R2R else yR y of Xy ; Compute a Householder factorization Xy D Q qq y y Y R 2R end if y of Yy ; Compute a singular value decomposition Yy D Vy diag.1 ; : : : ; q /W if m > n then y Vy Vy Q else y W
yW y Q end if FindRank( , q, .i /qiD1 , l); Ky
m .f1; : : : ; lg/; y
fAlgorithm 17g y
K K V 0 2 R ; W 0 2 RJ ; J0 0 for i 2 f1; : : : ; mg, j 2 f1; : : : ; lg do Vm .i/;m .j / Vyij end for; for i 2 f1; : : : ; ng, j 2 f1; : : : ; lg do yij Wn .i/;m .j / W end for; for i 2 f1; : : : ; lg do Sm .i/;m .i/ i end for
S
y
y
0 2 RKK ;
Admissible leaves of TM; K C We can apply the same approach to admissible leaves: let b D .t; r/ 2 LM;K . Assuming that the index sets LC;r , LB;r and KA;t are pairwise disjoint, the fact that
350
8 A posteriori matrix arithmetic
M is semi-uniform implies C VA;t Yb t Mr D VC;t SyC;b WC;r C Xb WB;r 0 1
WC;r A y ; D VC;t SC;b Xb VA;t @WB;r Yb
(8.15)
i.e., the submatrix corresponding to b is already given in a factorized representation of rank not higher than .#LC;r / C .#LB;r / C .#KA;t /. If we want to keep the storage requirements low, we can try to reduce the rank by applying Lemma 8.11 to t Mr . Since the direct computation of the singular value decomposition of the submatrix corresponding to b could become computationally expensive if the index sets tO and rO become large, we have to exploit the factorized structure (8.15): we apply Lemma 5.16 to the left factor in (8.15) in order to find an index set Kyb , an orthogonal matrix yb K yb 2 RKyb .LC;r [LB;r [KA;t / with and a matrix R Qb 2 R tO
VC;t SyC;b
Xb
yb : VA;t D Qb R
Combining this factorization with (8.15) yields 0 1
WC;r A t Mr D VC;t SyC;b Xb VA;t @WB;r Yb
yb j y D Qb R Kb LC;r
0 1 WC;r yb j y @W A R Kb KA;t B;r Yb
yb j y R Kb LB;r
yb j y yb j y yb j y y D Qb R W CR W CR Y D Qb M b Kb LC;r C;r Kb LB;r B;r Kb KA;t b for the auxiliary matrix yb j y b ´ WC;r R M y K
b LC;r
yb j C WB;r R y K
b LB;r
yb j C Yb R y K
b KA;t
yb KK
2 RsO
:
y b has only #Kyb .#LC;r / C .#LB;r / C .#KA;t / rows, we can compute its Since M singular value decomposition efficiently, and this gives us an index set Kb , orthogonal y JK matrices Vyb 2 RKb Kb , Wb 2 RsO b , and a diagonal matrix Sb 2 RKb Kb such that y b Wb Sb Vy k2 b kM b
or
y b Wb Sb Vy kF b kM b
holds. Since Qb is an orthogonal matrix, the same holds for Vb ´ Qb Vyb , and we find y Vyb Sb W /; t Mr Vb Sb Wb D Qb .M b b so (8.14) is guaranteed (since Sb is diagonal, we have Sb D Sb ). The resulting procedure for finding low-rank approximations of leaves of TM;K is given in Algorithm 52.
8.3 Coarsening
351
Algorithm 52. Low-rank approximation of leaf blocks. procedure CoarsenLeaf(b, M , , var Vb , Sb , Wb ); .t; r/ b; if b is admissible in TM;K then VC;t SyC;b ; X1;b X2;b Xb ; VA;t ; X3;b .LC;r [LB;r [KA;t / ; Zb 0 2 RtO Zb jtOLC;r X1;b ; Zb jtOLB;r X2;b ; Zb jtOKA;t X3;b ; yb , Kyb ); fAlgorithm 15g Householder(Zb , tO, Qb , R y1;b y y y2;b y y y3;b y y R Rj ; R Rj ; R Rj ; Kb LC;r
y 1;b M y 2;b M
y ; WC;r R 1;b y ; WB;r R
y 3;b M
y ; Yb R 3;b
Kb LB;r
Kb KA;t
2;b
y 1;b C M y 2;b C M y 3;b 2 RJKyb ; yb M M sO y b , sO , Kyb , b , Wb , Sb , Vyb , Kb ); LowrankFactor(M Qb Vyb Vb else LowrankFactor( t Mr , tO, r, O b , Vb , Sb , Wb , Kb ) end if
fAlgorithm 51g
fAlgorithm 51g
Subdivided blocks in TM; K Now let us consider blocks b D .t; r/ 2 TM;K which are not leaves. We assume that we have already found low-rank approximations Vb 0 Sb 0 Wb0 for all sons b 0 2 sons.b/, and we have to find a way of combining these approximations into a low-rank approximation of the entire block. This means that we have to approximate the matrix X X X Vb 0 Sb 0 Wb0 D V t 0 ;r 0 S t 0 ;r 0 W t0 ;r 0 : (8.16) b 0 2sons.b/
t 0 2sonsC .t/ r 0 2sonsC .r/
Let ´ # sonsC .t / and ft1 ; : : : ; t g ´ sonsC .t /, and let ´ # sonsC .r/ and fr1 ; : : : ; r g ´ sonsC .r/. The inner sum can be expressed in the form 1 W t0 ;r1
B C V t 0 ;r S t 0 ;r @ ::: A : W t0 ;r 0
X r 0 2sonsC .r/
V t 0 ;r 0 S t 0 ;r 0 W t0 ;r 0 D V t 0 ;r1 S t 0 ;r1
:::
(8.17)
352
8 A posteriori matrix arithmetic
We once more use Lemma 5.16 in order to find an index set Kyt 0 , an orthogonal matrix y 0 K y t 0 2 RKyt 0 .K t 0 ;r1 [[K t 0 ;r / with Q t 0 2 R 0 t and a matrix R tO
V t 0 ;r1 S t 0 ;r1
yt 0 : V t 0 ;r S t 0 ;r D Q t 0 R
:::
We combine this equation with (8.17) and get 1 W t0 ;r1
B C V t 0 ;r S t 0 ;r @ ::: A W t0 ;r 0
X
V t 0 ;r 0 S t 0 ;r 0 W t0 ;r 0 D V t 0 ;r1 S t 0 ;r1
:::
r 0 2sonsC .r/
yt 0 j y D Qt 0 R K t 0 K t 0 ;r
yt 0 j y R K
::: 1
D Qt 0
X
yt 0 j y R K
r 0 2sonsC .r/
t 0 K t 0 ;r 0
t 0 K t 0 ;r
1 W t0 ;r1 B :: C @ : A W t0 ;r 0
y t0 W t0 ;r 0 D Q t 0 M
for the auxiliary matrix y t0 ´ M
X r 0 2sonsC .r/
y t 0 j W t 0 ;r 0 R y K
t 0 K t 0 ;r 0
y 0 KK t
2 RrO
:
We assume that the index sets Kyt1 ; : : : ; Kyt are pairwise disjoint and combine the above equation with (8.16) in order to get 0 1 y M X
B :t1 C y Vb 0 Sb 0 Wb 0 D Q t1 : : : Q t @ :: A D Qb M b b 0 2sons.b/ y t M
for the auxiliary matrices Qb ´ Q t1
:::
yb K Q t 2 R ; tO
yb ´ M y t1 M
:::
y t 2 RKKyb M rO
P [ P Kyt . We have Q t 0 D t 0 Q t 0 for all t 0 2 sonsC .t /, and since with Kyb ´ Kyt1 [ the index sets corresponding to different sons of t are disjoint, we can conclude that the orthogonality of the matrices Q t 0 implies that also Qb is orthogonal. Therefore we can proceed as in the case of leaf blocks: we have to find a low-rank approximation of y . The resulting procedure for finding a low-rank approximation of a subdivided Qb M b matrix is given in Algorithm 53. Remark 8.13 (Intermediate approximation). In practice, it can be more efficient to replace the exact QR factorization of the matrices Z t 0 by truncations which reduce the y t 0 at an early stage. rank of the intermediate matrices Q t 0 R
8.3 Coarsening
Algorithm 53. Low-rank approximation of subdivided blocks. procedure CoarsenSubdivided(b, M , , var Vb , Sb , Wb ); .t; r/ b; Kyb ;; for t 0 2 sonsC .t / do K t 0 ; ;; for r 0 2 sonsC .r/ do b0 .t 0 ; r 0 /; K t 0 ; K t 0 ; [ Kb 0 end for; K 0 0 2 RtO t ; ; Zt 0 for r 0 2 sonsC .r/ do b0 .t 0 ; r 0 /; Z t 0 jKb0 Vb 0 Sb 0 end for; y t 0 , Kyt 0 ); Householder(Z t 0 , tO0 , Q t 0 , R y J K y t0 t0 ; M 02R 0 C for r 2 sons .r/ do b0 .t 0 ; r 0 /; y t0 y t 0 C Wb 0 R y t 0 j M M y
353
fAlgorithm 15g
K t 0 Kb 0
end for; Kyb Kyb [ Kyt 0 end for; y y K JK yb 0 2 RtO b ; M 0 2 RrO b ; Qb for t 0 2 sonsC .t / do y b jJK 0 y t0 Qb jK t 0 Qt 0 ; M M t end for; y b , r, LowrankFactor(M O Kyb , b , Wb , Sb , Vyb , Kb ); y Vb Qb Vb
fAlgorithm 51g
This approach also makes the implementation of the algorithm simpler: it is only necessary to implement the approximation of matrices of the form 0 1 X1
B :: C X D X1 : : : X ; X D @ : A X and handle the case of general block matrices by first compressing all rows, thus creating an intermediate block matrix with only one block column, and then compressing this intermediate matrix in order to get the final result.
354
8 A posteriori matrix arithmetic
Using the Algorithm 52 for leaf blocks and the Algorithm 53 for subdivided blocks recursively leads to the coarsening Algorithm 54 which approximates the semi-uniform 2 z 2 H .TC;K ; k/ given by matrix M 2 Hsemi .TM;K ; VA ; WB / by an H -matrix M z D M
X
X
Vb Sb Wb C
bD.t;r/2LC C;K
t Mr :
(8.18)
bD.t;r/2L C;K
Since we use the matrices Xb and Yb of the semi-uniform representation only in leaf blocks, we also have to embed a variant of the matrix backward transformation into Algorithm 54 (cf. Lemma 8.3). Algorithm 54. Coarsening of the block structure. procedure Coarsen(b, M , , var Vb , Sb , Wb ); .t; r/ b; if sons.TM;K ; b/ D ; then if b 62 TC;K or b 2 LC C;K then CoarsenLeaf(b, M , b , Vb , Sb , Wb ); fAlgorithm 52g end if else for b 0 D .t 0 ; r 0 / 2 sons.TM;K ; b/ do Xb 0 Xb 0 C t 0 Xb FB;r Yb 0 Yb 0 C r 0 Yb EA;t 0 ;r ; 0 ;t ; 0 Coarsen(b , M , , Vb 0 , Sb 0 , Wb 0 ) end for; if b 62 TC;K or b 2 LC C;K then CoarsenSubdivided(b, M , b , Vb , Sb , Wb ) fAlgorithm 53g end if end if
z be the matrix (8.18) constructed by Algorithm 54 Lemma 8.14 (Error estimate). Let M with the error tolerances . b /b2TM;K . We have z k2 kM M
X
b ;
b2TM;K
if the matrices are approximated with respect to the spectral norm, and z kF kM M
X b2TM;K
if the Frobenius norm is used instead.
b ;
355
8.3 Coarsening
Proof. We introduce the intermediate matrices
zb ´ M
8 ˆ t Mr < P ˆ :
b 0 2sons.TM;K ;b/ Vb Sb Wb
if b 2 L C;K ; if b 2 TC;K n LC;K ; otherwise
z b0 M
for b 2 TM;K
z b yields that M zb z DM z root.T . A closer look at the definition of M and observe M M;K / is the approximation of the submatrix t Mr computed by Algorithm 54 for a block b D .t; r/ 2 TM;K . We assume that matrices are approximated with respect to the spectral norm and now prove X z b k2 k t Mr M b 0 (8.19) b 0 2sons .b/
by induction on # sons .b/ 2 N. We start by considering blocks b 2 TM;K with # sons .b/ D 1. In this case, we have sons.b/ D ;, i.e., b is a leaf. If it is inadmissible, we change nothing an get an error of zero. Otherwise we apply Algorithm 52 which guarantees (8.14) and therefore (8.19) directly. Let now n 2 N, and assume that the induction assumption holds for all b 2 TM;K with # sons .b/ n. Let b 2 TM;K with # sons .b/ D n C 1. Due to # sons .b/ D n C 1 > 1, we find that b is not a leaf, so Algorithm 54 computes approximations for all blocks b 0 2 sons.b/. Due to sons .b 0 / sons .b/ n fbg, we have # sons .b 0 / n and can apply the induction assumption to all of these submatrices: X X z b0 M z b k2 D z b0 / C zb k t Mr M M . t 0 Mr 0 M b 0 D.t 0 ;r 0 /2sons.b/
z b 0 k2 C k t 0 Mr 0 M
X
b 0 D.t 0 ;r 0 /2sons.b/
X
b 0 2sons.b/
X
b
C
b 0 D.t 0 ;r 0 /2sons.b/ b 2sons .b 0 /
X
2
zb z b0 M M
b 0 2sons.b/
X
2
zb z b0 M M :
b 0 2sons.b/
2
z z 0 If b 62 TC;K or b 2 LC C;K , Mb is constructed from the submatrices Mb by Algorithm 53 and the norm on the right is bounded by b . Otherwise, no further approximation takes place and this norm is equal to zero. In both cases, we can conclude X X X z b k2 b C b D b k t Mr M b 0 2sons.b/ b 2sons .b 0 /
b 2sons .b/
and have completed the induction. For the Frobenius norm, we can apply exactly the same argument.
356
8 A posteriori matrix arithmetic
Complexity estimates We have seen that Algorithm 54 can provide us with a good H -matrix approximation z of the matrix M D C C AB. Now we have to investigate the number of operations M required to reach this goal. Lemma 8.15. Let .kC;t / t2T , .lC;r /r2TK , .kA;t / t2T and .lB;r /r2TK be defined as in (3.16) and (3.18). Let kO ´ maxfkC;t ; lC;r ; kA;t ; lB;r W t 2 T ; r 2 TK g: Computing low-rank approximations Vb Sb Wb for all leaves b 2 LM;K of TM;K by applying Algorithm 52 requires not more than a total of Clfc Csp2 kO 2 ..p C 1/n C .pK C 1/nK / operations for a constant Clfc 2 R>0 depending only on Cqr and Csvdf (cf. Lemma 5.17 and Remark 8.12). Proof. Let b D .t; r/ 2 LM;K be a leaf of TM;K . If b is inadmissible, Algorithm 52 uses the singular value decomposition to compute a low-rank approximation. Since b D .t; r/ is an inadmissible leaf, both t and r have to be leaves of T and O TK , respectively, and since the cluster trees are bounded, we have #rO kC;r k. According to Remark 8.12, the singular value decomposition can be found in not more than Csvdf .#tO/.#r/ O minf#tO; #rg O Csvdf .#tO/.#r/ O 2 Csvdf kO 2 #tO operations. If b is admissible, Algorithm 52 computes the matrix X1;b D VC;t SyC;b , and this can be done in 2.#tO/.#KC;t /.#LC;r / 2kO 2 #tO operations. Next, Algorithm 15 is used to compute the QR factorization of Zb . Since Zb has #tO rows and .#LC;r / C .#LB;r / C .#KA;t / 3kO columns, this computation requires not more than O 2 9Cqr kO 2 #tO Cqr .#tO/.3k/ operations. y 1;b is the product of WC;r 2 RKLC;r and R y 2 RLC;r Kyb , adding The matrix M 1;b rO O2 O y b takes not more than 2.#r/.#L O O it to M O C;r /.# Kb / 6k # rO operations since # Kb 3k. KL y B;r y 2;b is the product of WB;r 2 R y 2 RLB;r Kb and can The matrix M and R rO
2;b
O2 O y be added in 2.#r/.#L O B;r /.# Kb / 6k # rO operations. The matrix M3;b is the product
8.3 Coarsening KK
357
y
A;t y 2 RKA;t Kb and can be added in 2.#r/.#K O and R O of Yb 2 RrO A;t /.# Kb / 3;b 2 y b can be constructed in 6kO #rO operations. We conclude that M
18kO 2 #rO operations. y b is available, Algorithm 51 is applied, and according to Remark 8.12 it Once M requires not more than 2 O 2 D 9Csvdf kO 2 #rO O O k/ Csvdf .#r/.#K b / Csvdf .# r/.3
operations. The matrix Vb is the product of Qb and Vyb , and this product can be computed in O 2 D 18kO 2 #tO 2.#tO/.#Kyb /.#Kb / 2.#tO/.3k/ operations. Adding the bounds yields that the algorithm requires not more than .20 C 9Cqr /kO 2 #tO C .18 C 9Csvdf /kO 2 #rO operations for one block b D .t; r/ 2 TM;K . In order to get a bound for the entire algorithm, we add the bounds for all blocks. Combining the structure of the proof of Lemma 3.31 with the estimate of Lemma 8.8, we find X kO 2 .#tO C #r/ O 3Csp2 kO 2 ..p C 1/n C .pK C 1/nK / (8.20) C bD.t;r/2LM;K
and setting Clfc ´ 3 maxf20 C 9Cqr ; 18 C 9Csvdf g completes the proof. Lemma 8.16 (Complexity). Let .kC;t / t2T , .lC;r /r2TK , .kA;t / t2T and .lB;r /r2TK be defined as in (3.16) and (3.18). Let kO ´ maxfkC;t ; lC;r ; kA;t ; lB;r ; kb W t 2 T ; r 2 TK ; b 2 TM;K g; Cbc ´ maxf# sons.t /; # sons.r/ W t 2 T ; r 2 TK g: Computing an H -matrix approximation of M D C C AB with the block cluster tree TC;K using Algorithm 54 requires not more than 4 /Csp2 kO 2 ..p C 1/n C .pK C 1/nK / .Clfc C Csbc Cbc
operations for a constant Csbc 2 R>0 and the constant Clfc defined in Lemma 8.15. Both constants depend only on Cqr and Csvdf .
358
8 A posteriori matrix arithmetic
Proof. Due to Lemma 8.15, the compression of all leaf blocks of TM;K can be handled in Clfc Csp2 kO 2 ..p C 1/n C .pK C 1/nK / operations, so we only have to consider non-leaf blocks treated by Algorithm 53. Let b D .t; r/ 2 TM;K . If b 2 TC;K , no arithmetic operations are needed. Otherwise Algorithm 53 constructs the matrix Z t 0 using not more than X 2.#tO0 /.#K t 0 ;r 0 /2 2Cbc .#tO0 /kO 2 r 0 2sonsC .r/
operations. Due to [
#K t 0 ; D #
K t 0 ;r 0
r 0 2sonsC .r/
X
O #K t 0 ;r 0 Cbc k;
r 0 2sonsC .r/
the matrix Z t 0 has not more than Cbc kO columns, and Lemma 5.17 yields that Algorithm 15 requires not more than O 2 D Cqr C 2 .#tO0 /kO 2 Cqr .#tO0 /.Cbc k/ bc y t 0 . The matrix M y t 0 can be assembled in operations to compute Q t 0 and R X X O bc k/ O D 2Cbc .#r/ 2.#rO 0 /.#K t 0 ;r 0 /.#Kyt 0 / 2 .#rO 0 /k.C O kO 2 r 0 2sonsC .r/
r 0 2sonsC .r/
operations. This procedure is repeated for all t 0 2 sons.t /, and adding the estimates yields the bound X 2 2 2 .2Cbc CCqr Cbc /.#tO0 /kO 2 C2Cbc .#r/ O kO 2 .2Cbc CCqr Cbc /.#tO/kO 2 C2Cbc .#r/ O kO 2 t 0 2sons.t/
y b . Due to Remark 8.12, the matrices for the construction of the matrices Qb and M Wb , Sb and Vyb can be computed by Algorithm 51 in 2 O 2 4 Csvdf .#r/.# k/ D Csvdf Cbc O Kyb /2 Csvdf .#r/.C O bc .#r/ O kO 2
operations due to #Kyb D #
[ t 0 2sonsC .t/
Kyt 0 D
X
#Kyt 0
t 0 2sonsC .t/
X
2 O k: Cbc kO Cbc
t 0 2sonsC .t/
Finally the algorithm constructs Vb in not more than 2 O 2 O 4 2.#tO/.#Kyb /.#Kb / 2.#tO/.Cbc k/.Cbc k/ D 2Cbc .#tO/kO 2
8.4 Construction of adaptive cluster bases
359
operations. Adding the estimates gives us the bound 2 2 4 4 .2Cbc C Cqr Cbc /.#tO/kO 2 C 2Cbc .#r/ O C Csvdf Cbc .#r/ O kO 2 C 2Cbc .#tO/kO 2 2 4 2 4 maxf2Cbc C Cqr Cbc C 2Cbc ; 2Cbc C Csvdf Cbc g.#tO C #r/ O kO 2 maxf4 C Cqr ; 2 C Csvdf gC 4 .#tO C #r/ O kO 2 bc
for the number of operations used by Algorithm 53 applied to a block b D .t; r/ 2 TM;K . Using (8.20) yields the bound 4 2 O2 Csp k ..p C 1/n C .pK C 1/nK / Csbc Cbc
for the number of operations required by all calls to Algorithm 53, where Csbc ´ 3 maxf4 C Cqr ; 2 C Csvdf g: Adding this estimate to the one given by Lemma 8.15 completes the proof.
8.4 Construction of adaptive cluster bases In the first step of the adaptive multiplication algorithm, we have computed the exact product by Algorithm 50 using O.kO 2 ..p C 1/n C nJ C .pK C 1/nK // operations. z of In the second step, we have approximated the exact product by an H -matrix M 2 O O local rank k with the prescribed block cluster tree TC;K using O.k ..p C 1/n C .pK C 1/nK // operations. Now we have to approximate this intermediate H -matrix by an H 2 -matrix, i.e., we have to find suitable cluster bases and compute the H 2 -matrix best approximation of z in the corresponding H 2 -matrix space. M z can be apFortunately, we have already solved both problems: assuming that M 2 O proximated by an H -matrix with k-bounded cluster bases, Algorithm 27 computes adaptive cluster bases in O.kO 2 .pJ C 1/.n C nK // operations (cf. Lemma 6.25), z in the corresponding space and Algorithm 12 computes the best approximation of M 2 O in O.k .pJ C 1/.n C nK // operations (cf. Lemma 5.9). We can conclude that computing the adaptive H 2 -matrix approximation of the product M D C C AB can be computed in O.kO 2 .p C pJ C pK C 1/.n C nK / C kO 2 nJ / operations. Due to the structure of Algorithm 27, the computation of a cluster basis for a cluster t 2 T requires information on all admissible blocks b D .t C ; r/ 2 LC C;K connected C to ancestors t 2 pred.t /, i.e., we essentially have to store the H -matrix representation O J C 1/.n C nK // units of auxiliary storage are required. z explicitly, i.e., O.k.p of M In order to avoid the need for this large amount of temporary storage, the blockwise compression Algorithm 33 can be used. Combining this procedure with the coarsening
360
8 A posteriori matrix arithmetic
and multiplication routines allows us to avoid storing both the intermediate semiz , since we can convert each uniform matrix M0 and the H -matrix approximation M admissible block into an H 2 -matrix as soon as it becomes available. According to Theorem 6.32, this requires O.kO 2 .pJ C 1/.n C nK // operations, and the order of complexity is not changed if we use the blockwise compression Algorithm 33 instead of the direct Algorithm 27.
8.5 Numerical experiments We test the a posteriori matrix-matrix multiplication algorithm by applying it to compressed approximations of the single and double layer potential matrices V and K on the sphere and the cube. The results for the approximation of X ´ V 2 are given in Table 8.1: O n1=2 is the error tolerance for the compression algorithm, the column “A post.” gives the time in seconds for the matrix-matrix multiplication with adaptive cluster bases, the columns “Mem” and “Mem=n” the total storage requirements in MB and the requirements per degree of freedom in KB, and the column gives the relative spectral error kX V 2 k2 =kV 2 k2 of the product. Table 8.1. A posteriori and a priori multiplication for the single layer potential on the unit sphere (top half) and the unit cube (bottom half.)
n 512 2048 8192 32768 131072 768 3072 12288 49152 196608
O A post. A prio. Mem Mem=n 2:04 0:4 0:3 1:9 3:7 2:85 1:04 14:6 9:2 9:8 4:9 4:05 5:05 113:3 53:3 42:6 5:3 2:45 2:05 685:0 249:0 183:7 5:7 9:06 1:05 3626:3 1030:9 753:7 5:9 5:56 2:04 1:0 0:7 3:6 4:8 4:85 1:04 20:8 8:7 12:4 4:1 5:75 5:05 166:3 81:2 96:6 8:1 2:95 2:05 1041:7 356:5 415:4 8:7 9:96 1:05 5661:1 1316:3 1599:9 8:3 4:66
We can combine the new multiplication algorithm with the approach discussed in Chapter 7: according to Lemma 6.23, we can assume that the cluster bases chosen by the adaptive algorithm will be almost optimal, therefore computing the best approximation of the exact product in the corresponding H 2 -matrix space should yield good results. According to Theorem 7.19, we can expect the a priori multiplication Algorithm 45 to compute this best approximation in optimal complexity. The column “A prio.” in Table 8.1 gives the time in seconds required by Algorithm 45 to compute the best
8.5 Numerical experiments
361
approximation of the product with respect to the quasi-optimal cluster bases constructed by the adaptive multiplication algorithm. We can see that the algorithms work as expected: the time for the adaptive algorithm grows like O.n log.n//, the time for the a priori algorithm grows like O.n/. The storage requirements seem also to grow like O.n/. The measured error is always below the prescribed tolerance O , and this indicates that the error control works reliably. Now we perform the same experiment with the double layer potential matrix K instead of V . The results given in Table 8.2 show that the adaptive algorithm works as expected: the time required for the multiplication is almost linear, and the measured error is always bounded by the prescribed tolerance O . Using the cluster bases chosen by the adaptive algorithm, the a priori multiplication Algorithm 45 can compute approximations of the product that are far better than the ones given in Table 7.2 for the non-adaptive approach.
Table 8.2. A posteriori and a priori multiplication for the double layer potential on the unit sphere (top half) and the unit cube (bottom half).
n 512 2048 8192 32768 131072 768 3072 12288 49152 196608
O A post. A prio. Mem Mem=n 2:04 0:4 0:3 1:9 3:8 8:96 1:04 16:6 11:9 11:0 5:5 1:15 5:05 140:2 78:5 49:4 6:2 1:15 2:05 895:2 384:6 216:4 6:8 7:46 1:05 5245:4 1709:7 916:1 7:2 5:56 2:04 1:3 0:9 3:8 5:0 1:35 1:04 29:0 14:7 16:6 5:5 1:35 5:05 202:3 96:6 107:3 8:9 1:25 2:05 1184:7 383:8 445:0 9:3 8:96 1:05 5896:7 1268:8 1664:4 8:7 6:46
In a final experiment, we compare the adaptive multiplication algorithm of Chapter 8, the projected multiplication algorithm of Chapter 7 and the H -matrix multiplication algorithm [49], [52]. Table 8.3 lists computing times and accuracies for the adaptive algorithm (“Adaptive”), the a priori algorithm combined with the cluster bases provided by the adaptive algorithm (“A priori/new”) and the H -matrix algorithm. We can see that the H -matrix algorithm yields accuracies that are slightly better than those of the other algorithms, but also that it takes very long to complete. The adaptive algorithm finds H 2 -matrix approximations that are almost as good as the ones of the H -matrix algorithm, but it is faster and exhibits a better scaling behaviour. The a priori algorithm is easily the fastest, but it is also very inaccurate if the wrong kind of cluster basis is used. In combination with the cluster bases provided by the adaptive algorithm, the a priori algorithm is very fast and yields very good approximations.
362
8 A posteriori matrix arithmetic
Table 8.3. Adaptive and non-adaptive multiplication algorithms for the single and double layer potential on the cube.
n Adaptive A priori/new H -arithmetic 768 0:7 conv. 0:7 conv. 0:7 conv. 3072 35:8 1:85 32:6 1:15 153:0 1:75 276:6 3:15 72:3 3:05 824:8 2:55 12288 49152 1343:7 4:15 240:5 4:05 6591:3 2:55 196608 6513:3 4:75 805:5 4:65 29741:6 2:55 KV 768 0:7 conv. 0:7 conv. 0:7 conv. 3072 37:6 2:75 37:2 2:65 152:9 5:55 12288 270:3 3:55 87:6 3:45 755:5 5:45 49152 1267:0 4:35 291:6 4:25 5688:3 5:65 196608 5933:5 9:05 989:9 9:05 26404:6 5:55 VK 768 0:7 conv. 0:7 conv. 0:7 conv. 3072 37:8 3:25 40:2 3:15 150:2 3:35 12288 267:1 3:35 94:4 3:35 737:4 3:75 49152 1219:5 4:55 310:4 4:45 5886:5 3:45 196608 5734:5 8:05 1067:0 7:85 27303:6 3:65 KK 768 0:7 conv. 0:7 conv. 0:7 conv. 3072 39:8 6:86 44:0 6:66 151:4 5:36 12288 272:4 1:65 100:3 1:65 658:2 6:46 49152 1327:9 3:25 324:5 3:25 5066:8 1:45 196608 5791:6 4:55 1080:8 4:55 24862:1 1:55
Oper. VV
Chapter 9
Application to elliptic partial differential operators According to Theorem 6.21 and Corollary 6.22, we can approximate a matrix X 2 RJ by an efficient H 2 -matrix if the total cluster bases of X and X can be approximated by low rank. We have already seen (e.g., in Chapter 4 and Lemma 6.39) that these assumptions hold for integral operators. Now we turn our attention to elliptic partial differential operators. The discretization of a partial differential operator L by a standard finite element scheme always leads to a matrix L in which for all blocks b D .t; s/ satisfying even the fairly weak admissibility condition dist. t ; s / > 0 the equation t Xs D 0 (cf. Definition 3.20) holds, i.e., each admissible block can be “approximated” by rank zero without any loss. Therefore the matrix L is an H 2 -matrix with trivial cluster bases. The inverse matrix L1 is more interesting: in typical situations, it corresponds to the non-local inverse of the partial differential operator, and we can expect most of its entries to be non-zero. In order to be able to handle L1 efficiently, we need a datasparse representation, and our goal in this chapter is to prove that an H 2 -matrix can be used to approximate L1 up to a certain accuracy proportional to the discretization error. This limitation is due to the structure of the proof, it is not experienced in numerical experiments. The proof is based on an approximation result [6] for the solution operator L1 corresponding to the equation: if and are subdomains satisfying a suitable admissibility condition and if the support of a functional f is contained in , the restriction of L1 f to can be approximated efficiently in a low-dimensional space. The operator L1 and the matrix L1 are connected by the Galerkin projection: applying this projection and its adjoint from the left and right to L1 , respectively, directly yields L1 due to Galerkin orthogonality. Unfortunately, the Galerkin projection is typically a non-local mapping, therefore local approximation properties of L1 would be lost by this procedure. In order to fix this problem, we replace the Galerkin projection by a different mapping into the discrete space. In [6], the L2 -orthogonal projection is used, which is at least quasi-local (i.e., exhibits exponential decay as the distance to the support grows) and leads to an error estimate for the approximation of L1 by an H -matrix. Since the L2 -projection is only quasi-local, the construction of the blockwise error estimates needed by H 2 -matrix approximation theory is complicated. In [15] a different approach is presented: the Galerkin projection is replaced by a Clément-type interpolation operator, and we get a new matrix S approximating the
364
9 Application to elliptic partial differential operators
inverse L1 . Since the interpolation operators are local, they can be used to easily derive the blockwise error estimates we need. Since the operators are also L2 -stable, they provide approximations that are almost as good as those of the L2 -orthogonal projection. This chapter is organized as follows: • Section 9.1 introduces a model problem for an elliptic partial differential equations with non-smooth coefficients. • Section 9.2 describes a construction for a low-rank approximation of the solution operator of the partial differential equation. • Section 9.3 uses this result to find low-rank approximations of admissible submatrices of the discrete solution operator S . • Section 9.4 applies the error estimates of Theorem 6.16 and Corollary 6.17 to prove that the discrete solution operator S can be approximated by an efficient H 2 -matrix. • Numerical experiments are described together with the approximative inversion algorithm in Section 10.5. Assumptions in this chapter: We assume that a cluster tree T for the finite index is given. Let T be an admissible block cluster tree for T . Let n ´ # and c ´ #T denote the number of indices and clusters of and T . Let p be the depth of T .
9.1 Model problem We fix a domain Rd and a coefficient function C W ! Rd d satisfying C.x/ D C.x/ ;
.C.x// Œ˛; ˇ
for all x 2 :
We are interested in the partial differential operator Lu ´
d X
@i Cij @j u
i;j D1
mapping the Sobolev space H01 . / into H 1 . /. For f 2 H 1 . /, the partial differential equation Lu D f (9.1)
9.1 Model problem
is equivalent to the variational equation Z a.v; u/ ´ hrv.x/; C.x/ru.x/i2 dx D f .v/
for all v 2 H01 . /:
365
(9.2)
The bounds for the spectrum of C imply ˛kwk22 hC.x/w; wi2 D kC.x/1=2 wk22 ˇkwk22 ;
for all w 2 Rd :
Combining this inequality with the Cauchy–Schwarz inequality provides us with the upper bound hrv.x/; C.x/ru.x/i2 D hC.x/1=2 rv.x/; C.x/1=2 ru.x/i2 kC.x/1=2 rv.x/k2 kC.x/1=2 ru.x/k2 ˇkrv.x/k2 kru.x/k2 ; and the definition of the Sobolev space H 1 . / yields ja.v; u/j ˇkvkH 1 ./ kukH 1 ./
for all u; v 2 H 1 . /:
This means that the bilinear form a is bounded, i.e., continuous. Due to hru.x/; C.x/ru.x/i2 ˛kru.x/k22 ; Friedrichs’ inequality implies the existence of a constant C 2 R>0 depending only on the domain such that 2 2 a.u; u/ ˛krukL 2 ./ C ˛kukH 1 ./
for all u 2 H01 . /;
i.e., a is a coercive bilinear form, therefore (9.2) and the equivalent (9.1) possess unique solutions [34]. Usually, strongly elliptic partial differential equations of the type (9.1) are treated numerically by a finite element method: a mesh h for the domain is constructed, and basis functions .'i /i2 are used to define a finite-dimensional space Vn ´ spanf'i W i 2 g H01 . /; where is a finite index set and n ´ # is the dimension of the discrete space Vn . Using the standard Galerkin approach, an approximation un 2 Vn of u is represented in the form X xi 'i un D i2
for the solution vector x 2 R of the linear system Lx D b
(9.3)
366
9 Application to elliptic partial differential operators
given by the stiffness matrix L 2 R and the load vector b 2 R defined by Lij D a.'i ; 'j /;
bi D f .'i /
for all i; j 2 :
(9.4)
The system (9.3) can be solved by several techniques, e.g., by fast direct solvers [93], multigrid iterations [61] or H - and H 2 -matrix methods [62], [52], [18], [15]. H - and H 2 -matrix techniques offer the advantage that they can handle jumps and anisotropies in the coefficient matrix C better than multigrid techniques and that they are more efficient than direct solvers for large problems.
9.2 Approximation of the solution operator Let Rd be a convex set with \ ¤ ;. Let be a subset with dist.; / > 0.
Let 2 R>0 . We are looking for a low-dimensional space V H 1 . / such that for each right-hand side f 2 H 1 . / with supp f the corresponding solution u 2 H01 . / of the variational equation (9.2) can be approximated in V , i.e., such that there exists a v 2 V with ku vkH 1 . / kf kH 1 ./ : Since V is required to be independent of f , this property implies that the interaction between the domains and can be described by a low-rank operator. If the coefficient function C and the boundary of were sufficiently smooth, interior regularity estimates would yield an estimate of the form m c kuj kH m . / C mŠkf kH 1 ./ for all m 2 N0 dist.; / and we could simply approximate uj by a polynomial uQ of order m. In this setting, the space V would have a dimension . md and the approximation uQ would converge
9.2 Approximation of the solution operator
367
exponentially with respect to the order m if an admissibility condition of the type (4.11) or (4.37) holds. In the general case, we have to use a refined approach first presented in [6]: since u 2 H 1 . / holds, we can approximate the solution by a piecewise constant function, but the convergence rate will not be exponential. Projecting this function into a local space of L-harmonic functions (cf. Definition 9.1 below) yields an approximation v1 . We can apply a weak interior regularity argument to show that v1 j 1 is contained in H 1 .1 / for a subset 1 , therefore the error u1 ´ uj 1 v1 j 1 is also an Lharmonic function in H 1 .1 /, and the argument can be repeated until a sufficiently accurate approximation v ´ v1 C C vp has been found. The key element of the proof is the space of locally L-harmonic functions: Definition 9.1 (Locally L-harmonic functions). Let ! Rd be a domain (that may be unrelated to ). A function u 2 L2 .!/ is called locally L-harmonic on ! if for all !Q ! with dist.!; Q @!/ > 0 the following conditions hold: uj!Q 2 H 1 .!/; Q a.v; uj / D 0 uj!n D 0:
(9.5a) for all v 2
H01 . /
with supp v !; Q
(9.5b) (9.5c)
The space of all locally L-harmonic functions on ! is denoted by H .!/. For functions in H .!/, the following weak interior regularity estimate holds (cf. Lemma 2.4 in [6]): Lemma 9.2 (Cacciopoli inequality). Let u 2 H .!/, and let !Q ! be a domain with Q and dist.!; Q @!/ > 0. Then we have uj!Q 2 H 1 .!/ p creg krukL2 .!/ kukL2 .!/ ; creg ´ 4 ˇ=˛ 4: Q dist.!; Q @!/ Proof. Let ı ´ dist.!; Q @!/. Let 2 C 1 . [ !/ be a function satisfying 0 1; j!Q 1; krk2 2=ı; .x/ D 0 for all x 2 ! with dist.x; @!/ < ı=4: Such a function exists since the distance between the subdomain !Q and the boundary of ! is ı > 0. For the domain !O ´ fx 2 ! W dist.x; @!/ > ı=8g; we have dist.!; O @!/ ı=8 > 0, so (9.5a) implies uj!O 2 H 1 .!/, O and since is continuously differentiable and bounded, we have v ´ 2 u 2 H01 .!/ and can extend this function by zero to H01 . [ !/. Due to (9.5c), we get vj 2 H01 . / with supp v !O and uj 2 H01 . /, therefore we can use (9.5b) in order to prove Z 0 D a.vj ; uj / D hC.x/r.2 u/.x/; ru.x/i2 dx !\ O
368
9 Application to elliptic partial differential operators
Z D !\ O
hC.x/.2ur C 2 ru/.x/; rui2 dx:
Moving the second term of this sum to the left side of the equation yields Z Z 2 1=2 2 .x/ kC.x/ ru.x/k2 dx D .x/2 hC.x/ru.x/; ru.x/i2 dx !\ O !\ O Z D 2 .x/u.x/hC.x/r.x/; ru.x/ dx !\ O Z 2 .x/ju.x/j kC.x/1=2 r.x/k2 kC.x/1=2 ru.x/k2 dx !\ O Z 1=2 2ˇ .x/ju.x/j kr.x/k2 kC.x/1=2 ru.x/k2 dx !\ O
4 4
ˇ
1=2
Z
ı
!\ O
.x/ju.x/j kC.x/1=2 ru.x/k2 dx
ˇ 1=2 kL2 .!\/ kuj!\ O O ı
1=2
Z !\ O
.x/2 kC.x/1=2 ru.x/k22 dx
Dividing both sides by the rightmost factor (if it is not zero) yields Z
1=2 2
1=2
.x/ kC.x/ !\ O
ru.x/k22
dx
4
ˇ 1=2 kL2 .!\/ : kuj!\ O O ı
Due to j!Q 1 and (9.5c), we get Z kruj!Q kL2 .!/ kL2 .!\/ D Q Q D kruj!\ Q ˛ ˛
1=2
1=2
4 ı
ˇ ˛
Z
1=2 2
.x/ !\ Q
2
1=2
ru.x/k22
2
1=2
ru.x/k22
.x/ kC.x/ Z
kru.x/k22
!\ Q
dx
1=2 dx 1=2
.x/ kC.x/ !\ O 1=2
kuj!\ kL2 .!\/ O O
dx p 4 ˇ=˛ kukL2 .!/ : ı
This is the required estimate. As mentioned before, we use orthogonal projections to map functions from L2 .!/ into H .!/. The construction of these projections is straightforward if H .!/ is a complete set, i.e., closed in L2 .!/. Using Lemma 9.2, this property can be proven (the proof is a variant of that of Lemma 2.2 in [6] which requires only elementary tools): Lemma 9.3. The space H .!/ is a closed subspace of L2 .!/.
9.2 Approximation of the solution operator
369
Proof. Let .un /n2N be a Cauchy sequence in H .!/ with respect to the L2 .!/-norm. Since L2 .!/ is complete, we can find a function u 2 L2 .!/ with lim kun ukL2 .!/ D 0:
(9.6)
n!1
Let !Q ! be a domain with dist.!; Q @!/ > 0. Lemma 9.2 implies 1=2 2 2 kvj!Q kH 1 .!/ C krvj k Q kL2 .!/ ! Q 2 Q D kvj! Q Q L .!/ C kvkL2 .!/ with the constant
for all v 2 H .!/
16ˇ=˛ C ´ 1C dist.!; Q @!/2
(9.7)
1=2 ;
therefore we have kun j!Q um j!Q kH 1 .!/ Q C kun um kL2 .!/
for all n; m 2 N
and conclude that .un j!Q /n2N is a Cauchy sequence with respect to the H 1 .!/-norm. Q Since H 1 .!/ Q is complete, we can find a function u!Q 2 H 1 .!/ Q with lim kun j!Q u!Q kH 1 .!/ Q D 0:
(9.8)
n!1
Q (9.6) implies Since the restriction to !Q is a continuous mapping from L2 .!/ to L2 .!/, lim kun j!Q uj!Q kL2 .!/ Q D 0:
n!1
Combining this property with (9.8) and the estimate kuj!Q u!Q kL2 .!/ Q un j! Q C un j! Q u! Q kL2 .!/ Q D kuj! Q kuj!Q un j!Q kL2 .!/ Q u! Q kL2 .!/ Q C kun j! Q kuj!Q un j!Q kL2 .!/ Q u! Q kH 1 .!/ Q C kun j! Q
for all n 2 N
1 yields kuj!Q u!Q kL2 .!/ Q Q D u! Q 2 H .!/. Q D 0, i.e., uj! 1 Let now v 2 H0 . / with supp v !. Q We have just proven that uj!Q 2 H 1 .!/ Q is 1 the limit of un j!Q with respect to the H -norm, therefore (9.5b) and the continuity of a imply a.v; uj / D lim a.v; un j / D 0: n!1
The continuity of the restriction from L2 .!/ to L2 .! n / yields uj!n D lim un j!n D 0; n!1
and we can conclude that u 2 H .!/ holds, therefore H .!/ is closed.
370
9 Application to elliptic partial differential operators
We introduce the maximum-norm diameter diam1 .!/ ´ supfkx yk1 W x; y 2 !g D supfjxi yi j W x; y 2 !; i 2 f1; : : : ; d g and can now state the basic approximation result (the proof is a slight modification of the proof of Lemma 2.6 in [6]): Lemma 9.4 (Finite-dimensional approximation). Let ! Rd be a convex domain. Let ` 2 N. Let Z be a closed subspace of L2 .!/. There is a space V Z with dim.V / `d such that for all u 2 Z \ H 1 .!/ a function v 2 V can be found with p diam1 .!/ 2 d ku vkL2 .!/ capx krukL2 .!/ ; capx ´ : ` Proof. We let ı ´ diam1 .!/ and introduce ai ´ inffxi W x 2 !g
for all i 2 f1; : : : ; d g:
By definition, we have xi ai D jxi ai j ı
for all i 2 f1; : : : ; d g; x 2 !;
i.e., the d -dimensional hypercube Q ´ a C Œ0; ıd satisfies ! Q. We let SQ ´ f1; : : : ; `gd and define Q ´ a C
d O i 1 i ı; ı ` `
for all 2 SQ
iD1
and observe that .Q /2SQ is a family of `d disjoint open hypercubes satisfying p diam1 .Q / D ı=` and diam.Q / D d ı=` that covers Q up to a null set. For each 2 SQ , we let ! ´ ! \ Q . Defining S! ´ f 2 SQ W j! j > 0g; we have found that .! /2S! is a family of not more than `d convex sets that covers ! up to a null set. We construct an intermediate approximation by piecewise constant functions defined on .! /2S! , i.e., by functions in the space W ´ fw 2 L2 .!/ W wj! is constant almost everywhere for all 2 S! g L2 .!/: For a function u 2 L2 .!/, we let w ´
1 j! j
Z u.x/ dx !
for all 2 S!
9.2 Approximation of the solution operator
371
and introduce the piecewise constant approximation w 2 W by w.x/ ´ w
for all 2 S! ; x 2 ! :
Due to u 2 H 1 .!/, the Poincaré inequality yields Z diam.! /2 kru.x/k22 dx ju.x/ w j 2 ! ! Z d ı2 2 2 kru.x/k22 dx for all 2 S! ` !
Z
2
and summing over all 2 S! yields ku wkL2 .!/
p dı krukL2 .!/ : `
This is already the desired estimate, but w is not necessarily contained in the space Z.
!2;3
Figure 9.1. Domains for the piecewise constant approximation in Lemma 9.4.
Since Z is a closed subspace of L2 .!/, the orthogonal projection …Z W L2 .!/ ! Z defined by h…Z f; giL2 .!/ D hf; giL2 .!/
for all f 2 L2 .!/; g 2 Z
exists and satisfies k…Z kL2 .!/ L2 .!/ 1 and …Z g D g for all g 2 Z. We let V ´ …Z W and v ´ …Z w 2 V and conclude p dı ku vkL2 .!/ D k…Z u …Z wkL2 .!/ ku wkL2 .!/ krukL2 .!/ ; ` i.e., V is a subspace of Z with a dimension not larger than `d satisfying the desired estimate.
372
9 Application to elliptic partial differential operators
Combining the approximation result of Lemma 9.4 with the regularity result of Lemma 9.2 allows us to find finite-dimensional spaces approximating the solutions of the variational equation (9.2): Theorem 9.5 (Low-rank approximation). Let 2 R>0 and q 2 .0; 1/. There are constants Capx ; Cdim 2 R>0 such that for all convex open domains Rd and all p 2 N2 , we can find a space V H . / satisfying dim V Cdim p d C1 :
(9.9)
diam1 . / 2 dist.; /
(9.10)
For all domains with
and all right-hand sides f 2 H 1 . / with supp f , the corresponding solution u 2 H01 . / of the variational equation (9.2) can be approximated by a function v 2 V with kruj rvj kL2 . / Capx q p kf kH 1 ./ ;
(9.11) p
kuj vj kH 1 . / Capx .dist.; /=8 C 1/q kf kH 1 ./ :
(9.12)
Proof. Let Rd be a convex open domain, let be a domain satisfying (9.10), let ı ´ diam. /=.2/, and let p; ` 2 N. We introduce the domains ² ³ .p i /ı !i ´ x 2 Rd W dist.x; / < for all i 2 f0; : : : ; pg: p
!0 !p
By construction we have !p , !i !i1 for all i 2 f1; : : : ; pg and !0 \ D ;. In order to apply Lemma 9.2, we need an estimate for the distance between the boundaries of these subdomains. Let i 2 f1; : : : ; pg, x 2 !i and y 2 @!i1 . Due to dist.x; / < .p i /ı=p, we can find z 2 with kx zk2
.p i /ı p
9.2 Approximation of the solution operator
373
Due to dist.y; / D .p i C 1/ı=p and z 2 , we have ky zk2
.p i C 1/ı p
and conclude kx yk2 ky zk2 kz xk2
ı .p i C 1/ı .p i /ı D ; p p p
i.e., dist.!i ; @!i1 / ı=p. Let f 2 H 1 . / with supp f . Let u 2 H01 . / be the corresponding solution of the variational equation (9.2). We define u0 2 L2 .!0 / by letting u0 j!0 \ ´ uj!0 ;
u0 j!0 n ´ 0:
Q Since u equals zero on @ , we get u0 2 H 1 .!0 /. For all v 2 H01 . / with supp v !, we have supp v \ supp f D ; and therefore a.v; u0 j / D a.v; u/ D f .v/ D 0; so we can conclude u0 2 H .!0 /. We now construct ui 2 H .!i / and vi 2 Vi H .!i1 / for all i 2 f1; : : : ; pg such that ui D .ui1 vi /j!i and 2capx . C 1/ı krui1 kL2 .!i 1 / ; ` (9.13) p krui kL2 .!i / c krui1 kL2 .!i 1 / ` hold for the constant c ´ 2creg capx . C 1/. Let i 2 f1; : : : ; pg and assume that ui1 2 H .!i1 / is given. We apply Lemma 9.4 to find a space Vi H .!i1 / with dim Vi `d and a function vi 2 Vi with kui kL2 .!i /
diam1 .!i1 / krui1 kL2 .!i 1 / ` diam1 . / C 2ı capx krui1 kL2 .!i 1 / : ` By definition, we have diam1 . / 2ı, and the estimate becomes kui1 vi kL2 .!i 1 / capx
2. C 1/ı (9.14) krui1 kL2 .!i 1 / : ` According to Lemma 9.2, the restriction ui ´ .ui1 vi /j!i of the error ui1 vi is contained in H .!i / and the interior regularity estimate creg krui kL2 .!i / kui1 vi kL2 .!i 1 / dist.!i ; @!i1 / p 2. C 1/ı p creg capx krui1 kL2 .!i 1 / D c krui1 kL2 .!i 1 / : ı ` ` kui1 vi kL2 .!i 1 / capx
374
9 Application to elliptic partial differential operators
holds. Restricting the left side of (9.14) to the subdomain !i !i1 completes the induction and we have proven (9.13) for all i 2 f1; : : : ; pg. Iterating the second estimate of (9.13) yields p i krui kL2 .!i / c kru0 kL2 .!0 / for all i 2 f1; : : : ; pg: ` By construction, we have up j D up1 j vp j D up2 j vp1 j vp j D D u0 j v1 j vp j ; so v ´ v1 j C C vp j is an approximation of u0 j D uj . We define the space V ´ V1 j C C Vp j ; where the restriction of the space is interpreted as the space spanned by the restriction of its elements, and get v 2 V with dim V p`d . The estimates (9.13) imply 2capx . C 1/ı krup1 kL2 .!i 1 / ` 2capx . C 1/ı p p1 kru0 kL2 .!0 / c ` ` ı p p D kru0 kL2 .!0 / ; c creg p ` p p D krup j kL2 . / c kru0 kL2 .!0 / : `
kuj vkL2 . / D kup j kL2 . /
kr.uj v/kL2 . /
Combining both estimates allows us to bound the full H 1 -norm by 1=2 2 2 kuj vkH 1 . / D kuj vkL C kr.uj v/j k L2 . / 2 . / !1=2 p p ı2 c C 1 kru0 kL2 .!0 / 2 p2 creg ` 2 1=2 p p ı c C 1 kru0 kL2 .!0 / 2 2 4 2 ` p p .ı=8 C 1/ c kru0 kL2 .!0 / : ` Since u is the solution of (9.2), we have 2 kukH 1 ./
1 C ˛
a.u; u/ D
1 C ˛
f .u/
1 C ˛
kf kH 1 ./ kukH 1 ./ ;
and this implies kru0 kL2 .!0 / krukL2 ./ kukH 1 ./
1 kf kH 1 ./ : C ˛
9.2 Approximation of the solution operator
375
Combining this estimate with the error bounds for u v yields 1 p p kr.uj v/kL2 . / kf kH 1 ./ ; c C ˛ ` p p 1 kuj vkH 1 . / kf kH 1 ./ : .ı=8 C 1/ c C ˛ ` In order to get the estimates (9.9), (9.11) and (9.12), we have to choose ` appropriately. A simple approach is to let cp ; `´ q since this yields p p p pq c c D q; c qp ; cp ` ` and the dimension of V can be bounded by `
cp c 1 C1 pC p D q q 2
c 1 C p q 2
due to p 2, so setting Cdim ´
c 1 C q 2
d ;
Capx ´
1 C ˛
yields dim V p`d Cdim p d C1 and kr.uj v/kL2 . / Capx q p kf kH 1 ./ ; kuj vkH 1 . / Capx q p .ı=8 C 1/kf kH 1 ./ ; therefore the proof is complete. This result is closely related to Theorem 2.8 in [6], but it yields an H 1 -norm estimate for the solution of the variational equation (9.2) using the H 1 -norm of the right-hand side functional instead of an L2 -norm estimate of Green’s function. The main difference between both proofs is that the one given here exploits the fact that the original solution u already is L-harmonic in , therefore we can perform the approximation by Lemma 9.4 first, and follow it by the regularity estimate of Lemma 9.2 in order to get an H 1 -estimate for the error. The proof of [6], Theorem 2.8, on the other hand, deals with Green’s function, and this function is not globally in H 1 , therefore the first step has to be the regularity estimate and the resulting error bound is given only for the L2 -norm. Remark 9.6 (Direct L2 -norm estimate). Of course, we can use the same ordering of regularity estimates and approximation steps in Theorem 9.5 in order to get an estimate of the form p p kuj vkL2 . / c ku0 kL2 .!0 / Capx q p kf kH 1 ./ `
376
9 Application to elliptic partial differential operators
instead of (9.11). Since the space V constructed in this way would differ from the one used in Theorem 9.5, we cannot simply combine both estimates in order to get an estimate for the full H 1 -norm and have to rely on results of the type (9.12) instead. Remark 9.7 (Influence of ˛ and ˇ). The bounds ˛ and ˇ for the spectra of the coefficient matrices influence the constants Cdim and Capx . Due to p p 2 d . C 1/; c D 2creg capx . C 1/ D 8 ˇ=˛ we have p p 2 d 1 d Cdim D 16 ˇ=˛ . C 1/ C q 2 and can expect Cdim . .ˇ=˛/d=2 for ˇ ˛. The constant ˛ also appears in the definition of Capx , and we get Capx . 1=˛.
9.3 Approximation of matrix blocks The problem of proving that the H 2 -matrix arithmetic algorithms yield a sufficiently accurate approximation of the inverse can be reduced to an existence result: since the adaptive arithmetic operations (cf. Section 8 and [49]) have a best-approximation property, we only have to show that an approximation of L1 by an H 2 -matrix exists, because this already implies that the computed approximation Sz will be at least as good as this approximation. This proof of existence can be accomplished using our main result stated in Theorem 9.5: a block t L1 s describes the mapping from a right-hand side vector b with support in s to the restriction of the corresponding discrete solution to t . In order to apply our approximation result, we have to exploit the relationship between the inverse matrix L1 and the inverse operator L1 . In [6], this problem is solved by applying L2 -orthogonal projections. Since these projections are non-local operators, additional approximation steps are required, which increase the rank, lead to sub-optimal error estimates, and make the overall proof quite complicated. We follow the approach presented in [15]: instead of a non-local L2 -projection, a Clément-type interpolation operator [35] can be used to map continuous functions into the discrete space. These operators are “sufficiently local” to provide us with improved error estimates and guarantee that the rank of the approximation will not deteriorate. In order to keep the presentation simple, we assume that the finite element mesh is shape-regular in the sense of [38], Definition 2.2. It is not required to be quasi-uniform. Let us recall the basic definitions and properties of Clément interpolation operators: we fix a family .i /i2 of functionals mapping L2 . / into R such that supp i supp 'i
for all i 2 ;
(9.15)
9.3 Approximation of matrix blocks
377
holds, that the local projection property i .'j / D ıij
for all i; j 2
(9.16)
is satisfied and also that the local stability property ki .u/'i kL2 ./ Ccs kukL2 .supp 'i /
for all i 2 ; u 2 L2 . /
(9.17)
holds for a constant Ccs 2 R>0 depending only on the shape-regularity of the mesh. Constructions of this kind can be found in [95], [8]. The interpolation operator is defined by X I W L2 . / ! Vn ; u 7! i .u/'i : (9.18) i2
The local projection property (9.16) implies its global counterpart Ivn D vn
for all vn 2 Vn :
(9.19)
Since the matrices we are dealing with are given with respect to the space R , not Vn , we need a way of switching between both spaces. This is handled by the standard basis isomorphism X ˆ W R ! Vn H01 . /; x 7! xi 'i : i2
The interpolation operator I can be expressed by I D ˆƒ if we define ƒ W L2 . / ! R by .ƒv/i ´ i .v/
for all i 2 ; v 2 L2 . /:
In order to construct the approximation of L1 by using L1 , we turn a vector b 2 R into a functional, apply L1 , and approximate the result again in Vh . The first step can be accomplished by using the adjoint of ƒ: we define X ƒ W R ! .L2 . //0 ; b 7! bi i : i2
This operator turns each vector in R into a functional on L2 . /, and due to X X X hƒ b; vi D bi hi ; vi D bi i .v/ D bi .ƒv/i D hb; ƒvi2 ; i2
it is indeed the adjoint of ƒ.
i2
i2
378
9 Application to elliptic partial differential operators
If the vector b is given by (9.4) for a right-hand side functional f 2 H 1 . /, the projection property (9.16) implies X .ƒ b/.'j / D bi i .'j / D bj D f .'j / for all j 2 ; b 2 R ; i2
therefore the functional ƒ b and the original right-hand side f of (9.1) yield the same Galerkin approximation un . We have to prove that ƒ is a bounded mapping with respect to the correct norms. In order to do so, we first have to prove that I is L2 -stable. The shape-regularity of the mesh implies that there is a constant Csr 2 N such that #fj 2 W supp 'i \ supp 'j ¤ ;g Csr
for all i 2
holds. Since the local finite element spaces are finite-dimensional, there is also a constant Cov 2 N such that #fj 2 W 'j .x/ ¤ 0g Cov
for all x 2 :
Using these estimates, we can not only prove the well-established global L2 -stability of the interpolation operator I, but also localized counterparts on subdomains corresponding to clusters: p Lemma 9.8 (Stability). Let Ccl ´ Ccs Csr Cov . For all tO and all with supp 'i
for all i 2 tO
we define the local interpolation operator I t W L2 . / ! Vn ;
v 7!
X
i .v/'i :
i2tO
Then we have kI t vkL2 ./ Ccl kvkL2 . /
for all v 2 L2 . /:
(9.20)
In particular, the interpolation operator I is L2 -stable, i.e., satisfies kIvkL2 ./ Ccl kvkL2 ./
for all v 2 L2 . /:
Proof. Let t 2 T and v 2 L2 . t /. We define the functions i and ij by ´ 1 if 'i .x/ ¤ 0; i W ! N; x 7! for all i 2 ; 0 otherwise; ´ 1 if 'i .x/ ¤ 0; 'j .x/ ¤ 0; ij W ! N; x 7! for all i; j 2 0 otherwise
(9.21)
9.3 Approximation of matrix blocks
and observe X i .x/ Cov ;
X
i2
ij .x/ D
i2
X
379
for all j 2 ; x 2 :
j i .x/ Csr
i2
Combining these estimates with Cauchy’s inequality yields Z X 2 X 2 2 i .v/'i 2 D i .v/'i .x/ dx kI t vkL2 ./ D L ./
O
D
t Z i2X X
D
i2tO
i .v/j .v/'i .x/'j .x/ dx
i2tO j 2tO
Z XX
ij .x/i .v/j .v/'i .x/'j .x/ dx
i2tO j 2tO
Z XX
ij .x/i .v/2 'i .x/2
1=2
i2tO j 2tO
XX
ij .x/j .v/2 'j .x/2
1=2 dx
i2tO j 2tO
D
Z XX
D Csr
ij .x/i .v/2 'i .x/2 dx Csr
i2tO j 2tO
X
XZ i2tO
i .v/2 'i .x/2 dx
2 ki .v/'i kL 2 ./ :
i2tO
Now we apply the local stability estimate (9.17) to get 2 kI t vkL 2 ./
Csr Ccs2
X i2tO
D Csr Ccs2 D
2 kvkL 2 .supp ' / i
XZ
D
Csr Ccs2
XZ i2tO
v.x/2 dx supp 'i
Z
i .x/v.x/2 dx Csr Cov Ccs2
i2tO 2 Csr Cov Ccs2 kvkL 2 . /
v.x/2 dx
This is equivalent to (9.20). For tO ´ and ´ , this estimate implies (9.21). We are interested in bounding ƒ b by a norm of the vector b, so we need a connection between coefficient vectors and the corresponding elements of Vn . Since the finite element mesh is shape-regular, a simple application of Proposition 3.1 in [38] yields that there is a positive definite diagonal matrix H 2 R satisfying Cb1 kH d=2 xk2 kˆxkL2 ./ Cb2 kH d=2 xk2
for all x 2 R ;
380
9 Application to elliptic partial differential operators
where Cb1 ; Cb2 2 R>0 are constants depending only on the shape-regularity of the mesh. Using this inequality, we can prove the necessary properties of ƒ : Lemma 9.9 (ƒ bounded and local). Let % 2 Œ0; 1 and b 2 R . We have kƒ bkH 1C% ./
Ccl kH d=2 bk2 ; Cb1
(9.22)
i.e., ƒ is a continuous mapping from R to H 1C% . /. The mapping preserves locality, i.e., it satisfies [ fsupp 'i W i 2 ; bi ¤ 0g: supp.ƒ b/
(9.23)
Proof. Let v 2 H01% . /. Let y ´ ƒv and vn ´ ˆy D Iv. By the definition of ƒ we get .ƒ b/.v/ D hb; ƒvi2 D hb; yi2 D hb; H d=2 H d=2 yi2 D hH d=2 b; H d=2 yi2 kH d=2 bk2 kH d=2 yk2 kH d=2 bk2 kH d=2 bk2 kˆyk2 D kvn kL2 ./ Cb1 Cb1 kH d=2 bk2 Ccl D kIvkL2 ./ kH d=2 bk2 kvkL2 ./ ; Cb1 Cb1
and this implies (9.22) due to kvkL2 ./ kvkH 1% ./ . Due to (9.15) and the definition of ƒ , we have X X [ [ supp.ƒ b/ D supp bi i D supp b i i supp i supp 'i : i2
i2 bi ¤0
i2 bi ¤0
i2 bi ¤0
This is the desired inclusion. This result allows us to switch from the vector b corresponding to the discrete setting to the functional f of the variational setting. In the variational setting, we can apply Theorem 9.5 to construct the desired approximation of the solution, then we have to switch back to the discrete setting. Unfortunately, we cannot use the Galerkin projection to perform this last step, which would be the natural choice considering that we want to approximate un , since it is a global operator and the approximation result only holds for a subdomain. Therefore we have to rely on the Clément-type interpolation operator again, which has the desired locality property. Using interpolation instead of the Galerkin projection leads to a second discrete approximation of L1 , given by the matrix S D ƒL1 ƒ 2 R :
9.3 Approximation of matrix blocks
381
If we let b 2 R , f ´ ƒ b, u ´ L1 f and uQ n ´ Iu, we observe ˆS b D uQ n , i.e., S provides us with the coefficients of the Clément-type approximation of the solution operator. Since I is L2 -stable (cf. (9.21)) and a projection (cf. (9.19)), we have the estimate kuQ n ukL2 ./ D kIu ukL2 ./ D kIu Ivn C vn ukL2 ./ D kI.u vn / C .vn u/kL2 ./ kI.vn u/kL2 ./ C kvn ukL2 ./ .Ccl C 1/kvn ukL2 ./
(9.24)
for all vn 2 Vn , i.e., uQ n is close to the best possible approximation of u with respect to the L2 -norm. In particular we can apply (9.24) to the Galerkin solution un and get kuQ n ukL2 ./ .Ccl C 1/kun ukL2 ./ ;
(9.25)
therefore uQ n will converge to the same limit as un , and its rate of convergence will be at least as good. In fact, (9.24) even implies that uQ n may converge faster than un in situations of low regularity: due to u 2 H 1 . /, we can always expect uQ n to converge at least like O.h/ with respect to the L2 -norm, where h is the maximal meshwidth. If we assume that the equation (9.1) is H 1% . /-regular, it is possible to derive a refined error estimate for the matrices S and L1 : Lemma 9.10 (Clément vs. Galerkin). Let % 2 Œ0; 1. We assume that for all functionals f 2 H 1C% . /, the solution u ´ L1 f satisfies u 2 H01C% . / and kukH 1C% ./ Crg kf kH 1C% ./ :
(9.26)
Then there is a constant Ccg 2 R>0 depending only on Crg , Ccl , Cb1 , and the shaperegularity of the mesh with kH d=2 .S L1 /bk2 Ccg h2˛ kH d=2 bk2
for all b 2 R ;
where h 2 R>0 is the maximal meshwidth. Proof. Let b 2 R , let f ´ ƒ b and u ´ L1 f . Let x ´ L1 b and un ´ ˆx. Q By definition, we have uQ n D Iu. Let xQ ´ S b and uQ n ´ ˆx. Using the standard Aubin–Nitsche lemma yields ku un kL2 ./ Can h2˛ kf kH 1C% ./ ; with a constant Can depending only on Crg and the shape-regularity parameters of the mesh. Combining this estimate with (9.25) gives us kun uQ n kL2 ./ kun ukL2 ./ C ku uQ n kL2 ./ Can .Ccl C 2/h2˛ kf kH 1C˛ ./ :
382
9 Application to elliptic partial differential operators
We observe 1 kˆS b ˆL1 bkL2 ./ Cb1 1 Can .Ccl C 2/ 2˛ D kuQ n un kL2 ./ h kf kH 1C˛ ./ Cb1 Cb1 Can .Ccl C 2/ 2˛ Ccl h kH d=2 bk2 Cb1 Cb1
kH d=2 .S L1 /bk2
2 and complete the proof be setting Ccg ´ Can Ccl .Ccl C 2/=Cb1 .
Due to this result, a good approximation of S on a sufficiently fine mesh is also a good approximation of L1 , and a good approximation of S can be constructed by Theorem 9.5: we pick subsets tO; sO and subsets ; Rd with supp 'i ;
supp 'j ;
is convex
for all i 2 tO; j 2 sO :
(9.27)
Theorem 9.5 yields the following result: Theorem 9.11 (Blockwise low-rank approximation). Let 2 R>0 , q 2 .0; 1/. There are constants Cblk ; Cdim 2 R>0 depending only on , q, and the shape regularity of the mesh such that for all tO; sO and ; Rd satisfying (9.27) and the admissibility condition diam. / 2 dist.; / (9.28) and all p 2 N2 we can find a rank k 2 N with k Cdim p d C1 and matrices X t;s 2 RtOk , Y t;s 2 RsO k with /bk2 Cblk q p kHsd=2 bk2 kH td=2 .SjtOOs X t;s Y t;s
for all b 2 RsO
for H t ´ H jtOtO , Hs ´ H jsO Os , i.e., the submatrix of S corresponding to the block tO sO can be approximated by a matrix of rank k. Proof. If \ D ; or \ D ;, we have u 0 and the error estimates holds for the trivial space V D f0g. Therefore we can restrict our attention to the case \ ¤ ;, \ ¤ ;. Due to Theorem 9.5, there is a space V L2 . / with dim V Cdim p d C1 such that for all f 2 H 1 . / with supp f we can find a function v 2 V satisfying kuj vkH 1 . / Capx .dist.; /=8 C 1/q p kf kH 1 ./ with u ´ L1 f . Let b 2 RsO . We extend b to a vector bO 2 R by ´ bj if j 2 sO ; bOj ´ for all j 2 : 0 otherwise
(9.29)
9.3 Approximation of matrix blocks
383
Due to Lemma 9.9, the functional f ´ ƒ bO satisfies supp f ;
kf kH 1 ./
Ccl O 2 D Ccl kH d=2 bk2 : kH d=2 bk s Cb1 Cb1
(9.30)
Let u ´ L1 f , and let v 2 V be the local approximation introduced in (9.29). Since v approximates u only locally, we need local variants of ƒ and ˆ: ƒ t W L2 . / ! RtO ; ˆ t W RtO ! Vh ;
v 7! .i .v//i2tO ; X y 7! yi 'i : i2tO
O O D Sj O b. We let xQ ´ ƒ t u. According to the definition of S , we have xQ D .S b/j t t Os Let us now turn our attention to the local approximation of u. Due to \ ¤ ; and \ ¤ ;, we have dist.; / diam. /, and we have already seen that we can find a function v 2 V with kuj vkH 1 . / Capx .dist.; /=8 C 1/q p kf kH 1 ./ Capx .diam. /=8 C 1/q p kf kH 1 ./ :
(9.31)
We let yQ ´ ƒ t v and observe that Lemma 9.8 implies 1 1 kˆ t .xQ y/k Q L2 . / D kˆ t ƒ t .uj v/kL2 . / Cb1 Cb1 1 Ccl D kI t .uj v/kL2 . / kuj vkL2 . / : (9.32) Cb1 Cb1
kH td=2 .xQ y/k Q L2 . /
Now we can define Cblk ´ Capx .diam. /=8 C 1/
Ccl2 2 Cb1
and combining (9.30), (9.31) and (9.32) yields kH td=2 .xQ y/k Q L2 . / Cblk q p kHsd=2 bk2 : of S jtOOs : we Using this result, we can now derive the low-rank approximation X t;s Y t;s introduce the space Zh ´ fH td=2 ƒ t w W w 2 V g
and observe k ´ dim Zh dim V Cdim p d C1 for the dimension of Zh and H td=2 yQ D H td=2 ƒ t v 2 Zh . We fix an orthogonal basis of Zh , i.e., a matrix Q 2 RtOk with orthogonal columns and range Q D Zh . We define zQ ´ H t1=2 QQ H td=2 x. Q Since Q is orthogonal, QQ is the orthogonal projection onto Zh and we get hH td=2 .xQ z/; Q wi2 D hH td=2 xQ QQ H td=2 x; Q QQ wi2 D 0
384
9 Application to elliptic partial differential operators
for all w 2 Zh , , i.e., H td=2 zQ is the best approximation of H td=2 xQ in the space Zh . In Q and we get particular, H td=2 zQ is at least as good as H td=2 y, kH td=2 .xQ z/k Q 2 kH td=2 .xQ y/k Q 2 Cblk q p kHs1=2 bk2 : We let X t;s ´ H t1=2 Q 2 RtOk ;
Y t;s ´ S jtOOs H td=2 Q 2 RsO k
and conclude b; zQ D H t1=2 QQ H td=2 xQ D .H t1=2 Q/.Q H td=2 S jtOOs /b D X t;s Y t;s
which completes the proof.
9.4 Compression of the discrete solution operator In order to apply Theorem 9.11, we have to use an admissibility condition of the type (9.10). The straightforward choice for the convex set is the bounding box (cf. (3.10)), and with this choice, the admissibility condition takes the form maxfdiam.Q t /; diam.Qs /g 2 dist.Q t ; Qs / we have already encountered in (4.49). According to Corollary 6.17, finding low-rank approximations for the total cluster basis .S t / t2T (cf. Definition 6.11) corresponding to the matrix S is equivalent to finding an orthogonal nested cluster basis V D .V t / t2T such that the orthogonal projection …T ;V; S into the space of left semi-uniform matrices defined by T and V is a good approximation of S. Lemma 9.12 (Approximation of S t ). Let 2 R>0 and q 2 .0; 1/. There are constants Cblk ; Cdim 2 R>0 depending only on , q, and the shape regularity of the mesh such that for all t 2 T and all p 2 N2 we can find a rank k 2 N with k Cdim p d C1 and matrices X t 2 RtOk ; Y t 2 Rk with kH d=2 .S t X t Y t /bk2 Cblk q p kH d=2 bk
for all b 2 R :
Proof. Let t 2 T . The matrix S t of the total cluster basis of S is given by X St D t Ss ; s2row .t/
i.e., it corresponds to the restriction of S to the rows in tO and the columns in [ Nt ´ sO : s2row .t/
9.4 Compression of the discrete solution operator
385
Due to definition (cf. Lemma 5.7) we can find a t C 2 pred.t / for each s 2 row .t / such that .t C ; s/ is an admissible leaf of T , i.e., .t C ; s/ 2 LC . Since we are using the admissibility condition (4.49), this implies diam.Q t / diam.Q t C / 2 dist.Q t C ; Qs / 2 dist.Q t ; Qs /: We let ´ Qt ;
´
[
Qs
s2row .t/
and conclude that is convex, that supp 'i ;
supp 'j
hold for all i 2 tO; j 2 N t ;
and that diam./ D diam.Q t / minf2 dist.Q t ; Qs / W s 2 row .t /g D 2 dist.; / holds, i.e., and satisfy the requirements of Theorem 9.11. Applying the theorem and extending the resulting rank-k-approximation of S jtON t by zero completes the proof. Theorem 9.13 (Approximation of S). Let 2 R>0 and q 2 .0; 1/. There are constants Cblk ; Cdim 2 R>0 depending only on , q, and the shape regularity of the mesh such that for all p 2 N2 we can find a nested cluster basis V D .V t / t2T with a rank distribution .L t / t2T satisfying #L t Cdim p d C1 and z 2 Cyblk .p C 1/pc q p kH d=2 bk2 kH d=2 .S S/bk
for all b 2 R ;
where Sz 2 H 2 .T ; V; V / is an H 2 -matrix and p is the depth of T . Proof. Let Cblk and Cdim be defined as in Lemma 9.12. We let Sy ´ H d=2 SH d=2 and denote its total cluster basis by .Syt / t2T . Replacing b by H 1=2 bO in this lemma yields O 2 Cblk q p kbk O 2 /H d=2 bk kH d=2 .S t X t;s Y t;s
for all bO 2 R ;
i.e., the matrices Syt can be approximated by rank Cdim p d C1 up to an accuracy of Cblk q p . We apply Corollary 6.18 to prove that there exists a nested orthogonal cluster basis Q D .Q t / t2T with rank distribution L D .L t / t2T satisfying #L t Cdim p d C1 and X 2 2p 2 y 22 Cblk q Cblk .#T /q 2p : (9.33) kSy …T ;Q; ŒSk t2T
This provides us with an estimate for the row cluster basis.
386
9 Application to elliptic partial differential operators
Since L is self-adjoint, we have L D L and L1 D .L1 / and conclude S D .ƒL1 ƒ / D .ƒ / .L1 / ƒ D ƒL1 ƒ D S; therefore we can use Q also as a column cluster basis. Restricting (9.33) to admissible blocks b D .t; s/ 2 T yields p y s k2 Cblk #T q p for all b D .t; s/ 2 LC : k t Sys Q t Qt S We introduce the cluster basis V D .V t / t2T by V t ´ H d=2 Q t define the approximation of S by X Sz ´ t Ss C bD.t;s/2L
for all t 2 T ; X
V t Qt SyQs Vs
bD.t;s/2LC
and get z s /H d=2 k22 D kH d=2 . t Ss V t Qt SyQs Vs /H d=2 k22 kH d=2 . t .S S/ y s Q t Qt SQ y s Qs k22 D k t S y s Q t Qt S y s k22 C kQ t Qt .Sys SyQs Qs /k22 D k t S y s Q t Qt S y s k22 C ks Sy t Qs Qs Sy t k22 k t S 2 2Cblk .#T /q 2p
for all b D .t; s/ 2 LC :
We use Theorem 4.47 to get kH d=2 .S Sz/H d=2 k2 Csp
p X
Cblk
p p 2#T q p Cblk Csp .p C 1/ 2#T q p ;
`D0
p
and setting Cyblk ´ Csp Cblk 2 and substituting b for bO completes the proof.
Chapter 10
Applications
We have considered methods for approximating integral operators in the Chapters 2 and 4, we have discussed methods for reducing the storage requirements of these approximations in the Chapters 5 and 6, and we have investigated techniques for performing algebraic operations with H 2 -matrices in the Chapters 7 and 8. Using the tools provided by these chapters, we can now consider practical applications of H 2 -matrices. Due to their special structure, H 2 -matrix techniques are very well suited for problems related to elliptic partial differential equations, and we restrict our attention to applications of this kind: • In Section 10.1, we consider a simple boundary integral equation related to Poisson’s equation on a bounded or unbounded domain. • In Section 10.2, we approximate the operator mapping Dirichlet to Neumann boundary values of Poisson’s equation. • Section 10.3 demonstrates that the approximative arithmetic operations for H 2 matrices can be used to construct efficient preconditioners for boundary integral equations. • Section 10.4 shows that our techniques also work for non-academic examples. • Section 10.5 is devoted to the construction of solution operators for elliptic partial differential equations. These operators are non-local and share many properties of the integral operators considered here, but they cannot be approximated by systems of polynomials. Our goal in this chapter is not to prove theoretical estimates for the accuracy or the complexity, but to demonstrate that the H 2 -matrix techniques work in practice and to provide “rules of thumb” for choosing the parameters involved in setting up the necessary algorithms. Assumptions in this chapter: We assume that cluster trees T and TJ for the finite index sets and J are given. Let T be an admissible block cluster tree for T . Let p and pJ be the depths of T and TJ .
388
10 Applications
10.1 Indirect boundary integral equation Let R3 be a bounded Lipschitz domain. We consider Laplace’s equation u ´
3 X
@2i u D 0
(10.1a)
iD1
with Dirichlet-type boundary conditions uj D uD
(10.1b)
for the solution u 2 H 1 . / and the Dirichlet boundary values uD 2 H 1=2 ./.
Boundary integral equation According to [92], Subsection 4.1.1, we can solve the problem (10.1) by finding a density function f 2 H 1=2 ./ satisfying Symm’s integral equation Z f .y/ dy D uD .x/ for almost all x 2 : (10.2)
4kx yk2 Using this f , the solution u 2 H 1 . / is given by Z f .y/ u.x/ D dy for almost all x 2 : 4kx yk2
We introduce the single layer potential operator Z V W H 1=2 ./ ! H 1=2 ./; f 7! x 7!
f .y/ dy ; 4kx yk2
and note that (10.2) takes the form V f D uD :
(10.3)
We multiply both sides of the equation by test functions v 2 H 1=2 ./, integrate, and see that (10.3) is equivalent to the variational equation aV .v; f / D hv; uD i
for all v 2 H 1=2 ./
with the bilinear form aV W H 1=2 ./ H 1=2 ./ ! R; Due to [92], Satz 3.5.3, aV is H tion.
1=2
Z .v; f / 7!
Z v.x/
(10.4)
f .y/ dy dx: 4kx yk2
./-coercive, therefore (10.4) has a unique solu-
10.1 Indirect boundary integral equation
389
Discretization In order to discretize (10.4) by Galerkin’s method, we assume that can be represented by a conforming triangulation G ´ fi W i 2 g (cf. Definition 4.1.2 in [92]) and introduce the family .'i /i2 of piecewise constant basis functions given by ´ 1 if x 2 i ; 'i .x/ ´ for all i 2 ; x 2 : 0 otherwise Applying Galerkin’s method to the bilinear variational equation (10.4) in the finitedimensional space S0 ´ spanf'i W i 2 g H 1=2 ./ leads to the system of linear equations Vx D b for the matrix V 2 R and the right-hand side vector b 2 R defined by Z Z 'j .y/ Vij ´ 'i .x/ dy dx 4kx yk2
Z Z 1 D dy dx for all i; j 2 ; 4kx yk2 Zi j uD .x/ dx for all i 2 : bi ´
(10.5)
(10.6)
i
The solution vector x 2 R of (10.5) defines the approximation X fh ´ xi 'i i2
of the solution f 2 H 1=2 ./.
Approximation of the stiffness matrix We can see that Vij > 0 holds for all i; j 2 , therefore the matrix V is densely populated and cannot be handled efficiently by standard techniques. Fortunately, the matrix V 2 R fits perfectly into the framework presented in Chapter 4, i.e., we can construct an H 2 -matrix approximation Vz 2 R and solve the perturbed system Vz xQ D b:
(10.7)
390
10 Applications
The solution of the perturbed system is given by X xQ i 'i fQh ´ i2
and can be characterized by the variational equation aQ V .vh ; fQh / D hvh ; uD i
for all vh 2 S0 ;
where the perturbed bilinear form aQ V .; / is defined by aQ V .'i ; 'j / ´ Vzij
for all i; j 2 :
(10.8)
We have to ensure that approximating V by Vz , i.e., approximating a.; / by a.; Q /, does not reduce the speed of convergence of the discretization scheme. According to [92], Satz 4.1.32, we can expect kf fh k1=2; . h3=2 in the case of maximal regularity, i.e., if f 2 H 1 ./ holds, therefore our goal is to choose the accuracy of the approximation Vz in such a way that kf fQh k1=2; . h3=2 holds. Strang’s first lemma (e.g., Theorem 4.1.1 in [34] or Satz III.1.1 in [26]) yields the estimate jaV .vh ; wh / aQ V .vh ; wh /j kf fQh k1=2; . inf kf vh k1=2; C sup vh 2S0 kwh k1=2; wh 2S0 if aQ V is H 1=2 ./-elliptic, i.e., if aQ V is “close enough” to the elliptic bilinear form aV . We choose vh D fh in order to bound the first term on the right-hand side and now have to prove that the second term is also bounded. The latter term is related to the approximation error introduced by using Vz instead of V . In order to bound it, we assume that the triangulation G is shape-regular and quasiuniform with grid parameter h 2 R>0 . Due to the inverse estimate [38], Theorem 4.6, we have h1=2 kwh k0; . kwh k1=2; and conclude
for all wh 2 S0
jaV .vh ; wh / aQ V .vh ; wh /j 1=2 sup kf vh k1=2; Ch : vh 2S0 kwh k0; wh 2S0
kf fQh k1=2; . inf
We can bound the right-hand side of this estimate by using vh D fh and the following estimate of the approximation error:
391
10.1 Indirect boundary integral equation
Lemma 10.1 (Accuracy of aQ V .; /). We have jaV .vh ; wh / aQ V .vh ; wh /j . hd kV Vz k2 kvh k0; kwh k0;
for all vh ; wh 2 S0 :
Proof. Let x; y 2 R be the coefficient vectors of vh and wh satisfying X X vh D yj 'j : xi 'i ; wh D j 2
i2
We have
ˇXX ˇ ˇ ˇ jaV .vh ; wh / aQ V .vh ; wh /j D ˇ xi yj .aV .'i ; 'j / aQ V .'i ; 'j //ˇ i2 j 2J
ˇXX ˇ ˇ ˇ Dˇ xi yj .Vij Vzij /ˇ i2 j 2J
D jhx; .V Vz /yi2 j kV Vz k2 kxk2 kyk2 : Due to [38], Proposition 3.1, we find hd =2 kxk2 . kvh k0; ;
hd =2 kyk2 . kwh k0; ;
and combining these estimates completes the proof. Applying this result to the estimate from Strang’s lemma yields kf fQh k1=2; . kf fh k1=2; C hd 1=2 kV Vz k2 kfh k0; if kV Vz k2 is small enough to ensure that Vz is still positive definite. Since fh converges to f if the mesh size decreases, we can assume that kfh k0; is bounded by a constant and get kf fQh k1=2; . kf fh k1=2; C hd 1=2 kV Vz k2 : If we can ensure kV Vz k2 . hd C2 ;
(10.9)
we get kf fQh k1=2; . h3=2 ; which yields the desired optimal order of convergence. Combining the error estimate provided by Theorem 4.49 with the simplified interpolation error estimate of Remark 4.23 yields kV Vz k2 . Cov C2 q m
p X `D0
² max
³ j t j .`/ W t 2 T : diam.Q t /
(10.10)
392
10 Applications
The piecewise constant basis functions satisfy Cov D 1 and C . hd =2 . The kernel function defining V has a singularity of order D 1, therefore we can assume j t j . diam.Q t /d D diam.Q t /: diam.Q t / This means that the sum in estimate (10.10) can be bounded by a geometric sum dominated by the large clusters (cf. Remark 4.50), and these clusters can be bounded independently of h. We conclude kV Vz k2 . hd q m : In order to guarantee (10.9), we have to choose the order m 2 N large enough to yield q m . h2 , i.e., log.h/= log.q/ . m.
Experiments According to our experiments (cf. Table 4.2), increasing the interpolation order by one if the mesh width h is halved should be sufficient to ensure q m . h2 and therefore kf fQh k1=2; . h3=2 . We test the method for the approximations of the unit sphere S used in the previous chapters and the harmonic functions u1 W R3 ! R;
x 7! x1 C x2 C x3 ;
u2 W R3 ! R;
x 7! x12 x32 ; 1 x 7! kx x0 k2
u3 W R3 ! R;
for x0 D .6=5; 6=5; 6=5/
by computing the corresponding densities f1;h , f2;h and f3;h and comparing Z fi;h .y/ ui;h .x/ ´ dy for all i 2 f1; 2; 3g
4kx yk2 to the real value of the harmonic function at the test point xO ´ .1=2; 1=2; 1=2/. Due to the results in Chapter 12 of [97], we expect the error to behave like i ´ jui .x/ O ui;h .x/j O . h3
for all i 2 f1; 2; 3g:
In a first series of experiments, we use interpolation (cf. Section 4.4) to create an initial approximation of V , and then apply Algorithm 30 with blockwise control of the relative spectral error (cf. Section 6.8) for an error tolerance O h2 . In order to reduce the complexity, we use an incomplete cross approximation [104] of the coupling matrices Sb , which can be computed efficiently by theACA algorithm [4]. The nearfield matrices are constructed using the black-box quadrature scheme described in [89], [43]. The
393
10.1 Indirect boundary integral equation
resulting system of linear equations is solved approximately by the conjugate gradient method [99] with a suitable preconditioner (cf. [50]). The parameters used for the interpolation and the recompression are given in Table 10.1: as usual, n gives the number of degrees of freedom, in this case the number of triangles of the surface approximation, m is the order of the initial interpolation and O is the tolerance used by the recompression algorithm. Table 10.1. Solving Poisson’s equation using an indirect boundary element formulation. The matrix is constructed by applying the recompression Algorithm 30 to an initial H 2 -matrix approximation constructed by interpolation.
n m O 512 3 23 2048 4 54 8192 5 14 32768 6 25 131072 7 56 524288 8 16 100000
Build Mem Mem=n 1 3:1 1:3 2:7 6:64 21:7 8:2 4:1 1:84 152:1 45:3 5:7 2:76 1027:8 238:1 7:4 2:97 6950:0 1160:9 9:1 1:27 67119:0 5583:8 10:9 4:08
2 2:74 2:35 2:36 2:67 3:08 4:69
0.14
Build O(n log(n)^4)
3 1:84 1:35 8:06 9:07 8:48 9:610
Build/n O(log(n)^4)
0.12
10000 0.1 1000
0.08 0.06
100
0.04 10 0.02 1 100
1000
10000
10000
100000
1e+06
0 100
1000
10000
0.001
Memory O(n log(n)^2)
1e+06
eps1 eps2 eps3 O(h^3)
0.0001
1000
100000
1e-05 1e-06
100 1e-07 1e-08
10
1e-09 1 100
1000
10000
100000
1e+06
1e-10 100
1000
10000
100000
1e+06
We know that a local rank of k log2 .n/ is sufficient to ensure (10.9), since multipole expansions require this rank. Due to the best-approximation property of Algorithm 25, and therefore Algorithm 30, we expect the compressed H 2 -matrix to
394
10 Applications
have the same rank, and this expectation is indeed justified by Table 10.1: the time required for the matrix construction is proportional to n log4 .n/, i.e., nk 2 for k log2 .n/, and the storage requirements are proportional to n log2 .n/, i.e., nk for our estimate of k. The surprisingly long computing time for the last experiment (n D 524288) may be attributed to the NUMA architecture of the computer: although the resulting matrix requires less than 6 GB, Algorithm 30 needs almost 40 GB for weight matrices of the original cluster bases. Since this amount of storage is not available on a single processor board of the computer, some memory accesses have to use the backplane, and the bandwidth of the backplane is significantly lower than that of the local storage. The high temporary storage requirements are caused by the high rank of the initial approximation constructed by interpolation: for m D 8, the initial rank is k D m3 D 512, and using two 512 512-matrices for each of the 23855 clusters requires a very large amount of temporary storage. In practical applications, it can even happen that the auxiliary matrices require far more storage than the resulting compact H 2 -matrix approximation (it is not uncommon to see that the matrices .RX;s /s2TJ require more than two times the storage of the final H 2 -matrix, and the ratio grows as the order m becomes larger). Therefore we look for an alternative approach providing a lower rank for the initial approximation. A good choice is the hybrid cross approximation technique [17], which constructs blockwise low-rank approximations of the matrix, thus providing an initial H -matrix approximation. We can apply the recompression Algorithm 33 in order to convert the low-rank blocks into H 2 -matrices as soon as they become available. This allows us to take advantage of both the fast convergence of the HCA approximation and the low storage requirements of the resulting H 2 -matrix. The results of this approach are listed in Table 10.2. We use a modified HCA implementation by Lars Grasedyck that is able to increase the order m according to an internal heuristic depending on the prescribed accuracy O . The resulting orders are given in the column “mad ”. We can see that this approach yields results similar to those provided by Algorithm 30, but that the reduction of the amount of temporary storage seems to lead to an execution time proportional to n log5 .n/ instead of the time proportional to n log4 .n/ that we could observe for Algorithm 30. Remark 10.2 (Symmetry). In the case investigated here, the matrix V is symmetric and can therefore be approximated by a symmetric H 2 -matrix Vz . Instead of storing the entire matrix, it would be sufficient to store the lower or upper half, thus reducing the storage and time requirements by almost 50 percent.
395
10.2 Direct boundary integral equation
Table 10.2. Solving Poisson’s equation using an indirect boundary element formulation. The matrix is constructed by applying the recompression Algorithm 33 to an initial H -matrix approximation constructed by the HCA method.
n 512 2048 8192 32768 131072 524288
O 23 54 14 25 56 16
1e+06
Mem Mem=n 1 mad Build 4 7:1 1:6 3:2 4:24 5 35:4 8:1 4:0 5:76 7 298:3 45:6 5:7 5:37 9 2114:5 240:9 7:5 1:47 11 12774:5 1170:0 9:1 4:18 14 91501:2 5507:3 10:8 4:08
2 8:06 8:16 8:97 1:17 3:88 4:810
0.2
Build O(n log(n)^5)
Build/n O(log(n)^5)
0.18
100000
3 1:74 6:65 8:06 9:27 8:88 1:49
0.16 0.14
10000
0.12 1000
0.1 0.08
100
0.06 0.04
10
0.02 1 100
1000
10000
10000
100000
1e+06
0 100
1000
10000
0.001
Memory O(n log(n)^2)
1e+06
eps1 eps2 eps3 O(h^3)
0.0001
1000
100000
1e-05 1e-06
100 1e-07 1e-08
10
1e-09 1 100
1000
10000
100000
1e+06
1e-10 100
1000
10000
100000
1e+06
10.2 Direct boundary integral equation Now we consider a refined boundary integral approach for solving Laplace’s equation (10.1) with Dirichlet boundary values. Instead of working with the density function f , we now want to compute the Neumann values @u uN W ! R; x 7! .x/; @n corresponding to the Dirichlet values uD D uj of the harmonic function u directly.
396
10 Applications
According to [97], Kapitel 12, both functions are connected by the variational equation 1 aV .v; uN / D aK .v; uD / C hv; uD i0; 2
for all v 2 H 1=2 ./;
(10.11)
where the bilinear form aK corresponding to the double layer potential is given by Z Z hx y; n.y/iu.y/ 1=2 1=2 aK W H ./ H ./ ! R; .v; u/ 7! v.x/ dy dx: 4kx yk32
As in the previous example, existence and uniqueness of uN 2 H 1=2 ./ is guaranteed since aV is H 1=2 ./-coercive.
Discretization Usually we cannot compute the right-hand side of (10.11) explicitly, instead we rely on a discrete approximation: we introduce the family .xj /j 2J of vertices of the triangulation G and the family . j /j 2J of corresponding continuous piecewise linear basis functions given by ´ 1 if i D k; for all j; k 2 J; j .xk / D 0 otherwise j j i
for all i 2 ; j 2 J:
is affine
We approximate uD by its L2 -orthogonal projection uD;h into the space S1 ´ spanf
j
W j 2 Jg H 1=2 ./:
This projection can be computed by solving the variational equation hvh ; uD;h i0; D hvh ; uD i0;
for all vh 2 S1 ;
(10.12)
and the corresponding system of linear equations is well-conditioned and can be solved efficiently by the conjugate gradient method. Now we proceed as before: we replace the infinite-dimensional space H 1=2 ./ by the finite-dimensional space S0 and the exact Dirichlet value uD by uD;h and get the system of linear equations 1 Vx D K C M y (10.13) 2 for the matrix V 2 R already introduced in (10.6), the matrix K 2 RJ defined by Z Z hx y; n.y/i2 j .y/ Kij ´ 'i .x/ dy dx for all i 2 ; j 2 J; 4kx yk32
397
10.2 Direct boundary integral equation
the mass matrix M 2 RJ defined by Z 'i .x/ j .y/ dy dx Mij ´
for all i 2 ; j 2 J;
and the coefficient vector y 2 RJ corresponding to uD;h , which is defined by X yj j : uD;h D j 2J
Since aV is H 1=2 ./-coercive, the matrix V is symmetric and positive definite, so (10.13) has a unique solution x 2 R which corresponds to the Galerkin approximation X xi 'i uN;h ´ i2
of the Neumann boundary values uN .
Approximation of the matrices The mass matrix M and the matrix corresponding to (10.12) are sparse and can be computed exactly by standard quadrature. The matrices V and K are not sparse, therefore we have to replace them by efficient approximations. For V we can use interpolation (cf. Section 4.4), while for K the equation hx y; n.y/i2 @ 1 D 3 @n.y/ 4kx yk2 4kx yk2 suggests using derivatives of interpolants (cf. Section 4.5). Replacing V and K by their respective H 2 -matrix approximations Vz and Kz leads to the perturbed system
1 Vz xQ D Kz C M y: 2
(10.14)
The solution of the corresponding perturbed system corresponds to the function X uQ N;h ´ xQ i 'i i2
and can be characterized by the variational equation 1 aQ V .vh ; uQ N;h / D aQ K .vh ; uD;h / C hvh ; uD;h i0; 2
for all vh 2 S0
for the perturbed bilinear forms aQ V defined by (10.8) and aQ K defined by aQ K .'i ;
j/
D Kzij
for all i 2 ; j 2 J:
(10.15)
398
10 Applications
Strang’s first lemma (e.g., Satz III.1.1 in [26]) yields kuN uQ N;h k1=2; . inf kuN vh k1=2; vh 2S0
C sup wh 2S0
C sup wh 2S0
jaV .vh ; wh / aQ V .vh ; wh /j kwh k1=2;
Q h /j j.wh / .w kwh k1=2;
for the right-hand side functional 1 .wh / ´ aK .wh ; uD / C hwh ; uD i0; 2
for all wh 2 S0
and its perturbed counterpart Q h / ´ aQ K .wh ; uD;h / C 1 hwh ; uD;h i0; for all wh 2 S0 : .w 2 We have already given estimates for the first two terms of the error estimate in the previous section, so we now only have to find a suitable bound for the difference of the right-hand sides. To this end, we separate the approximation of uD and the approximation of aK : Q h / D aK .wh ; uD / aQ K .wh ; uD;h / C 1 hwh ; uD uD;h i0; .wh / .w 2 D aK .wh ; uD uD;h / C aK .wh ; uD;h / 1 aQ K .wh ; uD;h / C hwh ; uD uD;h i0; : 2 Since uD;h is computed by the L2 -projection, we have kuD uD;h k1=2; . h3=2 according to [97], equation (12.19), in the case of optimal regularity uD 2 H 2 ./. This implies hwh ; uD uD;h i0; kwh k1=2; kuD uD;h k1=2; . h3=2 kwh k1=2; : According to Satz 6.11 in [97], we have aK .wh ; uD uD;h / . kwh k1=2; kuD uD;h k1=2; . h3=2 kwh k1=2; : For the perturbed bilinear form, combining the technique used in the proof of Lemma 10.1 with the inverse estimate ([38], Theorem 4.6) yields z 2 kwh k0; kuD;h k0; jaK .wh ; uD;h / aQ K .wh ; uD;h /j . hd kK Kk d 1=2 z 2 kwh k1=2; kuD;h k0; : .h kK Kk
10.2 Direct boundary integral equation
399
Due to uD 2 H 1=2 ./ L2 ./, we can assume that kuD;h k0; is bounded independently of h and conclude Q h /j . .hd 1=2 kK Kk z 2 C h3=2 /kwh k1=2; ; j.wh / .w i.e., if we can ensure
z 2 . hd C2 ; kK Kk
(10.16)
we get kuN uQ N;h k1=2; . h3=2 : Since the kernel function corresponding to K is based on first derivatives of a kernel function with singularity order D 1, Corollary 4.32 yields an approximation error of kg gk Q 1; t s .
qm dist.Q t ; Qs /2
for all b D .t; s/ 2 LC J :
Assuming again j t j . diam.Q t /d ;
j s j . diam.Qs /d
for all t 2 T ; s 2 TJ
and applying Theorem 4.49 gives us z 2 . Cov C CJ q m maxfp ; pJ g: kK Kk
(10.17)
For our choice of basis functions, we have Cov D 3, C ; CJ . hd =2 and get z 2 . hd q m maxfp ; pJ g: kK Kk In order to guarantee (10.16), we have to choose the order m large enough to yield q m maxfp ; pJ g . h2 .
Experiments We can assume p . log n and pJ . log nJ , therefore our previous experiments (cf. Table 4.4) suggest that increasing the interpolation order by one if the mesh width h is halved should be sufficient to ensure q m maxfp ; pJ g . h2 and therefore the desired error bound kuN uQ N;h k1=2; . h3=2 . We once more test the method for the unit sphere and the harmonic functions u1 , u2 and u3 introduced in the previous section. For these functions, the Neumann boundary values u1;N , u2;N and u3;N can be computed and compared to their discrete approximations u1;N;h , u2;N;h and u3;N;h . Since computing a sufficiently accurate approximation of the norm of H 1=2 ./ is too complicated, we measure the L2 -norm instead. According to the inverse estimate [38], Theorem 4.6, we can expect i ´ kuN;i uN;i;h k0; . h
for all i 2 f1; 2; 3g:
400
10 Applications
The results in Table 10.3 match our expectations: the errors decrease at least like h, the faster convergence of 1 is due to the fact that uN;1 can be represented exactly by the discrete space, therefore the discretization error is zero and the error 1 is only caused by the quadrature and the matrix approximation, and its fast convergence is owed to the fact that we choose the error tolerance O slightly smaller than strictly necessary. Table 10.3. Solving Poisson’s equation using a direct boundary element formulation. The matrix V is constructed by applying the recompression Algorithm 30 to an initial H 2 -matrix approximation constructed by interpolation. The matrix K is approximated on the fly by interpolation and not stored.
n m O 512 3 23 2048 4 54 8192 5 14 32768 6 25 131072 7 56 524288 8 16 100000
Build Mem Mem=n 1 3:1 1:3 2:7 3:22 21:7 8:2 4:1 1:12 152:1 45:3 5:7 4:43 1027:8 238:1 7:4 1:23 6950:0 1160:9 9:1 4:64 67119:0 5583:8 10:9 1:44
2 2:51 1:21 6:22 3:12 1:52 7:73
0.14
Bld O(n log(n)^4)
3 5:12 2:32 1:52 5:63 2:83 1:43
Bld/n O(log(n)^4)
0.12
10000 0.1 1000
0.08 0.06
100
0.04 10 0.02 1 100
1000
10000
10000
100000
1e+06
0 100
Mem O(n log(n)^2)
1000
0.1
100
0.01
10
0.001
1 100
1000
10000
100000
1000
10000
1
1e+06
0.0001 100
100000
1e+06
eps1 eps2 eps3 O(h)
1000
10000
100000
1e+06
10.3 Preconditioners for integral equations
401
10.3 Preconditioners for integral equations According to Lemma 4.5.1 in [92], the condition number of the matrix V can be expected to behave like h1 , i.e., the linear systems (10.5) and (10.13) become illconditioned if the resolution of the discrete space grows. For moderate problem dimensions, this poses no problem, since the conjugate gradient method used for solving the linear systems is still quite fast, especially if V is approximated by an H 2 -matrix and Algorithm 8 is used to perform the matrix-vector multiplication required in one step of the solver. For large problem dimensions, the situation changes: due to the growing condition number, a large number of matrix-vector multiplications are required by the solver, and each of these multiplications requires an amount of time that is not negligible.
Preconditioned conjugate gradient method If we still want to be able to solve the linear systems efficiently, we have to either speed up the matrix-vector multiplication or reduce the number of these multiplications required by the solver. The time per matrix-vector multiplication can be reduced by compressing the matrix as far as possible, e.g., by applying the Algorithms 30 or 33, since reducing the storage required for the H 2 -matrix also reduces the time needed to process it, or by parallelizing Algorithm 8. The number of multiplications required by the conjugate gradient method can be reduced by using a preconditioner, i.e., by solving the system Vy xO D bO for the transformed quantities Vy ´ P 1=2 VP 1=2 ;
xO ´ P 1=2 x;
bO ´ P 1=2 b;
where P 2 R is a symmetric positive definite matrix, called the preconditioning matrix for V . The idea is to construct a matrix P which is, in a suitable sense, “close” to V , since then we can expect Vy to be “close” to the identity, i.e., to be well-conditioned. The matrix P can be constructed by a number of techniques: wavelet methods [74], [40], [36] apply a basis transformation and suitable diagonal scaling, the matrix P can be constructed by discretizing a suitable pseudo-differential operator [98], but we can also simply rely on H 2 -matrix arithmetic operations introduced in Chapter 8 by computing an approximative LU factorization of V .
402
10 Applications
LU decomposition In order to do so, let us first consider a suitable recursive algorithm for the computation of the exact LU factorization of V . For all t; s 2 T , we denote the submatrix corresponding to the block .t; s/ by V t;s ´ V jtOOs . Let t 2 T be a cluster. The LU factorization of V t;t can be computed by a recursive algorithm: if t is a leaf cluster, the admissibility condition (4.49) and, in fact, all other admissibility conditions we have introduced, implies that .t; t / 2 L is an inadmissible leaf of the block cluster tree T . Therefore it is stored in one of the standard representations for densely populated matrices and we can compute the LU factorization of V t;t by standard algorithms. If t is not a leaf cluster, the admissibility condition guarantees that .t; t / is inadmissible, i.e., that V t;t is a block matrix. We consider only the simple case # sons.t / D 2, the general case # sons.t / 2 N can be handled by induction. Let sons.t / D ft1 ; t2 g. We have V t1 ;t1 V t1 ;t2 V t;t D V t2 ;t1 V t2 ;t2 and the LU factorization V t;t D L.t/ U .t/ takes the form ! L.t/ U t.t/ V t1 ;t1 V t1 ;t2 t1 ;t1 1 ;t1 D .t/ V t2 ;t1 V t2 ;t2 L.t/ L t2 ;t1 t2 ;t2
! U t.t/ 1 ;t2 ; U t.t/ 2 ;t2
which is equivalent to the four equations .t/ V t1 ;t1 D L.t/ t1 ;t1 U t1 ;t1 ;
V t1 ;t2 D V t2 ;t1 D V t2 ;t2 D
.t/ L.t/ t1 ;t1 U t1 ;t2 ; .t/ L.t/ t2 ;t1 U t1 ;t1 ; .t/ L.t/ t2 ;t1 U t1 ;t2 C
(10.18a) (10.18b) (10.18c) .t/ L.t/ t2 ;t2 U t2 ;t2 :
(10.18d)
According to (10.18a), we compute L.t1 / and U .t1 / by recursively constructing the LU .t1 / factorization V t1 ;t1 D L.t1 / U .t1 / of the submatrix V t1 ;t1 and setting L.t/ t1 ;t1 ´ L and U t.t/ ´ U .t1 / . 1 ;t1 .t/ .t/ .t/ Once we have L.t/ t1 ;t1 and U t1 ;t1 , we compute U t1 ;t2 and L t2 ;t1 by using equations (10.18b) and (10.18c). Then equation (10.18d) yields .t/ .t/ .t/ V t2 ;t2 L.t/ t2 ;t1 U t1 ;t2 D L t2 ;t2 U t2 ;t2 ; .t/ and we compute the left-hand side by constructing V t02 ;t2 ´ V t2 ;t2 L.t/ t2 ;t1 U t1 ;t2 , using .t2 / a recursion to find its LU factorization V t02 ;t2 D L.t2 / U .t2 / , and setting L.t/ t2 ;t2 ´ L
and U t.t/ ´ U .t2 / . 2 ;t2
10.3 Preconditioners for integral equations
403
Algorithm 55. LU decomposition. procedure LUDecomposition(t, X , var L, U ); if sons.t/ D ; then Compute factorization X D LU else # sons.t /; ft1 ; : : : ; t g sons.t /; for i; j 2 f1; : : : ; g do Xij X jtOi tOj end for; for i D 1 to do LUDecomposition(ti , Xi i , var Li i , Ui i ); for j 2 fi C 1; : : : ; g do Uij Xij ; fAlgorithm 56g LowerBlockSolve(ti , Li i , Uij ); Lj i Xj i ; UpperBlockSolve(ti , Ui i , Lj i ) fAlgorithm 57g end for; for j; k 2 fi C 1; : : : ; g do Xj k Xj k Lj i Uik fUse approximative multiplicationg end for end for; for i; j 2 f1; : : : ; g do if i > j then LjtOi ;tOj Lij ; U jtOi ;tOj 0 fUnify resulting matrices by Algorithm 32g else if i < j then LjtOi ;tOj 0; U jtOi ;tOj Uij fUnify resulting matrices by Algorithm 32g else LjtOi ;tOj Lij ; U jtOi ;tOj Uij fUnify resulting matrices by Algorithm 32g end if end for end if The equations (10.18b) and (10.18c) are solved by recursive forward substitution: for an arbitrary finite index set K and a matrix Y 2 RtOK , we can find the solution X 2 RtOK of Y D L.t/ X by again using the block representation. We let X t1 ´ XjtO1 K , X t2 ´ X jtO2 K , Y t1 ´ Y jtO1 K and Y t2 ´ Y jtO2 K and get ! L.t/ X t1 Y t1 t1 ;t1 D ; .t/ .t/ Y t2 X t2 L t2 ;t1 L t2 ;t2 which is equivalent to Y t1 D L.t/ t1 ;t1 X t1 ;
(10.19a)
404
10 Applications .t/ Y t2 D L.t/ t2 ;t1 X t1 C L t2 ;t2 X t2 :
(10.19b)
The first equation (10.19a) is solved by recursion. For the second equation (10.19b), we .t/ 0 compute the matrix Y t02 ´ Y t2 L.t/ t2 ;t1 X t1 and then solve the equation Y t2 D L t2 ;t2 X t2 by recursion. For the general case # sons.t / 2 N, the entire recursion is summarized in Algorithm 56. Algorithm 56. Forward substitution for a lower triangular block matrix. procedure LowerBlockSolve(t, L, var X ); if sons.t/ D ; then Y X; Apply standard backward substitution to solve LX D Y else # sons.t /; ft1 ; : : : ; t g sons.t /; for i; j 2 f1; : : : ; g do Lij LjtOi tOj end for; for i 2 f1; : : : ; g do Xi X jtOi K end for; for i D 1 to do LowerBlockSolve(ti , Li i , Xi ); for j 2 fi C 1; : : : ; g do Xj Xj Lj i Xi fUse approximative multiplicationg end for end for; for i 2 f1; : : : ; g do Xj Oi K Xi fUnify resulting matrix by Algorithm 32g end for end if A similar procedure can be applied to (10.18c): for a finite index set K and a matrix Y 2 RKtO , we compute the solution X 2 RKtO of Y D X U .t/ by considering the subblocks X t1 ´ X jKtO1 , X t2 ´ X jKtO2 , Y t1 ´ Y jKtO1 and Y t2 ´ Y jKtO2 and the corresponding block equation !
U t.t/;t U t.t/;t 1 1 1 2 Y t1 Y t 2 D X t1 X t2 : U t.t/ 2 ;t2 The resulting equations Y t1 D X t1 U t.t/ ; 1 ;t1 Y t2 D
X t1 U t.t/ 1 ;t2
C
(10.20a) X t2 U t.t/ 2 ;t2
(10.20b)
10.3 Preconditioners for integral equations
405
and then are solved by applying recursion to (10.20a), computing Y t02 ´ Y t2 X t1 U t.t/ 1 ;t2
. The entire recursion is given in Algorithm 57. applying recursion to Y t02 D X t2 U t.t/ 2 ;t2 Algorithm 57. Forward substitution for an upper triangular block matrix. procedure UpperBlockSolve(t, U , var X ); if sons.t/ D ; then Y X; Apply standard backward substitution to solve X U D Y else # sons.t /; ft1 ; : : : ; t g sons.t /; for i; j 2 f1; : : : ; g do Uij U jtOi tOj end for; for i 2 f1; : : : ; g do Xi X jKtOi end for; for i D 1 to do UpperBlockSolve(ti , Ui i , Xi ); for j 2 fi C 1; : : : ; g do Xj Xj Xi Uij fUse approximative multiplicationg end for end for; for i 2 f1; : : : ; g do XjK Oi Xi fUnify resulting matrix by Algorithm 32g end for end if
The Algorithms 55, 56 and 57 are given in the standard notation for dense matrices and will compute the exact LU factorization of a given matrix. An excellent preconditioner for our system V x D b would then be the matrix P D LU , since P 1 D U 1 L1 can be easily computed by forward and backward substitution and P 1=2 VP 1=2 D I means that the resulting preconditioned method would converge in one step.
Approximate H 2 -LU decomposition This strategy is only useful for small problems, i.e., in situations when the matrices can be stored in standard array representation. For large problems, we cannot treat V in this way, and we also cannot compute the factors L and U in this representation, either. Therefore we use the H 2 -matrix approximation Vz of V and try to compute H 2 -matrix z and Uz of the factors L and U . approximations L
406
10 Applications
Fortunately, switching from the original Algorithms 55, 56 and 57 to their approximative H 2 -matrix counterparts is straightforward: we replace the exact matrix-matrix multiplication by the approximative multiplication algorithms introduced in Chapter 7 or Chapter 8. This allows us to compute all submatrices Lij , Uij and Xij in the algorithms efficiently. Since we are working with H 2 -matrices, we cannot simply combine the submatrices to form L, U or X , but we have to use the unification Algorithm 32 to construct uniform row and column cluster bases for the results. z Uz Vz is a sufficiently good approximation, we can now use P ´ Assuming that L z z LU as a preconditioner if we are able to provide efficient algorithms for evaluating z D y and Uz x D y for vectors z 1 , which is equivalent to solving Lx P 1 D Uz 1 L x; y 2 R . We consider only the forward substitution, i.e., the computation of x 2 R solving Lx D y for a vector y 2 R and a lower triangular H 2 -matrix for a cluster tree T and nested cluster bases V D .V t / t2T and W D .Ws /s2T with transfer matrices .E t / t2T and .Fs /s2T , respectively. Let us first investigate the simple case # sons.t / D 2 and ft1 ; t2 g D sons.t /. As before, we let L11 ´ LjtO1 tO1 , L21 ´ LjtO2 tO1 , L22 ´ LjtO2 tO2 , x1 ´ xjtO1 , x2 ´ xjtO2 , y1 ´ yjtO1 and y2 ´ yjtO2 and observe that Lx D y is equivalent to y1 L11 x1 D ; y2 L21 L22 x2 i.e., to the two equations y1 D L11 x1 ; y2 D L21 x1 C L22 x2 : The first of these equations can be solved by recursion, for the second one we introduce y20 ´ y2 L21 x1 and solve L22 x2 D y20 , again by recursion. So far, the situation is similar to the one encountered in the discussion of Algorithm 56. Now we have to take a look at the computation of y20 . If we would use the standard matrix-vector multiplication Algorithm 8 to compute L21 x1 , the necessary forward and backward transformations would lead to a suboptimal complexity, since some coefficient vectors would be recomputed on all levels of the cluster tree. In order to reach the optimal order of complexity, we have to avoid these redundant computations. Fortunately, this is possible: in the simple case of our 2 2 block matrix, we can see that the computation of x1 requires no matrix-vector multiplication, and therefore no forward or backward transformations. The computation of L21 x1 requires the coefficient vectors xO s ´ Ws x for all s 2 sons .t1 / and computes contributions to y only in the coefficient vectors yOs for all s 2 sons .t2 /. This observation suggests a simple algorithm: we assume that y is given in the form X y D y0 C V t yO t t 2sons .t/
10.3 Preconditioners for integral equations
407
and that we have to compute xjtO and all coefficient vectors xO s ´ Ws x for s 2 sons .t/. If t is a leaf, we have sons .t / D ft g and compute y D y0 C V t yO t directly. Since t is a leaf, LjtOtO is a dense matrix and we use the standard forward substitution algorithm for dense matrices. Then we can compute xO t ´ W t x directly. If t is not a leaf, we use the equation X V t yO t D V t 0 E t 0 yO t t 0 2sons.t/
to express yO t in terms of the sons of t, perform the recursion described above, and then assemble the coefficient vector X F t0 xO t 0 : xO t D t 0 2sons.t/
In short, we mix the elementary steps of the forward and backward transformations with the computation of the solution. Algorithm 58. Forward substitution. procedure SolveLower(t , L, var x, y, x, O y); O if sons.t/ D ; then yjtO yjtO C .V t yO t /jtO ; Solve LjtOtO xjtO D yjtO ; xO t W t x else # sons.t /; ft1 ; : : : ; t g sons.t /; xO t 0; for i D 1 to do yO ti yO ti C E ti yO t ; for j 2 f1; : : : ; i 1g do b0 .ti ; tj /; for b D .t ; s / 2 sons .b 0 / do if b 2 L then yjtO LjtO Os xjsO yjtO else if b 2 LC then yO t Sb xO s yO t end if end for end for; SolveLower(ti , L, x, y, x, O y); O xO t C F ti xO ti xO t end for end if
408
10 Applications
The resulting recursive procedure is given in Algorithm 58. An efficient algorithm for the backward substitution to solve the system Ux D y can be derived in a very similar way. A closer investigation reveals that both algorithms have the same complexity as the standard matrix-vector multiplication Algorithm 8, i.e., that their complexity can be considered to be of optimal order.
Experiments In the first experiment, we construct an approximation Vz of the matrix V for a polygonal approximation of the unit sphere with n D 32768 degrees of freedom using the same parameters as before, i.e., with an interpolation order of m D 6, a tolerance Orc D 106 for the recompression and a quadrature order of 3. We have already seen that these parameters ensure that the error introduced by the matrix approximation is sufficiently small. The approximation of Vz requires 336:8 MB, its nearfield part accounts for 117:5 MB, and the construction of the matrix is accomplished in 1244:7 seconds. The preconditioner is constructed by computing an approximate H 2 -matrix LU factorization of Vz with an error tolerance O for the blockwise relative spectral error. Table 10.4 lists the experimental results for different values of O : the first column contains the value of O , the second the time in seconds required for the construction of z and Uz , the third the storage requirements in MB, the fourth the storage requirements L z Uz k2 computed by per degree of freedom in KB, the fifth an estimate for ´ kVz L a number of steps of the power iteration, the sixth gives the maximal number of steps that where required to solve the problems (10.5) and (10.13), and the last column gives the time required for solving the system. Without preconditioning, up to 321 steps of the conjugate gradient method are required to solve one of the problems (10.5) or (10.13), and due to the corresponding large number of matrix-vector multiplications the solver takes 270 seconds. Even with the lowest accuracy O D 0:2, the number of iterations is reduced significantly, and only 44:4 seconds are required by the solver, and the storage requirements of 142:1 MB are clearly dominated by the nearfield part of 117:5 MB and less than half of the amount needed by Vz . Although the estimate for seems to indicate that the approximate LU factors are too inaccurate, they are well-suited for the purpose of preconditioning. The construction of the approximate LU factors requires 434:6 seconds, i.e., approximately a third of the time required to find the approximate matrix Vz . If we use smaller values of O , we can see that the error is proportional to the error tolerance O and that the storage requirements and the time for preparing the preconditioner seem to grow like log2 .1=O /. Now let us consider the behaviour of the preconditioner for different matrix dimensions n. We have seen in the previous experiment that the error is proportional to the error tolerance O . Since the condition number can be expected to grow like h1 as the mesh parameter h decreases, we use the choice O h to ensure stable convergence rates.
409
10.3 Preconditioners for integral equations
Table 10.4. Preconditioners of different quality for the single layer potential matrix V and the matrix dimension n D 32768.
O 21 11 52 22 12 53 23 13 1000
Mem Mem=n 142:1 4:4 7:20 146:6 4:6 5:10 152:9 4:8 3:10 162:6 5:1 1:10 170:0 5:3 5:41 178:9 5:6 3:61 194:9 6:1 1:61 210:4 6:6 7:62
LU 434:6 471:1 524:8 613:1 682:6 746:7 851:3 958:3
Steps Solve 32 44:4 26 36:1 21 30:2 15 22:7 11 17:3 9 15:1 7 12:2 6 11:0
220
LU O(log^2(1/eps))
Memory O(log^2(1/eps))
210
900 200 800
190 180
700 170 600
160 150
500 140 400 0.001
0.01
10
0.1
1
130 0.001
0.01
0.1
35
Error O(eps)
1 Steps
30 1
25 20
0.1
15 10
0.01 0.001
0.01
0.1
1
5 0.001
0.01
0.1
1
The results in Table 10.5 suggest that this approach works: the number of iteration steps required for solving the system is bounded and even seems to decay for large problem dimensions, while the time for the construction of the preconditioner seems to grow like n log3 .n/, i.e., the ratio between the time required for the construction of Vz and the preconditioner improves as n grows. Both the storage requirements and the time for solving the linear system grow like n log2 .n/, i.e., at the same rate as the storage requirements for the matrix Vz . We can conclude that the preconditioner works well: the time for solving the linear systems (10.5) and (10.13) is proportional to the storage requirements for the matrix
410
10 Applications
Table 10.5. Preconditioners for the single layer potential matrix V for different matrix dimensions n and O h.
n 512 2048 8192 32768 131072 524288
O 1:01 6:72 3:22 1:62 7:93 3:93
Build LU Mem Mem=n 3:1 2:4 1:2 2:4 2:02 21:2 19:7 6:1 3:1 5:51 151:9 102:2 32:1 4:0 9:11 1038:3 582:1 165:1 5:2 7:51 6953:5 3131:9 825:9 6:5 9:91 66129:6 18771:4 4009:8 7:8 1:20
100000
10000
LU O(n log^3(n))
10000
1000
1000
100
100
10
10
1
1 100
1000
10000
16
100000
1e+06
0.1 100
Memory O(n log^2(n))
1000
1000
Steps
Steps Solve 11 0:1 13 0:9 13 4:2 13 18:8 10 56:1 10 259:8
10000
100000
1e+06
Solve O(n log^2(n))
14 100 12 10
10
8 1 6 4
0.1
2 100
1000
10000
100000
1e+06
0.01 100
1000
10000
100000
1e+06
approximation Vz , which is the best result we can hope for, and the construction of the preconditioner takes less time than the construction of Vz , it even becomes more efficient as the problem dimension grows. Remark 10.3 (Symmetry). If Vz is symmetric and positive definite, it is advisable to use the Cholesky decomposition instead of the general LU decomposition. In this way, the computation time can be halved, and the Cholesky factorization can be computed even if only the lower half of Vz has been stored (cf. Remark 10.2).
10.4 Application to realistic geometries
411
10.4 Application to realistic geometries Until now, we have only considered problems on “academical” domains: the unit circle or sphere and the unit square. Since these geometries are very smooth, we expect the discretization error to be small, at least for those solutions that are of practical interest. According to Strang’s first lemma, this means that we have to ensure that the approximation of the matrices V and K is very accurate if we intend to get the optimal result. For geometries appearing in practical applications, a lower accuracy may be acceptable, since the discretization error will usually be quite large, but the algorithm has to be able to handle non-uniform meshes, possibly with fine geometric details. In order to investigate the performance of our techniques under these conditions, we now consider geometries that are closer to practical applications: Figure 10.1 contains a surface resembling a crank shaft1 , a sphere2 pierced with cylindrical “drill holes” and an object with foam-like structure3 .
Figure 10.1. Examples of practical computational domains.
We test our algorithms in the same way as in the previous section: we construct an H 2 -matrix approximation of the matrix V on the surface mesh by compressing an initial approximation based on interpolation, we compute the L2 -projection of the Dirichlet values and apply an H 2 -matrix approximation of the matrix K to compute the right-hand side of equation (10.14). Then we solve the equation by a conjugate gradient method using an approximate LU factorization of Vz as a preconditioner. Since the triangulation is less regular than the ones considered before, we use higher quadrature orders q 2 f4; 5; 6g to compute the far- and nearfield entries and determine the accuracy of the matrix approximation by trial and error: the matrices are constructed for the interpolation orders m 2 f6; 7; 8g and the error tolerances O 2 f106 ; 107 ; 108 g, and we check whether the approximation of the Neumann data changes. The results for the “crank shaft” problem are given in Table 10.6: switching from m D 6 to m D 7 hardly changes the errors 2 and 3 , only the error 1 is affected. 1
This mesh was created by the NGSolve package by Joachim Schöberl. Also created by NGSolve. 3 This mesh is used by courtesy of Heiko Andrä and Günther Of. 2
412
10 Applications
Table 10.6. Matrix approximations of different degrees of accuracy for the “crank shaft” geometry.
m q
O
Matrix V Build Mem
Matrices LU Build Mem
Slv.
1
2
3
6 7 8
4 16 4 17 4 18
1543:3 456:3 1011:5 180:1 20:6 5:73 2478:7 597:1 881:4 213:1 18:9 3:13 5009:6 753:3 893:3 248:9 20:1 2:93
1:12 1:12 1:12
5:53 4:13 4:03
6 7 8
5 16 5 17 5 18
2268:6 456:3 1016:3 180:1 21:1 3505:1 597:1 890:6 213:1 18:9 6319:0 753:3 890:3 248:9 20:1
5:13 1:73 1:33
1:12 1:12 1:12
4:33 2:23 2:13
7 8
6 17 6 18
5311:9 597:1 8660:5 753:3
1:43 9:74
1:12 1:12
2:03 1:83
885:3 213:1 19:2 896:0 248:9 20:2
If we increase the quadrature order, the errors 1 and 3 decrease, the error 2 is not affected. As mentioned before, 1 is only determined by the quadrature and matrix approximation errors, since the corresponding solution uN;1 is contained in the discrete space and therefore would be represented exactly by an exact Galerkin scheme. If we increase the interpolation error to m D 8, all errors remain almost unchanged. The fact that 1 changes only slightly suggests that the remaining errors are caused by the quadrature scheme, not by the matrix approximation. Table 10.7. Matrix approximations of different degrees of accuracy for the “pierced sphere” problem.
O
Matrix V Build Mem
6 7 8
4 16 4 17 4 18
1682:1 440:4 2696:0 577:2 5155:2 729:5
6 7
5 16 5 17
2444:1 440:4 3705:9 577:2
5 6 7
6 56 6 16 6 17
m q
Matrices LU Build Mem
Slv.
1
2
3
955:1 873:1 883:5
187:2 24:2 213:9 24:2 249:3 23:8
7:82 7:82 7:82
4:42 4:42 4:42
6:02 6:02 6:02
959:2 871:4
187:2 24:3 4:32 213:9 22:8 4:32
3:82 3:82
3:32 3:32
2595:6 359:2 1168:0 174:7 25:7 2:72 3758:8 440:4 962:1 187:2 24:2 2:22 5497:8 577:2 874:8 213:9 22:8 2:22
3:72 3:62 3:62
2:02 1:82 1:82
10.5 Solution operators of elliptic partial differential equations
413
We can conclude that an interpolation order of m D 7 and a quadrature order of q D 5 are sufficient to approximate the solutions uN;2 and uN;3 up to their respective discretization errors. For the “pierced sphere” problem, the results are collected in Table 10.7: even an interpolation order of m D 6 seems to be sufficient, since changing to m D 7 or m D 8 does not significantly change the approximation error. Increasing the quadrature order q, on the other hand, reduces the errors 1 and 3 , therefore we can conclude that the quadrature error dominates and the H 2 -matrix approximation is sufficiently accurate. For the “foam block” problem, the situation is similar: Table 10.8 shows that an interpolation order of m D 5 seems to be sufficient, the quadrature error dominates. The approximations of uN;2 and uN;3 are only slightly changed if the interpolation or the quadrature order are increased. Table 10.8. Matrix approximations of different degrees of accuracy for the “foam block” problem.
m q
O
Matrix V Build Mem
Matrices LU Build Mem
Slv.
1
2
3
1926:0 817:3 2969:7 976:7 4693:8 1240:3 8017:0 1521:0
2151:0 290:0 1807:4 317:8 1703:4 382:7 1767:4 469:4
91:1 85:5 78:7 78:6
6:92 6:32 6:22 6:22
1:41 1:41 1:41 1:41
7:22 7:12 7:12 7:12
2852:8 817:3 2180:5 290:0 4295:7 976:7 1822:2 317:8 6561:9 1240:3 1693:7 382:6 10861:6 1521:0 1769:5 469:4
90:9 83:4 79:3 78:8
4:12 2:82 2:72 2:72
1:41 1:41 1:41 1:41
6:62 6:42 6:42 6:42
290:0 92:3 317:9 88:3
3:32 1:32
1:41 1:41
6:42 6:22
5 6 7 8
4 4 4 4
56 16 17 18
5 6 7 8
5 5 5 5
56 16 17 18
5 6
6 56 6 16
4503:1 6636:4
817:3 2164:0 976:7 1816:3
10.5 Solution operators of elliptic partial differential equations All examples we have considered so far have been, in one way or another, based on polynomial expansion: even the hybrid cross approximation relies on interpolation. Now we investigate a problem class that cannot be treated in this way: the approximation of the inverse of a finite element stiffness matrix corresponding to an elliptic partial differential equation (cf. Chapter 9).
414
10 Applications
Finite element discretization We consider the equation div C.x/ grad u.x/ D f .x/ u.x/ D 0
for all x 2 ; for all x 2 @ ;
(10.21a) (10.21b)
where R2 is a bounded domain and C 2 L1 . ; R22 / is a mapping that assigns a 2 2-matrix to each point x 2 . We assume that C is symmetric and that there are constants ˛; ˇ 2 R with 0 < ˛ ˇ and .C.x// Œ˛; ˇ for all x 2 , i.e., that the partial differential operator is strongly elliptic. The variational formulation of the partial differential equation (10.21) is given by Z Z hrv.x/; C.x/ru.x/i2 dx D v.x/f .x/ dx for all v 2 H01 . /; (10.22)
where u 2 H01 . / is now only a weak solution (which nevertheless coincides with u if a classical solution exists). The variational formulation is discretized by a Galerkin scheme: as in Section 10.2, we use the space S1 spanned by the family .'i /i2 of piecewise linear nodal basis functions on a conforming triangulation G and consider the equation Ax D b; (10.23) where the matrix A 2 R is defined by Z Aij D hr'i .x/; C.x/r'j .x/i2 dx
for all i; j 2 ;
(10.24)
the vector b 2 R is defined by Z bi D 'i .x/f .x/ dx
for all i 2
and the solution vector x 2 R corresponds to the discrete solution X uh ´ xi 'i i2
of the variational equation. A is a symmetric positive definite matrix, therefore we could compute approximations of its Cholesky or LU factorization in order to construct efficient preconditioners (cf. [52], [79], [56]). A factorization is very efficient if we are only interested in solving a system of linear equations, but there are tasks, e.g., the computation of the Schur complement of a saddle point problem or the treatment of a matrix equation, when we require a sufficiently accurate approximation of the inverse B ´ A1 of the matrix A.
10.5 Solution operators of elliptic partial differential equations
415
Inversion As in the previous section, we reduce this task to a blockwise recursion using matrixmatrix multiplications. For all t; s 2 T , we denote the submatrices corresponding to the block .t; s/ by A t;s ´ AjtOOs and B t;s ´ BjtOOs . Let t 2 T be a cluster. If t is a leaf cluster, the admissibility condition implies that .t; t / 2 L is an inadmissible leaf of T , and A t t is stored in standard array representation. Its inverse B .t/ can be computed by the well-known algorithms from linear algebra. If t is not a leaf cluster, A t;t is a block matrix. As in the previous section, we consider only the simple case # sons.t / D 2 with sons.t / D ft1 ; t2 g. The defining equation A t;t B .t/ D I of B .t/ takes the form ! .t/ B B t.t/ I A t1 ;t1 A t1 ;t2 ;t ;t t 1 1 1 2 D : I A t2 ;t1 A t2 ;t2 B t.t/;t B t.t/;t 2 1
2 2
We compute the inverse B .t1 / of A t1 ;t1 by recursion and multiply the equation by the matrix B .t1 / A t2 ;t1 B .t1 / I in order to get
I 0
A t2 ;t2
B .t1 / A t1 ;t2 A t2 ;t1 B .t1 / A t1 ;t2
B t.t/ 1 ;t1 B t.t/ 2 ;t1
B t.t/ 1 ;t2 B t.t/ 2 ;t2
!
B .t1 / D A t2 ;t1 B .t1 /
0 : I
By denoting the Schur complement by S .t2 / ´ A t2 ;t2 A t2 ;t1 B .t1 / A t1 ;t2 , we can write this equation in the short form ! .t/ B B t.t/ 0 B .t1 / I B .t1 / A t1 ;t2 ;t ;t t 1 1 1 2 D : A t2 ;t1 B .t1 / I 0 S .t2 / B t.t/;t B t.t/;t 2 1
2 2
We compute the inverse T .t2 / of S .t2 / by recursion and multiply the equation by the matrix I B .t1 / A t1 ;t2 T .t2 / ; T .t2 / which yields the desired result ! .t/ B B t.t/ 0 B .t1 / I B .t1 / A t1 ;t2 T .t2 / t1 ;t2 1 ;t1 D A t2 ;t1 B .t1 / I 0 T .t2 / B t.t/ B t.t/ 2 ;t1 2 ;t2 .t / B 1 C B .t1 / A t1 ;t2 T .t2 / A t2 ;t1 B .t1 / B .t1 / A t1 ;t2 T .t2 / D : T .t2 / A t2 ;t1 B .t1 / T .t2 /
416
10 Applications
This construction of the inverse of A t;t gives rise to the recursive Algorithm 59: we start with the upper left block, invert it by recursion, and use it to compute the matrices X1;2 ´ B .t1 / A t1 ;t2 and X2;1 ´ A t2 ;t1 B .t1 / . Then we overwrite A t2 ;t2 by the Schur complement, given by A t2 ;t2 A t2 ;t1 X1;2 . In the next iteration of the loop, A t2 ;t2 is inverted, which gives us the lower right block of the inverse. The remaining blocks can be computed by multiplying this block by X1;2 and X2;1 . Algorithm 59. Matrix inversion, overwrites A with its inverse A1 and uses X for temporary matrices. procedure Inversion(t , var A, X ); if sons.t/ D ; then Compute A1 directly else # sons.t /; ft1 ; : : : ; t g sons.t /; for i; j 2 f1; : : : ; g do Aij AjtOi tOj end for; for i D 1 to do Inversion(ti , Ai i , Xi i ); for j 2 fi C 1; : : : ; g do Xij Ai i Aij ; fUse approximative multiplicationg Aj i Ai i fUse approximative multiplicationg Xj i end for; for j; k 2 fi C 1; : : : ; g do Aj k Aj k Aj i Xik fUse approximative multiplicationg end for end for; for i D downto 1 do for j 2 fi C 1; : : : ; g do Aij 0; Aj i 0; for k 2 fi C 1; : : : ; g do Aij Aij Xik Akj ; fUse approximative multiplicationg Aj i Aj i Aj k Xki ; fUse approximative multiplicationg end for; Ai i Ai i Xij Aj i fUse approximative multiplicationg end for end for; for i; j 2 f1; : : : ; g do Aj ti ;tj Aij fUnify resulting matrix by Algorithm 32g end for end if
10.5 Solution operators of elliptic partial differential equations
417
As in the case of the LU factorization, we only need matrix-matrix multiplications and recursion, therefore we can construct an approximate inverse by replacing the matrix-matrix products by their H 2 -matrix approximations, which can be computed by the algorithms introduced in the Chapters 7 and 8. Using the unification Algorithm 32, the blocks of the intermediate representation of the inverse can then be combined into the final H 2 -matrix representation.
Experiments We use the inversion Algorithm 59 to construct H 2 -matrix approximations of the stiffness matrix (10.24) for different types of coefficients. In our first experiment, we consider the simple case C D I , i.e., we discretize Poisson’s equation on the square domain Œ1; 12 with a regular triangulation. Table 10.9 gives the computation times, storage requirements and approximation errors ´ kI BAk2 for matrix dimensions ranging from n D 1024 to 1048576. The tolerance O for the approximative arithmetic operations is chosen to ensure 2 104 . Since we can expect the condition number of A to grow like n, using O 1=n should ensure a good approximation, and the experiment shows this leads to good results. Table 10.9. Approximate inverse of Poisson’s equation in two spatial dimensions for different matrix dimensions n.
n 1024 4096 16384 65536 262144 1048576 100000
O 46 16 17 28 49 19
Build Mem Mem=n 1:8 3:3 3:3 6:35 23:0 17:4 4:3 2:04 229:0 86:4 5:4 1:24 1768:7 395:1 6:2 1:44 12588:0 1785:2 7:0 1:54 80801:3 7585:0 7:4 2:24 0.0075
Invert O(n log^4(n))
Mem/n O(log(n))
0.007
10000
0.0065 0.006
1000
0.0055 0.005
100
0.0045 0.004
10
0.0035 1 1000
10000
100000
1e+06
1e+07
0.003 1000
10000
100000
1e+06
1e+07
418
10 Applications
Combining Theorem 9.13 (cf. Theorem 2.8 in [6] for a related result for H -matrices) with Lemma 9.10 yields that a rank of log.n/3 should be sufficient to approximate the inverse B of A, but the numerical experiment shows that in fact log.n/ is sufficient, and Lemma 3.38 yields that the storage requirements are in O.n log.n//. Our estimates also apply to strongly elliptic partial differential equations with nonsmooth coefficients in L1 , and even to anisotropic coefficients. We therefore consider two examples for non-smooth coefficients: in the first case, we split D Œ1; 12 into four equal quarters and choose C D 100 I in the lower left and upper right quarter and C D I in the remaining quarters: 8 ˆ 100 ˆ if x 2 Œ1; 0/ Œ1; 0/ [ Œ0; 1 Œ0; 1; ˆ < 100 22 CQ W ! R ; x 7! ˆ ˆ 1 ˆ otherwise: : 1 Due to the jump in the coefficients, the Green function for the corresponding elliptic differential operator will not be smooth, and approximating it by polynomials is not an option. The general theory of Section 6.3 and Chapter 9, on the other hand, can still be applied, therefore we expect to be able to approximate the inverse of A by an H 2 -matrix. Table 10.10. Approximate inverse of elliptic partial differential equations with coefficients in L1 for different matrix dimensions n.
n 1024 4096 16384 65536 262144
O 46 16 17 28 49
100000
Quartered Build Mem 1:8 3:3 22:7 17:5 223:6 87:1 1756:2 399:6 12670:2 1786:4
9:25 2:64 1:74 1:94 2:24
Anisotropy Build Mem 1:8 3:3 22:8 17:9 240:4 91:2 2137:2 429:4 17021:6 1964:7
0.0075
Q: Invert A: Invert O(n log^4(n))
10000
2:34 4:54 2:14 2:84 3:94
Q: Mem/n A: Mem/n O(log(n))
0.007 0.0065 0.006
1000
0.0055 0.005
100
0.0045 0.004
10
0.0035 1 1000
10000
100000
1e+06
0.003 1000
10000
100000
1e+06
10.5 Solution operators of elliptic partial differential equations
419
We can conclude that H 2 -matrices can be used in situations where the underlying operator cannot be approximated by polynomials and that the H 2 -matrix arithmetic algorithms will find suitable approximations in almost linear complexity. In a second example, we investigate the influence of anisotropic coefficients. We split into two halves and use an anisotropic coefficient matrix in one of them: 8 ˆ 100 ˆ if x 2 Œ1; 1 Œ0; 1; ˆ < 1 22 CA W ! R ; x 7! ˆ ˆ 1 ˆ otherwise: : 1 The approximate inverse is computed by using the recursive inversion Algorithm 59 in combination with the adaptive matrix-matrix multiplication introduced in Chapter 8 and yields the results given in Table 10.10. We can see that, compared to the case of Poisson’s equation, the storage requirements are only slightly increased and still proportional to n log.n/. For the anisotropic problem, the computation time is increased by approximately 35%, but it is still proportional to n log.n/4 .
Bibliography The numbers at the end of each item refer to the pages on which the respective work is cited. [1] S. Amini and A. T. J. Profit, Multi-level fast multipole solution of the scattering problem. Eng. Anal. Bound. Elem. 27 (2003), 547–564. 3 [2] C. R. Anderson, An implementation of the fast multipole method without multipoles. SIAM J. Sci. Statist. Comput. 13 (1992), 923–947. 2 [3] L. Banjai and W. Hackbusch, Hierarchical matrix techniques for low- and high-frequency Helmholtz problems. IMA J. Numer. Anal. 28 (2008), 46–79. 3 [4] M. Bebendorf, Approximation of boundary element matrices. Numer. Math. 86 (2000), 565–589. 2, 248, 392 [5] M. Bebendorf, Effiziente numerische Lösung von Randintegralgleichungen unter Verwendung von Niedrigrang-Matrizen. PhD thesis, Universität Saarbrücken, Saarbrücken 2000; dissertation.de -Verlag im Internet GmbH, Berlin 2001. 2 [6] M. Bebendorf and W. Hackbusch, Existence of H -matrix approximants to the inverse FE-matrix of elliptic operators with L1 -coefficients. Numer. Math. 95 (2003), 1–28. 2, 3, 6, 363, 367, 368, 370, 375, 376, 418 [7] M. Bebendorf and S. Rjasanow, Adaptive low-rank approximation of collocation matrices. Computing 70 (2003), 1–24. 2, 227, 248 [8] C. Bernardi and V. Girault, A local regularization operator for triangular and quadrilateral finite elements. SIAM J. Numer. Anal. 35 (1998), 1893–1916. 377 [9] G. Beylkin, R. Coifman, and V. Rokhlin, Fast wavelet transforms and numerical algorithms I. Comm. Pure Appl. Math. 44 (1991), 141–183. 2 [10] S. Börm, Approximation of integral operators by H 2 -matrices with adaptive bases. Computing 74 (2005), 249–271. 163, 212 [11] S. Börm, H 2 -matrix arithmetics in linear complexity. Computing 77 (2006), 1–28. 281, 329 [12] S. Börm, Adaptive variable-rank approximation of dense matrices. SIAM J. Sci. Comput. 30 (2007), 148–168. 212 [13] S. Börm, Data-sparse approximation of non-local operators by H 2 -matrices. Linear Algebra Appl. 422 (2007), 380–403. 6, 211 [14] S. Börm, Construction of data-sparse H 2 -matrices by hierarchical compression. SIAM J. Sci. Comput. 31 (2009), 1820–1839. 257 [15] S. Börm, Approximation of solution operators of elliptic partial differential equations by H - and H 2 -matrices. Numer. Math. 115 (2010), 165–193. 3, 6, 363, 366, 376 [16] S. Börm and L. Grasedyck, Low-rank approximation of integral operators by interpolation. Computing 72 (2004), 325–332. 2 [17] S. Börm and L. Grasedyck, Hybrid cross approximation of integral operators. Numer. Math. 101 (2005), 221–249. 2, 75, 248, 394
422
Bibliography
[18] S. Börm, L. Grasedyck, and W. Hackbusch, Hierarchical matrices. Lecture Note 21 of the Max Planck Institute for Mathematics in the Sciences, Leipzig 2003. 49, 269, 366 http://www.mis.mpg.de/de/publications/andere-reihen/ln/lecturenote-2103.html [19] S. Börm, L. Grasedyck, and W. Hackbusch, Introduction to hierarchical matrices with applications. Eng. Anal. Bound. Elem. 27 (2003), 405–422. 49 [20] S. Börm and W. Hackbusch, Approximation of boundary element operators by adaptive H 2 -matrices. In Foundations of computational mathematics: Minneapolis 2002, London Math. Soc. Lecture Note Ser. 312, Cambridge University Press, Cambridge 2004, 58–75. 163 [21] S. Börm and W. Hackbusch, Hierarchical quadrature of singular integrals. Computing 74 (2005), 75–100. 272 [22] S. Börm, M. Löhndorf, and J. M. Melenk, Approximation of integral operators by variableorder interpolation. Preprint 82/2002, Max Planck Institute for Mathematics in the Sciences, Leipzig 2002. http://www.mis.mpg.de/publications/preprints/2002/prepr2002-82.html 129 [23] S. Börm, M. Löhndorf, and J. M. Melenk, Approximation of integral operators by variableorder interpolation. Numer. Math. 99 (2005), 605–643. 2, 3, 75, 94, 125, 129, 135, 161, 278 [24] S. Börm and J. Ostrowski, Fast evaluation of boundary integral operators arising from an eddy current problem. J. Comput. Phys. 193 (2004), 67–85. 6 [25] S. Börm and S. A. Sauter, BEM with linear complexity for the classical boundary integral operators. Math. Comp. 74 (2005), 1139–1177. 126 [26] D. Braess, Finite Elemente. 4th ed., Springer-Verlag, Berlin 2007. 390, 398 [27] A. Brandt, Multilevel computations of integral transforms and particle interactions with oscillatory kernels. Comput. Phys. Comm. 65 (1991), 24–38. 83 [28] A. Brandt and A. A. Lubrecht, Multilevel matrix multiplication and fast solution of integral equations. J. Comput. Phys. 90 (1990), 348–370. 83 [29] S. Chandrasekaran, M. Gu, and W. Lyons,A fast adaptive solver for hierarchically semiseparable representations. Calcolo 42 (2005), 171–185. 59 [30] S. Chandrasekaran, M. Gu, and T. Pals, Fast and stable algorithms for hierarchically semi-separable representations. Tech. rep., Department of Mathematics, University of California, Berkeley, 2004. 59 [31] S. Chandrasekaran, M. Gu, and T. Pals, A fast ULV decomposition solver for hierarchically semiseparable representations. SIAM J. Matrix Anal. Appl. 28 (2006), 603–622. 59 [32] S. Chandrasekaran, M. Gu, X. Sun, J. Xia, and J. Zhu, A superfast algorithm for Toeplitz systems of linear equations. SIAM J. Matrix Anal. Appl. 29 (2007), 1247–1266. 59 [33] S. Chandrasekaran and I. C. F. Ipsen, On rank-revealing factorisations. SIAM J. Matrix Anal. Appl. 15 (1994), 592–622. 227 [34] P. G. Ciarlet, The finite element method for elliptic problems. Classics Appl. Math. 40, Society for Industrial and Applied Mathematics (SIAM), Philadelphia 2002. 365, 390
Bibliography
423
[35] P. Clément, Approximation by finite element functions using local regularization. RAIRO Anal. Numér. 9 (1975), 77–84. 376 [36] A. Cohen, W. Dahmen, and R. DeVore, Adaptive wavelet methods for elliptic operator equations: convergence rates. Math. Comp. 70 (2001), 27–75. 401 [37] J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series. Math. Comp. 19 (1965), 297–301. 1 [38] W. Dahmen, B. Faermann, I. G. Graham, W. Hackbusch, and S. A. Sauter, Inverse inequalities on non-quasi-uniform meshes and application to the mortar element method. Math. Comp. 73 (2004), 1107–1138. 376, 379, 390, 391, 398, 399 [39] W. Dahmen, H. Harbrecht, and R. Schneider, Compression techniques for boundary integral equations—asymptotically optimal complexity estimates. SIAM J. Numer. Anal. 43 (2006), 2251–2271. 2 [40] W. Dahmen, S. Prössdorf, and R. Schneider, Multiscale methods for pseudo-differential equations on smooth closed manifolds. In Wavelets: theory, algorithms, and applications (Taormina, 1993), Wavelet Anal. Appl. 5, Academic Press, San Diego 1994, 385–424. 401 [41] W. Dahmen and R. Schneider, Wavelets on manifolds I: Construction and domain decomposition. SIAM J. Math. Anal. 31 (1999), 184–230. 2 [42] R. A. DeVore and G. G. Lorentz, Constructive approximation. Grundlehren Math. Wiss. 303, Springer-Verlag, Berlin 1993. 95, 106 [43] S. Erichsen and S.A. Sauter, Efficient automatic quadrature in 3-d Galerkin BEM. Comput. Methods Appl. Mech. Engrg. 157 (1998), 215–224. 156, 392 [44] P. P. Ewald, Die Berechnung optischer und elektrostatischer Gitterpotentiale. Ann. Physik 369 (1921), 253–287. 2 [45] K. Giebermann, Multilevel approximation of boundary integral operators. Computing 67 (2001), 183–207. 1, 3, 74 [46] Z. Gimbutas and V. Rokhlin, A generalized fast multipole method for nonoscillatory kernels. SIAM J. Sci. Comput. 24 (2002), 796–817. 2 [47] G. Golub, Numerical methods for solving linear least squares problems. Numer. Math. 7 (1965), 206–216. 227 [48] G. H. Golub and C. F. Van Loan, Matrix computations. 3rd. ed., The Johns Hopkins University Press, Baltimore 1996. 180, 181, 186, 187, 188, 233 [49] L. Grasedyck, Theorie und Anwendungen Hierarchischer Matrizen. PhD thesis, Universität Kiel, Kiel 2001. 2, 50, 122, 262, 361, 376 http://eldiss.uni-kiel.de/macau/receive/dissertation_diss_00000454 [50] L. Grasedyck, Adaptive recompression of H -matrices for BEM. Computing 74 (2005), 205–223. 248, 393 [51] L. Grasedyck, Existence of a low-rank or H -matrix approximant to the solution of a Sylvester equation. Numer. Linear Algebra Appl. 11 (2004), 371–389. 2 [52] L. Grasedyck and W. Hackbusch, Construction and arithmetics of H -matrices. Computing 70 (2003), 295–334. 2, 4, 6, 29, 37, 38, 50, 347, 361, 366, 414
424
Bibliography
[53] L. Grasedyck, W. Hackbusch, and B. N. Khoromskij, Solution of large scale algebraic matrix Riccati equations by use of hierarchical matrices. Computing 70 (2003), 121–165. 2 [54] L. Grasedyck, W. Hackbusch, and R. Kriemann, Performance of H-LU preconditioning for sparse matrices. Comput. Methods Appl. Math. 8 (2008), 336–349. 2 [55] L. Grasedyck, R. Kriemann, and S. Le Borne, Parallel black box H -LU preconditioning for elliptic boundary value problems. Comput. Visual Sci. 11 (2008), 273–291. 2 [56] L. Grasedyck, R. Kriemann, and S. Le Borne, Domain decomposition based H -LU preconditioning. Numer. Math. 112 (2009), 565–600. 2, 414 [57] L. Greengard, D. Gueyffier, P.-G. Martinsson, and V. Rokhlin, Fast direct solvers for integral equations in complex three-dimensional domains. Acta Numer. 18 (2009), 243– 275. 2 [58] L. Greengard and V. Rokhlin, A fast algorithm for particle simulations. J. Comput. Phys. 73 (1987), 325–348. 1, 3, 5, 74 [59] L. Greengard and V. Rokhlin, On the numerical solution of two-point boundary value problems. Comm. Pure Appl. Math. 44 (1991), 419–452. 3, 36, 59 [60] L. Greengard and V. Rokhlin, A new version of the fast multipole method for the Laplace in three dimensions. Acta Numer. 6 (1997), 229–269. 1, 3, 5, 74 [61] W. Hackbusch, Multigrid methods and applications. Springer Ser. Comput. Math. 4, Springer-Verlag, Berlin 1985. 366 [62] W. Hackbusch, A sparse matrix arithmetic based on H -matrices. Part I: Introduction to H -matrices. Computing 62 (1999), 89–108. v, 2, 6, 29, 36, 49, 366 [63] W. Hackbusch, Hierarchische Matrizen. Springer-Verlag, Dordrecht 2009. 2, 6, 29, 49, 84, 104, 328 [64] W. Hackbusch and S. Börm, Data-sparse approximation by adaptive H 2 -matrices. Computing 69 (2002), 1–35. v, 2, 211, 212 [65] W. Hackbusch and S. Börm, H 2 -matrix approximation of integral operators by interpolation. Appl. Numer. Math. 43 (2002), 129–143. 3, 74, 75 [66] W. Hackbusch and B. N. Khoromskij, H -matrix approximation on graded meshes. In The mathematics of finite elements and applications X (MAFELAP 1999), Elsevier, Kidlington 2000, 307–316. 274 [67] W. Hackbusch and B. N. Khoromskij, A sparse H -matrix arithmetic: general complexity estimates. J. Comp. Appl. Math. 125 (2000), 479–501. 2, 29 [68] W. Hackbusch and B. N. Khoromskij, A sparse H -matrix arithmetic. Part II: Application to multi-dimensional problems. Computing 64 (2000), 21–47. 2, 29 [69] W. Hackbusch, B. N. Khoromskij, and R. Kriemann, Hierarchical matrices based on a weak admissibility criterion. Computing 73 (2004), 207–243. 36, 59 [70] W. Hackbusch, B. N. Khoromskij, and S. A. Sauter, On H 2 -matrices. In Lectures on applied mathematics, Springer-Verlag, Berlin 2000, 9–29. v, 2, 29, 125 [71] W. Hackbusch and Z. P. Nowak, O cloznosti metoda panelej (On complexity of the panel method, Russian). In Vychislitelnye protsessy i sistemy 6, Nauka, Moscow 1988, 233–244. 1
Bibliography
425
[72] W. Hackbusch and Z. P. Nowak, On the fast matrix multiplication in the boundary element method by panel clustering. Numer. Math. 54 (1989), 463–491. 1, 3, 5, 9, 74 [73] H. Harbrecht and R. Schneider, Wavelet Galerkin schemes for boundary integral equations—implementation and quadrature. SIAM J. Sci. Comput. 27 (2006), 1347–1370. 2 [74] S. Jaffard, Wavelet methods for fast resolution of elliptic problems. SIAM J. Numer. Anal. 29 (1992), 965–986. 401 [75] S. Lang, Real and functional analysis. 3rd ed., Graduate Texts in Math. 142, SpringerVerlag, New York 1993. 10, 12 [76] S. Le Borne, H -matrices for convection-diffusion problems with constant convection. Computing 70 (2003), 261–274. 2 [77] S. Le Borne, Hierarchical matrices for convection-dominated problems. In Domain decomposition methods in science and engineering (Berlin, 2003), Lect. Notes Comput. Sci. Eng. 40, Springer-Verlag, Berlin 2005, 631–638. 2 [78] S. Le Borne and L. Grasedyck, H -matrix preconditioners in convection-dominated problems. SIAM J. Matrix Anal. Appl. 27 (2006), 1172–1183. 2 [79] S. Le Borne, L. Grasedyck, and R. Kriemann, Domain-decomposition based H -LU preconditioners. In Domain decomposition methods in science and engineering XVI, Lect. Notes Comput. Sci. Eng. 55, Springer-Verlag, Berlin 2007, 667–674. 414 [80] M. Löhndorf, Effiziente Behandlung von Integraloperatoren mit H 2 -Matrizen variabler Ordnung. PhD thesis, Universität Leipzig, Leipzig 2003. http://www.mis.mpg.de/scicomp/Fulltext/Dissertation_Loehndorf.pdf 65, 161 [81] M. Löhndorf and J. M. Melenk, Approximation of integral operators by variable-order approximation. Part II: Non-smooth domains. In preparation. 65, 274 [82] J. Makino, Yet another fast multipole method without multipoles—pseudoparticle multipole method. J. Comput. Phys. 151 (1999), 910–920. 2 [83] P.-G. Martinsson, A fast direct solver for a class of elliptic partial differential equations. J. Sci. Comput. 38 (2009), 316–330. 59 [84] E. Michielssen, A. Boag, and W. C. Chew, Scattering from elongated objects: direct solution in O.N log2 N / operations. IEE Proc.-Microw. Antennas Propag. 143 (1996),277– 283. 36 [85] G. Of, O. Steinbach, and W. L. Wendland, The fast multipole method for the symmetric boundary integral formulation. IMA J. Numer. Anal. 26 (2006), 272–296. 2 [86] J. Ostrowski, Boundary element methods for inductive hardening. PhD thesis, Universität Tübingen, Tübingen 2003. http://tobias-lib.uni-tuebingen.de/dbt/volltexte/2003/672/ 2 [87] T. J. Rivlin, The Chebyshev polynomials. Wiley-Interscience, New York 1974. 94 [88] V. Rokhlin, Rapid solution of integral equations of classical potential theory. J. Comput. Phys. 60 (1985), 187–207. 2, 74 [89] S. A. Sauter, Cubature techniques for 3-D Galerkin BEM. In Boundary elements: implementation and analysis of advanced algorithms (Kiel, 1996), Notes Numer. Fluid Mech. 54, Vieweg, Braunschweig 1996, 29–44. 156, 392
426
Bibliography
[90] S. A. Sauter, Variable order panel clustering (extended version). Preprint 52/1999, Max Planck Institute for Mathematics in the Sciences, Leipzig 1999. 37, 125, 126, 278 http://www.mis.mpg.de/publications/preprints/1999/prepr1999-52.html [91] S. A. Sauter, Variable order panel clustering. Computing 64 (2000), 223–261. 1, 2, 5, 37, 75, 125, 126 [92] S. A. Sauter and C. Schwab, Randelementmethoden. B. G. Teubner, Wiesbaden 2004. 37, 388, 389, 390, 401 [93] O. Schenk, K. Gärtner, W. Fichtner, and A. Stricker, PARDISO: a high-performance serial and parallel sparse linear solver in semiconductor device simulation. Future Generat. Comput. Syst. 18 (2001), 69–78. 366 [94] R. Schneider, Multiskalen- und Wavelet-Matrixkompression. B. G. Teubner, Stuttgart 1998. 2 [95] L. R. Scott and S. Zhang, Finite element interpolation of nonsmooth functions satisfying boundary conditions. Math. Comp. 54 (1990), 483–493. 377 [96] P. Starr and V. Rokhlin, On the numerical solution of two-point boundary value problems II. Comm. Pure Appl. Math. 47 (1994), 1117–1159. 3, 36, 59 [97] O. Steinbach, Numerische Näherungsverfahren für elliptische Randwertprobleme. B. G. Teubner, Wiesbaden 2003. 392, 396, 398 [98] O. Steinbach and W. L. Wendland, The construction of some efficient preconditioners in the boundary element method. Adv. Comput. Math. 9 (1998), 191–216. 401 [99] E. Stiefel, Über einige Methoden der Relaxationsrechnung. Z. Angew. Math. Physik 3 (1952), 1–33. 393 [100] J. Tausch, The variable order fast multipole method for boundary integral equations of the second kind. Computing 72 (2004), 267–291. 3, 75, 125, 278 [101] J. Tausch, A variable order wavelet method for the sparse representation of layer potentials in the non-standard form. J. Numer. Math. 12 (2004), 233–254. 125 [102] J. Tausch and J. White, Multiscale bases for the sparse representation of boundary integral operators on complex geometry. SIAM J. Sci. Comput. 24 (2003), 1610–1629. 2, 191 [103] E. Tyrtyshnikov, Mosaic-skeleton approximations. Calcolo 33 (1996), 47–57. 2 [104] E. E. Tyrtyshnikov, Incomplete cross approximation in the mosaic-skeleton method. Computing 64 (2000), 367–380. 227, 392 [105] T. von Petersdorff and C. Schwab, Fully discrete multiscale Galerkin BEM. In Multiscale wavelet methods for partial differential equations, Wavelet Anal. Appl. 6, Academic Press, San Diego 1997, 287–346. 2 [106] J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, Fast algorithms for hierarchically semiseparable matrices. Numer. Linear Algebra Appl., to appear. Article first published online: 22 DEC 2009, DOI: 10.1002/nla.691 59 [107] J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, Superfast multifrontal method for large structured linear systems of equations, SIAM J. Matrix Anal. Appl. 31 (2009), 1382–1411. 59 [108] L. Ying, G. Biros, and D. Zorin, A kernel-independent adaptive fast multipole algorithm in two and three dimensions. J. Comput. Phys. 196 (2004), 591–626. 2
Algorithm index
Adaptive bases for H -matrices, 238 Adaptive bases for H 2 -matrices, 246 Adaptive bases for dense matrices, 230 Adaptive bases for dense matrices, theoretical, 229 Adaptive bases for weighted dense matrices, 263 Backward transformation, 61 Block backward transformation, 339 Block cluster tree, 43 Block forward transformation, 169 Blockwise conversion of an H -matrix, 257 Bounding boxes, 48
Forward transformation, 61 Geometric cluster trees, 42 Householder factorization, 181 Low-rank approximation of leaf blocks, 351 Low-rank approximation of matrix blocks, 349 Low-rank approximation of subdivided blocks, 353 Low-rank projection, 188 Lower forward substitution, 404 LU decomposition, 403
Check for orthogonality, 166 Cluster basis product, 176 Coarsening, 354 Construction of a nested cluster basis from a general one, 224 Construction of weights for the total cluster basis, 243 Conversion, H to H 2 , 173 Conversion, H to H 2 , with error norm, 202 Conversion, H 2 to H 2 , 178 Conversion, H 2 to H 2 , with error norm, 203 Conversion, dense to H 2 , 170 Conversion, dense to H 2 , with error norm, 201 Covering balls, 46
Matrix addition, 296 Matrix backward transformation, 291 Matrix backward transformation, elementary step, 290 Matrix forward transformation, 286 Matrix forward transformation, elementary step, 285 Matrix inversion, 416 Matrix multiplication, 314 Matrix multiplication, bA admissible, 312 Matrix multiplication, bB admissible, 312 Matrix multiplication, bC admissible, 313 Matrix multiplication, inadmissible, 314 Matrix multiplication, recursion, 311 Matrix-vector multiplication, 62
Expansion of a cluster basis, 338
Orthogonalization of cluster bases, 184
Find optimal rank, 187 Forward substitution, 407
Recursive construction of cluster bases for H -matrices, 237
428
Algorithm index
Recursive construction of cluster bases for H 2 -matrices, 246 Recursive construction of unified cluster bases, 252 Regular cluster trees, 40 Semi-uniform matrix forward transformation, 338, 340
Semi-uniform matrix product, multiplication step, 342 Truncation of cluster bases, 190 Unification of submatrices, 255 Upper forward substitution, 405
Subject index
Admissibility condition, 36 Admissibility condition, weak, 36 Anisotropic coefficients, 419 Approximation of derivatives, 104 Asymptotically smooth function, 83 Backward transformation, blocks, 339 Backward transformation, matrices, 287 Block backward transformation, 339 Block cluster tree, 34 Block cluster tree, admissible, 37 Block cluster tree, consistent, 144 Block cluster tree, induced, 298, 325 Block cluster tree, sparse, 51 Block cluster tree, transposed, 215 Block columns, 50 Block columns, admissible, 59 Block forward transformation, 169 Block restriction, 167 Block rows, 50 Block rows, admissible, 59 Block rows, extended, 170 Boundary integral formulation, direct, 395 Boundary integral formulation, indirect, 388 Bounding boxes, 45, 90 Cacciopoli inequality, 367 Call tree for multiplication, 315 Characteristic point, 38 Chebyshev interpolation, 93 Clément interpolation operator, 377 Cluster basis, 54 Cluster basis product, 176 Cluster basis, conversion into nested basis, 223 Cluster basis, induced, 299, 321, 325
Cluster basis, nested, 54 Cluster basis, orthogonal, 164 Cluster basis, orthogonalization, 180 Cluster basis, truncation, 185 Cluster basis, unified, 248 Cluster operator, 175 Cluster tree, 30 Cluster tree, bounded, 63 Cluster tree, depth, 34 Cluster tree, quasi-balanced, 129 Cluster tree, regular, 69 Cluster, descendants, 30 Cluster, father, 30 Cluster, level, 31 Cluster, predecessors, 30 Coarsening the block structure, 354 Complexity of block forward transformation, 169 Complexity of cluster basis expansion, 342 Complexity of coarsening, 357 Complexity of collecting submatrices, 284 Complexity of compression of dense matrices, 229 Complexity of compression of H -matrices, 236 Complexity of compression of H 2 -matrices, 245 Complexity of conversion of dense matrices, 171 Complexity of conversion of H -matrices, 173 Complexity of conversion of H 2 -matrices, 178 Complexity of finding cluster weights, 243
430
Subject index
Complexity of hierarchical compression, 258 Complexity of matrix backward transformation, 292 Complexity of matrix forward transformation, 286 Complexity of orthogonalization, 183 Complexity of projected matrix addition, 295 Complexity of projected multiplication, 317 Complexity of semi-uniform forward transformation, 343 Complexity of semi-uniform matrix multiplication, 346 Complexity of splitting into submatrices, 289 Complexity of the cluster basis product, 177 Complexity of truncation, 190 Complexity of unification of cluster bases, 250 Complexity of unification of submatrices, 255 Condensation of H -matrix blocks, 234 Condensation of total cluster bases, 240, 249 Conjugate gradient method, 401 Control of blockwise relative error, 262 Control of spectral error by variable rank compression, 264 Covering a regularity ellipse with circles, 152 Density function, 388 Direct boundary integral formulation, 395 Directional interpolation, 98 Double layer potential, 155 Elliptic partial differential equation, 363, 413 Error decomposition for truncation, 194
Error estimate for asymptotically smooth functions, 102 Error estimate for coarsening, 354 Error estimate for compression, 232 Error estimate for derived asymptotically smooth functions, 113 Error estimate for discrete solution operators, 385 Error estimate for isotropic interpolation, 101 Error estimate for multi-dimensional derived interpolation, 111 Error estimate for multi-dimensional interpolation, 100 Error estimate for multi-dimensional re-interpolation, 141 Error estimate for one-dimensional derived interpolation, 108 Error estimate for one-dimensional interpolation, 97 Error estimate for one-dimensional re-interpolation, 136 Error estimate for re-interpolation of analytic functions, 137 Error estimate for re-interpolation of asymptotically smooth functions, 142 Error estimate for semi-uniform projection, 224 Error estimate for solution operators, 372 Error estimate for submatrices of solution operators, 382 Error estimate for the double layer potential, 399 Error estimate for the single layer potential, 391 Error estimate for the Taylor expansion, 85 Error estimate for truncation, 195 Error estimate for variable-order interpolation, 147 Error orthogonality, 192 Expansion matrices, 54
Subject index
Expansion system, 77 Farfield, 37 Finite element method, 76, 365, 389, 414 Forward substitution, 406 Forward substitution, blocks, 402 Forward transformation, blocks, 169 Forward transformation, matrices, 285 Forward transformation, semi-uniform matrices, 338, 340 Frobenius error, blockwise, 116 Frobenius error, integral operator, 120 Frobenius error, projection, 200 Frobenius error, total, 117 Frobenius inner product, 166 Frobenius norm, 115, 166, 200 Galerkin’s method, 76, 365, 389, 414 H -matrix, 49 H 2 -matrix, 56 Hierarchical compression, 257 Hierarchical matrix, 49 Holomorphic extension, 151 Householder factorization, 180 HSS-matrix, 59 Indirect boundary integral formulation, 388 Induced block cluster tree, 298, 325 Induced cluster basis, 299, 321, 325 Integral operator, double layer potential, 155 Integral operator, Frobenius error, 120 Integral operator, single layer potential, 155, 388 Integral operator, spectral error, 124 Integral operator, truncation, 198 Integral operator, variable-rank compression, 265 Interior regularity, 366, 367 Interpolation, 88 Interpolation scheme, 93
431
Interpolation, best approximation property, 94 Jumping coefficient, 418 Lebesgue constant, 93 Locally L-harmonic, 367 LU decomposition, 402 Markov’s inequality, 106 Matrix addition, exact, 300 Matrix addition, projected, 293 Matrix backward transformation, 287 Matrix forward transformation, 285 Matrix inversion, 415 Matrix multiplication, exact, 327 Matrix multiplication, projected, 310 Matrix multiplication, semi-uniform, 340 Nearfield, 37 Operator norm, 120 Orthogonality criterion, 165 Orthogonalization of cluster bases, 180 Overlapping supports, 115 Partial differential equation, 364 Partial interpolation, 98 Polynomial approximation of analytic functions, 95 Preconditioner, 401 Projection into H 2 -matrix spaces, 167 Projection into semi-uniform matrix spaces, 214 Projection of H -matrices, 172 Projection of dense matrices, 168 Quasi-optimality of compression, 233 Rank distribution, 54 Rank distribution, bounded, 64 Rank distribution, maximum, 176 Re-interpolated kernel, 132
432
Subject index
Re-interpolation, 131 Re-interpolation cluster basis, 131 Restriction of cluster basis, 220 Schur complement, 415 Semi-uniform hierarchical matrix, 212, 334 Semi-uniform matrix forward transformation, 340 Semi-uniform matrix forward transformation, 338 Separable approximation by Taylor expansion, 81 Separable approximation by tensor interpolation, 90 Sequence of -regular bounding boxes, 141 Sequence of -regular boxes, 140 Sequence of -regular intervals, 135 Single layer potential, 155, 388 Singular value decomposition, 185, 348 Spaces of semi-uniform hierarchical matrices, 213 Spectral error, blockwise, 120 Spectral error, factorized estimate, 124 Spectral error, integral operator, 124 Spectral error, projection, 216 Spectral norm, 120
Spectral norm, total, 121, 122 Stability of re-interpolation, 135 Stable interpolation scheme, 93 Strang’s lemma, 390 Support, 38 Symm’s equation, 388 Tausch/White wavelets, 191 Taylor expansion, 80 Tensor interpolation, 90 Total cluster basis, 218 Total cluster basis, geometric interpretation, 222 Total cluster basis, properties, 218 Total cluster basis, weighted, 261 Transfer matrices, 54 Transfer matrices, long-range, 193 Tree, 29, 30 Truncation of cluster bases, 185 Unification, 248 Variable-order approximation, 125 Variable-order interpolation of the kernel function, 142 Variable-rank compression, integral operator, 265