PROCEEDINGS OF THE CONFERENCE ON APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Proceedings of the Conference on Applied Mathematics and Scientific Computing Edited by
ZLATKO DRMAý University of Zagreb, Croatia
MILJENKO MARUŠIû University of Zagreb, Croatia and
ZVONIMIR TUTEK University of Zagreb, Croatia
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN 1-4020-3196-3 (HB) ISBN 1-4020-3197-1 (e-book)
Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. Sold and distributed in North, Central and South America by Springer, 101 Philip Drive, Norwell, MA 02061, U.S.A. In all other countries, sold and distributed by Springer, P.O. Box 322, 3300 AH Dordrecht, The Netherlands.
Printed on acid-free paper
All Rights Reserved © 2005 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed in the Netherlands.
Preface
The Third Conference on Applied Mathematics and Scientific Computing took place June 23-27, 2003 on island of Brijuni, Croatia. The main goal of the conference was to interchange ideas among applied mathematicians in the broadest sense both from and outside academia, as well as experts from other areas who apply different mathematical techniques. During the meeting there were invited and contributed talks and software presentations. Invited presentations were given by active researchers from the fields of approximation theory, numerical methods for differential equations and numerical linear algebra. These proceedings contain research and review papers by invited speakers and selected contributed papers from the fields of applied and numerical mathematics. A particular aim of the conference was to encourage young scientists to present results of their research. Traditionally, the best presentation given by PhD student was rewarded. This year awardee was Luka Grubiˇsˇic´ (University of Hagen, Hagen, Germany) and we congratulate him for this achievement. It would be hard to organize the conference without generous support of the Croatian Ministry of Science and Technology and we acknowledge it. We are also indebted to the main organizer, Department of Mathematics, University of Zagreb. Motivating beautiful nature should be also mentioned. And, at the end, we are thankful to Drs. Josip Tambaˇcˇ a and Ivica Nakic´ for giving this book its final shape. ˇ C ´ MILJENKO MARUSI
ZLATKO DRMACˇ ZVONIMIR TUTEK
Contents
Preface Part I
v Invited lectures
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems: Theory, Algorithms and Applications 3 Peter Benner, r Daniel Kressner, r Volker Mehrmann 5 1 Preliminaries 2 The Skew-Hamiltonian Eigenvalue Problem 8 3 The Hamiltonian Eigenvalue Problem 18 4 Applications 30 5 Concluding Remarks 34 Acknowledgments
34
References
35
A General Frame for the Construction of Constrained Curves Paolo Costantini, Maria Lucia Sampoli 1 Introduction 2 The general structure of Abstract Schemes 3 Construction of constrained curves 4 A new application: geometric interpolation 5 Concluding remarks
41
References
64
DMBVP for Tension Splines Boris I. Kvasov 1 Introduction 2 1–D DMBVP. Finite Difference Approximation 3 System Splitting and Mesh Solution Extension 4 Computational Aspects 5 2–D DMBVP. Problem Formulation 6 Finite–Difference Approximation of DMBVP 7 Algorithm 8 SOR Iterative Method 9 Method of Fractional Steps
67
41 42 52 60 62
67 68 69 72 76 78 80 81 82
viii
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
10 Graphical Examples Acknowledgments
84 92
References
93
Robust numerical methods for the singularly perturbed Black-Scholes equation J J H Miller, r G I Shishkin 1 Introduction 2 Problem formulation 3 Numerical solutions of singular perturbation problems 4 Upwind uniform mesh method 5 Upwind piecewise-uniform fitted mesh method 6 Summary Acknowledgments
95 95 96 98 99 101 104 105
References
105
Part II
Contributed lectures
On certain properties of spaces of locally Sobolev functions Nenad Antoni´, Kreˇsˇimir Burazin 1 Introduction 2 Spaces of locally Sobolev functions 3 4 5
m,p
−m,p
Duality of spaces Wc (Ω) and Wloc (Ω) Weak convergence and some imbeddings Concluding remarks
109 109 111 113 116 119
References
119
On some properties of homogenised coefficients for stationary diffusion problem Nenad Antoni´, Marko Vrdoljak 1 Introduction 2 Two-dimensional case 3 Three-dimensional case 4 Some special cases
121
References
129
122 123 125 128
Solving Parabolic Singularly Perturbed Problems by Collocation Using Tension 131 Splines Ivo Beroˇ, Miljenko Maruˇi´ ˇ 1 Introduction 131 2 Collocation method 132 3 Collocation method for parabolic differential equation 134 4 Numerical results 135 References
139
ix
Contents
On accuracy properties of one–sided bidiagonalization algorithm and its applications 141 Nela Bosner, r Zlatko Drmaˇ 1 Introduction 141 2 One–sided bidiagonalizations 142 3 Error analysis 143 4 5
˜ On the (i)relevance of the (non)orthogonality of U Conclusion
146 150
References
150
Knot Insertion Algorithms for Weighted Splines Tina Bosner 1 Introduction and Preliminaries 2 Weighted splines of order 4 (k = 2) 3 Weighted splines of order k + 2 (k > 2) 4 Conclusion
151
References
159
151 154 156 159
Numerical procedures for the determination of an unknown source parameter in a parabolic equation 161 Emine Can Baran 1 Introduction 161 2 Procedure I (canonical representation) 162 163 3 Procedure II (TTF formulation) 4 Numerical Result and Discussion 164 References
168
Balanced central NT schemes for the shallow water equations ˇ ˇ Senka Vukovi´, Luka Sopta Nelida Crnjari´Zic, 1 Central NT scheme. 2 Balanced central NT scheme for the shallow water equations. 3 Numerical results 4 Conclusion remarks
171
References
184
172 175 179 184
Hidden Markov Models and Multiple Alignments of Protein Sequences 187 Pavle Goldstein, n Maja Karaga, a Mate Kosor, r Ivana Nizeti´ ˇ , Marija Tadi´, Domagoj Vlah 1 Introduction 188 2 Hidden Markov Models 188 3 Expectation Maximization 189 4 Suboptimal Alignments 191 5 Results and Conclusions 192 References
195
x
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
On strong consistency for one–step approximations of stochastic ordinary differential equations 197 Rozsa ´ Horv´ath ´ Bokor 1 Introduction 197 2 Strong Convergence and Consistency 199 References
205
On the dimension of bivariate spline space S31 () Gasper ˇ Jakliˇ, Jernej Kozak 1 Introduction 2 The approaches to the dimension problem 3 The reduction step 4 The reduction possibilities considered
207 207 209 210 212
References
216
Total least squares problem for the Hubbert function Dragan Juki´, Rudolf Scitovski, Kristian Sabo 1 Introduction 2 The existence problem and its solution 3 Choice of initial approximation 4 Numerical examples
217
References
233
Heating of oil well by hot water circulation ˇ Mladen Jurak, k Zarko Prni´ 1 Mathematical model 2 Variational problem 3 Numerical approximation
235
References
243
Geometric Interpolation of Data in R3 ˇ Jernej Kozak, k Emil Zagar 1 Introduction 2 The system of nonlinear equations 3 The proof of the theorem 4 Numerical example
245 245 246 248 251
References
252
217 221 231 231
236 238 240
One-dimensional flow of a compressible viscous micropolar fluid: stabilization of the solution 253 Nermina Mujakovi´ 1 Statement of the problem and the main result 254 2 Some properties of the nonstationary solution 256 3 Proof of Theorem 1.1. 259
Contents References
xi 261
On parameter classes of solutions for system of quasilinear differential equations 263 Alma Omerspahi´, Bozo ˇ Vrdoljak 1 Introduction 263 2 The main results 265 References
272
Algebraic Proof of the B–Spline Derivative Formula Mladen Rogina 1 Introduction and preliminaries 2 The derivative formula
273
Acknowledgment
281
References
281
Relative Perturbations, Rank Stability and Zero Patterns of Matrices Sanja Singer, r Sasa ˇ Singer 1 Introduction 2 Problem reduction for rank deficient matrices 3 Vanishing determinant stability 4 Zero patterns
283
References
292
273 274
283 285 286 289
Numerical Simulations of Water Wave Propagation and Flooding 293 ˇ ˇ ˇ Luka Sopta, Nelida Crnjari´-Zic, Senka Vukovi´, Danko Holjevi´, Jerko Skifi´, Siniˇa Druˇzˇ eta 1 Introduction 294 2 Wetting and drying 296 3 Simulations 298 References
303
Derivation of a model of leaf springs Josip Tambaˇa 1 Introduction 2 Geometry of straight multileaf springs 3 3D elasticity problem 4 The problem in ε–independent domain 5 A priori estimates 6 The first test function 7 The second test function 8 The model
305 305 306 307 307 309 311 312 314
References
315
Quantum site percolation on amenable graphs
317
xii
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Ivan Veseli´ 1 Introduction: The Quantum percolation model 2 Results: Spectral properties of finite range hopping operators 3 Proofs of the theorems 4 Outlook: finitely supported and exponentially decaying states
317 319 322 326
References
327
Order of Accuracy of Extended WENO Schemes ˇ r aricˇ Luka Sopta Senka Vuk V kovic´, Nelida d Crnj ´ Zic, t 1 Introduction 2 Extended WENO schemes 3 Application to one-dimensional shallow water equations 4 Application to one-dimensional linear acoustics equations 5 Application to one-dimensional Burgers equations with source term describing bathymetry 6 Concluding remarks
329 330 330 333 334
References
339
Index
343
337 337
I
INVITED LECTURES
SKEW-HAMILTONIAN AND HAMILTONIAN EIGENVALUE PROBLEMS: THEORY, ALGORITHMS AND APPLICATIONS∗ Peter Benner Technische Universitat t ¨ Chemnitz Fakult¨a¨ t f¨ fur Mathematik
[email protected]
Daniel Kressner Technische Universit¨ ta¨ t Berlin Institut f¨ fur Mathematik
[email protected]
Volker Mehrmann Technische Universit¨ ta¨ t Berlin Institut f¨ fur Mathematik
[email protected]
Abstract
Skew-Hamiltonian and Hamiltonian eigenvalue problems arise from a number of applications, particularly in systems and control theory. The preservation of the underlying matrix structures often plays an important role in these applications and may lead to more accurate and more efficient computational methods. We will discuss the relation of structured and unstructured condition numbers for these problems as well as algorithms exploiting the given matrix structures. Applications of Hamiltonian and skew-Hamiltonian eigenproblems are briefly described.
Keywords:
Hamiltonian matrix, skew-Hamiltonian matrix, structured condition numbers, structure-preserving algorithms.
∗ Supported
by the DFG Research Center “Mathematics for key technologies” (FZT 86) in Berlin.
3 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 3–39. © 2005 Springer. Printed in the Netherlands.
4
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Introduction Computing eigenvalues and invariant subspaces of matrices with structure has been an active field of research during the last two decades. In many instances it has been shown that the exploitation of matrix structures may give rise to more accurate and more efficient numerical methods. In this paper we will discuss this issue for two classes of matrices, skew-Hamiltonian and Hamiltonian matrices. A skew-Hamiltonian matrix has the form A G W = , G = −GT , Q = −QT , (1) Q AT while a Hamiltonian matrix reads as A G H= , Q −AT
G = GT , Q = QT ,
(2)
where A, G and Q are real n × n matrices. A number of applications from control theory and related areas lead to eigenvalue problems involving such matrices, with a stronger emphasis on Hamiltonian matrices, see Section 4. One of the first questions one should always ask when dealing with structured eigenvalue problems is what kind of advantages can principally be expected from exploiting structures. With respect to accuracy of computed eigenvalues and invariant subspaces this question leads to the notion of structured condition numbers and their relationship to unstructured ones. It is interesting to note that the two matrix structures under consideration differ significantly in this aspect. While it is absolutely necessary to use a structure-preserving algorithm for computing invariant subspaces of skew-Hamiltonian matrices, the merits of structure preservation for Hamiltonian matrices are of a more subtle nature and not always relevant in applications. If one is interested in efficiency then there is not so much that can be expected. Both matrix classes depend on 2n 2 + O(n) parameters compared to 4n2 parameters of a general 2n × 2n matrix. Hence, a structure-preserving algorithm can be expected to be at best a decent factor faster than a general-purpose method; for the matrix classes considered here, this factor is usually in the range of 2–3, see [Benner et al., 2000; Benner and Kressner, 2004; Benner et al., 1998]. Another important question is whether it is actually possible to design an algorithm capable to achieve the possible advantages mentioned above. An ideal method tailored to the matrix structure would be strongly backward stable in the sense of Bunch described in [Bunch, 1987], i.e., the computed solution is the exact solution corresponding to a nearby matrix with the same structure; be reliable, i.e., capable to solve all eigenvalue problems in the considered matrix class; and
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
5
require O(n3 ) floating point operations (flops), preferably less than a competitive general-purpose method. While for skew-Hamiltonian matrices such a method is known [Van Loan, 1984b], it has been a long-standing open problem to develop an ideal method for the Hamiltonian eigenvalue problem. So far there is no method known that meets all three requirements satisfactorily. The main purpose of this paper is to survey theory and algorithms for (skew)Hamiltonian eigenvalue problems. With respect to algorithms, the account will necessarily be rather incomplete, simply because of the vast number of algorithms that have been developed. Instead, our focus will be on methods that are based on orthogonal transformations and suitable for dense, small to mediumsized matrices. Nevertheless, they will be related to other existing methods. Another goal in this work is to describe applications of (skew-)Hamiltonian eigenvalue problems and identify the extent to which a structure-preserving algorithm may help to address these applications in a more accurate or more efficient manner. The structure of this survey is as follows. After having introduced some notation and preliminary material in the first section we devote the second section to the skew-Hamiltonian eigenvalue problem. We review structured Hessenberg-like, Schur-like and block diagonal decompositions. This is followed by some recent and new results on structured condition numbers for the eigenvalues and invariant subspaces. The section is concluded by a description of the ideal method for skew-Hamiltonian matrices that was mentioned above. Section 3 contains similar results for the Hamiltonian eigenvalue problem, with a more extensive treatment of structure-preserving algorithms. In particular, we present an explicit version of the Hamiltonian QR algorithm, describe an alternative derivation for the method given in [Benner et al., 1998], via an embedding in skew-Hamiltonian matrices, and give an example of an iterative refinement algorithm. Some applications related to systems and control theory and how they may benefit from the use of structure-preserving algorithms are the subject of Section 4. This paper is accompanied by a Matlab software library for solving skewHamiltonian and Hamiltonian eigenvalue problems. The library is based on recently developed Fortran 77 routines [Benner and Kressner, 2004] and described in [Kressner, 2003b], which also contains numerical examples illustrating some of the aspects in this survey.
1.
Preliminaries An ubiquitous matrix in this work is the skew-symmetric matrix 0 In J2n = , −IIn 0
(3)
6
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
where In denotes the n × n identity matrix. In the following we will drop the subscripts n and 2n whenever the dimension of the corresponding matrix is clear from its context. By straightforward algebraic manipulation one can show that a Hamiltonian matrix H is equivalently defined by the property HJ = (HJ) T . Likewise, a matrix W is skew-Hamiltonian if and only if W J = −(W J) T . Any matrix S ∈ R2n×2n satisfying S T JS = SJS T = J is called symplectic, and since (S −1 HS)J = S −1 HJS −T = S −1 J T H T S −T = [(S −1 HS)J]T we see that symplectic equivalence transformations preserve Hamiltonian structures. There are cases, however, where both H and S −1 HS are Hamiltonian but S is not a symplectic matrix [Freiling et al., 2002]. In a similar fashion the same can be shown for skew-Hamiltonian matrices. From a numerical point of view it is desirable that a symplectic matrix U ∈ R2n×2n is also orthogonal. Such a matrix is called orthogonal symplectic; the two relations U T JU = J and U T U = I imply JU J T = U which effectively means that every orthogonal symplectic matrix U has the block structure U1 U2 U= , U1 , U2 ∈ Rn×n . −U U2 U1 Two types of elementary orthogonal matrices have this form. These are 2n×2n Givens rotation matrices of the type ⎤ ⎡ Ij −1 ⎥ ⎢ cos θ sin θ ⎥ ⎢ ⎥ , 1 ≤ j ≤ n, ⎢ In−1 Gj (θ) = ⎢ ⎥ ⎦ ⎣ − sin θ cos θ In−j for some angle θ ∈ [−π/2, π/2) and the direct sum of two identical n × n Householder matrices In − βvv T (H Hj ⊕ Hj )(v, β) = , In − βvv T where v is a vector of length n with its first j − 1 elements equal to zero and β a scalar satisfying β(βv T v − 2) = 0. Here, ‘⊕’ denotes the direct sum of matrices. A simple combination of these transformations can be used to map an arbitrary vector x ∈ R2n into the linear space Ej = span{e1 , . . . , ej , en+1 , . . . , en+j−1 }, where ei is the ith unit vector of length 2n. Such mappings form the backbone of virtually all structure-preserving algorithms based on orthogonal symplectic
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
7
G2 H2
H2
G2 H2
H2
Figure 1.
The three steps of Algorithm 1 for n = 4 and j = 2.
transformations. They can be constructed using the following algorithm, where it should be noted that elements 1, . . . , j − 1 and n + 1, . . . , n + j − 1 of the vector x remain unaffected. Algorithm 1. Input: A vector x ∈ R2n and an index j ≤ n. Output: Vectors v, w ∈ Rn and β, γ, θ ∈ R so that [(H Hj ⊕ Hj )(v, β) · Gj (θ) · (H Hj ⊕ Hj )(w, γ)]T x ∈ Ej . 1 Determine v ∈ Rn and β ∈ R such that the last n − j elements of x ← (H Hj ⊕ Hj )(v, β)x are zero, see [Golub and Van Loan, 1996, p.209]. 2 Determine θ ∈ [−π/2, π/2) such that the (n + j)th element of x ← Gj (θ)x is zero, see [Golub and Van Loan, 1996, p.215]. 3 Determine w ∈ Rn and γ ∈ R such that the (j + 1)th to the nth elements of x ← (H Hj ⊕ Hj )(w, γ)x are zero. The three steps of this algorithm are illustrated in Figure 1. Orthogonal symplectic matrices of the form Ej (x) ≡ Ej (v, w, β, γ, θ) := (H Hj ⊕H Hj )(v, β)·Gj (θ)·(H Hj ⊕H Hj )(w, γ), (4) as computed by Algorithm 1, will be called elementary.
0 In Let F = In 0 , then we obtain the following variant of elementary orthogonal symplectic matrices: [F · Ej (F x) · F ]T x ∈ span{e1 , . . . , ej−1 , en+1 , . . . , en+j }. For the sake of brevity we set En+j (x) := F ·E Ej (F x)·F , whenever 1 ≤ j ≤ n.
8
2.
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
The Skew-Hamiltonian Eigenvalue Problem
Imposing skew-Hamiltonian structure on a matrix W has a number of consequences for the eigenvalues and eigenvectors of W ; one is that every eigenvalue has even algebraic multiplicity and hence appears at least twice. An easy way to access all these spectral properties is to observe that for any skew-Hamiltonian matrix W there exists a symplectic matrix S so that W11 0 −1 S WS = . (5) T 0 W11 This decomposition – among others – will be described in the following section.
2.1
Structured Decompositions
As a first application of elementary matrices we obtain Algorithm 2 below which constructs the following structured Hessenberg-like form: given a skewHamiltonian matrix W ∈ R2n×2n there is always an orthogonal symplectic matrix U so that U T W U has Paige/Van Loan (PVL) form, i.e., ⎡ ⎤ W11 W12 @ ⎦, UT W U = =⎣ (6) T 0 W11
@
where W11 ∈ Rn×n is an upper Hessenberg matrix [Van Loan, 1984b]. Algorithm 2 (PVL decomposition [Van Loan, 1984b]). Input: A skew-Hamiltonian matrix W ∈ R 2n×2n . Output: An orthogonal symplectic matrix U ∈ R 2n×2n ; W is overwritten with U T W U having the form (6). U ← I2n . for j ← 1, 2, . . . , n − 1 Set x ← W ej . Apply Algorithm 1 to compute Ej +1 (x). Update W ← Ej +1 (x)T W Ej +1 (x), U ← U Ej +1 (x). end for 3 2 A proper implementation of this algorithm requires 40 3 n + O(n ) flops for 3 2 reducing W and additionally 16 3 n + O(n ) flops for computing U . Figure 2 illustrates Algorithm 2 for n = 4. An immediate consequence of the PVL form (6) is that each eigenvalue of W has even algebraic multiplicity. The same is true for the geometric multiplicities. To see this we need to eliminate the skew-symmetric off-diagonal block W 12 , for which we can use solutions of the following singular Sylvester equation.
9
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
E2
E2
E2
E3
E3
E3
E2
E3
Figure 2.
Illustration of two loops of Algorithm 2 for n = 4.
Proposition 3. The matrix equation T W11 R − RW W11 = −W W12
(7)
is solvable for all skew-symmetric matrices W 12 ∈ Rn×n if and only if W11 ∈ Rn×n is nonderogatory, i.e., each eigenvalue of W 11 has geometric multiplicity one. In this case, any solution R of (7) is real and symmetric. Proof. This result can be found in [Gantmacher, 1960; Faßbender et al., 1999]. Actually, the second part is not explicitely stated there but follows easily from the proof of Proposition 5 in [Faßbender et al., 1999]. We now use this proposition to block-diagonalize a skew-Hamiltonian matrix in PVL form (6) assuming that W11 is nonderogatory. For this purpose let R be a solution of (7), then the symmetry of R implies that I0 RI is symplectic. Applying the corresponding symplectic equivalence transformation yields the transformed matrix −1 W11 W12 W11 0 I R I R = . (8) T T 0 I 0 I 0 W11 0 W11 Note that there is a lot of freedom in the choice of R as equation (7) admits infinitely many solutions. From a numerical point of view the matrix R should be chosen so that its norm is as small as possible. The same question arises in the context of structured condition numbers and will be discussed in the next section. It should be stressed that assuming W 11 to be nonderogatory is not necessary and thus, the even geometric multiplicity of eigenvalues also holds in the general case. In fact, in [Faßbender et al., 1999] it is shown that any skew-Hamiltonian
10
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
matrix can be reduced to block diagonal form (8) using symplectic equivalence transformations. The proof, however, is much more involved than the simple derivation given above. Another way to go from a skew-Hamiltonian matrix W in PVL form (6) to a more condensed form is to reduce W 11 further to real Schur form. This can be achieved by constructing an orthogonal matrix Q 1 so that T = QT1 W11 Q1 is in real Schur form [Golub and Van Loan, 1996, Thm.7.4.1]: ⎡ ⎢ ⎢ T =⎢ ⎣
T1m T11 T12 · · · .. .. . 0 T22 . .. .. .. . . Tm−1,m . 0 ··· 0 Tmm
⎤ ⎥ ⎥ ⎥, ⎦
(9)
where all diagonal blocks Tjj of T are of order one or two. Each scalar diagonal block contains a real eigenvalue and each two-by-two diagonal block contains ˜ = U (Q1 ⊕ Q1 ), a pair of conjugate complex eigenvalues of W 11 . Setting U we obtain a skew-Hamiltonian Schur decomposition of W : ˜ T G T ˜ ˜ U WU = , (10) 0 TT ˜ = QT W12 Q1 is skew-symmetric. where G 1
2.2
Structured Condition Numbers
In this section we investigate the change of eigenvalues and certain invariant subspaces of a skew-Hamiltonian matrix W under a sufficiently small, skewHamiltonian perturbation E. Requiring the perturbation to be structured as well may have a strong positive impact on the sensitivity of the skew-Hamiltonian eigenvalue problem; this is demonstrated by the following example. Example 4. Consider the parameter-dependent matrix ⎤ ⎡ 1 0 0 0 ⎢ 0 2 0 0 ⎥ ⎥ W (ε1 , ε2 ) = ⎢ ⎣ ε1 ε2 1 0 ⎦ . −ε2 0 0 2 The vector e1 = [1, 0, 0, 0]T is an eigenvector of W (0, 0) associated with the eigenvalue λ = 1. No matter how small ε1 > 0 is, any eigenvector of W (ε1 , 0) associated with λ has the completely different form [0, 0, α, 0] T for some α = 0. On the other hand, the skew-Hamiltonian matrix W (0, ε 2 ) has an eigenvector [1, 0, 0, ε2 ]T rather close to e1 .
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
11
Before we deal with structured perturbations, we briefly review standard perturbation results that apply to general matrices and perturbations, for further details, see e.g. [Stewart and Sun, 1990; Sun, 1998]. Let A ∈ R n×n and let X ⊂ Rn be a k-dimensional (right) invariant subspace of A, i.e., AX ⊆ X . If the columns of X and X⊥ span orthonormal bases for X and X ⊥ , respectively, then we obtain a block Schur decomposition T A11 A12 X X⊥ A X X⊥ = , (11) 0 A22 where A11 ∈ Rk×k and A22 ∈ R(n−k)×(n−k) . The block A11 satisfies the relation AX = XA11 , which implies that A11 is the representation of A with respect to X. An important operator associated with the decomposition (11) is the linear matrix operator T : R(n−k)×k → R(n−k)×k with T : R → A22 R − RA11 .
(12)
One can show that this Sylvester operator T is invertible if and only if A 11 and A22 have no eigenvalue in common [Golub and Van Loan, 1996, pp.366–369]. If this condition holds then X is called a simple invariant subspace. We are now ready to formulate a perturbation expansion theorem for invariant subspaces and their representations as it can be found, e.g., in [Sun, 1998, Sec.2.1.2]. In the following we denote by · 2 the Euclidean norm and the associated spectral norm for matrices, and by · F the Frobenius norm. Theorem 5. Let A ∈ Rn×n have a block Schur decomposition of the form (11) and assume that the invariant subspace X spanned by the columns of X is simple. Let E ∈ Cn×n be a perturbation of sufficiently small norm. Then, ˆ of A + E with representation Aˆ11 there is an invariant subspace Xˆ = span X satisfying the expansions Aˆ11 = A11 + (Y H X)−1 Y H EX + O( E 22 ), ˆ = X − X⊥ T−1 X H EX + O( E 2 ), X ⊥ 2
(13) (14)
where T is as in (12), the columns of Y form an orthonormal basis for the left ˆ −X) = invariant subspace of A belonging to the eigenvalues of A 11 and X H (X 0. Bounding the effects of E in the expansions (13) and (14) can be used to derive condition numbers for eigenvalues and invariant subspaces. For example, let λ be a simple eigenvalue of A with right and left eigenvectors x and y, ˆ yields respectively. Then Theorem 5 with A11 = [λ] and Aˆ11 = [λ] ˆ − λ| ≤ |λ
x 2 · y 2 E 2 + O( E 22 ). |y H x|
12
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
This inequality is attained up to first order by E = εxy H for any ε > 0, which shows that the absolute condition number of a simple eigenvalue λ can be written as ˆ − λ| |λ x 2 · y 2 c(λ) := lim sup = . (15) ε→0 E2 ≤ε ε |y H x| Note that c(λ) is independent of the choice of x and y. For a simple invariant subspace X spanned by the columns of X we obtain ˆ − X F ≤ T−1 · E F + O( E 2F ), X
(16)
where T−1 is the norm induced by the Frobenius norm:
T−1 (S) F A22 R − RA11 F −1 −1 T := sup = inf . R=0 S F R F S=0 Again, inequality (16) can be attained up to first order by choosing E = εX⊥ V X H with ε > 0, and a matrix V ∈ R(n−k)×k with V F = 1 satisfying T−1 (V ) F = T−1 . Turning (16) into a condition number for an ˆ − X F to quantities that are invariant subspace further requires relating X independent of the choice of bases for X and Xˆ . The matrix Θ(X , Xˆ ) = diag(θ1 (X , Xˆ ), θ2 (X , Xˆ ), . . . , θk (X , Xˆ )), where θi (X , Xˆ ) are the canonical angles between X and Xˆ [Stewart and Sun, ˆ − X) = 0 implies 1990, p.43], is such a quantity. One can show that X H (X ˆ − X F + O( X ˆ − X 3F ). Θ(X , Xˆ ) F = X Hence, we obtain the following condition number for an invariant subspace X : Θ(X , Xˆ ) F = T−1 . ε→0 E ≤ε ε F
c(X ) := lim sup
Note that T−1 is invariant under an orthonormal change of basis for X . A direct (albeit expensive) way to compute this quantity is to express T in terms of Kronecker products: vec(T(R)) = KT · vec(R),
KT = Ik ⊗ A22 − AT11 ⊗ In−k ,
with the Kronecker product ‘⊗’ of two matrices [Golub and Van Loan, 1996, Sec. 4.5.5] and the operator vec which stacks the columns of a matrix into one long vector in their natural order. Then T −1 −1 is the minimal singular value of the k(n − k) × k(n − k) matrix KT . In practice, one estimates T−1 by solving a few Sylvester equations with particularly chosen right hand sides, see e.g. [Higham, 1996].
13
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
Structured condition numbers for eigenvalues We now turn to the perturbation theory for an eigenvalue λ of a matrix W under a perturbation E, where both W and E are skew-Hamiltonian. As λ is necessarily a multiple eigenvalue we cannot apply Theorem 5 to λ alone but must consider the eigenvalue cluster containing all copies of λ. Assuming that λ has algebraic multiplicity two, there are two linearly independent eigenvectors x 1 and x2 corresponding to λ. Let [x1 , x2 ] = XR be a QR decomposition with X ∈ C 2n×2 and R ∈ C2×2 , then W X = W [x1 , x2 ]R−1 = [x1 , x2 ]A11 R−1 = [x1 , x2 ]R−1 A11 = XA11 , where A11 = diag(λ, λ). An analogous relation holds for the two eigenvectors ˆ of the perturbed matrix W + E. As the x ˆ1 , x ˆ2 belonging to the eigenvalue λ ˆ − λ|, Theorem 5 implies that ˆ spectral norm of A11 − A11 is given by |λ ˆ − λ| = (X ¯ H JX)−1 X ¯ H JEX 2 + O( E 2 ) |λ 2 H −1 ¯ ≤ (X JX) 2 + O( E 22 ),
(17)
¯ span the two-dimensional where we also used the fact that the columns of J X ¯ left invariant subspace belonging to λ. Note that X denotes the complex conjugate of X. For real λ we may assume X ∈ R 2n×2 and use the skew-Hamiltonian matrix E = εJ J2Tn XJ J2 X T to show that inequality (17) can be attained up to first order by a skew-Hamiltonian perturbation. This implies that the structured eigenvalue condition number for an eigenvalue λ ∈ R of a skew-Hamiltonian matrix satisfies cW (λ) := lim
ε→0
sup E2 ≤ε E skew-Hamiltonian
ˆ − λ| |λ = (X H JX)−1 2 . ε
T XJ ¯ J2 X H. Likewise for complex λ, we can use perturbations of the form E = εJ 2n Note that E satisfies (EJ)T = −(EJ), i.e., E may be regarded as a complex skew-Hamiltonian matrix. It is an open problem whether one can construct a real skew-Hamiltonian perturbation to show c W (λ) = (X H JX)−1 2 for complex eigenvalues. By straightforward computation one can obtain a simple expression for c W (or an upper bound thereof if λ ∈ C) in terms of the eigenvectors x 1 , x2 belonging to λ: 1 H −1 2 (X JX) 2 = H x1 22 · x2 22 − |xH 1 x2 | . |¯ x1 Jx2 |
Note that this happens to be the unstructured condition number of the mean of the eigenvalue cluster containing both copies of λ, see [Bai et al., 1993] and the references therein.
14
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Structured condition numbers for invariant subspaces The invariant subspaces of a skew-Hamiltonian matrix W that are usually of interest in applications are those which are isotropic. Definition 6. A subspace X ⊆ R2n is called isotropic if X ⊥ J2n X . A maximal isotropic subspace is called Lagrangian. Obviously, any eigenvector of W spans an isotropic invariant subspace but ˜ in a skew-Hamiltonian Schur dealso the first k ≤ n columns of the matrix U composition (10) share this property. Roughly speaking, an invariant subspace X of W is isotropic if X corresponds to at most one copy of each eigenvalue of W . Necessarily, X is not simple, which makes the application of Theorem 5 impossible. Instead, it was shown in [Kressner, 2003d] how one can adapt a technique developed by Stewart in [Stewart, 1971; Stewart, 1973] in order to obtain perturbation bounds for X . Here, we describe this approach only for the important special case when X has maximal dimension n, i.e., X is Lagrangian. Let the columns of X form an orthonormal basis for X . Then [X, JX] is orthogonal and we have the skew-Hamiltonian block Schur decomposition T W11 W12 X JX W X JX = . (18) T 0 W11 If W is perturbed by a skew-Hamiltonian matrix E then ˜ 11 W ˜ 12 T W X JX (W + E) X JX = ˜ 21 W ˜T W 11
(19)
˜ 12 , W ˜ 21 are skew-symmetric and W ˜ 21 F ≤ E F . Any matrix where W 2n×n ˆ ∈R ˆ T X = 0 can be written as X with orthonormal columns and X ˆ = (X + JXR)(I + RT R)−1/2 X
(20)
ˆ span an invariant for some matrix R ∈ Rk×k [Stewart, 1973]. The columns of X ˆ subspace X of W + E if and only if R satisfies the matrix equation ˜ TW ˜ (R) + Φ(R) = W21
(21)
˜ ˜T with the Sylvester operator TW ˜ : R → RW11 − W11 R and the quadratic ma˜ 12 R. Moreover, X is isotropic if and only if R is trix operator Φ : R → RW symmetric. The solution of (21) is complicated by the fact that the dominat˜ ing linear operator TW ˜ is singular. However, if W11 is nonderogatory, then Proposition 3 shows that the restricted operator T W ˜ : symm(n) → skew(n), where symm(n) {skew(n)} denotes the set of {skew-}symmetric n × n matrices, is onto. This allows us to define an operator T †W ˜ : skew(n) → sym(n),
15
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
which maps a skew-symmetric matrix Q ∈ R n×n to the minimal Frobenius˜ 11 − W ˜ T R = Q, which must be symmetric according to norm solution of RW 11 Proposition 3. The norm of T†W ˜ induced by the Frobenius-norm is given by T†W ˜
:=
sup Q=0 Q∈skew(n)
† T TW ˜ (Q) F
Q F
.
(22)
This can be used to estimate the norm of a solution of the nonlinear equation (21). ˜ ij be defined by (19) and assume that W ˜ 11 is Theorem 7. Let the matrices W † 2 † ˜ ˜ nonderogatory. If 4 TW ˜ · W12 F · W21 F < 1 with TW ˜ as in (22), then there is a symmetric solution R of (21) satisfying ˜ 12 F . R F ≤ 2 T†W˜ · W Proof. This result can be proven along the lines of the proof of Theorem 2.11 in [Stewart and Sun, 1990] by constructing a sequence R0 = 0,
˜ Ri+1 = T†W ˜ (W21 − Φ(Ri ))
and applying the contraction mapping theorem [Ortega and Rheinboldt, 1970] to this sequence. Using the fact that the tangents of the canonical angles between X and Xˆ are the singular values of R [Stewart and Sun, 1990, p.232], Theorem 7 implies the following perturbation bound for isotropic invariant subspaces. Corollary 8. Let W, E ∈ R2n×2n be skew-Hamiltonian matrices, and let the columns of X ∈ R2n×n form an orthonormal basis for an isotropic invariant ˜ 11 = X T (W + E)X is nonderogatory and subspace X of W . Assume that W † 2 † that 4 TW ˜ · W + E F · E F < 1, with TW ˜ defined as in (22). Then ˆ there is an isotropic invariant subspace X of W + E satisfying tan[Θ(X , Xˆ )] F ≤ α T†W ˜ · E F ,
(23)
where α ≤ 2. It should be remarked that the factor α in (23) can be made arbitrarily close to one if we let E F → 0. Furthermore, (23) still holds in an approximate sense T R corresponding to if the operator TW W11 − W11 ˜ is replaced by TW : R → RW the unperturbed block Schur decomposition (18). This shows that the structured condition number for an isotropic invariant subspace of a skew-Hamiltonian matrix satisfies Θ(X , Xˆ ) F cW (X ) := lim sup ≤ T†W . ε→0 ε EF ≤ε E skew-Hamiltonian
16
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
It can be shown that actually cW (X ) = T†W holds [Kressner, 2003d]. An extension of this condition number to lower-dimensional isotropic invariant subspaces and a discussion on the computation of T †W can also be found in [Kressner, 2003d]. Example 9. For the matrix W (0, 0) from Example 4, the structured condition number of the Langrangian invariant subspace X spanned by the columns of [II2 , 0]T is small, since cW (X ) = (II2 ⊗ diag(1, 2) − diag(1, 2) ⊗ I2 )† 2 = 1. This implies that a strongly backward stable method is guaranteed to compute an excellent approximation of X . On the other hand, the unstructured condition number c(X ) must be considered as infinite due to the fact that X is not simple. Hence, a method which is not strongly backward stable may return arbitrarily bad results. Similar remarks apply to the eigenvectors of W .
2.3
Algorithms
In Section 2.1 we used a constructive approach to prove the skew-Hamiltonian Schur decomposition ˜ T G T U WU = , (24) 0 TT where U is orthogonal symplectic and T has real Schur form. The following algorithm summarizes this construction. Algorithm 10 (Skew-Hamiltonian Schur decomposition). A skew-Hamiltonian matrix W ∈ R 2n×2n . Input: Output: An orthogonal symplectic matrix U ∈ R 2n×2n ; W is overwritten with U T W U having skew-Hamiltonian Schur form (24). Apply Algorithm 2 to compute an orthogonal symplectic matrix U so that W ← U T W U has PVL form (6). Apply the QR algorithm to the (1, 1) block W 11 of W to compute an orthogonal matrix Q so that QT W11 Q has real Schur form. Update W ← (Q ⊕ Q)T W (Q ⊕ Q), U ← U (Q ⊕ Q). This algorithm requires around 20n 3 flops if only the eigenvalues are desired, and 44n3 flops if the skew-Hamiltonian Schur form and the orthogonal
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
17
symplectic factor U are computed, where we used the flop estimates for the QR algorithm listed in [Golub and Van Loan, 1996, p.359]. This compares favorably with the QR algorithm applied to the whole matrix W , which takes 80n3 and 200n3 flops, respectively. The finite precision properties of this algorithm are as follows. Similarly as for the QR algorithm [Wilkinson, 1965] one can show that there is an orthogonal symplectic matrix V which transforms the ˆ = Tˆ ˆGˆT to a skew-Hamiltonian computed skew-Hamiltonian Schur form W 0 T ˆ V T = W + E, where E is skew-Hamiltonian, matrix near to W , i.e., V W E 2 = O(u) W 2 and u denotes the unit roundoff. Moreover, the computed ˆ is almost orthogonal in the sense that U ˆT U ˆ − I 2 = O(u), and it factor U ˆ ˆ
ˆ = Uˆ1 Uˆ2 . This implies that U ˆ is close to an has the block representation U −U 2 U 1
orthogonal symplectic matrix, see e.g. [Kressner, 2004, Lemma 4.6]. Once a skew-Hamiltonian Schur decomposition has been computed, the eigenvalues can be easily obtained from the diagonal blocks of T . Furthermore, if the (k + 1, k) entry of T is zero, then the first k columns of U span an isotropic invariant subspace of W . Other isotropic invariant subspaces can be obtained by swapping the diagonal blocks of T as described, e.g., in [Bai and Demmel, 1993; Bai et al., 1993].
Symplectic QR decomposition The following algorithm is only indirectly related to the skew-Hamiltonian eigenvalue problem. It can be used, for example, to compute orthonormal bases for isotropic subspaces. Let A ∈ R 2m×n with m ≥ n, then there exists an orthogonal symplectic matrix Q ∈ R 2m×2m so that A = QR and 0 .@ .. R11 0 R= , R11 = @ , R21 = , (25) R21 0 0 that is, the matrix R11 ∈ Rm×n is upper triangular and R21 ∈ Rm×n is strictly upper triangular. A decomposition of this form is called symplectic QR decomposition [Bunse-Gerstner, 1986] and can be computed by the following algorithm. Algorithm 11 (Symplectic QR decomposition). Input: A general matrix A ∈ R2m×n with m ≥ n. Output: An orthogonal symplectic matrix Q ∈ R 2m×2m ; A is overwritten with R = QT A having the form (25). Q ← I2m . for j ← 1, . . . , n Set x ← Aej . Apply Algorithm 1 to compute Ej (x). Update A ← Ej (x)T A, Q ← QE Ej (x).
18
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
end for If properly implemented this algorithm requires 8(mn 2 − n3 /3) + O(n2 ) 3 2 2 flops for computing the matrix R, and additionally 16 3 n + 16m n − 16mn + 2 O(n ) flops for accumulating the orthogonal factor Q in reversed order.
Other algorithms Similarly as the Hessenberg form of a general matrix can be computed by Gauss transformations [Golub and Van Loan, 1996, Sec. 7.4.7], it is shown in [Stefanovski and Trenˇcˇ evski, 1998] how non-orthogonal symplectic transformations can be used to compute the PVL form of a skew-Hamiltonian matrix. A modification of the Arnoldi method, suitable for computing eigenvalues and isotropic invariant subspaces of large and sparse skew-Hamiltonian matrices, has been proposed by [Mehrmann and Watkins, 2000]. Balancing a matrix by a simple and accurate similarity transformation may have a positive impact on the performance of numerical methods for computing eigenvalues. A structure-preserving balancing procedure based on symplectic similarity transformations is presented in [Benner, 2000]. The LAPACK [Anderson et al., 1999] subroutines for computing standard orthogonal decompositions, such as the QR or Hessenberg decomposition, attain high efficiency by (implicitly) employing W Y representations of the involved orthogonal transformations [Bischof and Van Loan, 1987; Dongarra et al., 1989; Schreiber and Van Loan, 1989]. A variant of this representation can be used to derive efficient block algorithms for computing orthogonal symplectic decompositions, such as the symplectic QR and URV decompositions [Kressner, 2003a].
3.
The Hamiltonian Eigenvalue Problem One of the most remarkable properties of a Hamiltonian matrix A G H= ∈ R2n×2n T Q −A
is that its eigenvalues always occur in pairs {λ, −λ}, if λ ∈ R, λ ∈ ıR, or ¯ −λ}, ¯ if λ ∈ C\(R ∪ ıR). The preservation of these in quadruples {λ, −λ, λ, pairings in finite precision arithmetic is a major benefit of using a structurepreserving algorithm for computing the eigenvalues of H. Generally, we will only briefly touch the difficulties that arise when H has eigenvalues on the imaginary axis. Although this case is well-analyzed with respect to structured decompositions, see [Lin and Ho, 1990; Lin et al., 1999; Freiling et al., 2002] and the references given therein, it is still an open research problem to define appropriate structured condition numbers and design satisfactory algorithms for this case.
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
3.1
19
Structured Decompositions
A major difficulty in developing computational methods for the Hamiltonian eigenvalue problem is that there is so far no O(n 3 ) method for computing a useful structured Hessenberg-like form known. Although a slight modification of Algorithm 2 can be used to construct an orthogonal symplectic matrix U so that ⎡ ⎤ W W @ 11 12 ⎦, U T HU = =⎣ T W21 W11
@ @
i.e., W11 has upper Hessenberg form and W21 is a diagonal matrix, this form is of limited use. The Hamiltonian QR algorithm, see Section 3.3 below, only preserves this form if the (2, 1) block can be written as W 21 = γen eTn for some γ ∈ R. In this case, U T HU is called a Hamiltonian Hessenberg form. Byers derived in [Byers, 1983] a simple method for reducing H to such a form under the assumption that one of the off-diagonal blocks G or Q in H has tiny rank, i.e., rank 1, 2 or at most 3. The general case, however, remains elusive. That it might be difficult to find a simple method is indicated by a result in [Ammar and Mehrmann, 1991], which shows that the first column x of an orthogonal symplectic matrix U that reduces H to Hamiltonian Hessenberg form has to satisfy the set of nonlinear equations xT JH 2i−1 x = 0, i = 1, . . . , n. This result can even be extended to non-orthogonal symplectic transformations [Raines and Watkins, 1994]. A Schur-like form for Hamiltonian matrices is given by the following theorem [Paige and Van Loan, 1981; Lin et al., 1999]. Theorem 12. Let H be a Hamiltonian matrix and assume that all eigenvalues of H that are on the imaginary axis have even algebraic multiplicity. Then, there is an orthogonal symplectic matrix U so that U T HU is in Hamiltonian Schur form, i.e., ˜ T G T U HU = , (26) 0 −T T where T ∈ Rn×n has real Schur form (9). If H has no eigenvalues on the imaginary axis, then the invariant subspace X belonging to the n (counting multiplicities) eigenvalues in the open left half plane is called the stable invariant subspace of H. By a suitable reordering of the Hamiltonian Schur form, see also Section 3.3, one can see that X is isotropic. If the columns of X form an orthonormal basis for X , then [X, JX]
20
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
is orthogonal and we have the Hamiltonian block-Schur decomposition T A11 G11 X JX H X JX = . 0 −AT11
3.2
Structured Condition Numbers
An extensive perturbation analysis of (block) Hamiltonian Schur forms for the case that H has no purely imaginary eigenvalues has been presented in [Konstantinov et al., 2001]. The analysis used therein is based on the technique of splitting operators and Lyapunov majorants. The approach used in this section is somewhat simpler; it is based on the perturbation expansions given in Theorem 5.
Structured condition numbers for eigenvalues Let λ be a simple eigenvalue of a Hamiltonian matrix H with right and left eigenvectors x and y, respectively. The perturbation expansion (13) implies that for a sufficiently small perturbation ˆ of W + E so that E, there is an eigenvalue λ ˆ − λ| = |λ
|y H Ex| x 2 · y 2 + O( E 22 ) ≤ E 2 + O( E 22 ). H |y x| |y H x|
(27)
If λ is real then we may assume that x and y are real and normalized so that x 2 = y 2 = 1. For the Hamiltonian perturbation E = ε[y, Jx] · [x, J T y]H we have |y H Ex| = ε(1 + |y H Jx|2 ) and E 2 = ε [x, Jy] 22 = ε(1 + |y H Jx|). √ The minimum of (1+|y H Jx|2 )/(1+|y H Jx|) is β = 2 2−2. This implies that for ε → 0 both sides in (27) differ at most by a factor 1/β. Hence, the structured eigenvalue condition number for a simple eigenvalue of a Hamiltonian matrix, cH (λ) := lim
ε→0
sup E2 ≤ε E is Hamiltonian
ˆ − λ| |λ , ε
satisfies the inequalities
√ (2 2 − 2)c(λ) ≤ cH (λ) ≤ c(λ),
if λ ∈ R. This inequality still holds for complex λ if one allows complex Hamiltonian perturbations E, i.e., (EJ)H = EJ. A tight lower bound for the structured condition number of a complex eigenvalue under real perturbations is an open problem. Structured backward errors and condition numbers for eigenvalues of Hamiltonian matrices with additional structures can be found in [Tisseur, 2003].
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
21
Structured condition numbers for invariant subspaces Let the columns of X ∈ R2n×k span a simple, isotropic invariant subspace X of H. By the symplectic QR decomposition (25) there is always a matrix Y ∈ R 2n×k so that U = [X, Y, JX, JY ] is an orthogonal symplectic matrix. Moreover, we have the block Hamiltonian Schur form ⎤ ⎡ G12 A11 A12 G11 ⎢ 0 A22 GT12 G22 ⎥ ⎥. U T HU = ⎢ T ⎣ 0 0 −A11 0 ⎦ 0 Q22 −AT12 −AT22 Assuming that the perturbation E is sufficiently small, the perturbation expanˆ so that Xˆ = span X ˆ is an invariant sion (14) implies that there is a matrix X subspace of H + E satisfying ˆ = X − X⊥ T−1 X T EX + O( E 2 ), X ⊥ F H ˆ T (X ˆ − X) = 0. The involved Sylvester operator T H is given by and X ⎤⎡ ⎤ ⎡ ⎤ ⎡ A22 GT12 G22 R1 R1 TH : (R1 , R2 , R3 ) → ⎣ 0 −AT11 0 ⎦ ⎣ R2 ⎦ − ⎣ R2 ⎦ A11 . T R3 R3 Q22 −A12 −AT22 T EX takes the form If the perturbation E is Hamiltonian, then X ⊥
S = [AT21 , QT11 , QT21 ]T , where Q11 ∈ symm(k) and A21 , Q21 are general (n − k) × k matrices. Hence, if we let T−1 −1 (n−k)×k (n−k)×k H (S) F TH := sup , |S ∈R × symm(k) × R S F S=0 then the structured condition number for an isotropic invariant subspace of a Hamiltonian matrix satisfies Θ(X , Xˆ ) F cH (X ) = lim sup = T−1 H . ε→0 EF ≤ε ε E Hamiltonian
Obviously, this quantity coincides with the unstructured condition number if X is one-dimensional, i.e., X is spanned by a real eigenvector. A less trivial observation is that the same holds if X is the stable invariant subspace, i.e., the n-dimensional subspace belonging to all eigenvalues in the left half plane. To show this, first note that in this case T−1 H =
sup S=0 S∈symm(n)
T−1 H (S) F = S F
inf
S=0 S∈symm(n)
A11 S + SAT11 F . S F
22
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Using a result in [Byers and Nash, 1987], we have inf
S=0 S∈symm(n)
A11 S + SAT11 F A11 S + SAT11 F = inf , S=0 S F S F
which indeed shows that the structured and unstructured condition numbers for the maximal stable invariant subspace coincide. However, there is a severe loss if we do not require E to be Hamiltonian; the subspace Xˆ might not be isotropic. To obtain a nearby isotropic subspace one ˆ of Xˆ . can apply the symplectic QR decomposition to an orthonormal basis X This yields the orthonormal basis Z of an isotropic subspace Z = span Z so that ˆ − X F ≤ 2cH (X ) E F + O( E 2F ). Z − X F ≤ 2 X T H X ˆ F Note that for the original subspace Xˆ we have the desirable property Xˆ⊥ ˆ ⊥ form an orthonormal basis for Xˆ ⊥ . For = E F , where the columns of X the isotropic subspace Z, however, we can only guarantee
(JZ)T HZ F ≤ 4cH (X ) · H F · E F + O( E 2F ), which signals a severe loss of backward stability. The following numerical example demonstrates the undesirable appearance of the factor c H (X ) in (JZ)T HZ F . Example 13. Let ⎡
⎤ −10−5 −1 1 0 ⎢ 1 0 0 1 ⎥ ⎥, H=⎢ −5 ⎣ 0 0 10 −1 ⎦ 0 0 1 0 and consider the stable invariant subspace spanned by the columns of X = [II2 , 0]T , which has condition number 105 . If we add a random (non-Hamiltonian) perturbation E with E F = 10−10 to H, and compute (using Matlab) ˆ for the invariant subspace Xˆ of H + E belonging to an orthonormal basis X the eigenvalues in the open left half plane, we observe that ˆ T H X ˆ F ≈ 4.0 × 10−11 . X ⊥ By computing a symplectic QR decomposition of Xˆ we constructed an orthonormal basis Z satisfying Z T (JZ) = 0 and observed ˜ T H Z ˜ F ≈ 4.7 × 10−6 . (J Z)
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
3.3
23
Algorithms
An explicit Hamiltonian QR algorithm The Hamiltonian QR algorithm [Byers, 1983] is a strongly backward stable method for computing the Hamiltonian Schur form of a Hamiltonian matrix H with no purely imaginary eigenvalues. Its only obstacle is that there is no implementation of complexity less than O(n4 ) known, except for the case when a Hamiltonian Hessenberg form exists [Byers, 1983; Byers, 1986]. One iteration of the Hamiltonian QR algorithm computes the symplectic QR decomposition of the first n columns of the symplectic matrix M = [(H − σ1 I)(H − σ2 I)][(H + σ1 I)(H + σ2 I)]−1 ,
(28)
where {σ1 , σ2 } is a pair of real or complex conjugate shifts. This yields an orthogonal symplectic matrix U so that ⎤ ⎡ UT M = ⎣
@
@
⎦.
(29)
The next iterate is obtained by updating H ← U T HU . Let us partition H as follows ⎡ ⎤ A11 A12 G11 G12 ⎢ A21 A22 GT12 G22 ⎥ ⎥ H =⎢ (30) ⎣ Q11 Q12 −AT11 −AT21 ⎦ , QT12 Q22 −AT12 −AT22 with A11 ∈ R2×2 and A22 ∈ Rn−2×n−2 . Under rather mild assumptions and a fortunate choice of shifts, it can be shown that the submatrices A 21 , Q11 and Q12 converge to zero, i.e., H converges to a Hamiltonian block-Schur form [Watkins and Elsner, 1991]. Choosing the shifts s 1 , s2 as those eigenvalues A11 G11 that have positive real part results in quadratic of the submatrix Q T 11 −A11 convergence. If this submatrix has two imaginary eigenvalues, then we suggest to choose the one eigenvalue with positive real part twice, and if there are four purely imaginary eigenvalues, then our suggestion is to choose random shifts. If the norms of the blocks A21 , Q11 and Q12 become less than u · H F , then
safely regard them as zero and apply the iteration to the submatrix we may A22 G22 . This will finally yield a Hamiltonian Schur form of H. Note that Q22 −AT 22 the Hamiltonian QR algorithm will generally not converge if H has eigenvalues on the imaginary axis. In our numerical experiments, however, we often observed convergence
to a Hamiltonian block-Schur form, where the unreduced A22 G22 block Q22 −AT contains all eigenvalues on the imaginary axis. 22
24
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Remark 14. One can avoid the explicit computation of the potentially illconditioned matrix M in (28) by the following product QR decomposition approach. First, an orthogonal matrix Q r is computed so that (H + σ1 I)(H + σ2 I)QTr has the block triangular structure displayed in (29). This can be achieved by a minor modification of the standard RQ decomposition [Benner et al., 1998]. Secondly, the orthogonal symplectic matrix U is computed from the symplectic QR decomposition of the first n columns of (H −σ 1 I)(H − σ2 I)QTr .
Reordering a Hamiltonian Schur decomposition If the Hamiltonian QR algorithm has successfully computed a Hamiltonian Schur decomposition, ˜ T G T U HU = (31) 0 −T T then the first n columns of the orthogonal symplectic matrix U span an isotropic subspace belonging to the eigenvalues of T . Many applications require the stable invariant subspace, for this purpose the Schur decomposition (31) must be reordered so that T contains all eigenvalues with negative real part. One way to achieve this is as follows. If there is a block in T which contains a real eigenvalue or a pair of complex conjugate eigenvalues with positive real part, then this block is swapped to the bottom right diagonal block T 22 of T using the algorithms described in [Bai and Demmel, 1993; Bai et al., 1993]. Now, ˜ it remains to find an orthogonal let G22 denote the corresponding block in G; symplectic matrix U22 so that ˜ 22 T22 G22 T˜22 G T U22 U = (32) 22 T T 0 −T T22 0 −T˜22 and the eigenvalues of T˜22 have negative real part. If X is the solution of T = G , then X is symmetric and the the Lyapunov equation T22 X − XT T22 22 T columns of [−X, I] span an isotropic subspace. Thus, there is a symplectic QR decomposition −X R = U22 I 0 By direct computation, it can be shown that U 22 is an orthogonal symplectic matrix which produces a reordering of the form (32). Bai and Demmel, 1993, show that in some pathological cases, the norm of the (2, 1) block in the reordered matrix may be larger than O(u) H F . In this case, which may only happen if the eigenvalues of T22 are close to the imaginary axis, the swap must be rejected in order to guarantee the strong backward stability of the algorithm. A different kind of reordering algorithm, which is based on Hamiltonian QR iterations with perfect shifts, can be found in [Byers, 1983].
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
25
Conclusively, we have a method for computing eigenvalues and selected invariant subspaces of Hamiltonian matrices. This method is strongly backward stable and reliable, as long as there are no eigenvalues on the imaginary axis. However, as mentioned in the beginning, in general it requires O(n 4 ) flops, making it unattractive for decently large problems.
Algorithms based on H 2 One of the first O(n3 ) structure-preserving methods for the Hamiltonian eigenvalue problem was developed in [Van Loan, 1984b]. It is based on the fact that H 2 is a skew-Hamiltonian matrix, because (H 2 J)T = (HJ)T H T = HJH T = −H(HJ)T = −H 2 J. Thus, one can apply Algorithm 10 to H 2 and take the positive and negative square roots of the computed eigenvalues, which gives the eigenvalues of H. An implicit version of this algorithm has been implemented in [Benner et al., 2000]. The main advantage of this approach is that the eigenvalue symmetries of H are fully recovered in finite precision arithmetic. Also, the computational cost is low when compared to the QR algorithm. The disadvantage of Van Loan’s method is that a loss of accuracy up to half the number of significant digits of the computed eigenvalues of H is possible. An error analysis in [Van ˆ satisfies Loan, 1984b] shows that for an eigenvalue λ of H the computed λ √ ˆ − λ| c(λ) · min{u H 2 /|λ|, u H 2 }. |λ 2 This√indicates that particularly eigenvalues with |λ| H 2 are affected by the u-effect. Note that a similar effect occurs when one attempts to compute the singular values of a general matrix A from the eigenvalues of A T A, see e.g. [Stewart, 2001, Sec. 3.3.2]. An algorithm that is based on the same idea but achieves numerical backward stability by completely avoiding the squaring of H was developed in [Benner et al., 1998]. In the following, we show how this algorithm can be directly derived from Algorithm 10. In lieu of H 2 we make use of the skew-Hamiltonian matrix ⎡ ⎤ 0 A 0 G ⎢ −A 0 −G 0 ⎥ 4n×4n ⎥ W =⎢ , (33) ⎣ 0 Q 0 −AT ⎦ ∈ R −Q 0 AT 0
A G 0 H for given H = Q . As W is permutationally equivalent to −AT −H 0 , √ we see that ±λ is an eigenvalue of H if and only if ± −λ2 is an eigenvalue of W . Note that the matrix W has a lot of extra structure besides being skewHamiltonian, which is not exploited if we apply Algorithm 10 directly to W . ˜ = (P ⊕ P )T W (P ⊕ P ), where Instead, we consider the shuffled matrix W P = e1 e3 · · · e2n−1 e2 e4 · · · e2n .
26
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
This matrix has the form
˜ = W
˜A W ˜G W ˜Q W ˜T W A
,
˜ A, W ˜ G and W ˜ Q is a block matrix composed of where each of the matrices W two-by-two blocks having the form
n 0 xij ˜X = W . −xij 0 i,j=1 ˜ has the form If an orthogonal symplectic matrix Q ⎡ V1 0 V2 0 ⎢ U1 0 U2 ˜ = (P ⊕ P )T ⎢ 0 Q ⎣ −V V2 0 V1 0 0 −U U 2 0 U1
⎤ ⎥ ⎥ (P ⊕ P ), ⎦
(34)
˜T W ˜Q ˜ is skew-Hamiltonian and has the same zero pattern as W ˜. then Q Lemma 15. The orthogonal symplectic factor of the PVL decomposition computed by Algorithm 2 has the form (34). ˜ has been Proof. Assume that after (j − 1) loops of Algorithm 2 the matrix W ˜ ˜ denote the overwritten by a matrix with the same zero pattern as W . Let x ˜ . If j is odd then x jth column of W ˜ can be written as x ˜ = x ⊗ e 2 and if j is even then x ˜ = x ⊗ e1 , where x is a vector of length 2n and e1 , e2 are unit vectors of length two. This implies that Algorithm 1 produces an elementary ˜ in (34), orthogonal matrix Ej (x) having the same zero pattern as the matrix Q see [Kressner, 2003c]. This shows that the jth loop of Algorithm 2 preserves ˜ . The proof is concluded by using the fact that the set of the zero pattern of W matrices having the form (34) is closed under multiplication. This also shows that the PVL form returned by Algorithm 2 must take the form ⎡ ⎤ 0 R22 0 R12 T ⎢ 0 −R12 0 ⎥ ˜T W ˜Q ˜ = (P ⊕ P )T ⎢ −R11 ⎥ Q (35) T ⎦ (P ⊕ P ), ⎣ 0 0 0 −R11 T 0 0 R22 0 where R11 is an upper triangular matrix and R22 is an upper Hessenberg matrix. ˜ and Q ˜ yields Rewriting (35) in terms of the block entries of W ⎡ ⎤ R R @ 11 12 ⎦ U T HV = =⎣ (36) T 0 −R22
@
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
27
V2 V1 ,
U1 U2 V1 with the orthogonal symplectic matrices U = −U and V = U2 U1 −V V2 ˜ in (34). This is a so called symplectic URV formed from the entries of Q decomposition [Benner et al., 1998]. ˜ it should As Algorithm 2 exclusively operates on the nonzero entries of W be possible to reformulate it purely in terms of these entries. This amounts to the following algorithm [Benner et al., 1998, Alg. 4.4]. Algorithm 16 (Symplectic URV decomposition). Input: A matrix H ∈ R2n×2n . Output: Orthogonal symplectic matrices U, V ∈ R 2n×2n ; H is overwritten with U T HV having the form (36). U ← I2n , V ← I2n . for j ← 1, 2, . . . , n Set x ← Hej . Apply Algorithm 1 to compute Ej (x). Update H ← Ej (x)T H, U ← U Ej (x). if j < n then Set y ← H T en+j . Apply Algorithm 1 to compute En+j+1 (y). Update H ← HEn+j+1 (y), V ← V En+j+1 (y). end if end for
3 2 If properly implemented, Algorithm 16 requires 80 3 n +O(n ) floating point 16 3 operations (flops) to reduce H and additionally 3 n + O(n2 ) flops to compute each of the orthogonal symplectic factors U and V . Note that Algorithm 16 does not require H to be a Hamiltonian matrix, but even if H is Hamiltonian, this structure will be destroyed. In the second step of Algorithm 10, the QR algorithm is applied to the upper left 2n × 2n block of the PVL form (35). In [Kressner, 2003c] it is shown that this is equivalent to applying the periodic QR algorithm [Bojanczyk et al., 1992; Hench and Laub, 1994; Van Loan, 1975] to the matrix product −R 22 · R11 , which constructs orthogonal matrices Q 1 and Q2 so that QT1 R22 Q2 is reduced to real Schur form while QT2 R11 Q1 stays upper triangular. The periodic QR algorithm is a backward stable method for computing the eigenvalues of R22 · R11 . The positive and negative square roots of these eigenvalues are the eigenvalues of H. The procedure, as described above, is a numerically backward stable method for computing the eigenvalues of a Hamiltonian matrix H. It preserves the eigenvalue symmetries of H in finite precision arithmetic and its complexity is O(n3 ). As the periodic QR algorithm inherits the reliability of the standard QR algorithm, this method can be regarded as highly reliable. Its only drawback is
28
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
E6
E6
E1
E1 E7
E7
E2
E2
Figure 3.
Illustration of two loops of Algorithm 16 for n = 4.
that it does not take full advantage of the structure of H. It is not clear whether the method is strongly backward stable or not.
Computation of invariant subspaces based on H 2 Having computed an invariant subspace for the skew-Hamiltonian matrix H 2 it is possible to extract invariant subspaces for H from it [Xu and Lu, 1995; Hwang et al., 2003]. However, we have already observed that the explicit computation of H 2 can lead to numerical instabilities and should be avoided. The above idea of embedding H in a skew-Hamiltonian matrix W of double dimension can be extended for computing invariant subspaces, see [Benner et al., 1997]. However, it should be noted that this approach might encounter numerical difficulties if H has eigenvalues on or close to the imaginary axis. Refinement of stable invariant subspaces With all the difficulties in deriving a strongly backward stable method it might be preferable to use some kind of iterative refinement algorithms to improve the quantities computed by a less stable method. This idea is used, for example, in the multishift algo-
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
29
rithm [Ammar et al., 1993] and hybrid methods for solving algebraic Riccati equations [Benner and Faßbender, 2001]. In the following we describe a method for improving an isotropic subspace Xˆ that approximates the stable invariant subspace X of a Hamiltonian matrix ˆ span an orthonormal basis for Xˆ and consider H. Let the columns of X ˜ T A˜ G ˆ ˆ ˆ ˆ H X JX = ˜ . X JX Q −A˜T ˆ has been computed by a strongly backward stable method then Q ˜ is If X ˆ of order u · H and it is not possible to refine X much further. However, as ˜ might be we have seen before, if a less stable method has been used then Q much larger. In this case we can apply the following algorithm to improve the ˆ accuracy of X. Algorithm 17. ˆ ∈ R2n×n Input: A Hamiltonian matrix H ∈ R 2n×2n , a matrix X ˆ J X] ˆ is orthogonal, and a tolerance tol > 0. so that [X, ˆ ˆ T H X ˆ F ≤ tol · Output: The matrix X is updated until (J X) H F . JˆX )T Hˆ while (Jˆ HˆX F > tol · H F Set˜ ˜A ←ˆ ˆX T Hˆ HˆX and˜˜Q ← (Jˆ JˆX )T Hˆ HˆX . Solve the Lyapunov equation R˜A +˜ ˜AT R = −˜Q. 2n×n so that [Y, JY ] is orthogonal and Compute Y ∈ R
I using a symplectic QR decomposition. span Y = span −R Update [ˆX, JˆX ] ← [ˆX, JˆX ] · [Y, JY ]. end while As this algorithm is a special instance of a Newton method for refining invariant subspaces [Stewart, 1973; Chatelin, 1984; Demmel, 1987; Benner, 1997] ¨ or a block Jacobi-like algorithm [Huper and Van Dooren, 2003] it converges locally quadratic. On the other hand, Algorithm 17 can be seen as a particular implementation of a Newton method for solving algebraic Riccati equation [Kleinman, 1968; Lancaster and Rodman, 1995; Mehrmann, 1991]. By a more general result in [Guo and Lancaster, 1998], this implies under some mild conditions global convergence if H has no eigenvalues on the imaginary axis ˆ so that all eigenvalues of and if the iteration is initialized with a matrix X T − ˜ ˆ ˆ A = X H X are in the open left half plane C . In finite precision arithmetic, the minimal attainable tolerance is tol ≈ n 2 · u under the assumption that a forward stable method such as the Bartels-Stewart method [Bartels and Stewart, 1972] is used to solve the Lyapunov equations ˜ [Higham, 1996; Tisseur, 2001]. RA˜ + A˜T R = −Q
30
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Other Algorithms As mentioned in the introduction there is a vast number of algorithms for the Hamiltonian eigenvalue problem available. Other algorithms based on orthogonal transformations are the Hamiltonian Jacobi algorithm [Byers, 1990; Bunse-Gerstner and Faßbender, 1997], its variants for Hamiltonian matrices that have additional structure [Faßbender et al., 2001] and the multishift algorithm [Ammar et al., 1993]. Algorithms based on symplectic but non-orthogonal transformations include the SR algorithm [Bunse-Gerstner and Mehrmann, 1986; Bunse-Gerstner, 1986; Mehrmann, 1991] as well as related methods [Bunse-Gerstner et al., 1989; Raines and Watkins, 1994]. A completely different class of algorithms is based on the matrix sign function, see, e.g., [Benner, 1999; Mehrmann, 1991; Sima, 1996] and the references therein. Other Newton-like methods directed towards the computation of invariant subspaces for Hamiltonian matrices can be found in [Absil and Van Dooren, 2002; Guo and Lancaster, 1998]. A structure-preserving Arnoldi method based on the H 2 approach was developed in [Mehrmann and Watkins, 2000; Kressner, 2004]. There are also a number of symplectic Lanczos methods available, see [Benner and Faßbender, 1997; Ferng et al., 1997; Watkins, 2002]. The remarks on balancing and block algorithms at the end of Section 2.3, carry over to Hamiltonian matrices. We only note that in [Benner and Kressner, 2003], a balancing algorithm is described which is particularly suited for large and sparse Hamiltonian matrices.
4.
Applications
Most applications of skew-Hamiltonian and Hamiltonian eigenvalue problems are in the area of systems and control theory. In the following, we consider a linear continuous-time system with constant coefficients, which can be described by a set of matrix differential and algebraic equations
x(t) ˙ = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t),
x(0) = x0 ,
(37)
where x(t) ∈ Rn is the vector of states, u(t) ∈ Rm the vector of inputs (or controls) and y(t) ∈ Rr the vector of outputs at time t ∈ [0, ∞). The system is described by the state matrix A ∈ R n×n , the input (control) matrix B ∈ R n×m , the output matrix C ∈ Rr×n and the feedthrough matrix D ∈ Rr×m . It is much beyond the scope of this work to give an introduction to such systems; for this purpose the reading might be complemented with any modern, state-space oriented monograph in this area, see e.g. [Green and Limebeer, 1995; Petkov et al., 1991; Van Dooren, 2003; Zhou et al., 1996].
31
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
4.1
Stability Radius Computation
The system (37) is called (asymptotically) stable if all eigenvalues λ(A) of the state matrix A lie in C− . It is often important to know how near the system is to an unstable one, i.e., what is the smallest norm perturbation E ∈ C n×n so that λ(A + E) ⊂ C− . This corresponds to the computation of the stability radius of A, which is defined as γ(A) := min{ E 2 : λ(A + E) ∩ ıR = ∅}. A bisection method for measuring γ(A) can be based on the following observation [Byers, 1988]: if α ≥ 0, then the Hamiltonian matrix H(α) =
A −αIIn αIIn −AT
has an eigenvalue on the imaginary axis if and only if α ≥ γ(A). This suggests a simple bisection algorithm. Start with a lower bound β ≥ 0 and an upper bound δ > γ(A) (an easy-to-compute upper bound is A+A T F /2 [Van Loan, 1984a]). Then in each step, set α := (β + δ)/2 and compute λ(H(α)). If there is an eigenvalue on the imaginary axis, choose δ = α, otherwise, set β = α. The correct decision whether H(α) has eigenvalues on the imaginary axis is crucial for the success of the bisection method. [Byers, 1988] shows that if the eigenvalues of H(α) are computed by a strongly backward stable method, then the computed γ(A) will be within an O(u) · A 2 -distance of the exact stability radius.
4.2
H∞ Norm Computation
A similar problem is the computation of the H ∞ norm of a stable system. Consider the transfer function G(s) of a stable system of the form (37), G(s) = C(sI − A)−1 B + D, then G H∞ = esssup{ G(ıω) 2 : ω ∈ R}. is the H∞ norm of G, see e.g. [Green and Limebeer, 1995; Zhou et al., 1996]. Let σmax (D) denote the largest singular value of D and let α ∈ R be such that α > σmax (D). Then consider the parameter-dependent Hamiltonian matrix H(α) =
H11 (α) H12 (α) H21 (α) −H11 (α)T
,
32
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
where for R(α) = α2 I − D T D, H11 (α) = A + BR(α)−1 D T C, 1 H12 (α) = BR(α)−1 B T , α2 H21 (α) = −C T (I + DR(α)−1 D T )C. The following result can be used to approximate G H∞ , see e.g. [Zhou et al., 1996]: G H∞ < α ⇔ σmax (D) < α and λ(H(α)) ∩ ıR = ∅. Using this fact, a bisection algorithm analogous to the stability radius computation can be formulated, starting with lower bound β = σ max (D) and upper bound δ > G H∞ , see [Boyd et al., 1989] for details. Again, the bisection algorithm benefits if the decisions are based on eigenvalues computed by a method preserving the eigenvalue symmetries of H(α). Faster convergent versions of this algorithm, which may also involve the eigenvectors of H(α), can be found in [Boyd and Balakrishnan, 1990; Bruinsma and Steinbuch, 1990; Genin et al., 1998].
4.3
Algebraic Riccati Equations
Given a Hamiltonian matrix H as in (2), there is always a corresponding algebraic Riccati equation (ARE) 0 = Q + AT X + XA − XGX.
(38)
AREs have played a fundamental role in systems and control theory since the early 1960’s as they are the major tool to compute feedback controllers using LQR/LQG (H2 ) or H∞ approaches. The correspondence between feedback controllers and AREs can be found in literally any modern textbook on control, see, e.g., [Anderson and Moore, 1990; Green and Limebeer, 1995; Zhou et al., 1996] and many others. In these applications, usually a particular solution of (38) is required which is stabilizing in the sense that λ(A − GX) is contained in the open left half plane. This solution is unique if it exists and is related to the Hamiltonian eigenproblem as follows. Suppose X is a symmetric solution of (38), then it is easy to see that A − GX G In In 0 0 H = . −X In −X In 0 −(A − GX)T Hence, the columns of [IIn , X]T span an H-invariant subspace corresponding to λ(H) ∩ λ(A − GX). This implies that we can solve AREs by computing
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
33
H-invariant subspaces. In particular, if we want the stabilizing solution, we need the maximal stable H-invariant subspace. Suppose that a basis of this subspace is given by the columns of [X1T , X2T ]T with X1 , X2 ∈ Rn×n then, under mild assumptions, X1 is invertible and X = −X2 X1−1 is the stabilizing solution of (38). Therefore, any algorithm to compute invariant subspaces of Hamiltonian matrices may be used to solve AREs. For discussions of this topic see [Benner, 1999; Mehrmann, 1991; Sima, 1996]. It should be noted, though, that often the ARE is a detour. In feedback control, the solution of the ARE can usually be avoided by working only with the H-invariant subspaces, see [Benner et al., 2004; Mehrmann, 1991]. A correspondence between the skew-Hamiltonian eigenproblem (1) and the anti-symmetric ARE 0 = Q − AT X + XA − XGX,
Q = −QT , G = −GT ,
is discussed in [Stefanovski and Trenˇcˇ evski, 1998].
4.4
Quadratic Eigenvalue Problems
The quadratic eigenvalue problem (QEP) is to find scalars λ and nonzero vectors x satisfying (λ2 M + λG + K)x = 0, (39) where M, G, K ∈ Rn×n . It arises, for example, from linear systems that are governed by second order differential equations, see [Tisseur and Meerbergen, 2001]. Gyroscopic systems yield QEPs with symmetric positive definite M , skew-symmetric G and symmetric K. In this case, the eigenvalues of (39) have the same symmetries as in the Hamiltonian eigenvalue problem, i.e., if λ is an ¯ and −λ ¯ are also eigenvalues. eigenvalue then −λ, λ By [Mehrmann and Watkins, 2000], a linearization of (39) reflecting this property is the skew-Hamiltonian/Hamiltonian matrix pencil λW W L WR − H, where 0 −K I 12 G M 12 G WL = , WR = , H= . M 0 0 M 0 I From this, it is easy to see that WL−1 H WR−1 is Hamiltonian and has the same eigenvalues as (39). Hence, a structure-preserving algorithm applied to WL−1 H WR−1 will preserve the eigenvalue pairings of (39). This is particularly important for testing the stability of the underlying gyroscopic system, which amounts to checking whether all eigenvalues of (39) are on the imaginary axis, see e.g. [Tisseur and Meerbergen, 2001, Sec.5.3]. However, it should be noted that such an approach is only advisable as long as the matrix M is sufficiently well conditioned. Otherwise, structure-preserving
34
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
algorithms that work directly on the pencil λW W L WR − H should be preferred [Benner et al., 1998]. Linearizations that lead to skew-Hamiltonian eigenvalue problems are described in [Mehrmann and Watkins, 2000], and have been used for computing corner singularities in anisotropic elastic structures [Apel et al., 2002].
4.5
Other Applications
Other applications for Hamiltonian eigenvalue problems include passivity preserving model reduction [Antoulas and Sorensen, 2001; Sorensen, 2002], the computation of pseudospectra [Burke et al., 2003b] and the distance to uncontrollability [Gu, 2000; Burke et al., 2003a].
5.
Concluding Remarks
We have presented structured decompositions, condition numbers, algorithms and applications for skew-Hamiltonian and Hamiltonian eigenvalue problems. It is our hope that the reader is now convinced that the exploitation of such structures is an interesting area of research, not only from a theoretical point of view but also with respect to applications. Many problems remain open. In particular, Hamiltonian matrices with eigenvalues on the imaginary axis require further investigation. Most of the presented material is already available in the cited literature. In this survey, the novel pieces of the (skew-)Hamiltonian puzzle are: explicit formulas/bounds for the structured eigenvalue condition numbers; a relation between the structured and unstructured condition numbers for stable invariant subspaces of Hamiltonian matrices; a new reordering algorithm for the Hamiltonian Schur form based on symplectic QR decompositions; and the derivation of the symplectic URV decomposition from the PVL decomposition; a structure-preserving iterative refinement algorithm for stable invariant subspaces of Hamiltonian matrices.
Acknowledgments The authors gratefully thank Ralph Byers and Vasile Sima for helpful discussions.
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
35
References Absil, P.-A. and Van Dooren, P. (2002). Two-sided Grassmann Rayleigh quotient iteration. Submitted to SIAM J. Matrix Anal. Appl. Ammar, G., Benner, P., and Mehrmann, V. (1993). A multishift algorithm for the numerical solution of algebraic Riccati equations. Electr. Trans. Num. Anal., 1:33–48. Ammar, G. and Mehrmann, V. (1991). On Hamiltonian and symplectic Hessenberg forms. Linear Algebra Appl., 149:55–72. Anderson, B. and Moore, J. (1990). Optimal Control – Linear Quadratic Methods. Prentice-Hall, Englewood Cliffs, NJ. Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J. J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., and Sorensen, D. (1999). LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition. Antoulas, A. C. and Sorensen, D. C. (2001). Approximation of large-scale dynamical systems: an overview. Int. J. Appl. Math. Comput. Sci., 11(5):1093–1121. Apel, T., Mehrmann, V., and Watkins, D. S. (2002). Structured eigenvalue methods for the computation of corner singularities in 3D anisotropic elastic structures. Comput. Methods Appl. Mech. Engrg, 191:4459–4473. Bai, Z., Demmel, J., and McKenney, A. (1993). On computing condition numbers for the nonsymmetric eigenproblem. ACM Trans. Math. Software, 19(2):202–223. Bai, Z. and Demmel, J. W. (1993). On swapping diagonal blocks in real Schur form. Linear Algebra Appl., 186:73–95. Bartels, R. H. and Stewart, G. W. (1972). Algorithm 432: The solution of the matrix equation AX − BX = C. Communications of the ACM, 8:820–826. Benner, P. (1997). Contributions to the Numerical Solution of Algebraic Riccati Equations and Related Eigenvalue Problems. Logos–Verlag, Berlin, Germany. Benner, P. (1999). Computational methods for linear-quadratic optimization. Supplemento ai Rendiconti del Circolo Matematico di Palermo, Serie II, No. 58:21–56. Benner, P. (2000). Symplectic balancing of Hamiltonian matrices. SIAM J. Sci. Comput., 22(5): 1885–1904. Benner, P., Byers, R., and Barth, E. (2000). Algorithm 800: Fortran 77 subroutines for computing the eigenvalues of Hamiltonian matrices I: The square-reduced method. ACM Trans. Math. Software, 26:49–77. Benner, P., Byers, R., Mehrmann, V., and Xu, H. (2004). Robust numerical methods for robust ¨ Mathematik, TU Berlin. control. Technical Report 06-2004, Institut ffur Benner, P. and Faßbender, H. (1997). An implicitly restarted symplectic Lanczos method for the Hamiltonian eigenvalue problem. Linear Algebra Appl., 263:75–111. Benner, P. and Faßbender, H. (2001). A hybrid method for the numerical solution of discrete-time algebraic Riccati equations. Contemporary Mathematics, 280:255–269. Benner, P. and Kressner, D. (2003). Balancing sparse Hamiltonian eigenproblems. To appear in Linear Algebra Appl. Benner, P. and Kressner, D. (2004). Fortran 77 subroutines for computing the eigenvalues of Hamiltonian matrices II. In preparation. See also http://www.math.tu-berlin.de/ ~kressner/hapack/. Benner, P., Mehrmann, V., and Xu, H. (1997). A new method for computing the stable invariant subspace of a real Hamiltonian matrix. J. Comput. Appl. Math., 86:17–43.
36
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Benner, P., Mehrmann, V., and Xu, H. (1998). A numerically stable, structure preserving method for computing the eigenvalues of real Hamiltonian or symplectic pencils. Numerische Mathematik, 78(3):329–358. Bischof, C. and Van Loan, C. F. (1987). The W Y representation for products of Householder matrices. SIAM J. Sci. Statist. Comput., 8(1):S2–S13. Parallel processing for scientific computing (Norfolk, Va., 1985). Bojanczyk, A., Golub, G. H., and Dooren, P. V. (1992). The periodic Schur decomposition; algorithm and applications. In Proc. SPIE Conference, volume 1770, pages 31–42. Boyd, S. and Balakrishnan, V. (1990). A regularity result for the singular values of a transfer matrix and a quadratically convergent algorithm for computing its L∞ -norm. Systems Control Lett., 15(1):1–7. Boyd, S., Balakrishnan, V., and Kabamba, P. (1989). A bisection method for computing the H∞ norm of a transfer matrix and related problems. Math. Control, Signals, Sys., 2:207–219. Bruinsma, N. A. and Steinbuch, M. (1990). A fast algorithm to compute the H∞ -norm of a transfer function matrix. Sys. Control Lett., 14(4):287–293. Bunch, J. R. (1987). The weak and strong stability of algorithms in numerical linear algebra. Linear Algebra Appl., 88/89:49–66. Bunse-Gerstner, A. (1986). Matrix factorizations for symplectic QR-like methods. Linear Algebra Appl., 83:49–77. Bunse-Gerstner, A. and Faßbender, H. (1997). A Jacobi-like method for solving algebraic Riccati equations on parallel computers. IEEE Trans. Automat. Control, 42(8):1071–1084. Bunse-Gerstner, A. and Mehrmann, V. (1986). A symplectic QR like algorithm for the solution of the real algebraic Riccati equation. IEEE Trans. Automat. Control, 31(12):1104–1113. Bunse-Gerstner, A., Mehrmann, V., and Watkins, D. S. (1989). An SR algorithm for Hamiltonian matrices based on Gaussian elimination. In XII Symposium on Operations Research ¨ (Passau, 1987), volume 58 of Methods Oper. Res., pages 339–357. Athenaum/Hain/Hanstein, ¨ Konigstein. Burke, J. V., Lewis, A. S., and Overton, M. L. (2003a). Pseudospectral components and the distance to uncontrollability. Submitted to SIAM J. Matrix Anal. Appl. Burke, J. V., Lewis, A. S., and Overton, M. L. (2003b). Robust stability and a criss-cross algorithm for pseudospectra. IMA J. Numer. Anal., 23(3):359–375. Byers, R. (1983). Hamiltonian and Symplectic Algorithms for the Algebraic Riccati Equation. PhD thesis, Cornell University, Dept. Comp. Sci., Ithaca, NY. Byers, R. (1986). A Hamiltonian QR algorithm. SIAM J. Sci. Statist. Comput., 7(1):212–229. Byers, R. (1988). A bisection method for measuring the distance of a stable to unstable matrices. SIAM J. Sci. Statist. Comput., 9:875–881. Byers, R. (1990). A Hamiltonian-Jacobi algorithm. IEEE Trans. Automat. Control, 35:566–570. Byers, R. and Nash, S. (1987). On the singular “vectors” of the Lyapunov operator. SIAM J. Algebraic Discrete Methods, 8(1):59–66. Chatelin, F. (1984). Simultaneous Newton’s iteration for the eigenproblem. In Defect correction methods (Oberwolfach, 1983), volume 5 of Comput. Suppl., pages 67–74. Springer, Vienna. Demmel, J. W. (1987). Three methods for refining estimates of invariant subspaces. Computing, 38:43–57. Dongarra, J. J., Sorensen, D. C., and Hammarling, S. J. (1989). Block reduction of matrices to condensed forms for eigenvalue computations. J. Comput. Appl. Math., 27(1-2):215–227. Reprinted in Parallel algorithms for numerical linear algebra, 215–227, North-Holland, Amsterdam, 1990.
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
37
Faßbender, H., Mackey, D. S., and Mackey, N. (2001). Hamilton and Jacobi come full circle: Jacobi algorithms for structured Hamiltonian eigenproblems. Linear Algebra Appl., 332/334:37–80. Faßbender, H., Mackey, D. S., Mackey, N., and Xu, H. (1999). Hamiltonian square roots of skew-Hamiltonian matrices. Linear Algebra Appl., 287(1-3):125–159. Ferng, W., Lin, W.-W., and Wang, C.-S. (1997). The shift-inverted J-Lanczos algorithm for the numerical solutions of large sparse algebraic Riccati equations. Comput. Math. Appl., 33(10):23–40. Freiling, G., Mehrmann, V., and Xu, H. (2002). Existence, uniqueness, and parametrization of Lagrangian invariant subspaces. SIAM J. Matrix Anal. Appl., 23(4):1045–1069. Gantmacher, F. (1960). The Theory of Matrices. Chelsea, New York. Genin, Y., Van Dooren, P., and Vermaut, V. (1998). Convergence of the calculation of H ∞ norms and related questions. In Beghi, A., Finesso, L., and Picci, G., editors, Proceedings of the Conference on the Mathematical Theory of Networks and Systems, MTNS ’98, pages 429–432. Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations. Johns Hopkins University Press, Baltimore, MD, third edition. Green, M. and Limebeer, D. (1995). Linear Robust Control. Prentice-Hall, Englewood Cliffs, NJ. Gu, M. (2000). New methods for estimating the distance to uncontrollability. SIAM J. Matrix Anal. Appl., 21(3):989–1003. Guo, C. and Lancaster, P. (1998). Analysis and modification of Newton’s method for algebraic Riccati equations. Math. Comp., 67:1089–1105. Hench, J. J. and Laub, A. J. (1994). Numerical solution of the discrete-time periodic Riccati equation. IEEE Trans. Automat. Control, 39(6):1197–1210. Higham, N. J. (1996). Accuracy and stability of numerical algorithms. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA. ¨ Huper, K. and Van Dooren, P. (2003). New algorithms for the iterative refinement of estimates of invariant subspaces. Journal Future Generation Computer Systems, 19:1231–1242. Hwang, T.-M., Lin, W.-W., and Mehrmann, V. (2003). Numerical solution of quadratic eigenvalue problems with structure-preserving methods. SIAM J. Sci. Comput., 24(4):1283–1302. Kleinman, D. (1968). On an iterative technique for Riccati equation computations. IEEE Trans. Automat. Control, AC-13:114–115. Konstantinov, M., Mehrmann, V., and Petkov, P P. (2001). Perturbation analysis of Hamiltonian Schur and block-Schur forms. SIAM J. Matrix Anal. Appl., 23(2):387–424. Kressner, D. (2003a). Block algorithms for orthogonal symplectic factorizations. BIT, T 43(4):775– 790. Kressner, D. (2003b). A Matlab toolbox for solving skew-Hamiltonian and Hamiltonian eigenvalue problems. Online available from http:/www.math.tu-berlin.de/~kressner/hapack/matlab/. Kressner, D. (2003c). The periodic QR algorithm is a disguised QR algorithm. To appear in Linear Algebra Appl. Kressner, D. (2003d). Perturbation bounds for isotropic invariant subspaces of skew-Hamiltonian matrices. To appear in SIAM J. Matrix Anal. Appl.. Kressner, D. (2004). Numerical Methods and Software for General and Structured Eigenvalue ¨ Mathematik, Berlin, Germany. Problems. PhD thesis, TU Berlin, Institut ffur Lancaster, P. and Rodman, L. (1995). The Algebraic Riccati Equation. Oxford University Press, Oxford.
38
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Lin, W.-W. and Ho, T.-C. (1990). On Schur type decompositions for Hamiltonian and symplectic pencils. Technical report, Institute of Applied Mathematics, National Tsing Hua University, Taiwan. Lin, W.-W., Mehrmann, V., and Xu, H. (1999). Canonical forms for Hamiltonian and symplectic matrices and pencils. Linear Algebra Appl., 302/303:469–533. Mehrmann, V. (1991). The Autonomous Linear Quadratic Control Problem, Theory and Numerical Solution. Number 163 in Lecture Notes in Control and Information Sciences. SpringerVerlag, Heidelberg. Mehrmann, V. and Watkins, D. S. (2000). Structure-preserving methods for computing eigenpairs of large sparse skew-Hamiltonian/Hamiltonian pencils. SIAM J. Sci. Comput., 22(6):1905– 1925. Ortega, J. M. and Rheinboldt, W. C. (1970). Iterative solution of nonlinear equations in several variables. Academic Press, New York. Paige, C. and Van Loan, C. F. (1981). A Schur decomposition for Hamiltonian matrices. Linear Algebra Appl., 41:11–32. Petkov, P. H., Christov, N. D., and Konstantinov, M. M. (1991). Computational Methods for Linear Control Systems. Prentice-Hall, Hertfordshire, UK. Raines, A. C. and Watkins, D. S. (1994). A class of Hamiltonian–symplectic methods for solving the algebraic Riccati equation. Linear Algebra Appl., 205/206:1045–1060. Schreiber, R. and Van Loan, C. F. (1989). A storage-efficient W Y representation for products of Householder transformations. SIAM J. Sci. Statist. Comput., 10(1):53–57. Sima, V. (1996). Algorithms for Linear-Quadratic Optimization, volume 200 of Pure and Applied Mathematics. Marcel Dekker, Inc., New York, NY. Sorensen, D. C. (2002). Passivity preserving model reduction via interpolation of spectral zeros. Technical report TR02-15, ECE-CAAM Depts, Rice University. Stefanovski, J. and Trenˇcˇ evski, K. (1998). Antisymmetric Riccati matrix equation. In 1st Congress of the Mathematicians and Computer Scientists of Macedonia (Ohrid, 1996), pages 83–92. Sojuz. Mat. Inform. Maked., Skopje. Stewart, G. W. (1971). Error bounds for approximate invariant subspaces of closed linear operators. SIAM J. Numer. Anal., 8:796–808. Stewart, G. W. (1973). Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Rev., 15:727–764. Stewart, G. W. (2001). Matrix algorithms. Vol. II. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA. Eigensystems. Stewart, G. W. and Sun, J.-G. (1990). Matrix Perturbation Theory. Academic Press, New York. Sun, J.-G. (1998). Stability and accuracy: Perturbation analysis of algebraic eigenproblems. Technical report UMINF 98-07, Department of Computing Science, University of Ume˚a, Ume˚a˚ , Sweden. Tisseur, F. (2001). Newton’s method in floating point arithmetic and iterative refinement of generalized eigenvalue problems. SIAM J. Matrix Anal. Appl., 22(4):1038–1057. Tisseur, F. (2003). A chart of backward errors for singly and doubly structured eigenvalue problems. SIAM J. Matrix Anal. Appl., 24(3):877–897. Tisseur, F. and Meerbergen, K. (2001). The quadratic eigenvalue problem. SIAM Rev., 43(2):235– 286. Van Dooren, P. (2003). Numerical Linear Algebra for Signal Systems and Control. Draft notes prepared for the Graduate School in Systems and Control. Van Loan, C. F. (1975). A general matrix eigenvalue algorithm. SIAM J. Numer. Anal., 12(6):819– 834.
Skew-Hamiltonian and Hamiltonian Eigenvalue Problems
39
Van Loan, C. F. (1984a). How near is a matrix to an unstable matrix? Lin. Alg. and its Role in Systems Theory, 47:465–479. Van Loan, C. F. (1984b). A symplectic method for approximating all the eigenvalues of a Hamiltonian matrix. Linear Algebra Appl., 61:233–251. Watkins, D. S. (2002). On Hamiltonian and symplectic Lanczos processes. To appear in Linear Algebra Appl. Watkins, D. S. and Elsner, L. (1991). Convergence of algorithms of decomposition type for the eigenvalue problem. Linear Algebra Appl., 143:19–47. Wilkinson, J. H. (1965). The algebraic eigenvalue problem. Clarendon Press, Oxford. Xu, H. and Lu, L. Z. (1995). Properties of a quadratic matrix equation and the solution of the continuous-time algebraic Riccati equation. Linear Algebra Appl., 222:127–145. Zhou, K., Doyle, J. C., and Glover, K. (1996). Robust and Optimal Control. Prentice-Hall, Upper Saddle River, NJ.
A GENERAL FRAME FOR THE CONSTRUCTION OF CONSTRAINED CURVES Paolo Costantini1 , Maria Lucia Sampoli1 costantini,
[email protected]
Abstract
The aim of the present paper is to review the basic ideas of the so called abstract schemes (AS) and to show that they can be used to solve any problem concerning the construction of spline curves subject to local (i.e. piecewise defined) constraints. In particular, we will use AS to solve a planar parametric interpolation problem with free knots.
Keywords:
Shape preserving interpolation, fairness, splines, CAGD.
1.
Introduction
During the recent years, a conspicuous part of research on mathematical methods for the construction of curves (and surfaces) has been devoted to develop new algorithms which satisfy, along with the classical interpolation or approximation conditions, also other constraints given by the context where the curve is looked for. Typical examples are given by data interpolation in industrial applications or by data approximation in the context of reverse engineering, where it is often required that the shape of the curve reproduces the shape of the data. Typically, the solution is provided by “ad-hoc” methods, specifically constructed for a single class of problems. On the other hand we notice that, in the case of piecewise (polynomial, exponential, rational etc.) curves, very often these schemes follow a common procedure. First, a suitable set of parameters is chosen and each piece of the function is expressed using some of these parameters; second, the constraints are also rewritten in terms of these parameters and a set of admissible domains
1 Dipartimento
di Scienze Matematiche ed Informatiche, Pian dei Mantellini, 44 – 53100 Siena – Italy
41 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 41–66. © 2005 Springer. Printed in the Netherlands.
42
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
is derived, and, third, a theory is developed for checking the feasibility of the problem and, possibly, an algorithm for computing a solution is provided. An unified algorithmic approach was proposed some years ago in the so called abstract schemes, which provide a general purpose practical theory that can give a common framework in which various and different problems and targets can be dealt with. It can be proven indeed that any problem regarding piecewise defined interpolating or approximating functions subject to any kind of local (i.e. piecewise defined) constraints can be modelled by means of abstract schemes and therefore solved with a general algorithmic procedure. In general a constrained curve construction problem admits a set of possible solutions; abstract schemes can be easily linked with many optimization functionals which can be used to select the best solution among the admissible ones. It is worthwhile to emphasize that AS are merely a strategy to check the feasibility of a set defined by weakly coupled relations and, possibly, to find an interior point. Despite the applications developed so far, AS can be (and will be) described without any connection to spline problems. Aim of this paper is to explain the basic structure of this schemes, to discuss some recent improvements and to show some applications in practical problems. The content will be divided into five sections. In the next one we will briefly recall the main algorithms and in section three we will explain how AS can be adopted to construct curves subject to constraints. Section four will be devoted to explore a new example: fair, shape preserving geometric interpolation of planar data and section five to final conclusions and remarks.
2.
The general structure of Abstract Schemes
In order to make this paper self contained we will recall some results on AS in the more general form; details and proofs can be found in the quoted papers.
2.1
The basic algorithms
Let Xi = ∅; i = 0, 1, . . . , N be an arbitrary sequence of sets and let Di ⊆ Xi × Xi+1 , Di = ∅; i = 0, 1, . . . , N − 1.
(1)
We consider the new set D ⊆ {X0 × X1 × · · · × XN } given by D = {di ∈ Xi s.t. (di , di+1 ) ∈ Di ; i = 0, . . . , N − 1}
(2)
43
Constrained Curves
and we set the following two problems. P1 Is D non empty? In other words do there exist sequences (di ∈ Xi ; i = 0, . . . , N ) such that (di , di+1 ) ∈ Di , i = 0, 1, . . . , N − 1.
(3)
P2 If there exist sequences fulfilling (3), is it possible to build up an algorithm which computes one among them efficiently? The solution was developed in [24] (see also [25]), where the term staircase algorithm is used, and independently in [12], [5] (see also [6]). The idea is to process the data first in one direction, for instance from 0 to N , through algorithm ALG1(D) (forward sweep), and then in the opposite direction, through algorithm ALG2(A0 , . . . , AN , D) (backward sweep). In more detail, let us denote with Πji,i+1 : Xi × Xi+1 → Xj ; j = i, i + 1 the projection maps from the “i, i + 1-plane” onto the “i-axis” and “i + 1-axis” respectively, and let us define the sets Bi := Πii,i+1 (Di ); i = 0, 1, . . . , N − 1;
BN := XN .
(4)
In the forward sweep the admissible domains A i are determined. We should observe in fact that for every parameter d i both the constraint domain coming from the segment (i − 1, i), that is Di−1 , and the one coming from the segment (i, i + 1), that is Di , have to be taken into account. Thus it is necessary to determine for every parameter the true admissible domain and this is indeed done by algorithm ALG1(D). Algorithm ALG1(D) 1. Set A0 := B0 , J := N 2. For i = 1, . . . , N 2.1 Set Ai := Πii−1,i (Di−1 ∩ {Ai−1 × Bi }). 2.2 If Ai = ∅ set J := i and stop. 3. Stop. At this concern, the following result holds ([5]). Theorem 1. P1 has a solution if, and only if, J = N that is A i = ∅, i = 0, 1, . . . , N . If (d0 , d1 , . . . , dN ) is a solution then di ∈ Ai ; i = 0, 1, . . . , N.
(5)
44
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING i
Di-1 ) Π2 (D B
Di-11
Ai i
Ai-1
Figure 1.
i-1
Algorithm ALG1(D): construction of the admissible domains Ai .
We remark that, in general, a solution of P1 is not unique and that the necessary condition (5) is not sufficient. Thus, if the sequence of non-empty sets A0 , . . . , AN has been defined by algorithm ALG1(D), a first simple scheme for computing a sequence (d0 , d1 , . . . , dN ) is provided by the following algorithm (backward sweep) whose effectiveness is guaranteed by Theorem 2 (we refer again to [5] for the proof). Algorithm ALG2(A0 , . . . , AN , D) 1. Choose any dN ∈ AN . 2. For i = N − 1, N − 2, . . . , 0 2.1 set Ci (di+1 ) := Πii,i+1 (Di ∩ {Ai × {di+1 }}) 2.2 Choose any di ∈ Ci (di+1 ) 3. Stop. Theorem 2. Let the sequence A0 , A1 , . . . , AN be given by algorithm ALG1(D), with Ai = ∅; i = 0, 1, . . . , N . Then algorithm ALG2(A 0 , . . . , AN , D) can be completed (that is the sets C(di+1 ) are not empty) and any sequence (d0 , d1 , . . . ,
45
Constrained Curves
dN ) computed by algorithm ALG2(A0 , . . . , AN , D) is a solution for problem P2.
i+1
di+1
Ai+1 Di
C(d i+1) Ai
Figure 2.
2.2
i
Algorithm ALG2(A0 , . . . , AN , D): graphical sketch of step 2.1.
Abstract schemes and set valued maps
We intend to reformulate algorithms ALG1(D) and ALG2(A 0 , . . . , AN , D) in terms of set valued maps, so that to enhance AS with general boundary conditions. The reader is referred to the classical books [2] and [4] for the basic definitions. Here and in the next subsection we suppose D = ∅ (ALG1(D) gives J = N ) and we restrict the generality of the previous definitions assuming that the sets X0 , . . . , XN are Banach spaces, that, for all i, Di ⊂ Xi × Xi+1 is closed and convex and that, for all i, B i := Πii,i+1 (Di ) ∈ Xi is closed. We start by observing that ALG1(D) is nothing more than the description of a map which takes the initial admissible set, A 0 := B0 ⊆ X0 , and gives the last one, AN = AN (D) ⊆ XN . In order to express more precisely this fact, let δ0 ∈ Π00,1 (D0 ), let ∆0 := {(d0 , d1 ) ∈ D0 s.t. d0 = δ0 } ,
46
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
from which we have Π00,1 (∆0 ) = δ0 , and Dδ0 = {di ∈ Xi s.t. (d0 , d1 ) ∈ ∆0 , (di , di+1 ) ∈ Di ; i = 1, . . . , N − 1} . Let us consider ALG1(Dδ0 ) (note that the algorithm starts with A 0 = δ0 ) and let AN = AN (Dδ0 ) be the corresponding last admissible subset. Then Φ : X0 → XN defined by ∅ if δ0 ∈ / A0 Φ(δ0 ) = (6) AN = AN (Dδ0 ) given by ALG1(Dδ0 ) if δ0 ∈ A0 is a set valued map, and, setting by definition ([2], [4]) Φ(A0 ) := Φ(δ0 ), δ0 ∈A0
we immediately have AN = AN (D) = Φ(A0 ) which furnishes the description of ALG1 in terms of set valued maps. The proof of the following result (which is a simple consequence of the closed graph theorem [2, p.42]) can be found in [7]. Theorem 3. Φ defined by (6) is upper semi continuous (u.s.c.). Now let β : X0 → XN be a single valued continuous function with continuous inverse β −1 such that the image β(U U0 ) of any convex set U0 ⊂ X0 is convex, and the inverse image β −1 (U UN ) of any convex set UN ⊂ XN is convex. Assume that Φ(A0 ) ∩ β(A0 ) = ∅ and consider the set valued function Γ : X0 → X0 defined by Γ(δ0 ) := β −1 (Φ(δ0 ) ∩ β(δ0 )) .
(7)
Note that β −1 (Φ(δ0 ) ∩ β(δ0 )) ⊆ A0 and so Γ(A0 ) ⊆ A0 where both A0 and Γ(A0 ) are convex and compact. Additionally the continuity of β and β −1 and Theorem 3 imply that Γ is u.s.c.; therefore we can apply the Kakutani’s fixed point theorem [2, p.85] and give the following result. Theorem 4. There exists δ0 ∈ A0 such that δ0 ∈ Γ(δ0 ).
2.3
Boundary conditions
As anticipated in the introduction, we intend to maintain the exposition in this section at an abstract level. However, in order to comprehend the results of this subsection, it is useful to recall that our goal is to construct spline functions or spline curves and that the domains D i are typically given by shape constraints imposed to each polynomial piece. In applications, spline curves are often required to satisfy boundary conditions: assigned end tangent or periodic conditions constitute two typical examples.
47
Constrained Curves
We will consider two different problems. Given δ0 , δN , find sequences (di ∈ Xi ; i = 0, . . . , N ) such that (di , di+1 ) ∈ Di , i = 0, 1, . . . , N − 1 with d0 = δ0 , dN = δN .
(8)
Given β as described in the previous subsection, find sequences (d i ∈ Xi ; i = 0, . . . , N ) such that (di , di+1 ) ∈ Di , i = 0, 1, . . . , N − 1 with dN = β(d0 ).
(9)
Conditions (8) and (9) are called, respectively, separable and non-separable. The situation is quite different. Separable boundary conditions imply only a reformulation of the first and last domain and no modifications of algorithms ALG1 and ALG2. Using the same notations of subsection 2.2, we define the new domains ∆0 := {(d0 , d1 ) ∈ D0 s.t. d0 = δ0 } , ∆N −1 := {(dN −1 , dN ) ∈ DN −1 s.t. dN = δN } and Dδ0 ,δN = {di ∈ Xi s.t. (d0 , d1 ) ∈ ∆0 , (dN −1 , dN ) ∈ ∆N −1 , (di , di+1 ) ∈ Di ; i = 1, . . . , N − 2} . A solution to (8) can be obtained running the algorithms ALG1(D δ0 ,δN ) (note that in this case ALG1 returns A0 = {δ0 }, AN = {δN }) and ALG2(A0 , . . . , AN , Dδ0 ,δN ). We remark that (8) is the simplest example; in general, we define as separable all the boundary conditions that can be defined only in terms of d0 , d1 and dN −1 , dN respectively. The situation is much more complicate for non-separable conditions; β relates the first and the last parameter and destroys the sequential structure which is the basis of ALG1 and ALG2. Using the notation of subsection (2.2), we start with the following result [7]. Theorem 5. There is a sequence satisfying (9) if, and only if, Φ(A0 ) ∩ β(A0 ) = ∅. We make now use of Theorem (4) which says that there exists δ 0 ∈ A0 such that δ0 ∈ β −1 (Φ(δ0 ) ∩ β(δ0 )) , or, in other words, such that β(δ0 ) ∈ Φ(δ0 ). Recalling definition (6) a first procedure for computing a solution for (9) is sketched below.
48
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Algorithm ALG3(D, β) 1. Use ALG1(D) for computing AN = Φ(A0 ) 2. If Φ(A0 ) ∩ β(A0 ) = ∅ Stop 3. Else, use ALG1(Dδ0 ) and some search strategy to find δ0 such that β(δ0 ) ∈ Φ(δ0 ) 4. Use ALG2(δ0 , . . . , β(δ0 ), Dδ0 ) to compute a solution We conclude this subsection observing that for space limitation we omit here the description of more efficient techniques which are reported in [7]; for a general purpose code which, using the set valued approach, produces shape preserving interpolating splines with boundary conditions, the reader is referred to [9] and [10].
2.4
Choice of an optimal solution
It is clear from algorithms A1 and A2, that, in general, it is possible to find infinite sequences (d0 , d1 , . . . , dN ) satisfying the constraints, because in algorithm A2 the admissible sets Ci (di+1 ), defined in step 2.1, do not reduce, in general, to a single point. It is therefore a natural idea to seek for an optimal sequence, where the optimality criterion can be given as the minimum of a suitable functional (classical examples will be given in the next section) F : X0 × X1 × · · · × XN → R
(10)
which must be defined in terms of local contributions g i (di , di+1 ). Although several possibilities could be conceived for (10), for simplicity we will limit ourselves to functionals of the form F (d0 , d1 , · · · , dN ) :=
N −1
gi (di , di+1 )
(11)
i=0
or F (d0 , d1 , · · · , dN ) := max {gi (di , di+1 ), i = 0, 1, . . . , N − 1} ,
(12)
where gi : Xi × Xi+1 → R. Let be given δi ∈ Xi ; for functional of the form (11) let Ψi (δi ) :=
min
d0 ,...,di−1
(g0 (d0 , d1 ) + · · · + gi−2 (di−2 , di−1 ) + gi−1 (di−1 , δi ))
and for functional of the form (12) let Ψi (δi ) :=
min
d0 ,...,di−1
max {g0 (d0 , d1 ), . . . , gi−2 (di−2 , di−1 ), gi−1 (di−1 , δi )} .
49
Constrained Curves
Note that min
d0 ,d1 ,...,dN
F (d0 , d1 , . . . , dN ) = min ΨN (dN ) . dN
To solve the optimization problem we present here an approach based on dynamic programming (DP) [3]. It is well known that there are many algorithms which are more efficient than DP. However DP is extremely flexible, as many functionals and any kind of separable constraints can be processed using the same algorithmic structure and, unlike other optimization methods, constraints play here a positive role, limiting the size of the decision space. In this regard, we may observe that the functional recurrence relations of dynamic programming can be very efficiently linked with the constraints in Algorithm ALG2. We refer to [14] for full details on how implement dynamic programming in Algorithm ALG2. Below is reported a sketch of the algorithm for the form (11) where we have stored in Ψi the cost associated with the i-th stage and in T i is stored the optimal policy. As a consequence, starting with the optimal d N , we obtain the optimal dN −1 := TN −1 (dN ) and so on. Algorithm ALG2DP(A0 , . . . , AN , D). 1. For any δ0 ∈ A0 set Ψ0 (δ0 ) := 0 2. For i = 1, 2, . . . , N 2.1 For any δi ∈ Ai compute Ci−1 (δi ) := Πi−1 i−1,i (Di−1 ∩ {Ai−1 × {δi }}) 2.2 For any δi ∈ Ai compute Ψi (δi ) := min (g(δi−1 , δi ) + Ψi−1 (δi−1 )) = δi−1 ∈Ci−1 (δi )
= g(T Ti−1 (δi ), δi ) + Ψi−1 (T Ti−1 (δi )) and the corresponding optimizing value T i−1 (δi ) 3. Compute dN such that ΨN (dN ) = min ΨN (δN ) δN ∈AN
4. For i = N − 1, . . . , 0 4.1. di := Ti (di+1 ) 5. Stop. We note that for functionals of the form (12) we need only to change step 2.2 of the algorithm in Ψi (δi ) :=
min
δi−1 ∈Ci−1 (δi )
max {g(δi−1 , δi ), Φi−1 (δi−1 )}
50
2.5
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Practical applications
Despite the abstract description of the previous results, in the following we will assume Xi = Rq , i = 0, 1, . . . , N since all the practical applications have been obviously implemented in this case. The two-sweep scheme, given by algorithms ALG1 and ALG2 (or ALG2DP), has turned out to be an effective method to solve several problems and its main attraction relies in the fact that it is general, being applicable to a wide range of problems. We refer to [8] and [26] and to the references therein for some applications of this idea. However a closer inspection shows us that the practical usage of this method has been for many years confined to the case D i ∈ R × R, that means q = 1. This is due to the fact that in algorithms, either ALG1 or ALG2 we have to compute the projection of intersections of subsets in a product space. More precisely, we may recall, for instance, that step 2.1 of ALG1, which is the kernel of all the modifications and improvements later developed, requires the computation of the following set: Ai := Πii−1,1 (Di−1 ∩ {Ai−1 × Bi }), and this leads, even in the simplest higher dimension case, that is q = 2, to intersections and projections of arbitrary subsets of R 2 × R2 . Even in the case of linear inequalities for the constraints (D i would be a polytope of R4 ), the corresponding algorithm is extremely difficult to implement and has an unaffordable computational cost. Indeed, in R q × Rq , the computational cost 2 of set intersections and their projections is given by O(n q −1 log n), where n is the number of polytope vertices, see [22] for full details. Thus, the practical application of abstract schemes has been for many years restricted to univariate problems, where we have only one parameter associated with every knot (two for every segment). This limitation is rather restrictive as univariate problems suffice in general to model interpolation off functions, while are not suitable for interpolation of parametric curves, which can represent closed curves. We may see that usually parametric planar curve interpolation gives rise already to constraint domains in R 2 × R2 . Recent research has been therefore devoted to develop a new theory and construct new methods suitable and applicable to multivariate constrained problems (see for instance [13], [21]). Recently a new approach has been proposed (see [14], [15]). It is based on the observation that if we consider the union of 2q-boxes (i.e. rectangular parallelepipeds with facets parallel to the coordinate hyperplanes) the computational 2 cost of their intersections and projections is reduced to O(n log q −1 n), [22].
Constrained Curves
51
The basic simple idea of the new method is that of approximating the constraint i domains Di with a union of 2q-boxes D We refer to [14] for full details on this new approach. For the sake of completeness we report here the main ideas on how this approximation is performed. For every domain Di , we suppose we are able to give an estimate of a lower and/or upper bound for each dimension. Then we may choose a step size h = (h1 , h2 , . . . , hq ) and, starting from the lower (upper) bound, construct a multidimensional grid in Rq × Rq whose dimension is assigned. We thus approximate every domain Di with the union of those boxes whose vertices are i ⊆ Di and we may easily see that, contained in Di . By construction we have D i ) → 0. given h, for h → 0, we have meas (Di \D The next step consists in making a further approximation. Once we have i , we consider only the discrete values for the paramobtained the domains D eters (di , di+1 ), corresponding to the vertices of the considered boxes. This is equivalent to working with discrete domains, which we denote by D i . We then select the points of the grid which are vertices of a 2q-box contained in D i . At the end of this process we obtain a sequence of domains D i such that again approximate Di and D i ⊆ Di . As in the continuous case, we may select an optimal solution by optimizing a suitable functional, using dynamic programming. The fact that the parameters di vary in discrete domains is well suited for applying the dynamic programming in the minimization process. Regarding the convergence analysis, the following result holds (we refer to [14] for the proof). Theorem 6. Let the domains D0 , D1 , . . . , DN −1 , with Di ⊂ Rq × Rq , be given. Let D0 , D 1 , . . . , D N −1 be the corresponding discrete domains obtained with a grid of step size h. Let us denote now with (d ∗0 , d∗1 , . . . , d∗N ) a solution in D which maximizes also a continuous functional F , with a unique absolute maximum, and let (d¯∗0 , d¯∗1 , . . . , d¯∗N ) be a discrete counterpart. Then lim (d¯∗0 , d¯∗1 , . . . , d¯∗N ) = (d∗0 , d∗1 , . . . , d∗N ), hmax := max(h1 , h2 , . . . , hq ).
hmax →0
We remark that as the parameters (di , di+1 ) can assume only the discrete values corresponding to non zero elements of the i-th logical matrix, the operations of intersection, projection, cartesian product, etc. are easily performed on the matrix by the logical operators AND, OR, taking only some planes, putting together more planes and so on. This way of proceeding has revealed to be very effective from the computational point of view and it can be extended straightforwardly to domains in Rq × Rq (the number of planes is in general given by 2q).
52
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING i+1
i+1
h
i
(a)
i
(b)
Figure 3. (a):Every domain Di is replaced by a union of 2q-boxes.(b) only the values at the vertices of the considered boxes are taken.
3.
Construction of constrained curves
Aim of this section is to show that the construction of any piecewise curve subject to any kind of locally defined constraints and minimizing any functional of the form (10) can be done using the AS. Obviously, so strong statements refer to a theoretical level; practical implementations can be sometimes limited by technical bounds. Here, the term piecewise curve denotes either a function s = s(t), s : R → R or s = s(t), s : R → Rn , n = 2, 3 whose pieces are defined on a knot sequence t 0 < t1 < · · · < tN (sometimes, the case n = 1 will be referred to as functional case). For simplicity we will not deal with Frenet continuity or geometric continuity (see, e.g. [19]) but only with analytical continuity, where the lower triangular connection matrix reduces to the identity matrix. We start with a simple and well-known example. Given a sequence of interpoi = (ffi+1 − fi )/(ti+1 − ti ) lation points {(t0 , f0 ), (t1 , f1 ), . . . , (tN , fN )}, let λ 0 ≥ 0 and and assume that they are non-decreasing and convex, that is λ i ≤ λ i+1 . We want to construct a shape preserving interpolating cubic spline, λ that is a function s ∈ S3r , where S3r = s ∈ C r [t0 , tN ] s.t. s|[ti ,ti+1 ] ∈ P3 , such that s(ti ) = fi ; i = 0, . . . , N and s (t) ≥ 0, s (t) ≥ 0; t ∈ [t0 , tN ]. Let si = si (t) denotes the i-th polynomial piece and let ρi,j = si (tj ), ρ i,j = s i (tj ), ρ i,j = s i (tj ); j = i, i + 1.
53
Constrained Curves
Obviously si can be expressed in the Hermite form using the values (ρ i,i , ρ i,i ), (ρi,i+1 , ρ i,i+1 ); the interpolation conditions give ρ i−1,i = ρi,i = fi , and the shape conditions restricted to [ti , ti+1 ] easily lead to s i (t) ≥ 0, s i (t) ≥ 0 iff (ρ i,i , ρ i,i+1 ) ∈ Ri , where
i ; η ≥ − 1 ξ + Ri = (ξ, η) s.t. ξ ≥ 0; η ≥ 0; η ≤ −2ξ + 3λ 2
3 λi . 2
The simplest case is when r = 1. We can formally set di = ρi−1,i , ρi,i , ρ i−1,i , ρ i,i ; i = 0, 1, . . . , N, define the constraint domains as Di = (di , di+1 ) s.t. ρi−1,i = ρi,i = fi ; ρ i−1,i = ρ i,i ; ρi,i+1 = ρi+1,i+1 = fi+1 ; ρ i,i+1 = ρ i+1,i+1 ; ρ i,i , ρ i,i+1 ∈ Ri , state problems P1 and P2 and solve them with the aid of ALG1, ALG2 or ALG2DP or, in case of non-separable boundary conditions, with the aid of ALG3. In terms of the notation of subsection 2.5 this is apparently a 4dimensional problem; obviously the equality constraints imply a reduction in the number of variables which can be reformulated as a simple 1-dimensional problem (see [9], where the present example can be seen as a particular case). Consider now r = 0; the constraint domains now become Di = {(di , di+1 ) s.t. ρi−1,i = ρi,i= fi ; ρi,i+1 = ρi+1,i+1 = fi+1 ;
ρ i,i , ρ i,i+1 ∈ Ri ,
and again the usual formulation can be repeated. Finally, in order to deal with the case r = 2, we must observe that s i can be expressed as the solution of the following overdetermined interpolation problem si (tj ) = ρi,j ; s i (tj ) = ρ i,j ; s i (tj ) = ρ i,j ; j = i, i + 1, where ρ i,i , ρ i,i+1 are subject to the constraints of being the second derivatives of a third degree polynomial expressed in the Hermite form; in other words we have relations of the form ρ i,i = φi,i ρi,i , ρi,i+1 , ρ i,i , ρ i,i+1 ; (13) ρ i,i+1 = φi,i+1 ρi,i , ρi,i+1 , ρ i,i , ρ i,i+1 .
54
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
In this case we set di = ρi−1,i , ρi,i , ρ i−1,i , ρ i,i ρ i−1,i , ρ i,i ; i = 0, 1, . . . , N, define the constraint domains as Di = (di , di+1 ) s.t. ρi−1,i = ρi,i = fi ; ρ i−1,i = ρ i,i ; ρ i−1,i = ρ i,i ; ρ = fi+1 ; ρ i,i+1 i,i+1 = ρi+1,i+1 = ρi+1,i+1 ; ρi,i+1 =ρi+1,i+1 ; ρ i,i , ρ i,i+1 ∈ Ri ; ρ i,i = φi,i ρi,i , ρi,i+1 , ρ i,i , ρ i,i+1 ; ρ i,i+1 = φi,i+1 ρi,i , ρi,i+1 , ρ i,i , ρ i,i+1 ,
and use again the theory developed in section 2. As we have already pointed out, there are sometimes pactical limitations. In this last case we have D i ⊂ R6 × R6 , which, using the interpolation and the continuity conditions, can be transformed into a new Di ⊂ R2 × R2 ; however, the relations (13) imply that for this new subset meas(Di ) = 0 and therefore the discrete algorithms proposed in subsection 2.5 cannot be applied. Indeed, we have not been able so far to handle situations of the form Di ⊂ Rq × Rq ; meas(Di ) = 0 for q > 1. The previous example can be extended to the construction of general curves. The basic idea is to express every component of each piece of the curve as the solution of a (possibly overdetermined) Hermite interpolation problem (and, if possible, to eliminate the common variables – e.g. ρ i,i = ρi−1,i = fi in the example). When the continuity conditions are more than the number of the parameters which define the piece of the curve (in the last case of our example 2(r + 1) = 2(2 + 1) > (3 + 1) = 4), we transfer the relations among the redundant variables in the domain of constraints. On the other hand we have already put in evidence the high generality both of the constraint domains (2) and of the optimization functional (10). In the remaining part of this section we will briefly mention some recent applications to parametric interpolating curves. We also recall the older papers [9], [10] (where C 1 or C 2 shape preserving spline functions, interpolating a set of data and possibly subject to boundary conditions are constructed) and [20] (where positive and co-monotone quadratic histosplines are considered).
3.1
C 1 cubic curves interpolating planar data
Given a set of data points and the corresponding parameter values (ti , fi ) , fi ∈ R2 , i = 0, 1, . . . , N,
55
Constrained Curves
we define the differences and the slopes of the data i := fi+1 − fi ; i = 0, 1, . . . , N − 1 λi := fi+1 − fi , λ ki where ki := ti+1 − ti , i = 0, 1, . . . , N − 1, we want to construct an interpolating spline curve s which is shape preserving and such that each component belongs to S31 . Each cubic piece can be defined using the B´e´ zier control points bi,0 , bi,1 , bi,2 , bi,3 , with bi,0 = fi , bi,3 = fi+1 and
ki ki Ti , bi,2 = fi+1 − Ti+1 3 3 where the (unknown) tangent vectors can be expressed as bi,1 = fi +
i−1 + vi λ i ; i = 0, . . . , N. Ti := ui λ Therefore, using the notation of section 2, we have di = (ui , vi ); q = 2 We intend now to define the set Di in terms of shape constraints. For simplicity we will limit ourselves to impose that i−1 ∧ λ i ) · ((b1i − fi ) ∧ (b2i − b1i )) ≥ 0 (λ i ∧ λ i+1 ) · ((b2i − b1i ) ∧ (ffi+1 − b2i )) ≥ 0. (λ
(14)
(for more complete constraints and for a geometric interpretation see [15]) and define the constraints domain as Di = {(ui , vi , ui+1 , vi+1 ) ∈ R2 × R2 s.t. (14) is satisfied} Note that in this case, the boundaries of D i can be explicitly computed. Strictly speaking, relations (14) act on the shape of the control polygon; however, assuming that the hypotheses of [18] are satisfied, as usually occurs in practice, the shape of s is the same as its control polygon. Usually, the additional goal for this kind of problems is to obtain a pleasant curve. The mathematical translation of this qualitative request is given by the minimization of some fairness functional. Let, as usual, k(σ) :=
s (σ) ∧ s (σ) , if s (σ) = 0 s (σ) 3
denote the curvature vector, where
!
σ = σ(t) = 0
t
|˙s(τ )|dτ
(15)
56
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
is the arc length. We can consider, in the optimization process of ALG2DP, two possible functionals: N −1 ! σ(ti+1 ) F (d0 , d1 , . . . , dN ) := k22 (σ) dσ (16) i=0
σ(ti )
or F (d0 , . . . , dN ) :=
max
max
0≤i≤N −1
σ(ti )≤σ≤σ(ti+1 )
k(σ) ∞ ;
(17)
in Fig. 4 a spline curve and its porcupine plot, that is the normalized curvature along the outer main normal, are shown. 3
2
1
0
−1
−2
−3 −3
Figure 4.
−2
−1
0
1
2
3
C 1 interpolating spline curve with fairness criterium (17)
We conclude this example observing that the space S 31 inherently produces discontinuous curvature; a possible alternative is to use algorithms ALG1 and ALG2DP for the construction of spline curves with components in S 62 . Details and graphical example can be found in [15].
3.2
Interpolating curves with maximal area
Given the data points as in the previous subsection, which in addition are supposed to be closed, that is f0 = fN , we want to construct a cubic, C 1 planar
57
Constrained Curves
curve which satisfy again the constraints (14). The difference is that we want the curve to bound a region of maximal area (a problem which is important in some engineering applications, as, for instance, in the design of ship hulls); this goal can be immediately achieved considering the functional F (d0 , d1 , . . . , dN ) :=
N −1
ai ,
i=0
where (see the graphical interpretation of Fig. 5)
ai fi
fi+1
fi+2
fi−1 Figure 5.
Geometric interpretation of formula (18)
!
ti+1
ai = ti
dy dx sx (t) − sy (t) dt dt
dt
(18)
(s = (sx , sy )) and minimizing −F using algorithm ALG2DP. A graphical example is reported in Fig. 6. We conclude this subsection noting that C 2 spline curves of degree 6 can also be constructed in this case; for details we refer to [23].
3.3
C 1 cubic curves interpolating spatial data
We are given a set of data points and the corresponding parameter values (ti , fi ) , fi ∈ R3 , i = 0, 1, . . . , N, of the data as in subsection 3.1. and define the differences λ and the slopes λ The polynomial pieces can again be defined in term of the B e´ zier control points
58
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Figure 6.
Cubic C 1 interpolating spline curve with maximal area
where bi,0 = fi , bi,3 = fi+1 and bi,1 = fi +
ki ki Ti , bi,2 = fi+1 − Ti+1 , 3 3
but in the present case Ti = ui λi−1 + vi λi + wi Ni ; i = 0, . . . , N, where Ni = λi−1 ∧ λi is the discrete curvature; therefore, using the notation of section 2, we now have di = (ui , vi , wi ); q = 3. Since we are now dealing with spatial curves the shape constraints must take the torsion into account. We introduce the discrete torsion τi =
det(|λi−1 , λi , λi+1 |) Ni Ni+1
59
Constrained Curves
and define the shape constraints as [(b1i − fi ) ∧ (b2i − b1i )] · Ni > 0, [(b1i − fi ) ∧ (b2i − b1i )] · Ni+1 > 0, [(b2i − b1i ) ∧ (ffi+1 − b2i )] · Ni+1 > 0, [(b2i − b1i ) ∧ (ffi+1 − b2i )] · Ni > 0, det (b1i − fi , b2i − b1i , fi+1 − b2i ) τi ≥ 0.
(19)
For the sake of simplicity we suppose the data points make the above definitions consistent, that is Ni = 0, all i; for more general constraints and for a geometric interpretation of (19) we refer to [14]. The constraint domains are defined as Di = {(ui , vi , wi , ui+1 , vi+1 , wi+1 ) ∈ R3 × R3 s.t. (19) is satisfied} Also in this case the boundaries of Di can be explicitly computed; moreover assuming that the hypotheses of [18] are satisfied, the shape of s is the same as its control polygon. The plots reported in figures 7 and 8 are obtained using the functionals of (16) and (17); we remark that other functionals, which minimize the variation of the torsion, could also be used. Other examples can be found in [14].
20 18 16 14 12 10 8 −1 −1
−0.5
−0.5
0
0
0.5
0.5 1
Figure 7.
1
C 1 interpolating spline curve with fairness criterium (16)
60
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
20 18 16 14 12 10 8 −1 −0.5
−0.5
0
0
0.5
0.5 1
Figure 8.
4.
C 1 interpolating spline curve with fairness criterium (17)
A new application: geometric interpolation
In this section we will show a new application of AS. We consider again planar data as in subsection 3.1 fi ∈ R2 , i = 0, 1, . . . , N –with the nontrivial difference that the parameter values corresponding to the interpolation points are now not assigned– and we wish to construct a fair, cubic, C 1 interpolating spline curve. This problem is sometimes referred to as geometric interpolation. It is well known that often in CAGD applications there is no a-priori relation between data and parameter values and that there are several formulas for their computation; uniform, chord length and centripetal (see, e.g. [19]) are the most used. Their utility is obvious: the construction of spline curves interpolating the data at given knots gives a well posed linear problem but the corresponding construction with free knots gives an overdetermined non linear problem. On the other hand it is also well known the deep influence of the parameterization on the curve shape and the advantage of a data dependent formula for the location of the knots.
Constrained Curves
61
In the context of abstract schemes such goal can be easily obtained. We use the notation stated in subsection 3.1 and observe that the i-th polynomial piece is defined by the control points bi,0 , bi,1 , bi,2 , bi,3 and by the two knots ti , ti+1 . Therefore we have di = (ti , ui , vi ); q = 3 i and and, recalling that ki := ti+1 − ti is used in the definition of the slopes λ consequently in the shape constraints (14), we obtain Di = {(ti , ui , vi , ti+1 , ui+1 , vi+1 ) ∈ R3 × R3 s.t. (14) is satisfied}. (20) We have only to find a sequence (di ∈ R3 ; i = 0, . . . , N ) such that (di , di+1 ) ∈ Di , i = 0, 1, . . . , N − 1, where Di are defined by (20), and which minimizes the fairness functional (16) or (17). We omit here the technical details; we limit ourselves to say that in order to reduce the computational cost (mainly dependent on the cardinality of the discretized domains D i defined in subsection 2.5), we impose the heuristic restriction 0.8 k¯min ≤ k¯i ≤ 1.2 k¯max ; i = 1, . . . , N − 1, where k¯min = min0≤i≤N −1 k¯i , k¯max = max0≤i≤N −1 k¯i and {k¯0 , . . . , k¯N −1 } is a given, fixed sequence of knots, typically given either by the centripetal or by the arc length formula. We conclude this section with two examples. Figures 9 and 10 shows the porcupine plots given by the cubic, C 1 interpolating splines, minimizing (17) with, respectively, the fixed arc length knots {ξ 0 , ξ1 , . . . , ξN } and the free knots {t0 , t1 , . . . , tN }. This comparison is justified by the consideration that usually the arc length parameterization produces fair curves and thus the minimum of the fairness functional is not far from it. The knot sequences are {ξ0 , . . . , ξ9 } = {0, 2.2361, 5.2775, 6.6917, 8.2728, 8.7827, 11.5684, 12.7767, 14.9699, 17.0315}, {t0 , . . . , t9 } = {0, 2.8134, 5.3989, 6.8446, 8.9742, 9.5081, 12.5495, 14.4511, 17.4925, 19.3941}; the values of (17) are, respectively, 2.2782 and 2.0647. The data set used in this example is taken from a benchmark of the FAIRSHAPE project; other graphical and numerical examples can be found in [1]. The graphical results of an analogous test are shown in figures 11 and 12, where the positive effect of the optimal choice of the knots clearly appears. For this test we have {ξ0 , . . . , ξ5 } = {0, 0.99, 2, 3, 3.99, 4.99} , {t0 , . . . , t5 } = {0, 0.8, 1.6, 2.4, 3.2, 4} ; the values of (17) are, respectively, 1.0760 and 1.0157.
62
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING 10
9
8
7
6
5
4
3
2
0
1
2
3
4
5
6
7
8
9
10
C 1 interpolating spline curve with fairness criterium (17); fixed knots 10
9
8
7
6
5
4
3
0
1
2
3
4
5
6
7
8
9
10
C 1 interpolating spline curve with fairness criterium (17); free knots
5.
Concluding remarks
We have presented a short review of the basic ideas underlying the Abstract Schemes and proposed some examples with the simultaneous goals of showing
Constrained Curves
Figure 11.
C 1 interpolating spline curve with fairness criterium (17); fixed knots
Figure 12.
C 1 interpolating spline curve with fairness criterium (17); free knots
63
both the very general mathematical setting of the theory and the effectiveness of their practical use. In this connection, we remark that the results reported in the previous section have never appeared before. Due to the authors’ main interest, the applications so far developed concern shape preserving interpolation. Since the most popular methods are based
64
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
on tension methods (TM), a comparison with the algorithms based on AS is undoubtedly interesting. Consider the interval [t i , ti+1 ] and denote with τ = (t − ti )/ki the local variable. Limiting ourselves to standard TM, the idea of TM is to substitute P3 = span{(1 − τ )3 , P1 , τ 3 } with TP3 = span {φi (τ ; αi ), P1 , ψi (τ ; αi+1 )} where φi , ψi are tension functions, for instance of the form φi (τ ; αi ) = (1 − τ )αi , ψi (τ ; αi+1 ) = ταi+1 ; φi (τ ; αi ) =
1 α2i
ψi (τ ; αi+1 ) =
sinh(αi (1−τ )) − sinh(α i) sinh(αi+1 τ ) 1 sinh(αi+1 ) α2i+1
(1 − τ ) , −τ .
The parameters αi , αi+1 are called tension parameters, that is, for any p ∈ P 1 , lim
αi ,αi+1 →∞
(c0 φi (τ ; αi ) + p(τ ) + c1 ψi (τ ; αi+1 )) = p(τ ), ∀τ ∈ [a, b] ⊂ (0, 1).
The most important features of TM is that they usually have a nice mathematical structure and that in some cases (e.g. for variable degree polynomial splines [17], [11]) we can reproduce almost entirely the Bernstein-B e´ zier and B-spline theory, with a clear geometric interpretation of the constraints and of the shape parameters. As an immediate consequence, the construction of parametric curves and surfaces can be obtained by straightforward extensions. On the other hand we need, for each choice of tension functions, an ad hoc mathematical theory, the curves are not conformal to the standard of CAD/CAM systems (which are based on low degree NURBS) and, due to their intrinsic structure, when the curves tend to a piecewise linear form the curvature and the torsion have sharp and unpleasant changes. The situation for AS is complementary. The advantages are that they can be described by a general mathematical theory, which allow the use of low degree splines with optimal values of the fairness functionals. The main disadvantage is that their applications to the construction of curves and surfaces require specialized techniques and algorithms. In this connection, it is worthwhile to say that AS can be used also for the construction of surfaces subject to local constraints; the algorithms are based on the von Neumann alternating projection and can be applied if the sets X i used in definition (1) are Hilbert spaces, see [16].
References [1] Applegarth, I., P. D. Kaklis and S. Wahl (Eds.): Benchmark Tests on the Generation of Fair Shapes sub ect to Constraints, Stuttgart, B. G. Teubner, 2000.
Constrained Curves
65
[2] Aubin, J.P. and A. Cellina: Differential Inclusions, Grundlehren der mathematischen Wissenschaften 264, Springer Verlag, Berlin, 1984. [3] Bellman, R. and S. Dreyfus: Applied Dynamic Programming. Princeton University Press, New York, 1962. [4] Berge, C.: Espaces Topologiques. Fonctions Multivoques, Collection Universitaire de Math`e` matiques 3, Dunod, Paris, 1959. [5] Costantini, P.: “On monotone and convex spline interpolation”, Math. Comp., 46 (1986), 203–214. [6] Costantini, P.: “An algorithm for computing shape-preserving interpolating splines of arbitrary degree”, Journal of Computational and Applied mathematics, 22 (1988), 89– 136. [7] Costantini, P.: “A general method for constrained curves with boundary conditions”, In Multivariate Approximation: From CAGD to Wavelets, K. Jetter and F.I. Utreras (eds.), World Scientific Publishing Co., Inc., Singapore, 1993. [8] Costantini, P.: “Abstract schemes for functional shape-preserving interpolation, in Advanced Course on FAIRSHAPE, J. Hoschek and P. Kaklis, eds., B. G. Teubner, Stuttgart, 1996, pp. 185–199. [9] Costantini, P.: “Boundary-Valued Shape-Preserving Interpolating Splines”, ACM Transactions on Mathematical Software, 23 (1997), 229–251. [10] Costantini, P.: “Algorithm 770: BVSPIS-A Package for Computing Boundary-Valued Shape-Preserving Interpolating Splines”, ACM Transactions on Mathematical Software, 23 (1997), 252–254. [11] Costantini, P.: “Curve and surface construction using variable degree polynomial splines”, CAGD, 17 (2000), 419–446. [12] Costantini, P. and R. Morandi: “Monotone and convex cubic spline interpolation”, CALCOLO, 21 (1984), 281–294. [13] Costantini P. and M. L. Sampoli: “Abstract Schemes and Constrained Curve Interpolation”, in Designing and Creating Shape-Preserving Curves and Surfaces, H. Nowacki and P. Kaklis (Eds.) B. G. Teubner, Stuttgart 1998, 121–130. [14] Costantini, P. and M. L. Sampoli: “Constrained Interpolation in R3 by Abstract Schemes”, in Curve and Surface Design: Saint-Malo 2002, T. Lyche, M.L. Mazure and L.L. Schumaker (eds), Nashboro Press, Nashville 2003, 93–102. [15] Costantini P. and M. L. Sampoli: “A General Scheme for Shape Preserving Planar Interpolating Curves”, BIT, 40 (2003), 297–317. [16] Costantini, P. and M.L. Sampoli: “Abstract schemes and monotone surface interpolation”, Universit`a` di Siena, Dipartimento di Matematica, Rapporto 398, Aprile 2000. [17] Kaklis, P.D. and D.G. Pandelis : “Convexity preserving polynomial splines of nonuniform degree”, IMA J. Numer. Anal., 10 (1990), 223–234. [18] Goodman, T.N.T.: “Total positivity and the shape of curves”, in Total Positivity and its Applications, M. Gasca and C. A. Micchelli (eds.), Kluwer, Dordrecht, (1996) 157–186. [19] Hoschek, J. and D. Lasser: Fundamentals of Computer Aided Geometric Design, AK Peters Ltd., Wellesley, 1993. [20] Morandi, R. and P. Costantini: “Piecewise monotone quadratic histosplines”, SIAM Journal on Scientific and Statistical Computing, 10 (1989), 397–406.
66
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[21] Mulansky, B. and J. W. Schmidt: “Convex interval interpolation using three-term staircase algorithm”, Numerische Mathematik, 82 (1999), 313–337. [22] Preparata, F. P. and M. I. Shamos: Computational Geometry, Springer-Verlag, Berlin, New York, 1985. [23] Sampoli, M.L.: “Closed Spline Curves Bounding Maximal Area”, Rendiconti di Matematica, 64 (2004), 377–391. [24] Schmidt, J.W. and W. Heß: “Schwach verkoppelte Ungleichungssysteme und konvexe Spline-Interpolation”, El. Math., 39 (1984), 85–96. [25] Schmidt J. W.: “On shape-preserving spline interpolation: existence theorems and determination of optimal splines”, Approximation and Function Spaces, Vol. 22 PWN-Polish Scientific Publishers, Warsaw (1989). [26] Schmidt J. W.: “Staircase algorithm and construction of convex spline interpolants up to the continuity C 3 ”, Computer Math. Appl., 31 (1996), 67–79.
DMBVP FOR TENSION SPLINES Boris I. Kvasov Institute of Computational Technologies, Russian Academy of Sciences, Lavrentyev Avenue 6, 630090, Novosibirsk, Russia.
[email protected]
Abstract
This paper addresses a new approach in solving the problem of shape preserving spline interpolation. Based on the formulation of the latter problem as a differential multipoint boundary value problem for hyperbolic and biharmonic tension splines we consider its finite-difference approximation. The resulting system of linear equations can be efficiently solved either by direct (Gaussian elimination) and iterative methods (successive over-relaxation (SOR) method and finite-difference schemes in fractional steps). We consider the basic computational aspects and illustrate the main advantages of this original approach.
Keywords:
Hyperbolic and biharmonic tension splines, differential multipoint boundary value problem, successive over-relaxation method, finite-difference schemes in fractional steps, shape preserving interpolation.
1.
Introduction
Spline theory is mainly grounded on two approaches: the algebraic one (where splines are understood as smooth piecewise functions, see, e.g., [19]) and the variational one (where splines are obtained via minimization of quadratic functionals with equality and/or inequality constraints, see, e.g., [13]). Although less common, a third approach [8], where splines are defined as the solutions of differential multipoint boundary value problems (DMBVP for short), has been considered in [3, 11, 12] and closely relates to the idea of polysplines [10]. Even though some of the important classes of splines can be obtained from all three schemes, specific features sometimes make the last one an important tool in practical settings. We want to illustrate this fact by the examples of interpolating hyperbolic and biharmonic tension splines. Introduced by Schweikert in 1966 [20] hyperbolic tension splines are still very popular [9, 15–18]. Earlier biharmonic (thin plate) tension splines were considered in [2, 4, 5, 7, 12], etc. 67 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 67–94. © 2005 Springer. Printed in the Netherlands.
68
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
For the numerical treatment of a DMBVP we replace the differential operator by its finite-difference approximation. This gives us a linear system of difference equations with a matrix of special structure. The latter system can be efficiently treated by the Gaussian elimination or by iterative methods such as SOR iterative method or finite-difference schemes in fractional steps [21]. We present numerical examples illustrating the main features of this approach. The content of this paper is as follows. In Section 2 we formulate the 1-D problem. In Section 3 we prove the existence of a mesh solution by constructing its extension as a discrete hyperbolic tension spline. Section 4, with its subsections, is devoted to the discussion of practical aspects and computational advantages of our discrete spline. In Sections 5 and 6 we formulate the 2–D problem and give its finite-difference approximation. The algorithm for the numerical solution of 2–D problem is described in section 7. Section 8 gives the SOR iterative method. In section 9 we consider a finite-difference scheme in fractional steps and treat its approximation and stability properties. Finally, Section 10 provides some graphical examples to illustrate the main properties of discrete hyperbolic and biharmonic tension splines.
2.
1–D DMBVP. Finite Difference Approximation Let the data (xi , fi ),
i = 0, . . . , N + 1,
(1)
be given, where: a = x0 < x1 < · · · < xN +1 = b. Let us put hi = xi+1 − xi ,
i = 0, . . . , N.
Definition 2.1. An interpolating hyperbolic spline S with a set of tension parameters {pi ≥ 0 | i = 0, . . . , N } is a solution of the DMBVP d4 S pi 2 d2 S − = 0, dx4 hi dx2
in each (xi , xi+1 ),
i = 0, . . . , N,
S ∈ C 2 [a, b]
(2) (3)
with the interpolation conditions S(xi ) = fi ,
i = 0, . . . , N + 1,
(4)
and the end conditions S (a) = f0
and
S (b) = fN +1 .
(5)
The classical end constraints (5) we consider only for the sake of simplicity. One can also use other types of the end conditions [11].
69
DMBVP for Tension Splines
Let us now consider a discretized version of the previous DMBVP. Let n i ∈ N, i = 0, . . . , N , be given; we look for {uij ,
j = −1, . . . , ni + 1,
i = 0, . . . , N },
satisfying the difference equations: p 2
i Λ2i − Λi uij = 0, j = 1, . . . , ni − 1, hi where Λi uij =
ui,j−1 − 2uij + ui,j+1 , τi2
i = 0, . . . , N,
τi =
(6)
hi . ni
The smoothness condition (3) is changed into ui−1,ni−1 = ui0 , ui−1,ni−1 +1 − ui−1,ni−1 −1 ui,1 − ui,−1 = , 2ττi−1 2ττi Λi−1 ui−1,ni−1 = Λi ui,0
i = 1, . . . , N,
(7)
while conditions (4)–(5) take the form ui,0 = fi , i = 0, . . . , N, uN,nN = fN +1 , Λ0 u0,0 = f0 , ΛN uN,nN = fN +1 .
(8)
Our discrete mesh solution will be then defined as {uij ,
j = 0, . . . , ni ,
i = 0, 1, . . . , N }.
(9)
In the next section we prove the existence of the solution of the previous linear system while we postpone to Section 4 the comments on the practical computation of the mesh solution.
3.
System Splitting and Mesh Solution Extension In order to analyze the solution of system (6)–(8) we introduce the notation mij = Λi uij ,
j = 0, . . . , ni ,
i = 0, . . . , N.
(10)
Then, on the interval [xi , xi+1 ], (6) takes the form p 2
mi,j−1 − 2mij + mi,j+1 i − hi τi2
mi0 = mi , mij = 0,
j = 1, . . . , ni − 1,
mi,ni = mi+1 ,
(11)
70
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
where mi and mi+1 are prescribed numbers. The system (11) has a unique solution, which can be represented as follows mij = Mi (xij ),
xij = xi + jττi ,
j = 0, . . . , ni ,
with Mi (x) = mi
sinh ki (1 − t) sinh ki t + mi+1 , sinh(ki ) sinh(ki )
t=
x − xi , hi
and where the parameters ki are the solutions of the transcendental equations 2ni sinh that is
ki = 2ni ln
pi + 2ni
ki = pi , 2ni
"
pi ≥ 0,
pi 2 + 1 ≥ 0, 2ni
i = 0, . . . , N.
From (10) and from the interpolation conditions (8) we have ui0 = fi , ui,j−1 − 2uij + ui,j+1 = mij , j = 0, . . . , ni , τi2 ui,ni = fi+1 .
(12)
For each sequence mij , j = 0, . . . , ni , system (12) has a unique solution which can be represented as follows uij = Ui (xij ),
j = −1, . . . , ni + 1,
where Ui (x) = fi (1 − t) + fi+1 t + ϕi (1 − t)h2i mi + ϕi (t)h2i mi+1 , with ϕi (t) =
(13)
sinh(ki t) − t sinh(ki ) . p2i sinh(ki )
In order to solve system (6)–(8), we only need to determine the values m i , i = 0, . . . , N + 1, so that the smoothness conditions (7) and the end conditions in (8) are verified. From (12)–(13), conditions (7) can be rewritten as Ui−1 (xi ) = Ui (xi ), Ui−1 (xi + τi−1 ) − Ui−1 (xi − τi−1 ) Ui (xi + τi ) − Ui (xi − τi ) = , 2ττi−1 2ττi Λi−1 Ui−1 (xi ) = Λi Ui (xi ),
(14)
71
DMBVP for Tension Splines
where Λj Uj (x) =
Uj (x + τj ) − 2Uj (x) + Uj (x − τj ) , τj2
x ∈ [xj , xj+1 ].
Then, from (10)–(11) and (12), the first and the third equalities in (14) are immediately satisfied, while, using (13) and the end conditions in (8), the second equality provides the following linear system with a 3-diagonal matrix for the unknown values mi : m0 = f0 , αi−1 hi−1 mi−1 + (β βi−1 hi−1 + βi hi )mi + αi hi mi+1 = di , i = 1, . . . , N, mN +1 = fN +1 , (15) where fi+1 − fi fi − fi−1 di = − , hi hi−1 ϕi n1i − ϕi − n1i ni sinh nkii − sinh(ki ) αi = − =− , 2 p2i sinh(ki ) ni ϕi 1 + n1i − ϕi 1 − n1i ni cosh(ki ) sinh nkii − sinh(ki ) βi = = . 2 p2i sinh(ki ) n i
Expanding the hyperbolic functions in the above expressions as power series we obtain βi ≥ 2αi > 0,
i = 0, . . . , N,
for all ni > 1,
pi ≥ 0.
Therefore, the system (15) is diagonally dominant and has a unique solution. We can now conclude that system (6)–(8) has a unique solution which can be represented as Ui (xij ),
j = −1, . . . , ni + 1,
i = 0, . . . , N,
whenever the constants mi are solution of (15). Let us put U(x) := Ui (x),
x ∈ [xi , xi+1 ],
i = 0, 1, . . . , N.
(16)
Due to the previous construction we will refer to U as discrete hyperbolic tension spline interpolating the data (1). We observe that we recover the result of [14] for discrete cubics since
1 1 t(t2 − 1) 1 1 lim αi = 1 − 2 , lim βi = 2 + 2 , lim ϕi (t) = . pi →0 pi →0 6 6 6 ni ni pi →0 (17)
72
4.
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Computational Aspects
The aim of this section is to investigate the practical aspects related to the numerical evaluation of the mesh solution defined in (9). A standard approach, [17], consists of solving the tridiagonal system (15) and then evaluating (13) at the mesh points as is usually done for the evaluation of continuous hyperbolic splines. At first sight, this approach based on the solution of a tridiagonal system seems preferable because of the limited waste of computational time and the good classical estimates for the condition number of the matrix in (15). However, it should be observed that, as in the continuous case, we have to perform a large number of numerical computations of hyperbolic functions of the form sinh(ki t) and cosh(ki t) both to define system (15) and to tabulate functions (13). This is a very difficult task, both for cancellation errors (when ki → 0) and for overflow problems (when ki → ∞). A stable computation of the hyperbolic functions was proposed in [17], where different formulas for the cases ki ≤ 0.5 and ki > 0.5 were considered and a specialized polynomial approximation for sinh(·) was used. However, we note that this approach is the only one possible if we want a continuous extension of the discrete solution beyond the mesh point. In contrast, the discretized structure of our construction provides us with a much cheaper and simpler approach to compute the mesh solution (9). This can be achieved both by following the system splitting approach presented in Section 3, or by a direct computation of the solution of the linear system (6)–(8). As for the system splitting approach, presented in Section 3, the following algorithm can be considered. Step 1. Solve the 3-diagonal system (15) for m i , i = 1, . . . , N .
Step 2. Solve N + 1 3-diagonal systems (11) for m ij , j = 1, . . . , ni − 1, i = 0, . . . , N ,
Step 3. Solve N + 1 3-diagonal systems (12) for u ij , j = 1, . . . , ni − 1, i = 0, . . . , N . In this algorithm, hyperbolic functions need only be computed in step 1. Furthermore, the solution of any system (11) or (12) requires 8q arithmetic operations, namely, 3q additions, 3q multiplications, and 2q divisions [22], where q is the number of unknowns, and is thus substantially cheaper than direct computation by formula (13).
73
DMBVP for Tension Splines
Steps 2 and 3 can be replaced by a direct splitting of the system (6)–(8) into N + 1 systems with 5-diagonal matrices
Λ2i ui,j −
p 2 i
hi
ui,0 = fi , Λi ui,0 = Mi , Λi ui,j = 0,
j = 1, . . . , ni − 1,
i = 0, . . . , N,
(18)
ui,ni = fi+1 , Λi ui,ni = Mi+1 . Also, in this case the calculations for steps 2 and 3 or for system (18) can be tailored for a multiprocessor computer system. Let us discuss now the direct solution of system (6)–(8) which, of course, only involves rational computations on the given data. In order to do this in the next subsections we investigate in some details the structure of the mentioned system.
4.1
The Pentadiagonal System
Eliminating the unknowns {ui,−1 , i = 1, . . . , N }, and {ui,ni +1 , i = 0, . . . , N − 1}, from (7) determining the values of the mesh solution at the data sites xi by the interpolation conditions and eliminating u 0,−1 , uN,nN +1 from the end conditions (8) we can collect (6)–(8) into the system Au = b,
(19)
where u = (u01 , . . . , u0,n0 −1 , u11 , . . . , u21 , . . . , uN 1 , . . . , uN,nN −1 )T , A is the following pentadiagonal matrix: ⎡ ⎤ b0 − 1 a0 1 ⎢ a0 b0 a0 1 ⎥ ⎢ ⎥ a0 b0 a0 1 ⎢ 1 ⎥ ⎢ ⎥ ··· ⎢ ⎥ ⎢ ⎥ 1 a0 b0 a0 ⎢ ⎥ ⎢ ⎥ 1 a0 η0,n0 −1 δ0,n0 −1 ⎢ ⎥ ⎢ ⎥ δ η a 1 ⎢ ⎥ 1,1 1,1 1 ⎢ ⎥ a1 b1 a1 1 ⎢ ⎥ ⎢ ⎥ ··· ⎢ ⎥ ⎢ 1 aN bN aN 1 ⎥ ⎢ ⎥ ⎣ 1 aN bN aN ⎦ 1 a N bN − 1
74
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
with ai = −(4 + ωi ),
bi = 6 + 2ωi ,
ηi−1,ni−1 −1 = 6 + 2ωi−1 + δi−1,ni−1 −1 = ρi =
τi τi−1
,
2 , ρi (ρi + 1)
ωi =
1 − ρi , 1 + ρi δi,1 = 2
p 2 i
ni
;
i = 0, 1, . . . , N,
ηi,1 = 6 + 2ωi +
ρi − 1 ρi + 1
ρ2i , ρi + 1
i = 1, 2, . . . , N ;
and b = (−(a0 + 2)ff0 − τ02 f0 , −ff0 , 0, . . . , 0, −f1 , −γ0,n0 −1 f1 , −γ1,1 f1 , −f1 , 2 0, . . . , 0, −ffN +1 , −(aN + 2)ffN +1 − τN fN +1 )T ,
with 1 − ρi γi−1,ni−1 −1 = − 4 + ωi−1 + 2 , ρi γi,1 = −(4 + ωi + 2(ρi − 1)), i = 1, 2, . . . , N.
4.2
The Uniform Case
From the practical point of view it is interesting to examine the structure of A when we are dealing with a uniform mesh, that is τ i = τ . In such a case it is immediately seen that A is symmetric. In addition, following [14] we observe that A = C + D, where both C and D are symmetric block diagonal matrices. To be more specific, ⎡ ⎤ C0 ⎢ ⎥ C1 ⎢ ⎥ C=⎢ ⎥ , Ci = B2i − ωi Bi , .. ⎣ ⎦ . CN where Bi is the (ni − 1) × (ni − 1) tridiagonal matrix ⎡ ⎤ −2 1 ⎢ 1 −2 1 ⎥ ⎢ ⎥ ⎢ ⎥ 1 −2 1 ⎥; Bi = ⎢ ⎢ ⎥ ··· ⎢ ⎥ ⎣ 1 −2 1 ⎦ 1 −2
75
DMBVP for Tension Splines
and ⎡
0 0 ⎢0 0 ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ D=⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
⎤ ..
. 0 1 1 1 1 0
..
. 0 1 1 1 1
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ 0 ⎥ ⎥ .. ⎥ . 0 0 ⎦ 0 0
The eigenvalues of C, λk (C), are the collection of the eigenvalues of C i . Since, (see [14]), jπ λj (Bi ) = −2 1 − cos , j = 1, . . . , ni − 1, ni we have jπ 2 jπ λj (Ci ) = 4 1 − cos , + 2ωi 1 − cos ni ni
j = 1, . . . , ni − 1.
In addition, the eigenvalues of D are 0 and 2, thus we deduce from a corollary of the Courant-Fisher theorem [6] that the eigenvalues of A satisfy the following inequalities λk (A) ≥ λk (C) = min λj (Ci ) i,j π 2 π
= min 4 1 − cos + 2ωi 1 − cos . i ni ni Hence, A is a positive matrix and we directly obtain that the pentadiagonal linear system has a unique solution. In addition, by Gershgorin’s theorem, λk (A) ≤ max[16 + 4ωi ]. i
Then we obtain the following upper bound for the condition number of A which is independent of the number of data points, N + 2, and which recovers the
76
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
result presented in [14] for the limit case p i = 0, i = 0, . . . , N , A 2 A
−1
2 maxi 16 + 4 npii 2 ≤ 2 2 1 − cos nπi mini 4 1 − cos nπi + 2 npii 2 maxi 16 + 4 npii . 4 mini n1i [π 4 + (πpi )2 ]
(20)
Summarizing, in the particular but important uniform case we can compute the mesh solution by solving a symmetric, pentadiagonal, positive definite system and therefore, we can use specialized algorithms, with a computational cost of 17q arithmetic operations, namely, 7q additions, 7q multiplications, and 3q divisions [22], where q is the number of unknowns. Moreover, since the upper bound (20) for the condition number of the matrix A does not depend on the number of interpolation points, such methods can be used with some confidence. In the general case of a non–uniform mesh, the matrix A is no longer symmetric, and an analysis of its condition number cannot be carried out analytically. However, several numerical experiments have shown that the condition number is not influenced by the non–symmetric structure, but does depend on the maximum number of grid points in each subinterval, exactly as in the symmetric case. In other words, symmetric and nonsymmetric matrices, with the same dimension and produced by difference equations with the same largest n i , produce very close condition numbers. Non–uniform discrete hyperbolic tension splines have in fact been used for the graphical tests of the section 10.
5.
2–D DMBVP. Problem Formulation Let us consider a rectangular domain Ω = Ω ∪ Γ where Ω = {(x, y) | a < x < b, c < y < d}
and Γ is the boundary of Ω. We consider on Ω a mesh of lines ∆ = ∆ x × ∆y with ∆x : a = x0 < x1 < · · · < xN +1 = b, ∆y : c = y0 < y1 < · · · < yM +1 = d, which divides the domain Ω into the rectangles Ω ij = Ωij ∪ Γij where Ωij = {(x, y) | x ∈ (xi , xi+1 ), y ∈ (yj , yj +1 )} and Γij is the boundary of Ωij , i = 0, . . . , N , j = 0, . . . , M .
77
DMBVP for Tension Splines
Let us associate to the mesh ∆ the data (xi , yj , fij ),
i = 0, N + 1,
j = 0, . . . , M + 1
(2,0) fij , (0,2) fij , (2,2) fij ,
i = 0, N + 1,
j = 0, . . . , M + 1
i = 0, . . . , N + 1, j = 0, M + 1 i = 0, N + 1,
j = 0, M + 1
where (r,s)
fij
=
∂ r+s f (xi , yj ) , ∂xr ∂y s
r, s = 0, 2.
We denote by C 2,2 [Ω] the set of all continuous functions f on Ω having continuous partial and mixed derivatives up to the order 2 in x and y variables. We call the problem of searching for a function S ∈ C 2,2 [Ω] such that S(xi , yj ) = fij , i = 0, . . . , N + 1, j = 0, . . . , M + 1, and S preserves the shape of the initial data the shape preserving interpolation problem. This means that wherever the data increases (decreases) monotonically, S has the same behaviour, and S is convex (concave) over intervals where the data is convex (concave). Evidently, the solution of the shape preserving interpolation problem is not unique. We are looking for a solution of this problem as a biharmonic tension spline. Definition 5.1. An interpolating biharmonic spline S with two sets of tension parameters { 0 ≤ pij < ∞ | i = 0, . . . , N, j = 0, . . . , M + 1 } and { 0 ≤ qij < ∞ | i = 0, . . . , N + 1, j = 0, . . . , M } is a solution of the DMBVP ∂4S ∂4S ∂4S + 2 + − ∂x4 ∂x2 ∂y 2 ∂y 4
pij hi
2
∂2S − ∂x2
q ij lj
2
∂2S =0 ∂y 2
in each Ωij , hi = xi+1 − xi , lj = yj +1 − yj , pij = max(pij , pi,j+1 ), q ij = max(qij , qi+1,j ), i = 0, . . . , N,
j = 0, . . . , M,
(21)
78
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
∂4S − ∂x4 ∂4S − ∂y 4
pij hi qij lj
2
2
∂2S = 0, x ∈ (xi , xi+1 ), i = 0, . . . , N, ∂x2 y = yj , j = 0, . . . , M + 1, ∂2S = 0, ∂y 2
y ∈ (yj , yj +1 ), x = xi ,
S∈C
2,2
j = 0, . . . , M,
(22)
(23)
i = 0, . . . , N + 1,
[Ω],
(24)
with the interpolation conditions S(xi , yj ) = fij ,
i = 0, . . . , N + 1,
j = 0, . . . , M + 1,
(25)
and the boundary conditions (2,0)
, i = 0, N + 1,
j = 0, . . . , M + 1,
(0,2)
, i = 0, . . . , N + 1,
j = 0, M + 1,
(2,2)
, i = 0, N + 1,
j = 0, M + 1.
S (2,0) (xi , yj ) = fij S (0,2) (xi , yj ) = fij S (2,2) (xi , yj ) = fij
(26)
By this definition an interpolating biharmonic tension spline S is a set of the interpolating biharmonic tension functions which satisfy (21), match up smoothly and form a twice continuously differentiable function both in x and y variables S (r,0) (xi − 0, y) = S (r,0) (xi + 0, y), S
(0,s)
(x, yj − 0) = S
(0,s)
(x, yj + 0),
r = 0, 1, 2,
i = 1, . . . , N,
s = 0, 1, 2,
j = 1, . . . , M.
(27)
C 2 smoothness of the interpolating hyperbolic tension splines in (22) and (23) was proven in [3, 11]. The computation of the interpolating biharmonic tension spline reduces to a computation of infinitely many proper one-dimensional hyperbolic tension splines. For all pij , qij → 0 the solution of (21)–(26) becomes a biharmonic spline [4] while in the limiting case as p ij , qij → ∞ in rectangle Ωij the spline S turns into a linear function separately by x and y, and obviously preserves the shape properties of the data on Ωij . By increasing one or more of tension parameters the surface is pulled towards an inherent shape while at the same time keeping its smoothness. Thus, the DMBVP gives an approach to the solution of the shape preserving interpolation problem.
6.
Finite–Difference Approximation of DMBVP
For practical purposes, it is often necessary to know the values of the solution S of a DMBVP only over a prescribed grid instead of its global analytic
79
DMBVP for Tension Splines
expression. In this section, we consider a finite-difference approximation of the DMBVP. This provides a linear system whose solution is called a mesh solution. It turns out that the mesh solution is not a tabulation of S but is supposed to be some approximation of it. Let ni , mj ∈ N, i = 0, . . . , N , j = 0, . . . , M , be given such that hi lj = = h. ni mj We are looking for a mesh function uik;jl | k = −1, . . . , ni + 1, i = 0, . . . , N ;
l = −1, . . . , mj + 1, j = 0, . . . , M ,
satisfying the difference equations
2 pij 2 q ij + 2Λ1 Λ2 + − Λ1 − Λ2 uik;jl = 0, (28) hi lj k = 1, . . . , ni − 1, i = 0, . . . , N ; l = 1, . . . , mj − 1, j = 0, . . . , M,
2 pij 2 Λ1 − Λ1 uik;jl = 0, (29) hi 0, if j = 0, . . . , M − 1, k = 1, . . . , ni − 1, i = 0, . . . , N ; l = 0, mM if j = M ,
2 qij Λ22 − Λ2 uik;jl = 0, (30) lj 0, if i = 0, . . . , N − 1, k= ; l = 1, . . . , mj − 1, j = 0, . . . , M, 0, nN if i = N , Λ21
Λ22
where ui,k+1;jl − 2uik;jl + ui,k−1;jl , h2 uik;j,l+1 − 2uik;jl + uik;j,l−1 = . h2
Λ1 uik;jl = Λ2 uik;jl
80
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
The smoothness conditions (27) are changed to ui−1,ni−1 ;jl = ui0;jl , ui−1,ni−1 +1;jl − ui−1,ni−1 −1;jl ui1;jl − ui,−1;jl = , (31) 2h 2h Λ1 ui−1,ni−1 ;jl = Λ1 ui0;jl , i = 1, . . . , N, l = 0, . . . , mj , j = 0, . . . , M, uik;j−1,mj−1 = uik;j0 , uik;j−1,mj−1 +1 − uik;j−1,mj−1 −1 uik;j1 − uik;j,−1 = , 2h 2h Λ2 uik;j−1,mj−1 = Λ2 uik;j0 , k = 0, . . . , ni ,
i = 0, . . . , N,
(32)
j = 1, . . . , M.
Conditions (25) and (26) take the form ui0;j0 = fij , ui0;M,mM = fi,M +1 ,
uN,nN ;j0 = fN +1,j , uN,nN ;M,mM = fN +1,M +1 , (33) i = 0, . . . , N, j = 0, . . . , M,
and (2,0)
Λ1 u00;j0 = f0j
,
j = 0, . . . ,M ;
(2,0)
Λ1 u00;M,mM = f0,M +1 , (2,0)
Λ1 uN,nN ;j0 = fM +1,j , Λ1 uN,nN ;M,mM = Λ2 ui0;00 = Λ2 uN,nN ;00 = Λ2 ui0;M,mM = Λ2 uN,nN ;M,mM = Λ1 Λ2 u00;00 = Λ1 Λ2 u00;M,mM =
7.
j = 0, . . . ,M ;
(2,0) fN +1,M +1 , (0,2) fi0 , i = 0, . . . ,N ; (0,2) fN +1,0 , (0,2) fi,M +1 , i = 0, . . . ,N ; (0,2) fN +1,M +1 , (2,2) (2,2) f00 , Λ1 Λ2 uN,nN ;00 = fN +1,0 , (2,2) (2,2) f0,M +1 , Λ1 Λ2 uN,nN ;M,mM = fN +1,M +1 .
(34)
Algorithm
To solve finite-difference system (28)–(34) we propose first to find its solution on the refinement of the main mesh ∆. The latter can be achieved in the four steps.
DMBVP for Tension Splines
81
First step. Evaluate all tension parameters p ij on the lines y = yj , j = 0, . . . , M + 1 and qij on the lines x = xi , i = 0, . . . , N + 1 by one of 1–D algorithms for automatic selection of shape control parameters, see, e.g., [11, 16, 17], etc. Second step. Construct discrete hyperbolic tension splines [3] in the x direction by solving the M + 2 linear systems (29). As a result, one finds the values of the mesh solution on the lines y = y j , j = 0, . . . , M + 1 of the mesh ∆ in x direction. Third step. Construct discrete hyperbolic tension splines in the y direction by solving the N + 2 linear systems (30). This gives us the values of the mesh solution on the lines x = xi , i = 0, . . . , N + 1 of the mesh ∆ in y direction. Fourth step. Construct discrete hyperbolic tension splines in the x and y (2,0) directions interpolating the data f ij , i = 0, N + 1, j = 0, . . . , M + 1, and (0,2)
fij , i = 0, . . . , N + 1, j = 0, M + 1, on the boundary Γ. This gives us the values Λ1 u00;jl , Λ1 uN,nN ;jl , l = 0, . . . , mj , j = 0, . . . , M, Λ2 uik;00 , Λ2 uik;M,mM , k = 0, . . . , ni , i = 0, . . . , N.
(35)
Now the system of difference equations (28)–(34) can be substantially simplified by eliminating the unknowns uik;jl , k = −1, ni + 1, i = 0, . . . , N, l = 0, . . . , mj , j = 0, . . . , M, uik;jl , k = 0, . . . , ni , i = 0, . . . , N, l = −1, mj + 1, j = 0, . . . , M, using relations (31), (32), and the boundary values (35). As a result one obtains a system with (n i − 1)(mj − 1) difference equations and the same number of unknowns in each rectangle Ω ij , i = 0, . . . , N , j = 0, . . . , M . This linear system can be efficiently solved by the SOR algorithm or applying finite-difference schemes in fractional steps on single- or multiprocessor computers.
8.
SOR Iterative Method
Using a piecewise linear interpolation of the mesh solution from the main mesh ∆ onto the refinement let us define a mesh function (0)
{uik;jl | k = 0, . . . , ni , i = 0, . . . , N, l = 0, . . . , mj , j = 0, . . . , M }. (36)
82
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
In each rectangle Ωij , i = 0, . . . , N , j = 0, . . . , M , the difference equation (28) can be rewritten in componentwise form 1 uik;jl = βij ui,k−1;jl + ui,k+1;jl + γij uik;j,l−1 + uik;j,l+1 αij − 2 ui,k−1;j,l−1 + ui,k−1;j,l+1 + ui,k+1;j,l−1 + ui,k+1;j,l+1 − uik;j,l−2 − uik;j,l+2 − ui,k−2;jl − ui,k+2;jl ,
(37)
where
2
pij q ij 2 αij = 20 + 2 +2 , ni mj
2
2 pij q ij βij = 8 + , γij = 8 + . ni mj Now using (37) we can write down SOR iterations to obtain a numerical solution on the refinement (ν+1) (ν+1) 1 (ν) (ν) uik;jl = βij ui,k−1;jl + ui,k+1;jl + γij uik;j,l−1 + uik;j,l+1 αij (ν+1) (ν) (ν+1) (ν) − 2 ui,k−1;j,l−1 + ui,k−1;j,l+1 + ui,k+1;j,l−1 + ui,k+1;j,l+1 (ν+1) (ν) (ν+1) (ν) − uik;j,l−2 − uik;j,l+2 − ui,k−2;jl − ui,k+2;jl , (ν+1)
(ν)
(ν)
uik;jl = uik;jl + ω(uik;jl − uik;jl ),
1 < ω < 2,
ν = 0, 1, . . . ,
k = 1, . . . , ni − 1, i = 0, . . . , N, l = 1, . . . , mj − 1, j = 0, . . . , M. Note that outside the domain Ω the extra unknowns u 0,−1;jl , uN,nN +1;jl , l = 0, . . . , mj , j = 0, . . . , M , and uik;0,−1 , uik;M,mM +1 , k = 0, . . . , ni , i = 0, . . . , N , are eliminated using (35) and are not part of the iterations.
9.
Method of Fractional Steps
The system of difference equations obtained in Section 4 can be efficiently solved by the method of fractional steps [21]. Using the initial approximation (36) let us consider in each rectangle Ω ij , i = 0, . . . , N , j = 0, . . . , M , the following splitting scheme un+1/2 − un + Λ11 un+1/2 + Λ12 un = 0, τ un+1 − un+1/2 + Λ22 un+1 + Λ12 un+1/2 = 0, τ
(38)
83
DMBVP for Tension Splines
where
Λ11 = Λ21 − pΛ1 , Λ22 = Λ22 − qΛ2 , Λ12 = Λ1 Λ2 ,
2
2 pij q ij p= , q= , hi lj u = uik;jl | k = 1, . . . , ni − 1, i = 0, . . . , N ; l = 1, . . . , mj − 1, j = 0, . . . , M .
Eliminating from here the fractional step u n+1/2 yields the following scheme in whole steps, equivalent to the scheme (38), un+1 − un + (Λ11 + Λ22 )un+1 + 2Λ12 un + τ (Λ11 Λ22 un+1 − Λ212 un ) = 0. τ (39) It follows from here that the scheme (39) and the equivalent scheme (38) possess the property of complete approximation [21] only in the case if Λ11 Λ22 = Λ212
or
pij = qij = 0 for all
i, j.
Let us prove the unconditional stability of the scheme (38) or, which is equivalent, the scheme (39). Using usual harmonic analysis [21] assume that un = ηn eiπz ,
un+1/2 = ηn+1/2 eiπz ,
z = k1
y − yj x − xi + k2 . (40) hi lj
Substituting equations (40) into equations (38) we obtain the amplification factors ηn+1/2 1 − a1 a2 ηn+1 1 − a1 a2 √ √ ρ1 = = , ρ2 = = , 2 ηn ηn+1/2 1 − p τ a 1 + a1 1 − q τ a2 + a22 ρ = ρ 1 ρ2 =
(1 − a1 a2 )2 √ √ , (1 − p τ a1 + a21 )(1 − q τ a2 + a22 )
where
√ 4 τ 2 k1 h π a1 = − 2 sin , h 2 hi
√ 4 τ k2 h π a2 = − 2 sin2 , h 2 lj
k1 = 1, . . . , ni − 1,
k2 = 1, . . . , mj − 1, mj h = lj .
It follows from here that 0≤ρ≤
ni h = hi ,
(1 − a1 a2 )2 ≤ (1 + a21 )(1 + a22 )
1 − a1 a2 1 + a1 a2
2 <1
for any τ . This proves the unconditional stability of the scheme (38).
84
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
At each fractional step in (38) one has to solve a linear system with a symmetric positive definite pentadiagonal matrix. This is much cheaper than directly solving the linear system (28). However, in general the scheme (38) has the property of incomplete approximation [21]. For this reason, √ in iterations we have to use small values of the iteration parameter τ , e.g., τ /h2 = const.
10.
Graphical Examples
The aim of this final section is to illustrate the tension features of discrete hyperbolic and biharmonic tension splines with some (famous) examples. Before, we want to notice that the continuous form U i of our solution given in (13) has the good shape-preserving properties of cubics (see e.g. [17]) in the sense that Ui is convex (concave) in [xi , xi+1 ] if and only if mi+j ≥ 0 (≤ 0), j = 0, 1, and has at most one inflection point in [x i , xi+1 ]. In order to preserve the shape of the data, we therefore simply have to analyze the values Λ i ui,0 and Λi ui,ni and increase the tension parameters if necessary. All the strategies proposed for the automatic choice of tension parameters in continuous hyperbolic tension spline interpolation can be used in our discrete context, see e.g., [16, 17]. In our first example we have interpolated the radio chemical data reported in Table 1. Table 1.
Radio chemical data:
xi 7.99 fi
0 xi fi
8.09
8.19
8.7
9.2
2.76429e−5 4.37498e−2 0.169183 0.469428 10
12
15
20
0.943740 0.998636 0.999916 0.999994
The effects of changing the tension values p i are depicted in Figs. 1–2. We have adopted a non–uniform mesh, assigning the same number of points (30) to each interval of the main mesh, and imposed natural end conditions, that is, following formulas (15), m0 = mN +1 = 0. Fig. 1 is obtained setting pi = 0, that is considering the discrete cubic spline interpolating the data. In Fig. 2 a new discrete interpolant with p 0 = p1 = 300, pi = 15, i = 2, . . . , 7, is displayed for the same data, and the stretching effect of the increase in tension parameters is evident. In the second example we have taken Akima’s data of Table 2 and constructed discrete interpolants with 20 points for each interval, with natural end conditions m0 = mN +1 = 0.
85
DMBVP for Tension Splines
0.05
1 0.04
0.8 0.03
0.6 0.02
0.4 0.01
0.2 0
0 8
10
12
14
16
18
−0.01 7.7
20
7.8
7.9
8
8.1
8.2
8.3
8.4
Figure 1. The radio chemical data with natural end conditions m0 = mN+1 = 0. Left: Interpolation by discrete cubic spline ((pi = 0). Right: a magnification of the lower left corner.
0.05
1 0.04
0.8 0.03
0.6 0.02
0.4 0.01
0.2 0
0 8
10
12
14
16
18
−0.01 7.7
20
7.8
7.9
8
8.1
8.2
8.3
8.4
The same as Fig. 1 with p0 = p1 = 300, pi = 15, i = 2, . . . , 7.
Figure 2.
Fig. 3 left shows the plot produced by a uniform choice of tension factors, namely pi = 0. The right part of the same figure shows a second mesh solution,
Table 2.
Akima’s data [1]:
xi fi
0
2
3
5
6
8
9
11 12 14 15
10 10 10 10 10 10 10.5 15 50 60 85
86
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
which perfectly reproduces the data shape, where we have set p 5 = p6 = p8 = 10 while the remaining pi are unchanged. 90
90
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
10
0
0
2
4
6
8
10
12
14
0
16
0
2
4
6
8
10
12
14
16
Figure 3. Akima’s data with natural end conditions. Left: Discrete interpolating cubic spline (pi = 0). Right: discrete hyperbolic spline with p5 = p6 = p8 = 10.
In 2–D case the approach developed in this paper was tested on a number of numerical examples. Because of space limitations we consider here only some of them. The initial data (xi , yj , f˜ij ) in Fig. 4 was obtained by taking Akima’s data in Table 2 both in x and y directions and using the formula f˜ij = fi + fj . As shown in Fig. 5 the usual discrete biharmonic spline does not preserve the monotonicity and convexity properties of the initial data.
80 70 60
z
50 40 30 20 10 15 15
10 10 5 y
Figure 4.
5 0
0
x
The initial data.
87
DMBVP for Tension Splines
Figure 5.
The biharmonic spline interpolation.
On the other hand the discrete biharmonic tension spline in Fig. 6 preserves the data shape and gives a visually smooth surface.
Figure 6.
The biharmonic tension spline interpolation.
The exponential function f (x, y) =
3 − 1 [(9x−2)2 +(9y−2)2 ] 3 −[ 1 (9x+1)2 + 1 (9y+1)] 10 e 4 + e 49 4 4 1 2 2 1 1 2 2 − e−[(9x−4) +(9y−7) ] + e− 4 [(9x−7) +(9y−3) ] 5 2
(41)
88
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
has been used in [5, 7] to obtain a scattered data. A graph of the function (41) with the data points marked by circles is shown in Fig. 7.
1.4 1.2 1 0.8 0.6 0.4 −0.2
0.2
0 0.2
0 −0.2
Figure 7.
0.4
0
0.2
0.4
0.6 0.6
0.8 0.8
1
1 1.2
1.2
A graph of the function (41) with the data points marked by circles.
1.2
1
0.8
1.2 1
0.6
z
y
0.8
0.4
0.6 0.4
0.2
−0.2
0.2
0 0.2
−0.2
0
−0.2 −0.2
0
0.2
0.4
x
0.6
0.8
1
1.2
0
0.4 0.2
0.4
0.6 0.6
0.8 0.8
1
1 1.2
1.2
x
y
Figure 8. The initial data. Left: a projection of the data points on the xy plane. Right: a surface obtained by joining the data points.
A projection of the data points on xy plane and a surface obtained by joining the data points by pieces of straight lines are given in Fig. 8. Fig. 9 presents the resulting biharmonic surface under tension. The initial topographical data in the next test is shown in Fig. 10. Fig. 11 is obtained by setting all tension parameters to zero, that is, considering usual discrete biharmonic spline interpolating the data. It gives oscillations which are unnatural for the data. The situation can be substantially improved by
89
DMBVP for Tension Splines
Figure 9.
The resulting biharmonic surface under tension.
Figure 10.
A view of the initial topographical data.
using biharmonic tension spline with automatic selection of the shape control parameters. The resulting discrete tension spline in Fig. 12 has no oscillations and simultaneously keeps a visually smooth surface. A reconstruction of the jet’s surface is shown in Figs. 13–15. The initial data was defined as a set of 16 pointwise-assigned non-intersecting and in general curvilinear sections of a 3-D body. The number of points varied from section to section with a total of 212 points. Fig. 13 gives the initial data. Figs. 14 and
90
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Figure 11.
Figure 12.
A surface “without tension”.
The resulting surface under tension.
15 show the biharmonic surfaces without tension and with “optimal” tension parameters, respectively. As a last numerical test, we tried to reconstruct the surface of a “Viking ship”. The initial data, which the author obtained from Professor T. Lyche of the Oslo University, was defined pointwise in the form of the envelopes of the sides and the keel of the boat, as well as six ribs. 3-D view of the data is given in Fig. 16. In Figs. 17 and 18 the resulting biharmonic tension surface is given for very large and “optimal” tension parameters with a mesh of lines 100 × 100.
91
DMBVP for Tension Splines
Figure 13.
Figure 14.
The initial jet’s data.
The biharmonic surface without tension.
Applying the SOR iterative method or using the method of fractional steps we obtain practically the same results. However the method of fractional steps converges about three times faster than the SOR iterations. But the operation count at each step of the SOR iterative method is approximately three times less than that in the method of fractional steps. Therefore, the performance of both methods is very similar. They can be also easily modified for use on parallel processor computers.
92
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Figure 15.
The resulting surface under tension.
Acknowledgments Work partially supported by the Russian Academy of Sciences. Appreciation is also rendered to the Russian Foundation for Basic Research for the financial support which has made this research possible.
30
25
20
15
10
5
0 60 40 −5 15
20 10
0 5
0
−20 −5
−40 −10
−15
−60
3–D view of the data.
93
DMBVP for Tension Splines
30 25 20 15 10 5 0 −5 15
Figure 17.
Figure 18.
100 50 10
5
0 0
−5
−50 −10
−15
−100
The biharmonic surface for very large tension parameters.
The resulting biharmonic surface with “optimal” tension parameters.
References [1] Akima, H., A new method of interpolation and smooth curve fitting based on local procedures, J. Assoc. Comput. Mech. 17 (1970), 589–602. [2] Bouhamedi, A. and Le M´e´ haute´ , A., Spline curves and surfaces with tension, Wavelets, Images, and Surface Fitting, Laurent, P. J., Le M´e´ haute´ , A., and Schumaker, L. L. (eds.), A K Peters, Wellesley, MA, pp. 51–58, 1994. [3] Costantini, P., Kvasov, B. I., and Manni, C., On discrete hyperbolic tension splines, Adv. Comput. 11 (1999), 331–354.
94
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[4] Duchon, J., Splines minimizing rotation invariant semi-norms in Sobolev spaces, Constructive Theory of Functions of Several Variables, Schempp, W. and Zeller, K. (eds.), Lecture Notes in Mathematics, Vol. 571, Springer, pp. 85–100, 1977. [5] Franke, R., Thin plate splines with tension, Surfaces in CAGD’84, Barnhill, R. E. and ¨ Bohm, W. (eds.), North-Holland, pp. 87–95, 1985. [6] Golub, G. H., and Van Loan, C. F., Matrix Computations, John Hopkins University Press, Baltimore, 1991. [7] Hoschek, J. and Lasser, D., Fundamentals of Computer Aided Geometric Design, A K Peters, Wellesley, MA, 1993. [8] Janenko, N. N. and Kvasov, B. I., An iterative method for construction of polycubic spline functions, Soviet Math. Dokl. 11 (1970), 1643–1645. [9] Koch, P. E. and Lyche, T., Interpolation with Exponential B-splines in Tension, in: Geometric Modelling, Computing/Supplementum 8. Farin G. et al. (eds.), Springer-Verlag, Wien, pp. 173–190, 1993. [10] Kounchev, O., Multivariate Polysplines: Applications to Numerical and Wavelet Analysis, Academic Press, San Diego, 2001. [11] Kvasov, B. I., Methods of Shape-Preserving Spline Approximation, World Scientific Publ. Co. Pte. Ltd., Singapore, 2000. [12] Kvasov, B. I., On interpolating thin plate tension splines, Curve and Surface Fitting: SaintMalo 2002, Cohen, A., Merrien, J.-L., and Schumaker, L. L., Nashboro Press, Brentwood, 2003, pp. 239–248. [13] Laurent, P. J., Approximation et Optimization, Hermann, Paris, 1972. [14] Malcolm, M. A., On the computation of nonlinear spline functions, SIAM J. Numer. Anal. 14 (1977), 254–282. [15] Maruˇsˇic´ , M. and Rogina, M., Sharp error bounds for interpolating splines in tension, J. of Comp. Appl. Math. 61 (1995), 205–223. [16] Renka, R. J., Interpolation tension splines with automatic selection of tension factors, SIAM J. Sci. Stat. Comp. 8 (1987), 393–415. [17] Rentrop, P., An algorithm for the computation of exponential splines, Numer. Math. 35 (1980), 81–93. [18] Sapidis, N. S. and Kaklis, P. D., An algorithm for constructing convexity and monotonicitypreserving splines in tension, Computer Aided Geometric Design 5 (1988), 127–137. [19] Schumaker, L. L., Spline Functions: Basic Theory, John Wiley & Sons, New York, 1981. [20] Schweikert, D. G., An interpolating curve using a spline in tension, J. Math. Phys. 45 (1966), 312–317. [21] Yanenko, N. N., The Method of Fractional Steps, Springer Verlag, New York, 1971. [22] Zav’yalov, Yu. S., Kvasov, B. I., and Miroshnichenko, V. L., Methods of Spline Functions, Nauka, Moscow, 1980 (in Russian).
ROBUST NUMERICAL METHODS FOR THE SINGULARLY PERTURBED BLACK-SCHOLES EQUATION J J H Miller Department of Mathematics, Trinity College Dublin, Ireland
[email protected]
G I Shishkin Institute for Mathematics & Mechanics, Russian Academy of Science Ekaterinburg, Russia
[email protected]
Abstract
We discuss a dimensionless formulation of the Black-Scholes equation for the value of a European call option. We observe that, for some values of the parameters, this may be a singularly perturbed problem. We demonstrate numerically that, in such a case, a standard numerical method on a uniform mesh does not produce robust numerical solutions. We then construct a new numerical method, on an appropriately fitted piecewise-uniform mesh, which generates numerical approximations that converge parameter-uniformly in the maximum norm to the exact solution.
Keywords:
Black Scholes, singular perturbation, numerical method, robust.
1.
Introduction
It is known that singular perturbation problems typically arise in financial models. The Black-Scholes equation in dimensionless variables is an equation with two small parameters multiplying the highest space and time derivatives. For such problems, both boundary and initial parabolic layers appear in the solution. It was shown previously that fitted operator methods on uniform meshes do not make it possible to compute robust numerical approximations that converge in the maximum norm to the exact solution uniformly with respect to the small parameters. However, the use of an appropriate fitted mesh technique allows us to construct a finite difference method converging in the maximum 95 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 95–105. © 2005 Springer. Printed in the Netherlands.
96
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
norm parameter-uniformly. By parameter-uniform convergence we mean that the order of convergence in the maximum norm and the error constant are independent of these parameters. A robust layer-resolving numerical method, or, more concisely, a robust numerical method, is a parameter-uniform numerical method that is also monotone and requires a parameter-uniform quantity of computational effort to compute its approximations. The main goal of this paper is to show experimentally that, in the case of an initial layer, the errors in the maximum norm of an upwind finite difference method on uniform meshes are unsatisfactorily large, while the errors in the maximum norm of the same upwind finite difference method on piecewiseuniform meshes, appropriately fitted to the initial layer in some neighbourhood of the layer, do not depend on the value of the parameter k and have an order of convergence close to unity.
2.
Problem formulation
The value of a European call option is denoted by C = C(S, t) where S is the current value of the underlying asset and t is the time. S and t are the independent variables and the domain is (0, ∞) × (0, T ). The value of the option also depends on σ, the volatility of the underlying asset; E the exercise price; T the expiry time; r the interest rate. Typical ranges of values of T in years, r in percent per annum and σ in percent per annum arising in practice are 1 ≤ T ≤ 1, 12 .01 ≤ r ≤ .2, .01 ≤ σ ≤ .5.
(1) (2) (3)
The Black–Scholes equation governing C(S, t) is ∂C ∂2C ∂C 1 + σ 2 S 2 2 + rS − rC = 0; ∂t 2 ∂S ∂S
(4)
the final condition is C(S, T ) = max(S − E, 0);
(5)
the boundary condition at S = 0 is C(0, t) = 0
(6)
and the boundary condition at +∞ is C(S, t) ∼ S as S → ∞.
(7)
Robust numerical methods for the singularly perturbed Black-Scholes equation
97
See e.g. [7] for an elementary treatment. We can transform this final–value problem to an initial–value problem in dimensionless form in the following way. We change the independent variables S, t to the new independent variables x, τ by the transformation τ S = Eex , t = T − 1 2 2σ and the dependent variable C(S, t) to the new dependent variable v(x, τ ) by the transformation C(S, t) = Ev(x, τ ). In these variables the domain becomes (−∞, ∞) × (0, T ) and the equation transforms to ∂v ∂v ∂2v + (k − 1) = − kv (8) 2 ∂τ ∂x ∂x where k = 1 rσ2 is a dimensionless parameter. 2
The initial condition is now v(x, 0) = max(ex − 1, 0)
(9)
and the boundary conditions at x = ±∞ are v(x, −∞) = 0, v(x, τ ) ∼ ex as x → ∞.
(10) (11)
The only other independent parameter in the problem is the dimensionless time to expiry T = 12 σ 2 T , since the other four parameters E, T, σ and r can all be expressed in terms of these two. The ranges of values of T and k arising from (1)-(3) are .0000041 ≤ T ≤ .125, .08 ≤ k ≤ 4000.
(12) (13)
The exact solution of equation (8) satisfying the initial and boundary conditions (9),(10),(11) is v(x, τ ) = ex N (d+ (x, τ )) − e−kτ N (d− (x, τ ))
(14)
where √ x 1 d+ (x, τ ) = √ + (k + 1) 2τ , 2 2τ √ x 1 d− (x, τ ) = √ + (k − 1) 2τ , 2 2τ
(15) (16)
98
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
and N is the cumulative distribution function for the normal distribution with mean 0 and standard deviation 1, which is given by ! y 1 2 1 N (y) = √ e− 2 s ds. (17) 2τ −∞ A problem involving a differential equation becomes a singular perturbation problem when the coefficient of the highest derivative in the differential equation is multiplied by a small parameter, called the singular perturbation parameter. In the equation (8) the parameter k can be large, in which case its inverse is the singular perturbation parameter. Thin layer phenomena arise in singular perturbation problems, which therefore form an important subclass of mathematical problems having non–smooth solutions.
3.
Numerical solutions of singular perturbation problems
In this paper we are interested in studying the numerical difficulties that arise in the computation of the numerical solutions of the Black-Scholes equation (8) when the parameter k is large. Since there are several sources of these difficulties we focus on the difficulties arising from the singularly perturbed nature of the equation. Therefore, we restrict the discussion to problems with smooth initial conditions and, instead of the family of piecewise-smooth initial conditions (9), we consider the family of smooth initial conditions v(x, 0) = eαx
(18)
for various values of the parameter α. The exact solution of the problem with the smooth initial condition (18) is v(x, τ ) = eαx−βτ
(19)
where β = (1 − α)(k + α). We note that β ≥ 0 when − k ≤ α ≤ 1 and that there is an initial layer of width 1/β when 0 < β << 1. When α = 0, β = 1/k and so β is small when k is large, which shows that for large k the initial layer in the solution is thin. In the following sections we construct and test different numerical methods for determining numerical approximations to the solution of problem (8), (18), (10) and (11) for α = 0. It is shown, in the next section, that a standard upwind numerical method on a uniform mesh does not generate robust approximations to the solution of this problem. In the following section it is shown that the
Robust numerical methods for the singularly perturbed Black-Scholes equation
99
modification of the uniform mesh to an appropriately fitted piecewise-uniform mesh in this standard numerical method allows us to overcome this difficulty. We remark that the problems considered here have simple analytic solutions and consequently need not be solved numerically. Our reason for solving them numerically is that their simplicity enables us to gain a clear understanding of the numerical difficulties encountered in attempts to compute robust numerical approximations to their solution and to find ways of overcoming these difficulties. It is important to note that the new numerical methods developed in the present paper produce robust approximations not only for the simple problems considered here, but also for more complex problems of practical interest. The detailed justification of the latter assertion will be the subject of future publications.
4.
Upwind uniform mesh method
We begin by showing that a widely-used standard numerical method, see e.g. [3], [7], does not generate robust approximations to the solution of the problem (8), (18), (10) and (11) with α = 0 and various values of the parameter k. A graph of the exact solution of this problem with k = 2 9 is given in Figure 1. The thin initial layer in the neighbourhood of τ = 0 is easily identified.
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1
0.5
Figure 1.
0
−0.5
−1
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
Graph of exact solution with α = 0, k = 29
We solve the problem numerically on the finite domain (−1, 1) × (0, T ). We use a uniform rectangular mesh on this domain with N x + 1 equal mesh
100
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
sub-intervals of width ∆x in the x-direction and N τ equal mesh sub-intervals of width ∆τ in the τ -direction. We use a finite difference operator, implicit in the time-like variable τ , with upwind first order and centered second order finite differences in the x variable. The finite difference equations are Dτ− Vi,j = Dx+ Dx− Vi,j + (k − 1)+ Dx+ Vi,j + (k − 1)− Dx− Vi,j − kV Vi,j (20) where 1 ≤ i ≤ Nx and 1 ≤ j ≤ Nτ and we use the standard notation Dx+ , Dx− , Dτ− for the first order forward and backward discrete partial derivatives and φ+ = 12 (φ + |φ|), φ− = 12 (φ − |φ|) for any function φ. The initial condition is Vi,0 = v(xi , 0) (21) for 0 ≤ i ≤ Nx + 1 and the boundary conditions are
VNx +1,j
V0,j = v(0, τj ), = v(xNx +1 , τj ),
(22) (23)
for 1 ≤ j ≤ Nτ . We remark that in general the quantities on the right sides of the initial and boundary conditions (22) and (23) are unknown, and so in general we must replace them by appropriate approximations. However, for the problems considered here we know the exact solutions, and we use these to give exact values in (22) and (23) to eliminate an additional potential source of error in the numerical approximations. Introducing the notation ∆τ (1 − ∆x(k − 1)− ), (∆x)2 ∆τ b=− (1 + ∆x(k − 1)+ ), (∆x)2 d = 1 − a − b + k∆τ,
a=−
the finite difference method can be written in the form AV Vj +1 = Vj + Fj where V and F are the vectors ⎛
⎞
⎛
⎜ V1,j ⎜ ⎟ ⎜ ⎜ Vj = ⎝ ... ⎠ , Fj = ⎜ ⎜ ⎝ VNx ,j
(24)
−aV V0,j 0 .. . 0 −bV VNx +1,j
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
Robust numerical methods for the singularly perturbed Black-Scholes equation 101
and A is the tridiagonal Toeplitz matrix of order N x ⎛
d
b .. . .. . .. .
0 ... .. .. ⎜ . . ⎜ a ⎜ . . ⎜ .. .. A=⎜ 0 ⎜ . .. .. ⎝ .. . . 0 ... 0 a
⎞ 0 .. ⎟ . ⎟ ⎟ ⎟ 0 ⎟. ⎟ b ⎠ d
The maximum pointwise error using this numerical method on a uniform mesh for various values of k and N = Nx = Nτ for the problem with α = 0 is given in Table 1. In this table the maximum value in each column is shown in italics. We see that these maxima all lie on a diagonal of the table and that they increase from 1.7344e-001 to 1.8787e-001 as the mesh is refined. This indicates that there is a persistent error along the diagonal, which cannot be reduced by refining the mesh. In other words, no matter how fine the mesh is made, we can find a value of the parameter k for which the error in the maximum norm for the corresponding problem cannot be made less than about 17%. Of course, if we are not interested in solving problems for arbitrarily large values of k, then we can take the mesh sufficiently fine so that the maximum value of k lies above the diagonal containing the maximum error in its column. But for this to hold the number of mesh points N = N x = Nτ must be much larger than the maximum value of k. A graph of the error in the numerical solution generated by this standard uniform mesh method is given in Figure 2 for k = 2 9 and N = Nx = Nτ = 32. Notice the difference in vertical scales in Figures 1 and 2.
5.
Upwind piecewise-uniform fitted mesh method
In this section we construct a new numerical method, which overcomes the limitations of the upwind method of the previous section. This is achieved by replacing the uniform mesh in the τ -dimension by an appropriately fitted piecewise-uniform mesh, in which the choice of the transition point between the coarse and fine parts of the mesh is the crucial non-trivial issue; see [4] for the first paper on this approach to the resolution of layers for parabolic equations, and [8], [2], [1] for further developments of the theory and applications of the technique. In [5], [6] it was proved that appropriately fitted piecewise-uniform meshes are a necessity for the construction of robust layer-resolving methods for problems with parabolic initial and boundary layers. In these two papers it was also proved that fitted operator methods on uniform meshes are not robust layer-resolving. The new numerical method is composed of the above upwind finite difference operator on an appropriately fitted piecewise-uniform mesh. Because
102
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Table 1. Computed maximum pointwise error for α = 0 on a uniform mesh for various values of k and N = Nx = Nτ . The maximum value in each column is in italics.
k\N
8
16
32
64
128
256
512
2−1
2.1701e-004
1.1051e-004
5.5659e-005
2.7921e-005
1.3982e-005
6.9962e-006
3.4994e-006
20
8.1383e-004
4.1491e-004
2.0912e-004
1.0492e-004
5.2543e-005
2.6291e-005
1.3150e-005
21
2.8351e-003
1.4531e-003
7.3503e-004
3.6951e-004
1.8523e-004
9.2735e-005
4.6397e-005
22
8.4554e-003
4.4332e-003
2.2646e-003
1.1436e-003
5.7455e-004
2.8795e-004
1.4415e-004
23
1.9340e-002
1.0365e-002
5.3669e-003
2.7323e-003
1.3784e-003
6.9225e-004
3.4689e-004
24
3.7801e-002
2.0799e-002
1.0927e-002
5.5933e-003
2.8292e-003
1.4224e-003
7.1314e-004
25
7.0453e-002
4.0019e-002
2.1482e-002
1.1133e-002
5.6614e-003
2.8529e-003
1.4317e-003
26
1.2165e-001
7.3201e-002
4.0942e-002
2.1732e-002
1.1191e-002
5.6732e-003
2.8554e-003
27
1.7344e-001
1.2616e-001
7.4673e-002
4.1305e-002
2.1811e-002
1.1202e-002
5.6743e-003
28
1.5393e-001
1.8052e-001
1.2789e-001
7.5239e-002
4.1455e-002
2.1836e-002
1.1205e-002
29
9.1527e-002
1.5911e-001
1.8448e-001
1.2882e-001
7.5521e-002
4.1517e-002
2.1845e-002
210
4.7856e-002
9.4906e-002
1.6275e-001
1.8628e-001
1.2927e-001
7.5650e-002
4.1546e-002
211
2.4442e-002
4.9702e-002
9.6618e-002
1.6479e-001
1.8714e-001
1.2948e-001
7.5712e-002
212
1.2353e-002
2.5405e-002
5.0645e-002
9.7786e-002
1.6577e-001
1.8760e-001
1.2958e-001
213
6.2099e-003
1.2844e-002
2.5899e-002
5.1120e-002
9.8446e-002
1.6625e-001
1.8787e-001
214
3.1134e-003
6.4582e-003
1.3097e-002
2.6148e-002
5.1437e-002
9.8772e-002
1.6649e-001
215
1.5588e-003
3.2382e-003
6.5861e-003
1.3225e-002
2.6273e-002
5.1618e-002
9.8933e-002
216
7.7992e-004
1.6213e-003
3.3025e-003
6.6507e-003
1.3289e-002
2.6356e-002
5.1708e-002
17
2
3.9009e-004
8.1124e-004
1.6536e-003
3.3350e-003
6.6832e-003
1.3321e-002
2.6403e-002
218
1.9508e-004
4.0576e-004
8.2739e-004
1.6699e-003
3.3513e-003
6.6995e-003
1.3342e-002
219
9.7548e-005
2.0292e-004
4.1384e-004
8.3556e-004
1.6781e-003
3.3595e-003
6.7077e-003
220
4.8776e-005
1.0147e-004
2.0696e-004
4.1793e-004
8.3966e-004
1.6822e-003
3.3636e-003
2
2.4389e-005
5.0736e-005
1.0349e-004
2.0900e-004
4.1998e-004
8.4172e-004
1.6843e-003
222
1.2194e-005
2.5369e-005
5.1746e-005
1.0451e-004
2.1003e-004
4.2101e-004
8.4275e-004
21
the computational domain is rectangular, we can use a two dimensional mesh ΩN Nx , Nτ ), that is a tensor product of one dimensional meshes. We ε , N = (N take a uniform mesh in the direction of the x- axis and an appropriately fitted piecewise–uniform mesh in the direction of the τ –axis. The tensor product of N Nx Nτ Nx these meshes is Ωε = Ωu × Ωε . Here Ωu is a uniform mesh with N x + 1 Nτ mesh intervals on the interval [−1, 1] of the x–axis, and Ω ε is a piecewise– uniform fitted mesh with Nτ mesh intervals on the interval [0, T ] of the τ –axis, such that the subinterval [0, σ] and the subinterval [σ, T ] are both subdivided into 12 Nτ uniform mesh intervals.
Robust numerical methods for the singularly perturbed Black-Scholes equation 103
0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 1
Figure 2.
0.5
0
−0.5
−1
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
Graph of error for uniform mesh with α = 0, k = 29 , Nx = Nτ = 32
In the present case the correct choice of the transition parameter σ is σ = min{T /2, (ln Nτ )/k} which ensures that the piecewise uniform mesh is appropriately fitted to the initial layer. The factor 1/k can be motivated from a priori estimates of the derivatives of the solution v. The maximum pointwise error for α = 0 on a piecewise uniform mesh for various values of k and N = Nx = Nτ is given in Table 2. In this table we see that the error in each row decreases as N increases. Also the error in each column increases until it stabilises at its maximum value. This maximum value is significantly less than the maximum value in the corresponding column of Table 1 for the uniform mesh case. Thus, for this particular problem, we have convergence of the numerical approximations, which is independent of the parameter k. Using the techniques in [4], [8] and [2] it can be proved theoretically that the above fitted mesh method generates approximations that converge k-uniformly for a large class of problems including the problem (8), (18), (10) and (11). A graph of the error in the numerical solution generated by this piecewise-uniform fitted mesh method is given in Figure 3 for k = 2 9 and N = Nx = Nτ = 32. Notice the difference in vertical scales in Figures 1, 2 and 3.
104
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Table 2. Computed maximum pointwise error for α = 0 on a piecewise uniform mesh for various values of k and N = Nx = Nτ .
k\N
8
16
32
64
128
256
512
2−1
2.1701e-004
1.1051e-004
5.5659e-005
2.7921e-005
1.3982e-005
6.9962e-006
3.4994e-006
20
8.1383e-004
4.1491e-004
2.0912e-004
1.0492e-004
5.2543e-005
2.6291e-005
1.3150e-005
21
2.8351e-003
1.4531e-003
7.3503e-004
3.6951e-004
1.8523e-004
9.2735e-005
4.6397e-005
22
8.4554e-003
4.4332e-003
2.2646e-003
1.1436e-003
5.7455e-004
2.8795e-004
1.4415e-004
23
1.9340e-002
1.0365e-002
5.3669e-003
2.7323e-003
1.3784e-003
6.9225e-004
3.4689e-004
24
3.7801e-002
2.0799e-002
1.0927e-002
5.5933e-003
2.8292e-003
1.4224e-003
7.1314e-004
25
7.0453e-002
4.0019e-002
2.1482e-002
1.1133e-002
5.6614e-003
2.8529e-003
1.4317e-003
26
7.3221e-002
5.3686e-002
3.5841e-002
2.1732e-002
1.1191e-002
5.6732e-003
2.8554e-003
27
7.3537e-002
5.4142e-002
3.6010e-002
2.2586e-002
1.3505e-002
7.8266e-003
4.4373e-003
28
7.4511e-002
5.4283e-002
3.6068e-002
2.2612e-002
1.3509e-002
7.8271e-003
4.4373e-003
9
2
7.5007e-002
5.4327e-002
3.6149e-002
2.2626e-002
1.3511e-002
7.8273e-003
4.4373e-003
210
7.5258e-002
5.4369e-002
3.6176e-002
2.2631e-002
1.3512e-002
7.8273e-003
4.4373e-003
211
7.5384e-002
5.4566e-002
3.6184e-002
2.2637e-002
1.3512e-002
7.8274e-003
4.4373e-003
212
7.5448e-002
5.4665e-002
3.6187e-002
2.2641e-002
1.3513e-002
7.8274e-003
4.4373e-003 4.4373e-003
13
2
7.5479e-002
5.4714e-002
3.6198e-002
2.2642e-002
1.3513e-002
7.8274e-003
214
7.5495e-002
5.4739e-002
3.6232e-002
2.2642e-002
1.3513e-002
7.8274e-003
4.4373e-003
215
7.5503e-002
5.4751e-002
3.6249e-002
2.2642e-002
1.3513e-002
7.8274e-003
4.4373e-003
216
7.5507e-002
5.4758e-002
3.6258e-002
2.2643e-002
1.3513e-002
7.8274e-003
4.4373e-003
17
2
7.5509e-002
5.4761e-002
3.6262e-002
2.2645e-002
1.3513e-002
7.8274e-003
4.4373e-003
218
7.5510e-002
5.4762e-002
3.6265e-002
2.2648e-002
1.3513e-002
7.8274e-003
4.4373e-003
219
7.5510e-002
5.4763e-002
3.6266e-002
2.2649e-002
1.3513e-002
7.8274e-003
4.4373e-003
220
7.5511e-002
5.4763e-002
3.6266e-002
2.2650e-002
1.3513e-002
7.8274e-003
4.4373e-003
221
7.5511e-002
5.4764e-002
3.6266e-002
2.2650e-002
1.3513e-002
7.8274e-003
4.4373e-003
222
7.5511e-002
5.4764e-002
3.6267e-002
2.2650e-002
1.3513e-002
7.8274e-003
4.4373e-003
6.
Summary
We discussed a dimensionless formulation of the Black-Scholes equation for the value of a European call option. We observed that, for some values of the parameters, this may be a singularly perturbed problem. We demonstrated numerically that, in such a case, a standard numerical method on a uniform mesh does not produce robust numerical solutions. We then constructed a new numerical method, on an appropriately fitted piecewise-uniform mesh, which generates numerical approximations that converge parameter-uniformly in the maximum norm to the exact solution.
Robust numerical methods for the singularly perturbed Black-Scholes equation 105
0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 1
Figure 3.
0.5
0
−0.5
−1
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
Graph of error for piecewise uniform mesh with α = 0, k = 29 , Nx = Nτ = 32
Acknowledgments This research was supported in part by the Enterprise Ireland Basic Research Grant No. SC-2000-070 and by the Russian Foundation for Basic Research Grant No. 01-01-01022.
References [1] Farrell P.A., Hegarty A.F, Miller J.J.H., O’Riordan E., Shishkin G.I., Robust Computational Techniques for Boundary Layers, Chapman and Hall/CRC, Boca Raton (2000). [2] Miller J.J.H., O’Riordan E., Shishkin G.I., Fitted Numerical Methods for Singular Perturbation Problems, World Scientific, Singapore, (1996). [3] Shaw W., Modelling Financial Derivatives with Mathematica. Cambridge University Press (1998). [4] Shishkin G.I., Approximation of solutions of singularly perturbed boundary value problems with a parabolic boundary layer,USSR Comput. Maths. Math. Phys., 29, 1–10 (1989). [5] Shishkin G.I., On finite difference fitted schemes for singularly perturbed boundary value problems with a parabolic boundary layer,J. Math. Anal. & Applications, 208, 181–204 (1997). [6] Shishkin G.I., On finite difference schemes ffor singularly perturbed problems with an initial parabolic layer,Communications in Applied Analysis, 5 (1), 1–16 (2001). [7] Wilmott P., Howison S., Dewynne J., The Mathematics of Financial Derivatives. Cambridge University Press (1995). [8] Shishkin G.I., Grid Approximations of Singularly Perturbed Elliptic and Parabolic Equations, Ural Branch of Russian Acad. Sci., Ekaterinburg, (1992) (in Russian).
II
CONTRIBUTED LECTURES
ON CERTAIN PROPERTIES OF SPACES OF LOCALLY SOBOLEV FUNCTIONS Nenad Antoni´c Department of Mathematics, University of Zagreb Bijeniˇcˇ ka cesta 30, 10000 Zagreb, Croatia
[email protected]
Kreˇsˇimir Burazin Department of Mathematics, University of Osijek Trg Ljudevita Gaja 6, 31000 Osijek, Croatia
[email protected]
Abstract
In recent years the locally Sobolev functions got quite popular in works on applications of partial differential equations. However, the properties of those spaces have not been systematically studied and proved in the literature, resulting in many particular proofs by reduction to classical Sobolev spaces. Following some hints of general theory scattered through classical literature, as well as some proofs of special cases, we systematically present the main m,p and Wcm,p spaces, their duality, reflexresults regarding the properties of Wloc ivity, imbeddings, density, weak topologies, etc., with particular emphasis on applications in partial differential equations of mathematical physics.
Keywords:
locally Sobolev functions, function space
1.
Introduction
Let us briefly state some basic facts concerning Sobolev spaces; more details could be found in [1], [2], [3], [4], [8] and [10]. Even though most of the statements are true for functions defined on paracompact manifolds with appropriate smoothness, bearing in mind the intended use in the theory of partial differential equations and applications, we shall restrict ourselves to the functions defined on an open set Ω ⊆ Rd . The functions will in general be complex (thus covering the real case as well), although the results are true for functions taking values in Cr as well. Assuming the familiarity of the reader with standard properties of Lebesgue spaces Lp (Ω), for p ∈ [1, ∞], and distributions (the derivatives will be assumed 109 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 109–120. © 2005 Springer. Printed in the Netherlands.
110
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
in the distributional sense), for m ∈ N 0 we define: ⎧ 1 ⎨ , p α f p ∂ , p |α|≤m L (Ω) f Wm,p (Ω) := ⎩ max α |α|≤m ∂ f L∞ (Ω) ,
1 ≤ p < ∞, p = ∞,
and the Sobolev spaces Wm,p (Ω) := {f ∈ Lp (Ω) : f Wm,p (Ω) < ∞} , which are Banach spaces with the above norm. In particular, W 0,p (Ω) = Lp (Ω), and we denote Hm (Ω) := Wm,2 (Ω), which are Hilbert spaces. D(Ω) m,p (Ω), and by Wm,p (Ω) we denote (= C∞ c (Ω)) is continuously imbedded in W 0 the closure of D(Ω) in Wm,p (Ω), which contains all functions from W m,p (Ω) with compact support. The spaces W m,p (Ω) and W0m,p (Ω) are continuously imbedded in distributions; if p < ∞ they are separable and imbedded in the pair of weak topologies as well; they are reflexive and uniformly convex for 1 < p < ∞. For ϕ ∈ D(Ω), the mapping f → ϕf is a continuous linear operator Wm,p (Ω) −→ W0m,p (Ω). For p ∈ [1, ∞], let p be the conjugate exponent defined by 1p + p1 = 1. If
p < ∞, by W−m,p (Ω) = W0−m,p (Ω) we denote the space of all continuous antilinear functionals on W0m,p (Ω). W−m,p (Ω) is continuously imbedded in distributions, which is true also if we consider weak ∗ topology on W −m,p (Ω) and weak (∗) on D (Ω). Due to the imbedding by the transpose operator, the duality product is preserved. For |α| ≤ m and v ∈ Lp (Ω), the distribution ∂ α v is in W−m,p (Ω), in the sense (note our consistent use of sesquilinear products) ! α |α| α |α| m,p p ∂ v, u = v, (−1) ∂ u = (−1) v∂ α u ¯. L (Ω) W0 (Ω) W−m,p (Ω) Lp (Ω)
Ω
p
More precisely, f → ∂ α f is a continuous mapping L (Ω) −→ W−m,p (Ω), the space Lp (Ω) (thus D(Ω) as well) is continuously, for p > 1 also densely, imbedded in W−m,p (Ω). The space W−m,p (Ω) is isometrically isomorphic to the Banach space of distributions of the form T = (−1)|α| ∂ α vα , |α|≤m
, for some v = (vα )|α|≤m ∈ Lp (Ω; CN ) (N := |α|≤m 1), with the norm T := inf v Lp (Ω;CN ) : T = (−1)|α| ∂ α vα . |α|≤m
111
Locally Sobolev functions
The multiplication of distributions by a test function is a continuous linear operator on W−m,p (Ω) and satisfies W−m,p (Ω) ϕv, u W0
m,p
(Ω)
:= W−m,p (Ω) v, ϕu ¯ W0m,p (Ω) .
For results concerning locally convex topological vector spaces, in particular those equipped with the topology of strict inductive limit, we refer to [5], [7], [9], [10] or [12]. Concluding these introductory remarks, let us note that the benefit of writing the precise results and proofs of these results occurred when the first author was teaching a course on Homogenisation theory following some unpublished Lecture notes by Luc Tartar.
2.
Spaces of locally Sobolev functions
Let m ∈ Z, 1 < p ≤ ∞ (if m ≥ 0 we allow p = 1); for K ∈ K(Ω) (compact) we define m,p WK (Ω) := {f ∈ Wm,p (Ω) : supp f ⊆ K} . m,p Lemma 1. WK (Ω) is a Banach subspace of Wm,p (Ω).
Next we define Wcm,p (Ω) :=
m,p WK (Ω) = Wm,p (Ω) ∩ E (Ω) .
K∈K(Ω) m,p On Wcm,p (Ω) we define the strict inductive limit topology of spaces W K (Ω), m,p K ∈ K(Ω). With this topology Wc (Ω) is a complete locally convex topological vector space which is not metrisable. Also, W cm,p (Ω) is continuously imbedded in Wm,p (Ω). Similarly as in [8] (Lemma 5.2, pg. 256), the following can be shown
Lemma 2. The canonical imbedding C ∞ Wcm,p (Ω) is continuous. Furc (Ω) → thermore, if p < ∞, it has a dense image. Due to the imbedding by transpose operator we get Corollary 1. For p < ∞, the space (Wcm,p (Ω)) of all continuous antilinear functionals on Wcm,p (Ω) is continuously imbedded in D (Ω), both in the pair of strong and the pair of weak ∗ topologies. Let Φ be a subset of D(Ω) with the following property (∀ x ∈ Ω)(∃ ϕ ∈ Φ) Re ϕ(x) > 0 . We define m,p Wloc (Ω) := {T ∈ D (Ω) : (∀ ϕ ∈ Φ)
ϕT ∈ Wm,p (Ω)} .
(1)
112
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
m,p Wloc (Ω) is a vector subspace of D (Ω), an we equip it by a locally convex topology induced by the family of seminorms (| · | ϕ )ϕ∈Φ , where, for ϕ ∈ Φ and m,p T ∈ Wloc (Ω) |T |ϕ := ϕT Wm,p (Ω) . m,p Lemma 3. The definition of Wloc (Ω) and its topology does not depend on the choice of a family Φ with property (1).
Proof. Let us show that the space defined by D(Ω) is included in the one generated by Φ. Take T ∈ D (Ω) such that ϕT ∈ Wm,p (Ω), ϕ ∈ Φ. For ψ ∈ D(Ω) arbitrary, let us show ψT ∈ W m,p (Ω). For x ∈ Ω there exists ϕ ∈ Φ such that Re ϕ(x) > 0. The family (U Uϕx )x∈Ω , Uϕx := {y ∈ Ω : Re ϕ(y) > 1 2 Re ϕ(x)} is an open cover for Ω. In particular, it is an open cover for the compact supp ψ, and therefore there exists a finite subcover U ϕx11 , Uϕx22 , ..., Uϕxnn . n Denoting U := Uϕxii , the function n1 ϕi ∈ C∞ (U ) is well defined and i=1
i=1
n ψ , ψT = ψT| = n ϕi T U i=1 ϕi
∈ Wm,p (Ω) .
i=1
The other inclusion is obvious and the sets are equal. Clearly, the topology defined by D(Ω) is stronger than the one defined by Φ. To show the converse, take ψ and ϕ1 , ϕ2 , ..., ϕn as before: . .
. n n . . ψ . , ψT Wm,p (Ω) = . ϕ T ≤ M ϕi T Wm,p (Ω) , i . n ϕi . m,p i=1 W (Ω) i=1
i=1
which proves the statement. m,p Lemma 4. For each nonzero distribution T ∈ W loc (Ω) there exists ϕ ∈ Φ such that |T |ϕ = 0. m,p This lemma implies that the above defined topology on W loc (Ω) is Hausdorff, and since we can choose a countable Φ = {ϕ n : n ∈ N}, it follows that m,p Wloc (Ω) is metrisable, a metric being
d(T, S) := max 2−n n∈N
|T − S|ϕn . 1 + |T − S|ϕn
m,p Theorem 1. With this metric Wloc (Ω) is a Frechet r´ space.
Proof. It remains to show completeness. Let (T T j ) be a Cauchy sequence in m,p Wloc (Ω); then for each n ∈ N (ϕn Tj )j is a Cauchy sequence in Wm,p (Ω), and therefore converges to some T n . In particular: ϕn Tj − − T n
in
D (Ω) .
113
Locally Sobolev functions
m Take ψ ∈ D(Ω) arbitrary, and let Uϕx11 , Uϕx22 , ..., Uϕxm be a finite subcover for supp ψ as in the proof of Lemma 3. Then
T Tj , ψ =
m / n=1
ψ
ϕn Tj , ,m
¯n n=1 ϕ
0
−→
m /
ψ T n , ,m
¯n n=1 ϕ
n=1
0 ,
which implies that T, ψ := lim T Tj , ψ defines a distribution T . For ϕn ∈ Φ j−→∞
it holds ϕn T, ψ = T, ϕ¯n ψ = lim T Tj , ϕ¯n ψ = lim ϕn Tj , ψ = T n , ψ , j−→∞
j−→∞
m,p so ϕn T = T n ∈ Wm,p (Ω), and therefore T ∈ Wloc (Ω). Now, as ϕn Tj −−→ m,p ϕn T in W (Ω) for ϕn ∈ Φ, it follows that Tj converges to T in the metric m,p of Wloc (Ω). m,p For m ∈ N0 , f ∈ Wloc (Ω) is a measurable function on Ω, and for ϕ ∈ D(Ω) we have an alternative representation of duality by integrals ! f ϕ¯ . D (Ω) f, ϕ D(Ω) = Ω
The Leibniz formula implies (∀ α ∈ Nd0 ) |α| ≤ m =⇒ = ∂ α f ∈ Lploc (Ω), which m,p is also a sufficient condition for f ∈ Wloc (Ω). As the multiplication by a test function is continuous on Wm,p (Ω), for ϕ ∈ D(Ω) and f ∈ Wm,p (Ω) we have ϕf Wm,p (Ω) ≤ Mϕ f Wm,p (Ω) , implying the following m,p Lemma 5. Wm,p (Ω) is continuously imbedded in Wloc (Ω). m,p Lemma 6. The canonical imbedding C ∞ (Ω) → Wloc (Ω) is continuous. Furm,p ∞ thermore, if p < ∞ then Cc (Ω) is dense in Wloc (Ω).
Lemma 6 can be proved along the same lines as [8] (Lemma 5.5, pg. 257). The detailed proof could be found in [6]. Similarly as Corollary 1 we get m,p Corollary 2. For p < ∞, the space (Wloc (Ω)) of all continuous antilinear m,p functionals on Wloc (Ω) is continuously imbedded in D (Ω), both in the pair of strong, and the pair of weak ∗ topologies.
3.
−m,p Duality of spaces Wcm,p(Ω) and Wloc (Ω)
In the sequel we assume p < ∞.
−m,p Theorem 2. Wloc (Ω) is topologically isomorphic to the strong dual of Wcm,p (Ω).
114
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
−m,p Proof. Let us first show that Wloc (Ω) = (Wcm,p (Ω)) (as subsets of D (Ω)). m,p T ∈ (Wc (Ω)) can be restricted to a continuous antilinear functional on m,p D(Ω), as well as to a one on WK (Ω), for K ⊆ Ω compact. For ϕ, ψ ∈ D(Ω): D (Ω) ϕT, ψ D(Ω)
= D (Ω) T, ϕψ ¯ D(Ω) m,p = (Wc (Ω)) T, ϕψ ¯ Wcm,p (Ω) = (Wm,p (Ω)) T, ϕψ ¯ Wm,p (Ω) , K
K
where K = supp ϕ, which implies |D (Ω) ϕT, ψ D(Ω) | ≤ T (Wm,p (Ω)) ϕψ ¯ Wm,p (Ω) ≤ C ψ W0m,p (Ω) , K
K
and we conclude that ϕT is a continuous antilinear functional on D(Ω) in the topology of W0m,p (Ω). Since D(Ω) is dense in W0m,p (Ω), ϕT ∈ W0−m,p (Ω), −m,p thus T ∈ Wloc (Ω). −m,p Conversely, we need to interpret T ∈ Wloc (Ω) as a continuous antilinear m,p ˜ functional T on Wc (Ω): T˜, f := W−m,p (Ω) ϕT, f W0m,p (Ω) . 0
Here, for given f , we took an open U , supp f ⊆ U ⊆ Ω and ϕ ∈ D(Ω) such that ϕ is identically 1 on U . It is not difficult to show that this definition does not depend on this particular choice of U and ϕ. In order to prove that T˜ is continuous in the strict inductive limit topology on W cm,p (Ω), it is enough to m,p show that it is continuous on WK (Ω), for K ⊆ Ω compact. Take ϕ ∈ D(Ω) m,p which is identically 1 on some open set containing K. Then, for f ∈ W K (Ω) |T˜, f | = W−m,p (Ω) ϕT, f W0m,p (Ω) ≤ ϕT W−m,p (Ω) f W0m,p (Ω) . 0
0
Since T and T˜ are equal on test functions, T˜ is the unique continuous extension of T on Wcm,p (Ω). −m,p (Ω) is It remains to be shown that the earlier defined topology on W loc −m,p m,p indeed the strong ∗ topology β(Wloc (Ω), Wc (Ω)), given by seminorms 1 1 1 1 pU (T ) = sup 1W−m,p (Ω) T, f Wcm,p (Ω) 1 , f ∈U
loc
where U ⊆ Wcm,p (Ω) is bounded. For such U , let K be a compact, such that m,p U is a bounded subset of WK (Ω), C > 0 so U W0m,p (Ω) ≤ C. Taking a test function ϕ identically equal 1 on some open set containing K: 1 1 1 1 pU (T ) = supf ∈U 1W−m,p (Ω) ϕT, f W0m,p (Ω) 1 0 ≤ ϕT W−m,p (Ω) supf ∈U f W0m,p (Ω) ≤ C|T |ϕ , 0
115
Locally Sobolev functions
−m,p so β(Wloc (Ω), Wcm,p (Ω)) is weaker than the usual topology on the space −m,p Wloc (Ω). To show the converse, take a test function ϕ and a compact K such that supp ϕ ⊆ Int K. If Mϕ > 0 such that ϕf W0m,p (Ω) ≤ Mϕ f W0m,p (Ω) , then
1 1 1 1 m,p ϕT, f 1 m,p ≤1 W−m,p (Ω) W0 (Ω) 1 W (Ω) 1 0 1 0 1 1 m,p ∞ = supψ∈CK (Ω) 1W−m,p (Ω) ϕT, ψ W0 (Ω) 1
ϕT W−m,p (Ω) = supf 0
ψWm,p (Ω) ≤1 0
≤ supf ∈Wm,p (Ω) K
f Wm,p (Ω) ≤M Mϕ
= pB (T ) ,
0
1 1 1 1 1W−m,p (Ω) T, f Wcm,p (Ω) 1 loc
0
m,p where B is an open ball in WK (Ω) round zero of radius Mϕ + 1, which is bounded in Wcm,p (Ω).
From the previous theorem and Corollary 1 we obtain
−m,p Corollary 3. Wloc (Ω) is continuously imbedded in D (Ω), both in strong topologies, and weak ∗ topologies.
m,p Theorem 3. Wc−m,p (Ω) is linearly isomorphic to the dual of Wloc (Ω).
Proof. Let T ∈ Wc−m,p (Ω) ⊆ W0m,p (Ω), and ϕ be a test function identically m,p equal 1 on some open set containing supp T . For f ∈ W loc (Ω), by T˜ , f := W−m,p (Ω) T, ϕf W0m,p (Ω) 0
m,p the antilinear extension of T on Wloc (Ω) is well defined. Since
|T˜, f | ≤ T W−m,p (Ω) ϕf W0m,p (Ω) , 0
m,p Wloc (Ω).
T is continuous on m,p Now, let T be a continuous antilinear functional on the space W loc (Ω). m,p m,p Since W0 (Ω) is continuously and densely imbedded in W loc (Ω), it fol lows that T ∈ W0−m,p (Ω). Furthermore, since C∞ (Ω) is continuously and m,p densely imbedded in Wloc (Ω), it follows that T ∈ E (Ω), and therefore T ∈ Wc−m,p (Ω). If we restrict 1 < p < ∞, we have the following two corollaries: m,p Corollary 4. The spaces Wcm,p (Ω) and Wloc (Ω) are reflexive.
116
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Proof. From the above two theorems it follows that W cm,p (Ω) is semireflexive, and since it is also barrelled (as a strict inductive limit of Banach spaces), it follows that it is reflexive. Since the strong dual of a reflexive space is also m,p reflexive, we conclude that Wloc (Ω) is reflexive.
Corollary 5. The space Wc−m,p (Ω) is continuously imbedded in D (Ω), both in strong topologies, and weak ∗ topologies. m,1 Remark 1. Note that Corollaries 3 and 5 do not include spaces W loc (Ω), m,1 m,∞ Wc (Ω) and Wc (Ω). By comparing adequate seminorms that generate the involved topologies, one can easily check the following statements: m,1 a) Wloc (Ω) and Wcm,1 (Ω) are continuously imbedded in D (Ω), both in strong and weak topologies; b) Wcm,∞ (Ω) is continuously imbedded in D (Ω); if m < 0 both in strong and weak ∗ topologies.
4.
Weak convergence and some imbeddings In the sequel let m ∈ N0 and p < ∞, if not explicitly stated differently.
−m,p Theorem 4. The space Lploc (Ω) is continuously imbedded in Wloc (Ω). Fur p −m,p thermore, for p > 1 Lloc (Ω) is dense in Wloc (Ω).
Proof. If f ∈ Lploc (Ω), for each test function ϕ, ϕf ∈ L p (Ω) ⊆ W−m,p (Ω), −m,p thus f ∈ Wloc (Ω). It is easy to see that this mapping is linear and injective, and since ϕf W−m,p (Ω) ≤ M ϕf Lp (Ω) , also continuous. The density can be shown as in [1] (3.12, pg. 51).
−m,p Theorem 5. For |α| ≤ m, ∂ α : Lploc (Ω) −→ Wloc (Ω) is a continuous linear mapping.
Proof. We use mathematical induction on k = |α| ≤ m; the case k = 0 being clear from the previous theorem. For |α| = k + 1 ≤ m, f ∈ L ploc (Ω) and ϕ ∈ D(Ω), we have ∂ α (ϕf ) ∈ W−m,p (Ω). The Leibniz formula implies α α α ϕ∂ f = ∂ (ϕf ) − ∂ α−β ϕ ∂ β f , β β<α
and by the inductive assumption all ∂ α−β ϕ ∂ β f belong to W−m,p (Ω), so we have ϕ∂ α f ∈ W−m,p (Ω). Since the mappings f → ∂ α (ϕf ) and f → ∂ α−β ϕ ∂ β f are continuous from Lploc (Ω) to W−m,p (Ω), it follows that so is
117
Locally Sobolev functions
−m,p f → ϕ∂ α (f ). This implies ∂ α f ∈ Wloc (Ω), and f → ∂ α f is a continuous −m,p linear mapping Lploc (Ω) −→ Wloc (Ω). , −m,p Theorem 6. Let N := |α|≤m 1. Then every distribution f ∈ Wloc (Ω) has the form f= (−1)|α| ∂ α vα , |α|≤m p
for some v = (vα )|α|≤m ∈ Lloc (Ω; CN ), i.e. there is such v that (∀ u ∈ Wcm,p (Ω))W−m,p (Ω) f, u Wcm,p (Ω) = v , ∂ α u Lpc (Ω) . Lp (Ω) α loc
|α|≤m
loc
Proof. P u := (∂ α u)α defines a continuous linear mapping P : Wcm,p (Ω) −→ −m,p Lpc (Ω; CN ), being an isomorphism to its image W := imP . For f ∈ W loc (Ω) we define f ∗ : W −→ C by f ∗ (w) := W−m,p (Ω) f, P −1 (w) Wcm,p (Ω) . loc
It is easy to see that f ∗ ∈ W , so by the Hahn-Banach theorem there is a continuous antilinear extension f˜ to Lpc (Ω; CN ). By Riesz representation (∃ v ∈ Lploc (Ω; CN ))(∀ w ∈ Lpc (Ω; CN )) f˜(w) = v , wα Lpc (Ω) . Lp (Ω) α |α|≤m
In particular, for u ∈ Wcm,p (Ω) we have
f, u Wcm,p (Ω) = f ∗ (P u) = f˜(P u) =
−m,p Wloc (Ω)
loc
vα , ∂ α u Lpc (Ω) .
|α|≤m
Lploc (Ω)
Theorem 7. A sequence (un ) converges weakly in Wcm,p (Ω) if and only if for |α| ≤ m the sequence (∂ α un ) converges weakly in Lpc (Ω). If we denote by u the weak limit of (un ), then ∂ α un − − ∂ α u.
Proof. Let un − − u in Wcm,p (Ω) weakly. Since for v ∈ Lploc (Ω) and |α| ≤ m, −m,p we have (−1)|α| ∂ α v ∈ Wloc (Ω), it holds
v, ∂ α un Lpc (Ω) = W−m,p (Ω) (−1)|α| ∂ α v, un Wcm,p (Ω)
Lploc (Ω)
loc
−→ (−1)|α| ∂ α v, u Wcm,p (Ω)=Lp W−m,p (Ω) loc
and that implies ∂ α un − − ∂ α u in Lpc (Ω) weakly.
v, ∂ α u Lpc (Ω) ,
loc (Ω)
118
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Assume now that the derivatives of un converge weakly in Lpc (Ω). If we denote by u the weak limit of un in Lpc (Ω), then it is easy to see that ∂ α un − − ∂ α u p weakly in Lc (Ω) (imbedding in distributions, and the derivative is continuous −m,p on the space of distributions). Using the representation for f ∈ W loc (Ω) from Theorem 6, we have , f, un Wcm,p (Ω) = v , ∂ α un Lpc (Ω) −m,p Wloc (Ω) Lploc (Ω) α |α|≤m , −→ v , ∂ α u Lpc (Ω) = W−m,p (Ω) f, u Wcm,p (Ω) , Lp (Ω) α |α|≤m
loc
loc
and that implies un − − u in Wcm,p (Ω) weakly. In a similar way as previous two results, we have , Theorem 8. Let N := |α|≤m 1. Then every distribution f ∈ Wc−m,p (Ω) has the form f= (−1)|α| ∂ α vα , |α|≤m p
m,p for some v = (vα )α ∈ Lc (Ω; CN ), i.e. for any u ∈ Wloc (Ω), m,p f, u = v , ∂ α u Lp −m,p W (Ω) Lp (Ω) α W (Ω) c
loc
|α|≤m
c
loc (Ω)
.
m,p Corollary 6. A sequence (un ) converges weakly in Wloc (Ω) if and only if for any |α| ≤ m the sequence (∂ α un ) converges weakly in Lploc (Ω). If we denote by u the weak limit of (un ), then ∂ α un − − ∂ α u. m,p Theorem 9. If m ∈ Z and 1 ≤ p < ∞, then (ff n ) converges weakly in Wloc (Ω) if and only if for every test function ϕ, the sequence (ϕf n ) converges weakly in W0m,p (Ω). If we denote the weak limit of (ff n ) by f , then ϕffn − − ϕf .
Proof. For technical reasons we prove the theorem only for p > 1. If fn − − f , it is bounded, and thus for ϕ ∈ D(Ω), (ϕf n ) is bounded in W0m,p (Ω), so there is a subsequence (ff n ) such that ϕffn − − T in W0m,p (Ω) (and in distributions). By Corollary 3, f n − − f in D (Ω), so ϕffn − − ϕf , thus T = ϕf . As any subsequence of a relatively compact sequence (ϕf n ) converges to ϕf , the sequence is convergent. If for each test function ϕ the sequence (ϕf n ) converges weakly in W0m,p (Ω), m,p then (ϕffn ) is bounded in W0m,p (Ω), implying that (ffn ) is bounded in Wloc (Ω). m,p Thus it has a subsequence (ffn ) converging weakly to some f in Wloc (Ω), and in distributions. To show that the whole sequence converges to f , it is enough m,p to show that each of its subsequences converging in W loc (Ω), converges to f . Note that if some subsequence f n converges to g, then ϕffn − − ϕg,
Locally Sobolev functions
119
implying ϕffn − − ϕg, since (ϕffn ) is convergent. In particular ϕff n − − ϕg which implies that for each test function ϕ, ϕf = ϕg. Now we conclude f = g. m Theorem 10. For m ∈ Z, Hm+1 loc (Ω) is compactly imbedded in H loc (Ω).
Proof. Since the continuity of the imbedding is obvious, it remains to be shown that any bounded sequence (ffn ) in Hm+1 loc (Ω) has a convergent subsequence in m+1 Hm (Ω). Since (f f ) is bounded, we have fn − − f in Hloc (Ω) weakly n loc (after passing to a subsequence if necessary). If ϕ ∈ D(Ω), let us show that ϕffn −→ ϕf in Hm loc (Ω) strongly. Let Ω ⊆ Ω be an open and bounded set, such that supp ϕ ⊆ Ω . Then the sequence (ϕffn ) is bounded in Hm+1 (Ω ), 0 m+1 and since H0 (Ω ) is compactly imbedded in Hm 0 (Ω ) [11] (pg. 113), there m is a subsequence (ffn ), and fϕ ∈ H0 (Ω ), such that ϕffn −→ fϕ in Hm 0 (Ω ). − fϕ in D (Ω ), and we conclude fϕ = ϕf on Ω . Now, Specially, ϕffn − since every subsequence of (ϕff n ) has a convergent subsequence in Hm 0 (Ω ), and every subsequence that converges has the same limit ϕf , it follows that the whole sequence (ϕffn ) converges to ϕf in Hm 0 (Ω ) strongly, and then also m strongly in Hloc (Ω).
5.
Concluding remarks
There are, with no doubt, many more results on locally Sobolev spaces which could be expressed along the same lines. We hope that this paper will motivate some readers to explore this path further. For the benefit of the reader, we write the main results in a table, comparing m,p the spaces Wloc (Ω) and Wcm,p (Ω) to the classical ones. Let us stress once more that in this table it is assumed that Ω is a domain in R d .
References [1] Robert A. Adams: Sobolev spaces, Academic Press, 1975. [2] Claudio Baiocchi, Antonio Capelo: Variational and quasivariational inequalities, Applications to free boundary problems, Wiley, 1984. [3] Ha¨¨ım Brezis: Annalyse fonctionnelle, Masson, 1987. [4] J. Deny, Jacques-Louis Lions: Les espaces du type de Beppo Levi, Ann. Inst. Fourier (Grenoble) 5 (1955) 305–379 ¨ [5] Gottfried Kothe: Topological vector spaces I, Springer, 1983. [6] Kreˇsˇimir Burazin: Application of compesated compactness in the theory of hyperbolic systems (in Croatian), magister scientiæ thesis paper, Zagreb, March 2004. [7] Lawrence Narici, Edward Beckenstein: Topological vector spaces, Dekker, 1985. [8] Bent E. Petersen: Introduction to the Fourier transform, and Pseudo-differential operators, Pitman, 1983. [9] Helmut H. Shaefer: Topological vector spaces, Springer, 1980.
p<∞
p<∞
separability
p<∞
yes
p<∞
yes
1
−m,p (Ω) Wloc (Ω)
yes
str. ind. lim.
Wcm,p (Ω)
m < 0, p > 1 , vα ∈ Lp (Ω) vα ∈ Lp (Ω) vα ∈ Lploc (Ω) vα ∈ Lpc (Ω) α f = |α|≤m ∂ vα m > 0, p < ∞ : fn − − f in Lp (Ω) in Lp (Ω) in Lploc (Ω) in Lpc (Ω) α α iff ∂ fn − − ∂ f, |α| ≤ m
yes, m ≤ 0
yes
1
density of D(Ω), p < ∞
reflexivity
−m,p
yes
metric
m,p Wloc (Ω)
(Ω) ⊆ (Lp (Ω))N Wc
W0
−m,p
dual space, p < ∞
yes
yes
completeness
norm
norm
Wm,p (Ω)
topology
W0m,p (Ω)
120 APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[10] Francc¸ ois Tre` ves: Topological vector spaces, distributions and kernels, Academic Press, 1967.
[11] Joseph Wloka: Partial differential equations, Cambridge, 1987.
[12] Kˆoˆ saku Yosida: Functional analysis, Springer, 1980.
ON SOME PROPERTIES OF HOMOGENISED COEFFICIENTS FOR STATIONARY DIFFUSION PROBLEM Nenad Antoni´c Department of Mathematics, University of Zagreb Bijeniˇcˇ ka cesta 30, 10000 Zagreb, Croatia
[email protected]
Marko Vrdoljak Department of Mathematics, University of Zagreb Bijeniˇcˇ ka cesta 30, 10000 Zagreb, Croatia
[email protected]
Abstract
We consider optimal design of stationary diffusion problems for two-phase materials. Such problems usually have no solution. A relaxation consists in introducing the notion of composite materials, as fine mixtures of different phases, mathematically described by the homogenisation theory. The problem can be written as an optimisation problem over K(θ), the set of all possible composite materials with given local proportion θ. Tartar and Murat (1985) described the set K(θ)e, for some vector e, and used this result to replace the optimisation over the complicated set K(θ) by a much simpler one. Analogous characterisation holds even for the case of mixing more than two materials (possibly anisotropic), where the set K(θ) is not effectively known (Tartar, 1995). We address the question of describing the set {(Ae, Af ) : A ∈ K(θ)} (for given e and f ), which is important for optimal design problems with multiple state equations (different right-hand sides). In other words, we are interested in describing two columns of matrices in K(θ). In two dimensions we describe this set in appropriate coordinates and give some geometric interpretation. For the three-dimensional case we consider the set {Af : A ∈ K(θ), Ae = t}, for a fixed t, and show how it can be reduced to a two-dimensional one, albeit through tedious computations.
Keywords:
homogenisation, stationary diffusion.
121 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 121–130. © 2005 Springer. Printed in the Netherlands.
122
1.
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Introduction We consider the conductivity problem in an open set Ω ⊆ R d −div(A∇u) = f ,
where A is a matrix function (the conductivity coefficients), f ∈ H −1 (Ω) the source term, and u an unknown function, modelling the temperature or, in electrostatics, the potential. We are interested in effective conductivities obtained by mixing two isotropic phases αI and βI, described by a characteristic function χ A = χαI + (1 − χ)βI . (1) In optimal design we look for an arrangement of given materials that would be optimal in a certain way. Such problems usually have no solution, and one way of relaxing them is to consider generalised (composite) materials obtained by homogenisation [1], [2], [4], [5], [6], [7]. The composite materials are described as limits, in the sense of H-topology, of sequences of the form (1). In this case the characterisation of the set of all possible limits is known. More precisely, if the sequence of characteristic functions converge weakly ∗ to θ in L∞ (Ω), then the corresponding sequence of conductivity matrices, given by (1), H-converges to the matrix function A such that A ∈ K(θ) holds almost everywhere on Ω, where K(θ) is the set of matrices described in terms of theirs eigenvalues [1], [3]: + λ− θ ≤ λj ≤ λθ d j=1 d j=1
1 λj − α
≥
1 β − λj
≥
λ− θ
(j = 1, . . . , d) ,
1 d−1 + + , − α λθ − α
1 d−1 − + β − λθ β − λ+ θ
(2) (3)
.
(4)
+ The numbers λ− θ , λθ are harmonic and arithmetic mean values of α and β (in 1 θ 1−θ proportion θ and 1−θ): λ+ θ = θα+(1−θ)β and λ− = α + β . The important θ
fact is that the converse statement is true as well: For any θ ∈ L ∞ (Ω; [0, 1]) and any measurable matrix function A, A(x) ∈ K(θ(x)) a.e. x ∈ Ω, there exists a sequence of characteristic functions (χ n ) converging weakly ∗ to θ such that H χn αI + (1 − χn )βI− − A. For many applications in optimal shape design, the optimisation over K(θ) can be changed to optimisation over a simpler set [4], knowing that the set K(θ)e + (for any e ∈ Rd ) is the ball whose diameter is the line segment [λ − θ e, λθ e].
123
On homogenised coefficients
A similar result holds for the case of multi-phase mixtures, possibly with anisotropic phases, where the characterisation of K(θ) is not known [5]. For optimal design problems with two state equations, where one is trying to find the best arrangement of given materials in terms of solutions obtained for different source terms (and some boundary conditions), the same idea leads to the question of characterisation of the set {(Ae, Af ) : A ∈ K(θ)} for two vectors e, f ∈ Rd . In [5] the convexity of this set for d ≥ 3 was proved. Primarily we are interested in the three-dimensional case, where this set is 5-dimensional, and our approach is to fix the first column Ae to some vector t ∈ R 3 and describe the set {Af : Ae = t, A ∈ K(θ)} . As we shall see in the following, the three-dimensional case can be reduced to two-dimensional.
2.
Two-dimensional case
Firstly, let us address the question of convexity for d = 2. We can, without the loss of generality, suppose that e and f form an orthonormal basis. So, our problem is to express the conditions (2)–(4) in terms of matrix’ entries. The expressions are especially simple if we present the matrices as
x+g b A= . b x−g Here, x = 2 tr A and d = x2 − g 2 − b2 is the determinant. The first step is to notice that (2)–(4) are equivalent to (γ and δ are the right-hand sides of (3) and (4), respectively) d ≤ x2 , x ∈ [λ− , λ+ ] ,
θ θ 1 2 d ≥ 2 + α x − α − α2 , γ γ
1 2 d ≥ 2 β− x + β − β2 . δ δ
(5)
Now, since g 2 + b2 = x2 − d, after a straightforward calculation we get a characterisation of the set K(θ) in x, g, b coordinates: 0 ≤ g 2 + b2 ≤ (x − x1 )(x − α) , 0 ≤ g 2 + b2 ≤ (x2 − x)(β − x) ,
x ∈ [x1 , x] , x ∈ [x, x2 ] ,
where 2 , γ 2 x2 = β − , δ
x1 = α +
1 1 + ; λ− − α λ+ − α 1 1 δ= + , β − λ− β − λ+
γ=
x=
λ− + λ + , 2
124
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
and K(θ) is clearly convex. Moreover, the above characterisation leads to simplification of calculations below. As mentioned in the Introduction, the main approach to our problem is to study the set {Af : Ae = t, A ∈ K(θ)} for a fixed vector t. Fixing e and t in R2 , among all effective conductivities we want to find those that map e to t. Let us start by e = (1, 0)τ and t = (a, b)τ . This problem has a solution if t belongs to the ball whose diameter is segment + [λ− θ e, λθ e]. The matrix of operator A in canonical basis is of the form:
a b A= , (6) b c with the range of number c to be calculated. Since K(θ) is described in terms of the spectrum of A, it is natural to write down the trace and the determinant of A: a + c = λ 1 + λ2 ac − b2 = λ1 λ2 . This leads to formula c = λ1 + λ2 − a, where λ1 and λ2 must satisfy the constraint −b2 = (a − λ1 )(a − λ2 ). Together with constraints describing the set K(θ) we are led to compute the intersections of this hyperbola with hyperbolas (3) and (4). Because of the convexity of K(θ), all possible values of c describe a segment. The computation becomes much easier if we write all constraints in terms of the trace and determinant, as we can see in (5). The same result can be obtained using the geometrical property written in the following Theorem. Since A is symmetric we can diagonalise it using a rotation matrix Rϕ . It is not hard to see that it is enough to consider ϕ ∈ [0, π. Theorem 1. For λ1 , λ2 ∈ R and e ∈ R2 the set {Ae : A ∈ Sym(R2 ), σ(A) = {λ1 , λ2 }} is the circle with diameter [λ1 e, λ2 e]. More precisely, if A = Rτϕ ΛRϕ , with Λ = diag(λ1 , λ2 ) and Rϕ the rotation by angle ϕ, then
λ1 + λ2 λ1 + λ 2 Ae − e = R−2ϕ Λe − e . (7) 2 2 By Theorem, the equality Ae = Rτϕ ΛRϕ e = t means that t should lie on a circle whose diameter is [λ1 e, λ2 e]. For the case e = (1, 0)τ and t = (a, b)τ this gives the equality −b2 = (a − λ1 )(a − λ2 ). Moreover, the angle ϕ satisfies 2ϕ + arg(t) = π, leading us, after a simple calculation, to the same expression for A as before.
125
On homogenised coefficients
Rτϕ ΛRϕ e Λe
2ϕ λ2 e
λ1 e
λ1 +λ2 2
e
e
Figure 1.
How matrices with prescribed eigenvalues act on vector e.
For the general e (not necessarily a unit vector), one can simply change coordinates from orthonormal basis having the first vector in the direction of e to the canonical basis. If e = (e1 , e2 )τ and t = (a, b)τ , all possible symmetric matrices that send e to t are of the form
1 e1 −e2 a b e1 e2 A= , (8) b λ1 + λ2 − a e2 e1 −e2 e1 e 2 where eigenvalues λ1 and λ2 must satisfy e 2 λ1 λ2 − (λ1 + λ2 )e · t + t 2 = 0 .
3.
(9)
Three-dimensional case
Any symmetric operator with given spectrum σ(A) = {λ 1 , λ2 , λ3 } (some of the eigenvalues may be equal) can be written using the Euler angles: ψ ∈ [0, 2π, ϑ ∈ [0, π, ϕ ∈ [0, 2π as A = Pτϕ Qτϑ Pτψ ΛPψ Qϑ Pϕ . Here Λ = diag(λ1 , λ2 , λ3 ), while ⎛ ⎞ ⎛ ⎞ cos α − sin α 0 1 0 0 Pα = ⎝ sin α cos α 0 ⎠ and Qα = ⎝ 0 cos α − sin α ⎠ 0 0 1 0 sin α cos α are rotations around z and x axis, respectively. If we use that spectral decomposition, the equality Ae = t reads Pτϕ Qτϑ Pτψ ΛPψ Qϑ Pϕ e = t. Multiplying by Qϑ Pϕ from the left we come to the equation Pτψ ΛPψ k = s ,
(10)
126
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
where k = Qϑ Pϕ e and s = Qϑ Pϕ t. The third component of the equation (10) is simple: λ3 k3 = s3 ,
(11)
while the first two components of the equation (10) can be written as A e = t ,
(12)
2 where A isa symmetric operator on R with eigenvalues λ1 and λ2 ,
linear k1 s1 e = and t = . k2 s2 Therefore, equality (11) determines one of the three parameters λ 3 , ϕ, ϑ (except in some singular cases), while (12) is the two-dimensional problem considered in the previous section. Unfortunately, since the following calculations are quite complicated, at the moment we can treat analytically only some special cases described in the next section. Nevertheless, in the general case, we can use a computer program to present what these sets look like. In the remainder of this section we shall focus on that question. For fixed ϑ and ϕ we calculate λ3 from (11) (unless k3 =0). If by a and b we denote the coordinates of vector t (they are expressed in terms of ϕ and ϑ), A is of the form (8) with λ1 and λ2 satisfying the constraint (9), with e and t instead of e and t. Together with constraints coming from the requirement A ∈ K(θ), we get the situation depicted in Figure 2.
λ2
λ+ θ
λ− θ λ− θ
Figure 2.
λ+ θ
λ1
Constraints on (λ1 , λ2 ).
127
On homogenised coefficients
To calculate all values obtainable by λ 1 + λ2 (which is obviously a segment), 2 it is best to write all constraints in terms of x = λ1 +λ and d = λ1 λ2 : 2 d ≤ x2 , d ≥
+ x ∈ [λ− θ , λθ ] ,
2λ+ θ x−
2 λ+ θ ,
1 2 d ≥ 2 + α x − α − α2 , γ γ
1 2 d ≥ 2 β− x + β − β2 , δ δ
(13) (14) (15) (16)
e 2 d = 2e · t x − t 2 . (17) 1 Here, γ is obtained by subtracting from the right-hand side in (3), and λ3 − α 1 δ by subtracting from the right-hand side in (4). β − λ3 Since equality (17) implies d ≤ x2 , the intersection points of line (17) and lines corresponding to equalities in (14), (15) and (16) determine the minimum and maximum of x. Now, it is not difficult to calculate Af :
A 0 τ τ Af = Pϕ Qϑ Qϑ Pϕ f . (18) 0 λ3 t=[1.62, .4e–1, .1e–1]
z
0.06
0.06
0.04
0.04
0.02
0.02 z
0
0
–0.02
–0.02
–0.04
–0.04
–0.06
–0.06 1.56 1.58 1.6 1.62 1.64 1.66 1.68 y
1.56 1.58 1.6 1.62 1.64 1.66 1.68 y
Figure 3.
In Figure 3, we present two examples with α = 1, β = 2, θ = 0.3, e = (1, 0, 0)τ and f = (0, 1, 0)τ . Since the considered set lies in the plane x = t 2 , for
128
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
better comparison of these two examples, we draw the circle which corresponds to the intersection of that plane with the ball {Af ∈ R 3 : A ∈ K(θ)}.
4.
Some special cases
Theorem 2. If t = λe and f is orthogonal to e then the set {Af : A ∈ K(θ), Ae = t} is the ball in the plane e⊥ whose diameter is [λ1 f , λ+ θ f ], where λ1 is given by the equation 1 1 1 1 + = − + . λ1 − α λ − α λθ − α λ+ θ −α The border of the set is attained by sequential laminates of order two, with the matrix phase αI and core βI. Proof. Ae = t means that λ is an eigenvalue, let us say λ 3 . The twodimensional subspace M = e⊥ , the orthogonal complement of {e}, is invariant for operator A. Since f ∈ M , we can use the result of Theorem 1. More precisely, for fixed λ1 and λ2 , the set {Ae : A ∈ Sym(R2 ), σ(A) = {λ1 , λ2 }} ⊆ M is a circle with diameter [λ1 f , λ2 f ]. Therefore, we should only look for conditions on eigenvalues λ1 and λ2 , and these are given by (2)–(4) with λ 3 = λ. This set corresponds to the shaded region in Figure 2. It is obvious that to the upper left corner of that region corresponds the biggest circle, since there λ 1 attains its lowest value and λ2 the highest (λ+ θ ). These eigenvalues correspond to sequential laminates of order two, with the matrix phase αI and core βI [1]. Although it is not hard to see that all other circles obtained by varying λ 1 and λ2 cover the biggest circle, we can simply use convexity of considered set to obtain the same conclusion. The above result together with numerical experiments, leads us to the following conjecture: The border of the set {(Ae, Af ) : A ∈ K(θ)} is obtained by sequential laminates of order two, with the matrix phase αI and core βI. To see what are the difficulties in the calculations described above we shall next present them for e = (1, 0, 0)τ , f = (0, 1, 0)τ and t = (t1 , t2 , 0)τ . Theorem 2 describes the solution if t2 = 0. The equation (11) is rewritten as ((λ3 − t1 ) sin ϕ − t2 cos ϕ) sin ϑ = 0 ,
(19)
and for nonzero θ we have ctgϕ =
λ3 − t1 . t2
(20)
On homogenised coefficients
129
If ϑ = 0 in (19), then Qϑ = I and A becomes Pτϕ+ψ ΛPϕ+ψ . We can now define ϕ as in (20), because ψ is arbitrary, and follow the same procedure as in the case ϑ = 0. We can eliminate ϕ from the system (12) dividing both e and t by sin ϕ (which is nonzero since t2 = 0) and arrive at the system
λ3 − t1 (λ3 − t1 )t1 − t22 A = . (21) t2 cos ϑ λ3 t2 cos ϑ Let us denote by I(λ3 , ϑ) (which is either an empty set or a segment) the set 2 of all possible values for x = λ1 +λ such that (13)–(17) are satisfied. Since its 2 explicit expression is lengthly, we omit it here. Therefore, for fixed x ∈ I(λ3 , ϑ) we can express the solution by (8) and (18): ⎞ ⎛ ⎛ ⎞ 0 t2 ⎟ ⎜ cos2 ϑ 2 ⎜ ⎟ ⎟ ⎜ ax + b t 2 ⎜ ⎟ ⎟ ⎜ Af = ⎝ λ3 − + λ − t 3 1 ⎟ λ3 − t1 ⎠ (λ3 − t1 )2 + t22 cos2 ϑ ⎜ sin ϑ cos ϑ ⎠ ⎝ −2 0 (λ3 − t1 )2 + t22 where
a = 2 (λ3 − t1 )2 + t22 (λ3 − t1 ) b = (λ3 − t1 )2 + t22 (t21 + t22 − λ23 )
If λ3 = t1 then the formula is (for ϑ = π/2 (21) obviously holds for no A ) ⎛ ⎞ t2 Af = ⎝ 2x − t1 ⎠ . −|t2 |tgϑ
References [1] G. Allaire: Shape optimization by the homogenization method, Springer-Verlag, 2002. [2] N. Antoni´c´ , M. Vrdoljak: Optimal design and hyperbolic problems, Mathematical Communications 4 (1999) 121–129. [3] V.V. Jikov, S.M. Kozlov, O.A. Oleinik: Homogenization of differential operators and integral functionals, Springer-Verlag, 1994. [4] F. Murat, L. Tartar: Calcul des Variations et Homog´en´e´ isation, in Les Me´ thodes de l’Homogenisation Th´e´ orie et Applications en Physique, Coll. Dir. Etudes et Recherches EDF, 57, pp. 319–369, Eyrolles, Paris 1985. [5] L. Tartar: An introduction to the homogenization method in optimal design, in Optimal shape design, Lecture Notes in Math. 1740, pp. 47–156, Springer-Verlag 2000. [6] Topics in the mathematical modelling of composite materials, A. Cherkaev, R. Kohn (eds.), ¨ 1997. Birkhauser,
130
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[7] M. Vrdoljak: On principal eigenvalue of stationary diffussion problem with nonsymmetric coefficients, Applied mathematics and scientific computing, pp. 313–322, Kluwer, 2003.
SOLVING PARABOLIC SINGULARLY PERTURBED PROBLEMS BY COLLOCATION USING TENSION SPLINES Ivo Beroˇs Department of Mathematics, University of Zagreb Bijeniˇcˇ ka cesta 30, 10000 Zagreb Croatia
[email protected]
Miljenko Maruˇsˇic´ Department of Mathematics, University of Zagreb Bijeniˇcˇ ka cesta 30, 10000 Zagreb Croatia
[email protected]
Abstract
Tension spline is a function that, for given partition x0 < x1 < . . . < xn , on each interval [xi , xi+1 ] satisfies differential equation (D 4 − ρ2i D2 )u = 0, where ρi ’s are prescribed nonnegative real numbers. In the literature, tension splines are used in collocation methods applied to two-points singularly perturbed boundary value problems with Dirichlet boundary conditions. In this paper, we adapt collocation method to solve a time dependent reactiondiffusion problem of the form ε2
∂u ∂2u = f (x, t) − c(x, t)u − p(x, t) ∂x2 ∂t
with Dirichlet boundary conditions. We tested our method on the time-uniform mesh with Nx × Nt elements. Numerical results show ε-uniformly convergence of the method.
1.
Introduction We consider singularly perturbed parabolic equation ε2
∂2u ∂u − c(x, t)u − p(x, t) = f (x, t), 2 ∂x ∂t 131
Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 131–140. © 2005 Springer. Printed in the Netherlands.
(1)
132
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
¯ where D = (0, 1) × (0, T ], with initial and boundary condion the domain D, tions u(x, 0) = u0 (x), x ∈ (0, 1), u(0, t) = g0 (t), u(1, t) = g1 (t),
(2) t ∈ (0, T ].
We assume that ¯ 1 functions c(x, t), p(x, t) and f (x, t) are sufficiently smooth on the D, and ¯ c(x, t) ≥ 0, p(x, t) ≥ p0 > 0, (x, t) ∈ D 2 functions g0 (t) and g1 (t) are sufficiently smooth on the [0, T ] and u 0 (x) is smooth on [0, 1], 3 ε ∈ (0, 1], and 4 compatibility conditions are satisfied at the corner points (0, 0) and (1, 0). This ensure existence and smoothness of the solution to the boundary value problem. It is known that classical methods, like standard finite difference or finite element method, fail on this problem when ε is small relative to the meshwidth. To solve this problem, different authors proposed difference schemes on fitted meshes [3, 4, 6, 17]. We will describe collocation method using tension splines on different type of meshes. Tension spline is a function that for given partition x 0 < x1 < . . . < xn on each interval [xi , xi+1 ] satisfies differential equation (D 4 − ρ2i D 2 )u = 0, where ρi ’s are prescribed nonnegative real numbers. In other words, tension spline is function whose restriction on non-empty interval (x i , xi+1 ) lies in span {1, x, sinh(ρi ), cosh(ρi )}. Tension splines are mostly applied to the problem of shape preserving approximation [20, 21] and in collocation method in singularly perturbed two-point boundary value problem for ODE [5, 10, 13]. In the forthcoming text, T denotes space of all tension splines defined on the given partition a = x0 < x1 < . . . < xn = b of the interval [a, b] with prescribed ρi ’s. Of the particular interest are subspaces T 1 = T ∩ C 1 (a, b) and T 2 = T ∩ C 2 (a, b) of T . In our algorithm we will use B-spline basis for T 1 and T 2 [7, 9, 11, 14, 15].
2.
Collocation method
We start with brief description of collocation method using tension splines applied to singularly perturbed two-point boundary value problem for the ODE. More details can be found in [2, 5, 10, 13].
133
Solving Parabolic Singularly Perturbed Problems...
So, we consider problem (Lu)(x) := ε2 u (x) + c(x)u(x) = f (x), u(0) = u0 , u(1) = u1 .
x ∈ (0, 1),
(3) (4)
Let 0 = x0 < x1 < . . . < xn = 1 be a partition of interval [0, 1] and let the parameters ρi are given by (cf. [5, 10]) |c(xi )| |c(xi+1 )| ρi := max , ., ε ε Spline s ∈ T 2 is the collocation spline for the problem (3–4) if it satisfies collocation equations (Ls)(xi ) = f (xi ),
i = 0, . . . n,
(5)
and boundary conditions s(0) = u0 and s(1) = u1 .
(6)
This n + 3 conditions uniquely determine tension spline [13]. If we apply tension spline s ∈ T 1 , s is uniquely determined by boundary conditions (6) and collocation equations (Ls)(ξi ) = f (ξi ), i = 0, . . . 2n − 1, where
(7)
ξ2i = xi + hi ti and ξ2i+1 = xi+1 − hi ti ,
1 2 pi 1 −1 = − cosh sinh , pi = ρi hi , 2 pi pi 2 (see [5, 10]). Collocation points ξ 2i and ξ2i+1 are generalization of classical Gaussian points used in cubic spline collocation. Note that dimension of the space T 1 is 2n + 2 and dimension of the space T 2 is n + 3. In both cases, collocation spline s can be written in the terms of B-spline basis: s(x) = ci Bi (x). and
ti
In this representation collocation equations (5–6) or (6–7) yield the matrix equation Ac = f, whose solution c (c = (c1 , . . . , cn+3 ) or c = (c1 , . . . , c2n+2 )) determines collocation spline. For s ∈ T 1 , the choice of special collocation points leads to a quadratically convergent method for small perturbation parameter ε (for large perturbation parameter, the order of convergence is four) [5, 10]. For s ∈ T 2 collocation method is linearly convergent for small and quadratically convergent for large perturbation parameter ε [13].
134
3.
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Collocation method for parabolic differential equation
Now, we will modify and adapt described methods and will apply them to parabolic singularly perturbed problems. To solve equation (1) with our method we have to do two main steps: the first step is discretizing the time variable with the backward Euler method and the second step is solving time independent singularly perturbed problems. Precisely, we introduce uniform partition 0 = t 0 < t1 < . . . < tm = T of the time interval [0, T ], where ti = iht , ht = T /m. For space interval [0, 1] we define partition 0 = x0 < x1 < . . . < xn = 1. In the first step we replace time-derivative in the points (xi , tj ), i = 0, . . . , n, j = 1, . . . , m, with first backward difference ∂u u(xi , tj ) − u(xi , tj−1 ) (xi , tj ) ≈ ∂t ht and (1) becomes ε2
u(xi , tj ) − u(xi , tj−1 ) ∂2u (xi , tj ) − cij u(xi , tj ) − pij = fij , 2 ∂x ht
(8)
Here, for the sake of simplicity, we wrote cij = c(xi , tj ),
pij = p(xi , tj ) fij = f (xi , tj ).
In the time step tj , we know values u(xi , tj−1 ) and rewrite (8) as ε2
pij pij ∂2u (x , t ) − c + u(xi , tj ) = fij + u(xi , tj−1 ), i j ij 2 ∂x ht ht
(9)
Equation (9) now can be solved using collocation method for ODE. Precisely, in the time step tj , j > 0, we solve problem ε2 u j (x) − Cj (x)uj (x) = Fj (x), x ∈ (0, 1) uj (0) = g0 (tj ), uj (1) = g1 (tj ), where uj (x) = u(x, tj ),
Cj (x) = c(x, tj ) +
Fj (x) = f (x, tj ) + p(x, tj )
p(x, tj ) , ht
u(x, tj−1 ) . ht
Our method can be written in the form of an algorithm. Algorithm Solving singularly perturbed problems by collocation methods using tension splines.
135
Solving Parabolic Singularly Perturbed Problems...
Define partition 0 = x0 < x1 < . . . < xn = 1 of interval (0, 1) Evaluate collocation spline s ∈ T 2 (or s ∈ T 1 ) for the problem (Lu)(x) := ε2 u (x) − C1 (x)u(x) = F1 (x), x ∈ (0, 1) u(0) = g0 (t1 ), u(1) = g1 (t1 ), as a solution of the collocation equations (5–6) (or (6-7)) where p(x, t1 ) u0 (x) C1 (x) = c(x, t1 ) + , F1 (x) = f (x, t1 ) + p(x, t1 ) ht ht and ρi are defined by |C1 (xi )| |C1 (xi+1 )| ρi := max , . ε ε Let u(x, t1 ) = s(x) For j = 2, . . . , m Evaluate collocation spline s ∈ T 2 (or s ∈ T 1 ) for the problem (Lu)(x) := ε2 u (x) − Cj (x)u(x) = F1 (x), x ∈ (0, 1) u(0) = g0 (t1 ), u(1) = g1 (t1 ), as a solution of the collocation equations (5–6) (or (6-7)) where p(x, tj ) uj (x) = u(x, tj ), Cj (x) = c(x, tj ) + , ht uj−1 (x) Fj (x) = f (x, tj ) + p(x, tj ) ht and ρi are defined by |C Cj (xi )| |C Cj (xi+1 )| ρi := max , . ε ε Let u(x, tj ) = s(x)
4.
Numerical results In this section we present numerical results for the problem from [17]: ε2
∂ 2 u ∂u − = 0, ∂x2 ∂t
(x, t) ∈ (0, 1) × (0, 1).
For initial and boundary conditions u(x, 0) = 0, x ∈ (0, 1), u(0, t) = 0, 1 " t 1 u(1, t) = t+ erfc √ e−1/4εt , − 2ε πε 2 εt
t ∈ (0, 1],
exact solution is given by x " t x2 2 u(x, t) = t + erfc √ xe−x /4εt . − 2ε πε 2 εt
136
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Let 0 = t0 < t1 < . . . < tm = 1 be a uniform partition of time interval and let 0 = x0 < x1 < . . . < xn be a partition of space interval. First, we used tension splines from T 1 and T 2 on the equidistant partition of time and space intervals. For splines from T 2 we have n = m, but for splines from T 1 we have n = m/2 since the dimension of T 1 defined over partition with m/2 knots is m + 2 and it is comparable with dimension of T 2 over m knots. In Table 1 and Table 2 errors E(n, m; ε) for methods using different spline spaces are presented. The error E(n, m; ε) is defined by E(n, m; ε) = max max |u(xi , tj ) − uij |, j=0,...,m i=0,n
where uij is the approximation to the solution of problem (1) at the point (x i , tj ) and u(xi , tj ) is the exact solution of problem (1) at that point for parameter ε. Numbers m, n denotes number of time and space intervals. Results show our methods converge for fixed ε. Also, we compare errors of our methods with with errors of solution using finite difference method from [17] and obtain similar behaviour of error. Piecewise uniform partition of space interval is widely used in the finite difference method for solving singularly perturbed problem ([16, 17]), so we try our method on the such partition. We used tension splines from T 1 and T 2 on the equidistant partition of time interval but the space interval is divided into three subintervals: I 1 = [0, σ], I2 = [σ, 1− σ] and I3 = [1− σ, 1] and on the each subinterval uniform partition with m/4 knots on I1 and I3 , and m/2 knots on I2 is defined. Point σ is called transition point [16] and it is defined by 1 σ = min{ , 2ε ln n}. 4 As in the first case, we use partition of m knots for the splines from T 0 and m/2 knots for the splines from T 1 . In Table 3 and Table 4 errors E(n, m; ε) for methods using different spline spaces are presented. Again we see that our method converges for fixed ε. The last row of the each table contains E(m), maximum errors for fixed number of time and space intervals and different ε, maxε E(m, m; ε), for spline s ∈ T 2 , E(m) = maxε E(m/2, m; ε), for spline s ∈ T 1 .
137
Solving Parabolic Singularly Perturbed Problems...
Table 1. Error E(m, m; ε) for method uses spline s ∈ T 2 on the equidistant partition of space interval. Parameter m =4, 16, 64, 256 and 1024 and ε = 2−k , k = 0, 2, . . . , 24.
m k
4
16
64
256
1024
0 2 4 6 8 10 12 14 16 18 20 22 24
1.56E-02 2.86E-02 6.92E-02 1.15E-01 8.58E-02 4.50E-02 2.30E-02 1.16E-02 5.83E-03 2.92E-03 1.46E-03 7.32E-04 3.66E-04
6.13E-03 8.60E-03 2.15E-02 5.87E-02 1.50E-01 2.35E-01 1.80E-01 1.02E-01 5.46E-02 2.83E-02 1.44E-02 7.26E-03 3.65E-03
1.79E-03 2.26E-03 5.74E-03 1.73E-02 5.58E-02 1.47E-01 2.71E-01 3.69E-01 2.96E-01 1.85E-01 1.06E-01 5.69E-02 2.96E-02
4.69E-04 5.74E-04 1.46E-03 4.54E-03 1.63E-02 5.52E-02 1.47E-01 2.70E-01 3.80E-01 4.95E-01 4.26E-01 2.98E-01 1.86E-01
1.19E-04 1.44E-04 3.67E-04 1.15E-03 4.25E-03 1.60E-02 5.51E-02 1.47E-01 2.74E-01 3.97E-01 4.93E-01 6.04E-01 5.51E-01
max.
1.15E-01
2.35E-01
3.69E-01
4.95E-01
6.04E-01
Table 2. Error E(m/2, m; ε) for method uses spline s ∈ T 1 on the equidistant partition of space interval. Parameter m =4, 16, 64, 256 and 1024 and ε = 2−k , k = 0, 2, . . . , 24.
m k
4
16
64
256
1024
0 2 4 6 8 10 12 14 16 18 20 22 24
1.56E-02 2.37E-02 2.06E-02 5.18E-03 7.05E-03 2.92E-03 9.82E-04 3.05E-04 9.10E-05 2.65E-05 1.07E-06 2.17E-10 8.05E-18
5.98E-03 7.08E-03 7.30E-03 6.60E-03 5.68E-03 8.63E-03 9.40E-03 4.08E-03 1.40E-03 4.37E-04 1.30E-04 5.35E-06 1.08E-09
1.78E-03 1.86E-03 1.88E-03 1.86E-03 1.71E-03 2.77E-03 1.26E-02 5.59E-03 8.17E-03 4.23E-03 1.57E-03 5.06E-04 2.24E-05
4.65E-04 4.70E-04 4.72E-04 4.72E-04 4.67E-04 4.31E-04 7.09E-04 5.74E-03 2.39E-02 7.32E-03 4.57E-03 3.52E-03 1.53E-03
1.18E-04 1.18E-04 1.18E-04 1.18E-04 1.18E-04 1.17E-04 1.08E-04 1.78E-04 1.48E-03 9.77E-03 3.34E-02 1.54E-02 1.72E-03
max.
2.37E-02
9.40E-03
1.26E-02
2.39E-02
3.34E-02
138
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Table 3. Error E(m, m; ε) for method uses spline s ∈ T 2 on the non-equidistant partition of space interval. Parameter m = 4, 16, 64, 256 and 1024 and ε = 2−k , k = 0, 2, . . . , 24.
m k
22
24
26
28
210
0 2 4 6 8 10 12 14 16 18 20 22 24
1.56E-02 2.18E-02 3.01E-02 5.60E-02 5.51E-02 3.87E-02 2.29E-02 1.25E-02 6.52E-03 3.33E-03 1.68E-03 8.46E-04 4.24E-04
5.72E-03 5.72E-03 5.50E-03 1.41E-02 3.63E-02 5.12E-02 4.38E-02 2.88E-02 1.65E-02 8.87E-03 4.59E-03 2.34E-03 1.18E-03
1.45E-03 1.45E-03 1.43E-03 1.42E-03 7.22E-03 1.84E-02 2.65E-02 2.89E-02 2.17E-02 1.33E-02 7.35E-03 3.87E-03 1.99E-03
3.62E-04 3.62E-04 3.62E-04 3.62E-04 5.33E-04 2.71E-03 7.44E-03 1.26E-02 1.35E-02 1.25E-02 8.53E-03 4.97E-03 2.68E-03
9.06E-05 9.06E-05 9.05E-05 9.06E-05 9.06E-05 1.99E-04 8.31E-04 2.34E-03 4.77E-03 7.13E-03 6.68E-03 4.63E-03 2.95E-03
max.
5.60E-02
5.12E-02
2.89E-02
1.35E-02
7.13E-03
Table 4. Error E(m/2, m; ε) for method uses spline s ∈ T 1 on the non-equidistant partition of space interval. Parameter m =4, 16, 64, 256 and 1024 and ε = 2−k , k = 0, 2, . . . , 24.
m k
4
16
64
256
1024
0 2 4 6 8 10 12 14 16 18 20 22 24
2.52E-01 4.51E-01 3.35E-01 1.81E-01 9.41E-02 4.80E-02 2.42E-02 1.22E-02 6.10E-03 3.05E-03 1.53E-03 7.64E-04 3.82E-04
5.98E-03 7.08E-03 7.30E-03 1.67E-02 8.49E-02 1.18E-01 1.01E-01 6.79E-02 3.96E-02 2.15E-02 1.14E-02 5.77E-03 2.90E-03
1.78E-03 1.86E-03 1.88E-03 1.82E-03 6.43E-03 3.77E-02 8.09E-02 1.03E-01 9.10E-02 6.35E-02 3.82E-02 2.24E-02 1.15E-02
4.65E-04 4.70E-04 4.72E-04 4.71E-04 4.61E-04 1.86E-03 1.14E-02 2.66E-02 4.05E-02 5.08E-02 5.15E-02 4.05E-02 3.30E-02
1.18E-04 1.18E-04 1.18E-04 1.18E-04 1.18E-04 1.16E-04 4.87E-04 3.20E-03 8.03E-03 3.07E-02 1.62E-02 1.86E-02 2.02E-02
max.
4.51E-01
1.18E-01
1.03E-01
5.15E-02
3.07E-02
Solving Parabolic Singularly Perturbed Problems...
139
In the Table 3 and Table 4 we can see that E(m) decrease as m increases. This results show that collocation method using tension splines (both from T 1 or T 2 ), on non-equidistant partition defined above, exhibits ε-uniform convergence property [16]. For evaluation of tension splines we use an algorithm described in [1]. All calculations were done in FORTRAN 77 in double precision arithmetic.
References [1] I. Beroˇsˇ and M. Marusˇic´ , Evaluation of tension splines, Mathematical Communications, 4 (1999), pp. 73–81. [2] I. Beroˇsˇ and M. Marusˇic´ , Collocation with High Order Tension Splines, Proceedings of the 1. Conference on Applied Mathematics and Computation Dubrovnik, Croatia, 1999., M. Rogina, V. Hari, N.Limi´c´ , and Z. Tutek, eds., Zagreb, 2001. [3] C. Clavero, J. C. Jorge, F. Lisbona and G. I. Shishkin,An alternating direction scheme on a nonuniform mesh for reaction-diffusion parabolic problem, IMA J.Numer.Anal. 20 (2000), pp. 263–280. [4] C. Clavero, J. C. Jorge and F. Lisbona,A uniformly convergent scheme on a nonuniform mesh for convection-diffusion parabolic problems, IMA J.Numer.Anal. 154 (2003), pp. 415–429. J. Comp. Appl. Math. 154 (2003)415–429. [5] J. E. Flaherty and W. Mathon, Collocation with polynomial and tension splines for singularly-perturbed boundary value problems, SIAM J. Sci. Stat. Comput., 1 (1980), pp. 260–289. [6] P. W. Hemker, G. I. Shishkin and L. P. Shishkina,ε-uniform schemes with high-order timeaccuracy for parabolic singular perturbation problems,IMA J. Numer. Anal. 20 (2000), pp. 99–121. [7] P. E. Koch and T. Lyche, Construction of exponential tension B-spline of arbitrary order, in P. J. Laurent, A. Le Mehaute and L. L Schumaker, eds., Curves and Surfaces, N.Y., 1991, Academic Press, pp. 255–258. [8] J. Kozak, Shape Preserving Approximation, Computers in Industry, 7 (1986), pp. 435–440. [9] B. I. Kvasov and P. Sattayatham,GB-splines of arbitrary order, J. Comp. Anal. Appl. Math., 104(1999), pp. 63–68. [10] M. Maruˇsˇic´ , A fourth/second order accurate collocation method for singularly perturbed two-point boundary value problems using tension splines, Numer. Math., 88(2001), pp. 135-158. [11] M. Maruˇsˇic´ , Stable calculation by splines in tension, Grazer Math. Ber., 328 (1996), pp. 65– 76. [12] M. Maruˇsˇic´ , On Interpolation by Hermite Tension Spline of Arbitrary Order, in K. Chui and L. L Schumaker, eds., Aproximation Theory IX, Nashville, 1998, Vanderbilt University Press, pp. 1–8. [13] M. Maruˇsˇic´ and M. Rogina, A collocation method for singularly perturbed two-point boundary value problems with splines in tension, Adv. Comput. Math., 6 (1996), pp. 65–76. [14] M. Maruˇsˇic´ and M. Rogina, B-splines in tension, in R. Scitovski, ed., Proceedings of VII Conference on Applied Mathematics, Osijek, 1989, pp. 129–134. [15] B. J. McCartin, Theory of exponential splines, J. Approx. Theory, 66 (1991), pp. 1–23.
140
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[16] J. J. H. Miller, E. O’Riordan and G. I. Shishkin, Fitted numerical methods for singular perturbation problems.Error Estimates in the Maximum Norm for Linear Problems in One and Two Dimensions,World Scientific, Singapore, 1996. [17] J. J. H. Miller, E. O’Riordan, G. I. Shishkin and L. P. Shiskina, Fitted Mesh Methods for Problems With Parabolic Boundary Layers Math. Proc. Roy. Irish Acad. 98A (1998)173 -190. [18] M. Rogina, Basis of splines associated with some singular differential operators, BIT, 32 (1992), pp. 496–505. [19] R. D. Russel and L. F. Shampine, A Collocation Method for Boundary Value Problems, Numer. Math., 19 (1972), pp. 1–28. [20] D. G. Schweikert, An interpolating curve using a spline in tension, J. Math. Physics, 45 (1966), pp. 312–317. ¨ Exponential spline interpolation, Computing, 4 (1969), pp. 225–233. [21] H. Spath,
ON ACCURACY PROPERTIES OF ONE–SIDED BIDIAGONALIZATION ALGORITHM AND ITS APPLICATIONS∗ Nela Bosner Department of Mathematics, University of Zagreb Bijeniˇcˇ ka cesta 30, 10000 Zagreb Croatia
[email protected]
Zlatko Drmaˇc Department of Mathematics, University of Zagreb Bijeniˇcˇ ka cesta 30, 10000 Zagreb Croatia
[email protected]
Abstract
1.
The singular value decomposition (SVD) of a general matrix is the fundamental theoretical and computational tool in numerical linear algebra. The most efficient way to compute the SVD is to reduce the matrix to bidiagonal form in a finite number of orthogonal (unitary) transformations, and then to compute the bidiagonal SVD. This paper gives detailed error analysis and proposes modifications of recently proposed one–sided bidiagonalization procedure, suitable for parallel computing. It also demonstrates its application in solving two common problems in linear algebra.
Introduction
The singular value decomposition (SVD) of a real m × n matrix A, A = UA ΣA VAT (U UA , VA orthogonal, ΣA diagonal) is fundamental theoretical and computational tool in numerical linear algebra. Specially, the highest level of mathematical elegance and perfectionism is achieved in computing the SVD of a bidiagonal matrix B, cf. [5]. The SVD of B, B = U B ΣB VBT , can be computed to full accuracy in quadratic time. To compute the SVD of general A, one uses ∗ The work of the authors is supported by the Croatian Ministry of Science and Technology under grant 0037120, and by the Volkswagen–Stiftung.
141 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 141–150. © 2005 Springer. Printed in the Netherlands.
142
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
some bidiagonalization algorithm to write A as A = U BV T with orthogonal U , V and bidiagonal B, and then obtains SVD as A = (U U B )ΣB (V VB )T . In this paper we analyze numerical aspects of the SVD computed via one– sided bidiagonalization, and its applications. We are in particular interested in one–sided bidiagonalization by Barlow [1], which prompted this study.
2.
One–sided bidiagonalizations
The idea of one–sided bidiagonalization is simple. If A = U BV T is the bidiagonalization of A, then H ≡ AT A = V (B T B)V T = V T V T , T = B T B is tridiagonalization of H. Thus, if the orthogonal transformation V is designed to tridiagonalize H, V T HV = T , but applied to A from the right hand side, then AV = U B, where B is the Cholesky factor of T . In fact, AV = U B is the QR factorization of AV and it can be computed using the Gram–Schmidt algorithm. Since both A ← AV = F and the Gram–Schmidt orthogonalization transform only the columns of the array A, the whole procedure is one–sided and therefore well suited for vector and distributed parallel computing.
2.1
Barlow’s modification of Ralha’s algorithm: U T A–one–sided bidiagonalization
Quite recently, Barlow [1] proposed a simple but far reaching modification of the algorithm from [9], who’s numerical properties were not satisfactory, and proved its backward stability. Barlow noted that mathematically equivalent T formulation can use the vector zk defined as zk = Ak−1 (:, k + 1 : n) uk to determine k-th Householder reflector in V , where U = u1 , u2 , . . . , un . This use of the computed uk ’s has nontrivial numerical consequences. The modified algorithm reads: Algorithm 2.1. For A ∈ Rm×n , rank(A) = n > 2, this algorithm computes orthonormal U , bidiagonal B and orthogonal U such that A = U BV T . Main steps of the algorithm are: uk is produced from orthogonalization of A k−1 (:, k) against uk−1 ,
Ak = Ak−1 Hk (A0 = A), where Householder reflector H k is chosen so that for zk = Ak−1 (:, k + 1 : n)T uk , Hk zk = φk+1 e1 , U = [u1 , u2 , . . . , un ], V = H1 H2 · · · Hn−2 .
One–sided bidiagonalization algorithm
3.
143
Error analysis
Our goal in this section is to show that the computed bidiagonal matrix determines the eigenvalues of the original matrix with small absolute backward error. With ε we denote the machine round–off.
3.1
Backward stability
˜ is the bidiagonal matrix computed by Algorithm 2.1 without Theorem 3.1. If B ˆ an breakdown, then there exist an (m + n) × (m + n) orthogonal matrix P, ˆ orthogonal n × n matrix V and backward perturbations ∆A, δA such that . .
. . ∆A . ˜ ∆A B . = Pˆ T Vˆ , . (1) . δA . ≤ ξ A F , A + δA 0 F where 0 ≤ ξ ≤ K(mn + n3 )ε. The computed approximation V˜ of the matrix ˘ and Vˆ satisfies V˜ − Vˆ F ≤ O(n2 ε). Further, there exist an orthonormal U ˘ perturbation δA such that √ ˘ =U ˘ F ≤ 2ξ A F . ˘B ˜ Vˆ T , δA A + δA (2) Proof: In [3] and [1], using the results from [7] and [2]. An immediate consequence of this theorem is: Corolary 3.2. If σ1 ≥ · · · ≥ σn are the singular values of A, then the singular ˜ from Theorem 3.1 satisfy values σ ˜1 ≥ · · · ≥ σ ˜n of B |˜ σ i − σi | max , ≤ ξ. n i 2 σ j=1 j Corolary 3.3. If the Algorithm 2.1 is implemented in parallel with certain data distribution, then the assertion of Theorem 3.1 essentially remains true for both Householder and Givens transformation based eliminations. Only the upper bound for ξ may have different constants and slightly different polynomial expression. ˜ is not guaranteed to be numerically orthogRemark 3.4. The computed U onal. In fact, it is not hard to construct examples where the departure from orthogonality is enormous.
3.1.1 HRA properties of the one–sided bidiagonalization. Barlow T ˜ ! He shows that even the U A–based SVD can compute numerically singular U also notes that his modified one–sided bidiagonalization computes all singular values to high relative accuracy in many cases. These facts raise important questions about relative accuracy of the one–sided bidiagonalization. Does it or can it posses high relative accuracy (HRA) property?
144
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
We will first derive simple multiplicative backward error relations, valid for both one–sided bidiagonalization algorithms. Assume that A is square row– wise well–scaled matrix. In that case, the matrix A is composed as DX, where D is diagonal with arbitrarily high condition number, and X is square and well– conditioned. (More generally, we can use full row rank rectangular X and use X † instead of X −1 .) Proposition 1. Let σ1 ≥ · · · ≥ σn > 0 and σ ˆ1 ≥ · · · ≥ σ ˆ n > 0 be the singular ˜B ˜ V˜ −1 , respectively, where values of A and U F F + δF V˜ −1 ˜B ˜ V˜ U
= D(X + δX)Vˆ , δX F ≤ 1 X F , Vˆ T Vˆ = I ˜ B, ˜ ˜ ||B|), ˜ = U |δF | ≤ 2 (|F | + |U 2 ≤ O(n)ε 2 = (I + E)Vˆ , E F ≤ 3 ≤ O(n )ε = D(X + ∆X)(I + E)−1 = DX(I + Z), ∆X = δX + δ X, δ X = D −1 δF Vˆ T .
(3)
Then, for all i, |ˆ σ i − σi | X −1 ∆X − E 2 √ ≤ ζ, ζ ≤ + O( X −1 ∆X − E 22 ). 1 − E 2 σi σ ˆi
(4)
Proof: In [3], using the results from [8]. Remark 3.5. The perturbation bound above correctly reflects the possibility that X −1 ∆X and E cancel each other. In practice, the useful bound is Z 2 ≤
X −1 ∆X 2 + E 2 , 1 − E 2
where X −1 ∆X 2 ≤ κ(X)1 + X −1 2 δ X F . ˜B ˜ Vˆ T instead Proposition 2. If we use the singular values σ ˇ1 ≥ · · · ≥ σ ˇn of U −1 ˜B ˜ V˜ , then we set E = 0 in the above definition of Z to get for all i those of U |ˇ σ i − σi | √ ≤ X −1 ∆X 2 + O( X −1 ∆X 22 ). σi σ ˇi ˜ ||B| ˜ ≤ |U ˜ B|| ˜ B ˜ −1 ||B| ˜ and obtain under the To estimate δ X, we use |U −1 ˜ ˜ assumption that β ≡ |B ||B| F < 1/2 δ X F = D −1 δF F ≤
2 (1 + 1 )(1 + β) X F ≡ 4 X F . 1 − β2
Hence, Z 2 ≤
κ(X)(1 + 4 ) + 3 ≡ 5 κ(X). 1 − 3
One–sided bidiagonalization algorithm
145
Remark 3.6. This analysis identifies β as one parameter that influences the size of the perturbation in the decompositions (3) which in the case of moderate β become HRA decompositions. Now, suppose that (3) is HRA decomposition, ˜≈U ˜B Σ ˜ V˜ T is the SVD of B ˜ computed to high relative accuracy. We and that B B ˜ is numerically orthogonal, and (U˜ U ˜B )Σ( ˜ V˜ V˜B )T still have no guarantee that U may or may not be numerically acceptable SVD of A. Note also that the carrier ˜ B, ˜ and not B ˜ (cf. (3)). of the information on the singular values is U
3.1.2 Sharpening the error bound using quadratic estimates. It has been observed that the computed singular values are often much better approximations of the true values than the error bounds and quality of the left singular vectors indicate. We want to know more precisely how (much) the departure from orthogonality of the left singular vectors harms the relative accuracy of the computed singular values. For the sake of simplicity, we will consider the singular values σˆ1 ≥ · · · ≥ σ ˆn ˜B ˜ Vˆ T and compare them with certain simple function of the singular values of U ˜ of B. ˜ = U ˜ Σ ˜ V T be the exact SVD of the computed bidiagonal B, ˜ and Let B ˜ B B B ˜B ˜ Vˆ = (U ˜ U ˜ )Σ ˜ (Vˆ V ˜ )T be accepted as the SVD of let the decomposition U B B B T ˜B ˜ Vˆ . Since the matrix U˜ = U ˜ U ˜ is not necessarily numerically orthogonal, U B ˜ = I + Γ, ˜T U we need an error estimate in the computed singular values. Let U T T ˜) U ˜ = I + Ξ, Ξ = U ΓU (U ˜ . Obviously, Ξ F = Γ F Since the columns ˜ UB B ˜ are explicitly normalized, maxi |Γii | ≤ O(m)ε. of U Now let diag(Ξ) be the diagonal matrix of the diagonal entries of Ξ, off(Ξ) = Ξ − diag(Ξ). Note that diag(Ξ) 2 ≤ Γ 2 . If Γ F < 1 we can perform the following scaling ˜ where (I + diag(Ξ))−1/2 (I + Ξ)(I + diag(Ξ))−1/2 = I + Ξ, ˜ = (I + diag(Ξ))−1/2 off(Ξ)(I + diag(Ξ))−1/2 , with Ξ ˜ ii = 0 for all i. Ξ 2 ˜ ij = Ξij / (1 + Ξii )(1 + Ξjj ), a reasonable assumption Since for i = j, Ξ ˜ F < 2 Ξ F . Γ 2 < 1/2 implies that Ξ ˜ to unit EuNote that this scaling corresponds to scaling of the columns of U −1/2 ˜ ˜ clidean norm. Let U = U (I +diag(Ξ)) and ΣB˜ = (I +diag(Ξ))1/2 ΣB˜ . Then ˜B ˜ Vˆ T = U ˜ Σ ˜ (Vˆ V ˜ )T U B B can be considered as another acceptable approximate SVD. If the diagonal entries of Σ B˜ are not ordered, we can deploy permutation π, U˜ ππ T Σ B˜ π(Vˆ VB˜ π)T ˜ := U ˜ π, Σ = π T Σ π. and write U ˜ ˜ B B
146
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
˜ Vˆ T : Proposition 3. Consider the general case of multiple singular values of U˜ B σ ˆ1 = . . . = σ ˆ s1 > σ ˆs1 +1 = . . . = σ ˆ s2 > . . . > σ ˆs−1 +1 = . . . = σ ˆ s , s = n, s0 ≡ 0. ˜ ≥ · · · ≥ σn (B) ˜ and σ (B) ˜ ≥ · · · ≥ σn (B) ˜ be the ordered singular Let σ1 (B) 1 ˜ and Σ , respectively. values of B ˜ B (i) If Γ 2 < 1 then ˜ |ˆ σj − σj (B)| 1 Γ 2 . max ≤ 2 1≤j≤n 2 1 − Γ 2 ˜ σ ˆj σj (B) (ii) Let the relative gaps in the spectrum be defined by γˆi = min j= i
|ˆ σs2i − σ ˆs2j | σ ˆs2i + σ ˆs2j
, i = 1, . . . , ; γˆ = min γˆi .
˜ 2 < γ/ If Ξ ˆ 3 then for all i 3 3 4 1 12 4 si 4 4 1σ 1 ˜ 1 4 4 1 ˆsi − σj (B) 5 ≤5 1 1 ˜ 1 1 σj (B) j=si−1 +1
i
1 12 1 1 2 σ ˆ 2 ˜ 2 1 1 si ≤ Ξ 11 − 1 F. ˜ )2 1 1 γˆi (σ σj (B) j=si−1 +1 si
In particular, ˜ | |ˆ σj − σj (B) 2 ˜ 2 ≤ Ξ F. ˜ 1≤j≤n γˆ σj (B) max
Proof: In [3], using the results from [6].
4.
˜ On the (i)relevance of the (non)orthogonality of U
˜ = u ˜1 , . . . , u ˜n is not numerically orthogonal in cases of The matrix U large condition number of A. Still, it is seldom in practice that the SVD is the final target of the computation. More often, the SVD is computational tool used to analyze and solve some other problem. For all that, possible loss of orthogonality is not always damaging as it may be expected, but provided that the particular application of the SVD is properly formulated. Instead of trying to fix the loss of orthogonality, we can try to make it irrelevant by proper modification of its use in given situation. Our effort is motivated by the potential of the one–sided bidiagonalization in distributed computing, and our argument ¨ is inspired by the discussion in Bjorck and Paige [2] and with Barlow.
One–sided bidiagonalization algorithm
4.1
147
Symmetric definite eigenvalue problem
Suppose we need to diagonalize symmetric positive definite matrix H = AT A, where A is the (computed) Cholesky factor, or any other full column rank factor obtained from the application that generates H. Here we note that in some important applications the numerically most important step is not to assemble H, but to formulate the problem in terms of A. Algorithm 4.1. This algorithm diagonalizes symmetric positive definite H. 1 Factor H as H = AT A, where A is the Cholesky or any other full column rank factor of A. 2 Compute the singular value matrix Σ = diag(σ i ) and the right singular vectors V of A using bidiagonalization of Algorithm 2.1 and some state of the art bidiagonal SVD. 3 The spectral factorization of H is H = V ΛV T , Λ = diag(λi ), λi = σi2 . Our formulation of the algorithm is numerically correct, as the following results will show. Theorem 4.2. If in Algorithm 4.1 the matrix A is given, and H is implicitly defined as H = AT A, then Algorithm 4.1 is backward stable. If H is given, Hs−1 2 < 1−2(n+1)ε and H n(n+1)ε , then Algorithm 4.1 will successfully compute the Cholesky factorization and the overall computation is backward stable. Here 2 Hs is the matrix (Hs )ij = Hij / Hii Hjj , 1 ≤ i, j ≤ n. Proof: In [3], using the results from [7]. Since Algorithm 4.1 computes the eigenvalues as squares of the singular values, the forward error in the computed eigenvalues of A˜T A˜ is governed by ˜ which is approximately square root of the condition the condition number of A, number of H. ˜1 ≥ · · · ≥ λ ˜ n be the approximations of the eigenvalues of Corolary 4.3. Let λ H, computed in Algorithm 4.1. Then, using the notation and the assumptions of Theorem 4.2, for all i it holds that √ 2 ˜ i = (1 + i )λi , |i | ≤ 2 nξ κ2 (H) + O(n2 )ε H λ Hs−1 2 + 2τ (n)ε + O(ε2 ), ˜ where τ (n) denotes bound on relative forward error for SVD performed on B. Further, if all eigenvalues are simple with Hv i = λi vi , V = v1 , .. . . . , vn , V T V = I, then the columns v˜i of the computed V˜ satisfy ⎡ ⎤ 3 7 2 2 O(mn + n ) sin (˜ vi , vi ) ≤ ⎣ + O(n2 )⎦ ε + O(ε2 ). min j=1,...,n |˜ σj2 − λi | j= i
Proof: f In [3], using the results from [4].
148
4.2
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Least squares problems
Consider now the m × n least squares problem Ax − b 2 −→ min . If A = U ΣV T is the SVD of A with m×n orthonormal U and n×n orthogonal V , then writing Ax − b 22 = U ΣV T x − U U T b − (I − U U T )b 22 = ΣV T x − U T b 22 + b − U (U T b) 22 , we obtain the minimal norm solution x = V Σ † U T b. Note that the mutual orthogonality of the columns of U was important in splitting b (and Ax − b) using its orthogonal projection, as well as in using orthogonal invariance of the Euclidean norm. ˜ , V˜ , Σ ˜ Suppose now we have computed the SVD, where the computed U satisfy ˜Σ ˜ V˜ T . A + δA = U The numerical analysis of such an approach would reveal that backward stable SVD with numerically orthogonal singular vectors produces a vector close to the exact solution of a nearby problem. Strictly speaking, this mixed stability is the best we can hope to achieve in the SVD solution to the least squares problem. Using SVD via one–sided bidiagonalization Consider now the above solution procedure, but with the SVD computed using one–sided bidiagonalization. Since in the computed bidiagonalization A ≈ ˜B ˜ V˜ T the computed U ˜ cannot be guaranteed numerically orthogonal, then U the above analysis does not yield the wanted conclusion. Recall that certain numerical orthogonality is ensured in the augmented matrix formulation and we therefore write the least squares problem in the following obviously equivalent form . .
. . 0 . 0 . . . A x − b . −→ min . 2 If the augmented bidiagonalization and the SVD read
0 B UB 0 Σ T =P V =P (V VB )T , A 0 0 I 0 where B = UB ΣV VBT is the SVD of the bidiagonal B. Using this SVD, we have T T Ax − b 22 = Σ(V VB )T x − UBT P21 b 22 + P P22 b 22 , where
P11 P12 P = , P11 ∈ Rn×n , P21 P22
One–sided bidiagonalization algorithm
149
T b. Premultiplication by and the minimal norm solution is x = V V B Σ† UBT P21 P expresses (theoretically and numerically, cf. [2]) modified Gram–Schmidt T b is the upper n × 1 part of P T 0 b T . Instead of procedure, and note that P21 ˜ , but using it in U ˜ T b or for explicit forming P21 we have the computed matrix U ˜ ˜ of the left singular vector matrix U UB may introduce unacceptable numerical error.
˜ T b be computed Lemma 4.4. Let the approximation y˜ = (˜ γ i )ni=1 of y = U using the formulae (1) d˜1 = b ; γ˜1 = computed(˜ uT1 d˜1 ) ; (2) for i = 2 : n d˜i = computed(d˜i−1 − u ˜i−1 γ˜i−1 ) T ˜ γ˜i = computed(˜ ui di ), ˜ = u ˜1 , . . . , u ˜n is computed as described in Algorithm 2.1 and Thewhere U orem 3.1. Then there exist perturbations δ 0 b, ∆0 b such that
y˜ ∆0 b T ˆ =P b + δ0 b where Pˆ is the matrix from Theorem 3.1, and ∆ 0 b 2 + δ0 b 2 ≤ b 2 , where = O(mn)ε. Proof: f In [3], using the results from [7]. ˜ Vˆ be as in Theorem 3.1. There exist an m × n orLemma 4.5. Let A, B, ˆ thonormal matrix Q and perturbations δ1 A, δ1 b such that ˆ T (A + δ1 A)Vˆ = B, ˜ Q ˆ T (b + δ1 b) = y˜, Q δA F ≤ A F , δ1 b 2 ≤ b 2 , with = O(mn + n3 )ε. Proof: f In [3], using the results from [2]. ˜ †U ˜ T y˜). Then the vector x Theorem 4.6. Let x ˜ = computed(V˜ V˜B Σ ˜ satisfies B 2 ˜ x−x ˆ 2 ≤ O(n )ε ˆ x 2 , where x ˆ denotes the minimal norm solution of the problem (A + δA)x − (b + δb) 2 −→ min ., where δA F ≤ A F , δb 2 ≤ b 2 , with = O(mn + n3 )ε. Proof: f In [3].
150
5.
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Conclusion
Our goal was to explore numerical properties of a new one–sided bidiagonalization algorithm proposed by Barlow, used as the first step in computing SVD factorization of a given matrix. We proved that this algorithm is numerically backward stable, despite of the possible loss in orthogonality of computed left singular vectors. Further, we showed that in some cases, it posses high relative accuracy property, and we demonstrate how the departure from orthogonality of the left singular vectors influences the relative accuracy of the computed singular values. We showed two examples of application of SVD factorization, when if properly modified, problems we are solving became irrelevant on possible loss of orthogonality.
References [1] J. L. Barlow, N. Bosner, and Z. Drmaˇcˇ . A new stable bidiagonal reduction algorithm. In preparation. ˚ . Bjorck ¨ [2] A and C. C. Paige. Loss and recapture of orthogonality in the modified Gram– Schmidt algorithm. SIAM J. Matrix Anal. Appl., 13(1): pp. 176–190, January 1992. [3] N. Bosner and Z. Drmaˇcˇ . On numerical properties of one–sided bidiagonalization algorithm. In preparation. [4] S. C. Eisenstat and I. C. F. Ipsen. Relative perturbation bounds for eigenspaces and singular vector subspaces. In J. G. Lewis, editor, Proceedings of the Fifth SIAM Conference on Applied Linear Algebra, pages 62–66, Philadelphia, 1994. SIAM Publications. [5] K. V. Fernando and B. N. Parlett. Accurate singular values and differential qd algorithms. Numerische Mathematik, 67(2): pp. 191–229, March 1994. [6] V. Hari and Z. Drmaˇcˇ . On scaled almost–diagonal hermitian matrix pairs. SIAM J. Matrix Anal. Appl., 18(4): pp. 1000–1012, October 1997. [7] N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 1996. [8] R.-C. Li. Realtive Perturbation Theory: (1) Eigenvalue and singular value variations. LAPACK working note 84, Computer Science Department University of Tennessee, Knoxvill, 1994. Revised May 1997. [9] R. Ralha. One–sided reduction to bidiagonal form. Linear Algebra and its Applications, 358: pp. 219–238, January 2003.
KNOT INSERTION ALGORITHMS FOR WEIGHTED SPLINES Tina Bosner Dept. of Mathematics, University of Zagreb Bijeniˇcˇ ka 30, 10000 Zagreb, Croatia
[email protected]
Abstract
We develop a technique to calculate with weighted splines of arbitrary order, i.e. with splines from the kernel of the operator D k wD2 , with w piecewisely constant, based on knot insertion type algorithm. The algorithm is a generalization of de Boor algorithm for polynomial splines, and it inserts the evaluation point in the knot sequence with maximal multiplicity. To achieve this, we use a general form of knot insertion matrices, and an Oslo type algorithm for calculating integrals of B-splines in reduced Chebyshev systems. We use the fact that the space of weighted splines is a subspace of the polynomial spline space. The complexity of proposed algorithm can be reduced to the computationally reasonable size. Now we can calculate weighted splines, and the splines associated with their reduced system, in a stable and efficient manner.
Keywords:
Chebyshev system, weighted spline, knot insertion, de Boor algorithm, Oslo algorithm.
1.
Introduction and Preliminaries
The choice of the spline space to be used in computer–aided geometric design, solving differential equations, and many other applications, relies heavily on efficient and fast evaluation algorithms. Oftenly, the polynomial splines are used, because elegant and stable recurrences exist enabling their fast evaluation, as well as of the associated derivatives and integrals. The extension of tools like knot insertion to wider family of splines is therefore important in the setting of the above applications. Weighted splines serve as the first and relatively simple step in generalization of these tools. Until now, stable algorithms exist for order 4 [7], and our aim is to derive stable algorithms for calculating with weighted splines of higher order. In section 2, algorithm for splines of order 4 is developed as a special case, and in section 3, for the general case. For a given measure vector dσ : = (dσ 2 , . . . dσk )T , and x ∈ [a, b], we can define generalized powers (or Canonical Complete Chebyshev (CCC)–system) 151 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 151–160. © 2005 Springer. Printed in the Netherlands.
152
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
{1, u2 , . . . uk } [11]: ! u2 (x) =
!
x a
dσ2 (ττ2 ), . . . ; uk (x) =
!
x a
dσ2 (ττ2 ) . . .
τk−1
dσk (ττk ). a
The i-th reduced system is defined as a Chebyshev system corresponding to the reduced measure vector, that is dσ (i) (δ) : = (dσi+2 (δ), . . . , dσk (δ))T ∈ Rk−(i+1) , i = 1, . . . k − 2, for measurable δ ⊂ [a, b]. Generalized derivatives Lj,dσ : = Dj · · · D1 , where f (x + δ) − f (x) , j = 1, . . . , k − 1, δ→0+ σj +1 (x + δ) − σj+1 (x)
Dj f (x) : = lim
for f ∈ S(k, dσ) : = span{1, u2 , . . . uk }, are linear mappings S(k, dσ) → S(k − j, dσ (j) ). For a partition ∆ = {xi }l+1 b], given multiplicity vector i=0 of an interval [a,, T m = (m1 , . . . , ml ) , (0 < mi ≤ k), and M := li=1 mi , we shall denote by {t1 . . . t2k+M }, an extended partition in the usual way:
tk+1
t1 ≤ . . . ≤ tk = x0 = a, ≤ . . . ≤ tk+M = x1 , . . . , x1 , . . . , xl , . . . , xl , 6 788 9 6 788 9 m1
ml
b = xl+1 = tk+M +1 ≤ . . . ≤ t2k+M .
S(k, m, dσ, ∆) is the spline space spanned by functions being piecewise in S(k, dσ), with generalized derivatives up to k − m i − 1-th order joining continuously in xi for i = 1, . . . , l. Chebyshev B-splines in S(k, m, dσ, ∆) have compact support [ti , ti+k ], and we henceforth assume that these are the unique , splines such that ni=1 Bik−j (x) = 1, (j = 0, . . . k−1), where n := k−j +M . In fact, n and M depend on the length of dσ (j) , but we avoid more complex notation, and also assume that mi = 1 for all i. In our case ∆ = {ti }ni=k+3 , where t1 ≤ t2 ≤ · · · ≤ a = tk+2 < tk+3 < · · · < tn < tn+1 = b ≤ · · · ≤ tn+k+1 ≤ tn+k+2 . If we define a positive func) tion w as w|[ti ,ti+1 ) = wi , wi = const, then dσ = (dλ(τ ), dλ(τ w(τ ) , . . . , dλ(τ )), where dλ is the Lebesgue measure, and weighted powers of order k + 2, with
153
Knot Insertion Algorithms for Weighted Splines
k ≥ 2, are: u1 (x) = 1, ! x u2 (x) = dττ2 , a ! x ! u3 (x) = dττ2 !
x a
dττ3 , w(ττ3 )
τ2
dττ3 w(ττ3 )
a
a
.. . ! uk+2 (x) =
τ2
dττ2
a
!
τ3 a
! dττ4 · · ·
τk+1 a
dττk+2 ,
while the first reduced CCC-system is: u1,1 (x) = 1, ! x dττ3 u1,2 (x) = , τ3 ) a w(τ .. . ! x ! τ3 ! k+1 dττ3 u1,k+1 (x) = dττ4 · · · dττk+2 . τ3 ) a a w(τ a Further, we need the first two generalized derivatives: L 1,dσ = D, L2,dσ = wD 2 , as well as the generalized derivative with respect to the reduced system: L1,dσ(1) = wD, and S(k + 2, dσ) = Ker Lk+2,dσ = Ker D k wD 2 , S(k + 1, dσ (1) ) = Ker Lk+1,dσ(1) = Ker D k wD. This produces a few important properties: weighted splines are C 1 polynomial splines of order k + 2, splines from the first reduced system are C 0 polynomial splines of order k + 1, and splines from the second reduced system are ordinary polynomial splines of order k. Instead of the de Boor–Cox type recurrence we use the derivative formula [6], stating that for x ∈ [a, b], and a multiplicity vector m whose components satisfy mi < k − 1 (i = 1, . . . l), the derivative of a B-spline is k L1,dσ Bi,dσ (x) =
k−1 Bi,dσ (1) (x)
where
Ck−1 (i) !
Ck−1 (i) : =
ti+k−1
ti
−
k−1 Bi+1,dσ (1) (x)
Ck−1 (i + 1)
,
(1)
k−1 Bi,dσ (1) dσ2 .
The last tool needed is the Lebesgue dominated convergence Theorem, by which integrals of B-splines (with some measure) are continuous functions of their knots [9].
154
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
From now on, the weighted B-splines of order k + 2 and the B-splines from the reduced systems will be denoted as T il , 1 ≤ l ≤ k + 2, and polynomial B-splines of order l as Bil .
2.
Weighted splines of order 4 (k = 2)
˜ 2n Let T = {ti }n+4 i=1 and T = {ti }i=1 be extended partitions, where t 1 = t2 = t3 = t4 = a, ti < ti+1 for i = 4, . . . , n, tn+1 = tn+2 = tn+3 = tn+4 = b, and t˜1 = t˜2 = t˜3 = t˜4 = a, t˜2i−5 = t˜2i−4 = ti for i = 5, . . . , n, t˜2n−3 = t˜2n−2 = t˜2n−1 = t˜2n = b: ··· ··· ···
ti−1 t˜r−3 t˜r−2
ti t˜r−1 t˜r
ti+1 t˜r+1 t˜r+2
ti+2 t˜r+3 t˜r+4
··· ··· . ···
Let Ti4 be weighted B-splines, and similarly Ti3 , Ti2 B-splines with respect to 4, . Obviously T4 = B the first and second reduced systems, associated with T i i 3 3 2 2 2 2 , T = B , and T = B . Integrals of B-splines, denoted as C 2 (i), Ti = B i i i i i 2 (i), C 3 (i) on T , are: C3 (i) for B-splines on T , and C
! ti+2 dτ 1 hi hi+1 C2 (i) = = , Ti2 (τ ) + w(τ ) 2 wi wi+1 ti ! ti+3 C3 (i) = Ti3 (τ ) dτ ti ⎞ ⎛ hi+2 hi (hi+1 + hi+2 ) ⎟ (h + hi+1 ) 1 ⎜ wi i wi+2 ⎟, = ⎜ + h + i+1 hi+1 hi+1 hi+2 ⎠ 3 ⎝ hi + + wi wi+1 wi+1 wi+2 2 (r − 1) = C 2 (r) = C 3 (r − 1) = C 3 (r) = C with hi := ti+1 − ti .
!
t˜r+1
tr −1 t˜r+2
!
tr t˜r+2
!
tr −1 t˜r+3
!
tr
dτ hi Tr2−1 (τ ) = , w(τ ) 2 wi dτ hi = , Tr2 (τ ) w(τ ) 2 wi hi Tr3−1 (τ ) dτ = , 3 hi + hi+1 Tr3 (τ ) dτ = , 3
Knot Insertion Algorithms for Weighted Splines
155
The following lemma and theorem are connecting general Chebyshev Bsplines of orders 3 and 4 with less smooth ones, which are, in our case, trivial to calculate. Proofs are omitted and may be found in [8]. 3 Lemma 1. Let Ti,d ∈ S(3, m, dσ (1) , ∆) be a Chebyshev 3rd order spline σ(1)
associated with the multiplicity vector m = (1, . . . , 1) T , and let us assume 3 ∈ S(3, m, ˜ dσ (1) , ∆) are B-splines associated with multiplicity that Ti,d σ(1)
vector m ˜ = (2, . . . , 2)T on the same knot sequence. If {t1 , . . . , tk+6 } and ˜ {t1 , . . . , t˜2k+6 } are the associated extended partitions, and r an index such that ti = t˜r < t˜r+1 , then for i = 1, . . . , k + 3: 3 3 3 3 3 Ti,d = Ti,d (t )T3 + Tr+1,dσ (1) + Ti,dσ(1) (ti+2 )Tr+2,dσ(1) . σ(1) σ(1) i+1 r,dσ(1) 4 4 Theorem 2. Let Ti,d ˜ dσ, ∆), multiplicity σ ∈ S(4, m, dσ, ∆), Ti,dσ ∈ S(4, m, 4 vectors m, m ˜ being as in Lemma 1. Then positive δ i (j) exist such that 4 Ti,d σ
=
r+3
4 δi4 (j)Tj,d σ,
j=r
where r = ri satisfies ti = t˜ri < t˜ri +1 . Let the extended partitions be {t1 , . . . , tk+8 } and {t˜1 , . . . , t˜2k+8 }. Then δi4 (j), j = r, . . . , r + 3 are determined by the formulæ: δi4 (r) δi4 (r+1) δi4 (r+2)
=
=
=
δi4 (r+3) =
3 Ti,d (t )C(r) σ(1) i+1
3 3 + C(r+1) Ti,d (t )C(r) + Ti,d (t )C(r+2) σ(1) i+1 σ(1) i+2 3 + C(r+1) Ti,d (t )C(r) σ(1) i+1
3 3 + C(r+1) Ti,d (t )C(r) + Ti,d (t )C(r+2) σ(1) i+1 σ(1) i+2
Ti3+1,dσ(1) (ti+3 )C(r+4) + C(r+3)
Ti3+1,dσ(1) (ti+2 )C(r+2) + C(r+3) + Ti3+1,dσ(1) (ti+3 )C(r+4) Ti3+1,dσ(1) (ti+3 )C(r+4)
Ti3+1,dσ(1) (ti+2 )C(r+2) + C(r+3) + Ti3+1,dσ(1) (ti+3 )C(r+4)
where = C(i)
! support
3 Ti,d dσ2 . σ(1)
be arbitrary partitions of [a, b], m and m their multiplicity Let ∆ and ∆ Let T = {tj }n+k and dσ, ∆). vectors, such that S(k, m, dσ, ∆) ⊂ S(k, m, j=1
156
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
= {t˜i }m+k , with n ≤ m, be extended partitions associated to the spline T i=1 respectively, and let B k (B k ) dσ, ∆), spaces S(k, m, dσ, ∆) and S(k, m, j,dσ i,dσ m,n be B-splines in these spaces. Then m × n matrix Γ = (γ i,j )i=1,j=1 such that k Bj,d σ (x) =
m
k (x), γi,j B i,dσ
j = 1, . . . , n,
(2)
i=1
. Lemma 1 and is called the knot insertion matrix of order k from T to T 3 )2n−4,n , Theorem 2 enable us to calculate explicitly the elements of Γ 3 = (γ γi,j i=2,j=2 , where a knot insertion matrix for the first reduced system from T to T 2 (r) C , C2 (i) = 1, 2 (r + 3) C = , C2 (i + 1)
3 γr,i =
γr3+1,i γr3+2,i
4 )2n−4,n , a knot insertion matrix for the weighted splines from and Γ4 = (γ γi,j i=1,j=1 T to T , where
4 γr,i =
γr4+1,i = γr4+2,i = γr4+3,i
3.
3 C 3 (r) γr,i , C3 (i) 3 (r) + γ 3 γ3 C r,i
r+1,i C3 (r
+ 1)
C3 (i)
,
3 (r + 3) + γ 3 γr3+3,i+1 C r+4,i+1 C3 (r + 4) C3 (i + 1)
,
3 (r + 4) γr3+4,i+1 C . = C3 (i + 1)
Weighted splines of order k + 2 (k > 2)
As for the case of order 4, in the general case we can proceed in the same way by calculating the knot insertion matrix for inserting all interior knots up to the multiplicity k, but it is not obvious how to calculate elements in such a matrix in a stable manner. Therefore, we have to choose an algorithm, based on inserting only one knot at each step.
157
Knot Insertion Algorithms for Weighted Splines
= { Let T = {ti }n+k+2 and T ti }n+k+3 be extended partitions of [a, b], i=1 i=1 T = T ∪ {t}, where t ∈ [ti , ti+1 ) for some i: ···
ti−1 ti−1
···
ti ti
t • ti+1
ti+1 ti+2
ti+2 ti+3
··· ···
l ) be an elementary knot insertion matrix, i.e. a knot insertion Let Γl = (γ γi,j , that inserts only one knot. Its elements are given matrix of order l from T to T with the following theorem:
Theorem 3. Let T = (t1 ≤ t2 ≤ · · · ≤ tk−1 ≤ a = tk < tk+1 < · · · < tn < tn+1 = b ≤ tn+2 ≤ · · · ≤ tn+k−1 ≤ tn+k ) be an extended partition of [a, b], and let {wi }ki=2 be piecewise continuous on [a, b], with possible jumps located at (ti )ni=k+1 . Let {1, u2 , . . . , uk } be the CCC–system associated with the measure vector dσ := (w2 (s2 ) ds2 , . . . , wk (sk ) dsk ). For t ∈ (a, b), and i such that t ∈ [ti , ti+1 ), let T = T ∪ {t}. Then the nontrivial elements of knot l ) of order l from T to T are: insertion matrix Γl = (γ γi,j 1 =1 γjj,j for j ≤ i, 1 γj,j = 1 for j ≥ i + 1, −1
for l = 1, and for j ≤ i − l + 1,
l =1 γj,j l = γ l−1 γj,j j,j
l−1 (j) C Cl−1 (j)
l−1 l γj,j −1 = γj +1,j
for i − l + 2 ≤ j ≤ i,
l−1 (j + 1) C Cl−1 (j)
for j ≥ i + 1,
l γj,j −1 = 1
for l = 2, . . . , k, where
!
Cl−1 (j) := l−1 (j) := C
for i − l + 2 ≤ j ≤ i,
!
tj+l−1 tj tj+l−1
tj
l−1 Bj,d dσk−l+2 , σ(k−l+1)
l−1 (k−l+1) dσk−l+2 , B j,dσ
l−1 l−1 for l = 2, . . . , k. B-splines Bj,dσ (k−l+1) and Bj,dσ(k−l+1) are associated with , respectively. the extended partitions T and T
158
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Proof. Theorem 3 is proved by induction. Γ 1 is trivial. Let us assume that the theorem holds for l − 1. We use the derivative formula (1) and the fact that two splines are equal if and only if they have equal generalized derivatives and the same value at the same point. By (2), let l f (x) := Bj,d (x), σ(k−l) l−1 g(x) := γj,j
l−1 (j) l−1 (j + 2) C C l (k−l) (x) + γ l−1 l B B (k−l) (x). j +2,j+1 j,d σ Cl−1 (j) Cl−1 (j + 1) j +1,dσ
Obviously, f (tj ) = g(tj ), and Bjl−1 +1,dσ(k−l+1) − , Cl−1 (j) Cl−1 (j + 1) ⎞ ⎛ l−1 (k−l+1) l−1 (k−l+1) B B j +1,dσ l−1 Cl−1 (j) ⎝ j,dσ ⎠ L1,dσ(k−l) g(x) = γj,j − Cl−1 (j) Cl−1 (j) Cl−1 (j + 1) ⎛ l−1 Cl−1 (j + 2) ⎝ Bj +1,dσ(k−l+1) +γ γjl−1 +2,j+1 l−1 (j + 1) Cl−1 (j + 1) C ⎞ l−1 (k−l+1) B j +2,dσ ⎠. − Cl−1 (j + 2)
L1,dσ(k−l) f (x) =
l−1 Bj,d σ(k−l+1)
Next we make use of the matrix Γl−1 and (2) to prove that L1,dσ(k−l) f (x) = L1,dσ(k−l) g(x). In our case , this recursive formula will have only two steps, because the knot insertion matrix Γk for the second reduced system is just the knot insertion matrix for polynomial splines of order k. Also, the required integrals of Bsplines are: ! ti+k+1 Ck+1 (i) = Tik+1 (τ ) dτ, ti ti+k
! Ck (i) =
ti
Bik (τ )
dτ . w(τ )
(3)
, To evaluate the weighted spline f ( t ) = i ai Ti ( t ) at the point t, we use an algorithm which is a generalization of the well known de Boor algorithm for polynomial splines [2]. The point t is inserted until maximal multiplicity by using elementary knot insertion matrices. The only problem that remains is to calculate the integrals of B-splines (3). One of the optimal algorithms to do that
Knot Insertion Algorithms for Weighted Splines
159
is the generalized Oslo algorithm [1], where each interior knot is inserted until multiplicity k. Again, elementary knot insertion matrices, but of lower order, are used. This algorithm can also be accelerated by avoiding operations where multiplication by zero or addition of zero occurs.
4.
Conclusion
The difference form the standard de Boor algorithm in the polynomial case lies only in calculation of integrals of B-splines. If w ≡ const, our algorithm becomes the standard de Boor algorithm, what is expected. Although the complete method is slightly complex, the complexity of the proposed algorithm can be reduced to a computationally reasonable size. During the development of the algorithm, we have tried different ways of calculating the integrals of B-splines (Gausssian integration and some special recurrence), but the biggest problem was the long time of execution. This final product is drastically faster than previous versions. Now we can calculate weighted splines, and the splines associated with their reduced system, in a stable and efficient manner. Acknowledgments. This research was supported by Grant 0037114, by the Ministry of Science, Education and Sports of the Republic of Croatia, and by BICRO ltd. under contract 0023-2002/2003-U-1.
References [1] P. J. Barry and R. N. Goldman, Knot insertion algorithms, Knot Insertion and Deletion Algorithms for B-Spline Curves and Surfaces, R. N. Goldman and T. Lyche (eds.), SIAM, 1993, pp. 89–133. [2] C. de Boor, A Practical Guide to Splines, Springer, New York, 1978. [3] T. Bosner, Polar forms of splines and knot insertion algorithms, master’s thesis, Department of Mathematics, University of Zagreb, 2002. (in Croatian), pp. 58–78. [4] T. A. Foley, Interpolation with interval and point tension controls using cubic weighted ν-splines, ACM Transactions on Mathematical Software, Vol. 13, No. 1., 1987, pp. 68–96. [5] T. A. Foley, A shape preserving interpolant with tension controls, Computer Aided Geometric Design 5, 1988, pp. 105–118. [6] M. Rogina, Basis of splines associated with some singular differential operators, BIT, No. 32, 1992, pp. 496–505. [7] M. Rogina, A knot insertion algorithm for weighted cubic splines, in Curves and Surfaces with Applications in CAGD, A. Le M´e´ haute´ , C. Rabut, L. L. Schumaker (eds.), Vanderbilt University Press, Nashville & London, 1997, pp. 387–395. [8] M. Rogina and T. Bosner, On calculating with lower order Chebyshev splines, in Curve and Surface Design, P.–J. Laurent, P. Sablonni`e` re, L. L. Schumaker (eds.), Vanderbilt University Press, Nashville, 1999, pp. 343–353. [9] M. Rogina and T. Bosner, A de Boor type algorithm for tension splines, in Curve and surface fitting: St Malo 2002, A. Cohen, J. L. Merrien, L. L. Schumaker (eds.), Nashboro press, Brentwood, 2003, pp. 343–353.
160
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[10] K. Salkauskas, C 1 Splines for interpolation of rapidly varying data, Rocky Mountain Journal of Mathematics, Vol. 14, No. 1., 1974, pp. 239–250. [11] L. L. Schumaker, Spline Functions: Basic Theory, New York, John Wiley & Sons, 1981.
NUMERICAL PROCEDURES FOR THE DETERMINATION OF AN UNKNOWN SOURCE PARAMETER IN A PARABOLIC EQUATION Emine Can Baran University of Kocaeli, Department of Physics Izmit, Kocaeli, Turkey
[email protected]
Abstract
Numerical procedures for the solution of an inverse problem of determining unknown source parameter in a parabolic equation are considered. Two different numerical procedures are studied and their comparison analysis is presented. One of these procedures is obtained by introducing transformation of an unknown function, while the other is based on trace functional formulation of the problem.
Keywords:
parabolic equation, inverse problem, unknown source parameter, finite-difference method.
1.
Introduction
In this paper, we study numerical procedures for the following inverse problem of simultaneously finding unknown coefficients p(t) and u(x, t) satisfying equation ut = uxx + qux + p(t)u + f (x, t),
x ∈ (0, 1), t ∈ (0, T ]
(1)
with the initial-boundary conditions u(x, 0) = ϕ(x),
x ∈ (0, 1),
(2)
u(0, t) = g1 (t),
t ∈ [0, T ],
(3)
u(1, t) = g2 (t),
t ∈ [0, T ]
(4)
x∗ ∈ (0, 1), t ∈ (0, 1],
(5)
and the additional specification u(x∗ , t) = E(t),
161 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 161–169. © 2005 Springer. Printed in the Netherlands.
162
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
where f (x, t), ϕ(x), g1 (t), g2 (t) and E(t) = 0 are known functions, q is known constant and x∗ is a fixed prescribed interior point in (0, 1). If u is a temperature then (1) − (5) can be regarded as a control problem finding the control p = p(t) such that the internal constraint (5) is satisfied. If p(t) is known then direct initial boundary value problem (1) − (3) has a unique smooth solution u(x, t) [1]. The similar inverse problem have been studied by authors in [2, 3]. In this work we study two type of numerical procedures for considered problem. One of them is proposed in [2, 3, 4]. According to this procedure the term p(t)u in (1) is eliminated by introducing some transformation and system (1) − (5) is written in the canonical suitable form for the finite difference solution. Another procedure to the solution of the same problem is obtained by using trace type functional (TTF) formulation[5].
2.
Procedure I (canonical representation)
This procedure is as follows: the term p(t)u in (1) eliminated by introducing the following transformation ! t q v(x, t) = u(x, t) exp( x − (p(s) − q 2 /4)ds) (6) 2 0 ! t r(t) = exp(− (p(s) − q 2 /4)ds) (7) 0
and
v(x, t) q q 2 r (t) u(x, t) = exp(− x), p(t) = − r(t) 2 4 r(t)
(8)
then for these new functions and variables the auxiliary problem becomes q vt = vxx + r(t) exp( x)f (x, t), 2
0 < x < 1,
0 < t T,
q v(x, 0) = ϕ(x) exp( x), 0 < x < 1, 2 v(0, t) = g1 (t)r(t), 0 < t T, q v(1, t) = g2 (t)r(t) exp( ), 0 < t T 2
(9) (10) (11) (12)
subject to r(t) =
v(x∗ , t) q ) exp(− x∗ ), x∗ ∈ (0, 1), 0 < t T E(t) 2
(13)
which we can obtain by elementary calculations. Problem (9) − (12) can be viewed as a nonlocal parabolic problem with nonlocal boundary conditions.
Determination of an unknown parameter in a parabolic equation
163
This system can be solved by finite difference method. Existence and uniqueness of solutions to the similar problems has been proved in [6, 7]. Let vjn and v∗n be an approximation to v(xj , tn ) and v(x∗ , tn ) respectively, τ = ∆t > 0 and h = ∆x > 0 be step length on time and space coordinates, {0 = t0 < t1 < ... < tM = T } and {0 = x0 < x1 < ... < xN = 1} where tn = nτ, xj = jh denotes a partition of [0, T ] and [0, 1], respectively. Then the finite difference scheme can be used for approximation (9) − (13) as follows. Find vjn such that vjn − vjn−1 τ
=
n n − 2vjn + vj−1 vj+1 q + fjn R(v∗n−1 ) exp( x), 2 h 2
j = 1, N − 1, n = 1, M , q vj0 = ϕ(xj ) exp( xj ), j = 0, N , 2
(14)
v0n = g1n R(v∗n−1 ),
(16)
n = 1, M ,
q n vN = g2n R(v∗n−1 ) exp( ), n = 1, M , 2 n n n where fj = f (xj , tn ), g1 = g1 (tn ), g2 = g2 (tn ), R(v∗n−1 ) =
(15)
(17)
v n−1 q ∗ exp(− x∗ ) E(tn−1 ) 2
and for x∗ ∈ (xm , xm+1 ], v n−1 = ∗
xm+1 − x∗ n−1 x∗ − xm n−1 vm + vm+1 h h
approximate value of v∗n−1 by the linear interpolation. The scheme (14) − (17)has a second order approximation on space and first order approximation on time. Once vin is known numerically the unknown pair (u, p(t)) can be calculated through the inverse transformation (8) and (13) via numerical differentiation.
3.
Procedure II (TTF formulation) If the functions pair (u, p) solves the inverse problem (1) − (4) then Et = uxx |x=x∗ + qux |x=x∗ + p(t)u|x=x∗ + f (x∗ , t )
(18)
p(t) = (E Et − uxx |x=x∗ − qux |x=x∗ − f (x∗ , t ))/E(t)
(19)
from that
164
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Substituting in (1) leads to the following initial boundary problem Et − uxx |x=x∗ − qux |x=x∗ − f (x∗ , t)|x=x∗ )u E(t) +f (x, t ) x(0, 1)
ut = uxx + qux + (
(20)
u(x, 0) = ϕ(x),
x[0, 1],
(21)
u(0, t) = g1 (t),
t(0, T ],
(22)
u(1, t) = g2 (t),
t(0, T ],
(23)
Such representation is called as trace type functional (TTF) formulation of problem (1)-(4) (see [5]). From the solution of this system the approximate solution of p(t) can be determined by (19). Numerical solution of (20) − (23) is realized by the implicit finite difference scheme which can be written as follows: uj+1 − uji i τ
j+1 j+1 + uj+1 uj+1 uj+1 i+1 − 2ui i−1 i+1 − ui−1 + q h2 2h j+1 j j+1 +k(u∗ )ui + fi ,
=
1 i N − 1, 0 j M − 1,
(24)
u0i = ϕ(xi ), 0 i N,
(25)
uj0 = g1 (tj ), 1 j M,
(26)
ujN = g2 (tj ), 1 j M,
(27)
where fij+1 = f (xi , tj+1 ) k(uj∗ ) =
E
uj∗ − 2uji∗ + uji∗ −1 − Ej − i +1 τ h2 j j u ∗ − ui∗ −1 −q i +1 − f (xi∗ , tj ) /E Ej +1 2h j +1
(28)
The system (24) − (28) can be solved again by standard numerical solver.
4.
Numerical Result and Discussion
In this section we report some results of our numerical calculations using the numerical procedures described in the previous section. If we take a solution u(x, t), coefficient q, source parameter p(t) and x ∗ as u(x, t) = t sin x+1, q = 2, p(t) = 10t exp(−t2 ), x∗ = 0.26 then substituting in (1), it can be seen that the input data and additional condition in (1) − (5) can be as follows f (x, t) = sin x(1 + t) − 2t cos x − p(t)(t sin x + 1), ϕ(x) = u(x, 0) = 1,
Determination of an unknown parameter in a parabolic equation
165
g1 (t) = u(0, t) = 1, g2 (t) = u(1, t) = t sin (1) + 1 and E(t) = t sin (0.26) + 1. First examples has been done to control whether expressions used in the procedures I and II are approximation expressions. The numerical results for different time and space steps are given in Figs. 1-4 for procedure I and II, respectively. As seen from the figures, that approximation is improved by increasing the number of nodes and for the sufficiently large number of nodes the agreement between numerical and exact solution becomes uniformly good.
166
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Second solution examples have been done to control the sensitivity of procedures to errors. Artificial errors were introduced into the additional specification data by defining functions E(tn ) = E(tn )(1 + d(tn , δ)) where d(tn , δ) represents the level of relative error in the corresponding piece of data. Two cases were considered. a) d(t, δ) = d = const, b) d(t, δ) is a random function of t uniformly distributed on (−δ, δ) (random errors). Calculation results with grid N × M = 300 × 600 are presented in figure 5 and figure 6, according to cases a) and b), respectively. Results with constant
Determination of an unknown parameter in a parabolic equation
167
errors d = 0.13 and d = 0.03 are given in Figs. 5-6. In Figs. 7-8 the results of the case of random errors d(t, 0.001) and d(t, 0.002) are presented. The results obtained for p(t) = (t3 + t) exp(−t2 ).
As seen from the figures that by constant errors results are worsening, but not much. In the case of random errors approximation is worsening and there is an approximation in some integral norm. Both procedures to solve the problem were tried on different test and the results we observed indicated that procedure I is more stable than procedure II. On the other hand, procedure II is more effective on the solution of some problems and less sensitive to artifical errors than procedure I.
168
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
References [1] O.A. Ladyzhenskaya, V.A. Solonnikov, N.N. Uralceva, Linear and quasi-linear equations of parabolic type, Nauka, Moscow (1967). [2] A. G. Fatulllayev and Emine C., ’Numerical procedures for determining unknown source parameter in parabolic equation’, Mathematics and Computers in Simulation, 1845, 1-9 (2000). [3] Lin Y., ’An inverse problem for a class of quasilinear parabolic equations’, SIAM J. Math. Anal. 22, 146-156(1991). [4] A.G. Fatullayev, ’Numerical procedure for determination of an known parameter in a parabolic equation’,Intern. J. Computer Math.,78, 103-109 (2001).
Determination of an unknown parameter in a parabolic equation
169
[5] Colton D., Ewing R., and Rundell W., Inverse problems in partial differential equation. SIAM Press, Philadelphia, 1990. [6] Cannon J.R. and Lin Y., ’Determination of a parameter p(t) in some quasi-linear parabolic differential equations’, Inverse Problems, 4, 35-45(1998). [7] Cannon J.R. Lin Y. and Wang S., ’Determination of source parameter in parabolic equations’, Meccanica, 27, 85-94(1992).
BALANCED CENTRAL NT SCHEMES FOR THE SHALLOW WATER EQUATIONS ˇ ˇ Nelida Crnjari´ c-Zic Faculty of Engineering, University of Rijeka, Croatia
[email protected]
Senka Vukovi´c Faculty of Engineering, University of Rijeka, Croatia
[email protected]
Luka Sopta Faculty of Engineering, University of Rijeka, Croatia
[email protected]
Abstract
The numerical method we consider is based on the nonstaggered central scheme proposed by Jiang, Levy, Lin, Osher, and Tadmor (SIAM J. Numer. Anal. 35, 2147(1998)) that was obtained by conversion of the standard central NT scheme to the nonstaggered mesh. The generalization we propose is connected with the numerical evaluation of the geometrical source term. The presented scheme is applied to the nonhomogeneous shallow water system. Including an appropriate numerical treatment for the source term evaluation we obtain the scheme that preserves quiescent steady-state for the shallow water equations exactly. We consider two different approaches that depend on the discretization of the riverbed bottom. The obtained schemes are well balanced and present accurate and robust results in both steady and unsteady flow simulations.
Keywords:
balance law, central schemes, exact C-property, shallow water equations.
Introduction In recent years many numerical schemes have been adopted for application to hyperbolic balance laws. Different schemes are obtained according to the discretization of the source term. In presence of the stiff source terms in balance laws, the implicit evaluation of the source term is needed, since the explicit evaluation can produce numerical instabilities. For other type of bal171 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 171–185. © 2005 Springer. Printed in the Netherlands.
172
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
ance laws, which incorporate the geometrical source terms, such as shallow water equations, an essentially different approach must be used. Here the explicit evaluation of the source term, that additionally accounts for the crucial property of balancing between the flux gradient and the source term, leads to very accurate and robust numerical schemes. One of the first numerical schemes based on that approach was developed by Bermudez and Vazquez ([1, 2]). Their numerical scheme is of finite volume type, with the source term evaluation that includes the upwinding in such a way that the obtained scheme is consistent with the quiescent steady state, i.e., it satisfies the C-property. In [9] the surface gradient method used in combination with the MUSCL Hancock scheme leads to a balanced numerical scheme. The central–upwind schemes ([3]) have been also developed for the shallow water equations. Furthermore, in [7] higher order numerical schemes, i.e., the finite difference ENO and WENO schemes, were extended to the balance laws. In this work we focus on the nonstaggered central NT scheme ([8, 10]). In [10], the central NT schemes were already developed for the balance laws. However, the approach used there is aimed to the balance laws with a stiff source term, while here we consider systems with a geometrical source term and present a completely different numerical treatment. The paper is organized as follows. After the nonstaggered central NT scheme for the homogeneous case is presented, its extension to the balance law is given. In second section we apply the extended schemes to the shallow water equations. The discretizations of the source term are made according to the required balancing property. Additionally the numerical scheme must be adapted in such a way that the transformations from the nonstaggered to the staggered values and vice a versa preserve the quiescent flow. In that sense, based on the different riverbed discretizations, we introduce two reformulations of the numerical scheme for the shallow water flow case. In this section we also prove that both reformulations satisfy the exact C-property. On numerical tests in the last section we verify the accuracy of the given schemes and present the improvement obtained by using the balanced version of the schemes.
1.
Central NT scheme.
In this section we give a short overview of the central schemes. Detail description of this schemes can be found in [8, 10, 5], etc. Let us consider the one–dimensional homogeneous hyperbolic conservation law system ∂t u + ∂x f (u) = 0. (1) Cells of size ∆x, Ii = [xi− 1 , xi+ 1 ], i = 0, . . . , N , where xi± 1 = xi ± ∆x 2 2 2 2 and points xi = i∆x as the ith cell center are defined. Furthermore, the staggered cells [xi , xi+1 ] are denoted with Ii+ 1 . For a solution u(x, t), uni = 2
173
Balanced central NT schemes for the shallow water equations
u(xi , tn ) denotes a point value of the solution at t = t n . The abbreviations uni and uni+ 1 are used for the average values of the solution over the cells I i and 2
Ii+ 1 respectively. We start with the integration of (1) over a control volume 2
Ii+ 1 × [tn , tn+1 ] and obtain the expression 2
un+1 i+ 12
=
uni+ 1 2
1 − ∆x
!
tn+1
tn
! f (u(xi+1 , t))dt −
tn+1
f (u(xi , t))dt
(2)
tn
The second order Nessyahu–Tadmor central scheme (central NT scheme) is based on a piecewise linear representation of the solution on each grid cell, u(x, tn ) = uni + u i (x − xi ) χIi (x). (3) i
A slope u i inside the cell is computed by using some standard slope limiting 1 procedure ([5]). The simplest choice is a minmod limiter u i = ∆x M M (ui+1 − ui , ui − ui−1 ), where M M (a, b) is the minmod function. Now, u ni+ 1 is the 2
cell average at time tn obtained by integrating the piecewise linear function (3) over the cell Ii+ 1 , i.e., 2
uni+ 1 = 2
1 n ∆x (ui + uni+1 ) + (ui − u i+1 ). 2 8
(4)
Thus, with (4) a second order accuracy in space would be obtained. The approximations of the integrals in (2) such that the second order accuracy in time is attained, yields to the central NT scheme that could be written in the predictor–corrector form as n+ 12
ui
un+1 = uni+ 1 i+ 1 2
∆t n f , u = uni , 2∆x i i
∆t n+ 12 n+ 21 − f (ui+1 ) − f (ui ) . ∆x
= uni −
2
(5) (6)
Here fi denotes the spatial derivative of the flux. In order to prevent spurious oscillations in the numerical solution, it is necessary to evaluate the quantity f i using a suitable slope limiter ([8]). In that sense the slope limiter procedure can be applied directly to the values f (uni ) or the relation fi = A(uni ) u i should be used. In this work the second approach in combination with a minmod slope limiter is chosen. After the staggered values un+1 in the corrector step of the scheme are i+ 1 2
computed, the nonstaggered version of the central NT scheme developed in [8], returns back to the nonstaggered mesh. That means, the average nonstaggered
174
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
values un+1 must be determined. In order to do that, first the piecewise linear i representation of the form u (x, tn+1 ) = un+1 + u (x − x ) χIi+ 1 (x). (7) 1 i+ i+ 1 i+ 1 2
2
2
i
2
is constructed. The staggered cell derivatives are computed by applying a slope limiter procedure to the staggered values u n+1 . The values un+1 are now i i+ 1 2
obtained by averaging this linear interpolant over the cell I i , 1 ∆x n+1 un+1 = (un+1 (ui+ 1 − u i− 1 ). 1 + u 1)− i i− i+ 2 8 2 2 2 2
(8)
We consider now a balance law system ∂t u + ∂x f (u) = g(u, x).
(9)
In order to solve it with the central NT scheme an appropriate extension of the presented scheme should be applied. Several possible approaches are given in [10]. We consider here only the geometrical type source terms, therefore an upwinded discretization will be crucial for obtaining a stable numerical scheme. The additional requirements on the source term evaluation that depend on the particular balance law and that are proposed in the next section, ensure the good accuracy of the numerical scheme developed in this work. Let us proceed as in the homogeneous case. The integration of (9) over a control volume Ii+ 1 × [tn , tn+1 ] gives 2
un+1 i+ 1 2
! n+1 ! tn+1 t 1 = uni+ 1 − f (u(xi+1 , t))dt − f (u(xi , t))dt 2 ∆x tn tn ! tn+1 ! xi+1 1 + g(u(x, t), x)dxdt. (10) ∆x tn xi
To obtain a second order scheme, all the integrals in the above expression must be evaluated according to this order. The flux integral is approximated as before by using the midpoint rule, i.e., !
tn+1 tn
n+ 21
f (u(xi , t))dt ≈ ∆xf (ui n+ 12
where the predictor values ui n+ 21
ui
)
are now evaluated by using the relation
= uni +
∆t (−ffi + gin ∆x) 2∆x
(11)
175
Balanced central NT schemes for the shallow water equations
obtained from (9). The term gin can be evaluated pointwise or some other approximation could be applied as we will see in the proceeding of this work. Furthermore, the approximation of the source term integral in (9) is defined such that second order accuracy in time is obtained. With this discretization the corrector step of our scheme
∆t n+ 21 n+ 21 n+ 1 n+ 1 n+1 n ui+ 1 = ui+ 1 − f (ui+1 ) − f (ui ) + ∆tg(ui 2 , ui+12 ) (12) 2 ∆x 2 is obtained. The spatial accuracy depends on the definition of the term n+ 12
g(ui
n+ 1
, ui+12 ).
Transformations from the staggered values to the nonstaggered ones and in the opposite direction are obtained with the relations (4) and (8) as in the homogeneous case.
2.
Balanced central NT scheme for the shallow water equations.
In this section we apply the nonstaggered central NT schemes to the shallow water equations. In the shallow water case (9) is defined with : ;
0 hv h 2 v|v| u= ,f= ,g= . (13) dz hv hv 2 + 12 gh2 gh(− dx − Mh4/3 ) Here h = h(x, t) is the water depth, v = v(x, t) is the water velocity, g is acceleration due to gravity, z = z(x) is the bed level, and M = M (x) is the Manning’s friction factor. The crucial property we want to be satisfied when the central NT scheme is applied to the shallow water equations is the exact C-property ([1]). The numerical scheme has exact C-property if it preserves a steady state of the quiescent flow h + z = const, v = 0 exactly. Since in that case the balancing between the flux gradient and the source term must be obtained, we refer to the scheme developed in this paper as to the balanced central NT scheme. In order to define the central NT scheme for the shallow–water system, the source term g in n+ 1
n+ 1
in the predictor step (11) and the term g(u i 2 , ui+12 ) that arises in the corrector step (12) should be determined. From this point on, when the derivations of the variables are evaluated, we use just a minmod limiter function. Following the idea of decomposing the source term, we propose to evaluate g in as n n gin = gi,L + gi,R ,
where n gi,L
=
1 − si s2i 2
0
i−1 −ghni zi −z ∆x
,
n gi,R
=
1 s2i
(14) + si 2
0
−zi −ghni zi+1 ∆x
.
176
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
The parameter si in the ith cell is defined by ⎧ ⎨ −1 , if h i = hni − hni−1 1 , if h i = hni+1 − hni . si = ⎩ 0 , if h i = 0
(15)
Depending on the side that is chosen when the variable and the flux derivations are evaluated the defined parameter changes the sign value. Thus, the expression (14) actually includes the source term upwinding. In this way the source term discretization is made according to the flux gradient evaluation. For the term n+ 12
g(ui
n+ 1
, ui+12 ) we propose to use just the centered approximation : ; 0 n+ 12 n+ 12 1 1 n+ n+ g(ui , ui+1 ) = . h 2 +h 2 −zi g i 2 i+1 (− zi+1 ) ∆x
(16)
The part of the source term concerning friction forces is omitted in (14) and (16). The reason lies in the fact it does not appear when a quiescent flow case is considered and we evaluate it just pointwise. Since we want that the defined numerical scheme preserves the quiescent flow exactly, we must first check if the balancing between the flux gradient and the source term is obtained. In the quiescent flow case the variable, the flux and the source term vector reduce to
0 0 h u= , f= , g= . (17) 1 dz 2 0 gh(− dx ) 2 gh If we use the definition (14) in (11), it is not hard to see that in the quiescent flow case the equality n+ 12
ui
= uni
(18)
holds. In the similar way from (12) by using (16) we get un+1 = uni+ 1 . i+ 1 2
2
(19)
The obtained equalities are consequence of balancing in both steps of the numerical scheme. From (18) and (19) we can conclude that in the quiescent flow case no time evolution of the variables occurs. Hence, if the initial discretization satisfies the quiescent flow condition, this condition would be preserved if the procedure of passing form the original to the staggered mesh and vice a versa is defined in an appropriate way. For that purpose the modification of the original nonstaggered version of the central NT scheme for applying it to the shallow water equations is needed. We propose here two different reformulations of the algorithm for evaluation of the staggered and the nonstaggered cell averages in the shallow water case. These reformulations are based on discretizations of the riverbed bottom.
177
Balanced central NT schemes for the shallow water equations
2.1
The interface type reformulation
We consider first the case where the bed topography is defined at the cell interfaces and the bed shape is approximated as a linear function inside each cell. That means, the values zi− 1 and zi+ 1 are known, while the height of the riverbed 2
2
1 bottom inside the cell Ii is expressed as z(x) = zi + ∆x (zi+ 1 − zi− 1 )(x − xi ). 2
zi− 1 +zi+ 1 2
2
2
At the cell center the relation zi = is valid. 2 Now we start with our reformulation. The corrections we propose are connected with the way of evaluation of u i and u i+ 1 in (3) and (7). The given 2
reformulation is based on the surface gradient method. Since in the quiescent flow case the second component in the variable vector is equal zero, the modifications will be done just for the first component, i.e., for the variable h. When the central NT scheme is considered, the water depth and the riverbed bottom are supposed to be linear inside each cell. Here, the linearization of the water depth will be made indirectly by prescribing first the linearization of the water level H(x) and then by using the relation h(x) = H(x) − z(x). The linearization H(x) inside a cell I i is obtained by using a slope limiting procedure on the cell values H i = hi + zi . Thus, for x ∈ Ii we have H(x) = Hi + Hi (x − xi ). The derivation of the water depth can be obviously calculated as h i = Hi −
1 (z 1 − zi− 1 ). 2 ∆x i+ 2
(20)
When the staggered values are considered the reformulation is again applied just to h. First we define the point values of the water level on the staggered mesh as 1 = h 1 + z 1 . H (21) i+ 2
i+ 2
i+ 2
Here the term zi+ 1 = zi+ 1 − 12 zi+ 1 − zi +z2 i+1 is the corrected riverbed 2 2 2 bottom. The reason of this correction lies in the fact that the riverbed is not linear inside the staggered cell Ii+ 1 . Now the discrete derivatives H i+ 1 are 2
2
1 } by using a standard slope limiter derived from the staggered values {H i+ 2 procedure. Then the relation 1 − 1 (zi+1 − zi ) h i+ 1 = H i+ 2 2 ∆x
(22)
is applied. We claim that for the described treatment of the cell average evaluations the reformulated nonstaggered central scheme is consistent with the quiescent flow case. Let us prove that.
178
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
From the relation (19) follows that the staggered values do not change in the time step of the numerical scheme. That means, it is enough to prove that transformations from the staggered values to the nonstaggered ones and then back return us the same values that we start form. We concentrate just on the variable h. The quiescent flow at the discrete level can be written as Hi = hi + zi = const.
(23)
From relations (4) and (20) we obtain n
hi+ 1 2
1 n n (hi + hi+1 ) 2
zi+ 1 − zi− 1 zi+ 3 − zi+ 1 ∆x 2 2 2 2 + Hi − − Hi+1 + 8 ∆x ∆x
1 n 1 zi + zi+1 n = (h + hi+1 ) − z 1− . (24) 2 i 2 i+ 2 2
=
The last equality is obtained by using the fact that for the quiescent flow case Hi = 0 and by applying the relations zi+ 1 − zi− 1 = 2(zi+ 1 − zi ) and zi+ 3 − 2 2 2 2 zi+ 1 = 2(zi+1 − zi+ 1 ). By using (24) in (21) simple calculations give us 2 2 1 is constant over 1 = 1 (hi + hi+1 + zi + zi+1 ) and since (23) is valid, H H i+ 2
i+ 2
2
the whole domain. Finally, the nonstaggered values are evaluated from (8) by using (22) as n+1
hi
1 n+1 n+1 (hi− 1 + hi+ 1 ) 2 2 2
∆x zi+1 − zi zi − zi−1 − Hi+ 1 − − Hi− 1 + . 8 2 ∆x 2 ∆x
=
(25) (26)
By taking into account expressions (24) for the staggered values of h and the n fact Hi + 1 = 0, the right side of (26) reduces to h i . With this the proof of the 2
consistency with the quiescent flow case is ended.
2.2
The cell centered type reformulation
Now we consider the case in which the bottom heights z i at cell centers are given. The surface gradient method is then applied in the next way. Let us notice that the term uni+ 1 appears only in relation (11), where the approximation of 2
the spatial part is added to this term. Therefore it is not necessary to evaluate n+1 n the term hi+ 1 directly. Instead, we compute the staggered values H i+ 1 at the 2 2 same way at it was described in the previous paragraph, i.e., by using values Hi = hi + zi and a slope limiter procedure for evaluating the derivatives. After
179
Balanced central NT schemes for the shallow water equations n+1
the evolution step (12) is applied, we obtain instead of h i+ 1 as a first component 2
n+1
of the variable un+1 , the staggered value of the water level H i+ 1 . The next step i+ 1 2
2
of the method, in which the nonstaggered values are computed, give us the water n+1 n+1 n+1 level values H i . Finally, by applying the simple relation h i = H i − zi n+1 the water depth values at time step t = t are obtained. We prove now that the scheme obtained with this reformulation preserves also the quiescent steady state exactly. Again, as in the previous reformulation, due to equalities (18) and (19), we concentrate just on verification if the procedure of passing from staggered to nonstaggered values and back preserve the water depth in the quiescent flow case. Since (23) is valid Hi = 0, so from (4) we get 1 n n n H i+ 1 = (H i + H i+1 ) = const. 2 2 n+1
As the staggered values do not change in the evolution step the values H i+ 1 will 2 be constant and the term H i+ 1 will be equal zero. By including the established 2 facts in (8) we have n+1
Hi n+1
so the equality hi
3.
1 n n n = (H i− 1 + H i+ 1 ) = H i , 2 2 2
(27)
n
= hi is obviously fulfilled.
Numerical results
In this section we present the improvements obtained by using the proposed balanced versions of the nonstaggered central NT scheme on several test problems. In all the test problems the CFL coefficient is set to 0.5.
3.1
A quiescent steady test
In this test section we are interested in the quiescent steady state preserving property of our scheme. We test it on the problem with the riverbed geometry proposed by the Working Group on Dam–Break Modelling, as described in [2]. The water level is initially defined with H = 15m and the water is at rest. The riverbed and the initial water level are presented in Fig. 1. Computations are performed by using the interface type reformulation and ∆x = 7.5m. In Fig. 2 we can see the performance obtained by using the balanced and the pointwise central NT scheme. The numerical errors that appear when just the pointwise source term evaluation is used are very large, therefore unacceptable for practical use.
180
3.2
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Tidal wave propagation in a channel with a discontinuous bottom
We consider here an unsteady problem taken again from [2]. It is used to establish the correctness of the central NT scheme in the case of a gradually varied flow and to show that the proposed source term evaluation is necessary when a discontinuous bottom is present. The riverbed is the same as in previous test problem. The tidal wave from the left boundary is defined with incoming 4t h(0, t) = 16.0 − 4.0 sin π 86400 + 12 . The water 12m high is initially at rest. The right boundary condition is v(1500, t) = 0. The computations are performed with the space step ∆x = 7.5m. We give numerical results after t = 10800s. Results presented in Fig. 3, where the comparison between the balanced and nonbalanced versions of the central NT schemes is made, clearly illustrate the superiority of the balanced schemes. Then in Fig. 4 the numerically obtained velocity profile is compared with the approximate one (see [1]). We
16
14
12
10
8
6
4
Riverbed
2
Initial water level 0 0
250
500
750
1000
1250
1500
x
Figure 1.
Initial conditions for the test problem 3.1.
1 , 4
Central NT - balanced Central NT - pointwise
1 ,
water level
1 ,
1 , 1
1
14,
14,
14, 1
1
1
x
Figure 2. 3.1.
Comparison in water level for the quiescent steady state at t = 100s. Test problem
181
Balanced central NT schemes for the shallow water equations
can conclude that the agreement is excellent. This suggest that the proposed scheme is accurate for tidal flow over an irregular bed. Such a behaviour could be very encouraging for real water flow simulations over natural watercourses.
0,6
Central NT - balanced Central NT - pointwise
0,5 0,4 0,3
velocity
0,2 0,1 0,0 -0,1 -0,2 -0,3 0
250
500
750
1000
1250
1500
x
Figure 3.
Comparison of velocity at t = 10800s in the test problem 3.2.
0,1
Exact solution Numerical solution
0,08
velocity
0,06
0,04
0,02
0 0
250
500
750
1000
1250
1500
x
Figure 4. Velocity computed with the balanced central NT scheme vs. asymptotic solution at t = 10800s. Test problem 3.2.
3.3
A convergence test over an exponential bump
This is a steady state test problem used for testing the convergence properties of the balanced central NT scheme. We know that a central NT scheme is second order accurate when it is used on homogeneous conservation laws. Now we want to confirm this order of accuracy for the balance laws also. The riverbed 4 2 bottom is supposed to be given with a smooth function z(x) = 0.2e − 25 (x−10) . The domain is in the range [0, 20] and the initial condition is steady subcritical
182
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
flow with a constant discharge equal 4, 42m 2 /s. The stationary solution could be evaluated analytically and should be preserved. With the given test problem we examine accuracy and the convergence properties of our scheme. Here we test the interface type reformulation. The convergence test results are presented in Table 1. We can notify that the experimentally established orders coincide very well with the theoretic ones. Table 1.
3.4
Accuracy of the central NT scheme. Test problem 3.3.
Errors in water level N L1 error L1 order 20 3.72 × 10−3 40 1.18 × 10−3 1.65 80 2.83 × 10−4 2.07 160 6.76 × 10−5 2.07 320 1.66 × 10−5 2.02
L∞ error 1.58 × 10−2 5.34 × 10−3 1.71 × 10−3 4.98 × 10−4 1.29 × 10−4
L∞ order
Errors in discharge N L1 error 20 6.82 × 10−3 40 2.06 × 10−3 80 5.20 × 10−4 160 1.29 × 10−4 320 3.21 × 10−5
L∞ error 2.51 × 10−2 1.10 × 10−2 3.74 × 10−3 1.07 × 10−3 2.76 × 10−4
L∞ order
L1 order 1.73 1.99 2.01 2.00
1.57 1.64 1.78 1.95
1.20 1.55 1.81 1.95
LeVeque test example over bump
This test problem is suggested by LeVeque ([4]). The bottom topography is defined with 0.25(cos(10π(x − 0.5)) + 1) , if |x − 0.5| < 0.1 z(x) = (28) 0 , otherwise over the domain [0, 1]. The initial conditions are 1.01 − z(x) , if 0.1 < x < 0.2 v(x, 0) = 0 and h(x, 0) = . 1.0 − z(x) , otherwise
(29)
As in [4] we take g = 1. A small perturbation that is defined with the initial conditions splits into two waves. The left-going wave leaves the domain, while the right-going one moves over the bump. Results are shown at time t = 0.7s
183
Balanced central NT schemes for the shallow water equations
after the left-going wave already left the domain while a right-going passes the bump. The computations are performed with a space step ∆x = 0.005 and by using the cell centered type reformulation. The disturbance in the pointwise version caused by the varying riverbed bottom can be clearly seen in Fig. 5. These numerical errors are of the same order as the disturbance that is moving over the domain. That leads to the conclusion that the nonbalanced scheme is especially unfavorable for the cases in which small disturbances appear. Central NT - balanced Central NT - pointwise
water level
1,005
1,003
1,001
0,999 0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
x
Figure 5.
3.5
Comparison in water level at t = 0.7s. Test problem 3.4.
Dam–break over a rectangular bump
This is a test problem taken from [7]. The purpose of this test is to check the balanced central NT scheme in the case of rapidly varying flow over a discontinuous bottom. The riverbed is given with z(x) =
8 0
, if |x − 1500/2| < 1500/8 , , otherwise
(30)
while the initial conditions are H(x, 0) =
20 15
, if x ≤ 750 and , otherwise
v(x, 0) = 0.
(31)
The Manning friction factor is set to 0.1. The computations are performed with the space step ∆x = 2.5m and the cell centered type reformulation. In Figs. 6 and 7 we compare the balanced and the nonbalanced central NT scheme at time t = 15s. The improvements obtained by using a balanced version are clearly visible.
184
4.
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Conclusion remarks
In this paper we present the extension of the nonstaggered central NT schemes to the balanced laws with geometrical source terms. The equilibrium type discretization of the source term, that includes the balancing with the flux gradient is used. The schemes are applied to the shallow water equations. The computations performed on several test problems show very good results in steady and unsteady flow cases.
References [1] A.Berm`u` dez and M. E. Va´ zquez,Upwind methods for hyperbolic conservation laws with source terms, Computers & Fluids 23(8), 1049-1071 (1994). [2] M. E. V´a´ zquez-Cendo´ n,Improved treatment of source terms in upwind schemes for the shallow water equations in channel with irregular geometry, Journal of Computational
25
Bed
Central NT - balanced
Central NT - pointwise
water level
20
15
10
5
0 0
250
500
750
1000
1250
1500
x
Figure 6.
Comparison in water level at t = 15s. Test problem 3.5.
25
Central NT - balanced
Central NT - pointwise
20
discharge
15
10
5
0
0
250
500
750
1000
1250
1500
x
Figure 7.
Comparison in discharge at t = 15s. Test problem 3.5.
Balanced central NT schemes for the shallow water equations
185
Physics 148, 497-526 (1999). [3] A. Kurganov and D. Levy,Central-upwind schemes for the Saint-Venant system, Mathematical Modelling and Numerical Analysis (M2AN), 33(3), 547-571 (1999). [4] R. J. LeVeque, Balancing source terms and flux gradients in high-resolution Godunov methods: the quasi-steady wave propagation algorithm, Journal of Computational Physics 146, 346 (1998). [5] R. J. LeVeque, Finite Volume Methods for Hyperbolic Problems, Cambridge University Press, 2002. [6] T. Gallou¨et, J.-M. H´e´ rard and N. Seguin, Some approximate Godunov schemes to compute shallow water equations with topography, Computers & Fluids 32, 479-513 (2003). [7] S. Vukovi´c´ and L. Sopta, ENO and WENO schemes with the exact conservation property for one-dimensional shallow water equations, Journal of Computational Physics 179, 593-621 (2002). [8] G.-S. Jiang, D. Levy, C.-T. Lin, S. Osher and E. Tadmor, High-resolution nonoscillatory central schemes with nonstaggered grids for hyperbolic conservation laws, SIAM J. Numer. Anal., 35(6), 2147-2168 (1998). [9] J. G. Zhou, D. M. Causon, C. G. Mingham and D. M. Ingram, The surface gradient method for the treatment of source terms in the shallow-water equations, Journal of Computational Physics 168, 1-25 (2001). [10] S. F. Liotta, V. Romano, G. Russo, Central schemes for balance laws of relaxation type, SIAM J. Numer. Anal., 38(4), 1337-1356 (2000).
HIDDEN MARKOV MODELS AND MULTIPLE ALIGNMENTS OF PROTEIN SEQUENCES Pavle Goldstein Department of Mathematics, University of Zagreb
[email protected]
Maja Karaga Department of Mathematics, University of Zagreb
[email protected]
Mate Kosor Department of Mathematics, University of Zagreb
Ivana Niˇzˇ etic´ Department of Mathematics, University of Zagreb
Marija Tadi´c Department of Mathematics, University of Zagreb
Domagoj Vlah Department of Mathematics, University of Zagreb
Abstract
A multiple alignment algorithm for protein sequences is considered. Alignment is obtained from a hidden Markov model of the family, which is built using simulated annealing variant of the EM algorithm. Several methods for obtaining the optimal model/alignment are discussed and applied to a family of globins.
Keywords:
multiple alignment, hidden Markov model, simulated annealing, expectation maximization, suboptimal alignment.
187 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 187–196. © 2005 Springer. Printed in the Netherlands.
188
1.
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Introduction
Multiple alignments of protein sequences are one of the most important computational methods for studying proteins. Multiple alignment of a protein family can be used to describe the evolutionary relationships within the family and build a phylogenetic tree, to detect structurally important or functional sites, and infer structure and function. Let us also point out that, given sufficient similarity between the sequences, “the correct” multiple alignment of sequences will actually describe the best possible match of associated three–dimensional objects (i.e. proteins in their folded state). Consequently, MSA algorithms, or, given the uncertainties involved, MSA strategies, are a topic of great interest in computational molecular biology. In this note, we describe several MSA strategies based on hidden Markov models. Our aim was to design a robust procedure with resulting alignments comparable in quality to the ones produced by heuristic algorithms, primarily CLUSTALW (cf. [4]). It should be pointed out that, since they were introduced into computational biology approximately fifteen years ago (cf. [1]), HMMs found various applications, primarily as family profiles, but also as MSA tools (cf. [6]). However, those methods, in contrast to CLUSTALW, require a representative sample of the family to work with. Here we show that, when working with an unrepresentative sample, the problem of finding the optimal alignment is quite different from the problem of determining the optimal model. We also propose modifications to the model that we believe will rectify this problem. Let us briefly describe our setup and introduce some notation: in the next Section, we give a short description of the HMM used. Sections 3 and 4 contain details on expectation maximization implementation and suboptimal alignment procedures, respectively, while in the last Section we show the results of our tests. Throughout the paper, the family of protein sequence will be denoted by x = {xi }ni=1 , where each xi = xi1 xi2 . . . xik(i) is a finite word in the alphabet A of 20 standard amino acids. For a family x, multiple alignment of x = {x i }ni=1 is the =x ˆ = {ˆ xi }ni=1 , where each x ˆi is a word in the alphabet - family MA(x) i i A {–}, and x ˆ restricted to A equals x . Since we do not allow columns consisting entirely of gaps, for each l, there exists j such that x ˆ jl = – .
2.
Hidden Markov Models
The putative Markov process under consideration consists of the emission of a single amino acid. It is common to use the HMM from Figure 1 to model this process (cf. [1, 2]). Here, the squares M correspond to the match states, diamonds I to the insert states and circles D to the silent delete states. The model is determined in terms of parameters – emission probabilities e S (b), probability of state S emitting the symbol b ∈ A, for S being M or I, and transition probabilities aST .
HMMs and Multiple Alignments of Protein Sequences
89:; ?>=<
89:; ?>=<
189
89:; ?>=<
/ / ? G // ? G // ? G // // // // / / / ?? ? ? // ? ? // ? ? // ? ? ? ? / / / & & & & ?? ? ?? ? // ?? Ij ? // ?? ? // ?? ?? ?? ?? // ?? ?? // ?? ?? // O ?? O ?? / O ?? / O ?? / ?? ?? // ??? /// ??? /// ?? ?? / ?? / ?? / ? ? / / / Mj / / End Begin
Figure 1.
Standard HMM for protein family modelling
Remark 2.1. While it looks reasonable to assume that insert states correspond to insert regions (=regions of low similarity), and match states correspond to conserved regions, note that there is nothing in the model so far that would ensure it behaves in such a manner. Namely, unless certain restrictions are put on emission and transition probabilities, a conserved column in an alignment could very well be emitted from an insert state. In order to distinguish between match and insert states, different kinds of pseudo–counts were used to adjust match and insert emission probabilities. Let q1 , . . . , q20 stand for the average occurrence of twenty standard amino acids. For the insert state emissions, the prior was assumed to be either flat, or given by qi . For the match state emissions, a substitution matrix (BLOSUM 50) was used in the following fashion: since the entry B(a, b) is defined as pab B(a, b) = log , qa qb it follows that – after renormalization – the vector (qb1 K B(a,b1 ) , . . . , qb20 K B(a,b20 ) ), K ∈ [1, ∞)
(1)
gives an average distribution of the amino acid a after some time T (K). Obviously, T (1) corresponds to a very long evolutionary time, while, for example, K = 100 will give a distribution almost entirely concentrated in a.
3.
Expectation Maximization
Given a protein family x, one can construct a model – i.e. determine the parameters of the model – that maximizes the score of the family. This is usually known as Baum–Welch, or, more generally, EM algorithm. It is also well known that this algorithm tends to get stuck in the local maxima, hence fails to determine the best parameters. We briefly describe two different approaches for avoiding local maxima.
190
3.1
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Noise Injection
Noise injection was introduced in [1], as a simplified version of simulated annealing. Namely, EM-algorithm is an iterative procedure, where each iteration consists of computing the parameters of a new model that would increase the score of the data, with data and the old model given. Noise injection amounts to adding a random component to re–estimated values of parameters at each iteration, and decreasing its proportion slowly, as the number of iterations grows. We defined the new parameters as a convex linear combination of three components – λ1 , λ2 and λ3 – model–based re–estimation, pseudo–count component and random part, respectively. The values λ j depend on the number of the iteration (as well as total number of iterations) and we tested several schemes for setting these parameters.
3.2
Simulated Annealing
Simulated annealing version of EM algorithm (in the domain of computational biology) was considered by Eddy in [6], who described simulated annealing Viterbi training (see also [2] for general background). Since our method includes Baum–Welch algorithm, that was not directly applicable. Another approach – Monte–Carlo based methods (see, for example, [7], and references therein) – also cannot be directly applied, because it involves random moves over the whole parameter set, one parameter per iteration. An HMM of length n has got over 44n independent parameters, so straightforward implementation would require too much time. Taking this into account, we implemented simulated annealing only as an extension of the noise injection. More precisely, we considered fixed λ 1 , λ2 and λ3 as in the previous subsection, and then, in each iteration, computed the model, while accepting it with respect to a Metropolis–type criterion, i.e. with probability p, where
p=a
Pnew (x) j Pold (x)
(2)
In the above equation, Pold and Pnew are scores of the old and new model, respectively, j is the iteration number, while a is a normalizing constant, depending on the range of j. It should be pointed out that the above formula is not entirely consistent with the theoretical background, since the new model will, in general, score better (= have higher energy) than the old one. Nevertheless, the algorithm performs reasonably well, considerably better than noise injection only.
191
HMMs and Multiple Alignments of Protein Sequences
4.
Suboptimal Alignments
In this section we present two algorithms that deal with suboptimal alignments. Namely, given an HMM π and a family x = {x i }ni=1 , by considering the most probable path of xi through π, for each i, one gets the optimal M A(x) (note that, while optimal alignment of x with respect to π does not have to be unique, that will be the case in any meaningful application, due to numerical considerations). If, for k ∈ N, and for each x i ∈ x, we consider k best paths of xi through π, we will get k–suboptimal alignment of the family x, called M Ak (x). M Ak (x) is an nk × l matrix, for some l ∈ N, that, clearly, contains more gaps than M A(x). Furthermore, it is clear that those columns in M A(x) – corresponding to match states – that remain unchanged for k = 1, 2, . . ., up to some large k0 , represent rigid – in other words, conserved – regions of the alignment M A(x). Here, by “conserved” we mean both numerically – in the sense of the model – as well as biologically. Namely, in several applications we found that, for sufficiently large k, conserved regions contain only conservative substitutions, or no substitutions at all. Furthermore, testing M A k (x), for various models, and for some fixed k, also shows considerable amount of rigidity. Hence, we can cut our family into several pieces – by cutting each member of the family at appropriate place – and then consider the usual model for each part. In other words, instead of dealing with a model as in Figure 1, we consider a model shown in Figure 2. It is clear that this greatly reduces the complexity of the problem.
89:; ?>=<
89:; ?>=<
89:; ?>=<
/ / .? GG? . /. / .? GG? . /. // // ? G // ?G // ? G // // // // // // // // / / / / / ? ? / / ? ? // ?? ?? ? ? // ? / / ? ? / / / & & & & & ?? ? .# .O .? // ?? ? // ?? ? ?? ? // .# .O .? // ?? ? // ?? ?? ?? ?? // ?? // ?? ?? // ?? ?? ?? // ?? ?? // O ?? O ?? / ?? / O ?? / O ?? ? / O ?? / / ?? // ?? ?? // ?? // ??? // ??? ??? // ?? / ?? ?? / ?? / ?? // ?? / ?? / / ? ? ? ? / ??/ ? / ... / / / / ... / / Begin E/B
Figure 2.
End
Piecewise standard HMM
Remark 4.1. The fact that using suboptimal alignments (instead of the usual ones) in the optimization schedule increases robustness of the method is not surprising. Namely, building a model from M A k (x) for some large k ∈ N will result in a “model with a lot of noise” - similar to the one produced by simulated
192
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
annealing Viterbi training from [6]. On the other hand, if each sequence in M Ak (x) is taken with its (normalized) Viterbi score, resulting models will roughly converge to the one that we started from, as k becomes large. However, it should be pointed out that, as k becomes large, this procedure becomes prohibitively expensive in terms of memory, and for larger families (as well as for longer sequences) an alternative, sampling strategy should be employed. The next algorithm that we employed is called k–match Viterbi, which gives the most probable path (of a sequence through the model) with the property that the groups of match states in the path are of length at least k. In other words, once a symbol is emitted from the bottom row in Figure 1, at least another k − 1 symbols have to be emitted from the bottom row before moving upwards. While the usual dynamic programming principles do not hold for this path, an easy solution is to use two matrices – one for match states, the other for inserts and deletes – and impose the required length in the match–matrix.
5.
Results and Conclusions
In this section, we describe our optimization schedule and present the results of tests that we carried out. We selected a small, rather non–representative sample of seven sequences from the globin family as our test–data. The family is shown in the Figure 5, with their SWISS–PROT identifiers (structural alignment of the family comes from [9], and the picture is taken from [2]). First of all, let us briefly review the difference between an algorithm for determining a family profile (associated to some protein family), and a MSA algorithm. Family profile is an HMM (or a collection of HMMs) associated with a protein family, with the property that the members of the family score well, and unrelated sequences score poorly against the model. Hence, family profile is defined in terms of the whole protein space (or, at least, a representative sample of it). It is usually built from a multiple alignment (cf. [3, 5]), and it should, in principle, correspond to the optimal model. On the other hand, a multiple alignment of a group of (possibly related) sequences is determined only by the sequences themselves, without the any regard for the rest of the protein space. Now we can describe the optimization schedule that we used: typically, we ran simulated annealing/noise injection for up to 100 iterations, produced an alignment, built a model from the alignment, and repeated this for up to 10 times. For the alignment, the usual Viterbi algorithm and the alternatives from Section 4 were used, while to construct a model from an alignment, we used MAP assignment from [2]. A sample alignment, obtained using 3–match Viterbi algorithm is shown in Figure 4, and Figure 3 shows sample suboptimal alignments. We also tested simulated annealing algorithm with various param-
193
HMMs and Multiple Alignments of Protein Sequences
eters m, p and r from Section 3. The values of parameters, and the highest score of a model are shown in Table 1. Table 1. Noise injection/simulated annealing scheme Number of iterations is 1000
m
p
r
f-b score
0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.4 0.4 0.4 0.5 0.5 0.6 0.6 0.7 0.8
0.1 0.3 0.5 0.7 0.1 0.3 0.5 0.7 0.1 0.3 0.5 0.1 0.3 0.5 0.1 0.3 0.1 0.3 0.1 0.1
0.8 0.6 0.4 0.2 0.7 0.5 0.3 0.1 0.6 0.4 0.2 0.5 0.3 0.1 0.4 0.2 0.3 0.1 0.2 0.1
3.52996 8.49378 17.96390 25.98360 9.76561 19.67840 28.99580 37.50630 21.53350 32.84920 41.53100 36.65950 45.70090 55.08630 51.58710 60.26250 63.94570 70.62970 76.62950 85.16000
Notice that the highest score in Table 1 was reached for the highest value of m, with corresponding model completely degenerate, and the alignment, basically, left–justified. Value m = 1 produced the same result, but with fewer iterations. That is, clearly, the consequence of dealing with an unrepresentative sample. Furthermore, we found out that even within one row in Table 1, the best model, or, better to say, the model giving the best alignment, was not the one with the highest score. This strongly suggests that some other function – measuring the quality of the alignment – should be used in formula (2) when the aim is to find an optimal alignment, rather than optimize the model. Finally, we briefly review our suboptimal alignments and list several changes of the procedure that we intend to implement in the near future. In Figure 3, 5 and 10–suboptimal alignments against some family profile of two sequences from
194
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
the family are shown. As a quick comparison with the Figure 5 shows, while the overall alignment is rather poor, the parts conserved in both alignments are exactly the most conserved sites in the structural alignment. As for the changes, we intend to determine the amount of pseudo–count in match state, as well as K from equation 1, as functions of the entropy of the distribution in a state. Furthermore, a weak interaction between neighbouring states should be introduced in order to force the grouping of match regions in the alignment.
Figure 3.
5 and 10–suboptimal alignment of two sequences from the family
HMMs and Multiple Alignments of Protein Sequences
Figure 4.
Figure 5.
195
Sample alignment
A sample from globin family
References ¨ [1] Krogh, A., Brown, M., Mian, I.S., Sjolander, K., Haussler, D. Hidden Markov models in computational biology: applications to protein modelling. Journal of Molecular Biology 235:1501-1531, 1994. [2] Durbin, R., Eddy, S. R., Krogh, A., Mitchison, G. Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, 1998.
196
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[3] Sonnhammer, E. L., Eddy, S. R., Durbin, R. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28:405-420, 1997. [4] Thompson, J. D., Higgins, D. G., Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment. . . Nucleic Acids Research 22:4673-4680, 1994. [5] Wistrand, M., Sonnhammer, E. L. Improving Profile HMM Discrimination by Adapting Transition Probabilities. Journal of Molecular a Biology 338:847-854, 2004 [6] Eddy, S.R. Multiple alignment using hidden Markov models. In Rawlings, C., Clark, D.,Altman, R., Hunter, L., Lengauer, T., Wodak, S., eds., Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology, 114-120. AAAI Press., 1995. [7] Baran, S., Szab´o´ , A. An application of simulated annealing to ML-estimation of a partially observed Markov chain. Third International Conference on Applied Informatics EgerNoszvaj, Hungary, 1997. [8] Zuker, M. Suboptimal Sequence Alignment in Molecular Biology: Alignment with Error Analysis. Journal of Molecular Biology 221: 403-420, 1991. [9] Bashford, D., Chotia, C. and Lesk, A. M. Determinants of a protein fold: unique features of the globin amino acid sequence. Journal of Molecular Biology 196: 199-216, 1987
ON STRONG CONSISTENCY FOR ONE–STEP APPROXIMATIONS OF STOCHASTIC ORDINARY DIFFERENTIAL EQUATIONS R´o´ zsa Horva´ th Bokor∗ Department of Mathematics and Computing, University of Veszpr´m, Egyetem utca 10, 8201 Veszpr´m, ´ Hungary
[email protected]
Abstract
In numerical approximation for stochastic ordinary differential equations (SODEs) the main concepts such as relationship between local errors and strong consistency are considered. The main result that consistency conditions given in [P. Kloeden and E. Platen, Numerical Solution of Stochastic Ordinary Differential Equations, Springer-Verlag, 1992] and local errors are equivalent under appropriate conditions.
Keywords:
Stochastic ordinary differential equations, numerical methods, convergence, consistency, local errors.
1.
Introduction
We usually do not know the exact solution of an initial value problem that we are trying to approximate by a finite difference method. Then, to assure that an approximation will be reasonably accurate, we have to be able to keep the unknown discretization and roundoff errors under control and sufficiently small. We can use certain a priori information about the difference method, that is information obtainable without explicit knowledge of the exact solution, to tell us if it is possible. So the main concepts in deterministic numerical analysis are convergence, consistency and stability. Lax’s equivalence theorem ensures us that for a given properly posed initialvalue problem and a finite difference approximation satisfying the consistency condition, stability is the necessary and sufficient for convergence. The conditions for a one-step method of order 1 are usually called consistency conditions.
∗ Supported by Hungarian National Scientific Foundation Grant (OTKA) T031935 and by Ministry of Science
and Technology, Croatia Grant 0037114.
197 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 197–205. © 2005 Springer. Printed in the Netherlands.
198
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
We will verify that property for stochastic difference methods alongside with discretizations. Consider the following type of the It oˆ stochastic ordinary differential equations (SODEs) with X(t) ∈ R d : dX(t) = a(t, X(t))dt + b(t, X(t))dW (t), for 0 ≤ t ≤ T,
(1)
with X(t) ∈ Rd where functions a : [0, +∞) × Rd → Rd , b : [0, +∞) × Rd → Rd , X(0) = x0 (∈ Rd ). The above system is driven by the 1-dimensional Brownian motion. Details about this stochastic object and corresponding caculus can be found in [1] and [5]. In the equation (1), the (W (t), 0 ≤ t ≤ T ) is Wiener process satisfying the initial condition W (0) = 0. Let Ft denote the increasing family of σ-algebras (the augmented filtration) generated by the random variables W (s), s ≤ t. Assume also that the equation (1) has a unique, mean-square bounded strong solution X(t). The proof of the next theorem is given in [6]. Theorem 1. Let (X(t)) be an exact solution of the Itˆoˆ stochastic ordinary differential equations (SODE) given by the equation (1). Assume that the functions a and b in (1) satisfy following conditions (B1) a(t, x) − a(t, y) ≤ K1 · x − y , and b(t, x) − b(t, y) ≤ K1 · x − y for all t ∈ [0, T ], and x, y ∈ R d . (B2) a(t, x) 2 ≤ K2 · (1+ x 2 ), and b(t, x) 2 ≤ K2 · (1+ x 2 ), for all t ∈ [0, T ], x ∈ Rd , and that E( X(0) 2n ) < +∞ for some integer n ≥ 1. Then the solution (X(t), 0 ≤ t ≤ T ) of (1) satisfies E( X(t) 2n ) ≤ (1 + E( X(0) 2n )) exp(C · t),
199
On strong consistency
for t ∈ [0, T ], where T < +∞, and C is a positive constant depending only on n, K1 , K2 and T. The usual and the simplest time discretization of a bounded interval [0, T ], T > 0 is of the form (0, ∆, 2∆, . . . , N ∆), T where N is a natural number and ∆ = N ∈ (0, ∆0 ) for some finite ∆0 > 0. We denote n∆ by tn , for n = 0, 1, . . . , N, and
nt = max{n = 0, 1, . . . : tn ≤ t}.
(2)
The sequel (Y Yn , 0 ≤ n ≤ N ) always denotes the approximation of X(t n ) using a given numerical method with an equidistant step size ∆. We suppose that for all natural N and n = 0, 1, 2, . . . , N we consider one-step schemes which satisfy E Yn 2 ≤ K(1 + E Y0 2 ). Constant K is positive constant independent of ∆, but it may depend on T. Let denote the Eucledean norm in R d .
2.
Strong Convergence and Consistency We require the concept of convergence in the following sense:
Definition 1. We shall say that a general time discrete approximation (Y Y n, 0 ≤ n ≤ N ) with equidistant step size ∆ converges strongly to (X(t), 0 ≤ t ≤ T ) if lim sup E X(s) − Yns 2 = 0. ∆→0 0≤s≤T
Definition 2. We shall say that a general time discrete approximation (Y Y n, 0 ≤ n ≤ N ) converges strongly with order γ > 0 at time T if there exist positive constants ∆0 ∈ (0, +∞) and C ∈ (0, +∞), not depending on ∆ 0 , such that 1
max (E X(tn ) − Yn 2 ) 2 ≤ C∆γ , ∀∆ < ∆0 .
n∈[0,N ]
(3)
As for deterministic numerical methods, the concept of consistency of a stochastic time discrete approximation is closely related to the idea of convergence. Definition 3. We shall say that a discrete time approximation (Y Yn, 0 ≤ n ≤ N ) is strongly consistent if there exists a nonnegative function c = c(∆) with lim c(∆) = 0,
∆→0
such that
1 ; Yn+1 − Yn 11 E E 1Ftn − a(tn , Yn ) 2 ≤ c(∆), 1 ∆ :
(4)
200
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
and :
1 E Yn+1 −Y Yn −E(Y Yn+1 −Y Yn |F Ftn )−b(tn , Yn )∆W Wn 2 ∆
; ≤ c(∆), (5)
for all fixed values Yn = y and n = 0, 1, . . ., where ∆W Wn = W (tn+1 )−W (tn ) are Gaussian distributed random variables with mean 0 and variance ∆. We introduce a new notation. For 0 ≤ s ≤ t < +∞, and y ∈ R d , the variable X s,y (t) denotes the value of a solution of (1) at time t which starts in Yn n ,Y y ∈ Rd at time s, and YnE,t denotes the Euler approximation with an initial +1 E,tn ,Y Yn condition Yn = Yn , namely Yn n ,Y YnE,t = Yn + a(tn , Yn )∆ + b(tn , Yn )∆W Wn . +1
The next theorem is a result of [4]. Theorem 2. Let (X(t)) be an exact solution of the Itˆo stochastic ordinary differential equations (SODE) given by the equation (1). Assume that functions a and b in (1) satisfy the following conditions: (B1) a(t, x) − a(t, y) ≤ K1 · x − y , and b(t, x) − b(t, y) ≤ K1 · x − y for all t ∈ [0, T ] and x, y ∈ R d . (B2) a(t, x) 2 ≤ K2 · (1+ x 2 ), and b(t, x) 2 ≤ K2 · (1+ x 2 ), for all t ∈ [0, T ] and x ∈ Rd , (B3)
1
a(t, x) − a(s, x) ≤ K3 · (1+ x )|s − t| 2 , and
1
b(t, x) − b(s, x) ≤ K3 · (1+ x )|s − t| 2
for all s, t ∈ [0, T ], x ∈ Rd , where K1 , K2 and K3 are positive constants independent of ∆. Then, for every strongly consistent one-step method (Y Y n , 0 ≤ n ≤ N ) there exists a function ω(∆) ↓ 0 s.t. 1
E( E(Y Yn+1 − X tn ,YYn (tn+1 )|F Ftn ) 2 ) 2 ≤ ∆ · ω(∆),
201
On strong consistency
and
1
1
(E Yn+1 − X tn ,YYn (tn+1 ) 2 ) 2 ≤ ∆ 2 · ω(∆). The left hand side of the above inequalities are called local errors. Proof. For 0 ≤ n ≤ N − 1 we obtain E(Y Yn+1 − X tn ,YYn (tn+1 )|F Ftn ) Yn Yn n ,Y n ,Y ≤ E(Y Yn+1 − YnE,t |F Ftn ) + E(Y YnE,t − X tn ,YYn (tn+1 )|F Ftn ) . +1 +1
By using the triangle inequality for · and in view of strong consistency, it follows from (4) and the local mean error of the Euler approximation that 1
(E E(Y Yn+1 − X tn ,YYn (tn+1 )|F Ftn ) 2 ) 2 1
Yn n ,Y ≤ (E E(Y Yn+1 − YnE,t |F Ftn ) )2 ) 2 + +1 Yn n ,Y + E( E(Y YnE,t − X tn ,YYn (tn+1 )|F Ftn ) 2 ) +1 2 3 ≤ c(∆)∆ + K5 ∆ 2 ≤ ω(∆)∆.
(6) 1 2
(7)
By using the same arguments as above we get 1
1
Yn 2 2 n ,Y (E Yn+1 − X tn ,YYn (tn+1 ) 2 ) 2 ≤ (E Yn+1 − YnE,t ) + +1 1
Yn n ,Y + (E YnE,t − X tn ,YYn (tn+1 ) 2 ) 2 . +1
(8)
We postpone the justification of the estimates of the second terms in (7) and (8) till the end of the proof. Now, the form of the Euler approximation yields Yn n ,Y Yn+1 − YnE,t = Yn+1 − Yn − a(tn , Yn )∆ − b(tn , Yn )∆W Wn +1 = Yn+1 − Yn − E(Y Yn+1 − Yn |F Ftn ) − b(tn , Yn )∆W Wn + + E(Y Yn+1 − Yn |F Ftn ) − a(tn , Yn )∆.
Now, we easily get from the definition of the strong consistency 1
(E Yn+1 − X tn ,YYn (tn+1 ) 2 ) 2 ≤ 1
Yn n ,Y ≤ (E YnE,t − X tn ,YYn (tn+1 ) 2 ) 2 +1 1
+(E Yn+1 − Yn − E(Y Yn+1 − Yn |F Ftn ) − b(tn , Yn )∆W Wn 2 ) 2 1
+(E E(Y Yn+1 − Yn |F Ftn ) − a(tn , Yn )∆ 2 ) 2 2 1 1 ≤ c(∆)(∆ 2 + ∆) + K6 · ∆ = ω(∆)∆ 2 , from which the assertion of the proposition follows.
202
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Now we turn to the proofs of the estimation in (7).
Yn n ,Y E (Y YnE,t − X tn ,YYn (tn+1 )|F Ftn ) = +1 ! tn+1 = E( (a(s, X tn ,YYn (s)) − a(tn , Yn ))ds|F Ftn ) =
! = E (
tn tn+1
(a(s, X tn ,YYn (s)) − a(tn , X tn ,YYn (s)) +
tn
+ a(tn , X tn ,YYn (s)) − a(tn , Yn ))ds|F Ftn ) .
Here we used the fact that the conditional expectation of the diffusion term is zero. For estimating the first difference we apply the assumption (B3): ! E(
tn+1 tn
(a(s, X tn ,YYn (s)) − a(tn , X tn ,YYn (s))ds|F Ftn ) ≤
! ≤ K3 ·E( ! ≤ K3 ·
tn+1
tn tn+1
tn
1
(1+ X tn ,YYn (s) )|s − tn | 2 ds|F Ftn ) ≤ 1
3
E(1+ X tn ,YYn (s) |F Ftn )(s − tn ) 2 ds ≤ K4 · ∆ 2 ,
where K4 is a positive constant depending only on constants C, K, K 1 , K2 , K3 and T. The second inequality follows from the finiteness of the second moment of the exact solution X(t) (cf. Theorem (1)). For estimating the second difference we apply assumptions (B1) and (B2), the Jensen inequality, the second moment of the stochastic integral (since
Yn n ,Y E(Y YnE,t − X tn ,YYn (tn+1 )|F Ftn ) ≤ K5 ∆ 2 . +1
Finally, we prove the estimation used in (8):
203
On strong consistency
By using the triangle inequality, and all the above considerations for the unconditional expectation, we get: 1
Yn n ,Y E YnE,t − X tn ,YYn (tn+1 ) 2 ) 2 = +1 ! tn+1 = (E (a(s, X tn ,YYn (s)) − a(tn , Yn ))ds +
!
tn tn+1
+ tn
1
(b(s, X tn ,YYn (s)) − b(tn , Yn ))dW (s) 2 ) 2 ≤
!
≤ (E
tn+1
tn tn+1
1
(a(s, X tn ,YYn (s)) − a(tn , Yn ))ds 2 ) 2 +
! + (E
tn tn+1
1
(b(s, X tn ,YYn (s)) − b(tn , Yn ))dW (s) 2 ) 2 ≤
! ≤ (E
(a(s, X tn ,YYn (s)) − a(tn , X tn ,YYn (s)) +
tn 1
+ a(tn , X tn ,YYn (s)) − a(tn , Yn ))ds 2 ) 2 + ! tn+1 + (E (b(s, X tn ,YYn (s)) − b(tn , X tn ,YYn (s)) + tn 1
+ b(tn , X tn ,YYn (s)) − b(tn , Yn ))dW (s) 2 ) 2 ≤ ≤ K6 · ∆, where K6 is a positive constant depending only on constants C, K, K 1 , K2 , K3 and T. We will give an another direction of the Theorem above. Theorem 3. Let (X(t)) be an exact solution of the Itˆo stochastic ordinary differential equations (SODE) given by the equation (1). If there exists a function ω(∆) such that for the given one-step method (Y Y n , 0 ≤ n ≤ N ) there exists a function ω(∆) ↓ 0 such that the local errors satisfy 1
E( E(Y Yn+1 − X tn ,YYn (tn+1 )|F Ftn ) 2 ) 2 ≤ ∆ · ω(∆), and
1
1
(E Yn+1 − X tn ,YYn (tn+1 ) 2 ) 2 ≤ ∆ 2 · ω(∆),
(9)
(10)
then a one-step method (Y Yn , 0 ≤ n ≤ N ) is strongly consistent in the sense of the Definition 3. Proof. To prove the Theorem 3, we will use estimations proved for the errors of Euler approximation. In view of local errors estimation (9) and the local mean
204
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
error of the Euler approximation for 0 ≤ n ≤ N − 1, we obtain 1
Yn n ,Y (E E(Y Yn+1 − YnE,t )|F Ftn ) 2 ) 2 +1 1
≤ (E E(Y Yn+1 − X tn ,YYn (tn+1 )|F Ftn ) )2 ) 2 + 1
Yn n ,Y + E( E(Y YnE,t − X tn ,YYn (tn+1 )|F Ftn ) 2 ) 2 +1 3
≤ ω(∆)∆ + K5 · ∆ 2 . We estimate the second inequality in the Definition 3. 1 1 ( √ (E Yn+1 − Yn − E(Y Yn+1 − Yn |F Ftn ) − b(tn , Yn )∆W Wn 2 ) 2 ∆ 1 1 ≤ [( √ E Yn+1 − Yn − a(tn , Yn )∆ − b(tn , Yn )∆W Wn 2 ) 2 + ∆ 1 1 + √ (E E(Y Yn+1 − Yn |F Ftn ) − a(tn , Yn )∆ 2 ) 2 ∆ 1 1 ≤ √ [(E Yn+1 − YnE,tn ,YYn 2 ) 2 + ∆ 1
+ (E E(Y Yn+1 − Yn |F Ftn ) − a(tn , Yn )∆ 2 ) 2 ] 1 1 ≤ √ [(E Yn+1 − X(tn+1 ) 2 ) 2 + ∆ 1
+ (E X(tn+1 ) − YnE,tn ,YYn 2 ) 2 + 1
+ (E E(Y Yn+1 − Yn |F Ftn ) − a(tn , Yn )∆ 2 ) 2 ] 3 1 √ ≤ √ ( ∆ω(∆) + K6 ∆ + ∆ω(∆) + K7 ∆ 2 ). ∆ Now, we can easily choose function c(∆) such that lim ∆→0 c(∆) = 0, and 3 1 √ √ ( ∆ω(∆) + K6 ∆ + ∆ω(∆) + K7 ∆ 2 ) ≤ c(∆) ∆ and
3
ω(∆)∆ + K4 · ∆ 2 ≤ c(∆)∆, so we get strong consistency conditions in the sense of the Definition 3. Our conclusion is that the definition of strong consistency is equivalent to the behaviour of local errors in mean and mean-square sense. By using the results from [4, page 38], we get the forward stochastic Lax principle; namely, consistency and stability imply consistency. We can prove the same results for the Itoˆ stochastic delay differential equations or SDDEs of the form dX(t) = f (t, X(t), X(t − τ ))dt + g(t, X(t), X(t − τ ))dW (t),
205
On strong consistency
for 0 ≤ t ≤ T, with X(t) = ψ(t), t ∈ [−τ, 0], or scalar stochastic functional differential equations with distributed memory term of the form dX(t) = f (t, X(t), Y (t))dt + g(t, X(t), Y (t))dW (t), for 0 ≤ t ≤ T , where Y (t) represents a memory term of the type ! t Y (t) = K(t, s − t, X(s))ds, t−τ
with X(t) = ψ(t), t ∈ [−τ, 0]. For details we refer to cf. [2] and [3].
References [1] L. Arnold, Stochastic Differential Equations, Wiley, New York, 1974. [2] C.T.H. Baker and E.Buckwar, Numerical analysis of explicit one-step methods for stochastic delay differential equations, LMS J. Comput. Math. 3, pp. 315-335, 2000. [3] E.Buckwar, Euler-Maruyama and Milstein approximations for stochastic functional differential equations with distributed memory term, submitted to publication. ´ th-Bokor, Convergence and Stability Properties for Numerical Approxima[4] R. Horva tions of Stochastic Ordinary Differential Equations, Ph.D thesis, University of Zagreb, Croatia, 2000. [5] I. Karatzas, S. E. Shreve, Brownian Motion and Stochastic Calculus, SpringerVerlag, 1988. [6] P. E. Kloeden, E. Platen, Numerical Solution of Stochastic Differential Equations, Springer-Verlag, New York, 1992.
ON THE DIMENSION OF BIVARIATE SPLINE SPACE S31 () Gaˇsˇper Jaklicˇ Institute of Mathematics, Physics and Mechanics Jadranska 19, 1000 Ljubljana, Slovenia
[email protected]
Jernej Kozak University of Ljubljana, Faculty of Mathematics and Physics and Institute of Mathematics, Physics and Mechanics Jadranska 19, 1000 Ljubljana, Slovenia
[email protected]
Abstract
In this paper a problem of determining the dimension of the bivariate spline space S31 () is studied. Under certain assumptions on the degrees of vertices and on the collinearity of edges, it is shown that the dimension of the spline space is equal to the lower bound, established by Schumaker in [7].
Keywords:
bivariate spline space, dimension, blossoming approach.
1.
Introduction
Suppose that Ω ⊂ IR2 is a closed, simply connected polygonal region in IR 2 and let N N := {Ωi }i=1 , Ω = Ωi i=1
denote a regular triangulation of Ω. The triangulation is regular if two triangles Ωi , Ωj can have only one vertex or the whole edge in common. Let Π n (IR2 ) denote the space of bivariate polynomials of total degree ≤ n and let Snr () := {f ∈ C r (Ω); f |Ωi ∈ Πn (IR2 ), i = 1, 2, ..., N } be the spline space over the triangulation . It is well known that the bivariate spline space has a complex structure. Even such basic problems as determining its dimension or the construction of 207 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 207–216. © 2005 Springer. Printed in the Netherlands.
208
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
its basis are still unsolved. The main difficulty is caused by the fact that the dimension depends not only on the topology (graph of triangulation), but also on the geometry (exact vertex positions) of the triangulation. The space of cubic C 1 splines S31 () is of particular interest, since it is the smallest space that allows interpolation at the vertices of the triangulation. The dimension of the spline space S31 () has not been established in general. It has been determined only for some special triangulations (triangulations of type 1, 2, nested polygon triangulations ([7, 8, 2]), etc.). The lower and the upper bound on the dimension can be found in [7, 8]. Some improvements on the upper bound are given in [5, 6]. The lower bound reads dim S31 () ≥ 3V VB () + 2V VI () + σ() + 1,
(1)
where VB () denotes the number of boundary vertices, V I () the number of ,VI ( ) internal vertices and σ() = i=1 σi , where 1 , vertex Ti is singular, σi := σ(T Ti ) := 0 , otherwise. Vertex is singular iff it is of degree exactly 4 and is given as an intersection of two lines (Fig. 1). If a triangulation has exactly one inner vertex T , it is called a cell, and will be denoted cell (T ).
Figure 1.
A singular vertex T .
Figure 2. \1 .
A triangulation = 1 ∪
Suppose is a triangulation with a boundary vertex T 0 of degree s + 1, and let 1 denote its subtriangulation that includes all the triangles with the vertex T0 (Fig. 2). Here is the main result of the paper. Theorem 1. The dimension dim(S31 ()) is equal to the lower bound (1) if dim(S31 (\1 )) is equal to the lower bound, there are no consecutive pairs of collinear edges on the common boundary of 1 and \1 , and one of the following conditions is fulfilled: 1) s ≤ 4, |1 | = s, (Fig. 4),
On the dimension of bivariate spline space S31 ()
209
2) s = 2, 3, |1 | = s + 2, and 1 includes a cell (Fig. 5, Fig. 6). The proof of Theorem 1 will be given in sections 3 and 4. It is easy to give simple sufficient conditions on the triangulation , where the reductions of Theorem 1 can be applied. Corollary 2. Let there be no collinearities of the edges at the inner vertices of the triangulation . Let the degrees of inner vertices be at most 6 and let the degrees of vertices on the outer face be at most 5. Then the dimension of spline space S31 () is equal to the Schumaker’s lower bound (1).
2.
The approaches to the dimension problem
There are four most common approaches to the dimension problem: based on the standard polynomial or the Bernstein-B´e´ zier basis, the minimal determining set approach and the polynomial blossoms. Here the blossoming approach will be followed. With the help of multiindex notation it is straightforward to write smoothness conditions in the blossoming form ([1], Thm. 2.1). Smoothness conditions should be satisfied for each inner edge of the triangulation . If all the conditions are combined together, the system of linear equations is obtained, and its matrix plays the key role in the dimension problem. The rank of this matrix gives the number of independent conditions. By ([1], Thm. 2.2, 3.1) some equations can be proven independent, and some dependent can be removed. The cubic case is simplified to ([1])
where
dim S31 () = 7 N + 3 − rank M,
(2)
M11 M12 M := M () := , 0 M22
(3)
and Mkm are block matrices with EI block rows and N block columns. Each block row corresponds to the smoothness conditions over the edge between the neighbouring triangles Ωi , Ωj . In every block row there are 2 nonzero blocks Qkm,i , Qkm,j = −Qkm,i , and the matrices Qkm,j are circular matrices of size 2 × (m + 2). Let v = (α , β ) denote a normalized directional vector of the edge e between triangles Ωi and Ωj , and let t = (c , d ) denote some point on the edge e (Fig. 3). Further let vi × vj := (αi βj − αj βi )
(4)
denote planar vector product. It is equal to 0 iff the vectors v i and vj are collinear. Throughout the paper it will be assumed that the triangulation is in general position, i.e. αi = 0, βi = 0, all i. That can be simply achieved by
210
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Figure 3.
Neighbouring triangles and the notation.
using a proper rotation. In the block matrix M 11 , the blocks read α β 0 Q11,i = −Q11,j = , 0 α β and in M22 Q22,i = −Q22,j
2 β2 0 α 2α β = . 0 α2 2α β β2
In the block M12 there are not only directions, but some vertices of the triangulation are involved too. It is possible to simplify some of the blocks without changing the rank of M by choosing the points t := (c , d ) at the inner vertices of the triangulation, and choosing some arbitrary additional points zk := (xk , yk ) ∈ IR2 , k = 1, 2, . . . , N ([1], Lemma 3.1). The block Q 12,i is of the form α (c − xi ) α (d − yi ) + β (c − xi ) β (d − yi ) 0 , (5) α (d − y ) + β (c − x ) β (d − y ) 0 α (c − x )
i
i
i
i
the block Q12,j reads α (c − xj ) α (d − yj ) + β (c − xj ) β (d − yj ) 0 − , α (d − yj ) + β (c − xj ) β (d − yj ) α (c − xj ) 0 (6) so the choice zk = t , k ∈ {i, j} reduces (5) or (6) to zero block. Of course, not all blocks can be simplified this way.
3.
The reduction step
There is very unlikely that the rank of a large symbolic matrix M could be determined in general. So it is perhaps the best idea to look for sufficient conditions that allow the problem to be reduced to a smaller one, and carry such step onto the smaller problem. The exact dimension count, established in advance for a particular triangulation that satisfies these sufficient conditions, is a needed basis for practical computations.
On the dimension of bivariate spline space S31 ()
211
Here is one of the possible approaches. Let us recall the setup of Theorem 1. Let 1 denote the subtriangulation of the triangulation , attached to the boundary vertex, and let \1 denote the rest of the triangulation (Fig. 2). The matrix M could be written as ⎡ ⎤ M (1 ) 0 M () = ⎣M (1 , ) M (, 1 )⎦ . (7) 0 M (\1 ) The matrices M (1 , ), M (, 1 ) represent the common part of the smoothness conditions between 1 and \1 , M (1 ) represent the conditions inside 1 , and M (\1 ) the conditions inside \1 . Fact: if a submatrix M (1 ) = Ms (1 , ) := ∈ IRr×c (8) M (1 , ) satisfies the condition r ≤ c and the matrix M ( 1 , ) is of full rank, then =s (1 , ) + rank M (\1 ). rank M () = rank M
(9)
Let now by inductive supposition the dimension of the spline space over \ 1 be equal to the lower bound dim S31 (\1 ) = 3V VB (\1 ) + 2V VI (\1 ) + σ(\1 ) + 1.
(10)
Then (2) and (9) imply =s (1 , ) + 3V dim S31 () = 7 |1 | − rank M VB (\1 )+ 2V VI (\1 ) + σ(\1 ) + 1.
(11)
Therefore, if one is able to prove the inequality =s (1 , ) ≥ 7 |1 | + 3 (V rank M VB (\1 ) − VB ()) + 2 (V VI (\1 ) − VI ()) + (σ(\1 ) − σ()) , (12) the dimension dim S31 () will be equal to the lower bound too. At the last reduction step one is left with one triangle, i.e., N = 1, V B = 3, VI = 0, σ = 0, and 3V VB + 2V VI + σ + 1 = 10 = 7N + 3, so the induction hypothesis is satisfied in the beginning. So, under certain conditions the subtriangulation 1 can be cut off and the procedure carried on the rest of matrix/triangulation. There are some limitations, in particular the condition r ≤ c. 1 may consist of more than s
212
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
triangles. However, we allowed only an additional cell, in order to keep the problem within reasonable complexity limits. Even with described restriction, the proof of the inequality (12) requires calculation of ranks of large symbolic matrices. It turned out that some of the problems are too large for the implemented algorithms in symbolic packages, such as Mathematica and Maple, even on quite enhanced PC. Also other known algorithms, based on the special structure of the matrix, were implemented: Gaussian elimination, Laplacian decomposition ([3]), Chio’s pivotal condensation ([9]), and the methods for calculation of large determinants ([4]). The most successful turned out to be the method of the calculation of determinants with a special structure, factorization of polynomials and the use of Gaussian elimination. Here is a general idea: to prove that a given matrix is of full rank r, it is enough to find a suitable minor of size r. Factorization of such minor proves that the matrix is of rank r in general, and gives conditions on , in which the rank can be lower. In order to calculate the rank (in degenerate cases) we find a subminor that can never be zero. The change of the rank in the submatrix M (1 ) is shown by using Gaussian elimination.
4.
The reduction possibilities considered
Let us conclude the proof of Theorem -s 1. First, let us consider the case 1) (Fig. 4). Note that |1 | = s, 1 = i=1 Ωi and T0 is a common vertex to all Ωi . The edges e := T T0 , = 1, 2, ..., s − 1 are chosen to be the inner edges
Figure 4.
Part of the triangulation at the boundary vertex T0 .
of 1 and the edges e , = s, s + 1, ..., 2s − 1 belong to the common border of 1 and \1 . The choice of vertices t is given as t := (t ) = (T T1 , T2 , ..., Ts−1 , T1 , T1 , T2 , ..., Ts−1 , •, •, ..., • ), 6 788 9 EI −2s+1
(13)
On the dimension of bivariate spline space S31 ()
213
and the vertices zk that simplify M12 are selected as z := (zk ) = (T T1 , T1 , T2 , ..., Ts−1 , •, •, ..., • ). 6 788 9
(14)
N −s
=s (1 , ), s ≤ 4, is of full rank. If s = 1, the matrix Let us show that M Q11,11 = M1 (1 , ) = Q22,11 4×7 is of full rank, since rank Qkk,lj = 2. Let us recall Fig. 3. Then Q22,i det = (v × vm )4 = 0, Q22,mi 4×4 and
⎡
Q11,i ⎣ det Q11,mi
Q11,j
(15)
⎤ ⎦
Q11,nj
= (v × vm )(vm × vn )(vn × v ) = 0.
(16)
6×6
=2 (1 , ), But (15) and (16) imply that the matrix M ⎡Q
11,11
⎢ Q 2 (1 , ) = ⎢ M ⎢ 11,21 ⎣
Q11,12 Q11,32
⎤ Q22,11 Q22,21
Q22,12
Q22,32
⎥ ⎥ ⎥ ⎦
,
12×14
=3 (1 , ) is given as is of full rank. If s = 3, M ⎡ Q11,11 ⎢ ⎢ ⎢ ⎢ Q 3 (1 , ) = ⎢ M ⎢ 11,31 ⎢ ⎢ ⎢ ⎣
Q11,12 Q11,22
Q11,42
⎤ Q11,23
Q11,53
Q22,11
Q22,31
Q12,22 Q22,12 Q22,22
Q22,42
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
Q22,23 ⎥
Q22,53
.
20×21
By (15) one can omit block rows 3,4,8,10, and block columns 4 and 6, and consider the rest of the matrix. If the last column of the matrix
Q11,23 Q11,53
Q12,22 >> Q22,42
6×7
is omitted, one gets 6 × 6 minor α34 e4 (v4 × v2 )(v2 × v5 )(v5 × v4 ) = 0.
214
Figure 5. (s = 2).
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Cell of degree 4 at the boundary
Figure 6. Cell of degree 4 with an additional triangle at the boundary (s = 3).
=3 (1 , ) is of the full Together with (16) this establishes that the matrix M =4 (1 , ) is a square matrix, rank. In the last case, s = 4, the matrix M ⎡Q
11,11
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ Q11,41 4 (1 , ) = ⎢ M ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
Q11,12 Q11,22
Q11,52
⎤ Q11,23 Q11,33
Q11,63
Q11,34
Q11,74
Q12,22 Q22,11
Q22,41
Q22,12 Q22,22
Q22,52
Q12,33 Q22,23 Q22,33
Q22,63
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ Q22,34 ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ Q22,74
,
28×28
and has determinant =4 (1 , ) = ||e5 || ||e6 || (v1 × v4 )5 (v1 × v5 )(v2 × v5 )3 (v2 × v6 )3 det M (v3 × v6 )(v3 × v7 )5 (v4 × v5 )(v5 × v6 )2 (v6 × v7 ) = 0. Since =s (1 , ) = 4 (2s − 1), rank M VB (\1 ) = VB () + s − 2,
|1 | = s, VI (\1 ) = VI () − s + 1,
and σ(\1 ) = σ() by assumption, (12) is satisfied. This concludes the first part of the proof of Theorem 1. Now let us consider the second part. Let s = 2. Then | 1 | = 4 and 1 is a cell of degree 4 (Fig. 5). Here the points on the edges, t , and additional points zk for the faces of the triangulation are chosen in a slightly different way, i.e., the smoothness conditions for the inner edges of cell(T 1 ) are included in M (1 ): e1 : T1 , e2 : T1 , e3 : T1 , e4 : T1 , e5 : T2 , e6 : T2 , and for faces: Ω1 : T1 , Ω2 : T1 , Ω3 : T1 , Ω4 : T1 . This choice leaves only two nonzero =2 (1 , ) reads blocks Q12,53 , Q12,64 in the block M12 . The matrix M
On the dimension of bivariate spline space S31 () ⎡ Q11,11
2 (1 , ) M
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ =⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
Q11,21 0 0 0 0 0 0 0 0 0 0
Q11,12 0 Q11,32 0 0 0 0 0 0 0 0 0
0 Q11,23 0 Q11,43 Q11,53 0 0 0 0 0 0 0
0 0 Q11,34 Q11,44 0 Q11,64 0 0 0 0 0 0
215
0 0 0 0 0 0
0 0 0 0 0 0
Q22,11 Q22,21 0 0 0 0
Q22,12 0 Q22,32 0 0 0
0 0 0 0 Q12,53 0 0 Q22,23 0 Q22,43 Q22,53 0
0 0 0 0 0
⎤
⎥ ⎥ ⎥ ⎥ ⎥ Q12,64 ⎥ ⎥ ⎥ 0 ⎥ 0 ⎥ Q22,34 ⎥ ⎥ Q22,44 ⎦
0 Q22,64
.
24×28
=2 (1 , ), one gets a minor If one omits columns 6, 9, 19, 20 of the matrix M of size 24: 2α43 α5 α6 (v2 × v1 )5 (v3 × v4 )(α2 α3 (v4 × v1 ) + α1 α4 (v3 × v2 ))(v5 × v4 )2 (v6 ×v4 )2 (v6 ×v5 )2 (β5 (c1 −c2 )+α5 (−d1 +d2 ))(β6 (c1 −c2 )+α6 (−d1 +d2 )). (17) Since a vector w := (β β , −α ) is orthogonal to v = (α , β ), the geometry of the triangulation implies β (c1 − c2 ) + α (−d1 + d2 ) = w , T1 − T2 = 0, = 5, 6. Therefore the minor (17) is nonzero, except in the case, when α2 α3 (v4 × v1 ) + α1 α4 (v3 × v2 ) = 0. This can happen iff T1 is a singular vertex. So, in the nonsingular case, the =2 (1 , ) is of full rank. In the singular case the rank changes for at matrix M least 1. But, if one takes 23 × 23 submatrix (by deleting the row 1 and columns =2 (1 , )), one gets a minor 3,6,9,19,20 in M −2α1 α22 α53 α4 α5 α6 (v1 × v2 )4 (v3 × v4 )(v5 × v4 )2 (v6 × v4 )2 (v6 × v5 )2 (β5 (c1 − c2 ) − α5 (d1 − d2 ))(β6 (c1 − c2 ) − α6 (d1 − d2 )) = 0. By using a sequence of rank preserving row eliminations on the first 8 rows of =2 (1 , ) it is easy to see that M (1 ) is of full rank 8, except in the singular M case, when the rank equals 7. The independence of the rest of the matrix follows from the structure of the matrix and the results on the rank. The case s = 3 is proved in a similar way. Since =s (1 , ) = 4 (2s + 2) − σ(1 ), rank M VB (\1 ) = VB () + s − 2,
|1 | = s + 2, VI (\1 ) = VI () − s,
and σ(\1 ) = σ() − σ(1 ) by assumption, (12) is satisfied. This concludes the proof of Theorem 1.
216
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
References [1] Chen, Z. B., Feng Y. Y., Kozak J., The Blossom Approach to the Dimension of the Bivariate Spline Space, J. Comput. Math., vol. 18, No. 2 (2000), 183-199. ¨ [2] Davydov, O., Nurnberger, G., and Zeilfelder, F., Cubic spline interpolation on nested polygon triangulations, in Curve and Surface Fitting: Saint-Malo 1999, A. Cohen, C. Rabut, and L. L. Schumaker, (eds.), 161-170, Vanderbilt University Press, 2000. [3] Karlin, S., Total Positivity, Stanford University press, Stanford, 1968. [4] Krattenthaler, C., Advanced determinant calculus, http://www.mat.univie.ac.at/∼kratt/artikel/detsurv.html. [5] Manni, C., On the Dimension of Bivariate Spline Spaces on Generalized Quasi-cross-cut Partitions, J. of Approximation Theory 69, 1992, 141-155. [6] Riepmeester, D. J., Upper bounds on the dimension of bivariate spline spaces and duality in the plane, in M. Daehlen, T. Lyche, L. L. Schumaker (eds.), Mathematical methods for curves and surfaces, Vanderbilt University Press, Nashville, 1995, 455-466. [7] Schumaker, L. L., On the dimension of the space of piecewise polynomials in two variables, in Multivariate Approximation Theory, W. Schempp and K. Zeller (eds.), Birkhauser, Basel, 1979, 251-264. [8] Schumaker, L. L., Bounds on the dimension of spaces of multivariate piecewise polynomials, Rocky Mountain J. of Math., 14 (1984), 251-265. [9] Weisstein, E., World of Mathematics, http://mathworld.wolfram.com/ChioPivotalCondensation.html.
TOTAL LEAST SQUARES PROBLEM FOR THE HUBBERT FUNCTION Dragan Juki´c Department of Mathematics, University of Osijek Gajev trg 6, HR-31 000 Osijek, Croatia
[email protected]
Rudolf Scitovski Department of Mathematics, University of Osijek Gajev trg 6, HR-31 000 Osijek, Croatia
[email protected]
Kristian Sabo Department of Mathematics, University of Osijek Gajev trg 6, HR-31 000 Osijek, Croatia
[email protected]
Abstract
In this paper we consider the parameter estimation (PE) problem for the logistic function-model in case when it is not possible to measure its values. We show that the PE problem for the logistic function can be reduced to the PE problem for its derivative known as the Hubbert function. Our proposed method is based on finite differences and the total least squares method. Given the data (pi , ti , yi ), i = 1, . . . , m, m > 3, we give necessary and sufficient conditions which guarantee the existence of the total least squares estimate of parameters for the Hubbert function, suggest a choice of a good initial approximation and give some numerical examples.
Keywords:
logistic function, Hubbert function, nonlinear least squares, total least squares, existence problem.
1.
Introduction Mathematical models described by the Verhulst’s logistic function g(t; A, β, γ) =
A , 1 + eβ−γt
A, γ > 0, β ∈ R
217 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 217–234. © 2005 Springer. Printed in the Netherlands.
(1)
218
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
are often used in various applied areas, e.g. biology (see [19]), marketing (see [6], [22]), economics (see [18]), etc. The unknown parameters A, β and γ have to be determined on the basis of some experimentally or empirically obtained data. This problem is known as the parameter estimation (PE) problem. There are two basic cases which have to be considered separately when trying to solve this problem: Case 1. when the data for the logistic function are available, and Case 2. when the data for the logistic function are not available.
Case 1. The data for the logistic function are available Suppose we are given the experimental or empirical data (p i , ti , yi ), i = 1, . . . , m, where ti denotes values of the independent variable, y i the respective measured function value, and pi > 0 is the data weight. If the errors in the measurements of the independent variable are negligible, and the errors in the measurements of the dependent variable are independent random variables following a normal distribution with expectancy zero, then in practical applications the unknown parameters A, β and γ of the logistic function (1) are usually estimated in the sense of the least squares (LS) method by minimizing the functional (see [2], [12], [20])
m Sg (A, β, γ) = pi i=1
A − yi 1 + eβ−γti
2
on the set B := {(A, β, γ) ∈ R3 : A, γ > 0}. Minimizing value (A , β , γ ) ∈ B is called the least squares (LS) estimate, if it exists. Numerical methods for solving nonlinear least squares problem can be found in [5] and [7]. Prior to the iterative minimization of the sum of squares it is still necessary to ask whether an LS estimate exists. There has been much of recent literature regarding the existence of the LS estimation for the logistic function (see e.g. [4] and [17]). This problem is solved in [12], where necessary and sufficient conditions on the data are given which guarantee the existence of the LS estimate for the logistic function.
Case 2. The data for the logistic function are not available There are many models so that it is not possible to measure values of the corresponding logistic function, but one still has to determine its unknown parameters A, β and γ. Such situation occurs almost always when the logistic function is used for modeling some cumulative quantity. So e.g. in marketing it is used as a model of cumulative production and cumulative sales (see e.g. [6, 22]), in geology it is used as a model of cumulative consumption of a finite non-renewable resource such as crude oil or gas (see [9, 10]). In such
219
TLS problem for the Hubbert function
situations it is usually possible to measure increments of the function-model (annual production, annual sales, annual consumption). Assumption. In this case we assume that for some values τ1 < τ2 < . . . < τm+1
(2)
of the independent variable we have measured value y i of the difference quotient g(ττi+1 ; A, β, γ) − g(ττi ; A, β, γ) , τi+1 − τi
i = 1, . . . , m.
Note that yi =
g(ττi+1 ; A, β, γ) − g(ττi ; A, β, γ) + εi , τi+1 − τi
i = 1, . . . , m,
(3)
where εi are some unknown additive errors of measurements. Now, by using this assumption, we are going to show how the unknown parameters A, β and γ of the logistic function can be estimated by using its derivative f (t; α, β, γ) =
α eβ−γ t , (1 + eβ−γ t )2
α, γ > 0, β ∈ R,
(4)
where α = Aγ.
(5)
Function (4) is known as the Hubbert curve, in honour of an American geophysicist M. K. Hubbert who used this curve for a projection of future annual crude oil production and consumption (see e.g.[9]). Figure 1 displays the logistic curve (as a model of cumulative production) and the corresponding Hubbert curve (as a model of annual production). Note that the Hubbert curve is similar in shape to the Gaussian curve. Suppose that (α , β , γ ) are in some way estimated values of the unknown parameters of Hubbert function (4). Then according to (5), the unknown parameters (A, β, γ) of logistic function (1) can be approximated by (α /γ , β , γ ).
(6)
In order to estimate the unknown parameters α, β and γ of Hubbert function (4) we are going to use the data (pi , ti , yi ), i = 1, . . . , m, where 1 (ττi + τi+1 ), i = 1, . . . , m, (7) 2 and pi > 0 are some data weights given in advance. According to the mean value theorem, there exists a vector δ = (δ 1 , . . . , δm ) ∈ Rm such that ti :=
g(ττi+1 ; A, β, γ) − g(ττi ; A, β, γ) = f ti + δi ; α, β, γ , τi+1 − τi
i = 1, . . . , m, (8)
220
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Hubbert curve
cumulative production
logistic curve
annual production
Figure 1.
and therefore (3) can be rewritten as yi = f ti + δi ; α, β, γ + εi ,
i = 1, . . . , m.
(9)
In model (9) δi can be interpreted as some unknown errors in the measurement ti of the independent variable, whereas εi can be interpreted as some unknown errors in the measurement of the corresponding value y i of the dependent variable. The unknown parameters of Hubbert function (4) can be estimated by the LS method, i.e. by solving the following problem : 2 m α eβ−γ ti min S(α, β, γ), where S(α, β, γ) = pi − yi . (1 + eβ−γ ti )2 (α,β,γ)∈B i=1
Necessary and sufficient conditions on the data which guarantee the existence of the LS estimate for the Hubbert function can be found in [13]. Note that in model (9) considerable errors can occur in the measurements of all variables. In this case it is reasonable to estimate the unknown parameters α, β and γ by minimizing the weighted sum of squares of all errors. This approach is known as the total least squares (TLS) method, and it leads to minimization of the weighted sum of squares of the distances from the data points (ti , yi ) to the curve t → f (t; α, β, γ), i.e. one has to solve the following problem (see [1], [11], [16], [21]): Does there exist a (3 + m) - tuple of numbers (α , β , γ , δ ) ∈ B × Rm , such that F (α , β , γ , δ ) = inf F (α, β, γ), (10) (α,β,γ,δ)∈B×Rm
where F (α, β, γ, δ) =
m i=1
pi
α eβ−γ (ti +δi ) − yi (1 + eβ−γ (ti +δi ) )2
2 +
m i=1
pi δi2
(11)
221
TLS problem for the Hubbert function
and B := {(α, β, γ) ∈ R3 : α, γ > 0} ? A point (α , β , γ ) ∈ B is called the total least squares (TLS) estimate of parameters for the Hubbert function. In the statistics literature, the TLS approach is known as errors in variables or orthogonal distance regression, and in numerical analysis it was first considered by Golub and Van Loan [8]. In Figure 2 we illustrate the difference between the OLS and the TLS approach. a) OLS approach
b) TLS approach
y 6
y 6
r
r
r
(ti , yi ) r
-
δi
r
εi di r ((ti + δi , f (ti + δi ))
r
t
-
t
Figure 2.
Since in the case of the TLS problem one has to deal with a large number of independent variables, special numerical methods for finding the TLS estimate were designed (see e.g. [1, 21]). In this paper we give necessary and sufficient conditions on the data which guarantee the existence of the TLS estimate for the Hubbert function, suggest a choice of a good initial approximation and give some numerical examples.
2.
The existence problem and its solution
Suppose we are given the experimental or empirical data (p i , ti , yi ), i = 1, . . . , m, m ≥ 3, where t1 < t2 < . . . < tm denote the values of the independent variable, yi > 0 are the respective measured function values and p i > 0 are the data weights. The following lemma which is going to be used in the proof of Theorem 1, shows that there exist data such that problem (10) - (11) has no solution. Lemma 1. If points (ti , yi ), i = 1, . . . , m, m ≥ 3, all lie on some exponential curve y(t) = b ect , b, c > 0, then problem (10) - (11) has no solution.
222
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Proof. Since F (α, β, γ, δ) ≥ 0 for all (α, β, γ, δ) ∈ B × R m , and lim F (α, ln α − ln b, c, 0) =
α→∞
=
m lim pi
α→∞
i=1
m lim pi
α→∞
i=1
2 α2 e− ln b−c ti − y i (1 + αe− ln b−c ti )2
2 e− ln b−c ti − y i ( α1 + e− ln b−c ti )2
m 2 = pi b ecti − yi = 0, i=1
this means that inf (α,β,γ,δ)∈B×Rm F (α, β, γ, δ) = 0. Furthermore, since any Hubbert curve intersects the graph of exponential function y(t) = be ct at two points at most, and m ≥ 3, it follows that F (α, β, γ, δ) > 0 for all (α, β, γ, δ) ∈ B × Rm , and hence problem (10)- (11) has no solution. The following theorem gives necessary and sufficient conditions on data which guarantee the existence of the TLS estimate for the Hubbert function. First, let us introduce the following notations: Let FE be an infimum of the sum of squares of orthogonal distances for the exponential function y(t) = b ect (b, c > 0), i.e. FE = where
inf
(b,c,∆)∈R2+ ×Rm
FE (b, c, ∆),
m m c(t +∆ ) 2 i i FE (b, c, ∆) = pi be − yi + pi ∆2i . i=1
i=1
Some results referring to the existence of the TLS estimate for the exponential function can be found in [23]. Theorem 1. Let the data (pi , ti , yi ), i = 1, . . . , m, m ≥ 3, be given, such that t1 < t2 < . . . < tm and yi > 0, i = 1, . . . , m. Then problem (10) - (11) has a solution if and only if there exists a point (α 0 , β0 , γ0 , δ 0 ) ∈ B × Rm such that F (α0 , β0 , γ0 , δ 0 ) ≤ FE . Proof. Suppose problem (10) - (11) has a solution. Let (α , β , γ , δ ) ∈ B × Rm be the TLS estimate. Then F (α , β , γ , δ ) ≤ F (α, β, γ, δ),
∀ (α, β, γ, δ) ∈ B × R m .
Let (b, c, ∆) ∈ R2+ × Rm be arbitrary. Since F (α , β , γ , δ ) ≤ F (α, ln α − ln b, c, ∆)
223
TLS problem for the Hubbert function
for all α > 0, taking the limits we obtain (see proof of Lemma 1) F (α , β , γ , δ ) ≤ lim F (α, ln α − ln b, c, ∆) = FE (b, c, ∆), α→∞
F (α , β , γ , δ )
and therefore ≤ FE . Let us now show the converse. Suppose that there exists a point (α 0 , β0 , γ0 , δ 0 ) ∈ B × Rm such that F (α0 , β0 , γ0 , δ 0 ) ≤ FE . Since functional F is nonnegative, there exists F :=
inf
(α,β,γ,δ)∈B×Rm
F (α, β, γ, δ).
Let (αn , βn , γn , δ n ) be a sequence in B × Rm , such that F = lim F (αn , βn , γn , δ n ) n→∞
m m α eβn −γn (ti +δin )
2 n n 2 = lim pi − y + p (δ ) . (12) i i n i n→∞ (1 + eβn −γn (ti +δi ) )2 i=1 i=1
Note that sequences (δin ), i = 1, . . . , m, are bounded. Otherwise, we would n have lim sup F (αn , βn , γnn, δ ) = ∞, which contradicts the assumption that sequence F (αn , βn , γn , δ ) converges to F . Furthermore, since lim F (αn , βn , γn , δ n ) = F ≤ F (α0 , β0 , γ0 , δ 0 ) ≤ FE
n→∞
notice that point (α0 , β0 , γ0 , δ 0 ) solves our TLS problem (10) - (11) if lim F (αn , βn , γn , δ n ) = FE .
n→∞
Hence, we can further assume that lim F (αn , βn , γn , δ n ) < FE .
n→∞
(13)
Without loss of generality, whenever we have an unbounded sequence, we can assume that it runs to ∞ or −∞ - otherwise by the Bolzano-Weierstrass theorem, we take a convergent subsequence. Similarly, whenever we have a bounded sequence, we can assume it is convergent, otherwise we take a convergent subsequence. Let lim δin = δi , i = 1, . . . , m. n→∞
Now we are going to show that sequence (α n , βn , γn ) is bounded by showing that functional F cannot attain its infimum F in either of the following three ways: I. (αn → ∞) and (∃ r ∈ [0, ∞), βn2 + γn2 → r ),
224
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
II. (∃ α ∈ [0, ∞), αn → α ) and (β βn2 + γn2 → ∞), III. (αn → ∞) and (βn2 + γn2 → ∞). In each of these cases we are going to find a point in B × R m at which the functional F attains a value which is smaller than lim n→∞ F (αn , βn , γn , δ n ), thus showing that neither of these three cases is possible. I. In the first case we would have that lim n→∞ F (αn , βn , γn , δ n ) = ∞, which means that in this case functional F cannot attain its infimum. II. Case αn → α ≥ 0 and βn2 + γn2 → ∞. Let us denote I := {1, . . . , m}, li := lim f (ti + δin ; αn , βn , γn ),
i ∈ I,
n→∞
I0 := {i ∈ I : li > 0}. Note that in this case the limit of the corresponding weighted sum of squares reads: 2 lim F (αn , βn , γn , δ n ) = pi yi2 + pi (li − yi )2 + pi δi n→∞
i∈I0
i∈I\I0
≥
pi yi2 +
i∈I 2
pi δi =: FI0 ,δ .
(14)
i∈I0
i∈I\I0
Let us find a point in B × Rm at which functional F attains a value which is smaller than FI0 ,δ . II.1. Case I0 = ∅. In this case we have FI0 ,δ ≥
m
pi yi2 − pr yr2 ,
i=1
where (pr , tr , yr ) is any datum chosen in advance. Now consider the following class of Hubbert’s curves 4yr e−γ(t−tr ) t → f t; 4yr , γ tr , γ = 2 , 1 + e−γ(t−tr )
γ>0
whose graph contains point (tr , yr ). Then 0 < f ti ; 4yr , γ tr , γ ≤ yi ,
i = 1, . . . , m
225
TLS problem for the Hubbert function
for every sufficiently large γ ∈ R, whereby the equality holds only if i = r (see Figure 3). Therefore, for every sufficiently large γ ∈ R we obtain 2 m 4yr e−γ(ti −tr ) F 4yr , γ tr , γ, 0 = pi 2 − yi 1 + e−γ(ti −tr ) i=1 m < pi yi2 − pr yr2 < FI0 ,δ . i=1
This means that in this way functional F cannot attain its infimum. (tr , yr ) γ3 > γ 2 > γ 1 > 0
γ3
Figure 3.
γ2
γ1
−γ(t−tr )
Graph of Hubbert function t → 4yr e
1+e−γ(t−tr )
2
II.2. Case I0 = ∅. Let us first show that I := {i ∈ I0 : δi = 0} is either an empty set or a one-point set. For that purpose, denote: µn := (β βn , −γ γn ), uni := (1, ti + δin ), n ∈ N, i = 1, . . . , m, µ n µ0n := , n ∈ N, µ0 = (µ01 , µ02 ) := lim µ0n , and n→∞ µn n ui := lim ui , i = 1, . . . , m. n→∞
With the above notations we can write αn eµn ·ui
n
f (ti + δi ; αn , βn , γn ) =
1 + eµn ·ui
n
2 ,
(15)
where µn · uni denotes the so-called “inner product" (or scalar product) of µ n and uni .
226
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Note that µ0 · ui = (µ01 , µ02 ) · (1, ti + δi ) = 0,
∀i ∈ I0 .
(16)
Namely, otherwise we would have |µ0 · ui | > 0. So, because of µn → ∞ there would be | lim µn · uni | = | lim µn µ0n · uni | = | lim µn · lim µ0n · uni | n→∞ n→∞ n→∞ n→∞ = | lim µn · µ0 · ui | = ∞, n→∞
and (15) would imply li = 0. This contradicts the definition of the set I 0 . Since (µ01 )2 + (µ02 )2 = 1, (16) implies µ02 = 0
& ti + δi = −
µ01 =: τ0 , µ02
∀i ∈ I0 .
(17)
From (17) it follows that I is a one-point set I = {r} if it is not empty. Case (a) I0 = I = {r}. In this case for any datum (p r , tr , yr ) chosen in advance from (14) we obtain FI0 ,δ >
m
pi yi2 − pr yr2 .
i=1
In considerations under II.1. we have shown that there B × Rm ,mexists 2a point in 2 at which functional F attains a value smaller than i=1 pi yi − pr yr . Case (b) I0 = I . Let y0 be any real number such that y0 > maxi∈I0 yi and consider the following class of Hubbert’s curves 4y0 e−γ(t−ττ0 ) t → f t; 4y0 , γ τ0 , γ = 2 , 1 + e−γ(t−ττ0 ) whose graph contains point (ττ0 , y0 ) (see Figure 4). tions δi : (0, ∞) → R, i = 1, . . . , m, by ⎧ 2y0 −yi +2√y02 −y0 yi 1 ⎪ , τ0 − ti + γ ln ⎪ yi ⎪ ⎪ ⎨ √ 2y0 −yi −2 y02 −y0 yi δi (γ) = τ0 − ti + γ1 ln , ⎪ yi ⎪ ⎪ ⎪ ⎩ 0, Note that
2 2y0 − yi + 2 y02 − y0 yi > 1, yi
γ>0
Furthermore, define funcif i ∈ I0 and ti ≥ τ0 if i ∈ I0 and ti < τ0 if i ∈ I\II0 .
if i ∈ I0 and ti ≥ τ0
(18)
227
TLS problem for the Hubbert function
2 2y0 − yi − 2 y02 − y0 yi 0< < 1, if i ∈ I0 and ti < τ0 yi By using (18) and (19) we obtain 2 ? 2y0 − yi + 2 y 2 − y0 yi τ0 −ti 0 0 < yi
and
(19)
i∈I0 ti >ττ0
2 ? 2y0 − yi − 2 y 2 − y0 yi τ0 −ti 0 × < 1. yi
(20)
i∈I0 ti <ττ0
The last inequality will be used later. (ττ0 , y0 ) (ti + δi (γ), yi )
◦ (tj , yj )
(ti , yi )
◦
KA A
(tj + δj (γ), yj )
Figure 4.
Since (see Figure 4) f ti + δi (γ); 4y0 , γ τ0 , γ = yi , we have F 4y0 , γ τ0 , γ, δ(γ) =
i ∈ I0 ,
(21)
2 pi f ti + δi (γ); 4y0 , γ τ0 , γ − yi
i∈I\I0
+
pi δi2 (γ)
(22)
i∈I0
and
2 lim F 4y0 , γ τ0 , γ, δ(γ) = pi yi2 + pi δi = FI0 ,δ .
γ→∞
i∈I\I0
i∈I0
228
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Let us now show that
F 4y0 , γ τ0 , γ, δ(γ) < FI0 ,δ
for every sufficiently large γ ∈ R. This will mean that in this way functional F cannot attain its infimum. Note that 0 < f ti + δi (γ); 4y0 , γ τ0 , γ < yi , i ∈ I\II0 , for every sufficiently large γ ∈ R. Therefore, for every sufficiently large γ ∈ R we have 2 pi f ti + δi (γ); 4y0 , γ τ0 , γ − yi < pi yi2 . i∈I\I0
i∈I\I0
It remains to show that for the second sum from (22) there holds 2 Ψ(γ) := pi δi2 (γ) < pi δi i∈I0
i∈I0
whenever γ is large enough. From 2 γ2 d Ψ 1 2y0 − yi + 2 y02 − y0 yi τ0 − ti + ln × (γ) = − 2 dγ γ yi i∈I0 ti >ττ0
2 2y0 − yi + 2 y02 − y0 yi ln yi 2
1 2y0 − yi − 2 y02 − y0 yi − τ0 − ti + ln × γ yi i∈I0 ti <ττ0
2 2y0 − yi − 2 y02 − y0 yi ln yi 2
2 1 2 2y0 − yi + 2 y0 − y0 yi − ln γ yi i∈I
and (20) we obtain : 2
2 ? 2y0 − yi + 2 y 2 − y0 yi τ0 −ti γ dΨ 0 lim (γ) = − ln γ→∞ 2 dγ yi i∈I0 ti >ττ0
; 2 ? 2y0 − yi − 2 y 2 − y0 yi τ0 −ti 0 × > 0. yi i∈I0 ti <ττ0
229
TLS problem for the Hubbert function
This means that dd Ψ γ (γ) is positive whenever γ is large enough. Hence there exists a real number γ0 such that function Ψ is strictly increasing on the interval (γ0 , ∞), and thus 2 pi δi2 (γ) = Ψ(γ) < lim Ψ(γ) = pi δi γ→∞
i∈I0
i∈I0
for every γ ∈ (γ0 , ∞). III. Consider the case αn → ∞ and βn2 + γn2 → ∞. Let us first introduce vectors νn ν n := (ln αn − βn , γn ), ν 0n := , n ∈ N ; ν 0 := lim ν 0n , n→∞ ν n n n ui := (1, ti + δi ), ui := (1, ti + δi ), i = 1, . . . , m . In this case for each αn large enough we have f (t; αn , βn , γn ) =
αn eβn −γn t 1 + eβn −γn t
2 =
eβn −γn t √1 αn
1
+ eβn − 2 ln αn −γn t
≈ eln αn −βn +γn t = eν n (1,t) , T
n ∈ N.
2 (23)
Only one of the following two cases can occur: 1. (ν n ) is unbounded,
2. (ν n ) is bounded.
III.1. Suppose (ν n ) is unbounded, i.e. ν n → ∞. With notations I := {1, . . . , m}, li := lim f (ti + δin ; αn , βn , γn ),
i ∈ I,
n→∞
Iˆ0 := {i ∈ I : li > 0} the limit of the corresponding weighted sum of squares reads: 2 lim F (αn , βn , γn , δ n ) = pi yi2 + pi (li − yi )2 + pi δi n→∞
i∈I\Iˆ0
≥
i∈I
i∈Iˆ0
pi yi2 +
i∈I\Iˆ0
2
pi δi =: FIˆ0 ,δ .
i∈Iˆ0
Arguing similarly as in case II.2, one can show that ν 0 · ui = (ν10 , ν20 ) · (1, ti + δi ) = 0,
∀i ∈ Iˆ0 .
So, because of equality (ν10 )2 + (ν20 )2 = 1, we conclude ν20 = 0
& ti + δi = −
ν10 =: τ0 , ν20
∀i ∈ Iˆ0 .
(24)
230
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
This means that set Iˆ0 is of the same type as I0 considered in II.2. Furthermore, from (24) it follows that Iˆ := {i ∈ Iˆ0 : δi = 0} is either an empty set or a one-point set Iˆ = {r}, just like I defined in II.2. Replacing I0 with Iˆ0 and I with Iˆ in II.2. we conclude that there exists a point in B × R m at which functional F attains a value smaller than F Iˆ0 ,δ , which means that also in this way one cannot obtain the infimum of functional F . III.2. Consider the case when (ν n ) = (ln αn − βn , γn ) is bounded. Let (ln αn − βn , γn ) → (ln ˆb, cˆ). Note that cˆ ≥ 0, because γn > 0, n ∈ N. By means of (23) we obtain lim F (αn , βn , γn , δ) =
n→∞
m
m 2 2 pi ˆbecˆ(ti +δi ) − yi + pi δi ≥ FE
i=1
i=1
which contradicts asumption (13). This means that in this way functional F cannot attain its infimum. In this way we completed the proof that sequence (α n , βn , γn ) is bounded. Let (αn , βn , γn ) → (α , β , γ ). By continuity of functional F from (12) we have F =
inf
F (α, β, γ, δ) = lim F (αn , βn , γn , δ n ) n→∞
(α,β,γ,δ)∈B×Rm
= F (α , β , γ , δ ). , 2 If α = 0, then F (α , β , γ , δ ) ≥ m i=1 pi yi . In considerations under II.1 it has been shown that there exists a point at which functional F attains a smaller value. Thus, α = 0. ,m If γ = 0, , then F = F (α , β , γ , δ ) ≥ pi yi2 . Since quadratic i=1 ,m m 2 function ¯)2 at point i=1 pi (yi − x) attains its minimum i=1 pi (yi − y ,mx → , m y¯ = i=1 pi yi / i=1 pi , we have F ≥
m
pi (yi − y¯)2 .
i=1
Furthermore, taking the limit c → 0+ in m 2 FE (¯, c, 0) = pi y¯ecti − yi ≥ FE i=1
,m
we obtain i=1 pi (yi − y¯)2 ≥ FE , and therefore F ≥ FE . This contradicts assumption (13). Thus, γ = 0. Therefore, (α , β , γ , δ ) ∈ B × Rm . This completes the proof of Theorem 1.
231
TLS problem for the Hubbert function
3.
Choice of initial approximation
Numerical methods for minimizing functional F defined by (11) require an initial approximation (α0 , β0 , γ0 , δ 0 ) ∈ B × Rm which is as good as possible. Using the fact that the observed values y i must clearly be close in some sense to the corresponding exact values, i.e. αeβ−γti ≈ yi (1 + eβ−γti )2
&
δi ≈ 0,
i = 1, . . . , m,
and that α/4 is the maximal value of Hubbert function (4), we can determine a possible good initial approximation (α 0 , β0 , γ0 , δ 0 ). We suggest to do this in the following way: Define δ 0 := 0,
α0 := 4yr ,
where yr := max{yi : i = 1, . . . , m}.
(25)
Now choose two data points (ti1 , yi1 ) and (ti2 , yi2 ) such that ti1 < ti2 < tr
& yi1 < yi2 < yr ,
(26)
and for (β0 , γ0 ) take a solution of the system β−γt
i1 α0 e β−γti 2 1) (1+e β−γti 2 α0 e β−γti 2 2) (1+e
= yi1 = yi2 .
Taking into account (26) and the request for γ 0 to be positive, it is easy to check that α0 −2yi1 +√α20 −4α0 yi1 α0 −2yi2 +√α20 −4α0 yi2 + t −ti2 ln ln i1 2yi1 2yi2 β0 = ti1 − ti2 α0 −2yi1 +√α20 −4α0 yi1 α0 −2yi2 +√α20 −4α0 yi2 − ln + ln 2yi1 2yi2 γ0 = (27) ti1 − ti2
4.
Numerical examples
Example 1. To illustrate the efficiency of our parameter estimation method for the logistic function by using the Hubbert function, consider logistic function g(t; 2, 3, 0.4) =
2 . 1 + e3−0.4t
Let m = 12,
τi = 0.5 + i,
i = 1, . . . , m + 1.
232
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Now according to (7) and (3), we define ti :=
τi + τi+1 , 2
yi :=
g(ττi+1 ; 2, 3, 0.4) − g(ττi ; 2, 3, 0.4) , τi+1 − τi
i = 1, . . . , m.
We calculated the initial approximation (α0 , β0 , γ0 , δ 0 ) = (0.789501, 2.98145, 0.4015470) from (25) and (27), taking (tr , yr ) = (t6 , y6 ), (ti1 , yi1 ) = (t1 , y1 ) and (ti2 , yi2 ) = (t5 , y5 ). By using the minimization method described in [1] we obtain the TLS estimate (α , β , γ ) = (0.797485, 2.98868, 0.398474) for the Hubbert function. Figure 5.a shows the graph of Hubbert function f (t; α , β , γ ) and data (ti , yi ), i = 1, . . . , m. According to (6), our method approximates parameters (A, β, γ) = (2, 3, 0.4) of the logistic function with (A , β , γ ) = (2.00134, 2.98868, 0.398474). Figure 5.b shows graphs of logistic function g(t; A , β , γ ) together with data point (ττi , g(ττi ; 2, 3, 0.4)), i = 1, . . . , m + 1.
0.25 2
0.2
1.5
0.15 0.1
1
0.05
0.5 2.5
5
7.5
Figure 5a.
10
12.5
15
2.5
5
7.5
10
12.5
15
Figure 5b.
Example 2. Table 1 shows US 48 States crude oil production. The data are obtained from the web site: http://www.eia.doe.gov/. Fitting the Hubbert curve (4) we obtained the TLS estimate: α = 33.3606, β = 120.519 and γ = 0.061153. The data and the graph of the corresponding Hubbert function f (t; α , β , γ ) are shown in Figure 6. Anomalies from the Hubbert curve can be explained by political factors.
233
TLS problem for the Hubbert function Table 1.
US 48 States crude oil production in million barrels per day.
ti yi
1954 6.342
1955 6.807
1956 7.151
1957 7.170
1958 6.710
1959 7.053
1960 7.034
1961 7.166
1962 7.304
ti yi
1963 7.512
1964 7.584
1965 7.774
1966 8.256
1967 8.730
1968 8.915
1969 9.035
1970 9.408
1971 9.245
ti yi
1972 9.242
1973 9.010
1974 8.581
1975 8.183
1976 7.958
1977 7.781
1978 7.478
1979 7.151
1980 6.980
ti yi
1981 6.962
1982 6.953
1983 6.974
1984 7.157
1985 7.146
1986 6.814
1987 6.387
1988 6.123
1989 5.739
ti yi
1990 5.582
1991 5.618
1992 5.457
1993 5.264
1994 5.103
1995 5.076
1996 5.071
1997 5.156
1998 5.077
ti yi
1999 4.832
2000 4.863 10 8 6 4 2
1940
1960
1980
2000
2020
Figure 6.
References [1] P. T. Boggs, R. H. Byrd and R. B. Schnabel, A stable and efficient algorithm for nonlinear orthogonal distance regression, SIAM J. Sci. Statist. Comput., 8 (1987), pp. 1052–1078. ¨ rck, Numerical Methods for Least Squares Problems, SIAM, Philadelphia, ˚. Bjo [2] A 1996. [3] E. Z. Demidenko, On the existence of the least squares estimate in nonlinear growth curve models of exponential type, Commun. Statist.-Theory Meth., 25 (1996), pp. 159– 182. [4] E. Z. Demidenko, Optimization and Regression, Nauka, Moscow, 1989. (in Russian). [5] J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, 1996. [6] C. J. Easingwood, Early product life cycle forms for infrequently purchased major products, Intern. J. of Research in Marketing, 4 (1987), pp. 3–9.
234
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[7] P. E. Gill, W. Murray and M. H. Wright, Practical Optimization, Academic Press, London, 1981. [8] G. H. Golub and C. F. Van Loan, An analysis of the total least squares problem, SIAM J. Numer. Anal., 17 (1980), pp. 883–893. [9] M. K. Hubbert, Nuclear energy and the fossil fuels, American Petroleum Institute, Drilling and production practices, (1956), pp. 7–25. [10] M. K. Hubbert, Oil and gas supply modeling, NBS special publication 631, U.S. Department of Commerce / National Bureau of Standards, May 1982. p. 90. [11] S. Van Huffel and H. Zha, The total least squares problem, Amsterdam: Elsevier, North–Holland 1993. ´ and R. Scitovski, Solution of the least squares problem for logistic [12] D. Jukic function, J. Comp. Appl. Math. 156 (2003), pp. 159–177. ´, K. Sabo and G. Bokun, Least squares problem for the Hubbert function, [13] D. Jukic in Proceedings of the 9th Int. Conf. on Operational Research, Trogir, October 2-4, 2002, pp 37–46. ´ and R. Scitovski, The best least squares approximation problem for a [14] D. Jukic 3-parametric exponential regression model, ANZIAM J., 42 (2000), pp. 254–266. ´, R. Scitovski and H. Spa ¨ th, Partial linearization of one class of the [15] D. Jukic nonlinear total least squares problem by using the inverse model function, Computing, 62 (1999), pp. 163–178. ´ and R. Scitovski, Existence results for special nonlinear total least [16] D. Jukic squares problem, J. Math. Anal. Appl., 226 (1998), pp. 348–363. ´ and R. Scitovski, The existence of optimal parameters of the generalized [17] D. Jukic logistic function, Appl. Math. Comput., 77 (1996), pp. 281–294. [18] R. Lewandowsky, Prognose und Informationssysteme und ihre Anwendungen, Walter de Gruyter, Berlin, New York, 1980. [19] J. A. Nelder, The fitting of a generalization of the logistic curve, Biometrics, (1961), pp. 89–100. [20] D. A. Ratkowsky, Handbook of Nonlinear Regression Models, Marcel Dekker, New York, 1990. [21] H. Schwetlick and V. Tiller, Numerical methods for estimating parameters in nonlinear models with errors in the variables, Technometrics 27 (1985), pp. 17–24. [22] R. Scitovski and M. Meler, Solving parameter estimation problem in new diffusion models, Appl. Math. Comput., 127 (2002), pp. 45–63. ´, Total least-squares problem for exponential function, [23] R. Scitovski and D. Jukic Inverse problems, 12 (1996), pp. 341–349.
HEATING OF OIL WELL BY HOT WATER CIRCULATION Mladen Jurak Department of Mathematics University of Zagreb Zagreb, Croatia
[email protected]
ˇ Zarko Prni´c INA-Naftaplin, d. o. o. Zagreb, Croatia
[email protected]
Abstract
When highly viscous oil is produced at low temperatures, large pressure drops will significantly decrease production rate. One of possible solutions to this problem is heating of oil well by hot water recycling. We construct and analyze a mathematical model of oil-well heating composed of three linear parabolic PDE coupled with one Volterra integral equation. Further on we construct numerical method for the model and present some simulation results.
Keywords:
Oil well, integro-differential equation, Volterra integral equation.
Introduction An oil well producing at low temperatures may experience large pressure drops due to high viscosity of oil and wax forming. One way to avoid these pressure drops is heating of oil by hot water recycling. The tubing is surrounded by two annulus for water circulation. Hot water is injected into inner annulus and it flows out of the system through the outer annulus. The main technical concern is minimization of energy lost in the system while keeping oil temperature sufficiently high. Configuration just described will be called counter flow exchange. If the hot water is injected into outer annulus and leaves the system through inner annulus, then we talk about parallel heat flow exchange.
235 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 235–244. © 2005 Springer. Printed in the Netherlands.
236
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING hot water in
cold water out tubing blind column production column
formation
inner annulus outer anulus
oil
Figure 1. Counter flow heat exchange
The outline of the paper is as follows. In the first section we present a simple one-dimensional mathematical model describing the heat exchange in the system. We present only counter flow configuration since parallel flow configuration differs only in signs of water velocities. Solvability of a system of integro-differential equations describing the heat exchange is discussed in the second section. It is shown that the result of Artola [1] can be applied. In final section we discuss numerical method for solution approximation and present some numerical results for counter flow and parallel flow configurations. A problem similar to this one was considered in engineering literature in [6].
1.
Mathematical model
Cross-sectional mean velocities of oil and water in inner and outer annulus will be denoted by vo , vi and ve . They are assumed to be constant, and therefore the fluids have constant pressure drops. Furthermore, to simplify the model, we neglect friction and we take mass densities ρ o (oil), ρw (water) to be constant. The heat is transferred between the tubing, inner and outer annulus and the formation according to Newton’s law. With these simplifying assumptions and taking direction of z axis vertically downwards, we obtain the following three parabolic equations (see [2] for example): ∂T To ∂T To ∂ 2 To − vo ] + bo (T To − Ti ) = Do 2 ∂t ∂z ∂z ∂T Ti ∂T Ti ∂ 2 Ti ai [ + vi ] + bo (T Ti − To ) + be (T Ti − Te ) = Di 2 ∂t ∂z ∂z ∂T Te ∂T Te ∂ 2 Te ae [ Te − Ti ) + bf (T Te − Tf ) = De 2 , − ve ] + be (T ∂t ∂z ∂z ao [
(1) (2) (3)
for z ∈ (0, L) and t ∈ (0, tmax ). The main variables are the temperatures of oil, water in inner annulus, water in outer annulus and the temperature of
237
Heating of oil well by hot water circulation
the formation, denoted respectively by T o , Ti , Te and Tf . All coefficients are constant and they have the following meaning: a o = Ao ρo co , ai = Ai ρw cw , ae = Ae ρw cw where Ao , Ai and Ae are cross-sectional areas and co , cw are heat capacities. By bo , be and bf are denoted heat transfer coefficients from Newton’s law, and by Do , Di and De thermal conductivities, multiplied by cross-section areas. In counter flow exchange all three fluid velocities are positive. From mass conservation it follows ai vi = ae ve . Heat flow in surrounding formation is assumed to be radial with respect to the tubing and to have constant (geothermal) gradient in vertical direction. We denote by Tz (z) geothermal temperature and by Ts (r, z, t) the temperature in the soil. Formation temperature Tf is then given by Tf (z, t) = Ts (rf , z, t), where rf is formation radius. The temperature T s is the solution of the heat equation with initial temperature Tz , temperature at infinity equal to Tz , and prescribed heat flux qf at r = rf . In the other hand, qf is given by Newton’s law qf = bf (T Te − Tf ). (4) Then, by applying Duhamel’s principle we can represent formation temperature by the formula !
t
Tf (z, t) = Tz (z) +
p(t − τ )
0
d qf (z, τ ) dτ, dτ
(5)
where p(t) = P (rf , t − τ )/2πkf (kf is thermal conductivity of the soil) and P (r, t) is the solution of the problem ⎧ 1 ∂ ∂P ⎪ ρf cf ∂P ⎪ = r , r > rf , t > 0 ⎪ ⎪ ⎪ r ∂r ∂r ⎪ kf ∂t ⎪ ⎨ P (r, 0) = 0, r > r f (6) ⎪ P (∞, t) = 0, t > 0 ⎪ ⎪ ⎪ ⎪ ∂P 11 ⎪ ⎪ = 1. 1 ⎩ −2πkf ∂r r=rf (ρf and cf are mass density and heat capacity of the soil, respectively). It √ can be shown√as in van Everdingen and Hurst [7] that p(t) = O( t) and p (t) = O(1/ t), as t → 0. Therefore, p (t) is in L1loc ([0, ∞)) and we can make partial integration in (5). Taking natural assumption that q f = 0 at t = 0 (that is Te = Tz at t = 0) and using (4) we obtain a Volterra integral equation for qf : ! bf (T Te (z, t) − Tz (z)) = qf (z, t) + bf 0
t
p (t − τ )qf (z, τ ) dτ.
(7)
238
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
This equation has the resolvent r ∈ L 1loc ([0, ∞)) and it can be solved by formula (see Gripenberg, Londen and Staffanson [3]) qf = bf Te − Tz − r (T T e − Tz ) , (8) where we have introduced convolution operator ! t (r ∗ φ)(t) = r(t − τ )φ(τ ) dτ. 0
By use of (8) and (4) we can eliminate formation temperature from (3) which is then transformed to ae [
∂T Te ∂T Te ∂ 2 Te − ve ] + be (T Te − Ti ) + bf (T Te − r Te ) = De 2 + F, ∂t ∂z ∂z
(9)
where F = bf (T Tz − r Tz ) is a smooth known function. We see that equations (1), (2) and (9) represent parabolic system perturbed by the operator M given by ! t M u(z, t) = r(t − τ )u(z, τ ) dτ (10) 0
The problem is to solve the system (1), (2) and (9) with suitable boundary and initial conditions. We assume given the temperatures of entering water at z = 0 and oil at z = L. At the bottom of inner and outer annulus we have equality of water temperatures and continuity of total thermal flux. Therefore, we take ∂T To ∂T Te (0, t) = 0, (0, t) = 0, To (L, t) = ToL , Ti (0, t) = Ti0 , ∂z ∂z ∂T Ti ∂T Te Ti (L, t) = Te (L, t), Di (L, t) + De (L, t) = 0, ∂z ∂z
(11) (12)
for all t > 0, where ToL = Tz (L) and Ti0 are given. The initial conditions are To (z, 0) = Te (z, 0) = Tz (z), Ti (z, 0) = Tz1 (z),
(13)
where function Tz1 satisfies compatibility conditions T z1 (0) = Ti0 , Tz1 (L) = Tz (L) and it is close to geothermal temperature T z . All functions involved are supposed to be smooth.
2.
Variational problem
We consider variational formulation of the problem (1), (2) and (9) with boundary and initial conditions (11)–(13). Without lose of generality we can consider homogeneous boundary conditions T oL = Ti0 = 0.
239
Heating of oil well by hot water circulation
We introduce Hilbert space V = {(φo , φi , φe ) ∈ H 1 (0, L)3 : φ0 (L) = 0, φi (0) = 0, φi (L) = φe (L)} with the norm · inherited from H 1 (0, L)3 and bilinear forms A, B and C over V × V defined as follows: for T = (T To , Ti , Te ), Φ = (φo , φi , φe ) we set A(T, Φ) = Ao (T To , φo ) + Ai (T Ti , φi ) + Ae (T To , φe ) + B(T, Φ) ! L ∂T To ∂φo ∂T To Ao (T To , φo ) = (Do − ao vo φo ) dz ∂z ∂z ∂z 0 ! L ∂T Ti ∂φi ∂T Ti Ai (T Ti , φi ) = (Di + ai vi φi ) dz ∂z ∂z ∂z 0 ! L ∂T Te ∂φe ∂T Te Ae (T Te , φe ) = (De − ae ve φe ) dz, ∂z ∂z ∂z 0 !
L
B(T, Φ) = bo !
! (T To − Ti )(φo − φi ) dz + be
0
L
(T Ti − Te )(φi − φe ) dz
0 L
+ bf
Te φe dz 0
!
L
C(T, Φ) = −bf
(r Te )φe dz. 0
Duality between V and V will be given by the formula F, Φ = ao F Fo , φo + ai F Fi , φi + ae F Fe , φe where F ∈ V is of the form F = (F Fo , Fi , Fe ), Fo , Fi , Fe ∈ (H 1 (0, L)) , and brackets at the right hand side signify duality between (H 1 (0, L)) and H 1 (0, L). We set H = L2 (0, L)3 , with usual norm denoted by | · |, and by identifying H with its dual we have V ⊂ H ⊂ V , with dense and continuous injections. Furthermore, by W (V, V ) we denote the space of all functions from L2 (0, tmax ; V ) with time derivative in L2 (0, tmax ; V ). It is well known that W (V, V ) is continuously embedded in C([0, t max ]; H). With this notations we can reformulate the problem (1), (2), (9), (11)–(13) in the following variational problem: find T ∈ W (V, V ) such that T(0) = T0 ∈ H and for a.e. t ∈ (0, tmax ) T , Φ + A(T, Φ) + C(T, Φ) = F, Φ, The linear form on the right hand side is given by ! L F, Φ = F φe dz. 0
∀Φ ∈ V.
(14)
240
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
and it is obviously continuous. It is easy to see that A(·, ·) is continuous bilinear form on V which satisfy A(T, T) + γ|T|2 ≥ α T 2 ,
∀T ∈ V,
with some constants α, γ > 0. Bilinear form C(·, ·) comes from perturbation operator M . It is not difficult to see that for any function u : (0, t max ) → L2 (0, L) it holds ! t 2 M u(t) L2 (0,L) ≤ K(t)( |r(t − τ )| u(τ ) L2 (0,L) dτ )1/2 , (15) 0
where K(t) = 0 |r(τ )| dτ . From here it follows that M is linear and continuous operator from L∞ (0, tmax ; L2 (0, L)) to itself, and it has the following continuity property: if un , u ∈ L∞ (0, tmax ; L2 (0, L)) are such that un (t) → u(t)
in L2 (0, L) for a.e. t ∈ (0, tmax )
then M un (t) → M u(t) in L2 (0, L) for a.e. t ∈ (0, tmax ). Furthermore, it is easy to see that M is an operator of local type, as defined in Artola [1], and therefore we can apply Theorem 1 from [1] and conclude: Theorem 1. Variational problem (14) has a unique solution T ∈ W (V, V ) for any T0 ∈ H and F ∈ L2 (0, tmax ; V ).
3.
Numerical approximation
In this section we discuss numerical approximation by finite difference method of the problem (1), (2), (9), (11)–(13). Instead of using equation (9) we find more convenient to apply finite difference method to the equations (1), (2), (3) and to discretize directly equations (4) and (5). We avoid numerical resolution of problem (6) by the use of Hasan and Kabir [4] approximation: p(t) = pn ( where
pn (s) =
kf t ) ρf cf rf2
√ √ √2 s(1 − 0.3 s) π 1 0.6 2 (0.80907 + log(s)) 1 + s
for s ≤ 1.5 for s > 1.5.
Furthermore, in our problem constants Do , Di and De are very small and it is natural to consider hyperbolic system (D o = Di = De = 0) instead of parabolic one. Due to limited space we will not enter here into discussion of
Heating of oil well by hot water circulation
241
existence theory for hyperbolic system. We just note that any difference scheme adapted to hyperbolic version of the system (1)–(3) will produce certain amount of numerical dispersion that will cover thermal diffusion in equations (1)–(3), at least for reasonable mesh sizes. Therefore we chose to neglect thermal diffusion and consequently to drop superfluous Neumann boundary conditions for oil and water in outer annulus. This will generally change the solution just in corresponding boundary layers. We apply explicit finite difference scheme of first order with convective terms treated by upwinding. In all the experiments we have used a uniform grid in space and time. The spatial step h and time step τ are related by the fixed positive number λ through relation λ = τ /h. In discretization of integral equation (5) we use composite trapezoidal rule which gives the following procedure for calculation of formation temperature at t = nτ and z = ih: : ; n−1 1 n k k n TF,i = TZ,i + (T TV,i − TF,i )(P Pn+1−k − Pn−1−k ) + P1 TV,i . 1 + P1 k=1
As a consequence of the convolution in formula (5) we see that the solution on next time level includes the solutions on all previous time levels. It can be shown that that described explicit scheme is TVB (total variation bounded) and L∞ -stable if the following CFL condition is satisfied: λ≤
1 , max{vo , vi , ve } + Ch
where C > 0 is certain constant that can be calculated from the coefficients in (1)–(3). We now proceed with some numerical results. To evaluate the merits of one flow arrangement over another (counter flow and parallel flow), some conditions must be equal. The interval of time during which the water is cooled is not equal to the interval of time during which the water is heated. The sum of these time intervals we call circulating period or cycle. Both method can now be compared using the same circulating period. Results of our simulations after four cycles are presented in the figure Fig. 2. Counter flow heat exchange temperature calculations are shown on the left figure. The tubing temperature is almost always less then inner and greater than outer annulus temperature. Parallel flow heat exchange temperature calculations are shown on the right figure. The tubing temperature lies between the inner annulus temperature and formation temperature. In this case oil temperature is lower than any annulus temperature. Besides, formation temperature is higher than in the previous case.
242
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Tubing temperature as well as outer annulus temperature reach very soon almost constant level. The important thing to note with respect to the bottomhole fluid temperature is that this temperature continually changes with time. A steady-state condition is never attained. Hence the stabilization of both outlet temperatures does not mean that all of the temperatures in the circulating system are constant. Under the same conditions we found that in parallel-flow arrangement temperature drop is smaller. Therefore, we may conclude that parallel flow seems to be better.
100 90 80
Temperature
70 60 50 40 30 20 10 0 0
100 200 300 400 500 600 700 800 900 1000
Depth
Figure 2. Temperature calculation for counter flow heat exchange. Legend: inner annulus, 3 outer annulus, tubing, earth
Heating of oil well by hot water circulation
243
100 90 80
Temperature
70 60 50 40 30 20 10 0 0
100 200 300 400 500 600 700 800 900 1000
Depth
Figure 3. Temperature calculation for parallel flow heat exchange. Legend: inner annulus, 3 outer annulus, tubing, earth
To conclude we point out that the linear model presented in this article has simplicity as its main advantage. It is not difficult to implement it in a computer code and it gives certain initial estimate of heat exchange in the system. Yet, important physical processes, such as dissipation due to friction and variations of viscosities and mass densities with the temperature, are not taken into account. They lead to nonlinear model that will be considered in our forthcoming publication.
References [1] Artola, M. (1969). Sur les perturbations des equations ´ d’´evolution, ´ applications a` des ´ Norm. Sup., 4 (2) 137-253. probl`e` mes de retard, Ann. scient. Ec. [2] Carslaw,H.S. and Jaeger, J.C. (1950). Conduction of Heat in Solids, Oxford U. Press, London . [3] Gripenberg, G., Londen, S-O., Staffans, O. (1990). Volterra Integral and Functional Equations, Cambridge: Cambridge University Press. [4] Hasan,A.R. and Kabir, C.S. (1991). Heat Transfer During Two-Phase Flow in Wellbores: Part I-Formation Temperatures, paper SPE 22866 presented at the SPE Annual Technical Conference and Exhibition, Dallas, TX, Oct, 6-9. [5] Ramey, H.J.Jr. (1962). Wellbore Heat Transmission, J. Pet. Tech. 427-435; Trans., AIME, 225,.
244
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[6] Raymond, L.R. (1969). Temperature Distribution in a Circulating Drilling Fluid, J. Pet. Tech. 98-106. [7] van Everdingen, A.F. and Hurst, W. (1949). The Application of the Laplace Transformations to Flow Problems in Reservoirs, Trans. AIME, 186, 305-324.
GEOMETRIC INTERPOLATION OF DATA IN R3 Jernej Kozak Faculty of Mathematics and Physics and IMFM Jadranska 19, SI-1000 Ljubljana, Slovenia
[email protected]
ˇ Emil Zagar Faculty of Mathematics and Physics and IMFM Jadranska 19, SI-1000 Ljubljana, Slovenia
[email protected]
Abstract
In this paper, the problem of geometric interpolation of space data is considered. Cubic polynomial parametric curve is supposed to interpolate five points in three dimensional space. It is a case of a more general problem, i.e., the conjecture about the number of points in Rd which can be interpolated by parametric polynomial curve of degree n. The necessary and sufficient conditions are found which assure the existence and the uniqueness of the interpolating polynomial curve.
Keywords:
Parametric curve, geometric interpolation.
1.
Introduction
Interpolation by parametric polynomial curves is an important approximation procedure in computer graphics, computer aided geometric design, computer aided modeling, mathematical modeling,. . . . The word geometric refers to the fact that the interpolating curve is not forced to pass the points at given parameter values but is allowed to take its “minimum norm shape”. It is well known too, that this kind of interpolation can lead to interpolation schemes of high order accuracy. In [4], the authors conjectured that a parametric polynomial curve of degree n in Rd can, in general, interpolate @ A n−1 n+1+ d−1 data points. Some results by means of the asymptotic analysis can be found in [2], [5] and [8], but there are only a few results on this conjecture which do not 245 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 245–252. © 2005 Springer. Printed in the Netherlands.
246
APPLIED MATHEMATICS A N D SCIENTIFIC COMPUTING
involve asymptotic analysis, e.g., [3], [6] and [7]. In this paper the conjecture is proved to be true in the simplest nontrivial space case. More precisely, the cubic polynomial curve is found, which interpolates five points in R3. It is clear that this can not be done in general. The necessary and sufficient conditions on data points, which ensure the existence of the unique interpolating polynomial curve are provided. These conditions are purely geometric and do not require any asymptotic approach. The problem which was described above, can be formalized as follows. Suppose five points T , E R3,.j = 0 .1 ,2 ,3 ,4 , are given. It is assumed that T , # T,+1. Is there a unique regular cubic parametric polynomial curve B , which satisfies the interpolating conditions
B ( t j )= T j , j
= 0,1,2.3,4:
(1)
where t o < tl < t 2 < t 3 < t 4 are unknown parameter values? Clearly, t o and t 4 can, e.g., be chosen as t o := 0 and t 4 := 1, since one can always apply a linear reparametrization. Thus the only unknown parameters left are t 1, t 2 and t 3 which have to lie in a domain
27 := { t := ( t l ,t 2 , t 3 ) ; 0 =: t o < tl < t 2 < t 3 < t 4 := l}.
(2)
Recall that B is a vector polynomial in R3, and its coefficients are also unknown. But once t j , j = 1:2,3, are determined, any classical interpolation scheme on arbitrary four points trivially produces coefficients of B . Thus the main problem is how to determine the parameters t j . Since the interpolating polynomial curve is cubic, the problem is clearly nonlinear and one can expect the system of nonlinear equations. One of the ways how to obtain it is described in the next section.
2.
The system of nonlinear equations
A polynomial curve, which satisfies (l), is cubic, and any qth order divided difference maps it to zero, i.e.,
Since t j are different, the equations (3) can also be written as
where
Geometric Interpolation of Data in R3
247
By (l),the equation (4) rewrites to
i.e., the system of three nonlinear equations for t l , t 2 and t 3 . Furthermore,
and one of the terms in ( 5 ) can be always cancelled by subtracting (6) for any k . This leads to scalar equations for the unknown t j . More precisely, if (6) is subtracted from ( 5 ) with k = 4, e.g., the equation 3
.
.
is obtained. Cross multiplication of (7) by (T3- 7'4) and scalar product by (7'2 - 7'4) lead to
Since (U x b ) . c = det (U.b, c ) , a simplemanipulation by determinants simplifies (8) to W(t0)
h(tl)
+
+
det (ATo, AT2, AT3) = 0, det(AT1. AT2, AT3)
where AT, := Tj+l- T , . Two other nonlinear scalar equations can be derived in a similar way with different k applied in (7) , and we finally get the system
248
APPLIED MATHEMATICS A N D SCIENTIFIC COMPUTING
where a1 :=
det (AT",AT2, AT3) det (AT1,AT2, AT3) '
The system (9) can be shortly written as
F ( t : a ):= where a :=
f3(tl,t2.t3;Q3)lT
= 0.
,
[a1 a2, a3IT. The main theorem of this paper is now the following.
Theoem 1. A cubic parametric curve through jive points T, E R3, j = 0 , l . 2,s.4, is uniquely determined ifand only ifthe components of a,dejined by (10), are all positive.
3.
The proof of the theorem
In this section the proof of the Theorem 1 will be given. If the system (9) has a unique solution in D defined by (2), then a straightforward computation shows that
This implies that ai, i = 1:2 , 3 , must be positive and the first part of the theorem is proved. The proof that the positivity of the components of a is also sufficient condition, will be split into two main parts. a) A unique solution of the system for a particular vector a*will be established. b) The fact that a unique solution exists will be extended to all admissible vectors a by the aim of the homotopy theory. a) Consider a particular system (9) first, i.e.,
F ( t ;a*)= 0; a* := [3; 1,3IT Its polynomial equivalent on D reads
Geometric Interpolation of Data in Rs
249
One of the possible approaches to such polynomial systems is to use resultants as a tool that brings the system to a higher degree single variable case. Let Res(p, q , z)denote the resultant of polynomials p , and q, with respect to the variable 2 . It is straightforward to compute
where q ( t l ) := 1024 ty
3072 ty
~
+ 5952 tf
~
+ 4392 tf
6784 tf
~
1512 tl
+ 189.
Since tl # 0, 1, the only candidates for the first component of the solution t are the six roots of the polynomial g, i.e., 1
-
a * i d r n
3
4'4'
4
The second equation in (12) is obviously linear in
t2,
and it is easy to deduce
1
that only the root tl
=
4
produces the (unique) solution
of the system (1 1) in D. b) In order to extend the fact from a) to the general a,consider the linear homotopy H ( t ,a :A) := (1 - A) F ( t ;a*) A F ( t :a ) (14) A particular form of the Brouwer's degree of a differentiable map G reads
+
degree(G. D)=
sign(det(J(G)(t))).
(15)
t t D , G(t)=O
where J is a Jacobian of G with respect to t. It gives some information about the number of the zeros of G in D.In particular, if degree(G,D)= f l .
G has at least one zero in D.Even more ( [ l , p. 5 2 ] ) ,if (15) is applied to H , the Brouwer's degree is invariant for all A E [0,1],provided H ( t ,a :A) # 0,
t
€
DD. A
€
[O,11.
(16)
It is also important to note that if J ( G ) in (15) is globally nonsingular, then Brouwer's degree gives the exact number of zeros of G in D.In our case the Jacobian a q t , a ;A) - a q t , a * )
J ( H ) ( t ):=
at
-
at
250
APPLIED MATHEMATICS A N D SCIENTIFIC COMPUTING
really is globally nonsingular on simplifies to
6 ( t o t 4 ) 3 ( t 4 tl) ( t 4 t 2 ) ( t 4 t 3 ) to) ( t 2 to) ( t 3 t o ) ( t 2 t1I2 ( t 3 t l )2 ( t 3 -
(tl
-
D,since its determinant at any point t
-
-
-
-
-
-
-
-
t2)
E
D
< 0.
Since for X = 0 the homotopy (14) becomes our particular system (11) for which a unique solution has been established, and degree(H(0, a*; 0 ) ; D)= -1, the Brouwer’s degree of H will be -1 for all X E [0, 11, if (16) holds. Unfortunately, H is not differentiable on 823. Even more, it is not continuous (or even unbounded) on some points of the boundary. Thus the following lemma is needed.
Lemma 1. There is a compact set 5 c D which contains particular solution (13)and H ( t ,a ;A) # Ojbrt E 35, X E [0,1]and a withpositive components.
Proof: Let us prove first that H ( t ,a ;A) can not have any solution arbitrary close to 3D. Note that
implies H 2 (t,a ;A)
# 0, thus H2 is either unbounded or H 2 ( t ,a;A)
=
(1 - X)a;
+ A a 2 > 0,
since the components of a*and a are positive. Thus only the relations
are left to examine. Since t o = 0 < 1 = t 4 , there are only two possibilities in the first case, t o = tl < t 2 and t 2 < t 3 = t 4 . This implies obvious contradictions H l ( t ,a;A) = (1 A)aT Xal > 0 , -
and
H a @ ,a;A)
=
(1 - X)a;
+
+ X a a > 0.
In the second case one has t o < tl = t 2 < t 3 < t 4 , t o < tl < t 2 = t 3 < t d or t o < tl = t 2 = t 3 < t 4 . But now H1 or H3 is unbounded. So all the zeros of H are strictly in D.But D is an open set, thus there is a compact set 5 c D with a smooth boundary which contains all the zeros of H in its interior. The proof of the last lemma also completes the proof of the Theorem 1.
Geometric Interpolation of Data in R3
4.
25 1
Numerical example
The results from the previous sections will be illustrated by a numerical example here. Let us suppose that the interpolating points are taken from the helix f(q)
=
[cos377, sin 371~3 qIT.
Let T,j = f ( ~ j ) ,where q ~ j= j/4,j = 0 , 1 , 2 , 3,4. It is a matter of straightforward computation to verify that ai, i = 1 , 2 : 3, defined by (lO), are positive and the conditions of the Theorem 1 are met. The solution of the nonlinear system (9) can be obtained by applying, e.g., Newton’s method or any of the continuation methods. This gives the solution
t
=
[0.2313: 0.5000, 0.7687IT
Now one can use any classical interpolation scheme on arbitrary four interpolation points T,?,which gives the interpolating polynomial curve
+
3.041 t3 - 4.823t2 - 0.207t 1 -0.216t3 - 3.384t2 3.741 t 1.172t3 - 1.759t2 3.586t
+ +
Figure I .
1
The interpolated data points and parametric polynomial curve.
252
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
References [1] Berger, M. S., Nonlinearity and Functional Analysis, Lectures on Nonlinear Problems in Mathematical Analysis, Academic Press, 1977. ¨ [2] de Boor, C., K. Hollig, and M. Sabin, High accuracy geometric Hermite interpolation, Comput. Aided Geom. Design, 4 (1987), pp. 269–278. [3] Feng, Y. Y., and J. Kozak, On spline interpolation of space data, in Mathematical Methods for Curves and Surfaces II, M. Dæhlen, T. Lyche, and L. L. Schumaker (eds.), Vanderbilt University Press, Nashville, 1998, pp. 167–174. ¨ [4] Hollig, K., and J. Koch, Geometric Hermite interpolation with maximal order and smoothness, Comput. Aided Geom. Design, 13 (1996), pp. 681–695. ˇ [5] Kozak, J., and E. Zagar, On curve interpolation in Rd , in Curve and Surface Fitting - Saint Malo 1999, A. Cohen, C. Rabut, and L. L. Schumaker (eds.), Vanderbilt University Press, Nashville, 2000, pp. 263–273. ˇ [6] Kozak, J., and E. Zagar, On geometric interpolation by polynomial curves, accepted for publication in SIAM Journal on Numerical Analysis. [7] Mørken, K., Parametric interpolation by quadratic polynomials in the plane, in Mathematical Methods for Curves and Surfaces, M. Dæhlen, T. Lyche, and L. L. Schumaker (eds.), Vanderbilt University Press, Nashville, 1995, pp. 385–402. [8] Mørken, K., and K. Scherer, A general framework for high-accuracy parametric interpolation, Math. Comp., 66 (1997), pp. 237–260.
ONE-DIMENSIONAL FLOW OF A COMPRESSIBLE VISCOUS MICROPOLAR FLUID: STABILIZATION OF THE SOLUTION Nermina Mujakovi´c Faculty of Philosophy University of Rijeka Omladinska 14, 51000 Rijeka Croatia
[email protected]
Abstract
An initial-boundary value problem for one-dimensional flow of a compressible viscous heat-conducting micropolar fluid is considered. It is assumed that the fluid is thermodynamically perfect and polytropic. This problem has a unique strong solution on ]0, 1[×]0, T [, for each T > 0 ([7]).We also have some estimations of the solution independent of T ([8]). Using these results we prove a stabilization of the solution.
Keywords:
Micropolar fluid, stabilization, convergence.
Introduction In this paper we consider nonstationary 1-D flow of a compressible and heat-conducting micropolar fluid, being in a thermodynamical sense perfect and polytropic ([6]). A corresponding initial-boundary value problem has a unique strong solution on ]0, 1[×]0, T [, for each T > 0 ([7]). For this solution we have also some a priori estimations independent of T ([8]). Using these results we prove that the solution converges to a stationary one in the space (H 1 (]0, 1[))4 (when T → ∞). Stabilization of a solution of a Cauchy problem for classical fluid has been considered by Y.I. Kanel ([3]) and A.Matsumura and T. Nishida ([5]) under assumption that the initial functions are sufficiently small. In our proof we follow some ideas of S.N. Antontsev, A.V. Kazhykhov and V.N. Monakhov ([1]) applied to 1-D initial-boundary value problem for a classical fluid.
253 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 253–262. © 2005 Springer. Printed in the Netherlands.
254
1.
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Statement of the problem and the main result
Let ρ, v, ω and θ denote, respectively, the mass density, velocity, microrotation velocity and temperature of the fluid in the Lagrangean description. Governing equations of the flow under consideration are as follows ([7]): ∂ρ ∂v + ρ2 =0, ∂t ∂x
(1.1)
∂v ∂ ∂v ∂ = (ρ ) − K (ρθ) , (1.2) ∂t ∂x ∂x ∂x ∂ω ∂ ∂ω ρ = A[ρ (ρ ) − ω] , (1.3) ∂t ∂x ∂x ∂θ ∂v ∂v ∂ω ∂ ∂θ ρ = −Kρ2 θ + ρ2 ( )2 + ρ2 ( )2 + ω 2 + Dρ (ρ ) (1.4) ∂t ∂x ∂x ∂x ∂x ∂x in ]0, 1[×R+ , where K, A and D are positive constants. We take the homogeneous boundary conditions v(0, t) = v(1, t) = 0 ,
(1.5)
ω(0, t) = ω(1, t) = 0 ,
(1.6)
∂θ ∂θ (0, t) = (1, t) = 0 ∂x ∂x for t > 0 and non-homogeneous initial conditions
(1.7)
ρ(x, 0) = ρ0 (x) ,
(1.8)
v(x, 0) = v0 (x) ,
(1.9)
ω(x, 0) = ω0 (x) ,
(1.10)
θ(x, 0) = θ0 (x)
(1.11)
for x ∈ Ω =]0, 1[, where ρ0 , v0 , ω0 and θ0 are given functions. We assume that there exists m, M ∈ R+ , such that 0 < m ≤ ρ0 (x) ≤ M, m ≤ θ0 (x) ≤ M, x ∈ Ω .
(1.12)
ρ0 , θ0 ∈ H 1 (Ω), v0 , ω0 ∈ H01 (Ω).
(1.13)
Let Then for each T > 0 the problem (1.1)-(1.11) has a unique strong solution ([7]) (x, t) → (ρ, v, ω, θ)(x, t), (x, t) ∈ QT = Ω×]0, T [
(1.14)
with the properties: ρ ∈ L∞ (0, T ; H 1 (Ω)) ∩ H 1 (QT ),
(1.15)
One-dimensional flow of a compressible viscous micropolar fluid
v, ω, θ ∈ L∞ (0, T ; H 1 (Ω)) ∩ H 1 (QT ) ∩ L2 (0, T ; H 2 (Ω)), ¯T . ρ > 0 , θ > 0 on Q
255 (1.16) (1.17)
From embedding and interpolation theorems ([4]) one can conclude that from (1.15) and (1.16) it follows: ¯ ∩ C([0, T ]; L2 (Ω)), ρ ∈ L∞ (0, T ; C(Ω))
(1.18)
¯ ∩ C([0, T ], H 1 (Ω)), v, ω, θ ∈ L2 (0, T ; C 1 (Ω)) ¯ T ). ρ, v, ω, θ ∈ C(Q
(1.19) (1.20)
We also obtain that in the domain Q = Ω×]0, ∞[ the problem (1.1)-(1.11) has the solution (ρ, v, ω, θ) ([8]) with the properties : inf ρ > 0,
(1.21)
ρ ∈ L∞ (0, ∞; H 1 (Ω)) ,
(1.22)
∂ρ ∈ L2 (0, ∞; L2 (Ω)), ∂x
(1.23)
∂ρ ∈ L∞ (0, ∞; L2 (Ω)) ∩ L2 (Q) , ∂t
(1.24)
Q
v, ω ∈ L∞ (0, ∞; H 1 (Ω)) ∩ L2 (0, ∞; H 2 (Ω)) ∩ H 1 (Q) , θ ∈ L∞ (0, ∞; H 1 (Ω)) ,
(1.25) (1.26)
∂θ (1.27) ∈ L2 (0, ∞; H 1 (Ω)) , ∂x ∂θ ∈ L2 (Q). (1.28) ∂t ∂ρ θ 1/2 ∈ L2 (0, ∞; L2 (Ω)), (1.29) ∂x >From embedding one can conclude that from (1.22), (1.25), (1.26) and (1.27) it follows: ¯ , ρ ∈ L∞ (0, ∞; C(Ω)) (1.30) ∞ 2 1 ¯ ∩ L (0, ∞; C (Ω)) ¯ , v, ω ∈ L (0, ∞; C(Ω)) (1.31) ¯ , θ ∈ L∞ (0, ∞; C(Ω))
(1.32)
∂θ ¯ . ∈ L2 (0, ∞; C(Ω)) ∂x
(1.33)
Let be
! α= 0
1
1 dx, ρ0 (x)
(1.34)
256
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
1 1 E1 = v0 2 + ω0 2 + θ0 L1 (Ω) (1.35) 2 2A where = L2 (Ω) . It is easy to see that a stationary solution x → (ρ, v, ω, θ)(x), x ∈ Ω, of the system (1.1)-(1.7), satisfying conditions: ! 1 1 1 1 dx = α , v 2 + ω 2 + θ L1 (Ω) = E1 (1.36) 2 2A 0 ρ(x) is (ρ, v, ω, θ) ≡ (α−1 , 0, 0, E1 ). Our purpose is to prove the following result. Theorem 1.1. The above solution converges to a stationary solution (α −1 , 0, 0, E1 ) in the space (H 1 (Ω))4 (when t → ∞).
2.
Some properties of the nonstationary solution
In that what follows we denote by C > 0 a generic constant, not depending of t > 0 and having possibly different values at different places. We use some of considerations from [1] and [7]. In these cases we omit proofs, making reference to corespondent pages of [1] and [7]. Lemma 1. ([1], pp.43-44)For t > 0 it holds ! 1 dx = α. 0 ρ(x, t)
(2.1)
There exists a bounded function t → r(t), 0 ≤ r(t) ≤ 1, such that ρ(r(t), t) = α−1 .
(2.2)
Lemma 2. ([7],pp.201) For t > 0 it holds 1 1 v(., t) 2 + ω(., t) 2 + θ(., t) L1 (Ω) = E1 . 2 2A
(2.3)
Lemma 3. There exists a constant C ∈ R + such that the following inequalities hold true: ! ∞ d ∂v | 2 |dt ≤ C, (2.4) dt ∂x 0 ! ∞ d ∂ω | 2 |dt ≤ C, (2.5) dt ∂x 0 ! ∞ d ∂θ | 2 |dt ≤ C, (2.6) dt ∂x 0
One-dimensional flow of a compressible viscous micropolar fluid
!
∞
0
|
d ∂ ln ρ 2 |dt ≤ C. dt ∂x
257 (2.7)
2
∂ v Proof. Multiplying the equation (1.2) by ∂x 2 and integrating over ]0, 1[ we obtain ! 1 ! 1 1 d ∂v 2 ∂2v ∂ρ ∂ 2 v ρ( 2 )2 dx + K θ dx =− 2 dt ∂x ∂x ∂x ∂x2 0 0 ! 1 ! 1 ∂θ ∂ 2 v ∂ρ ∂v ∂ 2 v +K ρ dx − dx. (2.8) 2 ∂x ∂x2 0 0 ∂x ∂x ∂x
Taking into account (1.30), the inequality (
∂v 2 ∂v ∂ 2 v ∂2v ) ≤ 2 2 ≤ 2 2 2 ∂x ∂x ∂x ∂x
(2.9)
and the property ([1], pp.95) θ 2 (x, t) ≤ C[θ(x, t) +
∂θ (., t) 2 ] ∂x
(2.10)
and applying the Young inequality with a parameter ε > 0 on the right-hand side of (2.8) we find the following estimates: ! 1 ∂ρ ∂ 2 v ∂ρ ∂2v ∂ρ ∂2v K| θ 2 dx| ≤ C θ 2 ≤ ε 2 2 + Cε θ 2 ∂x ∂x ∂x ∂x 0 ∂x ∂x 2 ∂ v ∂ρ ∂θ ∂ρ ≤ ε 2 2 + C( θ 1/2 2 + 2 2 ); (2.11) ∂x ∂x ∂x ∂x ! K|
1
ρ 0
∂θ ∂ 2 v ∂θ ∂ 2 v ∂2v 2 ∂θ dx| ≤ C ≤ ε + Cε 2 ; 2 2 2 ∂x ∂x ∂x ∂x ∂x ∂x !
1
∂ρ ∂v ∂ 2 v ∂v ∂ρ ∂ 2 v dx| ≤ max | | 2 2 ¯ ∂x ∂x ∂x x∈Ω 0 ∂x ∂x ∂x 2 2 ∂ρ ∂v ∂ v ∂ v ∂ρ ∂v ≤ C 1/2 2 3/2 ≤ ε 2 2 + Cε 4 2 . ∂x ∂x ∂x ∂x ∂x ∂x |
(2.12)
(2.13)
Inserting (2.11)-(2.13) in (2.8) we obtain d ∂v 2 ∂2v ∂ρ ∂v | ≤ C( 2 2 + 4 2 dt ∂x ∂x ∂x ∂x ∂θ 2 ∂ρ ∂θ ∂ρ + + θ 1/2 2 + 2 2 ). ∂x ∂x ∂x ∂x
|
(2.14)
258
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Integrating (2.14) over ]0, ∞[ and using (1.25), (1.29), (1.22) and (1.27) we get (2.4). 2 Multiplying the equations (1.3) and (1.4) respectively by A −1 ρ−1 ∂∂xω2 and ∂2θ ρ−1 ∂x 2 , integrating over ]0, 1[ and making use (1.6) and (1.7) we obtain 1 d ∂ω 2 =− 2A dt ∂x
! 0
1
∂2ω ρ( 2 )2 dx + ∂x
!
1 0
ω ∂2ω dx − ρ ∂x2
!
1 0
∂ρ ∂ω ∂ 2 ω dx, (2.15) ∂x ∂x ∂x2
! 1 ! 1 1 d ∂θ 2 ∂2θ 2 ∂v ∂ 2 θ = −D ρ( 2 ) dx + K ρθ dx 2 dt ∂x ∂x ∂x ∂x2 0 0 ! 1 ! 1 ∂v 2 ∂ 2 θ ∂ω 2 ∂ 2 θ − ρ( ) dx − ρ( ) dx (2.16) ∂x ∂x2 ∂x ∂x2 0 0 ! 1 2 2 ! 1 ω ∂ θ ∂ρ ∂θ ∂ 2 θ − dx − D dx. 2 2 0 ρ ∂x 0 ∂x ∂x ∂x Taking into account (1.21), (1.22), (2.9) and the inequalities ω 2 ≤ 2 ω
∂ω ∂ω ∂ 2 ω ∂ω , ( )2 ≤ 2 2 , ∂x ∂x ∂x ∂x
(2.17)
∂θ 2 ∂θ ∂ 2 θ ) ≤ 2 2 (2.18) ∂x ∂x ∂x and applying again the Young inequality on the right-hand side of (2.15) and (2.16) we find that (
|
d ∂ω 2 ∂2ω ∂2ω ∂ω ∂ρ ∂ 2 ω | ≤ C( 2 2 + ω 2 + max | | 2 ) ¯ ∂x ∂x dt ∂x ∂x ∂x ∂x x∈Ω ≤ C(
∂2ω 2 ∂ω 1/2 ∂ 2 ω 3/2 ∂ρ 2 + ω + C 2 ) ∂x2 ∂x ∂x ∂x
≤ C(|
|
(2.19)
∂2ω 2 ∂ρ ∂ω + ω 2 + 4 2 ), 2 ∂x ∂x ∂x
d ∂θ 2 ∂2θ ∂v ∂v ∂2θ ∂2θ | ≤ C( 2 2 + max | | θ 2 + 4L4 (Ω) + 2 2 ¯ ∂x dt ∂x ∂x ∂x ∂x ∂x x∈Ω
+
∂ω 4 ∂2θ ∂2θ ∂θ ∂ρ ∂ 2 θ L4 (Ω) + 2 2 + ω 4L4 (Ω) + 2 2 + max | | 2 ) ¯ ∂x ∂x ∂x ∂x ∂x ∂x x∈Ω
One-dimensional flow of a compressible viscous micropolar fluid
≤ C(
+
259
∂2θ 2 ∂2v 2 ∂ 2 v 2 ∂v 2 2 + θ + ∂x2 ∂x2 ∂x2 ∂x
∂ 2 ω 2 ∂ω 2 ∂ω ∂ρ ∂θ + ω 2 2 + 4 2 ). 2 ∂x ∂x ∂x ∂x ∂x
(2.20)
Integrating (2.19) and (2.20) over ]0, ∞[ and using (1.25), (1.22), (1.27) and (1.26) we obtain immediately (2.5) and (2.6). Finally, from the equations (1.1) and (1.2) it follows ∂ ∂ ln ρ ∂ρ ∂θ ∂v ( ) = −Kθ − Kρ − . ∂t ∂x ∂x ∂x ∂t ∂ ln ρ ∂x
and integrating over ]0, 1[ we obtain ! 1 1 d ∂ ln ρ 2 ∂ ln ρ 2 = −K ρθ( ) dx 2 dt ∂x ∂x 0 ! 1 ! 1 ∂θ ∂ ln ρ ∂v ∂ ln ρ −K ρ dx − dx. ∂x ∂x 0 0 ∂t ∂x
Multiplying (2.21) by
(2.21)
(2.22)
With the help of (1.30) in the same as before we get |
1 ∂ ln ρ d ∂ ln ρ 2 ∂θ ∂ ln ρ 2 ∂v | ≤ C( θ 2 2 + 2 + + 2 ). (2.23) dt ∂x ∂x ∂x ∂x ∂t
Integrating over ]0, ∞[ and taking into account (1.21), (1.29), (1.27), (1.23) and (1.25) we get the estimation (2.7).
3.
Proof of Theorem 1.1.
Lemma 1. For each ε > 0 there exists t 0 ∈ R+ such that ∂v ∂ω (., t0 ) 2 ≤ ε , (., t0 ) 2 ≤ ε ∂x ∂x
(3.1)
lim v(., t) 2H 1 (Ω) = 0 , lim ω(., t) 2H 1 (Ω) = 0.
(3.2)
and it holds t→∞
t→∞
Proof. With the help of (2.4), (2.5) and (1.25) we conclude that for each ε > 0 there exists t0 > 0 such that for each t > t0 it holds ! t ! t d ∂v 2 d ∂ω | |dt ≤ ε , | 2 |dt ≤ ε, (3.3) dt ∂x dt ∂x t0 t0 !
t
t0
∂v 2 dτ ≤ ε , ∂x
!
t
t0
∂ω 2 dτ ≤ ε. ∂x
(3.4)
260
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
For τ > t0 from (3.3) it follows ∂v ∂v (., t0 ) 2 − (., τ ) 2 ≤ ε, ∂x ∂x
(3.5)
∂ω ∂ω (., t0 ) 2 − (., τ ) 2 ≤ ε. ∂x ∂x
(3.6)
−ε ≤
−ε ≤
Integrating (3.5) and (3.6) over ]t 0 , t[ (t > t0 ) and using (3.4) we obtain
∂v ε ∂ω ε (., t0 ) 2 ≤ ε + , (., t0 ) 2 ≤ ε + ∂x t − t0 ∂x t − t0
(3.7)
and (3.1) is valid. With the help of (3.1) from (3.5) and (3.6) we can easily conclude that for t > t0 it holds
∂v ∂ω (., t) 2 ≤ 2ε , (., t) 2 ≤ 2ε. ∂x ∂x
(3.8)
For the functions v and ω it holds v(., t) ≤ 2
∂v ∂ω (., t) , ω(., t) ≤ 2 (., t) , t > 0 ∂x ∂x
(3.9)
from which we conclude that v(., t) 2 ≤ Cε , ω(., t) 2 ≤ Cε for t > t0 .
(3.10)
The convergence (3.2) follow now from (3.8) and (3.10). Lemma 2. For each ε > 0 there exists t 0 ∈ R+ such that and it holds
∂ ln ρ ∂θ (., t0 ) 2 ≤ ε , (., t0 2 ≤ ε ∂x ∂x
(3.11)
lim ρ(., t) − α−1 H 1 (Ω) = 0,
(3.12)
lim θ(., t) − E1 H 1 (Ω) = 0.
(3.13)
t→∞
t→∞
Proof. In the same way as in Lemma 3.1., with the help of (2.7), (1.21), (1.23), (2.6) and (1.27) we get (3.11) and the following estimations:
∂ ln ρ (., t) 2 ≤ 2ε , t > t0 , ∂x
(3.14)
∂θ (., t) 2 ≤ 2ε , t > t0 . ∂x
(3.15)
One-dimensional flow of a compressible viscous micropolar fluid
261
Using (1.30) from (3.14) we conclude that (when t → ∞ )
∂ρ (., t) 2 → 0. ∂x
(3.16)
With the help of the result from Lemma 2.1. we have |ρ(x, t) − α−1 | = |ρ(x, t) − ρ(r(t), t)| ! x ∂ρ ∂ρ =| (ξ, t)dξ| ≤ (., t) ∂x ∂x r (t)
(3.17)
and because of (3.16) we obtain lim ρ(., t) − α−1 2 = 0.
t→∞
Using (2.3) we find that there exists x1 (t) ∈ [0, 1] such that ! 1 1 1 θ(x1 (t), t) = θ(x, t)dx = E1 − v(., t) 2 − ω(., t) 2 . 2 A 0
(3.18)
(3.19)
It holds 1 1 |θ(x, t) − E1 + v(., t) 2 + ω(., t) 2 | = 2 2A ∂θ = |θ(x, t) − θ(x1 (t), t)| ≤ (., t) ∂x
(3.20)
and from (3.20) we get 1 1 ∂θ |θ(x, t) − E1 | − ( v(., t) 2 + ω(., t) 2 ) ≤ (., t) , t > 0. (3.21) 2 2A ∂x From (3.21), (3.15) and (3.2) we conclude that ∂θ (., t) 2 + v(., t) 4 + ω(., t) 4 ) = 0. t→∞ t→∞ ∂x (3.22) With the help of (3.16), (3.18), (3.15) and (3.22) one can easily verifies (3.12) and (3.13). lim θ(., t) − E1 2 ≤ C lim (
Theorem 1.1. is an immediate consequence of the above lemmas.
References [1] Antontsev S.N., Kazhykhov A.V., Monakhov V.N. (1990). Baundary Value Problems in Mechanics of Nonhomogeneous Fluids, North-Holland. [2] Brezis H. (1983). Analyse fonctionnalle, Masson, Paris.
262
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[3] Kanel Ya.I. (1979). On Cauchy problem for gas dynamics equations with viscosity, Sibirsk. Math. Journ. T.20.-N2.-S, 293-306.(Russian) [4] Lions J.L., Magenes E. (1972). Non-Homogeneous Boundary Value Problems and Applications, Vol.1, Springer-Verlag, Berlin. [5] Matsumura A., Nishida T. (1980). The initial value problem for the equations of motion of viscous and heat-conductive gases, J. Math. Kyoto Univ. 20-1, 67-104. [6] Mujakovi´c´ N. (1998). One-dimensional flow of a compressible viscous micropolar fluid: a ˇ 33(53), 71-91. local existence theorem, Glasnik matematicki [7] Mujakovi´c´ N. (1998). One-dimensional flow of a compressible viscous micropolar fluid: ˇ 33(53), 199-208. a global existence theorem, Glasnik matematicki [8] Mujakovi´c´ N. (to appear). One-dimensional flow of a compressible viscous micropolar fluid: a priori estimations of the solution independent of time
ON PARAMETER CLASSES OF SOLUTIONS FOR SYSTEM OF QUASILINEAR DIFFERENTIAL EQUATIONS Alma Omerspahi´c Faculty of Mechanical Engineering in Sarajevo, University of Sarajevo Vilsonovo sˇ etaliste ˇ 9, 71000 Sarajevo, Bosnia and Herzegovina
[email protected]
Boˇzˇ o Vrdoljak Faculty of Civil Engineering, University of Split Matice hrvatske 15, 21000 Split, Croatia
[email protected]
Abstract
The paper presents some results on the existence and behaviour of some parameter classes of solutions for system of quasilinear differential equations. The behaviour of integral curves in neighbourhoods of an arbitrary curve is considered. The obtained results contain the answer to the question on stability as well as approximation of solutions whose existence is established. The errors of the approximation are defined by the functions that can be sufficiently small. The theory of qualitative analysis of differential equations and topological retraction method are used.
Keywords:
quasilinear differential equation, parameter classes of solutions, approximation of solutions.
1.
Introduction Let us consider a system of quasilinear differential equations .
x = A (x, t) x + F (x, t) .
(1)
where x (t) = (x1 (t) , ..., xn (t))τ , n ≥ 2, t ∈ I = a, ∞ , a ∈ IR, D ⊂ IRn is open set, Ω = D × I, A(x, t) = (aij (x, t))n×n is the matrix-function with elements aij ∈ C (Ω, IR) , F (x, t) = (f1 (x, t) , ..., fn (x, t))τ is the vectorfunction with elements fi ∈ C (Ω, IR) . Moreover, A (x, t) and F (x, t) sat263 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 263–272. © 2005 Springer. Printed in the Netherlands.
264
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
isfy the sufficient conditions for existence and uniqueness of solution of any Cauchy’s problem for system (1) in Ω. Let Γ = {(x, t) ∈ Ω : x = ϕ (t) , t ∈ I} , (2) where ϕ (t) = (ϕ1 (t) , ..., ϕn (t)) , ϕi (t) ∈ C 1 (I, IR) , is a certain curve in Ω. We shall consider the behaviour of integral curves (x (t) , t) , t ∈ I, of the system (1) with respect to the set ω = {(x, t) ∈ Ω : |xi − ϕi (t)| < ri (t) , i = 1, ..., n} , (3) where ri ∈ C 1 I, IR+ , i = 1, ..., n. The boundary surfaces of the set ω with respect to the set Ω are Wik = (x, t) ∈ Clω ∩ Ω : Bik (x, t) := (−1)k (xi − ϕi (t)) − ri (t) = 0 , (4) k = 1, 2, i = 1, ..., n. Let us denote the tangent vector field to an integral curve (x (t) , t) , t ∈ I, of (1) by T. The vectors ∇Bik are the external normals on surfaces W ik . We have ⎛ ⎞ n n n T = ⎝ a1j xj + f1 , ..., aij xj + fi , ..., anj xj + fn , 1⎠ j=1
j=1
j=1
∇Bik = (−1)k δ1i , ..., δni , −ϕ i − (−1)k ri , where δmi is the Kronecker delta. Considering the sign of the scalar products P ik (x, t) = (∇Bik (x, t), T (x, t)) on Wik , k = 1, 2 , i = 1, ..., n , we shall establish the behaviour of integral curves of (1) with respect to the set ω. The results of this paper based on the Lemmas 1 and 2 in [7] and the following Lemma. In the following (n 1 , . . . , nn ) denotes a permutation of indices (1, . . . , n). Lemma 1. If, for the system (1) , the scalar products Pik = ∇Bik , T < 0 on Wik , k = 1, 2, i = n1 , ..., np , and Pik = ∇Bik , T > 0 on Wik , k = 1, 2, i = np+1 , ..., nn ,
(5) (6)
where p ∈ {0, 1, ..., n}, then the system (1) has a p−parameter class of solutions which belongs to the set ω (graphs of solutions belong to ω) for all t ∈ I.
265
System of quasilinear differential equations
Notice that, according to this Lemma, the case p = 0 means that the system (1) has at least one solution belongings to the set ω for all t ∈ I. n -p 1 Wi ∪ Wi2 has The conditions (5) and (6) imply that the set U = no point of exit and V =
n -n
i=np+1
2
Wi1 ∪ Wi
i=n1
is the set of points of strict exit
from the set ω with respect to the set Ω, for integral curves of system (1), which according to the retraction method, makes the statement of Lemma valid (see [2-7]). In the case p = n this Lemma gives the statement of Lemma 1, and for p = 0 the statement of Lemma 2 in [7].
2.
The main results Let n Xi (x, t) : = aij (x, t) xj + fi (x, t) − ϕ i (t) ,
Φi (x, t) : =
j=1 n
aij (x, t) ϕj (t) + fi (x, t) − ϕ i (t) ;
i = 1, ..., n.
j=1
Theorem 1. If, on Wik , k = 1, 2,
and
|X Xi (x, t)| < ri (t) , i = n1 , ..., np ,
(7)
|X Xi (x, t)| < −ri (t) , i = np+1 , ..., nn ,
(8)
(p ∈ {0, 1, ..., n}) , then the system (1) has a p−parameter class of solutions which belongs to the set ω. Proof. Let us consider the behavior of the integral curves of system (1) with respect to the set ω, which are defined by (3) . For P ik (x, t) on Wik we have ⎛ ⎞ n Pik = (−1)k ⎝ aij xj + fi ⎠ − (−1)k ϕ i − ri = (−1)k Xi − ri . j=1
According to (7) and (8) we have Pik ≤ |X Xi | − ri < 0, on Wik , k = 1, 2, i = n1 , ..., np , Pik ≥ − |X Xi | − ri > 0 , on Wik , k = 1, 2, i = np+1 , ..., nn . Hence, in the direction of p axis we have Pik (x, t) < 0 on Wik , and in the direction of other n − p axis Pik (x, t) > 0 on Wik , k = 1, 2. These estimates, according to the Lemma 1, confirm the statement of this Theorem.
266
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Theorem 2. If |X Xi (x, t) + aii (x, t) (ϕi (t) − xi )| < −aii (x, t) ri (t) + ri (t)
(9)
on Wik , k = 1, 2, i = n1 , ..., np , and |X Xi (x, t) + aii (x, t) (ϕi (t) − xi )| < aii (x, t) ri (t) − ri (t)
(10)
on Wik , k = 1, 2, i = np+1 , ..., nn , then the system (1) has a p−parameter class of solutions belonging to the set ω. Proof. Here, for the scalar products Pik (x, t) on Wik we have ⎡ ⎤ n Pik = (−1)k ⎣ aij xj + fi − ϕ i ⎦ − ri
(11)
j=1
= (−1) aii (xi − ϕi ) − ri ⎛ ⎞ n + (−1)k ⎝ aij xj + aii ϕi + fi − ϕ i ⎠ k
(12)
j=1(j= i)
= aii ri + (−1)k [X Xi + aii (ϕi − xi )] − ri . According to (9) and (10) , we have on W ik , k = 1, 2, Pik ≤ aii ri + |X Xi + aii (ϕi − xi )| − ri < 0 , i = n1 , ..., np , Pik ≥ aii ri − |X Xi + aii (ϕi − xi )| − ri > 0 , i = np+1 , ..., nn . These estimates imply the statement of the Theorem. Theorem 3. If n
|aij (x, t)| rj (t) + |Φi (x, t)| < −aii (x, t) ri (t) + ri (t)
(13)
j=1(j= i)
on Wik , k = 1, 2, i = n1 , ..., np , and n
|aij (x, t)| rj (t) + |Φi (x, t)| < aii (x, t) ri (t) − ri (t)
(14)
j=1(j= i)
on Wik , k = 1, 2, i = np+1 , ..., nn , then the system (1) has a p−parameter class of solutions which belongs to the set ω.
System of quasilinear differential equations
267
Proof. For the scalar products Pik (x, t) on Wik , we have, using (12), Pik = aii ri + (−1)k
n
aij (xj − ϕj ) + (−1)k Φi − ri .
j=1(j= i)
Moreover, it is sufficient to note that on W ik , according to (13) and (14) Pik ≤ aii ri +
n
|aij | rj + |Φi | − ri < 0, k = 1, 2, i = n1 , ..., np ,
j=1(j= i)
Pik ≥ aii ri −
n
|aij | rj − |Φi | − ri > 0, k = 1, 2, i = np+1 , ..., nn .
j=1(j= i)
We shall now consider the case of diagonal matrix-function A (x, t), aij (x, t) ≡ 0 , j = i , i, j = 1, ..., n .
(15)
Corollary 1. Let the condition (15) is satisfied. If, on W ik , k = 1, 2, 1 1 1aii (x, t) xi + fi (x, t) − ϕ i (t)1 < ri (t) , i = n1 , ..., np , and 1 1 1aii (x, t) xi + fi (x, t) − ϕ i (t)1 < −ri (t) , i = np+1 , ..., nn , then the conclusion of Theorem 1 holds. Corollary 2. Let (15) valids. If 1 1 1aii (x, t) ϕi (t) + fi (x, t) − ϕ i (t)1 < −aii (x, t) ri (t) + ri (t) on Wik , k = 1, 2 , i = n1 , ..., np , and 1 1 1aii (x, t) ϕi (t) + fi (x, t) − ϕ i (t)1 < aii (x, t) ri (t) − ri (t) on Wik , k = 1, 2 , i = np+1 , ..., nn , then the conclusion of Theorem 2 holds. This Corollary follows from Theorems 2 and 3. In the diagonal case we have the following theorem. Corollary 3. Let (15) valids. If, on W ik , k = 1, 2, |ffi (x, t)| < −aii (x, t) ri (t) + ri (t) , i = n1 , ..., np ,
268 and
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
|ffi (x, t)| < aii (x, t) ri (t) − ri (t) , i = np+1 , ..., nn ,
then the system (1) has a p−parameter class of solutions x (t) which satisfy the condition |xi (t)| < ri (t) , t ∈ I, i = 1, ..., n. The proof follows from Theorem 2 for ϕ (t) = 0. We shall now consider the case A (x, t) = C (t) + D (x, t) , F (x, t) = G (t) + H (x, t) . Let us consider the system (1) and the systems .
x = C (t) x + G (t) ,
(16)
x = C (t) x + F (x, t) .
(17)
.
Theorem 4. Let Γ (given by (2)) be an integral curve of the system (16) and let C be a diagonal matrix. If 1 1 1 n 1 1 1 1 1 < −aii (x, t) ri (t) + ri (t) d (x, t) x + d (x, t) ϕ (t) + h (x, t) ij j ii i i 1 1 1j=1(j= 1 i) on Wik , k = 1, 2, i = n1 , ..., np , and 1 1 1 1 1 n 1 1 dij (x, t) xj + dii (x, t) ϕi (t) + hi (x, t)11 < aii (x, t) ri (t) − ri (t) , 1 1j=1(j= 1 i) on Wik , k = 1, 2 , i = np+1 , ..., nn , then the system (1) has a p−parameter class of solutions belonging to the set ω. Proof. Here we have, on Wik , k = 1, 2, ⎛ ⎞ n Pik = aii ri + (−1)k ⎝ dij xj + (cii + dii ) ϕi + gi + hi − ϕ i ⎠ − ri ⎛ = aii ri + (−1)k ⎝
j=1(j= i) n
⎞
dij xj + dii ϕi + hi ⎠ − ri
j=1(j= i)
and
1 1 1 1 n 1 1 k Pi ≤ aii ri + 11 dij xj + dii ϕi + hi 11 − ri < 0, i = n1 , ..., np , 1j=1(j= 1 i)
System of quasilinear differential equations
269
and 1 1 1 1 n 1 1 k Pi ≥ aii ri − 11 dij xj + dii ϕi + hi 11 − ri > 0, i = np+1 , ..., nn . 1 1j=1(j= i) According to the Lemma 1, the above estimates for P ik on Wik confirm the statement of the Theorem.
In the case (15) we have Corollary 4. Let Γ be an integral curve of the system (16) and let (15) holds. If |dii (x, t) ϕi (t) + hi (x, t)| < −aii (x, t) ri (t) + ri (t) on Wik , k = 1, 2, i = n1 , ..., np , and |dii (x, t) ϕi (t) + hi (x, t)| < aii (x, t) ri (t) − ri (t) on Wik , k = 1, 2 , i = np+1 , ..., nn , then the conclusion of Theorem 4 valids. Let us consider the behaviour of integral curves of the system (17) using the system (16) . Theorem 5. Let Γ be an integral curve of the system (16). If 1 1 1 1 1 n 1 1 cij (x, t) (xj − ϕj (t)) + hi (x, t)11 < −cii (x, t) ri (t) + ri (t) (18) 1 1j=1(j= 1 i) on Wik , k = 1, 2 , i = n1 , ..., np , and 1 1 1 n 1 1 1 1 cij (x, t) (xj − ϕj (t)) + hi (x, t)11 < cii (x, t) ri (t) − ri (t) 1 1j=1(j= 1 i)
(19)
on Wik , k = 1, 2 , i = np+1 , ..., nn , then the system (17) has a p−parameter class of solutions which belongs to the set ω.
270
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Proof. Here we have, on Wik , k = 1, 2, ⎛ ⎞ n Pik = (−1)k ⎝ cij xj + gi + hi − ϕ i ⎠ − ri j=1
⎛ ⎞ n n = (−1)k ⎝ cij (xj − ϕj ) + hi + cij ϕj + gi − ϕ i ⎠ − ri j=1
j=1
⎛ ⎞ n = (−1)k ⎝ cij (xj − ϕj ) + hi ⎠ − ri j=1
⎛
= (−1)k cii (xi − ϕi ) + (−1)k ⎝ ⎛ = cii ri + (−1)k ⎝
n
⎞ cij (xj − ϕj ) + hi ⎠ − ri
j=1(j= i) n
⎞
cij (xj − ϕj ) + hi ⎠ − ri .
j=1(j= i)
In view of (18) and (19) the following estimates valid 1 1 1 n 1 1 1 k 1 Pi ≤ cii ri + 1 cij (xj − ϕj ) + hi 11 − ri < 0, i = n1 , ..., np , 1 1j=1(j= i) and
1 1 1 1 n 1 1 k Pi ≥ cii ri − 11 cij (xj − ϕj ) + hi 11 − ri > 0, i = np+1 , ..., nn . 1 1j=1(j= i)
Corollary 5. Let Γ be an integral curve of the system (16). If C is a diagonal matrix and |hi (x, t)| < −cii (x, t) ri (t) + ri (t) on Wik , k = 1, 2, i = n1 , ..., np , and |hi (x, t)| < cii (x, t) ri (t) − ri (t) on Wik , k = 1, 2, i = np+1 , ..., nn , then the conclusion of Theorem 5 valids. Example Let us consider the quasilinear differential equation ..
.
x + (1 + p (x, t)) x + p (x, t) x = f (x, t)
(20)
System of quasilinear differential equations
271
where p, f ∈ C 1 (D, IR) , D = (x, t) ∈ IR2 : |x| < M, t ∈ I , M ∈ IR+ . Theorem 6. Let r1 , r2 ∈ C 1 I, IR+ . a) If r2 (t) < r1 (t) + r1 (t) , t ∈ I, and |f (x, t)| < p (x, t) r2 (t) + r2 (t) on D,
(21)
then the equation (20) has a two-parameter class of solutions x (t) satisfying the conditions 1 1 . |x (t)| < r1 (t) , 1x (t) + x (t)1 < r2 (t) , t ∈ I. (22) b) If (21) holds and r2 (t) < −r1 (t) − r1 (t) , t ∈ I,
(23)
then the equation (20) has a one-parameter class of solutions x (t) satisfying (22) . c) If (23) holds and |f (x, t)| < −p (x, t) r2 (t) − r2 (t) on D, then the equation (20) has at least one solution x (t) satisfying (22) . We study the equation (20) by means of the equivalent system .
.
x = −x + y, y = −p (x, t) y + f (x, t) considering the behaviour of the integral curves with respect to the set ω = (x, y, t) ∈ IR3 : |x| < r1 (t) , |y| < r2 (t) , t ∈ I . Corollary 6. If p (x, t) > 3, |f (x, t)| < ce−3t [p [ (x, t) − 3] on D, then the equation (20) has a one-parameter class of solutions x (t) satisfying the conditions 1 1 . |x (t)| < ce−3t , 1x (t) + x (t)1 < ce−3t , t ∈ 0, ∞ . Remark The results which establish the existence of a p−parameter class of solutions belonging to the set ω for all t ∈ I, also give the possibility to discuss about the stability (or instability) of solutions with the function of stability (or instability) including auto-stability and stability along the coordinates of certain classes of solutions. The obtained results also contain an answer to the question on approximation of solutions x (t) whose existence is established. The errors of the approximation are defined by the functions r (t) which can be arbitrary small for any t ∈ I.
272
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
References [1] J. Dibl´ık, On asymptotic behaviour of solutions of certain classes of ordinary differential equations, Journal of Differential Equations 95 (1992), 203-217. ´, Retraction method in the Qualitative Analysis of the Solutions of the [2] A. Omerspahic Quasilinear Second Order Differential Equation, Proceedings of the First Conference on Applied Math. and Computation, Dubrovnik 1999, (2001). [3] B. Vrdoljak, Curvilinear "tubes" in the retraction method and the behaviour of solutions for the system of differential equations, Matematiˇcˇ ki Vesnik 4(17)(32) (1980), 381-392. [4] B.Vrdoljak, On parameter classes of solutions for system of linear differential equaˇ 20(40)(1985) 61-69. tions, Glasnik Matematicki [5] B. Vrdoljak, Existence and approximation of radial solutions of semilinear elliptic ˇ 30(50) (1995), 243-259. equations in an annulus, Glasnik Matematicki [6] B.Vrdoljak, On behaviour and stability of system of linear differential equations, Proceedings of the 2nd Congress of Croatian Society of Mechanics, Supetar, 1997, 631638. [7] B.Vrdoljak, On behaviour of solutions of system of linear differential equations, Mathematical Communications 2(1997) 47-57. ´, Qualitative analysis of some solutions of quasi[8] B. Vrdoljak and A. Omerspahic linear system of differential equations, Proceedings of the Second Conference on Applied Mathematics and Scientific Computing, Edited by Z. Drmaè et al., Kluver Academic/Plenum Publishers, New York (2003), 323-332. [9] T.Waˇ zewski, Sur un principe topologique de l’examen de l’allure asymptotique des integrales des equations differentielles ordinaires, Ann. Soc. Polon. Math. 20 (1947), 279313.
ALGEBRAIC PROOF OF THE B–SPLINE DERIVATIVE FORMULA Mladen Rogina Department of Mathematics University of Zagreb
[email protected]
Abstract
We prove a well known formula for the generalized derivatives of Chebyshev B–splines: L1 Bik (x) =
k−1 (x) Bi+1 Bik−1 (x) − , Ck−1 (i) Ck−1 (i + 1)
where
Ck−1 (i) =
ti+k−1 ti
Bik−1 (x)dσ,
in a purely algebraic fashion, and thus show that it holds for the most general spaces of splines. The integration is performed with respect to a certain measure associated in a natural way to the underlying Chebyshev system of functions. Next, we discuss the implications of the formula for some special spline spaces, with an emphasis on those that are not associated with ECC-systems. Keywords:
1.
Chebyshev splines, divided differences.
Introduction and preliminaries The classic formula for the derivatives of polynomial B–splines k−1 Bi+1 (x) B k−1 (x) d k Bi (x) = (k − 1) ( i − ), dx ti+k−1 − ti ti+k − ti+1
may be written in the form: k−1 Bi+1 (x) Bik−1 (x) d k Bi (x) = − , dx Ck−1 (i) Ck−1 (i + 1)
where
! Ck−1 (i) =
ti+k−1
ti
Bik−1 (x)dx.
273 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 273–282. © 2005 Springer. Printed in the Netherlands.
(1)
274
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
The same formula holds for Chebyshev splines if integration in (1) is performed with respect to a certain measure associated in a natural way to the underlying Chebyshev system of functions. In this way, we can define Chebyshev Bsplines recursively, and inductively prove their propereties. To the best of our knowledge, the derivative formula for non–polynomial splines first appeared in [9] for one–weight Chebyshev systems. Later, special cases appear for various Chebyshev splines, like GB splines [4], tension splines [3], and Chebyshev polynomial splines [10]. The general version for Chebyshev splines, that appeared in [1], in the form of a defining recurrence relation for B-splines, is based on an indirect argument, relying on induction and uniqueness of Chebyshev B-splines. A direct proof, valid for CCC-systems and Lebesgue-Stieltjes measures, follows.
2.
The derivative formula
We begin by introducing some new notation and restating some known facts, to make the proof of the derivative formula easier. Let δ ⊆ [a, b] be measurable with respect to Lebesgue - Stieltjes measures dσ2 , . . . dσn , and let P n−1 be n − 1 × n − 1 permutation matrix, that we call duality: i = 1, . . . n − 1;
(P n−1 )ij : = δi,n−j
j = 1, . . . n − 1.
We shall use the following notation: measure vector : reduced measure vectors : dual measure vector :
d := (dσ2 (δ), . . . dσn (δ))T ∈ Rn−1 , d (j) : = (dσ σj +2 , . . . dσn )T ∈ Rn−j−1 , P n−1 d .
CCC-system S(n, d ) of order n is a set of functions L{1, u 1 , . . . un }: ! x u2 (x) = u1 (x) dσ2 (t2 ), a
.. . un (x) = u1 (x)
!
!
x a
dσ2 (t2 ) . . .
tn−1
dσn (tn ) a
(see [12] and references therein). If all of the measures dσ i are dominated by the Lebesgue measure, then they possess densities p1i , i = 2, . . . n; if pi are
n−i+1 , the functions form an Extended Complete i smooth, i.e. p1i : = dσ dt ∈ C Chebyshev System (ECC-system). Reduction and duality define reduced, dual, and reduced dual Chebyshev systems as Chebyshev systems defined, respectively, by appropriate measure vectors:
275
B–Spline Derivative Formula
j–reduced system: dual system: j–reduced dual system:
S(n − j, d (j) ) = {uj,1 , . . . uj,n−j } S(n, P n−1 d ) = {u∗1 , . . . u∗j } S(n − j, (P n−1 d )(j) ) = {u∗j,1 , . . . u∗j,n−j }.
We define the generalized derivatives as linear operators mapping the Chebyshev space of functions spanned by S(n, d ) to the one spanned by S(n − j, d (j) ) by Lj,d : = Dj · · · D1 , where Dj are measure derivatives: Dj f (x) : = lim
δ→0+
f (x + δ) − f (x) . dσ σj +1 (x, x + δ)
Generalized derivatives with respect to the dual measure vector are known as dual generalized derivatives. For example, if n = 4: S(4, d ) = {u1 , u2 , u3 , u4 } : 1 <x dσ (t ) <0x 2 2 < x dσ (t ) dσ (t ) <0x 2 2 <0x 3 3 < x 0 dσ2 (t2 ) 0 dσ3 (t3 ) 0 dσ4 (t4 )
S(4, P 3 d ) = {u∗1 , u∗2 , u∗3 , u∗4 } : 1< y dσ (t ) <0y 4 4 < t4 dσ (t ) dσ (t ) <0y 4 4 <0t4 3 3 < t3 0 dσ4 (t4 ) 0 dσ3 (t3 ) 0 dσ2 (t2 )
S(3, d (1) ) = {u1,1 , u1,2 , u1,3 } : 1 <x dσ (t ) <0x 3 3 < t3 0 dσ3 (t3 ) 0 dσ4 (t4 )
S(3, (P 3 d )(1) ) = {u∗1,1 , u∗1,2 , u∗1,3 } : 1< y dσ (t ) <0y 3 3 < t3 0 dσ3 (t3 ) 0 dσ2 (t2 )
S(2, d (2) ) = {u2,1 , u2,2 } : 1 <x 0 dσ4 (t4 )
S(2, (P 3 d )(2) ) = {u∗2,1 , u∗2,2 } : 1< y 0 dσ2 (t2 )
S(1, d (3) ) = {u3,1 } : 1
S(1, (P 3 d )(3) ) = {u∗3,1 } : 1
Note that the dual of the reduced system is different from the reduced dual system, i.e.: P n−j−1 d (j) = (P n−1 d )(j) . The function Gn,d (x, y) : [a, b] × [a, b] → R defined by <s <s <x dσ2 y 2 . . . y n−1 dσn x ≥ y, y Gn,d (x, y) : = 0 otherwise is called the Green’s function with respect to d . It follows easily that Li,d Gn,d (x, ·) = Gn−i,d (i) (x, ·),
for
i = 1, . . . , n − 1.
(2)
276
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
We shall say that ∆ = {x0 , . . . , xk+1 }, a ≤ xi ≤ b is the knot sequence if a = x0 < x1 < x2 < . . . xk < xk+1 = b and m = (n1 , . . . , nk )T is the multiplicity vector if n i are integers, and 1 ≤ ni ≤ n. {t1 . . . t2n+k } is an extended partition if tn+k+1
t1 = . . . = t n = a = . . . = t2n+k = b
tn+1 ≤ . . . ≤ tn+k = x1 , . . . , x1 , . . . , xk , . . . , xk . 6 788 9 6 788 9 n1
nk
The space of Chebyshev splines of order n associated with the knot sequence ∆ and vectors m and d is denoted as S(n, m, d , ∆). Next, in order to define divided differences, we need to extend the Chebyshev systems by one extra function, and that means involving an additional artificial measure. To this end, let us define the extension operator ⎛ ⎞ 1 0 ⎜ ⎟ .. ⎜ ⎟ . Ei = ⎜ ⎟ E i : Ri → Ri+1 , ⎝0 . . . 1 ⎠ 0 ... 0 and the extended measure vector: d = (dσ2 , . . . dσn , dλ)T = E n−1 d + en dλ where en = [0 . . . 0, 1]T ∈ Rn , and dλ is the artificial measure (usually taken to be the Lebesgue one). The Chebyshev divided difference of order n is then
t1 , . . . , tn+1 D u1 , . . . , un , f
. [t1 , . . . tn+1 ]S(n+1,d ) f = t1 , . . . , tn+1 D u1 , . . . , un+1 For definition of the determinants defining the divided differences, see[12]. The important thing is the anihilation property, which we quote for the sake of notation purposes: [t1 , . . . , tn+1 ]S(n+1,d ) u = 0
∀u ∈ S(n, d ).
¨ In this notation, divided differences satisfy the Mulbach’s recurrence [5]: [t1 , . . . , tn+1 ]S(n+1,d ) f = [t2 , . . . , tn+1 ]S(n,d ) f − [t1 , . . . , tn ]S(n,d ) f . [t2 , . . . , tn+1 ]S(n,d ) un+1 − [t1 , . . . , tn ]S(n,d ) un+1
(3)
277
B–Spline Derivative Formula
Formula (3) can even be generalized to the complex case [6]. The un-normalized Chebyshev B-splines are then defined as Qni,d (x) = (−1)n [ti , . . . ti+n ]S(n+1,d ) Gn,d (x, ·). , Let K = ki=1 ni . B–splines {Qni,d }n+K are the basis for S(n, m, d , ∆), 1 and it is known [12] that they can be normalized so as to make a partition of unity, i.e. there are constants αni (d ) > 0 such that n Ti,d (x) = αni (d ) Qni,d (x),
(4)
, n n and n+K i=1 Ti,d (x) = 1 for x ∈ [a, b]. Moreover, T i,d (x) do not depend on the artificial measure, that is the extension operator E n−1 needed to define divided differences. Indeed,
ti , . . . , ti+n ti+1 , . . . , ti+n−1 D D u∗ , . . . , u∗n+1 u∗ , . . . , u∗n−1
1 1 αni (d ) = ti+1 , . . . , ti+n ti , . . . , ti+n−1 D D u∗1 , . . . , u∗n u∗1 , . . . , u∗n and
ti+1 , . . . , ti+n−1 ti , . . . , ti+n D D u∗1 , . . . , u∗n+1 u∗ , . . . , u∗n−1 n
1 · Ti,d (x) = ti+1 , . . . , ti+n ti , . . . , ti+n−1 D D u∗1 , . . . , u∗n u∗1 , . . . , u∗n
ti , . . . , ti+n D u∗1 , . . . , u∗n , Gn,d
, ti , . . . , ti+n D u∗1 , . . . , u∗n+1
so that the determinants involving u ∗n+1 cancel. Theorem 2.1. Let L1,d be the first generalized derivative with respect to the CCC-system S(n, d ), and let the multiplicity vector m satisfy n i < n − 1 for i = 1, . . . k. Then for all x ∈ [a, b] and i = 1, . . . , n + K: n L1,d Ti,d (x)
=
where
n−1 Ti,d (1) (x)
Cn−1 (i) !
Cn−1 (i) =
ti+n−1
ti
−
Tin−1 (x) +1,d (1) Cn−1 (i + 1)
n−1 Ti,d (1) dσ2 .
,
(5)
278
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Proof. By Sylvester’s determinant identity [3, p.158]: n Ti,d (x) = (−1)n {[ti+1 , · · · , ti+n ]S(n,P n−1 d ) Gn,d (x, ·)
−[ti , · · · , ti+n−1 ]S(n,P n−1 d ) Gn,d (x, ·)}.
If we apply the first generalized derivative and utilize (2), we obtain n L1,d Ti,d (x) = −(ωi+1 − ωi ),
where ωi : = (−1)n−1 [ti , . . . ti+n−1 ]S(n,P n−1 d ) Gn−1,d (1) (x, ·). ¨ The Mulbach’s recurrence (3) reduces the order of divided differences: ωi =
(−1)n−1 γi
{ [ti+1 , . . . ti+n−1 ]S(n−1,P n−2 d (1) ) Gn−1,d (1) − [ti , . . . ti+n−2 ]S(n−1,P n−2 d (1) ) Gn−1,d (1) },
where ∗ ∗ γi = [ti+1 ,. . ., ti+n−1 ]S(n− n 1,P n−2 d (1) ) un − [ti ,. . ., ti+n−2 ]S(n− n 1,P n−2 d (1) ) un . n−1 Therefore, by Sylvester’s identity, ω i = Ti,d (1) (x)/γi , and it remains to prove
that γi = Cn−1 (i). To this end, let dq : = E n−2 P n−2 d (1) + dλ n−1 . Recurrence for divided differences in S(n, dq) applied to u ∗n yields [ti , . . . ti+n−1 ]S(n,dq ) u∗n = (6) ∗ ∗ [ti+1 , . . . ti+n−1 ]S(n− n 1,P n−2 d (1) ) un − [ti , . . . ti+n−2 ]S(n− n 1,P n−2 d (1) ) un , ∗ ∗ [ti+1 , . . . ti+n−1 ]S(n− n 1,P n−2 d (1) ) vn − [ti , . . . ti+n−2 ]S(n− n 1,P n−2 d (1) ) vn where vn∗ is an element of the extended dual reduced system: ! s3 ! y ! s4 vn∗ (y) = dσn (sn ) . . . dσ3 (s3 )ds3 dλ(s2 ). a
a
a
Equation (6) implies that γi = [ti , . . . ti+n−1 ]S(n,dq ) u∗n · ∗ ∗ {[ti+1 i , . . . ti+n−1 ]S(n− n 1,P n−2 d (1) ) vn −[ti , . . . ti+n−2 ]S(n− n 1,P n−2 d (1) ) vn }. By Peano representation of Chebyshev divided differences [12, p.382] ! ti+n−1 ∗ ∗ [ti , . . . ti+n−1 ]S(n,dq) un = Qn−1 L q un dλ, i,d (1) n−1,d ti
279
B–Spline Derivative Formula
and
Ln−1,d u∗n dλ = dσ2 .
By yet another application of Sylvester’s determinant identity, the term in { } can be identified with the normalization constant α n−1 (d (1) ). i Theorem 2.1 may now be used to calculate, at least in theory, all derivatives of a Chebyshev spline. Generalized derivative can be factorized: Li+1,d = Li−k+1,d (k) Lk,d
for k = 1, . . . i;
i = 1, . . . n − 2, (7)
and this fact can be used inductively to find higher derivatives as linear combination of lower order splines. , n Theorem 2.2. Let s(x) = l−1 j=r−n+1 δj Tj,d (x) be the B–representation of a Chebyshev spline s ∈ S(n, m, d , ∆) for x ∈ [t r , tl ], 0 < r < l < k + 1. Then the B–representation of its generalized derivative L i,d s ∈ S(n − i, m, d (i) , ∆) is: Li,d s(x) =
l−1
n−i δji Tj,d (i) (x)
for
i = 1, . . . n−1−max ni . (8) i
j=r−n+i+1
The coefficients δji can be calculated recursively: δj0 = δj , δji
δji−1 − δji−1 −1
=
where
Cn−i (j)
! Cn−i (j) =
tj+n−i
tj
,
n−i Tj,d (i) dσi+1 .
Proof. We know that (8) holds for i = 1. Let us suppose that n−i Li,d s(x) = δji Tj,d (i) (x).
(9)
j
Equation (7) for k = i yields Li+1,d = L1,d (i) Li,d , whence by (9) n−i Li+1,d s(x) = L1,d (i) ( δji Tj,d (i) (x)). j
(10)
280
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Theorem 2.1 may now be applied to T n−i –splines in (10) to obtain Li+1,d s(x) =
δji − δji −1 n−i−1 n−i−1 Tj,d δji+1 Tj,d (i−1) (x) = (i−1) (x). Cn−i−1 (j) j
j
Applications We can define, and calculate (at least theoretically) Chebyshev B-splines by a recurrence relation implied by the derivative formula: ! x ! x 1 1 n−1 n Ti,d (x) = T dσ2 − T n−1 dσ2 . Cn−1 (i) ti i,d (1) Cn−1 (i + 1) ti+1 i+1,d (1) From the numerical point of view, the recurrence involves dangerous subtractions resulting in the loss of significant digits, even for polynomial splines. For Chebyshev splines the numerical instability sometimes destroys the result. To illustrate it, we consider a CCC-system associated with the measure vector T d = (t−α 2 dt2 , dt3 , dt4 ) , where 0 < α < 1. The system originates from the realistic problem concerning axially symmetric potentials, and is not an ECCsystem, since the measures do not posses smooth densities [8]. The Green’s function is 3−α 3−α 2−α) −y 2−α ) 2 1−α −y 1−α) x −y − y(x 2−α + y (x2(1−α) , x≥y 2(3−α) g4,d (x, y) = 0 otherwise. The simplest case is that of the 4th order splines on triplets of knots, as the first reduced system are ordinary powers, and therefore B-splines in the first reduced system are scaled Bernstein polynomials. The following simple Mathematica code generates some Chebyshev Bsplines for α = 12 : B1[x_] := ((b - x)/(b - a))^2; B2[x_] := 2/(b - a)^2*(x - a)*(b - x); a = 1000; b = 1001; C1 = Simplify[Integrate[B1[t]*Sqrt[t], {t, a, b}]] C2 = Simplify[Integrate[B2[t]*Sqrt[t], {t, a, b}]] first[x_] := Simplify[Integrate[B1[t]*Sqrt[t], {t, a, x}]/C1]; second[x_] := Simplify[Integrate[B2[t]*Sqrt[t], {t, a, x}]/C2]; Plot[first[x] - second[x], {x, a, b + (b - a)/3}]
Depending on a and b (eg. if they are either away from 0 like in the above example, or close to each other), this can lead to the loss of half of the significant digits. Indeed, recalculation in 64-bit arithmetics shows that only the first seven digits hold, and the error is shown in Fig. 1:
281
B–Spline Derivative Formula Figure 1.
Roundoff error for the derivative formula
(α = 21 )
-7
1.5·10
-7
1·10
-8
5·10
20
40
60
80
100
-8
-5·10
-7
-1·10
-7
-1.5·10
-7
-2·10
Observe also that accuracy is lost if we calculate the normalization constants on small intervals by analytic formulæ. For example, the constant C1 above is: 7 5 3 7 2 −15 a 2 + 42 a 2 b − 35 a 2 b2 + 8 b 2 . 105 (a − b)2 It is therefore better to use a Gaussian formula with the appropriate weight. In special cases such as this, where only one measure is different from the Lebesgue one, the derivative formula and knot insertion can sometimes be used to obtain numerically stable algorithms (at least for rational α’s); Theorem 2.1 by itself is not enough. The same qualitative behaviour also happens for some ECC-systems, like tension powers, where the ECC-system is determined by the measure vector d : = (dλ, cosh(px)dλ, cosh12 (px) dλ)T , and p > 0 is known as tension parameter. The tension parameter and interval length can be chosen so that straightforward application of the derivative formula leads to loss of all significant digits. There is a way out through knot insertion [11], but Theorem 2.1 still plays an important role in the construction. Finally, the first recorded proof of the famous de Boor-Cox recurrence [7] for polynomial splines is based on the derivative formula, plus an additional algebraic fact, which does not hold in the Chebyshev setting [2].
Acknowledgment This research was supported by Grant 0037114, by the Ministry of Science and Technology of the Republic of Croatia.
References
282
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[1] Bister, D. and Prautzsch, H. A new Approach to Tchebycheffian B-splines. Curves and Surfaces with Applications in CAGD, C. Rabut, L. L. Schumaker, A. Le M´e´ haute´ eds., Vanderbilt Univ. Press: 35–43, (1997). [2] de Boor, C. and Pinkus, A. The B-spline recurrence relations of Chakalov and of Popoviciu. [3] Koch, P. E. and Lyche, T. Interpolation with exponential B-splines in tension. Geometric Modelling, Computing Suppl., 8, G. Farin, H. Hagen and H. Noltemeier eds., SpringerVerlag, Wien: 173–190, (1993). [4] Kvasov, B. and Sattayatham, P. GB–splines of arbitrary order. J. Comput. Appl. Math. 104: 63–88, (1999). ¨ [5] Mulbach, G. A recurrence formula for generalized divided differences and some applications. J. of Approx. Theory 9: 165–172, (1973). ¨ [6] Mulbach, G. A recurrence relation for generalized divided differences with respect to ECTsystems. Numer. Algorithms 22: 317–326, (1999). [7] Popoviciu, R. Sur le prolongement des fonctions convexes d’ordre superieur. Bull. Math. Soc. Roumaine des Sc. 36: 75–109, (1934). [8] Reddien, G. W. and Schumaker, L. L. On a collocation method for singular two point boundary value problems. Numer. Math. 25: 427–432, (1976). [9] Rogina, M. Basis of splines associated with some singular differential operators. BIT 32: 496–505, (1992). [10] Rogina, M. and Bosner, T. On calculating with lower order Chebyshev splines. Curve and Surface Design, C. Rabut, L. L. Schumaker, A. Le M´e´ haute´ eds., Vanderbilt Univ. Press: 343–353, (2000). [11] Rogina, M. and Bosner, T. A de Boor type algorithm for tension splines. Curve and Surface Fitting, A. Cohen, J.-L. Merrien, L. L. Schumaker eds., Nashboro Press: 343–353, (2003). [12] Schumaker, L. L. On Tchebychevian spline functions. J. Approx. Theory 18: 278–303, (1976).
RELATIVE PERTURBATIONS, RANK STABILITY AND ZERO PATTERNS OF MATRICES∗ Sanja Singer Faculty of Mechanical Engineering and Naval Architecture, University of Zagreb, I. Luˇi ˇ ca ´ 5, 10000 Zagreb, Croatia.
[email protected]
Saˇsˇa Singer Department of Mathematics, University of Zagreb, P.O. Box 335, 10002 Zagreb, Croatia.
[email protected]
Abstract
1.
A matrix A is defined to be rank stable if rank(A) is unchanged for all relatively small perturbations of its elements. In this paper we investigate some properties and zero patterns of such matrices.
Introduction
Let A be a general rectangular m × n matrix with real or complex elements. We investigate a class of matrices whose rank remains unchanged under all sufficiently small relative perturbations of its elements. Definition 1. A matrix A ∈ Mmn , M = R or C, is rank stable if there exists ε0 > 0, such that for all δA ∈ Mmn |δA| ≤ ε0 |A| =⇒ = rank(A + δA) = rank(A). Here |A| denotes the componentwise absolute value of a matrix, i.e., (|A|) ij = |aij |. The same notion of stability can be applied to other objects (functions) which depend on A, i.e., an object is stable if it is insensitive to all sufficiently small relative perturbations of elements of A. ∗ This
work was supported in part by the Grant No. 0037114 from the Ministry of Science and Technology of the Republic of Croatia.
283 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 283–292. © 2005 Springer. Printed in the Netherlands.
284
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
This definition of rank stability can be interpreted as stability of vanishing singular values of A under all sufficiently small relative perturbations. A similar problem has been investigated by Demmel and Gragg in [1]. They showed that small relative perturbations of elements of A cause small relative perturbations of all singular values σ(A), if and only if the associated bipartite graph of A is acyclic. Let r = rank(A). Since r remains unchanged if we permute rows and columns of A, we can find permutation matrices P and Q, such that the first r rows and columns of P AQ are linearly independent. Therefore, without a loss of generality, we assume that A is already in such a form. This implies that the leading principal submatrix A11 = A([1 : r], [1 : r]) of order r in A is nonsingular. Note that A11 remains nonsingular under all sufficiently small relative (and absolute) perturbations. This is seen easily, since the determinant of a matrix is a continuous function of its elements. It follows that rank(A) can only increase, but not decrease, with sufficiently small perturbations. This immediately proves the following theorem. Theorem 1. If A ∈ Mmn is of full rank, i.e., r = min{m, n}, then A is rank stable. Furthermore, any componentwise relative perturbation δA of A can change only nonzero elements of A, and all zeroes remain intact. Theorem 2. If A ∈ Mmn is a zero matrix, or is equal to a full rank matrix bordered by a zero matrix, i.e., A([1 : m], [r + 1 : n]) = 0 or A([r + 1 : m], [1 : n]) = 0, then A is also rank stable. Proof. If A = 0, the only relative perturbation is δA = 0 and the result is trivial. Otherwise, r ≥ 1, and only the full rank block A([1 : m], [1 : r]) or A([1 : r], [1 : n]) can be changed by δA. Thus, rank(A + δA) ≤ r. But, for all sufficiently small δA, the rank cannot decrease, so A is rank stable. It remains to consider rank deficient nonzero matrices. Since 1 ≤ r < min{m, n}, they can be partitioned as A11 A12 A= , det(A11 ) = 0. (1) A21 A22 We may also assume that the second block-row and block-column in (1) are not trivial. The following simple example shows that such matrices are not rank stable, in general, without further requirements. Example 1. Take the rank 1 matrix A=
1 1 1 1
.
285
Relative perturbations and rank stability
For almost all perturbations δA, the matrix A + δA is nonsingular, so the rank increases. The rest of the paper is organized as follows. In the next section, we reduce the problem of rank stability of a general m × n matrix to the same problem for singular matrices of order r + 1 and rank r. Section 3 gives a characterization of such rank stable singular matrices. In the final section we show that rank stable rank deficient matrices have certain nontrivial zero patterns.
2.
Problem reduction for rank deficient matrices
Rank deficient matrices A have singular square submatrices. To preserve the rank deficiency, the singularity of some of these singular blocks must be stable under small relative perturbations. But, this property does not have to hold for all singular blocks in A. Example 2. The matrix
⎡
⎤ 2 1 0 A=⎣ 1 1 0 ⎦ 1 1 1
is nonsingular, det(A) = 1. It has several singular submatrices of order 2 2 0 1 0 B1 = A([1 2], [1 3]) = , B2 = A([1 2], [2 3]) = 1 0 1 0
and B3 = A([2 3], [1 2]) =
1 1 1 1
.
The first two matrices B1 and B2 are structurally singular and their singularity is stable, by Theorem 2. But, Example 1 shows that the singularity of B 3 is not stable. Yet, it does not affect the rank stability of A. The same argument applies for rank deficient matrices, too. Just border A with several zero rows and/or zero columns. The following result characterizes the rank stability in terms of rank stable singularity of submatrices of order r + 1. Theorem 3. Rank deficient nonzero matrix A defined by (1) is rank stable if and only if all square submatrices Bij := A([1 : r, i], [1 : r, j]),
i = r + 1, . . . , m,
j = r + 1, . . . , n, (2)
(of order r + 1 and rank r) are rank stable. Proof. If Bij is not rank stable for some i > r and j > r, the same perturbation that makes this block unstable will increase the overall rank, so A cannot be rank stable. The converse is obvious.
286
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Each one of these submatrices is singular and has rank deficiency equal to 1. Thus, it is rank stable if and only if it remains stably singular with respect to all sufficiently small componentwise relative perturbations. For, if the rank increases, it becomes nonsingular.
3.
Vanishing determinant stability
The previous result shows that it is enough to consider singular square matrices A. Quite generally, the stability of singularity of A is completely described by the manner its determinant vanishes. Theorem 4. Let A be a singular matrix of order n. Then A is stably singular if and only if det(A) contains only trivially vanishing terms, i.e., n ?
ai,p(i) = 0
(3)
i=1
for all permutations p ∈ Sn . Proof. First, suppose that A is stably singular, i.e., det(A + δA) = 0 for all sufficiently small relative perturbations δA, |δA| ≤ ε 0 |A|, for some ε0 > 0. The proof of (3) is by induction on n. If A = 0, then δA = 0, and the claim follows trivially. This also proves the case n = 1, since A = [0] is the only singular matrix of order 1. Now, suppose that (3) is true for all stably singular matrices of order n−1 ≥ 1. Let A = 0 be a stably singular matrix of order n and let a ij be any nonzero element of A. To prove (3), it is sufficient to show that all (n − 1)! terms in det(A), which contain aij as a factor, must vanish trivially. The sum of these terms in det(A) can be written as (−1)i+j aij det(C Cij ), where (−1)i+j det(C Cij ) is the cofactor of aij in A, and Cij is the corresponding submatrix of A. Therefore, we shall show that det(C ij ) contains only trivially vanishing terms. Let δA be any sufficiently small relative perturbation of A, |δA| ≤ ε 0 |A|, with δaij = εij aij = 0. Then det(A + δA) = 0. Now we split δA into two perturbations δ1 A and δ2 A, so that the only nonzero element of δ1 A is δaij at position (i, j), i.e., δA = δ1 A + δ2 A,
δaij , f ork = iandl = j, δ1 akl = 0, otherwise.
Relative perturbations and rank stability
287
Obviously, |δ1 A|, |δ2 A| ≤ ε0 |A|, so det(A + δ1 A) = det(A + δ2 A) = 0, as well. The Laplace expansion with respect to the i-th row gives n 0 = det(A) = (−1)i+l ail det(C Cil ), l=1
0 = det(A + δ1 A) =
n (−1)i+l (ail + δ1 ail ) det(C Cil + δ1 Cil ). l=1
Note that δ1 Cil = 0 for all l, and δ1 ail = 0, except for l = j. Subtraction yields 0 = det(A + δ1 A) − det(A) = δaij det(C Cij ), Since δaij = 0, we conclude that det(C Cij ) = 0. In exactly the same way we have 0 = det(A + δA) − det(A + δ2 A) = δaij det(C Cij + δC Cij ), which shows det(C Cij + δC Cij ) = 0, for all sufficiently small relative perturbations δC Cij , such that |δC Cij | ≤ ε0 |C Cij |. In other words, Cij is stably singular of order n − 1, and by induction hypothesis, all the terms in det(C ij ) must vanish trivially. Now suppose that det(A) contains only trivially vanishing terms, as in (3). For any permutation p ∈ Sn , there exists an index i such that ai,p(i) = 0. Since zeroes are not changed by componentwise relative perturbations δA, this remains true for A + δA. Thus, det(A + δA) = 0 (with trivially vanishing terms) holds for all (not just sufficiently small) relative perturbations δA, so A is stably singular. We see that a stably singular matrix A is structurally singular in two ways: all relative perturbations of A are also stably singular, and A must have a certain number of zero elements — at least n zeroes are required to annihilate all n! terms in det(A). This result also resolves the rank stability of matrices B ij from (2) in Theorem 3. Theorem 5. Let A be a rank deficient square matrix of order r + 1 and rank r. Then A is rank stable if and only if det(A) contains only trivially vanishing terms. Proof. Such a matrix is rank stable if and only if it is stably singular. There is a subtle difference between Theorems 4 and 5. While stable singularity extends to all relative perturbations of A, Theorem 5 is valid only if these perturbations are sufficiently small, to prevent a potential decrease of rank.
288
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Remark 1. These two theorems also hold for all permutations of A, since det(P AQ) = ± det(A), where P and Q are permutation matrices. Finally, from Theorems 3 and 5, we immediately obtain the following characterization of rank stability. Theorem 6. Let A be a rank deficient nonzero matrix, partitioned as in (1). Then A is rank stable if and only if the determinants of all submatrices B ij in (2) contain only trivially vanishing terms. As can be expected, rank stability is a stronger concept than stable singularity. This is clearly illustrated by our next example, which also shows the necessity of rank(A) = r condition in Theorem 5. Example 3. The rank 1 matrix ⎡
⎤ 1 1 0 A=⎣ 1 1 0 ⎦ 0 0 0 is stably singular, but not rank stable. Moreover, note that zero-bordered singular or rank deficient matrices do not have to be rank stable (in contrast to Theorem 2). We have seen that stably singular or rank stable rank deficient matrices must have a certain number of zero elements. From now on, we turn our attention to these zero patterns. Corollary 1. If A in (1) is rank stable, then A 22 = 0. Proof. Let aij ∈ A22 be any element, so i > r and j > r, and let B ij = A([1 : r, i], [1 : r, j]), as in (2). If A is rank stable, then Theorem 6 implies det(Bij ) = 0 with trivially vanishing terms. The terms in det(B ij ) which contain aij can be written as (−1)i+j aij det(C Cij ), where (−1)i+j det(C Cij ) is the cofactor of aij in Bij . But, obviously, Cij = A11 , which is nonsingular, so we must have aij = 0 to annihilate all these terms in Bij . This proof shows that one “structural” zero element of B ij has to be aij = 0. Since we know that each Bij has at least r + 1 “structural” zeroes, we expect that A22 = 0 is necessary, but not sufficient for rank stability in (1). Indeed, the following example shows it. Example 4. The rank 2 matrix ⎡
⎤ 1 0 1 A=⎣ 0 1 1 ⎦ −1 1 0
289
Relative perturbations and rank stability
is not rank stable, even though A22 = 0, since det(A) = det(B33 ) = 0 has two nonvanishing terms. A small perturbation of any single nonzero element in A makes the matrix nonsingular. Interestingly enough, the blocks A12 and A21 in (1) need not be trivial. Example 5. The rank 2 matrix ⎡
1 ⎢ 0 A=⎢ ⎣ 0 0
0 1 1 1
1 0 0 0
⎤ 1 0 ⎥ ⎥ 0 ⎦ 0
is rank stable. Note that det(Bij ) = 0 vanishes trivially, for i, j > 2.
4.
Zero patterns
Now we investigate the zero patterns of rank stable rank deficient matrices in more detail. Our goal is to locate the position of some of those “structural” zeroes in each Bij more precisely. Each Bij has at least r + 1 “structural” zeroes. So far, we only know that one of them is in the bottom right corner, at position (r + 1, r + 1). We shall show that the last row and the last column of B ij , together, contain at least r + 1 zeroes. All the results in this section will be stated for the columns of A, denoted by aj = A(:, j), for j = 1, . . . , n. Similar results for the rows of A follow trivially by considering AT . Let A be a rank deficient matrix, partitioned as in (1). Corollary 1 shows that A22 = 0 is necessary for the rank stability of A. If, in addition, A 21 = 0 or A12 = 0, then A is rank stable by Theorem 2. So, we assume that A 21 , A12 = 0, and apply additional column partitioning to investigate the structure of A. The first r columns of A can be reordered into the following form ⎡ ⎤ a11 · · · a1s a1,s+1 · · · a1r ⎢ .. ⎥ .. .. ⎢ . · · · ... ⎥ . ··· . ⎢ ⎥ ⎢ ar1 · · · ars ar,s+1 · · · ⎥ a rr ⎢ ⎥ (4) ⎢ 0 · · · 0 ar+1,s+1 · · · ar+1,r ⎥ , ⎢ ⎥ ⎢ .. ⎥ . .. .. ⎣ . · · · .. ⎦ . ··· . 0
···
0
am,s+1
···
amr
with aj (r + 1 : m) = 0, for j = s + 1, . . . , r. Note that s depends on the structure of A21 . Generally, 0 ≤ s ≤ r, and A21 = 0 implies s < r. >From now on, we may assume that A(:, [1 : r]) is already in this form.
290
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Since rank(A) = r < n, for any j ∈ {r + 1, . . . , n}, the column a j is a linear combination of the first r columns aj =
r
λkj ak .
(5)
k=1
Theorem 7. If A is rank stable, with A21 , A12 = 0 and A(:, [1 : r]) as in (4), then s > 0 and s aj = λkj ak , j = r + 1, . . . , n, (6) k=1
i.e., λkj = 0 in (5), for k = s + 1, . . . , r. Proof. Let j ∈ {r + 1, . . . , n} be fixed, and let λ j = (λ1j , . . . , λrj )T be the vector of coefficients in (5). From A22 = 0 we have aj (r + 1 : m) = 0, and (5) can be written as A11 aj (1 : r) λj = aj = . 0 A21 Nonsingularity of A11 implies that λj is a unique solution of the linear system A11 λj = aj (1 : r), or λj = A−1 11 aj (1 : r). Furthermore, λj satisfies A21 λj = 0. Let k ∈ {s + 1, . . . , r} be fixed, and let δA be a sufficiently small relative perturbation of A which changes only one nonzero element in a k (r + 1 : m) (from (4), at least one such element exists). Since A is rank stable, we have rank(A + δA) = rank(A) = r, and the first r columns of A + δA remain linearly independent (for sufficiently small δA). Starting from (5) written for A + δA, r aj + δaj = (λkj + δλkj )(ak + δak ),
j = r + 1, . . . , n,
k=1
the same argument gives A11 + δA11 (aj + δaj )(1 : r) (λj + δλj ) = aj + δaj = . 0 A21 + δA21 But, δA11 = 0, δaj = 0, by construction, so the first block equation is A 11 (λj + δλj ) = aj (1 : r), and δλj = 0 follows. From the second block equation (A21 + δA21 )λj = 0 and A21 λj = 0, we obtain δA21 λj = 0. Since δA is different from zero only in the k-th column, we conclude that λ kj = 0. This holds for all j > r and k > s. Finally, A12 = 0 implies aj (1 : r) = 0, for some j > r. This gives λj = A−1 11 aj (1 : r) = 0, so s > 0. Note that s = 0 in (6) means a j = 0 for all j > r, which is also correct for A12 = 0.
Relative perturbations and rank stability
291
We see that A12 = 0 implies a nontrivial zero structure of A 21 in (4). Example 6. The converse of Theorem 7 is not valid. The rank 2 matrix ⎡ ⎤ 1 0 1 A=⎣ 1 1 1 ⎦ 0 1 0 is already in form (4) with s = 1. We have a 3 = a1 , so (6) is also satisfied with λ13 = 1, λ23 = 0. But, A is not rank stable, since det(A) = det(B 33 ) = 0 has two nonvanishing terms. A small perturbation of any single nonzero element in A, except a22 , makes the matrix nonsingular. Note that a3 has 2 nonzero elements which can be perturbed independently, so a3 has simply too many nonzeroes for A to be rank stable. On the other hand, the rank stable matrix in Example 5 has just enough zeroes in A 12 and A21 . This example leads to the following conclusion. Theorem 8. Let A be rank stable, with A21 , A12 = 0 and A(:, [1 : r]) as in (4). Suppose that there are exactly q j nonzero elements in aj , for j = r + 1, . . . , n. Then s ≥ qj , for all j = r + 1, . . . , n. Proof. First note that A22 = 0 implies qj ≤ r, for j = r + 1, . . . , n. Let δA be any sufficiently small componentwise relative perturbation of A. From Definition 1 it is obvious that A + δA is also rank stable, and has the same zero pattern as A. Therefore, Theorem 7 holds for A + δA with the same value of s. Every nonzero element of aj can be perturbed independently, so all (sufficiently small) relative perturbations of aj form a qj -dimensional ball around aj , which cannot be spanned by less then qj vectors. Example 7. It is interesting that we can have s > q j , for all j > r, without A being trivially rank stable. For example, ⎡ ⎤ 1 0 0 1 ⎢ 1 1 0 0 ⎥ ⎥ A=⎢ ⎣ 0 0 1 0 ⎦, 0 0 1 0 is rank stable, with rank(A) = 3, since det(A) = det(B 44 ) = 0 has only trivially vanishing terms. We have q 4 = 1, s = 2, and a4 = a1 − a2 is a nontrivial linear combination of two columns. This is also true for all small relative perturbations a4 + δa4 , which form a 1-dimensional ball around a 4 ⎡ ⎤ 1 + ε14 ⎢ ⎥ 0 ⎥ = (1 + ε14 )a1 − (1 + ε14 )a2 , a4 + δa4 = ⎢ ⎣ ⎦ 0 0
292
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
but a4 + δa4 is still spanned by a1 and a2 . The same remains true even if we allow small relative perturbations of the whole matrix. Finally, we can now prove that Bij contains least r + 1 zeroes among 2r + 1 elements in its last row and column, together. Corollary 2. Let A be a rank stable rank deficient matrix, partitioned as in (1). Each submatrix Bij = A([1 : r, i], [1 : r, j]) from (2) has at least r + 1 zeroes which lie either in its last row or in its last column, for i = r + 1, . . . , m, and j = r + 1, . . . , n. Proof. From Corollary 1 we know that A22 = 0, so each Bij has at least one zero element in position (r + 1, r + 1). If A12 = 0, the last column of Bij is equal to zero. Likewise, if A21 = 0, the last row of Bij is zero. In both cases, the claim follows trivially. Otherwise, suppose that aj contains exactly qj nonzero elements. Since A22 = 0, these nonzeroes have to be in the upper part a j (1 : r) of aj . In other words, the last column of Bij contains exactly (r − qj ) + 1 zeroes (including the last element). Theorem 8 implies s ≥ q j , so the last row of Bij (without the last element) contains at least s zeroes. All together, the last row and column of Bij have at least (r − qj )+ 1+ s ≥ (r − qj )+ 1+ qj = r + 1 zero elements.
References [1] J. Demmel and W. B. Gragg. On computing accurate singular values and eigenvalues of matrices with acyclic graphs. Linear Algebra Appl., 185:203–217, 1993.
NUMERICAL SIMULATIONS OF WATER WAVE PROPAGATION AND FLOODING Luka Sopta University of Rijeka 51000 Rijeka, Vukovarska 58, Croatia
[email protected]
ˇ ˇ Nelida Crnjari´ c-Zic University of Rijeka 51000 Rijeka, Vukovarska 58, Croatia
[email protected]
Senka Vukovi´c University of Rijeka 51000 Rijeka, Vukovarska 58, Croatia
[email protected]
Danko Holjevi´c Croatian Waters ˇ 51000 Rijeka, Djure Sporera 3, Croatia
[email protected]
ˇ c Jerko Skifi´ University of Rijeka 51000 Rijeka, Vukovarska 58, Croatia jerko.skifi
[email protected]
Siniˇsˇa Druzˇ eta University of Rijeka 51000 Rijeka, Vukovarska 58, Croatia
[email protected]
293 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 293–304. © 2005 Springer. Printed in the Netherlands.
294
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Abstract
In this paper we present main points in the process of application of numerical schemes for hyperbolic balance laws to water wave propagation and flooding. The appropriate mathematical models are the one-dimensional open channel flow equations and the two-dimensional shallow water equations. Therefore good simulation results can only be obtained with well-balanced numerical schemes such as the ones developed by Berm`u` dez and Va´ zquez, Hubbard and Garc´ıaNavarro, LeVeque, etc. as well as the ones developed by the authors of this paper. We also propose a modification of the well-balanced Q-scheme for the twodimensional shallow water equations that solves the wetting and drying problem. Finally, we present numerical results for three simulation tasks: the CADAM dam break experiment, the water wave propagation in the Toce river, and the catastrophic dam break on the Malpasset river.
Keywords:
open channel flow, shallow water, hyperbolic balance laws, well-balanced schemes, wetting and drying problem.
1.
Introduction
The foundation for the numerical simulation is an appropriate mathematical model for the physical phenomena in consideration. In the case of water wave propagation and flooding, one-dimensional open channel flow equations ∂Q ∂A ∂t + ∂x = 0 (1) M 2 Q|Q| ∂Q Q2 ∂ dz ∂t + ∂x A + gI1 = g I2 − A dx − gA A2/3 P 4/3 and two-dimensional shallow water equations ∂h ∂ ∂ ∂t + ∂x1 (hv1 ) + ∂x2 (hv2 ) = 0 2 1 2 ∂ ∂ + ∂x∂ 2 (hv1 v2 ) ∂t (hv1 ) + ∂x1 hv1 + 2 gh 2 1 2 ∂ ∂ ∂ ∂t (hv2 ) + ∂x1 (hv1 v2 ) + ∂x2 hv2 + 2 gh
∂z = gh − ∂x − 1 ∂z = gh − ∂x − 2
⎫ ⎪ ⎪ ⎬
M 2 |v|v1 h4/3 M 2 |v|v2 h4/3
⎪ ⎪ ⎭
(2) are used. In (1) t is the time, x is the space coordinate, A = A (x, t) is the wetted cross section area, Q = Q (x, t). Furthermore, I 1 = I1 (x, h) =
Numerical Simulations
295
two-dimensional and then the two models can be connected through appropriate boundary conditions. Also, different other boundary conditions necessary for simulation of sources, junctions, weirs, dams, etc. must be introduced. Therefore, in applications physical domains are covered with a non-trivial net of one-dimensional and two-dimensional mathematical models connected through numerous boundary conditions. Next level in the simulations is the application of the numerical schemes. Both mathematical models (1) and (2) are actually hyperbolic systems of balance laws, so only appropriate numerical schemes can be used. Additionally, system (1) has a spatially variable flux and both systems (1) and (2) have the source term of the geometrical type. These two additional difficulties are not usually covered with the general case numerical schemes for hyperbolic conservation laws and only numerical schemes that are well-balanced are acceptable. Berm`u` dez and Va´ zquez in [3] introduced the concept of the conservation property (C-property) for the one-dimensional shallow water equations, i.e., they pointed out that source terms must be upwinded and balancing between the flux gradient and the source term must be obtained. Greenberg and LeRoux in [12] constructed and proved convergence of a well-balanced scheme for a balance law with source term, which is, for example, the prototype for shallow water equations. As Gosse explains in [14] the class of well-balanced schemes in the sense of Greenberg and LeRoux strictly contains schemes with C-property. Furthermore, Bermu` dez et al. applied the concept of C-property to the two-dimensional shallow water equations [2]; Bristeau and Perthame created kinetic schemes for the model of transport of a pollutant or temperature in shallow waters that also have the capability to preserve the steady state of still water [4]; Perthame and Simeoni in [23] showed analytically and numerically that their well-balanced kinetic scheme for shallow waters in addition to the C-property had the capability to preserve the nonnegativity of the height of the water and to satisfy an entropy inequality; LeVeque introduced a ¨ quasi-steady wave propagation algorithm in [21]; Jenny and Muller modified a Rankine–Hugoniot–Riemann solver for the presence of source terms [18]; Jin constructed a steady-state capturing method combining Godunov or Roe type upwind with source term balancing [19]; Chinnaya and LeRoux developed a new well-balanced general Riemann solver for the shallow water equations [8]; Gosse created well-balanced Godunov-type schemes for conservation laws with source terms [13], [14] based on a nonconservative reformulation of the right-hand side [10], [20] and on modified homogeneous Riemann problems where the sources are included in generalized jump relations; and so forth. Regarding the second difficulty, i.e., the spatially dependent flux, Towers in [24] proves convergence of a simple difference scheme for conservation laws with spatially varying source term and spatially varying flux. However, the discussed case consists of the flux in form of a product of a positive coefficient
296
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
dependent on space variable and a function dependent on state variables, so it does not apply to open-channel flow equations. On the other hand, V a´ zquez in [25], Hubbard and Garc´´ıa–Navarro in [17], Garc´ıa–Navarro and Va´ zquez– Cend´o´ n in [11] and Burguete and Garc´ıa–Navarro in [7] treated this additional problem; papers [25], [17], [11] are restricted to open channel flows with rectangular geometries, and only [7] gives results regarding the general case. In particular, in [27] we solved this general case with appropriately modified upwind schemes – Q-scheme and flux limited scheme. Furthermore, Bale et al. [1] deal with the balance laws with spatially variable flux functions, give a new version of the wave propagation method and apply it to elastic wave equations. We must emphasize that in all the mentioned results proposed schemes for balance laws of the discussed type are at most of the second order of accuracy. A very well known set of high order and shock capturing schemes are the essentially nonoscillatory (ENO) and the weighted essentially nonoscillatory (WENO) schemes. ENO schemes were developed by Harten et al. [15], [16], and WENO were first proposed by Liu et al. [22]. These schemes were originally designed for hyperbolic conservation laws with autonomous, i.e., not spatially variable, flux functions. In [26] we extended the finite difference version of these schemes to new schemes for hyperbolic balance laws with geometrical source term and applied them successfully to shallow water equations, i.e., we obtained schemes that are well-balanced, and even more exactly consistent with quiescent flow. We proposed a similar WENO extension for the sediment transport equations in [9].
2.
Wetting and drying
Furthermore, in simulations of water wave propagation and flooding a very significant problem appears: the problem of wetting and drying. Several ideas how to deal with this problem can be found in [5] where the Q-scheme is considered and in [6] where the wetting/drying front treatment is considered in combination with some multidimensional upwinding techniques. We developed a different approach that we here explain on the case of the application of the balanced Q-scheme to two-dimensional shallow water equation (2). Q-scheme is a conservative scheme of the form un+1 = uni − i
∆t ∆t n lij Fnij + |T Tij |gij . |C Ci | |C Ci | j∈Ki
(3)
j∈Ki
Here uni is the average value of the solution in the cell at time t = t n , Ki is set of indices of all the neighboring cells to C i , |C Ci | is area of Ci , |T Tij | is area of the part Tij of the cell Ci , lij is the length of the edge Γij between Ci and Cj , and ∆t is a time step, and nij is the unit vector normal to Γij . The term Fnij is n is the approximation to the the numerical flux through the edge Γ ij , while gij
297
Numerical Simulations
source term. The numerical flux for the Q-scheme is given by Fnij =
1 1 n Fi · nij + Fnj · nij − |Qnij |(unj − uni ) 2 2
(4)
which in combination with the formula for the source term evaluation n gij = I − |Qnij |Qnij −1 Gnij
(5)
leads to the numerical scheme that includes balancing between the flux gradient and the source term. Here Qnij = A(unij ), where unij denotes the average state (Roe or van Leer) of the states uni and unj . Gnij is numerical approximation of the source term at the state unij and for the two-dimensional shallow water equations is ⎞ ⎛ 0 n lij (zj −zi ) ⎜ ⎟ Gnij = ⎝ −ghij 2|TTij | (nij )x ⎠ . (6) −ghnij
lij (zj −zi ) 2|T Tij |
(nij )y
The second part of the source term that models friction losses is evaluated pointwise. The described numerical procedure can be applied to the cells of the wetted domain where the hyperbolic conservation law is valid. The problem arises on the moving boundaries of the wetted domain that are, in the numerical sense, made from the cells that are not completely surrounded by other wet cells. We propose first to include in computations the whole domain, treating the moving boundaries as the wetting fronts that are included in the ordinary numerical procedure, where the dry cells have zero depth. This approach shows very good results when the wetting front advances over the flat or downward slope, while the spurious velocities arise along the wet/dry contour when it advances over the adverse sloped riverbed. In such a situation some adaptations of the numerical scheme are necessary. The correction we propose must be done on the cells with partially wet bed slopes. The crucial thing we want to obtain with the corrected numerical scheme is the global balancing between the flux gradient and the source term. For the sake of simplicity let us consider just the quiescent steady state with the presence of the wet/dry boundary on which the balancing condition is not satisfied. Precisely, the problematic situations that need some special treatment appear inside the cell Ci , that has some neighboring dry cell C j , and they can be numerically identified with conditions hi > 0, hj = 0 and hi + zi < zj .
(7)
298 Table 1.
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING Computed propagation times, Toce river one-dimensional models author Ahmed, HR Wallingford Villanueva, UT Zaragoza Goutal, EDF Paquier, CEMAGREF Soares, UC Louvain Sopta
two-dimensional models time 36s 79s 42s 56s 36s 38s
author Alcrudo, UT Zaragoza Goutal, EDF Nujic, UBW Munchen Paquier, CEMAGREF Soares, UC Louvain Sopta Szydlowski, TU Gdansk
time 54s 54s 54s 51s 58s 50s 50s
To ensure the quiescent steady state on that problematic edge Γ ij we propose correction of (6) as ⎞ ⎛ 0 n lij hi ⎟ ⎜ Gnij = ⎝ −ghij 2|TTijj | (nij )x ⎠ . (8) l h
ij i −ghnij 2|T Tij | (nij )y
This redefinition of the numerical source term can be understood as instantaneous repositioning of the cell boundary to the real fluid boundary. Additionally, we propose to reduce the local velocities to zero. Although we focus just on the preserving of the still water state, the presented wetting/drying front treatment gives very good results for the unsteady flows too.
3.
Simulations
In this section we present several simulations of water wave propagation and flooding. All the computations are executed in the software we developed. Most of the state of the art numerical schemes as well as the numerical schemes we developed are implemented in that software, which gives us the opportunity to test and compare various methods and approaches. The software code is object oriented and written in C++. In the software we use OpenGL for graphical presentation of the computational results. All the results presented in this section are obtained by using our wetting/drying modification of the balanced Q-scheme for the two-dimensional shallow water model. The first numerical simulation we present is the CADAM dam break experiment. The physical model on which experiment was executed can be seen on Fig. 1. One view of the computed water level at the end of the propagation is presented on Fig. 2. The comparison between measurement and computation at the gauge point placed just at the channel corner is shown on Fig. 3. The second presented problem is the problem of water wave propagation in the Toce river. For the Toce river CADAM group constructed the physical model (Fig. 4) and then measurements at several gauge points were conducted.
299
Numerical Simulations Table 2.
Highest water levels, Malpasset river measurement points
measurement
computation
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17
79.15m 87.20m 54.90m 64.70m 51.10m 43.75m 44.35m 38.60m 31.90m 40.75m 24.15m 24.90m 17.25m 20.70m 18.60m 17.25m 14.00m
78.80m 86.54m 53.21m 58.60m 48.12m 48.10m 46.00m 33.79m 32.63m 36.73m 20.52m 30.00m 19.93m 17.47m 18.00m 16.95m
The computed water level in our simulations can be seen in Fig. 5. Table 1 gives comparison of computational results obtained by different authors, including our results for the propagation time from gauge point P1 to P26. The measured value for that time is 40 seconds. The last presented simulations are for the dam break on the Malpasset river. This dam break was not an experiment on a physical model, but an actual event that happened in 1959. The geometry of the domain and the triangulation used in our as well as other authors computations are shown in Fig. 6. Fig. 7 presents computed water level for two different propagation moments. Measured and computed highest water levels in different measurement points along the Malpasset riverbed are given in Table 2. The measured data were collected by the local police during the catastrophic event, therefore are certanly of limmited accuracy.
300
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Figure 1.
Physical model for the CADAM dam break experiment
Figure 2.
Computed water level, CADAM dam break experiment
301
Numerical Simulations
Figure 3.
Measurement vs. computation, the CADAM dam break experiment
Figure 4.
Physical model for the Toce river
302
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Figure 5.
Computed water level, Toce river
Figure 6.
Triangulation and gauge points for the Malpasset river
Figure 7.
Computed water level for different propagation times
Numerical Simulations
303
References [1] D. S. Bale, R. J. LeVeque, S. Mitran, and J. A. Rossmanith, A wave propagation method for conservation laws and balance laws with spatially varying flux functions, preprint, (2001). [2] A. Berm`u` dez, A. Dervieux, J. A. De´ sidee´ ri and M. E. Va´ zquez, Upwind schemes for the two-dimensional shallow water equations with variable depth using unstructured meshes, Comput. Methods Appl. Mech. Eng. 155, 49 (1998). [3] A. Berm`u` dez and M. E. Va´ zquez, Upwind methods for hyperbolic conservation laws with source terms, Comput. Fluids 23(8), 1049 (1994). [4] M. O. Bristeau and B. Perthame, Transport of pollutant in shallow water using kinetic schemes, ESAIM-Proceedings Vol. 10 – CEMRACS, 9 (1999). [5] P. Brufau, M. E. V´a´ zquez–Cendo´ n, P. Garc´ıa-Navarro, A numerical model for the flooding and drying of irregular domains, Int. J. for Numerical Methods in Fluids, Vol. 39, 247-275 (2002). [6] P. Brufau, P. Garc´´ıa-Navarro, Unsteady free surface flow simulation over complex topography with multidimensional upwind technique, J. of Computational Physics, Vol. 186, 503-526 (2003). [7] J. Burguete and P. Garc´´ıa–Navarro, Efficient construction of high-resolution TVD conservative schemes for equations with source terms: application to shallow water flows, Int. J. Numer. Meth. Fluids 37 (2001), doi: 10.1002/fld.175. [8] A. Chinnaya and A.–Y. LeRoux, A new general Riemann solver for the shallow water equations, with friction topography, http://www.math.ntnu.no/ conservation/1999/021.html. [9] N. Crnjaric-Zic, S. Vukovic, and L. Sopta, Extension of ENO and WENO schemes to one-dimensional sediment transport equations, Comput. Fluids, article in press. [10] G. Dal Maso, P. G. LeFloch and F. Murat, Definition and weak stability of a nonconservative product, J. Math. Pures Appl., 74, 483, (1995). [11] P. Garc´´ıa–Navarro and M. E. Va´ zquez–Cendo´ n, On numerical treatment of the source terms in the shallow water equations, Comput. Fluids 29, 951 (2000). [12] J. M. Greenberg and A.–Y. LeRoux, A well-balanced scheme for the numerical processing of source terms in hyperbolic equations, SIAM J. Numer. Anal., 33, 1 (1996). [13] L. Gosse, A well-balanced flux-vector splitting scheme designed for hyperbolic systems of conservation laws with source terms, Comput. Math. Appl., 39, 135 (2000). [14] L. Gosse, A well-balanced scheme using non-conservative products designed for hyperbolic systems of conservation laws with source terms, Math. Models Methods Appl. Sci., 11, 339 (2001). [15] A. Harten and S. Osher, Uniformly high-order accurate non-oscillatory schemes, I,SIAM J. Numer. Anal. 24, 279 (1987). [16] A. Harten, B. Engquist, S. Osher, and S. R. Chakravarthy, Uniformly high-order accurate non-oscillatory schemes, III,J. Comput. Phys. 71, 231 (1987). [17] M. E. Hubbard and P. Garc´´ıa–Navarro, Flux difference splitting and the balancing of source terms and flux gradients, J. Comput. Phys. 165, (2000), doi: 10.1006/jcph.2000.6603. ¨ [18] P. Jenny and B. Muller, Rankine–Hugoniot–Riemann solver considering source terms and multidimensional effects, J. Comput. Phys., 145, 575 (1998).
304
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[19] S. Jin, A steady-state capturing method for hyperbolic systems with geometrical source terms, Math. Model. Num. Anal., 35 631 (2001). [20] P. G. LeFloch and A. E. Tzavaras, Representation of week limits and definition of nonconservative products, SIAM J. Math. Anal., 30, 1309, (1999). [21] R. J. LeVeque, Balancing source terms and flux gradients in high-resolution Godunov methods: the quasi-steady wave-propagation algorithm, J. Comput. Phys. 146, 346 (1998). [22] X.-D. Liu, S. Osher, and T. Chan, Weighted essentially non-oscillatory schemes, J. Comput. Phys. 115, 200, (1994). [23] B. Perthame and C. Simeoni, A kinetic scheme for the Saint-Venant system with a source term, CALCOLO, 38(4), 201 (2001). [24] J. D. Towers, A difference scheme for conservation laws with a discontinuous flux: the nonconvex case,SIAM J. Numer. Anal., 39, 1197 (2001). [25] M. E. V´a´ zquez–Cendo´ n, Improved treatment of source terms in upwind schemes for the shallow water equations in channels with irregular geometry, J. Comput. Phys. 148, 497 (1999). [26] S. Vukovic and L. Sopta, ENO and WENO schemes with the exact conservation property for one-dimensional shallow water equations, J. Comput. Phys. 179, 593 (2002), doi: 10.1006/jcph.2002.7076. [27] S. Vukovic and L. Sopta, Upwind schemes with exact conservation property for onedimensional open channel flow equations, SIAM J. Scient. Comput., 24(5), 1630 (2003)
DERIVATION OF A MODEL OF LEAF SPRINGS Josip Tambaˇca Department of Mathematics, University of Zagreb, Bijeniˇcˇ ka 30, 10000 Zagreb, Croatia
[email protected]
Abstract
The behavior of springs in linearized theory can be expressed by the stiffness coefficient of the spring. In this paper we derive the stiffness coefficient for leaf springs in terms of geometric properties of the spring and elastic properties of material the spring is made of.
Keywords:
leaf springs, stiffness coefficient, elasticity, rod model.
1.
Introduction
The linearized behavior of springs is expressed through relation F = ku, where F is the force applied at the spring end, u is the displacement it produces and k is the stiffness coefficient. Therefore the behavior of springs is determined by k. It is geometry dependent (number of leaves, length, width, height and shape) and material dependent (elasticity constants of material). Once the spring is made the stiffness coefficient is easily measured. The purpose of this paper is to determine the coefficient (approximately) in advance knowing the material and geometry properties of leaf springs. We consider the leaf spring of length 2 > 0 as the union of n leaves of lengths 2i , for i = ni , i = 1, . . . , n and apply the equal forces at the ends of the longest leaf. We suppose the spring is supported in the middle.
Figure 1.
The multileaf spring
305 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 305–315. © 2005 Springer. Printed in the Netherlands.
306
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
In the case of one straight rod this problem can be replaced by the problem defined at the one half of the rod with the clamping boundary condition at the middle. We do the same here and consider the problem for the domains on Figure 2 with the clamping boundary condition at the left end.
Figure 2.
A half of the spring
The idea of considering the leaf spring as the union of straight (or not) rods is to apply the techniques developed for modelling of rods. The obstacle appears in the transmission conditions at the contact of rods and the construction of test functions satisfying them.
2.
Geometry of straight multileaf springs
In this section we define the multileaf spring with n-leaves of length and thickness ε > 0. Let a, b > 0. Let Φiε : [0, i ] → R3 be defined by Φiε (s) = (s, 2(i − 1)εa, 0) T . Then Φiε is the natural parametrization of the straight line. The straight lines defined by Φiε will be the middle curves of the straight rods. For their cross– sections we use S = [−a, a] × [−b, b], S ε = εS. Then the sets i Ωi = [0, ] × S, n
Ωiε = [0,
i ] × Sε , n
i = 1, . . . , n,
have cross–sections B i (s) = {s} × S, Bεi (s) = {s} × Sε , i = 1, . . . , n. Through the paper e1 , e2 , e3 denote the canonical basis in R3 . Then using mappings P iε : Ωiε → R3 ,
P iε (y) = Φiε (y1 ) + y2 e2 + y3 e3 ,
i = 1, . . . , n
we define ˜εi (s) = P iε (Bεi (s)), Ω ˜ iε = P iε (Ωiε ), i = 1, . . . , n, B
˜ε = Ω
n i=1
˜ iε . Ω
(2.1)
307
Derivation of a model of leaf springs
˜ iε we call a straight rod. Therefore Ω ˜ ε is an union of n straight rods. Each set Ω ε ˜ is a connected set. The contact of the neighborhood rods is given Note that Ω by ˜ iε ∩ Ω ˜ i+1 C˜εi = Ω i = 1, . . . , n − 1. ε , i+1 ). The Remark 1. For the point x ∈ C˜εi one has P iε (y i ) = x = P i+1 ε (y i i+1 connection between y and y is obvious
y1i = y1i+1 ,
y2i = εa,
y2i+1 = −εa,
y3i = y3i+1 ,
Let r : R3 → R3 , r(y1 , y2 , y3 ) = (y1 , −y2 , y3 )T and C˜εi = P iε (C Cεi ).
Cεi = [0, i ] × {εa} × [−εb, εb],
Then P iε |Cεi = P i+1 ◦ r|Cεi . We also denote C i = [0, i ] × {a} × [−b, b]. ε
3.
3D elasticity problem
˜ ε to be a linearized isotropic homogenous elastic body with We consider Ω ´ the Lam´e coefficients λ and µ and denote Aσ = λ (tr σ) I + 2µσ, σ ∈ ˜εi (0), Sym (R3 , R3 ). We assume that the elastic body is clamped at all bases B ε ε i = 1, . . . , n, that a contact force with surface density F˜ = −F˜ e2 is applied ˜ n () and that the rest of the boundary is force free. Then the at the base B ε ˜ ε belongs to the function space ˜ ε of the body Ω equilibrium displacement U ˜ ε ) = V˜ ∈ H 1 (Ω ˜ ε ; R3 ) : V ˜ | ˜ i = 0, i = 1, . . . , n V(Ω B (0) ε
˜ ε ; R3 ). The which is a Hilbert space equipped with the scalar product of H 1 (Ω ε ˜ equilibrium displacement U is a unique solution of the following variational ˜ ε ) such that ˜ ε ∈ V(Ω equation: find U ! ! ε ε ˜ ε ). ˜ ˜ ˜ dS, V ˜ ∈ V(Ω Ae(U ) · e(V )dV = (3.1) F˜ · V ˜ε Ω
˜εn () B
The existence and uniqueness of the weak solution of (3.1) is mainly a consequence of the Korn inequality and the Lax–Milgram lemma. ˜ ε of (3.1) when ε tends We want to consider the behavior of the solution U to zero to obtain a one–dimensional model of leaf–springs. In order to make this asymptotic we need to change the coordinates in the equation.
4.
The problem in ε–independent domain ˜i = V ˜ | ˜ i , i = 1, . . . , n, V ˜ 0 = (V˜ 1 , · · · , V ˜ n ) and Let us denote V Ωε ˜ iε ) = V ˜ iε ; R3 ) : V ˜ ∈ H 1 (Ω ˜ | ˜ i = 0 , i = 1, . . . , n, V(Ω B (0) ε
308
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
˜ 1ε ), · · · , V(Ω ˜ nε )) : V ˜ 0ε = V ˜ 0 ∈ (V(Ω ˜ i | ˜i = V ˜ i+1 | ˜ i , i = 1, . . . , n − 1 . V C C ε
ε
˜ ε ) ⇐⇒ V ˜0 ∈ Then V˜ ∈ V(Ω ˜ ε to the sum of n integrals on the Using (2.1) we can split the integral over Ω ˜ 0ε ∈ V ˜ 0ε such that curved rods. Therefore (3.1) becomes: find U ˜ 0ε . V
n ! i=1
P iε (Ωiε )
˜ iε ) · e(V˜ i )dV = Ae(U
! P iε (Bεn ())
ε ˜ i dS, F˜ · V
˜0 ∈ V ˜ 0ε . (4.1) V
Now each integral on the left hand side of this formula we rewrite in the local coordinates using the parametrization by P iε . We introduce the notation ˜ ε ◦ P iε , U iε = U i
˜ ◦ P iε , Vi =V i
F ε = F˜ ◦ P nε ε
i = 1, . . . , n,
on Ωiε and denote V0 = (V 1 , · · · , V n ), U0ε = (U 1ε , · · · , U nε ). The corre˜ i ) is the space V(Ωi ) = {V ∈ H 1 (Ωi ; R3 ) : sponding function space to V(Ω ε ε ε ˜ 0ε is the space V |Bεi (0) = 0}, while the corresponding space to V V 0ε = V0 ∈ (V(Ω1ε ), · · · , V(Ωnε )) : V i |Cεi = V i+1 ◦ r|Cεi , i = 1, . . . , n − 1 . Then the equilibrium equation (4.1) reads as follows: find U 0ε ∈ V 0ε such that ! n ! i i Ae(U ε ) · e(V ) dV = F ε · V n dS, V0 ∈ V 0ε . (4.2) i=1
Ωiε
Bεn ()
˜ 0ε and U0ε are posed on ε–dependent domains. Now we Problems for both U transform the problem (4.2) to ε–independent domain. As a consequence, the coefficients of the resulting weak formulation will depend on ε explicitly. Let Rε : Ωn → Ωnε be defined by Rε (z) = (z1 , εz2 , εz3 )T , z ∈ Ωn , ε ∈ (0, ε0 ). To the functions U iε , F ε defined on Ωiε we associate the functions ui (ε), f (ε)
Figure 3.
Change of varibles
defined on Ωi by composition with Rε for i = 1, . . . , n. Note that Rε ◦ r =
309
Derivation of a model of leaf springs
r ◦ Rε . Let V(Ωi ) = {v ∈ H 1 (Ωi ; R3 ) : v|B i (0) = 0} and v0 = (v 1 , . . . , v n ), u0 (ε) = (u1 (ε), . . . , un (ε)). Moreover we define V 0 (ε) = v0 ∈ (V(Ω1 ), · · · , V(Ωn )) : v i |C i = v i+1 ◦ r|C i , i = 1, . . . , n − 1 , We also denote γ ε (v) = 1ε γ z (v) + γ y (v) where ⎛ ⎞ ∂1 v1 21 ∂1 v2 12 ∂1 v3 γ y (v) = ⎝ 12 ∂1 v2 0 0 ⎠, 1 0 0 2 ∂1 v3 ⎛ ⎞ 1 1 0 2 ∂2 v1 2 ∂3 v1 1 ⎠. γ z (v) = ⎝ 12 ∂2 v1 ∂2 v2 2 (∂2 v3 + ∂3 v2 ) 1 1 ∂3 v3 2 ∂3 v1 2 (∂2 v3 + ∂3 v2 ) Then (4.2) can be written by: find u 0 (ε) ∈ V 0 (ε) such that ! n ! ε i ε i Aγ (u (ε)) · γ (v ) dV = f (ε) · v n dS, v0 ∈ V 0 (ε). (4.3) i=1
Ωi
B n ()
For the purpose of asymptotic analysis we assume f (ε) = ε 2 f = ε2 f e2 .
5.
A priori estimates
We begin with the properties of the coefficients of (4.3). Direct calculation shows that there are constants mA , MA > 0, independent of ε ∈ (0, ε0 ), such that for all z ∈ Ω one has mA I ≤ A ≤ MA I. The major tool in proving a priori estimates is the Korn inequality. We use the result derived for the curved rods in [2] and [3]. Lemma 1 (The Korn inequality). There are C K > 0 and ε0 > 0 such that for all ε, 0 < ε ≤ ε0 one has v H 1 (Ω)3 ≤
CK ε γ (v) L2 (Ω)9 , ε
v ∈ V(Ω).
Now we apply the Korn inequality for each rod Ω i to obtain that there are εK and CK such that for all ε ∈ (0, εK ) one has n ma i u (ε) 2H 1 (Ωi )3 2 CK i=1
n 1 ε i ≤ ma γ (u (ε)) 2L2 (Ωi )9 ε2 i=1 ! n 1 ≤ Aγ ε (ui (ε)) · γ ε (ui (ε)) dV ε2 Ωi i=1
≤ f L2 (B n ()) un (ε) L2 (B n ()) .
310
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
The a priori estimates follow: there are ε K and C such that for all ε ∈ (0, εK ) one has ui (ε) H 1 (Ωi ) ≤ C,
γ ε (ui (ε)) L2 (Ωi ) ≤ C,
i = 1, . . . , n. (5.1)
These a priori estimates and the weak sequential compactness of the unit ball in H 1 (Ωi )3 and L2 (Ωi )9 imply the convergence result. More precisely, there is a sequence in (0, εK ) converging toward zero, still denoted by ε, and functions ui ∈ V(Ωi ) and γ i ∈ L2 (Ω)9 , i = 1, . . . , n such that ui (ε) ui weakly in V(Ωi ), 1 ε i γ (u (ε)) γ i weakly in L2 (Ωi )9 , ε
(5.2) (5.3)
when ε tends to zero, for i = 1, . . . , n. The limit functions u i and γ i are not independent. The connection is given in the following Lemma which is borrowed from [2] as well. Let G10 (0, c) = {v ∈ H 1 (0, c) : v(0) = 0} and G20 (0, c) = {v ∈ H 2 (0, c) : v(0) = v (0) = 0}. Lemma 2. Let (εn )n∈N be a sequence of positive real numbers such that ε n → 0. Let a sequence (v(εn ))n∈N in V(Ω) satisfies v(εn ) v weakly in H 1 (Ω)3 , 1 εn γ (v(εn )) γ weakly in L2 (Ω)9 , εn
(5.4) (5.5)
as εn → 0. Then v ∈ {0} × G20 (0, ) × G20 (0, ) and there is ψ ∈ G10 (0, ), v = (v, ψ), such that −v2 = ∂2 γ11 ,
−v3 = ∂3 γ11 ,
ψ = ∂2 γ13 − ∂3 γ12 .
If the convergence in (5.5) is strong, then the convergence in (5.4) is also strong. An application of Lemma 2 implies that for each i ∈ {1, . . . , n} there are functions φi such that (ui , φi ) ∈ {0} × G20 (0, i ) × G20 (0, i ) × G10 (0, i ) and i −(ui2 ) = ∂2 γ11 ,
i −(ui3 ) = ∂3 γ11 ,
i i (φi ) = ∂2 γ13 − ∂3 γ12 .
(5.6)
The property u0 (ε) ∈ V 0 (ε) implies ui (ε)|C i = ui+1 (ε)◦r|C i , i = 1, . . . , n− 1. Using the convergence (5.2) we obtain ui |C i = ui+1 ◦ r|C i ,
i = 1, . . . , n − 1.
Independence of ui of the last two variables implies the following result: Lemma 3. If we denote un = u one has ui = u|[0,i ] ,
i = 1, . . . , n.
311
Derivation of a model of leaf springs
6.
The first test function
Let b, c, e, g ∈ R and di = −2(n − i)ac, hi = −2(n − i)ag for i = 1, . . . , n and q ∈ H 1 (0, ) such that q(0) = 0. Then we define functions
1 2 1 2 i i i v (ε)(z) = q(z1 ) z b − z3 g + z2 c + d e2 + z2 z3 g + z3 (e + h ) e3 . 2 2 2 Then v i (ε)(z1 , a, z3 ) = v i+1 (ε)(z1 , −a, z3 ), so v0 (ε) = (v 1 (ε), . . . , v n (ε)) ∈ ˜ V(ε) is a test function for the equation (4.3). One has ⎛ ⎞ 0 0 0 ⎠. 0 εγ ε (v i (ε)) → q(z1 ) ⎝ 0 bz2 + c i 0 0 gz2 + h + e We now insert the constructed test function in (4.3) and take the limit ε to zero to obtain ⎛ ⎞ n ! j n ! 0 0 0 ⎠ dS dz1 . 0 0= q(z1 ) Aγ i · ⎝ 0 bz2 + c j −1 i S 0 0 gz2 + h + e j=1 i=j Because q is arbitrary in H 1 (0, ) it follows that for a. e. z1 one has n ! 0= (Aγ i )22 (bz2 + c) + (Aγ i )33 (gz2 + hi + e) dS i=j
(6.1)
S
, < Free choice of e implies ni=j S (Aγ i )33 dS = 0f orallj=1, l . . . , n. Therefore hi can be dropped from (6.1) to obtain n ! n ! i 0= (Aγ )αα z2 dS, 0 = (Aγ i )αα dS, α = 2, 3 (6.2) S
i=j
S
i=j
for almost every z1 and j = 1, . . . , n. Hi , γ Hi , γ Hi by Let us define functions γ22 23 33 i γ22 =−
1 λ Hi γ i + γ22 , 2 λ + µ 11
i γ33 =−
1 λ Hi γ i + γ33 , 2 λ + µ 11
i Hi γ23 = γ23 .
Then i Hi Hi (Aγ i )11 = Eγ11 + λ(γ22 + γ33 ), i Hi Hi Hi (Aγ )22 = λ(γ22 + γ33 ) + 2µγ22 , Hi Hi Hi (Aγ i )33 = λ(γ22 + γ33 ) + 2µγ33 ,
Summing results in (6.2) we obtain n ! Hi Hi 0= (γ22 + γ33 )z2 dS, i=j
S
0=
n ! i=j
S
Hi Hi (γ22 + γ33 ) dS.
(6.3)
(6.4)
312
7.
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
The second test function Let us take v 0 ∈ {0} × G20 (0, ) × G20 (0, ) and define v 0,i (s) = v 0 (s) , s ∈ [0, i ], i = 1, . . . , n, v11,i (s) = − v20 (s)z2 + 2(n − i)a v20 (s) + C(s),
v21,i = v31,i = 0,
for i = 1, . . . , n and C ∈ G1 (0, ). Then one has v 0,i = v 0,i+1 ◦ r,
v 1,i = v1,i+1 ◦ r
on Cεi ,
i = 1, . . . , n − 1. (7.1)
Let us now define v i (ε) = v 0,i + εv 1,i ,
i = 1, . . . , n.
(7.2)
The conditions (7.1) make v0 (ε) = (v 1 (ε), . . . , v n (ε)) a good test function, ˜ 0 (ε). We calculate 1 γ ε (v i (ε)): i.e. it belongs to V ε 1 ε i 1 1 γ (v (ε)) = 2 γ z (v 0,i ) + γ z (v 1,i ) + γ y (v 0,i ) + γ y (v 1,i ) + O(ε). ε ε ε γ z (v 0,i ) = 0, ⎛
γ z (v 1,i ) + γ y (v 0,i ) = 0, ⎞ Qi1 (v 0 , C) − (v20 ) z2 · · γ y (v 1,i ) = ⎝ 0 0 · ⎠ 0 0 0 where Qi1 (v 0 , C) = 2(n − i)a v20 + C . Now we insert the test function of the form (7.2) into the variational equation (4.3) and take the limit ε → 0 to obtain ! n ! i 1,i Aγ · γ y (v ) dV = f · v 0 dS. i=1
Ωi
B n ()
Let us denote the integral on the left hand side by I. Then according to (6.3) and (6.4) one gets ! n ! n ! j n i 0 i 0 i I= Q1 (v , C) Eγ11 dS − (v2 ) Eγ11 z2 dS dz1 j=1
j −1
i=j
S
i=j
S
From (5.6) we know that there is a function of z 1 only, denoted by Qi , such that i = Qi − (ui ) z − (ui ) z . Using ui = u we obtain γ11 2 3 2 3 i γ11 = Qi − u 2 z2 − u 3 z3 .
The property of the cross–section S and the form of γ i implies that the limit equation is given by ! n ! j n i 0 i 0 0 E |S| Q1 (v , C)Q +(n−j+1)EII2 u2 (v2 ) dz1 = v2 ()) f dS. j −1 j=1 1
i=j
S
313
Derivation of a model of leaf springs
The equation for the test function C implies 0=
n ! j=1
j j −1
E|S|
n
C Qi dz1 =
i=j
n ! j=1
j j −1
C E|S|
n
Qi dz1 .
(7.3)
i=j
Let C|[j−1 ,j ] ∈ G10 (j−1 , j ) and let C be constant outside of [j−1 , j ] to , satisfy C ∈ G10 (0, ). For such test functions from (7.3) we obtain ni=j Qi = 0, j = 1, . . . , n, which implies Qi = 0, i = 1, . . . , n. Theorem 1. The limit function u2 is the unique solution of the variational problem: find u2 ∈ G20 (0, ) such that n ! j=1
j
(n − j +
j −1
1)EII2 u 2 (v20 ) dz1
! =
v20 ()
S
f dS, v20 ∈ G20 (0, ). (7.4)
Proof. All constants in the form on the left hand side are positive, so the form itself is G20 (0, )–elliptic. Therefore, the Lax–Milgram lemma implies the existence and uniqueness of the solution of the model. Remark 2. Number of terms in (7.4) at point z 1 is equal to the number of leaves at that point (see Figure 4).
Figure 4.
The model
In the sequel we follow the exposition in the case of the curved rods and prove the strong convergence of the whole families with respect to ε and uniqueness of the limit. Let us define
n ! 1 ε i 1 ε i ε i i Λ := A γ (u (ε)) − γ · γ (u (ε)) − γ , ε > 0. ε ε Ωi i=1
Tensor A is positive definite, so there is a constant C > 0 such that 1 γ ε (ui (ε)) − γ i 2L2 (Ωi )9 ≤ CΛε . ε
(7.5)
The equation (4.3) implies that in the limit, when ε tends to zero, we obtain ! f dS −
ε
Λ = lim Λ = u2 () ε→0
S
n ! i=1
Ωi
Aγ i · γ i .
314
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
The definition of A and γ i after some calculation implies
n ! j n ! 2 i 2 i 2 Λ = − (n − j + 1)EII3 (u3 ) + 4µ(γ12 ) + 4µ(γ13 ) j=1
j −1
Hi +λ(γ22
+
i=1 Hi 2 γ33 )
+
Hi 2 2µ(γ22 )
+
Ωi
Hi 2 4µ(γ23 )
+
Hi 2 2µ(γ33 )
.
As a limit of nonnegative numbers Λ is nonnegative. Therefore u 3 = 0
i i Hi Hi Hi γ12 = γ13 = γ22 = γ23 = γ33 = 0,
and because u3 ∈ G20 (0, ) one has u3 = 0 as well. Finally (7.5) implies the strong convergence in (5.3) which implies the strong convergence in (5.2) by Lemma 2. Theorem 2. Let u0 (ε) be the unique solution of (7.4). Then (7.6) ui (ε) → (0, u2 , 0)T strongly in H 1 (Ω)3 , ⎛ ⎞ −u2 0 0 1 ε i 1 λ ⎝ ⎠ strongly in L2 (Ω)9 , (7.7) 0 u 0 γ (u (ε)) → 2 λ+µ 2 ε 1 λ 0 0 2 λ+µ u2 where u2 ∈ G20 (0, ) is the unique solution of (7.4). Proof. Uniqueness of the solution of the model (7.4) implies that the limits in (7.6) and (7.7) are unique. Therefore the whole families (w.r.t. ε) converge.
8.
The model
To derive the model of leaf springs and the stiffness coefficient we write (7.4) in the differential form. The solution u 2 ∈ G20 (0, ) of (7.4) satisfies u 2 = 0,
on (j−1 , j ),
j = 1, . . . , n.
(8.1)
The fact u2 ∈ G20 (0, ) implies (here + and - denote the right and left limit) u2 (j −) = u2 (j +),
u 2 (j −) = u 2 (j +),
j = 1, . . . , n − 1,
(8.2)
n−j u (j +), n−j +1 2
(8.3)
Partial integration also gives the contact conditions u 2 (j −) =
n−j u (j +), n−j+1 2
j u 2 ( −) =
for j = 1, . . . , n − 1. The boundary conditions are u2 (0+) = 0,
u 2 (0+) = 0,
n −EII2 u 2 ( −) = f,
u 2 (n −) = 0. (8.4)
315
Derivation of a model of leaf springs
The solution of the transmission problem (8.1)–(8.4) at the end is given by : n ; 21 3 3 f u() = + n2 − n , 3 k 8n3 Ea3 b k=1
where I2 =
4 3 3 a b.
Therefore the stiffness coefficient is given by k=
8n3 3
2 3
Ea3 b 1 . + n 2 − n 3
,n
(8.5)
1 k=1 k
3
Asymptotically, when n → ∞ stiffness behaves like k ≈ 83 (n + 1) Ea3 b . 20 15 10 5
2
Figure 5.
4
The asymptotic (the dots are values
6
2 3
8
3
n 2n1 k=1 k
10
+n2 −n
, the line is given by 2n + 2)
Remark 3. In [1] one can find the following procedure for obtaining the stiffness coefficient of leaf springs. Let us consider the straight rod of length with variable cross–sections of the form S(x) = [−a, a] × [−b 0 (1 − x/), b0 (1 − x/)] that satisfies the problem: EI(x)u (x) = 0 in (0, ), − EIu () = f, (EIu )() = 0, u(0) = 0, u (0) = 0. Then I(x) = 43 a3 b0 (1 − x/) and by integration we obtain (with b = b 0 /n) 3 3 f 8 Ea3 b , so F = ku(), k = n . 8n Ea3 b 3 3 Note that relative asymptotic error of k with respect to (8.5) is 1/(n + 1). u() =
References [1] D. Bazjanac, Strength of materials, Tehniˇcˇ ka knjiga, Zagreb, 1968, in Croatian. [2] M. Jurak and J. Tambaˇca, Derivation and justification of a curved rod model. Math. Models Methods Appl. Sci. 9 (1999), 991–1014. [3] M. Jurak and J. Tambaˇca, Linear curved rod model. General curve. Math. Models Methods Appl. Sci. 11 (2001), 1237–1252.
QUANTUM SITE PERCOLATION ON AMENABLE GRAPHS Ivan Veseli´c∗ Forschungsstipendiat der Deutschen Forschungsgemeinschaft, currently: Department of Mathematics, California Institute of Technology, CA 91125, USA
[email protected],
http://homepage.ruhr-uni-bochum.de/ivan.veselic
Abstract
We consider the quantum site percolation model on graphs with a amenable group action. It consists of a random family of Hamiltonians. Basic spectral properties of these operators are derived: non-randomness of the spectrum and its components, existence of an self-averaging integrated density of states and an associated trace-formula.
Keywords:
¨ integrated density of states, random Schrodinger operators, random graphs, site percolation.
1.
Introduction: The Quantum percolation model
The quantum percolation model (QPM) consist of two building blocks which are both well studied in physics of disordered media. Let us first introduce the classical site percolation model. It is used to model the flow of liquid through porous media, the spreading of forest fires or of diseases etc. Consider the graph Z d where two vertices are connected by an edge if their Euclidean distance equals one. Equip each vertex v ∈ Z d with a random variable q(v) taking the value 0 with probability p and "∞" with probability 1 − p and being independent of the random variables at all other vertices. For each configuration of the randomness ω ∈ Ω let V (ω) := {v ∈ Zd | q(v) = 0}. The percolation problem consists in studying the properties of connected components — called clusters — of V (ω). Typical questions are: With what probability do infinite clusters exist? What is the average vertex number or diameter of a cluster? What is the probability that 0, v ∈ Z d are in the same cluster, etc.? One of the central results of percolation theory is the existence of a critical probability p c , such that for p > pc (respectively for ∗ New
address: Fak. f. Mathematik, D–09107 TU Chemnitz
317 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 317–328. © 2005 Springer. Printed in the Netherlands.
318
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
p < pc ) an infinite cluster exists (respectively does not exist) almost surely. See e.g. [11, 10] and the literature cited there. Random lattice Hamiltonians are used to describe the motion of waves in disordered media. Each of them is a family of operators on l 2 (Zd ) indexed by elements of a probability space. The family obeys an equivariance relation with respect to the action of a group. More precisely, the group acts in a consistent way on l2 (Zd ) by translations and on the probability space by ergodic, measure preserving transformations. The spectral features of these random operators allow one to draw conclusions about the transport and wave-spreading properties of the modelled medium. Monograph expositions of this topic can be found in [4, 16, 19]. Let us define the simplest QPM: Let (Aω f )(v) = f (w) for all v, w ∈ V (ω) and all f ∈ l 2 (V (ω)) (1) dist(v,w)=1
be the adjacency operator of V (ω) introduced above. More precisely, A ω is the adjacency operator of the induced sub-graph of Z d with vertex set V (ω). Here "dist" denotes the distance function on this graph. At this point let us explain why we chose ∞ as one of the values the random variable q(v) takes. The adjacency operator on Z d corresponds (up to an energy shift) to the kinetic energy part of a quantum Hamiltonian on the lattice. In this picture q corresponds to the potential energy. In the quantum percolation model it vanishes on some sites, on others it is infinitely high, i.e. forms an impenetrable barrier for the quantum wave function. The interesting feature of the QPM is that it defines a Laplacian on random geometry. More precisely, its domain of definition l 2 (V (ω)) varies with ω. This is the main difference to the random lattice operators considered in [4, 16]. After an extension of the notion of random lattice Hamiltonians the QPM fits in this framework. In our approach we rely on methods from [17, 14, 13], developed there to study operators on manifolds. The QPM was first studied in [8, 7]. There it was considered as a quantum mechanical model for electron-propagation in binary alloys where only one of the two types of atoms participates in the spreading of the wavepacket. The model attracted special attention because of the existence of molecular states, i.e. eigenvectors supported on finite regions of the infinite cluster, see [12, 5]. The last cited reference is the motivation of the present paper and our results can be seen as a mathematically rigorous version of some arguments in [5] and their extension to more general graphs. The integrated density of states (IDS) of a Hamiltonian is the number of eigenvalues per unit volume below a certain energy value. Thanks to the stationarity and ergodicity assumptions it is well defined for random Hamiltonians.
Quantum site percolation on amenable graphs
319
The IDS of a random Hamiltonian captures its global spectral features and its understanding is the prerequisite of the study of finer spectral properties. In the present work we analyze this quantity and provide therewith a basis for a further study of the QPM, cf. Section 4 and reference [21]. The next section states the results of this note, Section 2 is devoted to their proofs and the last section concludes with a discussion of further research topics.
2.
Results: Spectral properties of finite range hopping operators
To describe the geometric setting we are working in precisely, let us recall basic notions from graph theory and fix the notation along the way. A graph G = (V, E) is given by a set of verticesV = V (G) and a set of edges E = E(G) ⊂ V × V \ {(v, v) | v ∈ V } / ∼. Here ∼ denotes the relation (v, w) ∼ (w, v). If e = (v, w) ∈ E, we call v, w ∈ V nearest neighbours and endpoints of the edge e. By our definition a graph is simple: it contains neither multiple edges nor self-loops, i.e. edges joining a vertex to itself. A path (of length n) in G is an alternating sequence of vertices and edges {v0 , e1 , v1 , . . . en , vn } such that ej = (vj−1 , vj ) for all j = 1, . . . , n. If there is a path between two vertices v and w they are called (path) connected. This relation partitions the graph into (path connected) components. If a component contains a infinite number of distinct vertices we call it an infinite component. The distance between two vertices v, w ∈ V is defined by distG (v, w) := dist(v, w) := min{length of p | p is a path connecting v and w} (1) Note that the distance between v and w in a sub-graph of G may be larger than their distance in the original graph G. The vertex degree deg(v) of a vertex v ∈ V equals the number of edges e ∈ E, such that v is an endpoint of e. Let G and G be graphs. A map φ : G → G is called a graph-map or graph-homomorphism, if φ : V (G) → V (G ), φ : E(G) → E(G ) and if for any e = (v, w) ∈ E(G), the image φ(e) equals (φ(v), φ(w)). A graph-map φ : G → G which has an inverse graph-map is called a graph-automorphism or simply automorphism of G. Let Γ be a group of graph-automorphism acting on a graph X. It induces a projection map proj : X → X/Γ. We assume that the quotient is a finite graph. This implies in particular that the degree of the vertices in V is uniformly bounded. We denote the smallest upper bound by deg + . Chose a vertex [v] ∈ V (X/Γ) and a representative v ∈ [v] ⊂ V (X). Starting form v, lift pathwise the vertices and edges of X/Γ to obtain a connected set of vertices and edges F˜ ⊂ X, such that proj|F˜ : F˜ → X/Γ is a bijective map. The set F := F˜ ∪ {v ∈ V (X) | v is an endpoint of an edge in F} is a graph, which we call
320
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
fundamental domain. Note that proj|F : F → X/Γ is a graph-map, which is bijective on the set of edges, but not on the set of vertices. We construct a probability space (Ω, A, P) associated to percolation on X. Let Ω = ×v∈V {0, ∞} be equipped with the σ-algebra A generated by finite dimensional cylinders sets. Denote by P a probability measure on Ω and assume that the measurable shift transformations τγ : Ω → Ω,
(ττγ ω)v = ωγ −1 v
are measure preserving. Moreover, let the family τ γ , γ ∈ Γ act ergodically on Ω. By the definition of τγ , γ ∈ Γ the stochastic field q : Ω × V → {0, ∞} given by q(ω, v) = ωv , v ∈ V is stationary or equivariant, i.e. q(ττ γ ω, v) = q(ω, γ −1 v). An element ω of the probability space will be called configuration. The mathematical expectation associated to the probability P will be denoted by E. For a configuration ω, a site v with q(ω, v) = 0 will be called active or undeleted and a site v with q(ω, v) = ∞ deleted. For each ω ∈ Ω denote by V (ω) = V (X(ω)) = {v ∈ V | q(ω, v) = 0} the subset of active vertices, and denote by X(ω) the corresponding induced sub-graph of X. It is the sub-graph of X whose vertex set is V (ω) and whose edge set is E(ω) = E(X(ω)) = {e ∈ E(X) | both endpoints of e are in V (ω)} Let Λ = (V (Λ), E(Λ)) be an (deterministic) induced sub-graph of X. It gives rise to a random family of induced sub-graphs Λ(ω) := X(ω) ∩ Λ. On any of the graphs introduced so far we will consider operators of finite hopping range. The easiest example to have in mind is the adjacency operator Aω considered already in (1). More generally, a operator of finite hopping range H on a graph G is a linear map H : l 2 (V (G)) → l2 (V (G)) such that there exists C, R ≤ ∞ with (i) H(v, w) = H(w, v) (ii) H(γv, γw) = H(v, w) for all γ ∈ Γ (iii) |H(v, w)| ≤ C and (iv) H(v, w) = 0 if dist(v, w) ≥ R for all v, w ∈ V (G). Here H(v, w) := δv , Hδw and δv ∈ l2 (V (G)) is the function taking the value 1 at v and 0 elsewhere.
Quantum site percolation on amenable graphs
321
For a sub-graph G ⊂ X and a finite hopping range operator H denote by the compression of H to l 2 (V (G)), in other words
HG
H G (v, w) = H(v, w) if v, w ∈ G and H G (v, w) = 0 otherwise If V = V (G) is finite, H is a (|V |×|V |)-matrix, where |·| denotes the cardinality of a set. Thus the spectrum of H G is real and consists entirely of eigenvalues λi (H G ), which we enumerate in increasing order counting multiplicity. Let us define the normalized eigenvalue counting function of H G as N G (H, λ) :=
|{i ∈ N | λi (H G ) < λ}| |V |
We assume that the discrete group Γ is amenable, i.e. there exists a Følner sequence {IIj }j of finite, non-empty subsets of Γ. A sequence {II j }j is called Følner sequence if for any finite K ⊂ Γ and > 0 |IIj ∆KIIj | ≤ |IIj |
(2)
for all j large enough. Since the quotient X/Γ is compact, it follows that K := {γ ∈ Γ | γF ∩ F = ∅} is a finite generator set for Γ, cf. §3 in [2] for a similar statement in the context of manifolds. Now for finitely generated amenable groups there exists a Følner sequence of subsets, which is increasing and exhausts Γ, cf. Theorem 4 in [1]. From [15] we infer that each Følner sequence has an tempered subsequence. A tempered Følner sequence is a sequence which satisfies in addition to (2) the growth condition there exists C < ∞ such that for all j ∈ N : |II j Ij−1 Ij | −1 | ≤ C|I To each increasing, tempered Følner sequence associate an admissible exhaustion {Λj }j of X given by Λj := γF ⊂ X γ∈IIj−1
where Ij−1 := {γ|γ −1 ∈ Ij }. For a finite hopping range operator H, a Følner sequence {II j }j , and a random configuration ω ∈ Ω introduce for brevity sake the following notation: H ω := H X(ω) , Hωj := H Λj (ω) , and Nωj (λ) := N (H Hωj , λ). Denote by Pω (I) := χI (H Hω ) the spectral projection of Hω associated to the energy interval I. Theorem 2.1. There exist a distribution function N called integrated density of states such that for almost all ω ∈ Ω and any admissible exhaustion Λ j , j ∈ N we have lim Nωj (E) = N (E) (3) j→∞
322
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
at all continuity points of N . The following trace formula holds for the IDS N (E) =
1 E {Tr(χF Pω (] − ∞, E[))} |F|
We say that the IDS N is associated to the sequence of random operators {H Hωj }ω∈Ω , j ∈ N. Next we address the question of boundary condition dependence. Denote Λc = X \ Λ. Proposition 2.2. Let H be a finite hopping range operator, Λ j , j ∈ N an ˜ ∈ N, C < ∞. Let B j : l2 (Λj ) → l2 (Λj ), j ∈ N admissible exhaustion and R be any sequence of symmetric operators such that for all v, w ∈ V we have ˜ Then the |B j (v, w)| ≤ C and B j (v, w) = 0 if dist(v, Λcj ) + dist(w, Λcj ) > R. j j j IDS’ associated to the sequences {H H ω }ω∈Ω , j ∈ N and {H Hω + B }ω∈Ω , j ∈ N coincide. Next we establish the non-randomness of the spectrum of H ω and its components, its relation to the IDS and an understanding of the IDS as a von Neumann trace. Denote by σdisc , σess , σac , σsc , σpp the discrete, essential, absolutely continuous, singular continuous, and pure point part of the spectrum. Denote by σcomp the set of eigenvalues which posses an eigenfunction with compact, i.e. finite, support. In the following theorem Γ need not be amenable, but X must be countable. Theorem 2.3. There exists a Ω ⊂ Ω of full measure and subsets of the real numbers Σ and Σ• , where • ∈ {disc, ess, ac, sc, pp, comp}, such that for all ω ∈ Ω σ(H Hω ) = Σ and σ• (H Hω ) = Σ• for any • = disc, ess, ac, sc, pp, comp. If Γ is infinite, Σ disc = ∅. The almostsure spectrum Σ coincides with the set of points of increase of the IDS Σ = {λ ∈ R | N (λ + ) > N (λ − ) for all > 0} Furthermore, N is the distribution function of the spectral measure of the direct integral operator ! ⊕
H :=
Hω dP(ω) Ω
On the von Neumann algebra associated to H there is a canonical trace and N (E) is the value of this trace on the spectral projection of H associated to the interval ] − ∞, E[.
3.
Proofs of the theorems
Let H be a finite hopping range operator and assume without loss of generality |H(v, w)| ≤ 1 for all matrix elements. It follows that the l 2 -norm of H is
323
Quantum site percolation on amenable graphs
bounded by K := 2 deg R + . Since H is symmetric, it is a selfadjoint operator. In particular the spectrum of H ω is contained in [−K, K] for all ω ∈ Ω. Each γ ∈ Γ induces an unitary operator Uω,γ : l2 (V (ττγ−1 ω)) → l2 (V (ω)), (U Uω,γ f )(v) := f (γ −1 v). Note that V (ττγ ω) = γV (ω). By the definition of τγ the action of Γ on Ω and on X is compatible: ∗ Uω,γ Hω Uω,γ = Hτγ ω
(1)
The equivariance formula (1) implies ∗ Uω,γ f (H Hω )U Uω,γ = f (H Hτγ ω )
(2)
for any polynomial f . For continuous functions f, g we have f (H ω ) − Hω ) → g(H Hω ) ≤ f − g ∞ . Thus fn → f in C([−K, K], · ∞ ) implies fn (H f (H Hω ) in operator norm, and (2) extends by Weierstraß’ approximation theorem to all f ∈ C([−K, K]). By taking scalar products we obtain the corresponding equivariance relation for the matrix elements: f (H Hω )(γ −1 v, γ −1 w) = f (H Hτγ ω )(v, w) For the proof of the main Theorem 2.1 we need two key ingredients: an estimate of boundary effects on traces and a sufficiently general ergodic theorem, which will be applied to trace functionals of the type F (ω) := |F|−1 f (H Hω )(v, v) = |F|−1 Tr (f (H Hω )χF ) v∈F
Let us first estimate the boundary effects. Proposition 3.1. Let f (x) = xm for m ∈ N. Then 1 1 11 sup Tr(f (H Hωj )) − Tr(χΛj f (H Hω ))1 → 0 ω∈Ω |Λj | as j → ∞. Proof. We introduce the notion of a thickened boundary on a graph. For a sub-graph Λ and h ∈ N set ∂h Λ := {v ∈ Λ | dist(v, Λc ) ≤ h}. We expand the trace of powers of Hωj : Tr(H Hωj )m = (H Hωj )m (v, v) = Hω (v, v1 ) . . . Hω (vm−1 , v) v∈Λj
v∈Λj v1 ,...,vm−1 ∈Λj
By an analogous formula for Tr(χΛj Hωm ) we obtain Tr[χΛj Hωm − (H Hωj )m ] = Hω (v, v1 ) . . . Hω (vm−1 , v) v∈Λj
•
2R
m ≤ |∂Rm Λj | deg+
324
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
where the bullet denotes summation over m − 1-tuples (paths) in V (X) with at least one vertex outside Λj . By the Følner property of the sequence I j , j ∈ N we have: |∂h Λj | lim = 0 for any h ≥ 0 j→∞ |Λj | This is the content of Lemma 2.4 in [17]. In fact, there manifolds are considered, but the proof applies literally to the case of graphs. Lindenstrauss proved in [15] a remarkable ergodic theorem which applies to locally compact, second countable, amenable groups. It includes the following statement for discrete groups. Theorem 3.2. Let Γ be an amenable discrete group and (Ω, B Ω , P) be a probability space. Assume that Γ acts ergodically on Ω by measure preserving transformations τγ . Let {IIj }j be a tempered Følner sequence in Γ. Then for every F ∈ L1 (Ω) 1 lim F (ττγ ω) = E{F } (3) j→∞ |I Ij | γ∈IIj
for almost all ω ∈ Ω. In the application we have in mind F ∈ L ∞ , so the convergence holds in the L1 -topology, too. of distribution funcProof of Theorem 2.1. To prove
0 one can choose k large enough such that ffk − f ∞< < /3 and<subsequently j large enough (depending on and f k ) such that | fk dN Nωj − fk dN | < /3. Since the total measure of N and any Nωj is bounded by one, 1! 1 ! 1 1 j 1 f dN Nω − f dN 11 1 1! 1 1! 1 1! 1 ! 1 1 1 1 1 1 j1 j 1 1 1 1 ≤ 1 (f − fk )dN Nω 1 + 1 fk dN Nω − fk dN 1 + 1 (ffk − f )dN 11 < Thus it is sufficient to prove the convergence of moments ! ! m j lim λ Nω (dλ) = λm N (dλ) for all m ∈ N j→∞ R
R
(4)
325
Quantum site percolation on amenable graphs
1 Next we show that the limit on the left hand side equals |F| E {Tr(χF Hωm )} for almost all ω. For this aim we write the moment of the IDS as a trace ! λm Nωj (dλ) = |Λj |−1 Tr(f (H Hωj )) R
which by Proposition 3.1 converges for j → ∞ to the same limit as |Λj |−1 Tr(χΛj f (H Hω )). Now we decompose the trace according to local contributions and apply Lindenstrauss’ theorem |Λj |−1 Tr(χΛj f (H Hω )) = |Λj |−1 f (H Hωj )(v, v) = |Λj |−1
v∈Λj
f (H Hω )(γ −1 v, γ −1 v) = |IIj |−1
γ∈IIj v∈F
γ∈IIj
→ E{F } where F (ω) = |F|
, −1
v∈F
|F|−1
f (H Hτγ ω )(v, v)
v∈F
as j → ∞ for almost all ω ∈ Ω
f (H Hω )(v, v) = |F|−1 Tr(χF f (H Hω )). Set
Eω (λ) := Pω (] − ∞, λ[). The expectation of F equals ! 1 E λm Eω (dλ)(v, v) |F| v∈F ! ! 1 = λm E {Tr(χF Eω (dλ))} = λm N (dλ) |F| ˜ ωj := Hωj + B j . Similarly as in the proof Proof of Proposition 2.2. Denote H of Proposition 3.1 we have ˜ ωj )m − (H Tr[(H Hωj )m ] ˜ ω (v, v1 ) . . . H ˜ ω (vm−1 , v) = Hω (v, v1 ) . . . Hω (vm−1 , v) − H v∈Λj
• 2
m r ≤ [1 + (1 + C)m ] |∂Rm+R˜ Λj | deg+ .
Here the bullet denotes the summation over all paths v 1 , . . . , vm−1 ∈ Λj with ˜ By the Følner property at least one vertex in ∂R˜ Λj and r := max(R, R). j j ˜ ω )m − (H |Λj |−1 Tr[(H Hω )m ] converges to zero as j → ∞.
326
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Proof of Theorem 2.3. First we prove the non-randomness of σ comp . Set ˜ :={E ∈ R | ∃ finite induced sub-graph G ⊂ X and f ∈ l 2 (G) Σ
(5)
such that H f = Ef } G
˜ for all ω ∈ Ω. Since X is countable, there exists a Then σcomp (H Hω ) ⊂ Σ ˜ j := {E ∈ R | countable exhaustion of X by finite sets Dj , j ∈ N. If we set Σ 2 D (ω) ˜ = ∪j Σ ˜ j and thus Σ ˜ ∃ω ∈ Ω and f ∈ l (Dj (ω)) s.t. H j f = Ef }, then Σ is countable. ˜ set ΩE := {ω | ∃f with finite support and H ω f = Ef }. For any E ∈ Σ This set is invariant under the ergodic action of Γ by the transformations τ γ . ˜ E = ΩE and Therefore, either P(ΩE ) = 1 or P(ΩE ) = 0. In the first case set Ω c ˜ E ∈ Σcomp , in the second set ΩE = ΩE and E ∈ Σcomp . Here the superscript c denotes the complement of a set. The set Ω ˜ := ∩ ˜ Ω ˜ E∈Σ E has full measure ˜ satisfies σcomp (H and each ω ∈ Ω Hω ) = Σcomp . The remaining statements of the theorem follow from the results of [13]. One just has to check that the required assumptions are satisfied. This is not hard, but it would require to introduce the notion of grupoids and basic features of Connes’ non-commutative integration theory [6]. Therefore we leave the details of the proof of Theorem 2.3 to another occasion.
4.
Outlook: finitely supported and exponentially decaying states
Once we have a rigorous definition of the integrated density of states for the QPM, we can study finer spectral properties. One of the main interest in the physics literature are the properties of bound states, and their contribution to numerically observed "peaks" of the density of states. This quantity is the distributional derivative of the IDS. In the following we restrict our discussion to the QPM corresponding to the adjacency operator A on the lattice Z d , and to the sequence q(·, v), v ∈ Zd , consisting of independent, identically distributed random variables. There seem to be three different types of bound states of the QPM: finite cluster states, molecular states and exponentially decaying states, cf. [8, 7, 12, 18, 5]. Finite cluster states occur since almost surely there are active clusters of finite size, which, consequently, can support only bound states. These states are mathematically not challenging. However, as pointed out earlier, there exist molecular states. They are eigenvectors of the adjacency operator restricted to the active sites with support on a finite region of the infinite cluster. This means that due to the deletion of sites the unique continuation property of eigenfunctions of the adjacency operator on the lattice breaks down.
Quantum site percolation on amenable graphs
327
For the analysis of molecular states it is convenient to introduce the restriction of A to the infinite active cluster X ∞ (ω), and to define the corresponding IDS by N ∞ (λ) = |F|−1 E{Tr[χF ∞ (ω) Eω (λ)]}
A∞ ω
where F ∞ (ω) = F ∩ X ∞ (ω) and F = {0}. N ∞ is self-averaging, i.e. can be defined by an exhaustion procedure, similarly as N in Theorem 2.1. Here is a result on molecular states, whose proof is given in [21]. Theorem 4.1. The set of points of discontinuity of the IDS N of {A ω }ω coincides ˜ with the set of points of discontinuity of the IDS N ∞ of {A∞ ω }ω and equals Σ as defined in (5). Finally, one can hope that a multi-scale argument will yield a proof of the existence of exponentially decaying eigenstates, as in the case of the Anderson model [3, 9, 20, 19]. For this aim, one has to develop new tools to deal with the singular randomness present in the QPM. On the other hand, once this is done, one might use the new ideas to approach the exponential localization problem for the Anderson model with Bernoulli disorder of the coupling constants. In the multi-dimensional case this is a problem which is open since decades.
References [1] T. Adachi. A note on the Følner condition for amenability. Nagoya Math. J., 131:67–74, 1993. [2] T. Adachi and T. Sunada. Density of states in spectral geometry. Comment. Math. Helv., 68(3):480–493, 1993. [3] P. Anderson. Absence of diffusion in certain random lattices. Phys. Rev., 109:1492, 1958. [4] R. Carmona and J. Lacroix. ¨ Birkhauser, Boston, 1990.
Spectral Theory of Random Schr¨o¨ dinger Operators.
[5] J. T. Chayes, L. Chayes, J. R. Franz, J. P. Sethna, and S. A. Trugman. On the density of states for the quantum percolation problem. J. Phys. A, 19(18):L1173–L1177, 1986. [6] A. Connes. Sur la th´e´ orie non commutative de l’inte´ gration. In Alg`e` bres d’op´´ rateurs (S´e´ m., Les Plans-sur-Bex, 1978), pages 19–143. Springer, Berlin, 1979. [7] P.-G. de Gennes, P. Lafore, and J. Millot. Amas accidentels dans les solutions solides d´e´ sordonne´ es. J. of Phys. and Chem. of Solids, 11(1–2):105–110, 1959. [8] P.-G. de Gennes, P. Lafore, and J. Millot. Sur un ph´enom`e` ne de propagation dans un milieu d´e´ sordonne´ . J. Phys. Rad., 20:624, 1959. ¨ [9] J. Frohlich and T. Spencer. Absence of diffusion in the Anderson tight binding model for large disorder or low energy. Commun. Math. Phys., 88:151–184, 1983. [10] G. Grimmett. Percolation, volume 321 of Grundlehren der Mathematischen Wissenschaften. Springer, Berlin, 1999. [11] H. Kesten. Percolation theory for mathematicians, volume 2 of Progress in Probability r ¨ and Statistics. Birkhauser, Boston, 1982.
328
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[12] S. Kirkpatrick and T. P. Eggarter. Localized states of a binary alloy. Phys. Rev. B, 6:3598, 1972. [13] D. Lenz, N. Peyerimhoff, and I. Veseli´c´ . Integrated density of states for random metrics on manifolds. (arXiv.org, math-ph/0212058), to appear in Proc. London Math. Soc. [14] D. Lenz, N. Peyerimhoff, and I. Veseli´c´ . Von Neumann algebras, groupoids and the integrated density of states. (math-ph/0203026 on arXiv.org), submitted, March 2002. [15] E. Lindenstrauss. Pointwise theorems for amenable groups. Invent. Math., 146(2):259– 295, 2001. [16] L. A. Pastur and A. L. Figotin. Spectra of Random and Almost-Periodic Operators. Springer Verlag, Berlin, 1992. [17] N. Peyerimhoff and I. Veseli´c´ . Integrated density of states for ergodic random ¨ Schrodinger operators on manifolds. Geom. Dedicata, 91(1):117–135, 2002. [18] Y. Shapir, A. Aharony, and A. B. Harris. Localization and quantum percolation. Phys. Rev. Lett., 49(7):486–489, 1982. [19] P. Stollmann. Caught by disorder: A Course on Bound States in Random Media, vol¨ ume 20 of Progress in Mathematical Physics. Birkhauser, 2001. [20] H. von Dreifus and A. Klein. A new proof of localization in the Anderson tight binding model. Commun. Math. Phys., 124:285–299, 1989. [21] I. Veseli´c´ . Spectral Analysis arxiv.org/math-ph/0405006
of
Percolation
Hamiltonians.
ORDER OF ACCURACY OF EXTENDED WENO SCHEMES Senka Vukovi´c University of Rijeka 51000 Rijeka, Vukovarska 58 Croatia [email protected]
ˇ ˇ Nelida Crnjari´ c-Zic University of Rijeka 51000 Rijeka, Vukovarska 58 Croatia [email protected]
Luka Sopta University of Rijeka 51000 Rijeka, Vukovarska 58 Croatia [email protected]
Abstract
Extended finite difference WENO schemes for hyperbolic balance laws with spatially varying flux and geometrical source term were developed by the authors. In these schemes high order ENO and WENO reconstruction for flux and source term characteristicwise components are used, therefore schemes give high resolution results. In this paper these high order properties are analyzed through several convergence tests: on two shallow water tests, one linear acoustics test, and one test for the Burgers equation with a source term. Experimentally established orders of accuracy vs. the theoretically expected orders are compared for the new schemes, as well as for the original WENO schemes combined with pointwise source term evaluation.
Keywords:
hyperbolic balance laws, source term, spatially varying flux, WENO schemes, order of accuracy.
329 Z. Drma et al. (eds.), Proceedings of the Conference on Applied Mathematics and Scientific Computing, 329–341. © 2005 Springer. Printed in the Netherlands.
330
1.
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Introduction
Construction of numerical schemes that respect balance between the flux gradient and the source term is a significant issue in the ongoing research in numerical methods for hyperbolic balance laws. Significant results for the case of geometrical source term and autonomous flux were given by Berm u` dez and V´a´ zquez [4], Greenberg and LeRoux [11], Berm u` dez et al. [3], LeVeque [21], ¨ Jenny and Muller [18], Bristeau and Perthame [5], Chinnaya and LeRoux [7], Gosse [12], [13], Hubbard and Garc´´ıa–Navarro [17], Jin [20], Perthame and Simeoni [23], Zhou et al. [31], etc. In the case of the spatially varying flux the same problem was treated by V´a´ zquez [27], Hubbard and Garc´ıa–Navarro [17], Garc´´ıa–Navarro and Va´ zquez–Cendo´ n [10], Burguete and Garc´ıa–Navarro [6], Bale et al. [1], Vukovi´c´ and Sopta [30]. However, all of these papers were concerned with at most second order accurate numerical schemes. On the other hand, high order essentially nonoscillatory (ENO) schemes, developed by Harten et al. [15], [16], and the weighted essentially nonoscillatory (WENO) schemes, first proposed by Liu et al. [22], were known only in the version applicable to hyperbolic conservation laws with autonomous flux. Appropriate extension to WENO schemes for hyperbolic balance laws with geometrical source term and additionally with spatially varying flux were given by Vukovi c´ and Sopta [28] and Vukovi´c et al. [29]. In those papers construction as well as the proof of the exact conservation property of the proposed schemes in application to several balance laws was presented, while the question of order of accuracy of these schemes was only lightly discussed. In this paper a more detailed discussion and extensive numerical testing results are given.
2.
Extended WENO schemes
In this section a brief overview of the extended WENO schemes together with the discussion of the order of accuracy of these schemes is given. For a detailed description of the original WENO schemes [26] is recommended, while a detailed construction and motivation of the extended WENO schemes can be found in [28] and [29]. The extended WENO schemes are designed to numerically solve hyperbolic balance laws with spatially varying flux and geometrical source term ∂t u + ∂x f (u, x) = g (u, x) .
(1)
Here, t is the time, x is the space coordinate, u is the vector of conserved variables, f is the flux, and g is the source term. The time integration in (1) is solved as usually in WENO schemes, i.e., through a Runge-Kutta type method. Therefore, the order of accuracy in time equals the order of the applied RungeKutta method. In particular, the accuracy in time and the accuracy in space can be effectively made equal by taking ∆t ∝ (∆x) R/k . Here, k is the order of
331
Extended WENO Schemes
the Runge-Kutta time integration and R is the order of accuracy of the WENO reconstruction applied in approximations of L = −∂ x f + g. In all the WENO approximations of L conservation form for the flux gradient is applied, i.e., −1 Li = fi+1/2 − fi−1/2 + gi . (2) ∆x Here, a space discretization with cells xi−1/2 , xi+1/2 of uniform width ∆x is assumed, Li , i = 1, . . . , N is the numerical approximation to L in the ith cell center, fi+1/2 , i = 0, . . . , N is the numerical flux at the (i + 1/2)th cell boundary, and gi , i = 1, . . . , N is the numerical source term in the ith cell center. Furthermore, in the extended WENO schemes since source term is of the geometrical type the source term decomposition is adopted gi = gi−1/2,R + gi+1/2,L .
(3)
At first glance, a simple pointwise evaluation of the source term, i.e., g i = g (ui , xi ) would guarantee an exact evaluation of the source term, so the order of accuracy in space would depend only on the approximation of the flux. However, the geometrical source term typically contains spatial derivatives of some property of the media or of the domain. Usually in applications this property is not known analytically, therefore the derivation can only be evaluated as some finite difference approximation. If low order finite difference approximations are applied, the accuracy of the scheme will deteriorate. On the other hand, source term decomposition (3) is the first step towards the application of the WENO reconstruction to the source term approximations, instead of only to the flux approximation, as it is the case in the original WENO schemes. The next step in the extended WENO schemes is to compute characteristicwise components of the flux and source term in the form (p)
(p)
(p)
(p)
fi+1/2 = fi+1/2,1 + Pi+1/2,+ + Pi+1/2,− ,
(4)
(p)
(p)
1 (p) 1 (p) Qi+1/2,+ + Q , ∆x ∆x i+1/2,−
(5)
(p)
(p)
1 (p) 1 (p) Qi+1/2,+ − Q , ∆x ∆x i+1/2,−
(6)
gi+1/2,L = gi+1/2,L,1 + gi+1/2,R = gi+1/2,R,1 −
i.e., as the sum of some appropriately chosen balanced first order scheme term – (p) (p) (p) fi+1/2,1 , gi+1/2,L,1 , and gi+1/2,R,1 , and of the high order WENO reconstruction (p)
(p)
terms – Pi+1/2,± and Qi+1/2,± . The exact form of these terms depends on the particular version of WENO schemes.
332
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
In the WENO-LLF case the appropriate first order scheme is the balanced Q-scheme [4] 1 1 1 1 (p) 1 (p) fi+1/2,1 = (ffi + fi+1 ) − 1λi+1/2 1 (ui+1 − ui ) 2 (p) (p) −sign λi+1/2 vi+1/2 ∆x · li+1/2 , (7) (p)
(p) 1 − sign λi+1/2
(p)
2 (p) 1 + sign λi+1/2
gi+1/2,L,1 = gi+1/2,R,1 = (p)
2
(p)
(8)
(p)
(9)
gi+1/2 · li+1/2 , gi+1/2 · li+1/2 .
(p)
Here, λi+1/2 and li+1/2 , p = 1, . . . , m are numerical approximations of the eigenvalues and left eigenvectors of the local characteristic field, and m is the 1 number of equations in (1). Also, in (6)-(9) v i+1/2 = ∆x V (ui , ui+1 , xi , xi+1 ) 1 and gi+1/2 = ∆x G (ui , ui+1 , xi , xi+1 ) where V and G are numerical approx(p)
(p)
∂f imations of ∂x dx and gdx respectively. Furthermore, Pi+1/2,± and Qi+1/2,± are Rth order WENO reconstructions of the functions 1 1 1 1 (p) 1 v± = (f − fI ± ) ± 1λi+1/2 1 (u − uI ± ) 2 (p) ±sign λi+1/2 (V1 (uI ± , u, xI ± , x) (p) +β βi+1/2 V2 (uI ± , u, xI ± , x) · li+1/2 , (10)
and w± =
1 (G1 (uI ± , u, xI ± , x) 2
(p) (p) ±sign λi+1/2 γi+1/2 G2 (uI ± , u, xI ± , x) · li+1/2
(11)
respectively. In both WENO reconstructions the same weights computed from the smoothness measures of v ± − w± are used. In (9) and (10) V = V1 + βV V2 , βi+1/2 = β (ui , ui+1 , xi , xi+1 ), G = G1 = γG2 , γi+1/2 = γ(ui , ui+1 , xi , xi+1 ), I + = i, and I − = i + 1. If the balance laws (1) is homogenous and flux is autonomous, the presented extended WENO scheme reduces to the original algorithm as it is proved in [28]. Furthermore, in application to particular hyperbolic balance law appropriate definitions of V, G, and related terms lead to schemes with ability to exactly preserve some chosen subset of steady state solutions [28], [29]. Finally, the question of the order of accuracy in space can be discussed in the following
333
Extended WENO Schemes
way. If the properties of the media and domain are smooth enough as well as the solution, then functions v ± and w± are smooth enough too, and then from the theory of WENO reconstruction via primitive function follows (p) Pi+1/2,± = v ± xi+1/2 + O ∆xR , (12) (p) Qi+1/2,± = w± xi+1/2 + O ∆xR .
(13)
Further analytical analysis of this question in the general case becomes less transparent, therefore experimental tests are needed. In the WENO-RF case the appropriate first order is the balanced upwind scheme, i.e., (8) and (9) are again valid, while ⎞ ⎛ (p) (p) 1 + sign λi+1/2 1 − sign λi+1/2 (p) (p) fi+1/2,1 = ⎝ fi + fi+1 ⎠ · li+1/2 (14) 2 2 (p)
(p)
and Pi+1/2,± and Qi+1/2,± are computed for functions v± =
w± =
(p) 1 ± sign λi+1/2 2
(p)
(f − fI ± ) · li+1/2 ,
(p) 1 ± sign λi+1/2
(15)
(p)
G (uI ± , u, xI ± , x) · li+1/2 . (16) 2 Similar observations regarding exact conservation property and the order of accuracy in space as the ones for the WENO-LLF version are valid now too.
3.
Application to one-dimensional shallow water equations If in (1)
u=
h hv
, f=
hv 2 hv + 12 gh2
, and g =
0 dz −gh dx
.
(1)
the resulting system of equations are the one-dimensional shallow water equations. Here, h is the water depth, v is the water velocity, g is acceleration due to gravity, and z = z(x) is the bed level. In order to apply the extended WENO schemes to these equations following definitions are needed [28] V 1 = V2 = 0, , ,, β = 0, γ = g h +h 2 ,
0 0 , ,, G1 = , G2 = . (2) − (z ,, − z , ) −g h +h (z ,, − z , ) 2
334
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Then, as it is proved in [28] the resulting schemes exactly preserve the quiescent flow – h + z = const. and v = 0. Furthermore, for both WENO-LLF and WENO-RF versions it can be easily verified that (p)
(p)
fi+1/2 = fi+1/2,exact + O(∆xR ).
(3)
However, the estimation of the accuracy of the source term approximations leads to more complicated relations and experimental test are needed. Since extended WENO schemes exactly preserve quiescent flow, regardless of the space step, the error in the computations will be the machine round off error. Therefore, two types of numerical convergence tests are possible – for nonquiescent steady state flow, where exact solution is known, and for nonstationary flow, where instead of the exact solution a numerical solution computed on a very fine grid must be used.
3.1
A Stationary Shallow Water Flow Test 2
2 In the first test the bed level has a bump z (x) = 0.2 · e −( 5 (x−10)) while initial condition is the stationary subcritical flow over the bump with unit discharge hv (x, 0) = 4.42m3 /s. Experimentally obtained orders of accuracy are presented in Table 1.
3.2
A Nonstationary Shallow Water Flow Test 2
In the second test the bed level is given with z (x) = 0.1 · e −(8x) and the 2 initial condition is still water with water level H (x, 0) = 1 + 0.4 · e −(8x) . Experimentally obtained orders of accuracy are presented in Table 2.
4.
Application to one-dimensional linear acoustics equations The one-dimensional linear acoustics equations are obtained from (1) with
dρ ρ −ρu −u dx u= , f= , and g = . (1) −σ ρu 0
Here ρ = ρ (x) is the media density, is the strain, u is the velocity, and σ is the stress defined with the stress-strain relation σ = K, where K = K (x) is , ,, the bulk modulus of compressibility. If definitions β = Kρ, +K +ρ,, , γ = 1,
0 0 V1 = , V2 = , (2) − (σ ,, − σ , ) − (ρ,, ,, − ρ, , )
u, +u,, ,, − 2 (ρ − ρ, ) G1 = G2 = , (3) 0
335
Extended WENO Schemes
Table 1.
Convergence results, test problem 3.1
Method Q-scheme, balanced
Q-scheme, pointwise
3rd order ENO-RF, balanced
3rd order ENO-RF, pointwise
5th order ENO-RF, balanced
5th order ENO-RF, pointwise
3rd order WENO-RF, balanced
3rd order WENO-RF, pointwise
N. of cells
L∞ error
10 20 40 80 160 10 20 40 80 160 10 20 40 80 160 10 20 40 80 160 10 20 40 80 160 10 20 40 80 160 10 20 40 80 160 10 20 40 80 160
4.4865E-09 7.1772E-10 1.0164E-10 1.3200E-11 1.6600E-12 1.2188E-05 6.4309E-06 3.2593E-06 1.6352E-06 8.1828E-07 2.4101E-09 1.5603E-10 8.2501E-12 4.3010E-13 1.9984E-14 1.4245E-06 2.8191E-07 3.8595E-08 4.9248E-09 6.2017E-10 1.7470E-09 2.0059E-10 3.1699E-12 4.0190E-14 1.0214E-14 2.7218E-07 1.4471E-08 4.4283E-10 2.4160E-11 3.6500E-12 1.7379E-09 1.5667E-10 7.2700E-12 2.6001E-13 1.9984E-14 3.9226E-06 2.0979E-06 1.0661E-06 5.3372E-07 2.4952E-07
L∞ order 2.64 2.82 2.94 2.99 0.92 0.98 1.00 1.00 3.95 4.24 4.26 4.43 2.34 2.87 2.97 2.99 3.12 5.98 6.30 1.98 4.23 5.03 4.20 2.73 3.47 4.43 4.81 3.70 0.90 0.98 1.00 1.10
L1 error 2.0865E-08 2.6690E-09 3.5307E-10 4.4862E-11 5.6873E-12 5.4689E-05 2.8866E-05 1.4507E-05 7.2710E-06 3.6355E-06 9.6405E-09 3.5330E-10 1.8820E-11 8.4135E-13 8.5612E-14 5.6992E-06 9.3514E-07 1.4075E-07 1.9122E-08 2.4855E-09 6.9896E-09 3.2481E-10 3.7377E-12 2.1885E-13 8.1865E-14 1.0901E-06 5.6547E-08 2.8651E-09 1.2826E-10 1.5856E-11 8.1335E-09 4.3867E-10 1.8823E-11 7.6616E-13 9.3106E-14 1.5031E-05 5.8593E-06 1.5730E-06 3.8640E-07 8.6037E-08
L1 order 2.97 2.92 2.98 2.98 0.92 0.99 1.00 1.00 4.77 4.23 4.48 3.30 2.61 2.73 2.88 2.94 4.43 6.44 4.09 1.42 4.27 4.30 4.48 3.02 4.21 4.54 4.62 3.04 1.36 1.90 2.03 2.17
336
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Table 2.
Convergence results, test problem 3.2
Method 6th order ENO-RF, balanced
6th order ENO-RF, pointwise
11th order WENO-RF, balanced 11th order WENO-RF, pointwise 2nd order ENO-RF, balanced
4th order ENO-RF, balanced
3rd order WENO-RF, balanced
7th order WENO-RF, balanced
N. of cells
L∞ error
10 20 40 80 160 10 20 40 80 160 10 20 40 80 10 20 40 80 10 20 40 80 160 10 20 40 80 160 10 20 40 80 160 10 20 40 80 160
7.6543E-06 4.4605E-06 6.7366E-07 3.7350E-08 1.6468E-09 5.7348E-06 3.3475E-06 5.0501E-07 2.7769E-08 1.1581E-09 3.5436E-05 9.9411E-06 4.9194E-07 2.2891E-09 3.5436E-05 9.9411E-06 3.6766E-07 2.2891E-09 2.9363E-05 1.0225E-05 1.5488E-05 9.7069E-06 5.4801E-06 1.3046E-05 7.4715E-06 2.6646E-06 3.9290E-07 6.4296E-08 4.9601E-05 2.4373E-05 1.3317E-05 7.2244E-06 3.6601E-06 4.0051E-05 1.1650E-05 1.4325E-06 8.8643E-08 1.9912E-09
L∞ order 0.78 2.73 4.17 4.50 0.78 2.73 4.18 4.58 1.83 4.34 7.75 1.83 4.76 7.33 1.52 -0.60 0.67 0.82 0.80 1.49 2.76 2.61 1.03 0.87 0.88 0.98 1.78 3.02 4.01 5.48
L1 error 3.0649E-06 1.7825E-06 1.3460E-07 4.1762E-09 1.2795E-10 2.2971E-06 1.3369E-06 1.0077E-07 3.1010E-09 9.5498E-11 1.4178E-05 2.2051E-06 6.5971E-08 1.8260E-10 1.4178E-05 2.2051E-06 4.9339E-08 1.8260E-10 1.1748E-05 4.9860E-06 3.0963E-06 1.0945E-06 3.4805E-07 5.2216E-06 2.9870E-06 5.3263E-07 4.8165E-08 4.8336E-09 1.9844E-05 4.8747E-06 2.4968E-06 9.7803E-07 2.6906E-07 1.6024E-05 2.3301E-06 2.6287E-07 6.5593E-09 4.8336E-09
L1 order 0.78 3.73 5.01 5.03 0.78 3.73 5.02 5.02 2.68 5.06 8.50 2.68 5.48 8.08 1.24 0.69 1.50 1.65 0.81 2.49 3.47 3.32 2.03 0.97 1.35 1.86 2.78 3.15 5.32 6.23
337
Extended WENO Schemes
are taken, the extended WENO schemes exactly conserve all the steady states – σ = const. and u = const. as it is proven in [29]. Regarding the estimation of the accuracy of the flux and source term approximations, (3) is valid and experimental test can be carried out. Extended WENO schemes exactly preserve all steady state solutions for the linear acoustics equations, therefore numerical convergence tests are possible only on the nonstationary solutions, and there exact solution must be replaced with a numerical solution computed on a very fine grid.
4.1
A Nonstationary Linear Acoustics Test
In this test problem the sound speed and impendance Z = ρc of the media are given with c (x) = 1 − 0.5 sin (πx) and Z (x) = 1, while the initial condition 2 is σ (x, 0) = −1 − 1.5e−(8x) , u (x, 0) = 0. The computed orders of accuracy are given in Table 3.
5.
Application to one-dimensional Burgers equations with source term describing bathymetry If in (1) u=
u
, f=
1 2 2u
, and g =
dz −u dx
,
(1)
are used the resulting equations are the one-dimensional Burgers equations with source term describing bathymetry. Here, z = z(x) is given function describing bathymetry. For this case of extended WENO scheme application definitions , ,, V1 = V2 = 0, β = 0, γ = u +u 2 , , ,, (z ,, − z , ) , G2 = − (z ,, − z , ) , G1 = − u +u (2) 2 lead to the exact conservation property for steady states of type u + z = const. The proof is analogous to the proof of this property for the one-dimensional shallow water equations [28]. Also, analogous conclusions are valid for the estimation of the order of accuracy of this application as for the shallow water equations case. Experimentally obtained orders of accuracy in stationary or nonstationary test are almost identical to the ones presented in Sections 3.1 and 3.2, therefore tables with results are omitted in this section.
6.
Concluding remarks
It is known [4] that the balanced Q-scheme is first order accurate but that in the stationary shallow water flow problems it performs like a second order accurate scheme. This fact is confirmed with convergence results in Table 1. In Table 1 the same unusual property is demonstrated for the extended WENO schemes. In fact, while the experimentally established orders of the pointwise
338
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
Table 3.
Convergence results, test problem 4.1
Method 6th order ENO-RF, balanced
6th order ENO-RF, pointwise
11th order WENO-RF, balanced 11th order WENO-RF, pointwise 3rd order ENO-RF, balanced
5th order ENO-RF, balanced
5th order WENO-RF, balanced
9th order WENO-RF, balanced
N. of cells
L∞ error
10 20 40 80 160 10 20 40 80 160 10 20 40 80 10 20 40 80 10 20 40 80 160 10 20 40 80 160 10 20 40 80 160 10 20 40 80 160
3.9867E-04 6.2253E-05 6.6501E-06 2.5451E-07 1.0695E-08 3.5749E-04 6.2253E-05 6.6541E-06 2.5483E-07 1.0670E-08 3.8744E-04 9.2013E-05 2.5937E-06 7.4077E-09 3.8744E-04 9.2010E-05 2.5920E-06 6.9258E-09 4.9500E-04 1.4359E-04 5.3350E-05 1.2019E-05 1.6934E-06 1.1740E-04 7.1203E-05 1.0487E-05 8.4816E-07 3.2901E-08 4.9500E-04 1.8093E-04 2.4973E-05 2.1678E-06 1.2825E-07 4.1693E-04 1.0525E-04 5.7218E-06 2.1076E-08 8.6470E-11
L∞ order 2.68 3.23 4.71 4.57 2.52 3.23 4.71 4.58 2.07 5.15 8.45 2.07 5.15 8.55 1.79 1.43 2.15 2.83 0.72 2.76 3.63 4.69 1.45 2.86 3.53 4.08 1.99 4.20 8.08 7.93
L1 error 1.5936E-04 2.1093E-05 1.3028E-06 3.5502E-08 1.0001E-09 1.4702E-04 2.1093E-05 1.3024E-06 3.5472E-08 9.9652E-10 1.5487E-04 2.1121E-05 5.2809E-07 7.1653E-10 1.5487E-04 2.1120E-05 5.2780E-07 6.7121E-10 1.9785E-04 4.8663E-05 1.0571E-05 1.7468E-06 2.6344E-07 4.6971E-05 2.4430E-05 2.0794E-06 1.1383E-07 5.1997E-09 1.9785E-04 4.4898E-05 7.0921E-06 4.8606E-07 1.9299E-08 1.6665E-04 2.1045E-05 1.1414E-06 4.1107E-09 1.0716E-11
L1 order 2.92 4.02 5.20 5.15 2.80 4.02 5.20 5.15 2.87 5.32 9.53 2.87 5.32 9.62 2.02 2.20 2.60 2.73 0.94 3.55 4.19 4.45 2.14 2.66 3.87 4.65 2.99 4.20 8.12 8.58
Extended WENO Schemes
339
versions are comparable to the expected theoretical orders, the extended, i.e., balanced versions present an even better order of accuracy. Furthermore, results in Table 2 show that the balanced and pointwise versions present almost identical errors and orders of accuracy in the case of nonstationary shallow water flow, and results in those and also in Table 2 that these experimentally obtained orders approach the expected theoretical ones. Finally, the same conclusions can be obtained from the nonstationary linear acoustics test, i.e., convergence results in Table 3. Naturally, in the linear acoustics equation case convergence measurements can not be performed on steady state solutions since extended WENO schemes have the exact conservation property for all steady state solutions of the linear acoustics equations, and that property overrieds any convergence. All the presented results lead to the conclusion that the modifications introduced in the WENO schemes in order to obtain schemes that respect balance between the flux gradient and the source term did not deteriorate the order of accuracy of those schemes. Even more, in the case of steady state solutions the new schemes demonstrate convergence properties superior to convergence properties of the pointwise versions of the same schemes.
References [1] D. S. Bale, R. J. LeVeque, S. Mitran, and J. A. Rossmanith, A wave propagation method for conservation laws and balance laws with spatially varying flux functions, preprint, (2001) [2] D. S. Balsara and C. W. Shu, Monotonicity preserving weighted essentially nonoscillatory schemes with increasingly high order of accuracy, J. Comput. Phys. 160, (2000), doi:10.1006/jcph.2000.6443 [3] A. Berm`u` dez, A. Dervieux, J. A. De´ sidee´ ri and M. E. Va´ zquez, Upwind schemes for the two-dimensional shallow water equations with variable depth using unstructured meshes, Comput. Methods Appl. Mech. Eng. 155, 49 (1998) [4] A. Berm`u` dez and M. E. Va´ zquez, Upwind methods for hyperbolic conservation laws with source terms, Comput. Fluids 23(8), 1049 (1994) [5] M. O. Bristeau and B. Perthame, Transport of pollutant in shallow water using kinetic schemes, ESAIM-Proceedings Vol. 10 – CEMRACS, 9 (1999) [6] J. Burguete and P. Garc´´ıa–Navarro, Efficient construction of high-resolution TVD conservative schemes for equations with source terms: application to shallow water flows, Int. J. Numer. Meth. Fluids 37 (2001), doi: 10.1002/fld.175 [7] A. Chinnaya and A.–Y. LeRoux, A new general Riemann solver for the shallow water equations, with friction topography, www.math.ntnu.no/ conservation/1999/021.html ˇ ˇ S. Vukovi´c´ , and L. Sopta, Extension of ENO and WENO schemes to [8] N. Crnjari´ c-Zic, one-dimensional sediment transport equations, Comput. Fluids 33/1, 31 (2003) [9] P. Garc´´ıa–Navarro, F. Alcrudo and J. M. Saviroo´ n, 1-d open-channel flow simulation using TVD-McCormack scheme, Journal of Hydraulic Engineering 118, 1359 (1992) [10] P. Garc´´ıa–Navarro and M. E. Va´ zquez–Cendo´ n, On numerical treatment of the source terms in the shallow water equations, Comput. Fluids 29, 951 (2000)
340
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
[11] J. M. Greenberg and A.–Y. LeRoux, A well-balanced scheme for the numerical processing of source terms in hyperbolic equations, SIAM J. Numer. Anal., 33, 1 (1996) [12] L. Gosse, A well-balanced flux-vector splitting scheme designed for hyperbolic systems of conservation laws with source terms, Comput. Math. Appl., 39, 135 (2000) [13] L. Gosse, A well-balanced scheme using non-conservative products designed for hyperbolic systems of conservation laws with source terms, Math. Models Methods Appl. Sci., 11, 339 (2001) [14] A. Harten, P. D. Lax and B. Van Leer, On upstream differencing and Godunov-type schemes for hyperbolic conservation laws, SIAM Review 25, 35 (1983) [15] A. Harten and S. Osher, Uniformly high-order accurate non-oscillatory schemes, I, SIAM J. Numer. Anal. 24, 279 (1987) [16] A. Harten, B. Engquist, S. Osher, and S. R. Chakravarthy, Uniformly high-order accurate non-oscillatory schemes, III, J. Comput. Phys. 71, 231 (1987) [17] M. E. Hubbard and P. Garc´´ıa–Navarro, Flux difference splitting and the balancing of source terms and flux gradients, J. Comput. Phys. 165, (2000), doi: 10.1006/jcph.2000.6603 ¨ [18] P. Jenny and B. Muller, Rankine–Hugoniot–Riemann solver considering source terms and multidimensional effects, J. Comput. Phys., 145, 575 (1998) [19] G. Jiang and C. W. Shu, Efficient implementation of weighted ENO schemes, J. Comput. Phys. 126, 202 (1996) [20] S. Jin, A steady-state capturing method for hyperbolic systems with geometrical source terms, Math. Model. Num. Anal., 35 631 (2001) [21] R. J. LeVeque, Balancing source terms and flux gradients in high-resolution Godunov methods: the quasi-steady wave-propagation algorithm, J. Comput. Phys. 146, 346 (1998) [22] X.-D. Liu, S. Osher, and T. Chan, Weighted essentially non-oscillatory schemes, J. Comput. Phys. 115, 200, (1994) [23] B. Perthame and C. Simeoni, A kinetic scheme for the Saint-Venant system with a source term, CALCOLO, 38(4), 201 (2001) [24] C. W. Shu and S. Osher, Efficient implementation of essentially non-oscillatory shockcapturing schemes, J. Comput. Phys. 77, 439 (1988) [25] C. W. Shu and S. Osher, Efficient implementation of essentially non-oscillatory shockcapturing schemes, II, J. Comput. Phys. 83, 32 (1989) [26] C. W. Shu, Essentially non-oscillatory and weighted essentially non-oscillatory shockcapturing schemes for hyperbolic conservation laws, in Advanced Numerical Approximation of Nonlinear Hyperbolic Equations, edited by B. Cockburn, C. Johnson,C. W. Shu, and E. Tadmor, Lecture Notes in Mathematics (Springer-Verlag, Berlin/New York, 1988), Vol. 1697, p. 325 [27] M. E. V´a´ zquez–Cendo´ n, Improved treatment of source terms in upwind schemes for the shallow water equations in channels with irregular geometry, J. Comput. Phys. 148, 497 (1999) [28] S. Vukovi´c´ and L. Sopta, ENO and WENO schemes with the exact conservation property for one-dimensional shallow water equations, J. Comput. Phys. 179, 593 (2002), doi: 10.1006/jcph.2002.7076 ˇ ˇ and L. Sopta, WENO schemes for balance laws with spatially [29] S. Vukovi´c´ , N. Crnjari´ c-Zic, varying flux, submitted to J. Comput. Phys., (2003)
Extended WENO Schemes
341
[30] S. Vukovi´c´ and L. Sopta, Upwind schemes with exact conservation property for onedimensional open channel flow equations, SIAM J. Scient. Comput., 24(5), 1630 (2003) [31] J. G. Zhou, D. M. Causon, C. G. Mingham, and D. M. Ingram, The surface gradient method for the treatment of source terms in the shallow water equations, J. Comput. Phys. 168, 1 (2001), doi:10.1006/jcph/2000.6670
Index
H∞ norm, 31 ε-uniformly convergence, 131 order of accuracy, 329 abstract scheme, 41 admissible domain, 41 algebraic Riccati equation, 32 analytical continuity, 52 approximation of solution, 263 B-spline, 151 B´e´ zier control point, 57 backward stability, 143 balance law, 171 bidiagonal form, 141 biharmonic spline, 77 biharmonic tension spline, 67 bivariate spline space, 207 Black Scholes, 95 blossoming approach, 207 Burgers equation, 329 CAGD, 41 canonical complete Chebyshev (CCC)–system, 151 central scheme, 171 characteristic function, 122 Chebyshev spline, 273 Chebyshev system, 151, 273 collocation method, 131 complete approximation, 83 conductivity coefficient, 122 conductivity problem, 122 consistency, 197 constrained curve, 41 convergence, 197, 253 counter flow exchange, 235 de Boor algorithm, 151 decision space, 49 delete state, 188 density, 109 derivative formula, 273
differential multipoint boundary value problem, 67 dimension, 207 Dirichlet boundary condition, 131 discrete curvature, 58 discretization, 198 divided difference, 273 duality, 109 ECC-system, 273 edge, 207 elasticity, 305 end condition, 68 error analysis, 141 Euler approximation, 200 exact C-property, 171 existence problem, 217 expectation maximization, 187 explicit Hamiltonian QR algorithm, 23 extended partition, 157 fairness, 41 fairness functional, 55 finite-difference approximation, 68 finite-difference method, 161 finite-difference scheme in fractional steps, 67 Fr´e´ chet space, 112 Gaussian distributed random variable, 200 generalized derivative, 152, 273 geometric continuity, 52 geometric interpolation, 42, 60, 245 Hamiltonian, 317 Hamiltonian eigenvalue problem, 3 Hamiltonian matrix, 3 hidden Markov model, 187 homogenisation, 121 hot water circulation, 235 HRA property, 143 Hubbert function, 217 hyperbolic balance law, 294, 329 hyperbolic spline, 68 hyperbolic tension spline, 67
344
APPLIED MATHEMATICS AND SCIENTIFIC COMPUTING
imbeddings, 109 initial-boundary value problem, 253 insert state, 188 integrated density of state, 317 integro-differential equation, 235 interpolation condition, 68 inverse problem, 161 knot insertion, 151 leaf spring, 305 least squares problem, 148 Lebesgue space, 109 linear acoustics, 329 linear continuous-time system, 30 local error, 197 locally Sobolev function, 109 logistic function, 217 match state, 188 matrix feedthrough, 30 input, 30 output, 30 state, 30 measurable matrix function, 122 mesh solution, 69 micropolar fluid, 253 multiple alignment, 187 multivariate constrained problem, 50 noise injection, 190 nonlinear least squares, 217 NT scheme, 171 numerical method, 197 numerical scheme, 330 numerical simulation, 293 Oil well, 235 one–sided bidiagonalization algorithm, 141 open channel flow, 294 optimal design, 121 optimal solution, 48 Oslo algorithm, 151 parabolic equation, 161 parallel computing, 141 parallel heat flow exchange, 235 parameter class of solutions, 263 parameter estimation (PE) problem, 217 parameter-uniform convergence, 96 parametric curve, 50, 245 partial differential equation, 109 perturbation, 283 piecewise curve, 41 pleasant curve, 55 porcupine plot, 56
protein sequence, 188 PVL decomposition, 8 Q-scheme, 294 quadratic eigenvalue problem, 33 qualitative analysis of differential equation, 263 quasilinear differential equation, 263 random graph, 317 ¨ random Schrodinger operator, 317 rank stability, 283 rank stable, 283 reflexivity, 109 relative perturbation, 283 robust numerical method, 95 rod model, 305 set valued map, 45 shallow water, 294, 329 shallow water equation, 171 shape preserving interpolation, 41, 67 shape preserving interpolation problem, 77 simulated annealing, 187 singular perturbation, 95 singular value decomposition, 141 singularly perturbed parabolic equation, 131 site percolation, 317 skew-Hamiltonian matrix, 3 skew-Hamiltonian Schur decomposition, 10 source term, 329 space data, 245 spatially varying flux, 329 spline, 41 splitting scheme, 82 Stability radius, 31 stabilization, 253 stationary diffusion, 121 stiffness coefficient, 305 stochastic ordinary differential equation, 197 strong convergence, 199 strong solution, 253 strongly backward stable, 4 strongly consistent, 199 structure-preserving algorithm, 3 structured condition number, 3 structured decomposition, 8 suboptimal alignment, 187, 191 successive over-relaxation method, 67 symmetric definite eigenvalue problem, 147 symplectic QR decomposition, 17 symplectic URV decomposition, 27 tension parameter, 64, 68 tension spline, 67 time dependent reaction-diffusion problem, 131 topological retraction, 263 total least squares, 217 triangulation, 207
345
INDEX univariate problem, 50 unknown source parameter, 161 upper semi continuous, 46 upwind piecewise-uniform fitted mesh method, 101 upwind uniform mesh method, 99 vanishing singular value, 284 vertex, 208
Volterra integral equation, 235 weak topology, 109 weighted spline, 151 well-balanced scheme, 294 WENO scheme, 329 wetting and drying problem, 294 zero pattern, 283